Contents:
Introduction
Background and purposes
Problems with home-made aplications and advantage of
propietary sofware.
Methods
Results and Discussion
Procedure
Some Comments
Results
Conclusions
Bibliography
Bibliometric information systems are the workbench of Science and Technology (S & T) indicators research. As an important part of this field of endeavor, it requires a flexible design in order to obtain accurate and customized indicators as well as to incorporate new features resulted from the latest developments. This paper describes an open and flexible Bibliometric information system. The system complies with a simple modular design, connectivity for desktop work and low cost. It is useful for practical work as well as for education and training. Limitations are mainly due to available memory although one million documents are the maximum possible volume for treatment.
Today, the need to become oriented inside of huge volumes of information generated by so many different means and supports, becomes a challenge for the information professionals. The birth and development of new disciplines as "data mining" and "knowledge discovery" ",(Fayyad et al., 1996; Adriaans y Zantinge, 1996; Cabena et al., 1997 Dhar y Stein, 1997; Swanson y Smalheiser, 1997) specifically oriented towards the quest and interpretation of new knowledge by means of the research of the information production and consumption processes (IPCP) , shows a clear hint of the increasing importance of everything related to the quantitative and qualitative analysis of huge corpora data.
Bearing in mind the former, bibliometrics techniques, with the aim of the study, classification and assessment of information production and consumption of scientific information by means of quantitative methods and statistical treatment of data, becomes one of the fundamental tools available by the information professional for his quest of indicators. Allowing he or she a "critical appraisal" of scientific research, as well as the interaction among researchers, institutions and knowledge areas.
The former has conditioned the increase of efforts for the systematization and standardization of methods and tools used in bibliometrics. For Glanzel (1996) bibliometris is a complex discipline, although ins classified among the social sciences it is narrowly conditioned by pure and technological sciences. Reason why any methodological characterization on the one hand requiered well documented methods and data processing, a clear description of the sources and exact definition of indicators; on the other hand, an effective selection and integration of the applied technologies. Ravichandra Rao (1996) asserts that there is not an unique method that could be applied to any research by means of bibliometrics techniques, but different procedures for different problems. Grivel, Polanco y Kaplan (1997) emphasizes what they call the "informatic infrastructure" upon which bibliometrics could develop all its potential. For these authors bibliometrics should have not only a methodology characterized by an adequate mathematical representation but also by means of an effective "informatic arquitecture". In this same direction is the work of Katz y Hicks (1997), Small (1998) y Sotolongo-Aguilar, Guzmán-Sánchez y García-Díaz (1998) and others. They show projects that points to the integration of different informatic tools by means of proprietary software of public domain; bulding-up platforms that fits the needs of different approaches of biblometrics research. Probably one of the most exiting product-project is DATAVIEW from CRRM, that could be classify as a "very high integrated" software.
In this paper we report one of the components of an ongoing research devoted to the definition and assessment different stages of a procedure for studying the production and consumption of information by means of bibliometric techniques. The former is supported in the integration of different software widely spread and easy to use with the following aim:
In bibliometrics research every body has experimented the need of bulding in-house applications. This is a fact. The problem arrives when generalization should be done. In-hose applications are rarely well-documented and its use by others becomes difficult. The results are that only the members of the team are able to replicate the use of such application. The standardization "stall".
On the other hand, proprietary software is well documented, and the validation of techniques is obvious. Besides that, many teams of developers are continuously improving the performance of such software.
Six modules based on proprietary software integrate the system. Each of them performs defined functions. The modules are the following:
Bibliographic Searches, are conducted online or on CD-ROM. Resulting files are downloaded and converted by module (2) File Conversion & Handling. Resulting files are the input to module (3) Bibliographic Reference Management, where the standardization of the database is performed. Different fields under study or a combination of them are exported and saved as text files. Afterwards in module (4) Basic Statistics, those files are processed. All possible basic statistics could be performed based on frequency analysis special feature with built-in functions. The input for module (5) Basic Bibliometric Analysis, is prepared in module (3) where also the preparation of the input for module (6) Advanced Statistics, take place. Different scenarios could be implemented, varying elements inside each module.
SCENARIO
A possible software scenario associated to each module could be the following:
The above mentioned scenario operates according the following procedure: Bibliographic searches are conducted online or on CD-ROM. Resulting files are downloaded and treated by BilIolink II converting it according to a selected configuration that depends o host and fields to be studied. The resulting converted file is already in Pro-Cite format having the possibility to switch directly to the bibliographic reference management features of Pro-Cite. Here standardization of the database is conducted. Many different treatments could take place including the building of Authority Lists with the contents of different fields including an Authority List of all word in any field or in the whole database. The different fields under study or a combination of them are exported and saved as text files. Afterwards, EXCEL imports those files. All possible basic statistics could be performed based on frequency analysis aides by the Pivot Table feature of EXCEL complemented by built-in Analysis Functions available in the Tools Menu. Frequency tables obtained from the Pivot Table are copied and pasted in a text file in Word, next the table is converts to text saving data with paragraph-marks as separators. The resulting text file is the input for Basic Bibliometrics processing by The Bibliometrics ToolBox. All the analysis performed by The Bilbiometric ToolBox could be recorded in a text file. With EXCEL also is possible to built the matrices that produce the input for Cluster Analysis, Factor Analysis and Multidimensional Scaling. Those matrices are exported as EXCEL sheets and afterwards imported by STATISTICA and finally processed.
In regards with bibliographic searches in biomedicine we have been using The Query E-mail Retrieval System from NLM. This is a very nice retrieval engine by e mail and works very well. In the case of the bibliographic reference management software, we have used extensively ProCite beginning with version 2.02 (MS DOS) up to the latest available 4.01 (for Windows). The advantage with the later is that it integrates its companion file-conversion software BiblioLink II. It also works very smoothly. Other reference management software have been tested e.g. EndNote and Reference Manager including the latest versions. All of them, with their advantages and disadvantages fit in this model. Up t nearly 40 reference management software in the market could be eligible for this tasks. Statistical packages are another important component. Undoubtedly EXCEL is widely use and complies very well with many bibliometric tasks. The very good complement, as it was already mentioned, is xlStat with many useful features for cluster analysis and multidimensional scaling.
Although not yet tested by this team the incorporation of DATAVIEW to the platform presented here seams to fit very well. Its "very high integration" combining it, for example, to a reference management software could return very good results.
This system platform guarantee a comprehensive trazability of all data from the first data downloaded, to the last chart obtained. At the same time, consistent results are attained by means of the reproducibility at all the steps performed as was described above. Bibliographic data in the database could be treated for building up bibliographies.
Bibliometric output data of the system includes, among others, the following indicators:
The benefits resulting from the follow-up of the development outlined in this paper could be twofold. On the one hand, integrating public domain software in a flexible modular design, comprehensive automated processing and data representation stages of research could be achieved in contrast to the cumbersome tasks that should be performed by other means. On the other hand, this platform is supported on software widely used that are regularly updated and upgraded; in contrast with adhoc software that becomes outdated very rapidly.
The described Bibliometric information system has
shown to be a working platform that could be up-graded, flexible and with
utility performance. Improvements are foreseen. Participation on the testing
of this platform is welcome, as well as new ideas for incorporating modules
or improving the existing.
Adriaans, P., Zantinge, D. Data Minin. Addison-Wesley, 1996.
Cabena, P. et al. Discovering Data Mining: From Concept to Implementation. Prentice Hall, 1997.
Dhar, V., Stein, R. Seven Methods for Transforming Corporate Data into Business Intelligence. Prentice Hall, 1997.
Fayyad, U.M. et al. Advances in Knowledge Discovery and Data Mining. Massachussett Institute of Technology, 1996.
Glanzel, W. The need for standards in bibliometric research and technology. Scientometrics 35(2): 167-176, 1996.
Grivel, L., Polanco, X., Kaplan, A. A computer system for big scientometrics at the age of the world wide web. Scientometrics, 40 (3): 493-506, 1997.
Katz, J.S., Hicks, D. Desktop Scientometrics. Scientometrics 38 (1): 141-153, 1997.
Ravichandra Rao, I.K. Methodological and conceptual questions of bibliometric standards. Scientometrics 35(2): 265-270, 1996.
Small, H. A general framework for creating large-scale maps of science in two or three dimensions: the SCIVIZ system. Scientometrics 41(1-2): 125-133, 1998.
Sotolongo-Aguilar, G, Guzmán-Sanchez, M.V., García-Díaz, I. Bibliometric Information System For Desktop Research. 5th. International Conference on Science and Technology Indicators: Use of S&T Indicators for Science Policy and Decision-Making 4-6 June 1998, Hinxton, Cambridge, England.
Swanson, D.R., Smalheiser, N.R. An
interactive system for finding complementary literatures: a stimulus to
scientific discovery. Artificial Intelligence 91: 183-203, 1997.
* The Finlay Institute; Address: Calle 212 #3112, e/31 y 37, Lisa, Habana, CUBA; Mailing Address: POBox 16017, Cod. 11600 Habana, CUBA . Phone: 53-7-336212, 53-7-212280 (work); 53-7-215639 (home); Fax: 53-7-336075, 53-7-336754; E Mail: finlayci@infomed.sld.cu
** Universidad de La Habana, Facultad de Comunicación; E mail: csbgv@bib.uc3m.es Carlos Web Pages