Skip to main content
 
Exemplary Practices in Electronic Records and Information Access Programs



Patterns of Exemplary Practice in Electronic Access to Information

Metadata

The quality and completeness of metadata are key factors in access practices of all kinds. The search and interactivity capabilities described above depend in large part on the metadata resources available to the searchers, the applications, and engines that do the work. The same applies to methods for integrating information from diverse sources. Managing and sharing information resources depends on the ability to describe and interpret the contents of data repositories and is also a direct function of metadata resources. However in spite of the centrality of metadata to these access programs, there were two distinct types of metadata practices reported in the research. The first had to do with improving the quality and usefulness of metadata for structured data sets, primarily statistical in nature. The other consisted of ways to create metadata for data resources that lacked it altogether or had substantial gaps in the available metadata. The strategies differ markedly between these and so are discussed separately.

The repositories that were concerned primarily with structured statistical data sets devoted more attention to the quality and completeness of metadata resources. Part of the proactive acquisition discussed for the central archive above, involves working with principal investigators who are developing new data resources. By working with these investigators prior to data collection, the staff of the central archive could insure the quality and completeness of metadata provided with those new data sets. A similar proactive approach was used by the UK data archives. These archives developed metadata standards for use by providers of data for their repository. They also worked closely with high CPS in developing the standards and applying them to development of the NASA program. Part of the effort to provide adequate metadata to users of statistical databases was directed to the problem of multiple languages in use. The central archive and the UK data archives both deal extensively with researchers from many countries. This raises the problem of translation of metadata to make it accessible internationally. The UK data archives are working with the European Community to develop a multilingual thesaurus for metadata and to develop automatic indexing capabilities. They are also working to develop what they referred to as "contextual metadata." This type of metadata would provide information to the user about the circumstances surrounding the data collection.

Standardizing and ensuring adequate metadata is a particular problem for repositories. It is a special problem for those that except datasets from a wide variety of sources. The ICPSR reported investing substantial staff resources in reviewing the metadata received with datasets. The staff will require additional documentation from suppliers when necessary. Standardized metadata is also important for repositories that provide search capability based on metadata files. This is true of the NASA Global Climate Change Archive and Federal justice statistics maintained by the Urban Institute. For the global climate change archive, NASA relies on the many suppliers of datasets to maintain the accuracy and currency of metadata on the NASA system.

Complete and high-quality metadata is much less likely to be available for data sets that come from administrative processes, collections of text, and other archival material. Metadata for these kinds of resources is typically created through indexing or tagging processes. For small volumes of material, indexing and tagging can be done manually. But that is infeasible for large volumes of information. Automatic indexing is a form of computer-based text analysis that assigns Index term, or tag, to a section of text or other material. Systems to do this kind of indexing automatically can be very valuable, but also very difficult to develop and maintain. For a general-purpose Library, such as the Washington State Library, the variety of material submitted is very large, making the indexing problem even more difficult. The Washington State Library reported success to some degree in indexing up to 40,000 current documents using their automated system. They also described efforts to work with information providers in order to have them contribute to that indexing process. They are attempting to provide support and standards for the originators of information to provide adequate indexing and other metadata to the repositories.