The Research Issues: Using Integrated Multiple Data Sources
What is Data Integration?
Data integration is the process of the standardization of data definitions and data structures by using a common conceptual schema across a collection of data sources (Heimbigner and McLeod, 1985; Litwin, et al., 1990). Integrated data will be consistent and logically compatible in different systems or databases, and can use across time and users (Martin, 1986).
Goodhue et al. (1992, p294) defined data integration as "the use of common field definitions and codes across different parts of an organization". According to Goodhue, et al. (1992), data integration will increase along one or both of two dimensions: (1) the number of fields with common definitions and codes, or (2) the number of systems or databases adhering to these standards. Data integration is an example of a highly formalized language for describing the events occurring in an organization's domain. The scope of data integration is the extent to which that formal language is used across multiple organizations or sub-units of the same organization. The objective of data integration is to bring together data from multiple data sources that have relevant information contributing to the achievement of the users' goals (AFT, 1997).
The Advanced Forest Technologies in Canada (AFT, 1997) identified the following factors which must be addressed to integrate data properly:
-
identification of an optimal subset of the available data sources for integration
-
estimation of the levels of noise and distortions due to sensory, processing, and environmental conditions when the data are collected
-
the spatial resolution, the spectral resolution, and the accuracy of the data
-
the formats of the data, the archive systems, and the data storage and retrieval
-
the computational efficiency of the integrated data sets to achieve the goals of the users