The Prototype would not have been usable without specific data sets to support each application. Since creating new data sets was unrealistic within the scope of the project, data was extracted from production systems, scrubbed, and integrated in the Prototype so its functions and features could be used in ways similar to real world uses. The data sources used were not necessarily the best authoritative source for a production system. They were chosen to provide enough appropriate data to support the Prototype.
Since the scope of the Prototype could not support data for the entire state, each application was populated with data about county and municipal governments from the following 15 NYS Counties: Monroe, Niagara, Ontario, Broome, Cayuga, Cortland, Jefferson, Clinton, Essex, Albany, Saratoga, Schenectady, Washington, Ulster, Westchester. Data about three NYS agencies (NYS Department of Agriculture and Markets, NYS Office of the State Comptroller, and NYS Office of Real Property Services) was also included. All data in the Prototype was specific to one or more of the applications. Applications and corresponding data sources are shown in Table 3.
|
Application
|
Data Source
|
Notes
|
|---|---|---|
|
Overall Gateway |
|
|
|
Contact Repository Application |
|
|
|
Dog Licensing Application |
|
|
|
Parcel Transfer Verification Check Application |
|
|
To be usable by the Prototype, all the data sets needed to go through at least one of four transitions:
-
migration – one-time move from one system to another,
-
integration – of multiple data sources into a single set,
-
cleaned – scrubbed for inconsistencies. or
-
re-creation – new data set created with new business rules
-
How was the data collected?
-
How was it managed?
-
What do each of the data fields mean and how do they relate to one another?
-
How will the data be used in the prototype?
-
How can the existing data fields be mapped into the new structure?
As seen in the steps above, the Prototype Team and the Corporate Partners addressed all the traditional data issues such as:
-
"dirty data," (e.g. inaccurate, duplicated, conflicting, or improperly defined),
-
moving data from several sources into a centralized, relational structure,
-
accounting for historical features and tracking over time, and
-
incorporating new data fields that are not in the current sources but extend the usefulness of the data (e.g., email addresses for dog licenses).
© 2003 Center for Technology in Government
