FREE ELECTRONIC LIBRARY - Thesis, dissertations, books

Pages:   || 2 | 3 |

«1. Introduction This is a draft of a reference model for mapping tools. The intension is to address a comprehensive and sustainable functionality in ...»

-- [ Page 1 ] --

A Reference Model for Data Mapping tools

Draft Update August 2013

1. Introduction

This is a draft of a reference model for mapping tools. The intension is to address a comprehensive

and sustainable functionality in a scenario of information providers and aggregators, including the

long-term maintenance of resources. We assume a distribution of responsibilities, in which the

information provider curates his/her resources and provides in regular intervals updates, whereas

the aggregator is responsible for the homogeneous access to the integrated data and the resolution co-references (multiple URIs for the same thing) across all provided data. In the course of the transformation of resources to the target system, some kinds of quality control can be done which the provider has no means to do (see also the services provided by OCLC). Therefore the information provider receives and benefits from data cleaning information produced by the aggregator. The challenge is to define a modular architecture of as many components as possible that can independently be developed and optimized with minimal interfaces between them, without hindering integrated UI development for the different user roles involved.

The first part of this model is a sort of requirements specification, which breaks down in the usual way into a definition of the involved user roles, the primary kinds of data the users aims at handling and the complete definition of the processes users carry out to manage these data.

2. Process Model In the following, we specify user roles and data objects in order to have a vocabulary to define user processes. Of course, these things are interdependent, so a certain redundancy is inevitable.

We do not regard IT processes as self-contained and opposed to user processes, but IT processes are regarded as being part of the user processes replacing manual work. Once such a model has been defined, it allows for a dynamic definition of which user processes are replaced or assisted by IT processes, and to justify IT processes in terms of their utility for the functions users are ultimately interested in.

2.1 User Roles

Primary user roles are:

Provider: We call providers in this model the maintainers of Local information systems. In this model, we may also call them simply source systems. Following CIDOC CRM v5.0, “These are either collection management systems or content management systems that constitute institutional memories and are maintained by an institution. They are used for primary data entry, i.e. a relevant part of the information, be it data or metadata, is primary information in digital form that fulfils institutional needs”. In practice and even more general, these are individual museums, archives, libraries, sites and monument records, academic institutes, private research societies etc., represented by their curators, IT referents or researchers. Providers ultimately have the knowledge about the meaning of their data in the real world (if anybody has it), or know who knows, or know how to verify it.

Aggregator: We call aggregators in this model the maintainers of Integrated access systems. In this model, we may also call them simply target systems. Following CIDOC CRM v5.0, “These provide an homogeneous access layer to multiple local systems. The information they manage resides primarily on local systems.” Aggregators will maintain a form of business agreements with providers to send data from local systems to the aggregators’ systems, primarily metadata. In this model, we are not interested in aggregators doing harvesting without any business interaction with the provider. The model will be useful for such scenarios in a trivial way we do not explicitly describe. Aggregators have no direct knowledge about the meaning of the data they aggregate.

Secondary user roles are experts whose knowledge or services contributes to the mapping


Source Schema Expert: The curator, researcher or data manager of the local system who is responsible for the semantically correct data entry into the local system, i.e., the one who knows how fields, tables or elements in the schema correspond to the reality described by them following local use and practice.

Target Schema Experts: The expert(s) for the semantics of the schema employed by the aggregator (“integration model”). It is very likely that the aggregator uses a more widely known standard schema. Typically but not exclusively, we talk in this model about the CIDOC CRM and extensions of it. Therefore the target schema experts may not need to have intimate knowledge of the aggregator’s context. Moreover, there does not yet exist an established practice of a curator role at the aggregator side. Nevertheless, the requirement for semantic consistency in practice forces such a role to exist de facto, often a user team including some provider representatives. Therefore the target schema experts should include or be in close contact with a sort of curators of the integrated access system to fulfill this role.

URI Expert: The expert of the aggregator, normally an IT specialist, who is responsible for maintaining the referential integrity of the (meta)data in the integrated access system and who knows how to generate from provider data valid URIs for the integrated access system.

Source Terminology Expert: The curator, maintainer or other expert of one of the terminologies which the provider use as reference in the local system. If the terminology is provided by a third party, such as the Getty Research Institute, there may exist independent experts trained in this terminology. If it is local or even uncontrolled, it is typically the curators or other local data managers who know the meaning of the local terms.

Target Terminology Expert: The curator, maintainer or other expert of one of the terminologies which the aggregator uses as reference in the integrated access system.

Aggregator normally want to avoid to engage in the terminology maintenance business.

They will rather use and refer to third party terminology, or take over provider term lists.

Mapping manager: The actor responsible for the maintenance of the data transformation process from the provider format to the aggregator format. This role may split into a semantic and a technical part, and may be regarded as either an aggregator task, a provider task or a user consortium’s task. To our opinion, the mapping technology this model aims at should enable a scalable management of the data transformation process by the aggregator.

2.2 Data objects

We distinguish the following primary data objects:

Source systems: I.e., local information systems in the sense of the CIDOC CRM v5.0 (“These are either collection management systems or content management systems that constitute institutional memories and are maintained by an institution”) from which (meta)data are sent on a regular base or in a single action to some aggregator. We are interested here in their typical role relative to the processes we describe, regardless if they also may play some target system role to a third party.

Target system: i.e., integrated access systems in the sense of the CIDOC CRM v5.0 (“These provide an homogeneous access layer to multiple local systems. The information they manage resides primarily on local systems”), to which (meta)data are sent on a regular base or in a single action by several providers. We are interested here in their typical role relative to the processes we describe, regardless if they also may play some source system role to a third party.

Terminologies: Controlled vocabularies of terms that appear as individual data values in the source or target systems. We do never use the term “vocabulary” for metadata schemata. Terminologies may be flat list of words or be described and organized in more elaborate structures as so-called “thesauri” or “knowledge organization systems”, the most popular format now being SKOS. Here, we do not use the term “ontology”, even if the terminology may qualify as such, as long as its use in this context is to provide data values.

Content objects: Individual files or information units with an internal structure that is not described in terms of schema elements of the source or target systems. These are typically images or text documents, and searched by content retrieval indices such as keyword search, rather than by associative queries. They are described as objects by metadata records which are searched by associative queries. Important in this context is not the actual structure of an information unit to be qualified as a content object, but the way it is treated in the information system (sometimes also called “blobs” ). Many aggregators do not collect content object but only link them back to the provider system.

Metadata records: Information units with an internal structure that is described in terms of schema elements of the source or target systems. In our context, these are often data records describing content object (therefore the term “metadata”), but bad analogy brought the term also into use for data describing physical objects and other historical contexts. Therefore we define it here by the way it is treated in the information system, and not as “data about data”. The metadata records are the common target of submission to aggregators and therefore of transformation from the source to the target schema.

Secondary data objects are those that support the mapping processes:

Source schema definitions: data dictionaries, XML schemata, RDFS/OWL files etc.

describing the data structures that are managed and can be searched by associative queries in the source or systems.

Integration Model: The definition of the schema of the target system, now mostly an RDFS/OWL knowledge representation model (“ontology”).

Other kinds of data objects which are part of this reference model in the sense of products or interface definitions of the components it foresee, such as schema matching definitions etc. (see below)

2.3 Mapping processes 2.3.1 Overview This reference model aims at identifying, supporting or managing the processes needed to be executed or maintained when a provider and aggregator agree (1) to transfer data from the provider to the aggregator, (2) to transform their format to the (homogeneous) format of the aggregator, (3) to curate the semantic consistency of source and target data and the global referential integrity and (4) to maintain the transferred data up-to-date with whatever relevant changes occur in the source and target systems and the employed terminologies.

Note that experienced aggregators keep the original data from the provider, so that they can reexecute the data transformation process without asking for resubmission.

Figure 1: Mapping Processes diagram

At a first level, this breaks down into the following independent processes:

(a) Management of which data will be delivered and processed at what time, including updates.

(b) The mapping definition, i.e., specification of the parameters for the data transformation process, such that complete sets of data records can automatically be transformed, manual exception processing notwithstanding. This includes harmonization between multiple providers.

(c) The actual transfer of data until a first consistent state is achieved. This includes transformation of sets of data records submitted to the aggregator, the necessary exception processing of irregular input data between provider and aggregator, ingestion of the transformed records into target system and initial referential integrity processing possibly on both sides.

(d) Referential integrity processing at the aggregator side out of the context of a particular data submission, which is not our concern in this model.

(e) Change detection and update processing to restore ability of data transformation and semantic consistency, which comprises changes in the source target records, in the source or target terminologies, in the source or target schemata, in the target URI policy and in the good practice of interpretation of source and target schema in the mapping definition.

Only if these processes are sustained, an aggregator can provide valid and consistently integrated data in long terms, and thereby deliver the full added value of an aggregation service as a resource for professional and private research, which ultimately justifies its existence. We observe that absolutely none of the dozens or hundreds of mapping tools and frameworks created in numerous projects has ever systematically addressed this comprehensive scenario.

2.3.2 Analytical Representation Data Delivery Management Data Delivery Management deals with which data will be delivered and processed at what time, including updates. A Mapping Manager may be responsible for this task, possibly with a custodial participation or supervision on provider and aggregator side.

Here we do not further analyze the subprocesses which do not affect the mapping itself. In general, IT- support will be given by log-files of delivered content, facilities to query (last) changes of source system records or other data requiring resubmission and queries to support selection criteria in the source systems. Also, queries in the target system may be used to reveal semantic needs in the composition of the aggregation and to derive requests for certain materials from providers.

There is however a set of characteristic changes in the provider – aggregator environment that affect the mapping and may require redefinition of the mapping reexecuting the transformation of records already submitted to the aggregator and updating the transformed records in the target system resubmission of records from the source system.

The mapping manager must monitor such changes and initiate respective actions. Mapping Definition Mapping definition breaks down into Syntax normalization Schema matching, URI generation specification Terminology mapping The Mapping Manager may be responsible for issuing and coordinating these tasks.

Pages:   || 2 | 3 |

Similar works:

«Environmental Pillar Submission on the National Bioenergy Strategy 2013 Environmental Pillar Submission on the National Bioenergy Strategy May 2013 Page 1 of 12 Environmental Pillar Submission on the National Bioenergy Strategy 2013 Introduction Although reductions in greenhouse gas emissions are the primary driver for bioenergy policies, the full life cycle emissions of forms of different bioenergy are not yet fully understood but are known to vary dramatically. Even where biofuels are grown...»

«(Dallas County Community Colleges) Arts and Communications Division Beginning Spanish II SPAN 1412.43090 (1011687) 4 credit hours SPRING, 01/20/16 – 05/12/16 TR Course Face-to-face 11:00 AM –12:20 PM Room Class Period : TR Class : C 291 Instructor : Ana Piffardi Telephone : (972) 860-7661 MWF 9:30 -10:00 am, – 1: 15-2:00 pm Office Hours : TR 9:15 9:25, 1:50 – 2:00 pm, Fridays only 9:00-12:00 pm Fax : 972-860-7248 Email : anapiffardi@dcccd.edu Catalogue Description Beginning Spanish 1412...»

«Rumen Development in the Dairy Calf Jud Heinrichs Dairy and Animal Science Department, The Pennsylvania State University, University Park, PA Email: ajh@psu.edu Website: http://www.das.psu.edu/dcn Take Home Messages 8 Young calves have undeveloped rumens at birth and must undergo physiological changes before they can digest high fiber feeds. 8 Concentrate feeds are digested to propionic and butyric acids in the rumen and stimulate the growth of the rumen papillae. 8 Digestion of milk and...»

«CONSEIL CANADIEN POUR LES RÉFUGIÉS CANADIAN COUNCIL FOR REFUGEES Canadian Council for Refugees and the elected Sponsorship Agreement Holder representatives Comments on Private Sponsorship of Refugees evaluation September 2007 A) Introduction In May 2007, Citizenship and Immigration Canada published the “Summative Evaluation of the Private Sponsorship of Refugees Program.” Concerns about serious flaws in the report have prompted the Canadian Council for Refugees and the elected Sponsorship...»

«BP PLC, Total S.A., And Statoil ASA Downgraded After Oil Price Assumptions Revised And Annual Results Released Primary Credit Analysts: Alexander Griaznov, Moscow (7) 495-783-4109; alexander.griaznov@standardandpoors.com Lucas Sevenin, Paris (33) 1-4420-6661; lucas.sevenin@standardandpoors.com Simon Redmond, London (44) 20-7176-3683; simon.redmond@standardandpoors.com • The weak industry outlook and our Jan. 12, 2016, revision of our oil price assumptions led us to place major European oil...»

«betontod wikipedia betontod wikipedia Betontod:Viva Punk! Lyrics LyricWikia Wikia Betontod This song is performed by Betontod.Betontod:Viva Punk! ');var c=function() Wikipedia: search for Betontod Viva Punk! Spotify: Viva Punk! Betontod LinkFang.de Betontod ist eine deutsche Punkrock-Band aus Rheinberg. Bisher hat die Band über 700 Konzerte gespielt und neben sechs Studioalben zwei Livealben BETTER THAN HELL SHOP // BETONTOD Under Construction Learn and talk about Betontod, German Learn and...»

«Contend for the Kingdom! For so many people,their life circumstance are overwhelming. God promises that when you are in the mix of life’s struggles, you are not in this contest alone. He is with you, to CONTEND, to fight with you with an enduring passion. CONTEND, facing your future with uncompromising grit. This series will help you experience the joy of a lasting, enduring faith! The Influence of A Great Leader Week 1The True Grit of an Underdog Week 2Compelaggareuōto employ a courier,...»

«Remarks Of Jay T. Harris At Harvard University Cambridge, Massachusetts May 16, 2001 Good evening. It is a both a pleasure and an honor to be with you this evening. I would like to thank Alex Jones of the Shorenstein Center for the invitation to talk with you tonight about matters of significant importance to journalists – but even greater importance, I hope to convince you, to the American people. It is the happy irony of my current situation that my thoughts on American journalism – its...»

«GENDER IN FUSION KUGAK: AN EXAMINATION OF WOMEN’S FUSION KUGAK GROUPS AND THEIR MUSIC PRACTICES by Jungwon Kim Bachelor of Music, Kyungwon University, 1999 Bachelor of Music, Korea National University of Arts, 2004 Master of Arts, Seoul National University, 2008 Submitted to the Graduate Faculty of The DIETRICH School of Arts and Sciences in partial fulfillment of the requirements for the degree of Master of Arts University of Pittsburgh UNIVERSITY OF PITTSBURGH THE DIETRICH SCHOOL OF ARTS...»

«Click here to view the fund's statutory prospectus or statement of additional information. Vanguard FTSE All-World ex-US ETF Summary Prospectus February 26, 2016 Exchange-traded fund shares that are not individually redeemable and are listed on NYSE Arca Vanguard FTSE All-World ex-US Index Fund ETF Shares (VEU) The Fund’s statutory Prospectus and Statement of Additional Information dated February 26, 2016, as may be amended or supplemented, are incorporated into and made part of this Summary...»

«Inhaltsverzeichnis Inhaltsverzeichnis Packungsinhalt Systemanforderungen Funktionen und Leistungsmerkmale Hardware-Überblick Verbindungen LEDs Firewall und DMZ Installation Erweiterte Drahtloseinstellungen Vor der Inbetriebnahme Erweiterte Netzwerkeinstellungen Anmerkungen zur drahtlosen Installation. 9 Routing Verbindung über Kabel/DSL/Satellitenmodem. 10 Geräteverwaltung Verbindung zu einem anderen Router Speichern und Wiederherstellen Firmware-Aktualisierung Konfiguration...»

«Giovanni Asproni. Agile Times, Vol. 4, February 2004 Motivation, Teamwork, and Agile Development Giovanni Asproni aspro@acm.org http://www.giovanniasproni.com Introduction Motivation as defined by the Merriam-Webster dictionary 11th edition is “1 a : the act or process of motivating b : the condition of being motivated 2 : a motivating force, stimulus, or influence : INCENTIVE, DRIVE”. The fact that motivation is the most important factor for productivity and quality is not a new discovery....»

<<  HOME   |    CONTACTS
2016 www.dis.xlibx.info - Thesis, dissertations, books

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.