FREE ELECTRONIC LIBRARY - Thesis, dissertations, books

Pages:   || 2 |

«Ensuring Consistent Data Mapping Across SDTM-based Studies – a Data Warehouse Approach Annie Guo, ICON Clinical Research, North Wales, PA ABSTRACT ...»

-- [ Page 1 ] --

PharmaSUG2011 - Paper CD21

Ensuring Consistent Data Mapping Across

SDTM-based Studies – a Data Warehouse Approach

Annie Guo, ICON Clinical Research, North Wales, PA


SDTM is about standardization of clinical trials data. This paper presents a tool that helps ensure

consistent data mapping across SDTM-based studies. The tool is comprised of a series of SAS®

programs. The input to the SAS programs consists of three sources: annotated CRF, SDTM data set specifications, and SDTM SAS data sets. The SAS programs run across each study, and summarize the information from the input files. The output is a set of standardized SAS data sets per study that serve as a data warehouse storing the metadata and data contained in the SDTM data sets. This data warehouse approach allows for direct access and comparison among existing studies, bypassing the original sources, as well as providing a reference database useful for facilitating the programming of new studies.


SDTM is about standardizing and normalizing clinical trial data. Therefore, consistent mapping from CRF raw data to SDTM is important especially for projects with multiple studies. However, discrepancies may arise as the number of studies increases. New studies may be assigned to new programmers who sometimes refer to only one or two previous studies that are thought to be most similar to the new ones, and miss out information in other existing studies. On the other hand, it is not realistic to expect anyone to look up the documentation and SDTM SAS data sets in all previous studies, which can be time consuming, in order to cover everything. Alternatively we may designate a lead programmer to oversee and ensure the consistency, but it is only efficient that everyone is on top of his/her own assignment and does not rely on another person to check the work. At the end of the day what we need is a tool that centralizes the metadata and data in our SDTM-based studies. Like a data warehouse, it stores our experiences with SDTM, and serves as a one stop source to look up anything we may need when developing new SDTM-based studies.


Requirements focus on two areas: data warehouse, and reporting of the data in the data warehouse.

Data Warehouse On the data warehouse side, the three sources of SDTM metadata and data, i.e., annotated CRF (aCRF) in PDF, SDTM data set Specifications (Specs) in Excel and SDTM SAS data sets, must be all integrated into a set of SAS data sets across each study. The SAS data sets follow a uniform structure to store SDTM domain names, variable names, and variables values from the three sources. The uniform structure allows for listing or harmonization comparison across studies.

Report 1 List of CRFs and associated SDTM domain(s), with hypertext links to aCRF and Specifications This is a high level overview of the association between CRF and SDTM. In many cases the name of the CRF determines the SDTM domain it is mapped to. For example, Concomitant Medications CRF goes to the published Concomitant Medications - CM domain. However, confusion may arise when it comes to custom domains. For example, Human Anti-Human Antibody Samples CRF, does it go to a custom domain, and if so, have we had one for that CRF? A list like Table 1 would provide clarity and answers to those questions.

The hypertext links to aCRF and Specs in Table 1 provides direct access to the SDTM document. It saves us time and we do not need to navigate through the folder structure on server in order to locate the file and then open the specific PDF page or Excel tab.

Table 1: Report requirement 1 Report 2 List of distribution of SDTM variables on CRF annotations, Specifications and SDTM data sets, including variable values if they are annotated on CRF This structure of this list is one row per SDTM variable per variable value annotated on aCRF, such as the sample in Table 2. The focus is on the CRF annotations. The reason is, in general not all SDTM variables are annotated on aCRF. However, those variables or variable values annotated on aCRF must appear in the Specifications for the study. In addition, the Specifications and the SDTM data sets must have exactly the same SDTM variables within a study. This list would point out any deviation from those rules.

Table 2: Report requirement 2

This list summarizes CRF annotations by SDTM domain and variable, and gives us an idea about the data collected on CRF, without having to open and look at the aCRF files. For example, in Table 2, both studies collect adverse event casualty, AEREL. It appears that there is no annotation for AEREL variable values, so most likely the values are according to pre-printed text on CRF. For CM domain, Study 2 has CMDOSE annotated on CRF, so this is probably a numeric data field to collect medication dose. Study 1 has CMDOSTXT annotated instead, so we can guess the data field on CRF collects character data for not only medication dose but also medication unit or other information.

This list also helps identify inconsistency among studies. For example, EGORRES in Table 2, the variable values for abnormal ECG test results are different according to the CRF annotations. Without actually opening the aCRF to verify, we may guess Study 1 collects ECG test result as either Abnormal or Normal, but Study 2 also asks if an abnormal test result is clinically significant. Another possible cause for the difference would be inconsistent mapping between the two studies. In other words, both studies collect Clinical Significance, but Study 1 has that piece of information mapped to SUPPEG, and EGORRES is set to ABNORMAL regardless of Clinical Significance.

Report 3 List of all variables values in SDTM SAS data sets, cross referencing controlled terminology terms in Specifications The structure of the list is one row per combination of SDTM variable name, variable value, and variable label from SDTM SAS data sets and/or Specifications across studies. For those variables subject to controlled terminology, they are cross-checked to show if the variable labels and values are consistent with the Specifications.

Table 3: Report requirement 3

The purpose of this list is to show variable value mapping across studies. For example, in Table 3, variable AEREL, there are the values MULTIPLE, NOT RELATED, POSSIBLY RELATED and RELATED from the three studies. Breakdown by study, Study 1 has all the values but MULTIPLE according data set Specifications, and only RELATED actually collected on CRF and stored in the AE data set. For Study 2, the only value on AEREL is MULTIPLE. Though it looks like a little different between Study 1 and Study 2, it could be because the two studies following different versions of SDTM IG. So overall there seems to be no discrepancy.

This list can also identify possible data issues in the SDTM SAS data sets. For example, in Table 3, Study 3 seems to have an error. The values on AEREL in AE data set are NOT RELATED and RELATED, but the Specifications file has MULTIPLE as the only controlled terminology term for AEREL.

Report 4 List of paired variable values for --TESTCD and --TEST variables, and paired QNAM and QLABEL in SDTM SAS data sets, cross referencing value level metadata in Specifications Test codes and test names in Findings domains, and QNAM and QLABEL in supplemental qualifiers are one-to-one relation. They may be used consistently across studies unless there are study or sponsor specific requirements.

The structure of this list is one row per paired variable values from SDTM SAS data sets and/or the value level metadata in Specifications. Since it displays all the possible combinations, it is straightforward for anyone to look up what we have had on --TESTCD / --TEST and QNAM / QLABEL, and to make a good judgment if sticking to the existing convention or creating new ones.

Table 4: Report requirement 4

This list is also a tool for identifying differences across studies or validating SDTM SAS data sets against Specifications within a study. For example, the red text in Table 4, Study 3 has used the value EGCS for the variable EGTESTCD, as opposed to the value EGCLSIG used by the other two studies. Another problem in Study 3 is it has the value INTP stored in the SDTM SAS data set, but the Specification file has INTRP as the controlled terminology term for the variable EGTESTCD.

For supplemental qualifier, if a combination of QNAM and QLABEL is specified in Specifications but missing from SDTM SAS data sets, the list displays ** Absence **, for example, in Table 4 the paired values EGCLSP and ECG Clinical Significance Specify. The reason is, not all QNAM and QLABEL values defined in Specifications are required to appear in SDTM SAS data sets. If the CRF data field for that QNAM is completely blank in the raw data, the QNAM / QLABEL is not included in the SUPP-- data set.

Report 5 Extended from Report 4, list of paired variable values on --TESTCD / --TEST plus --CAT / --STRESU, and paired QNAM / QLABEL plus --QORIG / --QEVEL in SDTM SAS data sets, cross referencing value level metadata in Specifications

Table 5: Report requirement 5

This list is to drill down Report 4, with the addition of the category variable --CAT and the standard unit variable --STRESU for SDTM Findings domains, and the qualifier variable origin QORIG and evaluator QEVAL for supplemental qualifier data sets. The structure of the list is one row per combination of the respective 4 variables.

This list displays detailed information about the paired variables. For example, in Table 5, the data related to LBTESTCD is fairly consistent except for Study 3; Study 3 has multiple standard units mmol/L and mg/dL for the lab test Blood Urea Nitrogen, but only mmol/L is present in the value level metadata for that test in Specifications. This implies possible programming issue in Study 3.

For supplemental qualifier data sets, QORIG and QEVAL can be standardized unless there are study or sponsor specific requirements. For example, if QEVAL = INVESTIGATOR has been used as the default mapping, we may stick to it rather than using a different value such as SPONSOR, for example, Study 3 in Table 5.


Base SAS is the programming environment for both the data warehouse and the reports. It is used to import the CRF annotations, and the metadata in Specifications into SAS. For SDTM SAS data sets across each study, as illustrated in Flowchart 1, Proc Freq is used to summarize the values from individual variables and also paired variables. Then the data from the three sources are merged appropriately by SDTM domain, variable and value, and saved in a set of SAS data sets.

Flowchart 1: Data processes across each study Flowchart 2: Reports across multiple studies The output SAS data sets are merged across studies, and reports created as illustrated in Flowchart 2.

Reports are created with SAS ODS. Final reports are in Excel to take advantage of its AutoFilter tool.


Annotated CRF → Data Warehouse The annotations on CRF are created as Comments in the PDF file. They have consistent format and layout by following SDTM Submission Guidelines. For example, in Table 6, the annotation for a test code can be EGTESTCD = INTP, and for a test result EGORRES = NORMAL. QVAL for supplemental qualifier is annotated as, for example, EGCLSIG = N in SUPPEG.

To import the annotations to SAS, first save the annotations in ASCII file. In Adobe Acrobat, in the menu bar, click Comments and Summarize Comments…, and then click Comments Only. This extracts the annotation text to a separate window. Copy all the text and paste to a text editor and save them as ASCII file. For example, in Table 6 the annotations from the PDF file for the 12-LEAD ECG CRF can be saved as the text in ASCII file on the right.

Table 6: Converting annotations from PDF to ASCII file

Once the annotations are in ASCII file, they can be read into SAS with Data step. Note in Table 6, only the text in red is CRF annotations, plus Page 12 that is the page number from the PDF file and is part of the PDF Comments. The text is structured since we follow certain rules when creating the annotations. Therefore we can scan the imported text and extract specification information about SDTM domains, variable names, and variable values. Table 7 is the sample SAS code to process the text. The output SAS data set from the sample code is as in Table 8.

Table 7: Sample SAS code to import ASCII file to SAS

Table 8: CRF annotations saved in SAS data set Data Set Specifications → Data Warehouse SDTM data set Specifications sample are saved in CSV files. They include SDTM domain names, variable names, variable labels, type, length, controlled terminology terms and other required metadata. Table 9 is an example for EG domain and its supplemental qualifier SUPPEG.

Table 9: Sample data set Specifications in CSV file for EG and SUPPEG Data step combined with SAS Macro is used to loop through all domains and read individual CSV files into SAS.

Table 10 is the sample code. Table 11 is the output SAS data set, and it matches the Specifications in Table 9.

Table 10: Sample SAS code to import Specifications CSV files to SAS Table 11: SDTM data set Specifications saved in SAS data set Two other output data sets are created as shown in Table 12 and Table 13. In Table 12 the column Value is the controlled terminology terms extracted from the Specifications, and the column Label is the SDTM variable labels.

Table 13 is the value level metadata extracted from the Specifications, where the column Variable is the target variable names from the Specifications, i.e., EGTESTCD and QNAM, and the column Value stores the values of the two target variables, i.e., INTP and PR, and EGCLSIG, respectively. The column Label in Table 13 is the corresponding test names from EGTEST, i.e., ECG Interpretation and PR Interval, and the qualifier variable label from QLABEL, i.e., Clinically Significant.

Pages:   || 2 |

Similar works:

«Interaction Dialer 3.0 Best Practices March 20, 2009 Interaction Dialer 3.0 Best Practices ©2009 Interactive Intelligence Inc. All Rights Reserved Table of Contents Overview Configuration Default Settings Copy & Paste Automation with Rules Change Auditing Pacing Key Concepts AMD/LSD Trade-off Blending Pacing Slider Bar Understanding Abandons Calls per Agent & Server Parameters Customer Experience The Manage Scheduled Calls dialog SIP Line configuration Converting Callbacks Dialer Efficiency...»

«Job Interview Strategies For People with a Visible Disability‘ RHONDA TAGALAKIS CATHERINE FICHTEN~ VICKI AMSEL S. McGill U i e s t nvriy Dawson College Montreal, Quebec Montreal, Qrrchec Montreal, Quebec A total of 117 students participated in the present investigation, which compared wheelchair-user and able-bodied job applicants as well as two interview-taking strategies available to wheelchair users: disclosing the disability during the telephone screening or not doing so and acknowledging...»

«Young Goodman Brown by Nathaniel Hawthorne All new material ©2011 Enotes.com Inc. or its Licensors. All Rights Reserved. No portion may be reproduced without permission in writing from the publisher. For complete copyright information please see the online version of this text at http://www.enotes.com/young-goodman-brown-text Table of Contents Notes Nathaniel Hawthorne Biography Reading Pointers for Sharper Insights Young Goodman Brown i Notes What is a literary classic and why are these...»

«Truncated Love: A Response to Andrew Marin’s Love Is an Orientation Part 1 Robert A. J. Gagnon, Ph.D. Pittsburgh Theological Seminary gagnon@pts.edu, www.robgagnon.net August 31, 2010 Andrew Marin’s book, Love Is an Orientation: Elevating the Conversation with the Gay Community (2009), has been gaining some traction in evangelical circles. Having just finished reading the book I am stunned that an evangelical press like InterVarsity would publish such a fatally flawed work—and that...»

«David Hayes        ’       First the snapshots. In this one, my father, then twenty-eight years old, is sitting in a  Plymouth roadster convertible, beige with a red pinstripe along the side. It has red leather upholstery, including the rumble seat. My father was, according to my mother, a rakish bachelor who lived with a gang of fellows in a house in St. Catharines, nicknamed “the homestead.” His friends called him “HurryUp...»

«Credit Opinion: Lloyds Bank Plc Global Credit Research 01 Jul 2015 London, United Kingdom Ratings Moody's Category Rating Outlook Positive Bank Deposits A1/P-1 Baseline Credit Assessment baa1 Adjusted Baseline Credit Assessment baa1 Senior Unsecured A1 Subordinate Baa2 Jr Subordinate Baa3 Pref. Stock Ba1 Commercial Paper P-1 Other Short Term -Dom Curr (P)P-1 Parent: Lloyds Banking Group plc Outlook Positive Senior Unsecured Baa1 Subordinate Baa2 Pref. Stock Non-cumulative Ba1 Other Short Term...»

«U n i t e d n at i o n s C o n f e r e n C e o n t r a d e a n d d e v e l o p m e n t Investment Advisory Series Series A, number 6 INVESTMENT PROMOTION HANDBOOK FOR DIPLOMATS Investment Advisory Series Series A, number 6 United Nations Conference on Trade and Development INVESTMENT PROMOTION HANDBOOK FOR DIPLOMATS     United Nations New York and Geneva, 2011 Note As the focal point in the United Nations system for investment, and building on more than three and a half decades of experience...»

«564 June 3, 2015 No. 256 IN THE COURT OF APPEALS OF THE STATE OF OREGON Ronda LUNSFORD, Personal Representative of the Estate of Rodney Gale Lunsford, Deceased, Plaintiff-Appellant Cross-Respondent, v. NCH CORPORATION, dba Certified Labs, and dba Chemsearch; Aervoe Industries, Inc., fdba Zynolyte Products, Co., Inc.; Rust-Oleum Corporation; The Sherwin-Williams Company, dba Dupli-Color Products, Co., and fdba Dupli-Color Products, Co.; Dupli-Color Products, Co.; and Rodda Paint, Co.,...»

«www.ccsenet.org/jgg Journal of Geography and Geology Vol. 4, No. 1; March 2012 El Raval and Mile End: A Comparative Study of Two Cultural Quarters between Urban Regeneration and Creative Clusters Diane-Gabrielle Tremblay & Angelo Battaglia Community-University Research Alliance 100 Sherbrooke Ouest (west), Montréal, Québec H2X 3P2, Canada E-mail: tremblay.diane-gabrielle@teluq.ca Received: December 28, 2011 Accepted: January 16, 2012 Published: March 1, 2012 doi:10.5539/jgg.v4n1p56 URL:...»

«Sensors 2012, 12, 8675-8690; doi:10.3390/s120708675 OPEN ACCESS sensors ISSN 1424-8220 www.mdpi.com/journal/sensors Article WebTag: Web Browsing into Sensor Tags over NFC Juan Jose Echevarria 1,*, Jonathan Ruiz-de-Garibay 1,*, Jon Legarda 1, Maite Álvarez 2, Ana Ayerbe 2 and Juan Ignacio Vazquez 3 Deusto Institute of Technology–DeustoTech, University of Deusto, Avenida de las Universidades 24, Bilbao 48007, Spain; E-Mail: jlegarda@deusto.es Information & Interaction Systems, Tecnalia, Parque...»

«Please note that this is an author-produced PDF of an article accepted for publication following peer review. The definitive publisher-authenticated version is available on the publisher Web site Marine Geophysical Researches Archimer, archive institutionnelle de l’Ifremer http://www.ifremer.fr/docelec/ Volume 27, Number 3 / September, 2006 http://dx.doi.org/10.1007/s11001-006-9000-7 ©2006 Springer Science+Business Media The original publication is available at http://www.springerlink.com...»

«259 Chapter 7 A Proposal for Measuring Value Orientations across Nations Shalom H. Schwartz The Hebrew University of Jerusalem Contents 7.1. Suggestions for the ESS core module from Shalom Schwartz 7.1.1 The Nature of Values 7.1.2 Current Practice in Measuring Values 7.1.3 A Theory of the Content and Structure of Basic Human Values 7.1.4 The Recommended Method for Measuring Values in the ESS 7.1.5 Proposed Value Items for ESS 7.2 Evaluation of the Human Values scale 7.2.1 The definition of...»

<<  HOME   |    CONTACTS
2016 www.dis.xlibx.info - Thesis, dissertations, books

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.