«PUBLIC UK DATA ARCHIVE 30 MARCH 2015 UNIVERSITY OF ESSEX Version 08.00 WIVENHOE PARK. COLCHESTER ESSEX, CO4 3SQ T +44 (0)1206 872001 E ...»
QUALITATIVE DATA COLLECTION INGEST
PUBLIC UK DATA ARCHIVE
30 MARCH 2015 UNIVERSITY OF ESSEX
Version 08.00 WIVENHOE PARK ………………………......... COLCHESTER ESSEX, CO4 3SQ T +44 (0)1206 872001 E email@example.com www.data-archive.ac.uk ……………………….........
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International Licence. To view a copy of this licence, visit http://creativecommons.org/licenses/by-sa/4.0/ ……………………………
WE ARE SUPPORTED BY THE UNIVERSITY OF ESSEX, THE ECONOMIC AND SOCIAL
RESEARCH COUNCIL, AND THE JOINT INFORMATION SYSTEMS COMMITTEECD093-QualitativeDataCollectionIngestProcessingProcedures_08_00w.doc Contents
1. Qualitative data
2. Pre-processing checks on qualitative data collections
2.1. Processing Plan
3. Qualitative data formats
4. Ingest processing of interview transcripts
4.1. Making processing copies of original files
4.2. Formatting the interview transcript
4.2.1. Standard header information
4.2.2. Transcript template font
4.2.3. Interviewer/Respondent demarcation tags
4.2.4. Formatting of transcripts and non-transcript items
4.3. Reading and ingest processing transcripts
5. Confidentiality and anonymisation of textual data
6. Documentation metadata
7. The Data List
7.1. Confidentiality of the Data List
8. Read and Note files
8.1. Providing Read file information for mixed-methods data collections
9. Data file naming conventions for qualitative data
9.1. Interview Data
9.2. Focus Group Data
9.3. Photographic and Video Data
9.4. Naming Files according to Data Type
10. File naming conventions for qualitative documentation
10.1. User Guides
10.2. Data Lists
10.3. Other documentation
11. Digitisation of paper collections
11.1. Digitisation and scanning
Scope This document covers the ingest processing of qualitative data collections archived at the UK Data Archive. It does not cover quantitative ingest processing, which is covered in the document Quantitative Data Ingest Processing Procedures. However, where mixed-method data collections contain both qualitative and quantitative data elements, the procedures in this document and Quantitative Data Ingest Processing Procedures must be followed as appropriate. Finally, this document refers to, but does not cover fully, Documentation Ingest Processing Procedures, which are included in the document Documentation Ingest Processing Procedures.
It should be noted that some of the documents referenced within the text below are not publicly available, but external readers may of course contact the Archive in case of query.
quantitative data are covered in Quantitative Data Ingest Processing Procedures. It may be possible to link quantitative and qualitative data elements to respondents within the same study, for example a survey file may exist alongside a set of interview transcripts. This should be explicitly mentioned in the information provided to users with the collection.
2. Pre-processing checks on qualitative data collections
2.1. Processing Plan Once all administrative materials have been received, the Collections Development team will pass the materials to the Ingest Services team for assessment. The data, documentation and administrative materials will be checked and a unique study number will be allocated, the materials placed into the standard archival directory structure, and the Calm processing database entry created. In addition a processing plan is drawn up (based on a template) with recommendations for processing. The processing plan is produced as an aid to processing staff. This is based on knowledge of the collection and deposit negotiations and an initial review of the material after deposit. In addition, Collections Development and Producer Support staff may have been involved in assisting the depositor with preparing the data for deposit, suggesting anonymisation strategies and advising on access restrictions. The Ingest Services team will consult with the other teams as needed.
Production of the processing plan will involve;
Final assessment of completeness and quality of the deposit Assessment of confidentiality issues and wishes of depositor (e.g. access conditions) Advice on the ingest processing tasks to be done (layout edits, data listing, user guides) Notes on unusual details - such as early issue of study numbers, handling of audio or images The ingest processing demands of qualitative collections vary considerably depending on the individual collection. Ingest staff are therefore encouraged to see the plan as flexible and open to discussion. Its purpose is to help orientate the processing officer to a collection and provide a clear statement on its size, content, access conditions and individual requirements. Once the processing plan has been created, the collection is ready for ingest processing. Before processing commences, the processor must make a copy of the data collection on their allotted network area and work on that, so there is always a ‘master’ copy available during ingest processing, in case of problems.
3. Qualitative data formats Most qualitative data collections currently comprise sets of interview transcripts, most commonly in Rich Text Format (RTF), Microsoft (MS) Word (.doc files), or plain text format (.txt). For further information on how to process sets of interview transcripts in these common formats, see below. However, specialist qualitative software package files are also sometimes included in the deposit.
Qualitative software packages such as Nvivo, NUD*IST, ATLAS-ti and MaxQDA have export facilities that enable one to save a whole 'project' consisting of the raw data, coding tree, coded data and associated memos and notes. The coding process is often subjective and geared towards specific themes, and therefore may not be useful for the secondary analyst's topic of investigation. However, for larger studies, coded data may be helpful to aid searching and navigation through voluminous bodies of text. For archival purposes the raw data, the final coding tree and any useful memos should be exported as separate files prior to deposit. The coded elements of the collection can be provided to users as long as raw data are also made available. Depositors are requested to export all coded data to export formats prior to deposit and checks should be made prior to processing to ensure this has been done. It may not be possible to reliably preserve coded and annotated data for the long-term at the Archive, as they cannot all be exported in a common nonPage 3 of 15 CD093-QualitativeDataCollectionIngestProcessingProcedures_08_00w.doc proprietary format (though some packages do use XML to export coded data, which is suitable for archiving).
A set of procedures for processing ATLAS-ti format data collections is included in the document Atlas Processing Guide (not currently a controlled document).
4. Ingest processing of interview transcripts The processing of qualitative material is an exercise in comprehension. The need for careful checking of content means it is not sufficient to read a sample. All data and documentation is read to protect the interests of participants, data creators, data users and the UK Data Archive. Whilst this need is unavoidable, wherever possible ways of standardising stages in ingest processing work are sought. This allows for efficiency of time and consistency in how a diverse range of collections can be presented.
4.1. Making processing copies of original files Before processing begins, good data management and archival practice dictates that a separate processing copy of all data and documentation files must be made, and that all processing work is done using these copies rather than the original files. The copies can be edited to fix errors and resolve confidentiality issues (see section 5 below), whereas the original files should be left in their original state, in case of future query (unless the depositor requires their destruction). Original files are placed in the data collection structure under noissue/original/, with subdirectories as appropriate, and are not disseminated to users. Capitalisation and spaces should be removed from the original filenames before plattering (where a large number of files are affected, file renaming software is available for this purpose).
4.2. Formatting the interview transcript
If the depositor has provided consistently clear and well-formatted interview transcripts throughout the set, these may be retained with the minimum of editing being undertaken. Depositors should be encouraged to set transcripts to the standard Archive format prior to deposit. A transcript example is available for their use.
However, a pragmatic approach is required if this is likely to prove difficult for the depositor, for example where the project has long since ended and project staff have moved to other posts.
A large proportion of qualitative digital data collections are deposited in MS Word format. However Rich Text Format (RTF) is the standard Archive preservation format for this kind of text. Qualitative data are typically distributed in RTF, for better readability than plain text and cross-platform usability. Therefore, while the original Word files are of course retained for archival storage, an RTF copy of all MS Word files must be created for ingest processing, dissemination and archival purposes, using the appropriate Archive naming convention (see section 9 below).
Editing and formatting of text are normally kept to a minimum during ingest processing, as this can be very time-consuming and resource-intensive. Where the interview transcripts are in reasonable format as received from the depositor, editing will be limited as long as demarcation between the interviewer’s questions and the interviewee’s responses is clear. However, where the data collection is subject to enhanced processing, for example if it is part of a special project such as the Pioneers data collection, an RTF interview transcript template using the standard Archive interview transcript format may be constructed.
This will ensure that consistent font specification and speech demarcation is applied, and that standard header information is included.
4.2.1. Standard header information The Archive’s standard header information comprises the study number and title (may be copied from the unpublished catalogue record entry in the catalogue input programs), along with the depositor’s name. for brevity, this is enough for most standard collections, as long as the interview id number is displayed at the head of the interview.The text file name (created at the Archive) may also be addedfor enhanced collections.
For file naming conventions, see section 9 below. Where the depositor has already added the bulk of the information in a consistent format and font throughout the set of transcripts, the Archive-specific information only need be added.
Page 4 of 15 CD093-QualitativeDataCollectionIngestProcessingProcedures_08_00w.doc
Example of enhanced header information:
Name of Project: SN 9999 Mothers’ Relationships with their Teenage Daughters, 2010 Depositor: Bloggs, J.
Interview ID: M001 Filename: 9999int001 4.2.2. Transcript template font If the depositor has provided well-formatted and clear fonts consistently throughout the interview transcripts set, these may be retained. The Archive’s standard font for interview transcripts is: Verdana 11pt for the body text and Verdana 9pt for headers and footers. Depositors should be encouraged to format transcripts in this font prior to deposit.
4.2.3. Interviewer/Respondent demarcation tags Where the interview transcript follows the conversation recorded at interview, the text should be clearly separated into sections to show the interviewer’s questions/statements and the respondent’s responses, in the interview order.
If the data collection transcripts deposited already include a consistent method of identifying interviewers and respondents that has been applied consistently across the set of interviews, e.g. first names (where permitted), pseudonyms or initials, these may be used. If identification has not been applied consistently, a logical method of tagging that suits the particular collection may be chosen and used for all transcripts within the study. Some examples of the kind of tags that may be used are given below.
The tags should be in 11pt bold Verdana font. This will ensure that they are the same size as the standard transcript font, but emphasised by the bold setting. Standard formatting should be applied to the whole transcript (see below).
Some examples of the kind of tags that may be used are given below.
1. Where the depositor has used the Interviewer’s first name and a pseudonym for the Respondent:
3. Where the depositor has used initials for both interviewer (SJ in this example) and respondent (MH in this example). (Check the study materials to ensure the use of the respondent’s initials does not
compromise their confidentiality.):
7. Where the depositor has used initials for both Interviewer and Respondent (two Interviewer and two
Respondents in this example – increase numbers as appropriate for the study):
4.2.4. Formatting of transcripts and non-transcript items If the depositor has used clear formatting consistently throughout the interview transcripts set, this may be retained. Where this has not been applied, the Archive’s standard format for interview transcripts is as