Orgainsing your data

 

File naming

Best practice guidelines for naming both digital and analogue files should be followed as recommended by the UK Data Service, as detailed below:

  • Create meaningful but brief names e.g. use acronyms or formal IDs for: broad content types; project details; creator information.
  • Do not use spaces and use hyphens '-' or underscores '_' to separate logical elements. 
  • Avoid using special characters (& ? !).
  • If record creation is time specific, add the date in YYYY-MM-DD format to allow for sorting.
  • With digital files, reserve file format extensions such as .doc and .xls.

Examples:

  1.  FG1_CONS_2010-02-12.rtf is the file that contains the transcript of the first focus group with consumers, which took place on 12 February 2010.
  2.  Int024_AP_2008-06-05.doc is an interview with participant 024, interviewed by Anne Parsons (AP) on 5 June 2008.

 Version control should also be used where appropriate:

File nameChanges to file
Interviewschedule_1.0 Original document
Interviewschedule_1.1 Minor revisions made
Interviewschedule_1.2 Further minor revisions
Interviewschedule_2.0 Substantive changes

If a number of individuals are contributing to a project and its related files, a version control registry can also be used to record who was responsible for each version, what edits were made and when. This can be stored alongside the relevant data. An example of such a registry is available from the UK Data Service site.

 

File structure

A best practice methodology for structuring files is also given by the UK Data Archive as follows:

  • Data and documentation files should be held in separate folders.
  • Data files should be further organised according to data type and then according to research activity.
  • Documentation files should also be organised according to type of documentation file and research activity.

An example of such a hierarchy is given on the UK Data Service site.

Digital file formats

File format choice should be planned early on in the research cycle to ensure that the format is suitable for all uses and purposes that may be necessary. For example, initial file format may be determined by the specialist software used to create or collate data. However, for the data to be used in the long-term or shared, it may need to be converted to a more widely used format.

The UK Data Archive recommends that open and standard formats be utilised for future use of data. Open formats include: PDF/A (document); CSV (comma-separated values) (spreadsheet); TIFF (image). Standard formats include proprietary (under license) products such as Microsoft Office software (Word, Excel, etc.) and SPSS.

The "File Formats Table" produced by the UK Data Service offers a very useful resource for researchers to select the most appropriate file format type for the long-term use and access of their data.