Formats & Data Organization

  • What should I focus on when organizing data?
  • How should I approach naming my files?
  • What are the issues around file formats?
  • How do I keep track of changes?

What should I focus on when organizing data?

There are some fundamental decisions that you need to make when you start your research and data organization should be one. The choices that you make will vary based on the type of research that you do, but everyone must address the same issues.

  • File Version Control: Versioning refers to saving new copies of your files when you make changes so that you can go back and retrieve specific versions of your files later.
  • Directory Structure: Keep your files organized in a structure that is logical for understanding and using your data.
  • File Naming Conventions: You should name files in a way that helps you locate them and understand what they contain without having to open each file. Some disciplines have their own file naming conventions.
  • Directory Structure for Backups: Be sure that your backups are organized in the same way as your originals.
  • Keep your metadata files with your data files. See our Describing your Data for more information on metadata.

How should I approach naming my files?

Best Practices

  • Be consistent.
    • Have conventions for naming (1) Directory structure, (2) Folder names, (3) File names
    • Always include the same information (e.g., date and time)
    • Retain the order of information (e.g., YYYYMMDD, not MMDDYYYY )
    • Document your file naming conventions so that other users will understand the structure of your file names and any abbreviations or codes you might use.
    • Consider creating a file inventory (aka manifest) for each folder in a ReadMe file 
  • Be descriptive so others can understand your meaning. Include other relevant information such as:
    • Unique identifier (i.e., Project Name or Grant Number in folder name)
    • Project name
    • Conditions (Lab instrument, Solvent, Temperature, etc.)
    • Run of experiment (sequential)
    • Date (in file properties too)
    • Use application-specific codes in 3-letter file extension: MOV, TIF, WRL
  • Keep track of versions
    • Use a sequential numbered system: v1, v2, v3, etc.
    • Don't use confusing labels: revision, final, final2, etc.
    • Consider version control software, if applicable
    • Record all changes -- no matter how small
    • Discard obsolete versions (but never the raw copy)
    • Use auto-backup instead of self-archiving, if possible
    • Avoid using slashes in file names (/ or \)

File Name Example

Project_instrument_location_YYYYMMDD[hh][mm][ss][_extra].ext

File Renaming Applications

If you have many files already named and need to revise your naming system, you might consider using a file renaming application such as:

What are the issues around file formats?

it is important to think carefully about what file format will be best for long-term preservation and continued access to your data.

Formats most likely to be accessible in the future are:

  • Non-proprietary and not tied to a specific piece of software
  • Open, documented standard
  • Common, used by the research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Below are some common examples of preferred formats. For more, see the UK Data Service Recommended Formats or contact us with the Research Facilitation Services (RFS) intake form.

  • PDF, not Word
  • CSV, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

How do I keep track of changes?

If your research involves more than one person, tracking changes is a critical element. As you think through how to manage this step, keep the following issues in mind.

  • If you make significant changes to a file, consider file versioning (see above)
  • Use file naming conventions (see above)
  • Consider using a file sharing and collaboration software (e.g. Google Drive, OSF, DropBox) with built in version control. 
  • Consider using a software tool that maintains version control (e.g. Git).