A sustainable digital format is one that is compatible, for the foreseeable future, with software needed to open and read it.
Unfortunately, as software applications change or disappear over time, data file formats can become obsolete. If you are using a proprietary and/or obscure file format, there is a risk of the format becoming obsolete--making your data unusable.
If you are working in a proprietary/less-sustainable format, consider converting your data to an open, widely-used format when you preserve and share your data. Many software programs allow for saving/converting datasets into more open formats (e.g. save SPSS dataset as CSV). This will better ensure that your data is usable by others and into the future.
Wherever possible, select data formats that have the following sustainability attributes:
- Adheres to specifications that are publicly documented versus formats based on proprietary specifications - example TIFF format for images
- Is in widespread use and readable with available software - example HTML for hypertext, CSV for tablular data
- Is self-describing, i.e., contains embedded metadata that help interpret the context and structure of the data file - example XML files contain headers and tags describing the file's content
- Contains as much of the original information as possible - example Motion JPEG 2000, a "lossless" format for digital video.
If you are uncertain of which file formats to select for long-term preservation of your research data, here are some tips to help you decide:
- Select formats that ensure the best change for long-term access to data
- Favor commonly used and non-proprietary formats
- Consider longevity, popularity, and potential for migration
- Investigate detailed technical information about file formats using the UK National Archives'PRONOM Registry
- Consider requirements of selected data repository: If you intend to deposit your data in a data repository, this repository may have guidelines on how data should be structured and what file formats it will accept.
- Many institutions also provide file format recommendations and preferences based on content type:
To emphasis: The most appropriate file-format should be selected for the long-term preservation and continued access to research data.
The following should be taken into account when selecting an appropriate format:
- future accessibility
- open, document standard
- common, used by the research community
- standard representation (ASCII, Unicode)
- preferably not software specific.
Best file formats include:
- PDF, not Word
- ASCII, not Excel
- MPEG-4, not Quicktime
- TIFF or JPEG2000, not GIF or JPG
- XML or RDF, not RDBMS
A comprehensive guideline on various aspects of file formats has been compiled by the Digital Curation Centre (DCC)