File Formats

Scientific data should be stored in a format that will be as readable as possible in the future to ensure that it is not lost.

Why is the choice of file format important for publication on RODARE or long-term archiving?

Photo: Data ©Copyright: Created with AI / https://tse2.mm.bing.net/th?id=OIG2.OIrpE7oNDUjbPv_rgcBI&pid=ImgGn / 20 June 2024 at 8:29 AM

Digital objects of all kinds can be created quickly and sometimes easily and without a high level of background knowledge.

A correspondingly large number of objects are created every day. In the scientific field in particular, it is important to save some of these objects for as long as possible. This is because scientific research should also be usable in the future and it is often not possible to restore or reproduce the objects if they are lost.

However, as hardware, software and formats are constantly evolving, data runs the risk of becoming unreadable at some point.

Choosing the right formats that can be converted if necessary is therefore particularly important for storing data over a longer period of time.


What is important about file formats with regard to the publication and archiving of (research) data?

Foto: Data at the HZDR ©Copyright: Mit KI erstellt / https://tse1.mm.bing.net/th?id=OIG2.z7Nw82RZSMyWHaIvjQ_M&pid=ImgGn / 20. Juni 2024 um 1:51 PM

It is not possible to predict which formats will prevail in the future. However, there are some general guidelines that are relevant for the future use of data:

These are:
Open formats : Avoid proprietary formats that are restricted by software or patents. Open formats allow for further development by third parties and are less dependent on specific manufacturers.

Transparency: Choose formats that can be directly analyzed, e.g. through readability in a text editor. Use standardized text encodings such as Unicode.

Distribution: Choose formats that are distributed worldwide to ensure usability for many people.

Losslessness: Lossless formats preserve all original data. Take into account the long period of archiving and minimize data loss.

Standards and documentation: Make sure there is sufficient documentation or standards for the chosen data format.

Metadata support: Use formats that allow metadata to be entered to better understand and find the data.


Category Recommended formats for long-term archiving Proprietary formats
Documents PDF/A, XML, ASCII, LaTeX DOC (Microsoft Word), PAGES (Apple Pages)
Images TIFF, JPEG 2000, PNG PSD (Adobe Photoshop), HEIC (Apple)
Audio WAV, FLAC MP3 (partly proprietary), AAC (Apple)
Video MKV, AVI MOV (Apple QuickTime), WMV (Microsoft)
Databases CSV, XML, SQL (open implementations) MDB (Microsoft Access), SQL (proprietary implementations)
Text TXT, UTF-8, Markdown RTF (Microsoft), WPD (WordPerfect)
Spreadsheets CSV, ODS (OpenDocument Spreadsheet) XLSX (Microsoft Excel), NUMBERS (Apple Numbers)
Geo data GeoTIFF, Shapefile (SHP), NetCDF KML (Google Earth), GDB (Esri Geodatabase)
Statistical data CSV, RData (R), NetCDF SAV (SPSS), DTA (Stata)
2D/3D Design SVG (Scalable Vector Graphics) and DXF (AutoCAD Drawing Interchange Format) for vector graphics, shapefile, CGM, X3D, 3MF DWG, CDR
3D printing AMF (Additive Manufacturing File Format), 3MF (3D Manufacturing Format) ---
Metadata XML, JSON, RDF, HDF5 ---

Please note that this table contains general recommendations and it is important to consider specific requirements and standards in your context.


Related Links