15 April 2015

HDF5 Overview

HDF5 data file has hierarchical organization consisting of the groups and dataset. Groups can contain other groups and datasets, datasets contain complex multi-dimensional data. The organization looks similar to the regular UNIX file system, groups are analogous to directories, and datasets contain arbitrary data like regular files. Like directories, groups name the objects contained in them. This allows easy navigation from the “top” of the file to any object in that file, something like /groupA/groupB/dataset1.

Datasets in HDF5 file have defined element type and layout. Layout determines the dimensionality of the data and the size of each dimension (shape of the dataset). Special case of a layout called scalar which means that dataset contains exactly one element of the given type. More common layouts are multidimensional layouts. Element type describes the structure of the basic element of these multidimensional arrays or scalars. Type can be atomic, compound, or array. Atomic types include usual numeric types such as integer, floating, etc. Compound type is a collection of atomic types or other compound types, very much like structures in programming languages. Array type for elements means that each dataset element is itself an array. One has to remember that all elements of a dataset have the same type and structure, e.g. if element type is an array then all elements have to have exactly the same dimensions.

Groups and datasets in HDF5 could have a number of attributes attached to them which can be used to store some meta-information about the group or dataset as a whole. Each attribute has a name and associated value which usually hase some basic type - integer, float, string, etc. Attributes could be used for example to store begin/end time of the run or the run number.


blog comments powered by Disqus