Compressed Arrays

zfp‘s compressed arrays are C++ classes that implement random-accessible single- and multi-dimensional floating-point arrays whose storage size, specified in number of bits per array element, is set by the user. Such arbitrary storage is achieved via zfp‘s lossy fixed-rate compression mode, by partitioning each d-dimensional array into blocks of 4d values and compressing each block to a fixed number of bits. The more smoothly the array values vary along each dimension, the more accurately zfp can represent them. In other words, these arrays are not suitable for representing data where adjacent elements are not correlated. Rather, the expectation is that the array represents a regularly sampled and predominantly continuous function, such as a temperature field in a physics simulation.

The rate, measured in number of bits per array element, can be specified in fractions of a bit (but see FAQs #12 and #18 for limitations). Note that array dimensions need not be multiples of four; zfp transparently handles partial blocks on array boundaries.

The C++ templated array classes are implemented entirely as header files that call the zfp C library to perform compression and decompression. These arrays cache decompressed blocks to reduce the number of compression and decompression calls. Whenever an array value is read, the corresponding block is first looked up in the cache, and if found the uncompressed value is returned. Otherwise the block is first decompressed and stored in the cache. Whenever an array element is written (whether actually modified or not), a “dirty bit” is set with its cached block to indicate that the block must be compressed back to persistent storage when evicted from the cache.

This section documents the public interface to the array classes, including base classes and member accessor classes like proxy references/pointers and iterators.

Array Classes

Currently there are six array classes for 1D, 2D, and 3D arrays, each of which can represent single- or double-precision values. Although these arrays store values in a form different from conventional single- and double-precision floating point, the user interacts with the arrays via floats and doubles.

The array classes can often serve as direct substitutes for C/C++ single- and multi-dimensional floating-point arrays and STL vectors, but have the benefit of allowing fine control over storage size. All classes below belong to the zfp namespace.

Base Class

class array

Virtual base class for common array functionality.

array::array(const array &a)

Copy constructor. Performs a deep copy. NOTE: Deep copies are not yet fully implemented.

array::~array()

Destructor.

array &array::operator=(const array &a)

Assignment operator–performs a deep copy of the whole array.

double array::rate() const

Return rate in bits per value.

double array::set_rate(double rate)

Set desired compression rate in bits per value. Return the closest rate supported. See FAQ #12 and FAQ #18 for discussions of the rate granularity. This method destroys the previous contents of the array.

virtual void array::clear_cache() const

Empty cache without compressing modified cached blocks, i.e. discard any cached updates to the array.

virtual void array::flush_cache() const

Flush cache by compressing all modified cached blocks back to persistent storage and emptying the cache. This method should be called before writing the compressed representation of the array to disk, for instance.

size_t array::compressed_size() const

Return number of bytes of storage for the compressed data. This amount does not include the small overhead of other class members or the size of the cache. Rather, it reflects the size of the memory buffer returned by compressed_data().

uchar *array::compressed_data() const

Return pointer to compressed data for read or write access. The size of the buffer is given by compressed_size().

Common Methods

The following methods are common to 1D, 2D, and 3D arrays, but are implemented in the array class specific to each dimensionality rather than in the base class.

size_t array::size() const

Total number of elements in array, e.g. nx × ny × nz for 3D arrays.

size_t array::cache_size() const

Return the cache size in number of bytes.

void array::set_cache_size(size_t csize)

Set minimum cache size in bytes. The actual size is always a power of two bytes and consists of at least one block. If csize is zero, then a default cache size is used, which requires the array dimensions to be known.

void array::get(Scalar *p) const

Decompress entire array and store at p, for which sufficient storage must have been allocated. The uncompressed array is assumed to be contiguous (with default strides) and stored in the usual “row-major” order, i.e. with x varying faster than y and y varying faster than z.

void array::set(const Scalar *p)

Initialize array by copying and compressing data stored at p. The uncompressed data is assumed to be stored as in the get() method.

const Scalar &array::operator[](uint index) const

Return scalar stored at given flat index (inspector). For a 3D array, index = x + nx * (y + ny * z).

reference array::operator[](uint index)

Return proxy reference to scalar stored at given flat index (mutator). For a 3D array, index = x + nx * (y + ny * z).

iterator array::begin()

Return iterator to beginning of array.

iterator array::end()

Return iterator to end of array. As with STL iterators, the end points to a virtual element just past the last valid array element.

1D, 2D, and 3D Arrays

Below are classes and methods specific to each array dimensionality. Since the classes and methods share obvious similarities regardless of dimensionality, only one generic description for all dimensionalities is provided.

template<typename Scalar>
class array1 : public array
template<typename Scalar>
class array2 : public array
template<typename Scalar>
class array3 : public array

This is a 1D/2D/3D array that inherits basic functionality from the generic array base class. The template argument, Scalar, specifies the floating type returned for array elements. The suffixes f and d can also be appended to each class to indicate float or double type, e.g. array1f is a synonym for array1<float>.

array1::array1()
array2::array2()
array3::array3()

Default constructor. Creates an empty array.

array1::array1(uint n, double rate, const Scalar *p = 0, size_t csize = 0)
array2::array2(uint nx, uint ny, double rate, const Scalar *p = 0, size_t csize = 0)
array3::array3(uint nx, uint ny, uint nz, double rate, const Scalar *p = 0, size_t csize = 0)

Constructor of array with dimensions n (1D), nx × ny (2D), or nx × ny × nz (3D) using rate bits per value, at least csize bytes of cache, and optionally initialized from flat, uncompressed array p. If csize is zero, a default cache size is chosen.

uint array2::size_x() const
uint array2::size_y() const
uint array3::size_x() const
uint array3::size_y() const
uint array3::size_z() const

Return array dimensions.

void array1::resize(uint n, bool clear = true)
void array2::resize(uint nx, uint ny, bool clear = true)
void array3::resize(uint nx, uint ny, uint nz, bool clear = true)

Resize the array (all previously stored data will be lost). If clear is true, then the array elements are all initialized to zero.

const Scalar &array1::operator()(uint i) const
const Scalar &array2::operator()(uint i, uint j) const
const Scalar &array3::operator()(uint i, uint j, uint k) const

Return scalar stored at multi-dimensional index given by i, j, and k (inspector).

reference array1::operator()(uint i)
reference array2::operator()(uint i, uint j)
reference array3::operator()(uint i, uint j, uint k)

Return proxy reference to scalar stored at multi-dimensional index given by i, j, and k (mutator).

Caching

As mentioned above, the array classes maintain a software write-back cache of at least one uncompressed block. When a block in this cache is evicted (e.g. due to a conflict), it is compressed back to permanent storage only if it was modified while stored in the cache.

The size cache to use is specified by the user, and is an important parameter that needs careful consideration in order to balance the extra memory usage, performance, and quality (recall that data loss is incurred only when a block is evicted from the cache and compressed). Although the best choice varies from one application to another, we suggest allocating at least two “layers” of blocks, e.g. 2 × (nx / 4) × (ny / 4) blocks for 3D arrays, for applications that stream through the array and perform stencil computations such as gathering data from neighboring elements. This allows limiting the cache misses to compulsory ones. If the csize parameter provided to the constructor is set to zero bytes, then this default of two layers is used.

The cache size can be set during construction, or can be set at a later time via array::set_cache_size(). Note that if csize = 0, then the array dimensions must have already been specified for the default size to be computed correctly. When the cache is resized, it is first flushed if not already empty. The cache can also be flushed explicitly if desired by calling array::flush_cache(). To empty the cache without compressing any cached data, call clear_cache(). To query the byte size of the cache, use array::cache_size().

By default a direct-mapped cache is used with a hash function that maps block indices to cache lines. A faster but more collision prone hash can be enabled by defining the preprocessor macro ZFP_WITH_CACHE_FAST_HASH. A two-way skew-associative cache is enabled by defining the preprocessor macro ZFP_WITH_CACHE_TWOWAY.

References

template<typename Scalar>
class array1::reference
template<typename Scalar>
class array2::reference
template<typename Scalar>
class array3::reference

Array indexing operators must return lvalue references that alias array elements and serve as vehicles for assigning values to those elements. Unfortunately, zfp cannot simply return a standard C++ reference (e.g. float&) to an uncompressed array element since the element in question may exist only in compressed form or as a transient cached entry that may be invalidated (evicted) at any point.

To address this, zfp provides proxies for references and pointers that act much like regular references and pointers, but which refer to elements by array and index rather than by memory address. When assigning to an array element through such a proxy reference or pointer, the corresponding element is decompressed to cache (if not already cached) and immediately updated.

zfp references may be freely passed to other functions and they remain valid during the lifetime of the corresponding array element. One may also take the address of a reference, which yields a proxy pointer. When a reference appears as an rvalue in an expression, it is implicitly converted to a value.

The following operators are defined for zfp references. They act on the referenced array element in the same manner as operators defined for conventional C++ references.

reference reference::operator=(const reference &ref)

Assignment (copy) operator. The referenced element, elem, is assigned the value stored at the element referenced by ref. Return *this.

reference reference::operator=(Scalar val)
reference reference::operator+=(Scalar val)
reference reference::operator-=(Scalar val)
reference reference::operator*=(Scalar val)
reference reference::operator/=(Scalar val)

Assignment and compound assignment operators. For a given operator op, update the referenced element, elem, via elem op val. Return *this.

pointer reference::operator&()

Return pointer to the referenced array element.

Finally, zfp proxy references serve as a building block for implementing proxy pointers and iterators.

Pointers

template<typename Scalar>
class array1::pointer
template<typename Scalar>
class array2::pointer
template<typename Scalar>
class array3::pointer

Similar to references, zfp supports proxies for pointers to individual array elements. From the user’s perspective, such pointers behave much like regular pointers to uncompressed data, e.g. instead of

float a[ny][nx];     // uncompressed 2D array of floats
float* p = &a[0][0]; // point to first array element
p[nx] = 1;           // set a[1][0] = 1
*++p = 2;            // set a[0][1] = 2

one would write

zfp::array2<float> a(nx, ny, rate);       // compressed 2D array of floats
zfp::array2<float>::pointer p = &a(0, 0); // point to first array element
p[nx] = 1;                                // set a(0, 1) = 1
*++p = 2;                                 // set a(1, 0) = 2

However, even though zfp‘s proxy pointers point to individual scalars, they are associated with the array that those scalars are stored in, including the array’s dimensionality. Pointers into arrays of different dimensionality have incompatible type. Moreover, pointers to elements in different arrays are incompatible. For example, one cannot take the difference between pointers into two different arrays.

Unlike zfp‘s proxy references, its proxy pointers support traversing arrays using conventional pointer arithmetic. In particular, unlike the iterators below, zfp‘s pointers are oblivious to the fact that the compressed arrays are partitioned into blocks, and the pointers traverse arrays element by element as though the arrays were flattened in standard C row-major order. That is, if p points to the first element of a 3D array a(nx, ny, nz), then a(i, j, k) == p[i + nx * (j + ny * k)]. In other words, pointer indexing follows the same order as flat array indexing (see array::operator[]()).

A pointer remains valid during the lifetime of the scalar that it points to. Like conventional pointers, proxy pointers can be passed to other functions and manipulated there, for instance by passing the pointer by reference via pointer&.

The following operators are defined for proxy pointers. Below p refers to the pointer being acted upon.

pointer pointer::operator=(const pointer &q)

Assignment operator. Assigns q to p.

reference pointer::operator*() const

Dereference operator. Return proxy reference to the value pointed to by p.

reference pointer::operator[](ptrdiff_t d) const

Index operator. Return reference to the value stored at p[d].

pointer &pointer::operator++()
pointer &pointer::operator--()

Pre increment (decrement) pointer, e.g. ++p. Return reference to the incremented (decremented) pointer.

pointer pointer::operator++(int)
pointer pointer::operator--(int)

Post increment (decrement) pointer, e.g. p++. Return a copy of the pointer before it was incremented (decremented).

pointer pointer::operator+=(ptrdiff_t d)
pointer pointer::operator-=(ptrdiff_t d)

Increment (decrement) pointer by d. Return a copy of the incremented (decremented) pointer.

pointer pointer::operator+(ptrdiff_t d) const
pointer pointer::operator-(ptrdiff_t d) const

Return a copy of the pointer incremented (decremented) by d.

ptrdiff_t pointer::operator-(const pointer &q) const

Return difference p - q. Defined only for pointers within the same array.

bool pointer::operator==(const pointer &q) const
bool pointer::operator!=(const pointer &q) const

Pointer comparison. Return true (false) if p and q do (do not) point to the same array element.

Iterators

template<typename Scalar>
class array1::iterator
template<typename Scalar>
class array2::iterator
template<typename Scalar>
class array3::iterator

Iterators provide a mechanism for sequentially traversing a possibly multi-dimensional array without having to track array indices or bounds. They are also the preferred mechanism, compared to nested index loops, for initializing arrays, because they are guaranteed to visit the array one block at a time. This allows all elements of a block to be initialized together and ensures that the block is not compressed to memory before it has been fully initialized, which might otherwise result in poor compression and, consequently, larger errors than when the entire block is initialized as a whole. Note that the iterator traversal order differs in this respect from traversal by pointers.

The order of blocks visited is row-major (as in C), and the elements within a block are also visited in row-major order, i.e. first by x, then by y, and finally by z. All 4d values in a block are visited before moving on to the next block.

The iterators provided by zfp are sequential forward iterators, except for 1D array iterators, which are random access iterators. The reason why higher dimensional iterators do not support random access is that this would require very complicated index computations, especially for arrays with partial blocks. zfp iterators are STL compliant and can be used in STL algorithms that support forward and random access iterators.

All Iterators

Per STL mandate, the iterators define several types:

type iterator::value_type

The scalar type associated with the array that the iterator points into.

type iterator::difference_type

Difference between two iterators in number of array elements.

type iterator::reference
type iterator::pointer

The reference and pointer type associated with the iterator’s parent array class.

type iterator::iterator_category

Type of iterator: std::random_access_iterator_tag for 1D arrays; std::forward_iterator_tag for all other arrays.

The following operations are defined on iterators:

iterator iterator::operator=(const iterator &it)

Assignment (copy) operator. Make the iterator point to the same element as it.

reference iterator::operator*() const

Dereference operator. Return reference to the value pointed to by the iterator.

iterator &iterator::operator++()

Pre increment. Return a reference to the incremented iterator.

iterator iterator::operator++(int)

Post increment. Return the value of the iterator before being incremented.

bool iterator::operator==(const iterator &it) const
bool iterator::operator!=(const iterator &it) const

Return true (false) if the two iterators do (do not) point to the same element.

uint iterator::i() const
uint iterator::j() const
uint iterator::k() const

Return array index of element pointed to by the iterator. iterator::i() is defined for all arrays. iterator::j() is defined only for 2D and 3D arrays. iterator::k() is defined only for 3D arrays.

1D Array Iterators

The following operators are defined only for 1D arrays:

reference iterator::operator[](difference_type d) const

Random access index operator.

iterator &iterator::operator--()

Pre decrement. Return a reference to the decremented iterator.

iterator iterator::operator--(int)

Post decrement. Return the value of the iterator before being decremented.

iterator iterator::operator+=(difference_type d)
iterator iterator::operator-=(difference_type d)

Increment (decrement) iterator d times. Return value of incremented (decremented) iterator.

iterator iterator::operator+(difference_type d) const
iterator iterator::operator-(difference_type d) const

Return a new iterator that has been incremented (decremented) by d.

difference_type iterator::operator-(const iterator &it) const

Return difference between this iterator and it in number of elements. The iterators must refer to elements in the same array.

bool iterator::operator<=(const iterator &it) const
bool iterator::operator>=(const iterator &it) const
bool iterator::operator<(const iterator &it) const
bool iterator::operator>(const iterator &it) const

Return true if the two iterators satisfy the given relationship. For two iterators, p and q, within the same array, p < q if and only if q can be reached by incrementing p one or more times.