[ VIGRA Homepage | Function Index | Class Index | Namespaces | File List | Main Page ]

ChunkedArray< N, T > Class Template Referenceabstract

Interface and base class for chunked arrays. More...

#include <vigra/multi_array_chunked.hxx>

Inheritance diagram for ChunkedArray< N, T >:
ChunkedArrayCompressed< N, T, Alloc > ChunkedArrayFull< N, T, Alloc > ChunkedArrayHDF5< N, T, Alloc > ChunkedArrayLazy< N, T, Alloc > ChunkedArrayTmpFile< N, T >

Public Member Functions

std::string backend () const
 Return the class that implements this ChunkedArray.
 
iterator begin ()
 Create a scan-order iterator for the entire chunked array.
 
const_iterator begin () const
 Create a read-only scan-order iterator for the entire chunked array.
 
template<unsigned int M>
MultiArrayView< N-1, T, ChunkedArrayTagbind (difference_type_1 index) const
 Create a lower dimensional view to the chunked array.
 
MultiArrayView< N-1, T, ChunkedArrayTagbindAt (MultiArrayIndex dim, MultiArrayIndex index) const
 Create a lower dimensional view to the chunked array.
 
template<int M, class Index>
MultiArrayView< N-M, T, ChunkedArrayTagbindInner (const TinyVector< Index, M > &d) const
 Create a lower dimensional view to the chunked array.
 
MultiArrayView< N-1, T, ChunkedArrayTagbindInner (difference_type_1 index) const
 Create a lower dimensional view to the chunked array.
 
template<int M, class Index>
MultiArrayView< N-M, T, ChunkedArrayTagbindOuter (const TinyVector< Index, M > &d) const
 Create a lower dimensional view to the chunked array.
 
MultiArrayView< N-1, T, ChunkedArrayTagbindOuter (difference_type_1 index) const
 Create a lower dimensional view to the chunked array.
 
std::size_t cacheMaxSize () const
 Get the number of chunks the cache will hold.
 
int cacheSize () const
 Number of chunks currently fitting into the cache.
 
const_iterator cbegin () const
 Create a read-only scan-order iterator for the entire chunked array.
 
const_iterator cend () const
 Create the end iterator for read-only scan-order iteration over the entire chunked array.
 
template<class U, class Stride>
void checkoutSubarray (shape_type const &start, MultiArrayView< N, U, Stride > &subarray) const
 Copy an ROI of the chunked array into an ordinary MultiArrayView.
 
chunk_iterator chunk_begin (shape_type const &start, shape_type const &stop)
 Create an iterator over all chunks intersected by the given ROI.
 
chunk_const_iterator chunk_begin (shape_type const &start, shape_type const &stop) const
 Create a read-only iterator over all chunks intersected by the given ROI.
 
chunk_const_iterator chunk_cbegin (shape_type const &start, shape_type const &stop) const
 Create a read-only iterator over all chunks intersected by the given ROI.
 
chunk_const_iterator chunk_cend (shape_type const &start, shape_type const &stop) const
 Create the end iterator for read-only iteration over all chunks intersected by the given ROI.
 
chunk_iterator chunk_end (shape_type const &start, shape_type const &stop)
 Create the end iterator for iteration over all chunks intersected by the given ROI.
 
chunk_const_iterator chunk_end (shape_type const &start, shape_type const &stop) const
 Create the end iterator for read-only iteration over all chunks intersected by the given ROI.
 
virtual shape_type chunkArrayShape () const
 Number of chunks along each coordinate direction.
 
shape_type const & chunkShape () const
 Return the global chunk shape.
 
shape_type chunkShape (shape_type const &chunk_index) const
 Find the shape of the chunk indexed by 'chunk_index'.
 
shape_type chunkStart (shape_type const &global_start) const
 Find the chunk that contains array element 'global_start'.
 
shape_type chunkStop (shape_type global_stop) const
 Find the chunk that is beyond array element 'global_stop'.
 
template<class U, class Stride>
void commitSubarray (shape_type const &start, MultiArrayView< N, U, Stride > const &subarray)
 Copy an ordinary MultiArrayView into an ROI of the chunked array.
 
const_view_type const_subarray (shape_type const &start, shape_type const &stop) const
 Create a read-only view to the specified ROI.
 
std::size_t dataBytes () const
 Bytes of main memory occupied by the array's data.
 
std::size_t dataBytesPerChunk () const
 Number of data bytes in an uncompressed chunk.
 
iterator end ()
 Create the end iterator for scan-order iteration over the entire chunked array.
 
const_iterator end () const
 Create the end iterator for read-only scan-order iteration over the entire chunked array.
 
value_type getItem (shape_type const &point) const
 Read the array element at index 'point'.
 
bool isInside (shape_type const &p) const
 Check if the given point is in the array domain.
 
template<class U, class C1>
bool operator!= (MultiArrayView< N, U, C1 > const &rhs) const
 Check if two arrays differ in at least one element.
 
template<class U, class C1>
bool operator== (MultiArrayView< N, U, C1 > const &rhs) const
 Check if two arrays are elementwise equal.
 
std::size_t overheadBytes () const
 Bytes of main memory needed to manage the chunked storage.
 
virtual std::size_t overheadBytesPerChunk () const =0
 Bytes of main memory needed to manage a single chunk.
 
void releaseChunks (shape_type const &start, shape_type const &stop, bool destroy=false)
 
void setCacheMaxSize (std::size_t c)
 Set the number of chunks the cache will hold.
 
void setItem (shape_type const &point, value_type const &v)
 Write the array element at index 'point'.
 
shape_type const & shape () const
 Return the shape in this array.
 
MultiArrayIndex size () const
 Return the number of elements in this array.
 
view_type subarray (shape_type const &start, shape_type const &stop)
 Create a view to the specified ROI.
 
const_view_type subarray (shape_type const &start, shape_type const &stop) const
 Create a read-only view to the specified ROI.
 

Detailed Description

template<unsigned int N, class T>
class vigra::ChunkedArray< N, T >

Interface and base class for chunked arrays.

Very big data arrays (possibly bigger than the available RAM) can only be processed in smaller pieces. To support quick access to these pieces, it is advantegeous to store big arrays in chunks, i.e. as a collection of small rectagular subarrays. The class ChunkedArray encapsulates storage and handling of these chunks and provides various APIs to easily access the data.

#include <vigra/multi_array_chunked.hxx>
Namespace: vigra

Template Parameters
Nthe array dimension
Tthe type of the array elements

(these are the same as in MultiArrayView). The actual way of chunk storage is determined by the derived class the program uses:

  • ChunkedArrayFull: Provides the chunked array API for a standard MultiArray (i.e. there is only one chunk for the entire array).

  • ChunkedArrayLazy: All chunks reside in memory, but are only allocated upon first access.

  • ChunkedArrayCompressed: Like ChunkedArrayLazy, but temporarily unused chunks are compressed in memory to save space.

  • ChunkedArrayTmpFile: Chunks are stored in a memory-mapped file. Temporarily unused chunks are written to the hard-drive and deleted from memory.

  • ChunkedArrayHDF5: Chunks are stored in a HDF5 dataset by means of HDF5's native chunked storage capabilities. Temporarily unused chunks are written to the hard-drive in compressed form and deleted from memory.

You must use these derived classes to construct a chunked array because ChunkedArray itself is an abstract class.

Chunks can be in one of the following states:

  • uninitialized: Chunks are only initialized (i.e. allocated) upon the first write access. If an uninitialized chunk is accessed in a read-only manner, the system returns a pseudo-chunk whose elements have a user-provided fill value.

  • asleep: The chunk is currently unused and has been compressed and/or swapped out to the hard drive.

  • inactive: The chunk is currently unused, but still resides in memory.

  • active: The chunk resides in memory and is currently in use.

  • locked: Chunks are briefly in this state during transitions between the other states (e.g. while loading and/or decompression is in progress).

  • failed: An unexpected error occured, e.g. the system is out of memory or a write to the hard drive failed.

In-memory chunks (active and inactive) are placed in a cache. If a chunk transitions from the 'asleep' to the 'active' state, it is added to the cache, and an 'inactive' chunk is removed and sent 'asleep'. If there is no 'inactive' chunk in the cache, the cache size is temporarily increased. All state transitions are thread-safe.

In order to optimize performance, the user should adjust the cache size (via setCacheMaxSize() or ChunkedArrayOptions) so that it can hold all chunks that are frequently needed (e.g. all chunks forming a row of the full array).

Another performance critical parameter is the chunk shape. While the system uses sensible defaults (5122 for 2D arrays, 643 for 3D, 64x64x16x4 for 4D, and 64x64x16x4x4 for 5D), the shape may need to be adjusted via the array's constructor to match the access patterns of the algorithms to be used. For speed reasons, chunk shapes must be powers of 2.

The data in the array can be accessed in several ways. The simplest is via calls to checkoutSubarray() and commitSubarray(): These functions copy an arbitrary subregion of a chunked array (possibly straddling many chunks) into a standard MultiArrayView for processing, and write results back into the chunked array:

ChunkedArray<3, float> & chunked_array = ...;
Shape3 roi_start(1000, 500, 500);
MultiArray<3, float> work_array(Shape3(100, 100, 100));
// copy data from region (1000,500,500)...(1100,600,600)
chunked_array.checkoutSubarray(roi_start, work_array);
... // work phase: process data in work_array as usual
// write results back into chunked_array
chunked_array.commitSubarray(roi_start, work_array);
void checkoutSubarray(shape_type const &start, MultiArrayView< N, U, Stride > &subarray) const
Copy an ROI of the chunked array into an ordinary MultiArrayView.
Definition multi_array_chunked.hxx:2092
void commitSubarray(shape_type const &start, MultiArrayView< N, U, Stride > const &subarray)
Copy an ordinary MultiArrayView into an ROI of the chunked array.
Definition multi_array_chunked.hxx:2114
Main MultiArray class containing the memory management.
Definition multi_array.hxx:2479

The required chunks in chunked_array will only be active while the checkout and commit calls are executing. During the work phase, other threads can use the chunked array's cache to checkout or commit different subregions.

Alternatively, one can work directly on the chunk storage. This is most easily achieved by means of chunk iterators:

ChunkedArray<3, float> & chunked_array = ...;
// define the ROI to be processed
Shape3 roi_start(100, 200, 300), roi_end(1000, 2000, 600);
// get a pair of chunk iterators ( = iterators over chunks)
auto chunk = chunked_array.chunk_begin(roi_start, roi_end),
end = chunked_array.chunk_end(roi_start, roi_end);
// iterate over the chunks in the ROI
for(; chunk != end; ++chunk)
{
// get a view to the current chunk's data
// Note: The view actually refers to the intersection of the
// current chunk with the ROI. Thus, chunks which are
// partially outside the ROI are appropriately trimmed.
MultiArrayView<3, float> chunk_view = *chunk;
... // work phase: process data in chunk_view as usual
}
chunk_iterator chunk_end(shape_type const &start, shape_type const &stop)
Create the end iterator for iteration over all chunks intersected by the given ROI.
Definition multi_array_chunked.hxx:2437
iterator end()
Create the end iterator for scan-order iteration over the entire chunked array.
Definition multi_array_chunked.hxx:2389
chunk_iterator chunk_begin(shape_type const &start, shape_type const &stop)
Create an iterator over all chunks intersected by the given ROI.
Definition multi_array_chunked.hxx:2428
Base class for, and view to, MultiArray.
Definition multi_array.hxx:705

No memory is duplicated in this approach, and only the current chunk needs to be active, so that a small chunk cache is sufficient. The iteration over chunks can be distributed over several threads that process the array data in parallel. The programmer must make sure that write operations to individual elements are synchronized between threads. This is usually achieved by ensuring that the threads are responsible for non-overlapping regions of the output array.

An even simpler method is direct element access via indexing. However, the chunked array has no control over the access order in this case, so it must potentially activate the present chunk upon each access. This is rather expensive and should only be used for debugging:

ChunkedArray<3, float> & chunked_array = ...;
Shape3 index(100, 200, 300);
// access data at coordinate 'index'
chunked_array.setItem(index, chunked_array.getItem(index) + 2.0);
void setItem(shape_type const &point, value_type const &v)
Write the array element at index 'point'.
Definition multi_array_chunked.hxx:2244
value_type getItem(shape_type const &point) const
Read the array element at index 'point'.
Definition multi_array_chunked.hxx:2221

Two additional APIs provide access in a way compatible with an ordinary MultiArrayView. These APIs should be used in functions that are supposed to work unchanged on both ordinary and chunked arrays. The first possibility is the chunked scan-order iterator:

ChunkedArray<3, float> & chunked_array = ...;
// get a pair of scan-order iterators ( = iterators over elements)
auto iter = chunked_array.begin(),
end = chunked_array.end();
// iterate over all array elements
for(; iter != end; ++iter)
{
// access current element
*iter = *iter + 2.0;
}
iterator begin()
Create a scan-order iterator for the entire chunked array.
Definition multi_array_chunked.hxx:2381

A new chunk must potentially be activated whenever the iterator crosses a chunk boundary. Since the overhead of the activation operation can be amortized over many within-chunk steps, the iteration (excluding the workload within the loop) takes only twice as long as the iteration over an unstrided array using an ordinary StridedScanOrderIterator.

The final possibility is the creation of a MultiArrayView that accesses an arbitrary ROI directly:

ChunkedArray<3, float> & chunked_array = ...;
// define the ROI to be processed
Shape3 roi_start(100, 200, 300), roi_end(1000, 2000, 600);
// create view for ROI
chunked_array.subarray(roi_start, roi_stop);
... // work phase: process view like any ordinary MultiArrayView
view_type subarray(shape_type const &start, shape_type const &stop)
Create a view to the specified ROI.
Definition multi_array_chunked.hxx:2180

Similarly, a lower-dimensional view can be created with one of the bind functions. This approach has the advantage that 'view' can be passed to any function which is implemented in terms of MultiArrayViews. However, there are two disadvantages: First, data access in the view requires two steps (first find the chunk, then find the appropriate element in the chunk), which causes the chunked view to be slower than an ordinary MultiArrayView. Second, all chunks intersected by the view must remain active throughout the view's lifetime, which may require a big chunk cache and thus keeps many chunks in memory.

Member Function Documentation

◆ dataBytes()

template<unsigned int N, class T>
std::size_t dataBytes ( ) const

Bytes of main memory occupied by the array's data.

Compressed chunks are only counted with their compressed size. Chunks swapped out to the hard drive are not counted.

◆ chunkStop()

template<unsigned int N, class T>
shape_type chunkStop ( shape_type global_stop) const

Find the chunk that is beyond array element 'global_stop'.

Specifically, this computes

chunkStart(global_stop - shape_type(1)) + shape_type(1)
shape_type chunkStart(shape_type const &global_start) const
Find the chunk that contains array element 'global_start'.
Definition multi_array_chunked.hxx:1708

◆ chunkShape() [1/2]

template<unsigned int N, class T>
shape_type chunkShape ( shape_type const & chunk_index) const

Find the shape of the chunk indexed by 'chunk_index'.

This may differ from the global chunk shape because chunks at the right/lower border of the array may be smaller than usual.

◆ chunkShape() [2/2]

template<unsigned int N, class T>
shape_type const & chunkShape ( ) const

Return the global chunk shape.

This is the shape of all chunks that are completely contained in the array's domain.

◆ releaseChunks()

template<unsigned int N, class T>
void releaseChunks ( shape_type const & start,
shape_type const & stop,
bool destroy = false )

Sends all chunks asleep which are completely inside the given ROI. If destroy == true and the backend supports destruction (currently: ChunkedArrayLazy and ChunkedArrayCompressed), chunks will be deleted entirely. The chunk's contents after releaseChunks() are undefined. Currently, chunks retain their values when sent asleep, and assume the array's fill_value when deleted, but applications should not rely on this behavior.

◆ checkoutSubarray()

template<unsigned int N, class T>
template<class U, class Stride>
void checkoutSubarray ( shape_type const & start,
MultiArrayView< N, U, Stride > & subarray ) const

Copy an ROI of the chunked array into an ordinary MultiArrayView.

The ROI's lower bound is given by 'start', its upper bound (in 'beyond' sense) is 'start + subarray.shape()'. Chunks in the ROI are only activated while the read is in progress.

◆ commitSubarray()

template<unsigned int N, class T>
template<class U, class Stride>
void commitSubarray ( shape_type const & start,
MultiArrayView< N, U, Stride > const & subarray )

Copy an ordinary MultiArrayView into an ROI of the chunked array.

The ROI's lower bound is given by 'start', its upper bound (in 'beyond' sense) is 'start + subarray.shape()'. Chunks in the ROI are only activated while the write is in progress.

◆ subarray() [1/2]

template<unsigned int N, class T>
view_type subarray ( shape_type const & start,
shape_type const & stop )

Create a view to the specified ROI.

The view can be used like an ordinary MultiArrayView, but is a but slower. All chunks intersecting the view remain active throughout the view's lifetime.

◆ subarray() [2/2]

template<unsigned int N, class T>
const_view_type subarray ( shape_type const & start,
shape_type const & stop ) const

Create a read-only view to the specified ROI.

The view can be used like an ordinary MultiArrayView, but is a but slower. All chunks intersecting the view remain active throughout the view's lifetime.

◆ const_subarray()

template<unsigned int N, class T>
const_view_type const_subarray ( shape_type const & start,
shape_type const & stop ) const

Create a read-only view to the specified ROI.

The view can be used like an ordinary MultiArrayView, but is a but slower. All chunks intersecting the view remain active throughout the view's lifetime.

◆ getItem()

template<unsigned int N, class T>
value_type getItem ( shape_type const & point) const

Read the array element at index 'point'.

Since the corresponding chunk must potentially be activated first, this function may be slow and should mainly be used in debugging.

◆ setItem()

template<unsigned int N, class T>
void setItem ( shape_type const & point,
value_type const & v )

Write the array element at index 'point'.

Since the corresponding chunk must potentially be activated first, this function may be slow and should mainly be used in debugging.

◆ bindAt()

template<unsigned int N, class T>
MultiArrayView< N-1, T, ChunkedArrayTag > bindAt ( MultiArrayIndex dim,
MultiArrayIndex index ) const

Create a lower dimensional view to the chunked array.

Dimension 'dim' is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.

◆ bind()

template<unsigned int N, class T>
template<unsigned int M>
MultiArrayView< N-1, T, ChunkedArrayTag > bind ( difference_type_1 index) const

Create a lower dimensional view to the chunked array.

Dimension 'M' (given as a template parameter) is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.

◆ bindOuter() [1/2]

template<unsigned int N, class T>
MultiArrayView< N-1, T, ChunkedArrayTag > bindOuter ( difference_type_1 index) const

Create a lower dimensional view to the chunked array.

Dimension 'N-1' is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.

◆ bindOuter() [2/2]

template<unsigned int N, class T>
template<int M, class Index>
MultiArrayView< N-M, T, ChunkedArrayTag > bindOuter ( const TinyVector< Index, M > & d) const

Create a lower dimensional view to the chunked array.

The M rightmost dimensions are bound to the indices given in 'd'. All chunks intersecting the view remain active throughout the view's lifetime.

◆ bindInner() [1/2]

template<unsigned int N, class T>
MultiArrayView< N-1, T, ChunkedArrayTag > bindInner ( difference_type_1 index) const

Create a lower dimensional view to the chunked array.

Dimension '0' is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.

◆ bindInner() [2/2]

template<unsigned int N, class T>
template<int M, class Index>
MultiArrayView< N-M, T, ChunkedArrayTag > bindInner ( const TinyVector< Index, M > & d) const

Create a lower dimensional view to the chunked array.

The M leftmost dimensions are bound to the indices given in 'd'. All chunks intersecting the view remain active throughout the view's lifetime.

◆ cacheMaxSize()

template<unsigned int N, class T>
std::size_t cacheMaxSize ( ) const

Get the number of chunks the cache will hold.

If there are any inactive chunks in the cache, these will be sent asleep until the max cahce size is reached. The max cache size may be temporarily overridden when more chunks need to be active simultaneously.

◆ setCacheMaxSize()

template<unsigned int N, class T>
void setCacheMaxSize ( std::size_t c)

Set the number of chunks the cache will hold.

This should be big enough to hold all chunks that are frequently needed and must therefore be adopted to the application's access pattern.


The documentation for this class was generated from the following file:

© Ullrich Köthe (ullrich.koethe@iwr.uni-heidelberg.de)
Heidelberg Collaboratory for Image Processing, University of Heidelberg, Germany

html generated using doxygen and Python
vigra 1.12.2 (Mon Apr 14 2025)