C arrays vs pointers – VLA magic

2021-06-17

Note: intermediate knowledge of C is recommended to understand this post (read: comfortable with pointers).

Stack vs Heap and VLAs

Quick question: how can you allocate an array of 5 integers in C? There are two main ways:

There are differences between these alternatives; the most notable are that the storage reserve is done by the compiler in the first version (by reserving stack space), while the second one is done by the runtime (or whichever library implements malloc), which will allocate the space in the heap. This fact has a number of implications, namely that the de-allocation is statically handled and automatic at the end of the scope for the first version, while in the second one we are tasked with managing the deallocation (or we can let the operating system deal with that at the end of the process) 2.

Almost all programmers would choose the first alternative, as it is simpler and faster. However, there are situations where the size is not known beforehand, or it could be too large for a stack allocation. In the case where we do not know the size, but we are confident that it won’t be too large, we can use VLAs - Variable Length Arrays. For instance:

n = random() % 256;
int a[n];
int *b = malloc(sizeof *b * n);

In this case, variable a is an array without a previously known length. As before, the compiler will try to reserve space in the stack, but because it doesn’t know the size at compile time, it emits code to resize the stack at runtime. In comparison, the emited code for the malloc() version is almost the same3; and because tipically the heap is larger than the stack, this solution handles larger arrays. In my experience, the majority of people will reach out for the second version, when faced with uncertain sizes - which typically is a good choice.


Multi-dimensional arrays: exploiting VLA types

Other common topic about arrays is talking about multi-dimensional arrays. For instance, to declare a 3 by 4 matrix, we can just declare it:

double a[3][4];

a[2][3] = 42.0;

Logically, our matrix (or multidimensional array) is layed out like this:

+---------+---------+---------+---------+
| a[0][0] | a[0][1] | a[0][2] | a[0][3] |
+---------------------------------------+
| a[1][0] | a[1][1] | a[1][2] | a[1][3] |
+---------------------------------------+
| a[2][0] | a[2][1] | a[2][2] | a[2][3] |
+---------+---------+---------+---------+

But actually, all memory in current systems follows a linear geometry, so it is stored in this order:

+---------+---------+---------+---------+---------+---------+---------+
| a[0][0] | ....... | a[0][3] | a[1][0] | ....... | ....... | a[2][3] |
+---------+---------+---------+---------+---------+---------+---------+

In order to access the memory position of a[1][0], the compiler takes note that each position of a[3] is comprised of an array of 4 doubles (double [4]), and therefore a[1][0] is in position number 1 * 4 + 0 = 4, whereas a[2][3] is in the position 2 * 4 + 3 = 11, which is the last of 12 positions (we count including position 0, as usual). If, instead of a fixed position, we want to access an runtime-determined one, the compiler will emit code to make these calculations.

A tricky question is: how to instantiate and use a matrix with variable sizes? Again, if we are confident about the size bounds, we can use VLAs:

int height = random() % 256;
int width = random() % 256;

double a[height][width];

But what if we can’t get away just with the stack space? Some alternatives are possible. The most hacky one (probably) is to increase the default stack space on your computer; this “solution” just sweeps the problem under the rug, the fundamental issue is still there.

So, we will have to use the heap, and by extension malloc(). However, malloc just gives us a simple array of bytes, and we want to use more than one dimension. To handle this, the most common solutions are:

The first alternative is coded like this:

double *a = malloc(sizeof *a * height * width);

// To access element (2,3)
a[2 * width + 3] = 42.0;

In terms of memory layout, this is really similar, but instead of letting the compiler emit the necessary code to access a given row and collumn, we do it ourselves.

+---------+---------+---------+---------+---------+---------+---------+
| a[0]    | ....... | a[3]    | a[4]    | ....... | ....... | a[w*h]  |
+---------+---------+---------+---------+---------+---------+---------+

This can be annoying, because we always have to remember to multiply the correct units by the correct dimension, else everything goes haywire4. Extending this example to 3 or higher dimensions is left as an exercise for the reader. Also, we have changed the type of a, from double [][] to double *. Any code that used the previous declaration is going to be changed to accomodate this - which may introduce new bugs.

To avoid this “interface” change, we can create an array of arrays. In fact, what is usually done is declare a as pointer to pointer, and then each position has a pointer to a piece of memory where we can store the actual information. This is confusing to write, so I will just show you how it can be done:

double **a = malloc(sizeof *a * height);

for(int i=0; i<height; ++i) {
    a[i] = malloc(sizeof *a[i] * width);
}

// To access element (2,3)
a[2][3] = 42.0;

This is different than what we were doing previously. Now, instead of having a single continuous array, we have two: array double ** a, where each position points to an array of doubles (double *) representing a row. These rows are allocated one by one, and therefore we do not know if they are contiguous in memory, or even ordered.

                  +---------+---------+---------+---------+
             +--->| a[0][0] | a[0][1] | a[0][2] | a[0][3] |
 +------+    |    +---------+---------+---------+---------+
 | a[0] |----+
 +------+         +---------+---------+---------+---------+
 | a[1] |--+  +-->| a[2][0] | a[2][1] | a[2][2] | a[0][3] |
 +------+  |  |   +---------+---------+---------+---------+
 | a[2] |--|--+
 +------+  |      +---------+---------+---------+---------+
           +----->| a[1][0] | a[1][1] | a[1][2] | a[1][3] |
                  +---------+---------+---------+---------+

And to clean up all these allocations, we will have to go to all rows and free() them one by one. To avoid so many small (and potentially non-contiguous) allocations, a slight improvement can be made, by allocating the “base” memory in one single malloc() call and then compute the correct offsets:

double **a = malloc(sizeof *a * height);
double *_a = malloc(sizeof *_a * height * width);

for(int i=0; i<height; ++i) {
    a[i] = _a + i*width;
}

In this case, the memory layout is similar, but instead of many small allocations, we will have only 1 array per dimension; in our example, this means we have a single allocation for the rows, instead of many small ones - and this single allocation is actually an allocation following the previous logic; What this does is instead of always calculate the offset, we have pre-calculated it and saved in a.

This is a                     This is _a

+------+       +---------+---------+---------+---------+
| a[0] |------>| a[0][0] | a[0][1] | a[0][2] | a[0][3] |
+------+       +---------------------------------------+
| a[1] |------>| a[1][0] | a[1][1] | a[1][2] | a[1][3] |
+------+       +---------------------------------------+
| a[2] |------>| a[2][0] | a[2][1] | a[2][2] | a[2][3] |
+------+       +---------+---------+---------+---------+

Both these last solutions have a big drawback - each dimension introduces at least one new allocated array, and we have to make sure the various levels of pointers are correctly set up before starting to use the array. Also, these intermediate arrays may not be easy to deallocate. To deallocate an array, we have to use free(), which receives as a single argument a pointer, which must point to the first location of the array - it cannot point to the middle.

If, after setting up these arrays, we never touch the order of the rows, to deallocate we need to do:

free(a[0]); // to free _a
free(a);

However, if we change the row order (for instance, because we sorted them), it may happen that a[0] is not _a. If we lose the original value of _a, we can no longer deallocate.

Until know, we don’t have a version as good as the original that used VLAs - either because we become tasked with doing our own access calculations (which can lead to bugs) or because we have to use more memory. Recently5, I realized we can actually use VLA types with heap storage. This means that the usage is completely equal to the VLA version, but we malloc the storage (and therefore are tasked with freeing it as well). It seems monstruous, but is quite elegant:

int (*a)[width] = malloc(sizeof *a * height);

// Now we can just proceed as usual
a[2][3] = 42.0;

// and to free it...
free(a);

In terms of memory layout, is just the same as the VLA version with fixed dimensions, but stored in the heap. Note that the innermost dimension, height, is no longer declared as an array (that would use square brackets), but as a pointer - because we are using a memory location on the heap, and that is a pointer. However, all other dimensions are needed (in this case, width), because if not there would be no way to know at which offset of the base pointer does each collumn start. So, we got the simple interface (not counting with the cryptic type declaration), without the memory overhead and associated bookkeeping.

To emphasize how good this is, let’s extend the example with an extra dimension, the depth. (You may want to try it yourself, as an exercise, before seeing the solution).

This is the “array of array of array” version:

double ***a = malloc(sizeof *a * height);
double **_a = malloc(sizeof *_a * height * width);
double *__a = malloc(sizeof *__a * height * width * depth);

for(int i=0; i<height; ++i) {
    a[i] = _a + i*width;
    for(int j=0; j<width; ++j) {
        a[i][j] = __a + i*width*depth + j*depth;
    }
}

a[2][3][4] = 42.0;

free(a[0][0]); // To free __a
free(a[0]);    // To free _a
free(a);

As said before, lots of bookkeeping, but the usage is simple. The opposite would be the “one-dimensional array and doing offset calculation by hand” approach:

double *a = malloc(sizeof *a * height * width * depth);

a[2*width*depth + 3*depth + 4] = 42.0;

free(a);

Simpler, but less ergonomic in terms of use.

Now, the VLA type version:

double (*a)[width][depth] = malloc(sizeof *a * height);

a[2][3][4] = 42.0;

free(a);

Perfect, or close to it. Actually, when the compiler is faced with this code, it emits code to store the values of width and depth, and when we want to access a memory position it will essentially do what we did in the “1d” approach.

A drawback of this solution is that VLAs were introduced in C99, and where marked as optional in C11, which means some modern compilers do not support VLAs (such as MSVC); also, C++ does not support VLAs without using compiler-specific extensions. Also, the assembly generated by the compiler may be worse, compared with doing the calculations by hand - do a performance comparison for your particular use case. Final drawback - gdb doesn’t know how to compute the memory accesses for VLAs, so in this case you need to cast to a fixed size, and you need to substitute for the numbers themselves, you cannot use a variable. Therefore, when debugging, and if the width is 4 and depth is 5, you have to do:

(gdb) p ((double (*)[4][5]) a)[2][3][4]

instead of the more natural

(gdb) p ((double (*)[width][depth]) a)[2][3][4]

  1. Before allocating an array, it is necessary to know the size of each element, in this case, the size of *b (or b[0]). There are two ways of getting the size of a given variable: get the size of the type of the variable, which is the most common approach, and that would be written as sizeof(int), or ask for the size of a variable. In the compilation step, the compiler will know the type of the variable, and therefore sizeof *b == sizeof(*(int *)) == sizeof(int). Using the variable version is sometimes preferred to avoid mistakes in specifying the type, and this is the version I will use in this post.↩︎

  2. Other difference is that we can use the first alternative verbatim outside of functions, whereas the second alternative has to be broken up - the pointer itself can be declared beforehand, but the call to malloc() has to be done afterwards.↩︎

  3. The multiplication to know the total size can be done in the compile step if n is known at compile time.↩︎

  4. Actually, what prompted me to look into this topic was a bug that appeared when some code was modified to use this alternative instead of VLAs, and the results were different - the root cause was because the height variable was being altered, and because in the VLA version the access offset is not computed using the variable, but the original value, it was not affected by the changing height.↩︎

  5. To be precise, in the night of 16th of June of 2021↩︎