Range Constrained Types in C++

One of the most common software bugs is the humble buffer overflow. In its simplest form this can happen when an array is accessed out of bounds:

cpp
 1 | int arr[5];
 2 | 
 3 | int foo(int i) {
 4 |     return arr[i];
 5 | }
 6 | 
 7 | int main() {
 8 |     return foo(5);
 9 | }

The code above causes 🌩Undefined Behavior🌩TM (or UB for short). That is to say that a standard-compliant compiler can produce code that does literally anything. Roughly speaking this is to give the compiler more leeway in making aggressive optimizations: it means that it can assume UB never happens and optimize the code based on that assumption.

So more likely than not foo will simply compile to machine code that reads and returns the memory at &arr + i (in fact you can click the little green “C” in the corner of the code to see that clang’s codegen goes exactly that). Since the array only occupies memory up to &arr + 4, calling foo like above will try to read the memory of whatever object the compiler decided to place after arr. If that object is always the same, then executing the binary will consistently produce the same return code. Or maybe there is no object after arr and the address right after arr is not mapped to a region that the created process is allowed to read, resulting in a SEGFAULT.

# 1 std::array

In this respect C-style arrays are considered unsafe, and many “modern” C++ style guides suggest using std::array instead. Objects of this type implement the at function which checks that the supplied index is within the bounds of the array. If it isn’t, it throws an exception.

cpp
 1 | #include <array>
 2 | 
 3 | std::array<int, 5> arr;
 4 | 
 5 | int foo(int i) {
 6 |     return arr.at(i);
 7 | }
 8 | 
 9 | int main() {
10 |     return foo(5);
11 | }

Now the program will (predictably!) fail; And of course we could add error handling by catching the exception. Somewhat annoyingly though, there are also lots of folks who consider using exceptions at all to be bad practice. If we compile the code with clang and -fno-exceptions the code above will still compile (and still fail) but whenever any element is accessed out of bounds, the program will call abort killing the process and precluding any error handling.

And besides, who wants bounds checks anyway? We are writing C++ after all. What we want is performance! Every unnecessary CPU cycle brings dishonor to us and our family.

When iterating over an array the fancy new std::array type can still help us out. Using a range based for-loop allows us to access all array elements without bounds checking, and the guarantee that we don’t overflow the buffer.

cpp
 1 | #include <array>
 2 | 
 3 | std::array<int, 5> arr;
 4 | 
 5 | int sum() {
 6 |     int x = 0;
 7 |     for (int el : arr) {
 8 |         x += el;
 9 |     }
10 | 
11 |     return x;
12 | }

But what about accessing just a single element? In some other languages it is possible to create integer subtypes, where only certain values allowed. Then the api for array accesses could explicitly take only elements of that subtype; let’s call it InRange<T, n, m>, where T is an integral type, and n and m are of type T. These objects then carry a value of type T that is somehow guaranteed to be >= n and < m. Leaving aside the question of how to implement this behavior for now, this is what array’s subscript operator could look like:

cpp
 1 | #include <cstddef>
 2 | 
 3 | template<typename T, T n, T m>
 4 | class InRange;
 5 | 
 6 | template<typename T, size_t n>
 7 | class safe_array {
 8 | //...
 9 | public:
10 |     T& operator[](InRange<size_t, 0, n> x) {
11 |         return data[x];
12 |     }
13 | //...
14 | private:
15 |     T data[n];
16 | };

Another way of thinking about this is that we are encoding a precondition of safe_array’s subscript operator in the type system. So how can we design InRange to guarantee this implied precondition? Ideally we would want to be able to use it just like the integral type T that it is built on top of, i.e. being able to do arithmetic, performing comparisons, etc.

# 2 Constraining the argument

For one thing whenever we create a new InRange object or assign to an existing one from an arbitrary integer we need to obviously check that the constraint is satisfied:

cpp
 1 | template<typename T, T n, T m>
 2 | class InRange {
 3 | public:
 4 |     InRange(T value) : x(value) {
 5 |         if (!check_constraint())
 6 |             throw 1;
 7 |     }
 8 |     InRange<T, n, m>& operator=(T value) {
 9 |         x = value;
10 |         if (!check_constraint())
11 |             throw 1;
12 |         return *this;
13 |     }
14 | 
15 | private:
16 |     bool check_constraint() {
17 |         return (x >= n && x < m);
18 |     }
19 |     T x;
20 | };

With the single-argument constructor objects of type T can now also implicitly convert to InRange<T, ...> , meaning the introductory example already almost works with our custom integer subtype. The only thing missing is some way of retrieving the x from within InRange. By adding a conversion operator back to T we can both fix this and partly achieve our objective of being able to use InRange<T, ...> just like T:

cpp
 1 | operator T() {
 2 |     return x;
 3 | }

The code below will now compile:

cpp
 1 | #include "static/in_range_naive.h"
 2 | 
 3 | safe_array<int, 5> arr;
 4 | 
 5 | int main() {
 6 |     InRange<int, 0, 5> var(3);
 7 |     var = var + 1;
 8 |     return arr[5] + var;
 9 | }

… however, we still have all the same problems as with std::array and .at: a runtime check takes place when the 5 is implicitly converted to InRange<size_t, 0, 5>, and when it fails an exception is thrown that has to either be caught or else the process will terminate.

We also have runtime checks on the creation of, and assignment to var. While any half-descent optimizer will likely be able to get rid of these here (after all the compiler can see all the integer literals involved in the calculations) this is not always the case and we would like a stronger guarantee that the information available at compile-time is in fact used to eliminate unnecessary checks.

Ideally it should also realize at compile-time that the access to arr[5] is illegal, refusing to compile, instead of throwing an exception at run-time.

# 3 Compile-time checks

Since C++ 20 there is a very simple solution: add the consteval keyword to the constructor, and assignment operator of InRange:

cpp
 1 | template<typename T, T n, T m>
 2 | class InRange {
 3 | public:
 4 |     consteval InRange(T value) : x(value) {
 5 |         if (!check_constraint())
 6 |             throw 1;
 7 |     }
 8 |     consteval InRange<T, n, m>& operator=(T value) {
 9 |         x = value;
10 |         if (!check_constraint())
11 |             throw 1;
12 |         return *this;
13 |     }
14 | 
15 | private:
16 |     constexpr bool check_constraint() {
17 |         return (x >= n && x < m);
18 |     }
19 |     T x;
20 | };

This guarantees that the functions will be called at a compile time, and when it throws this results in a compilation failure. Try compiling the same code as before in compiler explorer this time with our new implementation:

cpp
 1 | #include "static/in_range.h"
 2 | 
 3 | safe_array<int, 5> arr;
 4 | 
 5 | int main() {
 6 |     InRange<int, 0, 5> var(3);
 7 |     return arr[5] + var;
 8 | }