One of the most common software bugs is the humble buffer overflow. In its simplest form this can happen when an array is accessed out of bounds:
1 | int arr[5]; 2 | 3 | int foo(int i) { 4 | return arr[i]; 5 | } 6 | 7 | int main() { 8 | return foo(5); 9 | }
The code above causes 🌩Undefined Behavior🌩TM (or UB for short). That is to say that a standard-compliant compiler can produce code that does literally anything. Roughly speaking this is to give the compiler more leeway in making aggressive optimizations: it means that it can assume UB never happens and optimize the code based on that assumption.
So more likely than not foo will simply compile to
machine code that reads and returns the memory at
&arr + i (in fact you can click the little green “C” in
the corner of the code to see that clang’s codegen goes exactly that).
Since the array only occupies memory up to &arr + 4,
calling foo like above will try to read the memory of
whatever object the compiler decided to place after arr. If
that object is always the same, then executing the binary will
consistently produce the same return code. Or maybe there is no object
after arr and the address right after arr is
not mapped to a region that the created process is allowed to read,
resulting in a SEGFAULT.
# 1 std::array
In this respect C-style arrays are considered unsafe, and many
“modern” C++ style guides suggest using std::array instead.
Objects of this type implement the at function which checks
that the supplied index is within the bounds of the array. If it isn’t,
it throws an exception.
1 | #include <array> 2 | 3 | std::array<int, 5> arr; 4 | 5 | int foo(int i) { 6 | return arr.at(i); 7 | } 8 | 9 | int main() { 10 | return foo(5); 11 | }
Now the program will (predictably!) fail; And of course we could add
error handling by catching the exception. Somewhat annoyingly though,
there are also lots of folks who consider using exceptions at all to be
bad
practice. If we compile the code with clang and
-fno-exceptions the code above will still compile (and
still fail) but whenever any element is accessed out of bounds, the
program will call abort killing the process and precluding
any error handling.
And besides, who wants bounds checks anyway? We are writing C++ after all. What we want is performance! Every unnecessary CPU cycle brings dishonor to us and our family.
When iterating over an array the fancy new
std::array type can still help us out. Using a range based
for-loop allows us to access all array elements without bounds checking,
and the guarantee that we don’t overflow the buffer.
1 | #include <array> 2 | 3 | std::array<int, 5> arr; 4 | 5 | int sum() { 6 | int x = 0; 7 | for (int el : arr) { 8 | x += el; 9 | } 10 | 11 | return x; 12 | }
But what about accessing just a single element? In some other
languages it is possible to create integer subtypes, where only certain
values allowed. Then the api for array accesses could explicitly take
only elements of that subtype; let’s call it
InRange<T, n, m>, where T is an integral
type, and n and m are of type T.
These objects then carry a value of type T that is somehow
guaranteed to be >= n and < m. Leaving
aside the question of how to implement this behavior for now, this is
what array’s subscript operator could look like:
1 | #include <cstddef> 2 | 3 | template<typename T, T n, T m> 4 | class InRange; 5 | 6 | template<typename T, size_t n> 7 | class safe_array { 8 | //... 9 | public: 10 | T& operator[](InRange<size_t, 0, n> x) { 11 | return data[x]; 12 | } 13 | //... 14 | private: 15 | T data[n]; 16 | };
Another way of thinking about this is that we are encoding a precondition
of safe_array’s subscript operator in the type system. So
how can we design InRange to guarantee this implied
precondition? Ideally we would want to be able to use it just like the
integral type T that it is built on top of, i.e. being able
to do arithmetic, performing comparisons, etc.
# 2 Constraining the argument
For one thing whenever we create a new InRange object or
assign to an existing one from an arbitrary integer we need to obviously
check that the constraint is satisfied:
1 | template<typename T, T n, T m>
2 | class InRange {
3 | public:
4 | InRange(T value) : x(value) {
5 | if (!check_constraint())
6 | throw 1;
7 | }
8 | InRange<T, n, m>& operator=(T value) {
9 | x = value;
10 | if (!check_constraint())
11 | throw 1;
12 | return *this;
13 | }
14 |
15 | private:
16 | bool check_constraint() {
17 | return (x >= n && x < m);
18 | }
19 | T x;
20 | };With the single-argument constructor objects of type T
can now also implicitly convert to InRange<T, ...> ,
meaning the introductory example already almost works with our custom
integer subtype. The only thing missing is some way of retrieving the
x from within InRange. By adding a conversion
operator back to T we can both fix this and partly achieve
our objective of being able to use InRange<T, ...>
just like T:
1 | operator T() {
2 | return x;
3 | }The code below will now compile:
1 | #include "static/in_range_naive.h" 2 | 3 | safe_array<int, 5> arr; 4 | 5 | int main() { 6 | InRange<int, 0, 5> var(3); 7 | var = var + 1; 8 | return arr[5] + var; 9 | }
… however, we still have all the same problems as with
std::array and .at: a runtime check takes
place when the 5 is implicitly converted to
InRange<size_t, 0, 5>, and when it fails an exception
is thrown that has to either be caught or else the process will
terminate.
We also have runtime checks on the creation of, and assignment to
var. While any half-descent optimizer will likely be able
to get rid of these here (after all the compiler can see all the integer
literals involved in the calculations) this is not always the case and
we would like a stronger guarantee that the information available at
compile-time is in fact used to eliminate unnecessary checks.
Ideally it should also realize at compile-time that the
access to arr[5] is illegal, refusing to compile, instead
of throwing an exception at run-time.
# 3 Compile-time checks
Since C++ 20 there is a very simple solution: add the
consteval keyword to the constructor, and assignment
operator of InRange:
1 | template<typename T, T n, T m>
2 | class InRange {
3 | public:
4 | consteval InRange(T value) : x(value) {
5 | if (!check_constraint())
6 | throw 1;
7 | }
8 | consteval InRange<T, n, m>& operator=(T value) {
9 | x = value;
10 | if (!check_constraint())
11 | throw 1;
12 | return *this;
13 | }
14 |
15 | private:
16 | constexpr bool check_constraint() {
17 | return (x >= n && x < m);
18 | }
19 | T x;
20 | };This guarantees that the functions will be called at a compile time, and when it throws this results in a compilation failure. Try compiling the same code as before in compiler explorer this time with our new implementation:
1 | #include "static/in_range.h" 2 | 3 | safe_array<int, 5> arr; 4 | 5 | int main() { 6 | InRange<int, 0, 5> var(3); 7 | return arr[5] + var; 8 | }