We’re spraying spaces, surfaces and our hands way more often, so why not sanitize our code while we’re at it? After all, software runs the world, and bugs that cause programs to malfunction can cause serious damage – much like their viral counterparts. 

If you’re developing in C and C++, you know this all too well. It’s easy to allocate a piece of memory and forget to free it later, or accidentally write past the memory buffer. These issues are extremely hard to find without proper tools and often cause sporadic, sudden crashes.  

Using sanitizers as you’re building and testing your program can help you catch a great deal of issues in your source code early on, including memory leaks, buffer overflows and undefined behavior. 

Today, we’ll be taking a look at three types of Clang sanitizers, how they’re used and what bugs they can help us nip in the bud.

Let’s spray away!

Cleaning up your address space with AddressSanitizer (ASan)

AddressSanitizer (ASan for short) is used for detecting use-after-free, double-free, buffer (stack, heap and global buffer) overflows and underflows, along with other memory errors. 

It consists of both a compiler instrumentation module and a run-time library that inserts red zones around each set of bytes allocated with the malloc function. It also poisons the freed bytes and keeps track of the call stack for each malloc/free pair.

This is what our code looks like without ASan:

code-without-asan

And here’s what ASan does to it in order to detect address related bugs:

code-with-asan

Using ASan is as simple as adding the -fsanitize=address as both compiler and linker flag.  You can also set a large number of run-time options via the ASAN_OPTIONS environment variable (here is a full list of ASan flags you can use).

To truly grasp how useful this tool is, here’s a little test for you. Try finding a bug in the code snippet below:

char const * src{ "Hello world!" };
auto const dst{ std::make_unique< char[] >( std::strlen( src ) ) };

std::strcpy( dst.get(), src );
std::puts( dst.get() );

It’s hard, isn’t it? Now, imagine this bug was part of a much larger codebase. It’d take us ages to debug it by hand, whereas with ASan turned on, we can identify the hiding heap buffer overflow in seconds (see the demo):

==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000001c at pc 0x00000044b754 bp 0x7ffc9ea586d0 sp 0x7ffc9ea57e80
WRITE of size 13 at 0x60200000001c thread T0
#0 0x44b753 (/app/output.s+0x44b753)

ASan will also warn you if your program continues to use a pointer after it’s been freed (a common security vulnerability called a use-after-free error). Here’s an example program containing the bug:

constexpr std::size_t bodyOffset{ 96u };
char * message{ new char[ 1024 ] };

// fill in the message

char * bodyBegin{ message + bodyOffset };

delete [] message;

// many lines of code ...

std::puts( bodyBegin ); // heap-use-after-free bug

And here’s a report you get from ASan after testing the code:

==1==ERROR: AddressSanitizer: heap-use-after-free on address 0x6190000000e0 at pc 0x000000470a69 bp 0x7ffe047e6b90 sp 0x7ffe047e6340
READ of size 929 at 0x6190000000e0 thread T0

Pretty neat, right?

Now, you might be asking yourself: how much does this cost? Well, nothing comes free in today’s world, and unfortunately, the same is true for ASan. 

Still, if you consider the amount of time this tool could save you, it’s not that expensive at all. The average overhead is ~2x both performance-wise and on memory usage. 

Speaking about performance, ASan’s main competitor, Valgrind, will impose a 10 - 100x higher slowdown on your program. That’s a solid improvement in our book.

Now that we have cleaned up our address space, let’s do some more sanitization of our memory.

Detecting uninitialized memory reads with MemorySanitizer (MSan)

You might think that AddressSanitizer covers all memory-related bugs, but that’s not the case. It doesn’t handle uninitialized memory reads, which is where another sanitizer called MemorySanitizer (MSan for short) comes in. 

MSan will let you copy uninitialized memory and perform simple logical and arithmetical operations on it, but when you try to use it in a decision-making statement, you’ll get a red flag. In order to understand this better, take a look at the following code snippet (demo):

int *src{ new int[ 5 ] };
int dst[ 5 ];

std::memcpy( dst, src, 5 ); // copying uninitialized memory, OK

if ( src[ 0 ] ) // MSan warning: use-of-uninitialized-value
{
// more code ...
}

As you can see, MSan will warn us of any uninitialized value in use. That memory leak you might’ve noticed won’t get flagged up because that’s a part of what ASan deals with — it’s important that we don’t get confused by this.

To get to the origin of this value, use -fsanitize-memory-track-origins flag along with the -fsanitize=memory.

Now, let’s talk a bit about how MSan actually works. It implements a bit to bit shadow mapping, as shown in the figure below, where 1 means “poisoned” or uninitialized bit. This allows for very efficient computation of the shadow memory address. Given the application memory address ProductAddr, computed ShaddowAddr is ProductAddr & ShadowMask, where ShadowMask is a platform-specific constant.

bit-to-bit-shadow-mapping

Whenever access to one of the poisoned bits has any side effect (e.g. in branching), a warning will be raised. This additional bit introduces a 2.5x CPU and 2x memory overhead. The overhead will be a bit higher if memory origins are tracked too, 5x on CPU and 3x on memory, to be specific.

Catching undefined behavior with UndefinedBehaviorSanitizer (UBSan)

Last but not least, UndefinedBehaviorSanitizer (UBSan for short). 

UBSan will catch signed integer overflow, use of null pointers, division by zero and other undefined behavior as you’re executing your program. Apart from -fsanitize=undefined compiler flag, which checks for all kinds of bugs, there are many additional flags that can be helpful in finding more specific bugs, including:

-fsanitize=bounds
-fsanitize=vptr
-fsanitize=enum
-fsanitize=signed-integer-overflow
-fsanitize=null
-fsanitize=unsigned-integer-overflow
-fsanitize=return
-fsanitize=integer-divide-by-zero
-fsanitize=unreachable
-fsanitize=alignment

As you can see, UBSan deals with simple bugs like the one shown in the following example quite well (demo):

int main()
{
      int m = std::numeric_limits< int >::max();
      return m + 1;
}

And here’s the report we get:

runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'

UBSan will demonstrate its power best in day-to-day development as well as in large codebases. Regarding performance, there’s roughly a 1.25x CPU overhead and no impact on memory usage at all (UBSan doesn’t affect address space layout).

Downsides to consider

Now that we’ve seen how sanitizers can help us build better code, we still need to mention a couple of downsides of using them. 

First, unlike Valgrind that does all the checks with no need for source code recompilation, sanitizers require your code to be recompiled. This might take some time, especially if you work on a large codebase. 

The second (and arguably more important downside), is the fact that ASan and MSan can’t work together. This means that you’ll need to perform multiple runs to test your software which, again, can take quite some time. Maybe this is the reason why sanitizers still don’t get much love from the C++ community. 

sanitizers-in-builds
Meeting C++ survey results show a large number of developers aren’t using sanitizers in their builds at all.

Despite their drawbacks, we strongly recommend using sanitizers. They will catch issues that may look safe to competing tools and they won’t break your workflow with hefty slowdowns. 

It makes perfect sense then to integrate them with your ‘fast’ development processes like continuous integration or pull request pipelines. 

However, let’s not dismiss Valgrind just yet as it can still be of much use to us. We’ll be talking about its advantages in one of the future posts.

That’s all from us for now — stay tuned for more interesting posts coming up and remember that building reliable software starts with clean code.