Technology

Be wise, sanitize: Keeping your C++ code free from bugs

May 28, 2021
Be wise, sanitize: Keeping your C++ code free from bugs

For all of the losses it has inflicted, this pandemic has at least made us more conscious about our personal hygiene.

We’re spraying spaces, surfaces and our hands way more often, so why not sanitize our code while we’re at it? After all, software runs the world, and bugs that cause programs to malfunction can cause serious damage – much like their viral counterparts. 

If you’re developing in C and C++, you know this all too well. It’s easy to allocate a piece of memory and forget to free it later, or accidentally write past the memory buffer. These issues are extremely hard to find without proper tools and often cause sporadic, sudden crashes.  

Using sanitizers as you’re building and testing your program can help you catch a great deal of issues in your source code early on, including memory leaks, buffer overflows and undefined behavior. 

Today, we’ll be taking a look at three types of Clang sanitizers, how they’re used and what bugs they can help us nip in the bud.

Let’s spray away!

Cleaning up your address space with AddressSanitizer (ASan)

AddressSanitizer (ASan for short) is used for detecting use-after-free, double-free, buffer (stack, heap and global buffer) overflows and underflows, along with other memory errors. 

It consists of both a compiler instrumentation module and a run-time library that inserts red zones around each set of bytes allocated with the malloc function. It also poisons the freed bytes and keeps track of the call stack for each malloc/free pair.

This is what our code looks like without ASan:

And here’s what ASan does to it in order to detect address related bugs:

Using ASan is as simple as adding the -fsanitize=address as both compiler and linker flag.  You can also set a large number of run-time options via the ASAN_OPTIONS environment variable (here is a full list of ASan flags you can use).

To truly grasp how useful this tool is, here’s a little test for you. Try finding a bug in the code snippet below:

char const * src{ "Hello world!" };
auto const dst{ std::make_unique< char[] >( std::strlen( src ) ) };

std::strcpy( dst.get(), src );
std::puts( dst.get() );

It’s hard, isn’t it? Now, imagine this bug was part of a much larger codebase. It’d take us ages to debug it by hand, whereas with ASan turned on, we can identify the hiding heap buffer overflow in seconds (see the demo):

==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000001c at pc 0x00000044b754 bp 0x7ffc9ea586d0 sp 0x7ffc9ea57e80
WRITE of size 13 at 0x60200000001c thread T0
#0 0x44b753 (/app/output.s+0x44b753)

ASan will also warn you if your program continues to use a pointer after it’s been freed (a common security vulnerability called a use-after-free error). Here’s an example program containing the bug:

constexpr std::size_t bodyOffset{ 96u };
char * message{ new char[ 1024 ] };

// fill in the message

char * bodyBegin{ message + bodyOffset };

delete [] message;

// many lines of code ...

std::puts( bodyBegin ); // heap-use-after-free bug

And here’s a report you get from ASan after testing the code:

==1==ERROR: AddressSanitizer: heap-use-after-free on address 0x6190000000e0 at pc 0x000000470a69 bp 0x7ffe047e6b90 sp 0x7ffe047e6340
READ of size 929 at 0x6190000000e0 thread T0

Pretty neat, right?

Now, you might be asking yourself: how much does this cost? Well, nothing comes free in today’s world, and unfortunately, the same is true for ASan. 

Still, if you consider the amount of time this tool could save you, it’s not that expensive at all. The average overhead is ~2x both performance-wise and on memory usage. 

Speaking about performance, ASan’s main competitor, Valgrind, will impose a 10 – 100x higher slowdown on your program. That’s a solid improvement in our book.

Now that we have cleaned up our address space, let’s do some more sanitization of our memory.

Detecting uninitialized memory reads with MemorySanitizer (MSan)

You might think that AddressSanitizer covers all memory-related bugs, but that’s not the case. It doesn’t handle uninitialized memory reads, which is where another sanitizer called MemorySanitizer (MSan for short) comes in. 

MSan will let you copy uninitialized memory and perform simple logical and arithmetical operations on it, but when you try to use it in a decision-making statement, you’ll get a red flag. In order to understand this better, take a look at the following code snippet (demo):

int *src{ new int[ 5 ] };
int dst[ 5 ];

std::memcpy( dst, src, 5 ); // copying uninitialized memory, OK

if ( src[ 0 ] ) // MSan warning: use-of-uninitialized-value
{
// more code ...
}

As you can see, MSan will warn us of any uninitialized value in use. That memory leak you might’ve noticed won’t get flagged up because that’s a part of what ASan deals with — it’s important that we don’t get confused by this.

To get to the origin of this value, use -fsanitize-memory-track-origins flag along with the -fsanitize=memory.

Now, let’s talk a bit about how MSan actually works. It implements a bit to bit shadow mapping, as shown in the figure below, where 1 means “poisoned” or uninitialized bit. This allows for very efficient computation of the shadow memory address. Given the application memory address ProductAddr, computed ShaddowAddr is ProductAddr & ShadowMask, where ShadowMask is a platform-specific constant.

Whenever access to one of the poisoned bits has any side effect (e.g. in branching), a warning will be raised. This additional bit introduces a 2.5x CPU and 2x memory overhead. The overhead will be a bit higher if memory origins are tracked too, 5x on CPU and 3x on memory, to be specific.

Catching undefined behavior with UndefinedBehaviorSanitizer (UBSan)

Last but not least, UndefinedBehaviorSanitizer (UBSan for short). 

UBSan will catch signed integer overflow, use of null pointers, division by zero and other undefined behavior as you’re executing your program. Apart from -fsanitize=undefined compiler flag, which checks for all kinds of bugs, there are many additional flags that can be helpful in finding more specific bugs, including:

-fsanitize=bounds
-fsanitize=vptr
-fsanitize=enum
-fsanitize=signed-integer-overflow
-fsanitize=null
-fsanitize=unsigned-integer-overflow
-fsanitize=return
-fsanitize=integer-divide-by-zero
-fsanitize=unreachable
-fsanitize=alignment

As you can see, UBSan deals with simple bugs like the one shown in the following example quite well (demo):

int main()
{
      int m = std::numeric_limits< int >::max();
      return m + 1;
}

And here’s the report we get:

runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'

UBSan will demonstrate its power best in day-to-day development as well as in large codebases. Regarding performance, there’s roughly a 1.25x CPU overhead and no impact on memory usage at all (UBSan doesn’t affect address space layout).

Downsides to consider

Now that we’ve seen how sanitizers can help us build better code, we still need to mention a couple of downsides of using them. 

First, unlike Valgrind that does all the checks with no need for source code recompilation, sanitizers require your code to be recompiled. This might take some time, especially if you work on a large codebase. 

The second (and arguably more important downside), is the fact that ASan and MSan can’t work together. This means that you’ll need to perform multiple runs to test your software which, again, can take quite some time. Maybe this is the reason why sanitizers still don’t get much love from the C++ community. 

Meeting C++ survey results show a large number of developers aren’t using sanitizers in their builds at all.

Despite their drawbacks, we strongly recommend using sanitizers. They will catch issues that may look safe to competing tools and they won’t break your workflow with hefty slowdowns. 

It makes perfect sense then to integrate them with your ‘fast’ development processes like continuous integration or pull request pipelines. 

However, let’s not dismiss Valgrind just yet as it can still be of much use to us. We’ll be talking about its advantages in one of the future posts.

That’s all from us for now — stay tuned for more interesting posts coming up and remember that building reliable software starts with clean code.

Integrate ID document scanning into your existing application today

Continue reading

Find more thoughts on the industry insights, use cases, product features, trends in AI, and development processes.

What is identity documentation verification and how does it work in finance?
ID and Document Verification

What is identity documentation verification and how does it work in finance?

August 31, 2023

Identity document verification ensures the authenticity of presented documents, which helps to mitigate the risk of fraudulent activities and breaches…

Upgrade your UX with ID document scanning for web browsers
Technology

Upgrade your UX with ID document scanning for web browsers

February 23, 2023

How easy is it for your customer to start utilizing your product or service? In an age with no abundance…

Microblink’s top 5 blogs of 2022

Microblink’s top 5 blogs of 2022

December 28, 2022

What a year it has been.  For both our Identity and Commerce business units, 2022 was highlighted by growth, innovation,…

Identity Document Scanning product updates – November 2022
Product Updates

Identity Document Scanning product updates – November 2022

November 22, 2022

Find out what’s new in the v6 release of Identity Document Scanning, and how the updates empower your solution and…

Blue in the face: Twitter’s vexing verification raises identity issue on social media
Social Media

Blue in the face: Twitter’s vexing verification raises identity issue on social media

November 17, 2022

In the Twittersphere, the term “verified” has progressively taken on a meaning of its own. It was back in 2009…

Document Verification product updates – August 2022
Product Updates

Document Verification product updates – August 2022

August 10, 2022

Here’s a quick overview of all new features and supported documents in the latest version of Document Verification. Our unique…

Identity Document Scanning product updates – July 2022
Product Updates

Identity Document Scanning product updates – July 2022

July 31, 2022

We’re super excited to announce a new-better-than-ever version of Identity Document Scanning with 50 new identity documents and significantly improved…

What Is True Rejection Rate?
ID and Document Verification

What Is True Rejection Rate?

December 1, 2023

Picture this—you’re shopping online when you get notified of a suspicious login attempt to your account. Did the algorithm get…

How To Evaluate an Online Gaming ID Verification Solution
ID and Document Verification

How To Evaluate an Online Gaming ID Verification Solution

November 29, 2023

In the world of online gaming, ID verification stands as a cornerstone for ensuring security and compliance. This is particularly…

Digital ID Verification: How to Avoid Common Pain-Points
ID and Document Verification

Digital ID Verification: How to Avoid Common Pain-Points

November 21, 2023

In today’s digital landscape, digital identity verification (confirming an individual’s identity remotely) is crucial for nearly every online interaction. It…