ShaderOp.com

ShaderOp.com

Hi there. I have moved to a new website at Mhmmd.org. I'm no longer updating this one, but I'm keeping it around so the Internet wouldn't break. See you there.

Minimizing Header Bloat in C++: An Example

I claimed in my previous post that validating one’s assumptions before committing to a particular solution is a good habit, but I was being slightly hypocritical. The fact of the matter is that I did actually write a SHA-1—based ID generator before doing any tests.

The effort wasn’t a total waste. It reminded me of a subject that I had long forgotten since living in .NET land for so long, and that is importance of minimizing header file dependencies in one’s code. The problem and the solution are explained nicely in the Google C++ Style Guide, in the section aptly titled “Header File Dependencies.” But the gist of it is to try to use as few #include directives as possible in your own header files.

This is often easier said than done, and sometimes leads to interface design choices that might look unintuitive to someone coming from a programming language that has proper support for modules, like C# and Java for instance. But such choices are usually justified.

I’ll try to present such an example here from my own SHA-1 experiment.

I wanted to write a function that takes a string and returns its SHA-1 hash. SHA-1 hashes are 160 bits long, so the return type has to be some sort of array or container.

I could have written something like the following in the header file:

#ifndef GENERATESHA1HASH_H_
#define GENERATESHA1HASH_H_

#include <stdint.h>
#include <vector>
#include <string>

std::vector<int8_t> GenerateSha1Hash(const std::string data);

#endif // GENERATESHA1HASH_H_

But then any code that includes this header will also take a dependency on the headers <vector>, <string>, and <stdint.h>, which will lead to longer compilation times. And no one likes longer compilation times.

The argument can be changed to const char* without any loss of readability or ease of use:

#ifndef GENERATESHA1HASH_H_
#define GENERATESHA1HASH_H_

#include <stdint.h>
#include <vector>

std::vector<int8_t> GenerateSha1Hash(const char* data);

#endif // MESSAGEBUS_GENERATESHA1HASH_H_

That’s one less header file to worry about. What about <stdint.h>? Do I really need int8_t in there? Firstly, I don’t want to worry about compiler- and platform-dependent sizes of the native types. Secondly, while it’s possible to return a container of ints and document somewhere that only the lower 8 bits of each int will be used, I would rather have the intent clearly defined in code.

So <stdint.h> stays. I can take solace in the fact that it’s a smaller header file and won’t be too much of a hit on compilation times.

Next up is <vector>. I can opt to use a different container, but that will only mean switching one header for another. Using a shared_array has the same issue. Returning a naked dynamically allocated pointer to the data and expecting the calling code to free it is simply unacceptable and is just plain asking for trouble.

There’s another possibility: Make the calling code carry the burden of providing the storage for the hash:

#include <stdint.h>

void GenerateSha1Hash(const char* data, int8_t** hash, int hashSize);

With this version the caller is expected to pass in a pointer to an array of bytes in hash and the size of the array in hashSize. The two possible issues with this is that the caller has to allocate the memory himself, and he also has to know that a SHA-1 hash will take 20 bytes of storage.

It can be made slightly better:

#include <stdint.h>

int GenerateSha1Hash(const char* data, int8_t** hash, int hashSize);

In this case the function will return –1 in case of errors, the value of hashSize if the output buffer is smaller than the size of the SHA-1 hash, or the actual size of the SHA-1 hash if the output buffer is larger.

This is slightly more resilient. I made the choice to be too forgiving and allow the call to succeed if the buffer is smaller than the size of the generated hash, but I could have chosen to fail if the buffer is too small.

All together now and with some Doxygen documentation thrown in:

#ifndef GENERATESHA1HASH_H_
#define GENERATESHA1HASH_H_

#include <stdint.h>

/// 
/// Generates a 20-byte SHA-1 hash for the provided string.
///
/// @param[in] data the string for which a hash will be generated.
/// @param[out] hash a pointer to the buffer that will receive the hash
/// @param[in] hashSize the size of the buffer
///
/// @return the actual number of bytes stored in <em>hash</em> on success,
///  or -1 on failure.
///
int GenerateSha1Hash(const char* data, int8_t** hash, int hashSize);

#endif // GENERATESHA1HASH_H_

That will do. Now all that is left is to actually implement the thing. But that’s for later.

Final thoughts

“Who in their right mind would want to worry about the cost of returning an array of bytes?” one might say. And it’s an entirely valid opinion in my view. But there’s two important points to consider here:

Firstly, you don’t have to worry about it if you don’t want to. If this was a private function not meant for sharing among applications, and if my application was already using, for example, shared_array all over the place, I would certainly go ahead and just return a shared_array and be done with it.

Secondly, I would argue that having an eye for detail is always a good thing, regardless of programming language. For example, which would be better: A C# method that takes an input parameter of type List<T> or of type IList<T>? And wouldn’t an IEnumerable<T> be better yet?

Most developers these days don’t need to work with C++, and thank goodness for that. But tinkering with different programming languages is a lot like solving crossword puzzles: it keeps the mind active and (arguably) healthy. And C++ is, I think, The New York Times of cross puzzles.

No Comments

Powered by: