ShaderOp.com

ShaderOp.com

Hi there. I have moved to a new website at Mhmmd.org. I'm no longer updating this one, but I'm keeping it around so the Internet wouldn't break. See you there.

Minimizing Header Bloat in C++: An Example

I claimed in my previous post that validating one’s assumptions before committing to a particular solution is a good habit, but I was being slightly hypocritical. The fact of the matter is that I did actually write a SHA-1—based ID generator before doing any tests.

The effort wasn’t a total waste. It reminded me of a subject that I had long forgotten since living in .NET land for so long, and that is importance of minimizing header file dependencies in one’s code. The problem and the solution are explained nicely in the Google C++ Style Guide, in the section aptly titled “Header File Dependencies.” But the gist of it is to try to use as few #include directives as possible in your own header files.

This is often easier said than done, and sometimes leads to interface design choices that might look unintuitive to someone coming from a programming language that has proper support for modules, like C# and Java for instance. But such choices are usually justified.

I’ll try to present such an example here from my own SHA-1 experiment.

I wanted to write a function that takes a string and returns its SHA-1 hash. SHA-1 hashes are 160 bits long, so the return type has to be some sort of array or container.

I could have written something like the following in the header file:

#ifndef GENERATESHA1HASH_H_
#define GENERATESHA1HASH_H_

#include <stdint.h>
#include <vector>
#include <string>

std::vector<int8_t> GenerateSha1Hash(const std::string data);

#endif // GENERATESHA1HASH_H_

But then any code that includes this header will also take a dependency on the headers <vector>, <string>, and <stdint.h>, which will lead to longer compilation times. And no one likes longer compilation times.

The argument can be changed to const char* without any loss of readability or ease of use:

#ifndef GENERATESHA1HASH_H_
#define GENERATESHA1HASH_H_

#include <stdint.h>
#include <vector>

std::vector<int8_t> GenerateSha1Hash(const char* data);

#endif // MESSAGEBUS_GENERATESHA1HASH_H_

That’s one less header file to worry about. What about <stdint.h>? Do I really need int8_t in there? Firstly, I don’t want to worry about compiler- and platform-dependent sizes of the native types. Secondly, while it’s possible to return a container of ints and document somewhere that only the lower 8 bits of each int will be used, I would rather have the intent clearly defined in code.

So <stdint.h> stays. I can take solace in the fact that it’s a smaller header file and won’t be too much of a hit on compilation times.

Next up is <vector>. I can opt to use a different container, but that will only mean switching one header for another. Using a shared_array has the same issue. Returning a naked dynamically allocated pointer to the data and expecting the calling code to free it is simply unacceptable and is just plain asking for trouble.

There’s another possibility: Make the calling code carry the burden of providing the storage for the hash:

#include <stdint.h>

void GenerateSha1Hash(const char* data, int8_t** hash, int hashSize);

With this version the caller is expected to pass in a pointer to an array of bytes in hash and the size of the array in hashSize. The two possible issues with this is that the caller has to allocate the memory himself, and he also has to know that a SHA-1 hash will take 20 bytes of storage.

It can be made slightly better:

#include <stdint.h>

int GenerateSha1Hash(const char* data, int8_t** hash, int hashSize);

In this case the function will return –1 in case of errors, the value of hashSize if the output buffer is smaller than the size of the SHA-1 hash, or the actual size of the SHA-1 hash if the output buffer is larger.

This is slightly more resilient. I made the choice to be too forgiving and allow the call to succeed if the buffer is smaller than the size of the generated hash, but I could have chosen to fail if the buffer is too small.

All together now and with some Doxygen documentation thrown in:

#ifndef GENERATESHA1HASH_H_
#define GENERATESHA1HASH_H_

#include <stdint.h>

/// 
/// Generates a 20-byte SHA-1 hash for the provided string.
///
/// @param[in] data the string for which a hash will be generated.
/// @param[out] hash a pointer to the buffer that will receive the hash
/// @param[in] hashSize the size of the buffer
///
/// @return the actual number of bytes stored in <em>hash</em> on success,
///  or -1 on failure.
///
int GenerateSha1Hash(const char* data, int8_t** hash, int hashSize);

#endif // GENERATESHA1HASH_H_

That will do. Now all that is left is to actually implement the thing. But that’s for later.

Final thoughts

“Who in their right mind would want to worry about the cost of returning an array of bytes?” one might say. And it’s an entirely valid opinion in my view. But there’s two important points to consider here:

Firstly, you don’t have to worry about it if you don’t want to. If this was a private function not meant for sharing among applications, and if my application was already using, for example, shared_array all over the place, I would certainly go ahead and just return a shared_array and be done with it.

Secondly, I would argue that having an eye for detail is always a good thing, regardless of programming language. For example, which would be better: A C# method that takes an input parameter of type List<T> or of type IList<T>? And wouldn’t an IEnumerable<T> be better yet?

Most developers these days don’t need to work with C++, and thank goodness for that. But tinkering with different programming languages is a lot like solving crossword puzzles: it keeps the mind active and (arguably) healthy. And C++ is, I think, The New York Times of cross puzzles.

Bad Assumptions: Hashing Algorithms

I’m doing some prep work for a little personal project I’m working on. For reasons that will hopefully become clear in future posts, I want to do the following:

Given the fully-qualified name of a type (whether it’s a class, struct, or function), generate a  unique identifier for it.

The identifier has to have the following properties:

  1. It should fit into 32 bits.
  2. It should be unique only within the scope of a given program.
  3. Speed isn’t a big factor since the results can be cached, but the faster it can be generated the better.

Since the scope of uniqueness I care for is limited, I theorized that the first four bytes of a SHA-1 hash of the fully qualified name of the type should be unique enough in a given program space. It’s an assumption that sounds reasonable, but I had to test it before basing any code on it.

And while I was at it, I decided to test MD5 as well because it’s supposedly faster. I also threw in CRC32 just for laughs, because I reasoned while it would be the fastest of the three it would fail miserably at generating unique ID’s with a large enough set of types. Or so I thought.

Generating SHA-1 and MD5 using .NET is trivial thanks to the classes in the System.Security.Cryptography namespace, but I had to look elsewhere for CRC32. I finally found an excellent and robust implementation on David Anson’s blog.

To test all three I wrote a little C# console application that went through all the types in the System assembly, computed all three hashes, and took the first four bytes of the hash and stuffed them into a UInt32. The program ends by printing out the total number of types found and the total number of unique IDs generated.

Here’s the first run with just the types in the System assembly:

Number of types         = 2779
Number of sha1 hashes   = 2779
Number of md5 hashes    = 2779
Number of crc32 hashes  = 2779

That looks reasonable enough. CRC32 was doing pretty well. But I was sure it won’t take long to break it.

Same test again, but this time with System, System.Xml, and System.Xml.Linq

Number of types         = 3740
Number of sha1 hashes   = 3740
Number of md5 hashes    = 3740
Number of crc32 hashes  = 3740

I kept adding assemblies and rerunning the test, and still all hashing algorithms managed to produce unique ID’s. The final test I ran was this:

Number of types         = 10183
Number of sha1 hashes   = 10183
Number of md5 hashes    = 10183
Number of crc32 hashes  = 10183

10,000 types and still CRC32 was holding its ground. At that point I began to suspect that something was wrong with my code. More drastic testing was required.

So I downloaded Moby Dick, modified the program to run through the file and store every line in a list while removing duplicates and trimming whitespaces, and then ran the hashing algorithms on each line in that list:

Number of unique lines  = 18847
Number of sha1 hashes   = 18847
Number of md5 hashes    = 18847
Number of crc32 hashes  = 18847

And again with War and Peace:

Number of unique lines  = 50605
Number of sha1 hashes   = 50604
Number of md5 hashes    = 50605
Number of crc32 hashes  = 50605

Finally one algorithm produced a duplicate, but it’s SHA-1. CRC32 is still tugging along happily.

I wasn’t going to give up until the others broke, so I tried again with The Bible:

Number of unique lines  = 98377
Number of sha1 hashes   = 98377
Number of md5 hashes    = 98376
Number of crc32 hashes  = 98377

The word of God did manage to shake MD5, but not the humble CRC32. Underestimating the meek is never a good idea.

All three books combined together now in one giant 10-megabyte file:

Number of unique lines  = 167309
Number of sha1 hashes   = 167307
Number of md5 hashes    = 167305
Number of crc32 hashes  = 167307

Finally CRC32 cracks, but not before finishing head to head with SHA-1.

Conclusion

I don’t know much about statistics and cryptography, so I don’t know what conclusions someone with better knowledge of the subject would draw from all of this. But in my case I’m now more than comfortable to assume the CRC32 will cover my needs quite adequately. I will probably still add some debug-only checks to my code to ensure that no duplicate ID’s are generated in my program, but otherwise, and contrary to my initial assumption, CRC32 are unique enough.

Using HtmlUnit to Test .NET Applications

I have been reading a lot about Behavior-Driven Development and test automation lately. Steven Sanderson’s blog proved to be a goldmine of practical information in that regard, specifically his post about BDD, SpecFlow, and ASP.NET MVC.

From my limited understanding of the subject, an ideal BDD test should verify a feature by running a series of tests against a real system using a real web browser. With the help of a suitable library, like WatiN for example, automating the web browser becomes a doable task. The problem is that as the number of tests increases, running them against a real browser becomes time consuming.

Mr. Sanderson lists another option in another blog post, and that is using a headless browser called HtmlUnit. It’s built with Java, but Mr. Sanderson has managed to get it running in .NET using IKVM, and I was able to replicate his experience successfully.

Close, But No Cigar

When working on a feature, I would like to be able to:

  • Use a real browser with BDD testing while developing the feature. This would allow me to actually see what’s going on and would make debugging easier.
  • Once I’m done working on it and the BDD tests are passing, I would like to switch to an headless browser for regression tests, since it would make the tests faster, and thus more likely to be maintained and ran with every build.
  • If a feature breaks during regression, I would like to switch back to using a real browser since, again, it would make debugging easier.

Using HtmlUnit and some other browser automation framework (e.g. WatiN) would mean that each test will have to be implemented twice, once for each framework, and that’s far from ideal.

Enter Selenium’s WebDriver

I have never used Selenium before, so my knowledge of their range of products is peripheral at best. But apparently they are implementing a new browser automation API called “WebDriver,” which will be part of the upcoming Selenium 2.0 release. This new API should allow programmatic automation of IE, Chrome, Firefox, and HtmlUnit using a uniform set of interfaces.

The latest public release is Alpha 5, which can be downloaded at this link. They have binaries for both Java and .NET, and the source code has libraries for Python and Ruby as well. I’ve only tried the .NET implementation, and found the Firefox driver to be quite solid, unlike the IE version, which was a bit too buggy for my taste. Hopefully it will all be sorted out by the final release.

Mr. Sanderson’s gives an example of how to use HtmlUnit in the blog post mentioned earlier. Here’s how the same example would be implemented in using WebDriver and Firefox:

[TestFixture]
public class GoogleSearchWithFirefox
{
    private IWebDriver _driver;

    [SetUp]
    public void SetUp()
    {
        _driver = new FirefoxDriver();
    }

    [TearDown]
    public void TearDown()
    {
        _driver.Close();
    }

    [Test]
    public void Can_Load_Google_Homepage()
    {
        _driver.Url = "http://www.google.com/";
        Assert.That("Google", Is.EqualTo(_driver.Title));
    }

    [Test]
    public void Google_Search_For_AspNetMvc_Yields_Link_To_AspDotNet()
    {
        _driver.Url = "http://www.google.com/";
        _driver.FindElement(By.Name("q")).SendKeys("asp.net mvc");
        _driver.FindElement(By.Name("btnG")).Click();

        // Should be on the results page now:
        var resultLinks =
            from element in _driver.FindElements(By.TagName("a"))
            let href = element.GetAttribute("href")
            where href.StartsWith("http://")
            let uri = new Uri(href)
            where uri.Host.ToLower().EndsWith("asp.net")
            select uri
            ;

        CollectionAssert.IsNotEmpty(resultLinks);
    }
}

I think the above code is self-explanatory and not too hard on the eyes. To run the same code against IE, it would be a simple matter of changing the following line:

_driver = new FirefoxDriver();

To:

_driver = new InternetExplorerDriver();

And everything should (at least once the final version of the IE driver is released) continue to work as before.

Now, as I have mentioned before, the WebDriver library does contain an implementation for an HtmlUnit-based driver, but unfortunately it’s only available in the Java version.

My Modest Contribution

So, I thought it would be a good idea to write a .NET implementation of the WebDriver API using HtmlUnit. I compiled the HtmlUnit JAR files to .NET assemblies, downloaded the WebDriver source code, and went to town. The WebDriver source code has about 340 unit tests. So far I’ve managed to get 303 of those to pass.

Using what I have so far, the above code could be rewritten as follows:

[TestFixture]
public class GoogleSearchWithHtmlUnit
{
    private IWebDriver _driver;

    [SetUp]
    public void SetUp()
    {
        _driver = new HtmlUnitDriver(true);
    }

    [TearDown]
    public void TearDown()
    {
        _driver.Close();
    }

    [Test]
    public void Can_Load_Google_Homepage()
    {
        _driver.Url = "http://www.google.com/";
        Assert.That("Google", Is.EqualTo(_driver.Title));
    }

    [Test]
    public void Google_Search_For_AspNetMvc_Yields_Link_To_AspDotNet()
    {
        _driver.Url = "http://www.google.com/";
        _driver.FindElement(By.Name("q")).SendKeys("asp.net mvc");
        _driver.FindElement(By.Name("btnG")).Click();

        // Should be on the results page now:
        var resultLinks =
            from element in _driver.FindElements(By.TagName("a"))
            let href = element.GetAttribute("href")
            where href.StartsWith("http://")
            let uri = new Uri(href)
            where uri.Host.ToLower().EndsWith("asp.net")
            select uri
            ;

        CollectionAssert.IsNotEmpty(resultLinks);
    }
}

As you can see, only the SetUp method changes. Everything else stays the same, the tests pass, and the they take a lot less time to run compared to the Firefox version.

I have the code—including the examples above—to BitBucket at http://bitbucket.org/shaderop/htmlunitdriver. I would appreciate any help that anyone reading this might offer, especially bug reports and patches. I know I’ll be using it myself at some point in the very near future, so I’ll try to work on it it as I go along.

Good luck.

Visual Studio Wallpapers: Full Resolution

I’ve recently posted a few Visual Studio 2010 desktop wallpapers on Scott Hanselman’s Visual Studio 2010 Community Wallpapers site. Unfortunately, all the wallpapers there are scaled down to 1280 by 720 pixels (or thereabouts).

So I thought I should repost them here in lossless, luscious 1080p resolution:

Visual Studio MMX: Black on White

Visual Studio MMX: White on Black

Visual Studio 2010: Jolly Roger Flag

Visual Studio 2010: God Rays

Visual Studio 2010: God Rays

Someone was kind enough to send me an email asking me how I made them. I used Autodesk Softimage for 3d modeling and cloth simulation, 3d Coat for some minor texture painting, and Adobe Photoshop for contrast correction and some other, minor tweaks.

Enjoy.

UV Geometry Constraint Goes To The Attic

A couple of weeks ago I proudly proclaimed to the world the release of version 1.1 of the UV Geometry Constraint plug-in for Softimage. Unfortunately, the great enthusiasm displayed by the community when I first announced that I had the idea for the plug-in failed to translate into any kind of meaningful interest when I actually made it. So far it has been downloaded a whopping twenty three times, and the related discussion thread is slowly but surely heading towards archival.

I’m a bit disappointed but not really surprised. In my own experience with software development, users are quickly excited by new ideas, but are reluctant to adopt them once they’re turned into working solutions. It’s a complicated issue, and one that I might get around to musing about it in a future post.

Back to the plug-in at hand. It was fun making it, especially since it required getting re-acquainted with C++, which was my programming language of choice before the advent of .NET and C#. But it’s time to shelve it for the time being. I’ll keep the code in the BitBucket repository, where anyone is free to download it and tinker with it. I’m also more than happy to answer any questions anyone might have. Just hit the contact form on this website.

And that’s that.

Older posts Newer posts
Powered by: