ShaderOp.com

ShaderOp.com

Hi there. I have moved to a new website at Mhmmd.org. I'm no longer updating this one, but I'm keeping it around so the Internet wouldn't break. See you there.

Uniquely Identifying Types in C++ without using RTTI

One of the books on my current reading list is Programming Game AI by Example by Mat Buckland. So far it has been an excellent read, except that some of the code snippets have a certain Win32 API feel to them that I find a bit less than optimal.

For example, Chapter 2 has a subsection about adding inter-object messaging capabilities to the sample presented in that chapter. The author proposes using enums to identify each message type that the system can send or receive. So somewhere in the program there should be one (possibly huge) header with the following:

enum message_type
{
    Msg_HeroMoved,
    Msg_WeaponFired,
    Msg_BulletHit,
    .
    .
};

There’s also a corresponding message class for each message_type enum that holds information related to the message, like the new location of the hero in case of a Msg_HeroMoved message, or the type of projectile fired in the case of a Msg_WeaponFired message.

I happen to think that using unique IDs for each message is actually a good idea because if its simplicity. I just have a problem with how it’s implemented. Maintaining both a long list of message IDs and the corresponding message classes feels redundant and error-prone to me.

One could move the ID to the message classes, like so:

class HeroMovedMessage
    : public MessageBase
{
    /* Some details omitted ... */

public:
    static unsigned int TypeId();
    Point Location() const;
}

But that leads to more potential issues. It’s not possible to force every message class to include a static TypeId method, and ensuring that the IDs don’t collide becomes a burden requiring the code author to either maintain a list of IDs externally or check all other message classes manually before adding a new ID.

I can think of a few ways that could make the process less tiresome, but the added complications make the original enum list solution look simple and elegant in comparison.

Something simpler is needed.

Back to the drawing board

What I would like to have is a method that looks like this:

template <typename T> unsigned int TypeId()
{
    // Should return a unique value for each T...
}

If such a method can be implemented, then it would become trivial to identify any message class (or any class in the system for that matter) like so:

class A{};

int main(int argc, char** argv)
{
    unsigned int id = TypeId<A>();

    return 0;
}

This would also work nicely in an ID-based messaging system, since both sender and receiver already know about the message classes they care about and can directly identify them using their type IDs. Message passing could look like the following code:

class HeroMovedMessage
    : public MessageBase
{
    /* some details omitted ... */
public:
    Point Location() const;
};

class Hero
{
private:
    void SendMessage(unsigned int messageId, Message* msg)
    {
        // ...
    }
public:
    void Update()
    {
        HeroMovedMessage msg = HeroMovedMessage(x_, y_);
        SendMessage(TypeId<HeroMovedMessage>(), &msg);
    }
};

class HerosPet
{
public:
    void ReceiveMessage(unsigned int messageId, Message* msg)
    {
        if (messageId == TypeId<HeroMovedMessage>())
        {
            Message* msg_ = static_cast<HeroMovedMessage>(msg);
            // ...
        }
    }
};

So, yes, a working TypeId function would actually work pretty well in this particular case. Now all that is left is an implementation for it. For that, I’ll need two ingredients:

  1. A way to obtain the name of a class as a string.
  2. A hashing algorithm that will produce a unique ID from said string.

Item number two is a solved problem. As discussed at length in the post titled “Hashing Algorithms,” a CRC32 checksum or a SHA-1 or MD5 digest of the name of the class should give a unique ID for any reasonably-sized set of classes. I’ll be using CRC32 for this example because it’s already implemented in Boost.

Item number one requires more effort.

Getting the type name through RTTI

The closest thing to reflection in C++ is Run-Time Type Information (RTTI). The bits I’ll need are the typeid operator and the type_info class. Extracting a class name this was is a breeze:

// Forward declaration
int GenerateCrcChecksum(const char* data);

template <typename T> unsigned int TypeId()
{
    return GenerateCrcChecksum(typeid(T).name());
}

The name method returns the fully qualified name of the class, including the namespace and template parameters (if any). And since it’s part of the standard it should be portable across all compilers.

But we’re not quite done yet. RTTI has a bit of a reputation for being slow. How slow it is is a subject of long and lengthy debates, but I would rather err on the side of mature optimization and avoid RTTI altogether if possible.

Fortunately, there is a way.

Getting some help from the preprocessor

After some furious head scratching I ended up looking through the Microsoft Visual C++ predefined macros for help. The most promising one was the nonstandard __FUNCTION__ macro, described thusly:

Valid only within a function and returns the undecorated name of the enclosing function (as a string). __FUNCTION__ is not expanded if you use the /EP or /P compiler option.

The initial test was disappointing. Here’s the code:

class A{};

template <typename T> void WhatsMyName()
{
    std::cout << __FUNCTION__ << std::endl;
}

int main(int argc, char** argv)
{
    WhatsMyName<A>();
    return 0;
}

And here’s the output

WhatsMyName

Which not useful at all since there’s no mention of the template type used to invoke the function.

I took a shot in the dark and tried using a static member function instead of a free standing one:

class A{};

template <typename T> class WhatsMyName
{
public:
    static void Name()
    {
        std::cout << __FUNCTION__ << std::endl;
    }
};

int main(int argc, char** argv)
{
    WhatsMyName<A>::Name();
    return 0;
}

Here’s the output:

WhatsMyName<class A>::Name

Which is just perfect. I don’t even need to trim the resulting string to extract only the name of the class. I only need to have the fully qualified name somewhere inside the string since all the other text around it is fixed. I have also tested this method with templates and namespaces, and it still works.

GCC has a similar macro called __PRETTY_FUNCTION__ that works along the same lines. Since MSVC and GCC are the only two compilers that I care about, I can fall back to the RTTI method for other compilers.

So the new implementation for TypeId (or TypeIdentifier as it’s now called) becomes:

#include "GenerateTypeId.h"

template<typename T> class TypeIdentifier
{
public:
    static unsigned int id()
    {
        #if defined (_MSC_VER)
            return GenerateTypeId(__FUNCTION__);
        #elif defined(__GNUC__)
            return GenerateTypeId(__PRETTY_FUNCTION__);
        #else
            return GenerateTypeId(typeid(T).name());
        #endif
    }
private:
    TypeIdentifier();
};

I have moved the actual generation of the CRC checksum to a separate function in a different file and added a bit of caching and some checking for duplicate IDs while I was at it. The full source code is provided at the end of this post.

Final thoughts

One caveat here is that I would be very careful about using IDs generated this way for persistence or for sending objects across the wire. CRC checksums are deterministic, but the data used to generate them in this case is not the same across different compilers or possibly different versions of the same compiler. So caveat emptor there.

The other issue is that the IDs might collide given a large set of classes (or a particularly unlucky choice of names). But that problem is easy to detect early on in debug builds, which is what I opted to do in my code. And if that problem does occur then switching from CRC32 to SHA-1 or MD5 or simply renaming one of the two offending classes should take care of it.

Now I’ll leave you with the code. I tested it with Visual C++ 2010, but it should compile on G++ with C++0x extensions enabled.

5 Comments

# By Reddog on September 23, 2010, 9:21 pm

Hi there,

nice article. But I’m not completely happy with using CRC checksums as truly unique IDs. I’ve had the same problem recently so I came up with the following solution using the “curiously recurring template pattern”:

My requirements were the following:
1. Classes with the unique ID should have one base class (so I could define a type-safe pointer to them).
2. Each derived class should get an ID automatically without additional code in derived classes.
3. It should be possible to retrieve the ID from the class and from an instance.

My solution was (a little stripped down):

typedef unsigned int ComponentID;
const ComponentID INVALID_COMPONENT_ID = 0;

class IComponent {
public:
    virtual ComponentID GetID();
}

template <typename Derived>
class Component : public IComponent {

private:
    static ComponentID sFamilyID;

public:
    static ComponentID StaticFamilyID() {
        if (sFamilyID == INVALID_COMPONENT_ID) {
            sFamilyID = ++sNextFamilyID;
        }
        return sFamilyID;
    }

    ComponentID GetFamilyID() {
        if (sFamilyID == INVALID_COMPONENT_ID) {
            sFamilyID = ++sNextFamilyID;
        }
        return sFamilyID;
    }
};

template <typename Derived>
ComponentID Component::sFamilyID = INVALID_COMPONENT_ID;

Derived classes look like this:

class Positional : public Component<Positional> ...

I ended up having 2 different methods for static and dynamic lookup which is not perfect, but I couldn’t find a better solution.

# By Reddog on September 24, 2010, 11:46 am

Oh, I’ve just notices, that it interpreted my templates as HTML tags…

the IComponent class should really be a template with (class Derived) as parameter. And also the derived class would have itself as parameter for the IComponent base class, like:

class Positional : public IComponent[Positional] <- think "<" instead of "["

Sorry for this mistake, I actually don't know how to write the correct brackets here :(

# By Mohammad on September 24, 2010, 7:30 pm

I fixed the formatting of your code for you. I hope it’s the way you intended it to be.

I get the gist of what you’re trying to do, and I think it’s a valid solution. Only possible issues I could have with it is the requirement to derive from a common base class, which is something you actually want to have and I would prefer to avoid, and the need to fiddle with static member variables, which would require more care in multi-threaded scenarios.

I also understand why you (or anyone else for that matter) would be cautious with using CRCs as type identifiers. I myself am still surprised that it actually works without collisions.

But regardless, thank you for your input :)

# By Maxim on January 12, 2011, 6:02 am

Thanks a lot! This is the best way to detect typename without RTTI and memory leaks in type_info::name()!!!

# By Mohammad on January 12, 2011, 7:23 am

I’m glad it worked out for you :)

Powered by: