Improving string comparison performance by about 1100%

Usually, strings in D are compared either this way…

Snippet 1 – The convenient way

if(stringA == stringB) // This will give you a good (if not even best) performance in most cases since arrays are handled by reference.
{
    // strings are equal
}

…or this way…

Snippet 2 – The OldSchool-C-way

if(strcmp(stringA, stringB) == 0)
{
    // strings are equal
}

Let’s do a little benchmark first ;–)

We’ll compare two strings 100 million times to get expressive numbers

The convenient way (complete program)

import std.stdio;
import std.perf;

void main()
{
    auto pc = new PerformanceCounter();

    string str = "POST";

    pc.start();

    for(uint i=0;i<100_000_000;++i)
    {
        if(str == "POST")
        {
            // strings are equal
        }
    }

    pc.stop();
    writefln("Execution took %d ms", pc.milliseconds);
}

Note that “str” isn’t constant. A constant string wasn’t representative.

result: 3984ms

The OldSchool way (complete program)

import std.stdio;
import std.perf;
import std.string; // for (str)cmp

void main()
{
    auto pc = new PerformanceCounter();

    string str = "POST";

    pc.start();

    for(uint i=0;i<100_000_000;++i)
    {
        if(cmp(str, "POST") == 0) // cmp() is similar to strcmp()
        {
            // strings are equal
        }
    }

    pc.stop();
    writefln("Execution took %d ms", pc.milliseconds);
}

result: 3388ms

3984ms vs 3388ms – A 17% increase that won’t knock your socks off

Surprisingly, I’ve an ace in the hole and you hopefully a Little Endian CPU.

Using the C(O)MP(ARE)-Instruction of your CPU, we are able to compare 4 characters at once very quickly without the need of an internal loop within the strcmp()-Function. Simply treat the 4 characters as an 32 bit integer and let the magic happen. Don’t forget to bit shift the characters since it’s Little Endian.

I’ve wrapped everything into this nifty template

template str4_cmp(const char[] m, const char c0, const char c1, const char c2, const char c3)
{
    const char[] str4_cmp = "*(cast(uint*)"~m~")==(('"~c3~"'<<24)|('"~c2~"'<<16)|('"~c1~"'<<8)|'"~c0~"')";
}

So let’s put everything together…

import std.stdio;
import std.perf;

template str4_cmp(const char[] m, const char c0, const char c1, const char c2, const char c3)
{
    const char[] str4_cmp = "*(cast(uint*)"~m~")==(('"~c3~"'<<24)|('"~c2~"'<<16)|('"~c1~"'<<8)|'"~c0~"')";
}

void main()
{
    auto pc = new PerformanceCounter();

    string str = "POST";

    pc.start();

    for(uint i=0;i<100_000_000;++i)
    {
        if(mixin(str4_cmp!("str", 'P', 'O', 'S', 'T')))
        {
            // strings are equal
        }
    }

    pc.stop();
    writefln("Execution took %d ms", pc.milliseconds);
}

…and finally, compile and execute it.

result: 276ms – That’s an increase about 1100%!

Pronghorn uses this trick to determine the desired HTTP Request method (“GET ” <– note the whitespace, “POST”, etc.). The only drawback is that the string length has to be a multiple of 2 (using an unsigned short) or 4 (using an unsigned integer, as shown here). Hence we’ve to came up with an hybrid aproach to transparently replace existing strcmp()/strncmp() functions.

I highly recommend reading http://www.codeproject.com/KB/string/optimize_strings.aspx?msg=1009488 for further details on this

Note that the code snippets have been compiled with DMD 2.0.49 without any optimizations (-O has been omitted).

the new virtual host map - unfolded

Pronghorn's virtualHostMap class provides an internal API for creating, accessing and deleting virtual hosts objects.

But how's the data organized? The most common way is probably to simply store the hostnames in a hashmap. We did so prior to pronghorn release 0.8 since it's the simplest and probably most efficient way - unless it comes to wildcard subdomains.

What's so tricky about wildcard subdomains? Having a configured wildcard subdomain "*.example.com" and a request to "dharmainitiative.foo.bar.example.com", several lookups are required to find the appropriate virtual host which makes the lookup process quite expensive. The following steps are needed to find the associated virtual host:

VirtualHost: *.example.com, Request: dharmainitiative.foo.bar.example.com

  1. lookup dharmainitiative.foo.bar.example.com (doesn't exist)
  2. lookup *.foo.bar.example.com (doesn't exist)
  3. lookup *.bar.example.com (doesn't exist)
  4. lookup *.example.com (oh, we finally got it)

As you can see, four steps are required in this example.

 

As mentioned before, pronghorn 0.8 comes up with another approach, which will be explained in the following lines.

keyword: tree

All Hostnames are organized in a tree (as the domain name system, DNS, does). If a new virtual host is registered to the virtual host map, all its hostnames will be inserted into the tree by splitting the hostname into its labels ("example.com" becomes an array of "example" and "com") whereas the labels are placed into reverse order ("com", "example" - the TLD is represented by the first level of the tree). If the root node of the tree doesn't contain a child node with name "com", it will be created instead. The same happens with the "example" label. If the "com" node doesn't hold a child node with name "example", a new node will be created and linked to the "com" node.

Lookups work in a similar way. The root node is accessed, the existence of the "com" label is ensured, and if the child label "example" doesn't exist, the "com" label will be checked for a child label with name "*" - a wildcard one. Of course expensive string comparisons can't be abandonded, which are needed to find the desired label. But compared to flat hashmaps lookups are performed in an incremental way - step by step, hostname label by hostname label. In so doing there's no need to compare long strings at once which saves time and makes this approach more efficient. And last but not least memory is saved since each label is only stored once (at the respective position in the tree).

Deleting hostnames from the tree is the hardest part. Before deleting a label, it needs to be ensured that it isn't shared with other hostnames. But as of now, the new hostname implementation works like a charm.