gdc w/ d2.0.50 - GC crash @amd64

To my great joy, GDC has been merged with the current D frontend (which is 2.0.50, to express it numerically).

But apparently, the garbage collector is a mere wreck when it comes to 64 bit builds. Pronghorn crashes after exactly 94 sequential requests (segfault) when compiling in 64 bit mode unless we disable the GC by calling "GC.disable()".

And as I don't want to abandon the 64 bit support for now, there's no way to bypass this bug except disabling the GC and doing explicit memory management as we did in good old times (which I still prefer over garbage collection). But as Walter Bright points out here, garbage-collected programs are (usually) faster. Moreover D is designed as a garbage collected language - some features, such as array concentration, rely on the GC. Garbage collection, however, isn't well-suited for realtime applications such as server daemons since the arbitrary collects performed by the GC will block all threads temporarily. Hence we need to tune the GC behaviour anyway and perform manual collects whenever the server is idle.

In a few days we'll see whether I was able to fix that GC bug on my own, whether "ibuclaw" will have fixed it or if the official 64 bit DMD will have been published.

32 bit builds, however, are working fine.

Improving string comparison performance by about 1100%

Usually, strings in D are compared either this way…

Snippet 1 – The convenient way

if(stringA == stringB) // This will give you a good (if not even best) performance in most cases since arrays are handled by reference.
{
    // strings are equal
}

…or this way…

Snippet 2 – The OldSchool-C-way

if(strcmp(stringA, stringB) == 0)
{
    // strings are equal
}

Let’s do a little benchmark first ;–)

We’ll compare two strings 100 million times to get expressive numbers

The convenient way (complete program)

import std.stdio;
import std.perf;

void main()
{
    auto pc = new PerformanceCounter();

    string str = "POST";

    pc.start();

    for(uint i=0;i<100_000_000;++i)
    {
        if(str == "POST")
        {
            // strings are equal
        }
    }

    pc.stop();
    writefln("Execution took %d ms", pc.milliseconds);
}

Note that “str” isn’t constant. A constant string wasn’t representative.

result: 3984ms

The OldSchool way (complete program)

import std.stdio;
import std.perf;
import std.string; // for (str)cmp

void main()
{
    auto pc = new PerformanceCounter();

    string str = "POST";

    pc.start();

    for(uint i=0;i<100_000_000;++i)
    {
        if(cmp(str, "POST") == 0) // cmp() is similar to strcmp()
        {
            // strings are equal
        }
    }

    pc.stop();
    writefln("Execution took %d ms", pc.milliseconds);
}

result: 3388ms

3984ms vs 3388ms – A 17% increase that won’t knock your socks off

Surprisingly, I’ve an ace in the hole and you hopefully a Little Endian CPU.

Using the C(O)MP(ARE)-Instruction of your CPU, we are able to compare 4 characters at once very quickly without the need of an internal loop within the strcmp()-Function. Simply treat the 4 characters as an 32 bit integer and let the magic happen. Don’t forget to bit shift the characters since it’s Little Endian.

I’ve wrapped everything into this nifty template

template str4_cmp(const char[] m, const char c0, const char c1, const char c2, const char c3)
{
    const char[] str4_cmp = "*(cast(uint*)"~m~")==(('"~c3~"'<<24)|('"~c2~"'<<16)|('"~c1~"'<<8)|'"~c0~"')";
}

So let’s put everything together…

import std.stdio;
import std.perf;

template str4_cmp(const char[] m, const char c0, const char c1, const char c2, const char c3)
{
    const char[] str4_cmp = "*(cast(uint*)"~m~")==(('"~c3~"'<<24)|('"~c2~"'<<16)|('"~c1~"'<<8)|'"~c0~"')";
}

void main()
{
    auto pc = new PerformanceCounter();

    string str = "POST";

    pc.start();

    for(uint i=0;i<100_000_000;++i)
    {
        if(mixin(str4_cmp!("str", 'P', 'O', 'S', 'T')))
        {
            // strings are equal
        }
    }

    pc.stop();
    writefln("Execution took %d ms", pc.milliseconds);
}

…and finally, compile and execute it.

result: 276ms – That’s an increase about 1100%!

Pronghorn uses this trick to determine the desired HTTP Request method (“GET ” <– note the whitespace, “POST”, etc.). The only drawback is that the string length has to be a multiple of 2 (using an unsigned short) or 4 (using an unsigned integer, as shown here). Hence we’ve to came up with an hybrid aproach to transparently replace existing strcmp()/strncmp() functions.

I highly recommend reading http://www.codeproject.com/KB/string/optimize_strings.aspx?msg=1009488 for further details on this

Note that the code snippets have been compiled with DMD 2.0.49 without any optimizations (-O has been omitted).