Under The Microscope

Archive for March 7th, 2007

How To Shrink Your Source Code

Prior to submitting my entry to a certain infamous contest, I discovered that even after reducing all possible identifiers to one or two characters, my source code was still far over the limit required by the contest. Needing to make up something like a 30% margin on my already insanely compressed code, I thought I was doomed. After several days of on and off effort, however, I just squeaked in under the limit. The ultimate limit is a count of 2048 characters excluding whitespace, or the characters {, }, or ; followed by whitespace. The final version of my entry came in at 2017 such characters.

Today I’m going to show you how to shrink your C source code like a real professional. Pay careful attention, and you can put these skills in to use at your next job.1

1. Use short identifiers
This one may seem obvious, but it’s the most effective if your code isn’t already using it. C has 53 legal one-character identifiers (yes, _ is a legal identifier too), and you should use every single one of them.

2. Optimize the use of short identifiers
If your program is large enough then you probably have more than 53 different identifiers in your program, meaning that you’ll be forced to use some two-character identifiers. For best effect, you want to spend your 53 one-character identifiers on the symbols that are used the most frequently.

To that end, I wrote a small python script that will tell you about underused short identifiers. Pipe your program into its standard input, and as its standard output it will print a list of all identifiers in your application, sorted by frequency of use. It will also print all legal one-character identifiers even if you aren’t using some of them. Then you can rename things so that all single character identifiers are at the top of the list.

#!/usr/bin/python
 
import re
import sys
 
def hasprefix(str, prefix):
    return len(str) >= len(prefix) and str[:len(prefix)] == prefix
 
def readin(f):
    str = ''
    for l in f.readlines():
        if not hasprefix(l, '#include'):
            str += l
    return str
 
def makedict(str):
    dict = {}
    
    for c in '_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ':
        dict[c] = 0
    
    ident_re = re.compile(r"([a-zA-Z_]([a-zA-Z0-9_]*))")
    for match in ident_re.findall(str):
        ident = match[0]
        if not dict.has_key(ident):
            dict[ident] = 0
        dict[ident] = dict[ident] + 1
    return dict
 
def sortedarray(dict):
    array = []
    for k in dict.keys():
        array += ((k, dict[k]),)
    array.sort(lambda x, y: len(x[0]) - len(y[0]))
    array.sort(lambda x, y: -x[1] + y[1])
    return array
 
str = readin(sys.stdin)
dict = makedict(str)
array = sortedarray(dict)
for item in array:
    print '%02d --> %s' % (item[1], item[0])

3. Reuse identifiers
You’ve run out of precious one-character identifiers but you still need to save more space, what to do? Shadowing is your friend. Any one-character function that isn’t called in the current function, or one-character global that isn’t used, can be reused as a local variable name with no ill effect.

If you want to take this further, you can directly reuse global variables for local storage as long as the types match, and you can be absolutely certain that nothing you call will manipulate that global. This not only lets you reuse a name, but also saves on lengthy declarations.

4. typedef
Type names get used a lot, and they’re pretty long. Make them shorter with typedef. Of course you have to pay for the typedef statement itself, but if the type name is used enough then this can easily pay off. For example, the string typedef int i; is 12 non-whitespace characters, but will save you 2 characters in each use of int, so you only need 7 uses of int in your code to make this a win.

5. #define
This is like the typedef tip, but more general. Use #define to shrink common bits of code. For example, if you use the typedef trick more than a couple of times, you can #define t typedef and save even more.

You can get even fancier and take advantage of macro arguments. If you have a bunch of code that’s similar but not identical, you can #define it to get it shorter. For example, if you have a bunch of functions with the same signature, you can save on space:

#define f(n) void n(int a, float b, char *c) {
 
f(q)
    int aLocal;
    ...
}
 
f(r)
    ...
}

You can get really fancy using the ## preprocessor operator to glue strings together. For example, here is a bit of code from an earlier version of my entry:

v D()
{
    tcgetattr(0, &B);
    A = B.c_lflag;
    B.c_lflag &= ~ICANON & ~ECHO;
    tcsetattr(0, TCSANOW, &B);
}
 
v C()
{
    tcgetattr(0, &B);
    B.c_lflag = A;
    tcsetattr(0, TCSANOW, &B);
}

All of those tcget/setattr calls take up valuable space, and I can’t rename them because they’re library functions. And I only call each one twice, so there’s not much gain with a #define on them. But with a bit of cleverness I can make a pair of #defines that will cover all four calls, and cover much more than just the function names:

v D()
{
#define O(x) tc ## x ## etattr(0,
#define TD O(s) TCSANOW, &B);
    O(g) &B);
    A = B.c_lflag;
    B.c_lflag &= ~ICANON & ~ECHO;
    TD
}
 
v C()
{
    O(g) &B);
    B.c_lflag = A;
    TD
}

6. Avoid character constants
A character constant uses three characters in code. But most characters have an ASCII value less than 100. Save one character per use by writing out the ASCII value directly instead. Instead of 'A', write 65. This is a small savings but it can really add up.

7. Use - instead of !=
In nearly every situation, the != operator can be replaced with the - operator with no change in functionality. This is not true when you have code that relies on the result of the comparison to be either 0 or 1, if you're assigning the result to a variable of a type where a non-zero result might be converted to 0, or if you're simply assigning the result to a variable of an incompatible type. Again, you save one character per use.

8. Use & and |
In many situations, the use of && and || is unnecessary, and they can be replaced with their one-character bitwise cousins. Be careful that you preserve the semantics of your program when doing this substitution, particularly if you use the advice in #7. Bitwise or will always give you the same result, truth-wise, as logical or, but bitwise and will not; use bitwise and only when you know that the truth values on both sides will always have at least one 1 bit in common. If you can't guarantee this then you may be able to get away with using the * operator, but beware of overflow. You can take advantage of the fact that C's comparison operators will always give either a 0 or a 1 as their result, so there's no problem in doing (a < b) & (c > d).

9. Use the ternary operator and short circuiting
?: is shorter than if/else and is equivalent in most situations, so use it when you can. For situations without the else, you may also take advantage of the short-circuiting properties of && and ||.

10. Adjust your constants
Using constants can help reduce code size by substituting a short identifier for a longer number. Watch carefully how your constant is used. If you discover that most code using the constant is immediately subtracting one from it, for example, you can simply subtract one from the constant itself, then modify the minority code to add one, saving valuable characters. In the extreme case, you may discover that the manipulation drops a digit from the constant, making it use less space to simply inline the new, shorter number in your code.

Putting it into practice
To demonstrate these principles, I wrote a short calculator program. It takes input on stdin in prefix notation, and prints the results of the calculations. You can give it input such as + * 5 5 7 and it will give you the answer of 32. This original non-shrunk version is 411 significant characters:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
int getnum(int c)
{
    int n = 0;
    do
    {
        n *= 10;
        n += c - '0';
        c = getchar();
    }
    while(c >= '0' && c <= '9');
    
    return n;
}
 
int calc(void)
{
    int c = getchar();
    if(c == EOF)
        exit(0);
    
    if(c == ' ' || c == '\n')
        return calc();
    
    if(c == '*')
        return calc() * calc();
    if(c == '+')
        return calc() + calc();
    if(c == '-')
        return calc() - calc();
    if(c == '/')
        return calc() / calc();
    
    // read a number
    return getnum(c);
}
 
int main(int argc, char **argv)
{
    while(1)
        printf("%d\n", calc());
    return 0;
}

By putting these principles into practice, I was able to shrink it to a mere 284 characters. There was a certain loss of readability, but I'm sure you will all agree that the tradeoff is well worth it:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
#define r return
#define h c = getchar();
typedef int i;
 
i _(i c)
{
    i _ = 0;
    do
    {
        _ = _ * 10 + c - 48;
        h
    }
    while(c > 47 & c < 58);
    
    r _;
}
 
i d(void)
{
    i h
    if(c + 1)
    {
        r
        c - 32 && c - 10 ?
    
    #define o(x, y) c == x ? d() y d() :
        o(42, *)
        o(43, +)
        o(45, -)
        o(47, /)
        
        _(c) :
        d();
    }
    exit(0);
}
 
i main(i x, char **y)
{
    while(1)
        printf("%d\n", d());
    r 0;
}

Footnotes:
1. If you haven't caught on yet, this article is intended to be humorous. It's not truly helpful, unless you too are entering a contest whose goal is to write horrible code.