Under The Microscope

How To Shrink Your Source Code

This post was written by Rogue Amoeba alumnus Mike Ash.

Prior to submitting my entry to a certain infamous contest, I discovered that even after reducing all possible identifiers to one or two characters, my source code was still far over the limit required by the contest. Needing to make up something like a 30% margin on my already insanely compressed code, I thought I was doomed. After several days of on and off effort, however, I just squeaked in under the limit. The ultimate limit is a count of 2048 characters excluding whitespace, or the characters {, }, or ; followed by whitespace. The final version of my entry came in at 2017 such characters.

Today I’m going to show you how to shrink your C source code like a real professional. Pay careful attention, and you can put these skills in to use at your next job.1

1. Use short identifiers
This one may seem obvious, but it’s the most effective if your code isn’t already using it. C has 53 legal one-character identifiers (yes, _ is a legal identifier too), and you should use every single one of them.

2. Optimize the use of short identifiers
If your program is large enough then you probably have more than 53 different identifiers in your program, meaning that you’ll be forced to use some two-character identifiers. For best effect, you want to spend your 53 one-character identifiers on the symbols that are used the most frequently.

To that end, I wrote a small python script that will tell you about underused short identifiers. Pipe your program into its standard input, and as its standard output it will print a list of all identifiers in your application, sorted by frequency of use. It will also print all legal one-character identifiers even if you aren’t using some of them. Then you can rename things so that all single character identifiers are at the top of the list.

#!/usr/bin/python
 
import re
import sys
 
def hasprefix(str, prefix):
    return len(str) >= len(prefix) and str[:len(prefix)] == prefix
 
def readin(f):
    str = ''
    for l in f.readlines():
        if not hasprefix(l, '#include'):
            str += l
    return str
 
def makedict(str):
    dict = {}
    
    for c in '_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ':
        dict[c] = 0
    
    ident_re = re.compile(r"([a-zA-Z_]([a-zA-Z0-9_]*))")
    for match in ident_re.findall(str):
        ident = match[0]
        if not dict.has_key(ident):
            dict[ident] = 0
        dict[ident] = dict[ident] + 1
    return dict
 
def sortedarray(dict):
    array = []
    for k in dict.keys():
        array += ((k, dict[k]),)
    array.sort(lambda x, y: len(x[0]) - len(y[0]))
    array.sort(lambda x, y: -x[1] + y[1])
    return array
 
str = readin(sys.stdin)
dict = makedict(str)
array = sortedarray(dict)
for item in array:
    print '%02d --> %s' % (item[1], item[0])

3. Reuse identifiers
You’ve run out of precious one-character identifiers but you still need to save more space, what to do? Shadowing is your friend. Any one-character function that isn’t called in the current function, or one-character global that isn’t used, can be reused as a local variable name with no ill effect.

If you want to take this further, you can directly reuse global variables for local storage as long as the types match, and you can be absolutely certain that nothing you call will manipulate that global. This not only lets you reuse a name, but also saves on lengthy declarations.

4. typedef
Type names get used a lot, and they’re pretty long. Make them shorter with typedef. Of course you have to pay for the typedef statement itself, but if the type name is used enough then this can easily pay off. For example, the string typedef int i; is 12 non-whitespace characters, but will save you 2 characters in each use of int, so you only need 7 uses of int in your code to make this a win.

5. #define
This is like the typedef tip, but more general. Use #define to shrink common bits of code. For example, if you use the typedef trick more than a couple of times, you can #define t typedef and save even more.

You can get even fancier and take advantage of macro arguments. If you have a bunch of code that’s similar but not identical, you can #define it to get it shorter. For example, if you have a bunch of functions with the same signature, you can save on space:

#define f(n) void n(int a, float b, char *c) {
 
f(q)
    int aLocal;
    ...
}
 
f(r)
    ...
}

You can get really fancy using the ## preprocessor operator to glue strings together. For example, here is a bit of code from an earlier version of my entry:

v D()
{
    tcgetattr(0, &B);
    A = B.c_lflag;
    B.c_lflag &= ~ICANON & ~ECHO;
    tcsetattr(0, TCSANOW, &B);
}
 
v C()
{
    tcgetattr(0, &B);
    B.c_lflag = A;
    tcsetattr(0, TCSANOW, &B);
}

All of those tcget/setattr calls take up valuable space, and I can’t rename them because they’re library functions. And I only call each one twice, so there’s not much gain with a #define on them. But with a bit of cleverness I can make a pair of #defines that will cover all four calls, and cover much more than just the function names:

v D()
{
#define O(x) tc ## x ## etattr(0,
#define TD O(s) TCSANOW, &B);
    O(g) &B);
    A = B.c_lflag;
    B.c_lflag &= ~ICANON & ~ECHO;
    TD
}
 
v C()
{
    O(g) &B);
    B.c_lflag = A;
    TD
}

6. Avoid character constants
A character constant uses three characters in code. But most characters have an ASCII value less than 100. Save one character per use by writing out the ASCII value directly instead. Instead of 'A', write 65. This is a small savings but it can really add up.

7. Use - instead of !=
In nearly every situation, the != operator can be replaced with the – operator with no change in functionality. This is not true when you have code that relies on the result of the comparison to be either 0 or 1, if you’re assigning the result to a variable of a type where a non-zero result might be converted to 0, or if you’re simply assigning the result to a variable of an incompatible type. Again, you save one character per use.

8. Use & and |
In many situations, the use of && and || is unnecessary, and they can be replaced with their one-character bitwise cousins. Be careful that you preserve the semantics of your program when doing this substitution, particularly if you use the advice in #7. Bitwise or will always give you the same result, truth-wise, as logical or, but bitwise and will not; use bitwise and only when you know that the truth values on both sides will always have at least one 1 bit in common. If you can’t guarantee this then you may be able to get away with using the * operator, but beware of overflow. You can take advantage of the fact that C’s comparison operators will always give either a 0 or a 1 as their result, so there’s no problem in doing (a < b) & (c > d).

9. Use the ternary operator and short circuiting
?: is shorter than if/else and is equivalent in most situations, so use it when you can. For situations without the else, you may also take advantage of the short-circuiting properties of && and ||.

10. Adjust your constants
Using constants can help reduce code size by substituting a short identifier for a longer number. Watch carefully how your constant is used. If you discover that most code using the constant is immediately subtracting one from it, for example, you can simply subtract one from the constant itself, then modify the minority code to add one, saving valuable characters. In the extreme case, you may discover that the manipulation drops a digit from the constant, making it use less space to simply inline the new, shorter number in your code.

Putting it into practice
To demonstrate these principles, I wrote a short calculator program. It takes input on stdin in prefix notation, and prints the results of the calculations. You can give it input such as + * 5 5 7 and it will give you the answer of 32. This original non-shrunk version is 411 significant characters:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
int getnum(int c)
{
    int n = 0;
    do
    {
        n *= 10;
        n += c - '0';
        c = getchar();
    }
    while(c >= '0' && c <= '9');
    
    return n;
}
 
int calc(void)
{
    int c = getchar();
    if(c == EOF)
        exit(0);
    
    if(c == ' ' || c == '\n')
        return calc();
    
    if(c == '*')
        return calc() * calc();
    if(c == '+')
        return calc() + calc();
    if(c == '-')
        return calc() - calc();
    if(c == '/')
        return calc() / calc();
    
    // read a number
    return getnum(c);
}
 
int main(int argc, char **argv)
{
    while(1)
        printf("%d\n", calc());
    return 0;
}

By putting these principles into practice, I was able to shrink it to a mere 284 characters. There was a certain loss of readability, but I’m sure you will all agree that the tradeoff is well worth it:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
#define r return
#define h c = getchar();
typedef int i;
 
i _(i c)
{
    i _ = 0;
    do
    {
        _ = _ * 10 + c - 48;
        h
    }
    while(c > 47 & c < 58);
    
    r _;
}
 
i d(void)
{
    i h
    if(c + 1)
    {
        r
        c - 32 && c - 10 ?
    
    #define o(x, y) c == x ? d() y d() :
        o(42, *)
        o(43, +)
        o(45, -)
        o(47, /)
        
        _(c) :
        d();
    }
    exit(0);
}
 
i main(i x, char **y)
{
    while(1)
        printf("%d\n", d());
    r 0;
}

Footnotes:
1. If you haven’t caught on yet, this article is intended to be humorous. It’s not truly helpful, unless you too are entering a contest whose goal is to write horrible code.

Cocoa Shaders

Just as Mike Ash occasionally posts more technical content on his blog, AHT lead programmer Guy English has started his own blog. Visit the Kicking Bear blog, and read all about Cocoa Shaders in the first entry.

Experiences In Web Hosting

Major hosting provider Dreamhost.com had a scheduled but overly-long downtime this weekend, and conversations on web hosting have come up several times since. At Rogue Amoeba, we’ve gone through a several web hosts over the years, and seen at least two die off in the process. As such, I thought I’d detail our own experiences with web hosting, and provide a bit of advice.

Web Hosting Rule #1: You get what you pay for.

This rule is true of most things in life, but it is perhaps less obvious in webhosting. Most hosts can offer you 99% or even 99.9% uptime, and that amounts to being down just a few hours per year. That sounds great on paper. In practice, however, it’s a whole different matter.

When Rogue Amoeba got started, we found a dirt-cheap web host, iSuperWeb.com. “iSuperWeb”? What the hell were we thinking? But it was the carefree days of 2002 and we were young and foolish. Within our first month, we had to upgrade our account twice as we went over bandwidth limits. This host actually shut down our account the first time this happened, posting a “Bandwidth exceeded” message, instead of simply charging us for the overage. That probably should have been a hint to us. Within three months (early 2003) they were dead, and we’d switched to a new host.

That new host was Eryxma, who gave us plenty of bandwidth. Erxyma was run by a couple guys who were in over their heads, but it was fine for us. We were a small company with very little in the way of traffic and we had better problems to consider1.

In our case, Eryxma gave us a Virtual Private Server (VPS), basically a slice of a full machine made to look like a real machine. You have full control of your virtual machine, and can reboot it, access it as root, and so on. You’re the administrator of this VPS, so you can do whatever your like. My email records show me happily recommending Eryxma to at least one person (Sorry Tim!).

Eryxma was great, it was very inexpensive, and it met our needs. Until one day, in early 2004, when it didn’t. In a short period of time, we started seeing many problems with Eryxma as the company crumbled2. We decided to start look for a new host. Quentin was fed up with a string of crappy web hosts going back to well before we founded Rogue Amoeba and convinced me to take a second look at Pair networks.

Web Hosting Rule #2: Everyone loves their current host until the server is down.

Pair’s been around since 1995 and many of their servers have been up almost as long. Their prices have always had a premium attached to them compared to other hosts, and this had turned me off for a long time. But we’ve been using them for three years now, so when I was recently asked “Is Pair’s network and overall reputation so great that they can command those prices?”, my answer was simple: “Yes”.

As noted, everyone will recommend their current host, until their site goes down. With Pair, we’ve been down once, when our HD died. This was fixed within about two hours, with no data loss. If our server never goes down, our love for Pair never stops. We trust them to provide rock-solid hosting, and that’s the most important thing to us.

Pair locks down your account, meaning you don’t get the same level of control as with a VPS, or even with many other similar hosting providers. But here’s the rub: that’s generally a good thing. Less power leads to increased stability across the board, and when you’re sharing a machine with others, that’s a very good thing.

Web Hosting Rule #3: Have A Contingency Plan

So we set up an inexpensive Pair account to host our main web site (rogueamoeba.com), while keeping our downloads on Eryxma. This two-pronged approach meant that we didn’t need a very high-end account with Pair, as our downloads account for the vast majority of our bandwidth usage. That kept costs down at Pair, and let us pay for what we call a “bandwidth sink” elsewhere.

Having two hosts probably seems a bit odd, but it’s worked out very well. If our less reliable host fails, we flip over and let Pair handle our downloads. In a true worst-case scenario, if Pair fails, we have an account all set up and can transition our site to the other host machine. See below for more on this setup.

Meanwhile, Eryxma was still falling apart, so we went looking for a more reliable VPS host and found ServInt. They too provided us with a VPS, and while the price was more expensive they were (and remain) far more stable. We played around on the VPS, using it to run our automated processes such as order processing.

Web Hosting Rule #4: Run your software company, not your web server.

As noted, this two-pronged approach worked quite well for us and we continue to use it now. However, having a VPS just proved to be a hassle for us. We all know our way around a command line, but while the VPS gave us as much power as we could want, we paid the price when we had to administer it ourselves. So a year ago, we looked at upgrading our Pair account.

Pair offers dedicated hosting (your own rented machine) through their QuickServe plans. These are quite expensive, a couple hundred dollars a month, so this was a big step up for us. However, we took advantage of their discounted QS-X plan, which offers an over-stock machine at a reduced price. With the QuickServe, we have a full machine to ourselves, and lots of bandwidth. We shut down our VPS and moved everything over to Pair.

Even with our own machine, we’re still locked down pretty well. We don’t have root access, and we can’t install new packages, which means Pair nickels and dimes us for add-ons like installing Subversion ($50). That’s a lot of money for something we could do ourselves if we just had the access, and this used to grate on me. Now, however, I have a different mindset: we’re paying a sysadmin in discrete chunks. He gets $50 for his task and we know it will be done right. Meanwhile, we don’t have to pay a full salary to someone who spends most of the day reading Slashdot.

Here We Are Now

I said that we still use a two-pronged approach, but when we got our QuickServe, we moved everything there. Around the same time, we realized that we would still do well to have more bandwidth if needed, and it might be nice to be able to noodle around a bit more than we can on Pair. On recommendation from Mike, we checked out DreamHost, whose prices are rather unbelievable. They offer plans with what is effectively unlimited bandwidth and storage for just a few dollars a month.

How do they do it? I don’t really know. Perhaps it’s all a house of cards. The important thing for us is that it doesn’t matter. Dreamhost hosts only one important public-facing site for us, BigBlueAmoeba.com3 (BBA). BBA is our download server, as long as it’s alive.

We have a mirror checker script that runs on our main site every six minutes. It checks the local files and makes sure BBA has the same copies. If it can’t reach BBA, or if the files don’t match, our Pair.com site (rogueamoeba.com) takes over handing out downloads. Worst case, we have 6 minutes of downtime for our downloads, and this has never been an issue. The upshot is that we again have a bandwidth sink, and it’s very inexpensive.

So that brings us up to our current setup. Pair.com hosts rogueamoeba.com, our main website, as well as rogueamoeba.net, which handles our order processing and other internal needs. This hosting costs us something like $1500 a year, but we know we can count on it 100%. Meanwile, DreamHost handles our downloads and other various needs (I put MacSanta.com there, for instance). This costs us around $100 a year.

Closing Advice

We had a lot of experience with inexpensive but unreliable hosting, and it was just never worth it. I certainly would recommend not using a host like DreamHost or one of the many other cut-rate hosts for a website that you want available 24/7. We love DreamHost, but if we cared about reliability there, we’d be gone in a heartbeat.

I firmly believe that everyone would do well to use Pair, or a company with a similarly bullet-proof track record4. I wouldn’t suggest a QuickServe to every software company, but Pair’s simpler web hosting plans are quite affordable. They cost more than some other hosts but you really do get what you pay for. For us, that piece of mind is worth a few dollars more per month.

If you’re interested in switching web hosts, you can click these shameless and self-serving referral links:

As well, if you’re switching to Pair, use coupon code REFUGEE to pay $0 on setup fees.

Footnotes:
1. Important questions such as “Can we sell a tool that lets you record any audio without getting sued out of existence?”

2. Maybe we need a parallel rule set. Web Hosting Rule #A: Never go with a host whose name looks like a skin disease

3. 10 AmmoPoints* to the first person who can tell me (here in the comments) what the domain BigBlueAmoeba.com is parodying.
* Even with the weak US Dollar, 1 AmmoPoint is equal to $0.00001, and may thus only be exchanged in blocks of 1000 AmmoPoints

4. Another host often mentioned in the same breath as Pair (pair.com) is Hurricane Electric (he.com). They’ve been around since 1994 and have a similarly-excellent track record (despite the name). This leads me to another parallel rule, Web Hosting Rule #B: The shorter your host’s domain name, the better.

This Is Only A Test

Update #2: Hello TUAW and Ars readers. This post really wasn’t meant for you, but apparently on this so-called “Interweb”, anyone can link to anything. This information is disturbing, to say the least, and I’ve already written my representatives about it.

Anyhow, this post was written for the regular readers of our weblog, so we could pick up some testers for our next application. As audio users and users of our existing product, they’re the perfect group of testers, and I firmly believe they’ll find AHT to be revolutionary. As for the rest of the world, we’ll have to see.

The problem here is we’re not yet ready to reveal any information about AHT, so there’s really nothing to see here. Maybe you’d be interested in this recent article on our web hosting experience?

We didn’t intend for this to make news anywhere but with our existing readers, but by getting linked from the aforementioned sites, we’ve apparently “built buzz”. Oddly enough, we prefer to do our marketing when we have an application available to buy, not before. Weird, I know. Here, all we really wanted to do was get some testers. I must say, this certainly worked better last time around! I guess we’ll have to switch things up for the future.
—–
Our next major application has been in development since last year, and we’re ready to begin a private testing process. We’re not saying much about this new application, but its code name is AHT and I’m confident I can use the word revolutionary in describing it.

If you’re interested in testing AHT, just follows these simple steps.

Step 1: Register for an account on our Forums (or Fora, if you prefer).

Step 2: E-mail us with your full name, email address, and forum member name.

—–
Update #1: We got only a few dozen submissions through Sunday, but as of Monday afternoon, we’ve received several hundred emails. Apparently, no one works at work. Anyhow, we’re all set for now, so we’re closing the submissions. We’ll choose randomly, and email those who get in Wednesday evening. If you don’t hear from us, stay tuned, as we’ll likely bring on a second wave. Thanks to everyone for your interest!
—–

Step 3: Sit back, relax, grab a drink, and wait to hear from us.

Please just email us once – we won’t be able to respond to emails about this. We’ll be drawing our first pool of testers next Wednesday, so if you’re chosen you’ll hear from us then. If you don’t hear back from us, stay tuned as we’ll likely bring on additional testers in the future.

Feel free to guess at what AHT means in the comments here.

The Planets Are Aligning Favorably

I’m up late most nights (around 4 am) so I don’t usually get up before 10 am (the Independent Software Vendor life has its perks). As such, I don’t imagine I’ll be up at 9:30 am EST later today (February 22), but perhaps you will be. If so, or if we’re past that already, you should check out http://www.red-sweater.com/.

Especially if you’re a blogger who uses a Mac.