Posted By Quentin Carnicelli on May 4th, 2009
Back in December, immediately after we released Radioshift Touch, I sat down to solve the sales data problem we had. Although Apple provides AppStore developers with sales data, the raw numbers are only available for the past 7 days. We’re accustomed to having sales data going back…well, forever. For example, on this day six years ago, we sold 7 copies of Audio Hijack. Wanting to have this same level of historic data, I set about building a sales data collection system for iTunes Connect and our iPhone applications.
Luckily for me, someone had already went down this path. Kirby Turner had released AppDailySales, a python script that would automatically log into iTunesConnect and fetch the raw sales data. This would provide the base for what would become iTunesConnectArchiver.
My first task was rewriting AppDailySales to use BeautifulSoup for web scraping. BeautifulSoup is a wonderful little library that I will probably talk more about in a future post (we use it in Pulsar). This conversion gave us a web scraper that I could trust wouldn’t break every other week, as such things are prone to doing.
Next I set about adding some basic data parsing and munging. Raw sales data out of iTunes Connect is almost impossible to make any sense of, and needs to be heavily massaged to give you even a simple “I made $X today” figure. A big part of this is dealing with all the various currencies. iTunesConnectArchiver attempts to convert every foreign currency it sees into USD, basing it on the exchange rate for the day of that sale. No more will you be lulled into a false sense of richness because you made 25,000 yen.
The final piece of the puzzle was persistence. For this, I dumped all the parsed sales figures into an SQLite database. A big worry here was “missing a day”. A lot of people seem to collect their iTunes Connect sales data by hand, and if they forget to grab a day, it can disappear and never be seen again. Thus, iTunesConnectArchiver downloads all available data (typically 7 days worth of sales) and merges all of that into the database. It can miss as many as 6 days, and still not lose any data. We’ve been running it nightly with a cron task, and have a complete set of sales data thus far.
I also hacked in some basic ASCII charts and graphs to get some basic idea of how things are doing, but didn’t go too far down this road, as my main concern was just getting the data into storage for future usage.
After I finished the script in December, I decided to test it out for a few weeks to see if there were any bugs before releasing it. And sure enough, 4 months later when we shipped Airfoil Speakers Touch, there was a divide-by-zero bug (because AFSTouch is free!). With that sorted out, I am finally releasing it here:
It requires python
2.5 2.6 or later with the sqlite3 module, and BeautifulSoup (which is included). Also included is a ReadMe that describes in more detail how it works and how to use it.
I should note that I mainly intend this as something for others to build their own tools from. I will accept bug fix patches, but will probably reject any feature patches or feature requests which I don’t use myself. So, feel free to take this script and steal whatever you need from it for your own projects.