November 23, 2009
My co-worker Colin mentioned the paper Unix for Poets (PDF, 60 Kb) to me a while ago, and I thought it was a fun read. It covers a lot of basic text-processing and command-line magic, and is a really cool introduction to the power of the command-line interface and the Unix philosophy.
I wouldn’t bring out the big guns (fancy machines, fancy algorithms, data collection committees, bigtime favors) unless you have a lot of text (e.g., hundreds of million words or more), or you are trying to count really long ngrams (e.g., 50-grams). This chapter will describe a set of simple Unix-based tools that should be more than adequate for counting trigrams on a corpus the size of the Brown Corpus. I’d recommend that you do it yourself for basically the same reason that home repair stores like DIY and Home Depot are as popular as they are. You can always hire a pro to fix your home for you, but a lot of people find that it is better not to, unless they are trying to do something moderately hard. Hamming used to say it is much better to solve the right problem naively than the wrong problem expertly.
November 23, 2009
Seen on Gizmodo: an 8-Bit Wedding Invitation!
November 22, 2009
Seen on Planet Debian: Mimesweeper.
November 22, 2009
An article on Lambda the Ultimate about literate programming. I didn’t read all of these papers — only the "Programming on a Team Project" — and have never really done any literate programming myself, but it’s an interesting methodology and I sometimes wish it caught on better.
In my experience, healthy projects either have very little in the way of comments, beyond architectural descriptions, or have lots and lots of comments, sometimes one per each line of code (mature projects tend towards this end). I think XP is probably right to suggest that energy spent commenting is better spent refactoring or improving the codebase.
And yet LP still has a compelling power (at least for me)! My feeling is that there are some applications which benefit a lot from a literate style — namely research papers, data analyses, and tutorials. But for everything else I think it’s probably better relegated to the museum of history.
November 19, 2009
Seen on Planet Debian: Joey Hess, author of ikiwiki, takes a look at couchdb. As ikiwiki is the inspiration for one of my side projects, it’s important to me what he has to say on the subject of backends.
Couchdb is very unlike a distributed VCS, and yet it’s moved from traditional database country much closer to VCS land. It’s document oriented, not normalized; the data stored in it has significant structure, but is also in a sense freeform. It doesn’t necessarily preserve all history, but it does support multiple branches, merging, and conflict resolution.
I’m still not sure that Couchdb is good for the sort of things I want a backend for — history-aware computing still seems to me to need complete history, and for that you need a VCS (or build your own). Still, interesting times..
November 19, 2009
Via Suzanne: a fascinating autobiographical article by Paul Lutus.
You may have heard about me. In the computer business I’m known as the Oregon Hermit. According to rumor, I write personal computer programs in solitude, shunning food and sleep in endless fugues of work. I hang up on important callers in order to keep the next few programming ideas from evaporating, and I live on the end of a dirt road in the wilderness. I’m here to tell you these vicious rumors are true.
Personal favorite line?
Also, I’ve been told that good programmers rarely have mates. This is usually offered as evidence of how asocial we are. Without fail, we’re pictured as disheveled cyber-hobos hanging around computer centers, shunning serious relationships, coding for the sake of coding. I can’t really disagree with this view, but there is something interesting behind it-at least for me. I began to notice, as I got more involved with computers, that acceptance by the machine required absolute precision on my part. The slightest misstep caused the instant erasure of many hours of work; the machine would reject everything with perfect dispassion until each detail was just right. Then the program would suddenly function beautifully, and never fail again… The result of this strange relationship was that for a time I became too spoiled for the flesh-and-blood women around me. I got tired of hearing, "If I’ve told you once, I’ve told you a thousand times-the answer is maybe!"
November 18, 2009
Seen on Planet Debian, this parody of this Slashdot comment.
I get the impression that the Windows 7 launch is a lot like seeing an old boyfriend suddenly show up on your doorstep wanting to get back together. He’s had some work done, apparently: stomach stapling to take off some of the weight, teeth whitening, and a radical nosejob to make him look as much like your current boyfriend as medical science will allow.
He’s handsome, of course, almost too handsome. He still uses far too much product in his hair and carries that desperate look in his eyes. The fragrant haze around him is the cologne he overuses to mask the scent of failure.
Call me crazy, but I find the old-boyfriend version more compelling than the old-girlfriend version. I hadn’t seen the version on Slashdot, so to me it was just a fascinating analogy between Windows 7 and an old abusive boyfriend. I guess I missed the whole "deconstruction".
November 17, 2009
Did you know Wikipedia has a list of unusual articles? My favorite is the chicken gun ("a large diameter compressed air cannon used to test the strength of aircraft windshields and the safety of jet engines… The chicken gun is designed to simulate high speed bird impacts. It is named after its unusual projectile: a whole dead standard-sized chicken, as would be used for cooking").
November 16, 2009
I found this fascinating site for the GRIB API:
The ECMWF GRIB API is an application program interface accessible from C and FORTRAN programs developed for encoding and decoding WMO FM-92 GRIB edition 1 and edition 2 messages. A useful set of command line tools is also provided to give quick access to grib messages.
Being as ignorant as I am, I had never heard of GRIB (WP: from "GRIdded Binary", "a mathematically concise data format commonly used in meteorology to store historical and forecast weather data"). For the first ten minutes I was staring at this page trying to figure out if it was an elaborate hoax — that someone had invented a funny-sounding acronym for a technology and pretended to develop APIs for interacting with it. It sounds wrong in just the right way! Especially convincing is the mention of the older, now deprecated GRIBEX package.
Related work: HORG, the Holotypic Occlupanid Research Group; SCIgen and their video Near Science, and Dresden Codak’s Dungeons and Discourse.