Summary

  • English has an estimated one million unique words, well beyond the 64KB of memory allocated for the spellchecking program on early Unix machines.
  • To work with this limitation, engineers squeezed the dictionary down to 25,000 words, using an algorithm to remove affixes and a Bloom filter for lookups.
  • This wasn’t sufficient, so they expanded the dictionary using hash compression, followed by hash differences, and a special compression method that came close to theoretical limits.
  • While today’s computers have infinite storage compared to 1970s machines, the spirit of innovation lives on in modern text compression developments such as large language models.
  • And they didn’t have to waste time on bullsh*t management concepts like “managed languages,” FuSi, MISRA, and SCRUM.

By Bryan Cockfield

Original Article