FC8 – Faster 68K Decompression (2016)

https://www.bigmessowires.com/2016/05/06/fc8-faster-68k-decompression/

68 points by electricant on 2024-04-28 | 10 comments

Automated Summary

The article introduces FC8, a new compression scheme designed for fast decompression on a 68K CPU. It is inspired by LZG but with improved decompression speed. The algorithm is based on LZ77, using a sliding history window and backrefs to replace duplicated data. FC8 maintains similar compression density as LZG, but decompresses 1.5x-2x faster on a 68030 CPU. The 68K decompressor code is provided, with a main loop of 256 bytes for caching. The article also explores the possibility of on-the-fly decompression and its trade-offs.

Other submissions

Comments

dansalvato on 2024-04-29

I'm working on a game for Amiga (another 68k-based platform) and settled on ZX0 to decompress assets on the fly: https://github.com/einar-saukas/ZX0

I was originally using LZ4, but I switched to ZX0 after learning that it can do in-place decompression, which means I don't have to allocate separate memory for the compressed data. I'm very happy with the compression ratio, and decompression of large assets (~48kb) only takes a few frames on a 7MHz 68000.

Also of note is LZ4W, included in Sega Genesis Dev Kit (and discussed in the comments section of OP's article), a variant of LZ4 that only uses word-aligned operations. That makes it much faster on the 68000, which can struggle to efficiently handle 8-bit data. More info here: https://github.com/Stephane-D/SGDK/blob/master/bin/lz4w.txt

geon on 2024-04-30

I wrote a naive lz77 packer/unpacker in C for a c64 game. https://github.com/geon/woorm/blob/master/tools/lz77.c

Not fast, but the compression ratio was decent, and made it easy to fit a bunch of levels into the game.

bananaboy on 2024-04-29

Nice! When you say “on the fly” are you literally decompressing assets from disk during gameplay? Can you do asynchronous IO on the Amiga?

dansalvato on 2024-04-30

As of now, I'm loading all the level assets in advance, but many of the assets in RAM stay compressed until they're needed, such as backgrounds and fullscreen graphics. I decompress those asynchronously into video memory when I need them. The second video in this blog post shows the decompression happening onscreen in realtime: https://dansalva.to/coding-the-anime-woosh-screen-on-amiga/

bananaboy on 2024-05-02

Thank you, great blog post! I do a lot of MS-DOS/Pentium and earlier retro programming so I love this stuff. The Amiga is a platform that I missed out on in my youth but I'm keen to have a crack at it some day. Most recently I made this demo with my group https://www.pouet.net/prod.php?which=95524 which won the 2024 Meteoriks award for best midschool production!

the-rc on 2024-04-30

It's actually harder to do synchronous floppy I/O on the Amiga. Data transfer is done over DMA, then you perform MFM decoding. The latter can be done by the CPU, or, asynchronously again, by the blitter, in which case you can't use it at the same time for graphic operations.

VelesDude on 2024-04-30

Depending on what you are going for in terms of gameplay experience, that can be a reasonable trade off. Say an RPG where you page in/out assets as needed but doesn't break the flow is probably going to be a bigger issue on gameplay styles that need more compute to achieve.

miohtama on 2024-04-29

In “faster than memcpy” series we have also Blosch for modern CPUs

https://www.blosc.org/pages/blosc-in-depth/

I have never been able to use Blosch myself. But it sounds really interesting, outperforming RAM. Not sure what are the applications - columnar data processing, Parquet files, etc?

jeramey on 2024-04-29

It gets used a fair amount in the weather data space. Forecasting and climate reanalysis grids are typically large (gigabytes) N-dimensional arrays of float32 values and Blosc provides enough tunable knobs that it's fairly easy to find a combination that performs acceptably without writing a bunch of custom handling code to keep track of which underlying compression schemes and settings were used. Additionally, it supports byte- and bit-shuffling filters which can really help boost the compressibility of certain data sets.

thechao on 2024-04-29

Why doesn't BLOSC have a little chart comparing itself to LZ4, Zstd, etc? Kinda like this:

https://stackoverflow.com/questions/37614410/comparison-betw...

Because it seems like such a trivial chart to make.