Hey, I'm Max.

This is my journal and sketchbook.

/ xkcd-machine

Development notes from xkcd's "Machine"

On April 5th, xkcd released Machine, the 15th annual April Fools project I’ve made with them.

It’s a game we’d been dreaming of for years: a giant rube goldberg machine builder in the style of the classic Incredible Machine games, made of a patchwork of machines created by individual xkcd readers. For more details, check out Explain xkcd’s wonderful writeup.

This is the story of how we built Machine in 3 weeks, and what I learned along the way.

April 1st is a special occasion where I and others collaborate with xkcd on ambitious interactive comics. This project had the largest group of contributors to date! Expand for full credits.
  • Randall, davean, and I created the art, backend, and frontend respectively.
  • Ed White designed and built the moderator UI.
  • Alex Garcia implemented the hook, wheel, prism, and cat widgets, with contributions to the React physics integration.
  • Kevin Cotrone wrote the meta machine generator which determines which inputs and outputs each tile has, and built backend infrastructure.
  • Conor Stokes (with his daughter Ami) implemented the cushion, bumper family of widgets, and refined the physics stepper.
  • Liran Nuna implemented the boat that floats at the bottom of the comic.
  • Benjamin Staffin improved the deployment pipeline and moderated submissions.
  • Manish Goregaokar, Patrick, Amber, and Michael Leuchtenburg moderated submissions and gave creative feedback.

Early machinations

It took us deep into March, turning around ideas we were kinda excited about, to find the one that had us all sitting bolt upright.

”Could we make a really big tiled mechanism like the blue balls GIF? Where everyone contributes a small square?”

This referenced a classic viral GIF from 2005 (warning: loud music), which was a collaboration composed of tiles made by Something Awful users:

A classic internet GIF animation of blue colored balls moving around a complicated rube goldberg machine mechanism

Sometimes an idea feels like it emerges fully-formed, but when you start talking about it, you realize there’s still a dizzying array of decisions to make. Thus ensued 5 days of brainstorming to discover each of us had slightly different core beliefs about what this comic should be:

  • Where do the balls come from?
  • Does everyone see the same machine? What is its purpose?
  • How can players interact with it?
  • And most importantly… why do they?

Learning from previous attempts

My favorite and least favorite interactive comics we’ve ever done have centered around user contributed content. My personal fave was Lorenz, an exquisite corpse where readers evolved jokes and storylines by writing in panel text. So much fun!

Screenshot of comic #1350, "Lorenz"

It doesn’t always work out how we hoped, though. Take 2020’s Collector’s Edition:

Screenshot of comic #2288, "Collector's Edition"

In Collector’s Edition, players found stickers scattered across the xkcd archives. They could then place each sticker once, permanently, on a global shared canvas.

Wouldn’t it be cool if readers could make their own comic panels together? This was the idea we started with, which got pared down to the sticker concept.

Unfortunately, the game design didn’t yield the desired results:

  • The initial view for all players was the center of the map, which was initially blank. It quickly descended into chaos. Chaos became every player’s first impression of the game.

  • There was no incentive to carefully consider where to place a sticker. Players didn’t have enough agency to advance the plot through their individual action. This limited creativity to simple patterns like tiling similar stickers or forming lines.

  • We didn’t provide an overarching story or goal. The stickers you had didn’t obviously relate to the others already on the page (the fridge poetry magnets were fun, though).

For a collective canvas to shine, the experience should teach you by example what’s cool to make with it. It helps to have a shared context and purpose which motivates what to create.

Designing constraints

Once we knew we were building a big collaborative marble drop, we were awash with too many choices. Many early approaches seemed like unsatisfying trade-offs, or very difficult to implement. The only thing we were really sure of was there would be a grid of interconnected machines players would create.

How big should the overall machine be? Let’s consider 100x100, arbitrarily. How would we simulate it? Running 10,000 tiles in realtime on the client, each with tens of balls, seemed like a risky goal.

Also, how could players create subdivisions of a large, complex machine without communicating directly? How would we know tiles designed in isolation would work when integrated together?

Many thought experiments later, we ended up with 3 core principles:

1. Maximize player expressiveness at the cost of correctness.

How predictable did the machine need to be? We considered running the whole thing server side. Another option was to simulate individual machine tiles to validate them. This would give us some assurance that when everything was connected, the machine would work.

Perhaps if the machines were deterministic enough, we could also estimate the rate balls exited each tile. We could use that to approximate the overall flow of the machine, so we could feed tiles balls at the proper rate without running every tile.

Once we had a prototype editor running, Davean quickly dispelled this idea by creating a machine with long patterns of chaotic ball collisions:

Unless balls moved in straight uninterrupted paths, clearly it was easy for players to make very unpredictable machines. Randall wryly suggested we add double pendulums.

From a design standpoint, this settled that making the machines more predictable would trade against degrees of freedom players had. Also, in the face of a tight deadline, it’s best to keep it simple, which favored an approach light on prediction or simulation.

We decided to prioritize players having tons of flexibility in what they could build — even extremely nondeterministic or broken machines. This meant we’d need active moderation, both to verify that machines satisfied the constraints, and to remove any offensive content.

2. Give players firm constraints that encourage resilient, interchangeable machines.

Accepting moderation and unpredictable player machines made another useful decision for us: ironically, it forced us to require more order between the machines.

Early on, we’d considered making the inputs and outputs of machines totally free-form: where previous tiles output balls on their edges, future players would build outwards incrementally. Then we looked at how moderation would work. There was the possibility that we’d need to replace a tile from early on.

If tile designs depended on previous ones, this could break a large portion of the machine. This led us to design tight enough constraints that multiple players would create compatible designs within the same tile space.

This is the Robustness principle in action: “be conservative in what you send, be liberal in what you accept”.

To provide players with input and output constraints, we’d need a map of the whole machine from the start. Generating the map also gave us the opportunity to vary how challenging the machines would be (we called the tile configurations “puzzles”). Kevin’s map generator transitions from simple single-input single-output puzzles to complex 4-in-4-out merges in the middle, back to 2 outputs per tile at the end.

On the player side, we designed the constraints so we could give players realtime feedback as they constructed their tile. By requiring that tiles output balls on average at roughly the same rate as they received them, we could discourage machines that ate balls or created a lot of latency (e.g. pooling them up). We chaos tested tiles by randomizing the rate of balls entering the editor to reflect the variance upstream.

Our general philosophy became “run the machines for a while, see if on average they meet the constraints given uneven input”.

3. Machines should reach a steady state in the first 30 seconds.

This led to a new question: how long would moderators have to watch? We made the arbitrary decision that it should take 30 seconds for machines to enter a steady state, based on napkin math for how long it’d take to moderate the whole machine (e.g. 10k tiles => 83.3 hours).

We also made balls expire after 30s. Initially, when there was no expiration, I noticed that everyone’s first experience was balls piling up and filling their screen while they learned how to play the game. This would also bog down the physics simulation as it accumulated a huge number of active rigid bodies. Instead of being fun, the balls were getting in the way!

Screenshot of Machine with a tutorial popup reading "For security reasons, balls that remain in your device for mosre than 30 seconds will be removed and destroyed."

Expiring the balls helped players fall into a pit of success, because machines would not accumulate errors over time. It also drastically simplified moderation, because after watching for 30 seconds, you’ve seen where most balls can end up in their lifetime.

Simulation and hyperreality

The architecture of Machine made two big bets. The first was: with all of the above design constraints in place, connecting together disparate tiles into an overall machine would work. We generated and solved a few smaller maps to shake that out.

Back to another problem, though: how could we display a giant machine if we couldn’t run it in realtime on either the server or client?

Before reading further, I’d encourage you to send a little time scrolling around the comic and imagine how it works. Because what follows will spoil it in a big way.

As a northstar, I wanted it to be possible to follow a single ball from the top of the machine to the bottom. This meant that even if the whole machine wasn’t being simulated, a window around what the player sees would need to be.

Once an early version of the map viewer was working, I started testing out an infinite map with only the viewable area simulated. It looked pretty good — but you can see gaps in the flow when I scroll up, because the initial state of the tiles was empty as they enter the simulation.

Instead of an empty tile, we needed them to appear to already have activity in them. So here’s the second bet: we’d snapshot tiles after they’d reached their steady state, only bringing the snapshots into existence just before they scrolled into view. Would players notice?

Here’s a view of the final comic, with display clipping turned off (you can do this by disabling the overflow: hidden and contain: paint CSS properties on the containers):

Did you notice the snapshots? Unless I’m really looking for them, I don’t.

Only the tiles you see rendered exist in the physics simulation. Note that there’s also a minor display optimization going on: even though you only see the balls inside the viewing area, they’re simulated within the whole tile extents. To pretend there’s more machine up above the view, balls are created and fed to the tiles at the top row of the simulation (based on the expected rate of their input constraints).

To create snapshots, we tied them into the moderation UI. Mods must wait at least 30 seconds before approving a tile. We then take the snapshot when they click the approve button. This gives mods discretion to wait a little longer for the machine to enter a nice looking state.

Snapshotting worked way better than we expected. A really nice consequence is that it resets accumulated error in the machine. As you scroll around, your first impression of a tile is a clean good state that a moderator liked. In practice, if you watch long enough, many machines can get wedged into stuck or broken states, but you’ll never see them if you keep exploring, because you’ll enter fresh snapshots.

The machine you’re scrolling around in the comic isn’t real. It’s hyperreal. The whole thing is never simulated in its entirely, and I think turned out better that way!

Rendering thousands of balls with React and DOM

Machine is built on the Rapier physics engine. Rapier was fantastic to work with: it has great docs, a clean API with lots of useful primitives, and has impressive performance thanks to its Rust implementation (running as WASM in the browser). I was also initially drawn to Rapier’s determinism guarantees, though we didn’t end up doing any server side simulation.

On top of Rapier, I wrote a custom React context, <PhysicsContext>, which creates Rapier physics objects and manages them within the React component lifecycle. This made it easy to develop a “widget” component for each placeable object with physics or collision surfaces. Effectively, React functioned as a quick and dirty scene graph. This simplified loading and unloading tiles as the view scrolled: when a tile unmounts, all of the physics and DOM are cleaned up. As a bonus, it made it easy to wire up hot reloading with fast refresh, which was really nice for tweaking collision shapes:

Another cool aspect of the React context approach is that all of the physics hooks noop when they’re not inside a <PhysicsContext>. This is used to render static previews of tiles for the moderation UI.

I wish I had used components instead of hooks to create rapier objects. I later discovered this is the approach react-three-rapier takes, and it fits better with React diffing (vs. useEffect which destroys the old instance and recreates on dependency change).

Machine is rendered entirely using the DOM. During early dev I was leery I’d reach the end of my rope perf-wise. I expected I’d eventually ditch DOM rendering for PixiJS or canvas when it got too slow. However, I wanted to see how far I could take it, since it meant less to build.

To optimize rendering performance, the frame loop applies styles directly to widgets with physics simulation. Thus React’s diff only runs when structural changes are made to the scene graph. Initially balls were rendered by React, but the frequent creates / removes were low hanging fruit for reducing diffs, so I created their own optimized renderer. Another win was draw culling for balls and widgets out of view. This performed well with 4000 balls in simulation and hundreds onscreen, so I settled on the DOM-only rendering approach.

I’ve heard comparisons drawn between modern browsers and game engines, with their tightly optimized GPU rendering and DOM / scene graph. The similarities have never felt more apt.

API and Moderation

Machine’s backend was written in Haskell by davean and Kevin, with redis as backing store. We used OpenAPI with OpenAPI fetch to share types between the codebases. This approach had some teething pains adapting Haskell types, but ended up very helpful for coordinating late breaking API changes. This was also my first project using TanStack Query, which was quite handy for caching and automatically refreshing the machine without server push.

The moderation UI, designed by Ed White, was critical for us because it bottlenecks all submissions being published. Mods must choose from potentially hundreds of designs for a particular tile. We used a simple approach but unreasonably effective approach to prioritize the queue. Each type of widget has an interestingness score, and we count each instance to sort candidate tiles. This biases towards maximalist solutions, though mods counteract that by reviewing the middle of the list for more minimal ones.

The large imbalance between the number of submitted designs and those published in the machine is unfortunate — it’s my least favorite thing about this comic. We searched for a way to make more of the back catalog available prior to launching, but there wasn’t a good compromise given our moderation time constraints. We’d like to find ways to share more of the submission dataset after live submissions are finished.

One nice UX finding came from the moderation approve cooldown. Since tile snapshot quality is so important, I hacked in a countdown timer which disabled the moderator approve button until at least 30 seconds had passed running the simulation. This ensures that snapshots are taken of a steady state, and gives time to check that outputs are receiving balls at the expected rate. I initially expected this to be annoying to mods, but to my surprise, they liked how it prevented hasty decisions.

Post-launch, I added a slider that allows moderators to speed up the simulation to much faster than realtime. This saves a ton of moderator time, because now the first 30 seconds of a submission can be viewed in under 5 seconds. It’s also quite useful for reviewing the behavior over a longer span of time.

A note of appreciation for the “Jamslunt Interfoggle”

Finally, I’d to take a moment to appreciate one of my favorite machines. It’s a great example of how even with all our editor constraints in place, serendipitous and funny unintended consequences happen between tiles.

The “Jamslunt Interfoggle” was posted within the first couple hours the comic was up. It’s a clever mechanism that exploits the narrow field of fans. It queues blue colored balls in a chute until they accumulate enough weight to spill out the sides.

However.

The tile that ended up above the Interfoggle, “Bouncy”, is a chaos engine launching balls across 3 crossing paths. Every once in a while, it will send a green ball through the wrong output, which wrecking-balls through the logjam and sends a cascade of blue balls through the Interfoggle.

The Interfoggle can’t have been designed with this behavior in mind, because we only feed the correct color in the editor (this was a conscious decision to make inputs easier to understand). Yet, this machine is so much better with the green balls in the mix.

One of the great joys of making a project like this is discovering all the creative ways people use it, intentional or not. Even though I know it’s coming, I’m continually amazed by how brilliant the internet is when given a shared canvas. Thanks to everyone who contributed tiles.

At the time of writing, there’s still a little time to add your own design to the final machine.


You can check out the source code of Machine here. Feel free to drop me a line on Mastodon if you have any questions about it. One cool thing to hack on would be implementing a full global simulation of the machine. I’m quite curious to see how well it works.

I hope you’ve enjoyed this deep dive into “Machine”. For more xkcd stories, check out these notes from our space exploration games and 2021’s Morse Code April Fool’s comic.

/ coalesce-dev-diary-first-post

Coalesce dev diary: first post

Listen to the spoken version of this post:

So, I’m going to try something new here.

For a long time I’ve had an intention to write more about what I create, but never seemed to really find the time. And I don’t think I’m particularly unique in this regard.

I’ve been working for the past six months on an audio editor called Coalesce that transcribes your audio and lets you edit it as text. This is the first of what I hope to be a series of development diaries where I just sit down and talk about what I’ve been working on lately.

So as an experiment, I wanted to see if speaking extemporaneously would help me to get these ideas out and avoid the perfectionism creeping in. Why not dogfood my own project and share what I’ve been working on by talking about it? I can have Coalesce generate the transcriptions and then it should be pretty easy to post that.

So, here goes!

Why build a podcast editor?

Coalesce is a transcription-based podcast editor. You put your audio files into it, it turns those into text, and then you can edit the text like any other text document. Move words around, delete words, edit it more as an outline. Coalesce, the audio editor, will chop up the audio to match the edits that you make to the text.

One of the reasons why I decided to build a podcast editor is because I really love podcasts. I love listening to people speak. I think that there’s a lot of really interesting nuance that comes from the spoken word.

And it’s also, frankly, something that is multitaskable: I really like that I can have a headset on, I can be doing something with my hands and be learning something or consuming information at the same time.

I also really like improvisation and that leads to an interesting conflict because it takes time at least for me, to get to the core of the ideas I want to talk about. As a listener, I just want to cut through all of that. I want to hear the author’s thoughts in a clear and concise way. I want economy of my time as a listener.

But as a creator, I like meandering. I like exploring. I like seeing what comes up when I’m speaking or thinking extemporaneously.

One of the things that really excites me about what we can now do with transcription tools is there’s this best of both worlds opportunity where I can see what I said as a document and edit it and cut out all the stuff that I don’t think makes sense, move things around so that perhaps the ideas connect together better, and produce something that hopefully respects your time as a listener or as a reader.

Project goals

There’s a couple things I’m trying to do with Coalesce.

Transcription-based

First of all, it’s transcription-based.

I think it’s a really interesting time to be working on transcription because a lot of organizations like OpenAI and Meta are releasing research projects around voice transcription and generation. Many of them are open-source and unencumbered for me to build upon. I hope that over the next couple of years, even more will be released.

Voice generation seems more morally and ethically complex to work in. I understand why these models aren’t more widespread, but I’m hoping in a couple years’ time, there will be even more tools to work with here. It’s very exciting to me because as a non-expert, I can integrate close-to-state-of-the-art techniques. The quality is amazing, frankly, and I only expect it to get better. So that’s cool.

Collaborative

Another core aspect of Coalesce is that it’s collaborative. Making the editor collaborative actually accomplishes two things. First, it makes it possible for everyone to pitch in, but it also blurs the line more between creator and editor. And where that’s exciting to me, is making podcasts with groups.

So in my own personal experience recording a podcast with my friends, I took on the brunt of the editing process and as I was editing things, making choices about whose narrative would lead and making tweaks to the ordering and structure of the conversation, I could feel my subjectivity creeping in there.

These are the things that sounded good to me, and I should lean into the things that sound good to me. But I recognize that sometimes those might come at the expense of what would sound good to my collaborators.

So I think it could be interesting to give more creative control across all the speakers in a podcast. Because now it’s just a shared text document that we can all edit together. I’m not sure if this is going to resonate with people. It’s something that could end up being a solution in search of a problem. But from my own experience, it was what I wanted.

Open-source

When I was about 10 or 11, I had the idea to search the web for a tool that would let me create 3D art. And naturally, I stumbled upon Blender. Its creator, Ton Roosendaal, is a major inspiration of mine. What’s cool about Blender is it was always offered as freeware. Even in the early 2000s, while it was being developed as a for-profit tool, you could download and have access to a ton of really interesting features.

Maybe not as good as some of the professional tools at the time, but for a kid like me, it was mind-blowing. I have to have spent hundreds of hours making pictures in Blender, and it inspired me to pursue digital art.

Another piece of software I used a lot as a teenager was Ardour, a digital audio workstation by Paul Davis, which was open source and available on the Linux distros I used at the time. It was my gateway into amateur audio production.

When you get a new tool, it defines the possibility space as a creator. Having a rich and full-featured audio editor for free in my bedroom accelerated my path towards making music. Ardour’s an especially fun example because I’m sitting here 20 years later using Ardour 7 to record this.

So I’m really grateful to projects like Blender, Inkscape, and Ardour for giving me creative tools. My hope is to contribute a tool that can help save creators time and help them make better podcasts. Especially for folks for whom the professional tools might be less accessible.

Balancing quality vs. speed

I’ve been working on Coalesce for about six months now.

There’s a bunch of topics I’d love to do a deep dive on, like Coalesce’s audio scheduler, like how building a collaborative editor has changed the way I needed to think about the backend. To the infrastructure complexities I’ve had from how these different decisions stack together.

But like many projects of mine, now it’s at that uncomfortable middle point where it’s probably not good enough or polished enough to stand on its own as a tool for other people to use. But at the same time, it’s past the prototype phase.

I have some confidence in it, and I want to evolve it to the point where people are editing real podcasts using this thing. So I’ve been working on taking this prototype to production over the last couple months.

I think that the best way to introduce this tool to people is to give them a place right away where they can try it out. And not everyone has a GPU or a fast CPU available to process the transcriptions.

So for its initial incarnation, I’m building it as a web app. That’s led me towards this path of building something that is reasonably scalable out the gate. Because I think this thing is really cool, and when I put it out there into the world, I don’t want it to immediately fall over.

If this never finishes because I spent too much time on scaling, and too much time thinking ahead to building a durable platform, that’s not good either. I’m trying to find a happy medium between rapid prototyping and putting something out there that I can stand behind and say, if you start making a podcast with this, it’s going to keep working.

To be continued

I’ve thought a lot about these trade-offs, and I think that by writing them down or by talking them out, it’s not just a useful reflection for me, but it’s a good way to capture a history of what I was weighing when I made technical decisions, and maybe can serve as food for thought if you’re building things too. It’s all been a lot of fun to think through and put together.

But for now I’m going to leave it at that.

This is an experiment. I hope you’ve enjoyed reading or listening to this and I’m looking forward to coming back and making some more.

Thanks for listening!