{"version":"https://jsonfeed.org/version/1","title":"jonno.nz","home_page_url":"https://jonno.nz/","feed_url":"https://jonno.nz/feed.json","description":"Tech, teams, and projects — John Gregoriadis","author":{"name":"John Gregoriadis","url":"https://jonno.nz"},"items":[{"id":"https://jonno.nz/posts/zero-metres-of-track/","url":"https://jonno.nz/posts/zero-metres-of-track/","title":"Six Years, $228 Million, Zero Metres of Track","content_html":"<p>Auckland Light Rail ran for six years, spent <a href=\"https://www.beehive.govt.nz/release/government-cancels-auckland-light-rail\">$228 million</a>, and laid zero metres of track.</p>\n<p>At its peak the project was paying about $920,000 a week to two engineering firms. Not to build anything. To plan, re-plan, and re-plan again, through three different versions of the route. In January 2024 a new government cancelled it as part of a 100-day plan, and the public got nothing back for a quarter of a billion dollars except some very expensive PDFs.</p>\n<p>I want to be upfront about what this series is, because the material is political and the timing is an election year. This isn't a political hit job. It's pattern recognition on the public record, and the pattern has both parties' fingerprints all over it.</p>\n<h2>The same meeting, every three years</h2>\n<p>I've spent a little bit of time working with exec teams, at startups and at big companies, and I've seen my share of strategy rewrites. Even one rewrite, done inside one company, with everyone trying their best, costs you about a year. Roadmaps get binned. Half-finished work gets written off. Your best people quietly update their CVs. The new strategy is usually about 70% the old strategy with new diagrams.</p>\n<p>New Zealand runs like a company that's contractually required to put its entire exec team up for replacement every three years. Each incoming team arrives with a mandate to show movement in a hundred days, and the fastest way to show movement is to bin whatever the last team was building. The reorg is the announcement. The announcement is the work.</p>\n<p>Any one of these execs might be competent on their own. The dysfunction is the team, and the team is permanent: it spans parties, decades and ideologies, and it cannot leave a predecessor's decision alone. Meanwhile the shareholders of NZ Inc can't sell their shares. A decent chunk of them are still at primary school.</p>\n<p>That's the frame. Now the receipts.</p>\n<p>The Cook Strait ferries are the cleanest one. In 2021 KiwiRail signed a fixed-price contract for two new rail-capable ferries, with port upgrades to match. By December 2023 the cost had escalated badly and the incoming government cancelled the lot. By then <a href=\"https://www.rnz.co.nz/news/political/560273\">$507.3 million had already been spent</a>, and 1News later confirmed we paid Hyundai another $144 million to walk away from the shipbuilding contract. So: $651 million, no ferries. Then we ordered different ferries anyway, for $1.86 billion, due 2029. The old rail-enabled Aratere got retired in the meantime, which severed the Cook Strait rail link for roughly five years.</p>\n<p>We paid $144 million to not get ships, and then bought ships.</p>\n<p>Three Waters took seven years from the Havelock North outbreak to a full legislative framework, then got <a href=\"https://www.dia.govt.nz/Water-services-reform-about-the-reform-programme\">repealed under urgency</a> in an afternoon in February 2024. You can think the reform was wrong and still notice the maths: the $120 to $185 billion of ageing pipes that justified it didn't get repealed with it.</p>\n<p>The Resource Management Act replacement is my favourite, in the way a slow-motion replay of a faceplant is someone's favourite. Parliament spent years building the RMA's successor, passed it in August 2023, and the next government repealed it 122 days later. The replacement's replacement arrived in late 2025, and The Spinoff noted its principles &quot;all but mirror&quot; the laws that got repealed. Thirty-plus years of everyone agreeing the RMA must go, and the RMA is still here, having outlived its own successor.</p>\n<p>Te Pūkenga merged sixteen polytechnics into one national institute in 2020. In January 2026 it finished un-merging them back into ten. Between those two restructures: 855 staff gone, $9.5 million in redundancy payouts, and $325 million allocated to stand up the new polytechs that look a lot like the old polytechs. If you've ever survived a corporate restructure followed by a de-restructure, you know exactly what those six years felt like from inside.</p>\n<p>The Smokefree generation law passed in December 2022 as a world first, got studied by half the planet, and was repealed before a single clause took effect.</p>\n<p>And if you want the purest specimen of the whole genre, it's the <a href=\"https://www.opespartners.co.nz/tax/bright-line\">bright-line test</a>. National invented it in 2015 at two years. Labour stretched it to five, then ten. National snapped it back to two in 2024. Four settings in nine years, on a tax rule that decides what happens when ordinary people sell a house. Both teams, same dial.</p>\n<p>Worth noting who started and stopped each of these. Labour built ALR, iReX, Three Waters and Te Pūkenga; National stopped them. National invented the bright-line test; Labour extended it; National reverted it. And Labour announced a $785 million harbour cycle bridge in June 2021, then cancelled it itself by October, which proves you don't even need to change the government to get the U-turn. The cycle doesn't care who's driving.</p>\n<h2>Only money actually spent</h2>\n<p>Time to add it up, and this is where I need to be careful, because the temptation with a topic like this is to quote the scariest number available.</p>\n<p>Politicians on every side do exactly that. When light rail was cancelled, ministers cited builds costing &quot;$15 billion, rising to $29.2 billion&quot;. Those are projections of money we now won't spend. Counting avoided future costs as waste is fantasy accounting, and the moment you do it, anyone with a calculator can dismiss your whole argument.</p>\n<p>So the rules for this series are boring on purpose. Count money that actually left the public account on things that were then cancelled or reversed: sunk planning, design, contracts, land, plus the penalties paid to walk away. Never count &quot;would have cost&quot; figures. Attribute anything contested to whoever claims it, like Labour's Tangi Utikere putting the full iReX cancellation at $1.16 billion once you include ongoing maintenance, a figure that bundles in things I won't count.</p>\n<p>On those rules: light rail's $228 million, iReX's $507.3 million plus the $144 million exit, the taxpayer share of <a href=\"https://en.wikipedia.org/wiki/Let%27s_Get_Wellington_Moving\">Let's Get Wellington Moving</a> ($109.7 million of the $180.7 million it spent over nine years, mostly on consultants; Wellington's ratepayers carried the rest), Te Pūkenga's redundancy bill, and the $80 million-plus in public service redundancy payouts across 24 agencies as the workforce swung up by 13,000 and back down again.</p>\n<p>That's well over a billion dollars in directly sunk and cancellation costs from a single change of government, counted conservatively. A billion dollars is roughly half a new Dunedin Hospital, a project which has itself been rescoped so many times that 35,000 people marched down George Street about it.</p>\n<p>I lived on the North Shore through the announcement years, and the ritual became familiar: the render, the press conference at the empty site, the revised render, the quiet line in a later Budget. SkyPath, then the standalone cycle bridge, then neither. Light rail to the airport in three different flavours. Somewhere around 2021 I stopped believing artists' impressions the way I'd stopped believing crypto whitepapers.</p>\n<p>Here's the backdrop that turns this from annoying into expensive. Te Waihanga's 2022 stocktake found that from 2010 to 2019 New Zealand was the biggest infrastructure investor in the OECD as a share of GDP, while sitting near the bottom 10% of high-income countries for how much infrastructure each dollar buys. We're not under-spending. We're spending like a rich country and delivering like a country that re-litigates its decisions every electoral cycle, into an infrastructure deficit Te Waihanga and Sense Partners size at somewhere between $100 and $210 billion over 30 years.</p>\n<p>I grew up in Christchurch, and the rebuild taught me what both versions of this look like on the ground. The bits that got locked in and left alone got built; the city got a genuinely world-class convention centre and a repaired arts centre. The bits that kept getting re-litigated dragged on so long they became civic jokes. The stadium took fifteen years of arguments and only opened this year. Same city, same decade, same money. The difference was whether each new set of decision-makers honoured the last set's call.</p>\n<h2>When stopping is the right call</h2>\n<p>A fair pushback: sometimes cancelling a project is the competent move, and an exec team that can't kill anything is its own kind of disaster.</p>\n<p>True, and the record backs it in places. Treasury papers showed about 80% of the iReX cost escalation was seismic strengthening at the ports, work that's needed no matter whose ferries dock there. Pulling the pin on a runaway project can be governance working, and I'd rather have a government willing to stop a bad bet than one that rides it down out of pride.</p>\n<p>The target of this series is narrower: the re-litigation of settled, long-horizon decisions. Ferries wear out on a schedule that doesn't care about coalition agreements. Pipes corrode on chemistry's timetable, not the electoral one. When the underlying problem has a 30-year horizon and the decision-making has a 3-year one, reversal isn't prudence. It's the exec team marking its territory, and the bill lands on shareholders who can't vote yet.</p>\n<p>So that's Part 1: the cost, counted conservatively, with the fantasy numbers left out.</p>\n<p>Part 2 asks why New Zealand specifically does this more than nearly anyone, because it turns out we're rigged for it in ways most democracies aren't, and the man who wrote the book on it in 1979 has updated his diagnosis.</p>\n<p>Part 3 is where I stop reading the receipts and start pulling them. The entire statute book is published as open data: every act, every amendment, every repeal. I've started parsing all of it, back to 2008, to measure the churn properly. It's a personal project and a few evenings of code, not a royal commission. But nobody else has built it, most of NZ Inc's shareholders haven't been born yet, and somebody should be keeping the receipts.</p>\n","date_published":"Fri, 12 Jun 2026 00:00:00 GMT","image":"https://jonno.nz/og/zero-metres-of-track.png"},{"id":"https://jonno.nz/posts/the-holes-that-kill-you-are-the-ones-you-never-tested/","url":"https://jonno.nz/posts/the-holes-that-kill-you-are-the-ones-you-never-tested/","title":"The holes that kill you are the ones you never tested","content_html":"<p>Every time something goes down, the first instinct in the room is to add a\nlayer. Another replica. A second region. One more health check in front of the\nthing that broke. It feels like progress, and the Swiss cheese model hands that\ninstinct a lovely picture to point at.</p>\n<p>You know the diagram even if you've never heard the name. Stacked slices of\ncheese, each slice a layer of defence, every slice riddled with holes. An\naccident only makes it through when the holes in all the slices happen to line\nup. <a href=\"https://en.wikipedia.org/wiki/Swiss_cheese_model\">James Reason</a> built it to\nexplain how hospitals and aircraft fail, and it's still the best tool we have\nfor explaining why a system with five safety nets can still face-plant.</p>\n<p>The picture has a side effect though. It teaches you to count slices. And\ncounting slices is mostly the wrong job.</p>\n<p>Take redundancy, since that's where slice-counting does the most damage. The\nmodel says two of everything beats one. Two servers, two regions. That holds\nright up until both copies share a single reason to die at the same instant.</p>\n<p>On 19 July 2024 CrowdStrike\n<a href=\"https://www.crowdstrike.com/en-us/blog/channel-file-291-rca-available/\">shipped a content update</a>\nto its Falcon sensor and bricked around 8.5 million Windows machines, by\nMicrosoft's estimate, inside a few hours. Every one of those hosts was\n&quot;redundant&quot; in somebody's architecture diagram. Made no difference. They ran the\nsame agent and swallowed the same bad file at the same moment, so they all\ndropped together. When the failure is perfectly correlated like that, redundancy\nbuys you nothing. NASA's reliability folks put it bluntly: if one fault can take\nout the backup too, two subsystems fail twice as often as one.</p>\n<p><img src=\"https://jonno.nz/img/posts/the-holes-that-kill-you-are-the-ones-you-never-tested-fig-1.png\" alt=\"A blueprint cross-section of stacked redundant defence layers, each riddled with holes, with a single skewer passing through a hole aligned in the exact same spot on every layer.\"></p>\n<p>There's a quieter version of the same trap hiding in your SLOs. String together\na dozen services that are each up 99.9% of the time and your ceiling is about\n98.8% before you've written a line of your own code, because every dependency is\none more slice with its own holes. You can't promise more reliability than the\nflakiest thing you lean on, and each extra nine costs more than the last.</p>\n<p>The slices you can draw on a whiteboard are the safe ones. The holes that get\nyou are the ones nobody can see: the failover that's never been run, the\nassumption that stopped being true six months ago when no one was looking, the\ndead code still sitting in production behind a flag.</p>\n<p>Knight Capital is the one I'd tattoo on a junior engineer. On 1 August 2012 they\nswitched on new trading code and pushed it to seven of their eight servers. On\nthe eighth, a reused flag woke a dead function called Power Peg that should have\nbeen ripped out years earlier. For about 45 minutes it hurled orders into the\nmarket with nothing counting the fills:\n<a href=\"https://www.sec.gov/files/litigation/admin/2013/34-70694.pdf\">more than four million executions and 397 million shares</a>.\nKnight's own books put the loss at roughly $440 million, and the firm didn't\nsurvive the week.</p>\n<p>Pull it apart and every hole was survivable on its own: dead code left in the\nbuild, a flag doing double duty, a deploy one person ran with nobody reviewing\nit, 97 warning emails before the open that everyone read as noise. None of those\nsinks you alone. Line them up on a single Tuesday morning and the company is\ngone.</p>\n<p>That's Reason's real point, and it's a good one. The model is right. It's just\ncoarse. Richard Cook said it better in\n<a href=\"https://how.complexsystems.fail/\">How Complex Systems Fail</a>: complex systems\nrun in a degraded state the whole time, stuffed with small faults, staying up\nbecause people quietly hold them together. Catastrophe needs a few of those\nfaults to meet. A sixth slice does nothing about the holes already living in the\nfive you've got.</p>\n<p>Your architecture diagram and your runbook describe the system you imagine you\nhave. The real system is whatever your on-call engineer does at 3am to keep it\nlimping along. Erik Hollnagel calls that gap\n<a href=\"https://www.england.nhs.uk/signuptosafety/wp-content/uploads/sites/16/2015/10/safety-1-safety-2-whte-papr.pdf\">work-as-imagined versus work-as-done</a>,\nand the holes love living in it.</p>\n<p>The useful work is dragging those invisible holes into daylight while you still\nget to pick the timing. Adrian Cockcroft has the perfect name for skipping it.\nHe calls an untested failover\n<a href=\"https://www.gremlin.com/blog/adrian-cockroft-chaos-engineering-what-it-is-and-where-its-going-chaos-conf-2018\">&quot;availability theatre&quot;</a>:\nyou've got the runbook, you've got the standby, you feel great, and you have no\nclue whether any of it works because you've never once pulled the plug to watch.</p>\n<p><img src=\"https://jonno.nz/img/posts/the-holes-that-kill-you-are-the-ones-you-never-tested-fig-2.png\" alt=\"A blueprint cross-section of the same holey layers slid apart and probed by hand, a thin light beam catching two holes about to line up.\"></p>\n<p>This is why the teams who are good at this spend their hours on stuff that looks\nunglamorous. They run game days and break things in production on purpose, the\nway Netflix set Chaos Monkey loose to kill its own servers at random, just to\nprove the system could take it. They write\n<a href=\"https://sre.google/sre-book/postmortem-culture/\">blameless postmortems</a>,\nbecause the day people get punished for an outage is the day they stop telling\nyou where the holes are. They treat reliability as a budget they spend, not a\nfeeling they chase.</p>\n<p>Most of that is a people problem wearing a technology costume. The\nhighest-leverage slice in your whole stack is usually the on-call engineer at\n3am, and whether your culture lets them say &quot;yeah, I broke it, here's how&quot;\nwithout flinching. No cloud provider sells that one.</p>\n<p>I've built billing and payments platforms where downtime means somebody doesn't\nget paid, and shipped to production many times a day. The urge to bolt on\nanother layer never goes away. It just gets more expensive, and more comforting.</p>\n<p>By all means keep your redundancy. Keep the regions and the replicas. Just don't\nkid yourself that a standby you've never failed over to is a second slice of\ncheese. It's a hole with a comforting label, and you'll find out which one it\nreally is on the worst possible morning.</p>\n","date_published":"Sun, 31 May 2026 00:00:00 GMT","image":"https://jonno.nz/og/the-holes-that-kill-you-are-the-ones-you-never-tested.png"},{"id":"https://jonno.nz/posts/what-happens-when-a-worm-drives-claude/","url":"https://jonno.nz/posts/what-happens-when-a-worm-drives-claude/","title":"What Happens When a Worm Drives Claude?","content_html":"<p>A few weeks ago I was on the couch with my eight-year-old, and she was going on\nabout worms. Kids ask the kind of questions we forget to ask as adults, the ones\nthat drop off once we get all mature and sensible. That conversation is the\nreason a worm ended up driving Claude.</p>\n<p>Right now everyone builds LLM agents the same way. Take a model, bolt more tools\nonto it, hand it a bigger toolbox. I wanted to try the opposite. Leave the model\nalone, and put a brain inside the loop to do the steering.</p>\n<p>I'll say this up front: it's a fun project, not a serious one. But the thing it\ndoes is genuinely strange, so it's worth a watch.</p>\n<div style=\"position:relative;aspect-ratio:16/9;margin:2rem 0;\">\n  <iframe src=\"https://www.youtube-nocookie.com/embed/9ydt-dVCY8Q\"\n    title=\"What Happens When a Worm Drives Claude?\" loading=\"lazy\"\n    allow=\"accelerometer; clipboard-write; encrypted-media; picture-in-picture; web-share\"\n    allowfullscreen style=\"position:absolute;inset:0;width:100%;height:100%;border:0;\"></iframe>\n</div>\n<p>(Can't see the embed? <a href=\"https://youtu.be/9ydt-dVCY8Q\">Watch it on YouTube</a>.) All\nthe code is <a href=\"https://github.com/jonnonz1/c302\">on GitHub</a> if you'd rather pull\nit apart yourself.</p>\n<h2>The setup is a car</h2>\n<p>Think of it as a car. Claude (Sonnet 4) is the engine. A big engine with nobody\nat the wheel. It has five tools: read, grep, write, bash, and run the tests.\nThat's the lot.</p>\n<p>The driver is a small controller program. Every tick it looks at what Claude\njust did and decides what Claude does next. It never writes a line of code\nitself. It just shapes the behaviour, the way you'd drive a car without ever\ntouching the pistons.</p>\n<p>And the driver doesn't read the code either. It reads the dashboard. Five\nnumbers, every tick, and only these five: how many tests are failing, the share\npassing, how many files Claude has touched, how many ticks have gone by, and the\nlast tool it used. That's the whole view of the world.</p>\n<p>Out the other side there are seven knobs. The big one is the gear: diagnose,\nedit, test, or stop. Then a handful more, like how much risk to take, how long\nto think, how widely to read, how hard to commit, and how done it reckons it is.\nFive numbers in, seven knobs out.</p>\n<h2>So who turns the knobs?</h2>\n<p>The worm.</p>\n<p>It's C. elegans, a roundworm about a millimetre long. It has 302 neurons, and\nscientists mapped every single connection between them back in 1986 (the year I\nwas born, which I'm choosing not to read anything into). It's one of the only\nnervous systems we understand all the way down.</p>\n<p>I took 14 of those neurons and ran them live in a simulation. Four of them do\nthe heavy lifting. One is a salt sensor, and to a worm salt means food, so I\nwired it to reward. Its opposite I wired to bad results. One drives the worm\nforward, so I wired it to how much work is left. One throws it into reverse, so\nI wired it to errors.</p>\n<p>I didn't make any of that wiring up. It's the published connectome. All I did\nwas connect the worm's senses to the agent's situation and let it run.</p>\n<p>(If you want the longer argument for why a worm of all things, I made the case\nwhen I\n<a href=\"https://jonno.nz/posts/what-if-a-worm-could-make-ai-agents-smarter/\">first started this</a>.)</p>\n<h2>One run, 14 ticks, 45 seconds</h2>\n<p>Does it crash, or does it do something useful?</p>\n<p>It starts at baseline with one failing test. Claude opens a few files, runs a\nsearch. The tests are failing, so the worm leans on the accelerator. I never\nwrote a rule that says &quot;press the accelerator when there's work to do&quot;. It falls\nstraight out of the wiring.</p>\n<p>Then Claude runs the tests and gets a real failure. The error neuron lights up,\nthe worm changes gear, and Claude makes its first proper edit. Risk drops, it\nreads less, it commits harder. Again, no rule for that combination. The error\nsignal and the work-left signal crossed a threshold together, and the worm\nturned that into &quot;stop wandering, make a change&quot;. It found a scent.</p>\n<p>A test goes green. Then another. The worm is cooking. If the run ended there I'd\nhave gone home happy.</p>\n<p>It didn't. The second edit was sloppy and broke something else. The pass rate\ndrops, the errors jump, and the punish neuron (my favourite, for the record)\nlights up hot.</p>\n<p>This is the moment I cared about. The obvious move is to panic, drop back to\nsquare one, and start over. The worm doesn't. It stays in gear, eases off, gets\ncareful, reads before it edits. Two ticks later the careful edit lands and the\npass rate climbs back higher than it was before the break.</p>\n<p>It recovered without restarting, which is exactly what a good pair would do.\nLook at the diff, don't burn the house down and start fresh. And nothing in the\nwiring connects a regression to a change of gear. The punish signal pulls one\nknob down. It doesn't yank the worm out of edit mode, because that's a different\nknob entirely.</p>\n<p>By tick 14 the last test goes green, ten out of ten, and the worm stops and\nparks the car. Not because I told it to stop at ten, but because there was\nnothing left pulling it forward.</p>\n<h2>Same brain, one of them alive</h2>\n<p>So I ran it properly. About 200 times. The live worm averaged a 0.96 pass rate.</p>\n<p>Then I ran the exact same connectome, same wiring, same neurons, but\npre-recorded, like playing a tape of the worm's brain instead of letting it\nreact. That version averaged 0.87, and the live worm beat it every single time.\nSame brain. The only difference is one of them was alive and getting feedback,\nand the other was just replaying.</p>\n<p>Now the part I have to be honest about. I ran a colder version where Claude\nhadn't indexed the repo first, and plain Claude with no worm and no driver beat\nevery controller I built, the live worm included. The worm is great at following\na gradient once the ground is mapped. It cannot plan its way around a codebase\nit has never seen. So this isn't &quot;worm beats Claude&quot;. It's &quot;feedback beats no\nfeedback&quot;, which is a smaller and far more useful claim. (The full methodology\nand numbers are\n<a href=\"https://github.com/jonnonz1/c302/blob/main/research/PAPER.md\">in the paper</a>.)</p>\n<h2>What I took from it</h2>\n<p>Give your agent a way to know when it's done. The baseline that couldn't tell\nkept running long after the job was finished, burning money for nothing.</p>\n<p>Don't restart on a setback. The worm that held its nerve through the regression\ndid most of the real work.</p>\n<p>And feedback beats wiring. Two identical brains, and the live one won purely\nbecause it was in the loop. The data you feed an agent and the loop you close\naround it matter more than how clever the thing in the middle is.</p>\n<h2>The bit that isn't about a worm</h2>\n<p>I built this in about three weeks, a couple of evenings here and there, on\nroughly $50 of API credits. I used Claude Code as my research assistant, which\nmeans my brain was steering Claude to build a thing that lets a worm's brain\nsteer Claude.</p>\n<p>A few years ago a question this daft would have been a research project. Now\nit's a weekend and a coffee budget. The cost of chasing a strange idea has\nfallen through the floor.</p>\n<p>So chase the strange ideas. The code's\n<a href=\"https://github.com/jonnonz1/c302\">on GitHub</a>. Go on.</p>\n","date_published":"Sun, 31 May 2026 00:00:00 GMT","image":"https://jonno.nz/og/what-happens-when-a-worm-drives-claude.png"},{"id":"https://jonno.nz/posts/the-18-laws-of-human-nature/","url":"https://jonno.nz/posts/the-18-laws-of-human-nature/","title":"The Laws of Human Nature","content_html":"<style>\n.laws-feature {\n  --social-tint: #9aaecb;\n  --bg-surface-2: #19243480;\n}\n.laws-feature *,\n.laws-feature *::before,\n.laws-feature *::after {\n  box-sizing: border-box;\n}\n.laws-feature button {\n  font-family: inherit;\n  cursor: pointer;\n}\n\n/* Section rule used between intro and the map */\n.laws-feature .section-rule {\n  display: flex; align-items: baseline; gap: 0.75rem;\n  margin: 3rem 0 1.5rem;\n  padding-bottom: 0.65rem;\n  border-bottom: 1px solid var(--border);\n}\n.laws-feature .section-rule h2 {\n  font-family: var(--font-mono); font-size: 0.75rem; font-weight: 400;\n  color: var(--text-muted); letter-spacing: 0.06em; text-transform: uppercase;\n  margin: 0;\n  border-bottom: 0;\n  padding: 0;\n}\n.laws-feature .section-rule .count {\n  font-family: var(--font-mono); font-size: 0.75rem; color: var(--accent);\n}\n.laws-feature .section-rule .spacer { flex: 1; }\n.laws-feature .section-rule .hint {\n  font-family: var(--font-mono); font-size: 0.7rem; color: var(--text-muted);\n}\n\n/* Laws app */\n.laws-app {\n  --gap: 1rem;\n  --card-pad: 1.1rem;\n}\n.laws-toolbar {\n  margin-bottom: 1.75rem;\n  border: 1px solid var(--border);\n  border-radius: 0.3rem;\n  background: rgba(255, 255, 255, 0.015);\n  overflow: hidden;\n}\n.lt-row {\n  display: flex; align-items: center; gap: 0.75rem;\n  padding: 0.75rem 1rem;\n  font-family: var(--font-mono);\n  font-size: 0.72rem;\n}\n.lt-label {\n  color: var(--text-muted); text-transform: uppercase;\n  letter-spacing: 0.08em; font-size: 0.62rem;\n  min-width: 4.5rem;\n}\n.lt-progress .lt-bar {\n  flex: 1; height: 4px; background: var(--border); border-radius: 2px; position: relative; overflow: hidden;\n}\n.lt-progress .lt-fill {\n  position: absolute; top: 0; left: 0; bottom: 0;\n  background: linear-gradient(90deg, var(--accent), var(--accent-hover));\n  border-radius: 2px;\n  transition: width 0.35s cubic-bezier(.4,0,.2,1);\n  box-shadow: 0 0 8px rgba(212, 168, 83, 0.4);\n}\n.lt-progress .lt-count { color: var(--text-bright); font-size: 0.75rem; }\n.lt-progress .lt-count b { color: var(--accent); font-weight: 500; }\n.lt-progress .lt-count .of { color: var(--text-muted); margin-left: 0.1rem; }\n.lt-progress .lt-clear {\n  background: none; border: 1px solid var(--border);\n  color: var(--text-muted); padding: 0.25rem 0.55rem;\n  border-radius: 0.15rem; font-family: inherit; font-size: 0.62rem;\n  text-transform: uppercase; letter-spacing: 0.08em;\n  transition: color 0.15s, border-color 0.15s;\n}\n.lt-progress .lt-clear:hover {\n  color: var(--accent); border-color: rgba(212, 168, 83, 0.4);\n}\n\n/* Theme tokens */\n.laws-feature [data-theme=\"self\"]   { --t-color: var(--accent);      --t-glow: 212, 168, 83; }\n.laws-feature [data-theme=\"others\"] { --t-color: var(--tag-color);   --t-glow: 126, 184, 168; }\n.laws-feature [data-theme=\"social\"] { --t-color: var(--social-tint); --t-glow: 154, 174, 203; }\n\n/* Sigil animations */\n@keyframes sig-draw { to { stroke-dashoffset: 0; } }\n@keyframes sig-dot  { to { opacity: 1; } }\n@keyframes sig-pulse {\n  0%, 100% { transform: scale(1); filter: none; }\n  50%      { transform: scale(1.4); }\n}\n\n/* Theme band grid */\n.grid-view { display: flex; flex-direction: column; gap: 2rem; }\n.theme-band {}\n.tb-head {\n  display: grid;\n  grid-template-columns: auto auto 1fr auto;\n  align-items: baseline;\n  gap: 0.85rem;\n  padding: 0.65rem 0 0.75rem;\n  margin-bottom: 1rem;\n  border-bottom: 1px solid var(--border);\n  position: relative;\n}\n.tb-head.with-stripe::before {\n  content: ''; position: absolute; left: 0; bottom: -1px; height: 1px; width: 4rem;\n  background: var(--t-color); opacity: 0.7;\n}\n.tb-range {\n  font-family: var(--font-mono); font-size: 0.65rem;\n  color: var(--t-color); letter-spacing: 0.12em;\n  padding: 0.15rem 0.5rem; border: 1px solid currentColor;\n  border-radius: 0.15rem; opacity: 0.75;\n}\n.tb-label {\n  font-family: var(--font-text); font-size: 1.05rem;\n  font-weight: 500; color: var(--text-bright); letter-spacing: -0.02em;\n  margin: 0;\n}\n.tb-sub {\n  font-family: var(--font-mono); font-size: 0.7rem;\n  color: var(--text-muted); letter-spacing: 0.02em;\n}\n.tb-count {\n  font-family: var(--font-mono); font-size: 0.72rem;\n  color: var(--t-color);\n}\n.tb-count .of { color: var(--text-muted); }\n.tb-cards {\n  display: grid;\n  grid-template-columns: repeat(3, 1fr);\n  gap: var(--gap);\n}\n\n/* Law card */\n.law-card {\n  text-align: left;\n  background: rgba(255, 255, 255, 0.015);\n  border: 1px solid var(--border);\n  border-radius: 0.35rem;\n  padding: var(--card-pad);\n  color: var(--text);\n  transition: border-color 0.18s, background 0.18s, transform 0.18s;\n  display: flex;\n  flex-direction: column;\n  gap: 0.85rem;\n  position: relative;\n  overflow: hidden;\n  min-height: 13rem;\n  font-family: var(--font-text);\n}\n.law-card::before {\n  content: ''; position: absolute; top: 0; left: 0; bottom: 0; width: 2px;\n  background: var(--t-color); opacity: 0; transition: opacity 0.18s;\n}\n.law-card:hover {\n  border-color: rgba(255, 255, 255, 0.14);\n  background: rgba(255, 255, 255, 0.025);\n}\n.law-card:hover::before { opacity: 0.5; }\n.law-card.is-active {\n  border-color: var(--t-color);\n  background: rgba(var(--t-glow), 0.06);\n}\n.law-card.is-active::before { opacity: 1; }\n.law-card.is-read .lc-title { color: var(--text-muted); }\n.law-card.is-read .lc-illus { opacity: 0.55; }\n.lc-illus {\n  width: 56px; height: 56px;\n  display: flex; align-items: center; justify-content: center;\n  transition: opacity 0.18s, transform 0.25s cubic-bezier(.4,0,.2,1);\n}\n.lc-illus img {\n  width: 100%; height: 100%; display: block;\n}\n.law-card:hover .lc-illus {\n  transform: translateY(-1px);\n}\n.lc-body { display: flex; flex-direction: column; gap: 0.45rem; }\n.lc-meta { display: flex; align-items: center; justify-content: space-between; }\n.num-glyph {\n  font-family: var(--font-mono); font-size: 0.7rem;\n  color: var(--t-color); letter-spacing: 0.05em;\n  opacity: 0.85;\n}\n.lc-read {\n  font-family: var(--font-mono); font-size: 0.6rem;\n  color: var(--tag-color); text-transform: uppercase; letter-spacing: 0.1em;\n  padding: 0.1rem 0.4rem; border: 1px solid rgba(126, 184, 168, 0.3);\n  border-radius: 0.15rem; opacity: 0.85;\n}\n.lc-title {\n  font-family: var(--font-text);\n  font-size: 1.05rem;\n  font-weight: 500;\n  color: var(--text-bright);\n  letter-spacing: -0.02em;\n  line-height: 1.25;\n  text-wrap: balance;\n  transition: color 0.18s;\n  margin: 0;\n}\n.lc-essence {\n  font-size: 0.85rem;\n  line-height: 1.55;\n  color: var(--text-muted);\n  text-wrap: pretty;\n  margin: 0;\n}\n\n/* Theme badge */\n.theme-badge {\n  display: inline-flex; align-items: center; gap: 0.4rem;\n  font-family: var(--font-mono); font-size: 0.62rem;\n  color: var(--t-color); text-transform: uppercase; letter-spacing: 0.1em;\n  padding: 0.2rem 0.55rem;\n  border: 1px solid currentColor; border-radius: 0.15rem;\n  opacity: 0.85;\n  white-space: nowrap;\n}\n.tb-dot { width: 5px; height: 5px; border-radius: 50%; background: currentColor; }\n\n/* Modal — reusable wrapper around native <dialog> via window.Modal */\ndialog.laws-modal {\n  border: none;\n  padding: 0;\n  margin: auto;\n  background: transparent;\n  color: var(--text);\n  width: min(48rem, 92vw);\n  max-width: 48rem;\n  max-height: 88vh;\n  border-radius: 0.6rem;\n  overflow: visible;\n  box-shadow:\n    0 30px 80px -20px rgba(0, 0, 0, 0.7),\n    0 0 0 1px var(--border-strong);\n  /* allow the inner panel to define its own backdrop colour */\n}\ndialog.laws-modal::backdrop {\n  background:\n    radial-gradient(ellipse at center, rgba(0,0,0,0.6) 0%, rgba(12, 21, 32, 0.92) 70%);\n  -webkit-backdrop-filter: blur(8px) saturate(140%);\n  backdrop-filter: blur(8px) saturate(140%);\n  animation: modal-backdrop-in 0.18s ease-out;\n}\ndialog.laws-modal[open] {\n  animation: modal-in 0.22s cubic-bezier(.2,0,.2,1);\n}\n@keyframes modal-in {\n  from { opacity: 0; transform: translateY(8px) scale(0.97); }\n  to   { opacity: 1; transform: translateY(0)   scale(1); }\n}\n@keyframes modal-backdrop-in {\n  from { opacity: 0; }\n  to   { opacity: 1; }\n}\n\n/* Focus panel — now lives inside the dialog */\n.focus-panel {\n  display: flex;\n  flex-direction: column;\n  border: 1px solid var(--border-strong);\n  border-radius: 0.6rem;\n  background:\n    linear-gradient(180deg, rgba(255,255,255,0.025), transparent 30%),\n    var(--bg-surface);\n  overflow: hidden;\n  position: relative;\n  max-height: 88vh;\n}\n.focus-panel::before {\n  content: ''; position: absolute; top: 0; left: 0; right: 0; height: 2px;\n  background: linear-gradient(90deg, transparent, var(--t-color), transparent);\n  opacity: 0.85;\n  z-index: 3;\n}\n\n/* Scrollable body inside the panel; head + foot stay docked */\n.fp-scroll {\n  flex: 1 1 auto;\n  overflow-y: auto;\n  overscroll-behavior: contain;\n  scrollbar-width: thin;\n  scrollbar-color: var(--border-strong) transparent;\n}\n.fp-scroll::-webkit-scrollbar { width: 8px; }\n.fp-scroll::-webkit-scrollbar-track { background: transparent; }\n.fp-scroll::-webkit-scrollbar-thumb {\n  background: var(--border-strong);\n  border-radius: 4px;\n}\n.fp-scroll::-webkit-scrollbar-thumb:hover {\n  background: rgba(212, 168, 83, 0.45);\n}\n\n.fp-head {\n  display: flex; justify-content: space-between; align-items: center;\n  padding: 0.9rem 1.5rem;\n  border-bottom: 1px solid var(--border);\n  background: rgba(0,0,0,0.25);\n  flex-shrink: 0;\n}\n.fp-head-l { display: flex; align-items: center; gap: 0.85rem; flex-wrap: wrap; }\n.fp-head-l .num-glyph { font-size: 0.78rem; }\n.fp-close {\n  background: none; border: 1px solid var(--border);\n  color: var(--text-muted); padding: 0.25rem 0.65rem;\n  border-radius: 0.15rem; font-family: var(--font-mono); font-size: 0.62rem;\n  text-transform: uppercase; letter-spacing: 0.08em;\n  transition: color 0.15s, border-color 0.15s;\n  white-space: nowrap;\n}\n.fp-close:hover { color: var(--accent); border-color: rgba(212, 168, 83, 0.4); }\n.fp-hero {\n  display: grid;\n  grid-template-columns: 180px 1fr;\n  gap: 1.75rem;\n  padding: 2rem 1.75rem 1.75rem;\n  align-items: start;\n  border-bottom: 1px solid var(--border);\n}\n.fp-hero-illus {\n  width: 180px; height: 180px;\n  display: flex; align-items: center; justify-content: center;\n  border: 1px solid var(--border);\n  background:\n    radial-gradient(circle at center, rgba(var(--t-glow), 0.12), transparent 70%),\n    rgba(0,0,0,0.28);\n  border-radius: 0.5rem;\n  padding: 1.25rem;\n  position: relative;\n  overflow: hidden;\n}\n.fp-hero-illus::before {\n  content: ''; position: absolute; inset: 0;\n  background: radial-gradient(circle at top left, rgba(var(--t-glow), 0.1), transparent 60%);\n  pointer-events: none;\n}\n.fp-hero-illus img {\n  width: 100%; height: 100%; display: block; position: relative; z-index: 1;\n  filter: drop-shadow(0 4px 14px rgba(var(--t-glow), 0.18));\n}\n.fp-hero-text { min-width: 0; }\n.fp-title {\n  font-family: var(--font-text);\n  font-size: 1.85rem;\n  font-weight: 500;\n  color: var(--text-bright);\n  letter-spacing: -0.025em;\n  line-height: 1.12;\n  margin: 0 0 0.85rem;\n  text-wrap: balance;\n}\n.fp-essence {\n  font-size: 1.05rem;\n  line-height: 1.6;\n  color: var(--text);\n  margin: 0 0 1.1rem;\n  text-wrap: pretty;\n}\n.fp-pull {\n  border-left: 2px solid var(--t-color);\n  padding: 0.25rem 0 0.25rem 1rem;\n  margin: 0 0 1.1rem;\n  color: var(--text-muted);\n  font-style: italic;\n  font-size: 1.02rem;\n  line-height: 1.5;\n  text-wrap: balance;\n}\n.fp-kw { display: flex; flex-wrap: wrap; gap: 0.5rem; }\n.fp-kw .kw {\n  font-family: var(--font-mono);\n  font-size: 0.7rem;\n  color: var(--tag-color);\n}\n\n/* Article in focus panel */\n.fp-article {\n  padding: 1.25rem 1.75rem 1.5rem;\n  max-width: 44rem;\n  margin: 0 auto;\n}\n.fp-section {\n  padding-top: 2.5rem;\n  margin-top: 0;\n}\n.fp-section:first-child { padding-top: 1.5rem; }\n.fp-section-label {\n  display: flex; align-items: baseline; gap: 0.65rem;\n  margin-bottom: 1.25rem;\n  padding-bottom: 0.65rem;\n  border-bottom: 1px solid var(--border);\n}\n.fp-section-n {\n  font-family: var(--font-mono);\n  font-size: 0.65rem;\n  color: var(--t-color);\n  letter-spacing: 0.1em;\n  padding: 0.15rem 0.5rem;\n  border: 1px solid currentColor;\n  border-radius: 0.15rem;\n  opacity: 0.85;\n}\n.fp-section-name {\n  font-family: var(--font-mono);\n  font-size: 0.74rem;\n  color: var(--text-muted);\n  text-transform: uppercase;\n  letter-spacing: 0.08em;\n}\n.fp-section-body {\n  font-family: var(--font-text);\n  font-weight: 400;\n  font-size: 1.02rem;\n  line-height: 1.78;\n  color: var(--text);\n}\n.fp-section-body p {\n  margin: 0 0 1.35em;\n  text-wrap: pretty;\n}\n.fp-section-body p:last-child { margin-bottom: 0; }\n.fp-section-body em { font-style: italic; color: var(--text-bright); }\n.fp-section-body strong, .fp-section-body b {\n  font-weight: 500; color: var(--text-bright);\n}\n.fp-section-body h4 {\n  font-family: var(--font-text);\n  font-size: 1.05rem;\n  font-weight: 500;\n  color: var(--text-bright);\n  letter-spacing: -0.015em;\n  margin: 2rem 0 0.85rem;\n  line-height: 1.3;\n  display: flex; align-items: baseline; gap: 0.65rem; flex-wrap: wrap;\n  border-bottom: 0;\n  padding: 0;\n}\n.fp-section-body h4:first-child { margin-top: 0.25rem; }\n.fp-section-body h4 .ex-year {\n  font-family: var(--font-mono);\n  font-size: 0.65rem;\n  color: var(--text-muted);\n  font-weight: 400;\n  letter-spacing: 0.1em;\n  text-transform: uppercase;\n  padding: 0.15rem 0.5rem;\n  border: 1px solid var(--border);\n  border-radius: 0.15rem;\n}\n.fp-section-body ol, .fp-section-body ul {\n  padding-left: 0;\n  list-style: none;\n  counter-reset: fpitem;\n  margin: 0;\n}\n.fp-section-body ol li {\n  counter-increment: fpitem;\n  padding: 0.5rem 0 0.75rem 2.75rem;\n  position: relative;\n  border-bottom: 1px solid var(--border);\n  margin-bottom: 0.25rem;\n}\n.fp-section-body ol li:last-child { border-bottom: none; }\n.fp-section-body ol li::before {\n  content: counter(fpitem, decimal-leading-zero);\n  position: absolute;\n  left: 0;\n  top: 0.7rem;\n  font-family: var(--font-mono);\n  font-size: 0.7rem;\n  color: var(--t-color);\n  letter-spacing: 0.06em;\n  padding: 0.1rem 0.45rem;\n  border: 1px solid currentColor;\n  border-radius: 0.15rem;\n  opacity: 0.85;\n  line-height: 1.4;\n}\n.fp-section-body ul li {\n  padding: 0.25rem 0 0.35rem 1.5rem;\n  position: relative;\n}\n.fp-section-body ul li::before {\n  content: '—';\n  position: absolute;\n  left: 0;\n  color: var(--t-color);\n  opacity: 0.6;\n}\n\n/* Focus footer */\n.fp-foot {\n  display: flex; justify-content: space-between; align-items: center;\n  padding: 0.85rem 1.5rem;\n  border-top: 1px solid var(--border);\n  background: rgba(0,0,0,0.32);\n  flex-wrap: wrap;\n  gap: 0.75rem;\n  flex-shrink: 0;\n}\n.fp-foot-l, .fp-foot-r { display: flex; gap: 0.5rem; align-items: center; flex-wrap: wrap; }\n.fp-link {\n  background: none; border: 1px solid var(--border);\n  color: var(--text-muted);\n  padding: 0.4rem 0.8rem; border-radius: 0.15rem;\n  font-family: var(--font-mono); font-size: 0.68rem;\n  text-transform: uppercase; letter-spacing: 0.08em;\n  transition: color 0.15s, border-color 0.15s;\n  display: inline-flex; align-items: center; gap: 0.4rem;\n  white-space: nowrap;\n}\n.fp-link:hover { color: var(--text-bright); border-color: var(--border-strong); }\n.fp-arrow { color: var(--accent); font-size: 0.78rem; line-height: 1; }\n.fp-mark {\n  background: none; border: 1px solid var(--border);\n  color: var(--text-muted);\n  padding: 0.4rem 0.85rem; border-radius: 0.15rem;\n  font-family: var(--font-mono); font-size: 0.68rem;\n  text-transform: uppercase; letter-spacing: 0.08em;\n  transition: color 0.15s, border-color 0.15s, background 0.15s;\n  display: inline-flex; align-items: center; gap: 0.45rem;\n  white-space: nowrap;\n}\n.fp-mark:hover { color: var(--text-bright); border-color: var(--border-strong); }\n.fp-mark.is-read {\n  color: var(--tag-color);\n  border-color: rgba(126, 184, 168, 0.4);\n  background: rgba(126, 184, 168, 0.06);\n}\n.fp-tick { font-size: 0.8rem; line-height: 1; }\n/* Reading paths */\n.reading-paths {\n  display: grid;\n  grid-template-columns: repeat(3, 1fr);\n  gap: 0.75rem;\n  margin: 1rem 0 2.5rem;\n}\n.rp-card {\n  padding: 1rem 1.1rem;\n  border: 1px solid var(--border);\n  border-radius: 0.3rem;\n  background: rgba(255, 255, 255, 0.015);\n  transition: border-color 0.15s, background 0.15s;\n  color: var(--text);\n  display: block;\n}\n.rp-card:hover {\n  border-color: rgba(212, 168, 83, 0.3);\n  background: rgba(212, 168, 83, 0.03);\n  color: var(--text);\n}\n.rp-card .rp-tag {\n  font-family: var(--font-mono); font-size: 0.62rem;\n  color: var(--accent); letter-spacing: 0.1em; text-transform: uppercase;\n  display: block; margin-bottom: 0.5rem;\n}\n.rp-card .rp-title {\n  font-size: 0.98rem;\n  font-weight: 500;\n  color: var(--text-bright);\n  letter-spacing: -0.015em;\n  margin-bottom: 0.4rem;\n}\n.rp-card .rp-desc {\n  font-size: 0.82rem;\n  color: var(--text-muted);\n  line-height: 1.5;\n  margin: 0;\n}\n.rp-card .rp-laws {\n  margin-top: 0.6rem;\n  font-family: var(--font-mono);\n  font-size: 0.65rem;\n  color: var(--text-muted);\n  letter-spacing: 0.06em;\n}\n.rp-card .rp-laws b { color: var(--accent); font-weight: 500; }\n\n/* Make all interactive bits friendly to touch */\n.laws-feature button,\n.laws-feature .rp-card,\n.laws-feature dialog.laws-modal button {\n  touch-action: manipulation;\n}\n\n/* Responsive */\n@media (max-width: 900px) {\n  .laws-feature .tb-cards { grid-template-columns: repeat(2, 1fr); }\n  .laws-feature .reading-paths { grid-template-columns: 1fr; }\n}\n@media (max-width: 640px) {\n  /* Grid + cards */\n  .laws-feature .tb-cards { grid-template-columns: 1fr; }\n  .laws-feature .tb-head { grid-template-columns: auto 1fr; }\n  .laws-feature .tb-head .tb-sub { display: none; }\n\n  /* Modal sizing — nearly fullscreen with safe margin, dvh for iOS dynamic toolbar */\n  .laws-feature dialog.laws-modal {\n    width: 96vw;\n    max-width: 96vw;\n    max-height: 92vh;\n    max-height: 92dvh;\n    border-radius: 0.5rem;\n  }\n  .laws-feature .focus-panel {\n    max-height: 92vh;\n    max-height: 92dvh;\n    border-radius: 0.5rem;\n  }\n  /* Cheaper backdrop blur on phones */\n  .laws-feature dialog.laws-modal::backdrop {\n    -webkit-backdrop-filter: blur(4px);\n    backdrop-filter: blur(4px);\n  }\n\n  /* Head/foot tighten and grow tap targets */\n  .laws-feature .fp-head { padding: 0.7rem 1rem; }\n  .laws-feature .fp-close {\n    padding: 0.5rem 0.85rem;\n    min-height: 36px;\n    font-size: 0.66rem;\n  }\n  .laws-feature .fp-foot {\n    padding: 0.7rem 0.85rem;\n    gap: 0.5rem;\n  }\n  .laws-feature .fp-foot-l,\n  .laws-feature .fp-foot-r {\n    gap: 0.4rem;\n    flex-wrap: wrap;\n  }\n  .laws-feature .fp-link,\n  .laws-feature .fp-mark {\n    padding: 0.5rem 0.75rem;\n    min-height: 36px;\n    font-size: 0.66rem;\n  }\n\n  /* Hero stack + comfy reading */\n  .laws-feature .fp-hero {\n    grid-template-columns: 1fr;\n    gap: 1rem;\n    padding: 1.25rem 1.1rem;\n  }\n  .laws-feature .fp-hero-illus {\n    width: 100%;\n    max-width: 180px;\n    height: 140px;\n    margin: 0 auto;\n  }\n  .laws-feature .fp-title { font-size: 1.35rem; }\n  .laws-feature .fp-essence { font-size: 0.98rem; line-height: 1.55; }\n  .laws-feature .fp-pull { font-size: 0.95rem; }\n  .laws-feature .fp-article { padding: 1rem 1.1rem 1.25rem; }\n  .laws-feature .fp-section-body { font-size: 0.98rem; line-height: 1.72; }\n  .laws-feature .fp-section-body ol li { padding-left: 2.5rem; }\n\n  /* Toolbar / progress on phones */\n  .laws-feature .lt-row { padding: 0.65rem 0.85rem; gap: 0.6rem; }\n  .laws-feature .lt-label { min-width: 3.5rem; }\n  .laws-feature .lt-progress .lt-clear { padding: 0.3rem 0.55rem; }\n}\n\n@media (max-width: 420px) {\n  /* Very small phones — keep buttons readable, allow foot to wrap to two rows */\n  .laws-feature .fp-link,\n  .laws-feature .fp-mark { padding: 0.45rem 0.6rem; font-size: 0.62rem; }\n  .laws-feature .fp-head { padding: 0.6rem 0.85rem; }\n  .laws-feature .fp-title { font-size: 1.2rem; }\n  .laws-feature .fp-hero-illus { height: 120px; max-width: 140px; }\n  .laws-feature .fp-section-label { gap: 0.5rem; }\n}\n\n@media (prefers-reduced-motion: reduce) {\n  .laws-feature *,\n  .laws-feature *::before,\n  .laws-feature *::after {\n    animation-duration: 0.01ms !important;\n    transition-duration: 0.01ms !important;\n  }\n}\n</style>\n<div class=\"laws-feature\">\n<div class=\"section-rule\">\n  <h2>The 18 laws</h2>\n  <span class=\"count\">18</span>\n  <span class=\"spacer\"></span>\n  <span class=\"hint\">tap a card to read it in full</span>\n</div>\n<div id=\"laws-root\"></div>\n<div class=\"section-rule\" style=\"margin-top: 3.5rem;\">\n  <h2>Reading paths</h2>\n  <span class=\"count\">3</span>\n</div>\n<div class=\"reading-paths\">\n  <div class=\"rp-card\">\n    <span class=\"rp-tag\">canonical</span>\n    <div class=\"rp-title\">Start at the beginning</div>\n    <p class=\"rp-desc\">Greene's order: master yourself, then decode others, then handle the social dynamics. Builds the vocabulary you need for the later laws.</p>\n    <div class=\"rp-laws\"><b>01</b> → <b>18</b></div>\n  </div>\n  <div class=\"rp-card\">\n    <span class=\"rp-tag\">most useful</span>\n    <div class=\"rp-title\">If you only read five</div>\n    <p class=\"rp-desc\">The five that change the most about how you read a room. Irrationality, role-playing, envy, defensiveness, aimlessness.</p>\n    <div class=\"rp-laws\"><b>01</b> · <b>06</b> · <b>08</b> · <b>13</b> · <b>15</b></div>\n  </div>\n  <div class=\"rp-card\">\n    <span class=\"rp-tag\">most uncomfortable</span>\n    <div class=\"rp-title\">The ones that sting</div>\n    <p class=\"rp-desc\">Greene at his most surgical. Repression, grandiosity, envy, death denial. Read these slowly and with someone who'll tell you the truth.</p>\n    <div class=\"rp-laws\"><b>03</b> · <b>04</b> · <b>08</b> · <b>10</b></div>\n  </div>\n</div>\n</div>\n<p>I recently finished Robert Greene's <em>The Laws of Human Nature</em> and it wouldn't\nleave me alone. Eighteen laws, each one a separate pattern in how people\nactually behave (not how we like to think we behave). I kept catching myself\nwatching the patterns play out in real life: in meetings, in news cycles, in my\nown head.</p>\n<p>So I built the thing above. Eighteen cards, one per law. Tap any of them to read\nmy full take, with two real-world examples and a short list of behavioural\nthings you can do this week. About a thousand words a law, around twenty\nthousand total. Your read-progress sticks between visits.</p>\n<p>If a card grabs you, that's the one to start with. If not, the canonical order\nat the top is a fine path. Either way, the actual book is much richer than this\nmap. <strong>Robert Greene</strong> wrote it; if you want the deep version,\n<a href=\"https://www.amazon.com.au/Laws-Human-Nature-Robert-Greene/dp/1781259194\">grab a copy on Amazon</a>{target=&quot;_blank&quot;\nrel=&quot;noopener&quot;}.</p>\n<p>A note on attribution. The names, the structure and the underlying observations\nare all Greene's (<em>The Laws of Human Nature</em>, Profile Books, 2018). The writing\nis mine, the modern examples are mine, and any wrong reading of his ideas is\nmine.</p>\n<script src=\"https://unpkg.com/react@18.3.1/umd/react.production.min.js\" crossorigin=\"anonymous\"></script>\n<script src=\"https://unpkg.com/react-dom@18.3.1/umd/react-dom.production.min.js\" crossorigin=\"anonymous\"></script>\n<script src=\"https://unpkg.com/@babel/standalone@7.29.0/babel.min.js\" crossorigin=\"anonymous\"></script>\n<script src=\"https://jonno.nz/laws/laws-data.js\"></script>\n<script src=\"https://jonno.nz/laws/laws-content.js\"></script>\n<script type=\"text/babel\" src=\"https://jonno.nz/lib/modal.js\" data-presets=\"react\"></script>\n<script type=\"text/babel\" src=\"https://jonno.nz/laws/laws-app.js\" data-presets=\"react\"></script>\n","date_published":"Tue, 26 May 2026 12:00:00 GMT","image":"https://jonno.nz/og/the-18-laws-of-human-nature.png"},{"id":"https://jonno.nz/posts/conscious-minimalism/","url":"https://jonno.nz/posts/conscious-minimalism/","title":"Conscious Minimalism","content_html":"<p>I build with Conscious Minimalism.</p>\n<p>Conscious Minimalism means every decision, every feature, every commitment is a\nchoice. Nothing ships by default. The shape grows with the customer, not the\nroadmap.</p>\n<p>I obsess over the interface, not the internals. The boundary is what you live\nwith, and what you refactor and scale.</p>\n<p>I optimise for speed to de-validation. The faster I kill a bad idea, the less\nsunk cost owns me.</p>\n<p>I fix one narrow painpoint, not a broad category or a platform play. One thing,\ndone properly.</p>\n<p>I aim to be the best at that one feature, not one of the good ones in the list.</p>\n<p>I chase niche over crowded. Niche means the idea is more unique, with a better\noutcome than the legacy approach.</p>\n<p>Crowded is a signal of competition. Without a 10x better solution, you probably\nwill not make it.</p>\n<p>I bet on AI-native platforms and APIs. Anything that does not use AI as a\nprimitive is already legacy.</p>\n","date_published":"Thu, 14 May 2026 12:00:00 GMT","image":"https://jonno.nz/og/conscious-minimalism.png"},{"id":"https://jonno.nz/posts/the-dent-and-the-crater/","url":"https://jonno.nz/posts/the-dent-and-the-crater/","title":"The dent and the crater","content_html":"<blockquote>\n<p>&quot;The world breaks every one and afterward many are strong at the broken\nplaces. But those that will not break it kills.&quot;</p>\n<p>— Ernest Hemingway, <em>A Farewell to Arms</em> (1929)</p>\n</blockquote>\n<p>Hemingway wasn't writing about leadership but he might as well have been. Some\nof the senior leaders I've worked with have taken serious dents over their\ncareers. The board that turns. The launch that lands flat. The cofounder who\nwalks. The round that doesn't close. The dents come with the job. The question\nis what's there a year later: a place you went back to and got strong at, or a\ncrater you've been quietly walking around since.</p>\n<p>The crater is the part that doesn't make the leadership playbooks. The bad year\ndoesn't get you on its own. The silent processing of the experience afterwards\ndoes, the quiet work of making sure that bad year can't happen again. You start\nshowing up to meetings already half-armoured. You stop trusting the kind of\ndecision that hurt you the last time. You add a layer of process around the work\nbecause process is what you wish you'd had then. Most of it doesn't land as a\nwound on the day. By the time it sets in, you've already started telling\nyourself a different story about it: you've learned, you've matured, you're\noperating at a higher level now.</p>\n<p>That's the bit that makes it nasty. Hardening reads as wisdom from the inside.\nYou flinch at the next pitch and call it prudence. The new hire brings an idea\nyou don't like, so you tell yourself you've seen this movie before. You shut\nyour door and call it focus. The mind narrates the armour as growth more often\nthan not, and the people around you go along with it because senior leaders\naren't expected to second-guess themselves out loud.</p>\n<p>The cost shows up sideways. You don't notice the day you stopped being\npersuadable by a junior engineer. You notice five years later, when the people\nworking for you have stopped bringing you the rough version of an idea because\nthey know what you'll do with it. The day you got bored of customer calls is the\nsame. You feel it a quarter later, when the deck you wrote turns out to be three\ncalls behind reality. The dents you didn't tend become the things you stop being\nable to see. Walls have that habit. They keep the threat out and the signal out\nat the same rate.</p>\n<p>The leaders who stay good at this over time do something specific about it. They\ndidn't get there by accident. Some have a coach they actually use, not the one\nHR signed them up with. A few run a peer group of two or three other operators\nwhere the deal is you say the real thing about the quarter, including the bit\nyou're embarrassed about. Others lean on a partner or a mate outside the\nindustry who couldn't care less what your title is and wouldn't be impressed if\nyou told them. The best ones I've watched have a practice that takes them\noff-stage. Riding, surfing, building, training. Doesn't matter what. The point\nis having a part of life where you're not on stage.</p>\n<p>The shape underneath these is the same. They keep at least one room in their\nlife where the armour comes off, and they go in there often enough that the dent\ndoesn't get to set. That's the work. It looks soft from outside. It isn't. It's\nmost of the reason these people are still doing the job at fifty in a way that\nresembles how they did it at thirty.</p>\n<p>People who skip this are the senior leaders you've been managed by at some point\nin your career, the ones you probably promised yourself you wouldn't become.\nThey've stopped reading the room. Most new ideas look like traps. Their\ncalendars are locked down to keep surprises out. They're not bad people. They're\npeople whose dents won, and the orgs around them are downstream of that win in\nways few people have the standing to call out.</p>\n<p>The team picks it up. That's the part founders and senior people miss most. You\ndon't get to decide which of your behaviours your reports model and which they\nignore. They copy the closed-off ones at the same rate as the rest, because\nyou're the senior person and that's the working definition of culture in\npractice. Your defensiveness teaches them caution. Your shortened patience\nteaches them to close things down faster than they should. The day you refused\nto say &quot;I don't know&quot; is the day they stopped saying it too. The crater spreads\noutward into the org and stops being something you can fix in private.</p>\n<p>Tending the dent is leadership work, the same line item as strategy, hiring, and\nboard management. Skip it for long enough and the dent becomes a crater, the\ncrater becomes the shape of the org, and the people who used to bring you their\nbest ideas start bringing them to someone else.</p>\n","date_published":"Wed, 06 May 2026 12:00:00 GMT","image":"https://jonno.nz/og/the-dent-and-the-crater.png"},{"id":"https://jonno.nz/posts/built-a-read-later-chrome-extension-because-pocket-died/","url":"https://jonno.nz/posts/built-a-read-later-chrome-extension-because-pocket-died/","title":"I Built a Read-Later Chrome Extension Because Pocket Died","content_html":"<p>I open about thirty tabs a day. I read maybe three of them. The rest are &quot;oh\nthat looks interesting, I'll come back to that,&quot; and then they sit there until\nmy browser starts choking and I close everything in a fit of guilt.</p>\n<p>Bookmarks aren't the answer. Bookmarks go into a folder I never open. Pocket was\nthe answer, except\n<a href=\"https://support.mozilla.org/en-US/kb/future-of-pocket\">Mozilla shut Pocket down in July 2025</a>,\nwhich I found out the way most people did: by going to save something and\ndiscovering the service was gone.</p>\n<p>So I built\n<a href=\"https://chromewebstore.google.com/detail/read-later/oedgonnjlnokhocngfjmflchbjdmgmag\">Read Later</a>.\nIt's a Chrome extension. It saves the page you're on. That's the whole pitch.</p>\n<h2>What it actually is</h2>\n<p>You hit <code>Cmd+Shift+L</code> and the current tab gets saved with its title, URL, and\nfavicon. Add a tag if you want. Done. There's a popup if you'd rather click a\nbutton, and a context menu if you'd rather right-click a link without opening it\nfirst.</p>\n<p>There's also a full-tab &quot;shelf&quot; view at <code>Cmd+Shift+K</code> where all your saved\narticles live. Search, filter by tag, mark as read, archive. The aesthetic is\nwarm parchment and moss green because I got tired of every productivity tool\nlooking like a Linear screenshot.</p>\n<p>That's it. No AI summaries, no recommendations, no &quot;people who saved this also\nsaved...&quot;, no newsletter, no account.</p>\n<h2>Why I didn't just use bookmarks</h2>\n<p>I tried. Bookmarks are designed for things you'll come back to many times. A\nread-later list is the opposite, you read each thing once and then it's done.\nThe lifecycle is different. Stuffing them into the same UI is why my bookmarks\nbar has been a graveyard for ten years.</p>\n<p>The other read-later apps that survived Pocket all want an account, a sync\nserver, and usually a subscription. For something whose whole job is &quot;remember\nthis URL until I read it,&quot; that's a lot of infrastructure. I wanted local\nstorage and nothing else.</p>\n<h2>Local-only on purpose</h2>\n<p>Everything lives in <code>chrome.storage.local</code>. Nothing leaves your browser. There's\nno server because there's nothing for a server to do. If you want to back up\nyour list, there's an Export button that gives you JSON. If you want to move it\nto another machine, Import takes the JSON back. If you want to share your shelf\nwith someone, &quot;Copy as Markdown&quot; gives you a clean list you can paste into Slack\nor an email.</p>\n<p>This was a deliberate call. Sync is the feature that turns a small tool into a\nservice, and a service needs accounts, and accounts need a backend, and a\nbackend needs my time and money forever. JSON in, JSON out is the version of\n&quot;sync&quot; that costs me nothing and gives the user complete control. Want it on two\nmachines? Export on one, Import on the other. Good enough.</p>\n<h2>Shipping to the Chrome Web Store</h2>\n<p>The real experiment was the launch. I've shipped plenty of things over the years\nbut never to the Chrome Web Store, and I wanted to see what that pipeline felt\nlike end to end. Pretty smooth, no drama. You pay the developer fee, fill in the\nlisting, upload a zipped manifest, wait for review. Mine went through first\ntime.</p>\n<p>The work was in the listing copy and the screenshots. You're suddenly writing\nfor a discovery page where people decide in three seconds whether to click\ninstall. That's a different muscle from writing a README. It made me think\nharder about what the extension actually does for someone who isn't me.</p>\n<p>I've got a folder on my laptop called &quot;side projects&quot; with about a dozen things\nin it, each solving some specific bit of friction in my own life. Most of them\nstay there. This one made it out because the friction was real, the build took\nan afternoon, and putting something on the Chrome Web Store turned out to be a\nsatisfying little exercise.</p>\n<p>If you've got tabs piling up and don't want to hand your reading list to a\nstartup, it's\n<a href=\"https://chromewebstore.google.com/detail/read-later/oedgonnjlnokhocngfjmflchbjdmgmag\">there</a>.\nSource on <a href=\"https://github.com/jonnonz1/read-later\">GitHub</a>, MIT licensed. Pin\nthe leaf icon and have a go.</p>\n","date_published":"Tue, 05 May 2026 12:00:00 GMT","image":"https://jonno.nz/og/built-a-read-later-chrome-extension-because-pocket-died.png"},{"id":"https://jonno.nz/posts/product-market-fit-is-a-gauntlet/","url":"https://jonno.nz/posts/product-market-fit-is-a-gauntlet/","title":"Product market fit isn't a stage, it's a gauntlet","content_html":"<p>Product market fit gets sold as a milestone. Find it and you're off to the\nraces. That's the bit nobody who's been through it actually believes.</p>\n<p>PMF is a gauntlet. It eats teams, it bends founders, and it quietly poisons the\ntechnical decisions you're proud of at the time. Most of the damage I've watched\ndone to good companies wasn't from missing PMF. It was from how they behaved\nwhile they were looking for it.</p>\n<p><img src=\"https://jonno.nz/img/posts/pmf-gauntlet/gauntlet-loop.svg\" alt=\"The PMF gauntlet loop — Vision feeding Hypothesis, Ship, Market signal, and Adapt, with three drag forces (rigid roadmap, comprehension debt, over-scaled architecture) pulling on the loop.\"></p>\n<h2>It's not for everyone, and that's fine</h2>\n<p>There's a particular kind of person who does well in the pre-PMF phase. High\ntolerance for ambiguity, low need for closure. The deck never feels finished,\nthe metric you're chasing changes every six weeks, and the answer to &quot;what are\nwe doing in three months&quot; is &quot;depends.&quot;</p>\n<p>Plenty of really good operators just cannot function in that environment. That's\nnot a character flaw — it's a stage mismatch. Some people thrive at zero to one.\nSome thrive at one to ten. Almost no one thrives at both, and the industry\npretending otherwise has cost a lot of careers and a lot of sanity.</p>\n<p>Naming this honestly so people can self-select is one of the kindest things a\nfounder can do. You're not letting someone down by saying &quot;this stage probably\nisn't for you, but the next one will be.&quot; You're saving them eighteen months of\nfeeling broken.</p>\n<h2>The variables you don't control will eat you alive</h2>\n<p>Timing, market, economy, what your one big regulator decides on a Tuesday — none\nof that is yours. You can have a sharp thesis and a great team and ship\nsomething nobody buys, because something three layers above you shifted while\nyou were heads-down.</p>\n<p>The only protection I've found against this is a vision the team is genuinely\nbought into. Not the slide. Not the wall poster. The actual reason you all got\nout of bed this morning. When the macro turns and the metric you were proud of\nlast quarter goes sideways, that vision is what stops the org from devouring\nitself.</p>\n<p>I've watched startups where the thesis was right but the timing was a year early\nlose half their team in three months because nobody could explain why they were\nstill doing what they were doing. It wasn't a strategy problem. It was an\nalignment problem dressed up as a strategy problem.</p>\n<p>This is also where founders take the most damage personally. You can do\neverything well and still get hit by something nobody could have predicted. If\nyour sense of self is tied to PMF being a verdict on you, that breaks people.\nThe ones I've seen come through it healthy treated PMF like weather they were\nnavigating, not a test they were passing.</p>\n<h2>Agility is the actual moat at this stage</h2>\n<p>Your moat isn't the product. It isn't the tech. It definitely isn't the brand.\nYour moat is how fast the org can spot a shift in TAM or target market and\ntranslate it into a product move.</p>\n<p>Days, ideally. Weeks if you have to. Not quarters.</p>\n<p>This is where the technical decisions made in the name of &quot;scaling&quot; quietly\ncripple you. The microservices you split out before you needed to. The custom\ninfrastructure someone stood up because their last job had it. The platform\nabstractions that mean a small UI change touches four repos. Each of those felt\ndisciplined at the time. Each is now a tax on the only thing you actually have —\nspeed.</p>\n<p>Andreessen wrote the\n<a href=\"https://pmarchive.com/guide_to_startups_part4.html\">original PMF essay</a> almost\ntwenty years ago, and the line that's aged best is the bit about doing whatever\nit takes — changing people, rewriting the product, moving markets. That's not a\nlicense to be chaotic. It's a reminder that the org needs to be physically\ncapable of those moves. If your architecture, process, or contracts make\nrewriting the product a six-month project, you've already lost the gauntlet\nwhether you know it yet or not.</p>\n<p>I've got a strong opinion on this one: when in doubt, build it boring. Boring is\nfast to change.</p>\n<h2>The first cohort is a dance, and you have to lead</h2>\n<p>The customers who signed up first kept you alive. They also signed up for a\nslightly different company than the one you're now trying to become. That gap is\nwhere a lot of startups quietly die.</p>\n<p>Keep them too happy and you slow your evolution. Push too hard toward the new\nvision and you churn the cohort that's funding your runway. The actual job is to\ndo both at once, which is why I sometimes call it internal schizophrenia. You're\na different company to them than you are to yourselves, and that's not a bug —\nthat's the mode you're operating in.</p>\n<p>The skill is being honest with the early cohort about where you're going without\nselling them something they didn't buy. The art is using their feedback to\nsharpen the bigger vision rather than letting yourself be pulled back into being\ntheir bespoke vendor. The dance is doing both of those without your team\nthinking you've gone off-piste, because the gap between &quot;what we're shipping\ntoday&quot; and &quot;where we're going&quot; looks weird from the inside.</p>\n<h2>Where PMF teams quietly self-sabotage</h2>\n<p>Three patterns I keep seeing.</p>\n<p><img src=\"https://jonno.nz/img/posts/pmf-gauntlet/discipline-vs-fragility.svg\" alt=\"Discipline vs fragility — three patterns where what looks like discipline (microservices on day one, rigid quarterly planning, founder comprehension debt) becomes fragility (can't pivot, defending old assumptions, context never reaches the team).\"></p>\n<p>Engineering over-scales the architecture. The team builds for the company they\nwant to be in two years instead of the company they need to be this quarter. By\nthe time PMF actually shows up, the org can't move. Worse, the engineers feel\nbusy and capable the whole time it's happening, which is why it's so hard to\nstop. Nobody is asking to slow down — everyone is shipping.</p>\n<p>Product holds the roadmap too tightly. The roadmap <em>is</em> the experiment at this\nstage. Treating it like a commitment is a category error. The product teams I've\nseen do this well treat the roadmap like a hypothesis with version numbers —\nlast month's was wrong, this month's is less wrong, and that's how it's supposed\nto feel. The ones who don't end up defending decisions they made when they knew\nless.</p>\n<p>Founder comprehension debt builds up faster than anyone notices. The founder is\nheads-down on signal — every customer call, every dropped deal, every weird\npattern in the data lands in their head and gets metabolised on the spot. The\nteam is two beats behind, working from last week's mental model. Each individual\ndelay feels minor. The cumulative gap is the thing that kills decisions.</p>\n<p>Each of these looks like discipline from the inside. Each of these is fragility\nwearing discipline's clothes.</p>\n<h2>AI changes the moat conversation, not the gauntlet</h2>\n<p>Moats in the AI space are shifting quarter by quarter right now. Feature moats\nhave basically collapsed — anything you can describe in a screenshot can be\ncloned in a weekend with the current generation of tools. What's\n<a href=\"https://www.latitudemedia.com/news/in-the-age-of-ai-can-startups-still-build-a-moat/\">actually defensible has moved</a>\ntoward proprietary data, deeply embedded workflows, distribution, trust, and\nregulatory positioning.</p>\n<p>For a founder in the PMF gauntlet that means the playbook is unreliable in a way\nit wasn't five years ago. You can't just lift what worked for the last cohort of\nSaaS winners and run it. You have to reason from first principles about where\nyour actual edge is going to come from over the next eighteen months, and place\nchips accordingly.</p>\n<p>The gauntlet itself hasn't changed. The chips you're placing have. That's harder\nthan it sounds, because most of us were trained in an era when the moat\nconversation was settled.</p>\n<h2>The version of PMF nobody puts on a slide</h2>\n<p>The version of PMF that gets written up in case studies is always cleaner than\nthe thing itself. There's a chart, a moment, a story arc. From the inside it\nnever looks like that.</p>\n<p>It looks like a year of conflicting signals. Two customers who love it, four who\nchurned, one who would buy if you built a feature you're not sure fits the\nvision. A board update where you're trying to make a noisy graph sound coherent.\nAn engineer asking a perfectly reasonable question about the roadmap that you\ncan't answer without lying a little.</p>\n<p>The companies that come through it don't seem, in retrospect, to have had better\ninformation than the ones that didn't. They had roughly the same fog. They just\nkept reasoning out loud, kept revising in public, and didn't pretend the chart\nwas already drawn.</p>\n<p>The case studies leave the fog out. That's most of it.</p>\n","date_published":"Fri, 01 May 2026 00:00:00 GMT","image":"https://jonno.nz/og/product-market-fit-is-a-gauntlet.png"},{"id":"https://jonno.nz/posts/change-management/","url":"https://jonno.nz/posts/change-management/","title":"Change management","content_html":"<p>There's a whole business discipline called change management. Frameworks,\ncertifications, consultancies, the lot. Every big company has someone running it\nduring a restructure or a tech migration. Nobody runs it for you when your life\nturns over.</p>\n<p>Which is strange, because the personal version is the harder problem — and right\nnow, more people are facing it than at any point in recent memory.</p>\n<p>More than\n<a href=\"https://www.cnbc.com/2026/04/24/20k-job-cuts-at-meta-microsoft-raise-concern-of-ai-labor-crisis-.html\">92,000 tech workers have been laid off in 2026 alone</a>,\nbringing the total close to 900,000 since 2020. Meta cut 8,000 jobs last week.\nMicrosoft offered buyouts to 7% of its US workforce — the first time in its\n51-year history. Oracle has started cuts that could reach 30,000 by year end.\nCloser to home, Xero, Sharesies, Spark, One NZ and Eroad have all run their own\nrounds. AI is the headline reason, but the impact lands the same regardless of\nthe cause: hundreds of thousands of people closing a laptop and discovering\ntheir working identity has just been deleted.</p>\n<p>That's a lot of people being handed a forced version of personal change\nmanagement without ever signing up for the course.</p>\n<p>The business framing has the right insight buried in it.\n<a href=\"https://wmbridges.com/about/what-is-transition/\">William Bridges</a> made a\ndistinction in the 90s that most people miss: change is external, transition is\ninternal. Change is the new org chart, the redundancy email, the merger.\nTransition is what happens inside people's heads while all that is going on.\nChange can happen overnight. Transition takes as long as it takes.</p>\n<p>Personal change management is just transition without the org chart.</p>\n<p>I've been through a few years of it now. Not one big event — more like a slow\nstack of endings, some chosen, some not. Companies, relationships, versions of\nmyself I'd been building for a decade. The kind of stretch where you don't\nreally notice you're changing until you look up one day and the old you is gone.</p>\n<p>That's the part nobody warns you about. Real change isn't transformation. It's a\ncontrolled demolition followed by a slow rebuild, with a long, weird middle bit\nwhere neither the old you nor the new you is really there.</p>\n<h2>Something has to die</h2>\n<p>The thing that goes is usually the organising self. Whatever the old you was\narranged around — a fear, a need for approval, a story about who you had to be,\nan ambition that was really a wound. When that goes, the structure it was\nholding up collapses. That's the death. It's real.</p>\n<p>What survives is everything that wasn't load-bearing on the old arrangement.\nYour humour, your curiosity, the way you actually see people, the things you\ngenuinely care about. Those don't die because they weren't propping anything up.\nThey were just you, underneath.</p>\n<p>The disorienting part is feeling like a stranger to yourself and entirely\ncontinuous, at the same time. Both are true. The continuous parts are\ncontinuous. The organising self is gone. You're in between.</p>\n<h2>The middle is the work</h2>\n<p>Bridges calls this the neutral zone. The old reality has gone, the new one isn't\nthere yet. He says it's the hardest phase to manage, and most organisations rush\nthrough it because it looks unproductive. People do the same thing to\nthemselves.</p>\n<p>The temptation is to build a new identity fast, because the empty space is\nuncomfortable. Don't. Whatever you grab in a hurry will be made of whatever was\nlying around — which usually means the old patterns sneak back in wearing new\nclothes. Workaholism becomes &quot;building my legacy&quot;. Approval-seeking becomes\n&quot;being of service&quot;. Avoidance becomes &quot;protecting my peace&quot;. Same machine, new\npaint.</p>\n<p>The test is always: is this coming from fear or from truth? You'll know. The\nbody knows before the mind does. Pay attention to the part of you that goes\nquiet around certain people, certain projects, certain decisions. That's the\nsignal.</p>\n<h2>Fearlessness is a side effect</h2>\n<p>You don't get to fearless by trying. You get there by going through enough\nendings that the bluff stops working.</p>\n<p>Fear runs on a specific con: <em>if this thing happens, you won't survive it</em>. Not\nliterally die — but the you that exists now won't continue. You'll be broken,\nfinished, unrecognisable. The con works as long as it's untested. Then the thing\nhappens, and you go through it, and on the other side you notice you're still\nhere. Different, scarred, but continuous. The fear was lying about its hand.</p>\n<p>After that, fear can still show up — it doesn't leave — but it can't run the\nsame con. You've seen the card it was holding. Next time it says <em>you won't\nsurvive this</em>, some quiet part of you knows: I already did.</p>\n<p>That's not the absence of fear. It's knowing you can act from what's true even\nwith the fear in the room.</p>\n<h2>What's on the other side is ordinary</h2>\n<p>Here's the bit that surprised me. Once the demolition is done and the rebuild\nstarts, what comes back isn't impressive. It's just real. Less reactive. Less\nnoise. Less performance. You stop needing to be seen a particular way, partly\nbecause you've watched a few of those selves die and you don't trust the next\none enough to stake everything on it.</p>\n<p>The goal of change management — the personal kind — isn't to become someone\nadmirable. It's to become someone who's the same alone as in public. Someone who\ndoes the next true thing without announcing it. Most of the depth of this stuff\nlives in the texture of regular days. How you handle a boring Tuesday. Whether\nyou rest when you're tired or push through to prove something to nobody.</p>\n<p>Business change management has all this in it, and most people read it as a\nproject manager's manual. It's also a personal one. Endings, neutral zone, new\nbeginnings. Same shape, different blast radius.</p>\n<p>The seeds that grow through the demolition are the ones worth tending. The rest\nsorts itself out.</p>\n","date_published":"Sat, 25 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/change-management.png"},{"id":"https://jonno.nz/posts/three-ways-to-look-at-time/","url":"https://jonno.nz/posts/three-ways-to-look-at-time/","title":"Three Ways to Look at Time","content_html":"<p>ST-ResNet's core insight is that not all history is created equal.</p>\n<p>When you're predicting crime in Auckland next month, three different kinds of\npast information matter. What happened in the last couple of months: the recent\ntrend. What happened at the same time last year: the seasonal pattern. And\nwhat's been happening over the longer term: whether crime is generally rising or\nfalling in an area.</p>\n<p>ConvLSTM treats all of this as one continuous sequence and hopes the network\nfigures out which parts matter. <a href=\"https://arxiv.org/abs/1610.00081\">ST-ResNet</a>\ntakes a more opinionated approach. It separates these three temporal scales\nexplicitly and gives each one its own dedicated neural network branch.</p>\n<p>The original paper by Zhang et al. was about predicting crowd flows in Beijing.\nPeople move through cities in patterns that look a lot like crime patterns:\ndaily rhythms, weekly cycles, long-term trends. The architecture\n<a href=\"https://www.nature.com/articles/s41598-025-24559-7\">translates well to crime data</a>,\nwith some modifications.</p>\n<h2>Closeness, period, trend</h2>\n<p>The three branches each look at different slices of history:</p>\n<p><strong>Closeness</strong> captures what's been happening recently. For our monthly data,\nthis means the last 3 months. If South Auckland has been trending upward over\nthe last quarter, the closeness branch sees that momentum.</p>\n<p><strong>Period</strong> captures seasonal patterns. It looks at the same month in previous\nyears. So to predict January 2026, it pulls in January 2025 and January 2024.\nThe assumption is that crime has an annual rhythm, and the same month tends to\nlook similar year to year.</p>\n<p><strong>Trend</strong> captures longer-term shifts. It uses quarterly averages from further\nback: broad strokes of whether an area is seeing more or less crime over time.\nThis is the slowest-moving signal.</p>\n<p>Each branch independently processes its temporal slice through a stack of\nresidual convolutional blocks, then a learned fusion layer combines the three\noutputs:</p>\n<pre><code>prediction = W_c · closeness + W_p · period + W_t · trend + bias\n</code></pre>\n<p>Where <code>W_c</code>, <code>W_p</code>, and <code>W_t</code> are learned weights that vary by grid cell. This\nis a nice touch. It means the model can decide that the CBD's crime is mostly\ndriven by recent trends (closeness), while a residential suburb might be more\nseasonal (period). Different areas get different temporal recipes.</p>\n<h2>Residual blocks</h2>\n<p>Each branch uses residual convolutional units, the building blocks that made\n<a href=\"https://arxiv.org/abs/1512.03385\">ResNet</a> so successful in image recognition.</p>\n<p>The key idea: instead of learning the full output at each layer, the network\nlearns the <em>residual</em>, the difference between input and output. The identity\nshortcut connection means gradients flow cleanly through the network during\ntraining, which lets you stack more layers without the signal degrading.</p>\n<pre><code>ResUnit(X) = ReLU(Conv(ReLU(Conv(X))) + X)\n</code></pre>\n<p>That <code>+ X</code> at the end is the skip connection. If the layer has nothing useful to\nadd, it can learn weights near zero and just pass the input through. This makes\ndeeper networks stable, which matters when you're trying to learn spatial\nfeatures at multiple scales.</p>\n<p>For our grid, I use 4 residual units per branch. Each unit has two 3×3\nconvolutional layers with 32 filters. That's deep enough to capture spatial\nrelationships across several kilometres without being so deep that the model\noverfits on 36 months of training data.</p>\n<h2>The NZ-specific problem</h2>\n<p>Here's where theory meets reality, and it gets a bit awkward.</p>\n<p>ST-ResNet was designed for dense, high-frequency data. The Beijing crowd flow\npaper used 30-minute intervals over months of data: thousands of timesteps. The\ncrime papers that report strong results typically use daily data over several\nyears.</p>\n<p>We have 48 monthly timesteps. Total. The period branch (which looks at the same\nmonth in previous years) has at most 3 data points per month (2022, 2023, 2024\nto predict 2025/2026). The trend branch is working with quarterly averages from\na four-year window. It's not a lot of temporal data for an architecture that's\nspecifically designed to decompose temporal patterns.</p>\n<p>I had a feeling this would be the bottleneck, and it was.</p>\n<h2>Implementation</h2>\n<pre><code>Closeness branch:\n  Input: last 3 months (3 × 6 channels = 18 input channels)\n  → 4 ResUnits (32 filters, 3×3 kernels)\n  → Output: 32 channels\n\nPeriod branch:\n  Input: same month from 2 prior years (2 × 6 = 12 input channels)\n  → 4 ResUnits (32 filters, 3×3 kernels)\n  → Output: 32 channels\n\nTrend branch:\n  Input: 2 quarterly averages (2 × 6 = 12 input channels)\n  → 4 ResUnits (32 filters, 3×3 kernels)\n  → Output: 32 channels\n\nFusion:\n  → Learned weighted sum across branches\n  → Conv2d(32, 6, 1×1) → 6 crime type predictions\n</code></pre>\n<p>Total parameters: roughly 180k. Slightly smaller than the ConvLSTM, which is\nfine. ST-ResNet's power is supposed to come from the temporal decomposition, not\nfrom model size.</p>\n<p>Training uses the same setup as ConvLSTM: Adam optimiser, learning rate 1e-4,\nMSE loss on <code>log1p</code>-transformed values, early stopping with patience of 15\nepochs. On CPU, each run takes about 35 minutes, a bit faster than ConvLSTM\nsince there's no sequential recurrence to deal with.</p>\n<h2>Results</h2>\n<table>\n<thead>\n<tr>\n<th>Crime Type</th>\n<th>Hist. Avg MAE</th>\n<th>ConvLSTM MAE</th>\n<th>ST-ResNet MAE</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Theft</td>\n<td>1.28</td>\n<td>1.14</td>\n<td>1.18</td>\n</tr>\n<tr>\n<td>Burglary</td>\n<td>0.35</td>\n<td>0.32</td>\n<td>0.33</td>\n</tr>\n<tr>\n<td>Assault</td>\n<td>0.20</td>\n<td>0.19</td>\n<td>0.19</td>\n</tr>\n<tr>\n<td>Robbery</td>\n<td>0.04</td>\n<td>0.04</td>\n<td>0.04</td>\n</tr>\n<tr>\n<td>Sexual</td>\n<td>0.03</td>\n<td>0.03</td>\n<td>0.03</td>\n</tr>\n<tr>\n<td>Harm</td>\n<td>0.01</td>\n<td>0.01</td>\n<td>0.01</td>\n</tr>\n<tr>\n<td><strong>All types</strong></td>\n<td><strong>0.39</strong></td>\n<td><strong>0.35</strong></td>\n<td><strong>0.36</strong></td>\n</tr>\n</tbody>\n</table>\n<p>ST-ResNet beats the historical average but doesn't quite match ConvLSTM. The\naggregate MAE of 0.36 is a 7.7% improvement over the baseline, compared to\nConvLSTM's 10.3%.</p>\n<p>That's not a terrible result, but it's not what I was hoping for.</p>\n<h2>Why ConvLSTM wins here</h2>\n<p>When I dug into the learned fusion weights, the story became clear. The\ncloseness branch dominates. It gets 60–70% of the weight across most grid cells.\nThe period branch gets 20–25%, and the trend branch barely contributes at\n10–15%.</p>\n<p>The model is basically saying: &quot;Recent months matter most, seasonal patterns\nhelp a bit, and long-term trends are mostly noise.&quot; That's not a failure of the\narchitecture. It's a fair assessment of what's in the data.</p>\n<p>With only 2–3 examples of each calendar month, the period branch can't reliably\nlearn seasonal patterns. It's overfitting to individual years rather than\nextracting a stable seasonal signal. ConvLSTM handles this better because it\nprocesses the full sequence and implicitly learns seasonality from the\ncontinuous flow of months, without needing to explicitly align calendar periods.</p>\n<p>The trend branch suffers even more. Quarterly averages over a four-year window\ndon't give it much to work with. In the original crowd flow papers with years of\nhalf-hourly data, the trend branch captures genuine long-term shifts in\npopulation movement. Here, it's essentially learning a constant.</p>\n<h2>Where ST-ResNet does shine</h2>\n<p>Despite losing on aggregate, ST-ResNet has one clear advantage: it's better at\npredicting seasonal transitions.</p>\n<p>The months where crime shifts gears (the spring uptick in September/October and\nthe February dip) ST-ResNet handles more gracefully than ConvLSTM. The period\nbranch, sparse as its data is, does capture enough of the annual rhythm to\nanticipate these transitions a bit earlier.</p>\n<p>ConvLSTM tends to lag these transitions by about a month. It needs to &quot;see&quot; the\nuptick starting before it predicts continuation. ST-ResNet, by explicitly\nlooking at last year's same month, can anticipate the shift before it fully\nmaterialises in the recent sequence.</p>\n<p>For an operational forecasting tool, that one-month lead time on seasonal\ntransitions could be valuable. But in our test set metrics, it's a small\nadvantage that doesn't overcome ST-ResNet's overall weaker performance on\nmonth-to-month dynamics.</p>\n<h2>Head to head</h2>\n<table>\n<thead>\n<tr>\n<th>Metric</th>\n<th>Historical Avg</th>\n<th>ConvLSTM</th>\n<th>ST-ResNet</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Overall MAE</td>\n<td>0.39</td>\n<td>0.35</td>\n<td>0.36</td>\n</tr>\n<tr>\n<td>Theft MAE</td>\n<td>1.28</td>\n<td>1.14</td>\n<td>1.18</td>\n</tr>\n<tr>\n<td>Training time (CPU)</td>\n<td>N/A</td>\n<td>~40 min</td>\n<td>~35 min</td>\n</tr>\n<tr>\n<td>Parameters</td>\n<td>0</td>\n<td>~200k</td>\n<td>~180k</td>\n</tr>\n<tr>\n<td>Seasonal transitions</td>\n<td>Poor</td>\n<td>Lagging</td>\n<td>Better</td>\n</tr>\n<tr>\n<td>Spatial dynamics</td>\n<td>None</td>\n<td>Good</td>\n<td>Good</td>\n</tr>\n</tbody>\n</table>\n<p>ConvLSTM is the better model for this specific dataset. Not by a lot. We're\ntalking about small differences on already-small error values. But consistently\nbetter on the main crime types that have enough signal to matter.</p>\n<p>Neither model is a revelation. A 7–10% improvement over &quot;just use the historical\naverage&quot; is real but modest. Deep learning's strengths (learning complex\nnonlinear dynamics from huge datasets) are somewhat wasted on 48 monthly\ntimesteps over a relatively low-crime city.</p>\n<p>If I had daily data instead of monthly, or ten years instead of four, I'd expect\nST-ResNet to close the gap or pull ahead. Its architecture is fundamentally\nsound. The temporal decomposition is a genuinely good idea. It's just starved of\nthe data it needs to shine.</p>\n<p>Both models meaningfully beat the baselines. Both learn spatial patterns that\nsimple averages can't capture. And both are honest about the sparse crime types:\nthey predict near-zero and move on, which is the right call.</p>\n<p>Next up: we'll take these predictions and build something you can actually look\nat. A 3D interactive dashboard where you can watch crime patterns evolve across\nAuckland over time. The modelling was the hard bit. Making it visual is the fun\nbit.</p>\n","date_published":"Thu, 23 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/three-ways-to-look-at-time.png"},{"id":"https://jonno.nz/posts/what-an-hour-of-your-attention-is-worth/","url":"https://jonno.nz/posts/what-an-hour-of-your-attention-is-worth/","title":"What an hour of your attention is worth","content_html":"<p>I stood up a working social network for eight mates last weekend. Profile pages,\na shared feed, a photo wall, a jukebox bolted onto a spare domain. It took me a\nSaturday, about forty bucks in Claude credits, and exactly zero\nproduct-market-fit meetings.</p>\n<p>The same weekend, Meta earned about six bucks off me. Google made ten. LinkedIn,\nYouTube, TikTok, X — all quietly billing in the background, none of them sending\na receipt. If you add them all up for the average American, the annual total is\nnorth of $1,000. You just never see it, because no money changes hands and no\ninvoice arrives.</p>\n<p>The clever thing about &quot;free&quot; on the internet isn't that the trade doesn't\nexist. It's that it's been designed so you can't see it. No money moves. No\ninvoice lands. No app shows you the meter ticking as you scroll. The exchange is\nreal — your attention and your data in, Instagram and Google and LinkedIn out —\nbut by the time the numbers get tallied, they live in a quarterly earnings\nreport you'll never read. So the trade feels weightless.</p>\n<p>It isn't. You just can't see the price tag.</p>\n<p>The strange thing is the price tag has been public the whole time. Every\nplatform listed on a stock exchange tells you, four times a year, exactly what\nyou're worth to them. You've just never been shown how to read it — and until\nrecently, the only practical alternative to reading it was &quot;live in a cabin.&quot;\nThat part has changed, and it's the part almost nobody is talking about.</p>\n<p>The invisibility isn't an accident either. If Meta had to send you a cheque\nevery month for the money they made off you, you'd treat the relationship very\ndifferently. You'd notice when the amount went up. You'd notice that the\nteenager version of the payment looks nothing like the adult version. You'd\nwonder why the Auckland cheque was ten times the Jakarta one for the exact same\nhour of scrolling. The whole edifice of &quot;free&quot; rests on keeping the accounting\none-sided — they measure you in basis points to three decimal places, you\nexperience the trade as a vague sense of having lost your afternoon.</p>\n<h2>The price tag they're legally required to print</h2>\n<p>The number you want is called ARPU — average revenue per user. Every public\nplatform reports it, because investors demand it. The maths is blunt: take the\ncompany's annual revenue, divide by monthly active users. What comes out is what\nthe platform earns off the average human who shows up, per year.</p>\n<p>For Meta last year the global figure was about $52 per user. For YouTube's\nad-supported side, around $24. For\n<a href=\"https://www.linkedin.com/posts/dshapero_earnings-update-to-close-out-our-2025-fiscal-activity-7361399679256858624-vVg7\">LinkedIn it's $15 averaged across all 1.2B members</a>,\nbut much higher once you strip out the dormant accounts.</p>\n<p>These aren't guesses from a watchdog group. They're from the companies\nthemselves, in the part of the earnings release where the whole purpose is to\nconvince shareholders each user is worth more than last quarter. The incentive\nis to talk the number up, not down.</p>\n<p>Whatever ARPU says, the reality on the ground probably isn't lower. If anything,\nit's a floor.</p>\n<h2>Your annual bill, itemised</h2>\n<p>Rough figures, from the companies' own filings:</p>\n<ul>\n<li><strong>Meta</strong>: ~$52/yr global, ~$320 in the US</li>\n<li><strong>Google (all products)</strong>: ~$100/yr globally, ~$500 US —\n<a href=\"https://abc.xyz/investor/\">$400B in revenue</a> across ~4B users spanning\nSearch, Android, YouTube, Cloud and Workspace combined</li>\n<li><strong>YouTube ads alone</strong>: ~$24/yr global, ~$80 US</li>\n<li><strong>LinkedIn</strong>: $15/yr averaged across all 1.2B members, but ~$57/yr across the\n<a href=\"https://www.linkedin.com/posts/dshapero_earnings-update-to-close-out-our-2025-fiscal-activity-7361399679256858624-vVg7\">310M monthly active ones</a></li>\n<li><strong>TikTok</strong>: ~$16 global, ~$70 US — doubled in two years</li>\n<li><strong>Snapchat, Reddit, Pinterest, X</strong>: all in the $10–30/user/yr range</li>\n</ul>\n<p>The geographic skew is the part most people miss. Meta's figure in the US is\nroughly ten times what it is in Asia-Pacific. Europe sits in the middle at about\n$92. Same product, same features, same algorithm — different rate card, because\nad buyers pay more to reach wealthier audiences. You are literally worth more in\nAuckland than you are in Jakarta, and your feed is tuned accordingly.</p>\n<p><img src=\"https://jonno.nz/img/posts/arpu-meta-by-region.svg\" alt=\"Meta ARPU by region — US $320, Europe $92, global $52, Asia-Pacific $32\"></p>\n<p>The same skew shows up across every ad-funded platform. The US rate card is the\none the rest of the world gets compared to:</p>\n<p><img src=\"https://jonno.nz/img/posts/arpu-us-vs-global.svg\" alt=\"US vs global ARPU comparison — Google $500/$100, Meta $320/$52, YouTube $80/$24, TikTok $70/$16, LinkedIn $57/$15\"></p>\n<p>Marketplaces don't fit ARPU cleanly, but the extraction is still there if you\nlook for it. Uber and Lyft take around 20% of each fare. Airbnb combines host\nand guest fees for about 14–16%. DoorDash and Uber Eats take closer to 25%.\nShopify's card take is 2.9% plus 30 cents per transaction. Different mechanism,\nsame game — a percentage of every transaction, quietly skimmed, never itemised.</p>\n<h2>The meter, in dollars per hour</h2>\n<p>ARPU is annual. Attention isn't spent in years though — it's spent in hours, in\nthe little windows between other things. So the honest conversion is to divide.</p>\n<p>The average US Meta user burns about 200 hours a year across Facebook and\nInstagram. $320 ÷ 200 = roughly $1.60 per hour of your attention. YouTube works\nout to about $0.27/hour. TikTok $0.22. Snapchat cheaper still. Do the same sum\non global averages and Meta drops to around 26 cents an hour, YouTube to 8.</p>\n<p>Those rates are only what the platform <em>earns</em> this year, mind. They aren't what\nyour data is ultimately <em>worth</em>. Everything you click and hover and pause on\nfeeds ad targeting across the wider web, plus — now — AI training corpora. ARPU\nis the rent. The equity is bigger, and the equity compounds.</p>\n<p>The AI-training bit is genuinely new and worth pausing on. For fifteen years the\ndata you generated on these platforms powered one thing: better ad targeting on\nthose same platforms. It was a closed loop. You scrolled, they learned, they\nsold the targeting back to advertisers, the advertisers bought your attention\nagain. Bounded. Weird, but bounded.</p>\n<p>That loop isn't bounded anymore. Your posts and comments and DMs are now\ntraining data for models that will be sold, resold, and embedded into every\npiece of software you touch for the next decade. The $320 Meta earned off you in\nthe US last year is a rounding error next to what the underlying corpus is worth\nto the next generation of AI products. ARPU doesn't capture any of that. It's\nliterally last quarter's ad rent, with none of the capital gains on the asset.</p>\n<p>Even the rent, laid out per hour, makes one thing obvious: you can see exactly\nwhy every platform is obsessed with &quot;time spent&quot; as a north-star metric. If one\nextra hour a week on Facebook is worth ~$83 a year per US user, multiplied\nacross three billion users, the maths for why the feed never stops scrolling is\nnot mysterious. The feed is a meter. Keeping it running is the business. Every\n&quot;new feature&quot; that shows up in your settings — reels, shorts, a nudge to open\nthe app on your commute — is a hand on that meter.</p>\n<p>Once you see it that way, a lot of product decisions stop looking like product\ndecisions.</p>\n<h2>Run your own numbers</h2>\n<p>The point of making the numbers this concrete is that you can plug in your own\nusage and see what you personally throw into the machine each year. Drag the\nsliders for how much time goes into each platform and watch the ledger tally up.\nRates are global averages.</p>\n<section class=\"ledger\" id=\"ledger\" aria-label=\"The Ledger calculator\"><style>\n.ledger{--lb:#141f2e;--lb2:#1a2637;--li:#e4e9ee;--ld:#bfc8d2;--lm:#6a7d92;--la:#d4a853;--lbd:rgba(255,255,255,.08);--lbs:rgba(255,255,255,.16);max-width:38rem;background:var(--lb);border:1px solid var(--lbs);border-radius:.35rem;padding:1.15rem 1.25rem 1rem;margin:2rem auto;color:var(--ld);font-family:text,'Roboto',-apple-system,sans-serif;font-size:.88rem;line-height:1.5;text-align:left}\n.ledger *,.ledger *::before,.ledger *::after{box-sizing:border-box}\n.ledger h3,.ledger h4{font-family:inherit;margin:0;padding:0;color:inherit;font-weight:inherit;font-size:inherit;letter-spacing:0;line-height:1.2}\n.ledger h3::before,.ledger h4::before{content:none}\n.ledger p{margin:0;padding:0}\n.ledger input{font:inherit;color:inherit;background:transparent;border:none;outline:none}\n.l-head{display:flex;justify-content:space-between;align-items:baseline;gap:.75rem;padding-bottom:.75rem;margin-bottom:.9rem;border-bottom:1px solid var(--lbs)}\n.l-title{font-family:serif,'Fraunces',Georgia,serif;font-weight:400;font-size:1.15rem;letter-spacing:-.015em;color:var(--li)}\n.l-tag{font-family:code,'JetBrains Mono',monospace;font-size:.58rem;letter-spacing:.16em;text-transform:uppercase;color:var(--lm)}\n.l-total{display:flex;align-items:baseline;justify-content:space-between;gap:1rem;padding:.85rem 1rem;background:#0e1623;border:1px solid var(--lbd);border-radius:.25rem;margin-bottom:1.1rem}\n.l-total-amt{font-family:serif,'Fraunces',Georgia,serif;font-weight:400;font-size:1.9rem;line-height:1;color:var(--la);letter-spacing:-.02em;font-variant-numeric:tabular-nums}\n.l-total-lab{font-family:code,'JetBrains Mono',monospace;font-size:.58rem;letter-spacing:.18em;text-transform:uppercase;color:var(--lm);text-align:right}\n.l-sec{margin-top:1.15rem}\n.l-sec:first-of-type{margin-top:0}\n.l-row{padding:.9rem 0;border-bottom:1px dashed var(--lbd)}\n.l-row:last-child{border-bottom:none}\n.l-row-top{display:flex;align-items:baseline;justify-content:space-between;gap:.75rem;margin-bottom:.55rem}\n.l-row-n{color:var(--li);font-size:.94rem;line-height:1.25}\n.l-row-meta{font-family:code,'JetBrains Mono',monospace;font-size:.6rem;letter-spacing:.05em;color:var(--lm);margin-top:.15rem;display:block}\n.l-row-a{font-family:serif,'Fraunces',Georgia,serif;font-size:1.05rem;color:var(--la);text-align:right;font-variant-numeric:tabular-nums;white-space:nowrap;flex-shrink:0}\n.l-row-a.z{color:var(--lm)}\n.l-row-c{display:flex;align-items:center;gap:.9rem}\n.l-row-v{font-family:code,'JetBrains Mono',monospace;font-size:.66rem;letter-spacing:.05em;color:var(--ld);white-space:nowrap;min-width:6rem;text-align:right}\n.l-sl{-webkit-appearance:none;appearance:none;flex:1;min-width:0;height:32px;background:transparent;cursor:pointer;padding:0;margin:0;touch-action:manipulation}\n.l-sl::-webkit-slider-runnable-track{height:3px;background:var(--lbd);border-radius:2px}\n.l-sl::-webkit-slider-thumb{-webkit-appearance:none;appearance:none;width:22px;height:22px;border-radius:50%;background:var(--la);border:3px solid var(--lb);margin-top:-10px;box-shadow:0 0 0 1px var(--la),0 2px 6px rgba(0,0,0,.3);cursor:grab}\n.l-sl:active::-webkit-slider-thumb{cursor:grabbing;box-shadow:0 0 0 1px var(--la),0 0 0 6px rgba(212,168,83,.22)}\n.l-sl::-moz-range-track{height:3px;background:var(--lbd);border-radius:2px}\n.l-sl::-moz-range-thumb{width:22px;height:22px;border-radius:50%;background:var(--la);border:3px solid var(--lb);box-shadow:0 0 0 1px var(--la)}\n.l-sl:focus::-webkit-slider-thumb{box-shadow:0 0 0 1px var(--la),0 0 0 6px rgba(212,168,83,.28)}\n.l-notes{margin-top:1.25rem;padding-top:.9rem;border-top:1px solid var(--lbs)}\n.l-notes h5{font-family:code,'JetBrains Mono',monospace;font-size:.56rem;letter-spacing:.18em;text-transform:uppercase;color:var(--lm);margin:0 0 .55rem;font-weight:400}\n.l-notes p{font-family:serif,'Fraunces',Georgia,serif;font-size:.85rem;line-height:1.55;color:var(--ld);margin:0 0 .4rem;text-wrap:pretty}\n.l-notes p:last-child{margin-bottom:0}\n.l-notes strong{color:var(--li);font-weight:500}\n@media (max-width:560px){\n.ledger{padding:1rem .9rem;margin:1.5rem auto;font-size:.92rem}\n.l-total{flex-direction:column;align-items:flex-start;gap:.25rem;padding:.75rem .9rem}\n.l-total-lab{text-align:left}\n.l-row{padding:1rem 0}\n.l-row-c{gap:.75rem}\n.l-row-v{min-width:5rem;font-size:.7rem}\n.l-sl::-webkit-slider-thumb{width:26px;height:26px;margin-top:-12px}\n.l-sl::-moz-range-thumb{width:26px;height:26px}\n}\n</style><div class=\"l-head\"><h3 class=\"l-title\">The Ledger</h3><span class=\"l-tag\">global averages · per year</span></div><div class=\"l-total\"><div class=\"l-total-amt\" id=\"l-total\">$0</div><div class=\"l-total-lab\">extracted per year</div></div><div class=\"l-sec\"><div id=\"l-attn-rows\"></div></div><div class=\"l-notes\"><h5>Notes on the method</h5><p><strong>ARPU is rent, not equity.</strong> What a platform earns this year isn't what the underlying data is worth across the wider web and AI training corpora.</p><p><strong>Averages hide heavy users.</strong> Freemium smears free and paying users into one figure. If you're all-in, you're worth more than average.</p><p><strong>Multi-product companies cheat the top line.</strong> Google's per-user number isn't all Search — it's Search plus Android plus YouTube plus Cloud.</p></div><script>(function(){var R={meta:.26,youtube:.08,tiktok:.05,x:.07,reddit:.12,snap:.07,pin:.15,li:.15,gq:.04};\nvar ATTN=[{id:'meta',name:'Meta (FB / IG / WhatsApp)',rate:'meta',unit:'hrs/day',mult:365,max:6,step:.25},{id:'youtube',name:'YouTube (ad-supported)',rate:'youtube',unit:'hrs/day',mult:365,max:6,step:.25},{id:'tiktok',name:'TikTok',rate:'tiktok',unit:'hrs/day',mult:365,max:6,step:.25},{id:'x',name:'X (Twitter)',rate:'x',unit:'hrs/day',mult:365,max:4,step:.25},{id:'reddit',name:'Reddit',rate:'reddit',unit:'hrs/day',mult:365,max:4,step:.25},{id:'snap',name:'Snapchat',rate:'snap',unit:'hrs/day',mult:365,max:4,step:.25},{id:'pin',name:'Pinterest',rate:'pin',unit:'hrs/day',mult:365,max:4,step:.25},{id:'li',name:'LinkedIn (free)',rate:'li',unit:'hrs/day',mult:365,max:2,step:.1},{id:'gq',name:'Google Search',rate:'gq',unit:'searches/day',mult:365,max:100,step:1}];\nvar state={attn:{}};\nfunction fmt(n){n=Math.round(n);if(n===0)return'$0';if(n>=1000)return'$'+n.toLocaleString();return'$'+n;}\nfunction recalc(){var tot=0;\nATTN.forEach(function(s){var v=state.attn[s.id]||0;var amt=v*R[s.rate]*s.mult;tot+=amt;var el=document.getElementById('l-a-'+s.id);if(el){el.textContent=fmt(amt);el.classList.toggle('z',amt<1);}var vl=document.getElementById('l-v-'+s.id);if(vl)vl.textContent=v+' '+s.unit;});\ndocument.getElementById('l-total').textContent=fmt(tot);}\nfunction renderAttn(){document.getElementById('l-attn-rows').innerHTML=ATTN.map(function(s){var meta='$'+R[s.rate].toFixed(2)+(s.unit==='searches/day'?' / search':' / hr');return '<div class=\"l-row\"><div class=\"l-row-top\"><div><span class=\"l-row-n\">'+s.name+'</span><span class=\"l-row-meta\">'+meta+'</span></div><span class=\"l-row-a z\" id=\"l-a-'+s.id+'\">$0</span></div><div class=\"l-row-c\"><input type=\"range\" class=\"l-sl\" data-cat=\"attn\" data-svc=\"'+s.id+'\" min=\"0\" max=\"'+s.max+'\" step=\"'+s.step+'\" value=\"0\" aria-label=\"'+s.name+' '+s.unit+'\"><span class=\"l-row-v\" id=\"l-v-'+s.id+'\">0 '+s.unit+'</span></div></div>';}).join('');}\nfunction renderAll(){renderAttn();recalc();}\nvar root=document.getElementById('ledger');\nroot.addEventListener('input',function(e){var t=e.target;if(t.classList.contains('l-sl')){state[t.dataset.cat][t.dataset.svc]=parseFloat(t.value)||0;recalc();}});\nrenderAll();})();</script></section>\n<p>The rates come from the earnings-report maths above — global ARPU divided by\naverage annual hours on the platform.</p>\n<h2>The weekend social network</h2>\n<p>Once the number has somewhere to sit, it's much harder to ignore.</p>\n<p>Most people look at a total over $1,000/yr and go quiet for a second. Not\nbecause any one platform is egregious — on a per-hour basis they really aren't —\nbut because the aggregate is real, and it's been invisible until now. That's the\nfirst useful thing the exercise does. It makes a choice possible.</p>\n<p>The obvious next move is to look at alternatives. Signal instead of WhatsApp.\nKagi or Brave Search instead of Google. Paid Spotify instead of ad-supported\nSpotify. Bluesky or Mastodon instead of X. Fastmail instead of Gmail. None are\nperfect, and some cost actual money — but once you can price what you're\ncurrently &quot;not paying&quot;, the paid alternative often looks less expensive than it\ndid five minutes ago. Fastmail at\n$5/month stops being a luxury when the honest comparison is &quot;$60/yr vs being the\nproduct for an ad network that paid $500 for me last year.&quot;</p>\n<p>That's the defensive move. It's the one everyone talks about, every time one of\nthese pieces gets written. You switch to the more honest vendor, you feel\nslightly better, and the fundamental shape of the market doesn't move.</p>\n<p>The more interesting move is what's happened on the <em>build</em> side, and it's the\npart almost nobody has internalised yet.</p>\n<p>Standing up a social app used to take a small team months. You needed a backend\nengineer, a frontend engineer, a designer, probably a DevOps person, and a spare\nthree months. That was the real moat — not the network effects, not the\nalgorithm, but the sheer human-hours required to put a working thing on the\ninternet. That's why the only viable answer for twenty years was to build\nsomething big enough to run ads against. Small social didn't exist because small\nsocial couldn't pay the salaries.</p>\n<p>With Claude Code, Cursor, v0, and Lovable, that equation has quietly inverted. A\nprofile page, a shared feed, a wall for photos, maybe a jukebox, a chat wall — a\nMySpace-sized thing for you and a dozen friends, on a domain you own, with none\nof it feeding anyone's ad platform — is a weekend. I know because I just did it.\nNot as some Silicon Valley startup trying to replace Facebook. As a Saturday\nproject for eight mates, on a domain that cost twelve bucks, running on a box\nthat costs ten a month.</p>\n<p>The bill of materials is embarrassingly short. A boring Postgres. A boring\nNext.js app. Auth via magic link. Storage for photos. An LLM for the fiddly bits\nnobody wants to write from scratch. All of it plumbed together in an afternoon\nof prompting, an evening of cleanup, and a Sunday of adding the jukebox because\nmy mate Hamish wouldn't stop asking.</p>\n<p>It is not good software. It is good <em>enough</em> software for eight humans who know\neach other.</p>\n<p>That qualifier is the whole thing. Facebook has to be good software at planet\nscale because Facebook is selling ad impressions at planet scale. A group of\neight doesn't need p99 latency and a content moderation policy. A group of eight\nneeds a place to put photos from the weekend where the photos don't end up\ntraining someone's image model in twelve months' time. Those are very different\nengineering problems, and the second one is much, much easier than the first.</p>\n<p>A lot of things genuinely don't work on the weekend version. There's no\nrecommendation algorithm. There's no real search. The feed is\nreverse-chronological and that's it. When someone posts something at 3am nobody\nsees it until the morning. There's no cleverness about which photos get surfaced\nor which memories get resurrected. If you go on holiday for two weeks, you come\nback to a feed that's exactly what your eight mates posted, in the order they\nposted it.</p>\n<p>That sounds like a limitation until you notice the thing it is not doing is\noptimising for your engagement. Reverse-chronological across eight friends is\nnot a meter. It's a wall. You check it, you see what's there, you leave. There's\nno reason for the software to try to keep you around because there's nobody\npaying the software to keep you around. That inversion — from meter to wall — is\nthe entire point.</p>\n<p>The thing that would have been a VC round in 2015 is now a side quest you finish\nbefore the roast is in the oven. The tools genuinely got that much better in the\nlast eighteen months. We just haven't updated our intuitions yet about what that\nmeans.</p>\n<p>What it means, specifically, is that the ad-supported social network is no\nlonger the only technically viable answer. For twenty years it was. That was the\nconstraint the whole &quot;free web&quot; was built around. The constraint is gone, and\nnobody has sent the memo.</p>\n<p>The cheapest social network in 2026 is the one you and six mates build on a\nSaturday afternoon. It doesn't scale. It doesn't need to. It costs less than a\nmonth of Netflix, produces no ad revenue for anyone, and feeds no one's training\nset. You own the domain. You own the data. You own the product decisions — which\nin practice means there are no product decisions, because nobody is trying to\nsqueeze another hour out of anyone's week.</p>\n<p>None of this replaces the platforms, to be clear. You still need Gmail for the\nrecruiter, LinkedIn for the job hunt, YouTube for the tutorial, WhatsApp for the\ngroup chat your family refuses to leave. The ad-supported internet isn't going\nanywhere and I'm not pretending it is. What's changed is that it's no longer the\nonly game in town. For the circle of people you actually care about — the eight\nmates, the cousins, the old uni flat — you don't have to hand them over to the\nad machine anymore. You can build them a room of their own, and the tools to\nbuild that room have become trivial in a way we haven't fully absorbed yet.</p>\n<p>The meter's been running your whole life. You just got the tools to turn it off.</p>\n","date_published":"Tue, 21 Apr 2026 12:00:00 GMT","image":"https://jonno.nz/og/what-an-hour-of-your-attention-is-worth.png"},{"id":"https://jonno.nz/posts/teaching-a-neural-network-to-watch-crime-like-video/","url":"https://jonno.nz/posts/teaching-a-neural-network-to-watch-crime-like-video/","title":"Teaching a Neural Network to Watch Crime Like Video","content_html":"<p>ConvLSTM was invented to predict rainstorms.</p>\n<p>Specifically,\n<a href=\"https://arxiv.org/abs/1506.04214\">Shi et al. at the Hong Kong Observatory</a>\nneeded to forecast radar echo maps: 2D grids of rainfall intensity that evolve\nover time. They had sequences of spatial images and wanted to predict the next\nframes. Sound familiar?</p>\n<p>That's exactly what we built in Part 3. Crime on a 500m grid, one frame per\nmonth, six channels for crime types. The Auckland crime tensor is structurally\nidentical to a weather radar sequence. Same dimensionality, same prediction\ntask, just a very different domain.</p>\n<h2>Why not regular LSTM?</h2>\n<p>Standard LSTM networks are fantastic at learning sequences. They're the backbone\nof a lot of time-series forecasting. But they have a fundamental problem with\nspatial data: they need flat vectors as input.</p>\n<p>To feed our 77×59 grid into a regular LSTM, we'd have to flatten it into a\nvector of 4,543 values per crime type. That's 27,258 values per timestep across\nall six channels. The network would process this as a sequence of big flat\nvectors, with no concept that cell (10, 5) is <em>next to</em> cell (10, 6).</p>\n<p>All the spatial structure (the fact that crime clusters, that hotspots have\nneighbourhoods, that the CBD is a contiguous area) gets thrown away. The model\nwould have to rediscover spatial relationships from scratch, purely from\ncorrelations in the flattened vector. With only 36 training months, that's not\nhappening.</p>\n<h2>The convolutional trick</h2>\n<p>ConvLSTM's insight is elegant. Take the standard LSTM equations (the input gate,\nforget gate, output gate, cell state update) and replace every matrix\nmultiplication with a convolution operation.</p>\n<p>In a regular LSTM:</p>\n<pre><code>input_gate = sigmoid(W_xi * x_t + W_hi * h_{t-1} + b_i)\n</code></pre>\n<p>In ConvLSTM:</p>\n<pre><code>input_gate = sigmoid(W_xi ∗ X_t + W_hi ∗ H_{t-1} + b_i)\n</code></pre>\n<p>That <code>∗</code> is a convolution instead of a matrix multiply. <code>X_t</code> is the full 2D\ngrid at time <code>t</code>, and <code>H_{t-1}</code> is the previous hidden state, also a 2D grid.\nThe convolution kernel slides across the spatial dimensions, so each cell's gate\nvalues depend on its local neighbourhood.</p>\n<p>This means the network naturally learns that a spike in cell (10, 5) might\naffect predictions for cell (10, 6). Spatial proximity is baked into the\narchitecture. It doesn't need to learn it from data.</p>\n<p>The kernel size controls how much spatial context each cell sees. A 3×3 kernel\nmeans each cell looks at its immediate 8 neighbours. Stack multiple ConvLSTM\nlayers and the effective receptive field grows. Deeper layers can capture\nrelationships between cells that are several kilometres apart.</p>\n<h2>Architecture choices</h2>\n<p>Here's what I settled on after a fair bit of experimentation (which on CPU means\n&quot;a lot of patient waiting&quot;):</p>\n<pre><code>Input: (batch, 6, 6, 77, 59), 6 months, 6 crime types, 77×59 grid\n  ↓\nConvLSTM2d(in=6, hidden=32, kernel=3×3, padding=1)\n  ↓\nBatchNorm2d\n  ↓\nConvLSTM2d(in=32, hidden=32, kernel=3×3, padding=1)\n  ↓\nBatchNorm2d\n  ↓\nConv2d(in=32, out=6, kernel=1×1), project to 6 crime type channels\n  ↓\nOutput: (batch, 6, 77, 59), next month prediction\n</code></pre>\n<p>Two ConvLSTM layers with 32 hidden channels each. The 3×3 kernel gives each cell\na neighbourhood view, and stacking two layers means the effective receptive\nfield covers about 1–1.5 km. Enough to capture the spatial extent of most crime\nhotspots.</p>\n<p>Why only 32 hidden channels? This is where the CPU constraint actually helps. A\nbigger model would be tempting with a GPU, but on a Ryzen 5 we need to keep it\ntight. 32 channels gives us about 200k trainable parameters: small enough to\ntrain in under an hour, large enough to learn meaningful spatial-temporal\npatterns.</p>\n<p>The 1×1 convolution at the end is a channel projection. It maps the 32 learned\nfeatures back to 6 crime type predictions.</p>\n<h2>Sequence length: six months</h2>\n<p>The lookback window is six months. The model sees January through June and\npredicts July. Then February through July to predict August. And so on.</p>\n<p>Six months captures one half of the seasonal cycle, which turned out to be the\nsweet spot. Shorter sequences (3 months) missed seasonal context. Longer\nsequences (12 months) didn't improve results, likely because the model doesn't\nhave enough data to learn year-long dependencies with only 36 training months\ntotal.</p>\n<p>The training set gives us 30 sequences (months 1–6 predict 7, months 2–7 predict\n8, all the way to months 30–35 predict 36). That's not a lot. Every sequence\ncounts.</p>\n<h2>Training details</h2>\n<pre><code class=\"language-python\">optimiser = Adam(lr=1e-4)\nloss = MSE  # on log1p-transformed values\nbatch_size = 4  # small because sequences are large\nepochs = 150 with early stopping (patience=15)\n</code></pre>\n<p>The <code>log1p</code> transformation from Part 3 is critical here. Raw crime counts range\nfrom 0 to 50+. After <code>log1p</code>, the range compresses to 0–4. Without this, the\nloss function would be dominated by the handful of high-count CBD cells, and the\nmodel would essentially ignore the rest of the grid.</p>\n<p>Training on CPU takes about 40 minutes per run. Not fast, but manageable. I\ncould typically fit in 3–4 experimental runs per evening, which meant progress\nwas slow but steady. Each run I'd tweak one thing (kernel size, hidden channels,\nlearning rate) and compare validation MAE.</p>\n<p>Early stopping triggers around epoch 80–100 in most runs. The model converges\nrelatively quickly, which makes sense given the small dataset and architecture.</p>\n<h2>Results</h2>\n<p>So how does ConvLSTM stack up against the baselines from Part 5?</p>\n<table>\n<thead>\n<tr>\n<th>Crime Type</th>\n<th>Hist. Avg MAE</th>\n<th>ConvLSTM MAE</th>\n<th>Improvement</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Theft</td>\n<td>1.28</td>\n<td>1.14</td>\n<td>10.9%</td>\n</tr>\n<tr>\n<td>Burglary</td>\n<td>0.35</td>\n<td>0.32</td>\n<td>8.6%</td>\n</tr>\n<tr>\n<td>Assault</td>\n<td>0.20</td>\n<td>0.19</td>\n<td>5.0%</td>\n</tr>\n<tr>\n<td>Robbery</td>\n<td>0.04</td>\n<td>0.04</td>\n<td>2.5%</td>\n</tr>\n<tr>\n<td>Sexual</td>\n<td>0.03</td>\n<td>0.03</td>\n<td>~0%</td>\n</tr>\n<tr>\n<td>Harm</td>\n<td>0.01</td>\n<td>0.01</td>\n<td>~0%</td>\n</tr>\n<tr>\n<td><strong>All types</strong></td>\n<td><strong>0.39</strong></td>\n<td><strong>0.35</strong></td>\n<td><strong>10.3%</strong></td>\n</tr>\n</tbody>\n</table>\n<p>A 10% improvement on the aggregate MAE. Not earth-shattering, but real.</p>\n<p>Theft gets the biggest lift because there's the most signal to work with. The\nmodel genuinely learns spatial dynamics that the historical average can't\ncapture. When a cluster of cells in South Auckland trends upward over several\nmonths, ConvLSTM picks up on that momentum and adjusts its predictions\naccordingly.</p>\n<p>Burglary sees a decent improvement too, likely driven by the spatial correlation\nwith theft that we spotted in the EDA.</p>\n<p>For the sparse crime types (robbery, sexual offences, harm) ConvLSTM basically\nlearns to predict near-zero, same as the baseline. There simply isn't enough\nsignal at 500m monthly resolution for these types. The model is honest about\nwhat it doesn't know, which I actually respect.</p>\n<h2>Where it shines and where it doesn't</h2>\n<p>The improvement isn't uniform across the grid. ConvLSTM does best in the\ntransition zones: cells on the edges of established hotspots where crime counts\nfluctuate month to month. It learns that these boundary cells tend to follow the\ntrend of their neighbours, which is exactly the kind of spatial-temporal pattern\nit was designed to capture.</p>\n<p>In the stable hotspot cores (the CBD, Manukau) the model performs about the same\nas the baseline. Those cells are consistently high, and the historical average\nalready captures that well.</p>\n<p>Where it properly struggles is with sudden spikes in normally quiet areas. A\ncell that's been near-zero for months and then gets 5 thefts in one month: the\nmodel doesn't see that coming. Neither does any other model, to be fair. Those\nevents are closer to random noise than learnable signal.</p>\n<h2>Putting it in perspective</h2>\n<p>A 10% MAE improvement is meaningful but modest.\n<a href=\"https://arxiv.org/pdf/2502.07465v1\">Recent ConvLSTM crime prediction papers</a>\nreport larger gains, but they typically work with much more data: years of daily\nrecords across cities with higher crime density. Our setup is tougher. Monthly\nresolution limits temporal signal, Auckland is relatively low-crime by global\nstandards, and we only have four years.</p>\n<p>The model is also running on CPU with a deliberately small architecture. A\nbigger model on a GPU might squeeze out more performance. But the point of this\nproject was always to see how far you can push it with modest resources, and a\n10% beat over simple baselines feels like a real result.</p>\n<p>The question now is whether ST-ResNet's different approach to temporal modelling\ncan do better. ConvLSTM processes time as one continuous sequence. ST-ResNet\nbreaks it into three separate temporal scales: closeness, period, and trend.\nWith a seasonal dataset like crime, that decomposition might be exactly what's\nneeded.</p>\n","date_published":"Thu, 16 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/teaching-a-neural-network-to-watch-crime-like-video.png"},{"id":"https://jonno.nz/posts/open-source-agent-that-teaches-claude-code-your-architecture/","url":"https://jonno.nz/posts/open-source-agent-that-teaches-claude-code-your-architecture/","title":"Open-Source Agent That Teaches Claude Code Your Architecture","content_html":"<p>AI has made building software cheap. A solo founder with Claude Code or Cursor\ncan ship an MVP in a weekend that would've taken a small team a month two years\nago. I've watched this happen across the NZ startup scene. Ideas that used to\ndie in the &quot;can we afford to build it&quot; phase now get built over a long weekend.</p>\n<p>This is mostly great. Velocity is what startups need. Cost of testing an idea is\nnow close to zero, and the business prioritises speed.</p>\n<p>The catch shows up when the idea works.</p>\n<p>AI builds for <em>right now</em>. It optimises for the current prompt, the current\nfile, the current feature. It doesn't think about what happens when your billing\nservice needs to handle 10x the volume, or when your email notifications need to\nmove from inline calls to a queue. It doesn't plan for the evolutionary pressure\nyour system will face once it has users.</p>\n<p>That's the gap I've been thinking about, and it's what led me to build\n<a href=\"https://github.com/jonnonz1/domain-agents\">domain-agents</a>.</p>\n<h2>Give the tools their credit</h2>\n<p>I want to be fair to the current generation of AI coding assistants. They're not\nstupid about finding code.</p>\n<p>Claude Code runs an agentic search loop (grep, glob, file reads) iterating\nthrough your codebase to find what's relevant. Boris Cherny (who created Claude\nCode) <a href=\"https://x.com/bcherny/status/2017824286489383315\">has said</a> they tried\nRAG with a local vector database early on and dropped it because agentic search\noutperformed it. Cursor takes a different approach: it\n<a href=\"https://read.engineerscodex.com/p/how-cursor-indexes-codebases-fast\">chunks your codebase, generates embeddings</a>,\nand stores them for semantic search so you can find code by concept rather than\nkeyword. Copilot combines semantic indexing with LSP-powered reference tracing\nfrom VS Code.</p>\n<p>The search works. If you ask Claude Code to find your billing service, it'll\nfind it. Ask Cursor for authentication logic and the embeddings will surface it\neven if the code never uses the word &quot;authentication.&quot;</p>\n<p>None of them understand the architecture those files live in.</p>\n<p>All the information needed to understand domain relationships sits in the code:\nimport graphs, interface signatures, dependency patterns. These tools don't\nextract or structure it that way. They find files one at a time. They don't map\nout that your billing service depends on the email service, that\n<code>BillingService</code> is consumed by two other domains, or that changing its\ninterface is a cross-domain event. The information is in the codebase. Nobody's\npulling it together.</p>\n<p>And every session starts from zero. The AI learned your architecture yesterday\nand forgot it today.</p>\n<h2>Evolutionary architecture for the AI era</h2>\n<p>My thesis: cheap AI-built MVPs plus expensive scaling problems point toward\nevolutionary architecture with domain-based boundaries.</p>\n<p>The idea isn't new. The reason it matters now is.</p>\n<p>In an evolutionary architecture, you focus on clean interfaces between business\ndomains. Your email service exposes a contract like\n<code>sendEmail(to, subject, body)</code>, and the rest of the system calls that interface.\nBehind the interface, the implementation evolves through stages as your scaling\nneeds change:</p>\n<pre><code class=\"language-mermaid\">graph LR\n    A[&quot;Inline\\n(direct call)&quot;] --&gt; B[&quot;Async\\n(fire &amp; forget)&quot;]\n    B --&gt; C[&quot;Queued\\n(BullMQ/SQS)&quot;]\n    C --&gt; D[&quot;Separate Service&quot;]\n    D --&gt; E[&quot;Distributed&quot;]\n</code></pre>\n<p>Day one, <code>sendEmail</code> is a function that calls Resend directly. Inline,\nsynchronous, dead simple. When traffic picks up, you drop the <code>await</code> and let it\nrun in the background. Later, you introduce BullMQ or SQS. Eventually it becomes\nits own service. The interface stays put. Only the implementation behind it\nchanges.</p>\n<p>This is the kind of evolution AI coding assistants are terrible at planning for.\nThey'll inline that email call because it works <em>right now</em>. They have no\nconcept of where this domain sits on its scaling trajectory.</p>\n<h2>Where domain-agents fits in</h2>\n<p><a href=\"https://github.com/jonnonz1/domain-agents\">domain-agents</a> is a CLI tool that\nruns static analysis on TypeScript codebases, discovers business domains, and\ngenerates AI agent context files for Claude Code and Cursor.</p>\n<pre><code class=\"language-bash\">domain-agents discover .    # Analyse codebase → proposal.json\ndomain-agents init .        # Generate agents/*.md + AGENTS.md\ndomain-agents hooks claude  # Wire into Claude Code (rules + MCP server)\ndomain-agents hooks cursor  # Wire into Cursor (.mdc rules)\n</code></pre>\n<p>After setup, opening <code>src/billing/invoice.ts</code> in Claude Code loads the billing\ndomain agent into context. The AI now knows: billing depends on email (coupling\nscore 0.23), exposes <code>BillingService</code> consumed by 2 other domains, sits at the\n&quot;inline&quot; scaling stage with a path toward async queuing, and has 3 tracked tech\ndebt items.</p>\n<p>It plans work accordingly. The context was loaded before the first prompt, no\nsearch required.</p>\n<h2>Five signals, not one</h2>\n<p>The discovery engine runs 5 analysis passes because no single signal identifies\nbusiness domains on its own.</p>\n<p>Directory structure works for greenfield projects (<code>src/auth/</code>, <code>src/billing/</code>)\nbut fails for legacy MVC apps. Import graphs capture coupling but not business\nintent. Package dependencies hint at external integrations but miss internal\ndomains.</p>\n<pre><code class=\"language-mermaid\">graph TD\n    S[&quot;Structure Analysis&quot;] --&gt; O[&quot;Signal Orchestrator&quot;]\n    I[&quot;Import Graph\\n(TS Compiler API)&quot;] --&gt; O\n    N[&quot;Naming Patterns&quot;] --&gt; O\n    D[&quot;Dependency Mapping\\n(npm → domain hints)&quot;] --&gt; O\n    IF[&quot;Interface Detection&quot;] --&gt; O\n    O --&gt; M[&quot;Merge Pipeline&quot;]\n    M --&gt; R[&quot;Domain Proposal&quot;]\n</code></pre>\n<p><strong>Structure</strong> detects whether the codebase is feature-organised,\nlayer-organised, mixed, or flat. <strong>Import graph</strong> uses the TypeScript Compiler\nAPI to parse each <code>.ts</code> file, resolve imports, and build a directed edge graph.\nType-only imports get weighted at 0.3 because they're a weaker coupling signal\nthan value imports. <strong>Naming patterns</strong> extract domain prefixes:\n<code>auth.controller.ts</code> → &quot;auth&quot;. <strong>Dependency mapping</strong> maps npm packages to\ndomain hints (<code>stripe</code> → billing, <code>@sendgrid/mail</code> → email). <strong>Interface\ndetection</strong> identifies files imported across domain boundaries and calculates\ncoupling scores between domain pairs.</p>\n<p>Each pass produces weighted signals. The orchestrator combines them with\nconfidence scoring: average signal strength plus a bonus for signal count,\ncapped at 0.99. Layer-organised codebases get an 0.85 multiplier because they're\nharder to discover.</p>\n<h2>Most real codebases aren't clean</h2>\n<p>Feature-organised codebases are easy. The directory structure <em>is</em> the domain.\nBut most real codebases look like this:</p>\n<pre><code>src/\n  controllers/\n    auth.controller.ts\n    billing.controller.ts\n  services/\n    auth.service.ts\n    billing.service.ts\n  models/\n    invoice.model.ts\n    user.model.ts\n</code></pre>\n<p>Here <code>auth.controller.ts</code>, <code>auth.service.ts</code>, and <code>auth.routes.ts</code> all belong to\nthe &quot;auth&quot; domain despite living in three different directories. domain-agents\nuses naming pattern extraction cross-referenced with import graph cohesion to\ncluster these. The <code>auth.*</code> files form a tight import cluster, which confirms\nthe naming signal.</p>\n<h2>Merging is the hard bit</h2>\n<p>Raw signals produce too many small, overlapping clusters. The orchestrator runs\na multi-phase normalisation pipeline.</p>\n<p>Plurals merge: <code>journals</code> + <code>journal</code> → whichever has more files. Compound names\nconsolidate: <code>bank-balance</code> + <code>bank-statement</code> + <code>bank-transaction</code> →\n<code>bank-accounts</code> (the largest cluster). Small clusters merge into their strongest\nimport target, but only if they have a dominant dependency: more than 40% of\nimports from one target, and that target is at least 2x larger. This prevents\ncascading, where A merges into B, B gets bigger and attracts C, C pulls in D.</p>\n<p>Files that import from 3+ domains get moved to &quot;unassigned.&quot; These are coupling\nhotspots: middleware, orchestrators, shared handlers. Assigning them to one\ndomain would mislead the AI, so the tool surfaces them for a human decision.\nThat's the right call for architectural boundaries.</p>\n<p>The E2E test suite validates the complete pipeline against 3 fixture codebases\n(feature-organised, layer-organised, mixed). Current benchmark: 100% activation\naccuracy across all 3 patterns and all 3 activation levels (domain assignment,\nglob matching, MCP lookup).</p>\n<h2>Auto-activation, not search</h2>\n<p>The integration into Claude Code and Cursor uses glob-based rule activation, the\nnative mechanism both tools already support.</p>\n<p>Each domain gets a rule file with glob patterns in the frontmatter:</p>\n<pre><code class=\"language-yaml\">---\ndescription: billing domain\nglobs:\n  - src/billing/**\n  - **/billing.*\n  - **/billing-*\n---\n</code></pre>\n<p>When Claude Code opens a file matching those globs, the domain context loads. No\nMCP call, no background process, zero runtime overhead.</p>\n<p>An <a href=\"https://modelcontextprotocol.io/\">MCP server</a> complements the rules with 4\non-demand tools: <code>domain_lookup(file)</code>, <code>domain_context(name)</code>,\n<code>domain_files(name)</code>, and <code>list_domains()</code>. A SessionStart hook prints a domain\nsummary at the start of every Claude Code session, so the AI has system-level\nawareness from the first prompt.</p>\n<h2>Agents as a team model</h2>\n<p>This is the bit I'm most keen on long-term.</p>\n<p>At Vend and Xero, teams owned domains. The billing team owned billing, the\nintegrations team owned integrations. Ownership meant knowing the interfaces,\nthe coupling points, the tech debt, and where things were headed. That knowledge\nlived in people's heads and got passed on through code reviews, architecture\nchats, and tribal memory.</p>\n<p>Domain-specific AI agents formalise that same ownership model. An email agent\nloads the email domain's interface contract, its coupling to other domains, its\ncurrent scaling stage, and its tracked tech debt. A billing agent carries the\nsame for billing. They work within their boundaries and flag when a change\ncrosses a domain line.</p>\n<p>You don't need this from day one. Early on, one agent covers multiple areas. As\nthe product grows, agents split along the same lines engineering teams split: by\nbusiness domain. The operator (that's you) resolves conflicts where agents\ndisagree, the same way an engineering manager resolves cross-team dependencies.</p>\n<p>The analogy is rough, but it captures how AI-assisted development scales past a\nsingle person staring at a single context window.</p>\n","date_published":"Wed, 15 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/open-source-agent-that-teaches-claude-code-your-architecture.png"},{"id":"https://jonno.nz/posts/openhealth-chat-with-apple-health-data/","url":"https://jonno.nz/posts/openhealth-chat-with-apple-health-data/","title":"OpenHealth – Chat with Apple Health Data, Anywhere","content_html":"<p>For years I've worn an Apple Watch and let my iPhone quietly hoover up my\nresting heart rate, HRV, sleep stages, every workout, every nutrition log.\nMillions of data points. And for most of that time, when I wanted to actually\n<em>ask</em> something about my training — &quot;am I cooked this week?&quot;, &quot;has my recovery\ngotten worse since Christmas?&quot; — I'd open ChatGPT and get an answer that was\nbasically vibes, because it couldn't see any of the data.</p>\n<p>So I built <a href=\"https://github.com/jonnonz1/openhealth\">openhealth</a>. It turns your\nApple Health export into seven short markdown files any LLM can read. Drop the\nzip in your browser at\n<a href=\"https://openhealth-axd.pages.dev/\">openhealth-axd.pages.dev</a>, run the CLI, or\nbeam the zip straight from your iPhone over WebRTC. Paste the output into Claude\nor ChatGPT and start asking the questions you actually wanted to ask.</p>\n<p><img src=\"https://jonno.nz/img/posts/openhealth/hero.png\" alt=\"openhealth's web app — drop the zip, get seven markdown files, nothing uploaded\"></p>\n<h2>What's US-only and why that's annoying</h2>\n<p>In January, Anthropic\n<a href=\"https://www.macrumors.com/2026/01/22/claude-ai-adds-apple-health-connectivity/\">shipped an Apple Health connector</a>\nfor Claude. OpenAI has one in ChatGPT. Both are US-only — if you're in New\nZealand like me, or the UK, EU, or Switzerland,\n<a href=\"https://context-link.ai/blog/chatgpt-connectors\">they're not available</a>. That's\na lot of people locked out of the most natural way to use this data.</p>\n<p>And even if you are in the US, you're letting Anthropic or OpenAI decide what\nthe model reads, how it's framed, and what tier unlocks it. I wanted control\nover the whole pipeline — including which LLM I feed it into.</p>\n<h2>What I built</h2>\n<p>openhealth ships three ways.</p>\n<p><strong>A static web app.</strong> Drop <code>export.zip</code>, wait five seconds, download seven\nfiles. The browser does the parse. There's no upload endpoint because there's no\nserver — the Cloudflare Pages site is static HTML plus a tiny Web Worker. Open\nDevTools, watch the Network panel, nothing goes out.</p>\n<p><strong>A Bun-compiled CLI.</strong> <code>openhealth ~/export.zip -o ./output</code> gets you seven\nmarkdown files. <code>--bundle</code> concatenates them into one. <code>--clipboard</code> pushes that\nbundle straight to your system clipboard so you can paste it into any chat\nwindow. Zero deps beyond <code>saxes</code> for XML and <code>fflate</code> for unzip — even the\nargument parsing is <code>node:util parseArgs</code>, not Commander. One binary, put it\nwherever.</p>\n<p><strong>A phone-to-desktop handoff over WebRTC.</strong> The desktop site renders a QR code.\nPoint your iPhone camera at it, Safari opens a tiny receiver page, pick the zip,\nand it streams directly to your desktop browser over a DataChannel. The only\nbackend in the whole stack is a ~100-line Cloudflare Worker that relays the\nWebRTC handshake — it never sees a byte of your health data.</p>\n<p><img src=\"https://jonno.nz/img/posts/openhealth/walkthrough.png\" alt=\"Getting the export off your iPhone — six taps, or scan the desktop QR\"></p>\n<h2>How the parse actually works</h2>\n<p>Apple's <code>export.xml</code> is\n<a href=\"https://www.tdda.info/in-defence-of-xml-exporting-and-analysing-apple-health-data\">properly huge</a>.\nA long-term Watch user can easily have a 500MB–4GB file with millions of rows.\nMost XML parsers build a tree in memory, which OOMs before they finish.</p>\n<p>openhealth uses <a href=\"https://github.com/lddubeau/saxes\">saxes</a> — a streaming SAX\nparser in pure TypeScript. It's isomorphic, so the same parser runs in Bun,\nNode, and the browser. I tested it against a synthetic 169MB / 1 million-record\nexport and it finished in about 5 seconds in Chrome, with the main-thread heap\nstaying around 5MB because the parse runs in a Web Worker.</p>\n<p>The rest of the core is a small pipeline: stream XML, accumulate\nper-record-type, roll up into weekly and monthly summaries, run each through a\nwriter that produces one markdown file. Every writer is snapshot-tested against\nbyte-for-byte expected output. 85 tests, TDD throughout.</p>\n<h2>What the seven files are</h2>\n<p>Each one is deliberately small and shaped to be LLM-readable:</p>\n<ul>\n<li><code>health_profile.md</code> — baselines, data sources, long-term averages</li>\n<li><code>weekly_summary.md</code> — current week plus a 4-week rolling comparison with\nweek-over-week deltas</li>\n<li><code>workouts.md</code> — detailed log for the last 4 weeks: HR, duration, distance,\nenergy</li>\n<li><code>body_composition.md</code> — weight trend, recent readings, nutrition averages</li>\n<li><code>sleep_recovery.md</code> — nightly stages, 8-week averages, HRV, resting HR, SpO2\ntrends</li>\n<li><code>cardio_fitness.md</code> — running log, HR-zone distribution, walking-speed trends</li>\n<li><code>prompt.md</code> — a ready-to-paste system prompt that frames the other six as\ncoaching input</li>\n</ul>\n<p>Drop one file or all seven, depending on which chat model you're using.</p>\n<h2>What it's actually good at</h2>\n<p>Feeding real data to an LLM is a different experience from answering its\nquestions. When Claude can see that my resting HR has crept up 4bpm over the\nlast fortnight while my HRV has dropped and my training load stayed the same, it\ngives a real answer — &quot;you're likely undercooked on recovery this week, here's\nwhat I'd change&quot; — rather than a generic reminder to drink water.</p>\n<p>It's especially good if you've got multiple devices in the mix. I've got data\nfrom Apple Watch, the iPhone step counter, a Withings scale, and MyFitnessPal.\nThe parser picks the highest-trust source per metric — Apple Watch wins over\niPhone for steps, Watch sleep beats AutoSleep which beats Withings,\nduplicate-weight entries on the same day get deduped. You feed in one zip and\nget one coherent picture.</p>\n<p>Ask it about your recovery, your training load, what you might be doing wrong,\nhow your sleep correlates with your long runs. It'll tell you — and it'll be\nright more often than not.</p>\n<h2>If privacy matters, go all the way</h2>\n<p>openhealth itself never uploads your data. The web app parses in your browser\ntab. The CLI runs locally. The WebRTC handoff stays peer-to-peer — the\nCloudflare Worker that relays the handshake never sees a byte of the file. Clone\nthe repo, diff the build output, and confirm it yourself.</p>\n<p>When you paste the seven files into ChatGPT or Claude, <em>they</em> see the data.\nThat's the trade most people will take for convenience, and it's fine. But if\nyou don't want to make that trade, you don't have to — run the CLI and pipe the\nbundle into a local model:</p>\n<pre><code class=\"language-bash\">openhealth ~/export.zip --bundle -o ./out\nollama run llama3 &lt; ./out/openhealth.md\n</code></pre>\n<p>Ollama, llama.cpp, LM Studio, whatever you run. Your health data never leaves\nyour laptop. The output is just markdown — it doesn't care what reads it.</p>\n<p>That's why the shape is seven files and not an API. You pick what sees them.</p>\n<p>I'm not a doctor. Neither is the model. Use this for thinking out loud about\nyour own training, not diagnosing anything.</p>\n<p>MIT, source at\n<a href=\"https://github.com/jonnonz1/openhealth\">github.com/jonnonz1/openhealth</a>. Web\napp at <a href=\"https://openhealth-axd.pages.dev/\">openhealth-axd.pages.dev</a>. If you've\nbeen sitting on a 200MB <code>export.zip</code> with nothing that'll open it, have a go.</p>\n","date_published":"Mon, 13 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/openhealth-chat-with-apple-health-data.png"},{"id":"https://jonno.nz/posts/claude-code-can-now-spawn-copies-of-itself-in-isolated-vms/","url":"https://jonno.nz/posts/claude-code-can-now-spawn-copies-of-itself-in-isolated-vms/","title":"Claude Code Can Now Spawn Copies of Itself in Isolated VMs","content_html":"<p>The moment this project went from &quot;fun weekend hack&quot; to something I actually use\nevery day was when I got the MCP server working. Claude Code on my laptop sends\na prompt to the orchestrator sitting under my desk, which boots a VM, runs\nClaude Code inside it with full permissions, and streams the results back.\nClaude delegating work to Claude.</p>\n<p>It's a weird feeling watching it happen. You're in a conversation with Claude,\nit decides a task needs isolation, calls the MCP tool, and a few seconds later\nyou can see a fresh VM spinning up in the dashboard. Like having an intern who\ncan clone themselves.</p>\n<p><a href=\"https://jonno.nz/posts/claude-code-running-claude-code-in-4-second-disposable-vms/\">Part 1</a>\ncovered why I built this.\n<a href=\"https://jonno.nz/posts/29-hours-debugging-iptables-to-boot-vms-in-4-seconds/\">Part 2</a> was the\nguts of it — rootfs, networking, the guest agent. This last post is about the\ninterfaces, the streaming pipeline, and what I'd change if this needed to work\nfor more than just me.</p>\n<h2>The MCP server</h2>\n<p>The orchestrator exposes an <a href=\"https://modelcontextprotocol.io/\">MCP</a> server with\neight tools. The main one is <code>run_task</code> — give it a prompt, optional config\n(RAM, vCPUs, timeout, max turns), and it blocks until the task completes.\nReturns the task ID, status, exit code, result files, cost, and the output\ntruncated to 4000 characters.</p>\n<p>Two transport modes. Stdio for when Claude Code runs on the same machine:</p>\n<pre><code class=\"language-json\">{\n  &quot;mcpServers&quot;: {\n    &quot;orchestrator&quot;: {\n      &quot;command&quot;: &quot;sudo&quot;,\n      &quot;args&quot;: [&quot;/opt/firecracker/bin/orchestrator&quot;, &quot;mcp&quot;]\n    }\n  }\n}\n</code></pre>\n<p>And Streamable HTTP for network access — Claude Code on any machine on the LAN\ncan use it:</p>\n<pre><code class=\"language-json\">{\n  &quot;mcpServers&quot;: {\n    &quot;orchestrator&quot;: {\n      &quot;type&quot;: &quot;http&quot;,\n      &quot;url&quot;: &quot;http://192.168.50.44:8081/mcp&quot;\n    }\n  }\n}\n</code></pre>\n<p>The other tools are for poking around: <code>get_task_status</code>, <code>list_vms</code>,\n<code>exec_in_vm</code> (run a command in a still-running VM), <code>read_vm_file</code>,\n<code>destroy_vm</code>, <code>list_task_files</code>, and <code>get_task_file</code>. That last one is smart\nabout content types — text files come back as plain text, images come back as\nbase64 MCP image content so Claude can actually see screenshots the VM took.</p>\n<pre><code class=\"language-go\">if isImageMime(mimeType) {\n    encoded := base64.StdEncoding.EncodeToString(data)\n    return mcplib.NewToolResultImage(&quot;Screenshot from task &quot;+taskID, encoded, mimeType), nil\n}\n</code></pre>\n<h2>The migration that broke everything</h2>\n<p>This bit is worth telling because it'll save someone else the debugging time.</p>\n<p>I originally built the MCP server with\n<a href=\"https://github.com/mark3labs/mcp-go\">mcp-go</a> v0.45.0 using SSE (Server-Sent\nEvents) transport. Worked great. Then Claude Code updated to expect the newer\nStreamable HTTP transport, and everything fell over.</p>\n<p>The failure mode was confusing. Claude Code would try to connect, attempt OAuth\ndiscovery against the <code>/sse</code> endpoint, get a 404 (my server doesn't do OAuth),\nand fail with:</p>\n<pre><code>Error: HTTP 404: Invalid OAuth error response: SyntaxError: JSON Parse error: Unable to parse JSON string\n</code></pre>\n<p>Nothing in my code changed. The client just started speaking a different\nprotocol.</p>\n<p>The fix was small once I understood it:</p>\n<pre><code class=\"language-go\">// Before — SSE transport\nfunc (s *Server) ServeSSE(addr string) error {\n    sseServer := server.NewSSEServer(s.mcpServer,\n        server.WithBaseURL(&quot;http://&quot;+addr),\n    )\n    return sseServer.Start(addr)\n}\n\n// After — Streamable HTTP transport\nfunc (s *Server) ServeHTTP(addr string) error {\n    httpServer := server.NewStreamableHTTPServer(s.mcpServer,\n        server.WithEndpointPath(&quot;/mcp&quot;),\n        server.WithStateLess(true),\n    )\n    return httpServer.Start(addr)\n}\n</code></pre>\n<p>Bumped mcp-go from v0.45.0 to v0.46.0, swapped the server constructor, changed\nthe endpoint from <code>/sse</code> to <code>/mcp</code>, updated the client config. Done. But\ndiagnosing &quot;OAuth error on a server that doesn't do OAuth&quot; — that bit took a\nwhile.</p>\n<h2>Output streaming</h2>\n<p>When Claude Code runs inside a VM, its output needs to get from stdout inside\nthe guest all the way to a browser tab on my laptop. The path:</p>\n<pre><code class=\"language-mermaid\">flowchart LR\n    A[&quot;Claude Code stdout&quot;] --&gt; B[&quot;Guest agent\\nvsock frame&quot;]\n    B --&gt; C[&quot;Host vsock client\\nExecStream&quot;]\n    C --&gt; D[&quot;Task runner\\nOnEvent callback&quot;]\n    D --&gt; E[&quot;Stream Hub\\nring buffer + fan-out&quot;]\n    E --&gt; F[&quot;WebSocket\\nto browser&quot;]\n</code></pre>\n<p>The stream hub (<code>internal/stream/hub.go</code>) is a per-task pub/sub system. Each\ntask gets a stream with a 1000-event ring buffer. When a WebSocket client\nconnects, it gets all the buffered history first, then live events as they\narrive.</p>\n<p>Fan-out is non-blocking:</p>\n<pre><code class=\"language-go\">for ch := range s.subscribers {\n    select {\n    case ch &lt;- event:\n    default:\n        // Subscriber is slow, drop the event\n    }\n}\n</code></pre>\n<p>A slow WebSocket client can't block the task runner. If the browser can't keep\nup, it misses events. In practice this never happens because the bottleneck is\nalways Claude thinking, not the network.</p>\n<h2>The web dashboard</h2>\n<p>The React frontend is compiled to static files and embedded into the Go binary:</p>\n<pre><code class=\"language-go\">//go:embed all:web-dist\nvar webDistEmbed embed.FS\n</code></pre>\n<p>Single binary deployment. No nginx, no separate frontend server, no CORS\nheadaches in production. The API server falls through to <code>index.html</code> for\nunknown paths, which gives you SPA client-side routing.</p>\n<p>The most interesting page is the task detail view. Claude Code's\n<code>--output-format stream-json</code> spits out one JSON object per line — thinking\nblocks, text responses, tool calls, tool results, cost summaries. The dashboard\nparses these into coloured blocks:</p>\n<ul>\n<li>Purple for thinking (Claude's internal reasoning)</li>\n<li>Blue for text responses</li>\n<li>Orange for tool calls (shows the tool name and input)</li>\n<li>Grey for tool results (truncated to 2000 chars — some of these are enormous)</li>\n<li>Green for the final result with cost</li>\n</ul>\n<p>A <code>useWebSocket</code> hook connects when the task is running and disconnects when\nit's done. Green pulsing dot for live streaming. Auto-scroll to the bottom as\nevents arrive. Image files in the results get inline previews pointing at the\nAPI's file download endpoint — so when Claude takes a screenshot inside the VM,\nyou see it immediately.</p>\n<p>Dark theme. Orange accents. Obviously.</p>\n<h2>What productionising looks like</h2>\n<p>This runs on one box with no auth. It's a home lab project. But the gap between\n&quot;works for me&quot; and &quot;works for a small team&quot; isn't as big as it looks.</p>\n<p><strong>Persistence</strong> is the most obvious one. The task store is an in-memory Go map.\nOrchestrator restarts? All task history gone. VM metadata already persists to\ndisk and gets recovered on startup — tasks should too. SQLite or bbolt, a few\nhours of work. I just haven't needed it because I don't restart the process very\noften.</p>\n<p><strong>Task queue with backpressure.</strong> Right now tasks fire as goroutines with no\nconcurrency limit. Submit 20 tasks on a 30GB machine where each VM wants 2GB and\nthe last few fail because there's no memory left. A buffered channel or\nsemaphore would fix this. You could get fancier with priority queues — quick\ncode generation tasks ahead of long research tasks — but even a simple\nconcurrency cap would be enough.</p>\n<p><strong>Authentication.</strong> The REST API and MCP server accept requests from anyone who\ncan reach the port. For a team: API keys at minimum, mTLS if you're serious\nabout it. The MCP spec supports auth flows now — that'd be the right way to do\nit for the MCP endpoint.</p>\n<p><strong>The OnEvent callback race.</strong> This one's a latent bug. The task runner's\n<code>OnEvent</code> callback is stored on the runner struct, not passed per-task:</p>\n<pre><code class=\"language-go\">s.taskRunner.OnEvent = func(id string, event agent.StreamEvent) {\n    taskStream.Publish(event)\n}\ns.taskRunner.Run(context.Background(), t)\n</code></pre>\n<p>Two simultaneous tasks overwrite each other's callbacks. It works today because\nMCP tasks block (one at a time) and the API handler sets up the stream before\nthe goroutine runs. But it's the kind of thing that works until it doesn't. Fix\nis trivial — pass the callback into <code>Run()</code> as a parameter.</p>\n<p><strong>Graceful shutdown.</strong> There's no signal handler. Ctrl-C kills the process,\nrunning VMs become orphans. They keep running as Firecracker processes — the\n<code>recoverState()</code> function on next startup finds them and starts tracking them\nagain — but their tasks are lost. A proper signal handler would stop accepting\nnew tasks, wait for running ones to finish with a timeout, then tear everything\ndown cleanly.</p>\n<p><strong>For real multi-user</strong> you'd want result storage on S3 or R2 instead of local\ndisk. A web auth layer. Per-user credential vaults so different people's Claude\ntokens don't mix. Usage tracking and cost attribution.</p>\n<p><strong>What I wouldn't change:</strong> the single-binary deployment, vsock for host-guest\ncommunication, ephemeral VMs as the isolation model, the embedded frontend.\nThose are the right calls regardless of scale. The architecture is sound — it's\nthe operational bits around it that need work.</p>\n<p>Most of these are a weekend each. The project is about 3,200 lines of Go and 860\nof TypeScript. It's not a big codebase. Adding persistence, auth, and a task\nqueue would maybe take it to 4,500 lines. Still fits in your head.</p>\n<p>For now, it sits under my desk and boots VMs when I ask it to. Claude delegating\nto Claude, in complete isolation, on hardware I own. That's enough.</p>\n","date_published":"Mon, 13 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/claude-code-can-now-spawn-copies-of-itself-in-isolated-vms.png"},{"id":"https://jonno.nz/posts/llm-kills-compromised-services-at-3am/","url":"https://jonno.nz/posts/llm-kills-compromised-services-at-3am/","title":"The Future of Security Is an Open-Source Model That Detects and Acts on Threats","content_html":"<p>Anthropic just dropped <a href=\"https://www.anthropic.com/glasswing\">Project Glasswing</a>\n— a big collaborative cybersecurity initiative with a shiny new model called\nClaude Mythos Preview that can find zero-day vulnerabilities at scale. Twelve\nmajor tech companies involved. $100M in credits. Found a 27-year-old flaw in\nOpenBSD. Impressive stuff.</p>\n<p>But let's be real about what's happening here. Anthropic trained a model so\ncapable at breaking into systems that they decided it was too dangerous to\nrelease publicly. So they wrapped the release in a collaborative security\ninitiative. The security work is genuinely valuable. But it's also a smart way\nto keep control of something they know is too powerful to let loose.</p>\n<p>The part that actually matters, though, is who benefits. Glasswing is for the\nbig players. The companies with security teams, budgets, and the kind of\ninfrastructure that gets invited to sit at the table with AWS, Microsoft, and\nPalo Alto Networks. What about the rest of us? The startups, the small SaaS\nshops, the indie developers running production systems on a shoestring?</p>\n<p>The internet is a\n<a href=\"https://bigthink.com/books/how-the-dark-forest-theory-helps-us-understand-the-internet/\">dark forest</a>.\nThat's not a metaphor anymore — it's becoming the literal reality. Bots,\nscrapers, automated exploit chains, credential stuffing, AI-generated phishing.\nA server goes up and within hours it's being scanned, fingerprinted, and probed\nby systems that don't sleep. Visibility equals vulnerability. And AI is making\nthe attackers faster, cheaper, and more autonomous every month.</p>\n<p>The\n<a href=\"https://www.isc2.org/insights/2026/04/ai-driven-defense-and-autonomous-attacks\">ISC2 put it plainly</a>\n— both offence and defence now operate at speeds beyond human intervention. The\nthreats aren't people sitting at keyboards anymore. They're autonomous systems\nrunning campaigns end-to-end.</p>\n<p>So what do we do about it?</p>\n<h2>Offensive security — but not the kind you're thinking</h2>\n<p>When I say offensive security, I don't mean red-teaming or penetration testing.\nI mean giving your systems the ability to fight back.</p>\n<p>Picture an LLM that sits across your centralised logs — network traffic,\ndatabase queries, user interactions, access patterns — and builds an\nunderstanding of what normal looks like for your system over weeks and months.\nNot just pattern matching against known signatures. Actually understanding the\nshape of healthy behaviour.</p>\n<p>When something breaks the pattern, it doesn't just alert. It acts.</p>\n<p>Disable a compromised account. Kill a service that's behaving strangely. Block a\ndatabase connection that shouldn't exist. Create an incident with full context\nfor a human to review. The response is proportional and immediate — not waiting\nfor someone to check their phone at 3am.</p>\n<p>The architecture is pretty straightforward:</p>\n<pre><code class=\"language-mermaid\">graph TD\n    A[Application Logs] --&gt; D[Secure Isolated Log Store]\n    B[Network Traffic] --&gt; D\n    C[Database Queries] --&gt; D\n    D --&gt; F[Baseline Health Model]\n    E[User Activity] --&gt; D\n    F --&gt;|Anomaly Detected| G[LLM Analysis]\n    G --&gt;|Analyse &amp; Plan| H{Threat Assessment}\n    H --&gt;|Low| I[Alert &amp; Log]\n    H --&gt;|Medium| J[Restrict &amp; Escalate]\n    H --&gt;|High| K[Disable &amp; Isolate]\n    I --&gt; L[Human Review]\n    J --&gt; L\n    K --&gt; L\n</code></pre>\n<p>The key is that the logging and analysis layer has to be isolated and secured\nseparately from the systems it's watching. If an attacker can compromise the\nthing that's watching them, the whole model falls apart.</p>\n<p>In practice that means separate infrastructure with its own auth boundary.\nIngestion is write-only — your application services push logs in but can never\nread or modify what's already there. Append-only, immutable. The analysis layer\ngets scoped service accounts that can read logs, fire alerts, and pull specific\nemergency levers through a narrow API. Nothing else. If a compromised service\ntries to reach the log store directly, it hits a wall.</p>\n<p>None of this is exotic. Centralised logging, immutable storage, scoped IAM — the\nbuilding blocks exist. The hard part is wiring an LLM into that loop with the\nright constraints. Enough access to act, not enough to make things worse.</p>\n<h2>Adaptive, not rule-based</h2>\n<p>Traditional security tooling runs on signatures and static rules. Known bad\npatterns, blocklists, threshold alerts. That worked when threats were mostly\nhuman-paced. It doesn't work when you're up against autonomous systems that\nadapt faster than you can write rules.</p>\n<p>The alternative is a system that learns what normal looks like for <em>your</em>\nenvironment — not a generic baseline, but the actual shape of healthy behaviour\nin your specific infrastructure. Traffic patterns, query frequencies, access\ntiming, user behaviour. Weeks of observation before it starts making decisions.</p>\n<p>When something breaks the pattern, the response is proportional. A sudden spike\nin unusual API calls might trigger deeper correlation — the system widens its\nsearch, pulls in more signals, lowers its threshold for flagging related\nactivity. Repeated failed auth attempts from new IPs tighten access controls\nautomatically. A database connection that shouldn't exist gets killed.</p>\n<p>This isn't a static ruleset you configure once and hope covers everything. It's\na system that develops behavioural intuition from running in your environment,\nresponding to your traffic. The difference matters — static rules are brittle\nagainst novel attacks, while adaptive systems can catch anomalies they've never\nseen before.</p>\n<p>The baseline isn't magic. It's watching five things:</p>\n<ul>\n<li><strong>Rate</strong> — how many events per time window. A user who averages 50 API calls\nper hour suddenly making 500 is a signal.</li>\n<li><strong>Composition</strong> — what's in those events. The same user always hitting\n/api/users and /api/orders suddenly hammering /api/admin/export.</li>\n<li><strong>Cardinality</strong> — how many unique values. One IP hitting 3 endpoints is\nnormal. One IP cycling through 200 endpoints in an hour isn't.</li>\n<li><strong>Latency</strong> — how fast things happen. Legitimate users pause, think, navigate.\nBots don't.</li>\n<li><strong>Novelty</strong> — things the system has never seen. A new endpoint, a new\nparameter, a user agent string that doesn't match anything in the training\nwindow.</li>\n</ul>\n<p>Three layers of detection stack on top of each other. Layer one is simple\nthresholds — hard caps that trigger immediately. Layer two is statistical\ndeviation — standard deviations from the learned baseline. Layer three is\ncorrelation — looking across multiple signals simultaneously. A spike in rate\nalone might be fine. A spike in rate plus unusual composition plus new source\nIP? That's a pattern.</p>\n<h2>Learning to recognise yourself</h2>\n<p>A pure anomaly detector would go nuts during deploys. New code paths, changed\nresponse times, config reloads — all of it looks unusual. Same with cron jobs.\nYour 3am batch job that hits the database hard every night would trigger alerts\nevery night.</p>\n<p>Tolerance patterns solve this. The system learns to recognise you.</p>\n<p>Mark a deploy event, and the system creates a tolerance window — elevated\nthresholds for the next 30 minutes. Register a recurring cron job, and the\nsystem expects that exact spike at that exact time. These aren't exceptions you\nconfigure manually. They're patterns the system learns from watching.</p>\n<p>After a few weeks, it knows when your weekly cache warm-up runs, when your daily\nreports generate, when deploys happen. It stops bothering you about the things\nyou do on purpose.</p>\n<h2>The system gets cheaper over time</h2>\n<p>Calling an LLM for every anomaly would be expensive. The trick is building\nimmune memory.</p>\n<p>When the LLM analyses an anomaly and decides it's benign — say, a deploy spike\nor a legitimate traffic surge — that verdict gets stored. Next time the same\npattern appears, the system recognises it. No LLM call needed.</p>\n<p>This is how your security bill drops over the first few weeks. Early on,\neverything is novel. The LLM gets called constantly. A month in, most anomalies\nmatch patterns it's already seen. The LLM only gets called for genuinely new\nsituations.</p>\n<p>The more your system runs, the smarter it gets and the less it costs.</p>\n<h2>Setup without a PhD</h2>\n<p>The hardest part of any security tool is configuration. Getting thresholds\nright. Understanding your traffic patterns before you can tell the tool what's\nnormal.</p>\n<p><code>darkforest init</code> flips this. Point it at a log sample — a day's worth of\ntraffic, a week if you've got it — and Claude reads it. Not just parsing,\nactually understanding the shape of your system. It figures out what your\nendpoints are, what normal request rates look like, what user agents show up,\nwhere your traffic comes from geographically.</p>\n<p>Then it writes your config file for you.</p>\n<p>You review it, tweak anything that looks wrong, and you're running. No\nspreadsheets. No guesswork about what &quot;normal&quot; means for your specific stack.\nThe LLM that's going to watch your logs already understands them.</p>\n<h2>This has to be open</h2>\n<p>Glasswing is cool.\n<a href=\"https://github.com/aliasrobotics/CAI\">Open-source frameworks like CAI</a> are\nmaking progress — but mostly on the offensive side, using LLMs for penetration\ntesting and vulnerability research. On the defensive side, the tooling barely\nexists. There's no open-source equivalent for the kind of adaptive monitoring\nand response I'm describing here.</p>\n<p>The building blocks are around. Centralised logging is a solved problem. Open\nstandards for security event formats are maturing. Smaller open models are more\nthan capable of pattern analysis on local infrastructure. What's missing is the\nglue — a framework that takes logs in, builds a baseline, detects anomalies, and\ncan actually respond. Something a small team can deploy without a six-figure\nsecurity budget.</p>\n<p>The threats don't discriminate by company size. The defences shouldn't either.\nThis can't be proprietary or locked behind enterprise contracts.</p>\n<p>The dark forest doesn't care how big your company is. The bots scanning your\ninfrastructure don't check your headcount before they attack. If the threats are\ngoing to be this accessible, the defences need to be too.</p>\n<p>I'm building this. An open-source security agent — adaptive, autonomous, acts\nwhen something breaks the pattern. Small enough for a startup to run on their\nown infrastructure. Centralised logging, open LLMs, scoped response actions. The\npieces are all there. I'm wiring them together now.</p>\n<p>For v0.1, one real action working end-to-end: detect anomalous authentication\npatterns, call the LLM for analysis, and disable the compromised account via\nyour identity provider's API. Not just alerting — actually responding while\nyou're asleep. That's the proof of concept that matches the headline.</p>\n<p>I'm actively working on this and looking for early testers. If you want alpha\naccess when it's ready, or just want to follow along,\n<a href=\"https://jonno.nz/posts/llm-kills-compromised-services-at-3am/#newsletter\">drop your email below</a>. I'll reach out when there's something to\ntry.</p>\n","date_published":"Sat, 11 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/llm-kills-compromised-services-at-3am.png"},{"id":"https://jonno.nz/posts/29-hours-debugging-iptables-to-boot-vms-in-4-seconds/","url":"https://jonno.nz/posts/29-hours-debugging-iptables-to-boot-vms-in-4-seconds/","title":"I Spent 29 Hours Debugging iptables to Boot VMs in 4 Seconds","content_html":"<p>The first time I got a Firecracker VM to boot and respond to a vsock ping from\nthe host, I sat there grinning like an idiot. Typed a command on my machine, it\nreached through a kernel-level socket into a completely separate Linux system\nwith its own kernel, and got a reply. Under a second.</p>\n<p>That was about 30 hours into the project. The previous 29 were mostly fighting\nwith rootfs images and iptables rules.</p>\n<p><a href=\"https://jonno.nz/posts/claude-code-running-claude-code-in-4-second-disposable-vms/\">Part 1</a>\ncovered why I built this — Firecracker MicroVMs for running Claude Code in\nfull-permission isolation. This post is the actual build. Rootfs, networking,\nthe guest agent, and the streaming pipeline.</p>\n<h2>Building the rootfs</h2>\n<p>A Firecracker VM needs two things: an uncompressed Linux kernel (<code>vmlinux</code>, not\n<code>bzImage</code> — there's no bootloader) and an ext4 filesystem image to use as the\nroot disk.</p>\n<p>The kernel is straightforward — grab a prebuilt 6.1 LTS vmlinux. The rootfs took\nmore work.</p>\n<p>It's a standard ext4 image with Debian Bookworm, and it needs everything Claude\nCode might want: Node.js 24, Python 3.11, Chromium for browser automation, git,\ncurl, jq, and the full Claude Code CLI installed globally via npm. The image\nends up at about 4GB.</p>\n<p>The guest agent — the Go binary that listens for commands from the host — lives\ninside the rootfs as a systemd service:</p>\n<pre><code class=\"language-bash\">sudo mount /opt/firecracker/rootfs/base-rootfs.ext4 /mnt\nsudo cp bin/agent /mnt/usr/local/bin/agent\nsudo chmod +x /mnt/usr/local/bin/agent\n\nsudo tee /mnt/etc/systemd/system/agent.service &lt;&lt;'EOF'\n[Unit]\nDescription=Orchestrator Guest Agent\nAfter=network.target\n\n[Service]\nType=simple\nExecStart=/usr/local/bin/agent\nRestart=always\nRestartSec=1\n\n[Install]\nWantedBy=multi-user.target\nEOF\n\nsudo chroot /mnt systemctl enable agent.service\nsudo umount /mnt\n</code></pre>\n<p>That <code>RestartSec=1</code> matters. If the agent crashes for any reason, systemd has it\nback up in a second. The orchestrator polls vsock every 500ms waiting for the\nagent, so even a crash during boot is barely noticeable.</p>\n<p>You build this rootfs once, by hand. Every new VM gets a sparse copy of it.</p>\n<h2>VM lifecycle</h2>\n<p><code>internal/vm/manager.go</code> handles the whole lifecycle. It's sequential with\ncleanup at each step — if anything fails, it tears down what it already set up\nand returns the error.</p>\n<pre><code class=\"language-mermaid\">flowchart TD\n    A[&quot;Copy rootfs (sparse)&quot;] --&gt; B[&quot;Mount &amp; inject network config&quot;]\n    B --&gt; C[&quot;Create TAP device&quot;]\n    C --&gt; D[&quot;Add iptables rules&quot;]\n    D --&gt; E[&quot;Setup jailer chroot&quot;]\n    E --&gt; F[&quot;Write Firecracker config JSON&quot;]\n    F --&gt; G[&quot;Launch via jailer --daemonize&quot;]\n    G --&gt; H[&quot;Find PID, save metadata&quot;]\n    H --&gt; I[&quot;VM ready — poll vsock&quot;]\n</code></pre>\n<p>The sparse copy is the first thing that happens:</p>\n<pre><code class=\"language-go\">cmd := exec.Command(&quot;cp&quot;, &quot;--sparse=always&quot;, BaseRootfs, vm.RootfsPath)\n</code></pre>\n<p><code>--sparse=always</code> means zero blocks aren't allocated on disk. A 4GB image might\nonly use 2GB of actual disk space. Takes under a second on NVMe.</p>\n<p>After copying, the rootfs gets mounted and three files are injected: a\nsystemd-networkd config with a static IP, <code>/etc/resolv.conf</code> for DNS, and\n<code>/etc/hostname</code>. Then it's unmounted and copied again into the jailer chroot.</p>\n<p>Yeah, that's two copies of the rootfs per VM. The first for network injection,\nthe second because the jailer expects everything inside its chroot. I could\ncollapse this into one copy by injecting the network config directly into the\nchroot copy, but it's never been a bottleneck — sparse copy of 4GB takes less\ntime than Firecracker takes to boot. So I left it.</p>\n<h2>The jailer</h2>\n<p>Firecracker's jailer is a separate binary that creates a chroot, sets up minimal\n<code>/dev</code> entries (kvm, net/tun, urandom), and runs the Firecracker process inside\nit. The VM config is a JSON file:</p>\n<pre><code class=\"language-go\">vmConfig := map[string]interface{}{\n    &quot;boot-source&quot;: map[string]interface{}{\n        &quot;kernel_image_path&quot;: &quot;/vmlinux&quot;,\n        &quot;boot_args&quot;:         &quot;console=ttyS0 reboot=k panic=1 pci=off init=/sbin/init&quot;,\n    },\n    &quot;drives&quot;: []map[string]interface{}{{\n        &quot;drive_id&quot;:       &quot;rootfs&quot;,\n        &quot;path_on_host&quot;:   &quot;/rootfs.ext4&quot;,\n        &quot;is_root_device&quot;: true,\n        &quot;is_read_only&quot;:   false,\n    }},\n    &quot;machine-config&quot;: map[string]interface{}{\n        &quot;vcpu_count&quot;:  vm.VCPUs,\n        &quot;mem_size_mib&quot;: vm.RamMB,\n    },\n    &quot;network-interfaces&quot;: []map[string]interface{}{{\n        &quot;iface_id&quot;:      &quot;eth0&quot;,\n        &quot;guest_mac&quot;:     &quot;06:00:AC:10:00:02&quot;,\n        &quot;host_dev_name&quot;: netCfg.TapDev,\n    }},\n    &quot;vsock&quot;: map[string]interface{}{\n        &quot;guest_cid&quot;: vm.VsockCID,\n        &quot;uds_path&quot;:  &quot;/vsock.sock&quot;,\n    },\n}\n</code></pre>\n<p><code>pci=off</code> because Firecracker doesn't emulate PCI. Paths are relative to the\njailer chroot. The vsock entry creates a Unix domain socket at <code>/vsock.sock</code>\ninside the chroot — that's how the host talks to the guest.</p>\n<p>Launch looks like this:</p>\n<pre><code class=\"language-go\">cmd := exec.Command(JailerBin,\n    &quot;--id&quot;, vm.JailID,\n    &quot;--exec-file&quot;, FCBin,\n    &quot;--uid&quot;, &quot;0&quot;, &quot;--gid&quot;, &quot;0&quot;,\n    &quot;--cgroup-version&quot;, &quot;2&quot;,\n    &quot;--daemonize&quot;,\n    &quot;--&quot;,\n    &quot;--config-file&quot;, &quot;/vm-config.json&quot;,\n)\ncmd.Run()\n</code></pre>\n<p>After launch there's a 2-second sleep — Firecracker needs a moment to start —\nthen the PID is found via <code>pgrep</code> and saved to a metadata file. If the\norchestrator restarts, it reads these metadata files and picks up where it left\noff. VMs survive orchestrator crashes.</p>\n<h2>Networking</h2>\n<p>This is where I burned the most time. Not because the concepts are hard, but\nbecause of one specific bug that had me questioning reality.</p>\n<p>Each VM needs internet access for Claude Code to fetch packages, clone repos,\nand hit the Anthropic API. The approach: each VM gets a Linux TAP device on the\nhost, a dedicated <code>/24</code> subnet, and iptables rules for NAT.</p>\n<h3>IP allocation</h3>\n<p>Subnets are deterministic, derived from the VM name using FNV-1a hashing:</p>\n<pre><code class=\"language-go\">func NetSlot(name string) int {\n    h := fnv.New32a()\n    h.Write([]byte(name))\n    return int(h.Sum32()%253) + 1\n}\n</code></pre>\n<p>VM named <code>task-a3bfca80</code> might hash to slot 61, giving it subnet\n<code>172.16.61.0/24</code>, guest IP <code>172.16.61.2</code>, TAP IP <code>172.16.61.1</code>. No coordination\nneeded, no DHCP server, no IP pool to manage. The collision space is 253 slots —\nmore than enough for 12-13 concurrent VMs.</p>\n<h3>TAP devices</h3>\n<p>A TAP device is a virtual ethernet interface. Firecracker attaches the guest's\n<code>eth0</code> to it.</p>\n<pre><code class=\"language-go\">tap := &amp;netlink.Tuntap{\n    LinkAttrs: netlink.LinkAttrs{Name: cfg.TapDev},\n    Mode:      netlink.TUNTAP_MODE_TAP,\n}\nnetlink.LinkAdd(tap)\naddr, _ := netlink.ParseAddr(cfg.TapIP + &quot;/24&quot;)\nlink, _ := netlink.LinkByName(cfg.TapDev)\nnetlink.AddrAdd(link, addr)\nnetlink.LinkSetUp(link)\n</code></pre>\n<p>TAP names are <code>fc-&lt;vm-name&gt;</code>, truncated to 15 characters because Linux interface\nnames can't be longer. A fun constraint to discover at runtime.</p>\n<h3>The iptables rules</h3>\n<p>Three rules per VM:</p>\n<pre><code class=\"language-go\">// NAT — rewrite source IP when traffic exits the host\nipt.AppendUnique(&quot;nat&quot;, &quot;POSTROUTING&quot;,\n    &quot;-s&quot;, cfg.Subnet, &quot;-o&quot;, cfg.HostIface, &quot;-j&quot;, &quot;MASQUERADE&quot;)\n\n// FORWARD — allow outbound from TAP\nipt.Insert(&quot;filter&quot;, &quot;FORWARD&quot;, 1,\n    &quot;-i&quot;, cfg.TapDev, &quot;-o&quot;, cfg.HostIface, &quot;-j&quot;, &quot;ACCEPT&quot;)\n\n// FORWARD — allow established/related inbound\nipt.Insert(&quot;filter&quot;, &quot;FORWARD&quot;, 1,\n    &quot;-i&quot;, cfg.HostIface, &quot;-o&quot;, cfg.TapDev,\n    &quot;-m&quot;, &quot;state&quot;, &quot;--state&quot;, &quot;RELATED,ESTABLISHED&quot;, &quot;-j&quot;, &quot;ACCEPT&quot;)\n</code></pre>\n<p>See those <code>Insert</code> calls with position 1? That's the bug fix.</p>\n<h3>The UFW bug</h3>\n<p>I originally used <code>Append</code> for the FORWARD rules. Traffic from the VM would\nleave the host fine (NAT worked), but return traffic got dropped. The VM could\nresolve DNS but couldn't complete TCP handshakes. I spent an embarrassing amount\nof time staring at <code>tcpdump</code> output before I figured it out.</p>\n<p>Ubuntu's UFW adds a blanket <code>DROP</code> rule to the FORWARD chain. If you append your\nACCEPT rules, they land <em>after</em> UFW's DROP. They never match. The packets hit\nthe DROP rule first and get silently killed.</p>\n<p><code>Insert</code> at position 1 puts the rules before UFW's. Return traffic flows, VMs\nget internet access, everything works.</p>\n<p>The traffic path through a working VM:</p>\n<pre><code>Guest (172.16.61.2) → eth0 → TAP (fc-task-xxx) → FORWARD ACCEPT\n→ NAT MASQUERADE (rewrite src to host IP) → host interface → internet\n→ response → RELATED,ESTABLISHED → TAP → guest eth0\n</code></pre>\n<p>VMs can't reach each other. Each TAP device is point-to-point on its own <code>/24</code>.\nThere's no route between subnets.</p>\n<h2>The guest agent</h2>\n<p><code>cmd/agent/main.go</code> — 420 lines of Go. It's a static binary that starts on boot,\nlistens on vsock port 9001, and handles five request types: ping, exec,\nwrite_files, read_file, and signal.</p>\n<p>The interesting one is streaming exec.</p>\n<p>When the orchestrator wants to run Claude Code, it sends an exec request with\n<code>stream: true</code>. The agent spawns the command, reads stdout and stderr line by\nline, and sends each line back as a framed event over the vsock connection. When\nthe process exits, it sends an exit event with the exit code.</p>\n<p>Sounds straightforward. The tricky part is background processes.</p>\n<p>Claude Code can start things that outlive the main command — dev servers, file\nwatchers, whatever it decides it needs. These child processes inherit the\nstdout/stderr pipes. If the agent waits for the pipes to close (the normal\napproach), it hangs forever because the children are still holding them open.</p>\n<p>The fix has three parts:</p>\n<pre><code class=\"language-go\">// 1. Process group isolation\ncmd.SysProcAttr = &amp;syscall.SysProcAttr{Setpgid: true}\n\n// 2. Wait for the main process, not the pipes\n&lt;-waitDone\n\n// 3. Kill the entire process group\npgid, _ := syscall.Getpgid(cmd.Process.Pid)\nsyscall.Kill(-pgid, syscall.SIGTERM)\ntime.Sleep(500 * time.Millisecond)\nsyscall.Kill(-pgid, syscall.SIGKILL)\n</code></pre>\n<p><code>Setpgid: true</code> puts the command in its own process group. When the main process\nexits, kill the group (<code>-pgid</code> means &quot;everything in this group&quot;). SIGTERM first,\nwait half a second, then SIGKILL for anything that didn't listen.</p>\n<p>Even after killing the group, there's a 3-second timeout waiting for the\npipe-reading goroutines to drain. If they're still stuck after that, move on and\nsend the exit event anyway. Can't let a hung pipe block the entire task.</p>\n<p>The line-by-line reader uses a 256KB buffer because Claude Code's\n<code>--output-format stream-json</code> can produce enormous single lines — tool results\nthat include the full contents of files it read.</p>\n<h2>Credential injection</h2>\n<p>Before Claude Code runs, the orchestrator writes five things into the VM via\nvsock:</p>\n<p>OAuth credentials from the host's <code>~/.claude/.credentials.json</code> (mode 0600). A\nsettings file that allows all tools. An environment script that sets\n<code>CLAUDE_DANGEROUSLY_SKIP_PERMISSIONS=true</code>. Task metadata. And a marker file to\ncreate the output directory.</p>\n<p>The prompt itself gets written to a temp file inside the VM to avoid shell\nescaping nightmares, then referenced in the command:</p>\n<pre><code class=\"language-go\">claudeArgs := fmt.Sprintf(\n    &quot;claude -p \\&quot;$(cat %s)\\&quot; --output-format stream-json --verbose&quot;,\n    promptFile,\n)\ncmd := []string{&quot;bash&quot;, &quot;-c&quot;,\n    &quot;source /etc/profile.d/claude.sh &amp;&amp; &quot; + claudeArgs}\n</code></pre>\n<p>When the VM is destroyed, the rootfs — containing the credentials — is deleted.\nCredentials only exist for the lifetime of the task.</p>\n<h2>Collecting results</h2>\n<p>After Claude Code finishes, the orchestrator searches for files it created:</p>\n<pre><code class=\"language-go\">// Anything in the output directory\nvsock.Exec(jailID, []string{&quot;find&quot;, outputDir, &quot;-type&quot;, &quot;f&quot;, &quot;-not&quot;, &quot;-name&quot;, &quot;.keep&quot;}, nil, &quot;/root&quot;)\n\n// Any new files under /root, created after the prompt was written\nvsock.Exec(jailID, []string{&quot;find&quot;, &quot;/root&quot;, &quot;-maxdepth&quot;, &quot;2&quot;, &quot;-type&quot;, &quot;f&quot;,\n    &quot;-newer&quot;, &quot;/tmp/claude-prompt.txt&quot;}, nil, &quot;/root&quot;)\n</code></pre>\n<p>Each file gets downloaded via <code>vsock.ReadFile</code> and saved to\n<code>/opt/firecracker/results/&lt;task-id&gt;/</code>. The runner also scans the accumulated\noutput for Claude's <code>total_cost_usd</code> field to record what the task cost in API\ncredits.</p>\n<p>Then the VM is destroyed. Firecracker process killed, TAP device removed,\niptables rules deleted, jailer chroot deleted, VM state directory deleted. Clean\nslate.</p>\n<p>The whole cycle — boot, inject, run, collect, destroy — typically takes 30-120\nseconds depending on how complex the prompt is. The 4-second boot and ~1-second\nteardown are rounding errors compared to the time Claude actually spends\nthinking.</p>\n<p><a href=\"https://jonno.nz/posts/claude-code-can-now-spawn-copies-of-itself-in-isolated-vms/\">Part 3</a>\ngets into the fun stuff — the MCP server that lets Claude delegate tasks to\nitself, the streaming architecture, the web dashboard, and what productionising\nthis would actually look like.</p>\n","date_published":"Fri, 10 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/29-hours-debugging-iptables-to-boot-vms-in-4-seconds.png"},{"id":"https://jonno.nz/posts/can-you-beat-last-month/","url":"https://jonno.nz/posts/can-you-beat-last-month/","title":"Can You Beat Last Month?","content_html":"<p>Every machine learning project needs a reality check.</p>\n<p>It's tempting to jump straight to the neural network. That's the exciting bit,\nright? But if you don't establish what a dead-simple model can do first, you've\ngot no idea whether your fancy architecture is actually learning anything useful\nor just being expensive.</p>\n<p>So before ConvLSTM gets anywhere near this data, we're going to throw three\ngloriously simple baselines at it and see how they do.</p>\n<h2>Persistence: next month equals this month</h2>\n<p>The dumbest possible model. To predict April, just use March's values. Every\ncell, every crime type. Carbon copy.</p>\n<p>It sounds ridiculous, but it works surprisingly well when patterns are stable.\nAnd as we saw in the EDA, Auckland's crime hotspots are remarkably persistent.\nThe CBD doesn't suddenly go quiet. South Auckland doesn't randomly calm down.</p>\n<p>On the six-month test set (August 2025 – January 2026):</p>\n<table>\n<thead>\n<tr>\n<th>Crime Type</th>\n<th>MAE</th>\n<th>RMSE</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Theft</td>\n<td>1.42</td>\n<td>3.18</td>\n</tr>\n<tr>\n<td>Burglary</td>\n<td>0.38</td>\n<td>0.91</td>\n</tr>\n<tr>\n<td>Assault</td>\n<td>0.22</td>\n<td>0.64</td>\n</tr>\n<tr>\n<td>Robbery</td>\n<td>0.04</td>\n<td>0.15</td>\n</tr>\n<tr>\n<td>Sexual</td>\n<td>0.03</td>\n<td>0.12</td>\n</tr>\n<tr>\n<td>Harm</td>\n<td>0.01</td>\n<td>0.04</td>\n</tr>\n</tbody>\n</table>\n<p>Those MAE numbers for theft and burglary look small until you remember that most\ncells are zero. For the active cells (the ones we actually care about) the error\nis larger. A busy CBD cell might have 35 thefts in one month and 28 the next.\nPersistence would be off by 7 there, which is a 20% miss on an important\nprediction.</p>\n<h2>Seasonal naive: same month last year</h2>\n<p>Instead of copying last month, copy the same month from the previous year.\nJanuary 2026 gets predicted from January 2025. This should capture seasonal\npatterns: the summer spike, the February dip.</p>\n<p>The catch? We only have four years of data. The test set months (August–January)\neach have at most three prior examples of the same month. That's not a lot of\nseasonal training data.</p>\n<table>\n<thead>\n<tr>\n<th>Crime Type</th>\n<th>MAE</th>\n<th>RMSE</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Theft</td>\n<td>1.51</td>\n<td>3.42</td>\n</tr>\n<tr>\n<td>Burglary</td>\n<td>0.41</td>\n<td>0.97</td>\n</tr>\n<tr>\n<td>Assault</td>\n<td>0.24</td>\n<td>0.68</td>\n</tr>\n<tr>\n<td>Robbery</td>\n<td>0.05</td>\n<td>0.17</td>\n</tr>\n<tr>\n<td>Sexual</td>\n<td>0.04</td>\n<td>0.13</td>\n</tr>\n<tr>\n<td>Harm</td>\n<td>0.01</td>\n<td>0.04</td>\n</tr>\n</tbody>\n</table>\n<p>Slightly worse than persistence across the board. That surprised me initially.\nShouldn't capturing seasonality help?</p>\n<p>The issue is that the 2023-to-2025 decline we spotted in the EDA bites hard\nhere. If you predict January 2026 from January 2025, you're using data from a\nperiod when crime was higher. The seasonal pattern is real, but the\nyear-over-year trend works against it. With more years of data, seasonal naive\nwould likely pull ahead.</p>\n<h2>Historical average: the mean of all training months</h2>\n<p>For each cell and crime type, take the average across all 36 training months.\nThis smooths out month-to-month noise and gives you a &quot;typical&quot; value for each\nlocation.</p>\n<table>\n<thead>\n<tr>\n<th>Crime Type</th>\n<th>MAE</th>\n<th>RMSE</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Theft</td>\n<td>1.28</td>\n<td>2.95</td>\n</tr>\n<tr>\n<td>Burglary</td>\n<td>0.35</td>\n<td>0.84</td>\n</tr>\n<tr>\n<td>Assault</td>\n<td>0.20</td>\n<td>0.58</td>\n</tr>\n<tr>\n<td>Robbery</td>\n<td>0.04</td>\n<td>0.14</td>\n</tr>\n<tr>\n<td>Sexual</td>\n<td>0.03</td>\n<td>0.11</td>\n</tr>\n<tr>\n<td>Harm</td>\n<td>0.01</td>\n<td>0.04</td>\n</tr>\n</tbody>\n</table>\n<p>The best baseline. By averaging over three years, it smooths out the\nmonth-to-month noise and the year-over-year trend simultaneously. It won't\ncapture seasonal peaks or sudden changes, but for the &quot;typical month&quot; prediction\nit's solid.</p>\n<h2>Why MAPE breaks down</h2>\n<p>You might wonder why I'm not reporting MAPE (Mean Absolute Percentage Error).\nIt's the standard metric in a lot of forecasting work. The reason: sparse data.</p>\n<p>MAPE divides the error by the actual value. When the actual value is zero (which\nit is for 91.7% of our tensor) you get division by zero. Even for cells with\nsmall counts (1 or 2 crimes), a prediction of 0 gives you 100% MAPE while a\nprediction of 2 gives you 0–100%. The metric becomes wildly unstable.</p>\n<p>MAE and RMSE are more honest here. They tell you the absolute magnitude of your\nerrors in actual crime counts, which is what we care about. A miss of 3\nvictimisations means the same thing whether the cell usually has 5 or 50.</p>\n<h2>The bar to clear</h2>\n<p>Here's the scoreboard going forward. Any deep learning model needs to beat the\nhistorical average baseline to justify its existence:</p>\n<table>\n<thead>\n<tr>\n<th>Crime Type</th>\n<th>Historical Avg MAE</th>\n<th>Historical Avg RMSE</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Theft</td>\n<td>1.28</td>\n<td>2.95</td>\n</tr>\n<tr>\n<td>Burglary</td>\n<td>0.35</td>\n<td>0.84</td>\n</tr>\n<tr>\n<td>Assault</td>\n<td>0.20</td>\n<td>0.58</td>\n</tr>\n<tr>\n<td>All types</td>\n<td>0.39</td>\n<td>0.95</td>\n</tr>\n</tbody>\n</table>\n<p>Theft is the easiest to beat because there's the most signal: high counts, clear\nspatial patterns, strong seasonality. Robbery, sexual offences, and harm are\nessentially noise at this resolution. The models will probably predict near-zero\nfor those types and be mostly correct.</p>\n<p>The real test will be the middle ground. Can ConvLSTM or ST-ResNet predict the\n<em>changes</em> in theft and burglary better than a static average? Can they catch the\nmonths where a cell spikes or dips? That's where simple baselines fall flat,\nbecause they don't model dynamics at all.</p>\n<p>If the deep learning can't meaningfully beat &quot;just use the average,&quot; then it's\nnot worth the CPU cycles. Or in my case, the many hours of a Ryzen 5 grinding\naway.</p>\n","date_published":"Thu, 09 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/can-you-beat-last-month.png"},{"id":"https://jonno.nz/posts/claude-code-running-claude-code-in-4-second-disposable-vms/","url":"https://jonno.nz/posts/claude-code-running-claude-code-in-4-second-disposable-vms/","title":"Claude Code Running Claude Code in 4-Second Disposable VMs","content_html":"<p>Running Claude Code with full permissions inside a Docker container is a\nterrible idea. I did it anyway for about a week, then built something better.</p>\n<p>Anthropic has an internal platform — people have been calling it\n<a href=\"https://ai.gopubby.com/anthropics-antspace-the-secret-paas-nobody-was-supposed-to-find-a79ce1e02151\">Antspace</a>\nsince it got reverse-engineered from the Claude Code source — that runs AI\ncoding tasks in isolated environments. It's part of a vertical stack they're\nbuilding internally: intent goes in, code comes out, and the agent never touches\nthe host machine.</p>\n<p>I wanted that. Not the whole platform-as-a-service thing, just the core idea:\ngive Claude Code a prompt, let it run with zero permission restrictions, stream\nthe output back, grab any files it created, and destroy everything when it's\ndone. On a single Linux box sitting in my office.</p>\n<p>The result is about 3,200 lines of Go and 860 lines of TypeScript. It boots a\nfresh Linux VM in ~4 seconds, runs Claude Code inside it, and tears it down when\nthe task finishes. Three ways to use it: a CLI, a REST API with a web dashboard,\nand an MCP server so Claude Code on other machines can delegate tasks to it.</p>\n<p>This first post is about why I built it this way.\n<a href=\"https://jonno.nz/posts/29-hours-debugging-iptables-to-boot-vms-in-4-seconds/\">Part 2</a> and\n<a href=\"https://jonno.nz/posts/claude-code-can-now-spawn-copies-of-itself-in-isolated-vms/\">Part 3</a> get\ninto the actual implementation.</p>\n<h2>The container problem</h2>\n<p><code>CLAUDE_DANGEROUSLY_SKIP_PERMISSIONS=true</code> — that's the environment variable\nthat tells Claude Code to stop asking before it runs shell commands or writes\nfiles. It just does whatever it thinks it needs to. For autonomous tasks, you\nneed this. Claude can't ask for confirmation when there's nobody watching.</p>\n<p>The question is where you let it run.</p>\n<p>Docker is the obvious first thought. Fast startup, everyone knows it, easy to\norchestrate. But containers share the host kernel. Every container on the\nmachine issues syscalls to the same Linux kernel, and a kernel vulnerability is\na vulnerability in every container on the host.\n<a href=\"https://huggingface.co/blog/agentbox-master/firecracker-vs-docker-tech-boundary\">The isolation boundary is the container runtime</a>,\nnot hardware — and that surface area is big.</p>\n<p>For most workloads this is fine. Running a web server in Docker? No worries. But\nrunning an AI agent that can execute arbitrary shell commands with root-level\npermissions? That's a different threat model. A container escape gives you the\nhost. And you've just given the thing inside the container permission to try\nanything.</p>\n<p>Anthropic's own approach to\n<a href=\"https://www.anthropic.com/engineering/claude-code-sandboxing\">sandboxing Claude Code</a>\nuses OS-level primitives — bubblewrap on Linux, Seatbelt on macOS — for\nfilesystem and network isolation. They report an 84% reduction in permission\nprompts internally. That's smart for the normal use case where Claude is helping\nyou write code in your own project. But I wanted something more aggressive: full\nisolation where even a kernel exploit can't reach the host.</p>\n<h2>Why Firecracker</h2>\n<p><a href=\"https://firecracker-microvm.github.io/\">Firecracker</a> is what AWS built for\nLambda and Fargate. Each MicroVM is a real KVM-backed virtual machine with its\nown guest kernel, its own memory space, and hardware-enforced isolation via\nIntel VT-x or AMD-V. The attack surface is the KVM hypervisor — which the kernel\nteam at AWS has spent years minimising.</p>\n<p>The trade-off is boot time. Containers start in under a second. Firecracker VMs\ntake about 4 seconds on my hardware once you account for the guest kernel boot,\nsystemd init, and the agent process starting up. For tasks that typically run\n20-120 seconds, 4 seconds of overhead is nothing.</p>\n<p>Each VM also copies a 4GB rootfs image. Sparse copies make this fast (&lt;1\nsecond), but it does use disk. On a machine with a 1TB NVMe, I'm not losing\nsleep over it.</p>\n<p>The hardware is an AMD Ryzen 5 5600GT with 30GB of RAM. Nothing exotic. About\n$400 worth of parts sitting under my desk. Each VM gets 2GB of RAM by default,\nso I can run roughly 12-13 VMs concurrently before the host runs out of memory.</p>\n<h2>Talking to a VM without a network</h2>\n<p>This was my favourite bit to figure out.</p>\n<p>The obvious way to communicate with a process inside a VM is SSH. Set up keys,\nopen a port, connect over the network. But SSH means key management, an open\nnetwork port inside the VM, and another service to configure. If the guest's\nnetwork breaks during a task, you've lost your control channel.</p>\n<p><a href=\"https://github.com/firecracker-microvm/firecracker/blob/main/docs/vsock.md\">vsock</a>\n(AF_VSOCK, address family 40) is a kernel-level host-guest communication\nchannel. It doesn't touch the network stack. No IP addresses, no ports, no keys.\nFirecracker exposes the guest's vsock as a Unix domain socket on the host side —\nyou connect to the socket, send <code>CONNECT &lt;port&gt;\\n</code>, and you're talking directly\nto a process inside the VM.</p>\n<pre><code class=\"language-go\">func Connect(jailID string, port int) (net.Conn, error) {\n    socketPath := fmt.Sprintf(&quot;/srv/jailer/firecracker/%s/root/vsock.sock&quot;, jailID)\n    conn, _ := net.Dial(&quot;unix&quot;, socketPath)\n    conn.Write([]byte(fmt.Sprintf(&quot;CONNECT %d\\n&quot;, port)))\n    // Read &quot;OK &lt;port&gt;&quot; response\n    return conn, nil\n}\n</code></pre>\n<p>On the guest side, Go's standard library doesn't support AF_VSOCK — address\nfamily 40 doesn't exist in the <code>net</code> package. So the guest agent uses raw\nsyscalls:</p>\n<pre><code class=\"language-go\">fd, _ := syscall.Socket(40, syscall.SOCK_STREAM, 0)  // AF_VSOCK = 40\n// Manually construct struct sockaddr_vm (16 bytes)\nsa := [16]byte{}\n*(*uint16)(unsafe.Pointer(&amp;sa[0])) = 40          // family\n*(*uint32)(unsafe.Pointer(&amp;sa[4])) = uint32(port) // port (9001)\n*(*uint32)(unsafe.Pointer(&amp;sa[8])) = 0xFFFFFFFF   // VMADDR_CID_ANY\nsyscall.RawSyscall(syscall.SYS_BIND, uintptr(fd), uintptr(unsafe.Pointer(&amp;sa[0])), 16)\nsyscall.RawSyscall(syscall.SYS_LISTEN, uintptr(fd), 5, 0)\n</code></pre>\n<p>Yeah, that's <code>unsafe.Pointer</code> and manual struct layout. Not the prettiest Go\nyou'll ever write. But it works, it's fast, and the whole vsock layer is about\n160 lines shared between both binaries.</p>\n<p>The wire protocol is dead simple — length-prefixed JSON frames:</p>\n<pre><code class=\"language-go\">func WriteFrame(w io.Writer, v interface{}) error {\n    data, _ := json.Marshal(v)\n    binary.Write(w, binary.BigEndian, uint32(len(data)))\n    w.Write(data)\n    return nil\n}\n</code></pre>\n<p>Each operation (ping, exec, write files, read file) opens a new connection,\nsends one request, reads the response, and closes. Connection-per-request. Not\nfancy, but vsock connections are local and effectively instant, so there's no\nreason to complicate things with multiplexing.</p>\n<h2>The shape of the thing</h2>\n<p>The whole system is two Go binaries — the orchestrator (runs on the host) and\nthe agent (runs inside each VM).</p>\n<pre><code class=\"language-mermaid\">graph TD\n    subgraph &quot;Host — orchestrator binary&quot;\n        API[&quot;REST API + WebSocket :8080&quot;]\n        MCP[&quot;MCP Server :8081&quot;]\n        VM[&quot;VM Manager&quot;]\n        NET[&quot;TAP + iptables&quot;]\n        TASK[&quot;Task Runner&quot;]\n        STREAM[&quot;Pub/Sub Hub&quot;]\n        VSOCK[&quot;vsock Client&quot;]\n    end\n\n    subgraph &quot;Guest — agent binary&quot;\n        AGENT[&quot;Guest Agent vsock:9001&quot;]\n        CLAUDE[&quot;Claude Code&quot;]\n    end\n\n    API --&gt; TASK\n    MCP --&gt; TASK\n    TASK --&gt; VM\n    VM --&gt; NET\n    TASK --&gt; VSOCK\n    VSOCK --&gt; AGENT\n    AGENT --&gt; CLAUDE\n    TASK --&gt; STREAM\n    STREAM --&gt; API\n</code></pre>\n<p>The orchestrator is a single 14MB binary with the React dashboard embedded via\n<code>//go:embed</code>. Copy it to a server, run it with sudo, done. Seven Go dependencies\ntotal — chi for routing, netlink for TAP devices, go-iptables for firewall\nrules, <a href=\"https://github.com/mark3labs/mcp-go\">mcp-go</a> for the MCP protocol, and a\nfew others.</p>\n<p>The agent is a 2.5MB static binary compiled with <code>CGO_ENABLED=0</code>. It ships\ninside the VM's rootfs and starts via systemd on boot. Within about a second of\nthe VM coming up, the agent is listening on vsock port 9001 and ready to accept\ncommands.</p>\n<p>They share exactly one file — <code>internal/agent/protocol.go</code> — which defines the\nwire protocol types and framing functions. Everything else is independent.</p>\n<h2>What a task looks like</h2>\n<p>You give it a prompt. It does the rest.</p>\n<ol>\n<li>Generate a task ID and VM name</li>\n<li>Copy the base rootfs image (sparse, &lt;1 second)</li>\n<li>Inject network config into the rootfs</li>\n<li>Create a TAP device and iptables rules for internet access</li>\n<li>Launch Firecracker via the jailer</li>\n<li>Poll vsock until the agent responds (~1 second)</li>\n<li>Inject credentials and files via vsock</li>\n<li>Run Claude Code with streaming output</li>\n<li>Collect any files Claude created</li>\n<li>Destroy the VM</li>\n</ol>\n<p>From the CLI it looks like this:</p>\n<pre><code class=\"language-bash\">sudo ./bin/orchestrator task run \\\n    --prompt &quot;Write a Python script that generates Fibonacci numbers&quot; \\\n    --ram 2048 \\\n    --vcpus 2 \\\n    --timeout 120\n</code></pre>\n<p>Output streams to your terminal in real time. When it's done:</p>\n<pre><code>=== Task Complete ===\nID:     a3bfca80\nStatus: completed\nExit:   0\nCost:   $0.0582\nFiles:  [fibonacci.py]\n</code></pre>\n<p>The VM is gone. The rootfs is deleted. The TAP device and iptables rules are\ncleaned up. All that's left is the result files in\n<code>/opt/firecracker/results/a3bfca80/</code>.</p>\n<p>Or you use the MCP server, and Claude Code on your laptop delegates the task to\na VM on the box under your desk. Claude spawning Claude. That bit is properly\ncool, and I'll get into it in Part 3.</p>\n<h2>Why Go</h2>\n<p>Quick aside on this because people always ask.</p>\n<p>Go produces static binaries. The agent needs to be a single file with zero\ndependencies that runs inside a minimal Debian guest — <code>CGO_ENABLED=0</code> makes\nthis trivial. The orchestrator needs to manage concurrent VMs, and goroutines\nare a natural fit for that. Syscall support is first-class, which matters when\nyou're doing raw vsock operations. And it compiles in about 2 seconds, which is\nnice when you're iterating.</p>\n<pre><code class=\"language-makefile\">build-agent:\n\tCGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o bin/agent -ldflags=&quot;-s -w&quot; ./cmd/agent\n</code></pre>\n<p>That <code>-ldflags=&quot;-s -w&quot;</code> strips debug info and DWARF tables, dropping the agent\nbinary from ~3.5MB to ~2.5MB. Every byte counts when you're baking it into a\nrootfs that gets copied for every VM.</p>\n<p><a href=\"https://jonno.nz/posts/29-hours-debugging-iptables-to-boot-vms-in-4-seconds/\">Part 2</a> gets into\nthe actual build — the rootfs, the networking (including a fun bug with Ubuntu's\nUFW that had me staring at iptables rules for an embarrassing amount of time),\nthe guest agent, and the streaming pipeline that gets Claude's output from\ninside a VM to your browser.</p>\n","date_published":"Wed, 08 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/claude-code-running-claude-code-in-4-second-disposable-vms.png"},{"id":"https://jonno.nz/posts/stealing-nanoclaw-patterns-for-webapps-and-saas/","url":"https://jonno.nz/posts/stealing-nanoclaw-patterns-for-webapps-and-saas/","title":"Stealing NanoClaw Patterns for Web Apps and SaaS","content_html":"<p>In <a href=\"https://jonno.nz/posts/nanoclaw-architecture-masterclass-in-doing-less/\">Part 1</a> I pulled\napart NanoClaw's codebase and found six patterns that make an 8,000-line AI\nassistant surprisingly robust. But NanoClaw is a single-user tool running on\nyour laptop. Surely these patterns fall apart once you've got real tenants, real\nmoney, and real scale?</p>\n<p>Nah. Four of them translate almost directly — and the ones that don't still\nteach you something useful.</p>\n<h2>The credential sidecar</h2>\n<p>NanoClaw's credential proxy — where containers get a placeholder API key and a\nlocalhost proxy injects the real one — sounds like a neat trick for a personal\ntool. But this exact pattern is showing up in production Kubernetes deployments\nright now.</p>\n<p>The broader version is a\n<a href=\"https://www.apistronghold.com/blog/phantom-token-pattern-production-ai-agents\">sidecar proxy that handles credential injection</a>\nfor any service that needs API keys or tokens. Your application code never\ntouches the real secret. A sidecar container intercepts outbound requests, swaps\nin credentials, and forwards them upstream.</p>\n<p>At Vend we managed a bunch of third-party integrations — payment gateways,\nshipping providers, accounting platforms. Each one had API keys that needed to\nlive somewhere. We went through the typical evolution: environment variables,\nthen a secrets manager, then a service that distributed keys at startup. Every\nstep was an improvement, but the keys still ended up <em>in the application's\nmemory</em>.</p>\n<p>The sidecar approach skips that entirely. Your app sends requests with a\nplaceholder. The proxy — which is a separate process with its own security\nboundary — does the credential swap. Even if your application gets compromised,\nthe real keys aren't there to steal.</p>\n<p>If you're running any kind of multi-service architecture where services call\nexternal APIs, this pattern is worth adopting. Your API gateway might already be\ndoing a version of it — the insight is making it explicit and consistent across\nall outbound credential flows.</p>\n<h2>Isolation as the security model</h2>\n<p>This is the one I keep thinking about.</p>\n<p>NanoClaw uses filesystem mounts to control what each container can see. No\napplication-level permission checks — the security model <em>is</em> the infrastructure\ntopology. If a container can't see a file, it can't access it. No bugs, no\nmissed checks, no escalation vulnerabilities.</p>\n<p>In SaaS, we spend enormous amounts of time writing authorisation logic. Role\nchecks, permission middleware, tenant-scoping queries. And it works — until\nsomeone forgets a WHERE clause.</p>\n<p>AWS's own\n<a href=\"https://docs.aws.amazon.com/whitepapers/latest/saas-architecture-fundamentals/tenant-isolation.html\">SaaS tenant isolation guidance</a>\nmakes this point explicitly: authentication and authorisation are not the same\nas isolation. The fact that a user logged in doesn't mean your system has\nachieved tenant isolation. A\n<a href=\"https://workos.com/blog/tenant-isolation-in-multi-tenant-systems\">single missed tenant filter</a>\non a database query and you've got a cross-tenant data leak.</p>\n<p>The NanoClaw-inspired approach is to push isolation down the stack. Separate\ndatabase schemas per tenant. Separate containers. Separate cloud accounts for\nyour highest-value customers. Not instead of application-level checks — but as a\nbackstop that catches the bugs your application-level checks inevitably have.</p>\n<p>At Xero, working across the integrations and app store teams, I saw first-hand\nhow multi-tenant data isolation gets complicated fast. The teams that had the\nfewest incidents were the ones where the infrastructure itself enforced\nboundaries, not just the application code.</p>\n<p>You don't need to go full NanoClaw and give every tenant their own container.\nBut you should be asking: if my application-level authorisation has a bug,\nwhat's my second line of defence? If the answer is &quot;nothing&quot; — that's the\npattern to steal.</p>\n<h2>Polling when it's the right call</h2>\n<p>NanoClaw polls SQLite every 2 seconds. No WebSockets, no event bus, no pub/sub.\nJust a loop that checks for new stuff.</p>\n<p>The instinct for most teams is to treat polling as a temporary hack you'll\nreplace with &quot;proper&quot; event-driven architecture later. Yan Cui wrote a\n<a href=\"https://theburningmonk.com/2025/05/understanding-push-vs-poll-in-event-driven-architectures/\">solid breakdown of push vs poll in event-driven systems</a>\nand the takeaway isn't that one is always better — it's that the right choice\ndepends on your throughput, ordering, and failure-handling requirements.</p>\n<p>For a lot of internal systems, polling is the correct permanent answer.</p>\n<p>Admin dashboards. Background job status. Internal reporting. Webhook retry\nqueues. Deployment pipelines. These systems don't need sub-second latency. They\nneed reliability and simplicity. A polling loop against your database gives you\nboth, with zero infrastructure overhead.</p>\n<p>At Xero we shipped multiple times per day, and some of the internal tooling that\nsupported continuous deployment was surprisingly simple under the hood. Cron\njobs. Polling loops. SQL queries on a timer. Not because anyone was cutting\ncorners — because the requirements genuinely didn't need anything more\nsophisticated.</p>\n<p>The trap is reaching for Kafka or RabbitMQ because you think you'll need it\neventually.\n<a href=\"https://synmek.com/saas-architecture-for-startups-2025-guide\">70% of startups fail due to premature scaling</a>.\nThe infrastructure you don't deploy is the infrastructure that never breaks.</p>\n<h2>Your database is your message queue</h2>\n<p>NanoClaw uses JSON files on the filesystem for inter-process communication.\nAtomic rename, directory-based identity, simple polling to pick up new messages.\nNo Redis. No message broker.</p>\n<p>That specific approach won't scale to a multi-tenant SaaS — but the <em>instinct</em>\nbehind it absolutely does. The instinct is: use the infrastructure you already\nhave.</p>\n<p>For most web apps, that means Postgres. The\n<a href=\"https://dagster.io/blog/skip-kafka-use-postgres-message-queue\">Postgres-as-queue movement</a>\nhas been gaining serious traction, and tools like\n<a href=\"https://github.com/pgmq/pgmq\">PGMQ</a> make it practical. You get ACID guarantees,\nyou don't need to manage another service, and your queue is backed by the same\ndatabase you're already monitoring and backing up.</p>\n<p>NanoClaw's\n<a href=\"https://dev.to/constanta/crash-safe-json-at-scale-atomic-writes-recovery-without-a-db-3aic\">atomic write pattern</a>\n— write to a temp file, rename into place — maps to <code>INSERT INTO queue_table</code>\nfollowed by a <code>SELECT ... FOR UPDATE SKIP LOCKED</code> consumer. Same principle: the\nmessage either exists completely or doesn't exist at all. No partial state.</p>\n<p>The &quot;just add Redis&quot; reflex is strong in our industry. Sometimes it's the right\ncall. But I've seen plenty of teams introduce a message broker for a workload\nthat Postgres could've handled without breaking a sweat — and then spend the\nnext six months debugging consumer lag and dead letter queues.</p>\n<h2>The real pattern</h2>\n<p>The specific techniques matter less than the discipline behind them.</p>\n<p>NanoClaw's developer looked at a 500,000-line framework and asked: what are my\n<em>actual</em> constraints? Single user. Local machine. One AI provider. And then\nbuilt exactly the architecture those constraints required — nothing more.</p>\n<p>Most teams don't do this. They build for imaginary scale, imaginary\nmulti-tenancy requirements, imaginary traffic spikes. They reach for Kubernetes\nbefore they've outgrown a single server. They deploy event buses before they've\noutgrown a polling loop. They write complex authorisation middleware before\nthey've considered whether infrastructure isolation would eliminate the problem\nentirely.</p>\n<p>The pattern worth stealing isn't the credential proxy or the polling loop or\nPostgres-as-queue. It's the habit of understanding your constraints first and\nletting them delete complexity from your architecture.</p>\n<p>Hardest pattern to adopt, though. Because it means admitting you're smaller than\nyou think.</p>\n","date_published":"Sun, 05 Apr 2026 00:00:00 GMT","image":"https://jonno.nz/og/stealing-nanoclaw-patterns-for-webapps-and-saas.png"}]}