So here's a fun story about the time I managed to completely kneecap my own Personal Data Server (...yesterday) through what can only be described as catastrophically terrible coding decisions made at 01:00. You'd think running your own infrastructure in the past would teach you to be careful. You'd be wrong!
The Setup
I had just migrated from altq.net to tophhie.social – both part of the whole AT Protocol ecosystem that Bluesky (and this blog!) sits on top of. For those not familiar, think of it as hosting your own slice of the social network: your posts, your data, your rules. It's brilliant in theory. In practice, it means you're responsible when things go spectacularly wrong.
I'd also recently been getting into Teal – this decentralised music scrobbling service that lives on ATProto. Think Last.fm, but distributed and living alongside your social posts. I'd been using Last.fm for ages (we're talking 16,000+ scrobbles here, across nearly a year), and naturally, I wanted to import all that history into Teal. Because why wouldn't you?
The thing is, Teal uses the same ATProto infrastructure. Records for each song you've listened to get written to your PDS as individual entries under fm.teal.alpha.feed.play (alpha since it's early software). Simple enough, right? Just fetch your Last.fm history via their API or an exporting tool, convert it to the right format, and push it to your PDS.
Where It All Went Wrong
I wrote a script. In JavaScript. At one in the morning. These three facts should immediately raise alarm bells.
The logic seemed sound: paginate through Last.fm's export, batch the results, convert each scrobble to the Lexicon format, and fire them off to my PDS. I even added a progress indicator! Look at me being all professional and everything.
What I didn't do – and this is the crucial bit – was implement any form of rate limiting or throttling on my end that actually respected the documented behaviour.
Now, here's where it gets interesting. My PDS isn't just serving me – it's connected to the wider ATProto network through the AppView, which is how posts and records get propagated across the network. When you write a record to your PDS, it notifies the AppView, which indexes it and makes it available to everyone else.
I was essentially firehosing roughly a quarter thousand records per minute directly at my PDS, which was dutifully trying to process each one, notify the AppView, and handle all the associated indexing. For sixteen thousand records.
Break things first and ask questions later, on a Friday, no less.
The Consequences
Within about ten minutes, things started to get weird. My PDS became... sluggish. Then unresponsive. Then completely dead.
But here's the really embarrassing part: I didn't just DoS myself. Because my PDS was trying to notify the AppView about every single record, and because I was overwhelming it with requests, I effectively created a feedback loop that impacted anyone trying to use any Bluesky client that relied on the PDS.
We're talking roughly 16 hours of degraded service. Sixteen. Hours.
I only noticed my wrongdoing when I had been "shadowbanned." The Bluesky app would show that I had created a record (a post, like, repost et cetera) but then silently drop it from the AppView after a refresh.
This was all happening whilst I was at college since I only noticed at around 09:30, 7 and a half hours after I had ran the importer and moved my records. I was incredibly anxious, thinking that I had royally fucked up.
I even emailed both Bluesky support directly and Chris Greenacre, my PDS administrator, apologising profusely as I felt horrible about the accidental attack. I had just migrated from my previous PDS and made a horrible impression like I was some Viking that ransacked an Englishman's home and stayed there.
Chris, if you're reading this, thank you so much for being forgiving for the headache.
What I Should Have Done
Looking back (and this is the bit where I try to salvage some technical credibility), there were about four different obvious things I should have done:
Rate limiting, obviously. A simple large delay between batches would have been enough. Hell, even a sleep between individual records would have helped. But no, I just let it rip at full throttle, 4 records a second.
Batch validation. I should have tested with a small batch first – maybe 100 records – to see how the PDS handled it. You know, like a sensible person who values uptime.
Time of day considerations. Running this at 1am (my time) seemed clever because "fewer people online means less impact", except a) it's a global network, and b) I fell asleep mid-import and didn't notice things going wrong until the morning.
Actually reading the documentation. Turns out there are recommended batch sizes and timing considerations in the Bluesky docs. Who knew? (Everyone. Everyone knew. Except me, apparently.)
I am literally doing Cybersecurity in my final Level 3 IT exam this coming January. This is basic shite that I forgot.
The Fallout
I did end up getting the scrobble history ported to Teal, but I immediately knew I needed to do a proper rewrite. So I did.
I also published it on Tangled alongside GitHub for anyone else who wants to import their Last.fm history without accidentally recreating a textbook DoS scenario. The TypeScript version is much better. Promise.
The wider lesson here is about running your own infrastructure. When you're on someone else's platform (i.e., Twitter), the worst you can do is post something embarrassing or hit some API rate limit that temporarily locks you out. When you're running your own PDS (or being hosted by a volunteer wanting to host a cozy space on the internet) and writing directly to the protocol layer, you can apparently take down chunks of the network through sheer incompetence!
It's empowering! And terrifying! Mostly terrifying!
Thank Fuck for Fallbacks
One small mercy in all of this: I'd implemented fallbacks on my website that detect when the AppView is struggling and switch to pulling data directly from my PDS instead. So whilst Bluesky clients were struggling, at least my personal site kept working. Small victories, I suppose.
Reflections
There's something almost poetic about the decentralised web making it easier to accidentally DoS yourself. In the old centralised model, you'd hit a rate limit and get a 429 error. Done. Service protected. But when you're running the service and the client, and you've got root access to both... well, you can do a lot of damage very quickly.
The ATProto dev community have been incredibly gracious about the whole thing (probably because I'm far from the first person to do something like this), and Chris was an absolute legends about getting everything stabilised again. The community around this stuff is genuinely wonderful, which makes it slightly more mortifying when you're the dumbarse who broke things.
The Takeaway
If you're going to run your own infrastructure – whether it's a PDS, a home server, or literally anything else—for the love of god, implement rate limiting. And monitoring. And test with small batches first. And maybe don't do it at one in the fucking morning when you're clearly not thinking straight.
Also, if you're importing data into ATProto and you've got thousands of records to migrate: pagination is your friend, sleep statements are your friend, and TypeScript's type system will save you from yourself more often than you'd think.
I've learned my lesson. Probably. At least until the next time I get a bright idea at 01:00 and decide it definitely can't wait until morning.
The music widget on my website works beautifully now, though. So there's that.