Malachite: How I Learnt to Stop Worrying and Import My Scrobbles

I will do shite out of spite, here's another case study!

January 23, 2026

I've been meaning to write about this for a bit now, and given that people are actually using it (which is still slightly surreal to me), it feels like the right time to properly document what this thing is, why it exists, and why you might want to liberate your listening history from the proprietary platforms that currently hold it hostage.

So, Malachite. Let me tell you about it – and about the weird journey of building a tool that went from "weekend hack to escape Last.fm's walled garden" to something people genuinely rely on.

This is going to be long. I'm not going to apologize for that. If you're the kind of person who's interested in a command-line tool for importing music scrobbles to a decentralized protocol, you're probably also the kind of person who appreciates context, technical detail, and the occasional tangent about data ownership philosophy. If you're not, well, there's a back button somewhere on your browser.

Still here? Good. Let's begin.

What It Actually Does (The Basics)

Malachite is a command-line tool that takes your Last.fm and Spotify listening history and imports it to the AT Protocol network, specifically using the fm.teal.alpha.feed.play lexicon. If you've been on Bluesky and seen Teal feeds showing what people are listening to, that's what this plugs into.

Teal hasn't even officially launched yet, by the way. We're all early adopters here, building tools for a platform that's still in development. Which is either admirably forward-thinking or slightly mad, depending on your perspective. I lean toward the former, but I'm biased.

The whole thing started because I wanted to escape Last.fm's walled garden. I'd been collecting listening data since 2021 (we're talking 88,000+ tracks here), mostly through Spotify, with Last.fm added more recently as a backup. But the thought of all that data being locked away in their databases, behind their APIs, subject to their business decisions, felt increasingly wrong. Like having a library and being told you can only read the books through their approved reading glasses.

I still use Last.fm as an ultimate backup – only started with it properly around January 2025, actually – but old habits die hard, and it's useful to have a redundant copy somewhere. But I resubmit that data to Teal on occasion because Piper (Teal's official submission client) isn't perfect by any means. Which is fine – it's early days – but it meant there was room for something more robust.

Your listening history is a record of your life – the soundtrack to your days, the background music to your worst and best moments. It deserves better than to languish in some company's proprietary database, accessible only on their terms.

The Problem Space

Here's the thing about music tracking services: they're brilliant until they're not. Last.fm has been around since 2002, and for the better part of two decades, it's been the de facto standard for tracking what you listen to. But it's also a centralized service run by a company that's been sold multiple times, had various ownership structures, and could theoretically shut down tomorrow. Remember when they killed the radio feature? Remember when they went through that period where nobody was quite sure who owned them or what the business model was?

Spotify's even worse from a data ownership perspective. They'll give you your extended streaming history if you ask nicely (and wait 30 days for them to generate it), but it's a gift, not a right. There's no API for accessing your own listening history. No continuous export. Just periodic data dumps that you have to manually request. And they can change the terms of that at any time.

The fundamental problem is this: your listening history is valuable data. It's valuable to advertisers, to recommendation algorithms, to music industry analytics. But it's also valuable to you, personally, as a record of your own life. And yet you don't really control it. You're a tenant in someone else's database.

That's what Malachite is trying to fix. Not by replacing these services – I'm not delusional enough to think a command-line tool can compete with Spotify or Last.fm – but by giving you a way to own a copy of your data on infrastructure you control.

The Name (Because Yes, It Matters)

Right, so let me address this straight away because I know someone's going to ask: why "Malachite"?

It used to be called atproto-lastfm-importer. Functional? Absolutely. Inspiring? Not even slightly. That name told you what it did and nothing about why it mattered, and frankly it sounded like something I'd knocked together on a Sunday afternoon and forgotten about by Tuesday. Generic as fuck.

Names matter more than you'd think. A good name makes people remember your project. A bad name makes them gloss over it in search results. atproto-lastfm-importer was SEO-optimized accidentally, sure, but it had all the personality of a database schema. Nobody's going to tell their friends "oh yeah, you should check out atproto-lastfm-importer". But "Malachite"? That's a name that sticks.

Malachite is a greenish-blue copper mineral associated with preservation and transformation. That's exactly what this tool does – it preserves your scrobbles and transforms them into proper fm.teal.alpha.feed.play records on the AT Protocol. The colour match isn't an accident, either. Malachite sits squarely in the teal/green range, a deliberate nod to the teal lexicon it publishes to.

There's also something satisfying about naming a tool after a mineral. Minerals are old, stable, enduring. They've been around longer than any tech company and will be around long after. Feels appropriate for a tool that's about preserving your data for the long term, not just making it accessible for the duration of some startup's funding runway.

Plus, it sounds significantly cooler than "yet another import script". Names matter. If you're going to build something people might actually use, at least give it a name that doesn't make you cringe when you say it aloud. You'll be typing it into terminal commands and GitHub URLs for months or years. Make it something you don't hate looking at.

The Journey (November 2025 to Now)

I started this project back in November 2025. The first commit was pretty bare-bones – just the skeleton of an idea, really. "Let's get Last.fm data into ATProto" was about as sophisticated as the planning got. No roadmap, no grand vision, just a developer with too much coffee and a personal itch to scratch.

The initial version was... well, it was a mess. Written in JavaScript, monolithically crammed into one file, with absolutely no concept of rate limiting or error handling. It worked, technically. But it was catastrophically slow and had a rather memorable failure mode: I accidentally DoS'd my own PDS.

Well, technically it was a Bluesky AppView rate limit that lasted a day, not a proper denial of service attack. But since Bluesky is the AT Protocol app – the one everyone actually uses – it may as well have been a DoS. My PDS was effectively unusable for 24 hours because the AppView had rate limited it after I tried to import too many records too quickly. Not ideal. Not something you'd want to admit to in polite company. Definitely not something you'd want to inflict on other people.

That incident was instructive, though. Taught me very quickly that "just blast records at the API as fast as possible" is not a sustainable import strategy. Hence the complete rewrite in TypeScript with proper rate limiting, batch management, and all the safety features that now prevent you from accidentally taking down your entire PDS.

The Iterations

Over the next couple of months, I kept iterating. Added Spotify support, because why not? If I'm going to do this, might as well do it properly.

I initially tried getting clever with CAR record injection and reimporting – the theory being that if you could just inject records directly into the repository, you could bypass the API entirely. Elegant solution, right? Except it completely failed. Turns out there are very good reasons why the API exists and why you shouldn't try to circumvent it with clever hacks. After banging my head against that wall for a while, I gave up and went with the sensible approach.

Implemented proper batch operations using com.atproto.repo.applyWrites instead of individual calls, which made things about 20 times faster. Turns out publishing 200 records in one API call is significantly more efficient than making 200 separate calls. Who knew? (Everyone. Everyone knew. But I had to learn it the hard way.)

Built in rate limiting that actually respects the fact that your PDS has other users who'd quite like their instance to keep working, thank you very much. Bluesky's AppView has limits on PDS instances – exceed 10,000 records per day and you can rate limit your entire PDS, affecting everyone else on it. That's the kind of thing that gets you angry messages from strangers. The tool now enforces a 7,500 records-per-day limit with proper scheduling and everything.

The Duplicate Problem

Then came the duplicate prevention system, which turned out to be more complex than I initially thought. You've got two layers here: first, deduplicate the input file itself (because sometimes your Last.fm export has the same track listed twice, for reasons that Last.fm itself probably couldn't explain), and second, check against what's already in Teal so you don't import the same scrobble twice if you re-run the tool with an updated export.

The first layer is straightforward enough. You read the CSV or JSON, build a map of unique plays based on track name, artist, and timestamp, and filter out duplicates before you even think about touching the API. Simple string comparison, maybe some normalization to catch edge cases where the same track is listed as "Song Name" versus "Song Name " (with a trailing space), that sort of thing.

The second layer – checking against existing Teal records – that's where it gets interesting from a performance standpoint.

Initially, I just fetched all your existing records in one go. Seemed reasonable. Fire off a request to com.atproto.repo.listRecords, iterate through with cursor-based pagination, build up a Set of existing record keys, done. Works great if you have a few hundred scrobbles. Absolutely murders your import time if you've been tracking your music for a decade.

See, the AT Protocol's listRecords endpoint returns records in batches. You can request up to 100 at a time (the protocol maximum). If you've got 88,000 existing records, that's 880 sequential API calls just to build your list of what you already have. At a conservative 2 seconds per request (accounting for network latency and server processing), that's 1,760 seconds – almost 30 minutes – just to check for duplicates before you even start importing anything.

That's not acceptable. That's the kind of UX that makes people give up and go do something else.

So I implemented adaptive batch sizing. Start small – 25 records per batch – and measure how long each request takes. If the network is fast and the responses come back quickly, increase the batch size. If things slow down, decrease it. The sweet spot tends to be around 50-100 records per batch depending on your network speed and the server's current load, but it varies. Better to let the code figure it out dynamically than to hardcode some arbitrary number and hope it works for everyone.

The result: went from taking several minutes to fetch existing records to doing it in under a minute with decent internet. Still not instant – physics exists, latency is real – but fast enough that it doesn't feel like the tool has frozen. You see progress messages ticking by, you see the batch size adjusting, you see the throughput numbers. It feels alive. That matters.

There's also the question of what constitutes a "duplicate". Same track, same artist, same timestamp down to the second? That's easy. But what about tracks that are off by a few seconds due to timestamp rounding differences between services? What about tracks where one service lists the artist as "Artist Name" and another as "Artist Name feat. Other Artist"? What about live versions versus studio versions?

I settled on a pragmatic approach: tracks are duplicates if they have the same normalized track name, the same normalized primary artist, and timestamps within five minutes of each other. Five minutes is long enough to catch different services reporting the same listen at slightly different times, but short enough that you won't accidentally merge two genuinely separate listens of the same song.

The normalization is important. Everything gets lowercased, punctuation gets stripped, extra whitespace gets collapsed. "Don't Stop Believin'" and "dont stop believin" become the same thing. "Artist Name feat. Someone Else" and "Artist Name" match on the primary artist. It's not perfect – nothing involving fuzzy string matching ever is – but it catches the vast majority of real-world duplicates without creating false positives.

Making It Actually Usable

I also added this whole interactive mode thing, which happened because I realised that asking someone to remember a bunch of command-line flags is a bit much, especially when they're just trying to import their music history. I've been using command-line tools for years and even I forget what -i and -h stand for half the time.

Now you can just run it without arguments and it walks you through the process. Do you want to import Last.fm? Spotify? Both? Should we do a dry run first? That sort of thing. It's the kind of polish that doesn't sound impressive when you describe it, but makes the difference between a tool people actually use and a tool people give up on after five minutes of reading documentation.

Combined Import (The Clever Bit)

The most recent big addition was combined import mode. Because here's the thing: some people (like me) have been using Last.fm for years but also have their Spotify extended streaming history. And those exports overlap. Significantly. You don't want duplicates, but you also don't want to lose data from either source.

This was harder than it sounds.

The obvious approach is to just concatenate both exports, sort by timestamp, and run the standard duplicate detection. But that doesn't work well in practice because the two services have fundamentally different data models and formatting conventions.

Last.fm gives you track name, artist name (with optional MusicBrainz IDs if you're lucky), album name, and a Unix timestamp. The MusicBrainz IDs are gold – they're universal identifiers that let you connect your listening history to the broader music metadata ecosystem. When Last.fm has them, you want to preserve them.

Spotify gives you track name, artist name (no MusicBrainz IDs, because of course not – they have their own proprietary IDs that are useless outside Spotify), album name, and an ISO 8601 timestamp. The timestamps are more precise than Last.fm's (millisecond precision versus second precision), and the metadata is sometimes more complete, but you lose the MusicBrainz linking.

And then there's the formatting. Spotify tends to format artist names with features inline: "Artist Name feat. Featured Artist". Last.fm is more variable – sometimes it's "feat.", sometimes it's "ft.", sometimes the featured artist is in parentheses, sometimes it's not there at all. Track names have similar variability. "Song Name (Live)" versus "Song Name - Live" versus just "Song Name" if the live designation got dropped somewhere.

So combined mode does this:

1. Parse both exports completely.

Load everything into memory (this is fine – even 100,000 tracks is only about 50MB of data), convert to a normalized internal representation.

2. Normalize everything for comparison.

Lowercase, strip punctuation, collapse whitespace, split artist names on common delimiters (feat., ft., featuring, with, &). Build a normalized "key" for each track that can be compared across services.

3. Sort by timestamp.

Chronological order, oldest first.

4. Walk through and merge.

For each track, check if there's already a track in the output with the same normalized key and a timestamp within 5 minutes. If yes, this is a duplicate – pick the better version and discard the other. If no, add it to the output.

5. Choose the better version intelligently.

Prefer Last.fm if it has MusicBrainz IDs (because that metadata is valuable). Otherwise, prefer Spotify (because the metadata quality is generally better, and the timestamps are more precise). Keep track of these decisions for statistics.

The result is a single chronological timeline with duplicates removed, taking the best bits from each source. The statistics it spits out are satisfying: "Found 15,234 Last.fm records and 8,567 Spotify records, merged to 16,959 unique plays after removing 7,842 duplicates." You can see exactly what happened.

It's one of those features that sounds simple until you actually try to implement it, at which point you discover edge cases you never imagined existed. Did you know that Spotify and Last.fm format artist names differently? Did you know that timestamp precision varies? Did you know that people listen to the same song multiple times in a row and you need to be very careful about what constitutes a "duplicate"?

I know all of these things now. Whether I wanted to or not.

The other thing that makes combined mode tricky is that you're essentially building a temporary database in memory, doing joins and deduplication, and then outputting a result set. It's not complex from a computer science perspective – this is freshman-year data structures stuff – but getting the details right so that it actually produces correct results on real-world messy data is surprisingly fiddly.

I tested it extensively on my own overlapping exports. Imported just Last.fm, recorded how many tracks. Imported just Spotify, recorded how many tracks. Imported combined, checked that the total was less than the sum (meaning duplicates were actually being removed), spot-checked a bunch of individual records to verify they had the right data from the right source. It works. But it took more iterations than I'd like to admit to get there.

The UX Details (Because They Actually Matter)

One thing that surprised me was how much the UX details mattered. Colour-coded output, progress bars, clear error messages, spinner animations whilst it's fetching data – all those little polish bits make the difference between a tool that works and a tool that people actually enjoy using.

Even if "enjoy" feels like a weird word for a command-line import script.

Green checkmarks for success, red X marks for errors, cyan arrows for progress updates, yellow warnings for things that aren't broken but might need attention. It sounds trivial, but when you're staring at terminal output for 20 minutes whilst your import runs, that visual clarity matters. You want to know at a glance whether everything's fine or whether you need to intervene.

The progress bars show you current batch, total records, estimated time remaining, and real-time throughput. The fetch operations show you how fast they're going and adjust batch sizes accordingly. The error messages actually tell you what went wrong and suggest how to fix it, rather than just dumping a stack trace and hoping you can figure it out.

This is the kind of polish that doesn't appear in feature lists, but it's what separates "technically functional" from "actually pleasant to use". And life's too short for software that's merely functional.

Architecture and Design Decisions

Since we're already this deep into the technical weeds, let's talk about how Malachite is actually structured. Because the architecture decisions matter – they're what make the difference between a tool that works once and a tool that keeps working.

The TypeScript Rewrite

After the disastrous initial JavaScript version that DoS'd my PDS, I rewrote everything in TypeScript. Not because I'm a type safety zealot (though types are nice), but because working with the ATProto SDK without them would be absolutely maddening.

The ATProto SDK returns deeply nested response objects with lots of optional fields and union types. Without TypeScript, you're either writing defensive checks everywhere ("if response.data exists and response.data.records exists and response.data.records[0] exists...") or you're living dangerously and hoping nothing blows up at runtime. TypeScript lets you express these constraints in the type system and get compile-time errors if you mess up. Much better.

The rewrite also let me properly structure the codebase. Instead of one monolithic file, everything's split into logical modules:

lib/ - Core functionality (auth, CSV parsing, publishing, merging, sync)
utils/ - Reusable utilities (logging, UI elements, rate limiting, TID generation, credential storage)
types.ts - TypeScript type definitions for everything
config.ts - Configuration constants

Each module has a single responsibility. The CSV parser doesn't know about authentication. The publisher doesn't know about file parsing. The rate limiter is its own subsystem with its own logic. This makes testing easier (you can test each piece independently), makes the code more maintainable (changes to one part don't ripple through everything), and makes it easier to add new features without breaking existing ones.

The TID Generation System

TIDs (Timestamp Identifiers) are ATProto's primary key system. They're 13-character base-32 encoded strings that embed a timestamp, allowing records to naturally sort chronologically. Getting TID generation right was critical because if you screw it up, your records end up out of order or you get collisions.

The naive approach is to just convert each track's timestamp to a TID. But this has problems:

1. Collision risk.

If two tracks have the same timestamp (down to the millisecond), they'll generate the same TID. That's a collision. One of them won't import.

2. Non-monotonic TIDs.

If you're processing tracks out of order (say, because you're batching them), you might generate TIDs that don't sort correctly.

Malachite solves this with a TID clock that guarantees monotonicity. Even if you feed it timestamps out of order, it ensures that each generated TID is strictly greater than the previous one. It maintains state across calls, detects when you're trying to generate a TID that would be too old, and adjusts accordingly. The state is persisted to disk, so even if you stop and restart the import, the monotonicity guarantee holds.

This is one of those things that's invisible when it works correctly and catastrophic when it doesn't. Getting it right meant diving deep into the ATProto TID spec, understanding the bit layout, figuring out how to maintain order guarantees, and testing it extensively with edge cases.

The Rate Limiting Subsystem

Rate limiting deserved its own subsystem because it's complicated and critical. Get it wrong and you can take down your PDS for everyone. Get it right and imports just work without anyone thinking about it.

The system works like this:

1. Calculate daily limit.

Take the PDS limit (10,000 records/day), apply safety margin (75% by default = 7,500 records/day).

2. Determine if multi-day import is needed.

If your import exceeds the daily limit, calculate how many days you'll need.

3. Calculate optimal batch parameters.

Given the daily limit and the need to spread records evenly throughout the day, calculate batch size and delay between batches.

4. Adjust dynamically during import.

Monitor success/failure rates. Speed up if things are going well (consecutive successes). Slow down and reduce batch size if you hit rate limits (exponential backoff).

5. Handle multi-day pauses gracefully.

If the import spans multiple days, pause for 24 hours between days, preserve state so it can resume.

The adaptive component is important. The tool starts conservatively (100 records per batch, 2 seconds between batches) and adjusts based on what actually happens. If the server's fast and responsive, it speeds up. If you hit a rate limit, it backs off aggressively. This means the tool is never blindly hammering the API – it's responding to feedback.

The Credential Storage System

Malachite can optionally save your credentials for convenience. This needed to be done carefully because storing passwords, even encrypted, is security-sensitive.

The approach:

1. Machine-specific encryption.

The encryption key is derived from your hostname and username using PBKDF2 with 100,000 iterations. This means the encrypted credentials only work on your machine. Copy the file to another computer and it won't decrypt.

2. AES-256-GCM.

Industry-standard authenticated encryption. Not just encrypted, but authenticated – if someone tampers with the file, decryption fails.

3. Proper file permissions.

On Unix systems, the credentials file is chmod 600 (readable only by owner). On Windows, well, Windows file permissions are their own special circle of hell, but we do what we can.

4. Optional.

You never have to save credentials. You can always just type them in each time. The storage is purely for convenience.

This is one of those features where you have to think carefully about threat models. The goal isn't to protect against nation-state adversaries – if that's your threat model, you shouldn't be storing credentials locally at all. The goal is to protect against casual compromise (someone steals your laptop) while providing convenience for normal use.

Data Formats and Lexicons

The record format follows the fm.teal.alpha.feed.play lexicon. Each scrobble becomes a play record with track name, artists (as an array, because songs can have multiple artists, obviously), timestamp in ISO 8601 format, and optionally things like MusicBrainz IDs, album names, and origin URLs.

The keys are TID-based (timestamp identifiers), which means your records naturally sort chronologically. This is important if you care about seeing your listening history in the correct order, which most people do. Turns out humans are quite attached to the concept of time flowing in one direction.

MusicBrainz IDs are preserved when available from Last.fm, which is honestly one of the best things about the Last.fm export. Unique identifiers for tracks, albums, and artists mean you can actually connect your listening history to other music databases. Spotify doesn't provide these, which is a shame but not surprising given their closed ecosystem approach.

The Community Bit (Still Slightly Surreal)

Here's the thing that still gets me: people actually use this. Not just me tinkering with it on my own machine, but actual other people importing their actual listening history to Teal.

I made a Last.fm/Spotify-to-TealFM scrobble converter tool and people actually use it. That's insane to me. Still feels a bit unreal, if I'm honest.

There's already Piper, which is Teal's official submission client for real-time scrobbling. But Piper's for ongoing tracking – it sits there and logs what you're listening to as you listen to it. Malachite is for the backfill problem: you've got years of historical data and you want it on ATProto. Different use cases, different tools. And frankly, even for ongoing use, having options is good. Piper isn't perfect (what is?), and sometimes you want the nuclear option of just importing a fresh export to make sure everything's there.

It's also led to some contributions. Someone fixed the PDS resolution system. Another person added missing dependencies. Small contributions, but they matter. It's nice when something you built turns out to be useful to others.

The interactive mode came from me realising that command-line flags are a terrible user experience for anyone who isn't already comfortable with CLI tools. I'd been using it with flags for ages (because that's how I work), but it became clear that a menu system would be more accessible. So I built one.

Most of the testing has been dogfooding, to be honest. I've got around 88,000 scrobbles myself – mainly from Spotify – covering the last 4.37 years. That's a decent test case for scale and edge cases. You learn a lot about your tool when you're importing nearly a half-decade's worth of your own listening history and watching it process tens of thousands of records.

The duplicate detection? Tested on my own overlapping Last.fm and Spotify exports. The rate limiting? Necessary because I didn't want to accidentally rate limit my entire PDS whilst importing 88K records. The resume functionality? Built because imports that large take time and I wanted to be able to stop and start without losing progress.

Turns out building for your own needs is a pretty good way to build something that works.

Why I Actually Built This (The Real Answer)

I think what draws me to projects like this – and to ATProto generally – is the sense that you can actually own your data in a meaningful way. Not "own" in the sense that some terms of service document says you technically retain rights to it, but own in the sense that you control it, can move it, can do what you want with it.

Your Last.fm scrobbles aren't really yours in any practical sense. They're in Last.fm's database, behind their API, subject to their terms of service and rate limits. They can change the API whenever they want. They can shut down the service (remember the 2009 apocalypse predictions?). They can decide tomorrow that exporting your data requires a premium subscription. They can pivot their entire business model and leave you with nothing. You have no recourse. Same with Spotify – your extended streaming history is a gift they deign to provide, not a right you can exercise. And they can revoke that gift at any time.

I got tired of that. Tired of building my digital life on someone else's infrastructure, subject to their business model changes and their pivot decisions. Tired of wondering if my scrobbles from 2021 would still be accessible in 2030. Tired of feeling like a tenant in someone else's building.

The Larger Pattern

This isn't just about music scrobbles. It's about a pattern that's played out across the entire social internet over the past two decades.

Remember Google Reader? Millions of people used it daily to keep up with blogs and news. Then Google killed it in 2013 because it didn't fit their strategic priorities. Didn't matter that people relied on it. Didn't matter that there was no good replacement. Google's priorities changed, so the service died. If you'd built your information consumption habits around Reader, tough luck.

Remember when Twitter had a robust API ecosystem and third-party clients were better than the official client? Then Twitter decided third-party clients were cutting into their ad revenue and strangled the API. Apps people had paid for stopped working. Workflows people had built over years broke overnight. Twitter's business model changed, so the ecosystem died.

These aren't hypotheticals. These are things that actually happened. And they happen because when you build on someone else's platform, you're subject to their decisions. They can change the rules. They can raise prices. They can shut down. And you have no recourse except to find a new platform and start over.

The promise of decentralization – the actual promise, not the blockchain buzzword version – is that you don't have to play that game. Your data lives on infrastructure you control. If a service goes away, you still have your data. If a company pivots, you're not stranded. If someone decides to start charging $8/month for a blue checkmark, you can laugh and keep using the protocol with a different client.

The ATProto Angle

ATProto is interesting because it's trying to thread a very specific needle. It's trying to be:

1. Actually decentralized.

Your data lives on your PDS, not on Bluesky's servers (unless you choose Bluesky's PDS). You can move between PDS providers without losing your identity, your followers, or your posts.

2. Actually usable.

Most decentralized systems have UX that makes you want to cry. ATProto, via Bluesky, has UX that's actually pleasant. It feels like Twitter circa 2020. That's not an accident – the team knows that if the UX is bad, nobody will use it no matter how good the protocol is.

3. Actually federated.

Not just in theory, but in practice. Multiple PDS providers exist. Multiple AppViews exist (Bluesky is the big one, but it's not the only one). The protocol is open. The implementations are open source.

This is hard to pull off. Most attempts at decentralized social media fail at one of these. Mastodon is decentralized and federated but the UX is... let's be kind and say "challenging". Blockchain-based systems are theoretically decentralized but practically unusable and often just centralized with extra steps. Corporate platforms are usable but not decentralized at all.

ATProto might actually thread the needle. It's early days – the protocol is still evolving, there are rough edges, not everything works perfectly yet. But the fundamentals are sound, and that matters.

Music Listening History as a Test Case

Music listening history is a good test case for decentralized data ownership because:

1. It's personal.

It's a record of your own behaviour, your own preferences, your own life. There's no reason anyone else should own that data.

2. It's valuable.

Not just to you personally, but to recommendation systems, to the music industry, to advertisers. There's economic value in this data, which is why companies want to own it.

3. It's portable.

Unlike social graphs (which are inherently networked and harder to move), your listening history is just a list of records. It should be trivial to export and import. The fact that it isn't is a policy choice, not a technical limitation.

4. It accumulates over time.

This isn't data you generate once. It's data you generate continuously over years. Losing access to it means losing a substantial chunk of your digital history.

If we can get music listening history right – if we can make it so your scrobbles live on infrastructure you control, can be moved between services, are actually yours in a meaningful sense – then we've proven the concept works. And if it works for music listening history, it can work for other kinds of personal data too.

That's the larger goal. Malachite is just one small piece of it. But every piece matters.

But on ATProto? That's your data on your PDS. You control it, you can query it, you can build on it, you can move it between servers if you want. And that feels important in a way that's hard to articulate without sounding preachy or like I'm selling something. I'm not selling anything – this is free and open source and always will be. I'm just saying: your data should actually be yours.

It's not just about decentralisation for its own sake (though that matters). It's about the fundamental question of who owns the record of your life. Your listening history is a kind of diary – it captures your moods, your phases, your obsessions, the music that got you through difficult times. That shouldn't be locked away in some company's database, accessible only through their interface, subject to their business decisions.

I have my own website that dynamically adapts to my social life on the internet. I've run my own PDS. I've moved between different PDS instances without losing followers or data. This is the actual promise of ATProto – not theoretical portability that exists on paper but never works in practice, but actual portability that you can experience directly. Malachite is just one small piece of that puzzle, extending that ownership to your music listening history.

Also, I just really like being able to see what I was listening to on a random Tuesday in 2022. I'm a data hoarder. There's something weirdly compelling about having that complete archive of your musical journey, even if most of it is just "listened to this track 47 times because I had it on repeat whilst coding". Your phases are embarrassing in retrospect, but they're yours. That matters.

The Actual Technical Details (For Those Who Care)

The whole thing lives at github.com/ewanc26/malachite. It's TypeScript, uses the ATProto SDK, depends on a handful of libraries for CSV parsing and UI elements. The code's structured pretty sensibly (I hope), with separate modules for authentication, publishing, CSV/Spotify parsing, merging, syncing, and all the utility bits.

Current version is 0.7.2. There have been... a lot of versions. Let's just say I learnt a lot about semantic versioning along the way.

It stores everything in ~/.malachite/ following Unix conventions. Cached records, import state for resume functionality, logs when you enable file logging, optionally encrypted credentials. All nicely contained so it doesn't clutter up your project directory.

The credential storage uses AES-256-GCM encryption and is machine-specific, which is a fancy way of saying your credentials are encrypted and won't work if you copy them to another computer. It's a convenience feature more than anything – you can always just type them in manually.

The Broader Ecosystem (And What Comes Next)

Malachite doesn't exist in isolation. It's one tool in a larger ecosystem of ATProto applications and services, and understanding that context matters.

Teal and the Music Listening Future

Teal (fm.teal.*) is still in development. There's no official launch, no big announcement, no marketing push. It's being built in public, gradually, by people who care about music tracking and decentralized infrastructure. Piper is the official real-time scrobbler. Malachite is the historical import tool. Other tools are being built by other people. It's an ecosystem emerging organically rather than being centrally planned.

This is both good and bad. Good because it means the ecosystem is driven by actual needs rather than corporate strategy. Bad because things are fragmented, documentation is sparse, and you sometimes have to piece together how things work from reading code and asking questions.

But that's the price of being early. When you're building on a platform that hasn't officially launched yet, you're simultaneously a user and a contributor. Every bug you find, every edge case you hit, every piece of feedback you provide – that shapes what the platform becomes.

I've tested Piper extensively and reported back on what works and what doesn't. This is part of being in an early ecosystem: you don't just use tools, you help build them.

The Personal Data Server Model

The PDS (Personal Data Server) model is central to how ATProto works, and it's worth understanding because it affects how tools like Malachite operate.

Your PDS is where your data lives. It's like your personal server, but for social data. When you post something on Bluesky, it's stored on your PDS. When you import scrobbles with Malachite, they're stored on your PDS. Your PDS is authoritative for your data.

But PDSs aren't isolated. They talk to each other and to AppViews. When someone follows you, their PDS talks to your PDS to get updates. When you post something, AppViews index it so it shows up in feeds and search. The protocol handles all the synchronization and indexing.

This model has implications:

1. You can host your own PDS.

If you're technically inclined, you can run your own server. Your data lives on infrastructure you physically control. I've done this – it's not trivial, but it's doable.

2. You can move between PDS providers.

If your current provider goes away or you don't like them anymore, you can migrate to a different provider without losing your identity or your data. Your handle stays the same, your followers stay the same, your posts stay the same. Only the underlying storage location changes.

3. Tools like Malachite write to your PDS directly.

When you import scrobbles, they go to your PDS, then get propagated to AppViews automatically. You don't have to do anything special. The protocol handles it.

This is fundamentally different from how centralized platforms work. On Twitter or Last.fm, there's one database, controlled by one company, and that's where everything lives. You can't move it, you can't host it yourself, you can't even really control who has access to it. On ATProto, your data is yours in a very concrete sense.

What I'm Not Building

It's worth being explicit about what Malachite doesn't do and doesn't try to do:

It's not a real-time scrobbler. Piper does that. Malachite is for bulk imports of historical data. Different use case, different tool.

It's not a music player. It imports listening history; it doesn't play music or connect to streaming services. You still need Spotify or Apple Music or whatever for actually listening.

It's not a social platform. It doesn't have its own UI for browsing listening history or discovering music. It writes to Teal, and other applications (Teal clients, feeds, etc.) handle the social aspects.

It's not trying to replace Last.fm or Spotify. Those are services with features and ecosystems that would take years to replicate. Malachite is just trying to give you ownership of your data.

Knowing what you're not building is as important as knowing what you are building. Scope creep kills projects. Malachite does one thing – imports listening history to ATProto – and does it well. That's enough.

Possible Future Directions

That said, there are some natural extensions that might make sense down the line:

1. Better error recovery.

Right now if a batch fails, the tool logs it and moves on. It would be nice to have more sophisticated retry logic, partial batch recovery, better debugging information when things go wrong.

2. Support for other music services.

Apple Music, YouTube Music, Tidal – they all have listening history. If they provide reasonable export options, supporting them would be straightforward. But that's an "if" – most services don't make it easy to export your data.

3. Analysis and statistics.

Your listening history is a rich dataset. It would be interesting to generate statistics, visualizations, insights. But that feels like a separate project. Something that consumes the data Malachite imports rather than being part of Malachite itself.

4. Integration with other Teal tools.

As the ecosystem develops, there might be opportunities for better integration. Automatic synchronization, cross-referencing with other data sources, that sort of thing.

But these are all "maybe someday" items. The core tool is feature-complete. It does what it's supposed to do. Sometimes that's a good place to be.

In Conclusion (Sort Of)

Malachite started as a weekend hack to escape Last.fm's walled garden and get my scrobbles onto ATProto. It's turned into something significantly more polished than that – a tool with proper rate limiting, duplicate prevention, combined import modes, interactive menus, adaptive batch sizing, progress tracking, and more edge case handling than I ever wanted to implement.

It does one thing (liberates your music listening history from proprietary platforms) and does it well. Sometimes that's enough.

The name makes sense. The UX is pleasant. The technical implementation is sound. The rate limiting protects your PDS and everyone who shares it. The duplicate prevention means you can safely re-run imports without creating mess. The combined import mode handles the Spotify/Last.fm overlap problem that I genuinely thought would be unsolvable when I started.

If you've got years of Last.fm data sitting in a CSV somewhere, or Spotify extended streaming history gathering digital dust, maybe give it a try. Your scrobbles deserve to be on ATProto too. They're part of your digital life, part of your history. They should live somewhere you control, not locked away in some company's database behind their API rate limits and terms of service.

And if you do use it? Let me know how it goes. I'm always curious to hear about other people's music archives and their journey importing them. Bug reports are welcome. Feature requests too, though I make no promises about implementing them – sometimes "feature complete" is a destination worth reaching rather than a waypoint to ignore.

The repository is at github.com/ewanc26/malachite. Current version is 0.7.2. Documentation is in the README. It's licensed AGPL-3.0-only, because I believe in open source and copyleft and in not letting your work be locked up in someone else's proprietary product.

Right then. That's Malachite. Hope this made sense. Hope it's useful. Hope someone reads this and thinks "yes, that's exactly what I need" rather than "why did this person write 2,000 words about a data import tool".

But if you're in the second camp, you probably stopped reading a while ago. And that's fine too.

P.S. – Yes, I know the repository is still called atproto-lastfm-importer on Tangled. I don't know if this can be resolved. GitHub's been updated though, so at least there's that. Small victories.

Back to NixOS (and completely winging it)

Hit By A Silver Bullet: The Timing of Being Early with W

atproto

music

Ewan’s Blog

I ramble, enjoy.