Bismuth: On Refraction and Conversion

April 23, 2026

I've been publishing to the AT Protocol for a while now. Blog posts, project documentation, the odd creative thing — all of it lives as site.standard.document records on my PDS, which means all of it lives as block trees. Richtext blocks, facets with byte-slice annotations, nested lists, images as blob references. Great for a federated document store. Opaque to everything else.

The problem: if you want to take a Standard.site document and actually use it somewhere that isn't Standard.site — feed it into a static site generator, archive it, diff it, or just read it in a terminal — you need something that understands the block tree and can produce plain text. Markdown is the obvious target. It preserves the document's semantic structure without requiring a custom renderer.

The other problem: if you ever want to migrate off a platform, you need your words back in a format something else can consume. My blog currently lives as a Leaflet publication on the AT Protocol. It works. But I've been through this before — I was a Hugo user before I moved to Standard.site, and the only reason that migration was painless was because the content was already Markdown. If I ever want to move the blog back to a static site generator, or republish via Sequoia the way I do with my docsite, I need those block trees turned back into plain Markdown files first.

The facet byte-slice annotation logic in particular — which must be applied in reverse order by span length to avoid index drift — is fiddly enough to be worth extracting once into a tested library.

So: Bismuth.

It converts richtext-block documents from any of Standard.site's three publishing platforms — Leaflet (pub.leaflet.*), Pckt (blog.pckt.*), and Offprint (app.offprint.*) — to Markdown. Ships as both a CLI tool and a TypeScript library. Zero runtime dependencies. ESM and CJS. Full type definitions.

Why the name?

Bismuth the element is known for its iridescent oxide surface — a single underlying structure that takes on different colours depending on how light hits it. The metal itself is silvery-white; the rainbow effect comes from a thin oxide layer that forms during cooling, producing thin-film interference the same way a soap bubble does. A pub.leaflet document is the same kind of thing: one block tree that can be rendered as a rich web UI, a terminal reader, a static site, or plain Markdown depending on what's doing the rendering. The underlying structure doesn't change — the surface does.

It also fits the monorepo's loose mineral naming convention. Malachite came first — the Last.fm/Spotify scrobble importer, named after the green copper carbonate mineral that's been ground into pigment since predynastic Egypt, which felt right for something that takes raw listening data and gives it a new surface. Jasper came next, the Instagram importer, named after the opaque red chalcedony coloured by iron oxide inclusions — fitting for something that preserves photographic memories. Bismuth continues the pattern.

Three platforms, one block tree

Standard.site isn't a single editor. It's a specification for how longform content lives on the AT Protocol — site.standard.document records, each with a content field that points to one of three platform-specific block formats. The platforms share the same DNA but have evolved separately enough that they're not interchangeable.

This wasn't always the plan. When Leaflet, Pckt, and Offprint started building, each had their own schemas for longform publishing on the AT Protocol. They were doing the same thing independently — defining how blogs, posts, and documents should be stored on a user's PDS. The collaboration that became Standard.site was the coordination layer that had been missing: a shared foundation for publications, documents, and subscriptions that all three platforms could implement, while leaving the content format (the block tree, the editor UX) to each platform. That's why the block types differ between pub.leaflet.*, blog.pckt.*, and app.offprint.* — the metadata is standardised, the content is not.

Leaflet (pub.leaflet.*) is the one I use. It's the editor built by Hyperlink Academy — a collaborative writing and social publishing tool at leaflet.pub. Content is organised into pages (linear or canvas), each page is an array of blocks with alignment metadata, and each block has a plaintext field plus an optional facets array for inline formatting. Lists use a children structure where each item has a content field and nested children. It's the most feature-rich of the three: math blocks, button blocks, iframe embeds, polls, page links, Bluesky post embeds.

Pckt (blog.pckt.*) takes a different approach. It's simpler — no pages, no alignment, no canvas layout. Content is a flat items array, or in extended mode a blob reference that needs to be fetched from the PDS separately. Lists are structured differently: a content array where each item contains text blocks and potentially nested list blocks, rather than Leaflet's children tree. Images have an attrs object with src and alt directly, rather than Leaflet's blob reference with a separate aspectRatio field.

Offprint (app.offprint.*) is the newest of the three. It shares some structural DNA with Pckt — flat items array, no pages — but adds its own quirks. Task lists with nested checkboxes. Coloured highlights, not just plain highlighting. Web embeds with siteName and preview blob references. Bluesky post embeds use StrongRef objects instead of bare URIs. Mentions can include both a did and a handle. The block type names differ slightly too: codeBlock instead of code, bulletList instead of unorderedList.

What they all share: typed blocks with a $type discriminant, facet annotations with byte-slice indices, and the same core set of block types (text, headings, blockquotes, lists, images, code, horizontal rules). The differences are in the details — field names, nesting structures, which features exist in which namespace. Bismuth normalises all of that into a single conversion pipeline.

The facet engine

This is where most of the real work lives, and it's the bit I keep having to re-implement if I don't extract it.

Facets are how Standard.site handles inline formatting — bold, italic, links, code, strikethrough, highlights, footnotes, mentions. Instead of embedding Markdown or HTML directly in the text, the plaintext field contains the raw text and the facets array stores annotations with byte offsets. A facet has an index (byteStart, byteEnd) and a features array (a single span can carry multiple features — bold and link, for instance).

The byte offsets are UTF-8 byte positions, not character positions. If your text contains multi-byte characters — é, ñ, any CJK character, the em dashes I use constantly — a naive character-index approach will place the markers in the wrong position. Bismuth handles this by encoding the plaintext to a Uint8Array, walking through it byte by byte, decoding each Unicode scalar individually, and emitting Markdown markers at the correct byte positions.

The application order matters. If you have overlapping spans — say, a bold span that contains an italic span — you need the outer markers to go on first and the inner markers to close first. Bismuth sorts facets by span length in descending order: wider spans first. This means wider spans get their opening markers placed before narrower spans at the same position, and narrower spans get their closing markers placed first when walking back through the "after" markers. The result is correct nesting: **bold *and italic*** rather than ***bold and italic** or, worse, mismatched markers that break rendering.

The 0.2.3 release fixed a subtle edge case I hadn't accounted for. If a facet span starts or ends with whitespace — something like ** bold text** — the Markdown markers land on the wrong side of the space, producing invalid syntax. Most Markdown parsers will just render the asterisks literally. The fix was straightforward: spanTrimStart and spanTrimEnd helper functions that advance past leading spaces and retreat past trailing ones before placing markers. The span still covers the same logical text, but the markers are positioned so that the Markdown is valid.

The normalisation layer is another piece worth mentioning. Each platform defines its own facet namespace — pub.leaflet.richtext.facet#bold, blog.pckt.richtext.facet#bold, app.offprint.richtext.facet#bold. Bismuth strips the namespace prefix and normalises to a shared internal representation before applying any markers. The dispatcher handles the differences: Offprint highlights can have a color property (emitted as <mark style="background-color:...">), Offprint mentions can include a handle (appended as a link), Offprint has webMention facets that Leaflet and Pckt don't. Leaflet has footnote facets that produce [^n] reference markers with a definition block at the end. The normalised representation means the marker application logic doesn't need to know which platform it's processing — it just handles bold, italic, link, and so on.

The block dispatcher

blockToMarkdown() is the central function. It takes any AnyBlock — the union type that covers all block types from all three platforms — and returns a { markdown, footnotes } result.

The dispatcher extracts the platform namespace and block type from the $type field using a regex that matches all three naming conventions: pub.leaflet.blocks.text, blog.pckt.block.text, app.offprint.block.text. Then it switches on the block type and handles the platform-specific field differences internally.

Headings are a good example. Leaflet calls them header with an optional level. Pckt calls them heading with an optional level. Offprint also calls them heading but requires level. The dispatcher normalises all three to #-prefixed Markdown headings, clamping the level to 1–6.

Lists are where the structural differences get most visible. Leaflet uses children with content + nested children. Pckt uses content arrays where list items contain text blocks and potentially nested list blocks. Offprint uses children with content + nested children again, but the content is a single OffprintTextBlock rather than Leaflet's ListItemContent union. Each variant gets its own recursive processor, but they all produce the same Markdown output: numbered or bulleted lists with correct indentation for nesting.

Blockquotes follow the same pattern. Leaflet has plaintext + facets directly on the block. Pckt and Offprint have a content array of child blocks. The dispatcher branches accordingly, but the output is always > -prefixed lines.

Images are the one block type that can't fully round-trip. Leaflet images are blob references with no public URL. Pckt images have attrs.src which can be a resolvable URL. Offprint images are also blob references. Bismuth emits ![alt]() for blob references — the alt text is preserved, but the image source is empty because there's no way to construct a public URL from a blob CID without AT Protocol context. This is a known limitation, and it's the right trade-off for a Markdown converter: the structure survives, the binary content doesn't.

Pckt blob resolution

Pckt has a content mode that the other two platforms don't: extended mode, where the items array is absent and instead the content is stored as a blob reference. The blob lives on the PDS and needs to be fetched via com.atproto.sync.getBlob.

Bismuth handles this through the BlobResolver interface. The default implementation (createPdsBlobResolver) constructs the blob URL from the PDS endpoint, the source DID, and the blob CID, then fetches and parses the JSON. A custom resolver can be injected for testing or for environments where direct PDS access isn't available.

Because blob resolution is async, pcktContentToMarkdown() is the only converter that returns a Promise<string>. The other three are synchronous. This is a deliberate design choice: the async boundary only exists where it's structurally necessary, not as a default that leaks into the synchronous converters.

The CLI


bismuth [options] [file]

bismuth fetch [options]

The convert mode reads a JSON document from a file or stdin, detects the $type field, and dispatches to the appropriate converter. The --frontmatter flag prepends YAML front matter with title, publishedAt, description, tags, and path — the same format Sequoia expects when ingesting Markdown files, so the output is ready to publish back to the AT Protocol without further processing. The --page-break flag lets you customise the separator between pages in a multi-page Leaflet document. The --did flag provides the source DID for Pckt blob resolution. The --output flag writes to a file instead of stdout.

The fetch subcommand goes further: given a DID and a publication rkey, it resolves the PDS endpoint via the PLC directory, lists all site.standard.document records in the repo, filters to those whose site field references the given publication, converts each one, and writes the .md files to disk. Files are named {rkey}.md. Pagination is handled automatically — the listRecords function loops through cursors until there are no more pages.

That fetch command is the one I use most. My project documentation lives on my PDS under site.standard.publication/3mfyq5mpohw25 — roughly 20 documents at this point, covering everything from Malachite to Numlang to Sigi to this thing itself. Running bismuth fetch --did did:plc:ofrbh253gwicbkc5nktqepol --rkey 3mfyq5mpohw25 pulls all of them down as Markdown, ready for a static site or a git repo.

My blog — the posts under site.standard.publication/3m3x4bgbsh22k — is what started this. Criminal by Birth, Dear Manager, Four Days with Letta Code, Self-Hosted Analytics with Umami, The Timer Problem, Dipping a Toe into the Fediverse (Again). They're all Leaflet documents. They're all things I wrote and want to keep. And now they're all one command away from being Markdown files on disk.

The exit plan

This is the real reason Bismuth exists. My blog sits on Standard.site as a Leaflet publication, and I'm happy with it for now. The editing experience is good. The output looks right. Publishing to the AT Protocol means my content is on my PDS, not some company's server, and anyone running a Standard.site-compatible reader can see it. That matters to me.

But I've been around the block enough times to know that "happy with it for now" is not the same as "locked in forever". Platforms change. Priorities change. The Hugo site I ran before this one worked fine — Markdown files in a directory, a config file, done. I moved to Standard.site because the writing experience was better and the AT Protocol integration was compelling, not because I had a problem with static site generators. Sometimes you just want to spin up Hugo again and have everything be files on disk.

With Bismuth, the migration path is a single command. bismuth fetch the publication, point a static site generator at the output directory, done. The front matter is already Sequoia-compatible, so the same Markdown files could also be republished to the AT Protocol via Sequoia — round-tripping the content without losing anything. Content goes in as block trees, comes out as Markdown, can go back in as Markdown. That's the loop.

The docsite is already doing this. It's a SvelteKit app that renders Markdown files from src/content/documentation/, and Sequoia publishes those same files as site.standard.document records on the PDS. The Markdown is the source of truth; the AT Protocol records are a derived view. Bismuth lets me flip the direction: if the AT Protocol records are the source of truth, I can derive Markdown from them. I don't have plans to move the blog right now. But the door is open, and it cost me an afternoon to unlock it.

Timeline

Bismuth started on 24 March 2026 — initially as a Leaflet-only converter. I needed something that understood pub.leaflet.content block trees and could turn them into Markdown, and I didn't want to write the same ad-hoc conversion script every time I needed to look at a document outside the browser. The initial version handled the core Leaflet blocks — text, headings, blockquotes, code, lists, images — and the facet engine with UTF-8 byte offset handling.

The 0.2.0 release on 22 April was the big one. Pckt and Offprint support meant the library could handle documents from any Standard.site platform, not just the one I happened to use. The facet normalisation layer — stripping the platform namespace and routing to a shared marker application engine — was the key design decision there. The fetch command came at the same time, because once you can convert any document, you want to be able to pull them all at once.

0.2.1 improved error handling in the CLI — better messages for missing required options, EPIPE handling for piped output. 0.2.2 added the fetch command's publication document support properly, with the --pds override for cases where the PLC directory resolution isn't available or you want to hit a local PDS. 0.2.3, today, fixed the whitespace-trimming edge case for inline markers — the ** bold text** problem that was more common than I'd expected, because Standard.site's block editor doesn't always trim whitespace from facet spans before storing them.

It's small. It does one thing. It's tested. The conversion logic for facet byte-slice annotations — the bit that's fiddly enough to be worth extracting — is the whole point, and now I don't have to think about it again.

AGPL-3.0-only. Available on npm as @ewanc26/bismuth. Source in the monorepo, docs at docs.ewancroft.uk/projects/bismuth.

The Liminal Man

So I Had Brain Surgery

tooling

pkgs

Ewan’s Blog

I ramble, enjoy.