A local corpus for the AI you build on your own Ghost content

If you are building anything intelligent on top of your own writing — a retrieval-augmented chatbot, a Custom GPT that answers from your archive, an MCP server your agents can call, an embeddings index, an internal-linking pass, or an answer-engine play for AEO and GEO — you have probably already discovered that your Ghost blog is the wrong shape to build against. The content you want lives behind an editor and a render pipeline, and the obvious move, scraping your live site, gives you the worst possible substrate: HTML wrapped in theme markup, navigation, cookie banners, and whatever your CDN decided to inject that day. You spend more time stripping boilerplate than building the thing you actually wanted.

What you want is the prose itself, as clean files you can read, chunk, and diff. That is the gap Specter fills. It is a native macOS app that does two-way sync between a Ghost blog and a folder of plain Markdown, so your whole archive lands on disk as .md files with their frontmatter intact — title, tags, status, feature image URL, excerpt. Connect once with your Admin API key, pick a folder, and your blog becomes a directory you can point any tool at. No scraping, no HTML cleanup, no rate limits against your own production site.

That folder is a corpus, and a corpus is the unit almost every AI workflow actually wants. For retrieval, you have stable text to chunk and embed, with frontmatter you can carry into metadata so a query can filter by tag or status. For an MCP server or a Custom GPT, you have a directory you can mount or upload directly, rather than a crawler you have to babysit. For an answer-engine effort, you can generate an llms.txt from the files and keep it honest as the archive changes. None of this requires Specter to do anything clever — it just has to give you trustworthy local Markdown, and then get out of the way.

I want to be plain about what Specter is and is not here, because this audience can smell overclaiming. Specter does not bundle a model, run embeddings, or charge you for tokens. There is no built-in RAG, no agent, no AI feature inside the app. What it provides is the substrate — the local files — and the two-way sync that lets edits flow back to Ghost. The intelligence is yours: the Claude, ChatGPT, or Gemini subscription you already pay for, the embeddings API you already call, the MCP server you are already writing. Some of the workflows below are things you wire up yourself; Specter’s job is to make the corpus they run on clean, current, and round-trippable. If you want a sense of how that editing loop feels in practice, using AI to edit Ghost posts walks through it, and editing Ghost with Claude shows the agent side.

The two-way part is what separates a corpus from a one-time dump. An export gives you files that drift out of date the moment someone publishes. Specter keeps the folder live, so when you run an internal-linking pass — say, having an agent read every post and propose links between related pieces — you can apply the result and sync it back to Ghost, then pull again tomorrow and the corpus reflects reality. That round trip is exactly the kind of cross-archive work the Ghost editor cannot do, and it is why people reach for it on jobs like fixing internal links across a whole blog. Your AI proposes the change against local files; you review; Specter ships it.

Which brings me to the honest caveat that matters most for fidelity. Markdown is a projection of a Ghost post, not the post itself — under the hood Ghost stores content as Lexical, and some things, especially card-heavy posts with embeds, galleries, bookmarks, or custom HTML, do not round-trip losslessly back through Markdown. For reading, embedding, and retrieval this rarely matters, because you want the prose and the prose comes through clean. It matters when you write back. That is what the dry-run preview is for: before anything touches Ghost, Specter shows you exactly which posts would be created, updated, or flagged as a conflict, so you see the blast radius of an AI pass before it goes live. If a post leans heavily on cards, you will want to know how that is handled before you let an agent rewrite it — how Specter handles Ghost cards is the honest accounting of what survives the trip and what does not. Build your retrieval layer on the whole archive with confidence; treat write-backs to card-heavy posts as the place to slow down and read the preview.

Underneath the menu bar app is a CLI, which is the detail that tends to matter to this crowd. The same sync engine runs from a script, so the corpus refresh can be a step in your own pipeline rather than a manual click — pull, run your embeddings job, regenerate your llms.txt, push any edits back. You bring the model and the orchestration; Specter keeps the files honest in both directions. The promise is narrow on purpose: a local, AI-ready Markdown corpus of your Ghost content, kept current, with a safeguard before anything you generate touches the live site.

Buy Specter — $49 Browse all guides