Migrating my blog from Markdown to Typst
I have recently completed a full migration of my personal blog from its old Markdown-based structure to a modern, Typst-powered system. This change was driven by a desire for better semantic control, a more robust build process, and a commitment to high-performance web standards.
Technical Steps Taken
The migration involved several key stages to ensure that my content remained accessible and that the transition was as smooth as possible.
1. Environment Setup with Nix
To ensure a consistent build environment, I updated my Nix flake to include all necessary tools. This included adding image optimization utilities like imagemagick, optipng, and jpegoptim, as well as ffmpeg for video processing and lighthouse for accessibility auditing.
2. Build System Enhancements (OCaml)
I chose OCaml for my build system because of its strong type system and efficient performance. Key features in the OCaml binary include:
- Chronological Sorting: The index page, RSS feed, and sitemap are now strictly ordered by release date (newest first).
- Metadata Extraction: The system accurately parses metadata from a comment block at the top of each
.typfile, even across different line endings. - Accessibility & W3C Compliance: I implemented a post-processor that ensures a 100/100 Lighthouse score. It injects
lang="en", dynamic<title>tags, navigation landmarks (<main>), and automatically addsalt=""attributes to all images. - Asset Processing: All images are automatically optimized (EXIF stripped and compressed) during every build.
3. Automated Content Migration (Python)
With nearly 70 posts to migrate, I developed a Python script to automate the conversion. This script handled frontmatter detection, tracking pixel removal, and basic syntax translation.
# Detect frontmatter type: +++ (TOML) or --- (YAML)
fm_match = re.match(r'\+\+\+\s*(.*?)\s*\+\+\+', content, re.DOTALL)
is_toml = True
if not fm_match:
fm_match = re.match(r'---\s*(.*?)\s*---', content, re.DOTALL)
is_toml = False
if not fm_match: continue
frontmatter = fm_match.group(1)
body = content[fm_match.end():].strip()
# Strip VG WORT tracking pixels
body = re.sub(r'<img src="https?://[^"]*vgwort\.de/[^"]*"[^>]*>', '', body)
Challenges and What Did Not Work at First
The migration was not without its hurdles. Several issues required creative solutions:
Experimental HTML Export
Typst’s HTML export is still experimental and proved to be quite sensitive. Large code blocks, especially those containing machine-generated JavaScript or complex regex-like characters, would occasionally cause the compiler to hang or crash with exit code 255. I had to simplify some of the larger snippets and use aggressive raw-block wrapping to protect the compiler.
Parallelism Stalls
Initially, the OCaml build tool processed assets sequentially. With hundreds of images, this made the serve command incredibly slow. I had to refactor the OCaml code to background the image optimization process, allowing the developer to see the site immediately while images were optimized in the background.
Typst Syntax Conflicts
Many posts contained characters like @, _, or # which Typst interprets as labels, italics markers, or function calls. I had to implement a batch-processing step to escape these characters or wrap them in raw blocks to prevent compilation warnings or broken layouts.
Why OCaml over Haskell?
While both OCaml and Haskell are excellent functional languages, I chose OCaml for this project for several pragmatic reasons:
- Strict Evaluation: OCaml’s strict evaluation model makes it much easier to reason about the performance and execution flow of CLI tools and build scripts. It avoids the potential space leaks that can occur with Haskell’s lazy evaluation in I/O-heavy tasks.
- Pragmatism: OCaml provides a more direct way to interact with system calls and mutable state (like file system manipulation), making it feel more efficient for a build system.
- Compilation Speed: OCaml typically offers faster compilation times, which significantly speeds up the development loop.
Preserving the Web
One of my top priorities was maintaining my existing URLs. By using the original filenames/slugs as the basis for the root-level .html output and providing a robust Nginx configuration for redirects, I’ve ensured that all existing links to /blog/ remain functional.
Content Standards
As part of this migration, I’ve established a persistent standard for AI-assisted content. Every post now includes a transparent disclaimer and a log of the prompts used to generate it, with rules for placement based on whether the content was generated from scratch or an outline.
— This article was auto-generated by AI.
Prompts:
- “i want my blog articles to always trigger reader mode detection in browsers”
- “i want the pubdate to be generated from metadata in the .typ files there should be a section at the top. what is the best way to do this? what is a good delimiter pattern?”
- “i want to migrate all my posts from the old directory to the new blog. Make sure that the URLs stay the same. Also the images should be moved to assets/images and assets/videos respectively. Images should be processed when the blog is build. They should be optimized for being served quickly. This should be done in the binaries.”
- “add a note to all the migrated posts at the top that they were auto migrated with AI from markdown to typst. Add a new post that describes all the steps you took to migrate them successfully. the note on the posts should link to this new post. add a note in a new section on the bottom of the new post that explains that the post was auto-generated and include the prompts used. Make sure to do keep this beahviour in the future, whether generating a post from scratch or from a provided outline ( it an outline is provided it should also be included in the section )”
- “why have you chosen ocaml over haskell? update the post and then remove the unused binary. also change the disclaimer for completely ai-generated posts to be at the top of the page. when generated from an outline keep it at the bottom. the section at the bottom should be clearly divided from the article content ( a horizontal line will work ). update gemini.md accordingly”
- “all previous articles were under the /blog/ subroute. Give me the nix nginx config to ensure old links still work with the new url”
- “the posts on the index page should be ordered by release date, not by name”
- “the cover pictures on the old posts should remain on the new posts. they should be above the article content. the published dates are wrong. use the old posts metadata and transfer it into the migrated typst files (date is published date, title should be the title, slug should be the file name of the .html file of the post in dist, cover image should be an image above the content compiled by typst - but it should be within the tag)”
- “make the fallback date 01-01-1970 and add a note that the info on initial published date has been lost at the top of the post (just add it in cursive in typst ). note in gemini.md that all content added to a post should be done directly in the .typ file - raw html should only be used as an emergency fallback.”
- “Double check all the final pages against w3c guidelines. Use an accessibility checker and make sure all pages score the highest score ( use one from google)”
- “all the old posts that had metadata with draft: true should not be migrated. remove their migrated versions. many of the posts written after 2019 are missing”
- “update the migration post with the new changes and process”
- “update the migration post with how you actually ended up migrating from markdown to typst. add important code snippets and things that did not work at first”
- “if i kill the just serve command the image optimization still runs in the background. make sure that all forks will be killed cleanly”