Digital Conservation
I’m capping off a fairly substantial project today. When I transitioned this blog off WordPress, the old posts vanished. Partially, the reason I left WordPress was pain in backing up posts. WordPress provides a nice XML export, but things get tough when you start worrying about artifacts. Unfortunately, over the years, a few of the transitions over WordPress versions and hosts didn’t preserve all the data. It was a lot of work, but I’ve restored the majority of 17 years of blog posts.
Let’s start with a couple big things I’ve done on the site:
- Links section. One of the big deals to me was blog roll links and making “people” based services more discoverable. I’ve started a database of links and primed it with a few of the sites I used.
- New theme. I was shooting for no weird Web2/Web3 crud – the theme is basic CSS and not much else. I found a hugo theme from a talented student. I’ve tweaked it a lot, but kept the no-JS minimal approach.
A few big notes on how I managed this all:
- https://archive.org/ – was crucial for browsing old versions of the site and finding several images not included in the backups I saved.
- https://github.com/lonekorean/wordpress-export-to-markdown – this tool made quick work converting a very problematic multisite + plugin heavy wordpress export into Markdown.
I worked through years of posts, updating images, updating meta data, and comparing against prior versions. It’s not perfect – 17 years of posts are a LOT to go through. While I used some python code to help search/replace update images, I did the work myself. I didn’t want to risk the magic AI box randomly swapping posts around. Spelling errors, typos, factual issues, all hopefully included in this restoration. Markdown is slightly less expressive, so some posts are missing formatting and details of the earlier versions.
While old deep links might result in 404 errors, most everything visible before can be found again. I have some ideas for next steps – there’s a lot possible now that the site is cleaned up and the data restored.
There’s a bit of an ongoing task to go through and update/repair/build more meta-information on posts. Expect some more refinement, but I promise to keep the content itself true to what was previously published.