Sun 01 December 2024 | -- (permalink)
So I inherited responsibility for a Wordpress site. It was created by the fundraising team for a project on which I was sysadmin and one of the developers. I (and others) advised them against using Wordpress at the time, but it's what they wanted and they were the ones bringing in donations to pay the rest of us, so they got what they wanted. The one concession I extracted was that it would live on a separate virtual machine, because this was a security-sensitive project and Wordpress is insecure by design, to put it mildly.
That all happened years ago. The project is, not exactly dead, but in suspended animation. It produced a lot of good open source code, at least some of which definitely has been used in other things, so it's been worth keeping the developers' git repository and wiki alive. But then there's the Wordpress VM, which has been slowly sinking back into the primordial slime. The last person maintaining it walked away years ago, which makes it an attack target. Which is bad enough on general principals, but this is a security project, so it's particularly embarrassing. What to do?
Wordpress was always a bad idea, and maintaining the bad idea forever seems like another bad idea. But the content on that site was all designed for Wordpress, and nobody has the time or is willing to donate the money to redesign the whole thing, much less maintain it.
Well, it turns out that there's been some progress. Even Wordpress, Grand-daddy of the "throw all your content into MySQL and serve it on the fly from SQL with PHP scripts" school of web site design, has figured out that sometimes static content really is better than exposing a database and a bunch of active scripts. So these days there are Wordpress plugins that will trawl your Wordpress site and dump export its content as a static site. A bit of the long way around the barn, but whatever works.
So the following is the story of how I put the ghost of this ancient Wordpress site to rest.
First task was to clone the zombie Wordpress VM so that I could work
on the clone without damaging the zombie. Details on how to do this
will vary with the virtualization system one is using, in this case
it's libvirt
, so the process was just:
sudo virsh shutdown zombie
sudo virsh dumpxml zombie >zombie-clone.xml
sudo cp -p /path/to/zombie.qcow2 zombie-clone.qcow2
sudo virsh start zombie
after which I could just scp
the two zombie-clone.*
files to my
laptop and use them to create a new VM there. Working with a VM on my
laptop started out as a basic precaution and convenience (the server
hardware in question is many timezones and several undersea cables
away from me) but turned out to be useful for other reasons later.
Unless otherwise specified, all of the following was performed on the clone, not the original VM.
Next task was to bring Wordpress up to date on the clone. This was a bit nasty, as Wordpress really wants an active administrator who will update it every time a new release comes out (of course, few do, which is part of the reason why Wordpress has such a terrible security reputation -- they don't even pretend that anything but the very latest release is secure, and only a fraction of the Wordpress instances out there run the latest release).
Wordpress's update process is also a bit of a joke, if you like gallows humor: there's an update button which does everything, but if it's been more than a couple of months since the last time you pressed it, they advise against using it, and recommend that you update "manually". Which means downloading all the intermediate releases between what you're running now, and installing them manually, one at a time.
The manual installation process is ridiculously primitive. In fairness, this is partly because the typical Wordpress site is on some virtual hosting service where the site owner can only update files through an FTP connection or something equally painful. When doing this on a server where one has full control, one really wants to script some of the repetitive instructions on what to delete before unpacking the next update.
Some kindly Kiwis provide a good description of the manual update process (and if you're really stuck, note that, for a price, they'll do the upgrade for you).
So the process here is twofold:
- Download the relevant updates
- Apply the updates, one at a time
But before getting started on that, one probably wants to delete any plugins that won't be needed, because the chance of a successful upgrade goes down with every plugin or theme one has installed.
But before doing that (you may be sensing a trend here), I had two other problems to deal with first:
-
I needed to be able to point a web browser at the cloned VM without being immediately redirected to the real zombie VM.
-
I needed to break into the Wordpress
admin
account.
Convincing a browser to talk to the clone even when redirected to the
real zombie Wordpress site boiled down to a DNS problem, so I handled
it by temporarily adding a local-zone: typetransparent
to my
unbound
configuration with A
and AAAA
RRs pointing at the clone.
This is nasty, but also quick, easy, and temporary, so, whatever.
As to breaking into the admin
account: the admin
password walked
off the job with the former Wordpress administrator, and this was long
enough ago that there was little point in tracking him down to ask
whether he remembers what it was. So I needed to break in.
Fortunately, this is not difficult when one owns the VM on which
Wordpress is running. There are various ways to do this, the easiest
turned out to be whacking the Postfix configuration so that Postfix
would hold all outgoing mail rather than attempting to deliver it,
triggering a password reset request for the admin
user from the
Wordpress /login
page, and grabbing the confirmation request out of
Postfix's "hold" queue so that I could confirm the password change.
If that sort of thing doesn't appeal to you, you might try whacking
the Wordpress SQL database
directly.
OK, back to the main plot. Now that I could log in, I could remove plugins. Since my goal was to move to a static web site, none of the handful of plugins were relevant, all of them were either useless nonsense (eg, the "Hello Dolly" plugin, seriously, Wordpress, WTF?) or were reasonable things that simply don't apply to static content. So they all went away. Your mileage may vary here.
Downloading the relevant Wordpress releases is something one can do with a lot of mouse clicking, or just do it on the command line, I chose the latter. Start with:
lynx -dump https://wordpress.org/download/releases/ | awk >release-urls '
NF == 2 &&
$0 ~ "^ *[0-9]+[.] https://wordpress[.]org/wordpress-[0-9.]+[.]tar[.]gz$" {
print $2;
}'
then edit the resulting release-urls
file with your favorite text editor
to remove all but the releases you need. In my case I was starting
from Wordpress 4.7.29 and moving to the latest (6.7.1 as of this
writing), so I wanted all the intervening "major" releases (x.y) as
well as the final point release (6.7.1).
Once you have that list, you can download all of them with a single command:
wget -i release-urls
Script I used to apply one release at a time (update paths as needed):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
But before you start using that, this might be a good time to talk
about backups: you might want to take a lot of them as you go through
this process. Since I was doing this on a VM with a .qcow2
disk,
the obvious easy way to do this was with disk snapshots. There's
probably some way to convince libvirt
to snapshot disk of a live VM,
but starting and stopping a VM on my laptop is fast enough that I
didn't bother, so backup sequence was just:
- Shut down the VM
qemu-img snapshot -c your-snapshot-name-here /path/to/disk.qcow
- Start the VM
As it happened, I only once needed to roll back to a snapshot, but that one time made it all worthwhile. You have been warned.
Anyway, once one has the release tarballs and the script to apply them, the update sequence is straightforward. Do read the Kiwis' explanation first so you know what you're getting into, but the basic sequence is:
- Apply one update
- Log into the Wordpress admin account so it can decide whether it wants to apply a database migration
- Go back to step 1 with the next update
All of which worked fine until it got to the point where it started complaining that the installed version of PHP was too old. OK, fair enough, we'd deliberately left the zombie VM with a version of PHP from the same era as its ancient Wordpress code, so once Wordpress got past that point, it was time to upgrade Ubuntu as well.
The Ubuntu upgrade process was fairly normal, not worth describing
when there's so much online documentation on how to do it, but with
one caveat: the original zombie VM was old enough that 32-bit Linux
had seemed like a good idea to whoever built it. There are still
platforms where 32-bit Linux is alive and well, but Ubuntu on
Intel/AMD "PC" hardware isn't one of them, everybody in their right
mind went to 64-bit a decade ago. So bits and pieces of the normal
Linux support infrastructure started vanishing as the Ubuntu upgrades
of the cloned VM got closer and closer to present time (eg, emacs
disappeared). This made it clear that I was going to want to rebuild
the VM as part of this process too; if I'd realized that at the start,
I might have handled the cloning process differently, but so it goes,
not that big a deal.
After upgrading Ubuntu to the point where Wordpress was happy with the PHP version, I was able to finish the cycle of Wordpress updates, giving me what, at least in theory, was an up-to-date Wordpress installation on a 32-bit Ubuntu VM.
Since by this point it was clear that I would not be keeping the
32-bit VM long term, the final step of this phase was to run
mysqldump
to dump the complete up-to-date Wordpress SQL database for
installation on a new 64-bit VM.
By this point I'd encountered enough little broken things in the Wordpress content (abandoned site problem, not Wordpress's fault) and done enough reading on Wordpress static site generator plugins that I suspected I might have at least a medium term use for an up-to-date Wordpress VM on my laptop that I could use with the static site generator plugins, so rather than trying to coax the 32-bit zombie clone VM along any further, I decided to take a few minutes to think through what a sane VM for this purpose would look like:
-
Any DNS hackery (
unbound
, etc) would be strictly local to the VM -
It would be a Desktop VM so that it could run a web browser, so that all browser access to Wordpress could also be local to the VM.
-
It would be a Linux release and Desktop environment I use on a regular basis, so I wouldn't be wasting time hunting for things.
So I created a new 64-bit VM and installed Debian Bookworm with the
Gnome3 Desktop environment, Apache, Unbound, MariaDB, PHP, FireFox,
and, of course, Emacs. Since, unlike most VMs, I planned to be using
the GUI on this one, I connect to it with VNC rather than SSH or
virsh console
. In retrospect, these were all good choices, except,
perhaps, the decision to go straight to Debian Bookworm without
checking for PHP whoopee cushions, but that's one of those things one
only finds out about the hard way.
To make things easy, I shut down the 32-bit VM and mounted its
.qcow2
read-only as a second virtual disk on the new 64-bit VM, so I
could just copy files from the old VM's filesystem directly to the
appropriate places on the new VM.
Which brings us to the SQL dump of the upgraded Wordpress installation
from the old VM. As mentioned earlier, this site had not been
maintained properly for many years, and links had gone stale, as they
do. The thought of attempting to fix all the known broken links via
the Wordpress GUI was almost enough to make me consider abandoning the
project entirely, but it turns out that there's a much easier way,
it's just a bit brutal. I already had all of the database content
mustered as a .sql
file. Emacs has an SQL editing mode, and Emacs
has M-x query-replace
. You can see where this is going, and I make
no apology for it: in ten minutes I was able to clean up hundreds of
broken links, just by searching for https?://
and doing the obvious
whenever I found a broken link. I also fixed a few old recipes that
were so badly out of date that they'd be harmful to anybody who tried
to follow them.
Per the previously mentioned policy of making frequent backups, I
created a local git repository for the SQL file, and checked in
intermediate versions after every large change; as with the qemu-img
snapshots, I only needed the backup once, but that one time (an
ISO-Latin-1 vs UTF-8 conversion issue) that once paid for it all.
As with earlier messing about on the 32-bit VM, I also needed to do
something about DNS for the site name, but since I now had things set
up so that I only had to lie to the local copy of Firefox on the new
VM, this meant I could just put the following into
/etc/unbound/unbound.conf.d/local.conf
:
server:
local-zone: "example.org." typetransparent
local-data: "example.org. 3600 IN A 127.0.0.1"
local-data: "example.org. 3600 IN AAAA ::1"
At this time I also removed the earlier local-zone
configuration
from my networks's unbound
configuration.
After I finished banging on the SQL with Emacs, I loaded it into MariaDB, pointed the local copy of Firefox at the site name, and...thud. Total failure. Wordpress totally broken.
Fortunately, by this point I had spent enough time groveling around in Wordpress's config file that I knew it has a built-in debug logging system, so I enabled that, and started getting messages about missing required parameters to various PHP functions. Say what?
Fortunately, these messages were detailed enough that a web search for them worked. Bottom line: because I'd installed Debian Bookworm, I was running with PHP8, which, the Net of a Million Lies says, is much fussier about things like getting function calling sequences wrong. Apparently stuff all over the world has been breaking because of this, or, rather, since PHP8 is doing something right that earlier versions of PHP were doing wrong, stuff all over the world has started exploding due to latent bugs that earlier versions of PHP had never noticed (see "insecure by design").
So at this point, I had a choice: a real hero would go through and fix all those bugs in whatever broken theme file (or whatever, but probably a theme file) was triggering these errors. Someone who just wanted to get on with it and wasn't planning to expose the PHP code to the outside world ever again might take the lazy way out and just downgrade to PHP7. You can guess which path I took.
After downgrading to PHP7, Wordpress started working properly (well, as properly as Wordpress ever does). I now had a self-contained little Wordpress VM I could use to generate static site files.
So, now that I had a self-contained up-to-date Wordpress environment with its own browser and everything, I could finally play with static site generators.
The first static site generator I found, "Simply Static", sounded great, but never ran to completion. Never did figure out why. Fortunately, there are others, and the second one I tried, "Staatic" (sic), did work.
Staatic didn't run properly at first either, but, unlike Simply Static, it gave me detailed hints on what the problems might be and how to fix them. Without (further) dragging this out, it turned out to be a permission problem in the local filesystem: I had things set up in the normal way where the Apache user doesn't have permission to modify content, because it shouldn't want to do that anyway; it turns out that in this case, Staatic had a legitimate need to write stuff somewhere, and because Wordpress, Staatic wanted to write into a corner of the Wordpress tree. If this were a production system exposed to the outside world, this would have bothered me, but on a protected VM on my laptop, yeah, whatever. So I gave it write access, and Staatic became happy.
Staatic has several methods for deploying new content, I chose the
simplest one, which is just to have Staatic create a .zip
file with
the entire static content tree. There are some buttons in the config
section of the Staatic plugin's section of the control panel that are
worth playing with, particularly if you want a pre-publication version
you can try browsing on your laptop (so all relative links). For a
pre-publication view, just take the .zip
file staatic generates, do:
mkdir test
cd test
unzip ../publication-xxxxxx.zip
python3 -m http.server
and point your browser at the URL Python's http.server
prints. You
can now browse the pre-publication version, and Python will log all
the HTTP requests. Assuming you did the unbound
hack mentioned
above, you can also temporarily shut down Apache within the VM to make
sure that all the requests are being served from by Python rather than
by Apache.
Once you're happy with the pre-publication version, go back to
Staatic's config section, tell it you want a version with absolute
URLs, press the "publish now" button, and grab the .zip
file.
You are now (finally!) ready to install the static content in its new home. It really is just a directory tree of static files, so the simplest possible server configuration on the public server should work; in our case it really was as simple as:
<VirtualHost *:443>
ServerName example.org
SSLEngine on
DocumentRoot /path/to/static/wordpress/files
</VirtualHost>
(yes, of course there's other stuff involved in enabling TLS ("SSL"), but we already had that set up as shared configuration for everything hosted on that server, no need to duplicate it here).
OK, I lied, there were two other things I needed to do in my case (your need may vary), in both cases because I was consolidating from a separate VM to a pre-existing server that already hosted other web sites:
-
I needed to shut down the old zombie VM and point its name at its new home.
-
I needed to update the Let's Encrypt certificate so that https://example.org/ would work at its new home.
In order to make it easier to back out the change if something went horribly wrong, I did this by:
-
Shutting down the old zombie VM;
-
Adding the zombie's IPv4 and IPv6 addresses to the new server as temporary aliases;
-
Updating the Let's Encrypt configuration and kicking off a certificate request cycle.
In order to avoid a prolonged outage or prolonged period during which the site would look broken, these tasks needed to be done as quickly as possible, once started, so I pre-staged all the configuration changes so that I could push the various "go" buttons in quick succession. There was probably a gap of two or three minutes during which things might have looked inconsistent, oh well, best one can do is best one can do.
As it happens, everything went well and there was no need to back out,
so once I'd confirmed that everything looked healthy, I initiated the
DNS change to the A
and AAAA
RRs for example.org
, waited long
enough for that to propagate, and dropped the temporary address
aliases to reclaim the zombie VM's addresses.
And Bob's your uncle.