Last week I set another website loose (Voices of Transition, for the documentary filmmaker, Nils Aguilar), which in part was being shunted off another domain, with some very messed up urls requiring quite a stack of redirection. I started using a plugin to deal with this, which then made me think about supernaut, and all its incarnations and thousands of posts, tags, hundreds of categories, tens of thousands of images, and what kind of mess almost ten years of blogging would leave.
Out of curiosity then, I installed the same plugin (in-between watching Person of Interest – which is another story) and turns out there are over 1000 404 Page not Found per day. A lot of these (around 3/4) are from Google’s Image Search trying to go directly from their site to the full-size image itself, bypassing the post the image lives in, which the server treats as hotlinking (due to the absence of a referrer) because of my anti-leeching rules. I’d love to be able to simply redirect those attempts to view the image to the actual post, but … prior to 2011 supernaut is a mess. Most of the remaining quarter are either weird errors trying to access images for the image viewer I need to deal with or spam searches looking for things that don’t exist (on the basis of “If image ‘n’ exists at path ‘p’, then ‘y’ is installed (plugin, theme, software, etc) that allows for hack ‘x’ to be tried). Which leaves real errors.
Which comprise wreckage from the days long ago when I used Movable Type instead of WordPress. Back in those days, all my images were in a folder called /images/ with subdirectories like 09dec for December, 2009. WordPress on the other hand has everything in /uploads/ with a year/month/ sub-directory structure, which I long ago imported all the images into (and which I am still very slowly dealing with). It also had a different url structure for posts, like post_name_truncated_somethi.html. Both of these are buried in search engine results, the former to quite a huge degree (about 5% of traffic is looking for those old images).
So I wrote (and am writing) a bunch of redirects – and a lot of regex. I could do this directly in .htaccess, but not knowing which urls are the problem, and having all of this logged by the plugin are both good reasons to do it directly in WordPress. It’s a bit messy and sometimes redirects for spam conflicted with genuine pages (like sub-pages of monthly archives), but probably useful for a while if people looking for something actually stand a good chance of finding it instead of “absence…”