Spellcheck the Shell Way

I was reading this awesome book (about which I shall soon blog) and there was this moment of, “Fark! What a brilliant line!” like I actually said that ’cos it was so good, followed by, “Fark! Spelling mistake of spacecraft’s name!” And I thought wouldn’t a good way to deal with spellchecking (besides my favourite cmd-;) be to take the entire text, do something fancy command-line to it, and output all the words alphabetically by frequency. Then you could just spellcheck that file, find the weird words, go back to the original document and correct the shit out of them. So I did. Brilliant!

# take a text and output all the words alphabetically by frequency
# spaces replaced with line breaks, lowercase everything, punctuation included (apostrophe in ascii \047)
# http://unix.stackexchange.com/questions/39039/get-text-file-word-occurrence-count-of-all-words-print-output-sorted
# http://tldp.org/LDP/abs/html/textproc.html
# http://donsnotes.com/tech/charsets/ascii.html
find . -name "foo.txt" -exec cat {} \; | tr ' ' '\012' | tr A-Z a-z | tr -cd '\012[a-z][0-9]\047' | grep -v "^\s*$" | sort | uniq -c | sort -bnr

Gallery

Neo-Grotesk Crypto-Brutalist emilezile.com

End of March, right when I’m throwing finally together my design portfolio (I swear I resisted, and now love having one), Emile asked if I might want to hurl together something for him. Something Web1.0, something like we’d handcode in HTML in the late-’90s, not quite something MySpace in the days of its browser-crashing gif-frenzy inferno, but definitely something that would be in its lineage; something tuner Nissan Skyline, unassuming on the surface, but all Fast & Furious: Tokyo Drift when you pop the hood. Something Helvetica, Neo-Grotesk, what’s getting called Brutalist right now, though not traipsing behind a fashion; this is Emile and when I was looking through years of his work putting his new website together, he has a deep love and understanding of the aesthetic, and the art and philosophy underpinning it.

First things first:

Emile: I have two websites. Can we make them one?
Frances: Yes!
Emile: Can we do all these other things?
Frances: OMG Yes!

Lucky I’d just done my portfolio, cos that gave me the framework to build on without having to bodge together fifty different functions and stuff. Saves a few hours there, which we made good use of in timezone-spanning conversations on typography, aesthetics, and usability.

First off, getting all those years of blog posts and work projects into a single database / website / organism. I used the hell out of interconnect/it’s Search & Replace DB script, merging, shuffling, shifting, getting rid of old code, jobs that would take a week or more to do by hand, done in seconds. We’d pretty much sorted out structure and functionality in a couple of afternoons; for a website that looks so simple, it was most of two weeks diligent work, back-and-forth conversations, picking away at details, (stripping and rebuilding, stancing, slamming, tuning … we are very good at turning all this into hoonage, especially with 24h Le Mans in the middle).

Obviously it had to be ‘Responsive’, look hella flush hectic antiseptic no matter what device, and for me (recently taking this stuff proper serious) it had to also be ‘Accessible’. I put those words in scare-quotes cos they’re kinda bullshit.

It occurred to me as I was finishing, that for a website to be neither responsive nor accessible — for example it looks crap if the screen size is too small or not ‘right’, or you can’t navigate with keyboard or screenreader — you have to actively remove this functionality. You have to break the website and override browser default behaviour. It’s a very active process to systematically remove basic functionality that’s been in web browsers since the beginning. You also have to actively not think, not empathise, intentionally not do or not know your job. Me for probably all of my earlier websites.

The funny thing is, it’s not really any additional work to make sure basic responsive and accessible design / functionality is present; the process of testing it always, always, always brings up usability issues, things I haven’t thought of, little points that become involved discussions about expectations, interactivity, culture, philosophy. Like ‘left and down’ is back in time, and ‘right and up’ forward; 下个礼拜 / 上个礼拜. Next week / last week. Yet the character for ‘next’ is xià, down, less than, lower; and ‘last’ (in the sense of ‘previous’) is shàng,  up, more than, higher. So how to navigate between previous and next posts or projects turns into an open-ended contextual exchange on meaning.

And ‘responsive’, ‘accessible’? Basic, fundamental web design. Not something tacked on at the end.

Back to the design. System fonts! Something I’ve not done in years, being all web-font focussed these days. Another trip through the wombat warren of devices, operating systems, CSS declarations. It’s crazy impressive how deep people go in exploring this stuff. Emile Blue! A bit like International Klein Blue, and a bit like Web / HTM 4.01 Blue. But not! We worked this in with a very dark grey and very slightly off-white, bringing in and throwing out additional colours, and managing in the end to sort out all the interaction visual feedback though combinations of these three — like the white text on blue background for blockquotes. Super nice.

As usual, mad props to DreamHost for I dunno how many years of hosting (it was Emile who said to me, “Frances. Use DreamHost.”), WordPress for running Emile’s old and new sites (and all of mine), and Let’s Encrypt for awesome and free HTTPS. And to Emile for giving me the pleasure of making the website of one of my favourite artist.

Emile’s new website is here: https://emilezile.com

A Bit of Character Counting Stupidity

When I’m using WordPress’ Status Post Format, I like to keep it to 140 characters, so it’s like a Tweet. But how many characters have I typed? Cos The Visual Editor only shows Word Count. So I took a look around and saw various ways of doing it, quite a few using regex to strip ‘unwanted characters’—but a space counts as a character! So I wrote my own, based loosely on what I’d been using for counting characters in the Excerpt, and from a few different plugins and bits of code. Turned out to be surprisingly easy and uncomplicated (I say that now, of course).

So, first I need a function to call a couple of files if I happen to be editing a Post or Page:

function supernaut_character_count( $charcount ) {
  if ( 'post.php' == $charcount || 'post-new.php' == $charcount ) {
    wp_enqueue_style( 'character-count-css', get_stylesheet_directory_uri() . '/css/char-count.css', array() );
    wp_enqueue_script( 'character-count-js', get_stylesheet_directory_uri() . '/js/char-count.js', array( 'jquery' ), '20160519', true );
  }
  else return;
}
add_action( 'admin_enqueue_scripts', 'supernaut_character_count' );

Then I slap together some jQuery:

jQuery( document ).ready( function( $ ) {
  if( $( '.post-php' ).length || $( '.post-new-php' ).length ) {
    $( '#post-formats-select input' ).click( function() {
      if ( $('#post-format-status').is(':checked') ) {
        $( '#post-status-info tr:first-child' ).after( 'Character count: ' );
        $( '#content_ifr' ).ready( function () {
          setInterval( function() {
            var tinymcebody = $( '#content_ifr' ).contents().find( 'body' );
            $( '#postdivrich .character-count' ).html( tinymcebody.text().length );
          }, 500 )
        });
      } else {
        $( '#post-status-info tr:first-child' ).nextAll().remove();
      }
    });
  }
});

Then a miniscule bit of CSS:

.post-php #wp-character-count {
  font-size: 12px;
  line-height: 1;
  display: block;
  padding: 0 10px 4px;
}

It’s a little bodgy, and if I had more than the 45 minutes to a) work out how the Visual Editor can be fiddled with, and b) write something that worked and looked ok—the bare minimum really—I’d do it slightly nice and maybe consider for a minute throwing it into a plugin (where it officially belongs). But I won’t. It does what I want: live updating of how many characters I’ve written in the Visual Editor, and shows it in a line underneath the Word Count (also aided me in delaying writing a residency application on the day it’s due).

supernaut Character Count
supernaut Character Count

Slight Improvements to Accessibility & Structured Data

Coding while sick!

Recent supernaut museum photography blogging got me thinking about image metadata (both Exif for camera-applied metadata, and IPTC for image-specific person-added metadata), and how I could set up a workflow to better implement this – particularly IPTC which is something I’d need to add and can partially automate, rather than Exif, which the camera adds – and how to persuade WordPress (or straight PHP / some method of database demonology) to scrape that and output it into Schema structured data.

I started doing heavy Schema work on tiptree.org as a way of making the huge amount of data comprehensible to search engines, and by extension, to humans, and have grown to like it a lot (also microformats, which WordPress uses) for making sense of what is otherwise a string of decontextualised words and media. I’d already schema-fied Dasniya’s blog, so mostly it was copy-pasting, then combining into supernaut’s structure – where the design had taken far too many liberties with accessibility and structured data.

So, now all posts spit out at least minimally useful structured data, and posts with images, galleries, or video spit out additional Schema structured data for those objects. Once I start using IPTC for my images, I’ll look at a way of including that in the Schema as well. (Though combining all that into useful objects from experience with Tiptree can be hilariously obtuse.)

On the accessibility side, I’d realised sometime in the past I’d been seriously shitty in supernaut’s design, basically a fucking horrible website to get around for anyone using keyboard navigation, screen-readers, or other methods non-trackpad + eyeballs. Accessibility is something I’ve been increasingly enjoying coding and designing for, along with structured data, particularly for websites that are highly designed and plain fucking arty – I find it gratifying to make hugely complex sites that remain structurally coherent and accessible no matter what device or method a person uses to access it – because there’s often an inverse relationship between design ‘woo!’ and usability ‘yay!’

So, keyboard navigation now is useable and hopefully much clearer, visual styling of user interaction also. Not as good as it could be, but getting there, and vaguely aiming for all of supernaut to be at least somewhat accessible – including and especially my museum visits.

DreamHost & Let’s Encrypt & WordPress

Mid-last year, Electronic Freedom Frontier announced Let’s Encrypt: free, automated, Open Source HTTPS Certificate Authority for everyone. That’s the padlock in the address bar. Supernaut and other sites of mine have either gone the parasitic paid route, or the “whole day lost and still not working” free route.

DreamHost, who has been my webhost for close to a decade (thanks Emile), announced in December they would be providing One-Click Let’s Encrypt setup for everyone, no matter what their hosting plan. Yesterday it arrived; It really is one-Click! Ok, two clicks, one checkbox, one select. Four things you have to do in DreamHost panel for what used to cost tens to hundreds of dollars/euros a year and hours of pain if you didn’t want to pay.

Awesome. Best thing that’s happened to the internet since ages.

Anyway, this isn’t about all that, it’s about “I have a WordPress site and how do I make the green padlock appear?” Cos that’s a couple more steps. Call it 15 minutes if you’re paying attention; an hour if you’re drinking.

So, you’ve already added the certificate in DreamHost Panel. No? To the DreamHost Wiki! Done that, wait for the confirmation email (might take a couple of hours), and on to your webserver.

I start with getting wp-admin all https-ified. Open your site in whatever FTP client you’re using, open wp-config.php and add:

define( 'FORCE_SSL_ADMIN', true );

Re-login to wp-admin, and check for the padlock in the address bar. Open Web Inspector (command-alt-i), select the Console tag and check for Mixed Content errors. Unless you’re doing weird things in wp-admin, that’s that side of things sorted.

In your site root directory, you’ll see a directory called “.well-known”. That’s added by Let’s Encrypt. Probably don’t want to delete that.

Open your root .htaccess and add two chunks of code:

# force redirect http to https
# ---------------------------------------------------------------

<IfModule mod_rewrite.c>
 RewriteEngine On
 RewriteCond %{HTTPS} off
 RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
</IfModule>

# https security headers
# http strict transport security (hsts)
# X-Xss-Protection
# X-Frame-Options
# X-Content-Type-Options
# ---------------------------------------------------------------

<IfModule mod_headers.c>
 Header always set Strict-Transport-Security "max-age=16070400; includeSubDomains"
 Header always set X-Xss-Protection "1; mode=block"
 Header always set X-Frame-Options "SAMEORIGIN"
 Header always set X-Content-Type-Options "nosniff"
 #Header always set Content-Security-Policy "default-src https://website.tld:443;"
</IfModule>

The first force redirects any requests for http to https. The second does some fairly obtuse header security stuff. You can read about that on securityheaders.io. (The Content Security Policy stuff takes a lot of back-and-forth to not cause chaos. Even wp-admin requires specific rules as it uses Google Fonts. Expect to lose at least an hour on that if you decide to set that up.)

Other ways of doing this are possible. It’s kinda unclear what’s canonical and what’s depricated, but WordPress Codex and elsewhere have variations on these (definite rabbithole, this stuff):

SSLOptions +StrictRequire
SSLRequireSSL
SSLRequire %{HTTP_HOST} eq "yourwebsite.tld"
ErrorDocument 403 https://yourwebsite.tld

<IfModule mod_rewrite.c>
 RewriteEngine On
 RewriteCond %{HTTPS} off
 RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

 RewriteBase /
 RewriteRule ^index\.php$ - [L]
 RewriteCond %{REQUEST_FILENAME} !-f
 RewriteCond %{REQUEST_FILENAME} !-d
 RewriteRule . /index.php [L]
</IfModule>

Then go through the remainder of your .htaccess and change any specific instances of your old http://yourwebsite.tld to https://yourwebsite.tld.

Open robots.txt and do the same, http changed to https.

Now’s a good time to empty your cache and check your live site to see how it’s going. Everytime I’ve done this so far it’s been all green padlock and magic S. But there’s still a couple of things to do.

Off to your Theme directory, open functions.php, and add these chunks of code:

/* content over ssl
------------------------------------------------------ */
 
function themename_ssl_content( $content ) {
 if ( is_ssl() )
 $content = str_ireplace( 'http://' . $_SERVER[ 'SERVER_NAME' ], 'https://' . $_SERVER[ 'SERVER_NAME' ], $content );
 return $content;
}

add_filter( 'the_content', 'themename_ssl_content' );

/* advanced custom fields over ssl
------------------------------------------------------ */

add_filter( 'wp_get_attachment_url', 'set_url_scheme' );

The first one makes sure your old http links in Posts and Pages are spat out as https. (Can probably extend that for excerpts and other things if you’re that way inclined.) The second was for me specifically dealing with Advanced Custom Fields plugin, and is part of a larger issue that’s been bashed around on Make WordPress Core. There’s a few other bits of code floating around for issues like this and the Media Library not behaving properly over https, but the next step I think deals with that, if you want to go that far.

Before that though, go through your theme for irksome hardcoded http strings. If you really can’t use proper WordPress functions (like get_template_directory_uri() or whatever), then change them to https.

Addendum:

Turns out WordPress’ responsive images support using src-set needs its own attention:

/* responsive images src-set
------------------------------------------------------ */

function ssl_srcset( $sources ) {
 foreach ( $sources as &$source ) {
 $source['url'] = set_url_scheme( $source['url'], 'https' );
 }

 return $sources;
}

add_filter( 'wp_calculate_image_srcset', 'ssl_srcset' );

If you fully commit to the next step though, these functions aren’t required.

(End of addendum.)

Database funtime! Backup your database, it’s time to search and replace. I’ve been using interconnect/it’s brilliant Search Replace DB for ages when I need to shift localhost websites to remote. Dump it in your WordPress folder, open it in a browser, and search-replace http://yourwebsite.tld to https://yourwebsite.tld. I do a dry run first, just to get an idea of what’s going to be changed. This step isn’t really necessary, and if you end up going back to http (dunno why), you’d need to reverse this process; it’s probably just that I like everything to be all orderly.

Another browser to make sure crap hasn’t fallen everywhere, and it’s cleanup time.

In wp-admin, check all the settings are showing https where they should (can even resave Permalinks if clicking buttons feels good). If you’re using a plugin like Yoast SEO, then checking the sitemaps are correct is good.

Caching plugins also need to be checked, and caches emptied. If you’re using W3 Total Cache and are manually minifying, check the file urls are all https, for some reason this was’t changed even with the database search replace. Also under Page Cache check “Cache SSL (https) requests”.

Then it’s checking in a couple of browsers with and without caching, particularly any pages that use plugins which embed or iframe content, or otherwise interact with stuff outside your website. Most sites like Vimeo, Twitter, YouTube etc that WordPress provides embeds for are already over https, but that doesn’t mean code in a plugin is up to date.

If you’re using Google Analytics or Webmasters or similar, you’ll need to set things up there for the new https version of your site as well.

Buncha caveats: Do some of this wrong and you will break things. At least backup your database before doing this. Some/all of this might be depreciated/incorrect/incomplete/not the best way to do it. Finding definitive, single ways to do things isn’t really how code works, so try and understand what the code does so you can ask if it’s doing what you want. For me to be able to do this in 15 minutes is because I’ve spent years scruffing around in stuff and breaking things horribly, and the best I can say is, this seems to work and cover the most common issues.

DreamHost. Let’s Encrypt. Excellent!

(I swear this is much quicker than it took to read.)