Making (and using!) WWO:SDI

Recently, we published an announcement about the release of the Women Writers Online: Scrabble Discovery Interface (WWO:SDI), which was (we hope) fairly obviously an April Fools’ Day joke. For all its silliness, however, WWO:SDI demonstrates some of the much more practical tools we have for interacting with WWO. More than that, the WWO:SDI interface itself has proved to be a remarkably effective proofing tool.

This second point may be less surprising when you note that WWO:SDI is similar to some of our existing proofing routines, which use XSLT to create HTML documents that enable us to review our data. For example, we have a proofing routine that creates a chart displaying encoded data on the page numbers and signature marks that appear in our texts, along with our idealizations of page numbers and milestones. This chart makes it much easier to see where there are mismatches between our idealized numbering and the actual contents of each page and to catch errors such as when pages might be numbered: 1, 2, 5.

Creating WWO:SDI was an interesting thought experiment for us, particularly as we considered how our markup could be used to extract words that would not be allowed in a standard Scrabble® game (we thought of the various namelike elements right away, but hadn’t considered <speaker> until we remembered that most of the contents of <speaker> labels are proper nouns—we did have to reconcile ourselves to falsely excluding some words, such as “servant,” “duke,” or “attendants”). We also had to figure out a mechanism for excluding roman numerals, which proved tricker than we first expected, precisely because they aren’t always set aside in the encoding as names and such are. And we were able to draw on some of our existing routines for regularizing original orthographies, dealing with end-of-line (“soft”) hyphens, and preferring corrections over errors.

Because WWO:SDI makes it easy to sort by word length, it also has helped us to catch some encoding errors in the texts we are preparing for publication. For example, the interface will join up the halves of words that are split by end-of-line hyphens, which we encode with a “soft hyphen” character that appears identical in most programs to the standard keyboard hyphen character we use for compound words (“hard hyphens,” as we often call them). Thus, WWO:SDI makes it very easy to spot incorrectly-encoded soft hyphens because these typically appear as extremely long words at the top of the lists when sorted by length.

Soft and hard hyphens: spot the difference

Similarly, WWO:SDI is good at uncovering the kinds of missing spaces that are much less visible in the XML files themselves, usually where words are marked with phrase-level elements, such as in:

There’s a missing space between “best” and “History” but the (in this case, artificially constructed) layers of markup make that hard to see. On the other hand, “besthistory” is much easier to spot in WWO:SDI and we may just end up developing a version that we could use in our actual proofing processes.

So, hopefully you enjoyed playing with WWO:SDI—and perhaps it even sparked your interest in using tools like XSLT to work with XML-encoded documents (possibly by joining the XSLT workshop at DHSI). We certainly have a lot of fun using XSLT to explore and proof our documents, even when it isn’t April 1st!