Real Editors Ship
By Paul Ford
tl;dr: needs editing.

There's a useful dialogue making the rounds. A man named Tom Taylor wrote about shipping product (I picked that up off Waxy Links). He's shipped good work himself, and makes the point that getting stuff out the door is a noble thing. I agree with that. Here's the closing graf:
And the next time someone produces an antenna with a weak spot, or a sticky accelerator, you're more likely to feel their
pain, listen to their words and trust their actions than the braying media who have never shipped anything in their lives.
Now as for this—well, as Taylor later pointed out (and I love blogs that point out when someone disagrees with them—very classy),
this bothered his friend Bobbie Johnson. Johnson wrote:
Tom's assessment is that these people (let's call them critics) have never shipped anything and therefore can't understand
what they're talking about. I'd suggest the opposite is in fact the case: the trouble is that media ships constantly, and therefore becomes inured to the difficulties and delicacies of launching a product of any size or scale.
I agree with that even more. (Well, mostly; they may be inured to the difficulties, or maybe just not impressed by them.)
And I think the discussion between Taylor and Johnson brings up a good point. I remember when I used to write for All Things Considered, my editor there sent me a few pictures from the whiteboard they used to put together the show. It changed constantly throughout
the day; they kept a webcam trained on it (this was a few years ago; maybe they use websockets and node.js now). There were
an insane number of variables that went into creating that big hunk of nightly audio: Recordings created months ago or two
hours ago; people working together in a dozen time zones; contracts, permissions, fact-checking. It had to fit together technically;
it had to be transmitted efficiently at a high bitrate to maintain quality (but may be sped up or slowed down to the limits
of Fourier transforms); it had to be edited to match certain durations; it had to have a certain consistency and flow; and
so on. It requires the human equivalent of map-reduce to manage it. And they—meaning editors and producers—managed a release
every night, with 12 million users.
People often think that editors are there to read things and tell people "no." Saying "no" is a tiny part of the job. Editors
are first and foremost there to ship the product without getting sued. They order the raw materials—words, sounds, images—mill
them to approved tolerances, and ship. No one wrote a book called Editors: Get Real and Ship or suggested that publishers use agile; they don't live in a "culture" of shipping, any more than we live in a culture of
breathing. It's just that not shipping would kill the organism. This is not to imply that you hit every sub-deadline, that
certain projects don't fail, that things don't suck. I failed plenty, myself. It just means that you ship. If it's too hard
to ship or you don't want to deal with it, you quit or get fired.
I recently left zineland and did a bunch of freelance work and hooboy do people not know how to ship. A three-year project
that yielded only 90-second page load; or $1.5 million down the drain with only a few microsites to show. And I've started
to find myself going, God, these projects need editors. Editors are really valuable, and, the way things are going, undervalued.
These are people who are good at process. They think about calendars, schedules, checklists, and get freaked out when schedules
slip. Their jobs are to aggregate information, parse it, restructure it, and make sure it meets standards. They are basically
QA for language and meaning.
But can they deal with character encoding issues when the parser breaks? Not really. They're often luddites of the kind that
calls the mouse a clicker, even the young ones. That said, I think there're weird content times afoot. Google just acquired MetaWeb, which is not user-generated as much as user-edited content. (C.f. the Shakespeare page). Wolfram Alpha is purely about curating data sources and then calculating atop the restructured data. Wikipedia growth
is slowing, but editing and tagging continue; the infoboxes are a wealth of semantic data. Meanwhile F——b—— and Tw—— (I can't
bear to write those words again) continue to dump forth information by the gallon, now tagging their core objects with all
manner of extra metadata. Everything is being knit together in all sorts of ways. User-generated content is still king, because
it generates page views and inculcates membership (the concept of the subscription being dead, the concept of the membership
being ascendant) but user-edited content is of increasing importance because of what I call, having just made it up, "the Barnes & Noble problem."
Until I was about 26 almost everything I wanted to read was in Barnes & Noble. Eventually they had less and less of what I
wanted. Now B&N's a place I go before a movie, and I get my books anywhere else. I'm increasingly having B&N moments with full text search ala Google. It's just not doing the job; you have to search,
then search, then search again, often within the sites themselves. The web is just too big, and Google really only can handle
a small part of it. It's not anybody's fault. It's a hard, hard problem.
Remember when everyone was into the idea that Google is a media company, back in 2008 when YouTube was two? Google is not really a media company as much as a medium company. Google creates forms—i.e.
structured ways of representing data—and then populates them with search results. They're the best at that. Google doesn't
do the best job making it easy to edit the nodes in every case (they can when they want, though—it's easy to edit in Gmail,
or upload a video), or even particularly want you to edit much of their data. Knol being the exception that proves that Knol
is kind of eh. And I haven't checked in on Orkut (the 65th largest website in the world) in quite a while.
Now, though, they've bought ITA (a very interesting company that has had tons of weird database stuff going on for a while) and Metaweb. So clearly structured—meaning edited, meaning user-edited—data is now going to be a big part of the web. There are going
to be all kinds of new slots and tabs and links and nodes. And whether the users want this or not, it looks like they're
going to get it, and the state of NLP being what it is, not to mention NPC, humans will need to be involved. Unfortunate but
true. (Then again I've been off in the high wilderness for five years; I have no clue what people think in Mountain View.
I could just be blowing more smoke.)
The Semantic Web is basically the edited web, for some very nerdy take on editing. Which implies editors. Facebook has gone turtles all the way down. Django, Rails, and other frameworks make it possible to build custom-structured-and-semantic data acquisition tools with
very little pain; Django's admin, in particular, is optimized for exactly that sort of thing. Solr and related technologies make it possible to search through that structured information. And nearest to my heart there's
an insane glut of historical data, texts, and so forth, billions of human, historical, textual objects to come online from
the millennia before the web. Plus a gaggle of history bloggers trying to contextualize it (the history bloggers are the best bloggers out there—but
that's for a different day). Dealing with the glut—and we must deal with this glut, because what is more important than sorting
all human endeavor into folders?—will require all manner of editing, writing, commissioning, contextualizing, and searching.
(Take a look at Lapham's Quarterly to see one very successful approach, using paper and ink.) Fortunes will be made! Not mine, of course, because I lack the
qualities that money likes, but someone's. History is big business.
I see three problems with my idea. First, editors and journalists are mostly luddites, as already noted, and they don't really
hang out in places where you might think to hire them. (I think the Awl should have a jobs board; that would be perfect.) But I think this one can be solved: even my most technically mystified
editor pals could be trained to use Freebase Gridworks. Add to that the willingness to schedule the living shit out of everything, the ability to see patterns, a total dedication
to shipping, and willingness to say "no," and you start to have this very interesting source of power inside your organization,
especially given the changes coming in web content, where you need structure and connections in order to play with others.
Editors can help you play nice. And they actually do understand standards, at least conceptually. If you tell them the line
needs to end with a semicolon they will end it with a semicolon. Words into Type and ISO 8879 are of similar complexity.
Second problem: most editors want to be editing for print or broadcast, not for the web, which is still seen as slumming it.
But that said more and more of the big-deal journalism is about aggregating data. Which means that more and more journalists are getting exposed to thinking in grids and bulk-editing and so forth. Or at
least getting interns to do it for them. Which is interesting. Also, getting fired or taking a buyout helps people gain perspective
on what they like doing; there's that.
Third problem: I've worked on various big content engagements, and I've talked to a number of people with more big-content
experience than me. And people agree that big orgs, even if they now have content problems, won't hire editors, or enough editors, to manage their content. Think: museums, non-profits, giant corporations, government. I get very sadpanda when I see someone spend $500K plus deployment,
development, and licensing costs on a Java EE-based multilingual platform incorporating a JSR-238 repository with a custom
workflow/process approval engine. Because they could build out something for about 20 percent of that (or sometimes 1/2 a
percent of that), and hire a few editors to wrangle the content. The content, were it approached strategically, could be of
far higher quality—better SEO, more durable, consistent voice, vetted for legal compliance, primed for re-use. And you can
make an end-run around workflow if you add versioning and reversion capability to your text fields (like Wikipedia), give
most users the ability to edit, and give the editor full revert and publish privileges. Most CMSes are parasitic technologies
dedicated to preserving the cultural and hierarchical status quo of their hosts no matter the cost, literally. People hear
me whine about this and they say: Our case is different; we need to have a system that sends out seven thousand "todo" emails
per day. And I grieve for the spirit of Work, killed by her evil child, Workflow.
That's it. This of course is already too long because I don't have an editor. Sadly. But to summarize: Good conversation between Taylor and Johnson. Editors ship. There's no place to hire the nerdier
ones because the Awl won't set up a job board. That's sad. The web is changing and it needs more editors. Do not dispute me. I love you. Goodbye.
Wednesday, April 21, 2010
Parka
My friend wore a green parka. She is, like I now am, self-employed, and called me this afternoon using Skype, which I can
already see, a few weeks into my new career, is going to be a problem. Behind her a cat moved, rendered as a set of small
animated blocks, like something made of Scrabble tiles. "That green parka," I said. "Let me ask you a question about it."
She waved her arm to point to herself, and to the parka. That caused a problem, a stutter in the system, and then we were
both trying to speak at once:
| "The parka?" |
"—Kay—" |
| "—Ahead—" |
"—Go—" |
| "—So—" |
"—Yeah—" |
| "—That—" |
"—Parka—" |
We were silent for a while, waiting.
"Go ahead," she said. "Ask your question."
This is the era for brief, frequent pauses. Pinwheels, little watches. FOUC. Vi.Me.O. The future arrives in five-second bundles, but then for the next ten seconds you're back in the past.
The 80s was the last truly futuristic decade. Skinny ties. Power, Corruption & Lies. Tass Times in Tonetown. Something about constant nuclear threat and Neuromancer. After that we kind of caught up with the future. Before, well, the future in the 70s was much goofier. Filtered cigarettes. R2D2. Kitchen appliances. People kept coming up with
new kinds of magnetic tape, and new ways to change vinyl records.
I wonder if when we look back at this month of iPad if we'll think what an amazing moment to have lived through, or if it will be like some guy with sideburns telling your dad about the reel-to-reel player in his carpeted van.
. . . . .
This man I know once took me out on his sailboat and, long story, but I had to bring the boat around alongside another boat
using a rope. He said to me, as I did this: "Listen. You can't go too slow. There is no such thing as too slow. You can only
go too fast." And I thought about that for a long time. It's a nice thing to think about, on the weekends, if you have a sailboat.
. . . . .
In my novel nervous teenagers go to startup school in abandoned skyscrapers. (I like to say "In my novel..." a lot, instead
of writing. I also like to organize my text-conversion pipeline. My latest idea is to port the novel to org-mode.)
. . . . .
I want to live in a historically awesome moment. What if in the map of time this is one of the small towns? What if this is
someplace we drive through to get somewhere interesting? If right now turns out to be nowhere? Then again have you messed
with spatial search in Solr? Right now is turning out to be everywhere.
. . . . .
I met an Amish inventor once. Everything he worked on turned out buggy.
. . . . .
"My question is," I said, when the pauses settled, "is how many days in a row have you worn that parka?"
My freelancer friend thought for a moment.
"Actually," she said, "that's a very good question."
. . . . .
Internet connections mostly fail on users 50-64-years-old, March 12, 2009,
IT Facts:
All demographic groups are about equally likely to have certain devices fail them, though seniors who own cell phones are
significantly less likely than younger cell phone owners to have problems with their cell phones. Just 18% of cell phone owners
65 years old and older reported that their cell phones had failed in the past year, while 26% of 50-64 year olds, 33% of 30-49
year olds and 30% of 18-29 year olds reported cell phone problems. Seniors are not as exclusively reliant on their cell phones
as younger owners, and so they may have less wear and tear on their phones than do younger users who are more likely to experience
cell phone failure.
We got the landline back in the new apartment. I can't tell you how happy that made me. I call people on it all the time.
It's like we're in the same room. Getting older.
I'm on a Panel at SxSW
I'll be in Austin at the interactive slice of SxSW (Where screencasts come alive!) for a few days starting this Wednesday. Mostly I'll be wandering around with a churro in my hand, muttering, but I'll also
be on a panel, moderated by Jeffrey Zeldman and featuring Erin Kissane, Lisa Holton, and Mandy Brown. It's called "New Publishing and Web
Content" and it's about teaching a meerkat to drive cuddling releasing a super-plague new publishing and web content. I'm working the door.
Here's information from Jeffrey regarding the panel, and an interview with Jeffrey about, among other things, the panel. Panel! (As a side-effect of this actual in-the-flesh attendance I'm sorry to say I'm
not doing six-word reviews this year. I have not the proper strength.)
I've never been to SxSW before. It surprises some people when I tell them that. It also surprises people when I cry or vomit,
or get into bed with them well after all the other guests have gone home. But I've never had a job where they want to spend
money to send me places to learn things. I think that's a very NYC thing; ideas and talent are supposed to come to us, preferably kneeling and begging, not the other way around. This approach is why the finance and publishing industries are
enjoying such great years.
So I bought myself a ticket on a jet, and if you see me say hello. I look like this as of a few weeks ago. (Caveat: The device I use to keep my head molded into a cube shape may not be allowed by TSA rules.)
If there are any webhatchets or resentments or awkwardnesses left over from the old days, I apologize. Let's just bury those
and be nice. I have nothing left in me for across-the-room awkward twinges but lots of room for niceness.
Elsewhere: Just Like Heaven
I wrote a Non-Expert for TheMorningNews.org, called "Just Like Heaven":
Question: Is there afterlife —Matt
Answer: If you ever need to make your own Grand Canyon, start with a river and lift up the earth. As the ground rises the river will
carry some of it away. Wait seven million years, at which point tourists will come. Some will see eons of erosion at work;
others will believe that, a mere 4,500 years back, God dragged His fingernail across the desert. Like the group of evangelical-Christian
creationists that rafted through....
And it goes on from there...
Wednesday, August 26, 2009
But melts just like a little girl
Bob Dylan plans to release a collection of familiar yuletide tunes... with proceeds of the album to benefit hunger-relief
charities... —"Sleigh, Lady, Sleigh: Bob Dylan to Release Christmas Album," Dave Itzkoff, the New York Times
- Snowin' in the Wind
- Reiny Deer Women #12 & 35
- If Not for Yule
- Can You Please Crawl Down Our Chimney?
- Just Like a Snowman
- Positively 34th Street
- Ain't No More Cane
- Gotta Serve Somebody Eggnog
Panel/Unicode table for you
So I've been out of it for a little while longer than I'd hoped. And I'm back here, like the world's worst ex-boyfriend, to
ask for a small favor. I want to ask you to go over to Jeffrey Zeldman's website to read about a panel on which I could, should all go well, appear in March at SXSW, along with some nice people.
If you're interested in it go ahead and vote for it.
Since I knew I was going to ask you for something I figured I should make you something nice. Here is a simple Unicode browser for people who like looking at characters; you can click on the number below each character to visit its Wikipedia page.
Surprisingly many symbols have their own pages.
There may already be something like it out there, but I couldn't find anything quite like it, and I keep spending time poking
around Unicode on Wikipedia and various other sites and finding it hard to get a sense of the whole range of options available.

There's a lot of good stuff up around 9,000. I think my favorite character, however, is ␙, #9241, the SYMBOL FOR END OF MEDIUM.
It's hacky--doesn't work in IE7. Otherwise it seems to roll along. It's all on one page (HTML/CSS/JavaScript) and under the
GPL/MIT license, so if you have any big ideas go to town.
␙
Monday, February 16, 2009
Been a while
I've been working on something over at the dayjob. (Although I'm writing this at 2:36 AM from the office, so not just dayjob.) I tell you because it's fun, and it's free to use.
I went out the other day with some XML folks, old hands. We talked about ISO 8879, which I once photocopied in its entirety,
and old issues of Creative Computing, and about Ted Nelson. I said, now that I have gained experience in key web technologies Django and SOLR, I feel I have the
experimental platform I need to implement a new version of Ftrain with a new kind of story, entitled “Lost Dogs, or, the Unhappy
Town.” The person I told this to, you could tell he was not buying this. He said, “I am not buying this.” There was a definite
sense of trains leaving stations, boats leaving docks, bicycles unracking, respectively blowing whistles or tooting horns
or tinkling bells. A part of me turned into birds and fluttered away, a flock heading to sea. They were dragging a whale.
I thought, well, shit, I guess I better think about that.
Wednesday, October 15, 2008
Learning to Fear the Semantic Web
By Paul Ford

Zotero is an open-sourced bibliography-management tool that runs inside Firefox-based browsers (see screencast). It helps you keep track of your research. I've enjoyed using it as I work on writing projects. From the about page:
Zotero is a production of the Center for History and New Media at George Mason University. It is generously funded by the United States Institute of Museum and Library Services, the Andrew W. Mellon Foundation, and the Alfred P. Sloan Foundation.
Nice! Except today, a good bit after the fact, I learned of a peculiar lawsuit that information and news giant Thomson Reuters Inc. filed last month against the makers of Zotero. From the website of The Chronicle of Higher Education, October 3, 2008, by Jeffrey R. Young (links added):
Thomson Reuters Inc. sued George Mason University in a Virginia court this month, arguing that a free software tool made by the university makes improper use of the company’s
EndNote citation software....
Thomson Reuters argues that the latest release of George Mason’s software, which can import files created by EndNote and turn
them into files that can be used and shared online using Zotero, “is willfully and intentionally destroying Thomson’s customer
base for the EndNote software.” The company seeks $10-million in damages for each year the university has offered the software
and to stop the university from distributing versions of Zotero that can convert EndNote files.
One person who commented on the lawsuit is Michael Feldstein, who writes a blog about online learning. He posted the following on October 5:
Apparently, the Zotero team did create their own style format and is crowd-sourcing the creation of import styles. As you
can see from this Zotero developer discussion thread, the developers considered and explicitly rejected supporting the redistribution of Thomson-supplied EndNote conversion files.
In fact, while Zotero can read EndNote style files, it specifically does not convert them into Zotero’s own format, in large
part to discourage the redistribution (deliberately or accidentally) of Thomson-created files. What the import feature does
facilitate is (a) users who have already licensed EndNote and want to migrate to Zotero can use the EndNote styles that they
have already paid for, and (b) Zotero users can take advantage of the EndNote import styles that individual journal publishers
(as opposed to Thomson itself) make available for the convenience of their subscribers. These uses strike me as totally within
bounds.
(More is available from the Disruptive Library Technology Jester blog.)
Given my biases this lawsuit seems like an anachronistic, hamfisted attempt to block competition. While as a programmer I
love being able to adapt open-source software to my particular needs, I use a mix of closed-source and open-source software
without many qualms. That said, non-standard, closed-source document formats are awful stuff that block competition between software vendors and, worse, waste god-awful amounts of my time. If you wish
to dispute me on this then come to my office tomorrow to help me, over the course of several hours, yank a magazine's-worth
of text out of Quark XPress, using a mix of applications and balky emacs macros. (Imagine if you could take back all the time
spent wrangling closed, proprietary document formats. You could finish Perl 6; you could probably write it in Arc.)
I'm not an Endnote user and I don't like to borrow trouble (which is why I've been avoiding this blog; blogging is a great
way to borrow trouble). But not only does this lawsuit invoke the dread specter of legally-enforced proprietary data formats,
it raises questions about Thomson Reuters's legal attitude towards the data produced by its other software offerings—including,
in this case, a piece of software called OpenCalais.
OpenCalais is a web-based application that consumes text and returns special Semantic Web-style metadata that you can use
to do interesting, Semantic Web-style things, like: create topic pages, improve search, or enhance local taxonomies. It has
a Facebook group and its website features both video of straight-talking bearded coders and a creatively borrowed terms of service statement:
We based these Terms of Service under those released by Automattic under a Creative Commons Sharealike license. Thanks to
Automattic and WordPress.com for sharing.
I have a quarter-million-page corpus at work and I'm looking for simple, inexpensive ways to enhance it, so I've followed the development of their platform for some time—joining
the FaceBook group, signing up for an account, and using their free endpoint for testing (go ahead and give it a spin). My grand, entirely unrealized plan was to include a direct hook to OpenCalais
in our content management system. The OpenCalais team seem trustworthy, progressive, and smart, and committed to openness.
But, at least for now, the lawsuit against Zotero has scared me off using the product.
This despite, as pointed out by the Panlibus blog at Talis, in a post on OpenCalais as it relates to the Zotero lawsuit, the following statement from the OpenCalais folk:
We want to make all the world’s content more accessible, interoperable and valuable. Some call it Web 2.0, Web 3.0, the Semantic
Web or the Giant Global Graph—we call our piece of it Calais.
So why am I overreacting? Well, that “our piece of it” bit is a little tricky, but I think I get what they mean, and the Endnote
people and the OpenCalais people are in different parts of a very large organization and working on different projects with
different goals. But the parent company is the same, and, professionally I feel required to overreact, because in every situation—as editor, coder, designer, and so forth—I to my great regret must always concern myself with liability.
I hate that part of my job. From worrying about copyright and fair use, to questioning whether we can reuse art or prose from
our own archives, to sending out cease and desists—it all fills me with gloom and despair, the sense of being a culpable cog
in a lumbering legal machine. It's the opposite of creative, interesting work, but if you get something wrong the consequences
can be dire, so worrying about getting sued is something that has to be done, every day, even on the subway. I'm worried about
getting sued right now, sitting here, typing this. If you've had someone threaten you with a lawsuit, you know the sort of
fear and second-guessing it engenders. Even if I am certain that I have followed every ethical and legal guideline, it's an
instant panic attack to see the words “contacting a lawyer” or “liable for damages” in an email; it leads to second-guessing,
and I know there will be phone calls, meetings, and several months of followups to comply with the needs of insurers. If I
can see the shadow of a lawsuit anywhere I am obligated to shine a light upon it and freak out at least a little; otherwise
I'm not doing my job.
And that's what's going on here. This recent lawsuit against George Mason/Zotero immediately brought to to mind a scenario:
Thomson Reuters maintains control over the taxonomy, the thesaurus, of terms used in OpenCalais, and they do the indexing
of content to associate that content with terms. The use pattern I was considering was as follows:
- Create text within a content management system;
- Send that text to OpenCalais;
- Store the metadata it returns;
- Over time, use aggregated metadata, integrated with our existing ~80,000 subjects, to create a local taxonomy for faceted
search and automatically-compiled topic pages, along with other interesting interfaces.
- Share as much of the taxonomy as possible as downloadable RDF;
- Make sure to provide links back to OpenCalais wherever possible, on their terms, as defined in their Terms of Service (TOS)
document.
That's probably not a big deal. I doubt anyone would even notice. But... is it at all possible, conceivable, even a tiny bit that at some point in the future Thomson Reuters could claim that we were misusing their data in step (4), above? From the
TOS:
If you syndicate, publish or otherwise transmit any content containing, enhanced by or derived from Calais-generated metadata
you will use your best efforts to incorporate the correct Calais-provided Globally Unique Identifier [GUID] in that content.
It seems straightforward, but that “best efforts....” The truth is, I don't really know exactly what they mean there. Also from the TOS:
You will not use any metadata or GUIDs produced by Calais to create a metadata retrieval service similar to Calais.
And could they claim that we were somehow creating a derivative work without permission and distributing it in step (5)?
I would say, based on my far-from-authoritative reading of the TOS, and given the suit against George Mason University, there
is now a precedent; that is, it is within the realm of possibility that if I passed thousands of web pages through OpenCalais and decided to adapt the resultant format for my own use in a
way that Thomson Reuters disliked, I could get a fat letter from some lawyer someday demanding damages, accusing me of creating a derivative work based on their proprietary
taxonomy, in violation of their terms.
I'm not saying it's likely; I'm not saying I'm right; I'm not even saying that Thomson Reuters would be legally or ethically wrong to sue for damages. I would bet $10,000 right now against my fears coming to pass. But IANAL, which is exactly my problem here. And this is not a call to boycott anything, nor an
attempt to get personalized service out of OpenCalais, where the developers are doing some very fine Semantic Web-bootstrapping
work. I know Thomson Reuters could give a damn about me, and in that they are justified—I'm just another API key hash in their
database, and even if I upgraded to their for-pay service I'd never represent more than a balance-sheet rounding error.
My only purpose in writing today is to point out how a lawsuit can have unintended chilling effects, at least for me. We're
in a remarkable downturn, and people are being told to “get real or go home.” One way corporations get “real” is to sue the living shit out of everything that blinks. It's probably a good time to review
the terms of service for all of your critical software to make sure you're in compliance; I wonder if a lot of Web 2.0 mashup
decentralized goodwill is going to go to good-faith heaven as companies under financial strain start to look closely at their
patent portfolios and vendor agreements, and decide that printing out lawsuits is even cheaper than deploying to EC2. And
now that the “Semantic Web,” or “Web 3.0,” or the “Linked Data Web,” or the “Web of Really, That's How to Query Over an rdf:Bag?” or whatever they're calling it, is viable enough that you can't shrug off legal worries—now that the Semantic Web is no
longer just a research project, if someone owns the taxonomy you're using and changes it up on you, what rights do you have
in the matter? Who owns the GUIDs? Your honor, I just wanted to build a hierarchy of topic pages. I never meant to hurt nobody.
And so forth.
To summarize: working in web publishing, I have a healthy fear of lawsuits bordering on the insanely paranoid; and I wish
it were not so, but that is now part of the job, as the web of ideas has given way to the web of pricks; and finally, actions
speak louder than Creative Commons-licensed terms of service. You can still get handed a subpoena while you're riding the
Cluetrain.
Now that I got the fear, do I want to go to the effort to (1) educate a few people in management, none of whom would have
great interest in the subject except as a soporific, about the far-fetched risks of using externally-generated taxonomies
to organize our content; and do I (2) want to spend a number of hours in the near future educating myself over the completely
nebulous rights issues connected to taxonomies, linking, and file formats, thus taking even more time away from code and prose
to give it to the law; and do I possibly even (3) want to allocate the budget to work with a lawyer on taxonomy-related issues?
All the while knowing that I'm overreacting and that this is probably pointless?
Not really. I'd rather let other people do that and read the judges' opinions. Let deeper pockets set the precedent; what
I do want to do is to port the CMS to Django, an open-sourced CMS published by a foundation, get the search into Solr, also published by a foundation, and introduce hierarchy to the 80,000 subjects we already have indexed. I'm just going to
put OpenCalais away for a while and start looking at DBpedia again, then see how that whole Zotero suit works out over the next few months or decades.
In one way, this is all great because I love the Semantic Web to the point of stupidity—to the point of building a custom content management system entirely
based on alpha-level technology using RDF for storage, creating a framework even slower than Rails. So I'm grateful to Zotero
for taking the brunt of the lawsuit, because it gave me reason to take off my rose-tinted Linked Data goggles, and made me
aware that all of my planned Semantic Web taxonomy-sharing fun could come crashing down if I don't carefully track the provenance
of every one of my triples, erring always on the side of raving terror.
Know what else is great? Now, finally, ten years on, I know that the Semantic Web is real and viable, because I'm afraid I'll
get sued for using it. That's the true measure of a maturing technology—eat it, Gartner hype cycle.
I believe, as in don't-get-him-started, that taxonomy-driven interactive editorial is essential to the future of the web,
and thus to storytelling and narrative in general. Clearly a great deal of money is being spent by major companies in pursuit
of the golden triple: It appears the AP is working on taxonomy tools, and Rupert Murdoch's Dow Jones has Synaptica and publishes a cute taxonomy cookbook. A number of other companies are out there, building massive thesauri and indexing tools, hacking parsers and coding semantic
disambiguators like mad, banging their heads against pronouns. There will be many, many competitors seeking to add their own
structure our increasingly Web-content-driven reality, and we will, if we use their services, find ourselves beholden to their
methods of indexing, with all manner of legal compliance and copyright issues as of yet untested in courts. Creating good,
broad, world-describing taxonomies is extraordinarily expensive, because reality is large, and these companies will need to
strike a balance between sharing their work and protecting it, so I imagine this will be a subject I'll revisit, professionally,
many times over the next few decades (barring complete societal breakdown, or a personal spiritual awakening that allows me
to stop thinking about this sort of thing).
Such questions could keep a librarian up at night, staring at the wall, petting his or her sleek gray cat Otlet and wondering
what, for instance, a political campaign looks like when all of the news and columns are automatically classified before being
published. Competition, he or she might conclude, must be encouraged between these platforms; there must be a free, and yet
somehow regulated (perhaps by the W3C, or preferably by an organization with a more attractive website), market of taxonomies—you
can't have people claiming to own concepts conjoined to unique identifiers, can you? Can you? You probably can? Oh.
But there's likely no reason to worry; and I am just borrowing trouble; and maybe the Semantic Web won't matter that much
after all. Even if taxonomies do become increasingly important in our web of linked data, thank God we live in a society with
an enlightened understanding of intellectual property, and that we can trust the tiny handful of organizations that control
the world's supply of news, as they become software providers as well as content providers, to do the right thing when it
comes to serving the needs of a wider populace, in a culture that would rather foster dialogue, discussion, and mutually beneficial
resolutions than use the ugly, blunt tool of potentially profitable lawsuits. I'm sure—really, I am—that mine is an overreaction.
And onward, to progress.
Thursday, September 18, 2008
Fixed