NEVER sanitize your inputs!

I’ve seen this cartoon being linked-to in so many comment threads and forums. Anytime its even a little bit applicable, someone will post a link to this cartoon. It has become so pervasive that if you search Google for “327”, it’ll be the third link returned, right after the Wikipedia pages for the year and the car.

Search “328” and the next XKCD is no-where to be seen.

The lesson, according to this character and so many real people on the internet, is to sanitize your inputs. The school in the cartoon didn’t sanitize its inputs – and one of its database tables got deleted!

Ask anyone about developing websites and they will tell you the first lesson is always to sanitize your inputs. In this day and age you’d have to be crazy not to sanitize your inputs.

Trouble is, sanitizing your inputs is very bad advice.

What went wrong at the school?

A quick aside for what’s going on in this cartoon. A new student named…
      Robert'); DROP TABLE Students; --
… joins a school and the administrators dutifully add a record to their database for the new student. The software takes the new student’s name and builds an SQL instruction.

    string sqlcmd = "INSERT INTO Students (name) VALUES ('" + name + "')";
    // INSERT INTO Students (name) VALUES ('Wilhelm von Hackensplat')

With normal names, the string would be a perfectly valid SQL command which will add a new record into the table named Students. But what about our friend Bobby Tables?

   INSERT INTO Students (name) VALUES ('Robert'); DROP TABLE Students; --')

Because that single-quote character wasn’t sanitized away, an extra command to drop the Students table crept in. This is what we know in the trade as an “SQL Injection” attack, as some unintentional SQL got injected in.

So let’s sanitize it?

We can’t allow people to go about running arbitrary SQL commands willy-nilly. Something must be done!

That single-quote character in the student’s name is clearly the problem, so we’ll take it out while building the SQL command. This fixes the command and you won’t find database tables disappearing. So why do I call this bad advice?

Trouble is, the single-quote character has a bit of a split personality. As well as being a quote, it’s also an apostrophe. Real people have real names with apostrophes and if you’ve ever seen a name where one has clearly been dropped, you’ve seen the mark of the sanitizer.

Perhaps this is why some Irish people prefer to spell their name using the letter Ó. After years of having their name mangled by naive software developers, they made a new letter.

So forget sanitizing your inputs. What you need to do instead is to contain your inputs.

Contain my inputs?

The error made by the programmers at the school was that they failed to contain Bobby’s name. A student’s name is just a sequence of characters, so you need to use it in a way that could only be a sequence of characters.

Lucky for us, all good SQL access libraries support parameters. Instead, you write the command but with placeholders for the values to be added in little boxes later.

   INSERT INTO Students (name) VALUES (@name)

Here, there’s a clear demarcation between what’s the SQL command and what’s the value from outside. The student’s name is inside the little box where the apostrophe is just another character. The name has been contained and that destructive command inside can’t break out.

But that’s what we mean by “sanitize”!

Then you should stop calling it that. The word “sanitize” is a common enough word and most people understand it as a word for cleaning – removing the bad stuff and keeping the good stuff.

  “Did you sanitize the kitchen worktop?”
  “Yes. I put it in that sealed box over there.”
  “That’s not sanitizing!”

  “When I use a word, it means just what I choose it to mean. Neither more nor less.”

There is a real problem with software not accepting names with apostrophes, as discussed earlier. Real software developers are listening to the advice to sanitize and interpreting it to mean they should have the bad characters removed.

Isn’t sanitization still needed with HTML?

HTML has a similar problem with injection. Say you’re building a website that can take comments from the public, like this one, you’d want to prevent people from leaving comments with bits of scripting code inside.

   "I <i>love</i> this website! <script>alert('Baron von Hackensplat Was Here');</script>"

Its fine to allow the emphasis, but if your website also publishes the script, anyone else visiting your site will end up running that script.

Unfortunately, HTML doesn’t support a nice little box from whence nothing can escape, so we need to provide that box of containment ourselves. Any HTML from the public should be parsed and rewritten as safe-HTML, where only a safe subset of tags are allowed.

You might argue that this amounts to sanitization, but it betrays a bad mental model. Okay, you’ve dealt with the big problem, but forgotten about the little problems.

Have you ever seen a comment thread where, starting part way down the page, everything is in italics? This is caused by someone opening italics but not closing them. If your mental model is to sanitize, your natural reaction would be remove the ability to use italics. If your mental model is instead to contain, you know that italics is really harmless and just needs to be closed when left open.

Cross-out Cross Site Scripting

In closing, I’d just like to appeal to the industry to drop the phrase “Cross-Site-Scripting” and call it “HTML Injection” instead.

Any scripting that you didn’t write or don’t trust, cross-site or not, is a very bad thing to have on your website. Putting “Scripting” in the name makes people think of scripting as the problem but its so much more than that.

Calling it “HTML Injection” draws an obvious parallel with “SQL Injection”. Its the same problem with the same solution.

Credits: XKCD 327 – Exploits of a mom by Randall Munroe.
“When I use a word…” is a quote from Lewis Carroll’s “Through the Looking-Glass”.
Second: sanitize the gloves by Thomas Cizauskas.
Fun with cling film by Elizabeth Gomm.

I need a good podcast catcher (and a bit of a rant)

I listen to podcasts on my daily commute. These are radio shows that can be downloaded over the internet and listened to later. However, to keep up with a weekly show, I’d have to – every week – visit the show’s website and manually download the latest episode. That would get real tedious real fast. To resolve the tedium for us all, the podcast catcher app was invented.

Podcast catchers allow me to list all the shows I want to listen to. Every day or so, it automatically checks each show on the list to see there are any new episodes for me. If it finds any, it downloads them and plays them for me.

Currently, I use Google’s ‘Listen’ app, but that service is about to be closed down with the imminent closure of Google Reader. I need to replace it. I’ve downloaded a handful of alternative apps, but they all lacked a feature I find essential. I remain a little flabbergasted that any podcast app out there does it any other way.

“She smoothes her hair with automatic hand and puts a record on the gramophone.”

My daily commute is ~45 minutes of driving each way, so for me, a good player needs an Auto-Play mode. When one show finishes, another should start playing right away. There’s very few places I could safely pull-over and having to push buttons while I’m driving is right out.

But not just any Auto-Play mode. Oh no. All the apps I tried had an Auto-Play mode, but they all did it so very badly.

Ask yourself – When a show finishes playing and Auto-Play is switched on, which show from the list of unplayed shows should your app select to play next?
   A. The one that’s been waiting in the queue longest.
   B. The one that appears next in the list when sorted by episode title.

Did you pick A or B? Sorry, they’re both wrong, and yet these were the only options available on an awful lot of podcast apps.

The right answer, is to play the one the user has queued up next. The “In the order I want” sort criteria. No really, who is actually asking for the order of play-back to be strictly enforced? Would anything else, perhaps, offend your sense of politeness?

   “You want to listen to the latest Cognitive Dissonance show? But what about this episode of Hanselminutes? It has been waiting paitiently in line and this is its turn to be played.”
   “I say! That would be jolly impolite of me. Don’t want to hurt the feelings of those audio files. Pip pip!”

“I sat upon the shore, fishing with the arid plain behind me. Shall I at least set my lands in order?”

With Google Listen, new episodes join the listening queue, but I can arrange them in the order I like. If I’m just not in the mood for the next episode in line, I’ll select another episode that I do want to listen to and bring it to the top using the ‘Move to the top of queue’ button.

Once I’m happy with my selection of the next hour or so’s worth of stuff at the top of the queue, I hit play and drive off. As the first show finishes, its taken off the queue and the next episode I had queued up starts playing, all without any interaction.

The few alternative apps I downloaded did not offer this. It seems such a simple thing and yet I can’t imagine the insanity of not being able to control the playing order.

If one, settling a pillow by her head should say, “That is not what I meant at all.”

Some people reading this, I’m sure, are thinking “He wants a playlist manager”.

To manage a playlist, you’d need to first create a playlist and give it a name. Then you’d need to add shows to the list and save it. Then once its played you’d need to delete that playlist and start a new one.

No. That’s just another level of insanity. All I want is a button on each episode labelled ‘Move to the top of the queue’. That’s it. If I have to perform some ritual every day to create a new playlist or whatever before I can get that button, I’m not going to be happy. Life is too short for pointless ritual.

Maybe if your UI is so user friendly that the ritualistic parts of your playlist manager just disappear, that’s fine but that’s not what I’ve seen out there.

“Oh, I have to chose a name for this new playlist. Why not just pick a random name for me? I’m only going to delete it in an hour’s time anyway.”

So there is my plea. Does anyone please know of a podcast app for Android phones that implements its Auto-Play mode… correctly? I will happily pay a reasonable subscription fee for good quality software.

If you’re an app developer and your podcast app does it correctly, please feel free to use this page’s comments for some free publicity. On the other hand if your app doesn’t do it right, please treat this page as a bug report.

Picture credits:
Day 30.06 Voices on the radio!” by Frerieke on Flickr.
Listening to Radio Karnali” by the BBC World Service.
The section titles were borrowed from The Waste Land and The Love-Song of J. Alfred Prufrock, both by T.S. Eliot.