May 25, 2013

Should String Be An Abstract Class?

Why are HTTP headers handled as plain strings in programming?

Is there anything in software engineering that is just a string? If not, shouldn't String be an abstract class, forcing developers to subtype and at least name datatypes?

Domain-Driven Security

Former colleague Dan Bergh Johnsson, application security expert Erlend Oftedal, and I have been evangelizing the idea of Domain-Driven Security. We truly believe proper domain and data modeling will kill many of the standard security bugs such as SQL injection and cross-site scripting.

This blog post is a case for Domain-Driven Security and a case against strings.

The addHeader() Method in Java

Let's be concrete and dive directly into programming with HTTP headers.

In Java EE's interface HttpServletResponse we find the following method (ref):

void addHeader(java.lang.String name,
               java.lang.String value)

Not a heavily debated method as far as I know. On the contrary it looks like most such interfaces do. An implementation of the interface may look like this (ref):

public void addHeader(String name, String value) {
  if (isCommitted())

  if (included)
    return;     // Ignore any call from an included servlet

  synchronized (headers) {
    ArrayList values = (ArrayList) headers.get(name);
    if (values == null) {
      values = new ArrayList();
      headers.put(name, values);

It shows we can really set any string as an HTTP header. And that's convenient, right?

The Ubiquitous String

java.lang.String is the ubiquitous datatype that solves all our problems. It can contain anything and nothing and of course it has its sibling in any popular programming language out there. Let's have a look at what a string is.

Java uses Unicode strings in UTF-16 code units which handle over 100,000 characters. As far as I know C# and JavaScript does the same. The max size of strings is often limited by the max size of integers, typically 2^31 - 1 which is just over 2 billion.

So, a string …
  • is anything between 0 and 2 billion in length, 
  • can contain 100,000 different characters, and 
  • can be null.
Hardly a good spec for HTTP headers.

HTTP Headers By the Spec

RFC 2047 gives us the formal specification of how HTTP headers should look. An excerpt will suffice for our discussion.

message-header = field-name ":" [ field-value ]
       field-name     = token
       field-value    = *( field-content | LWS )
       field-content  = <the OCTETs making up the field-value
                        and consisting of either *TEXT or
of token, separators, and

token          = 1*<any CHAR except CTLs or separators>

CHAR           = <any US-ASCII character (octets 0 - 127)>

CTL            = <any US-ASCII control character
                        (octets 0 - 31) and DEL (127)>

separators     = "(" | ")" | "<" | ">" | "@" |
                 "," | ";" | ":" | "\" | <"> |
                 "/" | "[" | "]" | "?" | "=" |
                 "{" | "}" | SP  | HT

LWS            = [CRLF] 1*( SP | HT )

CRLF            = CR LF

OCTET          = <any 8-bit sequence of data>

TEXT           = <any OCTET except CTLs,
                        but including LWS>

Let's summarize.
  • HTTP header names can consist of ASCII chars 32-126 except 19 chars called separators.
  • Then there shall be a colon.
  • Finally the header value can consist of any ASCII chars 9, 32-126 except 19 chars called separators … or a mix of tokens, separators, and quoted strings.
  • On top of this web servers such as Apache impose length constraints on headers, somewhere around 10,000 chars.
There's clearly a huge difference between just a string and RFC 2047.

The Dangers of Unvalidated HTTP Headers

Can this go wrong? Is there any real danger in using plain strings for setting HTTP headers? Yes. Let's look at HTTP response splitting as an example.

We have built a site where an optional URL parameter tells the server which language to use.

… redirects to …

… with a custom header telling the web client to use Swedish. After all, we don't want that language parameter pestering our beautiful URL the rest of the session.

So in the redirect response we do the following:


The result is an HTTP response like this:

HTTP/1.1 302 Moved Temporarily
Date: Wed, 24 Dec 2013 12:53:28 GMT
ApqwBE!-1251019693; path=/
Custom-Language: Swedish
Connection: Close

But what if the request looks like this (%0d is carriage return, %0a is linefeed):<html>Well, hello!</html>

That would generate the following HTTP response (linefeeds included):

HTTP/1.1 302 Moved Temporarily
Date: Wed, 24 Dec 2013 15:26:41 GMT
9UsS!-1251019693; path=/
Custom-Language: foobar
Content-Length: 0

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 19 
<html>Well, hello!</html>
Content-Type: text/html

… which will be interpreted as two responses by the web browser. This is an example of the security attack called HTTP response splitting (link to WASC from where I've adapted my example). And that's just one of the dangers of letting users mess with headers. Setting or deleting cookies is another. In fact, the whole header section is in danger.

The HTTP splitting vulnerability has been fixed under the hood in at least Tomcat 6+, Glassfish 2.1.1+, Jetty 7+, JBoss 3.2.7+. (Thanks for that info, Jeff Williams.)

Should We Fix the addHeader() API?

Now we can ask ourselves two different things. The first is – should we fix the addHeader() and related APIs? Yes. They should look something like this:

void addHeader(javax.servlet.http.HttpHeaderName name,
               javax.servlet.http.HttpHeaderValue value)

… where the two domain classes HttpHeaderName and HttpHeaderValue accept strings to their constructors and validate that the strings adhere to the RFC 2047 specification. In one blow all Java developers are relieved of the burden to write that validation code themselves and relieved of always having to remember running it.

Should String Be An Abstract Class?

The larger question is about strings in general. Yes, they are super convenient. But we're fooling ourselves. We think the time we save by not modeling our domain, by not writing that validation code, by not narrowing down our APIs to do exactly what they're supposed to, we think that time is better spent on other activities. It's not.

I truly believe nothing is just a string. Nothing is any of 100,000 characters and anything between 0 and 2 billion in length.

Therefore String should be an abstract class, forcing us developers to subtype and think about what we're really handling.

Even better, why not have a way to declare that a class can only be used in object composition? That way programmers could choose if an "is-a" relation or a "has-a" relation is most suitable for narrowing down the String class.

May 13, 2013

Introduction to Software Security

April 22, 2013 I successfully defended my PhD in computer science, more specifically in the area of software security [fulltext pdf]. I thought I'd share some parts of the thesis in a more digestible format and allow myself to augment our results, comment, and have opinions, things you typically don't see in academic publications.

Let's start with my introductory chapter …

The cover.

``To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem. In this sense the electronic industry has not solved a single problem, it has only created them, it has created the problem of using its products.''
–Edsger W.Dijkstra, The Humble Programmer, 1972

Computer software products are among the most complex artifacts, if not the most complex artifacts mankind has created (see Dijkstra's quote above). Securing those artifacts against intelligent attackers who try to exploit flaws in software design and construct is a great challenge too.

Our research contributes to the field of software security. Software as an artifact meant to interact with its environment including humans. Security in the sense of withstanding active intrusion attempts against benign software.

Software Vulnerabilities

Software can be intentionally malicious such as viruses (programs that replicate and spread from one computer to another and cause harm to infected ones), trojans (malicious programs that masquerade as benign) and software containing logic bombs (malicious functions set off when specified conditions are met).

However, attacks against computer systems are not limited to intentionally malicious software. Benign software can contain vulnerabilities and such vulnerabilities can be exploited to make the benign software do malicious things. A successful exploit has traditionally been the same as an intrusion. But in the era of web application vulnerabilities that term is not used as often. Nevertheless, a successful cross-site scripting attack (XSS) can be seen as executing arbitrary code inside the web application. And arbitrary code execution in a web application may very well be of high impact if the application handles sensitive information (password fields, credit card numbers etc) or is authorized to do sensitive state changes on the server (money transfers, profile updates, message posting etc). I would therefore argue that XSS is an intrusion attack.

Vulnerabilities can be responsibly reported to the public by creating a so called CVE Identifier – a unique, common identifier for a publicly known information security vulnerability. Identifiers are created by CVE Numbering Authorities for acknowledged vulnerabilities. Larger software vendors typically handle identifiers for their own products. Some of these participating vendors are Apple, Oracle, Ubuntu Linux, Microsoft, Google, and IBM.

The National Institute of Standards and Technology (NIST) has a statistical database over reported software vulnerabilities with a publicly accessible search interface. Two specific types of vulnerabilities are of specific interest in the context of our research, namely buffer overflows and format string vulnerabilities in software written in the programming language C. The statistics for Buffer Errors and Format String Vulnerabilities are shown below.

Reported software vulnerabilities due to buffer errors have increased significantly since 2002. Their percentage of the total number of reported vulnerabilities has also increased from 1-4 % between 2002 and 2006 to 10-16 % between 2008 and 2012. These statistics are in stark contrast to the statistics from CERT that Wagner et al used to show that buffer overflows represented 50 % of all reported vulnerabilities in 1999 [pdf]. We have not investigated if there are significant differences in how the two statistics were produced. Still, up to 16 % of all reported vulnerabilities is a significant number.

The reported format string vulnerabilities peaked between 2007 and 2009 but have never reached 0.5 % of the total. Our experience is that format string vulnerabilities are less prevalent, easier to fix, and harder to exploit than buffer overflow vulnerabilities. Nevertheless format string vulnerabilities are still being used for exploitation such as the Corona iOS Jailbreak Tool.

Avoiding Software Intrusions

Intrusion attempts or attacks are made by malicious users or attackers against victims. A victim can be either a machine holding valuable assets or another human computer user. Securing software against intrusions calls for anti-intrusion techniques as defined by Halme and Bauer. We have taken the liberty of adapting and reproducing Halme and Bauer's figure showing anti-intrusion approaches, see below.

  1. Preempt – strike offensively against likely threat agents prior to an intrusion attempt. May affect innocents.
  2. Prevent – severely handicap the likelihood of a particular intrusion’s success.
  3. Deter – increase the necessary effort for an intrusion to succeed, increase the risk associated with an attempt, and/or devalue the perceived gain that would come with success.
  4. Deflect – leads an intruder to believe that he or she has succeeded in an intrusion attempt, whereas in fact the intrusion was redirected to where harm is minimized.
  5. Detect – discriminate intrusion attempts and intrusion preparation from normal activity and alert the operations. Detection can also be done in a post mortem analysis.
  6. Actively countermeasure – counter an intrusion as it is being attempted.

Avoiding the Vulnerabilities

There are many ways to achieve more secure software, i.e. avoiding to have vulnerabilities. Microsoft's Security Development Lifecycle (SDL) defines seven phases where security enhancing activities and technologies apply:

  1. Training
  2. Requirements
  3. Design
  4. Implementation
  5. Verification
  6. Release
  7. Response

Further things can be done in an even wider scope. Programming languages can be constructed with security primitives which allow programmers to express security properties of the system they are writing – so called security-typed languages, a part of language-based security [pdf]. Operating systems and deployment platforms can be hardened and secured both in construction and configuration.

Our research objectives have been on the Requirements and Implementation phases of Microsoft's SDL and on hardening of the runtime environment for software applications. Want to know what we found out? Stay tuned for upcoming posts where we dive into the details of our studies.

Jan 6, 2013

Review of The Tangled Web by Michal Zalewski

My family and I spent Christmas and New Year abroad so I got the chance to do some reading. Sitting in my bookshelf for far too long it was time to take on The Tangled Web by Michal Zalewski.

The Best Book on Web Application Security

Let me start by saying The Tangled Web is the best text I've read on web application security. Yes, that includes blog posts, articles, and papers. This is a must-read for any technical OWASPer or the like.

The book covers a lot of ground, serves nice examples, and shows why Michal is one of the most highly regarded experts in the app sec field. Additionally Chris Evans served as technical reviewer and people like Adam Barth and Tavis Ormandy are frequently referenced in the text which means much of Google's best security people made this book what it is.

Detailed But Soon Dated

Most of The Tangled Web is spent on browser security, meaning how browsers implement web standards and de facto standards. Michal has an in-depth knowledge in both the history and the current state of how browsers work and why. This means you'll get plenty of browser quirks and curious differences between browsers and browser versions as well as up and coming features.

The downside of this is that the book quickly gets dated. Several of the features presented as forthcoming or WebKit-only are in fact available and in use today, a year after the book was published. As a reader you're either aware of this or will have to check what has changed.

For this book to be the reference I'd love to repeat, Michal will have to update it. And if he didn't plan for that ahead I'm guessing a second edition will be a major undertaking and not as rewarding for him as the first edition.

So my advice is to read it now while it's still highly relevant. Then hope for new editions with release notes on what has changed. Otherwise The Tangled Web will become a historic exposé of web security.

More For Security Pros Than For Developers

While the subtitle of the book says "A Guide To Securing Modern Web Applications" it's more geared toward security professionals and pentesters than toward developers. And in my experience the developers are the ones who need to secure the applications where security professionals either review proposals or prove that the system is not secure enough.

I'm not saying The Tangled Web is a guide on how to break web apps. On the contrary every chapter has a "Security Engineering Cheat Sheet" and the text is mostly about how to avoid pitfalls. But developers (such as myself) always look for concrete ways to test our code and Michal does not provide guidance on how to do that nor suggests data sets for security testing. If you read between the lines you start getting ideas of good test data yourself so the information is probably in Michal's head. I'm not looking for pen test guidance but for security unit and integration testing guidance.

To be concrete – I would have liked a description and an online reference on a data set for unit testing my URL parser(s). Michal's coverage of problems in parsing URLs is great so you are totally ready to put your code to the test after reading it.

From a developer's perspective The Tangled Web is more of an exposé of web security problems than a guide on how to secure your apps.

My Technical Takeaways

Here's a glimpse of what I underlined and took notes of in the book.

  • Newline handling quirks. Michal's coverage gives new life to header injection or response splitting, a subject not discussed too often these days. Especially worrying is the discrepancy between how Apache and IIS handle a lone CR and how browsers handle it.
  • Attacker-controlled 401 Unauthorized responses. Any resource hosted by the attacker (images etc) can in an instance be configured to respond with 401 and provoke a Basic Authentication dialog to appear in the browser. The victim(s) have no chance of seeing which origin is asking for credentials and thus will believe it's the main site.
  • HTML parsing behavior. Although people like Mario Heiderich and Gareth Heyes regularly treat me to new parsing flaws in browsers I really liked Michal's coverage of the subject.
  • Multiple parsing steps of inline JavaScript. User input in nested JavaScript such as event handlers or setTimeouts is almost impossible to get right in a complex application. Michal shows why with an example where the user input has to first be double encoded using JavaScript backslash sequences and then encoded again using HTML entities. In that exact order.
  • Cookie problems with SOP and country-level TLDs. Some countries require[country code] domains for businesses. Others allow example.[country code]. Japans allows both. This messes up the restrictions for how host can be set for cookies. They aren't allowed to be set for *.com but what about * With the idea of arbitrary generic TLDs we will head back into the dark ages with cookie leakage and overwriting.
  • The X-Content-Type-Options: nosniff response header. Should be default on all your content HTTP responses unless you actually want browsers to try to sniff content type and do all kinds of dangerous interpretation of untyped responses. As of 2011 only 0.6% of top 10,000 sites use this header.

Things I Miss in the Book

While weighing in on 268 pages there are some things I expected to see in there but didn't.

Parameter pollution

It is mentioned but no proper coverage is provided around parameter pollution. Both HTTP and JSON are susceptible to multiple instances of parameters. HTTP parameters, HTTP cookies, and JSON padding (not JSONP) are examples. I would like a general discussion on this topic and its implications for the server-side.

DOM clobbering

Again some parts of this topic are mentioned but there is no real coverage. I would like to read Michal's take on global DOM ids and CSS classes being generated, injected, or mistakingly duplicated in mashups. He does reference Heiderich et al's "Web Application Obfuscation" though.

CSRF in-depth

Cross-site request forgeries are mentioned very briefly. I believe there's room for a few pages on the many nuances such as GET vs POST, Referer header reliance, blindness of the attack, multi-step CSRF, and how CORS eases the attack scenarios.

Script inclusion from outside the app

Working with CSP you quickly see that JavaScript gets included from several places outside the application. ISPs, hotels, and venues add their stuff via proxies. And browser plugins add JavaScript in the browser which means not even HTTPS will help you avoid it. The dangers and diversity of this situation is not covered in the book.

Handling of legacy web

Many organizations are in a legacy state with ten year old asp, jsp, or php sites. There are techniques for moving such applications forward but the book doesn't cover them at all. On the contrary Michal specifically advices against a technique that works wonders in my experience – sandboxing same-origin legacy content in iframes to allow for CSP and clean global footprint in new code.

Security of infrastructure and frameworks

The frontend technology stack is overflowing of frameworks and micro frameworks such as jQuery, Bootstrap, Backbone, Ext JS, and Dojo. Many of them offer flawed security controls. Even more are plagued with insecure defaults. Some are making steady progress in security whereas others bluntly ignore even reported flaws. This whole situation is very tangible on the web today but not covered in the book. A detailed guide on top frameworks is probably out of scope but given the prevalent use of frameworks something should be written on the topic.

Architectural considerations

The web has interesting aspects such as the typical mix of programming languages within the system (fronted and backend), the loose and untyped coupling between server and client, and problems of mixing code, content, and style. This has bearing on security. System-wide static analysis is unheard of for one. Versioning and validity checksums are super hard to get right. I would love to read Michal's thoughts on where we are and where we should be headed in terms of secure architecture on the web.

The Best Part – The Epilogue

While the technical parts of this book (95 % that is) are really great I cannot help but think Michal's epilogue was the best part. It's short and not in the least the kind of self-indulging stuff you typically come across. Instead Michal challenges the whole security industry and academia. Are we really helping society with our paranoia and foil hats? Or are we a breed of IT pros about to be extinct? After all, no other part of mankind or society is "secure". It's all about trust and the tradeoff between development and risk.

I think Michal is on to something important and it makes me happy I decided long ago to go 70 % development and 30 % security. That's the app sec productivity sweet spot in my opinion.

Disclaimer: I got a free copy of the book from the publisher.

Nov 24, 2012

Is XSS Solved?

In academic research a problem is solved when it is fully understood and a solution is shown to work in a practical setting. If we define "XSS solved" as every instance of XSS eradicated from earth we will probably not see a solution in our lifetime. So, from a research perspective, is XSS solved already?

Geeks in a Castle

Early October I attended the weeklong seminar on web application security at castle Dagstuhl, Southwest Germany. An awesome opportunity to socialize and discuss with leading experts in web appsec academia.

Group photo outside the castle (original).

"XSS Is Solved"

One of the break-out sessions was on XSS. Someone had voiced the opinion that XSS is solved already the day before. The break-out session took the claim seriously and hashed it out.

From a principal standpoint, which is the typical standpoint of academic research, a problem like XSS is solved when a) we fully understand the problem and its underpinnings, and b) have a PoC solution that is practical enough to be rolled out and has the potential to solve the problem fully.

Do we fully understand XSS and its underpinnings?

Important Papers on XSS

Looking at recent publications we arrived at the following short list that we felt summarizes how academia understands XSS today:
  • Context-Sensitive Auto-Sanitization in Web Templating Languages Using Type Qualifiers [pdf]
  • ScriptGard: Preventing Script Injection Attacks in Legacy Web Applications with Automatic Sanitization [pdf]
  • A Symbolic Execution Framework for JavaScript [pdf]
  • Gatekeeper: Mostly Static Enforcement of Security and Reliability Policies for JavaScript Code [pdf]
  • Scriptless Attacks – Stealing the Pie Without Touching the Sill [pdf]
The conclusion was that yes, we think the understanding of XSS is fairly good. But we lack a definition of XSS that would summarize this understanding and allow new attack forms to be deemed XSS or Not XSS.

Current Definitions of XSS

Can you believe that? We still don't have a reasonable definition of XSS.

Wikipedia says "XSS enables attackers to inject client-side script into web pages viewed by other users". 

But it can easily be shot down. Do we need "web pages" to have XSS? Does an attack have to be "viewed by other users" to be XSS? More importantly the Wikipedia definition doesn't say whether the attackers' scripts have to be executed or not or in what context. With default CSP in place you can still inject the script into a page, right? With sandboxed JavaScript you can both inject and execute without causing an XSS attack. And what about these "attackers"? Can they be compromised trusted third parties, legitimate users of the system, or even clumsy business partners?

OWASP says "Cross-Site Scripting attacks are a type of injection problem, in which malicious scripts are injected into the otherwise benign and trusted web sites. Cross-site scripting (XSS) attacks occur when an attacker uses a web application to send malicious code, generally in the form of a browser side script, to a different end user."

Again "web sites" seem to be a prerequisite, but are they? Here the injected scripts have to be "malicious", but do they? And does the target web site have to be "benign and trusted"? OWASP just like Wikipedia fails to state that the injected script has to be executed. Then OWASP changes its mind and says XSS happens when an attacker "uses a web application to send malicious code". Clearly, this widens the scope beyond JavaScript. But look at that sentence and imagine Alice using to send an email to Bob containing a malicious code sample. Alice has done XSS since she used a web application to send malicious code.

I know I'm nit-picking here. Neither Wikipedia nor OWASP have proposed an academic definition of XSS. They're trying to be pedagogical and reach out to non-appsec people.

But we still need a (more) formal definition. To be clear, we need a definition of XSS that allows us to say if a certain vulnerability or attack is XSS or not. Without such a definition we cannot know if countermeasures such as CSP "solves XSS" or not.

Also, Dave Wichers brought up an interesting detail at this year's OWASP AppSec Research conference in Athens. We need to redefine reflected XSS, stored XSS, and DOM-based XSS into server-side XSS reflected and stored, and client-side XSS reflected and stored.

Current, insufficient categorization of XSS.

Proposed new categorization of XSS.

A New Candidate Definition of XSS

To get the juices flowing at the castle we came up with a candidate definition of XSS that the rest of the participants could shoot down.

Candidate definition of XSS: An XSS attack occurs when a script from an untrusted source is executed in rendering a page.

It was shot down thoroughly, in part by yours truly :). 

Terms more or less undefined in the candidate definition:
  • Script. JavaScript, any web-enabled script language, or any character sequence that sort of executes in the browser?
  • Untrusted. What does trusting and not trusting a script mean? Who expresses this trust or distrust?
  • Source. Is it a domain, a server, a legal entity such as Google, or the attacker multiple steps away in the request chain?
  • Executed. Relates to "Script" above. Does it mean running on the JavaScript engine, invoke a browser event, invoke an http request, or what?
  • Rendering. Does rendering have to happen for an attack to be categorized as XSS?
  • Page. Is a page a prerequisite for XSS? Can XSS happen without a page existing?

So Is XSS Solved?

Back to the original question. The feeling at Dagstuhl was that CSP is the mechanism we're all betting on to solve XSS. Not that it's done in version 1.0, not even 1.1. But it's a work horse that we can use to beat XSS in the long run.

What we need right now is a satisfactory definition of XSS. That way we can find the gaps in current countermeasures (including CSP) and get to work on filling them. Don't be surprised if the gaps are fairly few and academic researchers start saying "XSS is solved" within a year. Hey, they need to work on application security problems of tomorrow, not the XSS plague in all the legacy web apps out there.

Please chip in by commenting below. If you can give a good definition of XSS, even better!

Nov 7, 2012

The Rugged Developer

I took part in the intense, weeklong Rugged Summit this spring. Rugged as in Rugged Software. Rugged Software as in secure and robust software. The major outcome of the summit and the homework afterwards was a strawman of The Rugged Handbook. It's free, available here (as docx).

My main contribution to the handbook was being lead author of The Rugged Developer chapter. I'd like to share it with you as a stand-alone blog post below. Hopefully you can give me and the other authors some feedback!

The Rugged Developer
As a Rugged Developer, I want my software to be secure against attacks, interference, corruption, random events, and more.

To achieve my goals I have come to value...

  • Software Quality over Security Products
  • Defensive Code over Patching
  • Ruggedizing Your Own Systems over Waiting To Be Hacked

Your Mission as a Rugged Developer
Good news – you're already Rugged ... in part. We developers do all sorts of things to ensure our code is robust and maintainable. To become a fully rugged developer you only need to add security to the quality goals you try to achieve.

The interesting part of security is that you're protecting your code against intelligent adversaries, not just random things. So in addition to being robust against chaotic users and errors in other systems, your code also has to withstand attacks from people who really know the nuts and bolts of your programming languages, your frameworks, and your deployment platform.

Being a Rugged developer means you have a key role in your project’s security story. The story should tell you what security defenses are available, when they are to be used, and how they are to be used.  Your job is to ensure these things happen. You should also strive to integrate security tests into your development life cycle, and even try to hack your own systems. Better you than a “security” guy or bad guy, right?

Ideas for Being an Effective Rugged Developer
Add Security Unit Tests. Perhaps you've been to security training or you've read a blog post on a new attack form. Make it a habit of trying to add unit tests for attack input you come across. They will add to your negative testing and make your application more robust. A certain escape character, for instance ', may be usable in a nifty security exploit but just as well produce numerous of errors for benign users. A user registration with the name Olivia O'Hara should not fizzle your SQL statement execution. You as a developer have the deepest knowledge of how the system is designed and implemented and thus you are in the best position to implement and test security.

Model Your Data Instead of Using Strings. Almost nothing is just a string. Strings are super convenient for representing input, but they are also capable of transmitting source code, SQL statements, escape characters, null values, markup etc. Write wrappers around your strings and narrow down what they can contain. Even if you don't spend time on a draconic regular expressions or check for syntax and semantics, a simple input restriction to unicode letters + digits + simple punctuation may prove extremely powerful against attacks. Attackers love string input. Rugged developers deny them the pleasure.

Hack Your Own Systems. Even more fun, do it with your team. If management has a problem, tell them it's better you do it than someone on the outside.

Get Educated. There are many materials available to help you learn secure coding, including websites, commercial secure coding training, and vulnerable applications like WebGoat. Also, although top lists can be lame, the OWASP Top 10 and CWE Top 25 are great places to start. As luck would have it, most of the issues in these lists are concrete and you can take action in code today. There are a lot more good materials available at both OWASP and MITRE.  

Make Sure You Patch Your Application Frameworks and Libraries. Know which frameworks and libraries you use (Struts, Spring, .NET MVC, jQuery etc) and their versions. Make sure your regression test suite allows you to upgrade frameworks quickly. Make sure you get those patch alerts. A framework with a security bug can quickly open up several or all your applications to attacks. All the major web frameworks have been found to have severe security bugs the last two years so the problem is very much a reality today.

The Rugged Developer should evaluate success based on how well their code stands up to both internal and external stresses. How many weaknesses are discovered after the code is released for testing? How often are mistakes repeated? How long does it take to remediate a vulnerability? How many security-related test cases are associated with a project?

Mar 23, 2012

Rugged Summit Summary

I spent the last week in Washington DC as an invited expert to the Rugged Summit, part of the Rugged Software initiative.

The very minute I announced I'd be participating I got several messages on Twitter saying Rugged is a failure and I shouldn't go. Those messages were sent from people I like and trust. Sure, I was reluctant to a manifesto written to developers by security experts. Also, I hadn't heard much since the Rugged announcement in 2010.

But shouldn't I try to bring my view---a developer's view---to the table? Of course I should!

At the summit I got to work with some amazing people. Ken van Wyk, Joshua Corman, Nick Coblentz, Jeff Williams, Chris Wysopal, John Pavone, Gene Kim, Jason Li, and Justin Berman. Four very intense days. And still no silver bullet :).

Rugged Software In Short
My take on rugged is defensible software free from well-known bug types. A rugged application should be able to withstand a real attack as long as the attack doesn't exploit unknown bugs in the platform or unknown bug categories in the app. If the rugged application is breached the developers and operations should be able to recover gracefully.

Rugged also applies to operations and there's an ongoing Rugged DevOps initiative.

Why Should Organizations Become Rugged?
We first focused on *why* organizations should produce or require rugged software. Does software security also enhance software quality? Should we try to measure return on investment, reduced cost, reduced risk or what? What would make a CIO/CTO decide to go rugged?

Fundamentally we believe rugged software is part of engineering excellence. And we all need to do better. Software is enhancing our lives and revolutionizes almost everything mankind does. We want software to be good enough to enable further revolution.

Software security is currently in a state of vulnerability management. That's a negative approach and it hasn't made frequent breaches go away. Rugged is a more positive approach where you're not supposed to find a bunch of vulnerabilities in pentesting.

Here's three examples of motives for rugged we worked on.

"Telling your security story" could be a competitive advantage. Look at and put it in the context of Dropbox's recent security failures. Imagine the whole chain of people involved in a system being built to chip in to produce evidence of why their product or service is secure.

Another idea is to define tests that prove that you're actually more secure after becoming rugged than before. We believe executives feel security is a black art and a pentest+patch doesn't show if the organization is 90 % done or 1 % done. HDMoore's Law could be such a test (works without Rugged too of course). How to actually test against Metasploit will have to be figured out.

Third, if buyers of software started demanding more secure software that would drive producers to adopt something like Rugged. So we worked on a Buyers' Bill of Rights and a Buyer's Guide. Buyer empowerment if you will.

The Rugged Software Table of Contents
The rest of the summit was spent on various aspects of how we think software and security can meet successfully. Our straw man results will be published further on and there will be plenty of chances to help making it the right thing.

But the table of contents may give you an impression of where we're headed:

  1. Why Rugged?
  2. This is Rugged
  3. The Rugged Tech Executive
  4. The Rugged Architect
  5. The Rugged Developer (I will write the first version of this section)
  6. The Rugged Tester
  7. The Rugged Analyst
  8. Becoming a Rugged Organization
  9. Proving Ruggedosity
  10. Telling the Rugged Story (internally and publicly)
  11. How Rugged Fits in with Existing Work
  12. Success Case Study

Feb 18, 2012

We Need a Free Browser, Not Just an Open Source Browser

The security community was chocked when Trustwave came clean and revoked a subordinate root certificate it had sold to a third party which explicitly said it would use it to introspect SSL traffic.

The news of Trustwave's severe malpractice sparked demands for removing the Trustwave root certificates from the Mozilla trust stores (Firefox, Thunderbird, SeaMonkey). The demand was filed as a bug in Bugzilla and the issue has also gotten a fair amount of attention on the Mozilla-dev-security-policy mailing list.

We experienced the same process – Bugzilla + mailing list outbursts – during the recent DigiNotar and Comodo scandals.

Kill or Not to Kill, That's the Question
According to Trustwave they had to sell the man-in-the-middle certificate since other CAs do it. That in itself is extremely worrying. These bastards who've been charging us $$$ for maintaining trust on the Internet. They've not only been negligent in their security operations but also done business selling out the trust built in by all browsers.

So, should Mozilla kill the Trustwave root because of their misconduct? Tricky question.

On the one hand I feel Trustwave's CA business deserves nothing less than the ditch. They did the wrong thing with open eyes.

On the other hand we probably have a large scale problem at our hand – CAs worldwide have been issuing subCA certs that allow employers, governments, and agencies to intercept the traffic we all thought was authenticated, encrypted, and integrity checked. Killing the Trustwave root doesn't fix that.

Think about it. The whole trust model crumbles. Can customers now claim someone else must have manipulated their buy order for the stock that later plummeted? Can payment providers who leak credit cards now claim somebody must have MItMed them? Will the increase in online shopping continue once mainstream media understands and writes about this issue?

Whichever path we take it has to lead to reestablished trust in the CA model. The alternatives such as building on DNSSEC or Moxie's excellent Convergence are nowhere near mainstream roll-out.

But you know what? Democracy and openness seem to work. Mozilla has made the right decision.

CAs world-wide have until April 27 to come clean. Mozilla says the following on its security blog:

"Earlier today we sent an email to all certificate authorities in the Mozilla root program to clarify our expectations around certificate issuance. In particular, we made it clear that the issuance of subordinate CA certificates for the purposes of SSL man-in-the-middle interception or traffic management is unacceptable. We made it clear that this practice remains unacceptable even when the intended deployment of such a certificate is restricted to a closed network.

In addition to this clarification, we have made several requests. We have requested that any such certificates be revoked, and their HSMs destroyed. We have requested the serial numbers of those certificates and fingerprints of their signing roots so that we, and other relying parties, can detect and distrust these subCA certificates if encountered. We have requested that any CAs who have issued subCA certificates fulfill these requests no later than April 27, 2012."

Where else did you see such a clear message to the CAs who have abused our trust? Mozilla makes me proud.

We Need a Free Browser, Not Only an Open Source Browser
The handling of Comodo, DigiNotar and Trustwave tells me we truly need Mozilla and Firefox. Nowhere else in the web community have I seen such openness, freedom of speech, and focus on regular users' interests. Hey, even internet trolls get their say on the mailing list :).

Sure, I love my Chrome, I know Google is a high-paying partner to Mozilla, and I know Firefox has been lagging behind in performance and developer tools. But there's something really great and important in a free alternative.

Speaking of lagging behind ... JavaScript performance and Chrome's V8 have been industry standard for a few years. But when I run the SunSpider 0.9.1 benchmark on my new MacBook Air I get:

  • Chrome v17.0.963.56: 280.8 ms
  • Firefox v10.0.2: 233.0 ms

Therefore I would like to urge the Mozilla Foundation to get us tab sandboxing and silent auto-upgrades in Firefox so I can go all-in!

We need a free browser, not just an open source browser.

Jan 6, 2012

Stateless CSRF Protection

In the era of RESTful services and rich internet applications it's important to find security solutions that don't impose unnecessary state or computation on servers. I previously wrote a post on stateless session ids. Let's have a look at how we can protect against cross-site request forgeries (CSRF) without server-side state.

CSRF Basics
Forged requests are nasty attacks. They rely on the fact that your browser automatically adds cookies to HTTP requests if it has cookies associated with the target domain and path. That includes session cookies.

Let's say you're currently authenticated to If you visit another site on another domain that site can issue requests to and your Twitter session cookie will be added to those requests.

How can domain B issue requests to domain A, formally doing a cross-site HTTP request? Well, there are some obvious cases – images, JavaScript, and CSS.

<img src="" />

… is allowed from any site, which means a malicious site can contain tags like …

<img src=”" height=0 width=0 />

Such a tag will issue an HTTP GET to including the victim's session cookie for * should he or she be logged in. The browser doesn't know if there's an image on that URL or not. It just fires the request. And by setting the image size to 0x0 the victim will see nothing.

Most sensitive stuff require an HTTP POST since a GET should be idempotent and not change any state server-side. So can a malicious page issue an HTTP POST to any domain? Yes.

The CSRF code from the image above ($ is jQuery):

<form id="target" method="POST" 
  <input type="text" value="I hate OWASP!" name="oneLiner"/>
<input type="submit"

  $(document).ready(function() {

CSRF Against RESTful Services
But maybe you've left HTML forms behind and go with rich clients, a RESTful backend and communication via JSON? Can a malicious page issue an HTTP POST targeting such services? Yes.

You can change the encoding of HTML forms to text/plain and do some tricks to produce parseable JSON in the request body. Here's an example that I got working with a Java JAXRS backend:

<form id="target" method="POST" 


  <input type="text"

   name='{"id": 0, "nickName": "John",
          "oneLiner": "I hate OWASP!",

          "timestamp": "20111006"}//'
   value="dummy" />
  <input type="submit" value="Go" />


Notice the enctype and that the JSON is in the input name, not the value. The above form produces a request body looking like this:

{"id": 0, "nickName": "John","oneLiner": "I hate OWASP!","timestamp": "20111006"}//=dummy

… which is accepted by for instance the Jackson parser.

CSRF Protection With Double Submit
Traditional anti-CSRF techniques use tokens issued by the server that the client has to post back. The server validates the request by comparing the incoming token with it's copy. But that small word "copy" means server-side state. Not good.

Double submit is a variation of the token scheme where the client is required to submit the token both as a request parameter and as a cookie.

A malicious page on another domain cannot read the anti-CSRF cookie before its request and thus cannot include it as a request parameter.

Two Misconceptions About Double Submit
There are two common misconceptions about the double submit CSRF protection.

First, it has been suggested that the session cookie should be used for this purpose. Since you have to use JavaScript to pick up the cookie value and add it as a request parameter the cookie cannot have the HTTPOnly attribute. And you want HTTPOnly on your session cookie to prevent session hijacking via cross-site scripting.

But you should not use the session cookie as anti-CSRF cookie. Instead add a specific anti-CSRF cookie which does not have the HTTPOnly attribute and keep your session cookie protected.

Second, people have stuck with server-generated, stateful anti-CSRF cookies. But double submit cookies can be generated client-side and don't have to be saved by the server at all.

Stateless CSRF Protection with Double Submit
The protective measure of double submit lies in the fact that a malicious site cannot read the cookie and include it as request parameter. That condition still holds if the cookie is generated by the client and never saved by the server.

So let the client generate the anti-CSRF value and only compare and check format of cookie and request parameter on the server. Ergo, stateless CSRF protection!

Hardening the Double Submit Protection
Double submit protection breaks down if the attacker somehow can read or set the anti-CSRF value. We can harden double submit against malicious reads.

First of all we make the client change the anti-CSRF value upon every request. This is typically done by centralizing backend calls to a custom AJAX proxy, possibly inherited.

Second, we zero the anti-CSRF cookie directly after each backend call. This will allow for accurate server-side detection of forged requests. A zeroed double submit cookie is a clear signal of either a client-side bug or a forged request. With zeroed anti-CRSF cookies the attacker has to issue his/her attack to exactly when the cookie is set by the client.

Drawbacks of Double Submit
You typically hear two drawbacks of the double submit protection – it's reliance on JavaScript to add the cookie value as request parameter, and the possibility to read the anti-CSRF cookie via cross-site scripting.

The issue with JavaScript is diminishing as JavaScript is becoming a requirement for more and more sites anyway.

The cross-site scripting critique is invalid. If you can script the site you already own all of it and can setup your own AJAX proxy, read any tokens in the DOM etc.

Dec 17, 2011

The Anatomy of a Twitter Storm

Hi! I'm @johnwilander and yesterday I unwillingly created a Twitter storm.

#GodIsNotGreat and Twitter Trends
It was Friday afternoon in Stockholm, Sweden and I came across a tweet with the interesting tag #GodIsNotGreat. Earlier that day I had read about famous atheist and writer Christopher Hitchens passing away but there was something else going on around this tag. In the tag stream I read about christians threatening tweeters using the tag and several claims that Twitter had pulled the tag from the trends list.

I couldn't find the tag in my trends list nor in the US or European list. Still, tweets with the tag were pouring in. In the stream I found a tweet from @HillyFoz saying:

"So Twitter, it's ok for #reasonstobeatyourgirlfriend to trend but you saw fit to put a stop to #GodIsNotGreat ?"

Apparently #ReasonsToBeatYourGirlfriend had been a trend. Now that's interesting too. So I wrote The Tweet what would become a Twitter storm:

As you can see in the screenshot, numerous people eventually retweeted this.

I could see the amplification within a minute. Suddenly around 10 people had retweeted it and new retweets where being reported faster than I could open them on the activity tab.

At this point I reviewed my tweet and saw that I had claimed something I couldn't back up with a source or a reference. Such things make a about-to-be-PhD blush a little :). Soon enough I started to get complaints. I considered deleting the tweet but decided to let it live on to see what this storm would be like.

Here Come the Trending Bots
Time for the first bot to spot me. Apparently I was trending in the UK:

While the retweets, complaints and comments were pouring in I quickly became a trend in USA and Canada too:

About 30 minutes later the spam bots had caught on and I started getting all kinds of weird stuff. Some of them I couldn't tell if they we're real people or bots. For instance this (not even a reply to my tweet):

I did not respond :).

The final bot step was when the @favstar50 bot congratulated me to my first 50+ favorited tweet.

Trying to Find a Source
Some of the complaints I got where getting nasty so I thought I might be lucky enough to find a source and patch my earlier tweet. UK online paper Huffington Post gave me at least half an excuse as they wrote:

"The hash tag #GodIsNotGreat also began trending, which was followed by a storm of protests by the religious, many unaware that the hash tag was a tribute to the author's passing.

Twitter reportedly removed the topic from the trending lists following threats of violence towards the creators of the hash tag. The irony that Hitchens book, one that makes stark the link between religion and violence, had stirred the religious to then threaten violence was not lost on the twitterati."

But my antagonists quickly dismissed Huffington Post as a bad source and also pointed out that they said "reportedly" which I failed to do in my 140 chars. So much for the patchwork.

The Day After
When I got up Saturday I checked my email. Well ...

I had got quite a few new followers. Woot! But I guess most of them will unfollow when I go back to tweetin' about JavaScript and application security. Easy come, easy go, huh? Note the browser tab for Twitter with 19 new notifications that dropped in while I was taking the screenshot.

I checked my Klout score:

It looks like the revenue diagrams Uncle Scrooge has on his wall :).

Later I found out that Gizmodo had written an article about it – Shutup, Twitter Isn't Censoring Your Dumb Trends. There, in the middle of their bashing was my tweet! Luckily, Gizmodo didn't dig out the source but rather took poor Jessica K's commenting retweet as an example. Phew.

Lessons Learned
My lessons learned:
  • I should probably check the sources of every tweet, not just my tech tweets.
  • Rumors spread extremely fast on Twitter. As long as the message is interesting, people retweet.
  • Twitter trends are not based on volume, they're based on derivatives, or speed if you will. If the increase of the #GodIsNotGreat tag would have been steady it would have still been a trend. But it wasn't.

Apr 9, 2011

REST and Stateless Session IDs

Nowadays there's a general reluctance to introduce (more) server-side session state because of scalability. And there's specific reluctance to session state in RESTful web services, due to design principles.

In the stateless requirement of REST we read:

"The client–server communication is constrained by no client context being stored on the server between requests. Each request from any client contains all of the information necessary to service the request, and any session state is held in the client. The server can be stateful; this constraint merely requires that server-side state be addressable by URL as a resource." [Wikipedia]

This is a tough requirement, especially if we want features such as authentication and sessions.

So, can we have session ids without server-side session state? Yes.

The Relation Between Sessions and Authentication
There's often a tight relationship between authenticating users and holding their sessions. Anonymous sessions are not very sensitive whereas authenticated sessions have to be protected against hijacking, fixation, forging, and replay. Actually, a valid session token authenticates the session, so you're basically authenticating yourself every request. Which leads us to the first stateless session solution ...

No Sessions => Authenticate Every Request
If session ids are in fact authentication tokens we might as well use the mental model of no sessions, instead authenticate each request. The old HTTP Basic Authentication does this by storing your username and password for subsequent requests. But there are more advanced versions such as authentication in Amazon's S3 REST API.

They use a custom HTTP scheme based on a keyed-HMAC (Hash Message Authentication Code). To authenticate a request, you first concatenate selected elements of the request to form a string. You then use a shared "AWS Secret Access Key" to calculate the HMAC of that string, i.e. you sign the request. Finally, you add this signature as a parameter of the request.

GET /photos/puppy.jpg HTTP/1.1
Date: Mon, 26 Mar 2007 19:37:58 +0000

Authorization: AWS 0PN5J17HBGZHT7JJ3X82:frJIUN8DYpKDtOLCwo//yllqDzg=

Looking deeper into how this scheme works you find the following spec:

Authorization = "AWS" + " " + AWSAccessKeyId + ":" + Signature;

Signature = Base64( HMAC-SHA1( UTF-8-Encoding-Of( YourSecretAccessKeyID, StringToSign ) ) );

StringToSign = HTTP-Verb + "\n" +
________Content-MD5 + "\n" +
________Content-Type + "\n" +
________Date + "\n" +
________CanonicalizedAmzHeaders +

Canonicalization of course requires some processing such as converting headers to lower-case and sorting them lexicographically. The date works as a timestamp and narrows the replay window.

The server then does the same signing with the shared secret associated with the AWSAccessKeyId. So we're switching from server-side session state to more cycles and latency on both client and server.

Worth noting:
  • Signed requests are much stronger than mere session ids. Cross-site request forgeries will be mitigated with this scheme.
  • By authenticating all requests with a shared secret we don't have any time-bound sessions or timeouts. Just fire whenever you want.
  • The persistent shared secret is much more sensitive than a temporary session id. A cross-site scripting attack will steal the shared secret which is much worse than session hijacking. This means the scheme is less suitable for browsing sessions and more suitable for machine-to-machine communication.

Stateless, Hashed Session ID and Salt
The server can generate session id cookies by hashing usernames and a random global salt:

sessionIdCookie_v1 = username ":" SHA256(username + global salt)

The salt is used for all sessions but only valid for a certain timeframe, say 15 minutes. A new salt is produced every 5 minutes and incoming session ids produced with the previous but still valid salt will be exchanged for a new session id with the fresh salt. That means a session timeout of 15-5=10 minutes.

If we truly want to go stateless we cannot kill such a session since that would require a server-side table of revoked session ids. So in the stateless case an attacker will have a 15 minute replay window in which he/she will refresh the session and have endless access.

Stateless, Encrypted Session ID
By just storing a server-side symmetric crypto key we can effectively decrypt incoming session IDs and trust their contents. Imagine a cookie based on:

sessionIdCookie_v2 = AES_GCM(128 bit key, auth tag, username + timestamp)

This means we don't have to store each session ID. Instead we pay the price of decrypting incoming cookies and checking that the timestamp is within a timeframe, say 15 minutes. For all incoming session IDs older than 5 minutes we regenerate a new cookie to effectively run a 15-5=10 minute session timeout window.

Again, if we want to go stateless we cannot kill such a session since that would require a server-side table of revoked session ids. So an attacker will have a 15 minute replay window in which he/she will refresh the session and have endless access.

There are three competing parameters to prioritize between:
  • Server CPU cycles per request
  • Server-side session state
  • The replay window
The tradeoff between CPU cycles and memory footprint will change with new technologies such as non-blocking IO in node.js. So yesterday's best practice might not be valid today.

The difference between regular session hijacking and hijacking of stateless session ids is that successful theft of a stateless session id authenticates the attacker even if the victim has logged out. Remember, the server doesn't store the session state. And even if the server would store the boolean isLoggedIn for each user, an old session id will still be valid if the user logs in again, as long as it hasn't timed out.

So ask yourselves what your tradeoff between CPU cycles and server-side state is. Then consider the replay+refresh leverage of a successful cross-site scripting attack.

Apr 8, 2011

Friday JavaScript & Web Dev Links

I'm summing up some reading tips for JavaScript and web development. Just thought you'd like 'em.


Command-Line JavaScript on Rhino
So you want to write command-line JavaScript on Rhino? Here's how you do it on Mac OS:
  1. Download Rhino 1.7R2:
  2. Unzip Rhino in for instance Applications/Utilities/Java
  3. Download JLine:
  4. Unzip JLine in for instance Applications/Utilities/Java
  5. Move jline-0.9.94.jar to /Library/Java/Extensions
  6. In a shell: cd /Applications/Utilities/Java/rhino_1_7R2
  7. In the very same shell: java
Code away!

Building Large-Scale jQuery Applications
A good read on RIA architecture and links to lib and framework choices, not only for jQuery junkies:

JavaScript Primitive Types Becoming Objects
About JavaScript's primitive types and how they become objects when their properties are used:

Scoping and Hoisting in JavaScript
If you haven't looked into scoping and variable assignments in JavaScript, read this and improve your programs:

'String'.replace() Only Replaces First Instance
String.prototype.replace, i.e. 'yourString'.replace(), only replaces the first instance of the regexp. So beware. Twitter made the mistake and got vulnerable because of it. Read about it and a suggested patch:

Non-Blocking JavaScript Loading (and more) With head.js
With Head JS your scripts load like images - completely separated from page rendering, and in parallel!

Web Development

RESTful Design, Patterns and Anti-Patterns
A nice webcast on REST design. For instance brings up the idea of session ids with constant state on the server. But as always, I wonder when the CSRF storm is going to hit all these REST services out there?

Chrome Web Dev Extensions
Google Chrome is becoming many web developers' favorite browser. The bundled developer tools are good. But check out the extensions too, for instance the CSS reloader:

iframe Loading Techniques and How They Affect Performance
Want your iframes to stop blocking and allow onLoad to fire earlier? Check these techniques out:

Did I miss a good resource or read? Just fire away below.