Why are HTTP headers handled as plain strings in programming?
Is there anything in software engineering that is just a string? If not, shouldn't String be an abstract class, forcing developers to subtype and at least name datatypes?
This blog post is a case for Domain-Driven Security and a case against strings.
In Java EE's interface HttpServletResponse we find the following method (ref):
void addHeader(java.lang.String name,
java.lang.String value)
Not a heavily debated method as far as I know. On the contrary it looks like most such interfaces do. An implementation of the interface may look like this (ref):
public void addHeader(String name, String value) {
if (isCommitted())
return;
if (included)
return; // Ignore any call from an included servlet
synchronized (headers) {
ArrayList values = (ArrayList) headers.get(name);
if (values == null) {
values = new ArrayList();
headers.put(name, values);
}
values.add(value);
}
}
But what if the request looks like this (%0d is carriage return, %0a is linefeed):
www.example.com/?lang=foobar%0d%0aContent-Length:%200%0d%0a%0d%0aHTTP/1.1%20200%20OK%0d%0aContent-Type:%20text/html%0d%0aContent-Length:%2019%0d%0a%0d%0a<html>Well, hello!</html>
That would generate the following HTTP response (linefeeds included):
HTTP/1.1 302 Moved Temporarily
Date: Wed, 24 Dec 2013 15:26:41 GMT
Location: http://www.example.com/
Set-Cookie:
JSESSIONID=1pwxbgHwzeaIIFyaksxqsq92Z0VULcQUcAanfK7In7IyrCST
9UsS!-1251019693; path=/
Custom-Language: foobar
Content-Length: 0
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 19
<html>Well, hello!</html>
Content-Type: text/html
… which will be interpreted as two responses by the web browser. This is an example of the security attack called HTTP response splitting (link to WASC from where I've adapted my example). And that's just one of the dangers of letting users mess with headers. Setting or deleting cookies is another. In fact, the whole header section is in danger.
The HTTP splitting vulnerability has been fixed under the hood in at least Tomcat 6+, Glassfish 2.1.1+, Jetty 7+, JBoss 3.2.7+. (Thanks for that info, Jeff Williams.)
void addHeader(javax.servlet.http.HttpHeaderName name,
javax.servlet.http.HttpHeaderValue value)
… where the two domain classes HttpHeaderName and HttpHeaderValue accept strings to their constructors and validate that the strings adhere to the RFC 2047 specification. In one blow all Java developers are relieved of the burden to write that validation code themselves and relieved of always having to remember running it.
I truly believe nothing is just a string. Nothing is any of 100,000 characters and anything between 0 and 2 billion in length.
Therefore String should be an abstract class, forcing us developers to subtype and think about what we're really handling.
Even better, why not have a way to declare that a class can only be used in object composition? That way programmers could choose if an "is-a" relation or a "has-a" relation is most suitable for narrowing down the String class.
Is there anything in software engineering that is just a string? If not, shouldn't String be an abstract class, forcing developers to subtype and at least name datatypes?
Domain-Driven Security
Former colleague Dan Bergh Johnsson, application security expert Erlend Oftedal, and I have been evangelizing the idea of Domain-Driven Security. We truly believe proper domain and data modeling will kill many of the standard security bugs such as SQL injection and cross-site scripting.This blog post is a case for Domain-Driven Security and a case against strings.
The addHeader() Method in Java
Let's be concrete and dive directly into programming with HTTP headers.In Java EE's interface HttpServletResponse we find the following method (ref):
void addHeader(java.lang.String name,
java.lang.String value)
Not a heavily debated method as far as I know. On the contrary it looks like most such interfaces do. An implementation of the interface may look like this (ref):
public void addHeader(String name, String value) {
if (isCommitted())
return;
if (included)
return; // Ignore any call from an included servlet
synchronized (headers) {
ArrayList values = (ArrayList) headers.get(name);
if (values == null) {
values = new ArrayList();
headers.put(name, values);
}
values.add(value);
}
}
It shows we can really set any string as an HTTP header. And that's convenient, right?
Java uses Unicode strings in UTF-16 code units which handle over 100,000 characters. As far as I know C# and JavaScript does the same. The max size of strings is often limited by the max size of integers, typically 2^31 - 1 which is just over 2 billion.
So, a string …
The Ubiquitous String
java.lang.String is the ubiquitous datatype that solves all our problems. It can contain anything and nothing and of course it has its sibling in any popular programming language out there. Let's have a look at what a string is.Java uses Unicode strings in UTF-16 code units which handle over 100,000 characters. As far as I know C# and JavaScript does the same. The max size of strings is often limited by the max size of integers, typically 2^31 - 1 which is just over 2 billion.
So, a string …
- is anything between 0 and 2 billion in length,
- can contain 100,000 different characters, and
- can be null.
HTTP Headers By the Spec
RFC 2047 gives us the formal specification of how HTTP headers should look. An excerpt will suffice for our discussion.
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content = <the OCTETs making up the field-value
and consisting of either *TEXT or
combinations of token, separators, and
quoted-string>
combinations of token, separators, and
quoted-string>
token = 1*<any CHAR except CTLs or separators>
CHAR = <any US-ASCII character (octets 0 - 127)>
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
separators = "(" | ")" | "<" | ">" | "@" |
"," | ";" | ":" | "\" | <"> |
"/" | "[" | "]" | "?" | "=" |
"{" | "}" | SP | HT
LWS = [CRLF] 1*( SP | HT )
CRLF = CR LF
OCTET = <any 8-bit sequence of data>
TEXT = <any OCTET except CTLs,
but including LWS>
Let's summarize.
- HTTP header names can consist of ASCII chars 32-126 except 19 chars called separators.
- Then there shall be a colon.
- Finally the header value can consist of any ASCII chars 9, 32-126 except 19 chars called separators … or a mix of tokens, separators, and quoted strings.
- On top of this web servers such as Apache impose length constraints on headers, somewhere around 10,000 chars.
There's clearly a huge difference between just a string and RFC 2047.
The Dangers of Unvalidated HTTP Headers
Can this go wrong? Is there any real danger in using plain strings for setting HTTP headers? Yes. Let's look at HTTP response splitting as an example.
We have built a site where an optional URL parameter tells the server which language to use.
www.example.com/?lang=Swedish
… redirects to …
www.example.com/
… with a custom header telling the web client to use Swedish. After all, we don't want that language parameter pestering our beautiful URL the rest of the session.
So in the redirect response we do the following:
response.addHeader("Custom-Language",
request.getParameter("lang"));
request.getParameter("lang"));
The result is an HTTP response like this:
HTTP/1.1 302 Moved Temporarily
Date: Wed, 24 Dec 2013 12:53:28 GMT
Location: http://www.example.com/
Set-Cookie:
JSESSIONID=1pMRZOiOQzZiE6Y6iivsREg82pq9Bo1ape7h4YoHZ62RXj
ApqwBE!-1251019693; path=/
Custom-Language: Swedish
Connection: Close
But what if the request looks like this (%0d is carriage return, %0a is linefeed):
www.example.com/?lang=foobar%0d%0aContent-Length:%200%0d%0a%0d%0aHTTP/1.1%20200%20OK%0d%0aContent-Type:%20text/html%0d%0aContent-Length:%2019%0d%0a%0d%0a<html>Well, hello!</html>
That would generate the following HTTP response (linefeeds included):
HTTP/1.1 302 Moved Temporarily
Date: Wed, 24 Dec 2013 15:26:41 GMT
Location: http://www.example.com/
Set-Cookie:
JSESSIONID=1pwxbgHwzeaIIFyaksxqsq92Z0VULcQUcAanfK7In7IyrCST
9UsS!-1251019693; path=/
Custom-Language: foobar
Content-Length: 0
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 19
<html>Well, hello!</html>
Content-Type: text/html
… which will be interpreted as two responses by the web browser. This is an example of the security attack called HTTP response splitting (link to WASC from where I've adapted my example). And that's just one of the dangers of letting users mess with headers. Setting or deleting cookies is another. In fact, the whole header section is in danger.
The HTTP splitting vulnerability has been fixed under the hood in at least Tomcat 6+, Glassfish 2.1.1+, Jetty 7+, JBoss 3.2.7+. (Thanks for that info, Jeff Williams.)
Should We Fix the addHeader() API?
Now we can ask ourselves two different things. The first is – should we fix the addHeader() and related APIs? Yes. They should look something like this:void addHeader(javax.servlet.http.HttpHeaderName name,
javax.servlet.http.HttpHeaderValue value)
… where the two domain classes HttpHeaderName and HttpHeaderValue accept strings to their constructors and validate that the strings adhere to the RFC 2047 specification. In one blow all Java developers are relieved of the burden to write that validation code themselves and relieved of always having to remember running it.
Should String Be An Abstract Class?
The larger question is about strings in general. Yes, they are super convenient. But we're fooling ourselves. We think the time we save by not modeling our domain, by not writing that validation code, by not narrowing down our APIs to do exactly what they're supposed to, we think that time is better spent on other activities. It's not.I truly believe nothing is just a string. Nothing is any of 100,000 characters and anything between 0 and 2 billion in length.
Therefore String should be an abstract class, forcing us developers to subtype and think about what we're really handling.
Even better, why not have a way to declare that a class can only be used in object composition? That way programmers could choose if an "is-a" relation or a "has-a" relation is most suitable for narrowing down the String class.