Jan 6, 2013

Review of The Tangled Web by Michal Zalewski

My family and I spent Christmas and New Year abroad so I got the chance to do some reading. Sitting in my bookshelf for far too long it was time to take on The Tangled Web by Michal Zalewski.

The Best Book on Web Application Security

Let me start by saying The Tangled Web is the best text I've read on web application security. Yes, that includes blog posts, articles, and papers. This is a must-read for any technical OWASPer or the like.

The book covers a lot of ground, serves nice examples, and shows why Michal is one of the most highly regarded experts in the app sec field. Additionally Chris Evans served as technical reviewer and people like Adam Barth and Tavis Ormandy are frequently referenced in the text which means much of Google's best security people made this book what it is.

Detailed But Soon Dated

Most of The Tangled Web is spent on browser security, meaning how browsers implement web standards and de facto standards. Michal has an in-depth knowledge in both the history and the current state of how browsers work and why. This means you'll get plenty of browser quirks and curious differences between browsers and browser versions as well as up and coming features.

The downside of this is that the book quickly gets dated. Several of the features presented as forthcoming or WebKit-only are in fact available and in use today, a year after the book was published. As a reader you're either aware of this or will have to check what has changed.

For this book to be the reference I'd love to repeat, Michal will have to update it. And if he didn't plan for that ahead I'm guessing a second edition will be a major undertaking and not as rewarding for him as the first edition.

So my advice is to read it now while it's still highly relevant. Then hope for new editions with release notes on what has changed. Otherwise The Tangled Web will become a historic exposé of web security.

More For Security Pros Than For Developers

While the subtitle of the book says "A Guide To Securing Modern Web Applications" it's more geared toward security professionals and pentesters than toward developers. And in my experience the developers are the ones who need to secure the applications where security professionals either review proposals or prove that the system is not secure enough.

I'm not saying The Tangled Web is a guide on how to break web apps. On the contrary every chapter has a "Security Engineering Cheat Sheet" and the text is mostly about how to avoid pitfalls. But developers (such as myself) always look for concrete ways to test our code and Michal does not provide guidance on how to do that nor suggests data sets for security testing. If you read between the lines you start getting ideas of good test data yourself so the information is probably in Michal's head. I'm not looking for pen test guidance but for security unit and integration testing guidance.

To be concrete – I would have liked a description and an online reference on a data set for unit testing my URL parser(s). Michal's coverage of problems in parsing URLs is great so you are totally ready to put your code to the test after reading it.

From a developer's perspective The Tangled Web is more of an exposé of web security problems than a guide on how to secure your apps.

My Technical Takeaways

Here's a glimpse of what I underlined and took notes of in the book.

  • Newline handling quirks. Michal's coverage gives new life to header injection or response splitting, a subject not discussed too often these days. Especially worrying is the discrepancy between how Apache and IIS handle a lone CR and how browsers handle it.
  • Attacker-controlled 401 Unauthorized responses. Any resource hosted by the attacker (images etc) can in an instance be configured to respond with 401 and provoke a Basic Authentication dialog to appear in the browser. The victim(s) have no chance of seeing which origin is asking for credentials and thus will believe it's the main site.
  • HTML parsing behavior. Although people like Mario Heiderich and Gareth Heyes regularly treat me to new parsing flaws in browsers I really liked Michal's coverage of the subject.
  • Multiple parsing steps of inline JavaScript. User input in nested JavaScript such as event handlers or setTimeouts is almost impossible to get right in a complex application. Michal shows why with an example where the user input has to first be double encoded using JavaScript backslash sequences and then encoded again using HTML entities. In that exact order.
  • Cookie problems with SOP and country-level TLDs. Some countries require example.co.[country code] domains for businesses. Others allow example.[country code]. Japans allows both. This messes up the restrictions for how host can be set for cookies. They aren't allowed to be set for *.com but what about *.com.pl? With the idea of arbitrary generic TLDs we will head back into the dark ages with cookie leakage and overwriting.
  • The X-Content-Type-Options: nosniff response header. Should be default on all your content HTTP responses unless you actually want browsers to try to sniff content type and do all kinds of dangerous interpretation of untyped responses. As of 2011 only 0.6% of top 10,000 sites use this header.

Things I Miss in the Book

While weighing in on 268 pages there are some things I expected to see in there but didn't.

Parameter pollution

It is mentioned but no proper coverage is provided around parameter pollution. Both HTTP and JSON are susceptible to multiple instances of parameters. HTTP parameters, HTTP cookies, and JSON padding (not JSONP) are examples. I would like a general discussion on this topic and its implications for the server-side.

DOM clobbering

Again some parts of this topic are mentioned but there is no real coverage. I would like to read Michal's take on global DOM ids and CSS classes being generated, injected, or mistakingly duplicated in mashups. He does reference Heiderich et al's "Web Application Obfuscation" though.

CSRF in-depth

Cross-site request forgeries are mentioned very briefly. I believe there's room for a few pages on the many nuances such as GET vs POST, Referer header reliance, blindness of the attack, multi-step CSRF, and how CORS eases the attack scenarios.

Script inclusion from outside the app

Working with CSP you quickly see that JavaScript gets included from several places outside the application. ISPs, hotels, and venues add their stuff via proxies. And browser plugins add JavaScript in the browser which means not even HTTPS will help you avoid it. The dangers and diversity of this situation is not covered in the book.

Handling of legacy web

Many organizations are in a legacy state with ten year old asp, jsp, or php sites. There are techniques for moving such applications forward but the book doesn't cover them at all. On the contrary Michal specifically advices against a technique that works wonders in my experience – sandboxing same-origin legacy content in iframes to allow for CSP and clean global footprint in new code.

Security of infrastructure and frameworks

The frontend technology stack is overflowing of frameworks and micro frameworks such as jQuery, Bootstrap, Backbone, Ext JS, and Dojo. Many of them offer flawed security controls. Even more are plagued with insecure defaults. Some are making steady progress in security whereas others bluntly ignore even reported flaws. This whole situation is very tangible on the web today but not covered in the book. A detailed guide on top frameworks is probably out of scope but given the prevalent use of frameworks something should be written on the topic.

Architectural considerations

The web has interesting aspects such as the typical mix of programming languages within the system (fronted and backend), the loose and untyped coupling between server and client, and problems of mixing code, content, and style. This has bearing on security. System-wide static analysis is unheard of for one. Versioning and validity checksums are super hard to get right. I would love to read Michal's thoughts on where we are and where we should be headed in terms of secure architecture on the web.

The Best Part – The Epilogue

While the technical parts of this book (95 % that is) are really great I cannot help but think Michal's epilogue was the best part. It's short and not in the least the kind of self-indulging stuff you typically come across. Instead Michal challenges the whole security industry and academia. Are we really helping society with our paranoia and foil hats? Or are we a breed of IT pros about to be extinct? After all, no other part of mankind or society is "secure". It's all about trust and the tradeoff between development and risk.

I think Michal is on to something important and it makes me happy I decided long ago to go 70 % development and 30 % security. That's the app sec productivity sweet spot in my opinion.

Disclaimer: I got a free copy of the book from the publisher.