How Web Security Will Change With HTML5

Mike Shema is the engineering lead for the Qualys web application scanning service. He has authored several books, including Hack Notes: Web Application Security, and he blogs on web security topics at the companion site for his latest book, Seven Deadliest Web Attacks.

It’s astonishing that 10 years of technological progress have produced web application behemoths like Facebook, Twitter, Yahoo! and Google, while the actual technology inside the web browser remained relatively stagnant. Companies have grown to billion-dollar valuations (realistic or not) by figuring out how to shovel HTML over HTTP in ways that make investors, advertisers, and users happy.

The emerging HTML5 standard finally breathes some fresh air into the programming possible inside a browser. Complex UIs used to be the purview of plugins like Flash and Silverlight (and decrepit, insecure ActiveX). The JavaScript renaissance seen in YUI, JQuery, and Prototype significantly improve the browsing experience. HTML5 will bring sanity to some of the clumsiness of these libraries and provide significant extensions.

Here are some of the changes HTML5 will bring and what they mean for web security

Cross-Origin Resource Sharing

An HTML5 feature with possibly the most potential for mistakes is the Cross-Origin Resource Sharing (CORS) that relaxes the fundamental security mechanism of a browser, the Same Origin Rule. CORS isn’t an arbitrary change; it’s a step towards standardizing what developers are already trying to do in order to build higher-performance sites.

Basically, CORS defines a group of client and server headers that enable a site to define origins that are allowed to interact with another origin’s context. It also provides granularity of lifetime and request methods for this site-defined access control. The following headers show how simple this is to implement from a server’s perspective. (Obviously, we’re just showing the HTTP headers and skipping the server-side code to generate and verify these.)

Access-Control-Allow-Origin: http://domain Access-Control-Max-Age: 86400 Access-Control-Allow-Methods: PUT, DELETE

The first one, Allow-Origin, is where the worst mistakes will happen. We’ll see who the first sites are to use * in this field — thereby allowing sharing with any domain. There’s already precedent for this in Flash crossdomain.xml file vulnerabilities.

The domain of the Origin matters, not its path, as the spec emphasizes in section 3 — Security Considerations: “… only cross-origin security is provided and that therefore using a distinct origin rather than a distinct path is vital for secure client-side web applications.” Woe to developers who implement cross-origin requests without understanding this precaution.

Watch for potential “space invader” attacks in this area. Origin lists are space-delimited. For example, the following URL is intended to produce an Origin header from http://allowed.origin:

http://malicious.spoof/page?fake_arg=%20http://allowed.origin/

But a browser bug might turn this into:

Origin: http://malicious.spoof http://allowed.origin

Or worse, a server-side bug might turn this into an allowed destination for XHR requests if the page for some reason is building dynamic headers from the URL. In this case, the attacker would look for a weakness in the allowed.origin site that would enable CORS with the malicious spoof site. The vulnerable link might be something like this:

http://allowed.origin/page?cors=other.allowed.origin%20malicious.spoof

That produces an insecure access control header:

Access-Control-Allow-Origin: http://other.allowed.origin http://malicious.spoof

This last bit about space invaders is pretty speculative at the moment, but possibly not too far off considering the history of browser security. Browser hackers will no doubt be targeting their fuzzers to see how well browsers parse and serialize these headers. URLs may be prone to all sorts of errors, from invalid domains, to invalid ports, to IDN characters — the incorrect handling of which might lead to a buffer overflow or security bypass.

Spoofed headers are a serious threat for CORS and have several possible attack vectors. Unencrypted Wi-Fi combined with HTTP are a recipe for disaster (the least of which is spoofed headers). In the past, browser plugins like Flash have been used to spoof headers in order to bypass security restrictions. Browser plugins are notorious for breaking browser assumptions and playing outside their security sandbox.

Web Storage

The push for richer browser-based functionality also brings the desire to store more data in the browser than normally handled by cookies. Cookies have been the historically clumsy method of saving stateful data. The HTML5 Web Storage specification provides a more flexible way for sites to store data in the browser using essentially a key-value database.

Like most security boundaries in the browser, web storage is based on the Same Origin Rule. As the spec itself reminds readers, this means that the more general threats of DNS-based attacks pose a risk to the security of data stored by a domain. The Same Origin Rule is an implementation of the “Vegas principle:” What happens in one domain is supposed to stay in that domain. The browser assumes that content coming from a domain name is always legitimate, but that isn’t always the case if DNS isn’t secure.

The other danger of web storage will be sites that rely too heavily on it for storing a user’s sensitive data. We’ve already seen instances of sites that don’t properly encrypt passwords in their database. Now we may see sites that store sensitive, personal information via web storage APIs. If the site has a cross-site scripting (XSS), then an attacker would be able to trivially extract this information.

Then there’s the threat of malware. A site might be free of XSS vulnerabilities and otherwise secure, but store lots of valuable information in the browser. Many malware payloads already scan disks for items like financial information and gaming credentials. Now they’ll start searching for data in these browser stores as well. Diligent devs will use this data storage to improve the user experience, but not at the risk of exposing sensitive information.

Speaking of XSS, HTML5 might have some unexpected consequences for validation routines. An XSS filter might be tripped up by new elements and attributes present in HTML5 that didn’t exist in HTML4. Whitelisting-based filters should be more resilient because the new elements won’t be handled. In any case, devs need to be aware that even though <audio> and <video> may be the most popular new tags, they’re not the only new ways XSS could manifest.

Sins of the Past

The most dangerous security problems won’t be due to features of HTML5. Too many experienced people have been working on the specs to leave egregious errors in the design or in browsers’ implementation of it. The worst problems will come from developers who rush into new technologies without remembering sins of the past. It’s far too easy to fall into the trap of trusting data from the browser just because some hefty JavaScript routines have been assumed to perform all sorts of security validation on the data.

Once data leaves the browser, an attacker can modify it in any way before it reaches the server. Trusting the client to always serve well-formed, valid data is the sure path to SQL injection, XSS, and worse vulnerabilities.

HTML5 doesn’t just have security implications for web developers. The browser has become a highly coveted target for malware. With each browser’s implementation of new HTML5 features will come buffer overflows and other coding mistakes that malware will seek out. As the browser’s end user, there’s little you can do on this front other than to keep your software up to date. All of the new HTML5 features will take a while before they’re securely baked into the browser. Attackers will continually look for bugs by pushing different limits in the browser: Cross-origin requests for thousands of origins, deeply nested elements, resource consumption attacks (DoS) using multitudes of Web Worker threads, and so on.

Luckily, browser developers haven’t been lazy this whole time. The last few years have seen laudable forays into better security and privacy protections. Browsers are starting to implement new headers that can protect against broad classes of attacks. For example, cross-site request forgery and clickjacking can be reliably defended against with Origin and X-Frame-Options headers. This stands in stark contrast to problems like cross-site scripting, for which no easy solution has been found.

Browsers have been pushing the privacy front as well with Do Not Track headers and private browsing options. It’s important to keep perspective on the topic of privacy. While the browser can take steps to make your data protection easier, it has no control and little influence on how a web site will use and protect that data. HTML5 briefly touches on privacy issues and security has direct consequences for privacy.

HTML5 is not a security solution. It’s a long-awaited update to the HTML spec. An update that took the time to be more explicit about both security and privacy issues. The new features of HTML5 will lead to exciting, powerful applications delivered through the browser. As such, it’s important for developers to keep in mind a few basic security tenets: Validate all data from the client, prefer whitelisting approaches over blacklisting, use HTTPS wherever possible, and test your site to make sure it’s performing how you intended.

Interested in more Dev & Design resources? Check out Mashable Explore, a new way to discover information on your favorite Mashable topics.