Hi, here are six easy but interesting XSS games hosted by Google. This blog records the writeup, how to patch these vulns and CSP-based protection strategies. More importantly, I summarized the browser parsing process in terms of the timing of different decoding occurring.

The gaming area: https://xss-game.appspot.com/

Part One: Exploit

There is a bug on the xss-game.appspot site when getting passed to the next level. Even though you triggered the alert and passed level 1, you would still receive a page as follows saying that the browser's cookie is not set correctly.

I found that after you alert the current page, the site would send a get request /level1/record to receive a response with a set-cookie header. The bug is within the set-cookie header's Expires attribute that the time is already expired. So, the browser has no reason to accept an expired cookie and just drop it.

So, the straightforward way to fix it is to revise the response header's Expires attribute of the set-cookie before it arrives at the browser.

Browser parsing process

The timing of URL decoding, HTML entity decoding, and JS decoding in the browser

I want to summarize this important knowledge here because the page parsing process of the browser is very important. Otherwise, the various encoding bypass mechanisms would be a mess, and you can't really understand why some XSS payloads are constructed in this way and why some of them won't work. Here is an introduction to the browser parsing process and the timing of URL decoding, HTML entity decoding, and JS decoding.

First of all, although URL encoding is often heard, it is not handled by the client-side browser. Since URL is only allowed to contain characters including alphabets, numbers, and certain reserved characters -_.~!*'();:@&=+$,/?#[]. Other characters need to be encoded before send out by the browser. So, URL decoding happens on the server once it receives a request before it starts locating the resources and acquiring the parameter values from the URL.

The parsing process begins once the browser receives the response from the server, including HTTP, CSS, and JS files. The object of the parsing process is to convert these three code files into the browser's internal representation, the DOM and CSSOM trees, which could be further rendered. The process is much like an interpreter, we need a Lexer and a Parser. The former reads the token from the code files and makes sure it is a legitimate token. The latter selects productions from the grammar rule to parse and generates a parse tree.

The first step is processing the HTML markup and building the DOM tree. During the traversal of the HTML file, for each HTML tag encountered, a DOM tree node is constructed and the tag's attributes and content are added. As you can imagine, any encoding that breaks the structure of the HTML syntax during the DOM tree construction process is not legal. Since no decoder has been applied at this point, when an illegal character is encountered, the parser is unable to select the correct production to parse, and an error will be reported. For example, the following cases will not be correctly interpreted by the browser.

<div>hello</div>
<&#104;1>Main Title</h1>
<img src%3D"http://xxx">hello
<\x62\x75\x74\x74\x6f\x6e onclick="alert(1);"></button>

Once a DOM node has been successfully generated, its attribute values and content will be decoded by the HTML decoder. This makes sense, as HTML entity encoding is used to avoid special characters in the attribute values and content affecting the DOM tree structure, so these strings can be decoded as soon as the node is successfully created.

In fact, the CSS parser and JS parser are also involved in the parsing process. In the case of JS, for example, when a token is encountered that marks the starting position of the JS domain, such as <script> tags, the HTML parser will hand them over to the javascript parser. And there are other forms that can invoke the JS parser:

  • Script tag<scirpt>code</script>
  • External javascript file <script src="http://xxx"></script>
  • Various HTML CSS parameters that support JavaScript, e.g. <img style="xss:expression(alert(/xss/))" />, <a href="javascript:alert('<一>')">test</a>
  • Event handlers, such as onload, onerror, onclick, etc., such as <img/src=x onerror=alert(1)>
  • Timer, Timer(setTimeout, setInterval), e.g. <img src=x onerror='setTimeout("ale "+"rt(1)",0)' />
  • Eval call, e.g. eval("<script>alert(1);</script>");

The JS parser treats the content of the parsed domain as code and therefore supports Unicode encoding.

<a href="javascript:\u0061lert('<一>')">test</a>

For now, we can understand why the following case cannot alert:

<div id="message">
   Your timer will execute in &lt;script&gt;alert(1)&lt;script&gt; seconds.
</div>
// although after HTML entity decoding, the div seems containing a <scirpt> tag, the HTML parser won't invoke the JS parser to interpret. This is because the script tag is not found inside the div tag during parsing (it is still encoded by the HTML entity at this point). After the final HTML entity decoding, the content of the div is not a form that needs to be parsed by js. So the JS parser is not involved throughout.

We know that JS code embedded in HTML may modify the DOM tree that has been constructed. However, when a node in the DOM tree is added or modified, it will still be re-parsed in the same way as described above.

Here are materials about how the browser works:

  • http://taligarsiel.com/Projects/howbrowserswork1.htm
  • https://archive.org/details/thetangledwebaguidetosecuringmodernwebapplications/page/n33/mode/2up
  • https://xuelinf.github.io/2016/05/18/%E7%BC%96%E7%A0%81%E4%B8%8E%E8%A7%A3%E7%A0%81-%E6%B5%8F%E8%A7%88%E5%99%A8%E5%81%9A%E4%BA%86%E4%BB%80%E4%B9%88/
  • https://developer.mozilla.org/en-US/docs/Web/Performance/How_browsers_work#parsing
  • https://zhuanlan.zhihu.com/p/41945326

Level 1

Since the xss-game provides the source code of the website, it would be much helpful for us to understand what is going on in the back-end server. Obviously, after the user's input is transmitted to the server via the query parameter, the user's input is directly spliced into the response packet by the server, resulting in an XSS vulnerability.

So, the following URL would be a POC.

https://xss-game.appspot.com/level1/frame?query=<script>alert(1)</script>

Level 2

Level 2 is a typical Stored XSS challenge. There is a client-side database storing our posts and showing up every post when we reload the page or add new posts. The good news is that they don't escape what we input, so our input will be spliced into the HTML code as intended, which leads to a Stored XSS vulnerability.

As you can see, I have tried several XSS payloads.

Firstly, I tried the easiest payload: <script>alert(1);</script>,but it doesn't work. This is because our input would be put in a <tr> tag which defines a row in an HTML table. I checked that the <script>wouldn't be analyzed if it is in a <tr> tag back to HTML4 which might be the reason why it wouldn't alert.

Then, I try to write javascript in the body of a <img> tag:<img src=xss onerror:alert(1);>. Since I thought the server doesn't serve an image as https://xss-game.appspot.com/xss and would return a 404 not found error response leading to trigger the alert(1)defined in the onerror attribute of the <img> tag. However, it doesn't work either. The problem is that the server surprisingly returned an empty response packet with a 200 status code which would not trigger the onerror action.

Finally, I tried to use a <button>tag as: <button onclick="alert(1);">XSS</button> which would alert after I click the button.

Level 3

The logic of the program in Level3 is to load different images according to the user clicking on different tabs. A new img tag would be added then.

Although it seems that the argument of the chooseTab handle function is fixed, there is actually another way to invoke the function.

The chooseTab function is bound to the window.onload action which means every time the window(inside fame) is loaded, it would call the chooseTab function and pass the sub URL as the argument. And the parameter num of the chooseTab function would be manipulated as something we want to inject.

However, we need to close both the forward and backward single quotes and add an attribute to the img tag. So, I create this payload: ' onerror="alert(1);">',which would be <img src="/static/level3/cloud" onerror="alert(1);">after spliced into the html variable.

Level 4

Level 4 provides a timer creation functionality. After we enter the time we want to count down and click on the Create Timer button, we will get the page with the timer returned by the server. As you can see from the image below, there are two parts of the returned pages that show back what we entered, so there may be an XSS vulnerability.

Then, I tried to manipulate the timer parameter and see what we would get.

We can find out that dangerous characters like <, >, ", 'have been encoded to the HTML entities on the server side. There are several functions like htmlspecialchars in PHP which is able to complete this work. It is a helpful way to mitigate the XSS attack due to the order of parsing in the browser.

I would recommend the reader to read the brower parsing process part to understand this Level further.

Back to level 4, for now, we can decide that the input displayed in the div tag may not work, but the input displayed in the onload attribute of the img tag might lead to an XSS exploit. In short, we don't need to care about the HTML entity coding in the attribute values, and just close the front and back quotes and parentheses.

https://xss-game.appspot.com/level4/frame?timer=1')%3Balert('1

Level 5

For level 5, we can view the logic of its page jumps. First of all, if we click the Sign-up button, it will redirect us to the /level5/frame/signup?next=confirm page. And when we click the next button, it will redirect us to the confirm page, which is exactly the same as the parameter value of next on the previous page. Since the confirm URL is in the href attribute of a <a> tag, we can definitely revise it and inject a javascript domain and trigger an XSS attack.

So, my payload is as follows, which is really intuitive. And then, I can click the next button and trigger the alert().

/level5/frame/signup?next=javascript:alert(1);

Level 6

Level 6 provides a way to import other javascript files. The intention of this website is to import other resources from its backend, that's why it tries to filter the http/https input to prevent users import malicious javascript files from the internet.

However, there are still several ways to bypass the filter. Firstly, since the match function is case sensitive, we could use Https:// or HTTPS:// as the protocol name, which would be interpreted as https:// at the parsing time. I create a repository on GitHub and stored a malicious js file with alert(1).

HTTPS://raw.githubusercontent.com/jackfromeast/xss-file/main/alertyourwebsite.js

However, I received this Cross-Origin Read Blocking error.

The reason for this problem is that although we are requesting a javascript file, Github sends it as a text file that the content-type in the package header is text/plain. And the response package has an X-content-type-options:nosniff header, which requires the browser to refuse to accept data of the wrong MIME type. So what we expect is a file with text/javascriptMIME type but return text/plain, which prevents our file from being read in. If we have our own server hosting the malicious script, we can modify the response headers.

But don't be discouraged, we have another way. The data URL is another way of involving files instead of the traditional URL request. The intention of Data URLs is to provide a way to embed small files inline with HTML and avoid establishing additional HTTP connections to request those files. The data URL usually has the form of data:[<mediatype>][;base64],<data>. And the data part supports javascript execution. So, we can try the following POC:

data:text/plain,alert('1')

Part two: Patch

Level 1

In level 1, The user input will be inserted into the content of a <div> tag, and since the content of <div> will not be parsed by JS parser by default, the only thing we need to do is encoding the text with HTML entities. Since I am using the flask as the backend server, the escape() function has been applied.

Level 2

In Level 2, the user input will be saved into a client-side database and be spliced into a <tr> tag. Since the <tr> tag would not trigger the JS parser by default, I just encode the user input with HTML entities before it has been post to the database.

Level 3

In Level 3, the user input will be spliced into the src attribute of a <img> tag and send a request to ask the embedded img. Since there are only three images offered by the server, the user input cannot be a value other than 1, 2, or3 after parseInt().

Level 4

Since this level implements the function of a timer, it is sufficient to rewrite all user inputs that are not numeric to the default value at the back end.

Level 5

In this level, the user's input is written to the href attribute of the a tag, so it should filter out input that can invoke a JS parser such as javascript:. But the most effective means of protection against XSS is to create a whitelist in conjunction with the site's logic, so my patch is shown below.

Level 6

In level 6, we need to create a whitelist of javascript files in order to prevent users from including malicious javascript files. So I wrote regular expressions so that all hrefs must start with /static/, which allows users to introduce javascript files only from the static directory of the website.

Defense with Content-Security-Policy(CSP)

Content Security Policy is an added layer of security to prevent XSS attacks. It asks us to separate html and javascript, and explicitly grant execute permission to the javascript segment.

The use of Nonce is the most widespread means, as illustrated here.

Taking Level2 as an example, first separate the javascript code from all html tags' attributes and wrap it with <script> tags. Then add the nonce attribute to each <script> tag. The value of the nonce needs to be generated dynamically at each request, so just add a placeholder here.

Next, the nonce value is generated and injected into the HTML when the template is rendered using Flask.

def getCspNonce():
  """Returns a random nonce."""
  NONCE_LENGTH = 16
  return base64.b64encode(os.urandom(NONCE_LENGTH)).decode()

Finally, we can write the CSP header to restrict the loading source of javascript.