What does it mean to “escape” data that is put into a web page?
By escape I mean formatting the data in such a way that it doesn’t interfere with any of the code or markup of a web page. For example, let’s say a user entered their first name like so into a text input box
<b>Mary</b>
If the web page redisplays this data after the form is submitted, and it doesn’t escape it, then it will be rendered something like
<p>First Name: <b>Mary</b></p>
which will display as
First Name: Mary
instead of as
First Name: <b>Mary</b>
That is, as a web developer you want to display to the user exactly what they provided. The way to make sure you display to the user exactly what they provided is to escape any characters that could be interpreted as part of the web page markup, like so
First Name: <b>Mary</b>
which will render as
First Name: <b>Mary</b>
Why is it important to escape data included in a web page?
There are two main problems that can arise when data isn’t escaped in a web page. We’ve already looked at one, namely that characters that are part of the markup or programming language (as we’ll see with JavaScript later) can mess up that actual markup or program code. So for the sake of correctness, data included in the web page must be escaped.
The second problem is that failure to escape data in a web page can make possible various kinds of security attacks. Keeping with our example, suppose a user entered as their first name
Mary<script>alert("Hijacked!");</script>
Displaying this directly in a web page would result in that script being executed. And the JavaScript so injected could do lots of nefarious things, such as reading a user’s cookies.
What are the different ways of escaping data?
There are several different ways in which data is added to a web page, and they require different techniques to properly escape them. I’ll cover three cases in this blog:
- HTML data
- JavaScript strings and JSON
- URL components
For each of these I’ll show how to escape these in PHP (specifically when using Laravel), but the techniques here are applicable to other languages and frameworks. Most languages and frameworks will have similar utilities to escape web page data.
How to escape HTML data
For the most part what we need to do is replace the following characters
- < to
<
- > to
>
- ” to
"
- & to
&
- etc.
In a Laravel controller you can escape data to be put into a web page with the htmlspecialchars
. In a Blade template you can escape data by using triple curly braces, for example, {{{ $first_name }}}
.
To go along with the example we started with, here’s how to write the Blade template to display the first name:
<p>First Name: {{{ $first_name }}}</p>
If you are dynamically generating a web page using JavaScript then you also need to be careful when creating DOM elements. Don’t use .innerHTML
because that won’t escape any HTML special characters. There’s a simple way to handle data that you want to put into the DOM that may have special HTML characters and that is to use the special .textContent
property of the DOM element.
var p = document.createElement('p'); p.textContent = '<b>Mary</b>'; // then append the p somewhere in the DOM
You can do the same in jQuery with the .text()
function.
How to escape JavaScript strings and JSON data
With JavaScript strings you want to make sure that you escape quote characters in the string. If the user’s first name value is Mary"
and you put this directly into a JavaScript string like so
<script> var firstName = "{{ $first_name }}"; </script>
it will get rendered as
<script> var firstName = "Mary""; </script>
which is invalid JavaScript. The double quote in the $first_name
variable prematurely terminates the JavaScript string.
In PHP, the way to deal with this is to use the json_encode
method. You can use this for JavaScript strings:
<script> var firstName = "{{ json_encode($first_name) }}"; </script>
You can also use it to encoding PHP data structures like arrays to JSON:
<script> var userData = "{{ json_encode($userdata) }}"; </script>
How to escape values placed in URLs
When placing data into a URL, care has to be taken to encode any characters that are special to URLs such as
- forward slash (/)
- ampersand (&)
- question mark (?)
- etc.
URL encoding is the process of converting these special characters using something called percent-encoding. For example /
gets encoded as %2F
. Notice that this encoding format is different from the HTML encoding above so you’ll need different functions and techniques to do percent-encoding.
Let’s say you have a $project_name
variable with the value This&that project
and you want to create a link with the project name in it:
<a href="/projects?name={{ $project_name }}">Go to {{{ $project_name }}}</a>
This renders as
<a href="/projects?name=This&that project">Go to This&that project</a>
But this won’t work. When someone clicks on the link the server will interpret the URLs parameters like so
name=This
is the first parameter name and valuethat project
is another parameter without a value
Instead use urlencode
to properly encode the URL data.
<a href="/projects?name={{ urlencode($project_name) }}">Go to {{{ $project_name }}}</a>
This renders as
<a href="/projects?name=This%26that%20project">Go to This&that project</a>
The server will interpret these URL parameters as
name=This&that project
as the sole name value pair
If you are constructing URLs in JavaScript code, use encodeURIComponent
to do percent-encoding on URL data.