Being careful with user INPUT: sanitizing

Nahuel Molina | Silvio
5 min readJan 14, 2022

The input that users store in an application can be turn into a complex topic when database are right next in the process chain.

After numerous posts treating the data web storing and its vulnerabilities, I realized how interesting and relevant these topics are. The different points of view from developers there, make me write my own post about it.

As I said before through internet exists different opinions about what to do in a security environment, due to the specfic characteristics each project has. It’s mandatory to consider this at wrirting your application, and take your one precautions.

Sanitizing or validating

Talking about sanitizing, we are willing to clean all characters that don’t match what we specified. It concerns for example, those passwords that don’t accept “\” “&”, and they will be cutted off. Obviously, the data changed will “recover" its shape, encoding the output, some new characters replacing the original, but data itself is not the same.

It doesn’t make sense to cut or modify data that will be stored as a string from the form, because it will lost any chance to interact with its environment due to the nature of its format, at a moment it stays in the databas. These way we have to see for the pattern of that string, and validate it or not if it match our conditions intead of modify it.

Basically, make the input walk through a filter (HTML parsers,,), if it fails just deny it. Pure validation. As an advantage, you will always know what data structure you are storing in your database. However, your responsibility increase, an strict filter can even deny a valid request.

All this discussion comes to a question, what to do validate, sanitize or both. Well, as I said before you have to see for your own project, but the most experienced developer says that just one of them. Validating turns out better, since sanitizing is a risk for destroying inreversible your data.

Sanitizing HTML: filters, parsers,,

In general, filters work over the HTML format, checking if it is malformed or not. Once analized it can be possibly sended to the backend, and there it could pass through another filter or not, with a security behavior. Using two sanitizer, it allow us to be organized in our priorities.

Server protects us from XSS, and injections, meanwhile the front-app is about for just rendering documents. Javascript, with its army of libraries, makes the choice easier. Covering renderers with a low focus on security, and the opposite.

Common cases are those when there’s an editor which shows its preview in HTML format, and at the same time sends it to the server on its original format, applying some of backend over it. At start, it would be obvious and sufficient to apply a converter (for example md-to-html) in the front. However, considering you are worried enough on your site’s security, you will surely have integrated both converter (md-to-html) and sanitizer in the server.

As an example consider having '<script>console.log("hey")</script>'

Here comes the problem, you’ll have a different result in both sides in case of malformed HTML, the back split out what is wrong in the semantic once converted, meanwhile the front ignores inconsistency and translate just the target format to html.

In the Mozilla’s site are varied examples that show how to sanitize, thanks to the browser API. It’s super easy to try out how the data splitting works, but there’s the alternative of utilize an extern library/cdnjs. Dompurify for example, stands out for being DOM only, fast,, even nodejs has its sanitize-html package in NPM, which will be useful at making React or Angular nodejs based apps. For expressjs, most of the time template engine required its own secure add, like handlebars does with its express-secure-handlebars.

At this point I am not going to lie because I haven’t proved a great amount and make comparaisons at it, I think it should be part of your investigation at looking for yours tools, in your specific case.

In django we can say that this framework sanitize automatically, against SQL injections and XSS, then you don’t have to be worry about that. The ORM has a role in that, but as most of the ORMs, the fact that it protect us from injections doesn’t mean it will do it always and with all of them, I prefer to be aware of it instead if just having confidence. Django brings us his help in SQL thanks to its queries parametrization that I will talk nextly.

SQL parametrization

Sometimes it can emerged a misunderstood about terms sanitizing or parametrizing SQL data. Most of the time this concept confusion happens at this context of data storing.

What is being sanitized is data but what is being parametrized isn’t data but SQL queries. Despite certain confusion, it doesn’t care as you understand what a dev is talking about.

Sometimes since the frontend, queries can be malformed as HTML documents. Both are structured languages, with its own semantics. At adding filters or parsers you have to take care of that.

Django’s SQL queries utilize parametrization, where the bare query is defined separately from the parameter, which could be the one unsafe. After that, this user-provided data is escaped, what I will talk about later.

Escaping output data

The method of escaping output data basically is based on the hiding of certain probably unwanted characters. It can be considered inside of encoding, but with the difference that the this one translates characters instead of hiding them.

In general, we can talk about input-side and output-side encoding, as the less risky way to ensure our data string, without distroying it. Like sanitizing does. With the advantage that receivers (server) understand and decode.

What is escaped? Any kind of data, like URLs, XML, javascript code and css most of the time. The truth is that every structured language is vulnerable.

When to use it?

No one knows where an attack can emerge, this question is a little bit optimistic. Certainty, there’s documents which should count with a escaping support as much as possible.

As it’s expected, templates in general and more particular email templates, containing renderers aware of the MIME format which become in a really complex at integrating pieces of HTML like validiting templates inside of a email you receive once you create an account. It’s that our backend APIs serializers must play with confident source data.

For PHP, zend-escaper stands out here exposing the needed functionalities for escaping. Concerning HTML, HTML attributes, JavaScript, CSS, and URLs zend-escaper offers methods for escaping over each one.

In the case of Django, it’s possible to just use the template engine integrated, that receives rendered data sended from the backend, in monolithic apps. There’s two main keyword that should take your attention, at working in the HTML once data comes. Those are safe and escape, assigned over user-provided data, for execute or not the escaping. It works in specific cases, but django count with an automatic escaping, which I should mentioned at the beginning of this paragraph. Anyways, we can trust directly on it, or disallowed it globally, and just be focused on each case, is not necessary say you to be carefully at setting that.

I find these topics very engaging, and I hope not having confused you. Thanks for reading!

--

--

Nahuel Molina | Silvio

This place is what I need for writing about programming, learning in general, and for reading people's thoughts