Ending Injection Vulnerabilities
Injection Vulnerabilities are still common - even with Parameterised Queries, ORMs, etc.
But, there is something we can say about them:
You cannot have an Injection Vulnerability if the command (SQL, HTML, CLI, etc) does not include user data.
Which is why Libraries must receive user values separately from sensitive strings, e.g.
$articles->limit('word_count > ?', $count); // Correct
$articles->limit('word_count > ' . $count); // Insecure (concatenation)
$template->parse('<a href="?">Link</a>', $href); // Correct
$template->parse('<a href=' . $href . '>Link</a>'); // Insecure (concatenation)
This is why Libraries need Programming Languages to:
"Distinguish strings from a trusted developer, from strings that may be attacker controlled"
An idea explained in 2016 with Preventing Security Bugs through Software Design (Christoph Kern, Google Information Security Engineer). Also discussed at USENIX Security 2015 and OWASP AppSec US 2021.
The usual responses are:
-
Haven't we solved this issue with Parameterised Queries?
While they allow user values to be kept separate, mistakes still happen.
-
What about Database Abstractions, HTML Templating Engines, etc?
Libraries rely on developers using them correctly, and never making a mistake.
-
Why can't developers be trusted to escape values?
Because everyone makes mistakes with escaping; only Parameterised Queries and Libraries can handle user values consistently and safely.
-
Why can't we teach developers to never make a mistake?
Shall we keep trying this for another 20 years?
-
This approach sounds like Taint Checking, and that's flawed;
Yes, Taint Checking is flawed, but if we do not assume escaping creates "safe" output (context matters), we get to this "strings from a trusted developer" concept.
-
But what about dynamic data; like field names in SQL?
Use an Allow-List (of trusted developer strings), or Libraries must escape these values (consistently).
-
How about a variable number of parameters, e.g. '
WHERE id IN (?,?,?)
'Yes, you should use parameters.
-
What if developers use unsafe APIs directly; e.g. sending vulnerable SQL directly to the database?
That's part 2, where the output from Libraries gets marked as trusted for specific APIs (and some other special cases). This is how Trusted Types work in JavaScript.
-
What about parsing data; like decoding JSON, CSV files, images, etc?
This website is focused on the majority of developers who should simply use parsers/unparsers. The few who create parsers/unparsers should consider memory safe languages, and LangSec.
How can we distinguish strings from a trusted developer today?
PHP can use the
literal-string
type with Static Analysis (Psalm and PHPStan). Also, as most developers do not use these tools, and type checking can get complicated, the is_literal() RFC will help (thanks Joe Watkins, Máté Kocsis, Matthew Brown, and Ondřej Mirtes).Go can use an "un-exported string type". This is how Google's Safe HTML package works.
C++ can use a "consteval annotation" (thanks Jonathan Müller).
Rust can use a "procedural macro" (thanks Geoffroy Couprie).
Java can use a @CompileTimeConstant annotation from ErrorProne.
Node can use the isTemplateObject package, or goog.string.Const in Google's Closure Library.
JavaScript will hopefully get isTemplateObject, and TrustedHTML.fromLiteral (thanks Krzysztof Kotowicz).
Programming languages need to have this concept built in (like Go and C++), because anything seen as optional (e.g. Static Analysis) will be skipped by many developers (who probably need this the most).
This approach will bring an End to Injection Vulnerabilities.