Ending Injection Vulnerabilities
Injection Vulnerabilities are still common - even with Parameterised Queries, ORMs, etc.
But, we can stop them completely, because:
You cannot have an Injection Vulnerability if the command (SQL, HTML, CLI, etc) does not include user data.
This is why Libraries must receive user values separately from sensitive strings, e.g.
$articles->limit('word_count > ?', $count); // Correct
$articles->limit('word_count > ' . $count); // Insecure (concatenation)
$template->parse('<a href="?">Link</a>', $href); // Correct
$template->parse('<a href=' . $href . '>Link</a>'); // Insecure (concatenation)
To do this, Libraries need Programming Languages to:
"Distinguish strings from a trusted developer, from strings that may be attacker controlled"
This was explained in 2016 with Preventing Security Bugs through Software Design (Christoph Kern, Google Information Security Engineer). Also discussed at USENIX Security 2015 and OWASP AppSec US 2021.
The usual responses are:
-
Haven't we solved this issue with Parameterised Queries?
While they allow user values to be kept separate, it's not required, and mistakes happen.
-
What about Database Abstractions, HTML Templating Engines, etc?
Libraries rely on developers using them correctly, and never making a mistake.
-
Why can't developers be trusted to escape values?
Because everyone makes mistakes with escaping; only Parameterised Queries and Libraries can handle user values consistently and safely.
-
Why can't we teach developers to never make a mistake?
Shall we keep trying this for another 20 years?
-
This approach sounds like Taint Checking, and that's flawed;
Yes, Taint Checking is flawed, but if we do not assume escaping creates "safe" output (context matters), we get to this "strings from a trusted developer" concept.
-
But what about dynamic data; like field names in SQL?
Use an Allow-List (of trusted developer strings), or Libraries must escape these values (consistently).
-
How about a variable number of parameters, e.g. '
WHERE id IN (?,?,?)
'Yes, you should use parameters.
-
What if developers use unsafe APIs directly; e.g. sending vulnerable SQL directly to the database?
That's part 2, where the output from Libraries gets marked as trusted for specific APIs (and some other special cases). This is how Trusted Types work in JavaScript (could use stringable value-objects).
-
What about parsing data; like decoding JSON, CSV files, images, etc?
This website is focused on the majority of developers who should simply use parsers/unparsers. The few who create parsers/unparsers should consider memory safe languages, and LangSec.
How can we distinguish strings from a trusted developer today?
Python can use the LiteralString type in 3.11 (pyre example, via PEP 675; thanks to Pradeep Kumar, Graham Bleaney, and Jelle Zijlstra).
PHP can use the
literal-string
type with Static Analysis (Psalm and PHPStan). But, most developers do not use these tools, and type checking can get complicated, which is why the LiteralString RFC will help (thanks Joe Watkins, Máté Kocsis, Matthew Brown, and Ondřej Mirtes).Go can use an "un-exported string type". This is how Google's Safe HTML package works.
C++ can use a "consteval annotation" (thanks Jonathan Müller).
C# can use a "ConstantExpected annotation"
Scala can use "String with Singleton" (thanks Tamer Abdulradi).
Java can use a @CompileTimeConstant annotation from ErrorProne.
Rust can use a "procedural macro" (thanks Geoffroy Couprie).
Node can use the isTemplateObject package, or goog.string.Const in Google's Closure Library.
JavaScript will hopefully get isTemplateObject, and TrustedHTML.fromLiteral (thanks Krzysztof Kotowicz).
Programming languages need to have this concept built in (like Go and C++), because anything seen as optional (e.g. Static Analysis) will be skipped by many developers (who probably need this the most).
This approach will bring an End to Injection Vulnerabilities.