# 19 August 2021 - CVSS (and other vulnerability severity systems) are hoplessly flawed Alternate title: vulnerability severities considered harmful. (preface: I'm discussing this in regard to vulnerabilities in libraries, rather than in products. I don't think all these arguments hold the same weight for products, but a good deal of them apply.) I've spent a lot of time, particularly over the last year while working on the Go vulnerability database project, looking at vulnerability reports and tooling designed to consume and surface vulnerability information. Most vulnerability reporting formats include some form of severity information as an attempt to quickly convey the impact of a security issue. If you've spent any time looking at vulnerability reports or scanning tools you'll be familiar with the common LOW, MEDIUM, HIGH, CRITICAL labels that get routinely affixed to one line descriptions of vulnerabilities. These descriptors are inherently flawed. The intended purpose is in theory, simple, but in reality it is deceptive. Their purpose is to convey, in a single word, the impact the vulnerability may have, and in turn allow the reader to determine whether they need to pay attention to the issue, or can safely ignore it and move on in favor of more important work. The problem with this should be immediately clear to anyone who has had to analyze vulnerabilities or asses their impact. The impact of a vulnerability is rarely universal. In the majority of cases the impact will be tightly scoped to how the affected code is utilized by the reliant program. A crash in a parser may be a critical severity issue if it is used to parse user supplied input and can be leveraged in a DoS attack, but if the parser is used to parse local configuration files calling the severity even low would be pretty ridiculous. The 'premier' vulnerability reporting format, MITRE's CVE, takes severity labeling a step further. The Common Vulnerability Scoring System (CVSS) attempts to take away the subjective element of severity labeling by specifying a formula for generating a 'score' using a number of indicators and then bucketing scores into various severity levels. This formula takes into account metrics such as attack vector, complexity, and exploitability into account. If you're following along you'll immediately realize that these components are _entirely subjective_. In our previous example what is the attack vector for a parser bug? It could be local, or it could be remote. It could be easy to exploit or it could be impossible. The person writing the vulnerability report cannot determine this, the only person who can is the one assessing the vulnerabilities impact themselves. The point I'm trying to get at is that severities are pointless, and are often used as a way to avoid writing a useful and descriptive descriptions of vulnerabilities. If the vulnerability is CRITICAL, surely you don't need to read why, fix it and move on! If the severity is LOW, why bother looking any deeper, you can ignore this one. This is a dark pattern that both reporters and consumers of this data fall into. What vulnerability reports _really need_ are good descriptions. Descriptions which detail what the issue is, how it can be triggered, and what consumers should consider when determining the impact on their own software. A well written description should negate the need for a severity indicator, and provide _actual insight_ into the issue at hand. Which of these is more useful for determining the impact of a vulnerability? 7.5 HIGH CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H or The HTML parser does not properly handle "in frameset" insertion mode, and can be made to panic when operating on malformed HTML that contains <template> tags. If operating on user input, this may be a vector for a denial of service attack. # 21 June 2021 - Some fuzzing thoughts around call graphs Can call graph analysis provide a better approach to directed fuzzing than traditional block coverage counters? Most traditional directed fuzzing techniques involve breaking a program into basic execution blocks and instrumenting those blocks with counters. These counters are then used to determine whether a mutated input expands the set of blocks which are executed by inspecting the set of non-zero counters. Essentially this is attempting to discover new nodes in a call graph, but without any knowledge of the shape of the graph or the path through it that was taken. Whereas naive directed fuzzing generally just does brute force mutations on it's entire corpus equally in order to discover new inputs which expand the coverage, most advanced fuzzing strategies are centered on attempting to focus efforts on some subset of inputs which are more likely than others to expand coverage. This is essentially a graph problem. We are attempting to visit nodes in the call graph which were previously unvisited, and we want to focus on inputs which result in paths which contain nodes incidental(?) to large unvisited subgraphs. For most fuzzing strategies I am aware of though, this is done without any real knowledge of the call graph, other than either the visited nodes, or the exercised directed edges. For example given the call graph (a) below, for a fully covered program you only really see either (b) or (c). a b c 1 1 /| /| 2 3-4 2 3 4 - | | 5 5 This obviously reduces the fuzzers ability to figure out how to more accurately focus its efforts. Imagine we have two inputs, one of which exercises the nodes 1 and 2 and another which exercises 1 and 3. With no knowledge of the call graph we would exercise both inputs equally, since they could both feasibly lead to inputs which produce further coverage. With knowledge of the call graph we know that the input which exercises nodes 1 and 2 is likely to produce nothing else of interest, since it ends in a dead end. It seems like with a more heavyweight approach to coverage instrumentation, that for instance provides a lightweight call graph approximation, the fuzzer would be able to make significantly more informed choices about how to focus its efforts. For most fuzzers this is probably somewhat complicated, since you really need insight into the compiler/runtime to construct a call graph which matches reality. But for fuzzers tightly integrated into languages, this is perhaps a more viable approach?