# 19 August 2021 - CVSS (and other vulnerability severity systems) are hoplessly flawed

Alternate title: vulnerability severities considered harmful.

(preface: I'm discussing this in regard to vulnerabilities in libraries, rather than in products. I don't
think all these arguments hold the same weight for products, but a good deal of them apply.)

I've spent a lot of time, particularly over the last year while working on the Go vulnerability database project,
looking at vulnerability reports and tooling designed to consume and surface vulnerability information. Most
vulnerability reporting formats include some form of severity information as an attempt to quickly convey the
impact of a security issue. If you've spent any time looking at vulnerability reports or scanning tools you'll
be familiar with the common LOW, MEDIUM, HIGH, CRITICAL labels that get routinely affixed to one line descriptions
of vulnerabilities.

These descriptors are inherently flawed. The intended purpose is in theory, simple, but in reality it is deceptive.
Their purpose is to convey, in a single word, the impact the vulnerability may have, and in turn allow the reader
to determine whether they need to pay attention to the issue, or can safely ignore it and move on in favor of more
important work. The problem with this should be immediately clear to anyone who has had to analyze vulnerabilities
or asses their impact. The impact of a vulnerability is rarely universal. In the majority of cases the
impact will be tightly scoped to how the affected code is utilized by the reliant program. A crash in a parser may
be a critical severity issue if it is used to parse user supplied input and can be leveraged in a DoS attack, but
if the parser is used to parse local configuration files calling the severity even low would be pretty ridiculous.

The 'premier' vulnerability reporting format, MITRE's CVE, takes severity labeling a step further. The Common
Vulnerability Scoring System (CVSS) attempts to take away the subjective element of severity labeling by specifying
a formula for generating a 'score' using a number of indicators and then bucketing scores into various severity levels.
This formula takes into account metrics such as attack vector, complexity, and exploitability into account. If you're
following along you'll immediately realize that these components are _entirely subjective_. In our previous example
what is the attack vector for a parser bug? It could be local, or it could be remote. It could be easy to exploit or
it could be impossible. The person writing the vulnerability report cannot determine this, the only person who can is
the one assessing the vulnerabilities impact themselves.

The point I'm trying to get at is that severities are pointless, and are often used as a way to avoid writing a useful
and descriptive descriptions of vulnerabilities. If the vulnerability is CRITICAL, surely you don't need to read why,
fix it and move on! If the severity is LOW, why bother looking any deeper, you can ignore this one. This is a dark
pattern that both reporters and consumers of this data fall into.

What vulnerability reports _really need_ are good descriptions. Descriptions which detail what the issue is, how it can
be triggered, and what consumers should consider when determining the impact on their own software. A well written
description should negate the need for a severity indicator, and provide _actual insight_ into the issue at hand.

Which of these is more useful for determining the impact of a vulnerability?

  7.5 HIGH CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

or

  The HTML parser does not properly handle "in frameset" insertion mode, and can be made
  to panic when operating on malformed HTML that contains <template> tags. If operating
  on user input, this may be a vector for a denial of service attack.


# 21 June 2021 - Some fuzzing thoughts around call graphs

Can call graph analysis provide a better approach to directed fuzzing than traditional block coverage counters?

Most traditional directed fuzzing techniques involve breaking a program into basic execution blocks and instrumenting
those blocks with counters. These counters are then used to determine whether a mutated input expands the set of
blocks which are executed by inspecting the set of non-zero counters. Essentially this is attempting to discover
new nodes in a call graph, but without any knowledge of the shape of the graph or the path through it that
was taken.

Whereas naive directed fuzzing generally just does brute force mutations on it's entire corpus equally in order to
discover new inputs which expand the coverage, most advanced fuzzing strategies are centered on attempting to focus
efforts on some subset of inputs which are more likely than others to expand coverage.

This is essentially a graph problem. We are attempting to visit nodes in the call graph which were previously
unvisited, and we want to focus on inputs which result in paths which contain nodes incidental(?) to large
unvisited subgraphs. For most fuzzing strategies I am aware of though, this is done without any real knowledge
of the call graph, other than either the visited nodes, or the exercised directed edges. For example given the
call graph (a) below, for a fully covered program you only really see either (b) or (c).

    a       b       c

    1       1
   /|              /|
  2 3-4   2 3 4      -
    |               |
    5       5

This obviously reduces the fuzzers ability to figure out how to more accurately focus its efforts. Imagine we
have two inputs, one of which exercises the nodes 1 and 2 and another which exercises 1 and 3. With no
knowledge of the call graph we would exercise both inputs equally, since they could both feasibly lead to
inputs which produce further coverage. With knowledge of the call graph we know that the input which
exercises nodes 1 and 2 is likely to produce nothing else of interest, since it ends in a dead end.

It seems like with a more heavyweight approach to coverage instrumentation, that for instance provides a
lightweight call graph approximation, the fuzzer would be able to make significantly more informed choices
about how to focus its efforts.

For most fuzzers this is probably somewhat complicated, since you really need insight into the compiler/runtime
to construct a call graph which matches reality. But for fuzzers tightly integrated into languages, this is
perhaps a more viable approach?