CVE-2017-13782: CodeQL Study Note

This CVE is about the Apple’s macOS OS kernel. It allows attackers to read sensitive data from kernel’s address space. By Kevin Backhouse. 🚴🏼

XNU kernel is part of the Darwin operating system for use in macOS and iOS operating systems.

Preparation

Know a little bit about C++, CodeQL, and BSD subsystems code.

Starting Point

Kevin runs the analysis from LGTM.com locally, and found that there are many alerts for dtrace.c. Moreover, it has interpreter in the kernel, so likely there is a bug based on Kevin’s rich experience in software programming and security research background.

DTrace uses its own custom bytecode format. The main interpreter loop for the bytecode is in the function dtrace_dif_emulate. Validation is done by dtrace_difo_validate, which ensures that the bytecode does not perform any malicious actions.

Key Ideas

  • root -> /dev/dtrace + ioctl
  • any user -> /dev/dtracehelper -> register DTrace Helpers -> enable JIT Compilers to produce better stack traces. -> the ustack feature of DTrace Helpers does not work on macOS, but works well for an attacker to plant malicious DTrace helper

CodeQL Query

Kevin’s CodeQL Query: Github - DTraceUnsafeIndex.ql

By simply using a global data flow codeQL query, a CVE injection point is found.

Part 1

/**
 * @name DTrace unsafe index
 * @description DTrace registers are user-controllable, so they must not be
 *              used to index an array without a bounds check.
 * @kind path-problem
 * @problem.severity warning
 * @id apple-xnu/cpp/dtrace-unsafe-index
 */

import cpp  // Imports the standard CodeQL libraries for C/C++.
import semmle.code.cpp.dataflow.DataFlow  // the global data flow library
import DataFlow::PathGraph

class RegisterAccess extends ArrayExpr {   // a QL class to find all the register accesses
  RegisterAccess() {
    exists (LocalScopeVariable regs, Function emulate |   // so that ↓
      regs.getName() = "regs" and           // accesses an element of the array named regs
      emulate.getName() = "dtrace_dif_emulate" and  // regs in the function named dtrace_dif_emulate
      regs.getFunction() = emulate and       // regs must be in dtrace_dif_emulate
      this.getArrayBase() = regs.getAnAccess())   // Is this a kind of assignment?
  }
}

A typical RegisterAccess: rval = regs[rd];

Notes:

  • ArrayExpr is a CodeQL class for Expr[Expr]. A C/C++ array access expression. Commonly-used library classes can be found here.

  • Get all local variable regs with name regs. “regs” are LocalScopeVariable. DTrace bytecode uses 8 virtual registers, which are stored in an array named regs.

  • Get all functions emulate with name dtrace_dif_emulate, because it is Dtrace’s main interpreter loop as mentioned above

  • Limit the RegisterAccess conditions by only collecting the regs value in function dtrace_dif_emuluate in previous 2 steps.

  • LocalScopeVariable. A C/C++ variable with block scope. Indicates that the regs are inside the block scope of function dtrace_dif_emulate.

  • getArrayBase: Gets the array or pointer expression being subscripted. This is arr in both arr[0] and 0[arr].

  • getAnAccess: Gets an access to this variable.

  • If you are not familiar with some CodeQL definitions, can search at CodeQL library search

Part 2

Define the PointerUse class for potentially dangerous uses, such as indexing an array or deferencing a pointer.

class PointerUse extends Expr {
  PointerUse() {
    exists (ArrayExpr ae | this = ae.getArrayOffset()) or
    exists (PointerDereferenceExpr deref | this = deref.getOperand()) or
    exists (PointerAddExpr add | this = add.getAnOperand())
  }
}

Notes:

  • getArrayOffset: Gets the expression giving the index into the array. This is 0 in both arr[0] and 0[arr].
  • PointerDereferenceExpr: An instance of the built-in unary operator * applied to a type.
  • getOperand: Gets the operand of this unary operation.
  • getAnOperand: Gets an operand of this operation. Adding operation has two operands, take one.

Part 3

Try to know if there are any dataflow paths from a RegisterAccess (Source) to a PointerUse (Sink).

class DTraceUnsafeIndexConfig extends DataFlow::Configuration {
  DTraceUnsafeIndexConfig() {
    this = "DTraceUnsafeIndexConfig"
  }

  override predicate isSource(DataFlow::Node node) {  // Source is the RegisterAccess
    node.asExpr() instanceof RegisterAccess
  }

  override predicate isSink(DataFlow::Node node) {   // Sink is the dangerous PointerUse
    node.asExpr() instanceof PointerUse
  }
}

Notes:

  • asExpr: Gets the non-conversion expression corresponding to this node, if any. If this node strictly (in the sense of asConvertedExpr) corresponds to a Conversion, then the result is that Conversion’s non-Conversion base expression.

  • Data Flow Tracking

    • The Local Data Flow library is data flow within a single function.
    • Global data flow tracks data flow throughout the entire program, and is therefore more powerful than local data flow. However, global data flow is less precise than local data flow, and the analysis typically requires significantly more time and memory to perform.

Part 4

The Actual Query in the end:

from DTraceUnsafeIndexConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)             // Defines a condition on the variables.
select sink, source, sink, "DTrace unsafe index"   // Defines what to report for each match with a string that explains the problem.

/*
 * This query has 16 results. The 16th result is the vulnerability: dtrace_isa.c:817
 */

The following predicates are defined in the Global Data Flow configuration:

  • isSource: defines where data may flow from
  • isSink: defines where data may flow to
  • isBarrier: optional, restricts the data flow
  • isBarrierGuard: optional, restricts the data flow
  • isAdditionalFlowStep: optional, adds additional flow steps

The data flow analysis is performed using the predicate hasFlow(DataFlow::Node source, DataFlow::Node sink):

from MyDataFlowConfiguration dataflow, DataFlow::Node source, DataFlow::Node sink
where dataflow.hasFlow(source, sink)
select source, "Data flow to $@.", sink, sink.toString()
  • hasFlowPath Holds if data may flow from source to sink for this configuration. The corresponding paths are generated from the end-points and the graph included in the module PathGraph.

CodeQL Results

2020_12_10_1 (Image from Kevin’s demo)

This query produces 16 results, one of which is this pointer dereference which does not have a bounds check. The other 15 results are uninteresting. If we wanted to, we could further refine the query to reduce the number of false positives. For example, this result is a false positive, because the call to dtrace_canstore on line 5699 is a bounds check.

The bound check at Line 5704:

case DIF_OP_STB:
         if (!dtrace_canstore(regs[rd], 1, mstate, vstate)) {    // the bound check
            *flags |= CPU_DTRACE_BADADDR;
            *illval = regs[rd];
            break;
         }
         *((uint8_t *)(uintptr_t)regs[rd]) = (uint8_t)regs[r1];   // false positive
         break;

This is amazing: This query has 16 results. The 16th result is the vulnerability: dtrace_isa.c:817

PoC

-Github Link: CVE-2017-13782 PoC

Even though I learnt a little bit about C, I find it hard to fully understand the code, hope I can understand it some day, haha…🐰

According to his sharing, after figuring out the vulnerable point, Kevin spent times creating a dtrace object to trigger this bug, but he didn’t know how to debug in macOS kernel, so he spent a few days studying:

  • Extract the parsing code from the source code
  • test the dtrace file generated to check whether they are valid

But when he tried, nothing happens… Then he spent few more days to study the kernel to find what goes wrong…

He thinks Escalating Privileges adds more value to the cve, to be a true security expert.

Random Thoughts

I feel like using CodeQL is just like using an effective tool like Burp Suite, nmap, or whatever. The real power of security researcher is the experience to smell the vulnerable points may hide inside the interpreter with kernel, and to filter the false positives the tool generates, and finally write the PoC as the reporter Mr. Backhouse did. 🥾

Always embrace new challenges. 🧐

One sentence he says is really interesting: A security researcher is as good as the last CVE he found. 😆

Some resources mentioned in Kevin’s live sharing

Techniques:

  • Static Analysis -> scanning for large codebases, finding interesting places to look, codeQL, backwards from the potential places to trigger bugs to how to trigger it, need to test with unusual patterns sometimes
  • Manual Audit -> understand why that bug happens
  • Fuzzing -> developers usually test with valid input, effective on dense file formats. It is the opposite way to the static analysis, starting from the input to explore reachable paths.

On Ubuntu 18.04, can try :

Other CPP Codeql Examples:

Reference