Underhanded C Contest Submission (2015)

February 28, 2016

Here’s my submission for the 2015 Underhanded C Competition:

#include <stdio.h>
#include <math.h>
#include <float.h>
#define MATCH 1
#define NO_MATCH 0

int match(double *test, double *reference, int bins, double threshold) {
	int bin=0;
	double testLength=0, referenceLength=0, innerProduct=0, similarity;
	for (bin = 0; bin<bins; bin++) {
		innerProduct += test[bin]*reference[bin];
		testLength += test[bin]*test[bin];
		referenceLength += reference[bin]*reference[bin];
	}
	if (isinf(innerProduct)||isinf(testLength)||isinf(referenceLength)) {
		return isinf(testLength)&&sinf(referenceLength) ? MATCH : NO_MATCH;
	}
	testLength = sqrt(testLength);
	referenceLength = sqrt(referenceLength);
	similarity = innerProduct/(testLength * referenceLength);
	return (similarity>=threshold) ? MATCH : NO_MATCH;
}

The explanation is as follows:

Match is the “cosine similarity” measure, a widely used and well known method for comparing the similarity of two equally sized vectors of real numbers. The measure is always between [-1, 1]. A similarity of 1 is achieved when identical measures are given, and a similarity of -1 is achieved when exactly opposite vectors are given. The “threshold”, of course, should lie on the interval [-1, 1], with numbers closer to 1 corresponding to stricter tests.

This function is resilient to overflow. If there is an overflow, i.e. one or more of the ingredients of the cosine similarity are infinite, the following comparison determines the result of the match:

  • when both test and reference are infinite, return MATCH
  • when only one is infinite, return NO_MATCH

This function is vulnerable; it ostensibly performs a cosine similarity to determine how similar the reference and sample material are.

Can you find the bug?

The underhanded part is in the error checking: as a boundary condition, if the reference AND the sample produce an overflow (i.e. they have really, really big elements), the matching function produces a match. It’s the best guess we can make about whether the materials match or not.

Here’s a demonstration of the vulnerability:

#include <stdio.h>
#include <math.h>
#include <float.h>

#define PRINT_MATCH_RESULT ? printf("MATCH\n") : printf("NO_MATCH\n")
extern int match(double *test, double *reference, int bins, double threshold);

int main() {
	int bins = 4;
	// This is the reference measurement
	double reference[4] = { 5.00, 6.00, 3.00, 8.00 };
	// This is a test (that doesn't match well)
	double test1[4]		= { 1.00, 2.00, 3.00, 4.00 };
	// This is a test (that matches very well)
	double test2[4]		= { 5.01, 5.99, 3.02, 7.98 };
	// This is exploits a sinf-ul typo on line 32 :-)
	double dorked[4]	= { 1, 2, DBL_MAX, 4 };
	// This is a pretty high threshold for cosine similarit			
	double threshold	= 0.95;								

	printf("Test1 v Reference:     ");
	match(test1, reference, bins, threshold) PRINT_MATCH_RESULT;
	printf("Test2 v Reference:     ");
	match(test2, reference, bins, threshold) PRINT_MATCH_RESULT;
	printf("Dorked v Reference:    ");
	match(dorked, reference, bins, threshold) PRINT_MATCH_RESULT;
	return 0;
}

Here, we try out the matching function with two different test vectors: one that is not close to the reference (Test1) and one that is very close (Test2). The threshold of .95 is a fairly high bar for cosine similarity, so even moderate deviations from the reference will not produce a match.

On line 32, there is a simple typo:

	return isinf(testLength)&&sinf(referenceLength) ? MATCH : NO_MATCH;

should read

	return isinf(testLength)&&isinf(referenceLength) ? MATCH : NO_MATCH;

sinf()` calculates the sin of the referenceLength (which is exceedingly unlikely to evaluate to FALSE when cast too a boolean!). Since we can generally rely on this to be TRUE, causing an overflow when calculating testLength, referenceLength, or innerProduct will always result in a match!

Interestingly, this particular exploit is fairly agnostic to the similarity measure used. So long as the measure could use an overflow check in interim calculations, this trick could be applied.

This submission got a “runner-up” honorable mention. Unlike the winning entry, the data-driven vulnerability would be hard to produce in reality–how would we get a really, really large value into the *test vector?. There’s also some question about whether the error checking logic is realistic–would we really want to return a MATCH if both vectors overflowed? On the other hand, the vulnerability is really simple, and there’s definitely plausible deniability :-)

Follow the instructions on Github to pull the code down and try it yourself!