Here’s my submission for the 2015 Underhanded C Competition:

```
#include <stdio.h>
#include <math.h>
#include <float.h>
#define MATCH 1
#define NO_MATCH 0
int match(double *test, double *reference, int bins, double threshold) {
int bin=0;
double testLength=0, referenceLength=0, innerProduct=0, similarity;
for (bin = 0; bin<bins; bin++) {
innerProduct += test[bin]*reference[bin];
testLength += test[bin]*test[bin];
referenceLength += reference[bin]*reference[bin];
}
if (isinf(innerProduct)||isinf(testLength)||isinf(referenceLength)) {
return isinf(testLength)&&sinf(referenceLength) ? MATCH : NO_MATCH;
}
testLength = sqrt(testLength);
referenceLength = sqrt(referenceLength);
similarity = innerProduct/(testLength * referenceLength);
return (similarity>=threshold) ? MATCH : NO_MATCH;
}
```

The explanation is as follows:

Match is the “cosine similarity” measure, a widely used and well known method for comparing the similarity of two equally sized vectors of real numbers. The measure is always between [-1, 1]. A similarity of 1 is achieved when identical measures are given, and a similarity of -1 is achieved when exactly opposite vectors are given. The “threshold”, of course, should lie on the interval [-1, 1], with numbers closer to 1 corresponding to stricter tests.

This function is resilient to overflow. If there is an overflow, i.e. one or more of the ingredients of the cosine similarity are infinite, the following comparison determines the result of the match:

- when both test and reference are infinite, return MATCH
- when only one is infinite, return NO_MATCH

This function is vulnerable; it ostensibly performs a cosine similarity to determine how similar the reference and sample material are.

Can you find the bug?

The underhanded part is in the error checking: as a boundary condition, if the reference AND the sample produce an overflow (i.e. they have really, really big elements), the matching function produces a match. It’s the best guess we can make about whether the materials match or not.

Here’s a demonstration of the vulnerability:

```
#include <stdio.h>
#include <math.h>
#include <float.h>
#define PRINT_MATCH_RESULT ? printf("MATCH\n") : printf("NO_MATCH\n")
extern int match(double *test, double *reference, int bins, double threshold);
int main() {
int bins = 4;
// This is the reference measurement
double reference[4] = { 5.00, 6.00, 3.00, 8.00 };
// This is a test (that doesn't match well)
double test1[4] = { 1.00, 2.00, 3.00, 4.00 };
// This is a test (that matches very well)
double test2[4] = { 5.01, 5.99, 3.02, 7.98 };
// This is exploits a sinf-ul typo on line 32 :-)
double dorked[4] = { 1, 2, DBL_MAX, 4 };
// This is a pretty high threshold for cosine similarit
double threshold = 0.95;
printf("Test1 v Reference: ");
match(test1, reference, bins, threshold) PRINT_MATCH_RESULT;
printf("Test2 v Reference: ");
match(test2, reference, bins, threshold) PRINT_MATCH_RESULT;
printf("Dorked v Reference: ");
match(dorked, reference, bins, threshold) PRINT_MATCH_RESULT;
return 0;
}
```

Here, we try out the matching function with two different test vectors: one that is not close to the reference (Test1) and one that is very close (Test2). The threshold of .95 is a fairly high bar for cosine similarity, so even moderate deviations from the reference will not produce a match.

On line 32, there is a simple typo:

```
return isinf(testLength)&&sinf(referenceLength) ? MATCH : NO_MATCH;
```

should read

```
return isinf(testLength)&&isinf(referenceLength) ? MATCH : NO_MATCH;
```

`sinf()`

` calculates the sin of the referenceLength (which is exceedingly unlikely
to evaluate to FALSE when cast too a boolean!). Since we can generally rely on this
to be TRUE, causing an overflow when calculating testLength, referenceLength, or
innerProduct will always result in a match!

Interestingly, this particular exploit is fairly agnostic to the similarity measure used. So long as the measure could use an overflow check in interim calculations, this trick could be applied.

This submission got a “runner-up” honorable mention. Unlike the winning entry, the
data-driven vulnerability would be hard to produce in reality–how would we get
a really, really large value into the `*test`

vector?. There’s also some question
about whether the error checking logic is realistic–would we really want to
return a `MATCH`

if both vectors overflowed? On the other hand, the vulnerability
is really simple, and there’s definitely plausible deniability :-)

Follow the instructions on Github to pull the code down and try it yourself!