It runs code discovery over a large-ish binary to test coverage. We currently fail due to unsupported instructions (expected). This test will guide priorities on implementing new semantics.