Skip to content

Codebase for a master’s thesis on automatic change detection, change categorization, and relevance-based ranking, featuring a pipeline designed to operate directly on compiled binaries.

Notifications You must be signed in to change notification settings

mxwilen/binary-change-detection

Repository files navigation

Version comparison classification

Note

Developed by Max Wilén & Jacob Ringfjord, as part of our CS Master thesis. Also see thesis repo here.

A pipeline for classification of Ghidriff output data. Uses syntax matching against functions and their parents/neighbors to categorize changes.

Prerequisite

Ghidra requires JDK to run, so make sure Java Development Kit (JDK) 17 or higher is available.

pip install -r requirements.txt

Install Ghidra

Note

Ghidra is a software reverse engineering (SRE) framework developed by the National Security Agency (NSA). It helps analyze compiled code on various platforms including Windows, macOS, and Linux.

Using homebrew

brew install --cask ghidra

Using Chocolatey

choco install ghidra

Then verify setup by running ghidraRun.

Setup Ghidriff

Note

ghidriff provides a command-line binary diffing capability with a fresh take on diffing workflow and results. This project, developed over the course of a year, leverages the power of Ghidra's ProgramAPI and FlatProgramAPI to find the added, deleted, and modified functions of two arbitrary binaries. It is written in Python3 using pyhidra to orchestrate Ghidra and jpype as the Python to Java interface to Ghidra. For more info, see Ghidriff repo

Already installed through requirements.txt using pip.

Set paths in .env

Change this path in the environment file.

GHIDRA_INSTALL_DIR=<path_to_here>/ghidra/XX.X-XXXXXXXX/ghidra_XX.X_PUBLIC/

How to run

  1. Put the two apks to analyze inside the folder signal-binaries
  2. Change the .env vars pointing to these binaries
APK_LOC_V1=signal-binaries/Signal_Android_7.30.2.apk APK_LOC_V2=signal-binaries/Signal_Android_7.31.0.apk
  1. Execute start scripts

    Windows:

     ./run.bat

    Linux / Mac:

    sh run.sh

Structure of parsing and categorization data

classDiagram	class GhidriffLog { +FuncCollection "Collection of detected functions" } class AddedFuncCollection { +list[FuncNode] "Detected functions" +FuncCollectionMetaData "Function metadata" +categorizationData "Categorization data" }	class ModifiedFuncCollection { +list[FuncNode] "Detected functions" +FuncCollectionMetaData "Function metadata" +categorizationData "Categorization data" }	class DeletedFuncCollection { +list[FuncNode] "Detected functions" +FuncCollectionMetaData "Function metadata" +categorizationData "Categorization data" } GhidriffLog --> AddedFuncCollection : has GhidriffLog --> ModifiedFuncCollection : has GhidriffLog --> DeletedFuncCollection : has 
Loading
classDiagram	class FuncCollection { +list[FuncNode] "Detected functions" +FuncCollectionMetaData "Function metadata" +categorizationData "Categorization data" } class FuncNode { +DiffData <diffData for this function> +variousMetadata "Meta data for this function" } class DiffData class FuncCollectionMetaData FuncCollection --> FuncNode : list of FuncCollection --> FuncCollectionMetaData : has FuncCollection --> categorizationData : has FuncNode --> DiffData : has 
Loading

About

Codebase for a master’s thesis on automatic change detection, change categorization, and relevance-based ranking, featuring a pipeline designed to operate directly on compiled binaries.

Topics

Resources

Stars

Watchers

Forks

Languages