Skip to content

📃 System Call Dependy Graph extractor (SemaSCDG)¤

This repository contains a first version of a SCDG extractor. During symbolic analysis of a binary, all system calls and their arguments found are recorded. After some stop conditions for symbolic analysis, a graph is build as follow : Nodes are systems Calls recorded, edges show that some arguments are shared between calls.

How to use ?¤

First run the SCDG container with volumes like this :

docker run --rm --name="sema-scdg" -v ${PWD}/OutputFolder:/sema-scdg/application/database/SCDG -v ${PWD}/ConfigFolder:/sema-scdg/application/configs -v ${PWD}/InputFolder:/sema-scdg/application/database/Binaries -p 5001:5001 -it sema-scdg bash
In this command:

  • The first volume corresponds to the output folder where the results will be put.
  • The second volume corresponds to the folder containing the configuration files that will be passed to the docker.
  • The third matches the folder containing the binaries that are going to be passed to the container.

Example taking the files already provided, being inside the sema_toolchain folder, run :

docker run --rm --name="sema-scdg" -v ${PWD}/database/SCDG:/sema-scdg/application/database/SCDG -v ${PWD}/sema_scdg/application/configs:/sema-scdg/application/configs -v ${PWD}/database/Binaries:/sema-scdg/application/database/Binaries -p 5001:5001 -it sema-scdg bash

If you want to be able to modify the code when the container is running, use

docker run --rm --name="sema-scdg" -v ${PWD}/database:/sema-scdg/application/database -v ${PWD}/sema_scdg/application:/sema-scdg/application -p 5001:5001 -it sema-scdg bash

To run experiments, run inside the container :

python3 SemaSCDG.py configs/config.ini
Or if you want to use pypy3:
pypy3 SemaSCDG.py configs/config.ini

Configuration files¤

The parameters are put in a configuration file : configs/config.ini. Feel free to modify it or create new configuration files to run different experiments.

The output of the SCDG are put into database/SCDG/runs/ by default. If you are not using volumes and want to save some runs from the container to your host machine, use :

make save-scdg-runs ARGS=PATH

Parameters description¤

SCDG module arguments

expl_method:
  DFS                 Depth First Search
  BFS                 Breadth First Search
  CDFS                Coverage Depth-First Search Strategy (Default)
  CBFS                Coverage Breadth First Search

graph_output:
  gs                  .GS format
  json                .JSON format
  EMPTY               if left empty then build on all available format

packing_type:
  symbion             Concolic unpacking method (linux | windows [in progress])
  unipacker           Emulation unpacking method (windows only)

SCDG exploration techniques parameters:
  jump_it              Number of iteration allowed for a symbolic loop (default : 3)
  max_in_pause_stach   Number of states allowed in pause stash (default : 200)
  max_step             Maximum number of steps allowed for a state (default : 50 000)
  max_end_state        Number of deadended state required to stop (default : 600)
  max_simul_state      Number of simultaneous states we explore with simulation manager (default : 5)

Binary parameters:
  n_args                  Number of symbolic arguments given to the binary (default : 0)
  loop_counter_concrete   How many times a loop can loop (default : 10240)
  count_block_enable      Enable the count of visited blocks and instructions
  sim_file                Create SimFile
  entry_addr              Entry address of the binary

SCDG creation parameter:
  min_size             Minimum size required for a trace to be used in SCDG (default : 3)
  disjoint_union       Do we merge traces or use disjoint union ? (default : merge)
  not_comp_args        Do we compare arguments to add new nodes when building graph ? (default : comparison enabled)
  three_edges          Do we use the three-edges strategy ? (default : False)
  not_ignore_zero      Do we ignore zero when building graph ? (default : Discard zero)
  keep_inter_SCDG      Keep intermediate SCDG in file (default : False)
  eval_time            TODO

Global parameter:
  concrete_target_is_local      Use a local GDB server instead of using cuckoo (default : False)
  print_syscall                 Print the syscall found
  csv_file                      Name of the csv to save the experiment data
  plugin_enable                 Enable the plugins set to true in the config.ini file
  approximate                   Symbolic approximation
  is_packed                     Is the binary packed ? (default : False, not yet supported)
  timeout                       Timeout in seconds before ending extraction (default : 600)
  string_resolve                Do we try to resolv references of string (default : True)
  log_level_sema                Level of log of sema, can be INFO, DEBUG, WARNING, ERROR (default : INFO)
  log_level_angr                Level of log of angr, can be INFO, DEBUG, WARNING, ERROR (default : ERROR)
  log_level_claripy             Level of log of claripy, can be INFO, DEBUG, WARNING, ERROR (default : ERROR)
  family                        Family of the malware (default : Unknown)
  exp_dir                       Name of the directory to save SCDG extracted (default : Default)
  binary_path                   Relative path to the binary or directory (has to be in the database folder)
  fast_main                     Jump directly into the main function

Plugins:
  plugin_env_var          Enable the env_var plugin
  plugin_locale_info      Enable the locale_info plugin
  plugin_resources        Enable the resources plugin
  plugin_widechar         Enable the widechar plugin
  plugin_registry        Enable the registry plugin
  plugin_atom             Enable the atom plugin
  plugin_thread           Enable the thread plugin
  plugin_track_command    Enable the track_command plugin
  plugin_ioc_report       Enable the ioc_report plugin
  plugin_hooks            Enable the hooks plugin

The binary path has to be a relative path to a binary beeing into the database directory

To know the details of the angr options see Angr documentation

You also have a script MergeGspan.py in sema_scdg/application/helper which could merge all .gs from a directory into only one file.

Run multiple experiments automatically¤

If you wish to run multiple experiments with different configuration files, the script multiple_experiments.sh is available and can be used inside the scdg container:

# To show usage
./multiple_experiments.sh -h

# Run example
./multiple_experiments.sh -m python3 -c configs/config1 configs/config2

Tests¤

To run the test, inside the docker container :

python3 scdg_tests.py test_data/config_test.ini

Tutorial¤

There is a jupyter notebook providing a tutorial on how to use the scdg. To launch it, inside the docker, run

jupyter notebook --ip=0.0.0.0 --port=5001 --no-browser --allow-root --IdentityProvider.token=''
and visit http://127.0.0.1:5001/tree on your browser. Go to /Tutorial and open the jupyter notebook.