Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.
/ cpggen Public archive

Generate CPG for multiple languages for code and threat analysis

License

Notifications You must be signed in to change notification settings

AppThreat/cpggen

Repository files navigation

CPG Generator

 ██████╗██████╗  ██████╗
██╔════╝██╔══██╗██╔════╝
██║     ██████╔╝██║  ███╗
██║     ██╔═══╝ ██║   ██║
╚██████╗██║     ╚██████╔╝
 ╚═════╝╚═╝      ╚═════╝

CPG Generator is a python cli tool to generate Code Property Graph, a novel intermediate representation, for code and threat analysis. The generated CPG can be directly imported to Joern for analysis.

release Downloads Discord

Pre-requisites

  • JDK 11 or above
  • Python 3.10
  • Docker or podman (Windows, Linux or Mac) or
  • Atom ⚛ or Joern

Installation

cpggen is available as a single executable binary, PyPI package or as a container image.

Single executable binaries

Download the executable binary for your operating system from the releases page. These binary bundle the following:

  • Atom ⚛
  • cpggen with Python 3.10
  • cdxgen with Node.js 18 - Generates SBoM
curl -LO https://github.com/AppThreat/cpggen/releases/latest/download/cpggen-linux-amd64
chmod +x cpggen-linux-amd64
./cpggen-linux-amd64 --help

Atom based frontend.

curl -LO https://github.com/AppThreat/cpggen/releases/latest/download/atomgen
chmod +x atomgen
./atomgen --help

On Windows,

curl -LO https://github.com/appthreat/cpggen/releases/latest/download/cpggen.exe
.\cpggen.exe --help

NOTE: On Windows, antivirus and antimalware could prevent this single executable from functioning properly. Depending on the system, administrative privileges might also be required. Use container-based execution as a fallback.

OCI Artifacts via ORAS cli

Use ORAS cli to download the cpggen binary on Linux and Windows.

VERSION="1.0.0"
curl -LO "https://github.com/oras-project/oras/releases/download/v${VERSION}/oras_${VERSION}_linux_amd64.tar.gz"
mkdir -p oras-install/
tar -zxf oras_${VERSION}_*.tar.gz -C oras-install/
sudo mv oras-install/oras /usr/local/bin/
rm -rf oras_${VERSION}_*.tar.gz oras-install/
oras pull ghcr.io/appthreat/cpggen-bin:v1
chmod +x cpggen-linux-amd64
./cpggen-linux-amd64 --help

On Windows

set VERSION="1.0.0"
curl.exe -sLO  "https://github.com/oras-project/oras/releases/download/v%VERSION%/oras_%VERSION%_windows_amd64.zip"
tar.exe -xvzf oras_%VERSION%_windows_amd64.zip
mkdir -p %USERPROFILE%\bin\
copy oras.exe %USERPROFILE%\bin\
set PATH=%USERPROFILE%\bin\;%PATH%
Invoke-WebRequest -Uri https://github.com/oras-project/oras/releases/download/v1.0.0/oras_1.0.0_windows_amd64.zip -UseBasicParsing -OutFile oras_1.0.0_windows_amd64.zip
Expand-Archive -Path oras_1.0.0_windows_amd64.zip -DestinationPath .
oras.exe pull ghcr.io/appthreat/cpggen-windows-bin:v1

PyPI package

This would install the python cli tool with bundled atom distribution.

pip install cpggen

With atom, CPG can be generated for the following languages:

  • C/C++
  • Java
  • Jars
  • JavaScript/TypeScript
  • Python

Install joern and set the JOERN_HOME environment variable if you would like support for additional languages and binaries.

Bundled container image

docker pull ghcr.io/appthreat/cpggen
# podman pull ghcr.io/appthreat/cpggen

Use the AWS Public ECR mirror for those T days when ghcr becomes unavailable.

docker pull public.ecr.aws/appthreat/cpggen:latest
# podman pull public.ecr.aws/appthreat/cpggen:latest

Almalinux 9 requires the CPU to support SSE4.2. For kvm64 VM use the Almalinux 8 version instead.

docker pull ghcr.io/appthreat/cpggen-alma8
# podman pull ghcr.io/appthreat/cpggen-alma8

Or use the nightly to always get the latest joern and tools.

docker pull ghcr.io/appthreat/cpggen:nightly
# podman pull ghcr.io/appthreat/cpggen:nightly

Finally, a slimmer image based on atom distribution.

docker pull ghcr.io/appthreat/atomgen
# podman pull ghcr.io/appthreat/atomgen

Usage

To auto detect the language from the current directory and generate CPG.

cpggen

To specify input and output directory.

cpggen -i <src directory> -o <CPG directory or file name>

You can even pass a git or a package url or CVE id as source

cpggen -i https://github.com/HooliCorp/vulnerable-aws-koa-app -o /tmp/cpg
cpggen -i "pkg:maven/org.apache.commons/[email protected]" -o /tmp/cpg
export GITHUB_TOKEN=<token with read:packages scope>
cpggen -i CVE-2023-32681 -o /tmp/cpg

cpggen -i GHSA-j8r2-6x86-q33q -o /tmp/cpg

To specify language type.

cpggen -i <src directory> -o <CPG directory or file name> -l java

# Comma separated values are accepted for multiple languages
cpggen -i <src directory> -o <CPG directory or file name> -l java,js,python

Container-based invocation

docker run --rm -it -v /tmp:/tmp -v $(pwd):/app:rw --cpus=4 --memory=16g -t ghcr.io/appthreat/cpggen cpggen -i <src directory> -o <CPG directory or file name>

Export graphs

By passing --export, cpggen can export the various graphs to many formats using joern-export

Example to export cpg14 graphs in dot format

cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --build --export --export-out-dir ~/work/sandbox/crAPI/cpg_export

To export cpg in neo4jcsv format

cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --build --export --export-out-dir ~/work/sandbox/crAPI/cpg_export --export-repr cpg --export-format neo4jcsv

Slicing graphs

Pass --slice argument to extract intra-procedural slices from the CPG. By default, slices would be based on Usages. Pass --slice-mode DataFlow to create a sliced CPG based on DataFlow.

cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --slice

Creating vectors

Pass --vectors argument to extract vector representations of code from CPG in json format.

cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --vectors

Artifacts produced

Upon successful completion, cpggen would produce the following artifacts in the directory specified under out_dir

  • {name}-{lang}.⚛ - Atom representation for the given language. Requires the use of atomgen container image or the cli argument --use-atom
  • {name}-{lang}.cpg.bin - Code Property Graph for the given language type
  • {name}-{lang}.bom.json - SBoM in CycloneDX json format. Requires the environment variable ENABLE_SBOM to be set to true
  • {name}-{lang}.manifest.json - A json file listing the generated artifacts and the invocation commands

Server mode

cpggen can run in server mode.

cpggen --server

You can invoke the endpoint /cpg to generate CPG from a path, http or package url. Parameters can be passed using GET or POST request.

curl "http://127.0.0.1:7072/cpg?src=/Volumes/Work/sandbox/vulnerable-aws-koa-app&out_dir=/tmp/cpg_out&lang=js"
curl "http://127.0.0.1:7072/cpg?url=https://github.com/HooliCorp/vulnerable-aws-koa-app&out_dir=/tmp/cpg_out&lang=js"

Package url with slicing.

curl "http://127.0.0.1:7072/cpg?url=pkg:maven/org.apache.commons/[email protected]&out_dir=/tmp/cpg_out&slice=true"

Languages supported

Language Requires build Maturity
C No High
C++ No High
Java No (*) Medium
Scala Yes High
JavaScript No Medium
TypeScript No Medium
Kotlin No (*) Low
Php No Low
Python No Low

(*) - Precision could be improved with dependencies

EXPERIMENTAL: Use the provided protobuf bindings to build new language frontends.

Full list of options

cpggen --help
usage: cpggen [-h] [-i SRC] [-o CPG_OUT_DIR] [-l LANGUAGE] [--use-container] [--build] [--joern-home JOERN_HOME] [--server] [--server-host SERVER_HOST] [--server-port SERVER_PORT] [--export]
              [--export-repr {ast,cfg,cdg,ddg,pdg,cpg,cpg14,all}] [--export-format {neo4jcsv,graphml,graphson,dot}] [--export-out-dir EXPORT_OUT_DIR] [--verbose] [--skip-sbom] [--slice] [--slice-mode {Usages,DataFlow}] [--use-parse]

CPG Generator

optional arguments:
  -h, --help            show this help message and exit
  -i SRC, --src SRC     Source directory or url or CVE or GHSA id
  -o CPG_OUT_DIR, --out-dir CPG_OUT_DIR
                        CPG output directory
  -l LANGUAGE, --lang LANGUAGE
                        Optional. CPG language frontend to use. Auto-detects by default.
  --use-container       Use cpggen docker image
  --build               Attempt to build the project automatically
  --joern-home JOERN_HOME
                        Joern installation directory
  --server              Run cpggen as a server
  --server-host SERVER_HOST
                        cpggen server host
  --server-port SERVER_PORT
                        cpggen server port
  --export              Export CPG as a graph
  --export-repr {ast,cfg,cdg,ddg,pdg,cpg,cpg14,all}
                        Graph representation to export
  --export-format {neo4jcsv,graphml,graphson,dot}
                        Export format
  --export-out-dir EXPORT_OUT_DIR
                        Export output directory
  --verbose             Run cpggen in verbose mode
  --skip-sbom           Do not generate SBoM
  --slice               Extract intra-procedural slices from the CPG
  --slice-mode {Usages,DataFlow}
                        Mode used for CPG slicing
  --use-atom            Use atom toolkit
  --vectors             Extract vector representations of code from CPG

Environment variables

Name Purpose
JOERN_HOME Optional when using atom. Joern installation directory
CPGGEN_HOST cpggen server host. Default 127.0.0.1
CPGGEN_PORT cpggen server port. Default 7072
CPGGEN_CONTAINER_CPU CPU units to use in container execution mode. Default computed
CPGGEN_CONTAINER_MEMORY Memory units to use in container execution mode. Default computed
CPGGEN_MEMORY Heap memory to use for frontends. Default computed
AT_DEBUG_MODE Set to debug to enable debug logging
CPG_EXPORT Set to true to export CPG graphs in dot format
CPG_EXPORT_REPR Graph to export. Default all
CPG_EXPORT_FORMAT Export format. Default dot
CPG_SLICE Set to true to slice CPG
CPG_SLICE_MODE Slice mode. Default Usages
CPG_VECTORS Set to true to generate vector representations of code from CPG
CDXGEN_ARGS Extra arguments to pass to cdxgen
ENABLE_SBOM Enable SBoM generation using cdxgen
JIMPLE_ANDROID_JAR Optional when using atom. Path to android.jar for use with jimple for .apk or .dex to CPG conversion
GITHUB_TOKEN Token with read:packages scope to analyze CVE or GitHub Advisory
USE_ATOM Use AppThreat atom instead of joern frontends. atomgen would default to this mode.

GitHub actions

Use the marketplace action to generate CPGs using GitHub actions. Optionally, the upload the generated CPGs as build artifacts use the below step.

- name: Upload cpg
  uses: actions/[email protected]
  with:
    name: cpg
    path: cpg_out

License

Apache-2.0

Developing / Contributing

git clone [email protected]:AppThreat/cpggen.git
cd cpggen

python -m pip install --upgrade pip
python -m pip install poetry
# Add poetry to the PATH environment variable
poetry install

poetry run cpggen -i <src directory>