Skip to content

Java library for distributed zero knowledge proof systems (forked from scipr-lab/dizk)

License

Notifications You must be signed in to change notification settings

clearmatics/neodizk

Repository files navigation

(NEO)DIZK

Java library for DIstributed Zero Knowledge proof systems

Java library for distributed zero knowledge proof systems. The library implements distributed polynomial evaluation/interpolation, computation of Lagrange polynomials, and multi-scalar multiplication. Using these scalable arithmetic subroutines, the library provides a distributed zkSNARK proof system that enables verifiable computations of up to billions of logical gates, far exceeding the scale of previous state-of-the-art solutions.

🚨 WARNING: This is an academic proof-of-concept prototype. This implementation is not ready for production use. It does not yet contain all the features, careful code review, tests, and integration that are needed for a deployment!

Disclaimer: This work is derived from the SCIPR-Lab's library DIZK which was developed as part of a paper called "DIZK: A Distributed Zero Knowledge Proof System".

Table of contents

Directory structure

The directory structure is as follows:

  • src: Java directory for source code and unit tests
    • main/java: Java source code, containing the following modules:
      • algebra: fields, groups, elliptic curves, FFT, multi-scalar multiplication
      • bace: batch arithmetic circuit evaluation
      • common: standard arithmetic and Spark computation utilities
      • configuration: configuration settings for the Spark cluster
      • profiler: profiling infrastructure for zero-knowledge proof systems
      • reductions: reductions between languages (used internally)
      • relations: interfaces for expressing statement (relations between instances and witnesses) as various NP-complete languages
      • zk_proof_systems: serial and distributed implementations of zero-knowledge proof systems
    • test/java: Java unit tests for the provided modules and infrastructure

Overview

This library implements a distributed zero knowledge proof system, enabling scalably proving (and verifying) the integrity of computations, in zero knowledge.

A prover who knows the witness for an NP statement (i.e., a satisfying input/assignment) can produce a short proof attesting to the truth of the NP statement. This proof can then be verified by anyone, and offers the following properties.

  • Zero knowledge - the verifier learns nothing from the proof besides the truth of the statement.
  • Succinctness - the proof is small in size and cheap to verify.
  • Non-interactivity - the proof does not require back-and-forth interaction between the prover and the verifier.
  • Soundness - the proof is computationally sound (such a proof is called an argument).
  • Proof of knowledge - the proof attests not just that the NP statement is true, but also that the prover knows why.

These properties comprise a zkSNARK, which stands for Zero-Knowledge Succinct Non-interactive ARgument of Knowledge. For formal definitions and theoretical discussions about these, see [BCCT12] [BCIOP13] and the references therein.

DIZK provides Java-based implementations using Apache Spark [Apa17] for:

  1. Proof systems
    • A serial and distributed preprocessing zkSNARK for R1CS (Rank-1 Constraint Systems), an NP-complete language that resembles arithmetic circuit satisfiability. The zkSNARK is the protocol in [Gro16].
    • A distributed Merlin-Arthur proof system for evaluating arithmetic circuits on batches of inputs; see [Wil16].
  2. Scalable arithmetic
    • A serial and distributed radix-2 fast Fourier transform (FFT); see [Sze11].
    • A serial and distributed multi-scalar multiplication (MSM); see [BGMW93] [Pip76] [Pip80].
    • A serial and distributed Lagrange interpolation (Lag); see [BT04].
  3. Applications using the above zkSNARK for
    • Authenticity of photos on three transformations (crop, rotation, blur); see [NT16].
    • Integrity of machine learning models with support for linear regression and covariance matrices; see [Bis06] [Can69] [LRF97] [vW97].

Build guide

The library has the following dependencies:

More information about compilation options can be found here

Why Java?

This library uses Apache Spark, an open-source cluster-computing framework that natively supports Java, Scala, and Python. Among these, we found Java to fit our goals because we could leverage its rich features for object-oriented programming and we could control execution in a (relatively) fine-grained way.

While other libraries for zero knowledge proof systems are written in low-level languages (e.g., libsnark is written in C++ and bellman in Rust), harnessing the speed of such languages in our setting is not straightforward. For example, we evaluated the possibility of interfacing with C (using native binding approaches like JNI and JNA), and concluded that the cost of memory management and process interfacing resulted in a slower performance than from purely native Java execution.

Installation

Start by cloning this repository and entering the repository working directory:

git clone https://github.com/clearmatics/neodizk.git
cd neodizk
# Set up your environment
. ./setup_env.sh

Finally, compile the source code:

mvn compile

Docker-based development environment with local cluster

For development, it can be convenient to work inside a container, to isolate the development environment from the local system, and to make use of a local cluster on a virtual network.

Generate a simple docker-based cluster (1 master + 2 slaves) on a local network:

scripts/local-cluster-setup

The master and slaves are launched. Press CTRL-C in this terminal to terminate. The cluster nodes on the virtual network cluster-network are:

  • 10.5.0.2 - cluster-master
  • 10.5.0.3 - cluster-slave-1
  • 10.5.0.4 - cluster-slave-2

Generate an image and container for development (in another terminal). The current directory (this repository root) mapped to /home/dizk in the container. The container is attached to the same virtual network with IP address 10.5.1.2:

scripts/dev-setup

Start the simple docker-based cluster:

scripts/local-cluster-start

Start the development container:

scripts/dev-start

The container terminates when the shell is exited.

From within the development container, programs can be executed on the cluster using /opt/spark/bin/spark-submit:

/opt/spark/bin/spark-submit \
    --class <mynaamespace.MyClass> \
    --jars <other-classes.jar>
    --master spark://cluster-master:7077 \
    /home/dizk/target/neodizk-0.2.0.jar <args>

Manual setup of docker-based development environment

docker build -t neodizk-base -f Dockerfile-base .
docker build -t neodizk-dev -f Dockerfile-dev .
docker run -it --name neodizk-container neodizk-dev

The repository is mounted at /home/dizk in the container.

Testing

This library comes with unit tests for each of the provided modules. Run the tests with:

mvn test

Note 1: You can build the tests without running them by using the following command:

mvn test-compile

Note 2: You can run a single test by using the following command:

mvn -Dtest=<test-class> test
# Example:
# mvn -Dtest=BNFieldsTest test

See here for more information.

Run syntax checker

Run:

mvn spotless:check

Configuring AWS and using Flintrock to manage a testing cluster

Create and configure an AWS account

  1. Create an AWS account
  2. Follow the set-up instructions here
    • Select the region
    • Create an EC2 keypair (see here for more info)
    • Create a security group

Both the security group and keypair are used to secure the EC2 instances launched, as indicated here. AWS takes care of creating a default VPC.

  1. Create the appropriate set of IAM users
    • Create an Administrator as documented here
    • Create an IAM user for programmatic use with Flintrock. This user needs to have the following permissions:
      • AmazonEC2FullAccess ,
      • IAM.GetInstanceProfile and IAM.PassRole (as documented here)

Using Flintrock

Installation

python3.7 -m venv env
source env/bin/activate
pip install --upgrade pip
# Install the latest develop version of flintrock
pip install git+https://github.com/nchammas/flintrock

# Now the flintrock CLI is available
flintrock --help

Note 1: The latest stable version of Flintrock can be installed by simply running pip install flintrock. However, improvements have been added (and not yet packaged in a release) since the 1.0.0 release. In the following, we make the assumption that the support for configurable JDKs is available in the Flintrock CLI.

Note 2: Flintrock uses boto3 which is the Python SDK for AWS.

Note 3: The flintrock launch command truly corresponds to clicking the "Launch instance" button on the EC2 dashboard. The values of the flags of the flintrock launch command correspond to the values that one needs to provide at the various steps of the "Launch instance" process (see here)

Example

Below is an example to demonstrate how to launch a test cluster test-cluster. Before doing so, we assume that:

  • the private key (.pem) file of the created EC2 keypair (see this step) is stored on your computer at: ~/.ssh/ec2-key.pem
  • the desired instance type is: m4.large
  • the chosen AMI is one of the AMIs of either Amazon Linux 2 or the Amazon Linux AMI (see here to find an AMI). In fact, as documented here - the default username one can use to connect to the EC2 instance depends on the chosen AMI. For Amazon Linux (2) AMIs, this default username is ec2-user. For the sake of this example, we assume that the chosen AMI is: ami-00b882ac5193044e4
  • the region is us-east-1

Furthermore, before instantiating a cluster with Flintrock, it is necessary to configure the environment with the credentials ("access key ID" and "secret access key") of the IAM programmatic user created in previous steps. This can either be done by configuring environment variables, or using a configuration file (as documented here.)

Once the environment is configured, and assuming the example values above, the command to launch the cluster becomes:

flintrock launch test-cluster \
    --num-slaves 1 \
    --java-version 11 \
    --spark-version 3.0.0 \
    --ec2-instance-type m4.large \
    --ec2-region us-east-1 \
    --ec2-key-name ec2-key \
    --ec2-identity-file ~/.ssh/ec2-key.pem \
    --ec2-ami ami-00b882ac5193044e4 \
    --ec2-instance-initiated-shutdown-behavior terminate \
    --ec2-user ec2-user

TROUBLESHOOTING: For debug purposes, it is possible to use the aws CLI directly. The CLI is available as a docker container, however, while running the command like docker run --rm -ti amazon/aws-cli <command> is equivalent to running aws <command> on the host, one needs to remember that no state is preserved across the commands because the containers are removed as soon as the command stops executing. Hence, for a more stateful interaction, it is possible to override the ENTRYPOINT of the container by doing:

docker run -it --entrypoint /bin/bash amazon/aws-cli

Then, in the container, the aws CLI can be used by running aws <command>. Note, that credentials need to be configured first via aws configure. To check the configured credentials, use aws iam get-user"

  • If the access is denied, check:
    • The aws config (in ~/.aws or your access key credentials in the container's environment)
    • The time of your machine, and adjust to the same time of the AWS servers. On Debian-based distributions, this can be done via:
    sudo apt-get --yes install ntpdate
    sudo ntpdate 0.amazon.pool.ntp.org

Upon successful deployment of the cluster, make sure to persist the Flintrock configuration in a configuration file (with flintrock configure). Then, the cluster can be inspected/stopped/started/destroyed/scaled etc by using the Flintrock commands (e.g. flintrock describe test-cluster, flintrock destroy test-cluster etc.)

Running an application on the cluster

Upon successful instantiation of the cluster, the steps to deploy an application are:

  1. Package your application (create a .jar):
mvn package
  1. As documented here:

    • Move the .jar to the cluster via flintrock copy-file, e.g.:
    flintrock copy-file test-cluster $DIZK/target/neodizk-0.2.0.jar /home/ec2-user/
    • Login to the cluster via flintrock login, e.g.:
    flintrock login test-cluster
    • Start the application from the master node with spark-submit, e.g.:
    # Create a location to store the logs of the application and pass it to the spark-submit command
    mkdir /tmp/spark-events
    spark-submit --class profiler.Profiler --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=/tmp/spark-events /home/ec2-user/neodizk-0.2.0.jar 2 1 8G zksnark-large 15 4

    Note: The above can also be carried out directly from the host (without login to the master node of the cluster) via the flintrock run-command command.

  2. (Optional) Access SparkUI from your host machine:

    • <master-url>:8080
    • <master-url>:4040, where <master-url> can be obtained by running flintrock describe
  3. (Optional) If the spark-submit command is used along with the --conf spark.eventLog.enabled=true and --conf spark.eventLog.dir=/tmp/spark-events flags, the logs can be recovered on the host by running:

scp -i <path-to-aws-key> -r ec2-user@<master-url>:/tmp/spark-events/src/main/resources/logs/ $DIZK/out/

Note: Additional configuration parameters can be passed to the spark-submit command, e.g.:

--conf spark.memory.fraction
--conf spark.memory.storageFraction
...
--conf spark.rdd.compress
...
--conf spark.speculation
--conf spark.speculation.interval
--conf spark.speculation.multiplier
...

See here for more information on the configuration, and see this blog post for an introduction to speculative execution in Spark.

Setup monitoring infrastructure

To have more metrics about the cluster's health and usage, monitoring tools like ganglia can be used. This is particularly important to carry out meaningful optimizations.

To use ganglia with apache spark, spark needs to be compiled from source due to license mismatch. To install a ganglia-compatible version of spark on the cluster, you can modify flintrock here as follows:

-./dev/make-distribution.sh -Phadoop-{hadoop_short_version}
+./dev/make-distribution.sh -Pspark-ganglia-lgpl -Phadoop-{hadoop_short_version}

and make sure to build spark from a specific commit, by using: flintrock launch <your-cluster> --spark-git-commit 97340c1e34cfd84de445b6b7545cfa466a1baaf6 [other flags] (here commit 97340c1e34cfd84de445b6b7545cfa466a1baaf6 corresponds to apache version 3.1.0).

Configure the master node

Once the cluster is started:

  1. Configure the master node to run ganglia:
flintrock copy-file <your-cluster> scripts/ganglia-setup-master  /home/ec2-user/ --master-only
flintrock run-command <your-cluster> --master-only 'sudo chmod +x /home/ec2-user/ganglia-setup-master && sudo /home/ec2-user/ganglia-setup-master <your-cluster>'
  1. Make sure to configure the webserver appropriately by editing /etc/httpd/conf/httpd.conf as desired (e.g. change default port)
  2. Edit /etc/httpd/conf.d/ganglia.conf as desired (for e.g. write the auth configuration to access the dashboard)
  3. Restart the httpd service: service httpd restart
  4. Double check the AWS rules of the relevant security groups and make sure they align with the configuration above.

Configure the worker nodes

After configuring the master node, configure the worker nodes to send their metrics information to the master/reporting node (since flintrock only has a flag --master-only for the copy-file and run-command commands - and no flag --workers-only, we use ssh/scp commands to achieve the same thing below):

# Copy the configuration script to the each worker node
scp -i $AWS_KEYPAIR_PATH scripts/ganglia-setup-worker ec2-user@<worker-node-ip>:/home/ec2-user/
# Connect to each worker node
ssh -i $AWS_KEYPAIR_PATH ec2-user@<worker-node-ip>
# On the node execute the following commands
sudo ./ganglia-setup-worker <your-cluster> <worker-cluster-ip>

Configure Spark to use GangliaSink

Write a spark metrics configuration file. To do so, paste the following configuration

*.sink.ganglia.class = org.apache.spark.metrics.sink.GangliaSink
*.sink.ganglia.host = <worker-cluster-ip>
*.sink.ganglia.port = 8649
*.sink.ganglia.period = 10
*.sink.ganglia.unit = seconds
*.sink.ganglia.ttl = 1
*.sink.ganglia.mode = unicast
*.sink.ganglia.name = Spark-name

*.sink.console.class = org.apache.spark.metrics.sink.ConsoleSink
*.sink.console.period = 10
*.sink.console.unit = seconds

master.source.jvm.class = org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class = org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class = org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class = org.apache.spark.metrics.source.JvmSource

in $SPARK_HOME/conf/metrics.properties on all nodes of the cluster (make sure to replace <worker-cluster-ip> by the actual host node IP in the cluster).

After these steps, one can access the ganglia dashboard from the master/host node. Upon submission of a job on the cluster via spark-submit, the metrics of the various spark cluster nodes can be monitored on the dashboard - in addition to the SparkUI.

References

[Apa17] Apache Spark, Apache Spark, 2017

[Bis06] Pattern recognition and machine learning, Christopher M. Bishop, Book, 2006

[BCCT12] From extractable collision resistance to succinct non-interactive arguments of knowledge, and back again, Nir Bitansky, Ran Canetti, Alessandro Chiesa, Eran Tromer, Innovations in Theoretical Computer Science (ITCS), 2012

[BCIOP13] Succinct non-interactive arguments via linear interactive proofs, Nir Bitansky, Alessandro Chiesa, Yuval Ishai, Rafail Ostrovsky, Omer Paneth, Theory of Cryptography Conference (TCC), 2013

[BGMW93] Fast exponentiation with precomputation, Ernest F. Brickell, Daniel M. Gordon, Kevin S. McCurley, and David B. Wilson, International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), 1992

[BT04] Barycentric Lagrange interpolation, Jean-Paul Berrut and Lloyd N. Trefethen, SIAM Review, 2004

[Can69] A cellular computer to implement the Kalman filter algorithm, Lynn E Cannon, Doctoral Dissertation, 1969

[Gro16] On the size of pairing-based non-interactive arguments, Jens Groth, International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), 2016

[LRF97] Generalized cannon’s algorithm for parallel matrix multiplication, Hyuk-Jae Lee, James P. Robertson, and Jose A. B. Fortes, International Conference on Supercomputing, 1997

[NT16] Photoproof: Cryptographic image authentication for any set of permissible transformations, Assa Naveh and Eran Tromer, IEEE Symposium on Security and Privacy, 2016

[Pip76] On the evaluation of powers and related problems, Nicholas Pippenger, Symposium on Foundations of Computer Science (FOCS), 1976

[Pip80] On the evaluation of powers and monomials, Nicholas Pippenger, SIAM Journal on Computing, 1980

[Sze11] Schönhage-Strassen algorithm with MapReduce for multiplying terabit integers, Tsz-Wo Sze, International Workshop on Symbolic-Numeric Computation, 2011

[vW97] SUMMA: scalable universal matrix multiplication algorithm, Robert A. van de Geijn and Jerrell Watts, Technical Report, 1997

[Wil16] Strong ETH breaks with Merlin and Arthur: short non-interactive proofs of batch evaluation, Ryan Williams, Conference on Computational Complexity, 2016

License

MIT License