Skip to content

Commit

Permalink
Merge pull request SymbolicPathFinder#15 from yanxx297/svcomp2021
Browse files Browse the repository at this point in the history
GSoC 2022 String Supports
  • Loading branch information
sohah authored Nov 7, 2022
2 parents 246f2fe + b72992a commit ce0627c
Show file tree
Hide file tree
Showing 41 changed files with 950 additions and 232 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
.hg
.hgignore
.idea
.run
local.properties
.version
nbproject/private/*
build/*
dist/*
tmp/*
classes/*
trace
*.class
*.orig
Expand All @@ -15,3 +18,5 @@ trace
logs/
out/
!.idea/*
*.log
src/examples/strings/CRIME_secret.txt.enc
File renamed without changes.
122 changes: 122 additions & 0 deletions doc/Z3str3-string-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Z3str3 String Solving

## Acknowledge
This work is supported by [GSoC 2022](https://summerofcode.withgoogle.com/), an international annual program focused on bringing new contributors to open source projects.

## Introduction
Java Pathfinder (JPF) is a Java bytecode analysis tool mostly used for model checking.
Symbolic Pathfinder (SPF) is a symbolic execution tool for Java bytecode based on JPF.
In SPF, preliminary Z3str3 integration has been implemented during GSoC 2021.
However, the existing implementation only support string functions explicitly implemented by SMTLib.
A full list of string methods supported by the previous work can be found in [this document](z3str3-integration.md).

We extended SPF string solving by adding support to more complicated string methods.
Due to time limit, we only added full support for 3 methods and partial support for another 3.
For the rest methods, we evaluate the difficulty of supporting them, and proposed ideas for some of them.
In addition, we also augmented current string examples and tests by fixing minor bugs in SPF string solving and run configurations.


## Contributions
All the code we have contributed can be found at [this url](https://github.com/marlinroberts21/jpf-symbc/compare/mjr/dev_init_igen...yanxx297:jpf-symbc:mjr/dev_init_igen).

With Those changes, 6 more methods are fully or partially supported, and 5 previously failed examples under src/example/strings can pass now.
More details about the current status of SPF's string solving support can be found in the first 2 tabs of [this spreadsheet](https://docs.google.com/spreadsheets/d/1c5TlmLC2TiL83K7531vKj7874NFxmZ7gC3P8TMUunJk/edit#gid=0).

### Newly supported functions
Based on the result running examples in [src/examples/strings](../src/examples/strings), we prioritized methods that are required by more examples and less complicated to implement.

`isEmpty()`, `valueOf(char)` and `valueOf(int)` are now fully supported.
`isEmpty()` is implemented based on existing support to `length()`.
`valueOf(char)` is modeled by `str.from_code`.
To model `valueOf(int)`, the input integer is converted to its absolute value, and then converted to string by str.from_int.

Note: SPF sometimes cannot tell whether the `valueOf()` function should be modeled as `valueOf(char)` or `valueOf(Int)`.
Add more pattern checking on demand.

Integer `AND`, `ParseInt()` and `trim()` are partially supported due to limited time.
We discuss what already works and what left unfinished.

Integer `AND` and all other bitwise operations for integer are not supported in existing SPF, while `AND` is required by [ChallengeTest](../src/examples/strings/ChallengeTest.java) and [IndexOfTest](../src/examples/strings/IndexOfTest.java) when converting an integer to a char.
To model integer `AND` in SMTLib, we convert the input integer to a bit vector, perform bitwise `AND` and convert the result back to integer.
Unfortunately, this implementation is too slow to use in practical.
As a workaround, we check whether the input integer is the result of `charAt()`, and whether it is converted to a char (`AND` with 65535 in java bytecode.)
If both conditions are satisfied, then we simply remove the `AND` operation.

Although `ParseInt()` is an Integer method rather than a string method, we decide to add support to it since it is required by [MasterMind](../src/examples/strings/MasterMind.java).
In SPF, `ParseInt()` translates to an internal operation named `isInteger` when added to string path conditions.
We have modeled this operation mostly using `str.to_int`.
The problem is that the output of `ParseInt()` output will appear in numeric path conditions, and special handling is required to make it fully functioning.
See [this section](#Complete-partially-supported-methods) for more discussions.

A preliminary prototype of `trim()` has been implemented using on SMTLib regex.
Current implementation only works when there is only one occurrence of trim() in one solver query (e.g. `testTrim()` in [E5](../src/examples/E5.java)).

### Run testSymString with Z3str3
[TestSymString](../src/tests/gov/nasa/jpf/symbc/strings/TestSymString.java) contains string tests based on JUnit.
As far as we can tell, those tests haven't been tested recently.
Currently, we can run them automatically by ``ant test``.
In the future, testing automation will switch to Gradle.

We run those tests and found they are inconsistent with the real Z3str3 string solving implementation.
To fix this problem, we modified both [SPF](https://github.com/yanxx297/jpf-symbc/commit/fdfcb2053d239d63deee0aa9082af57e88ea2e56) and the tests.
See [TestZ3SymString](../src/tests/gov/nasa/jpf/symbc/strings/TestZ3SymString.java) for an example porting those tests to Z3.

### Minor fixes
While running some string examples, we found that `startsWith()` are modeled as ``str.prefixof str prefix``, while the correct order should be ``str.prefixof prefix str``.
The same problem exists in endsWith().
This bug has been fixed in our work.

In addition, we augmented string examples.
[MasterMind](../src/examples/strings/MasterMind.java) previously exists in `src/examples/strings` but haven't been set up.
We created the configuration file and refactor it so that we can easily set the important inputs to symbolic.
Existing [Tricky](../src/examples/strings/Tricky.java) doesn't result in any string path condition, which is undesirable for a string example.
As an augmentation, we added a branch on a symbolic string to this example.

We also fixed minor bugs such as [6fca10e](https://github.com/yanxx297/jpf-symbc/commit/6fca10e00e7a31e32e53d5cc5dd8d681ea7f94dd) and [520398a](https://github.com/yanxx297/jpf-symbc/commit/520398a4e3fcc2c65c4a18965cc60330e81b6975).
Those small fixes can help passing more examples.

## Future Work
Below is a list of works left unfinished.
For those unfinished tasks, we discuss what we learned and propose approaches to handle them in the future.

### Complete partially supported methods

Currently, Integer `AND` can be modeled efficiently only for a special case.
The intuitive approach that converting between integers and vectors back-and-forth is too slow.
We should either model this operation in an optimized way or wait until Z3 can handle integer-vector converting more efficiently.

Since `ParseInt()` affects both numeric and string path conditions, special handling is required to make it work.
It should be handled in the same way `length()` is handled.
See how `SymbolicLengthInteger` is used to handle `length()` as an example.
To reproduce this problem, run MasterMind.

For complete support to `trim()`, SPF should create difference SMT variables for each unique occurrence of `trim()`.

### Support more string methods
`codePointAt()`, `toLowerCase()`, `toUpperCase()` and `equalsIgnoreCase()` can be supported by a combination of SMTLib methods.
`codePointAt()` can be modeled mostly accurate using `str.at` and `str.to_code`.
Note that it may covert either one or two chars to its code point, and the details should be handled carefully.
`toLowerCase()` and `toUpperCase()` can be modeled by replacing all 26 lower (upper) case letters to upper (lower) case.
`equalsIgnoreCase()` can be implemented using `equal()` and `toLowerCase()` (or `toUpperCase()`).

`lastIndexOf()` can be supported by regex. See how we [implement trim() using regex](trim.smt) as an example.

The rest methods are more difficult.
To support `matches()`, `replaceAll()` and `split()`, Java regex support is required.
Implementing Java regex in SMTLib can be a lot of engineering work, since SMTLib only support a subset of Java regex operations.
`format()`, `codePointCount()`, `compareTo()` and `valueOf(float)` are fundamentally difficult both because of their complexity and because there is no corresponding features in SMTLib.

### Z3str3 support for TestSymString
`TestSymString` currently not work with Z3str3, since it doesn't fully match the actual implementation of Z3str3 string support.

Our current solution is to create Z3str3 version of `TestSymString`.
For now only one test has been ported for Z3.
See [TestZ3SymString.java](../src/tests/gov/nasa/jpf/symbc/strings/TestZ3SymString.java) for an example.

It is more desirable if we can fix SPF and run TestSymString without modifying it.
To achieve this goal, it is necessary to copy solver solutions to the solution field of corresponding `StringSymbolic`.
The problem is that SPF doesn't maintain a list of existing `StringSymbolic` objects.
Only a list of object names are maintained, but we have no idea how to find and access the corresponding objects using their names.
As an ultimate solution, we can iterate the path condition and collect a list of `StringSymbolic` objects.


120 changes: 60 additions & 60 deletions string-solver-integration.md → doc/string-solver-integration.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,60 @@
string-solver-integration.md

## String Constraint Solvers in SPF ##

These are general notes on integrating string solvers into SPF. While there are many ways this can be accomplished, these notes present a straightforward approach that can be adapted to most any solver.

### Accessing the solver from Java ###

First, the solver must be accessible through Java, ideally through a .jar file placed in the lib directory. In the case of a solver that is written in a language other than Java, this .jar file acts as a wrapper around the native language solver methods and objects using the Java Native Interface, JNI. These methods will be called by the solver translation code developed specifically for the given solver.

Example .jar files:
__com.microsoft.z3.jar__
__automaton.jar__

If a .jar file does not exist for a solver, JNI code can be placed directly in the translation class as an alternative. In both cases, the native libraries needed to support JNI for a specific platform need to be accessible in the lib directory.

Example JNI code:
__edu.ucsb.cs.vlab.DriverProxy.java__

### Translating SPF constructs to solver inputs ###

Second, a translation class needs to be created in SPF that provides the method _isSat()_ that returns the satisfiability of a given string path condition. It takes as parameters the SPF global graph and a path condition. The translation class is responsible for interpreting the path condition and global graph and translating them into a set of constructs compatible with the given solver. The translation class accesses the solver using the methods and objects provided in the .jar file, or with JNI code accessing the solver libraries directly.

If the solver returns a solution for variables in addition to satisfiability, the translation class is responsible for populating the string path condition solution property with the solution values. This allows SPF to access the solution values.

Location:
__gov.nasa.jpf.symbc.string.translate__

Examples:
__gov.nasa.jpf.symbc.string.translate.TranslateToIGEN.java__
__gov.nasa.jpf.symbc.string.translate.TranslateToABC.java__
__edu.ucsb.cs.vlab.versions.Z3String3Processor.java__

### Identifying the solver, executing the translation isSat() method

Third, the __SymbolicStringConstraintsGeneral__ class needs to be modified to include the identity and actions for the new solver. The solver identity is provided by a class property that is a unique string identifier for the solver (public static final String _identifier_). This identifier is used to specify the string solver in the .jpf file at runtime. The class instance variable solver is set to the correct identifier at the start of the _inner_isSatisfiable()_ method.

Finally, the solver needs to be added to the section in the _inner_isSatisfiable()_ method that contains the set of conditionals that execute the correct actions for a given solver. Typically this action will be setting the _decisionProcedure_ variable to the output of the translation classes _isSat()_ method.

Location:
__gov.nasa.jpf.symbc.string.SymbolicStringConstraintsGeneral__

### Alternative to JNI or .jar files ###

If a solver does not have a .jar file exposing the required functionality, it may be possible to interface the solver if it has command line functionality that can return results that a translation class can interpret. For example, a translation class can create an instance of a binary executable and redirect the input and output of the instance in order to issue commands and receive results.

### Setting solver properties in the .jpf file ###

If a solver accommodates or requires special settings, these can be specified at runtime through the .jpf file if modifications are made to SPF. Properties specified in the .jpf file are exposed to classes in SPF through static public properties of the __SymbolicInstructionFactory__ class. The static public properties need to be declared in the class, and subsequently set in the constructor method. Properties in the .jpf file are discovered by interrogating the __Config__ object passed to the constructor.

Once the __SymbolicInstructionFactory__ class has been modified to set the solver properties, they can be accessed within the solver translation code by importing the __SymbolicInstructionFactory__ class and accessing its public static properties.

Location:
__gov.nasa.jpf.symbc.SymbolicInstructionFactory__

Example in .jpf file:
__symbolic.z3str3_aggressive_length_testing=true__

Accessible in translator class as:
__SymbolicInstructionFactory.z3str3_aggressive_length_testing__
string-solver-integration.md

## String Constraint Solvers in SPF ##

These are general notes on integrating string solvers into SPF. While there are many ways this can be accomplished, these notes present a straightforward approach that can be adapted to most any solver.

### Accessing the solver from Java ###

First, the solver must be accessible through Java, ideally through a .jar file placed in the lib directory. In the case of a solver that is written in a language other than Java, this .jar file acts as a wrapper around the native language solver methods and objects using the Java Native Interface, JNI. These methods will be called by the solver translation code developed specifically for the given solver.

Example .jar files:
__com.microsoft.z3.jar__
__automaton.jar__

If a .jar file does not exist for a solver, JNI code can be placed directly in the translation class as an alternative. In both cases, the native libraries needed to support JNI for a specific platform need to be accessible in the lib directory.

Example JNI code:
__edu.ucsb.cs.vlab.DriverProxy.java__

### Translating SPF constructs to solver inputs ###

Second, a translation class needs to be created in SPF that provides the method _isSat()_ that returns the satisfiability of a given string path condition. It takes as parameters the SPF global graph and a path condition. The translation class is responsible for interpreting the path condition and global graph and translating them into a set of constructs compatible with the given solver. The translation class accesses the solver using the methods and objects provided in the .jar file, or with JNI code accessing the solver libraries directly.

If the solver returns a solution for variables in addition to satisfiability, the translation class is responsible for populating the string path condition solution property with the solution values. This allows SPF to access the solution values.

Location:
__gov.nasa.jpf.symbc.string.translate__

Examples:
__gov.nasa.jpf.symbc.string.translate.TranslateToIGEN.java__
__gov.nasa.jpf.symbc.string.translate.TranslateToABC.java__
__edu.ucsb.cs.vlab.versions.Z3String3Processor.java__

### Identifying the solver, executing the translation isSat() method

Third, the __SymbolicStringConstraintsGeneral__ class needs to be modified to include the identity and actions for the new solver. The solver identity is provided by a class property that is a unique string identifier for the solver (public static final String _identifier_). This identifier is used to specify the string solver in the .jpf file at runtime. The class instance variable solver is set to the correct identifier at the start of the _inner_isSatisfiable()_ method.

Finally, the solver needs to be added to the section in the _inner_isSatisfiable()_ method that contains the set of conditionals that execute the correct actions for a given solver. Typically this action will be setting the _decisionProcedure_ variable to the output of the translation classes _isSat()_ method.

Location:
__gov.nasa.jpf.symbc.string.SymbolicStringConstraintsGeneral__

### Alternative to JNI or .jar files ###

If a solver does not have a .jar file exposing the required functionality, it may be possible to interface the solver if it has command line functionality that can return results that a translation class can interpret. For example, a translation class can create an instance of a binary executable and redirect the input and output of the instance in order to issue commands and receive results.

### Setting solver properties in the .jpf file ###

If a solver accommodates or requires special settings, these can be specified at runtime through the .jpf file if modifications are made to SPF. Properties specified in the .jpf file are exposed to classes in SPF through static public properties of the __SymbolicInstructionFactory__ class. The static public properties need to be declared in the class, and subsequently set in the constructor method. Properties in the .jpf file are discovered by interrogating the __Config__ object passed to the constructor.

Once the __SymbolicInstructionFactory__ class has been modified to set the solver properties, they can be accessed within the solver translation code by importing the __SymbolicInstructionFactory__ class and accessing its public static properties.

Location:
__gov.nasa.jpf.symbc.SymbolicInstructionFactory__

Example in .jpf file:
__symbolic.z3str3_aggressive_length_testing=true__

Accessible in translator class as:
__SymbolicInstructionFactory.z3str3_aggressive_length_testing__
21 changes: 21 additions & 0 deletions doc/trim.smt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
; Proof of concept SMT regex implementation of Java String.trim()

(declare-const in_str String)

(declare-const out_str String)
(declare-const ws RegLan)
(declare-const nwc RegLan)

(assert (= out_str "abcd"))
(assert (str.prefixof " " in_str ))

(assert (and (str.in_re in_str (re.++ ws (str.to_re out_str) ws))
;; in_str = whitespace + out_str + whitespace
(or (= out_str "")
(and (str.in_re out_str (re.++ nwc re.all)) (str.in_re out_str (re.++ re.all nwc))))))
;; To make sure out_str start & end with non-white char if it is not empty
(assert (= ws (re.union (re.+ (re.range (str.from_code 0) (str.from_code 32))) (str.to_re ""))))
; arbitrary length whitespace or empty string
(assert (= nwc (re.diff re.allchar (re.range (str.from_code 0) (str.from_code 32)))))
;(assert (out_str = "123"))
(check-sat)
Loading

0 comments on commit ce0627c

Please sign in to comment.