The HTTP Archive Parser stands up a server with endpoints to parse HTTP Archive files in various ways. It's initial purpose was to detect data privacy violations in a user session. Part of that system has been broken out into a more general purpose parser, which will expose Shared Strings in a user's session. It helps identify dataflow between different host domains.
It looks to match strings such as "cookies", "headers", and "query parameters".
You can specify various reports to run on the HAR file.
- Shared String
- Shared String Entity List
- Shared String Differential
Currently Reports will be read and stored in S3. You will have to fill out the .env
file with the proper credentials.
There will be future support for using local filesystem.
$ npm install
# development
$ npm run start
# watch mode
$ npm run start:dev
# production mode
$ npm run start:prod
# unit tests
$ npm run test
# e2e tests
$ npm run test:e2e
# test coverage
$ npm run test:cov
There are many properties you can customize for the parser. The config is located in the parser/har/parser.config.js
file.
Improper modification of these values can lead to unneccessary parsing conditions which leads to long parsing times.
FIRST_CHAR_MIN_LEN
and FIRST_CHAR_MAX_LEN
values are most sensitive. The smaller the FIRST_CHAR_MIN_LEN
the more strings the parser will consider in the file. You should probably always have this value greater than 6 or 7. Most unique identifiers are greater than 7 so go ahead and set it higher if that is what you are looking for.
{
LEVELS: [
'request',
'response'
],
ENTRY_TYPES: [
'headers',
'cookies',
'queryString'
],
FIRST_CHAR_MIN_LEN: 7,
FIRST_CHAR_MAX_LEN: 200,
REPORT_KEY_NAME_MAX_LENGTH: 60,
REPORT_URL_MAX_LENGTH: 120,
INCLUDE_INITIATOR: true,
INCLUDE_SERVER_IP: false,
MATCH_COUNT_MIN: 2,
IGNORE_LIST: [],
INCLUDE_LIST: [],
REPORT_PARAMS: [],
IGNORE_SAME_REQUESTS: true,
FILTER_SAME_HOST_URL: true,
FILTER_TIMESTAMPS: true,
FILTER_URL_VALUES: false,
}
POST to {SERVER_HOST}/collection-event/parse
- There are two supported ways to pass your file to the parser
- Send the entire raw HAR contents in the request body
- Send the name of the HAR file stored in S3
Header Request Format
Headers
Content-Type application/json
mx-token TEST-KEY-PARSER
Supported Request Body
{
"format": "json", // OPTIONAL - also accepts "csv" - default is json
"save": bool, // OPTIONAL - true or false to save to bucket - default is true
"update": bool, // OPTIONAL - true or false to overwrite existing file - default is false
"report_type": "sharedStrings", // or "differential" or "entityList"
"files": ["<S3 HAR FILE NAME>"] // if differential pass two files
// OR
"raw": [{HAR1}], // if differential pass two raw HAR files as json objects
}
Request Body
{
"report_type": "sharedStrings",
"format": "json",
"files": ["<S3 HAR FILE NAME>"]
}
Request Body
{
"report_type": "entityList",
"format": "json",
"files": ["<S3 HAR FILE NAME>"]
}
Request Body
{
"report_type": "differential",
"format": "json",
"files": ["<S3 HAR FILE NAME>", "<S3 HAR FILE TO DIFF AGAINST>"]
}
Before starting development it is important to understand the NEST framework. There are a couple basic concepts that will help to understand the purpose of each file. The basic stucture is that each Module has a Component, Service/Repository, Data Transfer Model, & Interface. Some of these constucts are just fancy words for very simple purpose.
You can run the CreateFullModule.sh
script to have the following files autogenerated for you:
(Note that there are occasionally syntax issues when creating a module with a module name that is more than 1 word long)
- controller
- module
- dto
- repository
- spec test file
Take a look at the CreateFullModule.sh
code to understand what is happening.
After you run the script there are still a few things that need to be manually added.
- You will need to manually enter in the values into the dto file. Look at the
src/interfaces/entities/<your module>
for all the fields that you will need to add in. Use existing files for reference.
The alternative is to use the nest cli
and do it all manually.
-
Swagger Docs:
-
Great docs on creating components that hook into TypeORM
-
Swagger with NestJS
HTTP Archive Parser is built with TypeScript Nest framework
Nicholas Porter - https://github.com/porteron