Trace Editor

To run, access the trace-editor py in the root directory.
Please use the correct input for now, I haven't put any advanced validation.

Before running, create 2 symlinks/folders inside this directory:
./in: contains all input files
./out: contains all output files

The scripts will take every input and produce every output to those directories.

Please keep in mind that every trace must be preprocessed first before getting into script's another functionalities.

List of commands:

1. Preprocess a trace or traces inside a directory.
Type of traces:

Microsoft Server Trace

BlkReplay's blktrace

Unix's blktrace: in our case, so far it is the same with Hadoop trace

python trace-editor.py -file <tracename> -preprocessMSTrace (-filter read/write)

python trace-editor.py -file <tracename> -preprocessBlkReplayTrace (-filter read/write)

python trace-editor.py -file <tracename> -preprocessUnixBlkTrace (-filter read/write)

It can also preprocess all traces inside a directory, here's an example using MS-Trace

python trace-editor.py -dir <dirname> -preprocessMSTrace (-filter read/write)

2. Modify a trace (Precondition: The trace must has been preprocessed)
Resize all requests size by 2x and rerate all request arrival time by 0.5x :

python trace-editor.py -file <tracename> -resize 2 -rerate 0.5

3. Combine traces (Precondition: The traces must have been preprocessed).
Make sure that the traces' names are well ordered because the script will just do the process without ordering the traces. Well ordered means the traces are ordered from the earliest time to the latest time. Just check this condition with -ls.

python trace-editor.py -dir <dirname> -combine

4. Break to RAID-0 disks In this example get RAID disks from 4 disks with the stripe unit size 65536 bytes

python trace-editor.py -breaktoraid -file <infile> -ndisk 4 -stripe 65536

5. Check IO imbalance in the RAID Disks. This example uses 3disks with the granularity of 5minutes.

python trace-editor.py -ioimbalance -file <filename> -granularity 5

6. Check the busiest or the most loaded (in kB) time for a specific disk in a directory
Busiest = a time range with the largest number of requests
Most Loaded = a time range with the largest total requests size

Notes:
duration - in hrs, in this example 1hrs (60mins)
top - top n result in this example 3 top results

python trace-editor.py -dir <dirname> -mostLoaded -duration 60 -top 3

python trace-editor.py -dir <dirname> -busiest -duration 60 -top 3

Check the largest average time, the usage is the same with busiest and most loaded

python trace-editor.py -dir <dirname> -busiest -duration 60 -top 3

7. Top Large IO, In this example:
Top 3 Large IO with size greater than or equal 64kB, with 1hr duration

python trace-editor.py -toplargeio -file <filename> -offset 64 -devno 0 -duration 60 -top 3

8. Find most random write time range, In this example:
Find a time range(min) where has most random write

python trace-editor.py -dir <dirname> -mostRandomWrite -duration 5 -devno 5 -top 3

9. Get characteristic info from a after-preprocessed trace(usually after you cut the original preprocessed trace, due to devno reason), In this example:
You can get something like whisker plot info about write size, read size, time density, and % write, % read, % random write

python trace-editor.py -dir <dirname> -characteristic

10. Cut trace, in this example between timerange of minute 5 and minute 10

python trace-editor.py -cuttrace -file  -timerange 5 10

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
README.md		README.md
trace-editor.py		trace-editor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trace Editor

List of commands:

About

Releases

Packages

Languages

ucare-uchicago/trace-edit

Folders and files

Latest commit

History

Repository files navigation

Trace Editor

List of commands:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages