To run, access the trace-editor py in the root directory.
Please use the correct input for now, I haven't put any advanced validation.
Before running, create 2 symlinks/folders inside this directory:
./in: contains all input files
./out: contains all output files
The scripts will take every input and produce every output to those directories.
Please keep in mind that every trace must be preprocessed first before getting into script's another functionalities.
1. Preprocess a trace or traces inside a directory.
Type of traces:
- Microsoft Server Trace
- BlkReplay's blktrace
- Unix's blktrace: in our case, so far it is the same with Hadoop trace
python trace-editor.py -file <tracename> -preprocessMSTrace (-filter read/write)
python trace-editor.py -file <tracename> -preprocessBlkReplayTrace (-filter read/write)
python trace-editor.py -file <tracename> -preprocessUnixBlkTrace (-filter read/write)
It can also preprocess all traces inside a directory, here's an example using MS-Trace
python trace-editor.py -dir <dirname> -preprocessMSTrace (-filter read/write)
2. Modify a trace (Precondition: The trace must has been preprocessed)
Resize all requests size by 2x and rerate all request arrival time by 0.5x :
python trace-editor.py -file <tracename> -resize 2 -rerate 0.5
3. Combine traces (Precondition: The traces must have been preprocessed).
Make sure that the traces' names are well ordered because the script will just do the process without ordering the traces.
Well ordered means the traces are ordered from the earliest time to the latest time. Just check this condition with -ls.
python trace-editor.py -dir <dirname> -combine
4. Break to RAID-0 disks In this example get RAID disks from 4 disks with the stripe unit size 65536 bytes
python trace-editor.py -breaktoraid -file <infile> -ndisk 4 -stripe 65536
5. Check IO imbalance in the RAID Disks. This example uses 3disks with the granularity of 5minutes.
python trace-editor.py -ioimbalance -file <filename> -granularity 5
6. Check the busiest or the most loaded (in kB) time for a specific disk in a directory
Busiest = a time range with the largest number of requests
Most Loaded = a time range with the largest total requests size
Notes:
duration - in hrs, in this example 1hrs (60mins)
top - top n result in this example 3 top results
python trace-editor.py -dir <dirname> -mostLoaded -duration 60 -top 3
python trace-editor.py -dir <dirname> -busiest -duration 60 -top 3
Check the largest average time, the usage is the same with busiest and most loaded
python trace-editor.py -dir <dirname> -busiest -duration 60 -top 3
7. Top Large IO, In this example:
Top 3 Large IO with size greater than or equal 64kB, with 1hr duration
python trace-editor.py -toplargeio -file <filename> -offset 64 -devno 0 -duration 60 -top 3
8. Find most random write time range, In this example:
Find a time range(min) where has most random write
python trace-editor.py -dir <dirname> -mostRandomWrite -duration 5 -devno 5 -top 3
9. Get characteristic info from a after-preprocessed trace(usually after you cut the original preprocessed trace, due to devno reason), In this example:
You can get something like whisker plot info about write size, read size, time density, and % write, % read, % random write
python trace-editor.py -dir <dirname> -characteristic
10. Cut trace, in this example between timerange of minute 5 and minute 10
python trace-editor.py -cuttrace -file -timerange 5 10