Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor count summary for flexibility + more tests #100

Merged
merged 8 commits into from
Apr 5, 2016

Conversation

olorin
Copy link
Contributor

@olorin olorin commented Apr 2, 2016

On top of #99. diff

  • Add a "summarize" step for the row counts so we can perform additional transformations on the final accumulated state before serializing it (no schema changes in this PR).
  • Some associative/commutative properties for appropriate things.
  • Benchmark for numeric stuff.

@olorin
Copy link
Contributor Author

olorin commented Apr 2, 2016

Benchmark bench: RUNNING...
benchmarking decoding/decode/conduit+attoparsec-bytestring/1000
time                 285.6 ms   (282.3 ms .. 289.9 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 291.5 ms   (289.9 ms .. 293.6 ms)
std dev              2.002 ms   (503.9 μs .. 2.607 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking field-parsing/parseField/200
time                 83.93 μs   (82.00 μs .. 85.31 μs)
                     0.993 R²   (0.988 R² .. 0.996 R²)
mean                 76.76 μs   (74.37 μs .. 79.26 μs)
std dev              8.397 μs   (7.690 μs .. 9.689 μs)
variance introduced by outliers: 85% (severely inflated)

benchmarking folds/updateSVParseState/1000
time                 154.7 ms   (146.7 ms .. 162.8 ms)
                     0.997 R²   (0.994 R² .. 1.000 R²)
mean                 143.2 ms   (136.5 ms .. 147.9 ms)
std dev              7.793 ms   (5.631 ms .. 10.56 ms)
variance introduced by outliers: 13% (moderately inflated)

benchmarking folds/hashText/1000
time                 112.1 μs   (97.38 μs .. 124.4 μs)
                     0.955 R²   (0.946 R² .. 0.994 R²)
mean                 107.8 μs   (101.5 μs .. 122.3 μs)
std dev              29.12 μs   (14.35 μs .. 52.78 μs)
variance introduced by outliers: 97% (severely inflated)

benchmarking folds/updateTextCounts/1000
time                 26.25 ms   (21.11 ms .. 31.49 ms)
                     0.911 R²   (0.810 R² .. 0.973 R²)
mean                 32.98 ms   (30.82 ms .. 35.79 ms)
std dev              5.653 ms   (3.914 ms .. 7.671 ms)
variance introduced by outliers: 69% (severely inflated)

benchmarking numerics/updateNumericState/10000
time                 6.135 ms   (5.956 ms .. 6.279 ms)
                     0.992 R²   (0.987 R² .. 0.996 R²)
mean                 6.669 ms   (6.521 ms .. 6.961 ms)
std dev              592.5 μs   (469.3 μs .. 738.9 μs)
variance introduced by outliers: 52% (severely inflated)

benchmarking numerics/combineMeanDevAcc/10000
time                 391.8 μs   (375.0 μs .. 410.0 μs)
                     0.987 R²   (0.978 R² .. 0.996 R²)
mean                 365.2 μs   (358.1 μs .. 376.5 μs)
std dev              29.51 μs   (17.76 μs .. 43.37 μs)
variance introduced by outliers: 69% (severely inflated)

benchmarking numerics/combineNumericState/10000
time                 483.8 μs   (464.9 μs .. 508.1 μs)
                     0.988 R²   (0.982 R² .. 0.994 R²)
mean                 500.3 μs   (486.8 μs .. 512.8 μs)
std dev              44.17 μs   (35.98 μs .. 51.68 μs)
variance introduced by outliers: 72% (severely inflated)

Benchmark bench: FINISH

@charleso
Copy link

charleso commented Apr 4, 2016

I'm assuming SVParseState can't just be renamed/re-used as RowCountSummary as they will diverge at some point?

@olorin
Copy link
Contributor Author

olorin commented Apr 4, 2016

Yep, see PR #102 - the numeric state will be added, and needs to be finalized before it's serialized.

@charleso
Copy link

charleso commented Apr 4, 2016

👍 for everything not in #99

@olorin olorin force-pushed the topic/count-summary branch from b6c4408 to 9d683a7 Compare April 5, 2016 00:51
@olorin olorin force-pushed the topic/count-summary branch from 9d683a7 to f106094 Compare April 5, 2016 00:57
@olorin olorin merged commit 94216ad into master Apr 5, 2016
@olorin olorin deleted the topic/count-summary branch April 5, 2016 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants