Skip to content
This repository has been archived by the owner on Jul 7, 2020. It is now read-only.

HLL++ in sparse mode can be large than in normal mode #38

Closed
cykl opened this issue Jul 2, 2013 · 25 comments
Closed

HLL++ in sparse mode can be large than in normal mode #38

cykl opened this issue Jul 2, 2013 · 25 comments

Comments

@cykl
Copy link
Contributor

cykl commented Jul 2, 2013

Today I played a little bit with HLL++ in sparse mode. I have tens of millions HLL estimators and most of them have a low cardinality. Using HLL or HLL++ in normal mode is not memory efficient for this use case. To be picky I don't care that much about memory consumption, I am trying to minimize the serialized size of the estimators .

This whole idea behind the sparse mode is to not waste memory with the normal representation when we can do better for small cardinality. It sounds reasonable to switch back to the normal representation as soon as the sparse mode consume more memory.

However I don't observe a such behavior with HyperLogLogPlus:

HyperLogLog:

        HyperLogLog hll = new HyperLogLog(14);
        System.out.println(hll.getBytes().length);

=> 10932

HyperLogLogPlus in normal mode

        HyperLogLogPlus hllp = new HyperLogLogPlus(14);
        System.out.println(hllp.getBytes().length);

=> 10940 

Empty HyperLogLogPlus in sparse mode

        HyperLogLogPlus hllp = new HyperLogLogPlus(14, 14);
        System.out.println(hllp.getBytes().length);

=> 16

5K elements with HyperLogLogPlus in sparse mode

        Random r = new Random();

        HyperLogLogPlus hllp = new HyperLogLogPlus(14, 14);

        for (int i = 0; i < 5000; i++) {
            hllp.offer(Integer.toString(r.nextInt()));
        }

        System.out.println(hllp.getBytes().length);

=> 25495

According to the source code the sparseSetThreshold only depends of p and is set to 12288 for p = 14. It means that if the set contains 12000 elements, I'm wasting almost 40KBytes compared to the normal representation.

Am I wrong ? Is this behavior expected ?

My second question would be: Do we really want to create the RegisterSet even when we are in Sparse mode ? It ruins the goal to be memory efficient. It currently does not matters for me since my bottleneck is the serialization size but I find this quite surprising.

@abramsm
Copy link
Contributor

abramsm commented Jul 2, 2013

I think you are right. We've struggled a bit with this threshold value. I
need to re-review the paper and our implementation to see why the threshold
is not working as intended. Thanks for pointing this. I'll follow up once
I've had a chance to investigate further.

Matt

On Tue, Jul 2, 2013 at 10:51 AM, Clément MATHIEU
[email protected]:

Today I played a little bit with HLL++ in sparse mode. I have tens of
millions HLL estimators and most of them have a low cardinality. Using HLL
or HLL++ in normal mode is not memory efficient for this use case. To be
picky I don't care that much about memory consumption, I am trying to
minimize the serialized size of the estimators .

This whole idea behind the sparse mode is to not waste memory with the
normal representation when we can do better for small cardinality. It
sounds reasonable to switch back to the normal representation as soon as
the sparse mode consume more memory.

However I don't observe a such behavior with HyperLogLogPlus:

HyperLogLog:

    HyperLogLog hll = new HyperLogLog(14);
    System.out.println(hll.getBytes().length);

=> 10932

HyperLogLogPlus in normal mode

    HyperLogLogPlus hllp = new HyperLogLogPlus(14);
    System.out.println(hllp.getBytes().length);

=> 10940

Empty HyperLogLogPlus in sparse mode

    HyperLogLogPlus hllp = new HyperLogLogPlus(14, 14);
    System.out.println(hllp.getBytes().length);

=> 16

5K elements with HyperLogLogPlus in sparse mode

    Random r = new Random();

    HyperLogLogPlus hllp = new HyperLogLogPlus(14, 14);

    for (int i = 0; i < 5000; i++) {
        hllp.offer(Integer.toString(r.nextInt()));
    }

    System.out.println(hllp.getBytes().length);

=> 25495

According to the source code the sparseSetThreshold only depends of p and
is set to 12288 for p = 14. It means that if the set contains 12000
elements, I'm wasting almost 40KBytes compared to the normal representation.

Am I wrong ? Is this behavior expected ?

My second question would be: Do we really want to create the RegisterSeteven when we are in Sparse mode ? It ruins the goal to be memory efficient.
It currently does not matters for me since my bottleneck is the
serialization size but I find this quite surprising.


Reply to this email directly or view it on GitHubhttps://github.com//issues/38
.

@cykl
Copy link
Contributor Author

cykl commented Jul 2, 2013

Unfortunately I think that I will be too busy in the next weeks to help you to fix this one, and my current knowledge of this area of HLL++ is quite limited. However I will try to give it a look.

I also noticed that the two threshold values are "added". The sort threshold is added to the sparseSetThreshold, postponing even further the switch to the normal representation.

What do you think about the lazy RegisterSet creation ? Would not it be better and meet the goal of the sparse mode ? This one is quite easy to fix and does not require a deep knowledge of the paper.

@abramsm
Copy link
Contributor

abramsm commented Jul 5, 2013

I spent some time investigating this and the issue is how we store the
bytes on disk. We use a variable length integer in the sparse set. When
those variable length integers are stored on disk we write the number of
bytes used by each varint on disk using a normal Java int and that has a
very bad impact on the size of the serialized version of the estimator.
Using your example if we remove the marker that tracks the length of each
varint the size on disk goes from 25K down to 8K. If I use a variable
length integer to store the length of each varint in the sparse array the
serialized size of an estimator with 5K elements is 12.4K.

12.4K is still about 1.5K bigger than a comparable normal HLLplus
serialized object but it does provide the benefit of being more precise.
Do you have any other suggestions for how we might represent the sparse
set on disk more efficiently? Making this change will be binary
incompatible with HLLplus instances serialized with the algorithm currently
in master. Not much we can do about that now but I'm thinking of adding a
version number to the serialized instance so we can have more flexibility
in the future.

While investigating this I also noticed that we were under counting the
sparseSet size so the conversion to normal representation was not happening
early enough. I'll fix that issue as well when I commit this change.

Also your point on creating the RegisterSet even when we are in normal mode
is a good one. I'll work on instantiating that on-demand to save memory
when it isn't in use.

Matt

On Tue, Jul 2, 2013 at 3:15 PM, Clément MATHIEU [email protected]:

Unfortunately I think that I will be too busy in the next weeks to help
you to fix this one, and my current knowledge of this area of HLL++ is
quite limited. However I will try to give it a look.

I also noticed that the two threshold values are "added". The sort
threshold is added to the sparseSetThreshold, postponing even further the
switch to the normal representation.

What do you think about the lazy RegisterSet creation ? Would not it be
better and meet the goal of the sparse mode ? This one is quite easy to fix
and does not require a deep knowledge of the paper.


Reply to this email directly or view it on GitHubhttps://github.com//issues/38#issuecomment-20368811
.

@abramsm
Copy link
Contributor

abramsm commented Jul 5, 2013

Alright, good news. I've modified the code so that we can read and write sparse representations with no delimiter. This means the sparse set with 5K elements is down to 8K.

Further good news is I realized that we can use the varint encoding for encoding/decoding normal representations as well. This brings the encoded size of a normal serialized HLL++ object with p=14 to 7K (down from 10K).

This will certainly break binary compatibility. I'm thinking of cutting the 2.4.0 release and then moving to 3.x snapshot for this change.

@cykl
Copy link
Contributor Author

cykl commented Jul 8, 2013

Thanks for your work. I am definitively not fluent with the sparse mode and I will most likely I say silly things. Please correct me when I am wrong.

We use a variable length integer in the sparse set. When those variable length integers are stored on disk we write the number of bytes used by each varint on disk using a normal Java int and that has a very bad impact on the size of the serialized version of the estimator.

Does not it have exactly the same impact on the memory footprint ? sparseSet is a list of byte[] and getBytes only writes the size of the buffer then its payload. Memory usage should be greater than disk usage, the JVM has to store exactly the same payload plus the overhead of the list.

Do you have any other suggestions for how we might represent the sparse set on disk more efficiently?

I currently prefer to not be that adventurous ;)

Further good news is I realized that we can use the varint encoding for encoding/decoding normal representations as well. This brings the encoded size of a normal serialized HLL++ object with p=14 to 7K (down from 10K).

Does not it introduce a regression for high cardinality or pathologic streams ? Don't we get a similar compression factor using Snappy or Gzip ? Lot of zeros in a row usually compress quite well. Most likely better than an integer with only bits 25 to 30 set with a varInt encoding.

I think that if the ratio is the same or even slightly better, then it is better to keep the dumb version. The size of the serialized object is highly predictable with an easy to check upper bound, and users are free to use the codec they want. Do they want to be fast or size efficient ? Let them decide.

This will certainly break binary compatibility. I'm thinking of cutting the 2.4.0 release and then moving to 3.x snapshot for this change.

Sounds very reasonable to me.

For 2.4.0 we could:

  • Compute a better threshold value with this serialization format
  • Detect in getBytes when the sparse mode overhead is too high and switch to the normal mode (really or only for the data returned by getBytes). It will cheaply fix the issue while being backward compatible.

Personally, I had no other alternative than to hard code a threshold value since we always use the same log2m value. Any of the two previous solutions would be better. If you want I can try to work on it.

Your suggestion regarding the sparse mode serialization in 3.0.0 sounds fine to me. I will try to see if I can find something more efficient. Is your code available somewhere ?

@abramsm
Copy link
Contributor

abramsm commented Jul 10, 2013

On Mon, Jul 8, 2013 at 11:03 AM, Clément MATHIEU
[email protected]:

Thanks for your work. I am definitively not fluent with the sparse mode
and I will most likely I say silly things. Please correct me when I am
wrong.

We use a variable length integer in the sparse set. When those variable
length integers are stored on disk we write the number of bytes used by
each varint on disk using a normal Java int and that has a very bad impact
on the size of the serialized version of the estimator.

Does not it have exactly the same impact on the memory footprint ?
sparseSet is a list of byte[] and getBytes only writes the size of the
buffer then its payload. Memory usage should be greater than disk usage,
the JVM has to store exactly the same payload plus the overhead of the list.

This has no impact on memory usage, just the number of bytes required to
represent the data structure on disk. Previously serialization of a sparse
HLL would look like:

[4 bytes for length of varint][n bytes for varint]...

The 4 bytes for the length of varint were not necessary and can be totally
removed. So in new encoding the space savings will by 4 * number of
elements in the sparse set.

Do you have any other suggestions for how we might represent the sparse
set on disk more efficiently?

I currently prefer to not be that adventurous ;)

Further good news is I realized that we can use the varint encoding for
encoding/decoding normal representations as well. This brings the encoded
size of a normal serialized HLL++ object with p=14 to 7K (down from 10K).

Does not it introduce a regression for high cardinality or pathologic
streams ? Don't we get a similar compression factor using Snappy or Gzip ?
Lot of zeros in a row usually compress quite well. Most likely better than
an integer with only bits 25 to 30 set with a varInt encoding.

I think that if the ratio is the same or even slightly better, then it is
better to keep the dumb version. The size of the serialized object is
highly predictable with an easy to check upper bound, and users are free to
use the codec they want. Do they want to be fast or size efficient ? Let
them decide.

Fair point, may not be worth including change for normal representation.

This will certainly break binary compatibility. I'm thinking of cutting
the 2.4.0 release and then moving to 3.x snapshot for this change.

Sounds very reasonable to me.

For 2.4.0 we could:

  • Compute a better threshold value with this serialization format
  • Detect in getBytes when the sparse mode overhead is too high and
    switch to the normal mode (really or only for the data returned by
    getBytes). It will cheaply fix the issue while being backward compatible.

I don't think we can/should include the second bullet in the 2.4.0 release.
For some people accuracy is more important than space efficiency. So if
all of the sudden their previously very accurate sparse representations
become less accurate normal representations it could cause problems.

I've thought about this a bit further and I think we may be able to do this
in a backwards compatible way. If we include a value that would have been
illegal in the previous encoding stream at the beginning of the encoding
then we can switch between the new and old decoding schemes depending on
the type detection. The code would always use the new encoding scheme but
it would at least be able to read HLLs encoded using the older format.

Personally, I had no other alternative than to hard code a threshold value
since we always use the same log2m value. Any of the two previous
solutions would be better. If you want I can try to work on it.

Your suggestion regarding the sparse mode serialization in 3.0.0 sounds
fine to me. I will try to see if I can find something more efficient. Is
your code available somewhere ?

My code is WIP, hope to finish it this week or next. It is available in
this branch:
https://github.com/clearspring/stream-lib/tree/serialization_improvements

Matt


Reply to this email directly or view it on GitHubhttps://github.com//issues/38#issuecomment-20611641
.

@abramsm
Copy link
Contributor

abramsm commented Jul 10, 2013

I've pushed one more update to the serialization branch. Changes are:

-> do not use varint for encoding normal register sets
-> add version number to the first 4 bytes of the encoded values
-> while decoding if the version is not present degrade to legacy decoding

Net result is we should be able to deploy the new code in 2.4.0x without breaking existing systems. They will magically get the new encoding format so after the HLL+ objects serialized with older code are decoded and then re-encoded using the 2.4.0x branch the re-encoded size should be much smaller if the HLL+ was in SPARSE mode.

It would be great if you could test this branch and confirm performance/correctness. I'll move it to master after you've had a chance to test.

Thanks,
Matt

@cykl
Copy link
Contributor Author

cykl commented Jul 10, 2013

This has no impact on memory usage, just the number of bytes required to represent the data structure on disk.

It definitively has impact on memory usage. Basically the memory overhead for each byte[] is 12 bytes (8 bytes of object overhead, 4 bytes for array length). An int is 4 byte, we are trying to optimize it to 1, 2 or 3 bytes for small values. Game over. Not doable in Java nor in other languages if you consider memory alignment issues excepted in a large byte stream like we use for getBytes.

        HyperLogLogPlus hllp = new HyperLogLogPlus(14, 14);
        for (int i = 0; i < 10_000 ; i++) {
            hllp.offer(i );
        }

        System.out.println("Size: " + hllp.getBytes().length);

It prints 43587, but the retained size of sparseSet is 216,440 bytes and tmpSet is 18,472 bytes. The memory footprint is 6 times greater than the serialized buffer and 21 times greater than the registerSet in normal mode.

I'm not sure to fully understand all the code but according to theses facts, i do believe that deltaAdd only leads to an increased memory consumption. Would not a plain old array of int be more memory efficient ?

I don't think we can/should include the second bullet in the 2.4.0 release.

Fine for me.

I've thought about this a bit further and I think we may be able to do this in a backwards compatible way

Good idea. We currently only use the first bit of the first byte. Plenty of unused space to encode a version of whatever.

My code is WIP, hope to finish it this week or next. It is available in this branch

Cool ! I will try to give a look at it.

@cykl
Copy link
Contributor Author

cykl commented Jul 10, 2013

It would be great if you could test this branch and confirm performance/correctness. I'll move it to master after you've had a chance to test.

I can merge your branch into mine without "polluting" the master if you prefer. I will keep you informed about the output of my tests (ETA Friday or Monday)

@abramsm
Copy link
Contributor

abramsm commented Jul 12, 2013

I misunderstood your comment re memory. I thought you were referring just to the impact of serialization on memory footprint. I think you are probably correct. If we stick with an int array for the in memory representation it should produce a better result. When we encode/decode the array we can convert that to a delta encoded var int byte stream for efficient serialized representations. I'll take a crack at these changes soon.

@cykl
Copy link
Contributor Author

cykl commented Jul 12, 2013

I'm currently playing with your branch and testing the byte[] -> int switch. Seems doable, my branch passes the test suite. It still use a List which is a huge memory waste due to autoboxing. Since we are always adding to the end of the list, using an array of integer is an option and will make the sparse mode memory efficient.

BTW: Perhaps we could base this work on pull request #34 (even if we keep the IOException) ?

Regarding the varInt serialization we should be able to avoid the extra byte added before each varInt to indicate its size. It is usually piggy backed. If you are using the same encoding than protocol buffer, then you can detect the last byte using the most significant bit of the byte. If it is set to 1 then you must read the next byte, otherwise you reached the end of the varInt.

Let me know if you want me to work on any of theses topics.

@abramsm
Copy link
Contributor

abramsm commented Jul 12, 2013

Yes, I've been working on the same thing. I've replaced the List(s) with int arrays and moved the register set to lazy instantiation.

Now.

HyperLogLogPlus hllp = new HyperLogLogPlus(14, 14);
for (int i = 0; i < 10000 ; i++) {
  hllp.offer(i );
}

System.out.println("Size: " + hllp.getBytes().length);

Prints out 22316 and the retained size of the entire HyperLogLogPlus object is 42,232. The sparse set retains 29840 and the tmpset retains 12312. Nothing else in the object consumes significant resources.

All tests pass locally for me but this does introduce some more complexity when it comes to the thread safety of the class (lazy instantiation of RegisterSet as an example).

I think I can squeeze a bit more efficiency out of the serialized representation of the sparse set by using delta encoding (currently I'm just using varints).

I will try to work on that and backwards compatibility over the weekend. Let me get this branch in a good place and then we can decide what to do with #34.

Matt

@cburroughs
Copy link
Contributor

If we stick with an int array for the in memory representation it should produce a better result.

Wouldn't a long[] be more efficient than int[]?

@cykl
Copy link
Contributor Author

cykl commented Jul 13, 2013

If we don't need to put longs inside it, then no. An array of native type has no per item memory overhead. Using longs to store the data would make the code more complex or waste memory (since the current implementation uses a list of varInt to store the delta-encoded values I assume that we have to store integer).

@abramsm
Copy link
Contributor

abramsm commented Jul 14, 2013

I've pushed another update to the https://github.com/clearspring/stream-lib/tree/serialization_improvements branch. I believe this version is still backwards compatible with the 2.4.x branches but i could use some help testing.

These changes improve memory efficiency by using a normal int array for storing the sparse set in memory (thanks to Clément for the tip on that). I've also modified the serialization code for sparse sets to use a delta encoded list rather than standard varints. The result is that persisting the HLLPlus instance with 10K objects now takes 13761 bytes. This is down from 43587 bytes using the current serialization approach in master.

One side effect of this change is that the HLLPlus instance is even less thread safe than it was before. We'll need to add some concurrency protection to the indexes that track our current position in the tmpSet and sparseSet as the instance is mutated in order to make it thread safe.

I have some local code, not pushed to git yet, that uses lazy instantiation for the register set. This reduces the memory foot print further (and adds even more thread safety issues). I'll probably commit that code to a different branch soon for consideration.

Clément - when you have a chance can you run this against your benchmark suite? I'm unsure of the impact of these changes on runtime performance. Also if you see any backwards compatibility issues please let me know.

@cykl
Copy link
Contributor Author

cykl commented Jul 15, 2013

I should be able to do it today.

Did you give a look to the extra prefix byte for VarInt encoding or do you want me to test it ?

@abramsm
Copy link
Contributor

abramsm commented Jul 15, 2013

Thanks. The branch you are testing has the extra prefix byte removed. I've added a test that checks that the code in this branch is able to decode instances encoded with the branch in master but it would be nice to have further validation that there are no compatibility issues.

@cykl
Copy link
Contributor Author

cykl commented Jul 15, 2013

I did wrote a comment several hours ago but it seems that I forgot to send it :/

I tested your branch today. Unfortunately, I found regressions in the sparse mode running this very simple test:

hllp.offer("foo");
hllp.cardinality() == 1 // Returns 2

Attaching a debugger shown that at least the sort method returns bogus results. The sparse array should contains only one element not two (we get [XXXXX, 0]). This due to bad usage of array iterators and/or length attribute. The index attribute must always be used to ensure we are not reading garbage. I believe that this error has been made in several places.

I stopped my tests here.

IMHO, we should try to have a more object oriented design. It could lead to simpler and more robust code.

@abramsm
Copy link
Contributor

abramsm commented Jul 15, 2013

Not sure what happened there. Not a great commit. I've fixed the issue and now all tests, including a new 'testOne' pass. Patches are welcome for changing the design. e4466d8

@cykl
Copy link
Contributor Author

cykl commented Jul 16, 2013

Will give it a try again today and bench it if it passes the test suite. Thanks.

@cykl
Copy link
Contributor Author

cykl commented Jul 16, 2013

Found another regression due to this change:

     public int sizeof()
     {
-        return registerSet.size * 4;
+        switch (format)
+        {
+            case NORMAL:
+                return registerSet.size * 4;
+            case SPARSE:
+                return getSparseSetSize();
+            default:
+                throw new RuntimeException("unexpected format type: " + format);
+        }
     }

sizeof is used to decide if two estimator are compatible.

@abramsm
Copy link
Contributor

abramsm commented Jul 16, 2013

Reverted that change. Was there for experiments with lazy init of
registerSet.

On Tue, Jul 16, 2013 at 7:09 AM, Clément MATHIEU
[email protected]:

Found another regression due to this change:

 public int sizeof()
 {
  •    return registerSet.size \* 4;
    
  •    switch (format)
    
  •    {
    
  •        case NORMAL:
    
  •            return registerSet.size \* 4;
    
  •        case SPARSE:
    
  •            return getSparseSetSize();
    
  •        default:
    
  •            throw new RuntimeException("unexpected format type: " + format);
    
  •    }
    
    }

sizeof is used to decide if two estimator are compatible.


Reply to this email directly or view it on GitHubhttps://github.com//issues/38#issuecomment-21035217
.

@cykl
Copy link
Contributor Author

cykl commented Jul 17, 2013

Just ran the benchmarks, master VS your branch. Same performances for normal mode. Sparse mode is a little bit faster.

Results follows:

Master

Benchmark                                  Mode Thr    Cnt  Sec         Mean   Mean error    Units
i.u.s.b.HllpCardMicro.empty               thrpt   1      3    5 26426086.011   470387.506  ops/sec
i.u.s.b.HllpCardMicro.large               thrpt   1      3    5    12770.094       75.797  ops/sec
i.u.s.b.HllpCardMicro.medium              thrpt   1      3    5 26191455.017    80573.629  ops/sec
i.u.s.b.HllpCardMicro.scalability4        thrpt   4      1    5    47245.100          NaN  ops/sec
i.u.s.b.HllpMergeMicro.dumb               thrpt   1      3    5    61411.017     2096.195  ops/sec
i.u.s.b.HllpMergeMicro.evolving           thrpt   1      3    5    80633.800     5730.843  ops/sec
i.u.s.b.HllpOfferMicro.simple             thrpt   1      3    5  5641189.672   103528.767  ops/sec
i.u.s.b.HllpMergeMacro.merge1GB            avgt   1      3    5        9.226        0.075   sec/op
i.u.s.b.HllpMergeMacro.mergeFixedCount     avgt   1      3    5        3.553        0.115   sec/op

Branch

Benchmark                                  Mode Thr    Cnt  Sec         Mean   Mean error    Units
i.u.s.b.HllpCardMicro.empty               thrpt   1      3    5 24918286.061   236791.845  ops/sec
i.u.s.b.HllpCardMicro.large               thrpt   1      3    5    12526.267       26.491  ops/sec
i.u.s.b.HllpCardMicro.medium              thrpt   1      3    5 26038206.506   160278.228  ops/sec
i.u.s.b.HllpCardMicro.scalability4        thrpt   4      3    5    47460.822       64.950  ops/sec
i.u.s.b.HllpCardMicro.scalabilityMax      thrpt   4      3    5    46252.589       19.677  ops/sec
i.u.s.b.HllpMergeMicro.dumb               thrpt   1      3    5     2452.961       41.900  ops/sec
i.u.s.b.HllpMergeMicro.evolving           thrpt   1      3    5    79574.922      865.980  ops/sec
i.u.s.b.HllpMergeMicro.scalability4       thrpt   4      3    5   232855.978     1971.300  ops/sec
i.u.s.b.HllpOfferMicro.scalability4       thrpt   4      3    5 18184529.467   279857.791  ops/sec
i.u.s.b.HllpOfferMicro.scalabilityMax     thrpt   4      3    5 18146584.033   222648.258  ops/sec
i.u.s.b.HllpOfferMicro.simple             thrpt   1      3    5  5522740.300   337939.893  ops/sec
i.u.s.b.HllpMergeMacro.merge1GB            avgt   1      3    5        6.497        0.012   sec/op
i.u.s.b.HllpMergeMacro.mergeFixedCount     avgt   1      3    5        1.558        0.016   sec/op

You can observe than the merge macro benchmark is now 30% faster. Half of the merged HLL should be in sparse mode (max cardinality is 80K)

I also tried your branch with one of our jobs. It seems fine but I need more time to assert that everything is fine. Perhaps somebody else could try to use it ?

@abramsm
Copy link
Contributor

abramsm commented Jul 17, 2013

Thanks for the feedback and glad to see positive results. One of our
engineers is going to run a similar test suite. We'll update the thread
once that is done. At that point I'd be comfortable merging this into
master.

Matt

On Wed, Jul 17, 2013 at 3:07 PM, Clément MATHIEU
[email protected]:

Just ran the benchmarks, master VS your branch. Same performances for
normal mode. Sparse mode is a little bit faster.

Results follows:

Master

Benchmark Mode Thr Cnt Sec Mean Mean error Units
i.u.s.b.HllpCardMicro.empty thrpt 1 3 5 26426086.011 470387.506 ops/sec
i.u.s.b.HllpCardMicro.large thrpt 1 3 5 12770.094 75.797 ops/sec
i.u.s.b.HllpCardMicro.medium thrpt 1 3 5 26191455.017 80573.629 ops/sec
i.u.s.b.HllpCardMicro.scalability4 thrpt 4 1 5 47245.100 NaN ops/sec
i.u.s.b.HllpMergeMicro.dumb thrpt 1 3 5 61411.017 2096.195 ops/sec
i.u.s.b.HllpMergeMicro.evolving thrpt 1 3 5 80633.800 5730.843 ops/sec
i.u.s.b.HllpOfferMicro.simple thrpt 1 3 5 5641189.672 103528.767 ops/sec
i.u.s.b.HllpMergeMacro.merge1GB avgt 1 3 5 9.226 0.075 sec/op
i.u.s.b.HllpMergeMacro.mergeFixedCount avgt 1 3 5 3.553 0.115 sec/op

Branch

Benchmark Mode Thr Cnt Sec Mean Mean error Units
i.u.s.b.HllpCardMicro.empty thrpt 1 3 5 24918286.061 236791.845 ops/sec
i.u.s.b.HllpCardMicro.large thrpt 1 3 5 12526.267 26.491 ops/sec
i.u.s.b.HllpCardMicro.medium thrpt 1 3 5 26038206.506 160278.228 ops/sec
i.u.s.b.HllpCardMicro.scalability4 thrpt 4 3 5 47460.822 64.950 ops/sec
i.u.s.b.HllpCardMicro.scalabilityMax thrpt 4 3 5 46252.589 19.677 ops/sec
i.u.s.b.HllpMergeMicro.dumb thrpt 1 3 5 2452.961 41.900 ops/sec
i.u.s.b.HllpMergeMicro.evolving thrpt 1 3 5 79574.922 865.980 ops/sec
i.u.s.b.HllpMergeMicro.scalability4 thrpt 4 3 5 232855.978 1971.300 ops/sec
i.u.s.b.HllpOfferMicro.scalability4 thrpt 4 3 5 18184529.467 279857.791 ops/sec
i.u.s.b.HllpOfferMicro.scalabilityMax thrpt 4 3 5 18146584.033 222648.258 ops/sec
i.u.s.b.HllpOfferMicro.simple thrpt 1 3 5 5522740.300 337939.893 ops/sec
i.u.s.b.HllpMergeMacro.merge1GB avgt 1 3 5 6.497 0.012 sec/op
i.u.s.b.HllpMergeMacro.mergeFixedCount avgt 1 3 5 1.558 0.016 sec/op

You can observe than the merge macro benchmark is now 30% faster. Half of
the merged HLL should be in sparse mode (max cardinality is 80K)

I also tried your branch with one of our jobs. It seems fine but I need
more time to assert that everything is fine. Perhaps somebody else could
try to use it ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/38#issuecomment-21136567
.

@abramsm
Copy link
Contributor

abramsm commented Jul 30, 2013

these changes have been merged into master. thanks to @cykl for helping so much with this patch.

@abramsm abramsm closed this as completed Jul 30, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants