Skip to content
This repository has been archived by the owner on Aug 23, 2020. It is now read-only.

Create a tool that counts the size of each column family #936

Closed
2 of 4 tasks
jakubcech opened this issue Aug 13, 2018 · 6 comments · May be fixed by #1705
Closed
2 of 4 tasks

Create a tool that counts the size of each column family #936

jakubcech opened this issue Aug 13, 2018 · 6 comments · May be fixed by #1705
Assignees
Milestone

Comments

@jakubcech
Copy link
Contributor

jakubcech commented Aug 13, 2018

Description

We want to figure out what is the byte size of the data in each of the database column families. The result will be a tool that does the calculation. Use on the LatestDB.

Is there an existing implementation that we can use? KeyLoad

Motivation

Help us understand what the priority on the other issues (merge columns) is.

Requirements

  • Structure of the database in percentiles
  • Length of the max entry in the column
  • Length of the min entry in the column
  • Any other interesting statistics mean bonus points for the assignee
@jakubcech jakubcech added this to the Raanana milestone Aug 13, 2018
@alon-e
Copy link
Contributor

alon-e commented Aug 14, 2018

IOTA Related information:
https://iota.stackexchange.com/questions/798/how-to-read-the-tangle-database
https://pypi.org/project/python-rocksdb-iota/

GUI tools to manage rocksDB:

  • Keylord (v5.0+)
  • FastoNoSQL

@alon-e
Copy link
Contributor

alon-e commented Aug 16, 2018

First Results:
on a Mainnet DB: https://dbfiles.iota.org/mainnet/1.5.3/db-667848.tar

Each Column Family size in Percent

08/16 14:44:45.442 [main] INFO  c.i.i.c.TransactionViewModelTest - ----------------------------
08/16 14:44:45.443 [main] INFO  c.i.i.c.TransactionViewModelTest - transactionBytes: 84.62 % (max values: 1)
08/16 14:44:45.443 [main] INFO  c.i.i.c.TransactionViewModelTest - addressBytes: 2.25 % (max values: 4657237)
08/16 14:44:45.443 [main] INFO  c.i.i.c.TransactionViewModelTest - tagBytes: 2.12 %  (max values: 5377047)
08/16 14:44:45.443 [main] INFO  c.i.i.c.TransactionViewModelTest - approveeBytes: 5.49 %  (max values: 44447)
08/16 14:44:45.443 [main] INFO  c.i.i.c.TransactionViewModelTest - bundleBytes: 5.49 %  (max values: 44447)
08/16 14:44:45.443 [main] INFO  c.i.i.c.TransactionViewModelTest - milestoneBytes: 0.01 % (max values: 1)
08/16 14:44:45.443 [main] INFO  c.i.i.c.TransactionViewModelTest - stateDiffBytes: 0.02 % (max values: 792)
08/16 14:44:45.443 [main] INFO  c.i.i.c.TransactionViewModelTest - Total (uncompressed): 58.18 GB

@alon-e
Copy link
Contributor

alon-e commented Aug 16, 2018

Crude code for future use:

    @Test
    public void profileDBSectionsSize() throws Exception {

        RocksDBPersistenceProvider localRocksDBPP = new RocksDBPersistenceProvider("mainnetdb",
                "mainnetdb.log",1000);
        Tangle localTangle = new Tangle();
        localTangle.addPersistenceProvider(localRocksDBPP);
        localTangle.init();

        long counter;
        long size;
        List<Long> sizes = new LinkedList<>();

        //scan the whole DB to get the size of all the components:
        TransactionViewModel tx = TransactionViewModel.first(localTangle);
        counter = 0;
        while (tx != null) {
            if (++counter % 10000 == 0) {
                log.info("Scanned {} Transactions", counter);
            }
            tx = tx.next(localTangle);
        }
        // Key + Value + Metadata
        sizes.add(counter * (Hash.SIZE_IN_BYTES + Transaction.SIZE + 450));

        AddressViewModel add = AddressViewModel.first(localTangle);
        counter = 0;
        size = 0;
        while (add != null) {
            if (++counter % 10000 == 0) {
                log.info("Scanned {} Addresses", counter);
            }
            size += add.size();
            add = add.next(localTangle);
        }
        // Key + # of entries in each value ( + delimiter)
        sizes.add(counter * Hash.SIZE_IN_BYTES + size * (Hash.SIZE_IN_BYTES + 1));


        TagViewModel tag = TagViewModel.first(localTangle);
        counter = 0;
        size = 0;
        while (tag != null) {
            if (++counter % 10000 == 0) {
                log.info("Scanned {} Tags", counter);
            }
            size += tag.size();
            tag = tag.next(localTangle);
        }
        // Key + # of entries in each value ( + delimiter)
        sizes.add(counter * Hash.SIZE_IN_BYTES + size * (Hash.SIZE_IN_BYTES + 1));


        ApproveeViewModel approvee = ApproveeViewModel.first(localTangle);
        counter = 0;
        size = 0;
        while (approvee != null) {
            if (++counter % 10000 == 0) {
                log.info("Scanned {} Approvees", counter);
            }
            size += approvee.size();
            approvee = approvee.next(localTangle);
        }
        // Key + # of entries in each value ( + delimiter)
        sizes.add(counter * Hash.SIZE_IN_BYTES + size * (Hash.SIZE_IN_BYTES + 1));


        ApproveeViewModel bundle = ApproveeViewModel.first(localTangle);
        counter = 0;
        size = 0;
        while (bundle != null) {
            if (++counter % 10000 == 0) {
                log.info("Scanned {} Bundles", counter);
            }
            size += bundle.size();
            bundle = bundle.next(localTangle);
        }
        // Key + # of entries in each value ( + delimiter)
        sizes.add(counter * Hash.SIZE_IN_BYTES + size * (Hash.SIZE_IN_BYTES + 1));


        MilestoneViewModel milestone = MilestoneViewModel.first(localTangle);
        counter = 0;
        while (milestone != null) {
            if (++counter % 10000 == 0) {
                log.info("Scanned {} Bundles", counter);
            }
            milestone = milestone.next(localTangle);
        }
        sizes.add(counter * (Integer.BYTES + Hash.SIZE_IN_BYTES));


        MilestoneViewModel milestoneForStateDiff = MilestoneViewModel.first(localTangle);
        counter = 0;
        size = 0;
        while (milestoneForStateDiff != null) {
            if (++counter % 10000 == 0) {
                log.info("Scanned {} StateDiffs", counter);
            }
            StateDiffViewModel stateDiff = StateDiffViewModel.load(localTangle, milestoneForStateDiff.getHash());
            size += stateDiff.getDiff().size();
            milestoneForStateDiff = milestoneForStateDiff.next(localTangle);
        }

        sizes.add(counter * (Long.BYTES + Hash.SIZE_IN_BYTES) + size * (Long.BYTES + Hash.SIZE_IN_BYTES));

        double sum = sizes.stream().reduce((a,b)->a+b).get();
        int i = 0;
        log.info("----------------------------");
        log.info(String.format("transactionBytes: %.2f",  sizes.get(i++) / sum * 100));
        log.info(String.format("addressBytes: %.2f",  sizes.get(i++) / sum * 100));
        log.info(String.format("tagBytes: %.2f",  sizes.get(i++) / sum * 100));
        log.info(String.format("approveeBytes: %.2f",  sizes.get(i++) / sum * 100));
        log.info(String.format("bundleBytes: %.2f",  sizes.get(i++) / sum * 100));
        log.info(String.format("milestoneBytes: %.2f",  sizes.get(i++) / sum * 100));
        log.info(String.format("stateDiffBytes: %.2f",  sizes.get(i++) / sum * 100));
        log.info(String.format("Total (uncompressed): %.2f GB",  sum / 1073741824 /*GB */));

    }

@alon-e
Copy link
Contributor

alon-e commented Aug 16, 2018

So my conclusion for deleting transaction is:

  1. deleting the Transaction and its metadata covers the majority of DB size gain - > 80%.
  2. The approvees can also be easily deleted, b/c it's indexed by the hash on the TX, and all the values can be removed, approvees add 5% to size gain.
  3. Tags, Addresses, and Bundles (~3-5% each) require extensive work: 1. get a specific Tx field, and then read an entry, remove the Hash from the Set and store it again. I think this would cost much more than we gain - cost in I/O , Mem & Comp.
    3.a. we could do this lazy somehow, by marking the key as needing a clean-up, and when itis fetched actually execute the clean-up, but this is too complicated to start w/ imo.
  4. if we choose not to remove (3.) from the DB, then logic that iterates over these merge columns should be reviewed to see that no logic errors are introduced by missing Transaction corresponding to Hashes.

bottom line: I think we should start w/ only (1.) deleting the Transaction and attending to (4.)

@GalRogozinski
Copy link
Contributor

Any objections to closing this?

@jakubcech
Copy link
Contributor Author

None, it was left open before as @alon-e wanted to put the tool for this in a repo. But I don't think it's a priority to do anytime soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants