Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional optimised format #52

Open
Xless-Qu-dev opened this issue Mar 4, 2022 · 7 comments
Open

Optional optimised format #52

Xless-Qu-dev opened this issue Mar 4, 2022 · 7 comments

Comments

@Xless-Qu-dev
Copy link

Feature

An optional optimised version of the normal vox format which would use less disk space.

How

If i understand correctly, every voxels of a model are saved in the file. This is an easy way of doing it but not the most optimised.
I would suggest a "Group/*Chunk" saving technique. It would save not every voxels but chunk of voxels with the same property.

Ex : A cube with a side of voxel with pallet index 1 and the other with index 2.

Old : Every voxel would be saved.

New : 2 chunk whould be saved.

Format(xs, ys, zs, xe, ye, ze, index)
The first 6 values represents the start/end of the chunk bounds.
The last value represents the pallet index of the voxels inside that chunk.

@JelleBouma
Copy link

JelleBouma commented Aug 8, 2022

This is the exact same thing I was thinking about.
Important note is that this can be additional to the XYZI saving (if voxel is part of a cube, store it in CUBE chunk, otherwise in XYZI chunk).

pros/cons:
It would drastically reduce file size (especially for bigger models).
Encoding time would be increased (need to find cubes).
While decoding time would be decreased (decode an entire cube at once).
Old files are fully compatible with new encoders/decoders (just make CUBE chunk optional), but new files are not compatible with old encoders/decoders.

@JelleBouma
Copy link

JelleBouma commented Aug 15, 2022

I actually had a go at this, encoding some voxels in the XYZI+CUBE format.
Example voxels that come bundled with Magicavoxel:
2.71 MB (2,842,706 bytes) in the XYZI format
401 KB (410,931 bytes) in the XYZI+CUBE format

For my own voxels which dont use a lot of different colours the size difference is even greater (1/24):
5.12 MB (5,377,820 bytes) in the XYZI format
221 KB (226,475 bytes) in the XYZI+CUBE format

I wrote a cube finding algorithm thats pretty fast (i can read file, find cubes and write file, all in under 200ms for any of the models) so performance shouldnt be much of an issue.

for (int xx = 0; xx < size.x; xx++)
    for (int yy = 0; yy < size.y; yy++)
        for (int zz = 0; zz < size.z; zz++) { // for every possible voxel location
            int i = matrix[xx][yy][zz];
            if (i != 0) { // if there is a voxel here, start making a cube
                XYZ low = new XYZ(xx, yy, zz); // lowest point of the cube
                XYZ high = new XYZ(xx, yy, zz); // highest point of the cube
                boolean[] cubeExpanded = {true, true, true};
                while (cubeExpanded[0] || cubeExpanded[1] || cubeExpanded[2]) { // while cube can be expanded
                    if (cubeExpanded[0]) // if x side is being expanded
                        for (int yy2 = low.y; yy2 <= high.y; yy2++) // try to find a voxel that stops x side being expanded
                            for (int zz2 = low.z; zz2 <= high.z; zz2++)
                                cubeExpanded[0] &= (high.x + 1 != size.x) && matrix[high.x + 1][yy2][zz2] == i;
                    if (cubeExpanded[0]) // if x side can be expanded, do so
                        high.x++;
                    if (cubeExpanded[1]) // if y side is being expanded
                        for (int xx2 = low.x; xx2 <= high.x; xx2++) // try to find a voxel that stops y side being expanded
                            for (int zz2 = low.z; zz2 <= high.z; zz2++)
                                cubeExpanded[1] &= (high.y + 1 != size.y) && matrix[xx2][high.y + 1][zz2] == i;
                    if (cubeExpanded[1]) // if y side can be expanded, do so
                        high.y++;
                    if (cubeExpanded[2]) // if z side is being expanded
                        for (int xx2 = low.x; xx2 <= high.x; xx2++) // try to find a voxel that stops z side being expanded
                            for (int yy2 = low.y; yy2 <= high.y; yy2++)
                                cubeExpanded[2] &= (high.z + 1 != size.z) && matrix[xx2][yy2][high.z + 1] == i;
                    if (cubeExpanded[2]) // if z side can be expanded, do so
                        high.z++;
                }
                add(new Cube(low, high, i)); // create the cube
                for (int xx2 = low.x; xx2 <= high.x; xx2++) // save progress into matrix, so we dont have to check again for the voxels in this cube.
                    for (int yy2 = low.y; yy2 <= high.y; yy2++)
                        for (int zz2 = low.z; zz2 <= high.z; zz2++)
                            matrix[xx2][yy2][zz2] = 0;
            }
        }

For the algorithm to work you first need to put the colour indexes in a matrix.
With a matrix its much quicker to access adjacent voxels, because you dont have to traverse a list each time.

@Xless-Qu-dev
Copy link
Author

Xless-Qu-dev commented Aug 25, 2022

I also had an idea to make it even smaller but at the cost of even slower decoding and encoding.

The simplest example is a checker board which the current algorimth wouldn't be able to do much with. So, to fix this, we could implement an hierarchy system for the chunks which would essentially allows for chunk overlapping.

Taking our board, we would take one of the colors (normally the one with the most amount of chunks) and make it (in our specific case) fill the entire model. This leave use with the other voxels (that can also be air) which we would simply save at their normal location but as childrens of the big chunk so that the decoder would know what to overlaps with what.

This would reduce by half the size of the checker board which is a win and even more other models. But this comes at the cost of encoding/decoding speed so that could be optional.

@JelleBouma
Copy link

JelleBouma commented Aug 27, 2022

Having air overwrite voxels will in a lot of cases increase size (compared to the XYZI+CUBE format) because now we would have to store air too.
This is problematic because if the hierarchy scheme only reduces size in some cases, then for that to work well you would have to encode in all the schemes and compare the results to pick the best one.

I think a better solution to the checkerbox problem would be:
Find 45 degree lines (from remaining voxels after finding cubes), store them in a new chunk LINE
Any line would take only 6 bytes (starting point, direction, length, colour) and cant increase file size by a significant amount (chunks have only a 12 byte overhead).

alternatively (not as widely applicable but would be even better at this specific problem):
find checkerboxes (from remaining voxels after finding cubes), store them in a new optional chunk CHECK
Would reduce size significantly when there are big checkerboxes (any checkerbox would only need 8 bytes), no real size increase if there arent (m)any.

These could even be combined (finding cubes first, then checkerboxes and then 45 degree lines).
Of course there is a speed cost here too but its less significant i think (compared to hierarchy method).

@JelleBouma
Copy link

JelleBouma commented Aug 27, 2022

In conclusion, I am not sure if the checkerbox problem warrants fixing. It is the worst case scenario where we effectively only store XYZI and not CUBE, so same as the current MagicaVoxel format. I do not estimate this to be a common type of model.

I might run some tests to see if storing 45 degree lines can achieve a significant size reduction in my earlier mentioned test sets.
But I am expecting the performance loss to far outweigh any compression gains.

@JelleBouma
Copy link

JelleBouma commented Sep 4, 2022

I thought about this some more and figured out that there is even more to gain by storing colour mapping more efficiently.

If we sort our voxels/cubes by colour, we can then create an extra chunk to map the colours to our voxels/cubes.
In this chunk we can store colour mapping as 4 bytes per colour, instead of the current 1 byte per voxel/cube.
Assuming many more voxels/cubes than colours this can reduce size significantly, and there cannot be more colours than cubes/voxels anyways.

The way we do this is we create a new chunk VMAP (for voxels) and store in this 4 bytes for each colour actually used.
These 4 bytes are: the colour index (byte 0) and the amount of voxels that have this colour (byte 1-3).
In our voxel chunk we store the voxels in the same order as the VMAP, so if the VMAP starts with 90 white voxels the first 90 voxels are white by default and we can store the voxels as only location (3 bytes per voxel instead of 4).
For cubes we can do exactly the same thing but in a separate chunk CMAP.

Additional compression using this technique:
Example voxels that come bundled with Magicavoxel:
2.71 MB (2,842,706 bytes) in the XYZI format
401 KB (410,931 bytes) in the XYZI+CUBE format
346 KB (354.654 bytes) in the VMAP+XYZ +CMAP+CUBE format

My own voxels (which don't use a lot of colours):
5.12 MB (5,377,820 bytes) in the XYZI format
221 KB (226,475 bytes) in the XYZI+CUBE format
193 KB (197.934 bytes) in the VMAP+XYZ +CMAP+CUBE format

Sorting the voxels/cubes by colour is pretty fast, so the performance loss is negligible.

PS: Note that with this method we could also easily remove the need for the RGBA palette chunk by instead of using indexes, just writing the full colour (4 bytes to represent a colour instead of 1) in the maps. This would remove the 255 colour limitation.

@JelleBouma
Copy link

@Xless-Qu-dev @ephtracy I have made a github repository to describe this optimised file format in more detail: https://github.com/JelleBouma/cvox

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants