Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to get an idea about where the algorithm is flexible #2

Open
hohMiyazawa opened this issue Jan 20, 2021 · 7 comments
Open

Trying to get an idea about where the algorithm is flexible #2

hohMiyazawa opened this issue Jan 20, 2021 · 7 comments

Comments

@hohMiyazawa
Copy link
Contributor

If you don't mind me poking around a bit more :)

This is mostly based off of your description, though I checked out some relevant parts of the code.

Clustering/grid

This seems to me the most interesting part. Typically, compression algorithms aim to partition the input into chunks of uncorrelated data, to use a set of parameters tailored for each individual chunck.

In this regard, this looks much more sophisticated than what's usually done, namely slicing the image up into a grid (which you also do as the next step). Would it be possible to get some sort of estimate for how much storage the clustering metadata demands? (an x times y grid, for instance, only has two numbers worth of overhead).

From what I understand, your approach is not limited to a specific algorithm for deciding on the clusters for unique decodability. This should mean that there's the option for asymmetric encoding, where the decoder is computationally inexpensive in that it just has to decode only the single layout you ended up with, while the encoder is free to try multiple different approaches.

Calculating Lists

Every cluster has a minimum pixel value. This value is according to the minimum R, G, B value in that chunk. The resulting pixel value is an addition of this chunk value and the encoded pixel value.

It seems to me, you are using a very simple predictive coding approach here. The "predictor" is a single value for each chunck/cluster.
This would give you very few of the benefits the cluster decorrelation potentially gives. Zstd should be able to explot much of the redundancy of neighbouring pixels in your list, but the 2D correlation is somewhat wasted.
While I'm not sure what the geometry of your "wiggly lines", an analogue technique for scanline traversal would be to store each pixel value as the difference from the pixel above, which would otherwise be far away in the list, and out of reach for the entropy coding.

@hohMiyazawa
Copy link
Contributor Author

hohMiyazawa commented Jan 20, 2021

To illustrate my point, here is a very simple scheme:

  1. Read image as raw RGB bytes.
  2. Traverse the image in scanline order, storing each R, G and B value as the difference from the line above.
  3. Compress this with zstd.

(as an extreme example)

Using a best-worse file size graph again, this very simple predictor has considerably better performance than just using zstd directly.

My suggestion: You should look into using a spatial predictor, even a very simple one, in your "calculating lists" step.
With your advanced partitioning, you will likely also have some gains by selecting a different predictor for each cluster.

toppdiff2

@umgefahren
Copy link
Owner

Have you already implemented this method or what's the source of your graph?

@umgefahren
Copy link
Owner

This idea of yours is pretty good and I'm confident that it will improve the system significantly. However it will take me a long time to implement. Here is why:
As you have seen Rust is static typed. This is the issue. Your proposal would require the use of signed 8-bit integers instead of unsigned 8-bit integers, that I'm currently using. This is important, because if we calculate lists in relation to the previous line it's possible, that we have a decrease in one or multiple of the RGB values. This decrease requires signed integers. This is, again, probably not an issue. The refactor is just immense.

@hohMiyazawa
Copy link
Contributor Author

Yes, I implemented this transform to make the graph.

You can probably get away without using signed integers by storing the diffs with an offset of 128:

RED1 - RED2 potentially has a range of -255 to 255. However, we can notice that a diff of 128 is equivalent to a diff of -128, a diff of 129 equivalent to -127, and so on.

The real range is then -128 to 127. If you store that +128, you have the range 0-255, which should fit in an unsigned integer.

@umgefahren
Copy link
Owner

Could you upload the the implementation to your fork and make a pull request. If the system passes all tests we can use it directly.
Otherwise, I will implement it.

@hohMiyazawa
Copy link
Contributor Author

I wrote this as a separate c++ program, sadly.
It's probably also highly dependent on how your "wiggly lines" work, as I just used a simple scanline approach.

@umgefahren
Copy link
Owner

I'm working on it. Can't make any promises though. I've been trying to implement it for the last hour. The hole encoder and decoder is pretty bad organized and coded. It's also very hard to debug, because you never know where the error is. Could be in the encoder or the decoder.
How did you write your C++ Code? Did you rewrite the hole algorithm? Maybe I don't quite get yet what your plan is.

I'm graduating in several weeks from high school so I have only two weeks max to bring the code into a reasonable state.

Nevertheless I highly appreciate your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants