-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collaboration for a Python library #5
Comments
Also let me explain a bit more about the specificities of the pyFileFixity project:
My plan is to not only port your algorithm but also implement this "variable scheme" by implementing the change of the number of symbols on-the-fly (as I did with Reed-Solomon codes). Do you think this is possible? |
i'm close to publishing of 0.1. decoder is already developed and checked, it just need new codebase and will be pushed immediately after 0.1 release. sorry, i spend too much time improving 0.1 docs :) 0.2 can decode data, but final API will require collaborative work. I don't yet have finished vision what the API should be, so we will start with working encoding/decoding procedures with some uncomfortable API and will try to make it more convenient. as to Python implementation - yes, i will help you to understand math behind it. Once you know Reed-Solomon codes, it's pretty straightforward study. Did you read ALL md files in the project? In particular, Overview.md tells which parts of Math you need to learn first. Also take into account #4 as well as lhc-rs library - these libraries may be better choice for you, so for the start you need to learn pros and cons of all these libraries. Just now Wirehair is the best choice since it's already working and well-tested. I suggest you to subscribe to #1 as well. About porting it into Python - if it does meaning for you, it's possible. The code isn't that large - ~500 LOC. It has the "perfromance core" which is multiplication in GF(p) of array by scalar as wel as few more similar operations. Unfortunately, scalar algebra libraries probably doesn't support operations over GF(p), so you will need to find such libs or make a new one (or bind existing libs that perfrom that. I know about NTL, FLINT, there is even GPU implementation of entire fast NTT procedure!!) |
i've read your docs but not found - what you mean by "on-the-fly" change of protection parameters? fastecc allows to use different protection configs in different calls - f.e. you can compute 128 parity blocks from 1024 256-byte source blocks on the one call and 1024 parity blocks from 256 600-byte parity blocks on the next one. no data are need to be precomputed |
Sorry for the late reply, I was reading some documentation (including yours!) to update me about fft and ntt encoding. Yes you nailed it, by "on-the-fly" I meant changing the length of the parity blocks without having to precompute again. I am really excited by your project, I'm eagerly awaiting for the decoder to be implemented so that I can link your library in my project :-) I am also trying to port your ntt encoding into Python, but it's not as straightforward as it seems. I have an NTT and INTT functions, and I can compute the derivative of a polynomial. I also have functions to do pretty much any mathematical operation in GF. However, I cannot wrap my head around how to properly implement encoding, can you help me with an outline of the algorithm? I will of course add you as a co-author of both the reed-solomon library and pyFileFixity project (if that's anything worth!). |
About other projects, yes I checked them out, but Wirehair is LDPC and I would prefer a MDS such as Reed-Solomon because I understand it a lot better and it's deterministic, and the others are O(n²) which is just like my library (of course they are faster since they are written in C or C++ but asymptotically this does not make a huge difference). There is only ShianJhengLin's proof-of-concept library that might be interesting to port but it supports only erasures for decoding, not errors, so the usefulness is limited (I'm pretty sure it's easily adaptable to errors but I don't have the math skills nor the time to study it in order to implement this). So your library is the most promising to me, it will finally make it practical to protect huge volumes of data (until now I was limited to hundreds of megabytes, with your library it should work for hundreds of gigabytes!). BTW, will your decoder implement error decoding or only erasure for the moment? |
Very nice paper BTW, very nice and concise, congrats 👍 The maths are a little bit above my head but I'm going to study that. And I read you mention the error locator polynomial is very similar to the formal derivative :-) So I guess once I understand how to plug in the NTT to replace the polynomal multiplication/division, it will be straightforward to implement it in all encoding/decoding routines! |
So, the current state: As you can see in README, i've reinvented previous Lin algo that works in GF(p). Their latest paper implements the same high-level algo but in GF(2^n) fields. Their own library is research-grade and somewhat slower than mine but in June Christopher Taylor developed highly-optimized version of this algo and published it as https://github.com/catid/leopard - note that it has full working decoder and faster than my library. So now you have one more choice 😸 and outcome greatly depends on whether you want to use existing library or learn Math behind it and implement it yourself. I can help you with math behind my lib, Christopher probably can help you with his earlier libraries and you can reimplement Leopard in Python by mere translation of C++ code to Python. If you want to learn how my lib works, you need to know:
GF(p) arithmetics is simple and covered in the Wikipedia. FFT/iFFT is there too, but i suggest to learn that from some book, in particular FxtBook is great and free. Essential FFT algos also covered in FxtBook, you can find more in the Wikipedia again, but they are better organized in http://www.engineeringproductivitytools.com/stuff/T0001/index.html (although it doesn't cover important MFA algo). Finally, various views on RS are covered here and there (f.e. in Wikipedia) but i can help you here. I plan to write my own tutorial, similar to famous Plank one, but describing my lib implementation, which will cover all these topics except for FFT algos. |
do you mean my README? i never filled any papers 😃
I still don't understand what you mean by precomputing. Encoder and decoder functions are single-shot - they get all input blocks and return all output blocks for an entire shard in a single call. Of course, different shards may have entirely different configurations.
https://www.google.com/search?q=fast+polynomial+multiplication It's how I've googled several CS courses on these topics and can point you if you need. Except for multiplication, they are somewhat complex algos which are NOT used by my library. But if you want to implement error decoding, you may need to learn these topics. Download archive - it's very large but you can find there everything i digged on the topic 😄
I have never learned how error recovery works nor plan to do it 😄. The only fast error decoder i remember is libnoe mentioned in msg - essentially this msg lists all fast ECC libs we know. You may also find extremely useful other topics of the PAR math subforum as we discussed architecture of possible PAR3 implementation. I also developed program similar to PAR (although haven't published it), so my opinion:
|
Ah sorry I thought you were a co-author of "An Efficient (n,k) Information Dispersal Algorithm based on Fermat Number Transforms" by Sian-Jheng Lin and Wei-Ho Chung, but anyway yes your documentation is very good, thank you very much for taking the time to write that :-) About error recovery, it is useful in case your ECC symbols can be tampered too (and not just the data). In this case, CRC can also be tampered too and you lose all ability to erasure decode, but not error decode. I will find a way to enable error decoding if at least I can find a way to encode using NTT :-) About LDPC algo limit of 65536, this can be alleviated by using chunking so that's not a big deal, same for non-round configs which can be fixed by padding, it's similar limits to usual Reed-Solomon. But that's a totally different algorithm, LDPCs are VERY tricky to get done right, so porting it to Python is bound to be very complicated :-( Thank you very much for the links and documents 👍 I will study that. But however about NTT for polynomial division (RS encoding), I could not find much, this is something very very new, most other papers and tutos talk about FFT for fast polynomial division (which I am trying to implement, but I am getting mixed results for now). Do you have any pointer for NTT in RS encoding beside the paper by Sian-Jheng et al? |
Do you understand the paper? If so, you are smarter than me because I got it only when rediscovered their algo myself.
Do you understand O(N^2) error decoding algos? I'm not and they are more complex than erasure algos. And except for noe, i don't remember any fast error decoding algos published.
I think it can be managed by improved ECC file structure:
This way, when you lose chunk of control data, you usually also lose chunk of regular data, and since you have 5x more protection for control data, it will be extremely rare when you have enough regular data for error-decode but not enough ECC data to recover. It is how my own PAR-like utility works.
I prefer to use 4 KB data blocks to match modern media. Use of multiple shards (f.e. 16 shards of 64K blocks instead of single shard of 1M blocks) makes overall code essentially non-MDS and decreases recovery chances. I prefer to use as much blocks in single shard as can be loaded to RAM simultaneously, and with 4 KB block and modern computers this is ~~1M blocks.
O(N^2) RS doesn't need padding and can easily support GF(2^32) operations, but using more than 64K blocks will be extremely slow anyway. With padding to 2^n blocks, when going from 32768 to 32769 blocks you immediately double memory usage and halve speed. So the solution, that can increase number of blocks in smaller steps, is extremely important when we want to encode data in as large shards as can fit in RAM. |
In BulatECC.zip, these are in directories Par3 (collecting code/papers that were considered for PAR3 implementation by MultiPar author), NTT_RS and mateer-thesis.pdf. Note however that the approach i used IMHO is simplest and fastest among all algos proposed in these papers. So if you understand this paper, you don't need to look any more, except for curiosity.
As i described here (and Lin et al in the paper), we don't need polynomial division at all, and even less for encoding. NTT is the shortcut for DFT performed in a Galois field. FFT is any fast algo computing DFT. So any FFT algo is usable for computing NTT. Papers/courses talk about FFT just because it's a generic notion - these algos are applicable both for polynomials over C and polynomials over GF(p^n). Encoding algo is described here. Do you need more info to understand it? |
Hey Bulat, Thank you very much, it's an amazing database of papers you've got, thanks for sharing! Unluckily no I don't understand the whole paper yet, I'm no mathematician either, I'm just a bit familliar with good old Reed-Solomon procedures :-) About ECC protection, it's not really necessary to protect ECC bytes because they are protecting themselves, I'm not sure that adding another ECC on ECC is actually better Signal-to-Ratio wise compared to just increasing the number of ECC bytes. But the file structure of the ECC file/PAR file can actually be protected, and this is what I did with pyFileFixity with index files :-) They protect the metadata. About error detection, I posted on SO the Python code and the related books. The best one being: "Algebraic codes for data transmission", Blahut, Richard E., 2003, Cambridge university press. It's really the best book about Reed-Solomon and errata decoding, it describes almost everything, and I'm pretty sure it would work with your implementation since it's simply giving the errors locator polynomial, you can then use your decoding procedure with it just like if it was the erasure locator polynomial. But that's just for your information, I understand you don't have the time to implement it :-) About the NTT encoding, only the "Faster" section describe it, and your code. I tried to infer from the code what the algo does, can you help me filling up the missing pieces? def encode(message, N=255): # N is the galois field size right?
coefs = INVERSE_NTT(message, N)
generator_polynomial = generate_poly(...)
for g in generator_polynomial:
???
return NTT(encoded_message, N) Usually, instead of ??? I just do a |
Also if you want to support different parameters for your RS encoder (such as US FAA ADSB UAT RS FEC standard), I think you can change in RS.cpp:
and: With these two changes, you add two parameters: generator (by default 2, usually alpha in papers) and fcr (by default 0) that allow compatibility with any other RS codec. The resulting codeword will be different, but at decoding you should have the exact same decoding power (same number of erasures/errors recoverable). Then the US standard has these parameters:
Of course, at decoding, you also have to apply these changes: use the generator number to generate the generator polynomial, and fcr everytime you have to loop over the generator polynomial (more specifically: at syndrome calculation, and forney algorithm to correct errata). |
And I do the same. Now let's see:
Does it make sense now? |
Ok I see, but there are two differences in my implementation and the goals:
I will now look into block encoding as symbol encoding is clearly a dead end in terms of speed performance (maximum ~10MB, whatever the programming language). I will then come back to implementing NTT :-) |
Coming back after a lot of reading. The most useful resource I could find was by the famous J. Plank: http://web.eecs.utk.edu/~plank/plank/papers/2013-02-11-FAST-Tutorial.pdf So there is no magic block coding, the matrix multiplication approach anyway requires XOR additions, which is not available in linear algebra libraries of Python, so anyway reimplementing them manually will be slow. What I am missing the most to get a fast enough codec I think is SIMD, I will look into that but I don't hope too much because it seems difficult to access this kind of accelerated instructions from Python... Now about the scheme:
So, having said all that, I don't think using horizontal scheme makes much of a difference performance-wise, or am I mistaken? If it does not make such a big difference, I will just use your lib when you finish the decoding, as it is near impossible to implement SIMD optimizations in Python :-( |
Hey there,
I came from this post you posted on StackOverflow (thank you very much!). Your project is very interesting and i would be very interested in including it in my pyFileFixity project (a sort of MultiPar but aimed at maximum robustness and MIT licensed), and potentially port it into Python (you will be co-author ofc), as your readme explanations and code comments are very clear.
Of course, the Python port would be slower, but I think it can still retain most of the speed by using linear algebra specialized libraries.
For the port, I will probably need to ask you some maths questions when I'll be implementing. Would you be interested in this collaboration? I would start after you finished implementing decoding and the public API.
In any case, I'm really looking forward the development of this project, it is VERY interesting!
The text was updated successfully, but these errors were encountered: