Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading block based log format #249

Merged
merged 25 commits into from
Aug 2, 2022

Conversation

LykxSassinator
Copy link
Contributor

PR Description

For closing #246, this pr is set. This pr is used for building basics for the new DIO feature, including:

  • [New] DataLayout, representing the arrangement on data (records), in alignment or not.
  • Refactor LogFileFormat / LogFileContext.
  • Add extra support in stress tool for the new alignment-mode reading.

This commit is used for building basics for the new DIO feature, including:
* [New] DataLayout, representing the arrangement on data (records), in alignment or not.
* Refactor LogFileFormat / LogFileContext.

Signed-off-by: Lucasliang <[email protected]>
…`data_layout` in `stress`.

Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
@codecov
Copy link

codecov bot commented Jul 21, 2022

Codecov Report

Merging #249 (4287b8d) into master (70d27ee) will increase coverage by 0.21%.
The diff coverage is 99.64%.

@@            Coverage Diff             @@
##           master     #249      +/-   ##
==========================================
+ Coverage   96.99%   97.21%   +0.21%     
==========================================
  Files          30       30              
  Lines       10491    10951     +460     
==========================================
+ Hits        10176    10646     +470     
+ Misses        315      305      -10     
Impacted Files Coverage Δ
src/config.rs 97.50% <ø> (ø)
src/lib.rs 100.00% <ø> (ø)
src/filter.rs 82.63% <75.00%> (ø)
src/file_pipe_log/format.rs 98.92% <99.40%> (+3.47%) ⬆️
src/engine.rs 96.90% <100.00%> (+<0.01%) ⬆️
src/file_pipe_log/log_file.rs 99.56% <100.00%> (+0.20%) ⬆️
src/file_pipe_log/mod.rs 97.58% <100.00%> (ø)
src/file_pipe_log/pipe.rs 99.16% <100.00%> (+0.06%) ⬆️
src/file_pipe_log/pipe_builder.rs 96.24% <100.00%> (+0.05%) ⬆️
src/file_pipe_log/reader.rs 94.01% <100.00%> (+4.01%) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70d27ee...4287b8d. Read the comment docs.

@LykxSassinator
Copy link
Contributor Author

More uts for covering the abnormal code paths wait to be supplemented.

@LykxSassinator
Copy link
Contributor Author

Please hold on.
This pr considers the format of aligned records and raw records without alignment. However, it still lack the compatibility to fragmented records, which should be re-considered.

@LykxSassinator
Copy link
Contributor Author

Please hold on. This pr considers the format of aligned records and raw records without alignment. However, it still lack the compatibility to fragmented records, which should be re-considered.

Done for it.

src/config.rs Outdated Show resolved Hide resolved
Signed-off-by: Lucasliang <[email protected]>
Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a bit more on this and I think RocksDB's format is not very efficient (many memcpy-s). And we require accessing raw entry data via a file block handle, so the data of one log batch should stay contiguous.

So there's no need to handling the Type as RocksDB does. The way I see it, the only work is to add a padding detection to the existing reader.

When reading a log batch, first peek(offset, end_of_current_block) and construct header. If this slice turns out to be padded with zeros, then go to the next block.

After the header is ready, we get the rest of the data just like before.

@LykxSassinator
Copy link
Contributor Author

LykxSassinator commented Jul 25, 2022

Agree!
So I refactor the reading strategy and remove the over-designed structure in the previous commit.

so the data of one log batch should stay contiguous.

About the padding detection, u menthioned before:

the only work is to add a padding detection to the existing reader.
When reading a log batch, first peek(offset, end_of_current_block) and construct header. If this slice turns out to be padded with zeros, then go to the next block.

Thanks for your suggestion, making me find an existed bug in the preivous detection strategy. And this bug has been fixed in this commit.

src/config.rs Show resolved Hide resolved
src/file_pipe_log/format.rs Outdated Show resolved Hide resolved
src/file_pipe_log/log_file.rs Outdated Show resolved Hide resolved
src/file_pipe_log/log_file.rs Outdated Show resolved Hide resolved
src/file_pipe_log/reader.rs Outdated Show resolved Hide resolved
src/file_pipe_log/reader.rs Outdated Show resolved Hide resolved
@tabokie tabokie changed the title Basics for dio Support reading block based log format Jul 26, 2022
Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to test the decoding properly. Since the write flow is not implemented, you need to manually assemble a block-based log file content. Some cases need to be covered:

  • [ batch padding ]
  • [ batch1 batch2_header ][ batch2_middle ][ batch2_rest ]
  • [ batch1 batch2_header_part1 ][ batch2_header_part2 ] (this should fail)
  • [ batch non_zero_padding ] (fail too)

src/file_pipe_log/reader.rs Outdated Show resolved Hide resolved
@LykxSassinator
Copy link
Contributor Author

You also need to test the decoding properly. Since the write flow is not implemented, you need to manually assemble a block-based log file content. Some cases need to be covered:

  • [ batch padding ]
  • [ batch1 batch2_header ][ batch2_middle ][ batch2_rest ]
  • [ batch1 batch2_header_part1 ][ batch2_header_part2 ] (this should fail)
  • [ batch non_zero_padding ] (fail too)

Got it.

  • The first two case have been tested in the failpoints/test_engine.rs: test_build_engine_with_aligned_datalayout in the previous commit.
  • The third abnormal cornor case waited to be supplemented.
  • About the last corner case, I think it should be passed. Is necessary to check the padding whether it is filled with zeros? non-zero paddings can also be skipped as mentioned above.

src/file_pipe_log/format.rs Outdated Show resolved Hide resolved
src/file_pipe_log/format.rs Outdated Show resolved Hide resolved
src/file_pipe_log/log_file.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe.rs Outdated Show resolved Hide resolved
src/file_pipe_log/reader.rs Outdated Show resolved Hide resolved
src/file_pipe_log/reader.rs Outdated Show resolved Hide resolved
@LykxSassinator
Copy link
Contributor Author

Corner cases have been supplemented in this commit.
Thanks for reviewing.

src/file_pipe_log/reader.rs Outdated Show resolved Hide resolved
src/file_pipe_log/format.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/file_pipe_log/log_file.rs Outdated Show resolved Hide resolved
src/file_pipe_log/reader.rs Outdated Show resolved Hide resolved
src/file_pipe_log/format.rs Outdated Show resolved Hide resolved
src/file_pipe_log/reader.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe.rs Outdated Show resolved Hide resolved
Signed-off-by: Lucasliang <[email protected]>
Cargo.toml Outdated Show resolved Hide resolved
src/file_pipe_log/log_file.rs Outdated Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
src/util.rs Outdated Show resolved Hide resolved
src/file_pipe_log/format.rs Outdated Show resolved Hide resolved
src/file_pipe_log/format.rs Outdated Show resolved Hide resolved
src/file_pipe_log/format.rs Show resolved Hide resolved
tests/failpoints/test_engine.rs Outdated Show resolved Hide resolved
Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LG

src/pipe_log.rs Show resolved Hide resolved
src/file_pipe_log/pipe_builder.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants