Support secondary directory for log writing. #261

LykxSassinator · 2022-08-15T11:02:05Z

Brief Introduction

This pr is used to support the secondary directory feature, referred to ISSUE: #257.
It contains:

Add an extra configuration sub-dir to enable this feature. Default is None, and if set with Some(...), this directory will be used if the main dir is full, specified by dir option.
Add an extra class, named as StorageInfo in pipe.rs, to make this feature compatible to other existing features, i.e. recycle logs, recoverying and so on.

This commit builds a prototype to support secondary dir configuration. Signed-off-by: Lucasliang <[email protected]>

…full but the secondary dir was free to flush data. Signed-off-by: Lucasliang <[email protected]>

Signed-off-by: Lucasliang <[email protected]>

LykxSassinator · 2022-08-15T11:03:56Z

It's a prototype for closing #257, and all reasonable suggestions are acceptable.

And please hold on, I wanna refine it and make it more coherent.

LykxSassinator · 2022-08-15T11:04:53Z

It's a prototype for closing #257, and all reasonable suggestions are acceptable.

codecov · 2022-08-15T11:29:56Z

Codecov Report

Base: 97.64% // Head: 97.60% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (6a276e5) compared to base (5f718cf).
Patch coverage: 95.79% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #261      +/-   ##
==========================================
- Coverage   97.64%   97.60%   -0.04%     
==========================================
  Files          30       30              
  Lines       10655    11250     +595     
==========================================
+ Hits        10404    10981     +577     
- Misses        251      269      +18

Impacted Files	Coverage Δ
src/env/default.rs	`90.42% <37.50%> (-1.84%)`	⬇️
src/log_batch.rs	`97.51% <78.12%> (-0.47%)`	⬇️
src/file_pipe_log/pipe_builder.rs	`95.96% <93.57%> (-0.25%)`	⬇️
src/file_pipe_log/pipe.rs	`98.37% <94.62%> (-1.13%)`	⬇️
src/config.rs	`97.10% <96.49%> (-0.24%)`	⬇️
src/engine.rs	`97.94% <100.00%> (+0.02%)`	⬆️
src/util.rs	`89.58% <100.00%> (+0.18%)`	⬆️
tests/failpoints/test_engine.rs	`99.88% <100.00%> (+0.03%)`	⬆️
tests/failpoints/test_io_error.rs	`100.00% <100.00%> (ø)`
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: Lucasliang <[email protected]>

LykxSassinator · 2022-08-16T14:00:23Z

@tabokie Please take a review, this is a prototype for implementing secondary dir. If this design is a bit confusing, we can briefly make a offline discussion later.

Signed-off-by: Lucasliang <[email protected]>

LykxSassinator · 2022-08-17T08:11:07Z

Please hold on, the design of it should be refined and polished, to simplify the whole structure.

tabokie · 2022-08-17T08:14:54Z

src/engine.rs

-            writer.finish()?
+            // Retry if `writer.finish()` returns a special err, remarking there still
+            // exists free space for this `LogBatch`.
+            let ret = writer.finish();


This is not enough. fsync failure will panic. You should propagate the maybe_sync error as well.

tabokie · 2022-08-17T10:05:08Z

I think letting leader do all the retrying is simpler. When a write fails with NOSPC, the leader will close the current active file (it internally truncates un-sync-ed parts), and create a new file and retry the whole write group.

…dary-dir more readable and concise. Signed-off-by: Lucasliang <[email protected]>

Signed-off-by: Lucasliang <[email protected]>

LykxSassinator · 2022-08-17T15:35:51Z

I hold a different view on it.
As I do not expect the current writing group hanging on one LogBatch writing because of NOSPC err, I think the current leader should only be responsible for successfully written LogBatchs, and the followers, failed with NOSPC, should put them LogBatch into next writing group and wait for the next leader to do the append ops. And that's the core reason why I implemented the retry strategy.

Signed-off-by: Lucasliang <[email protected]>

tabokie · 2022-08-19T08:07:31Z

src/file_pipe_log/pipe_builder.rs

@@ -99,108 +101,201 @@ impl<F: FileSystem> DualPipesBuilder<F> {
    /// Scans for all log files under the working directory. The directory will
    /// be created if not exists.
    pub fn scan(&mut self) -> Result<()> {


Why change so much in this function? Couldn't it work by just doing two fs::read_dir(path)?.for_each?

The modifications just tidy codes and split the procedure of scan into three parts. It's not approapriate to code it with two fs::read_dir(path)?.for_eachs, as the definition of cfg.secondary-dir is Option<String>.

Let's not change the main code, because the updated version isn't more readable IMO. You can change the original scan to scan_dir, inside it you insert the file handle into a vector. After two scan_dir, you can sort and validate the vector.

The change is still too big and I don't see a clear purpose. You can keep the scan_dir simple and short as before, and do the file handle initialization early, then sort the FileToRecover list afterwards.

In my view, I think it's more clear than before.

scan has been splited into the following steps:

scanning dir to get file_seq_range of append logs and rewrite logs

scan_dir in Main dir;

scan_dir in Secondary dir, if it had been specified by cfg.secondary-dir;

Check and clear stale metadata of logs

clear_stale_metadata in Main dir;

clear_stale_metadata in Secondary dir, if it had been specified by cfg.secondary-dir;

Build file_list vector;

Still confusing?

There're two things here, one is to keep diff as short as possible. I do refactor all the time, but they are often centered around a change of abstraction (i.e. interaction between different modules, usually involves changing function interface and data types). I don't often refactor the internal implementation unless it significantly improves readability.

Then let's go to the code itself, is the readability improved? IMO it is not. The LOC increased 100%. And you introduced several types for the sake of refactoring (adding glue types usually indicates a bad abstraction). In particular I don't think the min_id/max_id approach is suitable anymore.

src/config.rs

src/engine.rs

Signed-off-by: Lucasliang <[email protected]>

…ne::write` Signed-off-by: Lucasliang <[email protected]>

Signed-off-by: Lucasliang <[email protected]>

LykxSassinator · 2022-08-31T06:50:20Z

PTAL cc @tabokie , several modifications have been supplied according to previous suggestions.

tabokie · 2022-08-31T06:56:45Z

src/file_pipe_log/pipe_builder.rs

@@ -99,108 +101,201 @@ impl<F: FileSystem> DualPipesBuilder<F> {
    /// Scans for all log files under the working directory. The directory will
    /// be created if not exists.
    pub fn scan(&mut self) -> Result<()> {


Let's not change the main code, because the updated version isn't more readable IMO. You can change the original scan to scan_dir, inside it you insert the file handle into a vector. After two scan_dir, you can sort and validate the vector.

src/engine.rs

src/log_batch.rs

src/file_pipe_log/pipe_builder.rs

…n `scan`. Signed-off-by: Lucasliang <[email protected]>

…o multi-dir

Signed-off-by: Lucasliang <[email protected]>

LykxSassinator · 2022-09-06T12:02:55Z

After merging #269, activate this pr again.

Ping @tabokie .

tabokie · 2022-09-07T02:58:47Z

src/log_batch.rs

+    }
+
+    /// Prepare the `rewrite` by reseting the `signature` in the `LogBatch`.
+    pub(crate) fn prepare_rewrite(buf: &mut Vec<u8>, signature: Option<u32>) -> Result<()> {


On a second thought, let's move checksum-ing of footer to LogBatch from LogItemBatch. This way we don't need to create two functions in LogItemBatch, and keep everything details together in LogBatch.

src/env/default.rs

tabokie · 2022-09-07T07:02:38Z

src/file_pipe_log/pipe_builder.rs

@@ -99,108 +101,201 @@ impl<F: FileSystem> DualPipesBuilder<F> {
    /// Scans for all log files under the working directory. The directory will
    /// be created if not exists.
    pub fn scan(&mut self) -> Result<()> {


The change is still too big and I don't see a clear purpose. You can keep the scan_dir simple and short as before, and do the file handle initialization early, then sort the FileToRecover list afterwards.

tabokie · 2022-09-07T07:03:33Z

src/file_pipe_log/pipe.rs

+
+/// Represents the info of storage dirs, including `main dir` and
+/// `secondary dir`.
+struct StorageInfo {


StorageManager, and use path_id to refer to path instead of storage_type.

path_id as in an integer. You can make the StorageManager manages a vector of directories, and you can also let it take care of directory lock as well. (Pass it to Pipe::open).

src/file_pipe_log/pipe.rs

src/engine.rs

Signed-off-by: Lucasliang <[email protected]>

tabokie · 2022-09-15T15:22:46Z

src/engine.rs

+                    let res = self
+                        .pipe_log
+                        .append(LogQueue::Append, log_batch, force_rotate);
+                    // If we found that there is no spare space for the next LogBatch in the


This is not what I meant by "rotate inside pipe_log". The decision to rotate should be put inside pipe_log.

fn append() { let mut writer = self.writer.lock(); let r = writer.write(bytes); if r.errorno == NOSPC { self.rotate_imp(&mut writer)?; return Err(TryAgain); } r }

src/util.rs

tabokie · 2022-09-22T04:48:28Z

src/file_pipe_log/pipe_builder.rs

+            let (min_id, max_id) = (files[0].seq, files[files.len() - 1].seq);
+            debug_assert!(min_id > 0);
+            let mut cleared = 0_u64;
+            for seq in (0..min_id).rev() {


Your new code won't work. Assuming it fails to clean up metadata for file N, on the next startup, it will not attempt to clean up all metadata for file n<N.

tabokie · 2022-09-22T05:18:55Z

src/file_pipe_log/pipe_builder.rs

-            fs::create_dir(dir)?;
-            self.dir_lock = Some(lock_dir(dir)?);
-            return Ok(());
+        // Scan main `dir` and `secondary-dir`, if `secondary-dir` is valid.


Let me be more verbose this time:

struct PipeBuilder { // Only available after a successful `scan`. dir_manager: Option<Arc<DirectoryManager>>, } struct DirectoryManager { paths: Vec<PathBuf>, locks: Vec<File>, } fn scan(&self) { // setup directories let mut dirs = DirectoryManager::new(); dirs.add(self.cfg.dir)?; dirs.add(self.second_dir)?; for path_id in 0..dirs.len() { self.scan_dir(path_id, dirs[path_id]); } self.rewrite_files.sort(); self.append_files.sort(); for queue, files in [] { if files.is_empty() { continue; } // check consecutiveness let mut invalid_files = 0; let mut current_seq = files[0].seq; for (i, f) in files.enumerate() { if f.seq > current_seq { warn!("hole"); current_seq = f.seq + 1; invalid_files = i; } else if f.seq < current_seq { return Error::InvalidArgument("Duplicate file"); } } files.drain(..invalid_files); if files.is_empty() { continue; } // cleanup metadata let delete_start = {...} 'cleanup: for seq in delete_start..files[0].seq { for path_id in 0..dirs.len() { if self.file_system.exists_metadata(dirs.path(path_id)) { if let Err(e) = self.file_system.delete_metadata() { ...; break 'cleanup; } } } } } self.dir_manager = Arc::new(dirs); } fn scan_dir(&self, path_id: u64, dir: &Path) { for f in fs::read_dir(dir) { self.rewrite_files.push(...); } } fn build(self) { SinglePipe::new(self.dir_manager.clone(), ...); }

Signed-off-by: Lucasliang <[email protected]>

LykxSassinator · 2023-02-15T07:25:45Z

This pr is based on a too obsolete master branch, which needs to be rebase.

So, I've reviewed the previous comment and re-built another pr for tackling this issue.

tabokie

(Some pending comments from months ago)

tabokie · 2022-09-27T07:19:43Z

src/log_batch.rs

@@ -551,10 +529,10 @@ enum BufState {
    /// state only briefly exists between encoding and writing, user operation
    /// will panic under this state.
    /// # Content
-    /// (header_offset, entries_len)
+    /// (header_offset, entries_len, signature)


signature -> original checksum

This change on BufState::Sealed() has been refactored in the new PR.

tabokie · 2022-09-27T07:21:53Z

src/log_batch.rs

+                        sign_checksum(&mut self.buf, Some(old ^ new))?;
+                    }
+                    (Some(old), None) => {
+                        sign_checksum(&mut self.buf, Some(old))?;


Lack coverage.

Same as above.

tabokie · 2022-09-27T07:22:27Z

src/config.rs

+        return Ok(());
+    }
+    if !path.is_dir() {
+        return Err(box_err!("Not directory: {}", dir));


Lack coverage.

Has been refactored here.

tabokie · 2022-09-27T07:23:22Z

src/engine.rs

-                    // As per trait protocol, this error should be retriable. But we panic anyway to
-                    // save the trouble of propagating it to other group members.
+                    // As per trait protocol, this error should be retriable. But we panic
+                    // anyway to save the trouble of propagating it to


Why the comment line break is changed?

Tackled in the new pr.

tabokie · 2022-09-27T07:23:54Z

src/engine.rs

@@ -178,7 +189,27 @@ where
            debug_assert_eq!(writer.perf_context_diff.write_wait_duration, Duration::ZERO);
            perf_context += &writer.perf_context_diff;
            set_perf_context(perf_context);
-            writer.finish()?
+            // Retry if `writer.finish()` returns a special 'Error::Other', remarking that


Add another error type TryAgain.

LykxSassinator added 4 commits August 11, 2022 17:42

Support secondary-dir configuration.

dabc636

This commit builds a prototype to support secondary dir configuration. Signed-off-by: Lucasliang <[email protected]>

[Enhancement] Supply the StorageFull judgement if the main dir was …

3c927b2

…full but the secondary dir was free to flush data. Signed-off-by: Lucasliang <[email protected]>

Supply necessary corner cases when disk is full.

221cc8f

Signed-off-by: Lucasliang <[email protected]>

Merge branch 'master' into multi-dir

b6570a7

LykxSassinator added 3 commits August 16, 2022 16:07

Polish codes and supply corner cases when sub-dir is set for use.

a97ca69

Signed-off-by: Lucasliang <[email protected]>

Merge branch 'master' into multi-dir

4859a52

Supply basic uts for StorageInfo to improve the codepath coverage ratio.

3da8d13

Signed-off-by: Lucasliang <[email protected]>

Supply examples for setting secondary-dir with uts in config.rs.

5231a07

Signed-off-by: Lucasliang <[email protected]>

tabokie reviewed Aug 17, 2022

View reviewed changes

LykxSassinator added 3 commits August 17, 2022 18:29

Polish the design of StorageInfo to make the whole structure on secon…

2278f0e

…dary-dir more readable and concise. Signed-off-by: Lucasliang <[email protected]>

Refine the codes.

d424d76

Signed-off-by: Lucasliang <[email protected]>

Refine the retry strategy in Engine::write(...).

5704eb5

Signed-off-by: Lucasliang <[email protected]>

Refine annotations.

35fc5b5

Signed-off-by: Lucasliang <[email protected]>

tabokie reviewed Aug 19, 2022

View reviewed changes

LykxSassinator added 5 commits August 22, 2022 19:13

Rename sub-dir with secondary-dir and polish annotations in scan.

30e0802

Signed-off-by: Lucasliang <[email protected]>

Merge branch 'master' into multi-dir

d769480

Signed-off-by: Lucasliang <[email protected]>

[Refinement]Refine the purge progress.

a92873e

Signed-off-by: Lucasliang <[email protected]>

[Bugfix]Fix recursive loop when trying to rewrite LogBatch by `Engi…

703892f

…ne::write` Signed-off-by: Lucasliang <[email protected]>

Refine the processing when force_rotate == true

34307bb

Signed-off-by: Lucasliang <[email protected]>

tabokie reviewed Aug 31, 2022

View reviewed changes

LykxSassinator added 2 commits August 31, 2022 18:35

Merge branch 'master' into multi-dir

dcd0e24

Bugfix for rewriting a LogBatch with V2 and tidy the implementation i…

a0c977b

…n `scan`. Signed-off-by: Lucasliang <[email protected]>

LykxSassinator added 3 commits September 1, 2022 15:54

Merge branch 'multi-dir' of github.com:LykxSassinator/raft-engine int…

acbe904

…o multi-dir

Refactor the rewrite progress when write meets NOSPC error.

193d34f

Signed-off-by: Lucasliang <[email protected]>

Clean unnecessary testing dirs in uts.

f306981

Signed-off-by: Lucasliang <[email protected]>

tabokie mentioned this pull request Sep 6, 2022

hide more file writing details from engine #269

Merged

LykxSassinator added 2 commits September 6, 2022 17:07

Merge branch 'master' into multi-dir

518621d

Signed-off-by: Lucasliang <[email protected]>

Tidy style of annotations.

00b7059

Signed-off-by: Lucasliang <[email protected]>

tabokie reviewed Sep 7, 2022

View reviewed changes

LykxSassinator added 5 commits September 8, 2022 18:05

Merge branch 'master' into multi-dir

6607662

Signed-off-by: Lucasliang <[email protected]>

Refine the code style, and refine the strategy for write LogBatchs.

0a0b705

Signed-off-by: Lucasliang <[email protected]>

Merge branch 'master' into multi-dir

f5df93e

Refine code styls of scan in pipe_build.rs

6e67a29

Signed-off-by: Lucasliang <[email protected]>

Polish annotations.

966656f

Signed-off-by: Lucasliang <[email protected]>

tabokie reviewed Sep 22, 2022

View reviewed changes

LykxSassinator added 2 commits September 22, 2022 14:38

Merge branch 'master' into multi-dir

7e65af6

Polish the implementation while doing recovery by scanning.

6a276e5

Signed-off-by: Lucasliang <[email protected]>

LykxSassinator closed this Feb 15, 2023

tabokie reviewed Feb 15, 2023

View reviewed changes

LykxSassinator mentioned this pull request Feb 15, 2023

Auxiliary directory for supporting the multi-directory configuration. #294

Merged

Support secondary directory for log writing. #261

Support secondary directory for log writing. #261

Conversation

LykxSassinator commented Aug 15, 2022

Brief Introduction

LykxSassinator commented Aug 15, 2022 • edited Loading

LykxSassinator commented Aug 15, 2022

codecov bot commented Aug 15, 2022 • edited Loading

Codecov Report

LykxSassinator commented Aug 16, 2022

LykxSassinator commented Aug 17, 2022

tabokie Aug 17, 2022 • edited Loading

Choose a reason for hiding this comment

tabokie commented Aug 17, 2022

LykxSassinator commented Aug 17, 2022

Choose a reason for hiding this comment

LykxSassinator Aug 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LykxSassinator Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LykxSassinator commented Aug 31, 2022

Choose a reason for hiding this comment

LykxSassinator commented Sep 6, 2022

tabokie Sep 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tabokie Sep 22, 2022 • edited Loading

Choose a reason for hiding this comment

LykxSassinator commented Feb 15, 2023

tabokie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LykxSassinator Feb 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LykxSassinator commented Aug 15, 2022 •

edited

Loading

codecov bot commented Aug 15, 2022 •

edited

Loading

tabokie Aug 17, 2022 •

edited

Loading

LykxSassinator Aug 22, 2022 •

edited

Loading

LykxSassinator Sep 8, 2022 •

edited

Loading

tabokie Sep 7, 2022 •

edited

Loading

tabokie Sep 22, 2022 •

edited

Loading

LykxSassinator Feb 16, 2023 •

edited

Loading