Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: when deserialization of module from filesystem cache fails, remove from cache #138

Open
wants to merge 1 commit into
base: feat/rm-serialized-module-cache
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion crates/common/src/result.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ pub enum WasmErrorInner {
/// Wasmer failed to build a Module from wasm byte code.
/// With the feature `wasmer_sys` enabled, building a Module includes compiling the wasm.
ModuleBuild(String),
/// Wasmer failed to serialize a Module to bytes.
ModuleSerialize(String),
/// Wasmer failed to deserialize a Module from bytes.
ModuleDeserialize(String),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional error types for clarity

/// The host failed to call a function in the guest.
CallError(String),
}
Expand All @@ -49,8 +53,10 @@ impl WasmErrorInner {
| Self::ErrorWhileError
// Bad memory is bad memory.
| Self::Memory
// Failing to build the wasmer Module means we cannot use it.
// Failing to build, serialize, or deserialize the wasmer Module means we cannot use it.
| Self::ModuleBuild(_)
| Self::ModuleSerialize(_)
| Self::ModuleDeserialize(_)
// This is ambiguous so best to treat as potentially corrupt.
| Self::CallError(_)
=> true,
Expand Down
32 changes: 26 additions & 6 deletions crates/host/src/module.rs
Original file line number Diff line number Diff line change
Expand Up @@ -197,10 +197,21 @@ impl ModuleCache {
match self.get_from_filesystem_cache(key) {
// Filesystem cache hit, deserialize and save to deserialized cache
Ok(Some(serialized_module)) => {
let module = Arc::new(
unsafe { Module::deserialize(&self.runtime_engine, serialized_module.clone()) }
.map_err(|e| wasm_error!(WasmErrorInner::ModuleBuild(e.to_string())))?,
);
let module_result =
unsafe { Module::deserialize(&self.runtime_engine, serialized_module.clone()) };

// If deserialization fails, we assume the file is corrupt,
// so it is removed from the filesystem cache.
let module = match module_result {
Ok(d) => Ok(Arc::new(d)),
Err(e) => {
let _ = self.remove_from_filesystem_cache(key);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove from cache, but ignore any errors deleting the file (i.e. if it doesn't exist or user doesn't have permissions). Not sure if the error should be passed up.

Err(wasm_error!(WasmErrorInner::ModuleDeserialize(
e.to_string()
)))
}
}?;

self.add_to_deserialized_cache(key, module.clone());

Ok(module)
Expand Down Expand Up @@ -234,8 +245,9 @@ impl ModuleCache {
.serialize()
.map_err(|e| wasm_error!(WasmErrorInner::ModuleBuild(e.to_string())))?;
let module = Arc::new(unsafe {
Module::deserialize(&self.runtime_engine, serialized_module.clone())
.map_err(|e| wasm_error!(WasmErrorInner::ModuleBuild(e.to_string())))?
Module::deserialize(&self.runtime_engine, serialized_module.clone()).map_err(
|e| wasm_error!(WasmErrorInner::ModuleDeserialize(e.to_string())),
)?
});

// Save serialized module to filesystem cache
Expand Down Expand Up @@ -311,6 +323,14 @@ impl ModuleCache {
Ok(())
}

// Remove serialized module from filesystem cache
fn remove_from_filesystem_cache(&self, key: CacheKey) -> Result<(), std::io::Error> {
if let Some(fs_path) = self.filesystem_cache_module_path(key) {
std::fs::remove_file(fs_path)?;
}

Ok(())
}
/// Check deserialized cache for module
fn get_from_deserialized_cache(&self, key: CacheKey) -> Option<Arc<Module>> {
let mut deserialized_cache = self.deserialized_module_cache.write();
Expand Down
67 changes: 67 additions & 0 deletions crates/host/src/module/wasmer_sys.rs
Original file line number Diff line number Diff line change
Expand Up @@ -220,4 +220,71 @@ mod tests {

std::fs::remove_dir_all(tmp_fs_cache_dir).unwrap();
}

#[test]
fn cache_get_from_fs_corrupt() {
// simple example wasm taken from wasmer docs
// https://docs.rs/wasmer/latest/wasmer/struct.Module.html#example
let wasm: Vec<u8> = vec![
0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, 0x01, 0x06, 0x01, 0x60, 0x01, 0x7f,
0x01, 0x7f, 0x03, 0x02, 0x01, 0x00, 0x07, 0x0b, 0x01, 0x07, 0x61, 0x64, 0x64, 0x5f,
0x6f, 0x6e, 0x65, 0x00, 0x00, 0x0a, 0x09, 0x01, 0x07, 0x00, 0x20, 0x00, 0x41, 0x01,
0x6a, 0x0b, 0x00, 0x1a, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x01, 0x0a, 0x01, 0x00, 0x07,
0x61, 0x64, 0x64, 0x5f, 0x6f, 0x6e, 0x65, 0x02, 0x07, 0x01, 0x00, 0x01, 0x00, 0x02,
0x70, 0x30,
];

// Bad serialized_wasm
let bad_serialized_wasm = vec![0x00];

let tmp_fs_cache_dir = tempdir().unwrap().into_path();
let module_cache = ModuleCache::new(make_engine, Some(tmp_fs_cache_dir.clone()));
let key: CacheKey = [0u8; 32];

// Build module, serialize, save directly to filesystem
let serialized_module_path = tmp_fs_cache_dir.clone().join(hex::encode(key));
let mut file = std::fs::OpenOptions::new()
.write(true)
.create_new(true)
.open(&serialized_module_path)
.unwrap();
file.write_all(&bad_serialized_wasm).unwrap();

// Module cannot be retrieved from fs cache
let res = module_cache.get(key, &wasm);
assert!(res.is_err());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting a corrupt file from the filesystem cache errors.


let compiler_engine = make_engine();
let module =
std::sync::Arc::new(Module::from_binary(&compiler_engine, wasm.as_slice()).unwrap());
let serialized_module = module.serialize().unwrap();

// 2nd time, module can be retrieved from cache
let res = module_cache.get(key, &wasm).unwrap();
assert_eq!(*res.serialize().unwrap(), *serialized_module);
Copy link
Member Author

@mattyg mattyg Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After getting a corrupt file from the filesystem cache, it is removed from the filesystem, so the next get succeeds.

I'm not sure if a better approach would be to (1) remove it from the filesystem then (2) retry the get once, so the caller does not need to call get twice.


// make sure module is stored in deserialized cache
{
let deserialized_cached_module = module_cache
.deserialized_module_cache
.write()
.get_item(&key)
.unwrap();
assert_eq!(
*deserialized_cached_module.serialize().unwrap(),
*module.serialize().unwrap()
);
}

// make sure module has been stored in serialized filesystem cache
{
let serialized_module_path = module_cache
.serialized_filesystem_cache_path
.unwrap()
.join(hex::encode(key));
assert!(std::fs::metadata(serialized_module_path).is_ok());
}

std::fs::remove_dir_all(tmp_fs_cache_dir).unwrap();
}
}
Loading