Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering, Max Files and Supporting URLs #26

Open
jaruba opened this issue Feb 13, 2024 · 8 comments
Open

Filtering, Max Files and Supporting URLs #26

jaruba opened this issue Feb 13, 2024 · 8 comments

Comments

@jaruba
Copy link
Contributor

jaruba commented Feb 13, 2024

Hi! We've been playing with your module for the last few days and were hoping to include it in Stremio's local streaming server.

I wanted to share our experience with it. 😄

Specifically this code gave us a bit of trouble:

while (fileOffset < rarFile.length - TerminatorHeaderParser.HEADER_SIZE) {
const fileHead = await parseHeader(FileHeaderParser, rarFile, fileOffset);
if (fileHead.type !== 116) {
break;
}
if (fileHead.method !== 0x30) {
throw new Error("Decompression is not implemented");
}
fileOffset += fileHead.headSize;
fileChunks.push({
name: fileHead.name,
fileHead,
chunk: new RarFileChunk(
rarFile,
fileOffset,
fileOffset + fileHead.size - 1
),
});
fileOffset += fileHead.size;
}

This is used to traverse the archive in order to read the file headers and prepare the file chunks.

We stress tested it with some cases, one being a 175gb archive with many files included, which would take many tens of seconds until it finished parsing the file list due to the code linked above.

We thus added some options:

{
  "fileIdx": 1,
  "fileMustInclude": ["hello world", "/dexter/i"]
}

These were added so we can stop parsing when we find the file that we require, and also to skip adding the file chunks for all files we do not require. (you can see the changes here: https://github.com/Stremio/rar-stream/blob/ee358ffe86b7060e554aa326a14d15e6f0cb739d/src/rar-files-package.ts#L55-L95 )

As far as we can tell the traversal to get the file list needs to be sequential and there is no shortcut to this method.

We were thinking that one way that we could continue using this module (and not a fork of it) would be to possibly simplify these options so they could be useful for more users, such as:

{
  "maxFiles": 1,
  "filter": (filename, fileIdx) => {
    return filename.includes('hello world')
  }
}

Otherwise, you did absolutely amazing work! I kept stumbling over your module through the years and kept disregarding it because I couldn't wrap my mind around what missing decompression means for RAR archives, and it randomly came to me recently that we don't need decompression at all for our usecase.

Our experiments lead to making this module: https://github.com/stremio/rar-http

It adds support for URLs and uses an HTTP API. I believe that support for URLs could also be added directly to your module, but if we want to keep the spirit of a self-contained (no dependencies) module, it may be hassle to use the http / https modules directly for this task.

We're mostly curious about your opinion on all of this, and if you would even accept PRs for such changes.

Thank you for building rar-stream!

@jaruba
Copy link
Contributor Author

jaruba commented Feb 14, 2024

I also found the readme of unzip-stream interesting: https://www.npmjs.com/package/unzip-stream#parse-zip-file-contents

As it also has a method of giving the user of the module control over the traversal of the archive's file list.

@1313
Copy link
Collaborator

1313 commented Feb 14, 2024

Hello 👋 so nice to hear that you find use of this library!

Hehe I had the same discovery that most media related cases uses rar just for file splitting. As a retired warez scener I know the reasons for splitting files like this but it makes less sense from the outside. Anyhow, nice to be able to stream things directly without having to have all the content locally.

I am a bit busy at the moment, but your suggestions are reasonable and sound. I will assess things more closely tomorrow.

Happy to receive PRs.

Cheers and thanks for the kind words!

@1313
Copy link
Collaborator

1313 commented Feb 19, 2024

When I think of it I think we can optimize it further and calculate the chunks with some arithmetics. As we know the file size and the intermediate chunks should have a constant size for file heads.

I'll do some digging. Filtering is not a bad idea either, especially with a lot of smaller files.

@jaruba
Copy link
Contributor Author

jaruba commented Feb 21, 2024

@1313 sorry for the late reply, i have been hunting for a new apartment lately

When I think of it I think we can optimize it further and calculate the chunks with some arithmetics.

That would be great, I have many test cases if you need anything tested, and we are also willing to assist with development for improving the module. Should we wait to see what results your current attempts lead to? Or do you have any details of how you think some features should be best implemented?

@1313
Copy link
Collaborator

1313 commented Mar 2, 2024

Haha when revisiting the code I see that the optimisation is already there.

Sorry for a late reply here as well. Short on free time with two toddlers to run after. I hope the apartment hunt bares fruit.

Anyhow, I think a filtering solution would be great, if you want to give it a shot I'm happy to review a PR and we can take the discussion from there.

Cheers

@jaruba
Copy link
Contributor Author

jaruba commented Mar 4, 2024

@1313 tell me what you think: #30

@jaruba
Copy link
Contributor Author

jaruba commented Mar 4, 2024

although off-topic, another interesting case i ran into is the way that the FileMedia interface works: https://github.com/doom-fish/rar-stream?tab=readme-ov-file#innerfile-api

it expects createReadStream() to be synchronous, which only creates an issue if I try to use node-fetch instead of needle, as the expected way of using node-fetch would be:

const createReadStream = async () => {
  const fetch = require('node-fetch');
  const resp = await fetch(url, opts);
  return resp.body; // this is a stream
}

so node-fetch can only be used in an async function, using await file.createReadStream() should work for both sync and async cases.

this is outside of our need, i just thought that using node-fetch would have been interesting because it can easily be swapped with the browser's own HTML5 fetch, and there is a good chance that rar-stream could be browserified to unpack and play a video straight in the browser when combined with something like: https://www.npmjs.com/package/videostream

just something to think about 😄

there is now a PR for this change here: #29

@1313
Copy link
Collaborator

1313 commented Mar 5, 2024

I'll look into making the lib browser ready as well. I suspect that the binary dependency used for reading the headers requires node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants