Filtering, Max Files and Supporting URLs #26

jaruba · 2024-02-13T18:31:37Z

Hi! We've been playing with your module for the last few days and were hoping to include it in Stremio's local streaming server.

I wanted to share our experience with it. 😄

Specifically this code gave us a bit of trouble:

rar-stream/src/rar-files-package.ts

Lines 55 to 75 in caf7953

    
           while (fileOffset < rarFile.length - TerminatorHeaderParser.HEADER_SIZE) { 
        
             const fileHead = await parseHeader(FileHeaderParser, rarFile, fileOffset); 
        
             if (fileHead.type !== 116) { 
        
               break; 
        
             } 
        
             if (fileHead.method !== 0x30) { 
        
               throw new Error("Decompression is not implemented"); 
        
             } 
        
             fileOffset += fileHead.headSize; 
        
             fileChunks.push({ 
        
               name: fileHead.name, 
        
               fileHead, 
        
               chunk: new RarFileChunk( 
        
                 rarFile, 
        
                 fileOffset, 
        
                 fileOffset + fileHead.size - 1 
        
               ), 
        
             }); 
        
             fileOffset += fileHead.size; 
        
           }

This is used to traverse the archive in order to read the file headers and prepare the file chunks.

We stress tested it with some cases, one being a 175gb archive with many files included, which would take many tens of seconds until it finished parsing the file list due to the code linked above.

We thus added some options:

{
  "fileIdx": 1,
  "fileMustInclude": ["hello world", "/dexter/i"]
}

These were added so we can stop parsing when we find the file that we require, and also to skip adding the file chunks for all files we do not require. (you can see the changes here: https://github.com/Stremio/rar-stream/blob/ee358ffe86b7060e554aa326a14d15e6f0cb739d/src/rar-files-package.ts#L55-L95 )

As far as we can tell the traversal to get the file list needs to be sequential and there is no shortcut to this method.

We were thinking that one way that we could continue using this module (and not a fork of it) would be to possibly simplify these options so they could be useful for more users, such as:

{
  "maxFiles": 1,
  "filter": (filename, fileIdx) => {
    return filename.includes('hello world')
  }
}

Otherwise, you did absolutely amazing work! I kept stumbling over your module through the years and kept disregarding it because I couldn't wrap my mind around what missing decompression means for RAR archives, and it randomly came to me recently that we don't need decompression at all for our usecase.

Our experiments lead to making this module: https://github.com/stremio/rar-http

It adds support for URLs and uses an HTTP API. I believe that support for URLs could also be added directly to your module, but if we want to keep the spirit of a self-contained (no dependencies) module, it may be hassle to use the http / https modules directly for this task.

We're mostly curious about your opinion on all of this, and if you would even accept PRs for such changes.

Thank you for building rar-stream!

The text was updated successfully, but these errors were encountered:

jaruba · 2024-02-14T12:35:56Z

I also found the readme of unzip-stream interesting: https://www.npmjs.com/package/unzip-stream#parse-zip-file-contents

As it also has a method of giving the user of the module control over the traversal of the archive's file list.

1313 · 2024-02-14T20:05:05Z

Hello 👋 so nice to hear that you find use of this library!

Hehe I had the same discovery that most media related cases uses rar just for file splitting. As a retired warez scener I know the reasons for splitting files like this but it makes less sense from the outside. Anyhow, nice to be able to stream things directly without having to have all the content locally.

I am a bit busy at the moment, but your suggestions are reasonable and sound. I will assess things more closely tomorrow.

Happy to receive PRs.

Cheers and thanks for the kind words!

1313 · 2024-02-19T07:54:32Z

When I think of it I think we can optimize it further and calculate the chunks with some arithmetics. As we know the file size and the intermediate chunks should have a constant size for file heads.

I'll do some digging. Filtering is not a bad idea either, especially with a lot of smaller files.

jaruba · 2024-02-21T17:44:08Z

@1313 sorry for the late reply, i have been hunting for a new apartment lately

When I think of it I think we can optimize it further and calculate the chunks with some arithmetics.

That would be great, I have many test cases if you need anything tested, and we are also willing to assist with development for improving the module. Should we wait to see what results your current attempts lead to? Or do you have any details of how you think some features should be best implemented?

1313 · 2024-03-02T07:58:24Z

Haha when revisiting the code I see that the optimisation is already there.

Sorry for a late reply here as well. Short on free time with two toddlers to run after. I hope the apartment hunt bares fruit.

Anyhow, I think a filtering solution would be great, if you want to give it a shot I'm happy to review a PR and we can take the discussion from there.

Cheers

jaruba · 2024-03-04T14:05:46Z

@1313 tell me what you think: #30

jaruba · 2024-03-04T15:40:34Z

although off-topic, another interesting case i ran into is the way that the FileMedia interface works: https://github.com/doom-fish/rar-stream?tab=readme-ov-file#innerfile-api

it expects createReadStream() to be synchronous, which only creates an issue if I try to use node-fetch instead of needle, as the expected way of using node-fetch would be:

const createReadStream = async () => {
  const fetch = require('node-fetch');
  const resp = await fetch(url, opts);
  return resp.body; // this is a stream
}

so node-fetch can only be used in an async function, using await file.createReadStream() should work for both sync and async cases.

this is outside of our need, i just thought that using node-fetch would have been interesting because it can easily be swapped with the browser's own HTML5 fetch, and there is a good chance that rar-stream could be browserified to unpack and play a video straight in the browser when combined with something like: https://www.npmjs.com/package/videostream

just something to think about 😄

there is now a PR for this change here: #29

1313 · 2024-03-05T07:28:59Z

I'll look into making the lib browser ready as well. I suspect that the binary dependency used for reading the headers requires node.

jaruba mentioned this issue Mar 4, 2024

Support Async file.createReadStream() #29

Closed

jaruba mentioned this issue Mar 5, 2024

Support Async file.createReadStream() (2) #33

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering, Max Files and Supporting URLs #26

Filtering, Max Files and Supporting URLs #26

jaruba commented Feb 13, 2024 •

edited

Loading

jaruba commented Feb 14, 2024 •

edited

Loading

1313 commented Feb 14, 2024

1313 commented Feb 19, 2024

jaruba commented Feb 21, 2024

1313 commented Mar 2, 2024 •

edited

Loading

jaruba commented Mar 4, 2024 •

edited

Loading

jaruba commented Mar 4, 2024 •

edited

Loading

1313 commented Mar 5, 2024

Filtering, Max Files and Supporting URLs #26

Filtering, Max Files and Supporting URLs #26

Comments

jaruba commented Feb 13, 2024 • edited Loading

jaruba commented Feb 14, 2024 • edited Loading

1313 commented Feb 14, 2024

1313 commented Feb 19, 2024

jaruba commented Feb 21, 2024

1313 commented Mar 2, 2024 • edited Loading

jaruba commented Mar 4, 2024 • edited Loading

jaruba commented Mar 4, 2024 • edited Loading

1313 commented Mar 5, 2024

jaruba commented Feb 13, 2024 •

edited

Loading

jaruba commented Feb 14, 2024 •

edited

Loading

1313 commented Mar 2, 2024 •

edited

Loading

jaruba commented Mar 4, 2024 •

edited

Loading

jaruba commented Mar 4, 2024 •

edited

Loading