Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to return raw file offsets from within the tar? #162

Open
jeroen opened this issue Jan 4, 2024 · 2 comments
Open

Is it possible to return raw file offsets from within the tar? #162

jeroen opened this issue Jan 4, 2024 · 2 comments

Comments

@jeroen
Copy link

jeroen commented Jan 4, 2024

I would like to generate an index of a tar file with the start and end offset of each file in the tarball, such that I can mmap or extract a single file later on. Is this possible with tar-stream?

The documentation of headers only mentions the size of each file, but I would also need the offset within the tar.

From hacking it looks like the global property extract._buffer.shifted contains what I need but this is mostly a guess. It would be nice if the header callback could include the offset property for each entry.

@jeroen
Copy link
Author

jeroen commented Jan 4, 2024

Here is what I puzzled together. Does this seem right? Is there a more efficient way to do this:

const fs = require('fs')
const tar = require('tar-stream')
const gunzip = require('gunzip-maybe');

function tar_index(path){
  const input = fs.createReadStream(path);
  const extract = tar.extract();
  let output = [];
  return new Promise(function(resolve, reject) {

    function process_entry(header, stream, next) {
      var offset = extract._buffer.shifted
      //console.log(stream)
      output.push({
        name: header.name,
        offset: offset,
        size: header.size
      });
      stream.on('end', function () {
        next() //read for next file
      })
      stream.on('error', reject);
      stream.resume();
    }

    function finish_stream(){
      resolve(output);
    }

    var extract = tar.extract({allowUnknownFormat: true})
      .on('entry', process_entry).on('finish', finish_stream).on('error', reject)
    input.pipe(gunzip()).pipe(extract);
  }).finally(function(){
    input.destroy();
  });
}

tar_index('myfile.tar.gz').then(console.log)

@mafintosh
Copy link
Owner

Should be easy to add yea. Feel free to PR that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants