Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support generator #28

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

madeindjs
Copy link

It might improve the performances to use a Generator when reading big files.

I just added an overide function which allow this

export default function parseGitDiff(
  diff: Generator<string, any, unknown>,
  options?: GitDiffOptions
): Generator<AnyFileChange, any, unknown>;

@yeonjuan
Copy link
Owner

yeonjuan commented Jan 9, 2024

Hi @madeindjs Thanks for the contribution.
I'm wondering if it slows down a lot when using large files as input, have you had any cases like that?

@madeindjs
Copy link
Author

So the thing is I tried with a large diff (25Mo) and the performances are still fine. I just tried to experimente some stuff to improve the performances: The execution time was identical. But it surely improve memory consumption. It's might also be cool if user wants to iterate on each changes without storing them in memory.

It's up to you to decide if this PR is good to have or not.

BTW, I tried to go further and use AsyncGenerator when reading the file (here) like this

async function* getGenerator() {
  const stream = createInterface({ input: createReadStream(FILE) });
  for await (const line of stream) yield line;
}

for await (const change of parseGitDiffAsync(getGenerator())) console.log(change);

...but my conslusion was the execution time become worse (aroud 2 times slower, probably because it requires that I add async everywhere). But it can still be intersting for large file which doesn't fit in memory.

@yeonjuan
Copy link
Owner

yeonjuan commented Jan 9, 2024

But it surely improve memory consumption. It's might also be cool if user wants to iterate on each changes without storing them in memory.

@madeindjs Agree! Thanks for the explanation.
What do you think about the supporting streams instead of using generators?
After seeing the above example, I realized that it would be more usable if we support streams.

@madeindjs
Copy link
Author

Yes I agree, I'll update the PR with the support of stream instead of generator.

My only concern is I duplicated parse-git-diff.ts as parse-git-diff-async.ts to be able to introduce AsyncGenerator. I didn't find a way to factorize things because everything need to be async.

@madeindjs
Copy link
Author

madeindjs commented Jan 9, 2024

@yeonjuan , I added the support of Interface from Node.js readline package. It allow a sexy syntax like:

import { createInterface } from 'node:readline';
import { createReadStream } from 'node:fs';
import parseGitDiffAsync from './parse-git-diff-async.js';

const FILE = '/home/alexandre/Downloads/v3_2_0...v3_3_0.diff';
const stream = createInterface({ input: createReadStream(FILE) });
for await (const change of parseGitDiffAsync(stream)) i++;

The cool thing is readline might allow user to introduce CLI tools which allow pipe operator from the shell like git diff | script

Regarding performances , I noted thoses performances using my 25mb diff

method time
parseGitDiff(diff: string) 0.30s
parseGitDiff(diff: Generator) 0.25s
parseGitDiffAsync(diff: AsyncGenerator) 0.47s

I let you take a look at the PR and comment. As I said, I'm not that happy of the code because I copy/pasted many stuff. There is surely a way to factorize it better.

@yeonjuan
Copy link
Owner

yeonjuan commented Jan 10, 2024

Hi @madeindjs Thanks for the work!
I agree that the code needs to be cleaned up, and I'm thinking about separating the public interfaces.
In the end, I'd like to have something like this.

import parseGitDiff, { parseGitDiffs, parseGitDiffsGenerator, parseGitDiffsStream } from 'parse-git-diff';

parseGitDiff; // deprecated

parseGitDiffs; // synchronized version
parseGitDiffsGenerator; // for supporting generator
parseGitDiffsStream; // for supporting stream

And it would be nice to change the output format for consistency.

  • AS-IS
{
   type:"GitDiff",
   files: [...]
}
  • TO-BE (remove root)
[
   { diff }, { diff }, { diff }
]

This might take a little longer. What do you think about what it would be like if I worked on this PR base?

@madeindjs
Copy link
Author

Yes I agree that the ouput needs to be changed, but I didn't do it since it's a breaking change for the API.

I personally prefer the overides way because it looks better in term of developper experience. But it's your librairy, do as you like you prefer, don't worry 😉

Of course, I'm fine that you work on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants