Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(findExports): use acorn tokenizer to filter false positive exports #56

Merged
merged 14 commits into from Aug 3, 2022

Conversation

hubvue
Copy link
Contributor

@hubvue hubvue commented Jun 29, 2022

This PR is designed to address this issue #34

example:

// export { foo } from 'foo1';
// exports default 'foo';
// export { useB, _useC as useC };
// export function useA () { return 'a' }
// export { default } from "./other"
// export async function foo () {}
// export { foo as default }
//export * from "./other"
//export * as foo from "./other"

/**
 * export const a = 123
 * export { foo } from 'foo1';
 * exports default 'foo'
 * export function useA () { return 'a' }
 * export { useB, _useC as useC };
 *export { default } from "./other"
 *export async function foo () {}
 * export { foo as default }
 * export * from "./other"
 export * as foo from "./other"
 */

export { bar } from 'foo2';
export { foobar } from 'foo2';

The expected length of matches should be "2", but instead we get "17".

'\b' does not solve such problems, its semantics is the junction of words and non-words. Apparently the following code is also semantically correct.

// (\b is here)export { foo } from 'foo1';

Solutions:filter out comments before matching to avoid them affecting the matching process.

src/analyze.ts Outdated Show resolved Hide resolved
@pi0
Copy link
Member

pi0 commented Jun 29, 2022

We can also use strip-literal by @antfu. However, downside is that it involves full AST parsing and sloweer.

I think this is last tested version based on regex.

@antfu Can you please explain finally what were limitations regex couldn't support? Did you migrate unimport for a specific reason or avoiding possible edge cases in the future?

@pi0 pi0 closed this Jun 29, 2022
@pi0 pi0 reopened this Jun 29, 2022
@antfu
Copy link
Member

antfu commented Jun 30, 2022

Here are the cases that regexp can't possibly handle correctly (either incompletely removed, or overkill with quotes unmatched.

Context:

In addition:

  • strip-literal only does tokenizing instead of full parsing. It's probably the most efficient way to do this correctly.
  • While tokenizing requires the input to be valid JavaScript, I also introduced stripLiteralRegex as a fallback (implementation inherited from vite and unimport), you can either call it directly, or use stripLiteral() to do it with auto fallback.

@hubvue
Copy link
Contributor Author

hubvue commented Jun 30, 2022

Using strip-literal to filter noise is indeed a good option, but a problem arose during my testing.
example:

export { foobar } from 'foo2';

After strip-literal processing

export { foobar } from '     ';

Expected namedExports:

[{
    type: 'named',
    exports: ' bar ',
    specifier: 'foo2',
    code: "export { bar } from 'foo2';",
    start: 121,
    end: 148,
    names: [ 'bar' ]
}]

Received namedExports:

[{
    type: 'named',
    exports: ' bar ',
    specifier: undefined,
    code: 'export { bar }',
    start: 794,
    end: 808,
    names: [ 'bar' ]
}]

specifier got undefined

@hubvue
Copy link
Contributor Author

hubvue commented Jun 30, 2022

I have found that the presence of an export statement in the string also affects the result.

example:

const test1 = "export { ba1 } from 'foo2'"
const test2 = "testexport { bar2 } from 'foo2'"
const test3 = "test export { bar3 } from 'foo2'"
const test4 = "export { bar4 } from 'foo2' test"
const test5 = \`
  test1
  export { bar4 } from 'foo2' test
  test2
`

expected 0 got 4.

regexp matching is too cumbersome and perhaps AST is an option (but only if performance is accepted).

@pi0
Copy link
Member

pi0 commented Jun 30, 2022

Thanks for information @antfu. I guess performance would be reasonable and we can introduce two versions of find* utils for a regex-only extractor when performance/bundle is matter.

Can we maybe support a filter function from strip-literal? We want to avoid stripping strings that do not include an import for mlly case.

@antfu
Copy link
Member

antfu commented Jul 1, 2022

Can we maybe support a filter function from strip-literal?

While we could do it, I am not sure if there is a good way to detect whether a string should be kept or striped. Since strip-literal does not change the position of content, I would suggest maybe detecting the exports using the striped code, while reading the content using the original.

@hubvue
Copy link
Contributor Author

hubvue commented Jul 1, 2022

I would suggest maybe detecting the exports using the striped code, while reading the content using the original.

@antfu This is a great idea, I've been trying to figure out how to parse and filter out the export statements in the strings all morning and still haven't had a good result.

@hubvue hubvue requested a review from pi0 July 1, 2022 09:38
Copy link
Member

@pi0 pi0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes!

src/analyze.ts Outdated Show resolved Hide resolved
src/analyze.ts Outdated Show resolved Hide resolved
src/analyze.ts Outdated Show resolved Hide resolved
src/analyze.ts Outdated Show resolved Hide resolved
@hubvue hubvue requested a review from pi0 July 1, 2022 14:48
@hubvue
Copy link
Contributor Author

hubvue commented Jul 11, 2022

@pi0 Is there any progress on this PR?

@codecov
Copy link

codecov bot commented Jul 22, 2022

Codecov Report

Merging #56 (0a61617) into main (83502b4) will increase coverage by 0.36%.
The diff coverage is 76.00%.

@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   52.52%   52.89%   +0.36%     
==========================================
  Files          13       13              
  Lines        2115     2159      +44     
  Branches      171      179       +8     
==========================================
+ Hits         1111     1142      +31     
- Misses        842      847       +5     
- Partials      162      170       +8     
Impacted Files Coverage Δ
src/analyze.ts 80.68% <76.00%> (-3.41%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@pi0 pi0 changed the title fix: filter out commented code to eliminate the effect of regular mat… feat(findExports): use acorn tokenizer Aug 3, 2022
@pi0 pi0 changed the title feat(findExports): use acorn tokenizer feat(findExports): use acorn tokenizer to filter false positive exports Aug 3, 2022
@pi0
Copy link
Member

pi0 commented Aug 3, 2022

Thanks for working on this PR @hubvue <3

@pi0 pi0 merged commit 7039f54 into unjs:main Aug 3, 2022
@hubvue
Copy link
Contributor Author

hubvue commented Aug 3, 2022

@pi0 I feel very happy to be able to contribute to an open source project. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants