Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR: 7.0 - ESM - Only version #330

Closed
16 of 24 tasks
frank-dspeed opened this issue Mar 2, 2021 · 13 comments · Fixed by #362
Closed
16 of 24 tasks

PR: 7.0 - ESM - Only version #330

frank-dspeed opened this issue Mar 2, 2021 · 13 comments · Fixed by #362

Comments

@frank-dspeed
Copy link

frank-dspeed commented Mar 2, 2021

As NodeJS LTS Now Reached the state where ESM Works and we got a 100% Way to shim it for older NodeJS Versions it
is time to Refactor this to ESM Only which would directly run Inside Nodejs & WebBrowsers

Issus to Tackle in this Major

Old Issue that got noisy and out of topic discussions

TODO

  • Update Documentation
  • Drop use of class, this, null
  • Switch to Functional Composition and drop Mixins
  • Get TypeScript Happy with JSDOC Comments.
  • Create new Tests that are Working in NodeJS and the Brwoser.
  • Current Extension Model Replacement
    • Static Analyzeable Classes that do clear extend each other for all 3 situations Raw/with error/with sourcecode info
    • Functional Composition
    • Event Emitter Pattern

5th iteration conclusions

Findings:
The Result of new Parser() should get wrapped when extending the parser it is not so easy to follow the inharitance of objects with
the self written Mixin Class system that offers a later install method which will patch a eventual existing _bootstrap method that later
gets eventual called when using Parser.parse or Parser.parseFragment.

All extensions should be written as proxy the treeAdapter should be named NodeAdapter is it is not forming the tree out of browser view
it is a Node api so we should express it like that to make more clear for other developers that there is the place to manipulate Node
Creation and handling. Proxy support starts with Nodejs 6.4 shim https://github.com/GoogleChrome/proxy-polyfill/blob/master/src/proxy.js

  • v8 JSONParse Optimization https://v8.dev/blog/cost-of-javascript-2019
  • treeAdapter should get replaced by a new Node() hireOrderFunction.
  • ErrorProxyHandler.applyl(target,thisArg, args) where target is Parser,Tokenizer,OpenElementStack, Preprocessor).
  • srcCodeLocationInfo.apply(...) where target is Tokenizer and Preprocessor
  • debuggingProxy as new feature that logs propertys.
  • Object.create(null) replaced by {}
  • Optional Propertys should not get null assigned
  • While Class is a nice form of representation out of coder view it needs to get avoided because of security problems with this
    • The Clossest Solution would be a hireOrderFunction that gets a shared state passed so that both sides of the code can operate
      on the same sharedState Object that got passed to the hireOrderFunction
  • toImprove loading speed it would be preferable to apply the v8 JSONParse Optimization at last in production
  • the entietieMap data should get transformed from .json to .js to be import able in the browser and in nodejs
  • add additional types.js file that exports the JSDOC Comments so can be treeshaken. and TypeScript config.
    • generate .d.ts files if possible maybe (It is not needed but needs rethinking as there is not standard) TypeScript will find the import and it gets treeShaken away or typescript looksup the .d.ts file by its algo and the file does not get bundled
      I Think .js JSDOC import is better as it supports to even bundle the TypeDefintions)
  • drop mixin implementation first version should be directly a higherOrderFunction that takes sharedState so drop class we start with preprocessor
  • drop option-merge we got Object.assign and ...{}

8ef28cd
Lexer implementation started. 0.6.1 … master v6.0.1 Ivan Nikulin committed on 15 Mar 2013

Why ESM Only?

my final Way to go will be a ESM only version this can then get transpiled via babel to cjs or with anything else.

Then the ESM version gets released as new Major version and can get used with current NodeJS via import() inside CJS

the README will show instructions to create a CJS version if needed. And it will point out that this new Major Version is Only ESM and to use it you will need NodeJS Version > 9.7.x or the npm esm package.

What do you think about it?

Browser ESM Support

  • All Modern Browsers support it google "can i use esm" for details
  • shim to directly use esm https://stealjs.com
  • package build it with webpack or other bundlers for diffrent loaders

NodeJS ESM Support

  • NodeJS 13.2+ .use import or import()
  • NodeJS 9.7+ use --experimental-modules then use import or import()
  • NodeJS 6+ use "npm install esm" then use import or import()
  • With TypeScript or with node-ts you can use it inside .ts files.
  • Most other Environments like Current Browsers and Deno or GraalJS will also work out of the box

NodeJS & BrowserProxy Support

@frank-dspeed frank-dspeed changed the title PR: ESM - Only version PR: 7.0 - ESM - Only version Mar 11, 2021
@fb55
Copy link
Collaborator

fb55 commented Jun 6, 2021

@frank-dspeed Is there anything that can be done to move this work forward? I am about to fork the htmlparser2 tree adapter to resolve #327, and would love it if this could be resolved in the main module instead.

@frank-dspeed
Copy link
Author

frank-dspeed commented Jun 6, 2021

@fb55 this is a big project that contains a lot of small bugs and needs complet rewrite i will release it as parse6 and will also make sure that html6 is handled correct it makes no sense to move this codebase forward sorry to say that.

I have rewriten the whole script here 8 times i know exactly what i am saying.

Some Problems with the existing Codebase

  • Mixin & and Functional Class inharitance.
  • reassignment of function arrguments
  • Adding propertys to handle state transitions
  • Coupled tests that modify the proototypes.
  • The treeAdapter Concept is also broken and bad maintain able
    • going for treeConverts would be a cleaner way to extend.

so overall refactoring would end in a diffrent API so introduce so much changes that a new name is needed or all maintainers would need to agree on same principles

The biggest problem is really also the test framework as i recoded this i also needed to refactor the whole test Harness.

@fb55
Copy link
Collaborator

fb55 commented Jun 8, 2021

@frank-dspeed Thanks for the update! Looking forward to parse6, and curious to see how the new code will be structured.

Regarding parser adapters: I do think they provide a lot of value, and the biggest issue of the current parse5 project is testing at the wrong abstraction level. htmlparser2 also provides support for tree adapters, and the test suite only tests if the correct callbacks are being called. That way, tree implementations can be tested separately (see the domhandler module), and can be independent of the main module. parse5 users use their own adapters as well, eg. see https://github.com/jsdom/jsdom/blob/0024630fc62d1abb567cd531f9396b9a811424aa/lib/jsdom/browser/parser/html.js#L48.

To make things a bit easier, I'm happy to provide an interface for your parser from the domhandler module itself.

Please let me know if there is anything I can help with!

@43081j
Copy link
Collaborator

43081j commented Jul 11, 2021

see this #351

tests pass, seems to be stable.

i don't think you need all the extra "nice to have" refactorings to at least move to ESM. in that PR it is already done and was fairly straight forward.

the complexities here are more around refactoring legacy logic IMO, things that now can be written in a nicer way with features which didn't exist at the time parse5 was written.

i also don't think it makes sense by name to be pushing a 'parse6'. by all means throw whatever fork together you want, but it isn't version 6 of parse5, its a different project by a different maintainer. the 5 also refers to HTML5 (of which there isn't a 6th yet, and when there is a parse6 would make sense, but only then).

i've written most of the types for parse5 and dealt with it a fair amount, so if you are open to feedback before going off and throwing together a fork, i'd be happy to help.

@frank-dspeed
Copy link
Author

@43081j sure it is a diffrent project with a diffrent maintainer it has total diffrent goals i only call it parse 6 because i also wanted something that is editable to add upcoming html6 spec

@43081j
Copy link
Collaborator

43081j commented Jul 11, 2021

to be clear, im not opposed to a fork. inikulin hasn't seemed so active these days so it may be the right path to go down.

but i definitely wouldn't claim the "parse6" name. it seems like bikeshedding but it implies it is a progression from parse5, and we really should save the parse6 name for a HTML6-compatible parse5-like library. if your aim was to only support html6 (when it exists), it'd make some sense. but it isn't right now, its a fork of parse5, still a parser for HTML5, not a new version of parse5.

other than the name, a fork may be a good idea to clean up a fairly inactive project.

its likely ill also do a typescript build follow up from my ESM PR, but im happy to throw it all away if your fork progresses (under a new name) to a useable state. it'll still be useful to give you feedback from my own findings

@frank-dspeed
Copy link
Author

@43081j i see parse6 logical even today as parse 5 is ES5 and we did progress to ES6+ what do you think? and then there will be also html6

@43081j
Copy link
Collaborator

43081j commented Jul 11, 2021

it is named parse5 because it parses HTML5. i would assume parse6 is a parser for HTML6.

any fork anyone makes also isn't a newer version of parse5, it is a fork. i just think it would be better off under its own name to avoid confusion, not to lead people to believe it is by the same maintainers etc.

as for contributing back to the repo, IMO inikulin should really be open to a breaking change PR to move this project forward. but if he doesn't have the time or interest in this project anymore, a fork would make more sense

@fb55
Copy link
Collaborator

fb55 commented Aug 2, 2021

@frank-dspeed @43081j Awesome to see you two looking to move this project forward! @inikulin seems open to adding contributors, and has offered to do so in #327. I have just followed up with him, to see if this can happen soon.

I agree that breaking changes are very much doable within the parse5 project. A semver-major release should allow to eg. drop older engines. And HTML spec-compatibility is an absolute no-brainer imho, no matter what the current version number of HTML is.

I actually couldn't find anything about a WHATWG HTML6 — @frank-dspeed would you happen to have a link?

@43081j
Copy link
Collaborator

43081j commented Aug 2, 2021

fwiw my original plan before i realised others were doing similar work:

  • move the codebase to be ESM-only (not difficult, already mostly done in Move entire codebase to ESM #351)
  • move the codebase to be typescript (somewhat difficult, many dynamic scopes/contexts need refactoring)
  • refactor any poorly typed parts of the codebase which can't be strongly typed easily as-is (tree adapters etc)

i don't believe the right thing to do is fork it and rename it if inikulin has already said he is looking for contributors. these could easily be done in this existing repo and @inikulin could add trusted contributors with write access. no need for another org, or another package.

@frank-dspeed
Copy link
Author

frank-dspeed commented Aug 3, 2021

@43081j its all to late and fb55 did not accept fast so i will tell you what currently gets done. I Fork Cheerio, JSDOM, Parse5 and all the related Modules of them and create 1 clean codebase.

The most hard part was to decouple all the Stuff as it is connected in places where you would not expect that they are connected.

The most big change out of Advanced user view will be the EventEmitter Pattern i have a minimal event emitter pattern included that works on the web and inside Nodejs. I hate when some one overwrites existing functions on a Object so i Implemented the option to register callbacks via on for the relevant parts.

Also i refactored the stream stuff to iterators that can get used via a 1 liner to create a nodeJs stream.

sure i am doing at present more but that is what the user will see.

@43081j oh and i looked into your fork deeply the only thing that survived from it was the change to the "tdd ui"

but that was already a massiv help Thanks for that.

Update

#327 (comment) fb55 accepted and wants to merge this already into cheerio so you see as i want to have also cheerio and JSDOM in Clean ESM without Build but with TypeScript support without using .ts extension.

It is the right way to create a new Project with cheerio and parse5 as also JSDOM

@fb55
Copy link
Collaborator

fb55 commented Aug 3, 2021

To clarify: I don't want to merge parse5 into cheerio. I would like to move the project to an org, which will make it easier to manage contributors, create repos for sub-projects and potentially look for funding. The @cheeriojs org already has a lot of this set up and feels like a natural home for parse5 to me. If @inikulin doesn't feel the same way, having a dedicated org is a great option as well.

Regarding a fork: parse5 already has a big user base, and any fork would have to convince existing users to switch. Releasing a major version with some breaking changes has a much higher likelihood of finding adoption; most users will only use the parse* methods anyhow. If we do have to wait a bit longer to do this, I think it is worth it.

@43081j @frank-dspeed In the meantime, we should decide if we are going to work on this together. If so, let's try to have a joint vision of where this library should go.

@frank-dspeed
Copy link
Author

frank-dspeed commented Aug 3, 2021

@fb55 I am with you as long as

  • the package gets Maintained in ESM with typescript support so using the .js extension with package type module
  • No Usage of the "this" keyword as it makes it impossible for me to run that on environments like Cloudflare workers or other security-constrained context environments.
  • Working with Higher Order Functions so apply discipline on Assigning Variables and State (Functional Programming)
  • No NodeJS Only stuff like NodeJS Streams in the core.
  • Do not create own packages for single js files like the sax parser
  • move the test suite into its own package/
  • move the nodejs stuff into a own package/

When you agree on all that we are on the same page if you disagree with any point for any reason I need to fork it independently.

In general, I am really happy with my new version I found a lot of issues and learned a lot about converting a huge class-based NodeJS Project to ESM.

@fb55 fb55 linked a pull request Jan 13, 2022 that will close this issue
4 tasks
@fb55 fb55 closed this as completed in #362 Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants