Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to (de)serialize parsed Node objects #26871

Closed
4 tasks done
timocov opened this issue Sep 4, 2018 · 11 comments
Closed
4 tasks done

Add API to (de)serialize parsed Node objects #26871

timocov opened this issue Sep 4, 2018 · 11 comments
Labels
Needs More Info The issue still hasn't been fully clarified

Comments

@timocov
Copy link
Contributor

timocov commented Sep 4, 2018

Search Terms

Serialization, deserialization, dumping, dump source file, compiler cache

Related issues

#25658, TypeStrong/ts-loader#825

Suggestion

I'd like to suggest to add an API to be able to (de-)serialize any parsed Node or make parsed AST be easy (de-)serializable by, for example, JSON.stringify().

Use Cases

It can be used to make cache of parsed source files, which can be used between compilations without parsing unchanged source files again.

For example, for our project at the moment we have the following times for compilation:

$ ./node_modules/.bin/tsc --diagnostics --skipLibCheck
Files:          1767
Lines:        226078
Nodes:       1036945
Identifiers:  315470
Symbols:      380158
Types:        137886
Memory used: 497393K
I/O read:      0.63s
I/O write:     2.09s
Parse time:   10.79s
Bind time:     2.10s
Check time:   12.47s
Emit time:    10.72s
Total time:   36.09s
$ ./node_modules/.bin/tsc --diagnostics --skipLibCheck --noEmit
Files:          1767
Lines:        226078
Nodes:       1036945
Identifiers:  315470
Symbols:      380158
Types:        137886
Memory used: 500125K
I/O read:      0.83s
I/O write:     0.00s
Parse time:   15.20s
Bind time:     3.13s
Check time:   17.59s
Emit time:     0.00s
Total time:   35.92s

(I have no idea why --noEmit does not reduce "Total time" by "Emit time" - it is strange)

I hope, if we'll have such API or even cache by tsc, it can reduce compilation time, especially Parse time.

Also, in this case, the TypeScript package can be published with already parsed lib files, what can reduce compilation time even without enabling --skipLibCheck option.

Examples

For example, we can override getSourceFile method in CompilerHost and provide files from cache if file is not changed since last compilation. It can be done via provided methods like ts.serializeNode(node: ts.Node): string | ByteArray / ts.deserializeNode(nodeData: string | ByteArray): ts.Node.

Also it can be added to tsc command as arg --use-cache (or something like that) to do it by default (but I believe in this case we also need to have new option like cacheDir or --force).

The similar cache can be found in bazel rules_typescript, but it is in-memory cache.
In our internal tool we have very similar in-memory cache too (it allows speedup multiple compilations the same files).

Checklist

My suggestion meets these guidelines:

  • This wouldn't be a breaking change in existing TypeScript / JavaScript code
  • This wouldn't change the runtime behavior of existing JavaScript code
  • This could be implemented without emitting different JS based on the types of the expressions
  • This isn't a runtime feature (e.g. new expression-level syntax)
@Nipheris
Copy link

Nipheris commented Sep 4, 2018

Canonical S-expressions could be a candidate format.

@timocov timocov changed the title Add API to (de)derialize parsed Node objects Add API to (de)serialize parsed Node objects Sep 4, 2018
@AlCalzone
Copy link
Contributor

AlCalzone commented Sep 5, 2018

Not sure if I misunderstand this issue, but is the watch mode not an option? I'm using that in my project https://github.com/AlCalzone/virtual-tsc (with a barebones language server host and an in-memory FS) where compilation times drop from a few seconds on the first compilation to a few milliseconds when only single files get changed.

@timocov
Copy link
Contributor Author

timocov commented Sep 5, 2018

Not sure if I misunderstand this issue, but is the watch mode not an option?

Not exactly. Watch mode by tsc is already fast (1-2 seconds delay is ok in watch mode for me) - this feature request is not about it.

Also it can be added to tsc command as arg --use-cache (or something like that) to do it by default (but I believe in this case we also need to have new option like cacheDir or --force).

For example, you work on your feature branch and now you need to switch to master/dev branch. You stop tsc --watch (to avoid conflicts with your vcs), switch to master/dev and then run tsc --watch again and now it takes some time (for me it is up to 1 minute).

Or, for example, even share caches between co-workers/builds in CI/etc to do not parse unchanged since some time files.

@RyanCavanaugh
Copy link
Member

Is this an XY problem? i.e. is the request here to make compilation faster in general, or do you specifically need a serialized parse tree for some other reason?

I will say that 10-15s to parse 226k lines seems quite high and may indicate a real bug somewhere. Our self-host compilation of compiler parses 83k lines in 0.71s; your parse time "should" only be 3-4s.

@RyanCavanaugh RyanCavanaugh added the Needs More Info The issue still hasn't been fully clarified label Sep 5, 2018
@timocov
Copy link
Contributor Author

timocov commented Sep 5, 2018

@RyanCavanaugh

Is this an XY problem? i.e. is the request here to make compilation faster in general

The thing which inspired me to create this issue is the compilation performance.

or do you specifically need a serialized parse tree for some other reason?

It would be nice to have such API, but if compilation/parse time will be fast, I hope that nobody needs to store parsed tree to file (because parse time will be similar to reading and restoring that tree from other file).

your parse time "should" only be 3-4s.

Wow, it looks good. But anyway I'm not sure will it be faster than restoring ast tree from file, but we need to check anyway.

10-15s to parse 226k lines seems quite high and may indicate a real bug somewhere

Hm. I'm open to help indicate that problem. What I can provide you to help find the bug if it is?

@timocov
Copy link
Contributor Author

timocov commented Sep 5, 2018

(I have no idea why --noEmit does not reduce "Total time" by "Emit time" - it is strange)

Also it would be nice to understand what's happened here and find the cause. Please let me know if something can help to understand the problem.

@ajafff
Copy link
Contributor

ajafff commented Sep 5, 2018

(I have no idea why --noEmit does not reduce "Total time" by "Emit time" - it is strange)

Are you using declaration: true? In that case declaration emit will still happen under the hood to collect declaration diagnostics and therefore still needs quite some time.

@timocov
Copy link
Contributor Author

timocov commented Sep 5, 2018

Are you using declaration: true?

No, at that moment I specially disable this flag to avoid extra checking/emitting. With enabled declaration: true flag total time is about 45-60s (but I haven't diagnostics for this runs - if you need this I can provide they tomorrow).

@timocov
Copy link
Contributor Author

timocov commented Sep 6, 2018

@ajafff it seems that it was some lag on my side - I just run compilations again and noEmit flag makes a sense on total time now. But parse/total time is still slow.

noEmit: true:

$ ./node_modules/.bin/tsc --diagnostics --skipLibCheck --noEmit
Files:          1767
Lines:        226078
Nodes:       1036945
Identifiers:  315470
Symbols:      380158
Types:        137886
Memory used: 495875K
I/O read:      0.70s
I/O write:     0.00s
Parse time:   12.41s
Bind time:     2.14s
Check time:   13.04s
Emit time:     0.00s
Total time:   27.59s
$ ./node_modules/.bin/tsc --diagnostics --skipLibCheck --noEmit --declaration true
Files:          1767
Lines:        226078
Nodes:       1036945
Identifiers:  315470
Symbols:      380158
Types:        137886
Memory used: 491434K
I/O read:      0.67s
I/O write:     0.00s
Parse time:   11.59s
Bind time:     2.19s
Check time:   13.24s
Emit time:     0.00s
Total time:   27.03s

noEmit: false:

$ ./node_modules/.bin/tsc --diagnostics --skipLibCheck
Files:          1767
Lines:        226078
Nodes:       1036945
Identifiers:  315470
Symbols:      380158
Types:        137886
Memory used: 498003K
I/O read:      0.62s
I/O write:     4.53s
Parse time:   12.21s
Bind time:     2.10s
Check time:   12.84s
Emit time:    14.89s
Total time:   42.03s
$ ./node_modules/.bin/tsc --diagnostics --skipLibCheck --declaration true
Files:          1767
Lines:        226078
Nodes:       1036945
Identifiers:  315470
Symbols:      380158
Types:        137886
Memory used: 543786K
I/O read:      0.65s
I/O write:     8.64s
Parse time:   11.57s
Bind time:     2.05s
Check time:   12.84s
Emit time:    26.95s
Total time:   53.40s

@timocov
Copy link
Contributor Author

timocov commented Sep 11, 2018

@RyanCavanaugh ping

@timocov
Copy link
Contributor Author

timocov commented Sep 12, 2018

Okay, I just ran with --extendedDiagnostics and now we have Parse time along with Program time:

Parse time:      2.25s
Program time:   10.83s

It seems that parse time is ok now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs More Info The issue still hasn't been fully clarified
Projects
None yet
Development

No branches or pull requests

5 participants