Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: joist-doc #711

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

feat: joist-doc #711

wants to merge 12 commits into from

Conversation

brudil
Copy link
Collaborator

@brudil brudil commented Jun 26, 2023

Getting up a WIP version of this to see if you're happy to progress with.

Overview

Implemented as a joist-doc package, which run within codegen after file emit, if "docGen": "markdown" is set in joist-config.json. Is a seperate optional package which would need to be part of the user's dependencies.

joist-doc depends on joist-codegen and is given the Config and EntityDbMetadata. Using the EntityDbMetadata object, it runs against the known file types, (Entities [todo], EntityCodegen, Enums [todo]).

The files are parsed with babel. Originally this was supposed to ideally use the user's typescript but the typescript parser doesn't have the best support for comments within the AST, relying on the complier's linting infrastructure.

From there, each of the file types can traverse the AST finding specific nodes that might need comments. When a commentable node is found, the CommentStore is called to optionally provide a comment. If the CommentStore returns a comment, it is added back to the AST to be printed back out and written to the file.

This PR implements a default MarkdownCommentStore implementation, which uses remark to parse the markdown for easy manipulation.

MarkdownCommentStore opts for loose-ish heriustics on determining comments, allowing for greater flexability in how users format their markdown files.

The goal was to support both these files:

Something structured, more detailed, nested like

# User
## Overview
A User entity is...

## Fields
### email
Something about their email

### Computed Fields
#### isActive
If a login has occured within the last 30 days

And something looser like

# User
A User entity is...

## email
Something about their email

## isActive
If a login has occured within the last 30 days

Thus the implemented herustics are as follows:

  • for Fields: we find a matching heading (of any level) of the field name
  • for Root descriptions (entities): we use the following rules:
    • paragraphs from the top
    • paragraphs from the following the first heading
    • paragraphs from the second heading if it preceeds the first and is called "Overview"

All content is collected until it hits a non-paragraph element. This allows for a neat trick of using a horizontal rule within your doc to stop what is included within the comments, but allowing for more depth within the markdown file.

We obviously might want to tweak these or improve them.


There's a little more to go and some code to clean up but wanted to see this direction fits in with what we discussed. Testing, too, is a little bit of an unknown, as the integration tests don't care about the actual text within the codegen'd files. Snapshot testing could be used put will be a pain to maintain for other codegen changes.

@brudil brudil linked an issue Jun 26, 2023 that may be closed by this pull request
@stephenh
Copy link
Collaborator

This looks amazing! Yep, definitely looks like a great approach/direction.

In terms of babel, that sgtm; I had seen ts-morph before, i.e. https://ts-morph.com/setup/ast-viewers, which I think could be a candidate for this? I was trying to see if it used babel or not, but I don't see it pulling in any babel/similar dependencies, so my current guess is that it uses the tsc compiler API, so then it probably/I assume has the same limitation you'd found wrt comments.

Fwiw I'm pretty happy to not use the tsc API b/c I think it will also be not-the-fastest.

Granted, anything we write (in JS) will probably need caching; i.e. if FooCodegen.ts / Foo.ts / Foo.markdown haven't changed, then don't pay the cost of re-parsing + re-stitching + re-formatting the AST. Definitely something we can add later, but will probably be required for our internal project to adopt this.

(When Joist used prettier for formatting, I'd started doing some caching to avoid the cost of formatting every *.ts file on every yarn joist-codegen b/c it was taking 10s of seconds just to format the entire generated output each time, but then switching to the Deno project's Rust-based dprint formatter made formatting so fast, I was able to just take the caching out.)

as the integration tests don't care about the actual text within the codegen'd files.

True; fwiw the rationale for checking the generated files into git is to allow admittedly-manual regression testing as PR reviewers can see exactly what changed in the generated files.

Granted, that's not as great as true red/green unit tests, but 🤷 I'd be fine with it to start with, and add fancier/automated assertions later/if needed.

This is really neat!

@stephenh
Copy link
Collaborator

Wdyt about adding a hint to the comments that "this comment comes from Author.md, so don't edit it".

Granted, for AuthorCodegen.ts I think it's kinda implicit the user should not edit the comments, but if they're in Author.ts and editing the class tsdoc, I could see that being confusing that they really need to go edit the Author.md...

@brudil
Copy link
Collaborator Author

brudil commented Jun 26, 2023

A source hint sounds like a great idea, I'll implement that for the Entity files.

On perf, this sounds good. Does joist-codegen not emit on non-changed files? Just trying to figure if this is caching "this doesn't need to be touched" or in terms of "this file hasn't changed (hash or something), and we've cached the previous output"

@brudil
Copy link
Collaborator Author

brudil commented Jun 26, 2023

@stephenh Out of curiosity how many "doc-able" structures (Entities, enums) do you have internally the project? I've kept the current deps light, but it might be worth pulling in p-limit or the like as currently every transformation is executed within a single Promise.all.

@stephenh
Copy link
Collaborator

does joist-codegen not emit on non-changed files?

I had some infra in place to do that, with the focus on "do not emit or format unchanged files", but ended up taking it out b/c with dprint (instead of prettier) it was basically just as fast to format + emit all files every time.

Here's the PR that removed the caching:

#493

It was a little tricky given the goal was to cache based on the pre-formatted content. If we add back caching, it hopefully can be simpler, b/c in theory we wouldn't need the "pre-format" aspect of the prior approach.

do you have internally the project

We have about 200 tables, ~150 entity tables and ~50 enum tables. So nothing that has hit the ~1000s of tables mark, but enough that prettier-ing ~200-or-so files become noticeably slow.

@brudil
Copy link
Collaborator Author

brudil commented Jun 26, 2023

Implemented caching with file-system-cache (https://www.npmjs.com/package/file-system-cache) which is used by Storybook, persisting to a .cache directory. Let me know if you have another preference for this though.

With no changes/fully cached joist-doc spends 7ms running in the integration dir, so this should be scalable to 200 entities.

Should get time to tackle enums and entity files themselves tomorrow.

@brudil
Copy link
Collaborator Author

brudil commented Jun 26, 2023

Entity files and enum supported added.

Now we've got caching, I've added dprint to get the files back to where they were before joist-doc ran.

I've taken the config directly from ts-poet - it is improved, but still not as if only the comments where added. I'm not sure if this is because I've used @dprint/formatter and not the dprint-node that ts-poet uses - perhaps some changes between versions? Or if those lines are lines that are just new lines from codegen which are lost within babel.

Let me know if you have any thoughts.

Otherwise, I think this is pretty much there in terms of functionality.

@brudil brudil marked this pull request as ready for review June 27, 2023 08:55
@stephenh
Copy link
Collaborator

Sweet! Haven't had a chance to take a look yet, but yeah in terms of @dprint/formatter vs. dprint-node, afaiu they are basically separate projects for two of the dprint core contributors, where @dprint/formatter is "more official" but has a ~kinda-weird imo approach where it downloads plugins + runs them in web assembly, vs. dprint-node which is just a stock node native library.

I briefly played with @dprint/formatter in ts-poet, but the web assembly approach incurs a mini-compilation on each run, so dprint-node ended up being noticeably faster.

Is it possible to keep using dprint-node?

@brudil
Copy link
Collaborator Author

brudil commented Jun 27, 2023

Sure thing, I'll get that moved over.

@brudil
Copy link
Collaborator Author

brudil commented Jun 27, 2023

Moved over just fine.

Copy link
Collaborator

@stephenh stephenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for taking a few days to get to this, but looks great @brudil ! I haven't ran it on our internal app yep, but looking forward to trying it out!

packages/doc/src/MarkdownCommentStore.ts Show resolved Hide resolved
packages/doc/src/MarkdownCommentStore.ts Outdated Show resolved Hide resolved
*
* Focused around Joist concepts (Entities, Fields etc) over the files themselves.
*/
export abstract class CommentStore {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we have no shared/base impls, wdyt of making this an interface?

Copy link
Collaborator Author

@brudil brudil Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing we can't do with an interface is define the constructor signature right? Can change if that's a willing trade off.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, that's true; I'd assumed plugins would want the freedom of "just implement this interface", but if they extend the class, you're right it does document the constructor for things like accepting config. Otherwise accepting the config would have to be like an init method or implicit. Sounds good to keep it then!

entity: EntityDbMetadata,
fieldName: string,
source: FieldSourceType,
generated: boolean,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, maybe call this isCodegen or isCodegenFile? I was thinking "well everything that we're outputting is generated", but we meant like the base-vs-codegen file.

I was thinking maybe the integration could inject this, instead of the comment store, but I guess the comment store is the one that knows the file name.

Wdyt of maybe always including the "generated from ..." suffix? Like even if we're writing to a codegen file, I could see someone reading the getter comments and seeing the same "ah I go to Foo.md" hint being somewhat useful. Dunno, unless you think it'd be too cluttered and we should taken the opportunity of "this is a codegen file that is obvious the comment is generated from somewhere" to skip the boilerplate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah. I get the name issue. We could flip it to isUserland or the like.

CommentStore has been built to not assume file backing; if someone (no judgement on if this is a terrible idea) wanted to build something like NotionCommentStore.

I'll remove this from the CommentStore API, and just always incude it within MarkdownCommentStore. We can always add it back in if it does get cluttered.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought on this one; if we wanted it to be handled by the integration, we could have the CommentStore API return something like

type CommentResult = { comment: string, source?: string }

So we can easily build the string of Generated from {source} if given.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the spam; one final thought. I wondered if there's a relevant TSDoc tag to use for this. The one that perhaps best fits is @privateRemarks or alternatively a custom tag such as @source

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed a custom tag like @source seems neat!

I like the { comment: string; source?: string } API and then, yeah, let the integration decide whether it injects the source (or maybe just always inject the @source, even for comments in AuthorCodegen.ts).

.then((contents) => {
return remark.parse(contents);
})
.catch(() => null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do a console.warn here just so that users can know that we skipped their file due to an error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This essentially catches the file not existing cases, which might be quite noisy to log? It's perhaps a badly placed catch as it's really for the readFile, not the remark.parse which I can't imagine throws that often.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, agreed the file missing shouldn't log...

I guess you're probably right that remark.parse doesn't throw very often/if ever given how lenient markdown is... I was thinking of like parsing ASTs and things, where parse errors are common, but, right the worst thing that happens with Markdown is an misinterpretation of your wip content, and not a true parse error.

Makes sense!

packages/integration-tests/src/entities/UserCodegen.ts Outdated Show resolved Hide resolved
* line comment with
*
* line breaks and also to show that we just use headings, and not the level, allowing for more customisation in users docs.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if tsdoc wants/expects comments on both the getter and the setter? I would have guessed when like viewing this in vscode, there is only one logical ipAddress property, so it would just like pick one, say the getter, and treat that as where comments should go.

Or maybe it like merges the getter+setter comments? Not sure...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did a bit of a deep dive looking in to this before starting. I believe this is a bit messy at the moment:

  • set & get are treated independently at a TSDoc level
  • They are merged at usage. There's an open bug for this though (Duplicate jsdoc tags from getters and setters microsoft/TypeScript#48801), so I didn't want to model how this works around what is marked as a bug. With the fieldSourceType property, it would be possible for CommentStores to tweak this, by deciding to only provide comments for getters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, does the "merged at usage" apply to tsdoc output (not just internally)?

Kinda wondering if we could just annotate the getter and assume readers are either just looking at the getter, or using vscode / something that merges the getter+setter since typically the getter+setter is exposed as a single logical property.

Does that seem reasonable, or does the doc output you're using really look best with the comment included in both the getter + setter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to move to just getters, it's an easy thing to change if there is ever movement on the above issue (which I think essentially refers to how the TS LSP exposes this stuff) and as it's touching code-generated files.

You can see the behavior here, hoving over the usages at the bottom. https://www.typescriptlang.org/play?#code/MYGwhgzhAEAKYCcCmA7ALtA3gKGtX0A9AFTEHTHQCS0A5kmmgJYq3RgC270KnSAdOWKEC9DLw5IAFAEos2AgF8FePARJlVlGhAbNW7LmB59BWkXl3i+s+UpWr1pIdUPc0ACyTQIpoRboGaCYAE1RmAE9bHHtyfDwNFxpOdy8fP3MCK2Cw9CYouRi8ZWVsFCQAdzhEcNl+CSQFcqr4ZHQ60PD8oA

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, thanks for the link! I thought "merged" was like "appended", but if they're the same, it's deduped, so then, yeah, seems like not a big deal to put them on either getters-only or setters-only or what not... Probably fine to leave as-is if/until we have a reason to do something different.

packages/doc/src/Cache.ts Outdated Show resolved Hide resolved
packages/doc/src/index.ts Outdated Show resolved Hide resolved
packages/doc/src/index.ts Show resolved Hide resolved
@brudil
Copy link
Collaborator Author

brudil commented Jun 29, 2023

Couple of discussion points left, resolved the others.

@@ -46,6 +46,8 @@ export interface Config {
/** Your application's request-level `Context` type. */
contextType?: string;

docGen?: string;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda picky/ocd, but maybe just "docs" instead of docGen?

);
readonly allPublisherAuthorNames: AsyncProperty<Author, string | undefined> = hasReactiveAsyncProperty({
publisher: { authors: "firstName" },
}, (author) => author.publisher.get?.authors.get.flatMap((a) => a.firstName).join());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, this reformatting might be problematic, b/c I think it's undoing whatever prettier/dprint would do...

Hm, I wonder if this is just a legitimate difference between prettier & dprint--like dprint is "prettier-ish", but not 100% the same, and we generally keep them separated out, like *Codegen.ts is done by dprint (fast b/c we regularly format all files) & *.ts is done by prettier (not fast, but joist-codegen only touches them once on file creation, so it doesn't matter).

But, I bet if we more regularly "ran dprint on prettier-controlled files", or "ru nprettier on dprint-controlled files", we'd get these sorts of things...

Makes me wonder if each integration needs to know "is my source file usually formatted by dprint or by prettier" and then just use the appropriate formatter...

Kinda annoying, but given you'd added caching for the doc-gen piece, we should be good to use prettier I think...everyone's very first yarn joist-codegen run will re-prettier the entity files to get their .cache populated, but that shouldn't be too bad...

Wdyt of like conditional dprint-vs-prettier formatting of the output?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned it above, I'll try out recast because I'm thinking if we can get perfect comment insertion without touching the rest of the line, we should be able to avoid all this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Fwiw a little curious to see how recast handles it, b/c even if it parses/outputs the existing code with 100% accuracy (which will be great), we'll still be inserting net-new/changed code, which will probably? have to go through a formatter? Maybe?

Or maybe if recast keeps everything basically as-is, and only our ~few lines of new comments are what "need formatting", then it won't matter as much whether we use prettier or dprint? Not sure...

@brudil
Copy link
Collaborator Author

brudil commented Aug 3, 2023

Hey, just to say I haven't forgotten this. Been busy elsewhere, but hope to find time to get this over the line once I'm back off holiday.

@stephenh
Copy link
Collaborator

stephenh commented Aug 5, 2023

Great to hear from you @brudil ! Definitely understood getting busy; all of Joist has been built in fits & spurts over the years :-), so whenever you can pick this back up again would be great/np.

I am curious, how's your Objection iirc to Joist migration going? Anything running in production yet? Understood switching ORMs is a huge endeavor for a codebase, so definitely np if you're still prototyping/trying to find time/etc., but would enjoy hearing if you did actually get Joist into production. 🤞 :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The unfathomable bridge of tsdoc & code-generation
2 participants