feat: joist-doc #711

brudil · 2023-06-26T12:22:25Z

Getting up a WIP version of this to see if you're happy to progress with.

Overview

Implemented as a joist-doc package, which run within codegen after file emit, if "docGen": "markdown" is set in joist-config.json. Is a seperate optional package which would need to be part of the user's dependencies.

joist-doc depends on joist-codegen and is given the Config and EntityDbMetadata. Using the EntityDbMetadata object, it runs against the known file types, (Entities [todo], EntityCodegen, Enums [todo]).

The files are parsed with babel. Originally this was supposed to ideally use the user's typescript but the typescript parser doesn't have the best support for comments within the AST, relying on the complier's linting infrastructure.

From there, each of the file types can traverse the AST finding specific nodes that might need comments. When a commentable node is found, the CommentStore is called to optionally provide a comment. If the CommentStore returns a comment, it is added back to the AST to be printed back out and written to the file.

This PR implements a default MarkdownCommentStore implementation, which uses remark to parse the markdown for easy manipulation.

MarkdownCommentStore opts for loose-ish heriustics on determining comments, allowing for greater flexability in how users format their markdown files.

The goal was to support both these files:

Something structured, more detailed, nested like

# User
## Overview
A User entity is...

## Fields
### email
Something about their email

### Computed Fields
#### isActive
If a login has occured within the last 30 days

And something looser like

# User
A User entity is...

## email
Something about their email

## isActive
If a login has occured within the last 30 days

Thus the implemented herustics are as follows:

for Fields: we find a matching heading (of any level) of the field name
for Root descriptions (entities): we use the following rules:
- paragraphs from the top
- paragraphs from the following the first heading
- paragraphs from the second heading if it preceeds the first and is called "Overview"

All content is collected until it hits a non-paragraph element. This allows for a neat trick of using a horizontal rule within your doc to stop what is included within the comments, but allowing for more depth within the markdown file.

We obviously might want to tweak these or improve them.

There's a little more to go and some code to clean up but wanted to see this direction fits in with what we discussed. Testing, too, is a little bit of an unknown, as the integration tests don't care about the actual text within the codegen'd files. Snapshot testing could be used put will be a pain to maintain for other codegen changes.

stephenh · 2023-06-26T14:09:49Z

This looks amazing! Yep, definitely looks like a great approach/direction.

In terms of babel, that sgtm; I had seen ts-morph before, i.e. https://ts-morph.com/setup/ast-viewers, which I think could be a candidate for this? I was trying to see if it used babel or not, but I don't see it pulling in any babel/similar dependencies, so my current guess is that it uses the tsc compiler API, so then it probably/I assume has the same limitation you'd found wrt comments.

Fwiw I'm pretty happy to not use the tsc API b/c I think it will also be not-the-fastest.

Granted, anything we write (in JS) will probably need caching; i.e. if FooCodegen.ts / Foo.ts / Foo.markdown haven't changed, then don't pay the cost of re-parsing + re-stitching + re-formatting the AST. Definitely something we can add later, but will probably be required for our internal project to adopt this.

(When Joist used prettier for formatting, I'd started doing some caching to avoid the cost of formatting every *.ts file on every yarn joist-codegen b/c it was taking 10s of seconds just to format the entire generated output each time, but then switching to the Deno project's Rust-based dprint formatter made formatting so fast, I was able to just take the caching out.)

as the integration tests don't care about the actual text within the codegen'd files.

True; fwiw the rationale for checking the generated files into git is to allow admittedly-manual regression testing as PR reviewers can see exactly what changed in the generated files.

Granted, that's not as great as true red/green unit tests, but 🤷 I'd be fine with it to start with, and add fancier/automated assertions later/if needed.

This is really neat!

stephenh · 2023-06-26T14:13:25Z

Wdyt about adding a hint to the comments that "this comment comes from Author.md, so don't edit it".

Granted, for AuthorCodegen.ts I think it's kinda implicit the user should not edit the comments, but if they're in Author.ts and editing the class tsdoc, I could see that being confusing that they really need to go edit the Author.md...

brudil · 2023-06-26T14:54:57Z

A source hint sounds like a great idea, I'll implement that for the Entity files.

On perf, this sounds good. Does joist-codegen not emit on non-changed files? Just trying to figure if this is caching "this doesn't need to be touched" or in terms of "this file hasn't changed (hash or something), and we've cached the previous output"

brudil · 2023-06-26T15:43:23Z

@stephenh Out of curiosity how many "doc-able" structures (Entities, enums) do you have internally the project? I've kept the current deps light, but it might be worth pulling in p-limit or the like as currently every transformation is executed within a single Promise.all.

stephenh · 2023-06-26T15:56:57Z

does joist-codegen not emit on non-changed files?

I had some infra in place to do that, with the focus on "do not emit or format unchanged files", but ended up taking it out b/c with dprint (instead of prettier) it was basically just as fast to format + emit all files every time.

Here's the PR that removed the caching:

#493

It was a little tricky given the goal was to cache based on the pre-formatted content. If we add back caching, it hopefully can be simpler, b/c in theory we wouldn't need the "pre-format" aspect of the prior approach.

do you have internally the project

We have about 200 tables, ~150 entity tables and ~50 enum tables. So nothing that has hit the ~1000s of tables mark, but enough that prettier-ing ~200-or-so files become noticeably slow.

brudil · 2023-06-26T18:39:02Z

Implemented caching with file-system-cache (https://www.npmjs.com/package/file-system-cache) which is used by Storybook, persisting to a .cache directory. Let me know if you have another preference for this though.

With no changes/fully cached joist-doc spends 7ms running in the integration dir, so this should be scalable to 200 entities.

Should get time to tackle enums and entity files themselves tomorrow.

brudil · 2023-06-26T22:06:31Z

Entity files and enum supported added.

Now we've got caching, I've added dprint to get the files back to where they were before joist-doc ran.

I've taken the config directly from ts-poet - it is improved, but still not as if only the comments where added. I'm not sure if this is because I've used @dprint/formatter and not the dprint-node that ts-poet uses - perhaps some changes between versions? Or if those lines are lines that are just new lines from codegen which are lost within babel.

Let me know if you have any thoughts.

Otherwise, I think this is pretty much there in terms of functionality.

stephenh · 2023-06-27T16:17:55Z

Sweet! Haven't had a chance to take a look yet, but yeah in terms of @dprint/formatter vs. dprint-node, afaiu they are basically separate projects for two of the dprint core contributors, where @dprint/formatter is "more official" but has a ~kinda-weird imo approach where it downloads plugins + runs them in web assembly, vs. dprint-node which is just a stock node native library.

I briefly played with @dprint/formatter in ts-poet, but the web assembly approach incurs a mini-compilation on each run, so dprint-node ended up being noticeably faster.

Is it possible to keep using dprint-node?

brudil · 2023-06-27T16:22:15Z

Sure thing, I'll get that moved over.

brudil · 2023-06-27T21:22:23Z

Moved over just fine.

stephenh

Apologies for taking a few days to get to this, but looks great @brudil ! I haven't ran it on our internal app yep, but looking forward to trying it out!

packages/doc/src/MarkdownCommentStore.ts

stephenh · 2023-06-29T01:16:03Z

packages/doc/src/CommentStore.ts

+ *
+ * Focused around Joist concepts (Entities, Fields etc) over the files themselves.
+ */
+export abstract class CommentStore {


Given we have no shared/base impls, wdyt of making this an interface?

The only thing we can't do with an interface is define the constructor signature right? Can change if that's a willing trade off.

Ah yeah, that's true; I'd assumed plugins would want the freedom of "just implement this interface", but if they extend the class, you're right it does document the constructor for things like accepting config. Otherwise accepting the config would have to be like an init method or implicit. Sounds good to keep it then!

stephenh · 2023-06-29T01:22:25Z

packages/doc/src/CommentStore.ts

+    entity: EntityDbMetadata,
+    fieldName: string,
+    source: FieldSourceType,
+    generated: boolean,


Ah I see, maybe call this isCodegen or isCodegenFile? I was thinking "well everything that we're outputting is generated", but we meant like the base-vs-codegen file.

I was thinking maybe the integration could inject this, instead of the comment store, but I guess the comment store is the one that knows the file name.

Wdyt of maybe always including the "generated from ..." suffix? Like even if we're writing to a codegen file, I could see someone reading the getter comments and seeing the same "ah I go to Foo.md" hint being somewhat useful. Dunno, unless you think it'd be too cluttered and we should taken the opportunity of "this is a codegen file that is obvious the comment is generated from somewhere" to skip the boilerplate.

Hmm, yeah. I get the name issue. We could flip it to isUserland or the like.

CommentStore has been built to not assume file backing; if someone (no judgement on if this is a terrible idea) wanted to build something like NotionCommentStore.

I'll remove this from the CommentStore API, and just always incude it within MarkdownCommentStore. We can always add it back in if it does get cluttered.

Another thought on this one; if we wanted it to be handled by the integration, we could have the CommentStore API return something like

type CommentResult = { comment: string, source?: string }

So we can easily build the string of Generated from {source} if given.

Sorry for the spam; one final thought. I wondered if there's a relevant TSDoc tag to use for this. The one that perhaps best fits is @privateRemarks or alternatively a custom tag such as @source

Agreed a custom tag like @source seems neat!

I like the { comment: string; source?: string } API and then, yeah, let the integration decide whether it injects the source (or maybe just always inject the @source, even for comments in AuthorCodegen.ts).

stephenh · 2023-06-29T01:24:01Z

packages/doc/src/MarkdownCommentStore.ts

+      .then((contents) => {
+        return remark.parse(contents);
+      })
+      .catch(() => null);


Maybe do a console.warn here just so that users can know that we skipped their file due to an error?

This essentially catches the file not existing cases, which might be quite noisy to log? It's perhaps a badly placed catch as it's really for the readFile, not the remark.parse which I can't imagine throws that often.

Ah I see, agreed the file missing shouldn't log...

I guess you're probably right that remark.parse doesn't throw very often/if ever given how lenient markdown is... I was thinking of like parsing ASTs and things, where parse errors are common, but, right the worst thing that happens with Markdown is an misinterpretation of your wip content, and not a true parse error.

Makes sense!

packages/integration-tests/src/entities/UserCodegen.ts

stephenh · 2023-06-29T01:40:55Z

packages/integration-tests/src/entities/UserCodegen.ts

+   * line comment with
+   *
+   * line breaks and also to show that we just use headings, and not the level, allowing for more customisation in users docs.
+   */


Do you know if tsdoc wants/expects comments on both the getter and the setter? I would have guessed when like viewing this in vscode, there is only one logical ipAddress property, so it would just like pick one, say the getter, and treat that as where comments should go.

Or maybe it like merges the getter+setter comments? Not sure...

Yeah, I did a bit of a deep dive looking in to this before starting. I believe this is a bit messy at the moment:

set & get are treated independently at a TSDoc level

They are merged at usage. There's an open bug for this though (Duplicate jsdoc tags from getters and setters microsoft/TypeScript#48801), so I didn't want to model how this works around what is marked as a bug. With the fieldSourceType property, it would be possible for CommentStores to tweak this, by deciding to only provide comments for getters.

Hm, does the "merged at usage" apply to tsdoc output (not just internally)?

Kinda wondering if we could just annotate the getter and assume readers are either just looking at the getter, or using vscode / something that merges the getter+setter since typically the getter+setter is exposed as a single logical property.

Does that seem reasonable, or does the doc output you're using really look best with the comment included in both the getter + setter?

I'm happy to move to just getters, it's an easy thing to change if there is ever movement on the above issue (which I think essentially refers to how the TS LSP exposes this stuff) and as it's touching code-generated files.

You can see the behavior here, hoving over the usages at the bottom. https://www.typescriptlang.org/play?#code/MYGwhgzhAEAKYCcCmA7ALtA3gKGtX0A9AFTEHTHQCS0A5kmmgJYq3RgC270KnSAdOWKEC9DLw5IAFAEos2AgF8FePARJlVlGhAbNW7LmB59BWkXl3i+s+UpWr1pIdUPc0ACyTQIpoRboGaCYAE1RmAE9bHHtyfDwNFxpOdy8fP3MCK2Cw9CYouRi8ZWVsFCQAdzhEcNl+CSQFcqr4ZHQ60PD8oA

Oh, I see, thanks for the link! I thought "merged" was like "appended", but if they're the same, it's deduped, so then, yeah, seems like not a big deal to put them on either getters-only or setters-only or what not... Probably fine to leave as-is if/until we have a reason to do something different.

packages/doc/src/Cache.ts

packages/doc/src/index.ts

brudil · 2023-06-29T13:20:51Z

Couple of discussion points left, resolved the others.

stephenh · 2023-07-01T20:14:46Z

packages/codegen/src/config.ts

@@ -46,6 +46,8 @@ export interface Config {
  /** Your application's request-level `Context` type. */
  contextType?: string;

+  docGen?: string;


Kinda picky/ocd, but maybe just "docs" instead of docGen?

stephenh · 2023-07-01T20:26:38Z

packages/integration-tests/src/entities/Author.ts

-  );
+  readonly allPublisherAuthorNames: AsyncProperty<Author, string | undefined> = hasReactiveAsyncProperty({
+    publisher: { authors: "firstName" },
+  }, (author) => author.publisher.get?.authors.get.flatMap((a) => a.firstName).join());


Hrm, this reformatting might be problematic, b/c I think it's undoing whatever prettier/dprint would do...

Hm, I wonder if this is just a legitimate difference between prettier & dprint--like dprint is "prettier-ish", but not 100% the same, and we generally keep them separated out, like *Codegen.ts is done by dprint (fast b/c we regularly format all files) & *.ts is done by prettier (not fast, but joist-codegen only touches them once on file creation, so it doesn't matter).

But, I bet if we more regularly "ran dprint on prettier-controlled files", or "ru nprettier on dprint-controlled files", we'd get these sorts of things...

Makes me wonder if each integration needs to know "is my source file usually formatted by dprint or by prettier" and then just use the appropriate formatter...

Kinda annoying, but given you'd added caching for the doc-gen piece, we should be good to use prettier I think...everyone's very first yarn joist-codegen run will re-prettier the entity files to get their .cache populated, but that shouldn't be too bad...

Wdyt of like conditional dprint-vs-prettier formatting of the output?

Mentioned it above, I'll try out recast because I'm thinking if we can get perfect comment insertion without touching the rest of the line, we should be able to avoid all this.

Cool! Fwiw a little curious to see how recast handles it, b/c even if it parses/outputs the existing code with 100% accuracy (which will be great), we'll still be inserting net-new/changed code, which will probably? have to go through a formatter? Maybe?

Or maybe if recast keeps everything basically as-is, and only our ~few lines of new comments are what "need formatting", then it won't matter as much whether we use prettier or dprint? Not sure...

brudil · 2023-08-03T12:57:01Z

Hey, just to say I haven't forgotten this. Been busy elsewhere, but hope to find time to get this over the line once I'm back off holiday.

stephenh · 2023-08-05T21:55:39Z

Great to hear from you @brudil ! Definitely understood getting busy; all of Joist has been built in fits & spurts over the years :-), so whenever you can pick this back up again would be great/np.

I am curious, how's your Objection iirc to Joist migration going? Anything running in production yet? Understood switching ORMs is a huge endeavor for a codebase, so definitely np if you're still prototyping/trying to find time/etc., but would enjoy hearing if you did actually get Joist into production. 🤞 :-)

brudil added 2 commits June 26, 2023 12:45

feat: wip joist-doc

b2f8599

revert lockfile after dedupe

441408a

brudil linked an issue Jun 26, 2023 that may be closed by this pull request

The unfathomable bridge of tsdoc & code-generation #706

Open

feat: add caching and refactor handlers

97e52da

Add dprint + enum & entity file integrators

4289d6c

brudil added 3 commits June 27, 2023 08:12

chore: format

ada7108

doc: some more tsdoc comments

72f5af1

chore: tidy up comments

cd0183a

brudil marked this pull request as ready for review June 27, 2023 08:55

use dprint-node

7b18eb4

stephenh approved these changes Jun 29, 2023

View reviewed changes

brudil added 4 commits June 29, 2023 14:04

use undefined over void

650b893

use actual file path for cache key, remove hashing

911bf07

rename file variable

2c9022d

Improve formatting for single line comments

7927f0c

stephenh reviewed Jul 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: joist-doc #711

feat: joist-doc #711

brudil commented Jun 26, 2023 •

edited

stephenh commented Jun 26, 2023

stephenh commented Jun 26, 2023

brudil commented Jun 26, 2023

brudil commented Jun 26, 2023 •

edited

stephenh commented Jun 26, 2023

brudil commented Jun 26, 2023 •

edited

brudil commented Jun 26, 2023 •

edited

stephenh commented Jun 27, 2023

brudil commented Jun 27, 2023

brudil commented Jun 27, 2023

stephenh left a comment

stephenh Jun 29, 2023

brudil Jun 29, 2023 •

edited

stephenh Jul 1, 2023

stephenh Jun 29, 2023

brudil Jun 29, 2023

brudil Jun 29, 2023

brudil Jun 29, 2023

stephenh Jul 1, 2023

stephenh Jun 29, 2023

brudil Jun 29, 2023

stephenh Jul 1, 2023

stephenh Jun 29, 2023

brudil Jun 29, 2023

stephenh Jul 1, 2023

brudil Jul 1, 2023

stephenh Jul 2, 2023

brudil commented Jun 29, 2023

stephenh Jul 1, 2023

stephenh Jul 1, 2023

brudil Jul 1, 2023

stephenh Jul 2, 2023

brudil commented Aug 3, 2023

stephenh commented Aug 5, 2023

feat: joist-doc #711

Are you sure you want to change the base?

feat: joist-doc #711

Conversation

brudil commented Jun 26, 2023 • edited

Overview

stephenh commented Jun 26, 2023

stephenh commented Jun 26, 2023

brudil commented Jun 26, 2023

brudil commented Jun 26, 2023 • edited

stephenh commented Jun 26, 2023

brudil commented Jun 26, 2023 • edited

brudil commented Jun 26, 2023 • edited

stephenh commented Jun 27, 2023

brudil commented Jun 27, 2023

brudil commented Jun 27, 2023

stephenh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brudil Jun 29, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brudil commented Jun 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brudil commented Aug 3, 2023

stephenh commented Aug 5, 2023

brudil commented Jun 26, 2023 •

edited

brudil commented Jun 26, 2023 •

edited

brudil commented Jun 26, 2023 •

edited

brudil commented Jun 26, 2023 •

edited

brudil Jun 29, 2023 •

edited