New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: joist-doc #711
base: main
Are you sure you want to change the base?
feat: joist-doc #711
Conversation
This looks amazing! Yep, definitely looks like a great approach/direction. In terms of babel, that sgtm; I had seen ts-morph before, i.e. https://ts-morph.com/setup/ast-viewers, which I think could be a candidate for this? I was trying to see if it used babel or not, but I don't see it pulling in any babel/similar dependencies, so my current guess is that it uses the tsc compiler API, so then it probably/I assume has the same limitation you'd found wrt comments. Fwiw I'm pretty happy to not use the tsc API b/c I think it will also be not-the-fastest. Granted, anything we write (in JS) will probably need caching; i.e. if (When Joist used prettier for formatting, I'd started doing some caching to avoid the cost of formatting every
True; fwiw the rationale for checking the generated files into git is to allow admittedly-manual regression testing as PR reviewers can see exactly what changed in the generated files. Granted, that's not as great as true red/green unit tests, but 🤷 I'd be fine with it to start with, and add fancier/automated assertions later/if needed. This is really neat! |
Wdyt about adding a hint to the comments that "this comment comes from Granted, for |
A source hint sounds like a great idea, I'll implement that for the Entity files. On perf, this sounds good. Does joist-codegen not emit on non-changed files? Just trying to figure if this is caching "this doesn't need to be touched" or in terms of "this file hasn't changed (hash or something), and we've cached the previous output" |
@stephenh Out of curiosity how many "doc-able" structures (Entities, enums) do you have internally the project? I've kept the current deps light, but it might be worth pulling in |
I had some infra in place to do that, with the focus on "do not emit or format unchanged files", but ended up taking it out b/c with dprint (instead of prettier) it was basically just as fast to format + emit all files every time. Here's the PR that removed the caching: It was a little tricky given the goal was to cache based on the pre-formatted content. If we add back caching, it hopefully can be simpler, b/c in theory we wouldn't need the "pre-format" aspect of the prior approach.
We have about 200 tables, ~150 entity tables and ~50 enum tables. So nothing that has hit the ~1000s of tables mark, but enough that prettier-ing ~200-or-so files become noticeably slow. |
Implemented caching with With no changes/fully cached joist-doc spends 7ms running in the integration dir, so this should be scalable to 200 entities. Should get time to tackle enums and entity files themselves tomorrow. |
Entity files and enum supported added. Now we've got caching, I've added dprint to get the files back to where they were before joist-doc ran. I've taken the config directly from ts-poet - it is improved, but still not as if only the comments where added. I'm not sure if this is because I've used Let me know if you have any thoughts. Otherwise, I think this is pretty much there in terms of functionality. |
Sweet! Haven't had a chance to take a look yet, but yeah in terms of I briefly played with Is it possible to keep using |
Sure thing, I'll get that moved over. |
Moved over just fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for taking a few days to get to this, but looks great @brudil ! I haven't ran it on our internal app yep, but looking forward to trying it out!
* | ||
* Focused around Joist concepts (Entities, Fields etc) over the files themselves. | ||
*/ | ||
export abstract class CommentStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we have no shared/base impls, wdyt of making this an interface
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing we can't do with an interface is define the constructor signature right? Can change if that's a willing trade off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, that's true; I'd assumed plugins would want the freedom of "just implement this interface", but if they extend the class, you're right it does document the constructor for things like accepting config
. Otherwise accepting the config
would have to be like an init
method or implicit. Sounds good to keep it then!
entity: EntityDbMetadata, | ||
fieldName: string, | ||
source: FieldSourceType, | ||
generated: boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, maybe call this isCodegen
or isCodegenFile
? I was thinking "well everything that we're outputting is generated", but we meant like the base-vs-codegen file.
I was thinking maybe the integration could inject this, instead of the comment store, but I guess the comment store is the one that knows the file name.
Wdyt of maybe always including the "generated from ..." suffix? Like even if we're writing to a codegen file, I could see someone reading the getter comments and seeing the same "ah I go to Foo.md" hint being somewhat useful. Dunno, unless you think it'd be too cluttered and we should taken the opportunity of "this is a codegen file that is obvious the comment is generated from somewhere" to skip the boilerplate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah. I get the name issue. We could flip it to isUserland or the like.
CommentStore has been built to not assume file backing; if someone (no judgement on if this is a terrible idea) wanted to build something like NotionCommentStore
.
I'll remove this from the CommentStore API, and just always incude it within MarkdownCommentStore. We can always add it back in if it does get cluttered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought on this one; if we wanted it to be handled by the integration, we could have the CommentStore API return something like
type CommentResult = { comment: string, source?: string }
So we can easily build the string of Generated from {source}
if given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the spam; one final thought. I wondered if there's a relevant TSDoc tag to use for this. The one that perhaps best fits is @privateRemarks
or alternatively a custom tag such as @source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed a custom tag like @source
seems neat!
I like the { comment: string; source?: string }
API and then, yeah, let the integration decide whether it injects the source (or maybe just always inject the @source
, even for comments in AuthorCodegen.ts
).
.then((contents) => { | ||
return remark.parse(contents); | ||
}) | ||
.catch(() => null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe do a console.warn
here just so that users can know that we skipped their file due to an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This essentially catches the file not existing cases, which might be quite noisy to log? It's perhaps a badly placed catch as it's really for the readFile, not the remark.parse which I can't imagine throws that often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, agreed the file missing shouldn't log...
I guess you're probably right that remark.parse
doesn't throw very often/if ever given how lenient markdown is... I was thinking of like parsing ASTs and things, where parse errors are common, but, right the worst thing that happens with Markdown is an misinterpretation of your wip content, and not a true parse error.
Makes sense!
* line comment with | ||
* | ||
* line breaks and also to show that we just use headings, and not the level, allowing for more customisation in users docs. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if tsdoc wants/expects comments on both the getter and the setter? I would have guessed when like viewing this in vscode, there is only one logical ipAddress
property, so it would just like pick one, say the getter, and treat that as where comments should go.
Or maybe it like merges the getter+setter comments? Not sure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I did a bit of a deep dive looking in to this before starting. I believe this is a bit messy at the moment:
- set & get are treated independently at a TSDoc level
- They are merged at usage. There's an open bug for this though (Duplicate jsdoc tags from getters and setters microsoft/TypeScript#48801), so I didn't want to model how this works around what is marked as a bug. With the
fieldSourceType
property, it would be possible for CommentStores to tweak this, by deciding to only provide comments for getters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, does the "merged at usage" apply to tsdoc output (not just internally)?
Kinda wondering if we could just annotate the getter and assume readers are either just looking at the getter, or using vscode / something that merges the getter+setter since typically the getter+setter is exposed as a single logical property.
Does that seem reasonable, or does the doc output you're using really look best with the comment included in both the getter + setter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to move to just getters, it's an easy thing to change if there is ever movement on the above issue (which I think essentially refers to how the TS LSP exposes this stuff) and as it's touching code-generated files.
You can see the behavior here, hoving over the usages at the bottom. https://www.typescriptlang.org/play?#code/MYGwhgzhAEAKYCcCmA7ALtA3gKGtX0A9AFTEHTHQCS0A5kmmgJYq3RgC270KnSAdOWKEC9DLw5IAFAEos2AgF8FePARJlVlGhAbNW7LmB59BWkXl3i+s+UpWr1pIdUPc0ACyTQIpoRboGaCYAE1RmAE9bHHtyfDwNFxpOdy8fP3MCK2Cw9CYouRi8ZWVsFCQAdzhEcNl+CSQFcqr4ZHQ60PD8oA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, thanks for the link! I thought "merged" was like "appended", but if they're the same, it's deduped, so then, yeah, seems like not a big deal to put them on either getters-only or setters-only or what not... Probably fine to leave as-is if/until we have a reason to do something different.
Couple of discussion points left, resolved the others. |
@@ -46,6 +46,8 @@ export interface Config { | |||
/** Your application's request-level `Context` type. */ | |||
contextType?: string; | |||
|
|||
docGen?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda picky/ocd, but maybe just "docs" instead of docGen?
); | ||
readonly allPublisherAuthorNames: AsyncProperty<Author, string | undefined> = hasReactiveAsyncProperty({ | ||
publisher: { authors: "firstName" }, | ||
}, (author) => author.publisher.get?.authors.get.flatMap((a) => a.firstName).join()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, this reformatting might be problematic, b/c I think it's undoing whatever prettier/dprint would do...
Hm, I wonder if this is just a legitimate difference between prettier & dprint--like dprint is "prettier-ish", but not 100% the same, and we generally keep them separated out, like *Codegen.ts
is done by dprint (fast b/c we regularly format all files) & *.ts
is done by prettier (not fast, but joist-codegen only touches them once on file creation, so it doesn't matter).
But, I bet if we more regularly "ran dprint on prettier-controlled files", or "ru nprettier on dprint-controlled files", we'd get these sorts of things...
Makes me wonder if each integration needs to know "is my source file usually formatted by dprint or by prettier" and then just use the appropriate formatter...
Kinda annoying, but given you'd added caching for the doc-gen piece, we should be good to use prettier I think...everyone's very first yarn joist-codegen
run will re-prettier the entity files to get their .cache
populated, but that shouldn't be too bad...
Wdyt of like conditional dprint-vs-prettier formatting of the output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned it above, I'll try out recast because I'm thinking if we can get perfect comment insertion without touching the rest of the line, we should be able to avoid all this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Fwiw a little curious to see how recast handles it, b/c even if it parses/outputs the existing code with 100% accuracy (which will be great), we'll still be inserting net-new/changed code, which will probably? have to go through a formatter? Maybe?
Or maybe if recast keeps everything basically as-is, and only our ~few lines of new comments are what "need formatting", then it won't matter as much whether we use prettier or dprint? Not sure...
Hey, just to say I haven't forgotten this. Been busy elsewhere, but hope to find time to get this over the line once I'm back off holiday. |
Great to hear from you @brudil ! Definitely understood getting busy; all of Joist has been built in fits & spurts over the years :-), so whenever you can pick this back up again would be great/np. I am curious, how's your Objection iirc to Joist migration going? Anything running in production yet? Understood switching ORMs is a huge endeavor for a codebase, so definitely np if you're still prototyping/trying to find time/etc., but would enjoy hearing if you did actually get Joist into production. 🤞 :-) |
Getting up a WIP version of this to see if you're happy to progress with.
Overview
Implemented as a
joist-doc
package, which run within codegen after file emit, if"docGen": "markdown"
is set injoist-config.json
. Is a seperate optional package which would need to be part of the user's dependencies.joist-doc
depends onjoist-codegen
and is given theConfig
andEntityDbMetadata
. Using theEntityDbMetadata
object, it runs against the known file types, (Entities [todo], EntityCodegen, Enums [todo]).The files are parsed with babel. Originally this was supposed to ideally use the user's
typescript
but the typescript parser doesn't have the best support for comments within the AST, relying on the complier's linting infrastructure.From there, each of the file types can traverse the AST finding specific nodes that might need comments. When a commentable node is found, the
CommentStore
is called to optionally provide a comment. If the CommentStore returns a comment, it is added back to the AST to be printed back out and written to the file.This PR implements a default
MarkdownCommentStore
implementation, which usesremark
to parse the markdown for easy manipulation.MarkdownCommentStore
opts for loose-ish heriustics on determining comments, allowing for greater flexability in how users format their markdown files.The goal was to support both these files:
Something structured, more detailed, nested like
And something looser like
Thus the implemented herustics are as follows:
All content is collected until it hits a non-paragraph element. This allows for a neat trick of using a horizontal rule within your doc to stop what is included within the comments, but allowing for more depth within the markdown file.
We obviously might want to tweak these or improve them.
There's a little more to go and some code to clean up but wanted to see this direction fits in with what we discussed. Testing, too, is a little bit of an unknown, as the integration tests don't care about the actual text within the codegen'd files. Snapshot testing could be used put will be a pain to maintain for other codegen changes.