Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MDX optimize with sibling nodes #10887

Merged
merged 4 commits into from May 1, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/smart-rats-mate.md
@@ -0,0 +1,5 @@
---
"@astrojs/mdx": patch
---

Updates the `optimize` option to group static sibling nodes as a `<Fragment />`. This reduces the number of AST nodes and simplifies runtime rendering of MDX pages.
5 changes: 5 additions & 0 deletions .changeset/violet-snails-call.md
@@ -0,0 +1,5 @@
---
"@astrojs/mdx": patch
---

Fixes `export const components` keys detection for the `optimize` option
38 changes: 27 additions & 11 deletions packages/integrations/mdx/src/README.md
Expand Up @@ -30,12 +30,7 @@ After:

```jsx
function _createMdxContent() {
return (
<>
<h1>My MDX Content</h1>
<pre set:html="<code class=...</code>"></pre>
</>
);
return <Fragment set:html="<h1>My MDX Content</h1>\n<code class=...</code>" />;
}
```

Expand All @@ -49,15 +44,20 @@ The next section explains the algorithm, which you can follow along by pairing w

### How it works

Two variables:
The flow can be divided into a "scan phase" and a "mutation phase". The scan phase searches for nodes that can be optimized, and the mutation phase applies the optimization on the `hast` nodes.

#### Scan phase

Variables:

- `allPossibleElements`: A set of subtree roots where we can add a new `set:html` property with its children as value.
- `elementStack`: The stack of elements (that could be subtree roots) while traversing the `hast` (node ancestors).
- `elementMetadatas`: A weak map to store the metadata used only by the mutation phase later.

Flow:

1. Walk the `hast` tree.
2. For each `node` we enter, if the `node` is static (`type` is `element` or `mdxJsxFlowElement`), record in `allPossibleElements` and push to `elementStack`.
2. For each `node` we enter, if the `node` is static (`type` is `element` or starts with `mdx`), record in `allPossibleElements` and push to `elementStack`. We also record additional metadata in `elementMetadatas` for the mutation phase later.
- Q: Why do we record `mdxJsxFlowElement`, it's MDX? <br>
A: Because we're looking for nodes whose children are static. The node itself doesn't need to be static.
- Q: Are we sure this is the subtree root node in `allPossibleElements`? <br>
Expand All @@ -71,8 +71,24 @@ Flow:
- Q: Why before step 2's `node` enter handling? <br>
A: If we find a non-static `node`, the `node` should still be considered in `allPossibleElements` as its children could be static.
5. Walk done. This leaves us with `allPossibleElements` containing only subtree roots that can be optimized.
6. Add the `set:html` property to the `hast` node, and remove its children.
7. 馃帀 The rest of the MDX pipeline will do its thing and generate the desired JSX like above.
6. Proceed to the mutation phase.

#### Mutation phase

Inputs:

- `allPossibleElements` from the scan phase.
- `elementMetadatas` from the scan phase.

Flow:

1. Before we mutate the `hast` tree, each element in `allPossibleElements` may have siblings that can be optimized together. Sibling elements are grouped with the `findElementGroups()` function, which returns an array of element groups (new variable `elementGroups`) and mutates `allPossibleElements` to remove elements that are already part of a group.
- Q: How does `findElementGroups()` work? <br>
A: For each elements in `allPossibleElements` that are non-static, we're able to take the element metadata from `elementMetadatas` and guess the next sibling node. If the next sibling node is static and is an element in `allPossibleElements`, we group them together for optimization. It continues to guess until it hits a non-static node or an element not in `allPossibleElements`, which it'll finalize the group as part of the returned result.

2. For each elements in `allPossibleElements`, we serailize them as HTML and add it to the `set:html` property of the `hast` node, and remove its children.
3. For each element group in `elementGroups`, we serialize the group children as HTML and add it to a new `<Fragment set:html="..." />` node, and replace the group children with the new `<Fragment />` node.
4. 馃帀 The rest of the MDX pipeline will do its thing and generate the desired JSX like above.

### Extra

Expand All @@ -82,7 +98,7 @@ Astro's MDX implementation supports specifying `export const components` in the

#### Further optimizations

In [How it works](#how-it-works) step 4,
In [Scan phase](#scan-phase) step 4,

> we remove all the elements in `elementStack` from `allPossibleElements`

Expand Down
123 changes: 115 additions & 8 deletions packages/integrations/mdx/src/rehype-optimize-static.ts
Expand Up @@ -8,6 +8,11 @@ export interface OptimizeOptions {
ignoreComponentNames?: string[];
}

interface ElementMetadata {
parent: Node;
index: number;
}

const exportConstComponentsRe = /export\s+const\s+components\s*=/;

/**
Expand All @@ -29,10 +34,11 @@ export function rehypeOptimizeStatic(options?: OptimizeOptions) {
for (const child of tree.children) {
if (child.type === 'mdxjsEsm' && exportConstComponentsRe.test(child.value)) {
// Try to loosely get the object property nodes
const objectPropertyNodes = child.data.estree.body[0]?.declarations?.[0]?.init?.properties;
const objectPropertyNodes =
child.data.estree?.body[0]?.declaration?.declarations?.[0]?.init?.properties;
Comment on lines -32 to +38
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous access was incorrect and never actually worked 馃槄 I should look into properly typing this in the future.

if (objectPropertyNodes) {
for (const objectPropertyNode of objectPropertyNodes) {
const componentName = objectPropertyNode.key?.name ?? objectPropertyNode.key?.value;
const componentName = objectPropertyNode.key?.name;
if (componentName) {
ignoreComponentNames.add(componentName);
}
Expand All @@ -45,18 +51,22 @@ export function rehypeOptimizeStatic(options?: OptimizeOptions) {
const allPossibleElements = new Set<Node>();
// The current collapsible element stack while traversing the tree
const elementStack: Node[] = [];
// Metadata used by `findElementGroups` later
const elementMetadatas = new WeakMap<Node, ElementMetadata>();

const isNodeNonStatic = (node: Node) => {
return node.type.startsWith('mdx') || ignoreComponentNames.has(node.tagName);
};

visit(tree, {
enter(node, key) {
enter(node, key, index, parents) {
// `estree-util-visit` may traverse in MDX `attributes`, we don't want that. Only continue
// if it's traversing the root, or the `children` key.
if (key != null && key !== 'children') return SKIP;

// @ts-expect-error read tagName naively
const isNodeIgnored = node.tagName && ignoreComponentNames.has(node.tagName);
// For nodes that can't be optimized, eliminate all elements in the
// `elementStack` from the `allPossibleElements` set.
if (node.type.startsWith('mdx') || isNodeIgnored) {
// For nodes that are not static, eliminate all elements in the `elementStack` from the
// `allPossibleElements` set.
if (isNodeNonStatic(node)) {
for (const el of elementStack) {
allPossibleElements.delete(el);
}
Expand All @@ -72,6 +82,12 @@ export function rehypeOptimizeStatic(options?: OptimizeOptions) {
if (node.type === 'element' || isMdxComponentNode(node)) {
elementStack.push(node);
allPossibleElements.add(node);

// @ts-expect-error MDX types for `.type` is not enhanced because MDX isn't used directly
if (index != null && node.type === 'element') {
// Record metadata for element node to be used for grouping analysis later
elementMetadatas.set(node, { parent: parents[parents.length - 1], index });
}
}
},
leave(node, key, _, parents) {
Expand All @@ -97,6 +113,11 @@ export function rehypeOptimizeStatic(options?: OptimizeOptions) {
},
});

// Within `allPossibleElements`, element nodes are often siblings and instead of setting `set:html`
// on each of the element node, we can create a `<Fragment set:html="...">` element that includes
// all element nodes instead, simplifying the output.
const elementGroups = findElementGroups(allPossibleElements, elementMetadatas, isNodeNonStatic);

// For all possible subtree roots, collapse them into `set:html` and
// strip of their children
for (const el of allPossibleElements) {
Expand All @@ -114,9 +135,95 @@ export function rehypeOptimizeStatic(options?: OptimizeOptions) {
}
el.children = [];
}

// For each element group, we create a new `<Fragment />` MDX node with `set:html` of the children
// serialized as HTML. We insert this new fragment, replacing all the group children nodes.
// We iterate in reverse to avoid changing the index of groups of the same parent.
for (let i = elementGroups.length - 1; i >= 0; i--) {
const group = elementGroups[i];
const fragmentNode = {
type: 'mdxJsxFlowElement',
name: 'Fragment',
attributes: [
{
type: 'mdxJsxAttribute',
name: 'set:html',
value: toHtml(group.children),
},
],
children: [],
};
group.parent.children.splice(group.startIndex, group.children.length, fragmentNode);
}
};
}

interface ElementGroup {
parent: Node;
startIndex: number;
children: Node[];
}

/**
* Iterate through `allPossibleElements` and find elements that are siblings, and return them. `allPossibleElements`
* will be mutated to exclude these grouped elements.
*/
function findElementGroups(
allPossibleElements: Set<Node>,
elementMetadatas: WeakMap<Node, ElementMetadata>,
isNodeNonStatic: (node: Node) => boolean
): ElementGroup[] {
const elementGroups: ElementGroup[] = [];

for (const el of allPossibleElements) {
// Non-static nodes can't be grouped. It can only optimize its static children.
if (isNodeNonStatic(el)) continue;

// Get the metadata for the element node, this should always exist
const metadata = elementMetadatas.get(el);
if (!metadata) {
throw new Error(
'Internal MDX error: rehype-optimize-static should have metadata for element node'
);
}

// For this element, iterate through the next siblings and add them to this array
// if they are text nodes or elements that are in `allPossibleElements` (optimizable).
// If one of the next siblings don't match the criteria, break the loop as others are no longer siblings.
const groupableElements = [el];
for (let i = metadata.index + 1; i < metadata.parent.children.length; i++) {
const node = metadata.parent.children[i];

// If the node is non-static, we can't group it with the current element
if (isNodeNonStatic(node)) break;

if (node.type === 'element') {
// This node is now (presumably) part of a group, remove it from `allPossibleElements`
const existed = allPossibleElements.delete(node);
// If this node didn't exist in `allPossibleElements`, it's likely that one of its children
// are non-static, hence this node can also not be grouped. So we break out here.
if (!existed) break;
}

groupableElements.push(node);
}

// If group elements are more than one, add them to the `elementGroups`.
// Grouping is most effective if there's multiple elements in it.
if (groupableElements.length > 1) {
elementGroups.push({
parent: metadata.parent,
startIndex: metadata.index,
children: groupableElements,
});
// The `el` is also now part of a group, remove it from `allPossibleElements`
allPossibleElements.delete(el);
}
}

return elementGroups;
}

function isMdxComponentNode(node: any) {
return node.type === 'mdxJsxFlowElement' || node.type === 'mdxJsxTextElement';
}