Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(utils): handle CRLF when parsing MDX imports #8606

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
20 changes: 20 additions & 0 deletions packages/docusaurus-utils/src/__tests__/markdownUtils.test.ts
Expand Up @@ -130,6 +130,26 @@ describe('createExcerpt', () => {
);
});

it('creates excerpt for content with imports/exports declarations, with CRLF line endings', () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this test supposed to fail if I remove your change in createExcerpt ? Because it seems to pass before/after in all cases so not sure the test is testing anything 😅

Copy link
Contributor Author

@scastiel scastiel Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is two ways to fix:

  • handling \r in createExcerpt (what I did here, which is the “clean” fix)
  • replacing fileLine === '' with !fileLine.trim() as .trim() removes \r characters (what I did here, which is not necessary but makes the code more consistent)

Only one of these changes is necessary. To me it’s more clear if both are made, but I can remove one of them :)

The test should fail if you remove both changes ;)

Copy link
Collaborator

@slorber slorber Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, but you said one of the fix was not required for the fix 😅 but if I don't revert both of them, the test will pass in both cases. So the 2nd change is required in the end? not sure to understand why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn’t clear 😅 , let me try to rephrase:

The problem is that on Windows, line endings are \r\n by default (CRLF). In createExcerpt, this caused an issue if the file contained import lines, as imports are supposed to end with an empty line. But the condition fileLine === '' couldn’t be true here, as empty lines actually contained \r.

  1. First way to fix this problem: changing the initial line split, by splitting on \r?\n instead of \n.

  2. Another way to fix the issue is doing it later, when handling the import lines, by checking for line.trim() instead of just line, which would remove the extra \r. This fix is not needed if we apply the fix 1. already. But as we already check line.trim() later in the function for other purpose, I think it makes the check more consistent.

Only one of these fixes is necessary to handle CRLF properly in createExcerpt. To me the fix 1 is the cleanest, but it doesn’t hurt to keep the fix 2. too.

If you revert one of these fixes, the test should still pass. If you revert both, the test should fail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see thanks

Didn't know trim actually removed new lines too 😄 so we have a double solution here, that's why I was a bit confused

expect(
createExcerpt(
dedent`
import Component from '@site/src/components/Component';

export function ItemCol(props) {
return <Item {...props} className={'col col--6 margin-bottom--lg'}/>
}

Lorem **ipsum** dolor sit \`amet\`[^1], consectetur _adipiscing_ elit. [**Vestibulum**](https://wiktionary.org/wiki/vestibulum) ex urna[^note], ~~molestie~~ et sagittis ut, varius ac justo :wink:.

Nunc porttitor libero nec vulputate venenatis. Nam nec rhoncus mauris. Morbi tempus est et nibh maximus, tempus venenatis arcu lobortis.
`.replace(/\n/g, '\r\n'),
),
).toBe(
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ex urna, molestie et sagittis ut, varius ac justo.',
);
});

it('creates excerpt for heading specified with anchor-id syntax', () => {
expect(
createExcerpt(dedent`
Expand Down
8 changes: 5 additions & 3 deletions packages/docusaurus-utils/src/markdownUtils.ts
Expand Up @@ -55,16 +55,18 @@ export function createExcerpt(fileString: string): string | undefined {
const fileLines = fileString
.trimStart()
// Remove Markdown alternate title
.replace(/^[^\n]*\n[=]+/g, '')
.split('\n');
.replace(/^[^\r\n]*\r?\n[=]+/g, '')
.split(/\r?\n/);
Comment on lines +58 to +59
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix. So far only CR was handled, not CRLF line endings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean "only LF was handled" 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry 😉

let inCode = false;
let inImport = false;
let lastCodeFence = '';

for (const fileLine of fileLines) {
if (fileLine === '' && inImport) {
// An empty line marks the end of imports
if (!fileLine.trim() && inImport) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required for the fix, but I think it makes the code cleaner, and more consistent with the next if block.

inImport = false;
}

// Skip empty line.
if (!fileLine.trim()) {
continue;
Expand Down