Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Refactor table tokens #2166

Merged
merged 7 commits into from Aug 16, 2021

Conversation

calculuschild
Copy link
Contributor

@calculuschild calculuschild commented Aug 8, 2021

Pretty small change to address #2165. header and cells are now formatted as "sub-tokens" (similar to list items in a list) rather than having separate headers and tokens.headers. This also means child tokens are now in a tokens property to follow the convention of the other lexer tokens for consistency.

Changes the Table signature though so this is breaking and should be in V3.0

Previous token signature:
{
  type: 'table',
  align: [null, null],
  raw: '| a | b |\n|---|---|\n| 1 | 2 |\n',
  header: ['a', 'b'],
  cells: [['1', '2']],
  tokens: {
    header: [
      [{ type: 'text', raw: 'a', text: 'a' }],
      [{ type: 'text', raw: 'b', text: 'b' }]
    ],
    cells: [[
      [{ type: 'text', raw: '1', text: '1' }],
      [{ type: 'text', raw: '2', text: '2' }]
    ]]
  }
}
New token signature
{
  type: 'table',
  align: [null, null],
  raw: '| a | b |\n|---|---|\n| 1 | 2 |\n',
  header: {
    text: ['a', 'b'],
    tokens: [
      [{ type: 'text', raw: 'a', text: 'a' }],
      [{ type: 'text', raw: 'b', text: 'b' }]
    ]
  },
  cells: {
    text: [['1', '2']],
    tokens: [[
      [{ type: 'text', raw: '1', text: '1' }],
      [{ type: 'text', raw: '2', text: '2' }]
    ]]
  }
}

@vercel
Copy link

vercel bot commented Aug 8, 2021

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/markedjs/markedjs/38ErXbTFie3utmH93Wg33w7gpLRQ
✅ Preview: https://markedjs-git-fork-calculuschild-refactortablece-f3c032-markedjs.vercel.app

This edge case is already handled within the splitCells function
@calculuschild
Copy link
Contributor Author

Bonus! CodeQL found a Regex vulnerability for an edge case that we already handled in #2126

Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/Tokenizer.js Outdated
Comment on lines 389 to 390
item.header.tokens[j] = [];
this.lexer.inlineTokens(item.header.text[j], item.header.tokens[j]);
Copy link
Member

@UziTech UziTech Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it the tokens property here is an array of token arrays.

Maybe instead we should do:

{
  type: 'table',
  align: [null, null],
  raw: '| a | b |\n|---|---|\n| 1 | 2 |\n',
  header: {
    items: [
      {
        text: 'a',
        tokens: [{ type: 'text', raw: 'a', text: 'a' }],
      },
      {
        text: 'b',
        tokens: [{ type: 'text', raw: 'b', text: 'b' }],
      }
    ]
  },
  rows: [ // replace `cells` with `rows`
    { // row 1
      items: [
        {
          text: '1',
          tokens: [{ type: 'text', raw: '1', text: '1' }],
        },
        {
          text: '2',
          tokens: [{ type: 'text', raw: '2', text: '2' }],
        }
      ]
    }
  ]
}

That seems to be more consistent with the rest of the tokens (including lists)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Good point. I have a thought on this then:

Is it worth it to have header and row as objects with a single property items? An object with a single property seems redundant and maybe slightly slows things with extra object accesses? What if instead header and rows are just arrays like items is an array? So a List has Items, a Table has Headers and Rows, but they use the same format of being an array of "sub-tokens" (with rows being and array of arrays of sub-tokens):

{
  type: 'table',
  align: [null, null],
  raw: '| a | b |\n|---|---|\n| 1 | 2 |\n',
  header: [
    {
      text: 'a',
      tokens: [{ type: 'text', raw: 'a', text: 'a' }],
    },
    {
      text: 'b',
      tokens: [{ type: 'text', raw: 'b', text: 'b' }],
    }
  ],
  rows: [ // replace `cells` with `rows`
    [ // row 1
      {
        text: '1',
        tokens: [{ type: 'text', raw: '1', text: '1' }],
      },
      {
        text: '2',
        tokens: [{ type: 'text', raw: '2', text: '2' }],
      }
    ]
  ]
}

TLDR:

Instead of

  • Table
    • which has a property of header
      *which has as sub-token of items which seems like one level too deep.

we could have:

  • Table
    • which as some sub-tokens header.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@UziTech This is now changed. Want another look at it?

@calculuschild calculuschild changed the base branch from master to dependabot/npm_and_yarn/babel/preset-env-7.15.0 August 14, 2021 04:19
@calculuschild calculuschild changed the base branch from dependabot/npm_and_yarn/babel/preset-env-7.15.0 to master August 14, 2021 04:19
@UziTech UziTech changed the title Refactor tables so tokens property is an array of tokens fix: Refactor table tokens Aug 16, 2021
@UziTech UziTech merged commit bc400ac into markedjs:master Aug 16, 2021
github-actions bot pushed a commit that referenced this pull request Aug 16, 2021
# [3.0.0](v2.1.3...v3.0.0) (2021-08-16)

### Bug Fixes

* Add module field to package.json ([#2143](#2143)) ([edc2e6d](edc2e6d))
* drop node 10 support ([#2157](#2157)) ([433b16f](433b16f))
* Full Commonmark compliance for Lists ([#2112](#2112)) ([eb33d3b](eb33d3b))
* Refactor table tokens ([#2166](#2166)) ([bc400ac](bc400ac))

### BREAKING CHANGES

* - `table` tokens `header` property changed to contain an array of objects for each header cell with `text` and `tokens` properties.
- `table` tokens `cells` property changed to `rows` and is an array of rows where each row contains an array of objects for each cell with `text` and `tokens` properties.

v2:

```json
{
  "type": "table",
  "align": [null, null],
  "raw": "| a | b |\n|---|---|\n| 1 | 2 |\n",
  "header": ["a", "b"],
  "cells": [["1", "2"]],
  "tokens": {
    "header": [
      [{ "type": "text", "raw": "a", "text": "a" }],
      [{ "type": "text", "raw": "b", "text": "b" }]
    ],
    "cells": [[
      [{ "type": "text", "raw": "1", "text": "1" }],
      [{ "type": "text", "raw": "2", "text": "2" }]
    ]]
  }
}
```

v3:

```json
{
  "type": "table",
  "align": [null, null],
  "raw": "| a | b |\n|---|---|\n| 1 | 2 |\n",
  "header": [
    {
      "text": "a",
      "tokens": [{ "type": "text", "raw": "a", "text": "a" }]
    },
    {
      "text": "b",
      "tokens": [{ "type": "text", "raw": "b", "text": "b" }]
    }
  ],
  "rows": [
    {
      "text": "1",
      "tokens": [{ "type": "text", "raw": "1", "text": "1" }]
    },
    {
      "text": "2",
      "tokens": [{ "type": "text", "raw": "2", "text": "2" }]
    }
  ]
}
```
* Add module field to package.json
* drop node 10 support
@github-actions
Copy link

🎉 This PR is included in version 3.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@calculuschild calculuschild deleted the RefactorTableCellTokens branch September 8, 2021 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants