Skip to content
This repository has been archived by the owner on Dec 22, 2020. It is now read-only.

Standardize data structure definition #6

Open
septs opened this issue Nov 21, 2020 · 30 comments
Open

Standardize data structure definition #6

septs opened this issue Nov 21, 2020 · 30 comments

Comments

@septs
Copy link
Member

septs commented Nov 21, 2020

  1. Redesign field name
  2. Write JSON Scheme file
  3. Confirm data set license?
  4. New repo name? (for @tc39 edition)

CC @codehag any idea?

@codehag
Copy link

codehag commented Nov 22, 2020

Thanks for the ping @septs. I will think about this a bit. Currently on holiday until Nov. 30th -- will try to answer after I am back.

@septs
Copy link
Member Author

septs commented Nov 30, 2020

@codehag i need you re-design an new data structure definition. thanks

@codehag
Copy link

codehag commented Dec 1, 2020

Hi @septs, thanks for the ping again. I did indeed forget during the vacation -- if there are other places that need my attention please ping me there too.

Ok. Data structures. This is mirrored in a gist: https://gist.github.com/codehag/677fab08889190124851b9b93490915b

Proposals repo data structure

Current data structure

[
  {
    "tags": ["ECMA-262", "proposal"],
    "stage": 1,
    "name": "My fantastic title",
    "link": "https://github.com/tc39/proposal-oh-so-great",
    "authors": ["Yulia Startsev", "Septs"],
    "champions": ["Septs"],
    "forks_count": 7,
    "open_issues_count": 10,
    "stargazers_count": 107,
    "created_at": "2020-09-21T14:20:14.000Z",
    "pushed_at": "2020-11-24T14:13:57.000Z"
  },
]

Proposed Data structure

This is an ideal, not necessarily what we will achieve at first. The links are place holders.

Some notes:

  • we can generate the link from the proposal id, which takes the format of proposal-<name> -- from this we can also generate the spec url which is tc39.es/proposal-<name> and looks like this. This will save us from needing to process the url.
  • The tags should represent information that can't be captured otherwise. I think ECMA-262 and ECMA-402 are both useful. However co-champion and specification long form names like draft are already captured in other fields and are not necessary.
  • I don't know if we have an immediate use for github stars and forks. Is there a use case you have in mind?

Here is the proposed schema in json:

  {
    "tags": [string], // required inputs: "ECMA-262" or "ECMA-402". Optional inputs: "inactive" or "withdrawn"
    "stage": number, // valid inputs: 0, 1, 2, 3, 4
    "name": string,
    "id": string,
    "authors": [string], 
    "champions": [string],
    "notes": [
      {
       "date": string, // date in ISO 8601 format
       "url: string
      },
    ],
    "has-specification": bool,
    "tests": [string]
  },

And an example of what I have in mind:

[
  // Stage 0 proposal which has been presented but not advanced
  {
    "tags": ["ECMA-262"],
    "stage": 0,
    "name": "My fantastic title",
    "id": "proposal-oh-so-great",
    "authors": ["Yulia Startsev", "Septs"],
    "champions": ["Septs"],
    "notes": [
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#internationalization"
      }
    ],
    "has-specification": false,
    "tests": [],
  },
  // Stage 1 proposal 
  {
    "tags": ["ECMA-262"],
    "stage": 1,
    "name": "My fantastic title",
    "id": "proposal-oh-so-great",
    "authors": ["Yulia Startsev", "Septs"],
    "champions": ["Septs"],
    "notes": [
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#internationalization"
      }
    ],
    "has-specification": false,
    "tests": [],    
  },
  // Stage 2 proposal 
  {
    "tags": ["ECMA-262"],
    "stage": 2,
    "name": "My fantastic title",
    "id": "proposal-oh-so-great",
    "authors": ["Yulia Startsev", "Septs"],
    "champions": ["Septs"],
    "notes": [
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#internationalization"
      },
      {
       "date": "01-02-2020",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
      }
    ],
    "has-specification": true,
    "tests": [],
  },
  // Stage 3 proposal 
  {
    "tags": ["ECMA-262"],
    "stage": 3,
    "name": "My fantastic title",
    "id": "proposal-oh-so-great",
    "authors": ["Yulia Startsev", "Septs"],
    "champions": ["Septs"],
    "notes": [
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#internationalization"
      },
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
      },
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
      }
    ],
    "has-specification": true,
    "tests": ["https://github.com/tc39/test262/issues/2909", "https://github.com/tc39/test262/issues/2908"]
  },
  // Stage 4 proposal
  {
    "tags": ["ECMA-262"],
    "stage": 4,
    "name": "My fantastic title",
    "id": "proposal-oh-so-great",
    "authors": ["Yulia Startsev", "Septs"],
    "champions": ["Septs"],
    "notes": [
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#internationalization"
      },
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
      },
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
      }
    ],
    "has-specification": true,
    "tests": ["https://github.com/tc39/test262/issues/2909", "https://github.com/tc39/test262/issues/2908"]
  },
  // Other types of states for proposals
  // Stage 3 proposal -- inactive
  {
    "tags": ["ECMA-262", "inactive"],
    "stage": 3,
    "name": "My fantastic title",
    "id": "proposal-oh-so-great",
    "authors": ["Yulia Startsev", "Septs"],
    "champions": ["Septs"],
    "notes": [
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#internationalization"
      },
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
      },
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
      }
    ],
    "has-specification": true,
    "tests": ["https://github.com/tc39/test262/issues/2909", "https://github.com/tc39/test262/issues/2908"]
  },
  // Stage 3 proposal -- withdrawn
  {
    "tags": ["ECMA-262", "withdrawn"],
    "stage": 3,
    "name": "My fantastic title",
    "id": "proposal-oh-so-great",
    "authors": ["Yulia Startsev", "Septs"],
    "champions": ["Septs"],
    "notes": [
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#internationalization"
      },
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
      },
      {
       "date": "2019-09-07T15:50-04:00",
       "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
      }
    ],
    "has-specification": true,
    "tests": ["https://github.com/tc39/test262/issues/2909", "https://github.com/tc39/test262/issues/2908"]
  },
]

Individual Proposal repo data structure

This is what I currently imagine. It will roughly correspond to the above but will include extra data such as example and description, which are not necessary for all aggregators, but are for the website. We may also add localization fields for the title and descriptions? what do you think?

I've made this also a list instead of an object, as some proposals merge. (see class fields proposal)

[{
  "tags": ["ECMA-262"],
  "stage": 3,
  "name": "My fantastic title",
  "id": "proposal-oh-so-great",
  "authors": ["Yulia Startsev", "Septs"],
  "champions": ["Septs"],
  "notes": [
    {
     "date": "2019-09-07T15:50-04:00",
     "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#internationalization"
    },
    {
     "date": "2019-09-07T15:50-04:00",
     "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
    },
    {
     "date": "2019-09-07T15:50-04:00",
     "url: "https://github.com/tc39/notes/blob/master/meetings/2012-05/may-21.md#something"
    }
  ],
  "has-specification": true,
  "tests": ["https://github.com/tc39/test262/issues/2909", "https://github.com/tc39/test262/issues/2908"],
  "example": "function foo() { 'hello' }",
  "description": "This is the description",
}]

@codehag
Copy link

codehag commented Dec 1, 2020

One thing I didn't touch on were the member categories. The delegates in tc39 can be pulled from our github list, I don't know if we need a separate data structure for that. The members are also held by ecma, and I am not sure how important it will be for us to duplicate that info -- do you have a use case in mind? It might make sense, I just don't know yet how we will use this information.

@codehag
Copy link

codehag commented Dec 1, 2020

also, additional bonus for the proposals list would be "implementations" (ie, "spidermonkey", "jsc", "v8", "xs" etc) but I don't what the best way of getting that programmatically would be. On my end I keep a list: https://github.com/codehag/proposals/ which updates 2 weeks after plenary.

@septs
Copy link
Member Author

septs commented Dec 1, 2020

about example field, i recommend:

    "snippet-paths": {
        "use-case-one": "./snippet-use-case-one.js",
        "use-case-two": "./snippet-use-case-two.js",
        // etc
    }

@septs
Copy link
Member Author

septs commented Dec 1, 2020

authors and champions, i recommend use https://github.com/tc39/notes/blob/master/delegates.txt limit it.

anthors not must use delegate code (if is tc39 member need use it)
champions must use delegate code

@septs
Copy link
Member Author

septs commented Dec 1, 2020

additional bonus for the proposals list would be "implementations" (ie, "spidermonkey", "jsc", "v8", "xs" etc)

polyfill (workaround only) + implementations (engine only)

   "polyfills": [
       "url-1",
       "url-2",
       // etc
   ],
   "implementations": [
       "v8",
       "engine262",
       // etc
    ],

@septs
Copy link
Member Author

septs commented Dec 1, 2020

inactive stage, no number. (standardize it?, e.q: stage: -1)

{
   // ...
   "tags": ["ECMA-262", "inactive"],
   "stage": "inactive",
   // ...
}

such?

@septs
Copy link
Member Author

septs commented Dec 1, 2020

i think, standardize this idea to proposal

name is tc39/proposal-schema or tc39/schema

@codehag
Copy link

codehag commented Dec 1, 2020

inactive stage, no number. (standardize it?, e.q: stage: -1)

{

// ...
"tags": ["ECMA-262", "inactive"],
"stage": "inactive",
// ...
}

such?

It may be useful to have information about the stage at which something became inactive, in case it starts being worked on again. The stage a proposal reaches can be useful information. We could have an additional field to identify withdrawn and inactive and have those as booleans?

@codehag
Copy link

codehag commented Dec 1, 2020

I like the suggestions around implementations and polyfills -- those will likely require work data entry from maintainers. But we can figure that out, I currently do that for Firefox.

I agree that we should verify champions against the delegates list. Authors shouldn't be verified against the delegates list, as authors can be from outside of the committee.

@codehag
Copy link

codehag commented Dec 1, 2020

i think, standardize this idea to proposal

name is tc39/proposal-schema or tc39/schema

I didn't quite understand this idea, can you explain a bit more?

@septs
Copy link
Member Author

septs commented Dec 1, 2020

We could have an additional field to identify withdrawn and inactive and have those as booleans?

  1. stage field is latest stage number?
  2. use tags? ("tags": ["ECMA-262", "inactive"])

@codehag
Copy link

codehag commented Dec 1, 2020

We could have an additional field to identify withdrawn and inactive and have those as booleans?

stage field is latest stage number?
use tags? ("tags": ["ECMA-262", "inactive"])

I like option 2 there -- "use tags"

@septs
Copy link
Member Author

septs commented Dec 1, 2020

i think, standardize this idea to proposal
name is tc39/proposal-schema or tc39/schema

I didn't quite understand this idea, can you explain a bit more?

Provide an data schema catalog in tc39. like https://schema.org or https://www.schemastore.org/json

Provide JSON Schema file and specification file.

@septs
Copy link
Member Author

septs commented Dec 1, 2020

I like the suggestions around implementations and polyfills -- those will likely require work data entry from maintainers. But we can figure that out, I currently do that for Firefox.

polyfills is carefully selected by the author (reference implementation)

@codehag
Copy link

codehag commented Dec 9, 2020

I am still not really sure what a data scheme catalogue will mean -- does this mean that we would publish it externally? One worry I have is that it will tie us to a certain scheme and make it harder for us to change to address our needs. Since we are just starting this work, I would like to make sure that our data structure is right for our needs before providing support. At least, this is what I understood from the comments, let me know if you had something else in mind.

With an eye to moving this to TC39: We don't necessarily need to update anything here before transferring it. That is, so long as the resources you use will still be able to use it. Then we can evolve the project. What do you think?

@septs
Copy link
Member Author

septs commented Dec 9, 2020

What do you think?

Start a new project, OK?

@codehag
Copy link

codehag commented Dec 9, 2020

Works for me, does "dataset" make sense as a name? So TC39/dataset?

@codehag
Copy link

codehag commented Dec 9, 2020

Would it be alright if I copy the contents of this repository for now?

@septs
Copy link
Member Author

septs commented Dec 9, 2020

I want to rewrite this project. (use new data structure)

@septs
Copy link
Member Author

septs commented Dec 9, 2020

@codehag create a tc39/dataset repo and add me to admin?

@codehag
Copy link

codehag commented Dec 9, 2020

Ok, on it

@septs
Copy link
Member Author

septs commented Dec 16, 2020

Ok, on it

I do not have permission create repo on @tc39, thanks

@codehag
Copy link

codehag commented Dec 16, 2020

Yep, I am aware. I need to get an ok from the chairs. I got one so far, will check again.

@codehag
Copy link

codehag commented Dec 22, 2020

Do you have access to https://github.com/tc39/dataset/ now?

@septs
Copy link
Member Author

septs commented Dec 22, 2020

Do you have access to https://github.com/tc39/dataset/ now?

OK

@septs
Copy link
Member Author

septs commented Dec 22, 2020

@codehag

JSON Schema version, data structure definition.
https://github.com/tc39/dataset/blob/gh-pages/schema/bundle.json

@septs
Copy link
Member Author

septs commented Dec 22, 2020

json schema on vscode user experience.

image
image
image

@septs septs mentioned this issue Dec 22, 2020
2 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants