Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rush] JSON Selector Expressions (with "json:" selector parser) #4271

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

elliot-nelson
Copy link
Collaborator

@elliot-nelson elliot-nelson commented Aug 6, 2023

Summary

  • Create JSON-based selector expressions, allowing for complex selection of projects.
  • Implement an API-level Project Selector, which can process these expressions.
  • Add a new selector parser, json:, which loads a JSON file containing an expression and selects it.

Details

This PR is sliced off of the prototype in #4250, based on discussion from August Rush Hour West.

Some key highlights/changes from the Rush Hour demo:

  • All "string-based" expressions and parsers are removed, so reviewers can focus on the JSON syntax.
  • The JSON syntax has been significantly reworked. Previously, it was kind of an "internal" syntax that was frustrating for humans to write and understand; now it has been modified to be human-focused first, with ease of crafting and understanding expressions prioritized.
  • "Filters" have been renamed back to "Parameters" to map better to existing CLI concepts.
  • Existing selector parsers have been tweaked so that instead of writing to Terminal in most cases, they'll throw a new SelectorError, which can be used identify and react by API users.

These changes introduce some cool new knock-on effects:

  • Before, there was a concern that it was not very obvious when a selector expression was unsafe. Now, a selector expression can only be inserted into a parameter's value, so a parameter still wins: e.g. rush list --to json:expression.json vs rush list --only json:expression.json. (Using an expression exactly as defined can be done with --only, which makes clear there's no guarantee it is safe.)
  • Since selector expressions are loaded via a selector parser, they can actually refer to other selection expression JSON files recursively! This allows the maintainer to create a hierarchy of detailed, complex JSON selections for use in various contexts.

Selector Expressions

This table represents a quick, at-a-glance introduction to selector expressions, which can be saved in a JSON file or used directly in API calls from a node script:

Description Example
Simple named project "@microsoft/rush"
Simple scoped selector "tag:app"
Detailed selector { "scope": "tag", value: "app" }
Parameter { "--to": "@microsoft/rush" }
Operator { "intersect": ["git:origin/main", "tag:app"] }
Complex { "subtract": [{ "--to": "@microsoft/rush" }, "tag:sdk"] }

All of the above are examples of valid expressions, which means any of them can be inserted into any of the places above where expressions are valid.

Using selector expression files

Simple example:

rush build --only json:stuff.json

By default, all JSON files are always loaded relative to the rush.json root folder. This allows you to save selector files in a common place, e.g. common/selectors/*.json, and they can refer to each other with json:common/selectors/xyz.json.

If you want to load a JSON file relative to the invocation of Rush, you can use ./:

rush build --only json:./stuff.json

Finally, you can pipe a JSON query into stdin from command-line using the standard filename "-":

cat <<-EOF | jq '{ or: .projects }' | rush build --to json:-
${{ github.inputs.build_matrix }}
EOF

How it was tested

  • Some simple unit tests
  • Testing on CLI

Impacted documentation

  • Document new selector expressions

@elliot-nelson elliot-nelson changed the title [rush] Selector Expressions ("json:" parser) [rush] JSON Selector Expressions (with "json:" selector parser) Aug 7, 2023
@iclanton iclanton added this to In Progress in Bug Triage Aug 7, 2023
Copy link
Contributor

@dmichon-msft dmichon-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, but it'd be better to leave all the selections as Set<RushConfigurationProject> (or ReadonlySet<RushConfigurationProject>).

libraries/rush-lib/src/api/RushProjectSelector.ts Outdated Show resolved Hide resolved
libraries/rush-lib/src/api/RushProjectSelector.ts Outdated Show resolved Hide resolved
libraries/rush-lib/src/api/RushProjectSelector.ts Outdated Show resolved Hide resolved
libraries/rush-lib/src/api/RushProjectSelector.ts Outdated Show resolved Hide resolved
libraries/rush-lib/src/api/IRushProjectSelector.ts Outdated Show resolved Hide resolved
libraries/rush-lib/src/api/SelectorExpressions.ts Outdated Show resolved Hide resolved
libraries/rush-lib/src/api/RushProjectSelector.ts Outdated Show resolved Hide resolved
libraries/rush-lib/src/api/RushProjectSelector.ts Outdated Show resolved Hide resolved
@elliot-nelson
Copy link
Collaborator Author

@dmichon-msft Thanks for the review! I've taken care of all feedback except the suggestion on operator names. I wanted to loop @octogonz in here as well before I dive into changing them everywhere.

My suggestion is "logical" (query) operators. In infix form, (a AND b) OR (c AND NOT d).

David's recommendation is "set" operators, so, (a INTERSECT b) UNION (c SUBTRACT d).

Is that everyone's preference? I guess I can't argue that it's explicit.

My original reasoning was that AND and OR are quite clear in SQL and JQL, so people should be familiar with them. I think the advantage those contexts have is that the expression looks like this: name = A and name = B (clearly both can't be true), whereas A and B is a little more ambiguous.

@elliot-nelson
Copy link
Collaborator Author

elliot-nelson commented Sep 2, 2023

I have updated the types and json schema to use set operators (UNION, INTERSECT, SUBTRACT).

A complex expression might now be expressed like this (example.json):

{
  "intersect": [
    "--impacted-by": {
      "intersect": [
        { "--impacted-by": "git:origin/main" },
        "tag:terraform"
      ]
    },
    "tag:app"
  ]
}

In English, you could describe this as: take all the projects touched in this pull request, then select only the impacted projects that are terraform components, then from that list find all the application projects that could be impacted.

Then, this set of application projects can be used on the command line, like so:

rush install

# A "safe" build up to and including the selected projects
rush build --to json:example.json

# An "technically unsafe" custom operation, designed to be used this way
rush deploy-previews --only json:example.json

"additionalProperties": false,
"requiredProperties": ["scope", "value"],
"properties": {
"scope": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I've used the name "scope" internally so far, we might want a better name for public API consumption. It might be better to use provider or similar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://rushjs.io/pages/developer/selecting_subsets/, we say that in "--to X", the "X" is a selector, and that there are different types of selectors.

We then describe a tag: selector, a git: selector, etc.

Perhaps words like "scope" and "provider" are just wishful thinking, and ways to avoid "selectorType" or simply "selector": "git"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elliot-nelson this terminology arises from the grammar. Your original prototype looked like this:

rush build --select 'to(tag:sdk and git:release/v3.0.0) and example-project'

...with the idea that Rush plugins might introduce new notations like approver:team1.

In this grammar, operators like to() and and use a simple standard tokenizer. Whereas release/v3.0.0 and sdk have arbitrary notations. When designing them, we need to be very careful that the syntax does not conflict with the grammar. For example, release/v3.0.0) is a perfectly legal Git branch name, but is inexpressible in your notation, because the ) will be misinterpreted as part of the to() function. (Yes yes, we can handle that by introducing escapes, or by banning unpleasant branch name choices, but awareness of this problem is my point.)

Beyond this parsing annoyance, release/v3.0.0 is also a learning curve for end users: Imagine a person doesn't know what Git is. They might wonder whether / is part of Rush's query language, or if it is part of the git: operator. In order to know, they need to learn the specialized syntax for Git branch names.

So, if you look at the docs, the word "selector" was really introduced to mean "AST leaf nodes with their own weird syntaxes you have to learn." Selectors are the weird tokens waiting to break our parser if we aren't careful.

--to selects a set of projects, but it is not a "selector" because --to has standard well-behaved syntax. It is a grammatically normal friendly selection "operator" or "function", being represented as as a CLI "parameter".

elliot-nelson and others added 7 commits September 8, 2023 14:38
Co-authored-by: David Michon <dmichon@microsoft.com>
Co-authored-by: David Michon <dmichon@microsoft.com>
Co-authored-by: David Michon <dmichon@microsoft.com>
Co-authored-by: David Michon <dmichon@microsoft.com>
Co-authored-by: David Michon <dmichon@microsoft.com>
Co-authored-by: David Michon <dmichon@microsoft.com>
parameterName: string;

/**
* A short, human-readable description of the context where this selector was encountered.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is context always a noun phrase?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to insert this text string into an English sentence, maybe the docs should give a little more guidance to ensure a grammatically correct result.

let expr: SelectorExpression | undefined;

if (unscopedSelector === '-') {
const stdinAsString: string = fs.readFileSync(process.stdin.fd, 'utf8');

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not async?

export type ExpressionSelector = string | IExpressionDetailedSelector;

export interface IExpressionDetailedSelector {
scope: string;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is scope? Give example strings


export interface IExpressionDetailedSelector {
scope: string;
value: string;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is value? Give example strings

}

export interface IExpressionOperatorSubtract {
subtract: SelectorExpression[];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is multi-subtraction overcomplicated?

For example, subtract({A,C}, {B,C}, {C}):

  • is that ({A,C} - {B,C}) - {C} = {A}?
  • Or is it {A,C} - ({B,C} - {C}) = {A,C}?

This order-of-operations concern does not arise with union or intersection. Multi-subtraction can easily be non-ambiguously implemented as subtract(A, union(B,C)).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I guess the only pushback I have is where to enforce such a restriction.

I could attempt to use a FixedLengthArray type for this everywhere, and enforce a min/max length in the JSON schema, and then throw Error in the function if it's not exactly 2 elements, but it seems like a lot of extra work to prevent the user from providing multiple elements. Whereas if you allow it, and all your examples only show 2 elements, I think most people will typically just subtract two elements.

(For what it's worth, your first result seems like the clear answer, it is the 1st operand minus all the other operands.)

P.S. I definitely don't want to change to not be an array e.g. { minuend: , subtrahend: } - I think there's value in all operators taking an operands array of at least 2 elements.

includeExternalDependencies: true,
enableFiltering: false
}
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2023-09-13 at 11 22 56 PM

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "selector"? Looking at these docs, a "selector" is an engine that chooses a list of Rush projects, kind of like how a CSS selector chooses DOM elements.

You have created a class called RushProjectSelector whose state is a rush.json configuration plus some options controlling how the Git selector behaves. If we use SQL terminology, this class is like a SQL table plus some global options that affect query behavior. It has the ability to execute queries, but the class is not a query -- instead the actual query is passed into projectSelector.selectExpression().

Is a "selector" that query? Well, the query's type is SelectorExpression (not to be confused with ExpressionSelector, let's ignore that for now).

What is a "selector expression"? Well apparently it can be a selector:

const projects: ReadonlySet<RushConfigurationProject> = await projectSelector.selectExpression({
  '--to': 'project1'
});

Is it then the case that --to X and X are both selectors? Is everything a "selector"? Is a "selector" just any AST node for this query language?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructive suggestions:

  1. Per my other comment, let's return to the idea that a "selector" is a leaf node like git:origin/main or tag:release.
  2. An expression like to(tag:release and git:origin/main) should get some other name, like selection expression. to() itself is really a "selection function" but its JSON notation is an expression involving that function.

--to is the CLI parameter representation of to(). I feel like the JSON design went off the rails a bit by putting --to in a JSON key. JSON keys are JavaScript variable names. We don't write let --to: string;. Of course we do want the JSON syntax to be concise to write, but it also needs to be an easy API to use. The API scenarios are things like:

  • a program that generates a selection expression
  • a program that optimizes a selection expression by reducing it to a more efficient form

Such programs aren't very easy to write if we can't even do:

switch (node.kind) {
  case 'union': . . .
  case 'intersect': . . .
  . . .
}

and instead must do:

    if (isUnion(expr)) {
      return this._evaluateUnion(expr, context);
    } else if (isIntersect(expr)) {
      return this._evaluateIntersect(expr, context);
    } else if (isSubtract(expr)) {
      return this._evaluateSubtract(expr, context);
    } else if (isParameter(expr)) {
      return this._evaluateParameter(expr, context);
    } else if (isDetailedSelector(expr)) {
  1. So I recommend going back to a more conventional AST design.
  2. Rename RushProjectSelector.selectExpression to something like ProjectQueryContext.select()

filters?: Record<string, string>;
}

export type ExpressionParameter = Record<`--${string}`, string>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be allowed:

Screenshot 2023-09-14 at 1 06 08 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Bug Triage
  
In Progress
Development

Successfully merging this pull request may close these issues.

None yet

3 participants