Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define CST elements #107

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Define CST elements #107

wants to merge 2 commits into from

Conversation

gibson042
Copy link

Fixes gh-41

Based on #41 (comment) , with slight modifications.

@forivall
Copy link
Contributor

For the sake of being able to more easily tell the type of a source element, it would probably be helpful to include type: "ChildReference", type: "Token", type: "NonToken" for each of the new types.

@gibson042
Copy link
Author

I thought about that, but type would collide with the Node property. Honestly, though, there's no relevant ECMAScript-level distinction. The key detail in this representation is reference vs. value, which is obvious on any given instance.


```js
interface ConcreteNode <: Node {
sourceElements: [ ChildReference | Token | Nontoken ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ConcreteNode show up in sourceElements? If not, should it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No and no, because ConcreteNode extends Node (which corresponds to a syntactic production) while sourceElements contains lexical input elements, which are not nodes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way because I don't think adding sourceElements: [ ChildReference | Token | Nontoken ] | null to Node would be valid syntax, but I see that I missed noting that everything else should extend not Node but ConcreteNode. I'll update.

@gibson042
Copy link
Author

Thanks for the use cases, @nzakas; they're great! I'll give them a first-pass attempt, but understand that some sharp edges are explicitly avoided, and others I'm sure I just missed.

Given a node, determine if it is surrounded by parentheses.

This one has some interesting properties; I'll come back to it.

Insert a new argument between two existing arguments of a CallExpression.

/* Insert `newArg` between `a` and `b` in `node.arguments`, where `b` is at `argIndex` */
node.arguments.splice(argIndex, 0, newArg);
let srcIndex;
let fill = node.sourceElements.reduce(function(fill, input, i) {
    // Found an argument?
    if ( input.reference === "arguments#next" ) {
        // Record it by decrementing `argIndex`, and set `srcIndex` if we found `b`
        if ( argIndex-- === 0 ) srcIndex = i;

    // Copy everything between `a` and `b` except comments
    } else if ( argIndex === 0 && input.element.slice(0, 7) !== "Comment" ) {
        fill.push( Object.assign({}, input) );
    }

    return fill;
}, []);
node.sourceElements.splice(srcIndex, 0, { reference: "arguments#next" }, ...fill);

Comment out a function.

I'll assume this is a function declaration, since replacing a function expression with a comment requires more information about the context (but is otherwise similar).

/* Replace the function declaration at index `i` in `node.body` with a comment */
srcIndex = indexOfRef(node.sourceElements, i); // uses the logic from above
node.sourceElements.splice(srcIndex, 1,
    createBlockCommentHead(),

    createCommentBody(
        // Escape `*/` sequences without touching existing escapes
        render(node.body[i]).replace(/\\([\w\W])|(\*)(\/)/g, "$2\\$1$3")
    ),
    createBlockCommentTail()
);
node.body.splice(i, 1);

Combine multiple var statements into a single var statement.

For simplicity, I'll avoid adding indentation and assume the same kind for all declarations.

/* Roll variable declarations into `decl`, at index `i` in `node.body` */
let declIndex = indexOfRef(node.sourceElements, i); // uses the logic from above
do {
    // Break out if there is no further declaration
    next = node.body[++i];
    if ( !next || next.type !== "VariableDeclaration" ) break;

    // Ensure the necessary comma
    let input, terminatorIndex = decl.sourceElements.length;
    while ( (input = decl.sourceElements[--terminatorIndex]) &&
        input.value !== ";" && !input.reference );
    if ( input.reference ) {
        decl.sourceElements.splice(terminatorIndex + 1, 0,
            (input={ element: "Puncutator" }));
    }
    input.value = ",";

    // Claim intervening source elements
    let nextIndex = declIndex;
    while ( (input = node.sourceElements[++nextIndex]) &&
        input.reference !== "body#next" );
    decl.sourceElements.push(
        // Pluck (but do not import) the reference to `next`
        ...node.sourceElements.splice(declIndex + 1, nextIndex).slice(0, -1)
    );

    // Claim _subordinate_ source elements (including declarator references)
    decl.sourceElements.push(
        // ...but excluding the `var`/`let`/`const` keyword
        ...next.sourceElements.filter( input => input.element !== "Keyword" )
    );

    // Update the AST, removing `next` and moving its declarators to `decl`
    node.body.splice(i, 1);
    decl.declarations.push(...next.declarations);
} while ( true );

Ok, back to the first:

Given a node, determine if it is surrounded by parentheses.

I think this requires access to parent nodes, but we absolutely prohibit the multiple (and in fact circular) references necessary to get them directly from the ESTree data structure. However, programs processing trees are free to do whatever they need on input/output, be they attaching ids like my POC, using on-the-side (Weak)Maps, or even introducing outright cycles like CST. So, assuming some helper functions,

let reNonToken = /^WhiteSpace|^LineTerminator|^Comment/;
function isParenthetical( node ) {
    // Check inwards if the first token is an open parenthesis
    let inwardsParenthetical = node.sourceElements.reduce(function(answer, input) {
        if ( answer != null ) return answer;
        if ( reNonToken.test(input.element) ) return answer;
        if ( input.value === "(" ) return true;
        return false;
    }, null);
    if ( inwardsParenthetical ) return true;

    // Check upwards
    for ( let dir of [-1, 1] ) {
        let expectedValue = dir === -1 ? "(" : ")";
        let categorize = dir === -1 ? $categorizeOpenParen : $categorizeCloseParen;
        let nut = node, parent; // "node under test"

        while ( (parent = $parentNode(nut)) ) {
            // Find the first preceding/following token
            let input, nutIndex = $sourceIndexOf(nut, parent);
            while ( (input = parent.sourceElements[nutIndex += dir]) &&
                reNonToken.test(input.element) );
            if ( input ) {
                // If it's not a grouping parenthesis, we know all we need to
                if ( input.value !== expectedValue ||
                    categorize(input, parent) !== "ParenthesizedExpression" ) {

                    return false;
                }

                // Otherwise, we may need to check the other side
                continue;
            }
            nut = parent;
        }

        // If we run out of "up", `node` is not parenthesized
        return false;
    }

    // We `continue`d twice, so `node` _is_ parenthesized
    return true;
}

@forivall
Copy link
Contributor

forivall commented Jan 7, 2016

rough implementation at https://npmjs.com/cstify (repo); test at http://forivall.com/astexplorer/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants