Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script tags & AST Strructure / Introducing Breaking Changes #170

Open
ichiriac opened this issue Aug 5, 2018 · 11 comments
Open

Script tags & AST Strructure / Introducing Breaking Changes #170

ichiriac opened this issue Aug 5, 2018 · 11 comments

Comments

@ichiriac
Copy link
Member

ichiriac commented Aug 5, 2018

The A from AST stands for Abstract, meaning it does not matters about source code formatting. This can result to removing/loosing some extra informations, and that makes impossible to transform AST back to the original source.

The reason why AST is like that is because it's purpose is to be an intermediate machine language/structure, it's used by a VM or JIT machine in order to generate atomic machine instructions.

But as php-parser is mainly used to edit PHP sources, and the next major release introduce anyway breaking changes that the good time to decide the AST orientations.

If I choose to keep on AST the same structure as the document, the script tag nodes may also be included as an AST node in order to be able to scan something like this :

<?php /** HELLO **/ ?>
<?php /** WORLD **/ ?>

@czosel & @evilebottnawi - do you see any other missing use case where the formatting is lost ?

@czosel
Copy link
Collaborator

czosel commented Aug 5, 2018

@ichiriac I've been working on this exact issue yesterday in the prettier plugin - my approach is to parse the tokens and add new just comment nodes to the AST (prettier/plugin-php#513). For the plugin, it would of course be great if we wouldn't have to do this anymore! I'm not sure in what other contexts this PHP parser is used, and if introducing the new AST nodes could case problems in these use-cases?

There is one more case for which we're looking at the AST tokens right now:

<?php echo 1; ?>
<?php echo 2; ?>
// same AST as
<?php echo 1; ?><?php echo 2; ?>

(see prettier/plugin-php#503)
If you're thinking of optimizing the AST output for re-printing the source code (like we do in the prettier plugin), an extra inline node would certainly help the prettier plugin as well.

@ichiriac
Copy link
Member Author

ichiriac commented Aug 5, 2018

Here the problem comes from the way interprets newlines after closing tag (another example of AST oriented for execution). The first "\n" is stripped from the output, a sort of authors choice in order to break html output with extra new lines.

Actually php-parser is used to :

  • retrieve informations about the code structure
  • update / generate code from the AST structure
  • reformat the code
  • transpile to JS by converting AST nodes

In principle sticking to the document structure should not be a problem. The reason PHP engine avoids extra information is because the have only one use case, but it's not the case here.

What I plan to do is to introduce a script node (with children array nodes) and an output node with raw html output.

Here some use cases :

<?php
  function foo() {
     $var  = true;
     ?>OUTPUT <?
     for(;;) { 
       ?> HTML<%
    }
  }
?>
Another output <?= foo(); ?>

As nesting script nodes breaks AST struction, only open_tag at the document root would be represented :

Program {
   children: [
      Script : {
        short: false,
        children: [
            Function: {
               children: [
                  Assing: { ... },
                  Output: { ... nextShortTag: true },
                  For: { .. Output { nextAspTag: true } .. },
               ]
            }
        ]
      },
      Output: {},
      EchoScript: {
         expr: { Call foo() }
      }
   ]
}

Do you see any other kind of missing informations ? (/cc @chris-l)

@czosel
Copy link
Collaborator

czosel commented Aug 6, 2018

This looks really good - I think we'd have it much easier to format files with lots of inline HTML.
/cc @mgrip @evilebottnawi

@alexander-akait
Copy link
Collaborator

Looks good 👍

@alexander-akait
Copy link
Collaborator

@ichiriac I thoroughly studied your assumption, it seems to me that it will not solve the problem. New kind of node can introduce new problems.

I prefer add option pseudoInlineNode: true (maybe better name) and doesn't eat ?>\n<?php. Also inline node can contains information about tag type (type: asp, standard, etc). Why? We already known what each inline node should be wrapper in php tags (?>Content of inline node<?php) and that's enough.

I think it's not difficult to implement.

@ichiriac
Copy link
Member Author

I'm agree that creating new nodes does not solves directly your problem, and just fixing the behavior of the inline node would be enough, but here I'm looking for being more close to the document structure, and without a new script tag it would be analog to reducing & simplifying the structure (and in some cases loosing information).

So in principle no more implied syntax choices, every structure should have it's own node - keeping the document semantics clean.

If the ouput corresponds to the raw HTML output, so the script should have its own node and include inside its childs as it adds a new tree level. In principle at the document root you should only have script and/or output nodes, and the output node (that's just the inline node renamed) could now have a raw '\n' and a value (that could be empty in your example).

Here we speak about the script tags but as I will digg into nodes it could maybe then be applied to another structures (I don't see any other right now)

I think that's the right moment to introduce this kind of change as the parser is not yet released and I do not want to end up next with a v4 because of changing tags and breaking backwards compatibility.

@ichiriac
Copy link
Member Author

#171 related : removal of parenthesis node (replaced with parenthesizedExpression property)

@alexander-akait
Copy link
Collaborator

@ichiriac maybe it is good idea introduce PHPExpressionContainer (or maybe better name) when we found <?php ?>/<?php (looks same for JSX https://astexplorer.net/#/gist/31725f76a8a3a6c33da02d098f4aa3ac/fe432a1b03023d56693a2e26e8f3edaaca15b8e9). It is allow to detect when we should print <?php and ?>, use group for all content and detect indent for this.

Now we print ?> before inline and <?php after inline (it is very dirty solution and doesn't work as expected in some cases), Also please look on innerComments, it is allow to print comments in empty expression:

<?php
// Comment
?>

@ichiriac ichiriac added the high-pri High Priority bug label Oct 9, 2018
@ichiriac ichiriac added this to the 3.0.0-prerelease.9 milestone Jan 7, 2019
@alexander-akait
Copy link
Collaborator

@ichiriac it think right now it is not high priority, better stabilize parser, then implement better ast for inline

@ichiriac ichiriac added feature and removed high-pri High Priority bug labels Jan 22, 2019
@ichiriac
Copy link
Member Author

ichiriac commented Feb 16, 2019

related to prettier/plugin-php#517

<!DOCTYPE html>
<html lang="de">
<head>
    <?php /* ups */
    if (1==1)
    {
        echo 'FOO';
    } ?>
</head>
<body>
</body>
</html>

NB : The /* ups */ comment would be attached on the script node instead of the inline - I'll try a proposal implementation soon as it's a blocking point for my release

@alexander-akait
Copy link
Collaborator

alexander-akait commented Feb 18, 2019

@ichiriac we use own algorithm to attach comment, it is simple comment location should fit in new script node ast location

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants