Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQL] use inheritance to support different dialects #3046

Open
wants to merge 271 commits into
base: master
Choose a base branch
from

Conversation

keith-hall
Copy link
Collaborator

@keith-hall keith-hall commented Sep 16, 2021

Initial work on supporting different SQL dialects using inheritance. Defaults to MySQL like before, but users can easily override the SQL.sublime-syntax to point to their preferred dialect. Idea originally proposed by SublimeHQ on the forum.

Fixes #1742, #1764, #2899, #3072, #3072, closes #2631

Note: breaking change:

  • the scope keyword.other.DML has been lowercased to keyword.other.dml, so color schemes may need a small change if they target this scope specifically

@jrappen
Copy link
Contributor

jrappen commented Sep 18, 2021

Thanks for working on this. I noticed a few things:

  • The indentation for context: types: match: ... is inconsistent with the rest of this PR.
  • You use
    captures:
      0: ...
      1: ...
    while some others use
    scope: ...
    captures:
      1: ...
    which naturally both works but I thought you might want to use one over the other for readability. But you knew that.
  • While requiring word borders via regex for correctly being able to scope words I believe the word scope should be only applied to the word without the borders.
  • AFAIK *.sublime-syntax does not define a comment key, although PackageDev marks them as such. Maybe use standard YAML comments?

@michaelblyons
Copy link
Collaborator

  • RE captures: 0:: Someone (not me) is very determined to convert all of these to scope:.

  • RE comment: mapping: I am pretty sure this is a product of the <comment> tag in tmLanguage that people used to use. To avoid dropping the comments, the sublime-syntax converter brought them over, and PD was kind enough to mark them as comments.

    I think we may as well convert to #-comments in the future as I mentioned in [SQL] Add support for build-in MariaDB/MySQL/PostgreSQL functions #2631, but the comment:-style ones were already there in the original.

Thanks for working on this.

Big +1.

@michaelblyons
Copy link
Collaborator

michaelblyons commented Sep 19, 2021

INSERT INTO my_table (foo, bar)
VALUES (1, 'two')

The table is scoped as a function.


I'm finding that the biggest thing I miss from my janky T-SQL definition is the scoping for identifiers within (eg) SELECT statements. It's not at all perfect, and tailored to conventions I use (vs. the more-complete, but less specifically-scoped identifier context here). But it has been very nice to have table-name/alias marked a different color than column-name.

@keith-hall
Copy link
Collaborator Author

keith-hall commented Sep 19, 2021

INSERT INTO my_table (foo, bar)
VALUES (1, 'two')

The table is scoped as a function.

I'm finding that the biggest thing I miss from my janky T-SQL definition is the scoping for identifiers within (eg) SELECT statements. It's not at all perfect, and tailored to conventions I use (vs. the more-complete, but less specifically-scoped identifier context here). But it has been very nice to have table-name/alias marked a different color than column-name.

Indeed, I am aware of this (though thanks for pointing it out) and am trying to decide what approach is best to use - I'm thinking of specific, small contexts which will allow me to match column names where column names are expected, table names where table names are expected etc, while allowing fully/partially qualified names. I think the alternative is to just match anything left which is not a keyword as an identifier - it could still support qualified names, but without being context aware, it won't know if it is a table or column name when unqualified (especially if the schema is not dbo), which is why I prefer the former approach, despite it being more complicated and potentially brittle if I don't think of/cover all cases/syntax... So I had the idea to work on syntax test coverage first, to make that easier to implement in terms of knowing when it is correct.

@keith-hall
Copy link
Collaborator Author

* While requiring word borders via regex for correctly being able to scope words I believe the word scope should be only applied to the word without the borders.

can you clarify what you mean here with an example please @jrappen ?

I believe I addressed your other comments :)

@jrappen
Copy link
Contributor

jrappen commented Sep 19, 2021

Will do when I'm back home.

@michaelblyons
Copy link
Collaborator

Regression: Having a function in the SELECT breaks downstream highlighting.

SELECT  foo, COUNT(*) AS tally
FROM    bar
WHERE   1 = 1
GROUP BY foo

@keith-hall
Copy link
Collaborator Author

Regression: Having a function in the SELECT breaks downstream highlighting.

Sorry about that, it seems every change I make breaks something 😅 but at least each time we get some more syntax test assertions to prevent it in future :D so thanks for the testing and reporting of issues! It is fixed now :)

@jrappen
Copy link
Contributor

jrappen commented Sep 22, 2021

I did some "comment"s research.

The built-in converter had originally brought over the comment tags from tmLanguage, when I did a PR to convert all files to sublime-syntax. I haven't tested it, but I assume the built-in converter works unchanged.

The docs do not specify a comment key, though. Whether by ommission or design, only Jon or one of the others could say.


With regards to word borders, I meant something like this random example from your code:

  expressions:
    - meta_append: true
    - match: (?i)\b(CONCATENATE|CONVERT|LOWER|SUBSTRING|TRANSLATE|TRIM|UPPER)\b
      scope: support.function.string.sql

should've probably been this, yes? :

  expressions:
    - meta_append: true
    - match: (?i)\b(CONCATENATE|CONVERT|LOWER|SUBSTRING|TRANSLATE|TRIM|UPPER)\b
      captures:
        1: support.function.string.sql

There still some pop: true left. These should be pop: 1 when using v2 syntax definitions.

@deathaxe
Copy link
Collaborator

RE captures: 0:: Someone (not me) is very determined to convert all of these to scope:.

Note: captures: 0 may be somewhat faster than pairing captures with scope.

Copy link
Collaborator

@michaelblyons michaelblyons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope-everything-as-a-constant for now is a little wild, but I can edit-color-scheme my way out of it.

Two notes on T-SQL types:

  • You're missing text (intentional?)
  • You can wrap them in [] in some places, including the somewhat odd [VARCHAR](1) style.

SQL/SQL (basic).sublime-syntax Outdated Show resolved Hide resolved
SQL/TSQL.sublime-syntax Outdated Show resolved Hide resolved
SQL/SQL (basic).sublime-syntax Outdated Show resolved Hide resolved
@michaelblyons
Copy link
Collaborator

michaelblyons commented Sep 24, 2021

Looking very good. I will probably now run this at work, rather than my own def.

The current weird stuff:

  • I can't seem to UPDATE after a CTE.
  • FOR XML RAW 🙄 Not a big priority. The def just pretends each word is a column and moves on.
  • UNPIVOT
  • INSERT rather than INSERT INTO: I'm not exactly sure what this does, but the highlighting doesn't recognize it.
  • I'm not sure the intention of constant.language.with. The places I'm seeing it are not constants. They're naming columns in a CTE. Is that what you're trying to do with it?

@keith-hall
Copy link
Collaborator Author

keith-hall commented Sep 25, 2021

  • I'm not sure the intention of constant.language.with. The places I'm seeing it are not constants. They're naming columns in a CTE. Is that what you're trying to do with it?

Hmm, I thought I had covered named CTE columns with syntax tests and scoped them as meta.column-name.

-- ^^^^^^^^^^^^^ meta.column-name

The WITH keyword is too overloaded in TSQL imo - we may need to get smarter about which scenarios we handle it in different ways. It can be used after RAISERROR "WITH NOWAIT", it can be used with parens after a FROM tablename or a JOIN tablename (or subquery?) (and before or after a table alias with an optional AS) to give table-level hints like (NOLOCK) - maybe I should take these known hints from your syntax def instead of scoping all words inside parens as constant.language.with (and perhaps changing the scope to mention "table hint") and finally at the beginning of a statement to declare a CTE. (there may be more cases I'm missing).

I am thinking that because semi-colons to end statements are mostly optional, it is going to be too brittle to try to accurately detect if it is the beginning of a (top level?) statement or not. But I may be able to improve it some more. Do you have any concrete examples you can share where it doesn't work please?

I will work on the other items, thanks for notifying about them. I may just add them into the bottom of the syntax test file without assertions to start with (like I've done for merge) to remind myself what still needs to be done. I also need to cover creating sql logins etc. and more types like image and co-ordinate related ones.

@keith-hall
Copy link
Collaborator Author

I found a case where a CTE wasn't scoped/identified correctly and fixed it. And I found another overload of WITH and added syntax tests for it. I believe I fixed FOR XML RAW and INSERT without INTO. PIVOT and UNPIVOT is still a WIP, ran out of time for now. I found a few other things which need fixing too, and added comments for them in the syntax tests file.

@michaelblyons
Copy link
Collaborator

Hmm, I thought I had covered named CTE columns with syntax tests and scoped them as meta.column-name.

SELECT  *
FROM    table_name AS t1
        INNER JOIN  (SELECT foo FROM bar) AS t2(id)
--                                              ^^ constant.language.with
                    ON t2.ID = t1.ID

which I think is equivalent to

SELECT  *
FROM    table_name AS t1
        INNER JOIN  (SELECT foo AS id FROM bar) AS t2
                    ON t2.ID = t1.ID

@keith-hall
Copy link
Collaborator Author

Hmm, I thought I had covered named CTE columns with syntax tests and scoped them as meta.column-name.

SELECT  *
FROM    table_name AS t1
        INNER JOIN  (SELECT foo FROM bar) AS t2(id)
--                                              ^^ constant.language.with
                    ON t2.ID = t1.ID

which I think is equivalent to

SELECT  *
FROM    table_name AS t1
        INNER JOIN  (SELECT foo AS id FROM bar) AS t2
                    ON t2.ID = t1.ID

Thanks, I hadn't heard of that before. I will try to implement everything from https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql?view=sql-server-ver15, which indeed includes this syntax.

@michaelblyons
Copy link
Collaborator

Thanks, I hadn't heard of that before.

I've never used it, either. 😅 I'm just clicking around in our SQL files looking for stuff that seems odd.

I will try to implement everything from [T-SQL v15]

You're doing great. If you need to take a break, or if you want me to stop finding bugs just tell me to stop for a bit. 😁

@keith-hall
Copy link
Collaborator Author

You're doing great. If you need to take a break, or if you want me to stop finding bugs just tell me to stop for a bit. 😁

Thanks! Pray, please continue - though I probably should also try to concentrate on other dialects (like MySql) to be sure the base syntax is as useful/correct as it should be, rather than just concentrating on T-SQL 😅
but I find T-SQL most relatable and personally useful for me, and still lots to do/fix... so, if someone else wants to contribute a SQL dialect they are familiar with, it'd be most welcome.

This was referenced Sep 30, 2021
deathaxe and others added 23 commits October 31, 2023 22:07
This commit renames column reference contexts to `column-name-list` and arranges
their section alphabetically within expressions.
The `+` operator pattern in MySQL is redundant as it is already part of `expressions`.
Embedding syntaxes may want or require to disable invalid stray highlighting
to avoid false positives after interrupted expressions or interpolation.
This commit prevents globally reserved keywords to be matched as identifiers
(column names, table names, ...).
Literal constants belong to reserved words and thus can be matched globally.
This commit scopes variables such as `@var` in MySQL.
for more customizable highlighting between the two
ADD SEQUENCE isn't valid inside an ALTER TABLE context, so `sequence` here needs to be treated as a column name
deathaxe added a commit to deathaxe/sublime-packages that referenced this pull request Mar 15, 2024
This commit
1. renames CSS.sublime-syntax to CSS (Plain).sublime-syntax
2. creates an inherit CSS from CSS (Plain)

Goal
----

Support different CSS dialects (Basic, PostCSS, Tailwind CSS, ...) in
embedded code scenarios including templating support.

Motivation
----------

More and more syntax packages inherit CSS.sublime-syntax to add support
for interpolation or embed template tags (e.g.: ASP, JSP, PHP, Liquid,
Vue, ...).

Other packages might provide certain dialects of CSS, such as PostCSS
or Tailwind CSS.

It is currently not easily possible to embed Tailwind CSS in PHP or
Liquid without rewriting each of those syntaxes, because all of them
rely on default CSS.sublime-syntax due to inheritance depending on
file names rather than scopes.

The motivation is to enable exactly this (again), to benefit from both
inheritance and flexibility from 3rd-party packages.

Strategy
--------

The idea is to turn CSS.sublime-syntax into an interface, which is
inherit from CSS (Plain) and can be extended by and embedded into
syntax definitions such as PHP, while being able to easily create an
override with `extends` value replaced by desired syntax dialect
(e.g.: Tailwind CSS).

       CSS (PHP)
          |
         CSS
          |  \
          |  Tailwind CSS
          |   |
          |  PostCSS
          |  /
         CSS (Plain)

Note: It follows the idea being discussed in SQL (sublimehq#3046).
@michaelblyons
Copy link
Collaborator

In terms of number 2, it's a sad fact that almost everything in SQL is a keyword that doesn't fit into any other existing subscope.

This may not be the place for it, but I think it's worth considering a new second-level scope on keyword, especially since languages (like C#) have SQL-like expressions that could use it. I don't know what the level should be called (keyword.relational? keyword.query?), but I think it'd be more useful than keword.other.dml.

Also, I've been using this branch for years. It'd be nice to have its improvements available for everyone without hunting for it.

@deathaxe
Copy link
Collaborator

Besides finalizing support for various statements and dialects (e.g. Postgre) scope naming in general is still one of the major todos. I agree with keyword.other.[dml|ddl|...] not really fitting well into the general scope naming scheme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SQL] MySQL and Goto Symbol