Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure SQL Parsing for SQL Server to use Microsoft.SqlServer.DacFx #4

Open
AdrianJSClark opened this issue Oct 8, 2019 · 12 comments · May be fixed by DbUp/DbUp#455
Open

Restructure SQL Parsing for SQL Server to use Microsoft.SqlServer.DacFx #4

AdrianJSClark opened this issue Oct 8, 2019 · 12 comments · May be fixed by DbUp/DbUp#455
Assignees
Labels
enhancement New feature or request

Comments

@AdrianJSClark
Copy link
Member

DbUp parses the text of each SQL script for two purposes:

  1. To identify replacement tokens for our variable replacement
  2. In order to split statements into batches (in SQL Server that's splitting on "GO" statements)

Currently these two purposes are handled using the same parsing logic. This issue was opened to split these two purposes into their own code paths so that each can be handled by appropriate code.

Initially the Variable Substitution parsing will keep the existing SqlParser functionality.

To split the SQL Server T-SQL statements, the parsing will be achieved using a Microsoft-supplied package called "Microsoft.SqlServer.DacFx" which includes a T-SQL parser. This offloads the responsibility for writing a performant, resiliant, compliant parser.

A simple LINQPad-based proof-of-concept to split SQL into batches using Microsoft.SqlServer.DacFx which you can see here: DacFx Parsing LINQPad POC

Edited from original comment posted by @AdrianJSClark in DbUp/DbUp#439 (comment)

@AdrianJSClark AdrianJSClark self-assigned this Oct 8, 2019
@AdrianJSClark AdrianJSClark added the enhancement New feature or request label Oct 8, 2019
@AdrianJSClark
Copy link
Member Author

While doing any code refactoring around variable parsing/substitution as part of this issue #1 should be kept in mind.

Likely sqlcmd -style variable support would be an alternative to the current variable support if both those things were able to be plugged in/out. While that is outside the scope of this issue it should be considered during implementation.

@AdrianJSClark
Copy link
Member Author

AdrianJSClark commented Oct 22, 2019

An initial, rough implementation is available via the draft pull request DbUp/DbUp#455.

A potential issue with this solution is it pulls in a large new dependency (i.e. Microsoft.SqlServer.DacFx) which does a heap more than just parse SQL. The libraries which do the parsing could (licensing permitted) be included separately in a lib folder or via some non-Microsoft NuGet package.

We also now need to handle errors from the parser, which is much more strict at checking syntax than our existing custom built one.

What are everyone's feelings on continuing down this road?

(ping @droyad @dazinator @sungam3r )

@dazinator
Copy link

dazinator commented Oct 22, 2019

My preference would be - firstly to keep our native go splitter in place without these additional dependencies. It's a simple and lightweight solution -- we dont care about fully parsing or validating the t-sql language- we only need go splitting. The splitter we have has occasional new edge cases found (and fixed) to enable that but it does that job in a comparatively lightweight manner.

Then secondly - we could put this new dacfx based splitter in a new and entirely optionally adopted package / project (DbUp.SqlServer.DacFx?). I personally wouldn't adopt this package myself just for its go splitter (because I find the dbup native one sufficient) but perhaps over time it will get other compelling features that leverage the more capable dacfx parser such as sqlcmd variable support etc - in which case if I needed those features I'd adopt this package and accept the cost of additional dependencies. .

@sungam3r
Copy link
Member

Yes, the package is really big - 10 mb, even if you try to get only one required library from it - Microsoft.SqlServer.ScriptDom, it will be 4.5 mb. On the other hand, I see a benefit in removing all parsing code from this project. DbUp is not about parsing or it should not be about it.

@AdrianJSClark
Copy link
Member Author

@sungam3r Unfortunately you can't remove the parsing code because then many scripts do not work. In SQL Server's case at least, the "GO" statement is only respected by the client. If you send "GO" to the server in the text of a SqlCommand object then it will fail as invalid SQL.

We have to split scripts on "GO" statements somehow. I was hoping we could use Microsoft's code to do that reliably.

@AdrianJSClark
Copy link
Member Author

@dazinator Not sure I agree that our "GO" splitting code is simple or particularly lightweight, however you are making some good points there. The size of the package is making me have second thoughts.

@sungam3r
Copy link
Member

 If you send "GO" to the server in the text of a SqlCommand object then it will fail as invalid SQL.

Are you sure?

@dazinator
Copy link

@sungam3r yes. "Go" is not valid t-sql. Tooling (such as sql management studio) use it as a batch seperator within script files. https://docs.microsoft.com/en-us/sql/t-sql/language-elements/sql-server-utilities-statements-go?view=sql-server-ver15

@dazinator
Copy link

dazinator commented Oct 23, 2019

@AdrianJSClark - yep fair enough. By lightweight I meant in terms of dependencies - I think it only leverages very basic low level stuff like stream and char etc. By simple - I guess it depends on your outlook :-) I certainly found it simplistic well.. simpler code than what a full sql parser would look like - it's not a recursive descent parser, its not tokenising or using a grammar etc. Perhaps I should have used the word "primitive" instead of simple :-D

@sungam3r
Copy link
Member

@dazinator I know that GO is just a special keyword for tooling and not a part of TSQL. I think whether the new parser supports specifying specialized delimiters like GO.

@dazinator
Copy link

dazinator commented Oct 23, 2019

@sungam3r not sure what you mean. Just to avoid confusion and for my own sanity I'd like to make the distinction that GO is not really a delimeter (or specialised delimeter) it's officially a batch seperator. Delimeters are valid parts of t-sql statements and are submitted to the database with the commands. The delimiter in t-sql is a ; we dont have to worry about delimeters because they are valid to submit with commands to the Db engine. Therefore delimeters are a non issue. The purpose of the GO splitting logic in dbup is to split scripts on a batch seperator (GO in the case of sql server) into separate commands that can be submitted to the database - the batch seperator is removed in the process. You probably know all this. The new dacfx parser will detect GO for sql server scripts because it can only parse valid sql server scripts (that is scripts that are valid in sql management studio). It wont help you parse scripts for other databases that may use different batch seperators (not GO). In contrast the current GO splitter that dbup has is reused for other database providers - where they can specify what there batch seperator is, and the logic largely remains the same.. In this sense we have more flexible support right now for different batch seperators than we would if we replaced it with dacfx. In fact we'd have to leave the legacy splitter in place for those non sql server db providers that wouldnt be able to use dacfx, and the dacfx based go splitter would just be used for sql server. I dont see much value in this personally but like I have said before if this was an opt-in package rather than a replacement of the existing go splitter then I'd prefer that as it's quite a large dependency to bring in, so I wouldnt personally want to adopt it unless there was a more compelling usage of the powers of the dacfx parser that I really needed (like sqlcmd support)

@sungam3r
Copy link
Member

The new dacfx parser will detect GO for sql server scripts because it can only parse valid sql server scripts (that is scripts that are valid in sql management studio). It wont help you parse scripts for other databases that may use different batch seperators (not GO). 

Right. I am talking only about SqlServer. Ok. It doesn’t matter, and it’s already clear that the parser is heavy.

@droyad droyad transferred this issue from DbUp/DbUp Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Wishlist
Development

Successfully merging a pull request may close this issue.

3 participants