Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse engineering complex databases - automatically generate storage definition and data structures #450

Open
prospectumdev opened this issue Jan 11, 2020 · 86 comments

Comments

@prospectumdev
Copy link

prospectumdev commented Jan 11, 2020

I have databases with many complex tables (lots of columns, foreign keys, range checks etc). Manually creating the storage definition and all the structs would take maybe a week. It is also likely that errors are made in the process.

Is there some kind of parser that generates the structs and the storage definition from an existing SQLite db?

If not, I am tempted to write a SQLite3 -> sqlite_orm definition parser that reads sqlite_master and outputs a .h file with all you need in order to use the db with sqlite_orm.

@fnc12
Copy link
Owner

fnc12 commented Jan 11, 2020

Hi. Thanks for using the lib. How many tables do you have? I am just curious.
I thought about codegen from SQL query. I want to make a website where you can type an SQL query and press generate and you would get C++ code written with sqlite_orm. E.g. you type:

CREATE TABLE contacts (
    contact_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE,
    phone TEXT NOT NULL UNIQUE
);

will generate

struct Contact {
    int id = 0;
    std::string firstName;
    std::string lastName;
    std::string email;
    std::string phone;
};
auto storage = make_storage("db.sqlite",
                            make_table("contacts",
                                       make_column("contact_id", &Contact::id, primary_key()),
                                       make_column("first_name", &Contact::firstName),
                                       make_column("last_name", &Contact::lastName),
                                       make_column("email", &Contact::email, unique()),
                                       make_column("phone", &Contact::phone, unique())));

or if you type

SELECT first_name, salary 
FROM emp_master;

you will get

auto rows = storage.select(columns(&EmpMaster::firstName, &EmpMaster::salary));

What do you think?

@prospectumdev
Copy link
Author

There should be somewhere around 70 tables (2 different schema definitions). Many of them have several dozens of columns.

The results you describe are exactly what is needed (although it seems that NOT NULL is ignored in the example). That would make sqlite_orm even more efficient to use and attractive!

I suppose that when it comes to details (e.g. the determining the right order of the make_table entities in the presence of foreign keys), one needs to be careful.

May I ask how you are planning to implement this transformation?

@fnc12
Copy link
Owner

fnc12 commented Jan 12, 2020

  1. parse a query with sqlite vanilla parser
  2. analyze model obtained by parsing
  3. generate code or show error with text "this feature is not supported right now"

@prospectumdev
Copy link
Author

Sounds good! What priority does this code generation thing have for you?

@fnc12
Copy link
Owner

fnc12 commented Jan 12, 2020

  1. Bugs
  2. Minor features
  3. Codegen
  4. Major features

@prospectumdev
Copy link
Author

I will need to have the tables converted in about 2 weeks. Do you think the code generator might be ready then? If not, no problem, then I'll write a simple object oriented parser just for 'create table'.

@fnc12
Copy link
Owner

fnc12 commented Jan 13, 2020

Probably it will be ready for create table

@prospectumdev
Copy link
Author

Awesome, thank you so much!

@prospectumdev
Copy link
Author

Quick question: How are types translated into sqlite_orm, e.g. UNSIGNED INT(4), CHARACTER(1), VARCHAR(64) etc.?
In the case of UNSIGNED INT(4) => int, information is lost so that the databases created by sqlite_orm are not identical to the original ones.

@prospectumdev
Copy link
Author

And another Question:
here...

struct User{
	...
	std::unique_ptr<std::string> imageUrl;
	...
};

...unique_ptr means that imageUrl = nullptr is allowed. Correct?

@fnc12
Copy link
Owner

fnc12 commented Jan 13, 2020

  1. https://github.com/fnc12/sqlite_orm/blob/master/include/sqlite_orm/sqlite_orm.h#L164
  2. yes, nullable types skip NOT NULL constraint during syncing a schema. Nullable types are std::unique_ptr, std::shared_ptr and std::optional. You can specify any other type as nullable in your application

@prospectumdev
Copy link
Author

Thanks!
In the meantime I have written a quick and dirty code generator based on the SQLite C/C++ API.
It can already create the structs. Next, I'll do the make_storage part.
I can make the code available on github if you like.

@fnc12
Copy link
Owner

fnc12 commented Jan 13, 2020

This is cool! But it is not required cause I have already started making injections in sqlite source to make available statement analyzing. I thought about creating custom parser but native parser is always better cause it will parse 100% of queries but non-native will parse <100% queries and must be regularly supported. Anyway parser creation is a good exercise to get amazing experience!

@prospectumdev
Copy link
Author

prospectumdev commented Jan 13, 2020

Generated Code:

The compiler says I must use the command line option /bigobj in order to get it to compile.
I am using /bigobj but it is still not compiling (Error C1128) :-(

@prospectumdev
Copy link
Author

Ok, I put /bigobj into the linker command line options instead of the compiler. Obviously I need a break but it seems that I am almost there.
Takes long to compile though.

@prospectumdev
Copy link
Author

Ok, I can create an sqlite3 db using sqlite_orm and the generated definition (103 KB of code, I am happy that I did not have to write this manually). Now I need to have a look at the foreign keys.
But maybe I should call it day for now :-)

@prospectumdev
Copy link
Author

prospectumdev commented Jan 13, 2020

I am stuck with the data type mapping. Seems like it is not possible to create a db definition for sqlite_orm without losing the detailed data type specification of the SQLite db to be modelled.
That might be a big dealbreaker because then the Host software is most likely going to refuse to read dbs created with sqlite_orm...

I need to exactly recreate the existing SQLite db. Is this possible?

@prospectumdev
Copy link
Author

How does sqlite_orm deal with circular foreign key references? I just discovered that the schema I am currently working with has 7 circular references...

@fnc12
Copy link
Owner

fnc12 commented Jan 14, 2020

@prospectumdev what will be lost?
sqlite_orm cannot handle circular references. This is a reason to improve sync_schema call

@prospectumdev
Copy link
Author

prospectumdev commented Jan 14, 2020

what will be lost?

The data type definitions as specified in the original databases. For instance, 'UNSIGNED INT(4)' is mapped to 'INT', so that databases created with sqlite_orm will always differ from original ones. However, contrary to my expectations it seems that the host software I am using does not care. So it is not a big problem.

sqlite_orm cannot handle circular references

In my tables there are self references and circular references. is it possible / are you planning to support these references? As for timing, is this considered a minor or a major feature?

Plus it seems that it is not possibly to specify validity checks via the check constraint, like
CHECK(CustomDisplay1_PosPixels_Y=0 OR CustomDisplay1_PosPixels_Y BETWEEN 0 AND 8192).
The databases I am using heavily rely on checks. Again, the host software does not seem to care but I know from experience that there is a good reason for all those checks to be there.

From my perspective, my assessment of these three issues is the following:

  1. Circular references: Important feature.
  2. Checks: Important feature.
  3. Non-SQLite data type names: Seems to be more of a nice-to-have feature.

This is a reason to improve sync_schema call

So you are planning to add support for circular references and self references? That is great news!

What about check constraints? If sqlite_orm is supposed to do the checking itself, this might be a lot of work. However, why not simply add the check constraints to the database and have SQLite itself do the checking? In that case it should be much easier to add this feature.

As for the data type names: It seems that SQLite allows for data type name aliases to be stored. While I am not sure if it is a good practice to use those at all, SQLite does it and so, for completeness, it might be nice to represent that, too. Or sqlite_orm could explicitly refuse to use data types other than those specified by the 5 storage classes and rely on checks in order to ensure validity. While this is a plausible approach, my experience from product development tells me that for psychological reasons this is likely to have a (small) detrimental effect on the user acceptance of sqlite_orm.

@fnc12
Copy link
Owner

fnc12 commented Jan 14, 2020

  1. If you have a database with a column with type UNSIGNED INT(4) and a member mapped to this column with type int the column will never be altered during sync_schema cause UNSIGNED INT(4) maps to int
  2. circular references are delayed until someone asks for it. Looks like it is time to implement this feature
  3. CHECK constraint is also delayed until someone asks
    I don't see the problem in different types' names cause names are only aliases.
    I think I shall implement CHECK constraint first, circular references next and SQL parse after that. I am afraid I can not make it in two weeks so you should use you own parser probably.

@prospectumdev
Copy link
Author

prospectumdev commented Jan 14, 2020

Thanks a lot, I'll find a way to get something going that I can work with at the moment. If the host software accepts databases without foreign keys and checks, that's ok for now.

I think you are doing an excellent job with sqlite_orm, and I think it can become very popular once the features are there and it is convenient to use (example: code generation).
May I ask you about your plans for sqlite_orm? Is it a private or professional project for you? Are you planning to develop/support it long term?

@prospectumdev
Copy link
Author

Feedback: It kind of works: I can now read an existing database, generate the sqlite_orm header file (no foreign keys, no check() constraints) and make sqlite_orm produce a working database. 'working means' that I can insert all the data of the preexisting database into the new one and the new one is accepted by the host software!
Unfortunately, when I try to open an existing database with sqlite_orm, it is completely overwritten. The preserve flag (sync_schema(true)) does not help.
So it seems that I will have to transfer the data manually at the moment.

@fnc12
Copy link
Owner

fnc12 commented Jan 14, 2020

Data is lost cause something in schema differs. Please show me the results of sync_schema call

@prospectumdev
Copy link
Author

My quick and dirty code: https://github.com/prospectumdev/200111_sqlite_orm_codegen

I'll post the results of sync_schema shortly.

@prospectumdev
Copy link
Author

prospectumdev commented Jan 14, 2020

sync_schema() returns "old table dropped and recreated" for all tables
sync_schema(true) also returns "old table dropped and recreated" for all tables

I found the reason: It is happening because of the data type names.
For testing, I modified a small table, setting data type names to SQLite standards, while leaving all other tables alone. Now, for the modified table, the output is "table and storage is already in sync."

Maybe custom data type name support isn't a 'nice to have' feature at all...

@prospectumdev
Copy link
Author

prospectumdev commented Jan 14, 2020

Optional data type aliases could solve the problem. Then

make_column("CustomDisplay1_PosPixels_X", &Coupler::CustomDisplay1_PosPixels_X, default_value(0))

turns into

make_column("CustomDisplay1_PosPixels_X", &Coupler::CustomDisplay1_PosPixels_X, "UNSIGNED INT(6)", default_value(0))

If available, the aliases can be used to set table_info[columnid].type when tables are created.

@fnc12
Copy link
Owner

fnc12 commented Jan 15, 2020

what type has Coupler::CustomDisplay1_PosPixels_X?

@prospectumdev
Copy link
Author

prospectumdev commented Jan 15, 2020

original db : UNSIGNED INT(6)
new db created with sqlite_orm: INTEGER
struct created by the code generator: unique_ptr<unsigned int>

@fnc12
Copy link
Owner

fnc12 commented Jan 15, 2020

what C++ type has Coupler::CustomDisplay1_PosPixels_X or what type you expect it to have? int, std::string or what?

@fnc12
Copy link
Owner

fnc12 commented Jan 24, 2020

if all tables are bound with each other with foreign keys than the only way is to drop foreign keys. If some tables can be separated from others that you can split your storage without dropping anything.

@prospectumdev
Copy link
Author

prospectumdev commented Jan 25, 2020

The problem is the complexity of make_storage.
Is it (theoretically) possible to modify sqlite_orm so that make_storage is broken up?

So that
make_storage(path,make_table(...), make_table(...), ...)

turns into

auto storage = init_storage(path);
storage.add_table(...);
...

After all, the larger the schema, the more useful orm gets and thus the more desirable it is to use it.

Another problem is compilation time and memory usage. Do you think extern template declarations could help decrease compilation time?

@fnc12
Copy link
Owner

fnc12 commented Jan 25, 2020

Yeah, I know this. There is no way to divide make_storage into two parts: storage init and adding tables cause API will be dropped in that case. This problem also exists in C++ std::tuple. Extern templated will not fix it cause there was someone with such a problem already who tried extern templates and they did not work. The only way for now is to split storage. Later probably C++ will obtain non-recursive tuple technique and large storage will not have this issue.

@prospectumdev
Copy link
Author

prospectumdev commented Jan 27, 2020

Assuming that a schema is given, what is the type returned by make_storage? I mean, if you can not use decltype but need to specify the type explicitly, what can you do? I am thinking about preparing libraries and declare the return these types in include headers.

@fnc12
Copy link
Owner

fnc12 commented Jan 27, 2020

@prospectumdev
Copy link
Author

Thanks! When I remove all the using ...= decltype(...("")); definitions, I can compile the whole thing (both schemas together)! So it seems that what I need to do is define those types explicitly. I'll see if that can done by a generator.

@prospectumdev
Copy link
Author

Almost there!
In the FAQ example, you show one constraint(primary key):
Column<Responsible, decltype(Responsible::id), constraints::primary_key_t<>>,

Could you please tell me all the constraints that can be put after decltype(...), ? Do I have to use them in a specific order? Thanks!

@fnc12
Copy link
Owner

fnc12 commented Jan 27, 2020

constraints::unique_t for unique, others can be found here https://github.com/fnc12/sqlite_orm/blob/dev/dev/constraints.h . The order must be the same as in your make_column declaration.

@prospectumdev
Copy link
Author

prospectumdev commented Jan 27, 2020

This works when the primary key definitions are removed...

struct __ValidFixedCodes_ReverbTailTruncationMode
{
	unsigned int Code;
	string Description;
};
template<class O, class T, class ...Op>
using Column = internal::column_t<O, T, const T & (O::*)() const, void (O::*)(T), Op...>;

using HW5CODMDB = internal::storage_t <
	internal::table_t < __ValidFixedCodes_ReverbTailTruncationMode,
	Column< __ValidFixedCodes_ReverbTailTruncationMode, decltype(__ValidFixedCodes_ReverbTailTruncationMode::Code)/*, constraints::primary_key_t<>*/, constraints::unique_t>,
	Column< __ValidFixedCodes_ReverbTailTruncationMode, decltype(__ValidFixedCodes_ReverbTailTruncationMode::Description)>
	>
> ;

HW5CODMDB initHW5CODMDB(string& path)
{
	return make_storage(path,
		make_table
		(
			"__ValidFixedCodes_ReverbTailTruncationMode",
			make_column("Code", &__ValidFixedCodes_ReverbTailTruncationMode::Code, unique()),
			make_column("Description", &__ValidFixedCodes_ReverbTailTruncationMode::Description)
			//, primary_key(&__ValidFixedCodes_ReverbTailTruncationMode::Code)
		)
	);
}

but does not when they are used. How can I represent a primary key that is defined as in make_storage in the storage type? Thanks!

@prospectumdev
Copy link
Author

prospectumdev commented Jan 27, 2020

This combination works...

Column< __ValidFixedCodes_ReverbTailTruncationMode, decltype(__ValidFixedCodes_ReverbTailTruncationMode::Code), constraints::primary_key_t<>, constraints::unique_t>,
make_column("Code", &__ValidFixedCodes_ReverbTailTruncationMode::Code, primary_key(), unique()),

but, in order to support multi column primary keys, I want something like

sqlite_orm::constraints::primary_key_t<decltype(__ValidFixedCodes_ReverbTailTruncationMode::Code)>
primary_key(&__ValidFixedCodes_ReverbTailTruncationMode::Code)

but that does not work.

@prospectumdev
Copy link
Author

prospectumdev commented Jan 27, 2020

I just figured it out

constraints::primary_key_t<decltype(&__ValidFixedCodes_ReverbTailTruncationMode::Code)>

@prospectumdev
Copy link
Author

Unfortunately, the combination

Column< _General, decltype(_General::CustomDisplay_EnableStdPages_Controls), constraints::default_t<string>>,

make_column("CustomDisplay_EnableStdPages_Controls", &_General::CustomDisplay_EnableStdPages_Controls, default_value<string>("Y")),

causes an error in sqlite_orm.h, line 9134:
default value of column [CustomDisplay_EnableStdPages_Controls] is not constant

@fnc12
Copy link
Owner

fnc12 commented Jan 27, 2020

Column< _General, decltype(_General::CustomDisplay_EnableStdPages_Controls), constraints::default_t<const char *>>,

make_column("CustomDisplay_EnableStdPages_Controls", &_General::CustomDisplay_EnableStdPages_Controls, default_value("Y")),

@prospectumdev
Copy link
Author

prospectumdev commented Jan 27, 2020

But _General::CustomDisplay_EnableStdPages_Controls is string, and without added, the output type of make_storage did not match the explicitely defined storage type...

I'll try it.

@fnc12
Copy link
Owner

fnc12 commented Jan 27, 2020

field can be std::string and default value can be const char *. It is ok cause both std::string and const char * map as TEXT to SQLite. Actually you can even make default_value with int and it will work cause SQLite has weak column types

@prospectumdev
Copy link
Author

So for an unsigned int it is constraints::default_t<unsigned int>> and default_value<unsigned int>(...) but for a string it is constraints::default_t<const char *>> and default_value("..."))?

Then, just for completeness, how is it done for real and blob types? Thanks!

@fnc12
Copy link
Owner

fnc12 commented Jan 27, 2020

No.
It is const char * for default_value("Y") statement cause type of "Y" argument is const char *.

struct User{
    int id = 0;
    float floatValue = 0;
};
auto storage = make_storage({},
                            make_table("users",
                                       make_column("id", &User::id, primary_key()),
                                       make_column("value", &User::floatValue, default_value(0))));
// or 
auto storage = make_storage({},
                            make_table("users",
                                       make_column("id", &User::id, primary_key()),
                                       make_column("value", &User::floatValue, default_value(0.0))));
// or 
auto storage = make_storage({},
                            make_table("users",
                                       make_column("id", &User::id, primary_key()),
                                       make_column("value", &User::floatValue, default_value(0.0f))));
// or 
auto storage = make_storage({},
                            make_table("users",
                                       make_column("id", &User::id, primary_key()),
                                       make_column("value", &User::floatValue, default_value(0L))));
// or 
auto storage = make_storage({},
                            make_table("users",
                                       make_column("id", &User::id, primary_key()),
                                       make_column("value", &User::floatValue, default_value("0"))));

everything will work cause all this just is serialized to SQL queries which are compiled by SQLite3 engine. Next is SQLite's responsibility to assign 0.0 to int column and it will be assigned well cause SQLite has weak types. You can insert string into int column and reverse. Also you can omit columns' types during table creation:

CREATE TABLE users (id, name)

@prospectumdev
Copy link
Author

Thanks, after ~12 hours it is finally working. 353 kB of generated code... Still takes long to compile but it works!
I am thinking about stuffing it all into a library and creating a facade for it so that my program does not have to deal with any of the orm stuff.

Anyway, thank you very much again! I'm done for today, 12 hours of coding should be enough ;-)

@fnc12
Copy link
Owner

fnc12 commented Jan 27, 2020

I am glad that everything works. Your idea about a dedicated library is the best solution. Also please beware that changing your schema requires storage type changing also.

@prospectumdev
Copy link
Author

prospectumdev commented Jan 28, 2020

My working code generator: https://github.com/prospectumdev/200111_sqlite_orm_codegen
Todo: Checks and foreign keys.

Thank you so much for your help!

@fnc12
Copy link
Owner

fnc12 commented Feb 2, 2020

CHECK is done #456 . Now sqlite_orm supports all types of table constraints. Thank you for your help

@fnc12
Copy link
Owner

fnc12 commented Mar 3, 2020

is the issue actual?

@prospectumdev
Copy link
Author

I'll check tomorrow

@prospectumdev
Copy link
Author

Before I update my generator: Is there any 'official' functionality now for generating storage definitions from SQLite dbs? If not: This is worth considering becauses such a generator increases the efficiency of use of sqlite_orm a lot, especially for more complex databases. If updated, my generator code could be a starting point for that.

@prospectumdev
Copy link
Author

Ok, it was easier than I thought to update my generator (https://github.com/prospectumdev/200111_sqlite_orm_codegen). Still: There should be an 'official' generator, I think.

@fnc12
Copy link
Owner

fnc12 commented Aug 18, 2022

hi @prospectumdev . There is no ready generator. I was working on it but it takes a lot of time to make an universal SQLite parser. You can help me if you give me some useful test cases which I can start with. Thanks

@prospectumdev
Copy link
Author

Sure - I have already sent you a database by email.
And as I was saying, if my generator code is useful for you in any way, go ahead and use it any way you want.

@fnc12
Copy link
Owner

fnc12 commented Aug 24, 2022

I tried to develop a parser of SQL by myself to make a code generator like you want. By I faced an issue that it is not a trivial task. E.g. does your generator support DEFAULT and CHECK constraints with random expression?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants