Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Access to DB connection inside migration #119

Open
sbrocket opened this issue Jul 13, 2020 · 19 comments
Open

Feature Request: Access to DB connection inside migration #119

sbrocket opened this issue Jul 13, 2020 · 19 comments
Labels
enhancement New feature or request

Comments

@sbrocket
Copy link

I've been searching around for some way to manage my database migrations in Rust. I'm considering using Diesel, but Diesel only supports defining migrations in SQL. I anticipate wanting to make migrations which would be difficult or impractical to represent in SQL alone - modifying blobs, making complex format changes, etc. - so that is a non-starter for me.

Refinery's README says that Refinery's design is based on Flyway and it supports defining migrations in Rust code instead of SQL, which looked promising. Flyway's document for Java-based migrations says explicitly that this kind of use case is what they're targeted at:

Java-based migrations are a great fit for all changes that can not easily be expressed using SQL.

These would typically be things like

  • BLOB & CLOB changes
  • Advanced bulk data changes (Recalculations, advanced format changes, …)

However, it doesn't seem Refinery's "migrations in a Rust module" support gets me what I need. For example, see Flyway's getting started tutorial for Java-based migrations: https://flywaydb.org/getstarted/java. Implementing this tutorial in the same way does not seem to be possible in Refinery, as the migration would need access to the database connection. (The specific modification in the example is trivial to implement in SQL, but imagine calculating the updated name was an operation that was impractical in SQL.)

Are there any plans to support more complex, manually coded migrations in Refinery?

@jxs
Copy link
Member

jxs commented Jul 14, 2020

Hi, thanks for your interest. Yeah, when I mention flyway is more in the sense of not having undo migrations.
There are currently no plans to support that feature, but that could be a nice inclusion!
Do you want to try and implement it? I can guide you though it does not seem trivial to implement, on the top of my head these need to be done:

  • Update the migration function signature to return a new strucure (could it be Migration?) that has a method fn with_connection<F: FnOnce(U), U: Query<T>, T> which accepts a closure that receives the connection (breaking change?)
  • Check if existing structures and traits can be re-used to accommodate this feature

@jxs jxs added the enhancement New feature or request label Jul 14, 2020
@dimitri-lenkov
Copy link

dimitri-lenkov commented May 12, 2021

UPDATE: DO NOT USE THIS METHOD! IT IS BUGGY

See: #119 (comment)


I'm kind of new to Rust / this use-case with this library. Sharing in hopes that this helps someone out there.

My current workaround for this is to open another connection to the database within the migration itself. The trick is to check that the migration isn't already ran first yourself because refinery always runs the migrate() -> String functions.

I did this by checking that the version number doesn't already exist in the database (in the should_run function below). The migration code below is kind of specific to my code-base, but, it should be enough to get the general idea.

I've sprinkled in some comments that I hope help guide the reader. The code is kind of very crude and terrible (lol unwrap city), but, at the end of the day it's just a migration right?

I'm also very well aware that the below SQL migration could be ran without resorting to this hack, but, it's easier to grok than my actual use-case of migrating a bunch of blob / binary data.

Without further ado:

use async_std::task;
use std::path::Path;

/// Should we run this migration file?
///
/// Checks that this migration files version number does not yet
/// exist in the `refinery_schema_history` table.
///
/// While this is specific to sqlx / postgres, you can easily
/// modify this to support your database driver.
///
async fn should_run(pool: &sqlx::Pool<sqlx::Postgres>) -> bool {
    // - - -
    // hack to pull out this files version with the file! macro.
    // @todo: move *this* `should_run` function into it's own macro.
    let file_name = Path::new(std::file!())
        .file_name()
        .unwrap()
        .to_str()
        .unwrap();

    let version = file_name
        .split("__")
        .next()
        .unwrap()
        .trim_start_matches("V");

    let version = version.parse::<i32>().unwrap();

    let sql = "SELECT version FROM refinery_schema_history WHERE version = $1";

    // This query will error if refinery has never ran, e.g.
    //
    // --> 'relation "refinery_schema_history" does not exist'
    //
    match sqlx::query(&sql).bind(version).fetch_optional(pool).await {
        Ok(row) => row.is_none(),
        Err(_e) => true, // @todo: maybe check correct error type? eh.
    }
}

/// Refinery migration --- this is called EVERY TIME refinery is invoked.
pub fn migration() -> String {
    // The SQL that we wish to run, however, it is NOT going to
    // be handled by refinery's database connection...
    let sql = "CREATE TABLE message (
        -- an auto-incrementing id for this message. (rust type: i32)
        id INTEGER GENERATED BY DEFAULT AS IDENTITY,

        -- What public_key is this session for?
        text TEXT,

        -- When this message was created
        created_at TIMESTAMPTZ NOT NULL,

        PRIMARY KEY(id)
    )";

    // Instead, it is handled by OUR APPLICATION's database connection
    // logic (this will be specific to YOUR application & how you acquire
    // database connections and what-not)
    task::block_on(async {
        // How my app gets a sqlx postgresql connection...
        let info = crate::postgres::ConnectionInfo::from_env().unwrap();
        let pool = info.connect().await.unwrap();

        //
        if should_run(&pool).await {
            sqlx::query(sql).execute(&pool).await.unwrap();
        }
    });

    // Because refinery isn't handling this SQL logic, we can return
    // an empty to make the compiler happy. Refinery happily accepts
    // this value and records it in the `refinery_schema_history` table.
    "".to_string()
}

glhf.


EDIT: Here is a macro you can shove in the root of your crate (or where ever you'd like) that can be used like this (in the above example) that replaces the should_run function:

        if crate::should_run!(&pool).await {
            // . . .
        }
#[macro_export]
macro_rules! should_run {
    ($pool:expr) => {
        async {
            let file_name = std::path::Path::new(std::file!())
                .file_name()
                .unwrap()
                .to_str()
                .unwrap();

            let version = file_name
                .split("__")
                .next()
                .unwrap()
                .trim_start_matches("V");

            let version = version.parse::<i32>().unwrap();

            let sql = "SELECT version FROM refinery_schema_history WHERE version = $1";

            // This query will error if refinery has never ran, e.g.
            //
            // --> 'relation "refinery_schema_history" does not exist'
            //
            match sqlx::query(&sql).bind(version).fetch_optional($pool).await {
                Ok(row) => row.is_none(),
                Err(_e) => true, // @todo: maybe check correct error type? eh.
            }
        }
    };
}

This will save you from copy/pasting that logic in all migrations that you have to get dirty with.

@jxs
Copy link
Member

jxs commented May 12, 2021

Hi, thanks for your example!
One other possible solution would be implementing iterator for Runner's migrations which could iterate with (Connection, Migration

@dimitri-lenkov
Copy link

dimitri-lenkov commented May 13, 2021

Hi @jxs (and the internet!) --- The solution I came up with is buggy & doesn't work when starting from a fresh instance of a database (e.g. in test cases where you start with a fresh database).

I thought it worked fine because I was migrating an existing database, but, it does NOT work as you'd expect from a fresh database...

It seems like my second-connection hack eagerly evaluates the SQL before refinery actually runs migrations that come before it.

To further illustrate this, imagine that you're starting on a fresh database and have these two migrations:

  • V1__init.rs (Normal: Returns a SQL String that refinery expects)
  • V2__complex_blob_migration.rs (The hack I wrote that opens a second connection)

It seems like the V2__complex_blob_migration.rs (the stuff I wrote out above with the second connection) runs before V1.

I'm coming to this conclusion because in my V1 script, I'm creating tables, and in the "complex migration V2 script" I'm manipulating those tables, but I get this error:

`relation "table_name" does not exist`

Which leads me to believe that something interesting is going on, and I shouldn't take this approach...


Hi, thanks for your example!
One other possible solution would be implementing iterator for
Runner's migrations which could iterate with (Connection, Migration

I'll spend tomorrow on this and ping here with how far I get. Thank you so much for the guidance!

. . .

EDIT: I think I understand why my approach is buggy, because, it's running the migrations on the 'gather phase' of this library, i.e.

/// Get the gathered migrations.
pub fn get_migrations(&self) -> &Vec<Migration> {
    &self.migrations
}

https://docs.rs/refinery-core/0.5.1/src/refinery_core/runner.rs.html#262-264

when what I want it for them to be ran in the 'run phase' --- 😄

@dimitri-lenkov
Copy link

@jxs just to be clear, you're suggesting that I implement Iterator for the refinery struct Runner?

@jxs
Copy link
Member

jxs commented May 13, 2021

I was considering it as a possibility to address this issue, and comparing with the other suggestion #119 (comment) to provide a callback to Runner. I think it might be a cleaner api for the end user but is definitely harder to implement, I would need to change the way migrations currently run, also for migrate_async would need to be Stream instead

@dimitri-lenkov
Copy link

Oh ok. This may be out of reach for me. Thank you for your time.

@jxs
Copy link
Member

jxs commented May 14, 2021

would you like to try to tackle as a provided callback function to Runner instead?

@dimitri-lenkov
Copy link

dimitri-lenkov commented May 15, 2021

Hey @jxs --- I found a library called schemamama that has a PostgreSQL driver. It requires a little more code to set it up, but it does what I need at this moment.

I think I'm really looking for an "abstract migration" tool similar to node-migrate and not anything tied to SQL-specific use cases. i.e. something that keeps track of functions ran with-or-without database modifications.

Forking schemamama (or implementing my own trait?) will be easier for me to accomplish as it's a much smaller codebase (it's a single lib.rs file under 300 LOC).

It seems like the majority of the migrate tools on crates.io are SQL specific. Now I'm thinking that refinery shouldn't worry about this use case, stay purely SQL, and defer this sort of "data migration" to another tool.

🤔

@nuttycom
Copy link

nuttycom commented Aug 1, 2022

I'd like to chime in on this issue by asking: how do barrel migrations access the database connection that is used by refinery? It's not clear to me from the examples what's going on here; ideally I'd also like to be able to access the database connection directly (to be able to perform transformations on data in Rust that require querying the database first and then performing computations that require other external context, before writing the result.)

Related, is there any way (apart from global variables) to pass additional context data (such as cryptographic keys that are never stored in the database, but which are used to construct public keys that are stored) to a migration?

@jxs
Copy link
Member

jxs commented Aug 2, 2022

Hi!

I'd like to chime in on this issue by asking: how do barrel migrations access the database connection that is used by refinery?

they don't, barrel migrations are converted to String which is what refinery expects.

It's not clear to me from the examples what's going on here; ideally I'd also like to be able to access the database connection directly (to be able to perform transformations on data in Rust that require querying the database first and then performing computations that require other external context, before writing the result.)

Related, is there any way (apart from global variables) to pass additional context data (such as cryptographic keys that are never stored in the database, but which are used to construct public keys that are stored) to a migration?

can you give a use case?

@nuttycom
Copy link

nuttycom commented Aug 13, 2022

Related, is there any way (apart from global variables) to pass additional context data (such as cryptographic keys that are never stored in the database, but which are used to construct public keys that are stored) to a migration?

can you give a use case?

I've actually opted to go with schemer instead because it supports my uses and also allows migrations to form a DAG, but here's an example of the use case:
zcash/librustzcash@2f71dee#diff-54ec5e42408ac28d55641fbe89818c7a8ea21c666b6e4b79b059776661c4bce9R173-R282

This performs a couple operations in Rust in the process of the migration:

  • Updates the derivation of cryptographic keys, using checks against the existing data to ensure that they're derived from the same private key material
  • Since a few different versions of the previously existing database structure existed "in the wild" prior to the addition of migrations to the stack, introspect the existing schema and use the results to choose how to execute part of the migration.

@superstator
Copy link
Contributor

Is this still something you'd be interested in a PR for?

Here's my use case: my system uses sqlite to catalog a large number of binary files on disk, indexing some basic attributes like time or content type. During a migration I may want to update that sqlite database and simultaneously update or transform those files in some way. I want to be able to do this stepwise with migrations, and ideally let those updates raise errors that could abort the migration process.

I've been playing around with it a little in a fork, and it seems like the simplest approach is to allow for adding pre/post migration hooks (closures) to run with each migration. The closure would have access to a reference to the current migration, and could include whatever other context it wanted, including a database connection. The only obvious downside to me is that the closure wouldn't have access to the actual migration transaction, but at least for us that's not a showstopper.

let mut connection = Connection::open(&db_path).unwrap();
let mut runner = crate::embedded::migrations::runner();

let hook_connection = Connection::open(&db_path).unwrap();
runner.add_post_migration_hook(move |migration| {
    hook_connection.execute("select foo from bar", rusqlite::params![]).map_err(|_| anyhow::anyhow!("oh noe"))?;
    warn!("we just ran this migration: {migration:?}");
    Ok(())
});

runner.run(&mut connection)?;

@jxs
Copy link
Member

jxs commented Nov 7, 2023

Hi, yeah I am def interested if this is something that helps users having more control. Curious, why do you prefer closures over something like having a .next() (we could even impl Iterator) function, that returns an applied migration?

@superstator
Copy link
Contributor

I could go either way. The advantage to closures I think is just the scope of the changes; it's relatively easy to supply some Fns and run them at the appropriate moment in the migration process, esp. if we're not worried about blocking during async migrations. An enumerator could be more flexible, though it makes a pre-migration hook a little trickier. But that might be a YAGNI thing anyway.

Another thought I had as I played with this is that it would be nice if there was an enum generated along with the migrations, so that in a hook/enumeration block I could match against that rather than a string name:

for migration in runner {
    match migration.variant {
        Migrations::V9__cleanup => do_cleanup(),
        _ => () // do nothing
    }
}

@superstator
Copy link
Contributor

Actually, I think the iterator approach isn't so much harder after all. But in going through the various migrate methods I've noticed that the migrate_grouped implementations in sync.rs and async.rs include a check for Target::Fake or Target::FakeVersion(_):

if let Target::Version(input_target) | Target::FakeVersion(input_target) = target {

If supplied, the migrations table is updated but the actual migration query isn't executed. The regular un-batched migrate implementations don't honor that flag, and only look for Target::Version(_):
if let Target::Version(input_target) = target {

Is that intentional? It feels like a bug, but I thought I'd check before I just fixed it unilaterally.

@superstator
Copy link
Contributor

I have iteration working, but: the iterator needs to hold onto the database connection. So, either you need a non-standard iter() method like runner.iter(&mut conn), or you need to supply the database connection to the runner when it's instantiated, not when it's run, like crate::embedded::migrations::runner(&mut connt). One is ugly and non-idiomatic, the other is a breaking change. I'd vote for the breaking change since it seems relatively minor, but what would you prefer?

@jxs
Copy link
Member

jxs commented Nov 23, 2023

Hi, and sorry for the delay! The problem with the first is that it's not only breaking it also breaks the current way the macros work as they don't have access to the connection when the instantiate the Runner.
Curious, why do you say having a Runner::iter(self, connection) that consumes self (or not) and returns a new struct that impls Iterator is idiomatic and ugly?

@superstator
Copy link
Contributor

iter methods by convention are implementations of IntoIterator, which does not allow for any extra parameters, just self. Especially if you want to be able to do something like

for migration in runner {
    ...
}

you need the Runner to have a mutable reference to the db connection before you start iterating.

None of which is a big deal, and I have this all working in my branch. Certainly not the only way to do it, but it felt the most natural to me. Maybe it will be clearer if I just send you the PR and we can discuss there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants