Feature Request: Access to DB connection inside migration #119

sbrocket · 2020-07-13T20:23:55Z

I've been searching around for some way to manage my database migrations in Rust. I'm considering using Diesel, but Diesel only supports defining migrations in SQL. I anticipate wanting to make migrations which would be difficult or impractical to represent in SQL alone - modifying blobs, making complex format changes, etc. - so that is a non-starter for me.

Refinery's README says that Refinery's design is based on Flyway and it supports defining migrations in Rust code instead of SQL, which looked promising. Flyway's document for Java-based migrations says explicitly that this kind of use case is what they're targeted at:

Java-based migrations are a great fit for all changes that can not easily be expressed using SQL.

These would typically be things like

BLOB & CLOB changes

Advanced bulk data changes (Recalculations, advanced format changes, …)

However, it doesn't seem Refinery's "migrations in a Rust module" support gets me what I need. For example, see Flyway's getting started tutorial for Java-based migrations: https://flywaydb.org/getstarted/java. Implementing this tutorial in the same way does not seem to be possible in Refinery, as the migration would need access to the database connection. (The specific modification in the example is trivial to implement in SQL, but imagine calculating the updated name was an operation that was impractical in SQL.)

Are there any plans to support more complex, manually coded migrations in Refinery?

jxs · 2020-07-14T12:05:25Z

Hi, thanks for your interest. Yeah, when I mention flyway is more in the sense of not having undo migrations.
There are currently no plans to support that feature, but that could be a nice inclusion!
Do you want to try and implement it? I can guide you though it does not seem trivial to implement, on the top of my head these need to be done:

Update the migration function signature to return a new strucure (could it be Migration?) that has a method fn with_connection<F: FnOnce(U), U: Query<T>, T> which accepts a closure that receives the connection (breaking change?)
Check if existing structures and traits can be re-used to accommodate this feature

dimitri-lenkov · 2021-05-12T06:02:32Z

UPDATE: DO NOT USE THIS METHOD! IT IS BUGGY

See: #119 (comment)

I'm kind of new to Rust / this use-case with this library. Sharing in hopes that this helps someone out there.

My current workaround for this is to open another connection to the database within the migration itself. The trick is to check that the migration isn't already ran first yourself because refinery always runs the migrate() -> String functions.

I did this by checking that the version number doesn't already exist in the database (in the should_run function below). The migration code below is kind of specific to my code-base, but, it should be enough to get the general idea.

I've sprinkled in some comments that I hope help guide the reader. The code is kind of very crude and terrible (lol unwrap city), but, at the end of the day it's just a migration right?

I'm also very well aware that the below SQL migration could be ran without resorting to this hack, but, it's easier to grok than my actual use-case of migrating a bunch of blob / binary data.

Without further ado:

use async_std::task;
use std::path::Path;

/// Should we run this migration file?
///
/// Checks that this migration files version number does not yet
/// exist in the `refinery_schema_history` table.
///
/// While this is specific to sqlx / postgres, you can easily
/// modify this to support your database driver.
///
async fn should_run(pool: &sqlx::Pool<sqlx::Postgres>) -> bool {
    // - - -
    // hack to pull out this files version with the file! macro.
    // @todo: move *this* `should_run` function into it's own macro.
    let file_name = Path::new(std::file!())
        .file_name()
        .unwrap()
        .to_str()
        .unwrap();

    let version = file_name
        .split("__")
        .next()
        .unwrap()
        .trim_start_matches("V");

    let version = version.parse::<i32>().unwrap();

    let sql = "SELECT version FROM refinery_schema_history WHERE version = $1";

    // This query will error if refinery has never ran, e.g.
    //
    // --> 'relation "refinery_schema_history" does not exist'
    //
    match sqlx::query(&sql).bind(version).fetch_optional(pool).await {
        Ok(row) => row.is_none(),
        Err(_e) => true, // @todo: maybe check correct error type? eh.
    }
}

/// Refinery migration --- this is called EVERY TIME refinery is invoked.
pub fn migration() -> String {
    // The SQL that we wish to run, however, it is NOT going to
    // be handled by refinery's database connection...
    let sql = "CREATE TABLE message (
        -- an auto-incrementing id for this message. (rust type: i32)
        id INTEGER GENERATED BY DEFAULT AS IDENTITY,

        -- What public_key is this session for?
        text TEXT,

        -- When this message was created
        created_at TIMESTAMPTZ NOT NULL,

        PRIMARY KEY(id)
    )";

    // Instead, it is handled by OUR APPLICATION's database connection
    // logic (this will be specific to YOUR application & how you acquire
    // database connections and what-not)
    task::block_on(async {
        // How my app gets a sqlx postgresql connection...
        let info = crate::postgres::ConnectionInfo::from_env().unwrap();
        let pool = info.connect().await.unwrap();

        //
        if should_run(&pool).await {
            sqlx::query(sql).execute(&pool).await.unwrap();
        }
    });

    // Because refinery isn't handling this SQL logic, we can return
    // an empty to make the compiler happy. Refinery happily accepts
    // this value and records it in the `refinery_schema_history` table.
    "".to_string()
}

glhf.

EDIT: Here is a macro you can shove in the root of your crate (or where ever you'd like) that can be used like this (in the above example) that replaces the should_run function:

        if crate::should_run!(&pool).await {
            // . . .
        }

#[macro_export]
macro_rules! should_run {
    ($pool:expr) => {
        async {
            let file_name = std::path::Path::new(std::file!())
                .file_name()
                .unwrap()
                .to_str()
                .unwrap();

            let version = file_name
                .split("__")
                .next()
                .unwrap()
                .trim_start_matches("V");

            let version = version.parse::<i32>().unwrap();

            let sql = "SELECT version FROM refinery_schema_history WHERE version = $1";

            // This query will error if refinery has never ran, e.g.
            //
            // --> 'relation "refinery_schema_history" does not exist'
            //
            match sqlx::query(&sql).bind(version).fetch_optional($pool).await {
                Ok(row) => row.is_none(),
                Err(_e) => true, // @todo: maybe check correct error type? eh.
            }
        }
    };
}

This will save you from copy/pasting that logic in all migrations that you have to get dirty with.

jxs · 2021-05-12T22:21:42Z

Hi, thanks for your example!
One other possible solution would be implementing iterator for Runner's migrations which could iterate with (Connection, Migration

dimitri-lenkov · 2021-05-13T06:26:50Z

Hi @jxs (and the internet!) --- The solution I came up with is buggy & doesn't work when starting from a fresh instance of a database (e.g. in test cases where you start with a fresh database).

I thought it worked fine because I was migrating an existing database, but, it does NOT work as you'd expect from a fresh database...

It seems like my second-connection hack eagerly evaluates the SQL before refinery actually runs migrations that come before it.

To further illustrate this, imagine that you're starting on a fresh database and have these two migrations:

V1__init.rs (Normal: Returns a SQL String that refinery expects)
V2__complex_blob_migration.rs (The hack I wrote that opens a second connection)

It seems like the V2__complex_blob_migration.rs (the stuff I wrote out above with the second connection) runs before V1.

I'm coming to this conclusion because in my V1 script, I'm creating tables, and in the "complex migration V2 script" I'm manipulating those tables, but I get this error:

`relation "table_name" does not exist`

Which leads me to believe that something interesting is going on, and I shouldn't take this approach...

Hi, thanks for your example!
One other possible solution would be implementing iterator for
Runner's migrations which could iterate with (Connection, Migration

I'll spend tomorrow on this and ping here with how far I get. Thank you so much for the guidance!

. . .

EDIT: I think I understand why my approach is buggy, because, it's running the migrations on the 'gather phase' of this library, i.e.

/// Get the gathered migrations.
pub fn get_migrations(&self) -> &Vec<Migration> {
    &self.migrations
}

https://docs.rs/refinery-core/0.5.1/src/refinery_core/runner.rs.html#262-264

when what I want it for them to be ran in the 'run phase' --- 😄

dimitri-lenkov · 2021-05-13T06:41:08Z

@jxs just to be clear, you're suggesting that I implement Iterator for the refinery struct Runner?

jxs · 2021-05-13T11:05:00Z

I was considering it as a possibility to address this issue, and comparing with the other suggestion #119 (comment) to provide a callback to Runner. I think it might be a cleaner api for the end user but is definitely harder to implement, I would need to change the way migrations currently run, also for migrate_async would need to be Stream instead

dimitri-lenkov · 2021-05-13T18:07:41Z

Oh ok. This may be out of reach for me. Thank you for your time.

jxs · 2021-05-14T22:01:22Z

would you like to try to tackle as a provided callback function to Runner instead?

dimitri-lenkov · 2021-05-15T19:50:59Z

Hey @jxs --- I found a library called schemamama that has a PostgreSQL driver. It requires a little more code to set it up, but it does what I need at this moment.

I think I'm really looking for an "abstract migration" tool similar to node-migrate and not anything tied to SQL-specific use cases. i.e. something that keeps track of functions ran with-or-without database modifications.

Forking schemamama (or implementing my own trait?) will be easier for me to accomplish as it's a much smaller codebase (it's a single lib.rs file under 300 LOC).

It seems like the majority of the migrate tools on crates.io are SQL specific. Now I'm thinking that refinery shouldn't worry about this use case, stay purely SQL, and defer this sort of "data migration" to another tool.

🤔

nuttycom · 2022-08-01T20:28:25Z

I'd like to chime in on this issue by asking: how do barrel migrations access the database connection that is used by refinery? It's not clear to me from the examples what's going on here; ideally I'd also like to be able to access the database connection directly (to be able to perform transformations on data in Rust that require querying the database first and then performing computations that require other external context, before writing the result.)

Related, is there any way (apart from global variables) to pass additional context data (such as cryptographic keys that are never stored in the database, but which are used to construct public keys that are stored) to a migration?

jxs · 2022-08-02T12:49:18Z

Hi!

I'd like to chime in on this issue by asking: how do barrel migrations access the database connection that is used by refinery?

they don't, barrel migrations are converted to String which is what refinery expects.

It's not clear to me from the examples what's going on here; ideally I'd also like to be able to access the database connection directly (to be able to perform transformations on data in Rust that require querying the database first and then performing computations that require other external context, before writing the result.)

Related, is there any way (apart from global variables) to pass additional context data (such as cryptographic keys that are never stored in the database, but which are used to construct public keys that are stored) to a migration?

can you give a use case?

nuttycom · 2022-08-13T18:16:58Z

Related, is there any way (apart from global variables) to pass additional context data (such as cryptographic keys that are never stored in the database, but which are used to construct public keys that are stored) to a migration?

can you give a use case?

I've actually opted to go with schemer instead because it supports my uses and also allows migrations to form a DAG, but here's an example of the use case:
zcash/librustzcash@2f71dee#diff-54ec5e42408ac28d55641fbe89818c7a8ea21c666b6e4b79b059776661c4bce9R173-R282

This performs a couple operations in Rust in the process of the migration:

Updates the derivation of cryptographic keys, using checks against the existing data to ensure that they're derived from the same private key material
Since a few different versions of the previously existing database structure existed "in the wild" prior to the addition of migrations to the stack, introspect the existing schema and use the results to choose how to execute part of the migration.

superstator · 2023-11-03T19:44:14Z

Is this still something you'd be interested in a PR for?

Here's my use case: my system uses sqlite to catalog a large number of binary files on disk, indexing some basic attributes like time or content type. During a migration I may want to update that sqlite database and simultaneously update or transform those files in some way. I want to be able to do this stepwise with migrations, and ideally let those updates raise errors that could abort the migration process.

I've been playing around with it a little in a fork, and it seems like the simplest approach is to allow for adding pre/post migration hooks (closures) to run with each migration. The closure would have access to a reference to the current migration, and could include whatever other context it wanted, including a database connection. The only obvious downside to me is that the closure wouldn't have access to the actual migration transaction, but at least for us that's not a showstopper.

let mut connection = Connection::open(&db_path).unwrap();
let mut runner = crate::embedded::migrations::runner();

let hook_connection = Connection::open(&db_path).unwrap();
runner.add_post_migration_hook(move |migration| {
    hook_connection.execute("select foo from bar", rusqlite::params![]).map_err(|_| anyhow::anyhow!("oh noe"))?;
    warn!("we just ran this migration: {migration:?}");
    Ok(())
});

runner.run(&mut connection)?;

jxs · 2023-11-07T14:18:43Z

Hi, yeah I am def interested if this is something that helps users having more control. Curious, why do you prefer closures over something like having a .next() (we could even impl Iterator) function, that returns an applied migration?

superstator · 2023-11-07T19:18:01Z

I could go either way. The advantage to closures I think is just the scope of the changes; it's relatively easy to supply some Fns and run them at the appropriate moment in the migration process, esp. if we're not worried about blocking during async migrations. An enumerator could be more flexible, though it makes a pre-migration hook a little trickier. But that might be a YAGNI thing anyway.

Another thought I had as I played with this is that it would be nice if there was an enum generated along with the migrations, so that in a hook/enumeration block I could match against that rather than a string name:

for migration in runner {
    match migration.variant {
        Migrations::V9__cleanup => do_cleanup(),
        _ => () // do nothing
    }
}

superstator · 2023-11-08T23:19:16Z

Actually, I think the iterator approach isn't so much harder after all. But in going through the various migrate methods I've noticed that the migrate_grouped implementations in sync.rs and async.rs include a check for Target::Fake or Target::FakeVersion(_):

refinery/refinery_core/src/traits/sync.rs

Line 71 in 1f646f9

    
           if let Target::Version(input_target) | Target::FakeVersion(input_target) = target {

If supplied, the migrations table is updated but the actual migration query isn't executed. The regular un-batched migrate implementations don't honor that flag, and only look for Target::Version(_):

refinery/refinery_core/src/traits/sync.rs

Line 29 in 1f646f9

if let Target::Version(input_target) = target {

Is that intentional? It feels like a bug, but I thought I'd check before I just fixed it unilaterally.

superstator · 2023-11-10T23:20:17Z

I have iteration working, but: the iterator needs to hold onto the database connection. So, either you need a non-standard iter() method like runner.iter(&mut conn), or you need to supply the database connection to the runner when it's instantiated, not when it's run, like crate::embedded::migrations::runner(&mut connt). One is ugly and non-idiomatic, the other is a breaking change. I'd vote for the breaking change since it seems relatively minor, but what would you prefer?

jxs · 2023-11-23T19:32:51Z

Hi, and sorry for the delay! The problem with the first is that it's not only breaking it also breaks the current way the macros work as they don't have access to the connection when the instantiate the Runner.
Curious, why do you say having a Runner::iter(self, connection) that consumes self (or not) and returns a new struct that impls Iterator is idiomatic and ugly?

superstator · 2023-11-24T18:55:18Z

iter methods by convention are implementations of IntoIterator, which does not allow for any extra parameters, just self. Especially if you want to be able to do something like

for migration in runner {
    ...
}

you need the Runner to have a mutable reference to the db connection before you start iterating.

None of which is a big deal, and I have this all working in my branch. Certainly not the only way to do it, but it felt the most natural to me. Maybe it will be clearer if I just send you the PR and we can discuss there.

jxs added the enhancement New feature or request label Jul 14, 2020

jxs mentioned this issue Dec 2, 2023

Runner IntoIterator support #293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Access to DB connection inside migration #119

Feature Request: Access to DB connection inside migration #119

sbrocket commented Jul 13, 2020

jxs commented Jul 14, 2020

dimitri-lenkov commented May 12, 2021 •

edited

jxs commented May 12, 2021

dimitri-lenkov commented May 13, 2021 •

edited

dimitri-lenkov commented May 13, 2021

jxs commented May 13, 2021

dimitri-lenkov commented May 13, 2021

jxs commented May 14, 2021

dimitri-lenkov commented May 15, 2021 •

edited

nuttycom commented Aug 1, 2022

jxs commented Aug 2, 2022

nuttycom commented Aug 13, 2022 •

edited

superstator commented Nov 3, 2023

jxs commented Nov 7, 2023

superstator commented Nov 7, 2023

superstator commented Nov 8, 2023

superstator commented Nov 10, 2023

jxs commented Nov 23, 2023

superstator commented Nov 24, 2023

Feature Request: Access to DB connection inside migration #119

Feature Request: Access to DB connection inside migration #119

Comments

sbrocket commented Jul 13, 2020

jxs commented Jul 14, 2020

dimitri-lenkov commented May 12, 2021 • edited

jxs commented May 12, 2021

dimitri-lenkov commented May 13, 2021 • edited

dimitri-lenkov commented May 13, 2021

jxs commented May 13, 2021

dimitri-lenkov commented May 13, 2021

jxs commented May 14, 2021

dimitri-lenkov commented May 15, 2021 • edited

nuttycom commented Aug 1, 2022

jxs commented Aug 2, 2022

nuttycom commented Aug 13, 2022 • edited

superstator commented Nov 3, 2023

jxs commented Nov 7, 2023

superstator commented Nov 7, 2023

superstator commented Nov 8, 2023

superstator commented Nov 10, 2023

jxs commented Nov 23, 2023

superstator commented Nov 24, 2023

dimitri-lenkov commented May 12, 2021 •

edited

dimitri-lenkov commented May 13, 2021 •

edited

dimitri-lenkov commented May 15, 2021 •

edited

nuttycom commented Aug 13, 2022 •

edited