Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Compile-time checked constraints #3166

Open
abonander opened this issue Apr 2, 2024 · 0 comments
Open

RFC: Compile-time checked constraints #3166

abonander opened this issue Apr 2, 2024 · 0 comments
Labels
enhancement New feature or request macros proposal

Comments

@abonander
Copy link
Collaborator

abonander commented Apr 2, 2024

This concept came about from a discussion between myself and @iamjpotts.

Motivation

Handling of constraint errors is a pretty typical task in any database application.

As a simple example, a platform with user registration will likely want to ensure that usernames are unique so they can unambiguously refer to one user. This is pretty easy to enforce in SQL with a UNIQUE constraint, which usually has the benefit of automatically creating an index over the column as well:

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    -- Of course, you'd likely want this to be case-insensitive, or just always fold to lowercase.
    username TEXT UNIQUE NOT NULL,
    password_hash TEXT NOT NULL,
);

A naive implementation may perform a SELECT query before attempting to insert a user record, to check if the username is already taken:

let username_taken: bool = sqlx::query_scalar("SELECT EXISTS(SELECT 1 FROM users WHERE username ILIKE ?)")
    .bind(&username)
    .fetch_one(&pool)
    .await?;

if username_taken {
     return Err("username taken");
}

let user_id: Uuid = sqlx::query_scalar("INSERT INTO users (username, password_hash) VALUES (?, ?) RETURNING user_id")
    .bind(&username)
    .bind(&password_hash)
    .fetch_one(&pool)
    .await?;

However, this takes two round-trips to the database, increasing response latency. The astute reader may have also noticed that the above code has a time-of-check to time-of-use (TOCTOU) bug because it does not perform the two queries in a transaction; another call to this function may race this one and create a user record with the same username, and so the insert may fail with an error which is likely not going to be surfaced to the user in a useful manner (e.g. a 500 Internal Server Error instead of a 400 Bad Request) since it wasn't anticipated by the developer.

You could fix the bug by performing the check and the insert in a transaction, but you can also eliminate a round-trip by just attempting the insert and using the unique constraint (which would have fired a violation error anyway in the above situation) to detect the collision. SQLx supports and encourages this use-case; however, this is not currently very ergonomic:

let user_id: Uuid = sqlx::query_scalar("INSERT INTO users (username, password_hash) VALUES (?, ?) RETURNING user_id")
    .bind(&username)
    .bind(&password_hash)
    .fetch_one(&pool)
    .await
    .map_err(|e| match e {
        sqlx::Error::DatabaseError(dbe) if dbe.constraint() == Some("users_username_key") => {
            MyError::from("username taken")
        },
        other => other.into(),
    })?;

In real Launchbadge projects, we've used an extension trait to make this more ergonomic, similar to what's demonstrated in realworld-axum-sqlx: https://github.com/launchbadge/realworld-axum-sqlx/blob/main/src/http/error.rs#L176-L184

However, this still leads to matching on magic strings most of the time, and can lead to forgetting to cover newly added constraints.

Design

I propose to add what I'm calling a "checked derive", which takes an enum as input and checks at compile-time that it exhaustively represents the constraints for a given table:

#[derive(sqlx::ConstraintEnum)]
#[sqlx(table_name = "users")]
pub enum UsersConstraints {
    /// The `UNIQUE` constraint of the `username` column was violated.
    #[sqlx(constraint = "users_username_key")]
    UsernameUnique,
}

Like any derive the primary function would be to generate trait impls. This would generate a FromStr impl that converts the constraint name string to a given variant, as well as an impl of the ConstraintEnum trait for type safety:

pub trait ConstraintEnum: FromStr {
    /// Get the name of the constraint for the given enum variant.
    fn name(&self) -> &'static str;
}

This impl, would then interact with an extension trait and intermediate error type:

pub trait ResultExt<T> {
    /// If this error represents one of the constraints covered by the given `ConstraintEnum`, map it to another error.
    fn on_constraint<C: ConstraintEnum, E>(self, on_constraint: FnOnce(C) -> E) -> Result<T, MaybeConstraintError<E>>;
}

// Implements at least: `std::error::Error`, `Debug`
pub enum MaybeConstraintError<E> {
    ConstraintError(E),
    Other(sqlx::Error),
}

// I'm *hoping* coherence allows at least some form of this, but I'm not currently certain.
impl<E1, E2> From<MaybeConstraintError<E1>> for E2 where E1: Into<E2>, sqlx::Error: Into<E2> {
    // ...
}

impl<T> ResultExt for Result<T, sqlx::Error> {
    fn on_constraint<C: ConstraintEnum, E>(self, on_constraint: FnOnce(C) -> E) -> Result<T, MaybeConstraintError<E>> {
        // ..
    }
}

This would then allow the above INSERT to be modeled like so:

let user_id: Uuid = sqlx::query_scalar("INSERT INTO users (username, password_hash) VALUES (?, ?) RETURNING user_id")
    .bind(&username)
    .bind(&password_hash)
    .fetch_one(&pool)
    .await
    // The above `From` impl should allow the `?` here to work, assuming `MyError` implements `From<sqlx::Error>`
    .on_constraint(|c: UserConstraints| match c {
        UserConstraints::UsernameUnique => MyError::new(StatusCode::BAD_REQUEST, "username taken"),
    })?;

Compile-Time Checking

Similar to the query macros, if DATABASE_URL is set and offline-mode is not enabled, the derive can check the constraints on the given table at compile-time (e.g. using INFORMATION_SCHEMA.TABLE_CONSTRAINTS or the equivalent in pg_catalog) and ensure that the enum exhaustively lists the constraints.

This can help ensure, for example, that a new constraint being added is covered by any code it might affect:

-- This is effectively equivalent to changing `username`'s type to `varchar(63)`,
-- however that raises a different error code than a constraint violation.
ALTER TABLE users ADD CONSTRAINT user_username_length CHECK (LENGTH(username) < 64));
-- Enforcing a minimum length is probably better done in your application code where you can change it more easily
-- without having to modify existing user records. However, checking that it's not empty is a good sanity check.
ALTER TABLE users ADD CONSTRAINT user_username_nonempty CHECK (LENGTH(username) > 0);

If the above were executed as a new migration, for example, the ConstraintEnum derive would then emit a compiler error calling out the uncovered constraints and force the developer to add coverage for them.

For convenience, I would maybe add a subcommand to sqlx-cli to automatically generate a ConstraintEnum definition for a given table, which can use the same logic to look up the constraints so the user doesn't have to manually. This would especially be useful if adapting SQLx to an existing database.

I would make this functionality optional so that users who don't wish to use compile-time checks can still benefit from the API improvement. Perhaps as a checked-derive feature that extends the derive feature added in #3113 (cc @saiintbrisson).

Without DATABASE_URL set and with the checked-derive feature disabled, the ConstraintEnum would work as a simple dumb derive. This would also make it possible to still use the query macros, but only check constraint enums, e.g., in CI or a pre-commit hook, by not enabling the feature in normal development (though that's likely to be flaky given Cargo's feature unification; a dependency could enable the feature non-optionally which would force it on everywhere). It should also obey SQLX_OFFLINE.

Additionally, I would make it possible to explicitly mark constraints as ignored, either as a temporary workaround or to ignore constraints that are not expected to fire in normal operation:

#[derive(sqlx::ConstraintEnum)]
#[sqlx(table_name = "users")]
// For example, these should realistically be checked before even touching the database.
#[sqlx(ignore("user_username_length", "user_username_nonempty")]
pub enum UsersConstraints {
    /// The `UNIQUE` constraint of the `username` column was violated.
    #[sqlx(constraint = "users_username_key")]
    UsernameUnique,
}

Or to mark a constraint enum as a non-exhaustive subset (only check that the declared constraints actually exist):

#[derive(sqlx::ConstraintEnum)]
#[sqlx(table_name = "users")]
#[sqlx(non_exhaustive)]
pub enum UsersConstraints {
    /// The `UNIQUE` constraint of the `username` column was violated.
    #[sqlx(constraint = "users_username_key")]
    UsernameUnique,
}

Bikeshedding: this should be semantically distinct from #[non_exhaustive] and should probably not use the same identifier to avoid confusion. But what should it be instead? check_declared_only?

Alternatives

Automatically generated enum

Instead, SQLx could automatically generate the full enum:

sqlx::constraint_enum!(pub UserConstraints, "users");

However, this has a number of disadvantages:

  • It would require either a compile-time connection to the database, or use the same offline data mechanism as the query macros.
  • The user cannot reason about the output of the macro without knowing what constraints exist on the given table.
    • The query macros generate predictable output from their input (assuming output columns are explicitly declared)
      and properly formatted query macro invocations should be perfectly readable and easily reasoned about on their own
      without an IDE expanding the macro.
  • Controlling the output of the macro would require a dedicated DSL for:
    • Ignoring constraints you don't care about;
    • Renaming variants for clarity;
      • notice that the UNIQUE constraint on users.username was automatically named users_username_key, which doesn't tell you a lot at a glance
    • Adding documentation comments to explain what each constraint violation means;
    • Adding other derives, such as thiserror::Error to turn the ConstraintEnum into an error type in its own right.
      • Supporting this use-case in this format would be more complex than just writing out the enum definition!
  • Go-to-definition would likely not be very useful, and IDE support in general would be very spotty and results would be slow.
@abonander abonander added enhancement New feature or request macros proposal labels Apr 2, 2024
@abonander abonander changed the title Compile-time checked constraints RFC: Compile-time checked constraints Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request macros proposal
Projects
None yet
Development

No branches or pull requests

1 participant