Skip to content

A configurable stand-alone Git hook to reject pushes of binary data

Notifications You must be signed in to change notification settings

avar/pre-receive-reject-binaries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Long qw(GetOptions);

GetOptions(
    'help|?'      => \my $help,
    'man'         => \my $man,
    'dry-run=i'   => \(my $dry_run = 1),
    'log-level=s' => \(my $OUTPUT_LOG_LEVEL = 'NOTICE'),
    'always-fail' => \my $always_fail,
) or die "Error parsing options";

if ($help or $man) {
    require Pod::Usage;
    Pod::Usage::pod2usage(-exitstatus => 0, -verbose => 1) if $help;
    Pod::Usage::pod2usage(-exitstatus => 0, -verbose => 2) if $man;
}

my $NAME = 'pre-receive-reject-binaries';

=encoding utf8

=head1 NAME

pre-receive-reject-binaries - A configurable Git hook to intelligently reject binary pushes

=head1 SYNOPSIS

    pre-receive-reject-binaries --help
    pre-receive-reject-binaries --man

    # To actually make this hook do anything wrap it in a shellscript
    # that does this
    pre-receive-reject-binaries --dry-run=0

    # Do whatever we would have done but fail in cases where we'd
    # succeed, for testing purposes
    pre-receive-reject-binaries --dry-run=0 --always-fail

    # By default we only emit NOTICE level output on STDERR, that can
    # be made a lot more verbose with DEBUG or TRACE. We can also pipe
    # DEBUG or TRACE output to a log command, see
    # hook.pre-receive-reject-binaries.log-command.
    pre-receive-reject-binaries --dry-run=0 --log-level=TRACE

=head1 DESCRIPTION

This is a Git hook meant to be set up as a C<pre-receive> hook (see
C<githooks(5)>) that'll reject the addition of binary data to a
repository, either all binary additions if they on a per-commit basis
go above a given size.

The general strategy of this hook is that when we get a push for a
given "master" branch we'll do a C<log --stat> of C<$branch..$to> and
find all the commits that add binary data, and how much they add.

Each commit in the push is then given a quota of how much binary data
is allowed, if any commit goes above that quota the entire push is
rejected. Depending on the configuration (see below) the user is
allowed to force the push to go through by amending the commit message
to include some string saying they forced it through.

To entirely reject binary pushes you can set the size limit to 0 and
don't define an override message to allow users to push those changes
manually.

You can also allow some amount of binary data in the repository
per-commit, e.g. to allow committing small icons but not giant images.

Of course someone could be clever and commit a bunch of huge
Base64-encoded data that wouldn't be detected by Git as binary, or
manually split up huge binary data into multiple commits, each of whom
don't go above the configured limit.

This hook is not meant to stop a dedicated attacker from enlarging
your Git history, it's meant to stop someone who doesn't know better
("what do you mean people have to download my binary data on every
checkout, forever?!") from accidentally messing up the history.

We only care about updates to the "master" branch for two reasons, one
is that if you're e.g. doing some temporary work and committing some
binaries to a custom branch temporarily that's fine as long as that
branch doesn't make it to "master", and eventually gets deleted from
$whatever.git.

The other is that if you we were to reject pushes to other branches,
especially newly created ones we'd have to deal with the special case
of deciding what to use as the merge-base for that branch. Do we use
the "master" branch? do we validate the entire history? Easier to just
not deal with it. Want the hook to do something smart about that?
Patches welcome.

Keep in mind though that this is deliberately written to be fast on
humongous repositories, so anything that breaks that promise would
have to at least be optional.

We do handle the initial push to the "master" branch itself by just
validating the entire history being pushed.

=head1 INSTALLATION

Our only dependencies are a working perl interpreter. We only depend
on modules that have shipped with perl itself forever, so this hook
should just work out of the box on any *nix-like OS that has perl
installed.

To enable it for a given bare repository you want to push to just
create a F<hook/pre-receive> with something like:

    #!/bin/sh
    /path/to/where/you/cloned-pre-receive-reject-binaries/pre-receive-reject-binaries --dry-run=0

See L</CONFIGURATION> below for how to configure it. We shell out to
C<git config> so you can enable this configuration per-repository, or
globally (via e.g. F</etc/gitconfig>) or any combination of the two.

=head1 CONFIGURATION

Here's an example of what a typical configuration for this hook might
look like:

    [hook "pre-receive-reject-binaries"]
        master-branch-name = master
        max-per-commit-size-increase = 1024
        commit-override-message = "YES I WANT TO FOREVER INCREASE THE SIZE OF core.git BY ADDING {BYTES} BYTES OF BINARY DATA TO IT!"
        support-contact = "git@lists.example.com or the 'support' Jabber channel"
        log-command = /gitroot/contrib/hooks/pre-receive-reject-binaries/log-command
        log-command-level = TRACE
        blocked-push-command = /gitroot/contrib/hooks/pre-receive-reject-binaries/blocked-push-command
        unblocked-push-command = /gitroot/contrib/hooks/pre-receive-reject-binaries/unblocked-push-command

For example commands for the C<*-command> switches see the
F<examples/> directory in the F<pre-receive-reject-binaries> Git
distribution for an example of what the author uses this hook for.

Detailed documentation about each option below:

=head2 hook.pre-receive-reject-binaries.master-branch-name

The C<master> branch name you want to not allow binary pushes
to. Usually C<master>, can also be C<trunk> or whatever.

=head2 hook.pre-receive-reject-binaries.max-per-commit-size-increase

If a given commit adds files whose cumulative file size is above this
we reject it. In bytes.

=head2 hook.pre-receive-reject-binaries.commit-override-message

If there actually is a good reason to go above the configured limits
they can be overridden.

To do this the commit message of any offending commit needs to be
changed to include this string. Occurrences of C<{BYTES}> in the will
be replaced by however many bytes are being added. E.g. this can be
set to:

    YES I WANT TO FOREVER INCREASE THE SIZE OF $whatever.git BY ADDING %BYTES% BYTES OF BINARY TO IT!

If that string is present in the commit message we'll ignore that
commit for the purposes of computing the size of the commit. To not
allow any overrides just don't set this.

=head2 hook.pre-receive-reject-binaries.support-contact

A string we'll use when we reject the push as the support contact,
e.g. an E-Mail address or Jabber channel.

=head2 hook.pre-receive-reject-binaries.log-command

An executable script that we'll pipe all our output into. Useful for
e.g. spewing all output users see into a log.

=head2 hook.pre-receive-reject-binaries.log-command-level

The log level we'll use for the output given to the
L</hook.pre-receive-reject-binaries.log-command>. Can be C<NOTICE>,
C<DEBUG> or C<TRACE>. The C<NOTICE> messages are the messages we show
to users, anything above that is verbose output mainly intended to
debug this script.

=head2 hook.pre-receive-reject-binaries.blocked-push-command

A command we'll run whenever this hook blocks a push, it'll get all
the output we emit piped into it. Useful for e.g. sending a mail
whenever we have a blocked push.

=head2 hook.pre-receive-reject-binaries.unblocked-push-command

A command we'll run if we have a push that we would have rejected, but
was unblocked through via the facility described in
L</hook.pre-receive-reject-binaries.commit-override-message>. If
that's not enabled we won't be calling this.

=head1 AUTHOR

Ævar Arnfjörð Bjarmason <avar@cpan.org>

=head1 COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Ævar Arnfjörð Bjarmason <avar@cpan.org>

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.

=cut

# Can't believe I'm implementing my own logging framework...
my %LOGLEVELS;
BEGIN {
    %LOGLEVELS = (
        NOTICE => 1,
        DEBUG  => 2,
        TRACE  => 3,
    );
}
use constant \%LOGLEVELS;

message(DEBUG, ("=" x 79));

# Get our configuration from the Git repository
my $config_master_branch_name;
my $config_max_per_commit_size_increase;
my $config_commit_override_message;
my $config_support_contact;
my $config_log_command;
my $config_log_command_level;
my $config_blocked_push_command;
my $config_unblocked_push_command;
{
    my $config_variable;

    $config_variable = 'hook.pre-receive-reject-binaries.master-branch-name';
    chomp($config_master_branch_name = qx[git config $config_variable]);
    error("PANIC: $0 has not had <$config_variable> configured for this repository. That configuration is mandatory, see the documentation for the hook") unless $config_master_branch_name;

    $config_variable = 'hook.pre-receive-reject-binaries.max-per-commit-size-increase';
    chomp($config_max_per_commit_size_increase = qx[git config $config_variable]);
    error("PANIC: $0 has not had <$config_variable> configured for this repository. That configuration is mandatory, see the documentation for the hook") unless $config_max_per_commit_size_increase;

    chomp($config_commit_override_message = qx[git config hook.pre-receive-reject-binaries.commit-override-message]);
    chomp($config_support_contact = qx[git config hook.pre-receive-reject-binaries.support-contact]);
    chomp($config_log_command = qx[git config hook.pre-receive-reject-binaries.log-command]);
    chomp($config_log_command_level = qx[git config hook.pre-receive-reject-binaries.log-command-level]);
    $config_log_command_level ||= 'NOTICE';
    chomp($config_blocked_push_command = qx[git config hook.pre-receive-reject-binaries.blocked-push-command]);
    chomp($config_unblocked_push_command = qx[git config hook.pre-receive-reject-binaries.unblocked-push-command]);
}

# We'll include this state in any detailed error messages
my (@updates, @updates_to_main_branch, $update);

while (my $line = <STDIN>) {
    chomp $line;
    my ($from, $to, $raw_ref) = split / /, $line;
    message(TRACE, "Parsed a line <$line> from STDIN as <$from> <$to> <$raw_ref>");
    error("PANIC: We should get '<old-value> SP <new-value> SP <ref-name> LF' here. Got '$line' instead")
        unless $from and $to and $raw_ref;

    my ($ref_type, $ref_name)= $raw_ref =~ m[^refs/(heads|tags)/(.+)$]s
        or error("PANIC: Unable to parse the ref name <$raw_ref>");

    push @updates => {
        from     => $from,
        to       => $to,
        raw_ref  => $raw_ref,
        ref_type => $ref_type,
        ref_name => $ref_name,
    };
}

@updates_to_main_branch = grep { $_->{ref_name} eq $config_master_branch_name } @updates;
# No updates to the branch we care about? Cool.
_exit(0, 1) unless @updates_to_main_branch;

# There's only one!
$update = $updates_to_main_branch[0];

# If you push to a new branch we'll just validate the entire history
# of your push.
my $null_ref = '0000000000000000000000000000000000000000';
my $over_nine_thousand = '9001,9001';
if ($update->{to} eq $null_ref) {
    message(DEBUG, "You're deleting the <$config_master_branch_name> branch (pushing to <$null_ref>). Allright then, not our job to stop you");
    _exit(0, 1);
} elsif ($update->{from} eq $null_ref) {
    message(DEBUG, "You are doing an initial push to <$config_master_branch_name> (pushing from <$null_ref>). Validating all the history");
    chomp(my @log = qx[git log --pretty=format:%H -M100% --stat=$over_nine_thousand $update->{to}]);
    $update->{log} = \@log;
    chomp(my @rev_list = qx[git rev-list $update->{to}]);
    $update->{rev_list} = \@rev_list;
} else {
    message(DEBUG, "You are doing a push to <$config_master_branch_name> from <$update->{from}> to <$update->{to}>");
    chomp(my @log = qx[git log --pretty=format:%H -M100% --stat=$over_nine_thousand $update->{from}..$update->{to}]);
    $update->{log} = \@log;
    chomp(my @rev_list = qx[git rev-list $update->{from}..$update->{to}]);
    $update->{rev_list} = \@rev_list;
}
@{$update->{rev_list_hash}}{@{$update->{rev_list}}} = ();

# Parse out the log for the push we're getting. State machine galore.
my $current_commit;
my $seen_commits = 0;
LINE: for my $line (@{$update->{log}}) {
    if (!$current_commit) {
        error("PANIC: The first line we get should be a commit hash, it's <$line>") unless exists $update->{rev_list_hash}->{$line};
        $current_commit = $line;
        $seen_commits++;
        next LINE;
    } elsif (exists $update->{rev_list_hash}->{$line}) {
        $current_commit = $line;
        $seen_commits++;
        next LINE;
    }

    # The empty line at the end of log messages.
    next LINE if $line eq '';
    # Skip "X files changed, Y insertions...", and also non-binary
    # changes.
    next LINE unless $line =~ /\|\s+Bin /;

    # Match the filename and the size change, try to deal with files
    # with whitespace in their name by using the greedy match. Why not
    # use --numstat? Because it doesn't tell us about the size of
    # binary files!
    my ($file, $from_bytes, $to_bytes) = $line =~ /^ (.*?)\|\s+Bin ([0-9]+) -> ([0-9]+) bytes$/s;
    $file =~ s/\s*$//; # Remove whitespace leading up to the "| Bin"
    $line =~ s/^ //;   # All --stat lines are prefixed by whitespace

    push @{ $update->{parsed_binary_log}->{$current_commit} } => {
        file => $file,
        from => $from_bytes,
        to   => $to_bytes,
    };
}
error(" PANIC: We should have seen <" . @{$update->{rev_list}} . "> commits in the log we just parsed, but saw <$seen_commits> instead. Our parser is broken!")
    unless @{$update->{rev_list}} == $seen_commits;

unless (exists $update->{parsed_binary_log}) {
    # We have no binary files in this push, just let the whole thing
    # through, nothing more to do here.
    message(DEBUG, "Parsed <" . @{$update->{rev_list}} . "> commits in this push and found no binary files. Letting it through");
    _exit(0);
}

# Figure out which commits are naughty, and which are nice.
for my $parsed_commit (keys %{$update->{parsed_binary_log}}) {
    my $commit_size = 0;
    PARSED_FILE: for my $parsed_file (@{$update->{parsed_binary_log}->{$parsed_commit}}) {
        my ($file, $from, $to) = @$parsed_file{qw(file from to)};

        # It's very important that we don't block the deletions of big
        # files, they'll still be in the history but at least they
        # won't be in the checked out tree anymore.
        if ($to == 0) {
            message(TRACE, "Skipping <$parsed_commit>'s <$file>. Was <$from> bytes but now deleted (or empty) at <$to> bytes");
            next PARSED_FILE;
        }

        $commit_size += $to;
    }

    if ($commit_size > $config_max_per_commit_size_increase){
        message(TRACE, "The commit <$parsed_commit>'s adds <$commit_size> bytes of binary data. This is above our limit of <$config_max_per_commit_size_increase>");
        $update->{bad_commits}->{$parsed_commit} = $commit_size;
    }
}

my $blocked_push;
my $unblocked_push;
if (exists $update->{bad_commits}) {
    message(NOTICE, ("=" x 79));
    message(NOTICE, "You are trying to push <$update->{from}..$update->{to}> to the <$config_master_branch_name> branch");
    message(NOTICE, "but have gone above the configured size quota for binary files.");
    message(NOTICE, "These are the commits that we found to contain more than <$config_max_per_commit_size_increase> bytes of binary additions:");
    message(NOTICE, ("=" x 79));

    # We're looping through the rev-list so we'll emit the commits in
    # the order that they're being pushed
    my @bad_commit_messages;
    COMMIT: for my $commit (reverse @{$update->{rev_list}}) {
        next COMMIT unless my $added_n_bytes = $update->{bad_commits}->{$commit};
        chomp(my $commit_message = qx[git show --no-abbrev-commit --stat $commit]);

        my $exempted = '';
        if ($config_commit_override_message) {
            my $cp = $config_commit_override_message;
            $cp =~ s/\{BYTES\}/$added_n_bytes/g;
            if ($commit_message =~ /\Q$cp\E/) {
                $update->{unblocked_commits}->{$commit} = undef;
                $exempted = ', BUT exempted!';
            }
        }
        $commit_message =~ s/^/(" " x 4)/meg;
        $commit_message =~ s/\bcommit (.{40})/commit $1 (added $added_n_bytes bytes$exempted)/;

        push @bad_commit_messages => $commit_message;
    }
    message(NOTICE, join(
        "\n" . ("=" x 79) . "\n",
        @bad_commit_messages,
    ));
    message(NOTICE, ("=" x 79));
    message(NOTICE, "");

    my $rejected = 1;
    if (not exists $update->{unblocked_commits}) {
        message(NOTICE, "You've tried to push big binary data to this repository so we're rejecting your push,");
        message(NOTICE, "binary data does *NOT* belong in a Git repository intended for source control.");
        message(NOTICE, "Whatever you push to the <$config_master_branch_name> branch will *forever* live in *every* checkout of the repository!");
        message(NOTICE, "");
        if ($config_support_contact) {
            message(NOTICE, "If you have questions about this please contact $config_support_contact,");
            message(NOTICE, "Make sure to paste the entire output of this push up to and including");
            message(NOTICE, "the 'git push' command you used.");
            message(NOTICE, "");
        }
        if ($config_commit_override_message) {
            message(NOTICE, "If for some reason you think this data is important enough to live in this repository");
            message(NOTICE, "you can force your push to be accepted by changing the commit message of offending");
            message(NOTICE, "commits to include this literal string, without quotation marks:");
            message(NOTICE, "");
            message(NOTICE, "    '$config_commit_override_message'");
            message(NOTICE, "");
            message(NOTICE, "You need to change any occurrence of {BYTES} in that message to be the");
            message(NOTICE, "actual amount of bytes that commit is adding, we've computed that on a per-commit");
            message(NOTICE, "basis in the output above, look for '(added {BYTES} bytes)'.");
            message(NOTICE, "");
            message(NOTICE, "If you want to do that or remove the offending commits from being pushed but don't know");
            message(NOTICE, "how read the 'INTERACTIVE MODE' section of 'git help rebase'.");
            message(NOTICE, "");
        }
    } elsif (exists $update->{unblocked_commits}) {
        if (keys %{$update->{unblocked_commits}} < keys %{$update->{bad_commits}}) {
            message(NOTICE, "You've unblocked some commits to get past the filters for binary data");
            message(NOTICE, "but you still have bad commits. These are the commits you're unblocking:");
            message(NOTICE, "");
            COMMIT: for my $commit (reverse @{$update->{rev_list}}) {
                message(NOTICE, "* $commit (adds $update->{bad_commits}->{$commit} bytes)") if exists $update->{unblocked_commits}->{$commit};
            }
            message(NOTICE, "");
            message(NOTICE, "And these are the commits that still violate the binary size policy of this repository:");
            message(NOTICE, "");
            COMMIT: for my $commit (reverse @{$update->{rev_list}}) {
                message(NOTICE, "* $commit (adds $update->{bad_commits}->{$commit} bytes)")
                    if exists $update->{bad_commits}->{$commit} and not exists $update->{unblocked_commits}->{$commit};
            }
            message(NOTICE, "");
        } elsif (keys %{$update->{unblocked_commits}} == keys %{$update->{bad_commits}}) {
            message(NOTICE, "You've decided to unblock all of the commits we detected as bad through:");
            message(NOTICE, "");
            COMMIT: for my $commit (reverse @{$update->{rev_list}}) {
                message(NOTICE, "* $commit (adds $update->{bad_commits}->{$commit} bytes)") if exists $update->{unblocked_commits}->{$commit};
            }
            message(NOTICE, "");
            message(NOTICE, "This push will be reported");
            $unblocked_push = 1;
            $rejected = 0;
        }
    }

    if ($rejected) {
        message(NOTICE, ("=" x 79));
        message(NOTICE, "This push is being rejected. Please don't add binary data to this repository!");
        message(NOTICE, ("=" x 79));
        $blocked_push = 1;
        _exit(1);
    } else {
        message(NOTICE, ("=" x 79));
    }
}

_exit(0);

my @messages;

sub _exit {
    my ($code, $really) = @_;

    my $pre_exit = sub {
        # We're not checking exit values here because we don't want
        # the push to fail due to some log hook failing
        if ($config_log_command and @messages) {
            if (open my $fh, "|-", $config_log_command) {
                print $fh $_, "\n" for map { message_filter($_->[0], $config_log_command_level, $_->[1]) } @messages;
                close $fh or warn "Couldn't close($config_log_command): <$!>";
            } else {
                warn "Couldn't open($config_log_command): <$!>";
            }
        }

        if ($config_blocked_push_command and $blocked_push) {
            if (open my $fh, "|-", $config_blocked_push_command) {
                print $fh $_, "\n" for map { message_filter($_->[0], 'NOTICE', $_->[1]) } @messages;
                close $fh or warn "Couldn't close($config_blocked_push_command): <$!>";
            } else {
                warn "Couldn't open($config_blocked_push_command): <$!>";
            }
        }

        if ($config_unblocked_push_command and $unblocked_push) {
            if (open my $fh, "|-", $config_unblocked_push_command) {
                print $fh $_, "\n" for map { message_filter($_->[0], 'NOTICE', $_->[1]) } @messages;
                close $fh or warn "Couldn't close($config_unblocked_push_command): <$!>";
            } else {
                warn "Couldn't open($config_unblocked_push_command): <$!>";
            }
        }

        return;
    };

    if (!$really and !$code and $always_fail) {
        message(NOTICE, "We would have exited successfully with <$code> but we're set to always fail for testing purposes");
        $pre_exit->();
        exit 1;
    }

    if ($code and $dry_run) {
        message(NOTICE, "We would have rejected this push but we're set to --dry-run=1 for testing purposes");
        $pre_exit->();
        exit 0;
    }


    $pre_exit->();
    exit $code;
}

sub message_filter {
    my ($message_log_level, $log_level, $message) = @_;

    # This is so very ugly
    if ($message_log_level == NOTICE) {
        return $message;
    } elsif ($message_log_level == DEBUG) {
        return $message if $LOGLEVELS{$log_level} == DEBUG or $LOGLEVELS{$log_level} == TRACE;
    } elsif ($message_log_level == TRACE) {
        return $message if $LOGLEVELS{$log_level} == TRACE;
    }
    return;
}

sub message {
    my ($message_log_level, $what) = @_;

    $what =~ s/^/$NAME: /mg;
    $what =~ s/ $//mg;

    if (my $filtered_what = message_filter($message_log_level, $OUTPUT_LOG_LEVEL, $what)) {
        print STDERR $filtered_what, "\n";
    }

    push @messages => [$message_log_level => $what];

    return;
}

sub error {
    my ($what) = @_;

    $what =~ s/^/$NAME: /mg;
    $what =~ s/ $//mg;

    message(NOTICE, "ERROR: $what");
    _exit(1);
}

About

A configurable stand-alone Git hook to reject pushes of binary data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages