Skip to content

Latest commit

 

History

History
157 lines (114 loc) · 11.8 KB

ARCHITECTURE.md

File metadata and controls

157 lines (114 loc) · 11.8 KB

The purpose of delta is to transform input received from git, diff, git blame, grep, etc to produce visually appealing output, including by syntax highlighting code.

Initialization

Delta reads user options from [delta] sections in .gitconfig, and from the command line.

Input

Delta reads from stdin, hence one can do e.g. git diff | delta. Note that when git's stdout is sent to a pipe, (a) git does not emit ANSI color escape sequences unless --color=always, and (b) git does not start its own pager process.

Users typically configure git to use delta as its pager. In that case, git sends its stdout to delta behind the scenes (with ANSI color escape sequences), without the user needing to pipe it explicitly.

Parsing the input

Delta parses input using a state machine in which the states correspond to semantically distinct sections of the input (e.g. HunkMinus means that we are in a removed line in a diff hunk). The core dispatching loop is here.

pub fn delta<I>(lines: ByteLines<I>, writer: &mut dyn Write, config: &Config) -> std::io::Result<()>
where
    I: BufRead,
{
    StateMachine::new(writer, config).consume(lines)
}

pub enum State {
    DiffHeader(DiffType),
    HunkHeader(DiffType, ParsedHunkHeader, String, String),
    HunkZero(DiffType, Option<String>),
    HunkMinus(DiffType, Option<String>),
    HunkPlus(DiffType, Option<String>),
    Unknown,
}


impl<'a> StateMachine<'a> {
    fn consume<I>(&mut self, mut lines: ByteLines<I>) -> std::io::Result<()>
    where
        I: BufRead,
    {
        while let Some(Ok(raw_line_bytes)) = lines.next() {
            self.ingest_line(raw_line_bytes);

            // Every method named handle_* must return std::io::Result<bool>.
            // The bool indicates whether the line has been handled by that
            // method (in which case no subsequent handlers are permitted to
            // handle it).
            let _ = self.handle_commit_meta_header_line()?
                || self.handle_diff_stat_line()?
                || self.handle_hunk_header_line()?
                || self.handle_hunk_line()?
                || self.emit_line_unchanged()?;
        }
        self.painter.paint_buffered_minus_and_plus_lines();
        Ok(())
    }
}

Output

Delta creates a child pager process (less) and writes its output to the stdin of the pager process. Delta's navigate feature is implemented by constructing an appropriate regex and passing it as an argument to less.

Core utility data structures

  • config::Config This is a struct with many fields corresponding to all user options and miscellaneous other useful things. It might be possible to store it globally, but currently the code passes references to it around the call stack.

  • paint::Painter This struct holds the syntax highlighter, and a writable output stream (connected to the stdin of the child less process). It also holds two line buffers: one to store all the removed ("minus") lines encountered in a single diff hunk, and one to hold the added ("plus") lines.

Handling diff hunk lines

Here we will follow one code path in detail: handling diff hunk lines (removed/unchanged/added). This is the most important, and most complex, code path.

Recall that git diff output contains multiple diff "hunks". A hunk is a sequence of diff lines describing the changes among some lines of code that are close together in the same file. A git diff may have many hunks, from multiple files (and therefore multiple languages). Within a hunk, there are sequences of consecutive removed and/or added lines ("subhunks"), separated by unchanged lines. (The term "hunk" is standard; the term "subhunk" is specific to delta.)

The handler function that is called when delta process a hunk line is handle_hunk_line. This function stores the line in a buffer (one buffer for minus lines and one for plus lines): the processing work is not done until we get to the end of the subhunk.

Now, we are at the end of a subhunk, and we have a sequence of minus lines, and a sequence of plus lines.

image

Delta processes a subhunk (paint_minus_and_plus_lines) as follows:

image
  1. Compute syntax (foreground) styles for the subhunk

    We call the syntect library to compute syntax highlighting styles for each of the minus lines, and each of the plus lines, if the minus/plus styles specify syntax highlighting. The language used for syntax-highlighting is determined by the filename in the diff. For a single line, the result is an array of (style, substring) pairs. Each pair specifies the foreground (text) color to be applied to a substring of the line (for example, a language keyword, or a string literal).

  2. Compute diff (background) styles for the subhunk

    Again, the call yields, for each line, an array of (style, substring) pairs. Each pair represents foreground and background colors to be applied to a substring of the line, as specified by delta's *-style options.

    In order to compute the array of style sections, the call has to (1) infer the correct alignment of minus and plus lines, and (2) for each such "homologous pair", infer the edit operations that transformed the minus line into the plus line (see within-line-diff-algorithm).

    For example, for a minus line, we may have inferred that the line has a homologous plus line, and that a word has been deleted. By default, delta applies a bright red background color to such a word and lets the foreground color be determined by the terminal emulator default foreground color (minus-emph-style = normal "#901011"). On the other hand, for an added word, delta by default applies a bright green background color, and specifies that the foreground color should come from the syntax highlighting styles (plus-emph-style = syntax "#006000").

  3. Process subhunk lines for side-by-side or unified output

    At this point we have a collection of lines corresponding to a subhunk and, for each line, a specification of how syntax styles and diff styles are applied to substrings of the line. These data structures are processed differently according to whether unified or side-by-side diff display has been requested.

  4. Superimpose syntax and diff styles for a line

    Before we can output a line of code we need to take the two arrays of (style, substring) pairs and compute a single output array of (style, substring) pairs, such that the output array represents the diff styles, but with foreground colors taken from the syntax highlighting, where appropriate. The call is here (superimpose_style_sections).

  5. Output a line with styles converted to ANSI color escape sequences

    The style structs that delta uses are implemented by the ansi_term library. Individual substrings are painted with their assigned style, and concatenated to form a utf-8 string containing ANSI color escape sequences.

Within-line diff algorithm

There is currently only one within-line diff algorithm implemented. This considers all possible pairings for a given line and for each one, computes the minimum number of edit operations between the candidate pair. The inferred pairing is the one with the smallest edit distance. (The number of comparisons is constrained by the possible interleavings, and furthermore a greedy heuristic is used, so that the number of comparisons is not quadratic).

Features

Delta features such as line-numbers, side-by-side, diff-so-fancy, etc can be considered to consist of (a) some feature-specific implementation code, and (b) a collection of key-value pairs specifying the values that certain delta options should take if that feature is enabled. Accordingly, each such "feature" is implemented by a separate module under src/features/. Each of these modules must export a function named make_feature whose job is to return key-value pairs for updating the user options.

Common terms used in the code

minus a removed line in a diff hunk (i.e. the lines starting with -)
zero an unchanged line in a diff hunk
plus an added line in a diff hunk (i.e. the lines starting with +)
style a struct specifying foreground colors, background colors, and other attributes such as boldness, derived from ANSI sequences
style_sections an array of (style, section) tuples
paint to take a string without ANSI color sequences and return a new one with ANSI color sequences
hunk a diff hunk
subhunk a consecutive sequence of minus and/or plus lines, without any zero line