Skip to content
Eddie Aftandilian edited this page Apr 19, 2019 · 1 revision

FAQ

Principles and goals

So formatter output is considered valid Google Style by definition?

Technically, no. It is merely a tool, which makes a best effort to follow the style guide. That said, there are no known bugs where formatting code will introduce a style violation. If you see one, please report it!

And of course, many style rules concern issues the formatter has nothing to do with, such as naming.

I just need to configure it a bit differently. How?

google-java-format exposes extremely few configuration options that govern formatting behavior. Most options are used only to switch entire behaviors on or off. (The primary exception is --aosp which allows the tool to be used with AOSP code, but inside Google this option is never surfaced through all the various integrations.)

The explicit goals of this tool are to bring consistency to the codebase, and to free developers from arguments over style choices. It is an explicit non-goal to support developers' individual preferences, and in fact this would work directly against our primary goals.

Also, the lack of configuration enables us to deliver a much more thoroughly-tested, high-quality tool. (As configurable parameters rise, the complexity and difficulty of testing rise exponentially.)

Finally, perhaps your preferred formatting is not merely a matter of taste, but one that just may be objectively superior to google-java-format's! In that case, please see Reporting issues. Maybe we can all benefit!

Why didn't it leave my perfectly valid code alone?

There are at least two kinds of source code formatter one could build. One kind finds violations and fixes them (an auto-correcting linter). The other kind ignores the existing formatting choices in the file, seeing only a stream of tokens, and chooses a formatting for those tokens following a consistent set of rules.

google-java-format (like clang-format before it) is of the second kind (with various exceptions). Its goal is not just to correct mistakes; it is to free developers from having to make formatting decisions in the first place, bringing greater consistency to your codebase. Every time the formatter decides to preserve an existing formatting choice, it works directly against that goal.

Users have also found it to be very liberating to not have to care what initial formatting they choose. Said one early adopter joyfully, "I just write code like a four-year-old doing finger paints!"

We have no plans to build the other kind of formatter.

But in this case, shouldn't it preserve my existing choice?

There are exceptions. For example, the formatter preserves your choices of interior blank lines inside a method implementation. Before opting to preserve an existing formatting choice, we check whether all three of these criteria are met:

  • The formatting choice has an important effect on readability.
  • The choice depends on the nature of the code in question, not just on varying personal preferences.
  • We consider it infeasible for the formatter to figure out an acceptable choice on its own.

Notice, for example, that these criteria are clearly met in the case of interior blank lines, and that's why the formatter preserves them.

Why did the formatter change X to Y?

The important thing to understand is that nearly every aspect of how the existing code was formatted is intentionally ignored. So, trying to understand its behavior in terms of why it chose certain changes will lead to confusion and difficulty communicating. For the most part, it sees only tokens, and it makes formatting decisions in total unawareness of the previous state.

What if I don't like what it did?

So you run the formatter, and you read the code it produced, and you spot something you strongly dislike. Yes, this will happen. No matter how much we improve the formatter, it will never be as smart or tasteful as a human like you. When this happens, please report an issue if you suspect the formatter could have made a better global decision.

But then, do you grit your teeth and keep google-java-format's "bad" formatting in your current change? Or hand-edit your code to "fix" the problem?

You're absolutely entitled to hand-fix your code, but there are costs to doing so. You won't want to run the formatter again for the current CL, lest your changes get changed back, so you'll want to delay making those layout changes until your code has stabilized. And later, the next time those same lines of code are edited, the formatter may undo those formatting changes again. It's up to the author or reviewer to decide whether to manually carry forward the old style or not.

And of course, the decision to hand-format the code isn't a one-time decision that sticks; it may recur over and over.

Why can't I use magic comments to disable formatting for certain sections of code?

We realize this situation will occasionally be annoying—but is it is annoying enough, often enough, to justify peppering our codebase with special // format:off comments?

So far, it seems the answer is no. The annoyance you feel is real, but it's usually transient, while those formatting comments would persist, visibly, in the code for everyone to see forever. They would reduce the signal-to-noise ratio and hurt readability, which is counter to the project goals.

Formatting can be disabled in javadoc comments using the <pre> tag.

Troubleshooting

It actually introduced a 100-column violation!

Please report an issue.

I ran it in change-only-modified-lines mode, but it reformatted lines I never touched.

The formatter can't reliably format some portions of code; whenever nearby untouched portions weren't formatted exactly as the formatter would have done, the result would be mangled. As a result, it expands each range it is given until it includes a complete region of code that it knows how to format.

This means that it typically reformats entire statements, as well as method and class signatures.

If you're seeing far more modifications than necessary, please report an issue.

Formatting choices

On what principles does it base its formatting decisions?

In rough priority order (most to least important):

  1. The physical layout of code should reflect the syntactic structure of the code, making the code easier to understand. See the style guide and the Rectangle Rule.
  2. Future code changes should have a small "blast radius"; a future change to code on one line should ideally not force formatting changes to surrounding lines. (For example, this explains why it doesn't use horizontal alignment.)
  3. Stylistic choices should be consistent with what most Google source code is already doing.
  4. Stylistic choices should be consistent with other languages at Google (particularly JavaScript and C++).
  5. Vertical space should be conserved. (While this is last in the list, it definitely still matters when all else is nearly equal.)

Why are its layout rules so hard and fast? Why doesn't it consider lots more options, then just pick the "best" one?

Internally, google-java-format uses a deterministic layout engine that runs in linear time. Some formatters take more time, so that they can, for instance, generate all possible layouts for some given statement and pick the “best” one (e.g., the one with the fewest lines, then the one with the lowest standard deviation of the line lengths). Unfortunately, because it runs in linear time, google-java-format cannot and thus does not do this. It might be a nice option, but it would require much more work.

Of course, even if google-java-format considered lots more options, we would still need a mechanical rule to determine which one is "best." There are fewer magical panaceas than one might like.

Why did it choose such an elaborate, overly nested layout when it could have fit it in 3 lines?

As covered above, the top priority for the formatter's formatting decisions is to make the physical code structure reflect the logical code structure. An interesting corollary is that when the logical code is highly complicated, the physical layout will appear highly complicated as well! This is Working As Intended.

Our advice is usually to extract some temporary variables and helper methods!

Why doesn't it format according to what the code means, not just what it says?

google-java-format already tries to do this, in some cases, when possible.

For example, without special treatment, the input expression 1 - 2 + 3 - 4 + 5 would be grouped like <<<<1 ^- 2> ^+ 3> ^- 4> ^+ 5> (in an internal markup language where < and > mark groupings and ^ marks optional breaks). This would produce surprising layouts like:

1 - 2 + 3
        - 4
    + 5

This follows from the break-from-the-top rule. To avoid showing the user quite so much left-associative structure, google-java-format regroups this expression as <1 ^- 2 ^+ 3 ^- 4 ^+ 5>, letting these breaks break independently at the same level, as opposed to breaking from the top. This allows the improved layout:

1 - 2 + 3
    - 4 + 5

(As shown here, google-java-format will even merge different binary operators if they have the same precedence. Although one might worry that that last layout could be misread as meaning … - (4 + 5), if one were in the mood to worry so, this is not a problem in practice.)

This special treatment is an improvement, and it would be great if we could extend this sort of improvement elsewhere. For example, since Sets.union is associative, we might wish to lay out Sets.union(s1, Sets.union(s2, Sets.union(s3, Sets.union(s4, s5)))) to better reflect its associativity. Unfortunately, google-java-format is not a compiler.

What do you mean, it's not a compiler?

For methods like Sets.union, above, it would be great if we could annotate our method definitions with rules on how their calls should be formatted. Associativity is one example; similarly, perhaps certain chained calls should be broken in special ways, and so on.

Unfortunately, google-java-format is not a compiler. When it formats a file, it doesn't resolve and read its imports and parse them and remember them. Doing this, and doing it well, would need much more work.

google-java-format does implement a few special formatting rules for method calls that depend only on the syntax at the call. For example, it can guess that a method call contains pairs of arguments and lay out each pair together, but such a rule has to be weak to avoid false positives. Also, in specific cases (currently just for calls that look like logging methods), its formatting depends on the names of the methods.

Why won't it remove extraneous semicolons or insert braces around while statement bodies?

We still hope it might in the future.

Why are some indentations so weird?

We do our best, but unusual cases sometimes arise.

As described above, the layout of the code follows its structure. Let's look at one example in detail.

In its internal markup language (as shown above), google-java-format notes what indentations should occur when breaks are broken. The statement currentEstimate = (currentEstimate + argument / currentEstimate) / 2.0f; is internally marked up as <+4currentEstimate = ^<+4(currentEstimate ^+ <+4argument ^/ currentEstimate>) ^/ 2.0f>>; . Here, breaks in each grouping are indented +4 spaces from their enclosing groupings, allowing layouts like:

currentEstimate = (currentEstimate + argument / currentEstimate) / 2.0f;

or:

currentEstimate =
    (currentEstimate + argument / currentEstimate) / 2.0f;

or:

currentEstimate =
    (currentEstimate + argument / currentEstimate)
        / 2.0f;

or:

currentEstimate =
    (currentEstimate
            + argument / currentEstimate)
        / 2.0f;

or:

currentEstimate =
    (currentEstimate
            + argument
                / currentEstimate)
        / 2.0f;

These layouts reflect the structure of the code, according to the Rectangle Rule. They may not always be what you would have generated by hand, but we believe they are readable and predictable.

This internal layout language is simple but powerful, We test the layout rules (which map from the parse tree to the layout language) against special test cases, and against google3, and tweak them when they produce unexpected results. (If you find an unexpected result, you can file a bug.)

Some cases are especially hard to test, like breaks following line comments. Consider the input:

currentEstimate = ( // This is a line comment.
    currentEstimate + argument / currentEstimate) / 2.0f;

google-java-format currently lays it out as:

currentEstimate =
    ( // This is a line comment.
        currentEstimate + argument / currentEstimate)
        / 2.0f;

This indentation is a little ugly, but the alternatives are ugly too. We haven't seen enough unusual forced breaks like this to form a good idea of how best to lay them out.

Why does it reformat more of the input file than it has to?

When google-java-format is invoked, it is optionally told which lines of its input file have been modified and should be reformatted. It first reformats the whole file, then merges the input and that intermediate output so that only the modified lines have been reformatted, plus a possibly non-empty “blast radius” on each. The merging step combines whole lines from the input and the intermediate whole-file output, such that the token stream in the merged output matches the input.

Once the blast radii have been chosen, we still must adjust indentation. If the input contains the lines:

> Function<T1, T2> function = new Function<>() { T2 apply(T1 x) {
>   otherFunction(
      x); // Long comment.
    }
  }

where only the marked lines have been modified, the whole-file output might be:

> Function<T1, T2> function =
>     new Function<>() {
>       T2 apply(T1 x) {
>           otherFunction(
                x); // Long comment.
        }
      }

(where the markings have been carried over). Merging these two blindly would give:

> Function<T1, T2> function =
>     new Function<>() {
>       T2 apply(T1 x) {
>           otherFunction(
      x); // Long comment.
    }
  }

This result is far from perfect. We should really adjust the indentation further, but it's not always clear how to. (Worse, we believe that implementing optimal reindentation might require multiple reformatting passes.)

To avoid hard problems, google-java-format sometimes extends the blast radii a little too far, causing too many lines to be reformatted. We imagine it might be possible to avoid the worst misindentations by extending the blast radii more intelligently, but how best to do this is an open question.

Is there anything else funny about google-java-format?

You bet!

For example, and as you know, the maximum length of lines in Java code is 100. We briefly considered setting the maximum trimmed line length (ignoring leading and trailing spaces) to be somewhat less, since we've heard that very long lines can be hard to read. We rejected this formatting option because, we imagined, it would surprise and annoy many users if the formatter avoided using the full 100 characters of width to which they might feel entitled.

Not annoying users is a great goal, but a tricky one. In several cases, we've had to balance abstract improvements against not annoying too many users. For example, the code fragment:

public Customer createCustomerLink(
    Long externalEntityId,
    String externalEntityIdStr,
    @Required EntityNamespaceSubtype initiator,
    @Required EntityNamespaceSubtype externalEntitySubtype,
    Customer.ChangeEvent creationEvent,
    Customer.ChangeEvent lastModificationEvent)
    throws ApiException;

was once laid out as:

public Customer createCustomerLink(
      Long externalEntityId,
      String externalEntityIdStr,
      @Required EntityNamespaceSubtype initiator,
      @Required EntityNamespaceSubtype externalEntitySubtype,
      Customer.ChangeEvent creationEvent,
      Customer.ChangeEvent lastModificationEvent)
    throws ApiException;

The earlier layout visibly separates the formals from the throws clause, but in the end we decided that the possible annoyance outweighed the possible benefit.

google-java-format always follows the rectangle rule, right? No exceptions?

Well, not quite. In a small number of cases we found that rigid adherence to the rectangle rule produced results that were surprising and unpleasant. So there are a few special-cases.

  • Methods shaped like Mockito.when are formatted as:

    when(remoteApi.findOrCreate(
            FOO_METADATA,
            Optional.<TheProto>absent(),
            AssignReserved.YES,
            AttachData.YES))
        .thenReturn(OPERATION);

    Using indentation to distinguish syntactic levels and always breaking from the top would produce:

    when(
            remoteApi.findOrCreate(
                FOO_METADATA,
                Optional.<TheProto>absent(),
                AssignReserved.YES,
                AttachData.YES))
        .thenReturn(OPERATION);

Javadoc formatting choices

Why is it changing my two spaces after a period to one?

If there's one thing we can all agree on, it's that we will never all agree on the great one-or-two-space debate. There's no consensus. We believe neither choice is inherently right or wrong.

However, a formatter doesn't have the option of being neutral: when it rewraps a paragraph, such that a period that was formerly at the end of the line is now in the middle, it must choose a number of spaces to put after it. And because it doesn't know whether the period ends a sentence or not, one space is the only reasonable choice. Because this could increase inconsistency, we decided to standardize to one space between tokens in all cases throughout your documentation.

Why does the formatter create one-liner Javadoc?

In other words, why does the formatter rewrite the following...

/**
 * Tests for {@link Foo}.
 */

...to the following...?

/** Tests for {@link Foo}. */

First, recall that the formatter ignores most existing formatting. Given that policy, we have two choices: Standardize to one line, or standardize to three.

One-liners are shorter. But converting between one-liners and three-liners can be tedious, so some users prefer to stick with three. The formatter can give us the best of both worlds: It converts between one and three lines automatically, reformatting as necessary.

Why doesn't the formatter touch <pre> or <table>?

The formatter does not touch <pre> sections because, in general, <pre> means something like "display this exactly as I have formatted it."

Of course, Javadoc <pre> sections often contain sample code. We may someday format such code automatically.

The formatter also does not touch <table> sections. It might someday, but for now, this seemed too difficult.

Why does the formatter break inside {@code} and {@link}?

While such breaks are not universally beloved, they are legal, commonly used, and occasionally necessary to stay under 100 columns.

How do I report a bug?

File bugs here.

It's nice to peruse this FAQ and the known issues first, but we don't yell at people for filing duplicates. Closing out lots of dups gives us a pleasant illusion of accomplishment.

If possible, please provide the specific code in question (ideally a link to the entire file, if possible). Please do this in a comment even if you found your bug filed already; more test cases are often useful.

Be aware: there will always be situations where you as a human being could make a better formatting choice than a robot. These don't all make for good issue reports. Sometimes there is simply no reasonable way we can expect a robot formatter, which has to serve the entire codebase, to have known what to do in your particular case. But when in doubt, file it!

Background info

Why is it named that?

What distinguishes it from other formatters is that it produces code in Google Style.

Clone this wiki locally