Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #9116: Error source highlighting in presence of tabs #9118

Open
wants to merge 9 commits into
base: trunk
Choose a base branch
from

Conversation

Armael
Copy link
Member

@Armael Armael commented Nov 13, 2019

This PR fixes error messages with source highlight in batch mode in presence of tabulations, that is, messages of the form:

File ..., line ..., characters ...:
23 | <TAB><TAB>...error...
                  ^^^^^

Before the PR, the ^ characters are aligned using spaces. After the PR, the spikes (^) are aligned using tabs whenever a tab is used in the source code (and spaces otherwise). (I added a test in a first commit, then fixed the test in the next commit)

Note that this PR does not addresses situations where tabs are used in the middle of the error (how many spikes for each tab?) or multi-line errors. It's not so clear what to do in those cases, but fortunately I don't think they would come up too much in practice.

Fixes #9116 .

@Armael
Copy link
Member Author

Armael commented Nov 13, 2019

Actually, after discussing with @Octachron it might not be too hard to address multi-line errors as well. I'll add a commit for that.

parsing/location.ml Outdated Show resolved Hide resolved
parsing/location.ml Outdated Show resolved Hide resolved
@Armael
Copy link
Member Author

Armael commented Nov 17, 2019

I just added three commits that propose to address the issue for multiline errors.

Before the PR, a multiline error is displayed as follows. If the source code is of the form:

   foobar error
     error error foobar

where "foobar" is not part of the error, then the error location is displayed as:

..........error
     error error.......

i.e. every character that is not part of the error but shares a line with it is replaced by a dot. This is problematic in presence of tabs, since replacing a tab with a dot will break the alignment.

To fix the issue, I propose to slightly change the way multiline errors are displayed, and only replace with a dot non-space characters outside the error. On the source code above, the new format shows:

   ...... error
     error error ......

If alignment is performed using tabs, then alignment is now preserved.

Again, the PR is best reviewed commit by commit.

@gasche
Copy link
Member

gasche commented Nov 17, 2019

Reading your description of the approach, I thought it was a good idea, but in fact I'm not fond of the look of testsuite examples with irregular dot placement:

Lines 3-4, characters 4-10:
3 |   . Int (x : a)
4 |   | Bool x .. .

Could we consider instead just displaying spaces for characters other than tab? This would give the following

Lines 3-4, characters 4-10:
3 |     Int (x : a)
4 |   | Bool x     

(The simplest implementation implies having trailing spaces on the last line, which I think is perfectly fine.)

Another approach would be to use fixed ellipsis markers (eg. [..] or ...) right before and after the location, but this sounds trickier from an alignment point of view. One possible approach would be to show the ellipsis markers, on either side, only when there is space to do so (when the distance between the line begin/end and the location is larger than the marker size).

@gasche
Copy link
Member

gasche commented Nov 17, 2019

(In fact one should consider the distance between the location and the closest tab character (excluded), to avoid precisely this alignment issue.)

@Armael
Copy link
Member Author

Armael commented Nov 17, 2019

Right, I wasn't fond of the irregular dot placement myself, and I like both ideas of replacing all dots by spaces and inserting ellipsis markers. I'll see whether the second option is not too cumbersome to implement; and if it is, I'll go with the first one.

@gasche
Copy link
Member

gasche commented Nov 17, 2019

In this example, spaces look as follows

Lines 3-4, characters 4-10:
3 |     Int (x : a)
4 |   | Bool x     

and the ellipsis markers would look as follows

Lines 3-4, characters 4-10:
3 | ... Int (x : a)
4 |   | Bool x ...

Lines 3-4, characters 4-10:
3 |[..] Int (x : a)
4 |   | Bool x [..]

It's clear to me that [..] looks bad, but I don't have a strong preference between spaces and ....

@Armael
Copy link
Member Author

Armael commented Nov 17, 2019

I just pushed a first implementation of the "..." approach (in separate commits that we can rebase later if we decide to keep them).

@gasche
Copy link
Member

gasche commented Nov 17, 2019

This looks nice; nitpick: we should print ellipses if there are non-whitespace characters before (or after) on the line.

@gasche
Copy link
Member

gasche commented Nov 18, 2019

After a whiteboard discussion with Armaël, we propose the following approach:

  • for each line, there may be several intervals of locations we want to signal; we consider the set of non-interesting locations (one, possibly empty, before the first location; one, possibly empty, at the end; zero or more in the middle); let's call those "elided zones"
  • we want to keep the tabulation at the exact same column as in the original line, without any assumption on their visual size
  • elided zone with only whitespace should remain as only whitespace
  • for the first elided zone, before the location, we want to print a right-justified ellipsis -- only if the zone contains non-whitespace; this is possible if the distance between the last tabulation and the end of the zone allows it
  • for the last elided zone, after the location, we want a left-justified ellipsis (same reasoning)
  • for the elided zones in the middle, we want to print both a left- and a right-justified ellipsis using the same reasoning. If the space does not allow it (if the ellipsis would be consecutive or overlapping), we only print the left-justified one.

Using ... as the ellipsis marker, the odd cases are elided zones in the middle with size 8 or less (the ellipsis marker plus a space, doubled), and the display would look as foo ...    bar -- where foo, bar are interesting locations.

@kit-ty-kate
Copy link
Member

kit-ty-kate commented Aug 3, 2021

I encountered the issue that this PR is trying to fix, on OCaml 4.13 while browsing the build logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error messages with wrong alignment due to tab characters
4 participants