New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
small regex changes for better generic log support #2851
Conversation
I figured this would cause issues with any tests haha, let me know which changes make sense and I can fix the problems. |
Thank you for the PR @dkanada! Lots of good ideas in here. I'll go through your proposed changes one by one. I will only comment on whether these changes should be made not how they were/will be made. (Comments on the implementation will follow.) In general, please try to share examples if you can. There is no one log format, so I have no idea what the logs you try to highlight look like.
Accepted.
From my testing, Java stack traces looked ok but I haven't tested C# exceptions. Could you please provide a few stack traces (just all in one file) that don't look good?
Accepted.
Accepted.
Accepted.
File paths should be tokenized as such. IMO we should rather tokenize UUIDs inside file paths.
Strings are commonly used to store this kind of data. I don't really see why would want to not highlight strings. Do you have an example?
Yes, that is intentional. The problem is that some log formats use tabs or spaces to separate arguments. The separator between a path and an error message might be a single space. A trade-off between false positives and false negatives had to be made and I decided on false negatives. |
Regarding the string issues, I have no problems with matching strings at all. However, it seemed like the string matches were blocking any further matches for UUID or other more specific tokens. I don't personally want to highlight strings in my logs, but I definitely want to use a special color for IP addresses. I thought the UUID was overriding paths but I can't reproduce that one now so you can ignore that comment.
|
It actually might not be an issue with strings but simply the special token matching - perhaps quotes aren't getting counted as word boundaries. I feel like wrapping variables in quotes is a relatively common practice though, so this doesn't seem unreasonable. One more item I wanted to bring up was the properties, which match a trailing colon. Was that implemented to improve YAML and key-value output? Perhaps it would be better to match the colons as operators and remove them from the property rule with a non-capturing group. |
This is something I wanted to include in my first comment but forgot to. The solution to this is to highlight IP-addresses, UUIDs, etc inside strings. Prism has the
Good idea. Go for it! (But please use a lookahead.)
Yes, many logs contain key-value pairs using this pattern. But I actually never tested it on yaml files. |
I decided against the property colon change in the end. One last issue is that the hash rule is designed in a way that currently conflicts with the new GUID changes. How do you want that resolved? Also, it seems the build CI is failing since the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I expected to update that manually
Yes. Run npm run build
please.
the hash rule is designed in a way that currently conflicts with the new GUID changes
I haven't thought of that... This is definitely a problem but how are we going to resolve it?
The string 01234567890123456789012345678901
(32 hex digits) could be a hash (e.g. an MD5 hash) or a GUID without hyphens. What is it?
I say we drop unhyphenated GUIDs. UUIDs and GUIDs are supposed to be hyphenated anyway. Let's require the hyphens again.
I'm poking at the |
Yes, you need to duplicate the patterns you want inside strings. Luckily, there are variables :) In the case of log files, it should probably be done kinda like this: var uuid = { pattern: ..., ... }; // the UUID pattern
var url = /.../; // the URL pattern
Prism.languages.log = {
'string': {
// Single-quoted strings must not be confused with plain text. E.g. Can't isn't Susan's Chris' toy
pattern: /"(?:[^"\\\r\n]|\\.)*"|'(?![st] | \w)(?:[^'\\\r\n]|\\.)*'/,
greedy: true,
inside: {
'url': url,
'uuid': uuid
}
},
...,
'url': url,
...,
'uuid': uuid,
...
}; Please don't forget to wrap the whole thing into an IIFE or else we pollute the global namespace. |
After updating the tests for the exception rule, I noticed URLs and other tokens inside the exception were no longer getting matched, which might not be desirable. Since I have somewhat lost my spare time recently, I'd rather get the current changes merged than let the PR grow stale while I consider what to do about the inner matching. |
Perfectly understandable. Thank you for contributing! |
I encountered a few small issues while trying out the new generic log support.
Levels: Serilog uses three letter acronyms for their log levels
Exception: Many programming languages would benefit from highlighted exceptions (Java and C# for example)
UUID: Microsoft uses GUIDs which can be valid without dashes
Date: Serilog and Python output datetimes without a
T
character by defaultTime: RFC 3339 allows
+00:00
for the timezone portionThere were a few larger problems I encountered that I'd like to get some input on before changing the order and existence of the current rules.
file-path
rule. The logfile I was testing used strings to avoid path ambiguity (further exacerbating the string problem) but it doesn't seem like that rule accounts for this situation.