Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avisynth lang definition. #3071

Merged
merged 14 commits into from Sep 15, 2021
Merged

Avisynth lang definition. #3071

merged 14 commits into from Sep 15, 2021

Conversation

Zinfidel
Copy link
Contributor

@Zinfidel Zinfidel commented Sep 12, 2021

I will not be offended if you recoil in disgust from this PR when you see the corpus that has been programmed into it. AviSynth is a scripting language that acts as a video frame server, so that you can programmatically edit video.

Why are there that many words?

AviSynth poses an interesting challenge for syntax highlighting, as it can be argued that its built-in functions and filters represent the core of its functionality. The very goal of the language itself, which is to edit video, is achieved by these built-in filters. Indeed, many AviSynth scripts (maybe even most?) are just a small collection of its included filters, such as Trim, AddBorders, etc.

This is the reason why I decided early on to include the full list of internal functions/filters as highlighted elements in this definition. Without considering the internal functions/filters as part of the core language features, there is actually very little to highlight overall. Whether this choice was the right one, I do not truly know.

Questions to Answer

  • Internal functions/filters/properties use the builtin token, which in the default theme is the same color as strings. This seems... wrong, especially since many of these functions take string arguments. Is the builtin token really the wrong token for these? Perhaps it should be function instead?
  • I have also included highlighting for user-defined functions, and external filters using the function token. It may be the case that the internal functions/filters/properties should be using this token instead. If that is the case, though, then user-defined/external filters probably don't have an appropriate token and will remain unhighlighted.
  • I made a really contentious choice to highlight some predefined, special constants as they appear in strings. You can see a couple at the top of my linked github.io example. AviSynth has very few constants, and these string constants only ever appear in those exact functions to enable special (but common) functionality. This isn't a must have.
  • This definition becomes unusably slow for editing for large scripts. The QTGMC script takes nearly a full second to update for each entered character, for example.
  • The last variable is quite special in AviSynth, and I have assigned it the variable token. However, it can show up a LOT in scripts, and may contribute to a "color soup" problem as a result. Highlighting this word as the only variable type may be a mistake.

@github-actions
Copy link

github-actions bot commented Sep 12, 2021

JS File Size Changes (gzipped)

A total of 3 files have changed, with a combined diff of +2.64 KB (+56.6%).

file master pull size diff % diff
components/prism-avisynth.min.js 0 Bytes 2.62 KB +2.62 KB +100.0%
plugins/autoloader/prism-autoloader.min.js 2.25 KB 2.26 KB +7 B +0.3%
plugins/show-language/prism-show-language.min.js 2.42 KB 2.43 KB +14 B +0.6%

Generated by 🚫 dangerJS against b3d6a99

Copy link
Member

@RunDevelopment RunDevelopment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making this PR @Zinfidel!

I have a few suggestions, so please take a look at my comments.

I will not be offended if you recoil in disgust from this PR when you see the corpus that has been programmed into it.

lol. Don't worry, Avisynth isn't the worst by far.

  • Internal functions/filters/properties use the builtin token, which in the default theme is the same color as strings. This seems... wrong, especially since many of these functions take string arguments. Is the builtin token really the wrong token for these? Perhaps it should be function instead?

Honestly, I can't even tell you what builtin is supposed to be used for. Some languages use it for built-in functions, other built-in classes, other built-in constants, and so on.

function is probably a better choice, yes.

  • I have also included highlighting for user-defined functions, and external filters using the function token. It may be the case that the internal functions/filters/properties should be using this token instead. If that is the case, though, then user-defined/external filters probably don't have an appropriate token and will remain unhighlighted.

You can use 'builtin-function': { pattern: /something/, alias: 'function' } for built-in functions instead. This will give user functions and built-in functions the same highlighting by default and users can still customize the highlighting if so desired.

  • I made a really contentious choice to highlight some predefined, special constants as they appear in strings. You can see a couple at the top of my linked github.io example. AviSynth has very few constants, and these string constants only ever appear in those exact functions to enable special (but common) functionality. This isn't a must have.

I think it makes the code more readable, so let's leave it in for now.

  • This definition becomes unusably slow for editing for large scripts. The QTGMC script takes nearly a full second to update for each entered character, for example.

That's the browser updating the whole DOM.

Here's a performance record created using Chrome.

image

The actual highlighting only takes 30ms. Parsing the generated HTML, updating the DOM, and recalculating layout are where the browser spends most of its time. There isn't really much we can do about it, unfortunately. We have a plugin in the works that will split the highlighted code into lines. Using this plugin we could do differential updates that's still far in the future.

  • The last variable is quite special in AviSynth, and I have assigned it the variable token. However, it can show up a LOT in scripts, and may contribute to a "color soup" problem as a result. Highlighting this word as the only variable type may be a mistake.

This might not necessarily be a problem. With built-in functions and user functions being highlighted the same way, the "color soup" problem might go away on its own.

components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
@Zinfidel
Copy link
Contributor Author

@RunDevelopment

lol. Don't worry, Avisynth isn't the worst by far.

Oh wow, I had no idea...

I have moved all of the internal stuff over to 'builtin-function' as suggested. Unfortunately, @vadosnaprimer revealed to me last night that functions that take no arguments (or rather, take only the implicit first clip argument) do not actually need the parentheses to qualify as a statement. So,

KillAudio == KillAudio() == last.KillAudio == last.KillAudio()

This means that external/user-functions are indistinguishable from variables in some cases without semantic(?) parsing. I could still try to catch them by looking for identifiers that proceed a . OR precede a ( but it would only guarantee partial coverage of all the cases. I see three possible moves here:

  1. Remove highlighting of user/external functions
  2. Ignore this yucky use case
  3. Regex for the above mentioned pattern and only match user/external functions that can be guaranteed

I'm not sure if partial matching is acceptable at all. If the highlighting is removed, function definitions could still get hightlighted at least.

Copy link
Member

@RunDevelopment RunDevelopment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functions that take no arguments (or rather, take only the implicit first clip argument) do not actually need the parentheses to qualify as a statement.

That makes things more difficult... I say we still try to highlight functions. A few false negatives here and there are acceptable.


I have a suggestion for a new token: argument-label. I noticed that variables assignments are statements and not expressions in AviSynth. This means that we know for certain that a=b if followed by ( or , is an argument name.

		'argument': { ... },
		'argument-label': {
			pattern: /([,(][\s\\]*)\w+\s*=(?!=)/,
			lookbehind: true,
			inside: {
				'argument-name': {
					pattern: /^\w+/,
					alias: 'punctuation'
				},
				'punctuation': /=$/
			}
		},

(The alias: 'punctuation' is debatable...)

This can make long function calls a bit more readable:

With
image

Without
image

components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
Zinfidel and others added 2 commits September 13, 2021 18:56
Co-authored-by: Michael Schmidt <mitchi5000.ms@googlemail.com>
Internal functions now match any internal word bounded by word boundaries. User/External functions now match identifiers after a ., before a (, or both.
@Zinfidel
Copy link
Contributor Author

I have a suggestion for a new token: argument-label.

I really like it! That helps the QTGMC script immensely and looks natural to me, so I'm pretty sure I've seen this type of styling elsewhere. Punctuation does seem iffy, but I think that punctuation is probably universally styled to be downplayed in themes, which is the intent here so I think it works.

That makes things more difficult... I say we still try to highlight functions. A few false negatives here and there are acceptable.

I did some testing, and discovered that clip properties are actually no different from other functions and filters (including being useable without parenthesis) and so I was able to collapse the definition for all of these tokens down into one area.

Unfortunately, that led me to the discovery of yet another problem: AviSynth allows variable shadowing. This means that you can define variables with the same name as built-in functions/filters/properties and it's valid. I noticed it in a script that I personally use quite a bit. Here's a few more test cases:

You'll notice in encode.avs, there is a string variable called time that gets defined, which shadows the internal function time. This causes it to get highlighted as a function. I added a negative lookahead for the pattern identifier = identifier so that when the name is on the left-hand side this ambiguity is resolved, but right-hand side situations remains highlighted.

So now, this leads to a case with occasional false negatives and occasional false positives. The fact that those scripts have variables named this way is probably a bad thing and should be corrected, and I don't think those of who worked on it had any idea the shadowing was occurring.

Copy link
Member

@RunDevelopment RunDevelopment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AviSynth allows variable shadowing.

There's no way for us to resolve that shadowing. To implement the shadowing rules we would need a parser and we don't have one...

We now have 2 options:

  1. Try to prevent false positives. (You already implemented this with the (?!\s*=).)
  2. Assume no shadowing.

While option 1 is more correct, it is also less consistent. Option 2 will get some variables wrong (false positive) but it will do so consistently.

I think we should go for consistency. It's easier to implement on our side and we use the same strategy in other languages as well.

components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
components/prism-avisynth.js Outdated Show resolved Hide resolved
@vadosnaprimer
Copy link

Highlighting terms that are reused for variables helps the user to know they are actually internal functions/variables and to stop using them. Does it still work fine in avs if you use the internal filter and also a variable that has its name in the same script?

Co-authored-by: Michael Schmidt <mitchi5000.ms@googlemail.com>
@Zinfidel
Copy link
Contributor Author

I tested this abomination:

int = 5
bool = yes
string = "test"
float = 0.5
# global = 10 # error

ColorBars()
Subtitle(string(float))

global time = "not the time"

function test(clip c, float float, int "global") {
    c = c.Subtitle(string(float), align=1)
    c = c.Subtitle(string(global), align=9)
    c = c.Subtitle(time, align=5)
    return c.Subtitle(string(time("%X")), align=3)
}

test(0.5, global=int)

SwapUV = "don't do this"

#SwapUV # <--- returns a string
SwapUV()

... and it works. 😥 The int, bool, string, and float variables all get subtitled "correctly" in that function. The arguments with type/keyword names work too. In the QTGMC script I actually noticed this - the MAnalyze function uses a "global" optional parameter. It looks like filters that are shadowed by variables will resolve to the variables unless followed by parenthesis, which then gets the function instead.

@vadosnaprimer As you can see, you can mix your variables and filters all in the same script. I agree with you that consistently highlighting the internal functions/filters/properties is a good way to let users know that something is amiss, and I think @RunDevelopment agrees that consistency is the way to go here even if it means more false positives.

Also remove positive lookahead on builtin-function for consistency.
@RunDevelopment
Copy link
Member

I tested this abomination:

Wow. I can understand shadowing built-in functions and variables but keywords? Damn.

test(0.5, global=int)

Beautiful.


Yeah, I think we can all kinda agree that using keywords/types as identifiers is a bad idea. I don't think we need to support this because the overwhelming majority of people won't do this. Highlighting 98% of inputs correctly is good enough.

- Simplify property tests to account for more permissive actual usage.
- Simplify keyword tests to account for more permissive matching.
- Add missing - operator to operators test.
- Add missing [] punctuation to punctuation test.
- Add argument-label test.
@Zinfidel
Copy link
Contributor Author

Yeah, I think we can all kinda agree that using keywords/types as identifiers is a bad idea. I don't think we need to support this because the overwhelming majority of people won't do this. Highlighting 98% of inputs correctly is good enough.

Agreed. I think "incorrectly" highlighting instances of keywords where they are used as variables/parameters probably has the benefit of indicating to users that they are doing something they really should not be doing anyway.

I was more concerned with handling edge cases initially because I didn't realize just how badly you could abuse the language before the testing I've done now.

@RunDevelopment RunDevelopment merged commit 746a4b1 into PrismJS:master Sep 15, 2021
@RunDevelopment
Copy link
Member

Thank you for contributing @Zinfidel!

@Zinfidel
Copy link
Contributor Author

And thank you for all of the help improving the definition @RunDevelopment !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants