You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some languages like PL/SQL or the new T-SQL (#4390) are case-insensitive. When tokenizing, this is working correctly, e.g. the lexers are agnostic to casing. JavaCC has a grammar option and ANTLR since 4.10 as well.
However, when we convert the original tokens into CPD TokenEntries, we don't seem to use the token kind and use the original token text, which contains the original casing. It's therefore very easy to work around duplicated for these languages by just changing the casing:
echo'select a, b, c, d, e, f from table where x = 1 and y = 2;'> file1.plsql
cp file1.plsql file2.plsql
echo'sEleCt a, b, c, d, e, f frOm table where x = 1 and y = 2;'> file3.plsql
run.sh cpd --minimum-tokens 20 --language plsql --dir file1.plsql file2.plsql
results correctly in:
Found a 1 line (23 tokens) duplication in the following files:
Starting at line 1 of /home/andreas/temp/plsql/file1.plsql
Starting at line 1 of /home/andreas/temp/plsql/file2.plsql
select a, b, c, d, e, f from table where x = 1 and y = 2;
since file1.plsql and file2.plsql are identical.
However, comparing file1.plsql and file3.plsql which differ only in casing, shows no duplications:
Affects PMD Version: 6.x
Description:
Some languages like PL/SQL or the new T-SQL (#4390) are case-insensitive. When tokenizing, this is working correctly, e.g. the lexers are agnostic to casing. JavaCC has a grammar option and ANTLR since 4.10 as well.
However, when we convert the original tokens into CPD TokenEntries, we don't seem to use the token kind and use the original token text, which contains the original casing. It's therefore very easy to work around duplicated for these languages by just changing the casing:
results correctly in:
since file1.plsql and file2.plsql are identical.
However, comparing file1.plsql and file3.plsql which differ only in casing, shows no duplications:
I think, this problem affects both JavaCC and ANTLR based languages.
The text was updated successfully, but these errors were encountered: