Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Supporting field transformers in filtering language #1789

Closed
6 tasks done
jasondellaluce opened this issue Apr 10, 2024 · 9 comments
Closed
6 tasks done

[Feature] Supporting field transformers in filtering language #1789

jasondellaluce opened this issue Apr 10, 2024 · 9 comments
Assignees
Labels
kind/feature New feature or request
Milestone

Comments

@jasondellaluce
Copy link
Contributor

jasondellaluce commented Apr 10, 2024

Action plan

We plan to execute the changes as follows:

All steps will require in-depth tests.

Motivation

Over time, we collected plenty of requests in the context of the filtering language of libsinsp. This little DSL (domain specific language) is the basis on which Falco rules are developed and executed, and also serves other use cases across the different adopters of the Falco libs. Feedback from adopters always indicated that the language is simple and expressive, but we acknowledge that it also suffers from some limitations. To list some:

  • Small modifications to existing fields mandate adding new fields
  • Minor changes in the semantics of a comparison operator mandate adding new operators (e.g. istartswith and iglob, which are just case insensitive versions of already-existing operators)
  • Field-to-field comparisons are not possible
  • Interpolation, composition, or small runtime transformations of existing types are not possible

Here's a non-comprehensive collection of issues from our repositories related to the topic:

The general feeling is that changing the nature of the language, or making it extra complex, would defeat the simplicy principles that made the rules language widely adopted and easy to learn. Moreover, the grammar of the filtering language is quite fragile and does not leave much space to edits without the risk of introducing breaking changes of vast magnitude.

However, we also argue that there are minor feasible changes could make the language far more expressive and powerful.

Feature

I want to share an R&D project that me and @Andreagit97 spent some time on over the past weeks.

Our proposal is to update the filtering language with the notion of Field transformers. Transformers are declarative transformations that can be applied to filter fields (e.g. proc.name, etc...) with the purposes of supporting new detection scenarios and filtering capabilities.

The proposed syntax is as follows (all fields and scenarios are random simple examples):

  • fd.name startswith "/etc": Traditional use case, which will be supported as usual
  • tolower(fd.name) startswith "/etc": Lower case conversion for string field types
  • toupper(fd.name) startswith "/ETC": Upper case conversion for string field types
  • b64(evt.buffer) bcontains deadbeef: base64 decoding for string and bytebuf field types
  • proc.name != val(proc.pname): field-to-field comparisons
  • tolower(proc.name) != tolower(proc.pname): field-to-field comparisons, with transformers
  • toupper(b64(fd.name)) = TESTFILE: base64 decoding for string and bytebuf field types

Here are some properties of field transformers:

  • Implemented as an additional feature of the language, thus not introducing any breaking change from to the current state of things
  • Have strong typing, thus non-ambiguous
  • Easy to implement new ones for future use cases, making them future proof
  • Are composable (e.g. toupper(b64(fd.name)))

The grammar of the filtering language (current state:

// Context-free Grammar for Sinsp Filters
) will evolve in the following backward-compatible and non-ambiguous way:

Productions (EBNF Syntax):
    Expr                    ::= OrExpr
    OrExpr                  ::= AndExpr ('or' OrExprTail)*
    OrExprTail              ::= ' ' AndExpr
                                | '(' Expr ')'
    AndExpr                 ::= NotExpr ('and' AndExprTail)*
    AndExprTail             ::= ' ' NotExpr
                                | '(' Expr ')'
    NotExpr                 ::= ('not ')* NotExprTail
    NotExprTail             ::= 'not(' Expr ')'
                                | Check
    Check                   ::= Field Condition
                                | FieldTransformer Condition 
                                | Identifier
                                | '(' Expr ')'
    FieldTransformer        ::= FieldTransformerType FieldTransformerTail
    FieldTransformerTail    ::= FieldTransformerArg ')'
    FieldTransformerArg     ::= FieldTransformer
                                | Field
    FieldTransformerOrVal   ::= FieldTransformer
                                | FieldTransformerVal Field ')'
    Condition               ::= UnaryOperator
                                | NumOperator (NumValue | FieldTransformerOrVal)
                                | StrOperator (StrValue | FieldTransformerOrVal)
                                | ListOperator (ListValue | FieldTransformerOrVal)
    ListValue               ::= '(' (StrValue (',' StrValue)*)* ')'
                                | Identifier
    Field                   ::= FieldName('[' FieldArg ']')?
    FieldArg                ::= QuotedStr | FieldArgBareStr 
    NumValue                ::= HexNumber | Number
    StrValue                ::= QuotedStr | BareStr

Supported Check Operators (EBNF Syntax):
    UnaryOperator           ::= 'exists'
    NumOperator             ::= '<=' | '<' | '>=' | '>' 
    StrOperator             ::= '==' | '=' | '!='
                                | 'glob ' | 'iglob '
                                | 'contains ' | 'icontains ' | 'bcontains '
                                | 'startswith ' | 'bstartswith ' | 'endswith '
    ListOperator            ::= 'intersects' | 'in' | 'pmatch' 
    FieldTransformerVal     ::= 'val('
    FieldTransformerType    ::= 'tolower(' | 'toupper(' | 'b64('

Tokens (Regular Expressions):
    Identifier              ::= [a-zA-Z]+[a-zA-Z0-9_]*
    FieldName               ::= [a-zA-Z]+[a-zA-Z0-9_]*(\.[a-zA-Z]+[a-zA-Z0-9_]*)+
    FieldArgBareStr         ::= [^ \b\t\n\r\[\]"']+
    HexNumber               ::= 0[xX][0-9a-zA-Z]+
    Number                  ::= [+\-]?[0-9]+[\.]?[0-9]*([eE][+\-][0-9]+)?
    QuotedStr               ::= "(?:\\"|.)*?"|'(?:\\'|.)*?'
    BareStr                 ::= [^ \b\t\n\r\(\),="']+

Additional context

The val(<field>) transformer is a special no-op transformer that's needed at the language parser level in order to disambuate field references from raw string values. For clarity:

  • proc.name = proc.pname: Evaluates true for process of which comm is the proc.pname string, and is equivalent to proc.name = "proc.pname"
  • proc.name = val(proc.pname): Evaluates true for process of which comm is the same as its parent's comm

/milestone 0.17.0

@jasondellaluce jasondellaluce added the kind/feature New feature or request label Apr 10, 2024
@jasondellaluce jasondellaluce self-assigned this Apr 10, 2024
@poiana
Copy link
Contributor

poiana commented Apr 10, 2024

@jasondellaluce: The provided milestone is not valid for this repository. Milestones in this repository: [0.16.0, 0.17.0, TBD, next-driver]

Use /milestone clear to clear the milestone.

In response to this:

Motivation

Over time, we collected plenty of requests in the context of the filtering language of libsinsp. This little DSL (domain specific language) is the basis on which Falco rules are developed and executed, and also serves other use cases across the different adopters of the Falco libs. Feedback from adopters always indicated that the language is simple and expressive, but we acknowledge that it also suffers from some limitations. To list some:

  • Small modifications to existing fields mandate adding new fields
  • Minor changes in the semantics of a comparison operator mandate adding new operators (e.g. istartswith and iglob, which are just case insensitive versions of already-existing operators)
  • Field-to-field comparisons are not possible
  • Interpolation, composition, or small runtime transformations of existing types are not possible

Here's a non-comprehensive collection of issues from our repositories related to the topic:

The general feeling is that changing the nature of the language, or making it extra complex, would defeat the simplicy principles that made the rules language widely adopted and easy to learn. Moreover, the grammar of the filtering language is quite fragile and does not leave much space to edits without the risk of introducing breaking changes of vast magnitude.

However, we also argue that there are minor feasible changes could make the language far more expressive and powerful.

Feature

I want to share an R&D project that me and @Andreagit97 spent some time on over the past weeks.

Our proposal is to update the filtering language with the notion of Field transformers. Transformers are declarative transformations that can be applied to filter fields (e.g. proc.name, etc...) with the purposes of supporting new detection scenarios and filtering capabilities.

The proposed syntax is as follows (all fields and scenarios are random simple examples):

  • fd.name startswith "/etc": Traditional use case, which will be supported as usual
  • tolower(fd.name) startswith "/etc": Lower case conversion for string field types
  • toupper(fd.name) startswith "/ETC": Upper case conversion for string field types
  • b64(evt.buffer) bcontains deadbeef: base64 decoding for string and bytebuf field types
  • proc.name != val(proc.pname): field-to-field comparisons
  • tolower(proc.name) != tolower(proc.pname): field-to-field comparisons, with transformers
  • toupper(b64(fd.name)) = TESTFILE: base64 decoding for string and bytebuf field types

Here are some properties of field transformers:

  • Implemented as an additional feature of the language, thus not introducing any breaking change from to the current state of things
  • Have strong typing, thus non-ambiguous
  • Easy to implement new ones for future use cases, making them future proof
  • Are composable (e.g. toupper(b64(fd.name)))

The grammar of the filtering language (current state:

// Context-free Grammar for Sinsp Filters
) will evolve in the following backward-compatible and non-ambiguous way:

Productions (EBNF Syntax):
   Expr                    ::= OrExpr
   OrExpr                  ::= AndExpr ('or' OrExprTail)*
   OrExprTail              ::= ' ' AndExpr
                               | '(' Expr ')'
   AndExpr                 ::= NotExpr ('and' AndExprTail)*
   AndExprTail             ::= ' ' NotExpr
                               | '(' Expr ')'
   NotExpr                 ::= ('not ')* NotExprTail
   NotExprTail             ::= 'not(' Expr ')'
                               | Check
   Check                   ::= Field Condition
                               | FieldTransformer Condition 
                               | Identifier
                               | '(' Expr ')'
   FieldTransformer        ::= FieldTransformerType FieldTransformerTail
   FieldTransformerTail    ::= FieldTransformerArg ')'
   FieldTransformerArg     ::= FieldTransformer
                               | Field
   FieldTransformerOrVal   ::= FieldTransformer
                               | FieldTransformerVal Field ')'
   Condition               ::= UnaryOperator
                               | NumOperator (NumValue | FieldTransformerOrVal)
                               | StrOperator (StrValue | FieldTransformerOrVal)
                               | ListOperator (ListValue | FieldTransformerOrVal)
   ListValue               ::= '(' (StrValue (',' StrValue)*)* ')'
                               | Identifier
   Field                   ::= FieldName('[' FieldArg ']')?
   FieldArg                ::= QuotedStr | FieldArgBareStr 
   NumValue                ::= HexNumber | Number
   StrValue                ::= QuotedStr | BareStr

Supported Check Operators (EBNF Syntax):
   UnaryOperator           ::= 'exists'
   NumOperator             ::= '<=' | '<' | '>=' | '>' 
   StrOperator             ::= '==' | '=' | '!='
                               | 'glob ' | 'iglob '
                               | 'contains ' | 'icontains ' | 'bcontains '
                               | 'startswith ' | 'bstartswith ' | 'endswith '
   ListOperator            ::= 'intersects' | 'in' | 'pmatch' 
   FieldTransformerVal     ::= 'val('
   FieldTransformerType    ::= 'tolower(' | 'toupper(' | 'b64('

Tokens (Regular Expressions):
   Identifier              ::= [a-zA-Z]+[a-zA-Z0-9_]*
   FieldName               ::= [a-zA-Z]+[a-zA-Z0-9_]*(\.[a-zA-Z]+[a-zA-Z0-9_]*)+
   FieldArgBareStr         ::= [^ \b\t\n\r\[\]"']+
   HexNumber               ::= 0[xX][0-9a-zA-Z]+
   Number                  ::= [+\-]?[0-9]+[\.]?[0-9]*([eE][+\-][0-9]+)?
   QuotedStr               ::= "(?:\\"|.)*?"|'(?:\\'|.)*?'
   BareStr                 ::= [^ \b\t\n\r\(\),="']+

** Additional context **

The val(<field>) transformer is a special no-op transformer that's needed at the language parser level in order to disambuate field references from raw string values. For clarity:

  • proc.name = proc.pname: Evaluates true for process of which comm is the proc.pname string, and is equivalent to proc.name = "proc.pname"
  • proc.name = val(proc.pname): Evaluates true for process of which comm is the same as its parent's comm

** Action plan **

We plan to execute the changes as follows:

  • Preparing ground work on libsinsp "filter checks" data structure, that evaluates filter comparisons at runtime
  • Updating the filter grammar and AST (Abstract Syntax) definitions
  • Supporting the new feature in the "sinsp filter compiler", which compiles filter ASTs in the filtercheck-based executable form avaluated at runtime
  • Supporting the new feature in the sinsp output formatters, which are used to format Falco rules output and print-out information about event payloads and data fields
  • Document all the features on falco.org

All steps will require in-depth tests.

/milestone 0.38.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@leogr
Copy link
Member

leogr commented Apr 10, 2024

/milestone 0.38.0

I guess you wanted to select the last libs milestone before Falco 0.38, if so:
/milestone 0.17.0

@poiana poiana added this to the 0.17.0 milestone Apr 10, 2024
@leogr
Copy link
Member

leogr commented Apr 12, 2024

Additional proposal (can be implemented later):

  • basename() replicating the basename behavior, useful in combination with fields holding filepaths (for example, basename(proc.exepath) which results may be different from proc.exe) cc @loresuso

@leogr
Copy link
Member

leogr commented Apr 12, 2024

Additional proposal (can be implemented later):

  • join(<list>, <sep>) concatenates the elements of the given to create a single string (with placed between elements), especially useful in output: for printing list with a custom separator

@incertum
Copy link
Contributor

The val() transformer is a special no-op transformer that's needed at the language parser level in order to > disambuate field references from raw string values. For clarity:
proc.name = proc.pname: Evaluates true for process of which comm is the proc.pname string, and is equivalent to proc.name = "proc.pname"
proc.name = val(proc.pname): Evaluates true for process of which comm is the same as its parent's comm

Understood, it will likely cause a bit of a confusion and we need to document it very clearly. If we can think of alternatives that do not require val() we should consider them as well.

@jasondellaluce
Copy link
Contributor Author

Understood, it will likely cause a bit of a confusion and we need to document it very clearly. If we can think of alternatives that do not require val() we should consider them as well.

I agree with this. As part of this work, the plan is also to make the sinsp compiler emit warnings for potential mistakes with regards of this. Unfortunately, we explored many options and there is no better grammar construct we can employ that would not lead us to potential breaking changes in the filtering language and Falco rulesets out there. Although ugly-ish, this should guarantee complete backward compatibility with the status quo.

@LucaGuerra
Copy link
Contributor

Corresponding documentation PR: falcosecurity/falco-website#1319

@FedeDP
Copy link
Contributor

FedeDP commented May 27, 2024

Considering that the docs PR is open and that the 0.17.0 libs tag is out, i think we can close this one.
/close

@poiana poiana closed this as completed May 27, 2024
@poiana
Copy link
Contributor

poiana commented May 27, 2024

@FedeDP: Closing this issue.

In response to this:

Considering that the docs PR is open and that the 0.17.0 libs tag is out, i think we can close this one.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants