Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR][DAC] Add Support for Custom Schemas #3679

Open
wants to merge 18 commits into
base: DAC-feature
Choose a base branch
from

Conversation

eric-forte-elastic
Copy link
Collaborator

@eric-forte-elastic eric-forte-elastic commented May 15, 2024

Issues

#3618

Summary

This PR adds the ability to specify custom schema files (or directories) via stack-schema-map.yaml

e.g.

8.14.0:
  beats: 8.12.2
  ecs: 8.11.0
  endgame: 8.4.0
  custom: schemas/forte-schema.json
Details

{
    "eric-index*": {
      "process.NewEricValue": "keyword",
      "process.AnotherEricValue": "keyword"
    }
}
  

Example Rule using eric-index

[metadata]
creation_date = "2023/08/23"
integration = ["endpoint"]
maturity = "production"
min_stack_comments = "New fields added: required_fields, related_integrations, setup"
min_stack_version = "8.3.0"
updated_date = "2024/03/08"

[rule]
author = ["Elastic"]
description = """
Test Rule
"""
from = "now-9m"
index = ["logs-endpoint.events.*", "eric-index*"]
language = "eql"
license = "Elastic License v2"
name = "TEST Potential Protocol Tunneling via Chisel Client"
note = """Test Note"""
references = [
    "https://blog.bitsadmin.com/living-off-the-foreign-land-windows-as-offensive-platform",
    "https://book.hacktricks.xyz/generic-methodologies-and-resources/tunneling-and-port-forwarding"
    ]
risk_score = 47
rule_id = "e8e3af2a-11b8-4ab7-9ca1-c6db621ea89d"
setup = """Test Setup"""
severity = "medium"
tags = [
        "Domain: Endpoint",
        "OS: Linux",
        "Use Case: Threat Detection",
        "Tactic: Command and Control",
        "Data Source: Elastic Defend"
        ]
type = "eql"
query = '''
process where host.os.type == "linux" and process.NewEricValue == "GoodValue"
'''
timestamp_override = "event.ingested"

[[rule.threat]]
framework = "MITRE ATT&CK"

[[rule.threat.technique]]
id = "T1572"
name = "Protocol Tunneling"
reference = "https://attack.mitre.org/techniques/T1572/"

[rule.threat.tactic]
id = "TA0011"
name = "Command and Control"
reference = "https://attack.mitre.org/tactics/TA0011/"

Basic Testing

At a minimum one should test being able to run view-rule against both an EQL and a KQL rule using custom index. Additionally, different min stack versions should be used to test that the custom schema is being applied to the correct stack version.

Example

BYOS_test

Additional Testing

With your custom schema and rules perform the following tests.

Note, these may not all pass, as I am not sure if we have tried them all on the DAC-feature branch to catch any other bugs, but before merging we should either remove extraneous tests for this list or address why they are failing.

  • Run view-rule on several different integrations and try with multiple min_stack versions.
    • E.g. python -m detection_rules view-rule rules/integrations/google_workspace/collection_google_drive_ownership_transferred_via_google_workspace.toml
    • E.g. python -m detection_rules view-rule rules/integrations/aws/collection_cloudtrail_logging_created.toml
    • E.g. python -m detection_rules view-rule rules/integrations/azure/initial_access_consent_grant_attack_via_azure_registered_application.toml
    • E.g. python -m detection_rules view-rule rules/apm/apm_403_response_to_a_post.toml
  • Run show-latest-compatible to verify command shows latest compatible version.
    • E.g. python -m detection_rules dev integrations show-latest-compatible -p apm -s 8.2.0
  • Run the build-schemas and build-manifest commands to test rebuilding the schemas and manifest
    • E.g. python -m detection_rules dev integrations build-schemas -o
    • E.g. python -m detection_rules dev integrations build-manifests -o
  • Run make test to ensure all unit tests pass.

@eric-forte-elastic eric-forte-elastic linked an issue May 15, 2024 that may be closed by this pull request
@eric-forte-elastic eric-forte-elastic self-assigned this May 15, 2024
@eric-forte-elastic eric-forte-elastic changed the title [FR][DAC} Add Support for Custom Schemas [FR][DAC] Add Support for Custom Schemas May 15, 2024
@eric-forte-elastic eric-forte-elastic changed the base branch from main to DAC-feature May 16, 2024 03:10
@eric-forte-elastic eric-forte-elastic added enhancement New feature or request python Internal python for the repository Area: DED Team: TRADE detections-as-code labels May 16, 2024
@eric-forte-elastic eric-forte-elastic marked this pull request as ready for review May 16, 2024 03:11
@botelastic botelastic bot added the schema label May 16, 2024
@Mikaayenson
Copy link
Collaborator

We should assess / consider a few things:

Copy link
Collaborator

@brokensound77 brokensound77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a good solution for the data view consideration (which only now becomes relevant since built in rules will never use them, only indexes).

Within QueryRuleData, we can add a new property index_or_dataview -> list[str] which returns whichever is set. The reason for making it a list even though dataview is a string is because we always iterate, so it saves us from doing a type check every time.

Then, we will need to search for every use of data.index (or rule.contents.data.index and all variations) and replace them with this new property where it makes sense.

Additionally, in #3510, it should have also included a validation check to ensure only index or dataview are set and not both

    @validates_schema
    def validates_query_data(self, data, **kwargs):
        ...
        # add check here

This should probably be done in main and backported to this branch.

If needed, you can convert this comment to an issue to track.

detection_rules/utils.py Outdated Show resolved Hide resolved
detection_rules/ecs.py Outdated Show resolved Hide resolved
detection_rules/ecs.py Outdated Show resolved Hide resolved
if schema_path.is_file():
custom_schema_dump.update(eql.utils.load_dump(str(schema_path)))
elif schema_path.is_dir():
custom_schema_dump.update(load_schemas_from_dir(schema_path))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enforce only schema files vs a dir of schemas (requiring explicit paths for all). Unless you have an example where this would be used

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I like the idea of being able to specify a directory of schemas as I think it could be more convenient for CI.

For example, if a customer has schema files that have dynamic names, e.g. my_schmea_v1.1, that could change to my_schema_1.2, they would not have to update their config file each time the name changed. Instead they could leave the config file the same and just specify a directory where my_schema_* would be located instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but with the way it is written, won't it merge all the schemas together?

def load_schemas_from_dir(schema_dir: Path) -> dict:
    """Load all schemas from a directory."""
    schemas_dump = {}
    for file_path in schema_dir.iterdir():
        if file_path.is_file() and file_path.suffix in [".json", ".yaml", ".yml"]:
            schemas_dump.update(eql.utils.load_dump(str(file_path)))

    return schemas_dump

so if I had a dir with random schemas, it would merge them all into one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it would have that behavior, merging them all into one. Maybe I am misunderstanding. Originally, I thought the question you were proposing was comparing someone being able to write:

8.14.0:
  beats: 8.12.2
  ecs: 8.11.0
  endgame: 8.4.0
  custom_one: schemas/custom/forte-schema_1.json
  custom_two: schemas/custom/forte-schema_2.json
  ...

vs

8.14.0:
  beats: 8.12.2
  ecs: 8.11.0
  endgame: 8.4.0
  custom: schemas/custom

In which case I would think we would want to support either? Otherwise I am not sure I follow what you are asking?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For flexibility, I like the ability to specify independent schemas or a folder of schemas. Definitely helps limit modifications to the config.

@eric-forte-elastic
Copy link
Collaborator Author

I think I have a good solution for the data view consideration (which only now becomes relevant since built in rules will never use them, only indexes).

Within QueryRuleData, we can add a new property index_or_dataview -> list[str] which returns whichever is set. The reason for making it a list even though dataview is a string is because we always iterate, so it saves us from doing a type check every time.

Then, we will need to search for every use of data.index (or rule.contents.data.index and all variations) and replace them with this new property where it makes sense.

Additionally, in #3510, it should have also included a validation check to ensure only index or dataview are set and not both

    @validates_schema
    def validates_query_data(self, data, **kwargs):
        ...
        # add check here

This should probably be done in main and backported to this branch.

If needed, you can convert this comment to an issue to track.

Made an issue to track this independently 👍 #3697

stack_schema_map = RULES_CONFIG.stack_schema_map[stack_version]

for schema, value in stack_schema_map.items():
if schema not in ["beats", "ecs", "endgame"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if it is one of these three?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should ignore the schema if it is one of those, as those are reserved "schema words" that are by definition not custom. Do you agree?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think that is the smart approach, but wont this return an empty dict in this case? It would be better to raise an error, forbidding using reserved words. Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good idea, we will need to have it function such that it will allow the words once, as we would want them to be able to specify beats, etc. or to not use those if desired. But we would not want them to use multiple as the result would be confusing 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further testing and reflection, I think the test for additional beats, ecs, or endgame schemas should not be in the custom_schemas.py as it might be confusing to have the custom schema loader be validating non-custom schemas. I think this check would be considered increased validation for the schema map in general. As such, I think this could target main, as the check is not DAC specific.

@eric-forte-elastic
Copy link
Collaborator Author

eric-forte-elastic commented May 23, 2024

Will be adding documentation updates to show BYOS features.

detection_rules/custom_schemas.py Outdated Show resolved Hide resolved
detection_rules/custom_schemas.py Outdated Show resolved Hide resolved
Co-authored-by: Mika Ayenson <Mikaayenson@users.noreply.github.com>
docs/custom-rules.md Outdated Show resolved Hide resolved
Mikaayenson
Mikaayenson previously approved these changes May 23, 2024
Copy link
Collaborator

@Mikaayenson Mikaayenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: DED detections-as-code enhancement New feature or request python Internal python for the repository schema Team: TRADE
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR][DAC] add support for custom-schemas (BYOS)
4 participants