Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash detectors module #327

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Logstash detectors module #327

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Sep 23, 2021

@xp-1000 This is the replacement PR for PR 302 that I did in the past. This is using the generator instead. Please review.

Copy link
Contributor

@xp-1000 xp-1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @chungktran

Thanks for your contribution.

Some remarks as comments in the PR can be applied to all detectors:

  • I would probably change all transformation to true (or replace mean by min and delete all lasting functions)
  • I would delete all description which do not bring more information than the detector name (you wan put too high/ too low in the detector name if you want).

most of the detectors in this module do not rely on a "plug and play" approach as we try to do in general. instead, they need a specific threshold depending on each environment the user needs to set, for that you can:

  • disable them by default to let the user ability to enable it if they want (and know how to configure them)
  • let the threshold undefined to force the user to do it
  • trying to use more relative and generic approach like the queue size approach I proposed in the PR
  • mix all of these

this is not a problem but notice the default rollup for cumulative metrics are delta so it should not be necessary to define it explicitly.

rollup: delta
rules:
major:
description: is high
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would let the description undefined while it does not bring more valuable information than the default, this can apply to all other similar descriptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave the description b/c the default says too high for both critical and major which to me is not right. When it's major I want it say high and when it's critical I want it to say too high which make perfect sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but no matter their severity both rules are too high compared to their respective threshold and that is true to everybody, not a personal meaning preference.

I would prefer to keep modules homogeneous to give a consistent behavior to the users.
That said we could open a request feature to create variables to give the user the ability to customize the rule description as he want.

modules/smart-agent_logstash/conf/01-events_in_high.yaml Outdated Show resolved Hide resolved
rules:
major:
description: is high
threshold: 25000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this detector thresholds highly depends on each environment, I would let threshold undefined to force the user to customize this according to his needs.

this can applied to all other detectors except may be the critical too low ones (equals to 0).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to the module to be as useful out of the box as much as possible without too much configuration. With your suggestion the end user will get error when doing terraform plan/apply and that's very un-friendly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the goal to make all modules useful out the box as much as possible and this is why I suggest this in addition to my other suggestion to prefer relative based detectors (like percentage).

There are 2 possibilities:

  • either the thresholds you choose have an universal or common logic so you can keep it but you just need to explain them (as Notes in readme, or as tip on detector, or mix them). e.g. on aws rds, the max connection is calculated depending on the instance type so we could specify a default value in this case which should match and work for any user who did not set a custom limit on their RDS.
  • or these thresholds have been choose from your environment, with your requirements, your needs and your criteria so they have no sens or legitimacy for others and so they should not have been set as default in a generic module

the modules should be as plug and play as possible, meaning a user can deploy detectors without to customize anything and it works, users are used to this.

if you let default values for thresholds which must always be customized this is misleading for the user and a trap because they will not notice the problem until the moment they do not get alert when they want to.

if you do not set default values, so these variables will be automatically added to the module example usage in the readme to inform the user these variables are mandatory and yes if they forget them they will get an explicit and understandable error to force them to set, so no bad surprise in future because they deployed detectors which will never work in their environment (in my opinion this is more un-friendly)

modules/smart-agent_logstash/conf/01-events_in_high.yaml Outdated Show resolved Hide resolved
Comment on lines 8 to 12
disk:
metric: node.stats.pipelines.queue.queue_size_in_bytes
rollup: delta
signal:
formula: (disk / 1000000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of applying a static threshold check to the queue which, again, fully depends on each environment may by can you try to make this check more generic by making a comparison between node.stats.pipelines.queue.max_queue_size_in_bytes and node.stats.pipelines.queue.queue_size_in_bytes.

Suggested change
disk:
metric: node.stats.pipelines.queue.queue_size_in_bytes
rollup: delta
signal:
formula: (disk / 1000000)
size:
metric: node.stats.pipelines.queue.queue_size_in_bytes
max:
metric: node.stats.pipelines.queue.max_queue_size_in_bytes
signal:
formula: (size/max).scale(100).fill(value=0).publish('signal')

from this way you can define relative percentage based threshold which could fit to everybody.

also, it seems redundant to the previous one event count detector. you can keep it but you should probably use disabled: true to disable it by default. this allow interested users to keep a hard queue limit count for who know it but don't enable alert by default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monitoring disk size in Logstash queue is different than the queued events in the pipeline. The disk size is cumulative of all pipelines so it is not quite the same. I'd like to keep this as static threshold.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok it was only a suggestion to make this detector more generic. while most of your detectors use absolute thresholds they are not usable and useful out of the box for other users as seen above.

modules/smart-agent_logstash/conf/readme.yaml Outdated Show resolved Hide resolved
@ghost
Copy link
Author

ghost commented Sep 27, 2021

@xp-1000 Thanks for all your comments and suggestions. I will make the changes accordingly.

@ghost ghost marked this pull request as draft September 27, 2021 16:41
@ghost ghost marked this pull request as ready for review September 28, 2021 18:00
@ghost
Copy link
Author

ghost commented Sep 28, 2021

@xp-1000 I've made changes to some of the suggested changes. Please review. Thanks.

Copy link
Contributor

@xp-1000 xp-1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to take the time to make the changes and to answer to my remarks.

I answered too, homogeneity and generic purpose of the modules in this repository make them constraining for sure but I believe we must find a cursor between the simplicity of usage and the relevance of the detectors for everybody.

@ghost ghost marked this pull request as draft October 4, 2021 12:44
@pdecat pdecat added the detectors About nex or existing detectors label Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
detectors About nex or existing detectors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants