Logstash detectors module #327

ghost · 2021-09-23T16:41:11Z

@xp-1000 This is the replacement PR for PR 302 that I did in the past. This is using the generator instead. Please review.

xp-1000

Hello @chungktran

Thanks for your contribution.

Some remarks as comments in the PR can be applied to all detectors:

I would probably change all transformation to true (or replace mean by min and delete all lasting functions)
I would delete all description which do not bring more information than the detector name (you wan put too high/ too low in the detector name if you want).

most of the detectors in this module do not rely on a "plug and play" approach as we try to do in general. instead, they need a specific threshold depending on each environment the user needs to set, for that you can:

disable them by default to let the user ability to enable it if they want (and know how to configure them)
let the threshold undefined to force the user to do it
trying to use more relative and generic approach like the queue size approach I proposed in the PR
mix all of these

this is not a problem but notice the default rollup for cumulative metrics are delta so it should not be necessary to define it explicitly.

xp-1000 · 2021-09-27T07:56:55Z

modules/smart-agent_logstash/conf/01-events_in_high.yaml

+    rollup: delta
+rules:
+  major:
+    description: is high


I would let the description undefined while it does not bring more valuable information than the default, this can apply to all other similar descriptions.

I'd like to leave the description b/c the default says too high for both critical and major which to me is not right. When it's major I want it say high and when it's critical I want it to say too high which make perfect sense to me.

but no matter their severity both rules are too high compared to their respective threshold and that is true to everybody, not a personal meaning preference.

I would prefer to keep modules homogeneous to give a consistent behavior to the users.
That said we could open a request feature to create variables to give the user the ability to customize the rule description as he want.

modules/smart-agent_logstash/conf/01-events_in_high.yaml

xp-1000 · 2021-09-27T08:03:03Z

modules/smart-agent_logstash/conf/01-events_in_high.yaml

+rules:
+  major:
+    description: is high
+    threshold: 25000


as this detector thresholds highly depends on each environment, I would let threshold undefined to force the user to customize this according to his needs.

this can applied to all other detectors except may be the critical too low ones (equals to 0).

I want to the module to be as useful out of the box as much as possible without too much configuration. With your suggestion the end user will get error when doing terraform plan/apply and that's very un-friendly.

I agree about the goal to make all modules useful out the box as much as possible and this is why I suggest this in addition to my other suggestion to prefer relative based detectors (like percentage).

There are 2 possibilities:

either the thresholds you choose have an universal or common logic so you can keep it but you just need to explain them (as Notes in readme, or as tip on detector, or mix them). e.g. on aws rds, the max connection is calculated depending on the instance type so we could specify a default value in this case which should match and work for any user who did not set a custom limit on their RDS.

or these thresholds have been choose from your environment, with your requirements, your needs and your criteria so they have no sens or legitimacy for others and so they should not have been set as default in a generic module

the modules should be as plug and play as possible, meaning a user can deploy detectors without to customize anything and it works, users are used to this.

if you let default values for thresholds which must always be customized this is misleading for the user and a trap because they will not notice the problem until the moment they do not get alert when they want to.

if you do not set default values, so these variables will be automatically added to the module example usage in the readme to inform the user these variables are mandatory and yes if they forget them they will get an explicit and understandable error to force them to set, so no bad surprise in future because they deployed detectors which will never work in their environment (in my opinion this is more un-friendly)

modules/smart-agent_logstash/conf/01-events_in_high.yaml

xp-1000 · 2021-09-27T08:20:11Z

modules/smart-agent_logstash/conf/07-queued_disk.yaml

+  disk:
+    metric: node.stats.pipelines.queue.queue_size_in_bytes
+    rollup: delta
+  signal:
+    formula: (disk / 1000000)


instead of applying a static threshold check to the queue which, again, fully depends on each environment may by can you try to make this check more generic by making a comparison between node.stats.pipelines.queue.max_queue_size_in_bytes and node.stats.pipelines.queue.queue_size_in_bytes.

Suggested change

disk:

metric: node.stats.pipelines.queue.queue_size_in_bytes

rollup: delta

signal:

formula: (disk / 1000000)

size:

metric: node.stats.pipelines.queue.queue_size_in_bytes

max:

metric: node.stats.pipelines.queue.max_queue_size_in_bytes

signal:

formula: (size/max).scale(100).fill(value=0).publish('signal')

from this way you can define relative percentage based threshold which could fit to everybody.

also, it seems redundant to the previous one event count detector. you can keep it but you should probably use disabled: true to disable it by default. this allow interested users to keep a hard queue limit count for who know it but don't enable alert by default.

Monitoring disk size in Logstash queue is different than the queued events in the pipeline. The disk size is cumulative of all pipelines so it is not quite the same. I'd like to keep this as static threshold.

ok it was only a suggestion to make this detector more generic. while most of your detectors use absolute thresholds they are not usable and useful out of the box for other users as seen above.

modules/smart-agent_logstash/conf/readme.yaml

ghost · 2021-09-27T16:41:02Z

@xp-1000 Thanks for all your comments and suggestions. I will make the changes accordingly.

ghost · 2021-09-28T18:01:03Z

@xp-1000 I've made changes to some of the suggested changes. Please review. Thanks.

xp-1000

Thanks to take the time to make the changes and to answer to my remarks.

I answered too, homogeneity and generic purpose of the modules in this repository make them constraining for sure but I believe we must find a cursor between the simplicity of usage and the relevance of the detectors for everybody.

ghost requested review from BzSpi, cvauvarin, Shr3ps, tchernomax and xp-1000 as code owners September 23, 2021 16:41

xp-1000 suggested changes Sep 27, 2021

View reviewed changes

ghost marked this pull request as draft September 27, 2021 16:41

ghost marked this pull request as ready for review September 28, 2021 18:00

xp-1000 reviewed Sep 29, 2021

View reviewed changes

ghost marked this pull request as draft October 4, 2021 12:44

Logstash detectors module.

b6ca5ba

pdecat added the detectors About nex or existing detectors label Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logstash detectors module #327

Logstash detectors module #327

ghost commented Sep 23, 2021

xp-1000 left a comment •

edited

xp-1000 Sep 27, 2021

ghost Sep 28, 2021

xp-1000 Sep 29, 2021

xp-1000 Sep 27, 2021

ghost Sep 28, 2021

xp-1000 Sep 29, 2021

xp-1000 Sep 27, 2021

ghost Sep 28, 2021

xp-1000 Sep 29, 2021

ghost commented Sep 27, 2021

ghost commented Sep 28, 2021

xp-1000 left a comment

Logstash detectors module #327

Are you sure you want to change the base?

Logstash detectors module #327

Conversation

ghost commented Sep 23, 2021

xp-1000 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Sep 27, 2021

ghost commented Sep 28, 2021

xp-1000 left a comment

Choose a reason for hiding this comment

xp-1000 left a comment •

edited