Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add configurable alarms for Ec2Monitoring #515

Open
gossandr opened this issue May 6, 2024 · 0 comments
Open

add configurable alarms for Ec2Monitoring #515

gossandr opened this issue May 6, 2024 · 0 comments
Labels
feature-request New feature

Comments

@gossandr
Copy link

gossandr commented May 6, 2024

Feature scope

EC2

Describe your suggested feature

Currently, EC2Monitoring does not support adding any alarms. and i was honestly a bit surprised at how I ultimately had to implement the alarms i wanted to implement.

I'd like to potentially take this on in a PR, but wanted to get some feedback here first (if possible)

At first, I tried to leverage the Ec2Monitoring class to get access to the metrics to use in monitorCustom. this did not work, in part because the metrics there are IMetric and not MetricWithAlarmSupport. The class does not support the StatusCheckFailed metric at all, and the other metrics are all exposed as the "wrong" type for what i need.

ultimately i got something working, but I am not sure that this is the best way:

using MonitoringFacade i create the metric factory:

    this.monitoring = new MonitoringFacade(this, 'MonitoringFacade', {
      alarmFactoryDefaults: {
        actionsEnabled: true,
        alarmNamePrefix: `${props.applicationName}-${props.stageName}`,
        action: new SnsAlarmActionStrategy({
          onAlarmTopic: monitoringTopic,
        }),
        datapointsToAlarm: 1,

      },
      metricFactoryDefaults: {
        namespace: `${props.applicationQualifier}`,
      },
      dashboardFactory: new DefaultDashboardFactory(this, 'DashboardFactory', {
        dashboardNamePrefix: `${props.applicationName}-${props.stageName}`,
        createDashboard: true,
        createSummaryDashboard: false,
        createAlarmDashboard: true,
        renderingPreference: DashboardRenderingPreference.INTERACTIVE_ONLY,
      }),
    });
    // initialize metric factory
    this.metricFactory = this.monitoring.createMetricFactory();
    // initialize dimensions map for ec2 InstanceId
    const ec2DimensionsMap: DimensionsMap = {};
    ec2DimensionsMap.InstanceId = props.ec2InstanceId;

I am using Ec2Monitoring, but I can't do much with it:

    // create the monitoring widget for the summary dashboard
    this.monitoring.monitorEC2Instances({
      ...monitorEc2Props,
    });

To then setup alarms, I use .monitorCustom():

    this.monitoring.monitorCustom({
      metricGroups: [
        /**
         * MetricGroup for the inference instance
         */
        {
          title: 'Inference Instance Health',
          metrics: [
            /**
             * CPU Utilization Metric with Alarm
             * Will alarm when CPU breaches the threshold
             */
            {
              metric: this.getEC2InstanceMetric('CPUUtilization', MetricStatistic.AVERAGE, ec2DimensionsMap),
              alarmFriendlyName: 'inference-instance-cpu-utilization',
              addAlarm: {
                Critical: {
                  threshold: 80,
                  comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
                  actionsEnabled: true,
                  datapointsToAlarm: 3,
                  evaluationPeriods: 3,
                  // missing data indicates that the instance is potentially down
                  // treatMissingDataOverride: TreatMissingData.BREACHING,
                },
              },
            },
            /**
             * StatusCheckFailed Metric with Alarm
             * will alarm when either instance status check fails for 2 consecutive data points
             * when maximum statistic is >= to 1.0
             */
            {
              metric: this.getEC2InstanceMetric('StatusCheckFailed', MetricStatistic.MAX, ec2DimensionsMap),
              alarmFriendlyName: 'inference-instance-status-check-failed',
              addAlarm: {
                Critical: {
                  threshold: 1.0,
                  comparisonOperator: ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
                  actionsEnabled: true,
                  datapointsToAlarm: 2,
                  evaluationPeriods: 2,
                },
              },
            },
          ],
        },
      ],
      addToAlarmDashboard: true,
      addToSummaryDashboard: false,
      alarmFriendlyName: 'inference-instance',

    });

I use a private method to create the metric using the metric factory:

  // create a method to get the EC2 Metrics using the metric factory
  private getEC2InstanceMetric(metricName: string, statistic: MetricStatistic, dimension: DimensionsMap) {

    const metric = this.metricFactory.createMetric(
      metricName,
      statistic,
      undefined,
      dimension,
      undefined,
      'AWS/EC2',
    );

    return metric;
  }

Any feedback appreciated

@gossandr gossandr added the feature-request New feature label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature
Projects
None yet
Development

No branches or pull requests

1 participant