文章目录[隐藏]

AWS by HTTP
AWS EC2 by HTTP
AWS RDS instance by HTTP
AWS S3 bucket by HTTP
AWS ECS Serverless Cluster by HTTP
AWS ECS Cluster by HTTP
AWS ELB Application Load Balancer by HTTP
AWS ELB Network Load Balancer by HTTP
AWS Lambda by HTTP
AWS Backup Vault by HTTP
AWS Cost Explorer by HTTP

This template is for Zabbix version: 7.4

Also available for: 7.2 7.0 6.4 6.2 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/cloud/AWS/aws_http?at=release/7.4

https://www.zabbix.com/integrations/aws

AWS by HTTP

Overview

This template is designed for the effortless deployment of AWS monitoring by Zabbix via HTTP and doesn't require any external scripts.

Currently, the template supports the discovery of EC2 and RDS instances, ECS clusters, ELB, Lambda, S3 buckets, and backup vaults.

Included Monitoring Templates

AWS EC2 by HTTP
AWS ECS Cluster by HTTP
AWS ECS Serverless Cluster by HTTP
AWS ELB Application Load Balancer by HTTP
AWS ELB Network Load Balancer by HTTP
AWS Lambda by HTTP
AWS RDS instance by HTTP
AWS S3 bucket by HTTP
AWS Cost Explorer by HTTP
AWS Backup Vault by HTTP

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS by HTTP

Configuration

Setup

Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect metrics.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ec2:DescribeInstances",
                "ec2:DescribeVolumes",
                "ec2:DescribeRegions",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances",
                "ecs:DescribeClusters",
                "ecs:ListServices",
                "ecs:ListTasks",
                "ecs:ListClusters",
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation",
                "s3:GetMetricsConfiguration",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:DescribeSecurityGroups",
                "lambda:ListFunctions",
                "backup:ListBackupVaults",
                "backup:ListBackupJobs",
                "backup:ListCopyJobs",
                "backup:ListRestoreJobs"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume Role Authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ec2:DescribeInstances",
                "ec2:DescribeVolumes",
                "ec2:DescribeRegions",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances",
                "ecs:DescribeClusters",
                "ecs:ListServices",
                "ecs:ListTasks",
                "ecs:ListClusters",
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation",
                "s3:GetMetricsConfiguration",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:DescribeSecurityGroups",
                "lambda:ListFunctions",
                "backup:ListBackupVaults",
                "backup:ListBackupJobs",
                "backup:ListCopyJobs",
                "backup:ListRestoreJobs"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Note: If you set the {$AWS.ASSUME.ROLE.AUTH.METADATA} macro to true and set the macros {$AWS.STS.REGION} and {$AWS.ASSUME.ROLE.ARN}, the Zabbix server or proxy will attempt to retrieve the role credentials from the instance metadata service. This means that the Zabbix server or proxy must be running on an AWS EC2 instance with an IAM role assigned that has the necessary permissions. This approach is recommended when running Zabbix inside an AWS EC2 instance with an IAM role assigned, as it simplifies credential management.

Role-Based Authorization

If you are using role-based authorization, add the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ec2:DescribeInstances",
                "ec2:DescribeVolumes",
                "ec2:DescribeRegions",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances",
                "ecs:DescribeClusters",
                "ecs:ListServices",
                "ecs:ListTasks",
                "ecs:ListClusters",
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation",
                "s3:GetMetricsConfiguration",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:DescribeSecurityGroups",
                "lambda:ListFunctions",
                "backup:ListBackupVaults",
                "backup:ListBackupJobs",
                "backup:ListCopyJobs",
                "backup:ListRestoreJobs"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

To gather Request metrics, enable Requests metrics on your Amazon S3 buckets from the AWS console.

Set the macros: {$AWS.AUTH_TYPE}. Possible values: access_key, assume_role, role_base.

For more information about managing access keys, see official documentation.

Refer to the Macros section for a list of macros used for LLD filters.

Additional information about the metrics and used API methods:

Macros used

Name	Description	Default
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.REQUEST.REGION}	Region used in GET request `ListBuckets`.	`us-east-1`
{$AWS.DESCRIBE.REGION}	Region used in POST request `DescribeRegions`.	`us-east-1`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.DATA.TIMEOUT}	A response timeout for an API.	`60s`
{$AWS.EC2.LLD.FILTER.NAME.MATCHES}	Filter of discoverable EC2 instances by namespace.	`.*`
{$AWS.EC2.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered EC2 instances by namespace.	`CHANGE_IF_NEEDED`
{$AWS.EC2.LLD.FILTER.REGION.MATCHES}	Filter of discoverable EC2 instances by region.	`.*`
{$AWS.EC2.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered EC2 instances by region.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.NAME.MATCHES}	Filter of discoverable ECS clusters by name.	`.*`
{$AWS.ECS.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered ECS clusters by name.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.STATUS.MATCHES}	Filter of discoverable ECS clusters by status.	`ACTIVE`
{$AWS.ECS.LLD.FILTER.STATUS.NOT_MATCHES}	Filter to exclude discovered ECS clusters by status.	`CHANGE_IF_NEEDED`
{$AWS.S3.LLD.FILTER.NAME.MATCHES}	Filter of discoverable S3 buckets by namespace.	`.*`
{$AWS.S3.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered S3 buckets by namespace.	`CHANGE_IF_NEEDED`
{$AWS.RDS.LLD.FILTER.NAME.MATCHES}	Filter of discoverable RDS instances by namespace.	`.*`
{$AWS.RDS.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered RDS instances by namespace.	`CHANGE_IF_NEEDED`
{$AWS.RDS.LLD.FILTER.REGION.MATCHES}	Filter of discoverable RDS instances by region.	`.*`
{$AWS.RDS.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered RDS instances by region.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.REGION.MATCHES}	Filter of discoverable ECS clusters by region.	`.*`
{$AWS.ECS.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered ECS clusters by region.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.NAME.MATCHES}	Filter of discoverable ELB load balancers by name.	`.*`
{$AWS.ELB.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered ELB load balancers by name.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.REGION.MATCHES}	Filter of discoverable ELB load balancers by region.	`.*`
{$AWS.ELB.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered ELB load balancers by region.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.STATE.MATCHES}	Filter of discoverable ELB load balancers by status.	`active`
{$AWS.ELB.LLD.FILTER.STATE.NOT_MATCHES}	Filter to exclude discovered ELB load balancer by status.	`CHANGE_IF_NEEDED`
{$AWS.LAMBDA.LLD.FILTER.REGION.MATCHES}	Filter of discoverable Lambda functions by region.	`.*`
{$AWS.LAMBDA.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered Lambda functions by region.	`CHANGE_IF_NEEDED`
{$AWS.LAMBDA.LLD.FILTER.RUNTIME.MATCHES}	Filter of discoverable Lambda functions by Runtime.	`.*`
{$AWS.LAMBDA.LLD.FILTER.RUNTIME.NOT_MATCHES}	Filter to exclude discovered Lambda functions by Runtime.	`CHANGE_IF_NEEDED`
{$AWS.LAMBDA.LLD.FILTER.NAME.MATCHES}	Filter of discoverable Lambda functions by name.	`.*`
{$AWS.LAMBDA.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered Lambda functions by name.	`CHANGE_IF_NEEDED`
{$AWS.BACKUP_VAULT.LLD.FILTER.NAME.MATCHES}	Filter of discoverable backup vaults by name.	`.*`
{$AWS.BACKUP_VAULT.LLD.FILTER.NAME.NOT_MATCHES}	Filter to exclude discovered backup vaults by name.	`CHANGE_IF_NEEDED`
{$AWS.BACKUP_VAULT.LLD.FILTER.REGION.MATCHES}	Filter of discoverable backup vaults by region.	`.*`
{$AWS.BACKUP_VAULT.LLD.FILTER.REGION.NOT_MATCHES}	Filter to exclude discovered backup vaults by region.	`CHANGE_IF_NEEDED`

LLD rule S3 buckets discovery

Name	Description	Type	Key and additional info
S3 buckets discovery	Get S3 bucket instances.	Script	aws.s3.discovery

LLD rule EC2 instances discovery

Name	Description	Type	Key and additional info
EC2 instances discovery	Get EC2 instances.	Script	aws.ec2.discovery

LLD rule RDS instances discovery

Name	Description	Type	Key and additional info
RDS instances discovery	Get RDS instances.	Script	aws.rds.discovery

LLD rule ECS clusters discovery

Name	Description	Type	Key and additional info
ECS clusters discovery	Get ECS clusters.	Script	aws.ecs.discovery

LLD rule ELB load balancers discovery

Name	Description	Type	Key and additional info
ELB load balancers discovery	Get ELB load balancers.	Script	aws.elb.discovery

LLD rule Lambda discovery

Name	Description	Type	Key and additional info
Lambda discovery	Get Lambda functions.	Script	aws.lambda.discovery

LLD rule Backup vault discovery

Name	Description	Type	Key and additional info
Backup vault discovery	Get backup vaults.	Script	aws.backup_vault.discovery

AWS EC2 by HTTP

Overview

The template to monitor AWS EC2 and attached AWS EBS volumes by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Note: This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and used API methods:

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS EC2 by HTTP

Configuration

Setup

The template get AWS EC2 and attached AWS EBS volumes metrics and uses the script item to make HTTP requests to the CloudWatch API. Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon EC2 metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "ec2:DescribeVolumes",
              "cloudwatch:"DescribeAlarms",
              "cloudwatch:GetMetricData"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume Role Authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "cloudwatch:"DescribeAlarms",
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "cloudwatch:"DescribeAlarms",
                "cloudwatch:GetMetricData"
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

For more information, see the EC2 policies on the AWS website.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, {$AWS.EC2.INSTANCE.ID}.

For more information about managing access keys, see official documentation.

Also, see the Macros section for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.REGION}	Amazon EC2 Region code.	`us-west-1`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.EC2.INSTANCE.ID}	EC2 instance ID.
{$AWS.EC2.LLD.FILTER.VOLUME_TYPE.MATCHES}	Filter of discoverable volumes by type.	`.*`
{$AWS.EC2.LLD.FILTER.VOLUME_TYPE.NOT_MATCHES}	Filter to exclude discovered volumes by type.	`CHANGE_IF_NEEDED`
{$AWS.EC2.LLD.FILTER.ALARM_SERVICE_NAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.EC2.LLD.FILTER.ALARM_SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.EC2.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.EC2.LLD.FILTER.ALARM_NAME.NOT_MATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.EC2.CPU.UTIL.WARN.MAX}	The warning threshold of the CPU utilization expressed in %.	`85`
{$AWS.EC2.CPU.CREDIT.BALANCE.MIN.WARN}	Minimum number of free earned CPU credits for trigger expression.	`50`
{$AWS.EC2.CPU.CREDIT.SURPLUS.BALANCE.MAX.WARN}	Maximum number of spent CPU Surplus credits for trigger expression.	`100`
{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of I/O credits remaining for trigger expression.	`20`
{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of Byte credits remaining for trigger expression.	`20`
{$AWS.EBS.BURST.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of Byte credits remaining for trigger expression.	`20`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get instance metrics. Full metrics list related to EC2: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html	Script	aws.ec2.get_metricsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get instance alarms data	DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.ec2.get_alarmsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get volumes data	Get volumes attached to instance. DescribeVolumes API method: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeVolumes.html	Script	aws.ec2.get_volumesPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get metrics check	Check result of the instance metric data has been got correctly.	Dependent item	aws.ec2.metrics.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Check result of the alarm data has been got correctly.	Dependent item	aws.ec2.alarms.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get volumes info check	Check result of the volume information has been got correctly.	Dependent item	aws.ec2.volumes.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Credit CPU: Balance	The number of earned CPU credits that an instance has accrued since it was launched or started. For T2 Standard, the CPUCreditBalance also includes the number of launch credits that have been accrued. Credits are accrued in the credit balance after they are earned, and removed from the credit balance when they are spent. The credit balance has a maximum limit, determined by the instance size. After the limit is reached, any new credits that are earned are discarded. For T2 Standard, launch credits do not count towards the limit. The credits in the CPUCreditBalance are available for the instance to spend to burst beyond its baseline CPU utilization. When an instance is running, credits in the CPUCreditBalance do not expire. When a T3 or T3a instance stops, the CPUCreditBalance value persists for seven days. Thereafter, all accrued credits are lost. When a T2 instance stops, the CPUCreditBalance value does not persist, and all accrued credits are lost.	Dependent item	aws.ec2.cpu.credit_balancePreprocessing JSON Path: `$.[?(@.Label == "CPUCreditBalance")].Values.first().first()`⛔️Custom on fail: Discard value
Credit CPU: Usage	The number of CPU credits spent by the instance for CPU utilization. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes).	Dependent item	aws.ec2.cpu.credit_usagePreprocessing JSON Path: `$.[?(@.Label == "CPUCreditUsage")].Values.first().first()`⛔️Custom on fail: Discard value
Credit CPU: Surplus balance	The number of surplus credits that have been spent by an unlimited instance when its CPUCreditBalance value is zero. The CPUSurplusCreditBalance value is paid down by earned CPU credits. If the number of surplus credits exceeds the maximum number of credits that the instance can earn in a 24-hour period, the spent surplus credits above the maximum incur an additional charge.	Dependent item	aws.ec2.cpu.surplus_credit_balancePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Credit CPU: Surplus charged	The number of spent surplus credits that are not paid down by earned CPU credits, and which thus incur an additional charge. Spent surplus credits are charged when any of the following occurs: - The spent surplus credits exceed the maximum number of credits that the instance can earn in a 24-hour period. Spent surplus credits above the maximum are charged at the end of the hour; - The instance is stopped or terminated; - The instance is switched from unlimited to standard.	Dependent item	aws.ec2.cpu.surplus_credit_chargedPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
CPU: Utilization	The percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application on a selected instance. Depending on the instance type, tools in your operating system can show a lower percentage than CloudWatch when the instance is not allocated a full processor core.	Dependent item	aws.ec2.cpu_utilizationPreprocessing JSON Path: `$.[?(@.Label == "CPUUtilization")].Values.first().first()`⛔️Custom on fail: Discard value
Disk: Read bytes, rate	Bytes read from all instance store volumes available to the instance. This metric is used to determine the volume of the data the application reads from the hard disk of the instance. This can be used to determine the speed of the application. If there are no instance store volumes, either the value is 0 or the metric is not reported.	Dependent item	aws.ec2.disk.read_bytes.ratePreprocessing JSON Path: `$.[?(@.Label == "DiskReadBytes")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Disk: Read, rate	Completed read operations from all instance store volumes available to the instance in a specified period of time. If there are no instance store volumes, either the value is 0 or the metric is not reported.	Dependent item	aws.ec2.disk.read_ops.ratePreprocessing JSON Path: `$.[?(@.Label == "DiskReadOps")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Disk: Write bytes, rate	Bytes written to all instance store volumes available to the instance. This metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application. If there are no instance store volumes, either the value is 0 or the metric is not reported.	Dependent item	aws.ec2.disk_write_bytes.ratePreprocessing JSON Path: `$.[?(@.Label == "DiskWriteBytes")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Disk: Write ops, rate	Completed write operations to all instance store volumes available to the instance in a specified period of time. If there are no instance store volumes, either the value is 0 or the metric is not reported.	Dependent item	aws.ec2.disk_write_ops.ratePreprocessing JSON Path: `$.[?(@.Label == "DiskWriteOps")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
EBS: Byte balance	Percentage of throughput credits remaining in the burst bucket for Nitro-based instances.	Dependent item	aws.ec2.ebs.byte_balancePreprocessing JSON Path: `$.[?(@.Label == "EBSByteBalance%")].Values.first().first()`⛔️Custom on fail: Discard value
EBS: IO balance	Percentage of I/O credits remaining in the burst bucket for Nitro-based instances.	Dependent item	aws.ec2.ebs.io_balancePreprocessing JSON Path: `$.[?(@.Label == "EBSIOBalance%")].Values.first().first()`⛔️Custom on fail: Discard value
EBS: Read bytes, rate	Bytes read from all EBS volumes attached to the instance for Nitro-based instances.	Dependent item	aws.ec2.ebs.read_bytes.ratePreprocessing JSON Path: `$.[?(@.Label == "EBSReadBytes")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
EBS: Read, rate	Completed read operations from all Amazon EBS volumes attached to the instance for Nitro-based instances.	Dependent item	aws.ec2.ebs.read_ops.ratePreprocessing JSON Path: `$.[?(@.Label == "EBSReadOps")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
EBS: Write bytes, rate	Bytes written to all EBS volumes attached to the instance for Nitro-based instances.	Dependent item	aws.ec2.ebs.write_bytes.ratePreprocessing JSON Path: `$.[?(@.Label == "EBSWriteBytes")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
EBS: Write, rate	Completed write operations to all EBS volumes attached to the instance in a specified period of time.	Dependent item	aws.ec2.ebs.write_ops.ratePreprocessing JSON Path: `$.[?(@.Label == "EBSWriteOps")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Metadata: No token	The number of times the instance metadata service was successfully accessed using a method that does not use a token. This metric is used to determine if there are any processes accessing instance metadata that are using Instance Metadata Service Version 1, which does not use a token. If all requests use token-backed sessions, i.e., Instance Metadata Service Version 2, the value is 0.	Dependent item	aws.ec2.metadata.no_tokenPreprocessing JSON Path: `$.[?(@.Label == "MetadataNoToken")].Values.first().first()`⛔️Custom on fail: Discard value
Network: Bytes in, rate	The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance.	Dependent item	aws.ec2.network_in.ratePreprocessing JSON Path: `$.[?(@.Label == "NetworkIn")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Network: Bytes out, rate	The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance.	Dependent item	aws.ec2.network_out.ratePreprocessing JSON Path: `$.[?(@.Label == "NetworkOut")].Values.first().first()` JavaScript: `The text is too long. Please see the template.`
Network: Packets in, rate	The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only.	Dependent item	aws.ec2.packets_in.ratePreprocessing JSON Path: `$.[?(@.Label == "NetworkPacketsIn")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Network: Packets out, rate	The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only.	Dependent item	aws.ec2.packets_out.ratePreprocessing JSON Path: `$.[?(@.Label == "NetworkPacketsOut")].Values.first().first()`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.`
Status: Check failed	Reports whether the instance has passed both the instance status check and the system status check in the last minute. This metric can be either 0 (passed) or 1 (failed).	Dependent item	aws.ec2.status_check_failedPreprocessing JSON Path: `$.[?(@.Label == "StatusCheckFailed")].Values.first().first()`⛔️Custom on fail: Discard value
Status: Check failed, instance	Reports whether the instance has passed the instance status check in the last minute. This metric can be either 0 (passed) or 1 (failed).	Dependent item	aws.ec2.status_check_failed_instancePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Status: Check failed, system	Reports whether the instance has passed the system status check in the last minute. This metric can be either 0 (passed) or 1 (failed).	Dependent item	aws.ec2.status_check_failed_systemPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity
AWS EC2: Failed to get metrics data	Failed to get CloudWatch metrics for EC2.	`length(last(/AWS EC2 by HTTP/aws.ec2.metrics.check))>0`	Warning
AWS EC2: Failed to get alarms data	Failed to get CloudWatch alarms for EC2.	`length(last(/AWS EC2 by HTTP/aws.ec2.alarms.check))>0`	Warning
AWS EC2: Failed to get volumes info	Failed to get CloudWatch volumes for EC2.	`length(last(/AWS EC2 by HTTP/aws.ec2.volumes.check))>0`	Warning
AWS EC2: Instance CPU Credit balance is too low	The number of earned CPU credits has been less than {$AWS.EC2.CPU.CREDIT.BALANCE.MIN.WARN} in the last 5 minutes.	`max(/AWS EC2 by HTTP/aws.ec2.cpu.credit_balance,5m)<{$AWS.EC2.CPU.CREDIT.BALANCE.MIN.WARN}`	Warning
AWS EC2: Instance has spent too many CPU surplus credits	The number of spent surplus credits that are not paid down and which thus incur an additional charge is over {$AWS.EC2.CPU.CREDIT.SURPLUS.BALANCE.MAX.WARN}.	`last(/AWS EC2 by HTTP/aws.ec2.cpu.surplus_credit_charged)>{$AWS.EC2.CPU.CREDIT.SURPLUS.BALANCE.MAX.WARN}`	Warning
AWS EC2: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS EC2 by HTTP/aws.ec2.cpu_utilization,15m)>{$AWS.EC2.CPU.UTIL.WARN.MAX}`	Warning
AWS EC2: Byte Credit balance is too low		`max(/AWS EC2 by HTTP/aws.ec2.ebs.byte_balance,5m)<{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN}`	Warning
AWS EC2: I/O Credit balance is too low		`max(/AWS EC2 by HTTP/aws.ec2.ebs.io_balance,5m)<{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN}`	Warning
AWS EC2: Instance status check failed	These checks detect problems that require your involvement to repair. The following are examples of problems that can cause instance status checks to fail:Failed system status checks Incorrect networking or startup configuration Exhausted memory Corrupted file system Incompatible kernel	`last(/AWS EC2 by HTTP/aws.ec2.status_check_failed_instance)=1`	Average
AWS EC2: System status check failed	These checks detect underlying problems with your instance that require AWS involvement to repair. The following are examples of problems that can cause system status checks to fail:Loss of network connectivity Loss of system power Software issues on the physical host Hardware issues on the physical host that impact network reachability	`last(/AWS EC2 by HTTP/aws.ec2.status_check_failed_system)=1`	Average

LLD rule Instance Alarms discovery

Name Description Type Key and additional info

Instance Alarms discovery

Discovery instance and attached EBS volumes alarms.

Dependent item

Name	Description	Type	Key and additional info
Instance Alarms discovery	Discovery instance and attached EBS volumes alarms.	Dependent item	aws.ec2.alarms.discoveryPreprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

aws.ec2.alarms.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Instance Alarms discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get alarm metrics about the state and its reason.

Dependent item

Name	Description	Type	Key and additional info
[{#ALARM_NAME}]: Get metrics	Get alarm metrics about the state and its reason.	Dependent item	aws.ec2.alarm.get_metrics["{#ALARM_NAME}"]Preprocessing JSON Path: `$.[?(@.AlarmName == "{#ALARM_NAME}")].first()`⛔️Custom on fail: Discard value
[{#ALARM_NAME}]: State reason	An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION}	Dependent item	aws.ec2.alarm.state_reason["{#ALARM_NAME}"]Preprocessing JSON Path: `$.StateReason`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#ALARM_NAME}]: State	The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM). Alarm description: {#ALARM_DESCRIPTION}	Dependent item	aws.ec2.alarm.state["{#ALARM_NAME}"]Preprocessing JSON Path: `$.StateValue`⛔️Custom on fail: Set value to: `3` JavaScript: `The text is too long. Please see the template.`

aws.ec2.alarm.get_metrics["{#ALARM_NAME}"]Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.ec2.alarm.state_reason["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateReason⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM).

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.ec2.alarm.state["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateValue⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Instance Alarms discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS EC2: [{#ALARM_NAME}] has 'Alarm' state	Alarm "{#ALARM_NAME}" has 'Alarm' state. Reason: {ITEM.LASTVALUE2}	`last(/AWS EC2 by HTTP/aws.ec2.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS EC2 by HTTP/aws.ec2.alarm.state_reason["{#ALARM_NAME}"]))>0`	Average
AWS EC2: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS EC2 by HTTP/aws.ec2.alarm.state["{#ALARM_NAME}"])=1`	Info

LLD rule Instance Volumes discovery

Name Description Type Key and additional info

Instance Volumes discovery

Discovery attached EBS volumes.

Dependent item

Name	Description	Type	Key and additional info
Instance Volumes discovery	Discovery attached EBS volumes.	Dependent item	aws.ec2.volumes.discoveryPreprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

aws.ec2.volumes.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Instance Volumes discovery

Name	Description	Type	Key and additional info
[{#VOLUME_ID}]: Get volume data	Get data of the "{#VOLUME_ID}" volume.	Dependent item	aws.ec2.ebs.get_volume["{#VOLUME_ID}"]Preprocessing JSON Path: `$.[?(@.volumeId == "{#VOLUME_ID}")].first()`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Create time	The time stamp when volume creation was initiated.	Dependent item	aws.ec2.ebs.create_time["{#VOLUME_ID}"]Preprocessing JSON Path: `$.createTime`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#VOLUME_ID}]: Status	The state of the volume. Possible values: 0 (creating), 1 (available), 2 (in-use), 3 (deleting), 4 (deleted), 5 (error).	Dependent item	aws.ec2.ebs.status["{#VOLUME_ID}"]Preprocessing JSON Path: `$.status`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
[{#VOLUME_ID}]: Attachment state	The attachment state of the volume. Possible values: 0 (attaching), 1 (attached), 2 (detaching).	Dependent item	aws.ec2.ebs.attachment_status["{#VOLUME_ID}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
[{#VOLUME_ID}]: Attachment time	The time stamp when the attachment initiated.	Dependent item	aws.ec2.ebs.attachment_time["{#VOLUME_ID}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#VOLUME_ID}]: Device	The device name specified in the block device mapping (for example, /dev/sda1).	Dependent item	aws.ec2.ebs.device["{#VOLUME_ID}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#VOLUME_ID}]: Get metrics	Get metrics of EBS volume. Full metrics list related to EBS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cloudwatch_ebs.html	Script	aws.ec2.get_ebs_metrics["{#VOLUME_ID}"]Preprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Read, bytes	Provides information on the read operations in a specified period of time. The average size of each read operation during the period, except on volumes attached to a Nitro-based instance, where the average represents the average over the specified period. For Xen instances, data is reported only when there is read activity on the volume.	Dependent item	aws.ec2.ebs.volume.read_bytes["{#VOLUME_ID}"]Preprocessing JSON Path: `$.[?(@.Label == "VolumeReadBytes")].Values.first().first()`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Write, bytes	Provides information on the write operations in a specified period of time. The average size of each write operation during the period, except on volumes attached to a Nitro-based instance, where the average represents the average over the specified period. For Xen instances, data is reported only when there is write activity on the volume.	Dependent item	aws.ec2.ebs.volume.write_bytes["{#VOLUME_ID}"]Preprocessing JSON Path: `$.[?(@.Label == "VolumeWriteBytes")].Values.first().first()`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Write, ops	The total number of write operations in a specified period of time. Note: write operations are counted on completion.	Dependent item	aws.ec2.ebs.volume.write_ops["{#VOLUME_ID}"]Preprocessing JSON Path: `$.[?(@.Label == "VolumeWriteOps")].Values.first().first()`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Read, ops	The total number of read operations in a specified period of time. Note: read operations are counted on completion.	Dependent item	aws.ec2.ebs.volume.read_ops["{#VOLUME_ID}"]Preprocessing JSON Path: `$.[?(@.Label == "VolumeReadOps")].Values.first().first()`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Read time, total	This metric is not supported with Multi-Attach enabled volumes. The total number of seconds spent by all read operations that completed in a specified period of time. If multiple requests are submitted at the same time, this total could be greater than the length of the period. For example, for a period of 1 minutes (60 seconds): if 150 operations completed during that period, and each operation took 1 second, the value would be 150 seconds. For Xen instances, data is reported only when there is read activity on the volume.	Dependent item	aws.ec2.ebs.volume.total_read_time["{#VOLUME_ID}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Write time, total	This metric is not supported with Multi-Attach enabled volumes. The total number of seconds spent by all write operations that completed in a specified period of time. If multiple requests are submitted at the same time, this total could be greater than the length of the period. For example, for a period of 1 minute (60 seconds): if 150 operations completed during that period, and each operation took 1 second, the value would be 150 seconds. For Xen instances, data is reported only when there is write activity on the volume.	Dependent item	aws.ec2.ebs.volume.total_write_time["{#VOLUME_ID}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Idle time	This metric is not supported with Multi-Attach enabled volumes. The total number of seconds in a specified period of time when no read or write operations were submitted.	Dependent item	aws.ec2.ebs.volume.idle_time["{#VOLUME_ID}"]Preprocessing JSON Path: `$.[?(@.Label == "VolumeIdleTime")].Values.first().first()`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Queue length	The number of read and write operation requests waiting to be completed in a specified period of time.	Dependent item	aws.ec2.ebs.volume.queue_length["{#VOLUME_ID}"]Preprocessing JSON Path: `$.[?(@.Label == "VolumeQueueLength")].Values.first().first()`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Throughput, pct	This metric is not supported with Multi-Attach enabled volumes. Used with Provisioned IOPS SSD volumes only. The percentage of I/O operations per second (IOPS) delivered of the total IOPS provisioned for an Amazon EBS volume. Provisioned IOPS SSD volumes deliver their provisioned performance 99.9 percent of the time. During a write, if there are no other pending I/O requests in a minute, the metric value will be 100 percent. Also, a volume's I/O performance may become degraded temporarily due to an action you have taken (for example, creating a snapshot of a volume during peak usage, running the volume on a non-EBS-optimized instance, or accessing data on the volume for the first time).	Dependent item	aws.ec2.ebs.volume.throughput_percentage["{#VOLUME_ID}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Consumed Read/Write, ops	Used with Provisioned IOPS SSD volumes only. The total amount of read and write operations (normalized to 256K capacity units) consumed in a specified period of time. I/O operations that are smaller than 256K each count as 1 consumed IOPS. I/O operations that are larger than 256K are counted in 256K capacity units. For example, a 1024K I/O would count as 4 consumed IOPS.	Dependent item	aws.ec2.ebs.volume.consumed_read_write_ops["{#VOLUME_ID}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#VOLUME_ID}]: Burst balance	Used with General Purpose SSD (gp2), Throughput Optimized HDD (st1), and Cold HDD (sc1) volumes only. Provides information about the percentage of I/O credits (for gp2) or throughput credits (for st1 and sc1) remaining in the burst bucket. Data is reported to CloudWatch only when the volume is active. If the volume is not attached, no data is reported.	Dependent item	aws.ec2.ebs.volume.burst_balance["{#VOLUME_ID}"]Preprocessing JSON Path: `$.[?(@.Label == "BurstBalance")].Values.first().first()`⛔️Custom on fail: Discard value

Trigger prototypes for Instance Volumes discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS EC2: Volume [{#VOLUME_ID}] has 'error' state		`last(/AWS EC2 by HTTP/aws.ec2.ebs.status["{#VOLUME_ID}"])=5`	Warning
AWS EC2: Burst balance is too low		`max(/AWS EC2 by HTTP/aws.ec2.ebs.volume.burst_balance["{#VOLUME_ID}"],5m)<{$AWS.EBS.BURST.CREDIT.BALANCE.MIN.WARN}`	Warning

AWS RDS instance by HTTP

Overview

The template to monitor AWS RDS instance by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Note: This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and used API methods:

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS RDS instance by HTTP

Configuration

Setup

The template get AWS RDS instance metrics and uses the script item to make HTTP requests to the CloudWatch API. Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon RDS metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume Role Authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, {$AWS.RDS.INSTANCE.ID}.

For more information about managing access keys, see official documentation.

Also, see the Macros section for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.REGION}	Amazon RDS Region code.	`us-west-1`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.RDS.INSTANCE.ID}	RDS DB Instance identifier.
{$AWS.RDS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.RDS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.RDS.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.RDS.LLD.FILTER.ALARM_NAME.NOT_MATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.RDS.LLD.FILTER.EVENT_CATEGORY.MATCHES}	Filter of discoverable events by category.	`.*`
{$AWS.RDS.LLD.FILTER.EVENT_CATEGORY.NOT_MATCHES}	Filter to exclude discovered events by category.	`CHANGE_IF_NEEDED`
{$AWS.RDS.LLD.FILTER.EVENT_SOURCE_TYPE.MATCHES}	Filter of discoverable events by source type.	`.*`
{$AWS.RDS.LLD.FILTER.EVENT_SOURCE_TYPE.NOT_MATCHES}	Filter to exclude discovered events by source type.	`CHANGE_IF_NEEDED`
{$AWS.RDS.CPU.UTIL.WARN.MAX}	The warning threshold of the CPU utilization expressed in %.	`85`
{$AWS.RDS.CPU.CREDIT.BALANCE.MIN.WARN}	Minimum number of free earned CPU credits for trigger expression.	`50`
{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of I/O credits remaining for trigger expression.	`20`
{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of Byte credits remaining for trigger expression.	`20`
{$AWS.RDS.BURST.CREDIT.BALANCE.MIN.WARN}	Minimum percentage of Byte credits remaining for trigger expression.	`20`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get instance metrics. Full metrics list related to RDS: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-metrics.html Full metrics list related to Amazon Aurora: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Monitoring.Metrics.html#Aurora.AuroraMySQL.Monitoring.Metrics.instances	Script	aws.rds.get_metricsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get instance info	Get instance info. DescribeDBInstances API method: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DescribeDBInstances.html	Script	aws.rds.get_instance_infoPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get instance alarms data	DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.rds.get_alarmsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get instance events data	DescribeEvents API method: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DescribeEvents.html	Script	aws.rds.get_eventsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get metrics check	Data collection check.	Dependent item	aws.rds.metrics.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get instance info check	Data collection check.	Dependent item	aws.rds.instance_info.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Data collection check.	Dependent item	aws.rds.alarms.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get events check	Data collection check.	Dependent item	aws.rds.events.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Class	Contains the name of the compute and memory capacity class of the DB instance.	Dependent item	aws.rds.classPreprocessing JSON Path: `$[*].DBInstanceClass.first()` Discard unchanged with heartbeat: `3h`
Engine	Database engine.	Dependent item	aws.rds.enginePreprocessing JSON Path: `$..Engine.first()` Discard unchanged with heartbeat: `3h`
Engine version	Indicates the database engine version.	Dependent item	aws.rds.engine.versionPreprocessing JSON Path: `$[*].EngineVersion.first()` Discard unchanged with heartbeat: `3h`
Status	Specifies the current state of this database. All possible status values and their description: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/accessing-monitoring.html#Overview.DBInstance.Status	Dependent item	aws.rds.statusPreprocessing JSON Path: `$..DBInstanceStatus.first()` Discard unchanged with heartbeat: `3h`
Storage type	Specifies the storage type associated with DB instance.	Dependent item	aws.rds.storage_typePreprocessing JSON Path: `$[*].StorageType.first()` Discard unchanged with heartbeat: `3h`
Create time	Provides the date and time the DB instance was created.	Dependent item	aws.rds.create_timePreprocessing JSON Path: `$..InstanceCreateTime.first()`
Storage: Allocated	Specifies the allocated storage size specified in gibibytes (GiB).	Dependent item	aws.rds.storage.allocatedPreprocessing JSON Path: `$[*].AllocatedStorage.first()` Discard unchanged with heartbeat: `3h`
Storage: Max allocated	The upper limit in gibibytes (GiB) to which Amazon RDS can automatically scale the storage of the DB instance. If limit is not specified returns -1.	Dependent item	aws.rds.storage.max_allocatedPreprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`
Read replica: State	The status of a read replica. If the instance isn't a read replica, this is blank. Boolean value that is true if the instance is operating normally, or false if the instance is in an error state.	Dependent item	aws.rds.read_replica_statePreprocessing JSON Path: `$..StatusInfos..Normal.first()`⛔️Custom on fail: Discard value Boolean to decimal Discard unchanged with heartbeat: `3h`
Read replica: Status	The status of a read replica. If the instance isn't a read replica, this is blank. Status of the DB instance. For a StatusType of read replica, the values can be replicating, replication stop point set, replication stop point reached, error, stopped, or terminated.	Dependent item	aws.rds.read_replica_statusPreprocessing JSON Path: `$..StatusInfos..Status.first()`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
Swap usage	The amount of swap space used. This metric is available for the Aurora PostgreSQL DB instance classes db.t3.medium, db.t3.large, db.r4.large, db.r4.xlarge, db.r5.large, db.r5.xlarge, db.r6g.large, and db.r6g.xlarge. For Aurora MySQL, this metric applies only to db.t* DB instance classes. This metric is not available for SQL Server.	Dependent item	aws.rds.swap_usagePreprocessing JSON Path: `$.[?(@.Label == "SwapUsage")].Values.first().first()`⛔️Custom on fail: Discard value
Disk: Write IOPS	The number of write records generated per second. This is more or less the number of log records generated by the database. These do not correspond to 8K page writes, and do not correspond to network packets sent.	Dependent item	aws.rds.write_iops.ratePreprocessing JSON Path: `$.[?(@.Label == "WriteIOPS")].Values.first().first()`⛔️Custom on fail: Discard value
Disk: Write latency	The average amount of time taken per disk I/O operation.	Dependent item	aws.rds.write_latencyPreprocessing JSON Path: `$.[?(@.Label == "WriteLatency")].Values.first().first()`⛔️Custom on fail: Discard value
Disk: Write throughput	The average number of bytes written to persistent storage every second.	Dependent item	aws.rds.write_throughput.ratePreprocessing JSON Path: `$.[?(@.Label == "WriteThroughput")].Values.first().first()`⛔️Custom on fail: Discard value
Network: Receive throughput	The incoming (Receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.	Dependent item	aws.rds.network_receive_throughput.ratePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Burst balance	The percent of General Purpose SSD (gp2) burst-bucket I/O credits available.	Dependent item	aws.rds.burst_balancePreprocessing JSON Path: `$.[?(@.Label == "BurstBalance")].Values.first().first()`⛔️Custom on fail: Discard value
CPU: Utilization	The percentage of CPU utilization.	Dependent item	aws.rds.cpu.utilizationPreprocessing JSON Path: `$.[?(@.Label == "CPUUtilization")].Values.first().first()`⛔️Custom on fail: Discard value
Credit CPU: Balance	The number of CPU credits that an instance has accumulated, reported at 5-minute intervals. You can use this metric to determine how long a DB instance can burst beyond its baseline performance level at a given rate. When an instance is running, credits in the CPUCreditBalance don't expire. When the instance stops, the CPUCreditBalance does not persist, and all accrued credits are lost. This metric applies only to db.t2.small and db.t2.medium instances for Aurora MySQL, and to db.t3 instances for Aurora PostgreSQL.	Dependent item	aws.rds.cpu.credit_balancePreprocessing JSON Path: `$.[?(@.Label == "CPUCreditBalance")].Values.first().first()`⛔️Custom on fail: Discard value
Credit CPU: Usage	The number of CPU credits consumed during the specified period, reported at 5-minute intervals. This metric measures the amount of time during which physical CPUs have been used for processing instructions by virtual CPUs allocated to the DB instance. This metric applies only to db.t2.small and db.t2.medium instances for Aurora MySQL, and to db.t3 instances for Aurora PostgreSQL	Dependent item	aws.rds.cpu.credit_usagePreprocessing JSON Path: `$.[?(@.Label == "CPUCreditUsage")].Values.first().first()`⛔️Custom on fail: Discard value
Connections	The number of client network connections to the database instance. The number of database sessions can be higher than the metric value because the metric value doesn't include the following: - Sessions that no longer have a network connection but which the database hasn't cleaned up - Sessions created by the database engine for its own purposes - Sessions created by the database engine's parallel execution capabilities - Sessions created by the database engine job scheduler - Amazon Aurora/RDS connections	Dependent item	aws.rds.database_connectionsPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Disk: Queue depth	The number of outstanding read/write requests waiting to access the disk.	Dependent item	aws.rds.disk_queue_depthPreprocessing JSON Path: `$.[?(@.Label == "DiskQueueDepth")].Values.first().first()`⛔️Custom on fail: Discard value
EBS: Byte balance	The percentage of throughput credits remaining in the burst bucket of your RDS database. This metric is available for basic monitoring only. To find the instance sizes that support this metric, see the instance sizes with an asterisk (*) in the EBS optimized by default table (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html#current) in Amazon RDS User Guide for Linux Instances.	Dependent item	aws.rds.ebs_byte_balancePreprocessing JSON Path: `$.[?(@.Label == "EBSByteBalance%")].Values.first().first()`⛔️Custom on fail: Discard value
EBS: IO balance	The percentage of I/O credits remaining in the burst bucket of your RDS database. This metric is available for basic monitoring only. To find the instance sizes that support this metric, see the instance sizes with an asterisk (*) in the EBS optimized by default table (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html#current) in Amazon RDS User Guide for Linux Instances.	Dependent item	aws.rds.ebs_io_balancePreprocessing JSON Path: `$.[?(@.Label == "EBSIOBalance%")].Values.first().first()`⛔️Custom on fail: Discard value
Memory, freeable	The amount of available random access memory. For MariaDB, MySQL, Oracle, and PostgreSQL DB instances, this metric reports the value of the MemAvailable field of /proc/meminfo.	Dependent item	aws.rds.freeable_memoryPreprocessing JSON Path: `$.[?(@.Label == "FreeableMemory")].Values.first().first()`⛔️Custom on fail: Discard value
Storage: Local free	The amount of local storage available, in bytes. Unlike for other DB engines, for Aurora DB instances this metric reports the amount of storage available to each DB instance. This value depends on the DB instance class. You can increase the amount of free storage space for an instance by choosing a larger DB instance class for your instance. (This doesn't apply to Aurora Serverless v2.)	Dependent item	aws.rds.free_local_storagePreprocessing JSON Path: `$.[?(@.Label == "FreeLocalStorage")].Values.first().first()`⛔️Custom on fail: Discard value
Network: Receive throughput	The incoming (receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. For Amazon Aurora: The amount of network throughput received from the Aurora storage subsystem by each instance in the DB cluster.	Dependent item	aws.rds.storage_network_receive_throughputPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Network: Transmit throughput	The outgoing (transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. For Amazon Aurora: The amount of network throughput sent to the Aurora storage subsystem by each instance in the Aurora MySQL DB cluster.	Dependent item	aws.rds.storage_network_transmit_throughputPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Disk: Read IOPS	The average number of disk I/O operations per second. Aurora PostgreSQL-Compatible Edition reports read and write IOPS separately, in 1-minute intervals.	Dependent item	aws.rds.read_iops.ratePreprocessing JSON Path: `$.[?(@.Label == "ReadIOPS")].Values.first().first()`⛔️Custom on fail: Discard value
Disk: Read latency	The average amount of time taken per disk I/O operation.	Dependent item	aws.rds.read_latencyPreprocessing JSON Path: `$.[?(@.Label == "ReadLatency")].Values.first().first()`⛔️Custom on fail: Discard value
Disk: Read throughput	The average number of bytes read from disk per second.	Dependent item	aws.rds.read_throughput.ratePreprocessing JSON Path: `$.[?(@.Label == "ReadThroughput")].Values.first().first()`⛔️Custom on fail: Discard value
Network: Transmit throughput	The outgoing (Transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.	Dependent item	aws.rds.network_transmit_throughput.ratePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Network: Throughput	The amount of network throughput both received from and transmitted to clients by each instance in the Aurora MySQL DB cluster, in bytes per second. This throughput doesn't include network traffic between instances in the DB cluster and the cluster volume.	Dependent item	aws.rds.network_throughput.ratePreprocessing JSON Path: `$.[?(@.Label == "NetworkThroughput")].Values.first().first()`⛔️Custom on fail: Discard value
Storage: Space free	The amount of available storage space.	Dependent item	aws.rds.free_storage_spacePreprocessing JSON Path: `$.[?(@.Label == "FreeStorageSpace")].Values.first().first()`⛔️Custom on fail: Discard value
Disk: Read IOPS, local storage	The average number of disk read I/O operations to local storage per second. Only applies to Multi-AZ DB clusters.	Dependent item	aws.rds.read_iops_local_storage.ratePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Disk: Read latency, local storage	The average amount of time taken per disk I/O operation for local storage. Only applies to Multi-AZ DB clusters.	Dependent item	aws.rds.read_latency_local_storagePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Disk: Read throughput, local storage	The average number of bytes read from disk per second for local storage. Only applies to Multi-AZ DB clusters.	Dependent item	aws.rds.read_throughput_local_storage.ratePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Replication: Lag	The amount of time a read replica DB instance lags behind the source DB instance. Applies to MySQL, MariaDB, Oracle, PostgreSQL, and SQL Server read replicas.	Dependent item	aws.rds.replica_lagPreprocessing JSON Path: `$.[?(@.Label == "ReplicaLag")].Values.first().first()`⛔️Custom on fail: Discard value
Disk: Write IOPS, local storage	The average number of disk write I/O operations per second on local storage in a Multi-AZ DB cluster.	Dependent item	aws.rds.write_iops_local_storage.ratePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Disk: Write latency, local storage	The average amount of time taken per disk I/O operation on local storage in a Multi-AZ DB cluster.	Dependent item	aws.rds.write_latency_local_storagePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Disk: Write throughput, local storage	The average number of bytes written to disk per second for local storage.	Dependent item	aws.rds.write_throughput_local_storage.ratePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
SQLServer: Failed agent jobs	The number of failed Microsoft SQL Server Agent jobs during the last minute.	Dependent item	aws.rds.failed_sql_server_agent_jobs_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Disk: Binlog Usage	The amount of disk space occupied by binary logs on the master. Applies to MySQL read replicas.	Dependent item	aws.rds.bin_log_disk_usagePreprocessing JSON Path: `$.[?(@.Label == "BinLogDiskUsage")].Values.first().first()`⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity
AWS RDS: Failed to get metrics data	Failed to get CloudWatch metrics for RDS.	`length(last(/AWS RDS instance by HTTP/aws.rds.metrics.check))>0`	Warning
AWS RDS: Failed to get instance data	Failed to get CloudWatch instance info for RDS.	`length(last(/AWS RDS instance by HTTP/aws.rds.instance_info.check))>0`	Warning
AWS RDS: Failed to get alarms data	Failed to get CloudWatch alarms for RDS.	`length(last(/AWS RDS instance by HTTP/aws.rds.alarms.check))>0`	Warning
AWS RDS: Failed to get events data	Failed to get CloudWatch events for RDS.	`length(last(/AWS RDS instance by HTTP/aws.rds.events.check))>0`	Warning
AWS RDS: Read replica in error state	The status of a read replica. False if the instance is in an error state.	`last(/AWS RDS instance by HTTP/aws.rds.read_replica_state)=0`	Average
AWS RDS: Burst balance is too low		`max(/AWS RDS instance by HTTP/aws.rds.burst_balance,5m)<{$AWS.RDS.BURST.CREDIT.BALANCE.MIN.WARN}`	Warning
AWS RDS: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS RDS instance by HTTP/aws.rds.cpu.utilization,15m)>{$AWS.RDS.CPU.UTIL.WARN.MAX}`	Warning
AWS RDS: Instance CPU Credit balance is too low	The number of earned CPU credits has been less than {$AWS.RDS.CPU.CREDIT.BALANCE.MIN.WARN} in the last 5 minutes.	`max(/AWS RDS instance by HTTP/aws.rds.cpu.credit_balance,5m)<{$AWS.RDS.CPU.CREDIT.BALANCE.MIN.WARN}`	Warning
AWS RDS: Byte Credit balance is too low		`max(/AWS RDS instance by HTTP/aws.rds.ebs_byte_balance,5m)<{$AWS.EBS.BYTE.CREDIT.BALANCE.MIN.WARN}`	Warning
AWS RDS: I/O Credit balance is too low		`max(/AWS RDS instance by HTTP/aws.rds.ebs_io_balance,5m)<{$AWS.EBS.IO.CREDIT.BALANCE.MIN.WARN}`	Warning

LLD rule Instance Alarms discovery

Name Description Type Key and additional info

Instance Alarms discovery

Discovery instance alarms.

Dependent item

Name	Description	Type	Key and additional info
Instance Alarms discovery	Discovery instance alarms.	Dependent item	aws.rds.alarms.discoveryPreprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

aws.rds.alarms.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Instance Alarms discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: State reason

Name	Description	Type	Key and additional info
[{#ALARM_NAME}]: State reason	An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION}	Dependent item	aws.rds.alarm.state_reason["{#ALARM_NAME}"]Preprocessing JSON Path: `$.[?(@.AlarmName == "{#ALARM_NAME}")].StateReason.first()`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#ALARM_NAME}]: State	The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM). Alarm description: {#ALARM_DESCRIPTION}	Dependent item	aws.rds.alarm.state["{#ALARM_NAME}"]Preprocessing JSON Path: `$.[?(@.AlarmName == "{#ALARM_NAME}")].StateValue.first()`⛔️Custom on fail: Set value to: `3` JavaScript: `The text is too long. Please see the template.`

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.rds.alarm.state_reason["{#ALARM_NAME}"]Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].StateReason.first()⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM).

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.rds.alarm.state["{#ALARM_NAME}"]Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].StateValue.first()⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Instance Alarms discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS RDS: [{#ALARM_NAME}] has 'Alarm' state	Alarm "{#ALARM_NAME}" has 'Alarm' state. Reason: {ITEM.LASTVALUE2}	`last(/AWS RDS instance by HTTP/aws.rds.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS RDS instance by HTTP/aws.rds.alarm.state_reason["{#ALARM_NAME}"]))>0`	Average
AWS RDS: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS RDS instance by HTTP/aws.rds.alarm.state["{#ALARM_NAME}"])=1`	Info

LLD rule Aurora metrics discovery

Name Description Type Key and additional info

Aurora metrics discovery

Name	Description	Type	Key and additional info
Aurora metrics discovery	Discovery Amazon Aurora metrics. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Monitoring.Metrics.html#Aurora.AuroraMySQL.Monitoring.Metrics.instances	Dependent item	aws.rds.aurora.discoveryPreprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`

Discovery Amazon Aurora metrics.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Monitoring.Metrics.html#Aurora.AuroraMySQL.Monitoring.Metrics.instances

Dependent item

aws.rds.aurora.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Aurora metrics discovery

Name	Description	Type	Key and additional info
Row lock time	The total time spent acquiring row locks for InnoDB tables.	Dependent item	aws.rds.row_locktime[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "RowLockTime")].Values.first().first()`⛔️Custom on fail: Discard value
Operations: Select throughput	The average number of select queries per second.	Dependent item	aws.rds.select_throughput.rate[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "SelectThroughput")].Values.first().first()`⛔️Custom on fail: Discard value
Operations: Select latency	The amount of latency for select queries.	Dependent item	aws.rds.select_latency[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "SelectLatency")].Values.first().first()`⛔️Custom on fail: Discard value
Replication: Lag, max	The maximum amount of lag between the primary instance and each Aurora DB instance in the DB cluster.	Dependent item	aws.rds.aurora_replica_lag.max[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Replication: Lag, min	The minimum amount of lag between the primary instance and each Aurora DB instance in the DB cluster.	Dependent item	aws.rds.aurora_replica_lag.min[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Replication: Lag	For an Aurora replica, the amount of lag when replicating updates from the primary instance.	Dependent item	aws.rds.aurora_replica_lag[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "AuroraReplicaLag")].Values.first().first()`⛔️Custom on fail: Discard value
Buffer Cache hit ratio	The percentage of requests that are served by the buffer cache.	Dependent item	aws.rds.buffer_cache_hit_ratio[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Operations: Commit latency	The amount of latency for commit operations.	Dependent item	aws.rds.commit_latency[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "CommitLatency")].Values.first().first()`⛔️Custom on fail: Discard value
Operations: Commit throughput	The average number of commit operations per second.	Dependent item	aws.rds.commit_throughput.rate[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "CommitThroughput")].Values.first().first()`⛔️Custom on fail: Discard value
Deadlocks, rate	The average number of deadlocks in the database per second.	Dependent item	aws.rds.deadlocks.rate[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "Deadlocks")].Values.first().first()`⛔️Custom on fail: Discard value
Engine uptime	The amount of time that the instance has been running.	Dependent item	aws.rds.engine_uptime[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "EngineUptime")].Values.first().first()`⛔️Custom on fail: Discard value
Rollback segment history list length	The undo logs that record committed transactions with delete-marked records. These records are scheduled to be processed by the InnoDB purge operation.	Dependent item	aws.rds.rollback_segment_history_list_length[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Network: Throughput	The amount of network throughput received from and sent to the Aurora storage subsystem by each instance in the Aurora MySQL DB cluster.	Dependent item	aws.rds.storage_network_throughput[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value

LLD rule Aurora MySQL metrics discovery

Name Description Type Key and additional info

Aurora MySQL metrics discovery

Name	Description	Type	Key and additional info
Aurora MySQL metrics discovery	Discovery Aurora MySQL metrics. Storage types: aurora (for MySQL 5.6-compatible Aurora) aurora-mysql (for MySQL 5.7-compatible and MySQL 8.0-compatible Aurora)	Dependent item	aws.rds.postgresql.discoveryPreprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`

Discovery Aurora MySQL metrics.

Storage types:

aurora (for MySQL 5.6-compatible Aurora)

aurora-mysql (for MySQL 5.7-compatible and MySQL 8.0-compatible Aurora)

Dependent item

aws.rds.postgresql.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for Aurora MySQL metrics discovery

Name	Description	Type	Key and additional info
Operations: Delete latency	The amount of latency for delete queries.	Dependent item	aws.rds.delete_latency[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "DeleteLatency")].Values.first().first()`⛔️Custom on fail: Discard value
Operations: Delete throughput	The average number of delete queries per second.	Dependent item	aws.rds.delete_throughput.rate[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "DeleteThroughput")].Values.first().first()`⛔️Custom on fail: Discard value
DML: Latency	The amount of latency for inserts, updates, and deletes.	Dependent item	aws.rds.dml_latency[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "DMLLatency")].Values.first().first()`⛔️Custom on fail: Discard value
DML: Throughput	The average number of inserts, updates, and deletes per second.	Dependent item	aws.rds.dml_throughput.rate[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "DMLThroughput")].Values.first().first()`⛔️Custom on fail: Discard value
DDL: Latency	The amount of latency for data definition language (DDL) requests - for example, create, alter, and drop requests.	Dependent item	aws.rds.ddl_latency[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "DDLLatency")].Values.first().first()`⛔️Custom on fail: Discard value
DDL: Throughput	The average number of DDL requests per second.	Dependent item	aws.rds.ddl_throughput.rate[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "DDLThroughput")].Values.first().first()`⛔️Custom on fail: Discard value
Backtrack: Window, actual	The difference between the target backtrack window and the actual backtrack window.	Dependent item	aws.rds.backtrack_window_actual[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Backtrack: Window, alert	The number of times that the actual backtrack window is smaller than the target backtrack window for a given period of time.	Dependent item	aws.rds.backtrack_window_alert[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Transactions: Blocked, rate	The average number of transactions in the database that are blocked per second.	Dependent item	aws.rds.blocked_transactions.rate[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Replication: Binlog lag	The amount of time that a binary log replica DB cluster running on Aurora MySQL-Compatible Edition lags behind the binary log replication source. A lag means that the source is generating records faster than the replica can apply them. The metric value indicates the following: A high value: The replica is lagging the replication source. 0 or a value close to 0: The replica process is active and current. -1: Aurora can't determine the lag, which can happen during replica setup or when the replica is in an error state	Dependent item	aws.rds.aurora_replication_binlog_lag[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Transactions: Active, rate	The average number of current transactions executing on an Aurora database instance per second. By default, Aurora doesn't enable this metric. To begin measuring this value, set innodb_monitor_enable='all' in the DB parameter group for a specific DB instance.	Dependent item	aws.rds.aurora_transactions_active.rate[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Connections: Aborted	The number of client connections that have not been closed properly.	Dependent item	aws.rds.aurora_clients_aborted[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "AbortedClients")].Values.first().first()`⛔️Custom on fail: Discard value
Operations: Insert latency	The amount of latency for insert queries, in milliseconds.	Dependent item	aws.rds.insert_latency[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "InsertLatency")].Values.first().first()`⛔️Custom on fail: Discard value
Operations: Insert throughput	The average number of insert queries per second.	Dependent item	aws.rds.insert_throughput.rate[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "InsertThroughput")].Values.first().first()`⛔️Custom on fail: Discard value
Login failures, rate	The average number of failed login attempts per second.	Dependent item	aws.rds.login_failures.rate[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "LoginFailures")].Values.first().first()`⛔️Custom on fail: Discard value
Queries, rate	The average number of queries executed per second.	Dependent item	aws.rds.queries.rate[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "Queries")].Values.first().first()`⛔️Custom on fail: Discard value
Resultset cache hit ratio	The percentage of requests that are served by the Resultset cache.	Dependent item	aws.rds.result_set_cache_hit_ratio[{#SINGLETON}]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Binary log files, number	The number of binlog files generated.	Dependent item	aws.rds.num_binary_log_files[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "NumBinaryLogFiles")].Values.first().first()`⛔️Custom on fail: Discard value
Binary log files, size	The total size of the binlog files.	Dependent item	aws.rds.sum_binary_log_files[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "SumBinaryLogSize")].Values.first().first()`⛔️Custom on fail: Discard value
Operations: Update latency	The amount of latency for update queries.	Dependent item	aws.rds.update_latency[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "UpdateLatency")].Values.first().first()`⛔️Custom on fail: Discard value
Operations: Update throughput	The average number of update queries per second.	Dependent item	aws.rds.update_throughput.rate[{#SINGLETON}]Preprocessing JSON Path: `$.[?(@.Label == "UpdateThroughput")].Values.first().first()`⛔️Custom on fail: Discard value

LLD rule Instance Events discovery

Name Description Type Key and additional info

Instance Events discovery

Discovery instance events.

Dependent item

Name	Description	Type	Key and additional info
Instance Events discovery	Discovery instance events.	Dependent item	aws.rds.events.discoveryPreprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

aws.rds.events.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Instance Events discovery

Name Description Type Key and additional info

[{#EVENT_CATEGORY}]: {#EVENT_SOURCE_TYPE}/{#EVENT_SOURCE_ID}: Message

Provides the text of this event.

Dependent item

Name	Description	Type	Key and additional info
[{#EVENT_CATEGORY}]: {#EVENT_SOURCE_TYPE}/{#EVENT_SOURCE_ID}: Message	Provides the text of this event.	Dependent item	aws.rds.event_message["{#EVENT_CATEGORY}/{#EVENT_SOURCE_TYPE}/{#EVENT_SOURCE_ID}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value JSON Path: `$[-1]` Discard unchanged with heartbeat: `3h`
[{#EVENT_CATEGORY}]: {#EVENT_SOURCE_TYPE}/{#EVENT_SOURCE_ID} : Date	Provides the text of this event.	Dependent item	aws.rds.event_date["{#EVENT_CATEGORY}/{#EVENT_SOURCE_TYPE}/{#EVENT_SOURCE_ID}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value JSON Path: `$[-1]` Discard unchanged with heartbeat: `3h`

aws.rds.event_message["{#EVENT_CATEGORY}/{#EVENT_SOURCE_TYPE}/{#EVENT_SOURCE_ID}"]Preprocessing

JSON Path: The text is too long. Please see the template.⛔️Custom on fail: Discard value
JSON Path: $[-1]
Discard unchanged with heartbeat: 3h

[{#EVENT_CATEGORY}]: {#EVENT_SOURCE_TYPE}/{#EVENT_SOURCE_ID} : Date

Provides the text of this event.

Dependent item

aws.rds.event_date["{#EVENT_CATEGORY}/{#EVENT_SOURCE_TYPE}/{#EVENT_SOURCE_ID}"]Preprocessing

JSON Path: The text is too long. Please see the template.⛔️Custom on fail: Discard value
JSON Path: $[-1]
Discard unchanged with heartbeat: 3h

AWS S3 bucket by HTTP

Overview

The template to monitor AWS S3 bucket by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Note: This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and used API methods:

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS S3 bucket by HTTP

Configuration

Setup

The template gets AWS S3 metrics and uses the script item to make HTTP requests to the CloudWatch API. Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon S3 metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData",
              "s3:GetMetricsConfiguration"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "s3:GetMetricsConfiguration"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "s3:GetMetricsConfiguration",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

To gather Request metrics, enable Requests metrics on your Amazon S3 buckets from the AWS console.

You can also define a filter for the Request metrics using a shared prefix, object tag, or access point.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.S3.BUCKET.NAME}.

For more information about managing access keys, see official documentation.

Also, see the Macros section for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.REQUEST.REGION}	Region used in GET request `ListBuckets`.	`us-east-1`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.S3.BUCKET.NAME}	S3 bucket name.
{$AWS.S3.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.S3.LLD.FILTER.ALARM_NAME.NOT_MATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.S3.LLD.FILTER.ID.NAME.MATCHES}	Filter of discoverable request metrics by filter ID name.	`.*`
{$AWS.S3.LLD.FILTER.ID.NAME.NOT_MATCHES}	Filter to exclude discovered request metrics by filter ID name.	`CHANGE_IF_NEEDED`
{$AWS.S3.UPDATE.INTERVAL}	Interval in seconds for getting request metrics. Used in the metric configuration and in the JavaScript API query. Must be between 1 and 86400 seconds.	`1800`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get bucket metrics. Full metrics list related to S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html	Script	aws.s3.get_metricsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get alarms data	Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.s3.get_alarmsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get metrics check	Data collection check.	Dependent item	aws.s3.metrics.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Data collection check.	Dependent item	aws.s3.alarms.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Bucket Size	This is a daily metric for the bucket. The amount of data in bytes stored in a bucket in the STANDARD storage class, INTELLIGENT_TIERING storage class, Standard-Infrequent Access (STANDARD_IA) storage class, OneZone-Infrequent Access (ONEZONE_IA), Reduced Redundancy Storage (RRS) class, S3 Glacier Instant Retrieval storage class, Deep Archive Storage (S3 Glacier Deep Archive) class, or S3 Glacier Flexible Retrieval (GLACIER) storage class. This value is calculated by summing the size of all objects and metadata in the bucket (both current and noncurrent objects), including the size of all parts for all incomplete multipart uploads to the bucket.	Dependent item	aws.s3.bucket_size_bytesPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Number of objects	This is a daily metric for the bucket. The total number of objects stored in a bucket for all storage classes. This value is calculated by counting all objects in the bucket (both current and noncurrent objects) and the total number of parts for all incomplete multipart uploads to the bucket.	Dependent item	aws.s3.number_of_objectsPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
AWS S3: Failed to get metrics data	Failed to get CloudWatch metrics for S3 bucket.	`length(last(/AWS S3 bucket by HTTP/aws.s3.metrics.check))>0`	Warning
AWS S3: Failed to get alarms data	Failed to get CloudWatch alarms for S3 bucket.	`length(last(/AWS S3 bucket by HTTP/aws.s3.alarms.check))>0`	Warning

LLD rule Bucket Alarms discovery

Name Description Type Key and additional info

Bucket Alarms discovery

Discovery of bucket alarms.

Dependent item

Name	Description	Type	Key and additional info
Bucket Alarms discovery	Discovery of bucket alarms.	Dependent item	aws.s3.alarms.discoveryPreprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

aws.s3.alarms.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Bucket Alarms discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: State reason

Name	Description	Type	Key and additional info
[{#ALARM_NAME}]: State reason	An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION}	Dependent item	aws.s3.alarm.state_reason["{#ALARM_NAME}"]Preprocessing JSON Path: `$.[?(@.AlarmName == "{#ALARM_NAME}")].StateReason.first()`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#ALARM_NAME}]: State	The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM). Alarm description: {#ALARM_DESCRIPTION}	Dependent item	aws.s3.alarm.state["{#ALARM_NAME}"]Preprocessing JSON Path: `$.[?(@.AlarmName == "{#ALARM_NAME}")].StateValue.first()`⛔️Custom on fail: Set value to: `3` JavaScript: `The text is too long. Please see the template.`

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.s3.alarm.state_reason["{#ALARM_NAME}"]Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].StateReason.first()⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM).

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.s3.alarm.state["{#ALARM_NAME}"]Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].StateValue.first()⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Bucket Alarms discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS S3: [{#ALARM_NAME}] has 'Alarm' state	Alarm "{#ALARM_NAME}" has 'Alarm' state. Reason: {ITEM.LASTVALUE2}	`last(/AWS S3 bucket by HTTP/aws.s3.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS S3 bucket by HTTP/aws.s3.alarm.state_reason["{#ALARM_NAME}"]))>0`	Average
AWS S3: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS S3 bucket by HTTP/aws.s3.alarm.state["{#ALARM_NAME}"])=1`	Info

LLD rule Request Metrics discovery

Name Description Type Key and additional info

Request Metrics discovery

Discovery of request metrics.

Dependent item

Name	Description	Type	Key and additional info
Request Metrics discovery	Discovery of request metrics.	Dependent item	aws.s3.configuration.discoveryPreprocessing JSON Path: `$.filter_id` Discard unchanged with heartbeat: `3h`

aws.s3.configuration.discoveryPreprocessing

JSON Path: $.filter_id
Discard unchanged with heartbeat: 3h

Item prototypes for Request Metrics discovery

Name	Description	Type	Key and additional info
Filter [{#AWS.S3.FILTER.ID.NAME}]: Get request metrics	Get bucket request metrics filter: '{#AWS.S3.FILTER.ID.NAME}'. Full metrics list related to S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html	Script	aws.s3.get_metrics["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: All	The total number of HTTP requests made to an Amazon S3 bucket, regardless of type. If you're using a metrics configuration with a filter, then this metric only returns the HTTP requests that meet the filter's requirements.	Dependent item	aws.s3.all_requests["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "AllRequests")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Get	The number of HTTP GET requests made for objects in an Amazon S3 bucket. This doesn't include list operations. Paginated list-oriented requests, like List Multipart Uploads, List Parts, Get Bucket Object versions, and others, are not included in this metric.	Dependent item	aws.s3.get_requests["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "GetRequests")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Put	The number of HTTP PUT requests made for objects in an Amazon S3 bucket.	Dependent item	aws.s3.put_requests["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "PutRequests")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Delete	The number of HTTP DELETE requests made for objects in an Amazon S3 bucket. This also includes Delete Multiple Objects requests. This metric shows the number of requests, not the number of objects deleted.	Dependent item	aws.s3.delete_requests["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "DeleteRequests")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Head	The number of HTTP HEAD requests made to an Amazon S3 bucket.	Dependent item	aws.s3.head_requests["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "HeadRequests")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Post	The number of HTTP POST requests made to an Amazon S3 bucket. Delete Multiple Objects and SELECT Object Content requests are not included in this metric.	Dependent item	aws.s3.post_requests["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "PostRequests")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Select	The number of Amazon S3 SELECT Object Content requests made for objects in an Amazon S3 bucket.	Dependent item	aws.s3.select_requests["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "SelectRequests")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Select, bytes scanned	The number of bytes of data scanned with Amazon S3 SELECT Object Content requests in an Amazon S3 bucket. Statistic: Average (bytes per request).	Dependent item	aws.s3.select_bytes_scanned["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Select, bytes returned	The number of bytes of data returned with Amazon S3 SELECT Object Content requests in an Amazon S3 bucket. Statistic: Average (bytes per request).	Dependent item	aws.s3.select_bytes_returned["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: List	The number of HTTP requests that list the contents of a bucket.	Dependent item	aws.s3.list_requests["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "ListRequests")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Bytes downloaded	The number of bytes downloaded for requests made to an Amazon S3 bucket, where the response includes a body. Statistic: Average (bytes per request).	Dependent item	aws.s3.bytes_downloaded["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "BytesDownloaded")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Bytes uploaded	The number of bytes uploaded that contain a request body, made to an Amazon S3 bucket. Statistic: Average (bytes per request).	Dependent item	aws.s3.bytes_uploaded["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "BytesUploaded")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Errors, 4xx	The number of HTTP 4xx client error status code requests made to an Amazon S3 bucket with a value of either 0 or 1. The average statistic shows the error rate, and the sum statistic shows the count of that type of error, during each period. Statistic: Average (reports per request).	Dependent item	aws.s3.4xx_errors["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "4xxErrors")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Requests: Errors, 5xx	The number of HTTP 5xx server error status code requests made to an Amazon S3 bucket with a value of either 0 or 1. The average statistic shows the error rate, and the sum statistic shows the count of that type of error, during each period. Statistic: Average (reports per request).	Dependent item	aws.s3.5xx_errors["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "5xxErrors")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: First byte latency, avg	The per-request time from the complete request being received by an Amazon S3 bucket to when the response starts to be returned. Statistic: Average.	Dependent item	aws.s3.first_byte_latency.avg["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "FirstByteLatency")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: First byte latency, p90	The per-request time from the complete request being received by an Amazon S3 bucket to when the response starts to be returned. Statistic: 90th percentile.	Dependent item	aws.s3.first_byte_latency.p90["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "FirstByteLatency")].Values.first().first()`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Total request latency, avg	The elapsed per-request time from the first byte received to the last byte sent to an Amazon S3 bucket. This includes the time taken to receive the request body and send the response body, which is not included in FirstByteLatency. Statistic: Average.	Dependent item	aws.s3.total_request_latency.avg["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Total request latency, p90	The elapsed per-request time from the first byte received to the last byte sent to an Amazon S3 bucket. This includes the time taken to receive the request body and send the response body, which is not included in FirstByteLatency. Statistic: 90th percentile.	Dependent item	aws.s3.total_request_latency.p90["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Replication: Latency	The maximum number of seconds by which the replication destination region is behind the source Region for a given replication rule.	Dependent item	aws.s3.replication_latency["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Replication: Bytes pending	The total number of bytes of objects pending replication for a given replication rule.	Dependent item	aws.s3.bytes_pending_replication["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Filter [{#AWS.S3.FILTER.ID.NAME}]: Replication: Operations pending	The number of operations pending replication for a given replication rule.	Dependent item	aws.s3.operations_pending_replication["{#AWS.S3.FILTER.ID.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value

AWS ECS Serverless Cluster by HTTP

Overview

The template to monitor AWS ECS Serverless Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Note: This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about the metrics and used API methods:

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS ECS Cluster by HTTP

Configuration

Setup

The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API. Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData",
              "ecs:ListServices"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ecs:ListServices"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ecs:ListServices",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the following macros {$AWS.AUTH_TYPE}, {$AWS.REGION}, {$AWS.ECS.CLUSTER.NAME}.

For more information about managing access keys, see official documentation.

Refer to the Macros section for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.REGION}	Amazon ECS Region code.	`us-west-1`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.ECS.CLUSTER.NAME}	ECS cluster name.
{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.ECS.LLD.FILTER.ALARM_NAME.NOT_MATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.ECS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES}	Filter of discoverable services by name.	`.*`
{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES}	Filter to exclude discovered services by name.	`CHANGE_IF_NEEDED`
{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}	The warning threshold of the cluster CPU utilization expressed in %.	`70`
{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}	The warning threshold of the cluster memory utilization expressed in %.	`70`
{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}	The warning threshold of the cluster service CPU utilization expressed in %.	`80`
{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}	The warning threshold of the cluster service memory utilization expressed in %.	`80`

Items

Name	Description	Type	Key and additional info
Get cluster metrics	Get cluster metrics. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html	Script	aws.ecs.get_metricsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get cluster services	Get cluster services. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html	Script	aws.ecs.get_cluster_servicesPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get alarms data	Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.ecs.get_alarmsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get metrics check	Data collection check.	Dependent item	aws.ecs.metrics.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Data collection check.	Dependent item	aws.ecs.alarms.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Container Instance Count	The number of EC2 instances running the Amazon ECS agent that are registered with a cluster.	Dependent item	aws.ecs.container_instance_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Task Count	The number of tasks running in the cluster.	Dependent item	aws.ecs.task_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Service Count	The number of services in the cluster.	Dependent item	aws.ecs.service_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
CPU Utilization	Cluster CPU utilization.	Dependent item	aws.ecs.cpu_utilizationPreprocessing JSON Path: `$.CPUUtilization`⛔️Custom on fail: Discard value
Memory Utilization	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.memory_utilizationPreprocessing JSON Path: `$.MemoryUtilization`⛔️Custom on fail: Discard value
Network rx bytes	The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.network.rxPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Network tx bytes	The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.network.txPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Ephemeral Storage Reserved	The number of bytes reserved from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later.	Dependent item	aws.ecs.ephemeral.storage.reservedPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Custom multiplier: `1073741824`
Ephemeral Storage Utilized	The number of bytes used from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later.	Dependent item	aws.ecs.ephemeral.storage.utilizedPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Custom multiplier: `1073741824`
Ephemeral Storage Utilization	The calculated Disk Utilization.	Dependent item	aws.ecs.disk.utilizationPreprocessing JSON Path: `$.DiskUtilization`⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity
AWS ECS Serverless: Failed to get metrics data	Failed to get CloudWatch metrics for ECS Cluster.	`length(last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.metrics.check))>0`	Warning
AWS ECS Serverless: Failed to get alarms data	Failed to get CloudWatch alarms for ECS Cluster.	`length(last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarms.check))>0`	Warning
AWS ECS Serverless: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}`	Warning
AWS ECS Serverless: High memory utilization	The system is running out of free memory.	`min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}`	Warning

LLD rule Cluster Alarms discovery

Name Description Type Key and additional info

Cluster Alarms discovery

Discovery instance alarms.

Dependent item

Name	Description	Type	Key and additional info
Cluster Alarms discovery	Discovery instance alarms.	Dependent item	aws.ecs.alarms.discoveryPreprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

aws.ecs.alarms.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Alarms discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get alarm metrics about the state and its reason.

Dependent item

Name	Description	Type	Key and additional info
[{#ALARM_NAME}]: Get metrics	Get alarm metrics about the state and its reason.	Dependent item	aws.ecs.alarm.get_metrics["{#ALARM_NAME}"]Preprocessing JSON Path: `$.[?(@.AlarmName == "{#ALARM_NAME}")].first()`⛔️Custom on fail: Discard value
[{#ALARM_NAME}]: State reason	An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION}	Dependent item	aws.ecs.alarm.state_reason["{#ALARM_NAME}"]Preprocessing JSON Path: `$.StateReason`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#ALARM_NAME}]: State	The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM). Alarm description: {#ALARM_DESCRIPTION}	Dependent item	aws.ecs.alarm.state["{#ALARM_NAME}"]Preprocessing JSON Path: `$.StateValue`⛔️Custom on fail: Set value to: `3` JavaScript: `The text is too long. Please see the template.`

aws.ecs.alarm.get_metrics["{#ALARM_NAME}"]Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.ecs.alarm.state_reason["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateReason⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM).

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.ecs.alarm.state["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateValue⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Cluster Alarms discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ECS Serverless: [{#ALARM_NAME}] has 'Alarm' state	Alarm "{#ALARM_NAME}" has 'Alarm' state. Reason: {ITEM.LASTVALUE2}	`last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0`	Average
AWS ECS Serverless: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS ECS Serverless Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1`	Info

LLD rule Cluster Services discovery

Name Description Type Key and additional info

Cluster Services discovery

Discovery {$AWS.ECS.CLUSTER.NAME} services.

Dependent item

Name	Description	Type	Key and additional info
Cluster Services discovery	Discovery {$AWS.ECS.CLUSTER.NAME} services.	Dependent item	aws.ecs.services.discoveryPreprocessing Discard unchanged with heartbeat: `3h`

aws.ecs.services.discoveryPreprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Services discovery

Name	Description	Type	Key and additional info
[{#AWS.ECS.SERVICE.NAME}]: Running Task	The number of tasks currently in the `running` state.	Dependent item	aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Pending Task	The number of tasks currently in the `pending` state.	Dependent item	aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Desired Task	The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service.	Dependent item	aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Task Set	The number of task sets in the {#AWS.ECS.SERVICE.NAME} service.	Dependent item	aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: CPU Reserved	A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.	Dependent item	aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: CPU Utilization	A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.	Dependent item	aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Memory utilized	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Custom multiplier: `1048576`
[{#AWS.ECS.SERVICE.NAME}]: Memory utilization	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Memory reserved	The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Custom multiplier: `1048576`
[{#AWS.ECS.SERVICE.NAME}]: Network rx bytes	The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Network tx bytes	The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Ephemeral storage reserved	The number of bytes reserved from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later.	Dependent item	aws.ecs.services.ephemeral.storage.reserved["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Custom multiplier: `1073741824`
[{#AWS.ECS.SERVICE.NAME}]: Ephemeral storage utilized	The number of bytes used from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later.	Dependent item	aws.ecs.services.ephemeral.storage.utilized["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Custom multiplier: `1073741824`
[{#AWS.ECS.SERVICE.NAME}]: Storage read bytes	The number of bytes read from storage in the resource that is specified by the dimensions that you're using.	Dependent item	aws.ecs.services.storage.read.bytes["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Storage write bytes	The number of bytes written to storage in the resource that is specified by the dimensions that you're using.	Dependent item	aws.ecs.services.storage.write.bytes["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Get metrics	Get metrics of ESC services. Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html	Script	aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"]Preprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value

Trigger prototypes for Cluster Services discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ECS Serverless: [{#AWS.ECS.SERVICE.NAME}]: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}`	Warning
AWS ECS Serverless: [{#AWS.ECS.SERVICE.NAME}]: High memory utilization	The system is running out of free memory.	`min(/AWS ECS Serverless Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}`	Warning

AWS ECS Cluster by HTTP

Overview

The template to monitor AWS ECS Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Note: This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about the metrics and used API methods:

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS ECS Cluster by HTTP

Configuration

Setup

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData",
              "ecs:ListServices"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ecs:ListServices"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ecs:ListServices",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the following macros {$AWS.AUTH_TYPE}, {$AWS.REGION}, {$AWS.ECS.CLUSTER.NAME}.

For more information about managing access keys, see official documentation.

Refer to the Macros section for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.REGION}	Amazon ECS Region code.	`us-west-1`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty then no proxy is used.
{$AWS.ECS.CLUSTER.NAME}	ECS cluster name.
{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.ECS.LLD.FILTER.ALARM_NAME.NOT_MATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.ECS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES}	Filter of discoverable services by name.	`.*`
{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES}	Filter to exclude discovered services by name.	`CHANGE_IF_NEEDED`
{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}	The warning threshold of the cluster CPU utilization expressed in %.	`70`
{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}	The warning threshold of the cluster memory utilization expressed in %.	`70`
{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}	The warning threshold of the cluster service CPU utilization expressed in %.	`80`
{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}	The warning threshold of the cluster service memory utilization expressed in %.	`80`

Items

Name	Description	Type	Key and additional info
Get cluster metrics	Get cluster metrics. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html	Script	aws.ecs.get_metricsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get cluster services	Get cluster services. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html	Script	aws.ecs.get_cluster_servicesPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get alarms data	Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.ecs.get_alarmsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get metrics check	Data collection check.	Dependent item	aws.ecs.metrics.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Data collection check.	Dependent item	aws.ecs.alarms.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Container Instance Count	The number of EC2 instances running the Amazon ECS agent that are registered with a cluster.	Dependent item	aws.ecs.container_instance_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Task Count	The number of tasks running in the cluster.	Dependent item	aws.ecs.task_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Service Count	The number of services in the cluster.	Dependent item	aws.ecs.service_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
CPU Reserved	A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.	Dependent item	aws.ecs.cpu_reservedPreprocessing JSON Path: `$.[?(@.Label == "CpuReserved")].Values.first().first()`⛔️Custom on fail: Discard value
CPU Utilization	Cluster CPU utilization	Dependent item	aws.ecs.cpu_utilizationPreprocessing JSON Path: `$.CPUUtilization`⛔️Custom on fail: Discard value
Memory Utilization	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.memory_utilizationPreprocessing JSON Path: `$.MemoryUtilization`⛔️Custom on fail: Discard value
Network rx bytes	The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.network.rxPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Network tx bytes	The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.network.txPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity
AWS ECS Cluster: Failed to get metrics data	Failed to get CloudWatch metrics for ECS Cluster.	`length(last(/AWS ECS Cluster by HTTP/aws.ecs.metrics.check))>0`	Warning
AWS ECS Cluster: Failed to get alarms data	Failed to get CloudWatch alarms for ECS Cluster.	`length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarms.check))>0`	Warning
AWS ECS Cluster: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS ECS Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}`	Warning
AWS ECS Cluster: High memory utilization	The system is running out of free memory.	`min(/AWS ECS Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}`	Warning

LLD rule Cluster Alarms discovery

Name Description Type Key and additional info

Cluster Alarms discovery

Discovery instance alarms.

Dependent item

Name	Description	Type	Key and additional info
Cluster Alarms discovery	Discovery instance alarms.	Dependent item	aws.ecs.alarms.discoveryPreprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

aws.ecs.alarms.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Alarms discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get alarm metrics about the state and its reason.

Dependent item

aws.ecs.alarm.get_metrics["{#ALARM_NAME}"]Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.ecs.alarm.state_reason["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateReason⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM).

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.ecs.alarm.state["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateValue⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Cluster Alarms discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ECS Cluster: [{#ALARM_NAME}] has 'Alarm' state	Alarm "{#ALARM_NAME}" has `Alarm` state. Reason: {ITEM.LASTVALUE2}	`last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0`	Average
AWS ECS Cluster: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1`	Info

LLD rule Cluster Services discovery

Name Description Type Key and additional info

Cluster Services discovery

Discovery {$AWS.ECS.CLUSTER.NAME} services.

Dependent item

aws.ecs.services.discoveryPreprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Services discovery

Name	Description	Type	Key and additional info
[{#AWS.ECS.SERVICE.NAME}]: Running Task	The number of tasks currently in the `running` state.	Dependent item	aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Pending Task	The number of tasks currently in the `pending` state.	Dependent item	aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Desired Task	The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service.	Dependent item	aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: Task Set	The number of task sets in the {#AWS.ECS.SERVICE.NAME} service.	Dependent item	aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `3h`
[{#AWS.ECS.SERVICE.NAME}]: CPU Reserved	A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.	Dependent item	aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: CPU Utilization	A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.	Dependent item	aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Memory utilized	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Custom multiplier: `1048576`
[{#AWS.ECS.SERVICE.NAME}]: Memory utilization	The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Memory reserved	The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.	Dependent item	aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value Custom multiplier: `1048576`
[{#AWS.ECS.SERVICE.NAME}]: Network rx bytes	The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Network tx bytes	The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.	Dependent item	aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ECS.SERVICE.NAME}]: Get metrics	Get metrics of ESC services. Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html	Script	aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"]Preprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value

Trigger prototypes for Cluster Services discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ECS Cluster: [{#AWS.ECS.SERVICE.NAME}]: High CPU utilization	The CPU utilization is too high. The system might be slow to respond.	`min(/AWS ECS Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}`	Warning
AWS ECS Cluster: [{#AWS.ECS.SERVICE.NAME}]: High memory utilization	The system is running out of free memory.	`min(/AWS ECS Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}`	Warning

AWS ELB Application Load Balancer by HTTP

Overview

Please scroll down for AWS ELB Network Load Balancer by HTTP.

The template is designed to monitor AWS ELB Application Load Balancer by HTTP via Zabbix, and it works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and API methods used in the template:

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS ELB Application Load Balancer with Target Groups by HTTP

Configuration

Setup

The template gets AWS ELB Application Load Balancer metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account. For more information, visit the ELB policies page on the AWS website.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect AWS ELB Application Load Balancer metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData",
              "elasticloadbalancing:DescribeTargetGroups"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "elasticloadbalancing:DescribeTargetGroups"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, and {$AWS.ELB.ARN}.

For more information about managing access keys, see official AWS documentation.

See the section below for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.REGION}	AWS Application Load Balancer region code.	`us-west-1`
{$AWS.DATA.TIMEOUT}	API response timeout.	`60s`
{$AWS.PROXY}	Sets the HTTP proxy value. If this macro is empty, no proxy is used.
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ELB.ARN}	Amazon Resource Names (ARN) of the load balancer.
{$AWS.HTTP.4XX.FAIL.MAX.WARN}	Maximum number of HTTP request failures for a trigger expression.	`5`
{$AWS.HTTP.5XX.FAIL.MAX.WARN}	Maximum number of HTTP request failures for a trigger expression.	`5`
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.MATCHES}	Filter of discoverable target groups by name.	`.*`
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.NOT_MATCHES}	Filter to exclude discovered target groups by name.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.ALARM_SERVICE_NAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.ELB.LLD.FILTER.ALARM_SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.ELB.LLD.FILTER.ALARM_NAME.NOT_MATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get ELB Application Load Balancer metrics. Full metrics list related to Application Load Balancer: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html	Script	aws.elb.alb.get_metricsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get target groups	Get ELB target group. `DescribeTargetGroups` API method: https://docs.aws.amazon.com/elasticloadbalancing/latest/APIReference/API_DescribeTargetGroups.html	Script	aws.elb.alb.get_target_groupsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get ELB ALB alarms data	`DescribeAlarms` API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.elb.alb.get_alarmsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get metrics check	Check that the Application Load Balancer metrics data has been received correctly.	Dependent item	aws.elb.alb.metrics.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Check that the alarm data has been received correctly.	Dependent item	aws.elb.alb.alarms.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Active Connection Count	The total number of active concurrent TCP connections from clients to the load balancer and from the load balancer to targets.	Dependent item	aws.elb.alb.active_connection_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
New Connection Count	The total number of new TCP connections established from clients to the load balancer and from the load balancer to targets.	Dependent item	aws.elb.alb.new_connection_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Rejected Connection Count	The number of connections that were rejected because the load balancer had reached its maximum number of connections.	Dependent item	aws.elb.alb.rejected_connection_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Requests Count	The number of requests processed over IPv4 and IPv6. This metric is only incremented for requests where the load balancer node was able to choose a target. Requests that are rejected before a target is chosen are not reflected in this metric.	Dependent item	aws.elb.alb.requests_countPreprocessing JSON Path: `$.[?(@.Label == "RequestCount")].Values.first().first()`⛔️Custom on fail: Discard value
Target Response Time	The time elapsed, in seconds, after the request leaves the load balancer until a response from the target is received. This is equivalent to the `target_processing_time` field in the access logs.	Dependent item	aws.elb.alb.target_response_timePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
HTTP Fixed Response Count	The number of fixed-response actions that were successful.	Dependent item	aws.elb.alb.http_fixed_response_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Rule Evaluations	The number of rules processed by the load balancer given a request rate averaged over an hour.	Dependent item	aws.elb.alb.rule_evaluationsPreprocessing JSON Path: `$.[?(@.Label == "RuleEvaluations")].Values.first().first()`⛔️Custom on fail: Discard value
Client TLS Negotiation Error Count	The number of TLS connections initiated by the client that did not establish a session with the load balancer due to a TLS error. Possible causes include a mismatch of ciphers or protocols or the client failing to verify the server certificate and closing the connection.	Dependent item	aws.elb.alb.client_tls_negotiation_error_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Target TLS Negotiation Error Count	The number of TLS connections initiated by the load balancer that did not establish a session with the target. Possible causes include a mismatch of ciphers or protocols. This metric does not apply if the target is a Lambda function.	Dependent item	aws.elb.alb.target_tls_negotiation_error_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Target Connection Error Count	The number of connections that were not successfully established between the load balancer and target. This metric does not apply if the target is a Lambda function.	Dependent item	aws.elb.alb.target_connection_error_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Consumed LCUs	The number of load balancer capacity units (LCU) used by your load balancer. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/	Dependent item	aws.elb.alb.capacity_unitsPreprocessing JSON Path: `$.[?(@.Label == "ConsumedLCUs")].Values.first().first()`⛔️Custom on fail: Discard value
Processed Bytes	The total number of bytes processed by the load balancer over IPv4 and IPv6 (HTTP header and HTTP payload). This count includes traffic to and from clients and Lambda functions, and traffic from an Identity Provider (IdP) if user authentication is enabled.	Dependent item	aws.elb.alb.processed_bytesPreprocessing JSON Path: `$.[?(@.Label == "ProcessedBytes")].Values.first().first()`⛔️Custom on fail: Discard value
Desync Mitigation Mode Non Compliant Request Count	The number of requests that fail to comply with HTTP protocols.	Dependent item	aws.elb.alb.non_compliant_request_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
HTTP Redirect Count	The number of redirect actions that were successful.	Dependent item	aws.elb.alb.http_redirect_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
HTTP Redirect Url Limit Exceeded Count	The number of redirect actions that could not be completed because the URL in the response location header is larger than 8K bytes.	Dependent item	aws.elb.alb.http_redirect_url_limit_exceeded_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
ELB HTTP 3XX Count	The number of HTTP 3XX redirection codes that originate from the load balancer. This count does not include response codes generated by targets.	Dependent item	aws.elb.alb.http_3xx_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
ELB HTTP 4XX Count	The number of HTTP 4XX client error codes that originate from the load balancer. Client errors are generated when requests are malformed or incomplete. These requests were not received by the target, other than in the case where the load balancer returns an HTTP 460 error code. This count does not include any response codes generated by the targets.	Dependent item	aws.elb.alb.http_4xx_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
ELB HTTP 5XX Count	The number of HTTP 5XX server error codes that originate from the load balancer. This count does not include any response codes generated by the targets.	Dependent item	aws.elb.alb.http_5xx_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
ELB HTTP 500 Count	The number of HTTP 500 error codes that originate from the load balancer.	Dependent item	aws.elb.alb.http_500_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
ELB HTTP 502 Count	The number of HTTP 502 error codes that originate from the load balancer.	Dependent item	aws.elb.alb.http_502_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
ELB HTTP 503 Count	The number of HTTP 503 error codes that originate from the load balancer.	Dependent item	aws.elb.alb.http_503_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
ELB HTTP 504 Count	The number of HTTP 504 error codes that originate from the load balancer.	Dependent item	aws.elb.alb.http_504_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
ELB Auth Error	The number of user authentications that could not be completed because an authenticate action was misconfigured, the load balancer could not establish a connection with the IdP, or the load balancer could not complete the authentication flow due to an internal error.	Dependent item	aws.elb.alb.auth_errorPreprocessing JSON Path: `$.[?(@.Label == "ELBAuthError")].Values.first().first()`⛔️Custom on fail: Discard value
ELB Auth Failure	The number of user authentications that could not be completed because the IdP denied access to the user or an authorization code was used more than once.	Dependent item	aws.elb.alb.auth_failurePreprocessing JSON Path: `$.[?(@.Label == "ELBAuthFailure")].Values.first().first()`⛔️Custom on fail: Discard value
ELB Auth User Claims Size Exceeded	The number of times that a configured IdP returned user claims that exceeded 11K bytes in size.	Dependent item	aws.elb.alb.auth_user_claims_size_exceededPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
ELB Auth Latency	The time elapsed, in milliseconds, to query the IdP for the ID token and user info. If one or more of these operations fail, this is the time to failure.	Dependent item	aws.elb.alb.auth_latencyPreprocessing JSON Path: `$.[?(@.Label == "ELBAuthLatency")].Values.first().first()`⛔️Custom on fail: Discard value
ELB Auth Success	The number of authenticate actions that were successful. This metric is incremented at the end of the authentication workflow, after the load balancer has retrieved the user claims from the IdP.	Dependent item	aws.elb.alb.auth_successPreprocessing JSON Path: `$.[?(@.Label == "ELBAuthSuccess")].Values.first().first()`⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity
AWS ELB ALB: Failed to get metrics data	Failed to get CloudWatch metrics for Application Load Balancer.	`length(last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.metrics.check))>0`	Warning
AWS ELB ALB: Failed to get alarms data	Failed to get CloudWatch alarms for Application Load Balancer.	`length(last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarms.check))>0`	Warning
AWS ELB ALB: Too many HTTP 4XX error codes	Too many requests failed with HTTP 4XX code.	`min(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.http_4xx_count,5m)>{$AWS.HTTP.4XX.FAIL.MAX.WARN}`	Warning
AWS ELB ALB: Too many HTTP 5XX error codes	Too many requests failed with HTTP 5XX code.	`min(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.http_5xx_count,5m)>{$AWS.HTTP.5XX.FAIL.MAX.WARN}`	Warning

LLD rule Load Balancer alarm discovery

Name Description Type Key and additional info

Load Balancer alarm discovery

Used for the discovery of alarm balancers.

Dependent item

aws.elb.alb.alarms.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Load Balancer alarm discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get metrics about the alarm state and its reason.

Dependent item

aws.elb.alb.alarm.get_metrics["{#ALARM_NAME}"]Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state reason in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.elb.alb.alarm.state_reason["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateReason⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The value of the alarm state. Possible values:

0 - OK;

1 - INSUFFICIENT_DATA;

2 - ALARM.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.elb.alb.alarm.state["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateValue⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Load Balancer alarm discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ELB ALB: [{#ALARM_NAME}] has 'Alarm' state	The alarm `{#ALARM_NAME}` is in the ALARM state. Reason: `{ITEM.LASTVALUE2}`	`last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarm.state_reason["{#ALARM_NAME}"]))>0`	Average
AWS ELB ALB: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS ELB Application Load Balancer by HTTP/aws.elb.alb.alarm.state["{#ALARM_NAME}"])=1`	Info

LLD rule Target groups discovery

Name Description Type Key and additional info

Target groups discovery

Used for the discovery of {$AWS.ELB.TARGET.GROUP.NAME} target groups.

Dependent item

aws.elb.alb.target_groups.discoveryPreprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Target groups discovery

Name	Description	Type	Key and additional info
[{#AWS.ELB.TARGET.GROUP.NAME}]: Get metrics	Get the metrics of the ELB target group `{#AWS.ELB.TARGET.GROUP.NAME}`. Full list of metrics related to AWS ELB here: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html#user-authentication-metric-table	Script	aws.elb.alb.target_groups.get_metrics["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 2XX Count	The number of HTTP response 2XX codes generated by the targets. This does not include any response codes generated by the load balancer.	Dependent item	aws.elb.alb.target_groups.http_2xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 3XX Count	The number of HTTP response 3XX codes generated by the targets. This does not include any response codes generated by the load balancer.	Dependent item	aws.elb.alb.target_groups.http_3xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 4XX Count	The number of HTTP response 4XX codes generated by the targets. This does not include any response codes generated by the load balancer.	Dependent item	aws.elb.alb.target_groups.http_4xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: HTTP Code Target 5XX Count	The number of HTTP response 5XX codes generated by the targets. This does not include any response codes generated by the load balancer.	Dependent item	aws.elb.alb.target_groups.http_5xx_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy Host Count	The number of targets that are considered healthy.	Dependent item	aws.elb.alb.target_groups.healthy_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "HealthyHostCount")].Values.first().first()`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy Host Count	The number of targets that are considered unhealthy.	Dependent item	aws.elb.alb.target_groups.unhealthy_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy State Routing	The number of zones that meet the routing healthy state requirements.	Dependent item	aws.elb.alb.target_groups.healthy_state_routing["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy State Routing	The number of zones that do not meet the routing healthy state requirements, and therefore the load balancer distributes traffic to all targets in the zone, including the unhealthy targets.	Dependent item	aws.elb.alb.target_groups.unhealthy_state_routing["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Request Count Per Target	The average request count per target, in a target group. You must specify the target group using the TargetGroup dimension.	Dependent item	aws.elb.alb.target_groups.request["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy Routing Request Count	The average request count per target, in a target group.	Dependent item	aws.elb.alb.target_groups.unhealthy_routing_request_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Mitigated Host Count	The number of targets under mitigation.	Dependent item	aws.elb.alb.target_groups.mitigated_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Anomalous Host Count	The number of hosts detected with anomalies.	Dependent item	aws.elb.alb.target_groups.anomalous_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy State DNS	The number of zones that meet the DNS healthy state requirements.	Dependent item	aws.elb.alb.target_groups.healthy_state_dns["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "HealthyStateDNS")].Values.first().first()`⛔️Custom on fail: Discard value
[{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy State DNS	The number of zones that do not meet the DNS healthy state requirements and therefore were marked unhealthy in DNS.	Dependent item	aws.elb.alb.target_groups.unhealthy_state_dns["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing JSON Path: `$.[?(@.Label == "UnhealthyStateDNS")].Values.first().first()`⛔️Custom on fail: Discard value

AWS ELB Network Load Balancer by HTTP

Overview

The template is designed to monitor AWS ELB Network Load Balancer by HTTP via Zabbix, and it works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and API methods used in the template:

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS ELB Network Load Balancer with Target Groups by HTTP

Configuration

Setup

The template gets AWS ELB Network Load Balancer metrics and uses the script item to make HTTP requests to the CloudWatch API.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect AWS ELB Network Load Balancer metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData",
              "elasticloadbalancing:DescribeTargetGroups"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "elasticloadbalancing:DescribeTargetGroups"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, and {$AWS.ELB.ARN}.

For more information about managing access keys, see official AWS documentation.

See the section below for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.REGION}	AWS Network Load Balancer region code.	`us-west-1`
{$AWS.PROXY}	Sets the HTTP proxy value. If this macro is empty, no proxy is used.
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.DATA.TIMEOUT}	API response timeout.	`60s`
{$AWS.ELB.ARN}	Amazon Resource Names (ARN) of the load balancer.
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.MATCHES}	Filter of discoverable target groups by name.	`.*`
{$AWS.ELB.LLD.FILTER.TARGET.GROUP.NOT_MATCHES}	Filter to exclude discovered target groups by name.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.ALARM_SERVICE_NAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.ELB.LLD.FILTER.ALARM_SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.ELB.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.ELB.LLD.FILTER.ALARM_NAME.NOT_MATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`
{$AWS.ELB.UNHEALTHY.HOST.MAX}	Maximum number of unhealthy hosts for a trigger expression.	`0`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get ELB Network Load Balancer metrics. Full metrics list related to Network Load Balancer: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-cloudwatch-metrics.html	Script	aws.elb.nlb.get_metricsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get target groups	Get ELB target group. `DescribeTargetGroups` API method: https://docs.aws.amazon.com/elasticloadbalancing/latest/APIReference/API_DescribeTargetGroups.html	Script	aws.elb.nlb.get_target_groupsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get ELB NLB alarms data	`DescribeAlarms` API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.elb.nlb.get_alarmsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get metrics check	Check that the Network Load Balancer metrics data has been received correctly.	Dependent item	aws.elb.nlb.metrics.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Check that the alarm data has been received correctly.	Dependent item	aws.elb.nlb.alarms.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Active Flow Count	The total number of concurrent flows (or connections) from clients to targets. This metric includes connections in the `SYN_SENT` and `ESTABLISHED` states. TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow.	Dependent item	aws.elb.nlb.active_flow_countPreprocessing JSON Path: `$.[?(@.Label == "ActiveFlowCount")].Values.first().first()`⛔️Custom on fail: Discard value
Active Flow Count TCP	The total number of concurrent TCP flows (or connections) from clients to targets. This metric includes connections in the `SYN_SENT` and `ESTABLISHED` states. TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow.	Dependent item	aws.elb.nlb.active_flow_count_tcpPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Active Flow Count TLS	The total number of concurrent TLS flows (or connections) from clients to targets. This metric includes connections in the `SYN_SENT` and `ESTABLISHED` states.	Dependent item	aws.elb.nlb.active_flow_count_tlsPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Active Flow Count UDP	The total number of concurrent UDP flows (or connections) from clients to targets.	Dependent item	aws.elb.nlb.active_flow_count_udpPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Client TLS Negotiation Error Count	The total number of TLS handshakes that failed during negotiation between a client and a TLS listener.	Dependent item	aws.elb.nlb.client_tls_negotiation_error_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Consumed LCUs	The number of load balancer capacity units (LCU) used by your load balancer. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/	Dependent item	aws.elb.nlb.capacity_unitsPreprocessing JSON Path: `$.[?(@.Label == "ConsumedLCUs")].Values.first().first()`⛔️Custom on fail: Discard value
Consumed LCUs TCP	The number of load balancer capacity units (LCU) used by your load balancer for TCP. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/	Dependent item	aws.elb.nlb.capacity_units_tcpPreprocessing JSON Path: `$.[?(@.Label == "ConsumedLCUs_TCP")].Values.first().first()`⛔️Custom on fail: Discard value
Consumed LCUs TLS	The number of load balancer capacity units (LCU) used by your load balancer for TLS. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/	Dependent item	aws.elb.nlb.capacity_units_tlsPreprocessing JSON Path: `$.[?(@.Label == "ConsumedLCUs_TLS")].Values.first().first()`⛔️Custom on fail: Discard value
Consumed LCUs UDP	The number of load balancer capacity units (LCU) used by your load balancer for UDP. You pay for the number of LCUs that you use per hour. More information on Elastic Load Balancing pricing here: https://aws.amazon.com/elasticloadbalancing/pricing/	Dependent item	aws.elb.nlb.capacity_units_udpPreprocessing JSON Path: `$.[?(@.Label == "ConsumedLCUs_UDP")].Values.first().first()`⛔️Custom on fail: Discard value
New Flow Count	The total number of new flows (or connections) established from clients to targets in the specified time period.	Dependent item	aws.elb.nlb.new_flow_countPreprocessing JSON Path: `$.[?(@.Label == "NewFlowCount")].Values.first().first()`⛔️Custom on fail: Discard value
New Flow Count TCP	The total number of new TCP flows (or connections) established from clients to targets in the specified time period.	Dependent item	aws.elb.nlb.new_flow_count_tcpPreprocessing JSON Path: `$.[?(@.Label == "NewFlowCount_TCP")].Values.first().first()`⛔️Custom on fail: Discard value
New Flow Count TLS	The total number of new TLS flows (or connections) established from clients to targets in the specified time period.	Dependent item	aws.elb.nlb.new_flow_count_tlsPreprocessing JSON Path: `$.[?(@.Label == "NewFlowCount_TLS")].Values.first().first()`⛔️Custom on fail: Discard value
New Flow Count UDP	The total number of new UDP flows (or connections) established from clients to targets in the specified time period.	Dependent item	aws.elb.nlb.new_flow_count_udpPreprocessing JSON Path: `$.[?(@.Label == "NewFlowCount_UDP")].Values.first().first()`⛔️Custom on fail: Discard value
Peak Packets per second	Highest average packet rate (packets processed per second), calculated every 10 seconds during the sampling window. This metric includes health check traffic.	Dependent item	aws.elb.nlb.peak_packets.ratePreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Port Allocation Error Count	The total number of ephemeral port allocation errors during a client IP translation operation. A non-zero value indicates dropped client connections. Note: Network Load Balancers support 55,000 simultaneous connections or about 55,000 connections per minute to each unique target (IP address and port) when performing client address translation. To fix port allocation errors, add more targets to the target group.	Dependent item	aws.elb.nlb.port_allocation_error_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Processed Bytes	The total number of bytes processed by the load balancer, including TCP/IP headers. This count includes traffic to and from targets, minus health check traffic.	Dependent item	aws.elb.nlb.processed_bytesPreprocessing JSON Path: `$.[?(@.Label == "ProcessedBytes")].Values.first().first()`⛔️Custom on fail: Discard value
Processed Bytes TCP	The total number of bytes processed by TCP listeners.	Dependent item	aws.elb.nlb.processed_bytes_tcpPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Processed Bytes TLS	The total number of bytes processed by TLS listeners.	Dependent item	aws.elb.nlb.processed_bytes_tlsPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Processed Bytes UDP	The total number of bytes processed by UDP listeners.	Dependent item	aws.elb.nlb.processed_bytes_udpPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Processed Packets	The total number of packets processed by the load balancer. This count includes traffic to and from targets, including health check traffic.	Dependent item	aws.elb.nlb.processed_packetsPreprocessing JSON Path: `$.[?(@.Label == "ProcessedPackets")].Values.first().first()`⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Inbound ICMP	The number of new ICMP messages rejected by the inbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sg_blocked_inbound_icmpPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Inbound TCP	The number of new TCP flows rejected by the inbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sg_blocked_inbound_tcpPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Inbound UDP	The number of new UDP flows rejected by the inbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sg_blocked_inbound_udpPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Outbound ICMP	The number of new ICMP messages rejected by the outbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sg_blocked_outbound_icmpPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Outbound TCP	The number of new TCP flows rejected by the outbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sg_blocked_outbound_tcpPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Security Group Blocked Flow Count Outbound UDP	The number of new UDP flows rejected by the outbound rules of the load balancer security groups.	Dependent item	aws.elb.nlb.sg_blocked_outbound_udpPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Target TLS Negotiation Error Count	The total number of TLS handshakes that failed during negotiation between a TLS listener and a target.	Dependent item	aws.elb.nlb.target_tls_negotiation_error_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
TCP Client Reset Count	The total number of reset (RST) packets sent from a client to a target. These resets are generated by the client and forwarded by the load balancer.	Dependent item	aws.elb.nlb.tcp_client_reset_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
TCP ELB Reset Count	The total number of reset (RST) packets generated by the load balancer. For more information, see: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html#elb-reset-count-metric	Dependent item	aws.elb.nlb.tcp_elb_reset_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
TCP Target Reset Count	The total number of reset (RST) packets sent from a target to a client. These resets are generated by the target and forwarded by the load balancer.	Dependent item	aws.elb.nlb.tcp_target_reset_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Unhealthy Routing Flow Count	The number of flows (or connections) that are routed using the routing failover action (fail open).	Dependent item	aws.elb.nlb.unhealthy_routing_flow_countPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
AWS ELB NLB: Failed to get metrics data	Failed to get CloudWatch metrics for Network Load Balancer.	`length(last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.metrics.check))>0`	Warning
AWS ELB NLB: Failed to get alarms data	Failed to get CloudWatch alarms for Network Load Balancer.	`length(last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarms.check))>0`	Warning

LLD rule Load Balancer alarm discovery

Name Description Type Key and additional info

Load Balancer alarm discovery

Used for the discovery of alarm balancers.

Dependent item

aws.elb.nlb.alarms.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Load Balancer alarm discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get metrics about the alarm state and its reason.

Dependent item

aws.elb.nlb.alarm.get_metrics["{#ALARM_NAME}"]Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state reason in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.elb.nlb.alarm.state_reason["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateReason⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The value of the alarm state. Possible values:

0 - OK;

1 - INSUFFICIENT_DATA;

2 - ALARM.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.elb.nlb.alarm.state["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateValue⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Load Balancer alarm discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS ELB NLB: [{#ALARM_NAME}] has 'Alarm' state	The alarm `{#ALARM_NAME}` is in the ALARM state. Reason: `{ITEM.LASTVALUE2}`	`last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarm.state_reason["{#ALARM_NAME}"]))>0`	Average
AWS ELB NLB: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.alarm.state["{#ALARM_NAME}"])=1`	Info

LLD rule Target groups discovery

Name Description Type Key and additional info

Target groups discovery

Used for the discovery of {$AWS.ELB.TARGET.GROUP.NAME} target groups.

Dependent item

aws.elb.nlb.target_groups.discoveryPreprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Target groups discovery

Name Description Type Key and additional info

[{#AWS.ELB.TARGET.GROUP.NAME}]: Get metrics

Get the metrics of the ELB target group {#AWS.ELB.TARGET.GROUP.NAME}.

Full list of metrics related to AWS ELB here: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-cloudwatch-metrics.html#user-authentication-metric-table

Script

aws.elb.nlb.target_groups.get_metrics["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing

Check for not supported value: any error⛔️Custom on fail: Discard value

[{#AWS.ELB.TARGET.GROUP.NAME}]: Healthy Host Count

The number of targets that are considered healthy.

Dependent item

aws.elb.nlb.target_groups.healthy_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing

JSON Path: $.[?(@.Label == "HealthyHostCount")].Values.first().first()⛔️Custom on fail: Discard value

[{#AWS.ELB.TARGET.GROUP.NAME}]: Unhealthy Host Count

The number of targets that are considered unhealthy.

Dependent item

aws.elb.nlb.target_groups.unhealthy_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]Preprocessing

JSON Path: The text is too long. Please see the template.⛔️Custom on fail: Discard value

Trigger prototypes for Target groups discovery

Name Description Expression Severity Dependencies and additional info

AWS ELB NLB: [{#AWS.ELB.TARGET.GROUP.NAME}]: Target have become unhealthy This trigger helps in identifying when your targets have become unhealthy. last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.target_groups.healthy_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]) = 0 Average

AWS ELB NLB: [{#AWS.ELB.TARGET.GROUP.NAME}]: Target have unhealthy host

This trigger allows you to become aware when there are no more registered targets.

last(/AWS ELB Network Load Balancer by HTTP/aws.elb.nlb.target_groups.unhealthy_host_count["{#AWS.ELB.TARGET.GROUP.NAME}"]) > {$AWS.ELB.UNHEALTHY.HOST.MAX}

Warning

Depends on:

AWS ELB NLB: [{#AWS.ELB.TARGET.GROUP.NAME}]: Target have become unhealthy

AWS Lambda by HTTP

Overview

This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about metrics and API methods used in the template:

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS Lambda by HTTP

Configuration

Setup

The template gets AWS Lambda metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account. For more information, visit the Lambda permissions page on the AWS website.

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect AWS Lambda metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "cloudwatch:DescribeAlarms",
              "cloudwatch:GetMetricData"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume role authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, and {$AWS.LAMBDA.ARN}.

For more information about managing access keys, see the official AWS documentation.

See the section below for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.REGION}	AWS Lambda function region code.	`us-west-1`
{$AWS.PROXY}	Sets the HTTP proxy value. If this macro is empty, no proxy is used.
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.DATA.TIMEOUT}	API response timeout.	`60s`
{$AWS.LAMBDA.ARN}	The Amazon Resource Names (ARN) of the Lambda function.
{$AWS.LAMBDA.LLD.FILTER.ALARM_SERVICE_NAMESPACE.MATCHES}	Filter of discoverable alarms by namespace.	`.*`
{$AWS.LAMBDA.LLD.FILTER.ALARM_SERVICE_NAMESPACE.NOT_MATCHES}	Filter to exclude discovered alarms by namespace.	`CHANGE_IF_NEEDED`
{$AWS.LAMBDA.LLD.FILTER.ALARM_NAME.MATCHES}	Filter of discoverable alarms by name.	`.*`
{$AWS.LAMBDA.LLD.FILTER.ALARM_NAME.NOT_MATCHES}	Filter to exclude discovered alarms by name.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
Get metrics data	Get Lambda function metrics. Full metrics list related to the Lambda function: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html	Script	aws.lambda.get_metricsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get Lambda alarms data	`DescribeAlarms` API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html	Script	aws.lambda.get_alarmsPreprocessing Check for not supported value: `any error`⛔️Custom on fail: Discard value
Get metrics check	Check that the Lambda function metrics data has been received correctly.	Dependent item	aws.lambda.metrics.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Get alarms check	Check that the alarm data has been received correctly.	Dependent item	aws.lambda.alarms.checkPreprocessing JSON Path: `$.error`⛔️Custom on fail: Set value to Discard unchanged with heartbeat: `3h`
Async events received sum	The number of events that Lambda successfully queues for processing. This metric provides insight into the number of events that a Lambda function receives.	Dependent item	aws.lambda.async_events_received.sumPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Async event age average	The time between when Lambda successfully queues the event and when the function is invoked. The value of this metric increases when events are being retried due to invocation failures or throttling.	Dependent item	aws.lambda.async_event_age.avgPreprocessing JSON Path: `$.[?(@.Label == "AsyncEventAge")].Values.first().first()`⛔️Custom on fail: Discard value Custom multiplier: `0.001`
Async events dropped sum	The number of events that are dropped without successfully executing the function. If you configure a dead-letter queue (DLQ) or an `OnFailure` destination, events are sent there before they're dropped.	Dependent item	aws.lambda.async_events_dropped.sumPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Total concurrent executions	The number of function instances that are processing events. If this number reaches your concurrent executions quota for the Region or the reserved concurrency limit on the function, then Lambda will throttle additional invocation requests.	Dependent item	aws.lambda.concurrent_executions.maxPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Unreserved concurrent executions maximum	For a Region, the number of events that function without reserved concurrency are processing.	Dependent item	aws.lambda.unreserved_concurrent_executions.maxPreprocessing JSON Path: `The text is too long. Please see the template.`⛔️Custom on fail: Discard value
Invocations sum	The number of times that your function code is invoked, including successful invocations and invocations that result in a function error. Invocations aren't recorded if the invocation request is throttled or otherwise results in an invocation error. The value of `Invocations` equals the number of requests billed.	Dependent item	aws.lambda.invocations.sumPreprocessing JSON Path: `$.[?(@.Label == "Invocations")].Values.first().first()`⛔️Custom on fail: Discard value
Errors sum	The number of invocations that result in a function error. Function errors include exceptions that your code throws and exceptions that the Lambda runtime throws. The runtime returns errors for issues such as timeouts and configuration errors.	Dependent item	aws.lambda.errors.sumPreprocessing JSON Path: `$.[?(@.Label == "Errors")].Values.first().first()`⛔️Custom on fail: Discard value
Dead letter errors sum	For asynchronous invocation, the number of times that Lambda attempts to send an event to a dead-letter queue (DLQ) but fails. Dead-letter errors can occur due to misconfigured resources or size limits.	Dependent item	aws.lambda.dead_letter_errors.sumPreprocessing JSON Path: `$.[?(@.Label == "DeadLetterErrors")].Values.first().first()`⛔️Custom on fail: Discard value
Throttles sum	The number of invocation requests that are throttled. When all function instances are processing requests and no concurrency is available to scale up, Lambda rejects additional requests with a `TooManyRequestsException` error.	Dependent item	aws.lambda.throttles.sumPreprocessing JSON Path: `$.[?(@.Label == "Throttles")].Values.first().first()`⛔️Custom on fail: Discard value
Duration average	The amount of time that your function code spends processing an event. The billed duration for an invocation is the value of `Duration` rounded up to the nearest millisecond. Duration does not include cold start time.	Dependent item	aws.lambda.duration.avgPreprocessing JSON Path: `$.[?(@.Label == "Duration")].Values.first().first()`⛔️Custom on fail: Discard value Custom multiplier: `0.001`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
AWS Lambda: Failed to get metrics data	Failed to get CloudWatch metrics for the Lambda function.	`length(last(/AWS Lambda by HTTP/aws.lambda.metrics.check))>0`	Warning
AWS Lambda: Failed to get alarms data	Failed to get CloudWatch alarms for the Lambda function.	`length(last(/AWS Lambda by HTTP/aws.lambda.alarms.check))>0`	Warning

LLD rule Lambda alarm discovery

Name Description Type Key and additional info

Lambda alarm discovery

Used for the discovery of alarm Lambda functions.

Dependent item

aws.lambda.discoveryPreprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Lambda alarm discovery

Name Description Type Key and additional info

[{#ALARM_NAME}]: Get metrics

Get metrics about the alarm state and its reason.

Dependent item

aws.lambda.alarm.get_metrics["{#ALARM_NAME}"]Preprocessing

JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state reason in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.lambda.alarm.state_reason["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateReason⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The value of the alarm state. Possible values:

0 - OK;

1 - INSUFFICIENT_DATA;

2 - ALARM.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item

aws.lambda.alarm.state["{#ALARM_NAME}"]Preprocessing

JSON Path: $.StateValue⛔️Custom on fail: Set value to: 3
JavaScript: The text is too long. Please see the template.

Trigger prototypes for Lambda alarm discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS Lambda: [{#ALARM_NAME}] has 'Alarm' state	The alarm `{#ALARM_NAME}` is in the ALARM state. Reason: `{ITEM.LASTVALUE2}`	`last(/AWS Lambda by HTTP/aws.lambda.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS Lambda by HTTP/aws.lambda.alarm.state_reason["{#ALARM_NAME}"]))>0`	Average
AWS Lambda: [{#ALARM_NAME}] has 'Insufficient data' state	Either the alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.	`last(/AWS Lambda by HTTP/aws.lambda.alarm.state["{#ALARM_NAME}"])=1`	Info

AWS Backup Vault by HTTP

Overview

This template uses AWS Backup API calls to list and retrieve metrics. For more information, please refer to the AWS Backup API page.

Additional information about metrics and API methods used in the template:

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS Backup Vault service

Configuration

Setup

The template gets AWS Backup vault metrics and uses the script item to make HTTP requests to the AWS Backup API.

Before using the template, you need to create an IAM policy with the necessary permissions for the Zabbix role in your AWS account.

Required permissions

Add the following required permissions to your Zabbix IAM policy in order to collect AWS backup vaults and jobs.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
              "backup:ListBackupVaults",
              "backup:ListBackupJobs",
              "backup:ListCopyJobs",
              "backup:ListRestoreJobs"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

Access Key Authorization

If you are using access key authorization, you need to generate an access key and a secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and a secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume Role authorization

For using Assume Role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "backup:ListBackupVaults",
                "backup:ListBackupJobs",
                "backup:ListCopyJobs",
                "backup:ListRestoreJobs"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "backup:ListBackupVaults",
                "backup:ListBackupJobs",
                "backup:ListCopyJobs",
                "backup:ListRestoreJobs"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "backup.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}, {$AWS.REGION}, and {$AWS.BACKUP_VAULT.NAME}.

For more information about managing access keys, see the official AWS documentation.

See the section below for a list of macros used for LLD filters.

Macros used

Name	Description	Default
{$AWS.DATA.TIMEOUT}	API response timeout.	`60s`
{$AWS.PROXY}	Sets the HTTP proxy value. If this macro is empty, no proxy is used.
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.REGION}	AWS backup vault region code.	`us-west-1`
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.BACKUP_VAULT.NAME}	AWS backup vault name.
{$AWS.BACKUP_JOB.STATE.MATCHES}	Filter of discoverable jobs by state.	`.*`
{$AWS.BACKUP_JOB.STATE.NOT_MATCHES}	Filter to exclude discovered jobs by state.	`CHANGE_IF_NEEDED`
{$AWS.BACKUP_JOB.RESOURCE_TYPE.MATCHES}	Filter of discoverable jobs by resource type.	`.*`
{$AWS.BACKUP_JOB.RESOURCE_TYPE.NOT_MATCHES}	Filter to exclude discovered jobs by resource type.	`CHANGE_IF_NEEDED`
{$AWS.BACKUP_JOB.RESOURCE_NAME.MATCHES}	Filter of discoverable jobs by resource name.	`.*`
{$AWS.BACKUP_JOB.RESOURCE_NAME.NOT_MATCHES}	Filter to exclude discovered jobs by resource name.	`CHANGE_IF_NEEDED`
{$AWS.BACKUP_JOB.PERIOD}	The number of days over which to retrieve backup jobs.	`7`

Items

Name	Description	Type	Key and additional info
Get jobs	Get a list of jobs in the vault.	Script	aws.backup_vault.job.get
Get data	Retrieve AWS backup vault metrics. More information here: https://docs.aws.amazon.com/aws-backup/latest/devguide/API_BackupVaultListMember.html	Script	aws.backup_vault.data.get
Recovery points	The total number of recovery points in the backup vault.	Dependent item	aws.backup_vault.recovery_pointsPreprocessing JSON Path: `$.NumberOfRecoveryPoints` Does not match regular expression: `null`⛔️Custom on fail: Discard value
Age	The age of the vault.	Dependent item	aws.backup_vault.agePreprocessing JSON Path: `$.CreationDate` JavaScript: `return Date.now() / 1000 - value`
Retention period, min	The minimum retention period that the vault retains its recovery points.	Dependent item	aws.backup_vault.retention.minPreprocessing JSON Path: `$.MinRetentionDays` Does not match regular expression: `null`⛔️Custom on fail: Set error to: `The vault does not have the minimum retention period set.`
Retention period, max	The maximum retention period that the vault retains its recovery points.	Dependent item	aws.backup_vault.retention.maxPreprocessing JSON Path: `$.MaxRetentionDays` Does not match regular expression: `null`⛔️Custom on fail: Set error to: `The vault does not have the maximum retention period set.`
Lock status	Indicates whether AWS Backup Vault Lock is applied to the selected backup vault. When the vault is locked, delete and update operations on recovery points in that vault are prevented.	Dependent item	aws.backup_vault.lock.statusPreprocessing JSON Path: `$.Locked` Replace: `false -> 0` Replace: `true -> 1`
Lock time remain	The remaining time before AWS Backup Vault Lock configuration becomes immutable, meaning it cannot be changed or deleted.	Dependent item	aws.backup_vault.lock.time_leftPreprocessing JSON Path: `$.LockDate` Does not match regular expression: `null`⛔️Custom on fail: Set error to: `Either the vault is not locked, or the lock date is not specified.` JavaScript: `The text is too long. Please see the template.`
Lock date	The date and time when AWS Backup Vault Lock configuration becomes immutable, meaning it cannot be changed or deleted.	Dependent item	aws.backup_vault.lock.datePreprocessing JSON Path: `$.LockDate` Does not match regular expression: `null`⛔️Custom on fail: Set error to: `Either the vault is not locked, or the lock date is not specified.`
State	The current state of the backup vault. Possible values are: - Unknown - Creating - Available - Failed	Dependent item	aws.backup_vault.statePreprocessing JSON Path: `$.VaultState` JavaScript: `The text is too long. Please see the template.`
Jobs: Size, avg	The average size, in bytes, of a backup (recovery point). This value can render differently depending on the resource type as AWS Backup pulls in data information from other AWS services. For example, the value returned may show a value of `0`, which may differ from the anticipated value.	Dependent item	aws.backup_vault.job.size.avgPreprocessing JSON Path: `$[?(@.job_size > 0)].job_size.avg()`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Jobs: Size, max	The maximum size, in bytes, of a backup (recovery point). This value can render differently depending on the resource type as AWS Backup pulls in data information from other AWS services. For example, the value returned may show a value of `0`, which may differ from the anticipated value.	Dependent item	aws.backup_vault.job.size.maxPreprocessing JSON Path: `$[?(@.job_size > 0)].job_size.max()`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Jobs: Size, min	The minimum size, in bytes, of a backup (recovery point). This value can render differently depending on the resource type as AWS Backup pulls in data information from other AWS services. For example, the value returned may show a value of `0`, which may differ from the anticipated value.	Dependent item	aws.backup_vault.job.size.minPreprocessing JSON Path: `$[?(@.job_size > 0)].job_size.min()`⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `1h`
Jobs: Backup	The number of backup jobs in the vault over the last `{$AWS.BACKUP_JOB.PERIOD}` day(s).	Dependent item	aws.backup_vault.job.backup.countPreprocessing JSON Path: `$.[?(@.job_type == "backup-job")].length()` Discard unchanged with heartbeat: `1h`
Jobs: Restore	The number of restore jobs in the vault over the last `{$AWS.BACKUP_JOB.PERIOD}` day(s).	Dependent item	aws.backup_vault.job.restore.countPreprocessing JSON Path: `$.[?(@.job_type == "restore-job")].length()` Discard unchanged with heartbeat: `1h`
Jobs: Copy	The number of copy jobs in the vault over the last `{$AWS.BACKUP_JOB.PERIOD}` day(s).	Dependent item	aws.backup_vault.job.copy.countPreprocessing JSON Path: `$.[?(@.job_type == "copy-job")].length()` Discard unchanged with heartbeat: `1h`
Jobs: Total	The total number of jobs in the vault over the last `{$AWS.BACKUP_JOB.PERIOD}` day(s).	Dependent item	aws.backup_vault.job.total.countPreprocessing JSON Path: `$.length()` Discard unchanged with heartbeat: `1h`
Jobs: Failed backup	The number of failed backup jobs in the vault over the last `{$AWS.BACKUP_JOB.PERIOD}` day(s).	Dependent item	aws.backup_vault.job.backup.failed.countPreprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Jobs: Failed restore	The number of failed restore jobs in the vault over the last `{$AWS.BACKUP_JOB.PERIOD}` day(s).	Dependent item	aws.backup_vault.job.restore.failed.countPreprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Jobs: Failed copy	The number of failed copy jobs in the vault over the last `{$AWS.BACKUP_JOB.PERIOD}` day(s).	Dependent item	aws.backup_vault.job.copy.failed.countPreprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
AWS Backup vault: Restore job has appeared	New restore job has appeared.	`change(/AWS Backup Vault by HTTP/aws.backup_vault.job.restore.count)>0`	Average	Manual close: Yes
AWS Backup vault: Copy job has appeared	New copy job has appeared.	`change(/AWS Backup Vault by HTTP/aws.backup_vault.job.copy.count)>0`	Warning	Manual close: Yes

LLD rule AWS Backup job discovery

Name Description Type Key and additional info

AWS Backup job discovery

AWS Backup job discovery.

Dependent item

aws.backup_vault.job.discoveryPreprocessing

Discard unchanged with heartbeat: 1h

Item prototypes for AWS Backup job discovery

Name Description Type Key and additional info

Job state [{#AWS.BACKUP_JOB.RESOURCE_NAME}][{#AWS.BACKUP_JOB.ID}]

The state of the job.

Possible values are:

- Unknown

- Created

- Pending

- Running

- Aborting

- Aborted

- Completed

- Failed

- Expired

- Partial

Dependent item

aws.backup_vault.job.state["{#AWS.BACKUP_JOB.ID}"]Preprocessing

JSON Path: $.[?(@.job_id == "{#AWS.BACKUP_JOB.ID}")].job_state.first()⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for AWS Backup job discovery

Name	Description	Expression	Severity	Dependencies and additional info
AWS Backup vault: Job failed [{#AWS.BACKUP_JOB.ID}]	Job has failed.	`last(/AWS Backup Vault by HTTP/aws.backup_vault.job.state["{#AWS.BACKUP_JOB.ID}"])=7`	High	Manual close: Yes
AWS Backup vault: Job has been aborted [{#AWS.BACKUP_JOB.ID}]	Job has been aborted.	`last(/AWS Backup Vault by HTTP/aws.backup_vault.job.state["{#AWS.BACKUP_JOB.ID}"])=5`	Average	Manual close: Yes
AWS Backup vault: Job has expired [{#AWS.BACKUP_JOB.ID}]	Job expired.	`last(/AWS Backup Vault by HTTP/aws.backup_vault.job.state["{#AWS.BACKUP_JOB.ID}"])=8`	Warning	Manual close: Yes
AWS Backup vault: Job is in an unknown state [{#AWS.BACKUP_JOB.ID}]	Job is in unknown state.	`last(/AWS Backup Vault by HTTP/aws.backup_vault.job.state["{#AWS.BACKUP_JOB.ID}"])=0`	Warning	Manual close: Yes

AWS Cost Explorer by HTTP

Overview

The template to monitor AWS Cost Explorer by HTTP via Zabbix, which works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Note: This template uses the Cost Explorer API calls to list and retrieve metrics.

For more information, please refer to the Cost Explorer pricing page.

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

AWS by HTTP

Configuration

Setup

Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

IAM policies for AWS Cost Management

Required Permissions

Add the following required permissions to your Zabbix IAM policy in order to collect metrics.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ce:GetDimensionValues",
                "ce:GetCostAndUsage"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

Access Key Authorization

If you are using access key authorization, you need to generate an access key and secret key for an IAM user with the necessary permissions:

Create an IAM user with programmatic access.
Attach the required policy to the IAM user.
Generate an access key and secret key.
Use the generated credentials in the macros {$AWS.ACCESS.KEY.ID} and {$AWS.SECRET.ACCESS.KEY}.

Assume Role Authorization

For using assume role authorization, add the appropriate permissions to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ce:GetDimensionValues",
                "ce:GetCostAndUsage"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Assume Role Authorization

Next, add a principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Set the following macros: {$AWS.ACCESS.KEY.ID}, {$AWS.SECRET.ACCESS.KEY}, {$AWS.STS.REGION}, {$AWS.ASSUME.ROLE.ARN}.

Role-Based Authorization

If you are using role-based authorization, add the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ce:GetDimensionValues",
                "ce:GetCostAndUsage",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Trust Relationships for Role-Based Authorization

Next, add a principal to the trust relationships of the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

Note: Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

Set the macros: {$AWS.AUTH_TYPE}. Possible values: access_key, assume_role, role_base.

For more information about managing access keys, see the official documentation.

Also, see the Macros section for a list of macros used in LLD filters.

Additional information about metrics and used API methods:

Describe AWS Cost Explore API actions

Macros used

Name	Description	Default
{$AWS.AUTH_TYPE}	Authorization method. Possible values: `access_key`, `assume_role`, `role_base`.	`access_key`
{$AWS.ASSUME.ROLE.AUTH.METADATA}	Add when using the `assume_role` through instance metadata or environment authorization method. Possible values: `false`, `true`.	`false`
{$AWS.ACCESS.KEY.ID}	Access key ID.
{$AWS.SECRET.ACCESS.KEY}	Secret access key.
{$AWS.ASSUME.ROLE.ARN}	ARN assume role; add when using the `assume_role` authorization method.
{$AWS.PROXY}	Sets HTTP proxy value. If this macro is empty, then no proxy is used.
{$AWS.STS.REGION}	Region used in assume role request.	`us-east-1`
{$AWS.BILLING.REGION}	Amazon Billing region code.	`us-east-1`
{$AWS.BILLING.MONTH}	Months to get historical data from AWS Cost Explore API, no more than 12 months.	`11`
{$AWS.BILLING.LLD.FILTER.SERVICE.MATCHES}	Filter of discoverable discovered billing service by name.	`.*`
{$AWS.BILLING.LLD.FILTER.SERVICE.NOT_MATCHES}	Filter to exclude discovered billing service by name.	`CHANGE_IF_NEEDED`

Items

Name Description Type Key and additional info

Get monthly costs

Get raw data on the monthly costs by service.

Script

aws.get.monthly.costsPreprocessing

Check for not supported value: any error⛔️Custom on fail: Discard value

Get daily costs

Get raw data on the daily costs by service.

Script

aws.get.daily.costsPreprocessing

Check for not supported value: any error⛔️Custom on fail: Discard value

LLD rule AWS daily costs by services discovery

Name Description Type Key and additional info

AWS daily costs by services discovery

Discovery of daily blended costs by services.

Dependent item

aws.daily.services.costs.discoveryPreprocessing

JSON Path: $..Groups.first()

Item prototypes for AWS daily costs by services discovery

Name Description Type Key and additional info

Service [{#AWS.BILLING.SERVICE.NAME}]: Blended daily cost

The daily blended cost of the {#AWS.BILLING.SERVICE.NAME} service for the previous day.

Dependent item

aws.daily.service.cost["{#AWS.BILLING.SERVICE.NAME}"]Preprocessing

JSON Path: The text is too long. Please see the template.⛔️Custom on fail: Discard value

LLD rule AWS monthly costs by services discovery

Name Description Type Key and additional info

AWS monthly costs by services discovery

Discovery of monthly costs by services.

Dependent item

aws.cost.service.monthly.discoveryPreprocessing

JSON Path: $.monthly_service_costs

Item prototypes for AWS monthly costs by services discovery

Name Description Type Key and additional info

[{#AWS.BILLING.SERVICE.NAME}]: Month [{#AWS.BILLING.MONTH}] Blended cost

The monthly cost by service {#AWS.BILLING.SERVICE.NAME}.

Dependent item

aws.monthly.service.cost["{#AWS.BILLING.SERVICE.NAME}", "{#AWS.BILLING.MONTH}"]Preprocessing

JSON Path: The text is too long. Please see the template.⛔️Custom on fail: Discard value
JSON Path: The text is too long. Please see the template.⛔️Custom on fail: Discard value

LLD rule AWS monthly costs discovery

Name Description Type Key and additional info

AWS monthly costs discovery

Discovery of monthly costs.

Dependent item

aws.monthly.cost.discoveryPreprocessing

JSON Path: $.monthly_costs

Item prototypes for AWS monthly costs discovery

Name Description Type Key and additional info

[{#AWS.BILLING.MONTH}]: Blended cost per month

The blended cost by month {#AWS.BILLING.MONTH}.

Dependent item

aws.monthly.cost["{#AWS.BILLING.MONTH}"]Preprocessing

JSON Path: The text is too long. Please see the template.⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 3h

原创文章，作者：奋斗，如若转载，请注明出处：https://blog.ytso.com/tech/aiops/319434.html

zabbix integrations aws

AWS by HTTP

Overview

Included Monitoring Templates

Requirements

Tested versions

Configuration

Setup

Required Permissions

Access Key Authorization

Assume Role Authorization

Trust Relationships for Assume Role Authorization

Role-Based Authorization

Trust Relationships for Role-Based Authorization

Macros used

LLD rule S3 buckets discovery

LLD rule EC2 instances discovery

LLD rule RDS instances discovery

LLD rule ECS clusters discovery

LLD rule ELB load balancers discovery

LLD rule Lambda discovery

LLD rule Backup vault discovery

AWS EC2 by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Required Permissions

Access Key Authorization

Assume Role Authorization

Trust Relationships for Assume Role Authorization

Role-Based Authorization

Trust Relationships for Role-Based Authorization

Macros used

Items

Triggers

LLD rule Instance Alarms discovery

Item prototypes for Instance Alarms discovery

Trigger prototypes for Instance Alarms discovery

LLD rule Instance Volumes discovery

Item prototypes for Instance Volumes discovery

Trigger prototypes for Instance Volumes discovery

AWS RDS instance by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Required Permissions

Access Key Authorization

Assume Role Authorization

Trust Relationships for Assume Role Authorization

Role-Based Authorization

Trust Relationships for Role-Based Authorization

Macros used

Items

Triggers

LLD rule Instance Alarms discovery

Item prototypes for Instance Alarms discovery

Trigger prototypes for Instance Alarms discovery

LLD rule Aurora metrics discovery

Item prototypes for Aurora metrics discovery

LLD rule Aurora MySQL metrics discovery

Item prototypes for Aurora MySQL metrics discovery

LLD rule Instance Events discovery

Item prototypes for Instance Events discovery

AWS S3 bucket by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Required Permissions

Access Key Authorization

Assume role authorization

Trust Relationships for Assume Role Authorization

Role-Based Authorization

Trust Relationships for Role-Based Authorization

Macros used