Skip to content

Host Health Monitor

Finbar Ryan edited this page Nov 15, 2023 · 17 revisions

The Host Health Monitor feature of the Functions Runtime monitors various VM sandbox imposed performance counters. The goal is to temporarily stop the host from doing more work when thresholds for any of the counters are about to be exceeded. This allows the host to avoid hitting hard sandbox limits which could cause a hard shutdown, and also allows the host to gracefully complete in-progress work while waiting for the counters to return to normal limits. The performance counters currently monitored are:

  • Connections : Number of outbound connections (limit is 600 active, 1200 total). For information on handling connection limits, see Managing Connections.
  • Threads : Number of threads (limit is 512).
  • Processes: Number of child processes (limit is 32).
  • NamedPipes: Number of named pipes (limit is 128).
  • Sections: Related to file create operations. The underlying resource is Named Shared Memory sections created by CreateFileMapping calls. (limit is 256).

Note that the limits above are the hard limits enforced by the sandbox. The actual thresholds used by the monitor are a percentage of these maximums (default is 0.80). When one or more counters are nearing their thresholds, the host will be stopped until the counter values return to normal. The Web App continues to run, but internally the host has been stopped, and no new functions will be run. If the Function App is scaled out to multiple instances, other instances will continue to run and pick up the workload. Once the counter values return to normal, the host will start processing work again automatically. If after waiting for a while the counter values do not recover, the App Domain will be recycled in an attempt to recover.

If your Function App is hitting these thresholds, you'll see errors like "Host thresholds exceeded: [Connections]" being logged, where the brackets will show the set of counters exceeded. If this is happening often, the offending function(s) will need to be examined, to ensure that they're using resources appropriately and are throttled correctly. E.g. is your function code opening up a large/unbounded number of outgoing connections?

For the Sections threshold you may see this if you have the Snapshot Debugger enabled on a Function App on the Windows Consumption plan. This is enabled by using the SnapshotDebugger_EXTENSION_VERSION setting and is not supported for the Windows Consumption plan. The docs on this are at Enable Snapshot Debugger for .NET and .NET Core apps in Azure Functions.

The feature is currently only active on Consumption plan, where these sandbox limits exist. The feature is enabled by default, but can be disabled/configured via the healthMonitor section of host.json, e.g.

{
    "healthMonitor": {
        "enabled": true,
        "healthCheckInterval": "00:00:10",
        "healthCheckWindow": "00:02:00",
        "healthCheckThreshold": 6,
        "counterThreshold": 0.80
    }
}

Description of settings:

  • enabled: Whether the feature is enabled. Default is true.
  • healthCheckInterval: The time interval between the periodic background health checks. Default is 10 seconds.
  • healthCheckWindow: A sliding time window used in conjunction with the healthCheckThreshold setting (see below).
  • healthCheckThreshold: Maximum number of times the health check can fail before a host recycle is initiated.
  • counterThreshold: The threshold at which a performance counter will be considered unhealthy. Default is 0.80.

Learn

Azure Functions Basics

Advanced Concepts

Dotnet Functions

Java Functions

Node.js Functions

Python Functions

Host API's

Bindings

V2 Runtime

Contribute

Functions host

Language workers

Get Help

Other

Clone this wiki locally