Getting visibility into your Veeam VSA High Availability (HA) Cluster is critical for ensuring backup continuity. Veeam ONE provides the dashboards, alarms, and reporting you need to stay ahead of issues before they impact your recovery objectives. This post walks through configuring Veeam ONE to effectively monitor your VSA HA Cluster — from initial connection through tuning alarm thresholds and automating the whole thing with PowerShell via the Veeam ONE REST API.
What Is the Veeam VSA HA Cluster?#
The Veeam VSA HA Cluster provides redundancy for your Veeam Backup & Replication environment by pairing a primary and secondary node. If the primary node fails, the secondary automatically takes over, maintaining backup and replication operations without manual intervention. While the cluster handles the failover, Veeam ONE is what tells you something happened — or better yet, that something is about to go wrong.
The cluster relies on several components that need continuous monitoring:
- Primary and secondary VBR nodes (virtual appliances)
- Shared PostgreSQL configuration database
- Internode heartbeat and replication traffic
- Underlying vSphere resources hosting the appliances
Architecture Overview#
The following diagram shows how Veeam ONE sits alongside your HA cluster, collecting data from both nodes and the underlying vSphere infrastructure simultaneously.
graph TD
subgraph vSphere["vSphere Infrastructure"]
ESX1["ESXi Host 1"]
ESX2["ESXi Host 2"]
vC["vCenter Server"]
end
subgraph VSA_HA["Veeam VSA HA Cluster"]
PRI["Primary VBR Node\n(Active)"]
SEC["Secondary VBR Node\n(Standby)"]
PG[("PostgreSQL\nConfig DB")]
VIP["Cluster VIP"]
PRI -- "Heartbeat / Sync" --> SEC
PRI -- "DB Replication" --> PG
VIP --> PRI
end
subgraph VONE["Veeam ONE Server"]
MON["Monitoring\nService"]
REP["Reporting\nService"]
API["REST API\n:1239"]
MON --> REP
MON --> API
end
vC --> MON
ESX1 --> MON
ESX2 --> MON
PRI -- "Port 9392" --> MON
SEC -- "Port 9392" --> MON
API --> ALERTS["Alert\nNotifications\n(Email / Script / Webhook)"]🔍 Click to enlarge
Veeam ONE connects to both VBR nodes independently on port 9392, and to vCenter for VM-level metrics. This means you retain full visibility regardless of which node is currently active.
Connecting Veeam ONE to the HA Cluster#
Before alarms can fire, Veeam ONE needs to know about the cluster. Add both nodes individually so you have full visibility regardless of which is active.
- Open Veeam ONE Monitor and go to Veeam Backup Servers in the left panel
- Right-click and choose Add Veeam Backup Server
- Enter the FQDN/IP of the primary node, credentials, and connection port (default
9392) - Repeat for the secondary node
- Optionally, add the cluster virtual IP (VIP) as a third entry if your setup uses one
Tip: Use a dedicated service account with the Veeam ONE Read-Only Operator role rather than your admin credentials. This follows least-privilege principles and avoids lockout issues during node failovers.
Once connected, verify both nodes appear under Infrastructure > Veeam Backup Servers and that job data is populating correctly.
Alarm Lifecycle#
Before diving into specific alarms, it helps to understand how Veeam ONE processes an alarm from detection through to resolution. Over 300 predefined alarms ship out of the box, all with adjustable thresholds.
sequenceDiagram
participant INF as Infrastructure<br/>(VBR Node / VM / vSphere)
participant COLL as Veeam ONE<br/>Collection Service
participant ENG as Alarm Engine
participant NOTIF as Notification<br/>Rule
participant OPS as Operator
INF->>COLL: Metric / event data (polling interval)
COLL->>ENG: Evaluate against alarm thresholds
alt Threshold breached
ENG->>ENG: Set severity (Warning / Error / Resolved)
ENG->>NOTIF: Match alarm to notification rules
NOTIF->>OPS: Email / Script / Webhook fired
OPS->>INF: Remediate issue
INF->>COLL: Metric returns to normal
COLL->>ENG: Re-evaluate
ENG->>ENG: Alarm → Resolved
ENG->>NOTIF: Resolution notification (optional)
NOTIF->>OPS: Resolved notification sent
else No breach
ENG->>ENG: No alarm — continue polling
end🔍 Click to enlarge
Key Alarms to Configure#
Veeam ONE ships with a solid set of built-in alarms, but the defaults aren’t always tuned for an HA environment. Navigate to Veeam ONE Monitor > Alarm Management to review and customize the following.
Backup Server Health Alarms#
These fire when something goes wrong with the VBR service itself:
| Alarm | Recommended Threshold | Severity |
|---|---|---|
| Backup Server Unreachable | 1 occurrence | Critical |
| Backup Server Database Connection | Immediate | Critical |
| Veeam Services Status | Immediate | Critical |
| License Expiration | 30 days = Warning / 7 days = Critical | Warning → Critical |
HA Cluster Node Alarms#
For the cluster specifically, configure these in Alarm Management > Veeam Backup:
- HA Cluster Node Failover Detected — Set to Critical with immediate notification. A failover means something failed on the primary; you need to know now so you can investigate the root cause and restore HA redundancy.
- HA Cluster Sync Status — Set to Warning if sync lag exceeds 30 seconds, Critical if it exceeds 5 minutes. A lagging secondary means your failover target is out of date.
- HA Cluster Node Role Change — Informational alarm that logs when the active node designation changes. Useful for audit trails.
VM Resource Alarms#
Since the VSA nodes run as virtual machines on your vSphere infrastructure, wire up these VM-level alarms and scope them to the HA cluster VMs specifically:
- VM CPU Usage — Warning at 75%, Critical at 90%
- VM Memory Usage — Warning at 80%, Critical at 95%
- VM Disk Space — Warning at 75%, Critical at 85% (especially on the config DB volume)
- VM Snapshot Detected — Immediate warning; snapshots on VBR appliances can cause serious performance and consistency problems
- VM Network Connectivity Loss — Immediate Critical; loss of network on either node breaks internode communication
Backup Job Alarms#
Failovers don’t always mean jobs resume cleanly. Add these to catch post-failover drift:
- Backup Job Failed — Critical, notify immediately
- Backup Job Warning — Warning, notify within 15 minutes
- No Backup in Last N Days — Critical if a VM hasn’t been backed up within RPO window
- Backup Window Exceeded — Warning when jobs run longer than expected, which can indicate resource contention on the newly active node
Alarm Severity Flow#
The diagram below maps alarm states against recommended actions, keeping your response process consistent across shifts.
flowchart LR
A([Alarm Triggered]) --> B{Severity?}
B -->|Warning| W["🟡 Warning\nLog ticket\nMonitor 15 min\nCheck trend"]
B -->|Error / Critical| E["🔴 Critical\nPage on-call\nBegin RCA\nCheck HA sync"]
B -->|Resolved| R["🟢 Resolved\nClose ticket\nDocument outcome"]
W --> W2{Still Warning\nafter 15 min?}
W2 -->|Yes| E
W2 -->|No| R
E --> E2["Attempt\nRemediation"]
E2 --> E3{Resolved?}
E3 -->|Yes| R
E3 -->|No| E4["Escalate to\nSenior Engineer /\nVeeam Support"]🔍 Click to enlarge
Scoping Alarms to the Cluster#
Rather than applying alarms globally, scope them directly to the HA cluster objects to reduce noise:
- In Alarm Management, select the alarm and click Edit
- Under Scope, choose Specific Objects
- Add both VBR node entries and any associated VMs
- Save — the alarm will now only fire for those objects
This prevents false positives from other backup servers in your environment triggering the same HA-specific alarms.
Setting Up Notification Rules#
Alarms are only useful if someone sees them. Go to Settings > Notification Rules and create a dedicated rule for HA cluster alarms:
Rule Name: VSA HA Cluster - Critical Alerts Trigger: Any alarm scoped to HA cluster objects at Warning or Critical severity Recipients: your-team@company.com (or Teams/Slack webhook if integrated) Schedule: 24/7 — no suppression window for HA alarms
For non-critical informational alarms (like role change events), create a separate rule with a digest schedule so you’re not flooded during a planned failover test.
Automating Alarm Configuration with PowerShell#
Veeam ONE does not ship with a native PowerShell module, but its REST API
(port 1239) exposes full alarm management capabilities. The snippets below use
the Veeam ONE v13 REST API to authenticate, query alarm templates, retrieve active
alarms, and wire up a script action — all repeatable across environments.
Step 1 — Authenticate and Get a Token#
#Requires -Version 5.1
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
$voneServer = "veeamone.yourdomain.local"
$vonePort = 1239
$baseUri = "https://${voneServer}:${vonePort}"
$cred = Get-Credential -Message "Enter Veeam ONE credentials"
# Ignore self-signed cert in lab — remove in production (use a trusted cert)
if (-not ([System.Management.Automation.PSTypeName]'TrustAll').Type) {
Add-Type @"
using System.Net; using System.Security.Cryptography.X509Certificates;
public class TrustAll : ICertificatePolicy {
public bool CheckValidationResult(ServicePoint sp, X509Certificate cert,
WebRequest req, int prob) { return true; }
}
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAll
}
$tokenBody = @{
grant_type = "password"
username = $cred.UserName
password = $cred.GetNetworkCredential().Password
}
$tokenResponse = Invoke-RestMethod `
-Method POST `
-Uri "$baseUri/api/token" `
-Body $tokenBody `
-ContentType "application/x-www-form-urlencoded"
$token = $tokenResponse.access_token
$headers = @{
"Authorization" = "Bearer $token"
"Accept" = "application/json"
"Content-Type" = "application/json"
}
Write-Host "✅ Connected to Veeam ONE at $voneServer" -ForegroundColor GreenStep 2 — List All Alarm Templates (Find HA Alarm IDs)#
Use this to discover the alarmTemplateId values for HA cluster alarms so you
can reference them in subsequent calls.
$alarmTemplates = Invoke-RestMethod `
-Method GET `
-Uri "$baseUri/api/v2.2/alarms/templates" `
-Headers $headers
# Filter for HA and Backup Server related alarms
$haAlarms = $alarmTemplates.items | Where-Object {
$_.name -match "HA|Backup Server|Cluster|Failover|Sync"
}
$haAlarms | Select-Object alarmTemplateId, name, isEnabled, isPredefined |
Format-Table -AutoSizeExample output:
alarmTemplateId name isEnabled isPredefined
214 HA Cluster Node Failover Detected True True 215 HA Cluster Sync Status True True 216 HA Cluster Node Role Change True True 42 Backup Server Unreachable True True 43 Backup Server Database Connection True True
Step 3 — Query Currently Triggered Alarms#
Pull all active alarms scoped to your HA cluster nodes and filter by severity. Run this on a schedule or wire it into your ticketing system.
$triggeredAlarms = Invoke-RestMethod `
-Method GET `
-Uri "$baseUri/api/v2.2/alarms/triggered?limit=200" `
-Headers $headers
# Show only Warning / Error alarms on HA cluster objects
$activeHaAlarms = $triggeredAlarms.items | Where-Object {
$_.status -in @("Warning", "Error") -and
$_.name -match "HA|Backup Server|Cluster"
}
if ($activeHaAlarms.Count -eq 0) {
Write-Host "✅ No active HA cluster alarms." -ForegroundColor Green
} else {
Write-Warning "⚠️ $($activeHaAlarms.Count) active HA alarm(s) found!"
$activeHaAlarms | Select-Object `
triggeredAlarmId,
name,
status,
triggeredTime,
@{N="Object"; E={$_.alarmAssignment.objectName}},
description |
Format-Table -AutoSize -Wrap
}Step 4 — Resolve an Alarm Programmatically#
After automated or manual remediation, acknowledge and resolve an alarm by its ID.
function Resolve-VeeamOneAlarm {
param(
[Parameter(Mandatory)][int] $TriggeredAlarmId,
[Parameter(Mandatory)][string] $Comment
)
$body = @{ comment = $Comment } | ConvertTo-Json
Invoke-RestMethod `
-Method POST `
-Uri "$baseUri/api/v2.2/alarms/triggered/$TriggeredAlarmId/resolve" `
-Headers $headers `
-Body $body | Out-Null
Write-Host "✅ Alarm ID $TriggeredAlarmId resolved." -ForegroundColor Green
}
# Example usage — resolve alarm after confirming HA sync restored
Resolve-VeeamOneAlarm -TriggeredAlarmId 42 -Comment "HA sync lag resolved after primary node restart at $(Get-Date -Format 'yyyy-MM-dd HH:mm')"Step 5 — Configure an Alarm Script Action (Run on Trigger)#
Veeam ONE allows a Run Script action on any alarm. This is how you fire a custom PowerShell handler — for example, to create a ServiceNow/Jira ticket or send a Teams webhook — when an HA alarm triggers. Configure this in the alarm’s Actions tab in the GUI, or template it with the snippet below as your notification script.
# save as: C:\Scripts\VeeamONE-HAAlarm-Handler.ps1
# Veeam ONE passes: %1=AlarmName %2=ObjectName %3=AlarmStatus %4=Description %5=DateTime
param(
[string]$AlarmName = $args,
[string]$ObjectName = $args,[1]
[string]$AlarmStatus = $args,[2]
[string]$Description = $args,[3]
[string]$TriggeredAt = $args[4]
)
$logPath = "C:\Logs\VeeamONE-HAAlarms.log"
$timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
# Log the alarm locally
"[$timestamp] ALARM: $AlarmName | Object: $ObjectName | Status: $AlarmStatus | $Description" |
Out-File -FilePath $logPath -Append -Encoding UTF8
# Send a Teams webhook (replace URI with your connector URL)
$teamsWebhook = "https://outlook.office.com/webhook/YOUR-WEBHOOK-URI"
$teamsBody = @{
"@type" = "MessageCard"
"@context" = "http://schema.org/extensions"
"summary" = "Veeam ONE HA Alarm"
"themeColor" = if ($AlarmStatus -eq "Error") { "FF0000" } else { "FFA500" }
"title" = "🔔 Veeam ONE: $AlarmName"
"sections" = @(@{
"facts" = @(
@{ "name" = "Object"; "value" = $ObjectName }
@{ "name" = "Status"; "value" = $AlarmStatus }
@{ "name" = "Time"; "value" = $TriggeredAt }
@{ "name" = "Details"; "value" = $Description }
)
})
} | ConvertTo-Json -Depth 5
try {
Invoke-RestMethod -Uri $teamsWebhook -Method POST -Body $teamsBody -ContentType "application/json"
"[$timestamp] Teams notification sent." | Out-File -FilePath $logPath -Append -Encoding UTF8
} catch {
"[$timestamp] ERROR sending Teams notification: $_" | Out-File -FilePath $logPath -Append -Encoding UTF8
}In the alarm’s Actions tab, set the Run script action value to:
powershell.exe -ExecutionPolicy Bypass -File “C:\Scripts…” ‘%1’ ‘%2’ ‘%3’ ‘%4’ ‘%5’
Step 6 — Full Alarm Audit Report (Scheduled)#
Run this weekly via Windows Task Scheduler to produce a CSV audit of all HA alarm activity for your change management records.
$reportPath = "C:\Reports\VeeamONE-HAAlarm-Audit-$(Get-Date -Format 'yyyyMMdd').csv"
$allTriggered = Invoke-RestMethod `
-Method GET `
-Uri "$baseUri/api/v2.2/alarms/triggered?limit=500" `
-Headers $headers
$report = $allTriggered.items |
Where-Object { $_.name -match "HA|Backup Server|Cluster|Failover|Sync" } |
Select-Object `
triggeredAlarmId,
name,
status,
triggeredTime,
@{N="ObjectName"; E={$_.alarmAssignment.objectName}},
@{N="ObjectType"; E={$_.alarmAssignment.objectType}},
repeatCount,
description
$report | Export-Csv -Path $reportPath -NoTypeInformation -Encoding UTF8
Write-Host "📄 Report saved to: $reportPath" -ForegroundColor CyanDashboard Configuration#
Veeam ONE Monitor’s Business View is ideal for building an HA cluster dashboard.
- Navigate to Business View and create a new Business Group called
VSA HA Cluster - Add both backup server objects and their underlying VMs to the group
- Pin the following widgets to the group dashboard:
- Protected VMs — confirms all VMs remain protected post-failover
- Backup Infrastructure Health — overall status of both nodes
- Active Alarms — filtered to the cluster group
- Job Session History (Last 24h) — quickly spot failed or missed jobs
- Save and set the dashboard as a favorite for quick access
In Veeam ONE Reporter, schedule a weekly Infrastructure Assessment report scoped to the cluster group and email it to stakeholders. This gives a historical record of any failover events, alarm activity, and resource utilization trends.
Testing Your Monitoring Setup#
Before you rely on these alarms in production, validate them with a controlled test:
- Simulate a failover using the Veeam Backup & Replication console (or by gracefully shutting down the primary node in a maintenance window)
- Confirm the HA Cluster Node Failover Detected alarm fires within the expected timeframe
- Verify notification emails/webhooks and Teams messages are delivered
- Check that the secondary node now shows as Active in Veeam ONE
- Restore the primary node and confirm HA Cluster Sync Status returns to green once replication catches up
Document the alarm-to-notification latency you observe — this becomes your baseline SLA for incident response.
Best Practices#
- Never rely solely on Veeam ONE for HA cluster monitoring; integrate it with your broader monitoring stack (Zabbix, PRTG, Azure Monitor) via SNMP traps or syslog forwarding
- Review alarm thresholds quarterly — resource baselines shift as your environment grows, and stale thresholds lead to alarm fatigue
- Keep Veeam ONE updated alongside VBR — newer versions introduce improved HA cluster awareness and alarm coverage
- Test failover scenarios at least twice a year and validate that monitoring correctly reflects the post-failover state
- Store the REST API token securely — use Windows Credential Manager or a secrets manager rather than hardcoding credentials in scripts
- Use Business View grouping consistently so reports and dashboards remain accurate even as you add nodes or migrate VMs