Monitoring Your Veeam VSA HA Cluster with Veeam ONE

Table of Contents

Getting visibility into your Veeam VSA High Availability (HA) Cluster is critical for ensuring backup continuity. Veeam ONE provides the dashboards, alarms, and reporting you need to stay ahead of issues before they impact your recovery objectives. This post walks through configuring Veeam ONE to effectively monitor your VSA HA Cluster — from initial connection through tuning alarm thresholds and automating the whole thing with PowerShell via the Veeam ONE REST API.

What Is the Veeam VSA HA Cluster?
#

The Veeam VSA HA Cluster provides redundancy for your Veeam Backup & Replication environment by pairing a primary and secondary node. If the primary node fails, the secondary automatically takes over, maintaining backup and replication operations without manual intervention. While the cluster handles the failover, Veeam ONE is what tells you something happened — or better yet, that something is about to go wrong.

The cluster relies on several components that need continuous monitoring:

Primary and secondary VBR nodes (virtual appliances)
Shared PostgreSQL configuration database
Internode heartbeat and replication traffic
Underlying vSphere resources hosting the appliances

Architecture Overview
#

The following diagram shows how Veeam ONE sits alongside your HA cluster, collecting data from both nodes and the underlying vSphere infrastructure simultaneously.

graph TD
    subgraph vSphere["vSphere Infrastructure"]
        ESX1["ESXi Host 1"]
        ESX2["ESXi Host 2"]
        vC["vCenter Server"]
    end

    subgraph VSA_HA["Veeam VSA HA Cluster"]
        PRI["Primary VBR Node\n(Active)"]
        SEC["Secondary VBR Node\n(Standby)"]
        PG[("PostgreSQL\nConfig DB")]
        VIP["Cluster VIP"]
        PRI -- "Heartbeat / Sync" --> SEC
        PRI -- "DB Replication" --> PG
        VIP --> PRI
    end

    subgraph VONE["Veeam ONE Server"]
        MON["Monitoring\nService"]
        REP["Reporting\nService"]
        API["REST API\n:1239"]
        MON --> REP
        MON --> API
    end

    vC --> MON
    ESX1 --> MON
    ESX2 --> MON
    PRI -- "Port 9392" --> MON
    SEC -- "Port 9392" --> MON
    API --> ALERTS["Alert\nNotifications\n(Email / Script / Webhook)"]

🔍 Click to enlarge

Veeam ONE connects to both VBR nodes independently on port 9392, and to vCenter for VM-level metrics. This means you retain full visibility regardless of which node is currently active.

Connecting Veeam ONE to the HA Cluster
#

Before alarms can fire, Veeam ONE needs to know about the cluster. Add both nodes individually so you have full visibility regardless of which is active.

Open Veeam ONE Monitor and go to Veeam Backup Servers in the left panel
Right-click and choose Add Veeam Backup Server
Enter the FQDN/IP of the primary node, credentials, and connection port (default 9392)
Repeat for the secondary node
Optionally, add the cluster virtual IP (VIP) as a third entry if your setup uses one

Tip: Use a dedicated service account with the Veeam ONE Read-Only Operator role rather than your admin credentials. This follows least-privilege principles and avoids lockout issues during node failovers.

Once connected, verify both nodes appear under Infrastructure > Veeam Backup Servers and that job data is populating correctly.

Alarm Lifecycle
#

Before diving into specific alarms, it helps to understand how Veeam ONE processes an alarm from detection through to resolution. Over 300 predefined alarms ship out of the box, all with adjustable thresholds.

sequenceDiagram
    participant INF as Infrastructure<br/>(VBR Node / VM / vSphere)
    participant COLL as Veeam ONE<br/>Collection Service
    participant ENG as Alarm Engine
    participant NOTIF as Notification<br/>Rule
    participant OPS as Operator

    INF->>COLL: Metric / event data (polling interval)
    COLL->>ENG: Evaluate against alarm thresholds
    alt Threshold breached
        ENG->>ENG: Set severity (Warning / Error / Resolved)
        ENG->>NOTIF: Match alarm to notification rules
        NOTIF->>OPS: Email / Script / Webhook fired
        OPS->>INF: Remediate issue
        INF->>COLL: Metric returns to normal
        COLL->>ENG: Re-evaluate
        ENG->>ENG: Alarm → Resolved
        ENG->>NOTIF: Resolution notification (optional)
        NOTIF->>OPS: Resolved notification sent
    else No breach
        ENG->>ENG: No alarm — continue polling
    end

🔍 Click to enlarge

Key Alarms to Configure
#

Veeam ONE ships with a solid set of built-in alarms, but the defaults aren’t always tuned for an HA environment. Navigate to Veeam ONE Monitor > Alarm Management to review and customize the following.

Backup Server Health Alarms
#

These fire when something goes wrong with the VBR service itself:

Alarm	Recommended Threshold	Severity
Backup Server Unreachable	1 occurrence	Critical
Backup Server Database Connection	Immediate	Critical
Veeam Services Status	Immediate	Critical
License Expiration	30 days = Warning / 7 days = Critical	Warning → Critical

HA Cluster Node Alarms
#

For the cluster specifically, configure these in Alarm Management > Veeam Backup:

HA Cluster Node Failover Detected — Set to Critical with immediate notification. A failover means something failed on the primary; you need to know now so you can investigate the root cause and restore HA redundancy.
HA Cluster Sync Status — Set to Warning if sync lag exceeds 30 seconds, Critical if it exceeds 5 minutes. A lagging secondary means your failover target is out of date.
HA Cluster Node Role Change — Informational alarm that logs when the active node designation changes. Useful for audit trails.

VM Resource Alarms
#

Since the VSA nodes run as virtual machines on your vSphere infrastructure, wire up these VM-level alarms and scope them to the HA cluster VMs specifically:

VM CPU Usage — Warning at 75%, Critical at 90%
VM Memory Usage — Warning at 80%, Critical at 95%
VM Disk Space — Warning at 75%, Critical at 85% (especially on the config DB volume)
VM Snapshot Detected — Immediate warning; snapshots on VBR appliances can cause serious performance and consistency problems
VM Network Connectivity Loss — Immediate Critical; loss of network on either node breaks internode communication

Backup Job Alarms
#

Failovers don’t always mean jobs resume cleanly. Add these to catch post-failover drift:

Backup Job Failed — Critical, notify immediately
Backup Job Warning — Warning, notify within 15 minutes
No Backup in Last N Days — Critical if a VM hasn’t been backed up within RPO window
Backup Window Exceeded — Warning when jobs run longer than expected, which can indicate resource contention on the newly active node

Alarm Severity Flow
#

The diagram below maps alarm states against recommended actions, keeping your response process consistent across shifts.

flowchart LR
    A([Alarm Triggered]) --> B{Severity?}

    B -->|Warning| W["🟡 Warning\nLog ticket\nMonitor 15 min\nCheck trend"]
    B -->|Error / Critical| E["🔴 Critical\nPage on-call\nBegin RCA\nCheck HA sync"]
    B -->|Resolved| R["🟢 Resolved\nClose ticket\nDocument outcome"]

    W --> W2{Still Warning\nafter 15 min?}
    W2 -->|Yes| E
    W2 -->|No| R

    E --> E2["Attempt\nRemediation"]
    E2 --> E3{Resolved?}
    E3 -->|Yes| R
    E3 -->|No| E4["Escalate to\nSenior Engineer /\nVeeam Support"]

🔍 Click to enlarge

Scoping Alarms to the Cluster
#

Rather than applying alarms globally, scope them directly to the HA cluster objects to reduce noise:

In Alarm Management, select the alarm and click Edit
Under Scope, choose Specific Objects
Add both VBR node entries and any associated VMs
Save — the alarm will now only fire for those objects

This prevents false positives from other backup servers in your environment triggering the same HA-specific alarms.

Setting Up Notification Rules
#

Alarms are only useful if someone sees them. Go to Settings > Notification Rules and create a dedicated rule for HA cluster alarms:

Rule Name: VSA HA Cluster - Critical Alerts Trigger: Any alarm scoped to HA cluster objects at Warning or Critical severity Recipients: your-team@company.com (or Teams/Slack webhook if integrated) Schedule: 24/7 — no suppression window for HA alarms

For non-critical informational alarms (like role change events), create a separate rule with a digest schedule so you’re not flooded during a planned failover test.

Automating Alarm Configuration with PowerShell
#

Veeam ONE does not ship with a native PowerShell module, but its REST API (port 1239) exposes full alarm management capabilities. The snippets below use the Veeam ONE v13 REST API to authenticate, query alarm templates, retrieve active alarms, and wire up a script action — all repeatable across environments.

Step 1 — Authenticate and Get a Token
#

#Requires -Version 5.1
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

$voneServer  = "veeamone.yourdomain.local"
$vonePort    = 1239
$baseUri     = "https://${voneServer}:${vonePort}"
$cred        = Get-Credential -Message "Enter Veeam ONE credentials"

# Ignore self-signed cert in lab — remove in production (use a trusted cert)
if (-not ([System.Management.Automation.PSTypeName]'TrustAll').Type) {
    Add-Type @"
    using System.Net; using System.Security.Cryptography.X509Certificates;
    public class TrustAll : ICertificatePolicy {
        public bool CheckValidationResult(ServicePoint sp, X509Certificate cert,
            WebRequest req, int prob) { return true; }
    }
"@
    [System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAll
}

$tokenBody = @{
    grant_type = "password"
    username   = $cred.UserName
    password   = $cred.GetNetworkCredential().Password
}

$tokenResponse = Invoke-RestMethod `
    -Method  POST `
    -Uri     "$baseUri/api/token" `
    -Body    $tokenBody `
    -ContentType "application/x-www-form-urlencoded"

$token   = $tokenResponse.access_token
$headers = @{
    "Authorization" = "Bearer $token"
    "Accept"        = "application/json"
    "Content-Type"  = "application/json"
}

Write-Host "✅ Connected to Veeam ONE at $voneServer" -ForegroundColor Green

Step 2 — List All Alarm Templates (Find HA Alarm IDs)
#

Use this to discover the alarmTemplateId values for HA cluster alarms so you can reference them in subsequent calls.

$alarmTemplates = Invoke-RestMethod `
    -Method  GET `
    -Uri     "$baseUri/api/v2.2/alarms/templates" `
    -Headers $headers

# Filter for HA and Backup Server related alarms
$haAlarms = $alarmTemplates.items | Where-Object {
    $_.name -match "HA|Backup Server|Cluster|Failover|Sync"
}

$haAlarms | Select-Object alarmTemplateId, name, isEnabled, isPredefined |
    Format-Table -AutoSize

Example output:

alarmTemplateId name isEnabled isPredefined

214 HA Cluster Node Failover Detected True True 215 HA Cluster Sync Status True True 216 HA Cluster Node Role Change True True 42 Backup Server Unreachable True True 43 Backup Server Database Connection True True

Step 3 — Query Currently Triggered Alarms
#

Pull all active alarms scoped to your HA cluster nodes and filter by severity. Run this on a schedule or wire it into your ticketing system.

$triggeredAlarms = Invoke-RestMethod `
    -Method  GET `
    -Uri     "$baseUri/api/v2.2/alarms/triggered?limit=200" `
    -Headers $headers

# Show only Warning / Error alarms on HA cluster objects
$activeHaAlarms = $triggeredAlarms.items | Where-Object {
    $_.status -in @("Warning", "Error") -and
    $_.name   -match "HA|Backup Server|Cluster"
}

if ($activeHaAlarms.Count -eq 0) {
    Write-Host "✅ No active HA cluster alarms." -ForegroundColor Green
} else {
    Write-Warning "⚠️  $($activeHaAlarms.Count) active HA alarm(s) found!"
    $activeHaAlarms | Select-Object `
        triggeredAlarmId,
        name,
        status,
        triggeredTime,
        @{N="Object"; E={$_.alarmAssignment.objectName}},
        description |
        Format-Table -AutoSize -Wrap
}

Step 4 — Resolve an Alarm Programmatically
#

After automated or manual remediation, acknowledge and resolve an alarm by its ID.

function Resolve-VeeamOneAlarm {
    param(
        [Parameter(Mandatory)][int]    $TriggeredAlarmId,
        [Parameter(Mandatory)][string] $Comment
    )

    $body = @{ comment = $Comment } | ConvertTo-Json

    Invoke-RestMethod `
        -Method  POST `
        -Uri     "$baseUri/api/v2.2/alarms/triggered/$TriggeredAlarmId/resolve" `
        -Headers $headers `
        -Body    $body | Out-Null

    Write-Host "✅ Alarm ID $TriggeredAlarmId resolved." -ForegroundColor Green
}

# Example usage — resolve alarm after confirming HA sync restored
Resolve-VeeamOneAlarm -TriggeredAlarmId 42 -Comment "HA sync lag resolved after primary node restart at $(Get-Date -Format 'yyyy-MM-dd HH:mm')"

Step 5 — Configure an Alarm Script Action (Run on Trigger)
#

Veeam ONE allows a Run Script action on any alarm. This is how you fire a custom PowerShell handler — for example, to create a ServiceNow/Jira ticket or send a Teams webhook — when an HA alarm triggers. Configure this in the alarm’s Actions tab in the GUI, or template it with the snippet below as your notification script.

# save as: C:\Scripts\VeeamONE-HAAlarm-Handler.ps1
# Veeam ONE passes: %1=AlarmName %2=ObjectName %3=AlarmStatus %4=Description %5=DateTime
param(
    [string]$AlarmName    = $args,
    [string]$ObjectName   = $args,[1]
    [string]$AlarmStatus  = $args,[2]
    [string]$Description  = $args,[3]
    [string]$TriggeredAt  = $args[4]
)

$logPath   = "C:\Logs\VeeamONE-HAAlarms.log"
$timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"

# Log the alarm locally
"[$timestamp] ALARM: $AlarmName | Object: $ObjectName | Status: $AlarmStatus | $Description" |
    Out-File -FilePath $logPath -Append -Encoding UTF8

# Send a Teams webhook (replace URI with your connector URL)
$teamsWebhook = "https://outlook.office.com/webhook/YOUR-WEBHOOK-URI"

$teamsBody = @{
    "@type"      = "MessageCard"
    "@context"   = "http://schema.org/extensions"
    "summary"    = "Veeam ONE HA Alarm"
    "themeColor" = if ($AlarmStatus -eq "Error") { "FF0000" } else { "FFA500" }
    "title"      = "🔔 Veeam ONE: $AlarmName"
    "sections"   = @(@{
        "facts" = @(
            @{ "name" = "Object";    "value" = $ObjectName   }
            @{ "name" = "Status";    "value" = $AlarmStatus  }
            @{ "name" = "Time";      "value" = $TriggeredAt  }
            @{ "name" = "Details";   "value" = $Description  }
        )
    })
} | ConvertTo-Json -Depth 5

try {
    Invoke-RestMethod -Uri $teamsWebhook -Method POST -Body $teamsBody -ContentType "application/json"
    "[$timestamp] Teams notification sent." | Out-File -FilePath $logPath -Append -Encoding UTF8
} catch {
    "[$timestamp] ERROR sending Teams notification: $_" | Out-File -FilePath $logPath -Append -Encoding UTF8
}

In the alarm’s Actions tab, set the Run script action value to:

powershell.exe -ExecutionPolicy Bypass -File “C:\Scripts…” ‘%1’ ‘%2’ ‘%3’ ‘%4’ ‘%5’

Step 6 — Full Alarm Audit Report (Scheduled)
#

Run this weekly via Windows Task Scheduler to produce a CSV audit of all HA alarm activity for your change management records.

$reportPath = "C:\Reports\VeeamONE-HAAlarm-Audit-$(Get-Date -Format 'yyyyMMdd').csv"

$allTriggered = Invoke-RestMethod `
    -Method  GET `
    -Uri     "$baseUri/api/v2.2/alarms/triggered?limit=500" `
    -Headers $headers

$report = $allTriggered.items |
    Where-Object { $_.name -match "HA|Backup Server|Cluster|Failover|Sync" } |
    Select-Object `
        triggeredAlarmId,
        name,
        status,
        triggeredTime,
        @{N="ObjectName"; E={$_.alarmAssignment.objectName}},
        @{N="ObjectType"; E={$_.alarmAssignment.objectType}},
        repeatCount,
        description

$report | Export-Csv -Path $reportPath -NoTypeInformation -Encoding UTF8
Write-Host "📄 Report saved to: $reportPath" -ForegroundColor Cyan

Dashboard Configuration
#

Veeam ONE Monitor’s Business View is ideal for building an HA cluster dashboard.

Navigate to Business View and create a new Business Group called VSA HA Cluster
Add both backup server objects and their underlying VMs to the group
Pin the following widgets to the group dashboard:
- Protected VMs — confirms all VMs remain protected post-failover
- Backup Infrastructure Health — overall status of both nodes
- Active Alarms — filtered to the cluster group
- Job Session History (Last 24h) — quickly spot failed or missed jobs
Save and set the dashboard as a favorite for quick access

In Veeam ONE Reporter, schedule a weekly Infrastructure Assessment report scoped to the cluster group and email it to stakeholders. This gives a historical record of any failover events, alarm activity, and resource utilization trends.

Testing Your Monitoring Setup
#

Before you rely on these alarms in production, validate them with a controlled test:

Simulate a failover using the Veeam Backup & Replication console (or by gracefully shutting down the primary node in a maintenance window)
Confirm the HA Cluster Node Failover Detected alarm fires within the expected timeframe
Verify notification emails/webhooks and Teams messages are delivered
Check that the secondary node now shows as Active in Veeam ONE
Restore the primary node and confirm HA Cluster Sync Status returns to green once replication catches up

Document the alarm-to-notification latency you observe — this becomes your baseline SLA for incident response.

Best Practices
#

Never rely solely on Veeam ONE for HA cluster monitoring; integrate it with your broader monitoring stack (Zabbix, PRTG, Azure Monitor) via SNMP traps or syslog forwarding
Review alarm thresholds quarterly — resource baselines shift as your environment grows, and stale thresholds lead to alarm fatigue
Keep Veeam ONE updated alongside VBR — newer versions introduce improved HA cluster awareness and alarm coverage
Test failover scenarios at least twice a year and validate that monitoring correctly reflects the post-failover state
Store the REST API token securely — use Windows Credential Manager or a secrets manager rather than hardcoding credentials in scripts
Use Business View grouping consistently so reports and dashboards remain accurate even as you add nodes or migrate VMs

What Is the Veeam VSA HA Cluster?#

Architecture Overview#

Connecting Veeam ONE to the HA Cluster#

Alarm Lifecycle#

Key Alarms to Configure#

Backup Server Health Alarms#

HA Cluster Node Alarms#

VM Resource Alarms#

Backup Job Alarms#

Alarm Severity Flow#

Scoping Alarms to the Cluster#

Setting Up Notification Rules#

Automating Alarm Configuration with PowerShell#

Step 1 — Authenticate and Get a Token#

Step 2 — List All Alarm Templates (Find HA Alarm IDs)#

Step 3 — Query Currently Triggered Alarms#

Step 4 — Resolve an Alarm Programmatically#

Step 5 — Configure an Alarm Script Action (Run on Trigger)#

Step 6 — Full Alarm Audit Report (Scheduled)#

Dashboard Configuration#

Testing Your Monitoring Setup#

Best Practices#

What Is the Veeam VSA HA Cluster?
#

Architecture Overview
#

Connecting Veeam ONE to the HA Cluster
#

Alarm Lifecycle
#

Key Alarms to Configure
#

Backup Server Health Alarms
#

HA Cluster Node Alarms
#

VM Resource Alarms
#

Backup Job Alarms
#

Alarm Severity Flow
#

Scoping Alarms to the Cluster
#

Setting Up Notification Rules
#

Automating Alarm Configuration with PowerShell
#

Step 1 — Authenticate and Get a Token
#

Step 2 — List All Alarm Templates (Find HA Alarm IDs)
#

Step 3 — Query Currently Triggered Alarms
#

Step 4 — Resolve an Alarm Programmatically
#

Step 5 — Configure an Alarm Script Action (Run on Trigger)
#

Step 6 — Full Alarm Audit Report (Scheduled)
#

Dashboard Configuration
#

Testing Your Monitoring Setup
#

Best Practices
#