Skip to main content
Viptela is now part of Cisco.
Support
Product Documentation
Viptela Documentation

Monitor Alarms and Events

When something of interest happens on individual devices in the overlay network, they create events and forward them to the vManage NMS. The vManage NMS filters the event notifications, correlated related events, and consolidates them into alarms. Alarms report only major or critical events notifications. They ignore less important events.

You can view alarms from the vManage Dashboard, by clicking the Alarm Bell icon in the top bar. In the Alarm Bell, the alarms are grouped into Active or Cleared. You can also view alarms from the vManage Monitor ► Alarms screen. You can view the individual events from the vManage Monitor ► Events screen.

You can also collect alarm information using the vManage REST APIs.

The following table explains the differences between alarms and events:

  Alarms Events
Device source vManage NMS vEdge routers and vSmart controllers
Type of information Global and device-specific Per-device
Ways of extracting alarm information REST and bulk APIs Netconf, REST bulk APIs, SNMP traps, syslog messages

It is recommended that you use alarms rather than events and notifications. The vManage NMS aggressively filters and aggregates events and notifications to provide you with a comprehensive and clear view of the events that are occurring. Manually gathering, filtering, and aggregating events on an individual device provides information only about that device.

This articles describes each of the alarms generated by the vManage NMS and shows the REST and bulk API calls to use to collect alarm information from a vManage NMS.

Alarm States

vManage alarms are assigned a state based on their severity:

  • Critical (red)—Serious events that impair or shut down the operation of an overlay network function.
  • Major (yellow)—Serious events that affect, but do not shut down, the operational of a network function.
  • Medium (blue)—Events that might impair the performance of a network function.
  • Minor (green)—Events that might diminish the performance of a network function.

The alarms listed as Active generally have a severity of either critical or major.

When the notification events that the vManage NMS receives indicate that the alarm condition has passed, most alarms clear themselves automatically. The vManage NMS then lists the alarm as Cleared, and the alarm state generally changes to medium or minor.

Permanent Alarms

Permanent alarms are alarms that are permanently included in the vManage software. In addition to the alarm fields lists for each alarm in the table, all alarms show their start time.

Alarm Severity Cleared Automatically Alarm Fields Description
BFD node down Critical Yes hostname, site-id, system-ip A particular node has lost all its BFD sessions. It no longer has data connectivity to any other nodes in the network.
BFD site down Critical Yes hostname, site-id, system-ip All BFD sessions for all nodes in a particular site are down. Within the site, nodes may have BFD tunnels with each other, but they have no data connectivity with the rest of the network.
BFD TLOC down Major Yes color, hostname, site-id, system-ip All BFD sessions for a particular TLOC (transport interface) on a vEdge router with multiple TLOCs are down.
BGP router down Critical Yes hostname, router-id, system-ip,vpn-id All BGP peering sessions from a particular BGP router to all its BGP peers on other vEdge routers are down.
Control all vSmarts down Critical Yes hostname, site-id, system-ip

All control connections for all vSmart controllers in the network are down. This is a catastrophic event, because the entire control plane is down.

Control node down (vEdge) Critical Yes hostname, site-id, system-ip A vEdge router has lost all its control connections and no longer has control plane connectivity to any other node in the network.
Control site down Critical Yes hostname, site-id, system-ip All control connections for all nodes in a particular site are down. Within the site, nodes may have control connections with each other, but they have no control plane with the rest of the network.
Control TLOC down Major Yes color, hostname, site-id, system-ip All control connections for a particular TLOC (transport interface) on a vEdge router with multiple TLOCs are down.
Control vSmart down Critical Yes hostname, site-id, system-ip A vSmart control has lost all its control connections and no longer has control plane connectivity to any other node in the network.
OMP all vSmarts down Critical Yes hostname, site-id, system-ip

All OMP sessions for all vSmart controllers in the network are down. This is a catastrophic event, because the entire control plane is down. You will receive no information from OMP.

OMP node down Critical Yes hostname, site-id, system-ip A vEdge router or vSmart controller has lost all its control connections and no longer has control plane connectivity to any other node in the network, and it will receive no information from OMP.
OMP site down Critical Yes hostname, site-id, system-ip All control connections for all nodes in a particular site are down. Within the site, nodes may have control connections with each other, but they have no control plane with the rest of the network, and they will receive no information from OMP.
OSPF router down Critical Yes hostname, router-id, system-ip,vpn-id All OSPF peering sessions from a particular OSPF router to all its OSPF peers on other vEdge routers are down.

Experimental Alarms

Experimental alarms are alarms that the vManage server might display, but that are likely to be removed in a future release. Following is a list of the experimental alarms as of Release 17.1. All alarms are self-explanatory except as indicated.

  • aaa-admin-pwd-change
  • control-vbond-state-change
  • cpu-load
  • device-activiation-failed
  • device-upgrade-failed
  • dhcp-server-state-change
  • disk-usage—Disk usage on the device has exceeded the predefined threshold of 60 percent
  • domain-id-change
  • interface-state-change
  • mem-usage
  • omp-state-change
  • org-name-change
  • ospf-interface-state-change
  • pim-interface-state-change
  • process-restart
  • pseudo-commit-status
  • security-clear-installed-certificate
  • security-new-csr-generated
  • security-root-cert-chaiin-installed
  • security-root-cert-chain-uninstalled
  • security-vedge-serial-file-uploaded
  • security-vsmart-serial-file-uploaded
  • site-id-change
  • system-ip-change
  • system-ip-reuse
  • system-reboot-issued
  • template-rollback
  • ztp-upgrade-failed

Alarm Fields

Alarms can contain the following fields:

Field Description
acknowledged

Whether the alarm has been viewed and acknowledged. This field allows the vManage NMS to distinguish between alarms that have already been reported and those that have not yet been addressed. To acknowledge an alarm, use the following API post call:

https://vmanage-ip-address:8443/dataservice/alarms/markviewed

Specify the data as:

{“uuid”: [<uuids of alarms to acknowledge>]}

active Whether the alarm is still active. For alarms that are automatically cleared, when a network element recovers, the alarm is marked as "active":false.
cleared_time Time when alarm was cleared. This field is present of for alarms whose "active" field is false.
devices List of system IP addresses or router IDs of the affected devices.
entry_time Time when the alarm was raised, in milliseconds, expressed in UNIX time.
message Short message that describes the alarm.
possible_causes Possible causes for the event.
rule_name_display Name of the alarm. Use this name when querying for alarms of a particular type.
severity Severity of the alarm: critical, major, medium, minor.
severity_number Integer value for the severity: 1 (critical), 2 (major), 3 (medium), 4 (minor)
uuid Unique identifier for the alarm
values Set of values for all the affected devices. These values, which are different for each alarm, are in addition to those shown in the "devices" field.
values_short_display Subset of the values field that provides a summary of the affected network devices.

Revision History

Introduced in vManage NMS in Release 16.2.
In Release 17.1, update list of permanent and experimental alarms.

  • Was this article helpful?