Cisco SD-WAN controller troubleshooting

There are many things to configure when building a Cisco SD-WAN lab. This note provides an overview of everything to check when you are configuring your vManage, vSmart, and vBond controllers.

For vEdge troubleshooting, take a look at:

Cisco SD-WAN vEdge Troubleshooting

The controllers in this example have these IP addresses:

  • vManage: 20.1.0.1
  • vBond: 20.1.0.2
  • vSmart: 20.1.0.3

VPN 0

Because so many things can go wrong in a nested virtualization setup (running EVE-NG on a hypervisor like ESX or Proxmox), make sure the controllers can reach each other:

vManage1# ping 20.1.0.2 count 3 Ping in VPN 0 PING 20.1.0.2 (20.1.0.2) 56(84) bytes of data. 64 bytes from 20.1.0.2: icmp_seq=1 ttl=64 time=24.7 ms 64 bytes from 20.1.0.2: icmp_seq=2 ttl=64 time=24.2 ms 64 bytes from 20.1.0.2: icmp_seq=3 ttl=64 time=5.42 ms
vManage1# ping 20.1.0.3 count 3 Ping in VPN 0 PING 20.1.0.3 (20.1.0.3) 56(84) bytes of data. 64 bytes from 20.1.0.3: icmp_seq=1 ttl=64 time=1.64 ms 64 bytes from 20.1.0.3: icmp_seq=2 ttl=64 time=1.30 ms 64 bytes from 20.1.0.3: icmp_seq=3 ttl=64 time=1.26 ms

If not, check your VPN0 settings:

vManage1# show run vpn 0 vpn 0 interface eth0 ip address 20.1.0.1/24 ipv6 dhcp-client tunnel-interface allow-service all allow-service dhcp allow-service dns allow-service icmp no allow-service sshd no allow-service netconf no allow-service ntp no allow-service stun allow-service https ! no shutdown
vBond1# show run vpn 0 vpn 0 interface ge0/0 ip address 20.1.0.2/24 ipv6 dhcp-client tunnel-interface encapsulation ipsec allow-service all no allow-service bgp allow-service dhcp allow-service dns allow-service icmp no allow-service sshd no allow-service netconf no allow-service ntp no allow-service ospf no allow-service stun allow-service https ! no shutdown
vSmart1# show run vpn 0 vpn 0 interface eth0 ip address 20.1.0.3/24 ipv6 dhcp-client tunnel-interface allow-service all allow-service dhcp allow-service dns allow-service icmp no allow-service sshd no allow-service netconf no allow-service ntp no allow-service stun ! no shutdown

Time

The clock has to match on all controllers:

Cisco SD-WAN clock command

System Configuration

These items have to match:

  • System-IP
  • Site-ID
  • Organization-Name
  • vBond address

This is how you can verify them.

Running Configuration

You can look in the running configuration and only look at the system settings:

vManage1# show run system system host-name vManage1 system-ip 172.16.1.101 site-id 1 admin-tech-on-failure sp-organization-name nwl-sdwan-lab organization-name nwl-sdwan-lab vbond 20.1.0.2
vBond1# show run system system host-name vBond1 system-ip 172.16.1.102 site-id 1 admin-tech-on-failure no route-consistency-check organization-name nwl-sdwan-lab vbond 20.1.0.2 local
vSmart1# show run system system host-name vSmart1 system-ip 172.16.1.103 site-id 1 admin-tech-on-failure organization-name nwl-sdwan-lab vbond 20.1.0.2

Show control local-properties

You can also use show control local-properties which gives a bit more information.

vManage1# show control local-properties personality vmanage sp-organization-name nwl-sdwan-lab organization-name nwl-sdwan-lab root-ca-chain-status Installed certificate-status Installed certificate-validity Valid certificate-not-valid-before Aug 11 12:09:05 2021 GMT certificate-not-valid-after Aug 11 12:09:05 2026 GMT dns-name 20.1.0.2 site-id 1 domain-id 0 protocol dtls tls-port 23456 system-ip 172.16.1.101 chassis-num/unique-id 5cc3dc32-4eeb-42ff-a519-3c93bce77b07 serial-num E465F7187613ECAA subject-serial-num N/A cloud-hosted no token -NA- retry-interval 0:00:00:15 no-activity-exp-interval 0:00:00:20 dns-cache-ttl 0:00:02:00 port-hopped FALSE time-since-last-port-hop 0:00:00:00 cdb-locked false number-vbond-peers 1 INDEX IP PORT ----------------------------------------------------- 0 20.1.0.2 12346 number-active-wan-interfaces 4 PUBLIC PUBLIC PRIVATE PRIVATE PRIVATE LAST INSTANCE INTERFACE IPv4 PORT IPv4 IPv6 PORT VS/VM COLOR STATE CONNECTION --------------------------------------------------------------------------------------------------------------------------------------------------------------- 0 eth0 20.1.0.1 12346 20.1.0.1 :: 12346 1/0 default up 0:00:00:09 1 eth0 20.1.0.1 12446 20.1.0.1 :: 12446 0/0 default up 0:00:00:09 2 eth0 20.1.0.1 12546 20.1.0.1 :: 12546 0/0 default up 0:00:00:10 3 eth0 20.1.0.1 12646 20.1.0.1 :: 12646 0/0 default up 0:00:00:10

Certificates

Make sure each controller has the required certificates:

  • Installed root CA certificate
  • Installed server certificate

You can use the show certificate command to see a lot of details:

vManage1# show certificate ? Possible completions: installed Display installed certificate reverse-proxy Display installed reverse-proxy certificate root-ca-cert Display installed root certificate serial Display installed certificate serial signing-request Display certificate signing request umbrella-root-ca-cert Display installed Umbrella root certificate validity Display installed certificate validity vmanage-root-ca-cert Display installed root certificate

Root certificate

You can see whether this is installed using these commands:

vManage1# show control local-properties root-ca-chain-status control local-properties root-ca-chain-status Installed

For a more detailed look:

vManage1# show certificate root-ca-cert | more Certificate: Data: Version: 3 (0x2) Serial Number: 18:da:d1:9e:26:7d:e8:bb:4a:21:58:cd:cc:6b:3b:4a Signature Algorithm: sha1WithRSAEncryption Issuer: C=US, O=VeriSign, Inc., OU=VeriSign Trust Network, OU=(c) 2006 VeriSign, Inc. - For authorized use only, CN=VeriSign Class 3 Public Primary Certification Authority - G5 Validity Not Before: Nov 8 00:00:00 2006 GMT Not After : Jul 16 23:59:59 2036 GMT Subject: C=US, O=VeriSign, Inc., OU=VeriSign Trust Network, OU=(c) 2006 VeriSign, Inc. - For authorized use only, CN=VeriSign Class 3 Public Primary Certification Authority - G5 Subject Public Key Info: Public Key Algorithm: rsaEncryption Public-Key: (2048 bit) Modulus: 00:af:24:08:08:29:7a:35:9e:60:0c:aa:e7:4b:3b: 4e:dc:7c:bc:3c:45:1c:bb:2b:e0:fe:29:02:f9:57: 08:a3:64:85:15:27:f5:f1:ad:c8:31:89:5d:22:e8: 2a:aa:a6:42:b3:8f:f8:b9:55:b7:b1:b7:4b:b3:fe: 8f:7e:07:57:ec:ef:43:db:66:62:15:61:cf:60:0d: a4:d8:de:f8:e0:c3:62:08:3d:54:13:eb:49:ca:59: 54:85:26:e5:2b:8f:1b:9f:eb:f5:a1:91:c2:33:49: d8:43:63:6a:52:4b:d2:8f:e8:70:51:4d:d1:89:69: 7b:c7:70:f6:b3:dc:12:74:db:7b:5d:4b:56:d3:96: bf:15:77:a1:b0:f4:a2:25:f2:af:1c:92:67:18:e5: f4:06:04:ef:90:b9:e4:00:e4:dd:3a:b5:19:ff:02: ba:f4:3c:ee:e0:8b:eb:37:8b:ec:f4:d7:ac:f2:f6: f0:3d:af:dd:75:91:33:19:1d:1c:40:cb:74:24:19: 21:93:d9:14:fe:ac:2a:52:c7:8f:d5:04:49:e4:8d: 63:47:88:3c:69:83:cb:fe:47:bd:2b:7e:4f:c5:95: ae:0e:9d:d4:d1:43:c0:67:73:e3:14:08:7e:e5:3f: 9f:73:b8:33:0a:cf:5d:3f:34:87:96:8a:ee:53:e8: 25:15 Exponent: 65537 (0x10001) X509v3 extensions: X509v3 Basic Constraints: critical CA:TRUE X509v3 Key Usage: critical Certificate Sign, CRL Sign [output omitted]

Server Certificate

You can use this command:

vManage1# show certificate installed | more Server certificate ------------------ Certificate: Data: Version: 1 (0x0) Serial Number: e4:65:f7:18:76:13:ec:aa Signature Algorithm: sha256WithRSAEncryption Issuer: C=NL, ST=NL, O=nwl-lab-sdwan, CN=vmanage1.lab.nwl.ai Validity Not Before: Aug 11 12:09:05 2021 GMT Not After : Aug 11 12:09:05 2026 GMT Subject: C=US, ST=California, L=San Jose, OU=nwl-sdwan-lab, O=Viptela LLC, CN=vmanage-5cc3dc32-4eeb-42ff-a519-3c93bce77b07-0.viptela.com/emailAddress=support@viptela.com Subject Public Key Info: Public Key Algorithm: rsaEncryption Public-Key: (2048 bit)

A quick way to check the server certificate is:

vManage1# show control local-properties | include certificate certificate-status Installed certificate-validity Valid certificate-not-valid-before Aug 11 12:09:05 2021 GMT certificate-not-valid-after Aug 11 12:09:05 2026 GMT
vBond1# show control local-properties | include certificate certificate-status Installed certificate-validity Valid certificate-not-valid-before Aug 11 12:25:44 2021 GMT certificate-not-valid-after Aug 11 12:25:44 2026 GMT
vSmart1# show control local-properties | include certificate certificate-status Installed certificate-validity Valid certificate-not-valid-before Aug 11 19:28:37 2021 GMT certificate-not-valid-after Aug 11 19:28:37 2026 GMT

Connection History

Last but not least, we can check the connection attempts between the controllers. You can see connection attempts in the history:

vSmart1# show control connections-history Legend for Errors ACSRREJ - Challenge rejected by peer. NOVMCFG - No cfg in vmanage for device. BDSGVERFL - Board ID Signature Verify Failure. NOZTPEN - No/Bad chassis-number entry in ZTP. BIDNTPR - Board ID not Initialized. OPERDOWN - Interface went oper down. BIDNTVRFD - Peer Board ID Cert not verified. ORPTMO - Server's peer timed out. BIDSIG - Board ID signing failure. RMGSPR - Remove Global saved peer. CERTEXPRD - Certificate Expired RXTRDWN - Received Teardown. CRTREJSER - Challenge response rejected by peer. RDSIGFBD - Read Signature from Board ID failed. CRTVERFL - Fail to verify Peer Certificate. SERNTPRES - Serial Number not present. CTORGNMMIS - Certificate Org name mismatch. SSLNFAIL - Failure to create new SSL context. DCONFAIL - DTLS connection failure. STNMODETD - Teardown extra vBond in STUN server mode. DEVALC - Device memory Alloc failures. SYSIPCHNG - System-IP changed. DHSTMO - DTLS HandShake Timeout. SYSPRCH - System property changed DISCVBD - Disconnect vBond after register reply. TMRALC - Timer Object Memory Failure. DISTLOC - TLOC Disabled. TUNALC - Tunnel Object Memory Failure. DUPCLHELO - Recd a Dup Client Hello, Reset Gl Peer. TXCHTOBD - Failed to send challenge to BoardID. DUPSER - Duplicate Serial Number. UNMSGBDRG - Unknown Message type or Bad Register msg. DUPSYSIPDEL- Duplicate System IP. UNAUTHEL - Recd Hello from Unauthenticated peer. HAFAIL - SSL Handshake failure. VBDEST - vDaemon process terminated. IP_TOS - Socket Options failure. VECRTREV - vEdge Certification revoked. LISFD - Listener Socket FD Error. VSCRTREV - vSmart Certificate revoked. MGRTBLCKD - Migration blocked. Wait for local TMO. VB_TMO - Peer vBond Timed out. MEMALCFL - Memory Allocation Failure. VM_TMO - Peer vManage Timed out. NOACTVB - No Active vBond found to connect. VP_TMO - Peer vEdge Timed out. NOERR - No Error. VS_TMO - Peer vSmart Timed out. NOSLPRCRT - Unable to get peer's certificate. XTVMTRDN - Teardown extra vManage. NEWVBNOVMNG- New vBond with no vMng connections. XTVSTRDN - Teardown extra vSmart. NTPRVMINT - Not preferred interface to vManage. STENTRY - Delete same tloc stale entry. HWCERTREN - Hardware vEdge Enterprise Cert Renewed HWCERTREV - Hardware vEdge Enterprise Cert Revoked. EMBARGOFAIL - Embargo check failed PEER PEER PEER PEER PEER SITE DOMAIN PEER PRIVATE PEER PUBLIC LOCAL REMOTE REPEAT INSTANCE TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT PUBLIC IP PORT REMOTE COLOR STATE ERROR ERROR COUNT DOWNTIME --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0 vbond dtls 0.0.0.0 0 0 20.1.0.2 12346 20.1.0.2 12346 default connect DCONFAIL NOERR 1 2024-02-07T08:24:58+0000 1 vedge dtls 172.16.1.1 2 1 20.1.0.101 12346 20.1.0.101 12346 public-internet tear_down VP_TMO NOERR 0 2024-02-07T08:30:35+0000 1 vbond dtls 0.0.0.0 0 0 20.1.0.2 12346 20.1.0.2 12346 default connect DCONFAIL NOERR 1 2024-02-07T08:24:59+0000

This gives you an overview of all possible errors and connection attempts. You can find a detailed explanation of all errors here:

https://community.cisco.com/t5/networking-knowledge-base/sd-wan-routers-troubleshoot-control-connections/ta-p/3813237#toc-hId-559959951