Troubleshooting OSPF Neighbor Adjacency

mark.postma@versa-networks.com

Troubleshooting OSPF Neighbor Adjacencies

by Radu Pavaloiu

When the OSPF neighbor adjacency is not in the full state then it is in one of the other states:

There’s no OSPF neighbor at all
It’s stuck in ATTEMPT
It’s stuck in INIT
It’s stuck in 2-WAY
It’s stuck in EXSTART/EXCHANGE
It’s stuck in LOADING

1. OSPF is not enabled on the interface

Let’s start with our first scenario. We have two OSPF routers, one of them is a Versa appliance:

Unfortunately, they do not become neighbors:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR
routing-instance Tenant1-LAN-VR
[ok][2020-06-26 01:28:32]

As you can see Versa box does not report the other router as a neighbor. So, what could be wrong? Let’s take a look on the Versa appliance and on the peer:

admin@branch6-cli> show ospf interface brief
Codes for operation state (State):
BDR - backup designated router; DR - designated router;
ODR - other designated router; PTP - point to point
dwn - down; lbk - loopback; wtg - waiting
Interface Area DR ID BDR ID State
--------- ---- ----- ------ -----

IOU1#show ip ospf 1 interface brief | include Et0/1.10
Et0/1.10 1 0.0.0.0 192.168.254.10/30 10 P2P 0/0

We can see that OSPF is not enabled on vni-0/4.10 of the Versa appliance but it’s running on its peer. After this is fixed, the OSPF neighborship is established without any problem:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 full 1.1.1.1 1 up

2. Layer 2 is down

OSPF adjacency is reported as down:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR
routing-instance Tenant1-LAN-VR
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 down 1.1.1.1 1 down

Checking the interface operational state and OSPF operational state we can see that both are reported as being down:

admin@branch6-cli> show ospf interface extensive 192.168.254.9
Codes for operation state (State):
BDR - backup designated router; DR - designated router;
ODR - other designated router; PTP - point to point
dwn - down; lbk - loopback; wtg - waiting
Interface Area DR ID BDR ID State
--------- ---- ----- ------ -----
vni-0/4.10 0.0.0.0 none none dwn
Interface ID: 12
Admin state: enabled; MTU: 1500
Interface op state: down; OSPF op state: down <<<<<<
Link IP address: 192.168.254.9; Mask: 255.255.255.252
Area: 0.0.0.0; Router ID: 6.6.6.6; Network type: point-to-point
Instance ID: 3014; Cost: 1; Priority: 1
Number of OSPF interface state changes or error: 2
LSA count: 0; Checksum: 0x00000000
Timer intervals configured
Hello: 10 secs; Dead: 40 secs
Transit delay: 1 sec; Retransmit: 5 secs

Troubleshooting should start with checking unicast connectivity from the peer router:

IOU1#ping vrf Tenant1 192.168.254.9
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.254.9, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

After the L2 issue is fixed the unicast reachability between the peers is OK and the OSPF adjacency should come up fine:

IOU1#ping vrf Tenant1 192.168.254.9
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.254.9, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 2/2/3 ms

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR
routing-instance Tenant1-LAN-VR
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 full 1.1.1.1 1 up

3. OSPF Passive Interface

Let’s look at the next issue, same two routers but a different problem. Let’s check it out:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
% No entries found.

As you can see there is no OSPF neighbor. By checking if OSPF is enabled on the interface everything looks OK:

admin@branch6-cli> show ospf interface brief
Codes for operation state (State):
BDR - backup designated router; DR - designated router;
ODR - other designated router; PTP - point to point
dwn - down; lbk - loopback; wtg - waiting
Interface Area DR ID BDR ID State
--------- ---- ----- ------ -----
vni-0/4.10 0.0.0.0 none none PTP

However, if OSPF traffic is sniffed we can see that only one router is sending OSPF hellos:

admin@branch6-cli> tcpdump vni-0/4 filter "host 224.0.0.5" | grep "vlan 10"
03:54:30.992335 aa:bb:cc:00:01:10 > 01:00:5e:00:00:05, ethertype 802.1Q (0x8100), length 94: vlan 10, p 0, ethertype IPv4, 192.168.254.10 > 224.0.0.5: OSPFv2, Hello, length 56
03:54:40.020335 aa:bb:cc:00:01:10 > 01:00:5e:00:00:05, ethertype 802.1Q (0x8100), length 94: vlan 10, p 0, ethertype IPv4, 192.168.254.10 > 224.0.0.5: OSPFv2, Hello, length 56
03:54:49.844133 aa:bb:cc:00:01:10 > 01:00:5e:00:00:05, ethertype 802.1Q (0x8100), length 94: vlan 10, p 0, ethertype IPv4, 192.168.254.10 > 224.0.0.5: OSPFv2, Hello, length 56
03:54:58.968134 aa:bb:cc:00:01:10 > 01:00:5e:00:00:05, ethertype 802.1Q (0x8100), length 94: vlan 10, p 0, ethertype IPv4, 192.168.254.10 > 224.0.0.5: OSPFv2, Hello, length 56
03:55:08.124232 aa:bb:cc:00:01:10 > 01:00:5e:00:00:05, ethertype 802.1Q (0x8100), length 94: vlan 10, p 0, ethertype IPv4, 192.168.254.10 > 224.0.0.5: OSPFv2, Hello, length 56

If an interface is configured as passive, then its network will still be advertised but it won’t send any OSPF hello packets. This way it’s impossible to form an OSPF neighbor adjacency. After the issue is fixed, we can see OSPF hellos sent by both routers and the adjacency coming up:

04:07:20.624153 0c:e9:f6:f0:7f:05 > 01:00:5e:00:00:05, ethertype 802.1Q (0x8100), length 86: vlan 10, p 0, ethertype IPv4, 192.168.254.9 > 224.0.0.5: OSPFv2, Hello, length 48
04:07:22.452133 aa:bb:cc:00:01:10 > 01:00:5e:00:00:05, ethertype 802.1Q (0x8100), length 98: vlan 10, p 0, ethertype IPv4, 192.168.254.10 > 224.0.0.5: OSPFv2, Hello, length 60

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.11 vni-0/4.10 full 1.1.1.1 1 up

4. OSPF Multicast Filtering

Next scenario, same two routers but different issue. Again, the two routers are not establishing the neighborship:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 init 1.1.1.1 1 up

IOU1#show ip ospf 1 neighbor

Interesting, Versa device is showing the OSPF neighbor to be in the INIT state, while its peer is showing nothing. Since the Versa appliance is showing the INIT state we can draw the conclusion that it’s receiving something from R2. Its peer isn’t showing anything so it’s probably not receiving anything. OSPF uses hello packets to establish the OSPF neighbor adjacency and these are sent using the 224.0.0.5 multicast address. Let’s check on the Versa device if this address is reachable over vni-0/4.10:

admin@branch6-cli> ping 224.0.0.5 routing-instance Tenant1-LAN-VR interface vni-0/4.10 count 1
PING 224.0.0.5 (224.0.0.5) from 192.168.254.9 vni-0_4.10: 56(84) bytes of data.

--- 224.0.0.5 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

However, the peer is reachable via its unicast address:

admin@branch6-cli> ping 192.168.254.10 routing-instance Tenant1-LAN-VR count 1
PING 192.168.254.10 (192.168.254.10) 56(84) bytes of data.
64 bytes from 192.168.254.10: icmp_seq=1 ttl=255 time=1.58 ms

--- 192.168.254.10 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.582/1.582/1.582/0.000 ms

Checking the Versa’s peer we can see that the OSPF multicast address is filtered out:

interface Ethernet0/1.10
encapsulation dot1Q 10
ip vrf forwarding Tenant1
ip address 192.168.254.10 255.255.255.252
ip access-group 100 in <<<<<<
ip ospf network point-to-point
ip ospf 1 area 0.0.0.0

IOU1#show access-lists 100
Extended IP access list 100
10 deny ip any host 224.0.0.5 log-input (556 matches) <<<<<<
20 permit ip any any (6230 matches)

After the filtering issue is solved the OSPF multicast address is reachable on the Versa appliance and the neighborship is coming up:

admin@branch6-cli> ping 224.0.0.5 routing-instance Tenant1-LAN-VR interface vni-0/4.10 count 1
PING 224.0.0.5 (224.0.0.5) from 192.168.254.9 vni-0_4.10: 56(84) bytes of data.
64 bytes from 192.168.254.10: icmp_seq=1 ttl=255 time=1.77 ms

--- 224.0.0.5 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.778/1.778/1.778/0.000 ms
admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 full 1.1.1.1 1 up

5. Mismatching OSPF network type

This time Versa appliance interface is configured as OSPF Network type broadcast and the peer router has its interface configured as point-to-point. Now the OSPF state is reported as 2-WAY. This means that the peers are seeing each other. The 2-WAY state is attained when the router receiving the hello packet sees its on Router ID within the received hello packet’s neighbor field. At this state, a router decides whether to become adjacent with this neighbor. However, this decision is based on media type:

Broadcast (become adjacent only with the DR and BDR)
Point-to-Point or Point-to-Multipoint (become adjacent with all the connected routers)

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 2-way 1.1.1.1 1 up

It is easy to spot the issue by checking the OSPF network type on both routers. As you can see on the Versa side the interface network type is broadcast and DR/BDR systems were elected:

admin@branch6-cli> show ospf interface detail 192.168.254.9
Codes for operation state (State):
BDR - backup designated router; DR - designated router;
ODR - other designated router; PTP - point to point
dwn - down; lbk - loopback; wtg - waiting
Interface Area DR ID BDR ID State
--------- ---- ----- ------ -----
vni-0/4.10 0.0.0.0 192.168.254.9 192.168.254.10 DR
Interface ID: 12
Admin state: enabled; MTU: 1500
Interface op state: up; OSPF op state: designated router
Link IP address: 192.168.254.9; Mask: 255.255.255.252
Area: 0.0.0.0; Router ID: 6.6.6.6; Network type: broadcast
Instance ID: 3014; Cost: 1; Priority: 1

On the peer side the network type is point-to-point:

IOU1#show ip ospf 1 interface brief
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Et1/3.10 1 0.0.0.0 30.30.30.1/24 10 DR 0/0
Et0/2.10 1 0.0.0.0 192.168.254.14/30 10 P2P 0/1
Et0/1.10 1 0.0.0.0 192.168.254.10/30 10 P2P 1/1 <<<<<<

After the network type mismatch is solved the adjacency is fully coming up. As a note, neighbors reported in 2-WAY state is normal in a broadcast network type for normal routers (if it is not elected as a DR or a BDR).

6. OSPF Subnet Mask mismatch

Once more the OSPF adjacency is having issues:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR
routing-instance Tenant1-LAN-VR

Two peers are checking if the subnet is matching via the OSPF hello packets. To troubleshoot this issue OSPF debugging for hello packets should be performed:

admin@branch6-cli> show configuration routing-instances Tenant1-LAN-VR protocols ospf 3014 debug | display set
set routing-instances Tenant1-LAN-VR protocols ospf 3014 debug flags [ hello ]
set routing-instances Tenant1-LAN-VR protocols ospf 3014 debug level debug

The log events are saved on the bellow path:

/var/log/versa/ rtdtrc.log

In the log we can see the “network mask” mismatch and also that the peer OSPF hello packets are discarded.

20-06-26 07:09:32.00: qonmhllo.c 0075(Tenant1-LAN-VR): hello buf 0x7f478f785fb0, if_cb 0x7f4787d57030, router_id 0x7ffeccd3fc20
20-06-26 07:09:32.00: qon2hllo.c 0067(Tenant1-LAN-VR): qonm_verify_hello_packet
20-06-26 07:09:32.00: qon2hllo.c 0094(Tenant1-LAN-VR): hello interval 10 expected 10
20-06-26 07:09:32.00: qon2hllo.c 0180(Tenant1-LAN-VR): Network masks do not match
20-06-26 07:09:32.00: qon2hllo.c 0300(Tenant1-LAN-VR): Invalid packet received - log problem 3
20-06-26 07:09:32.00: qonmhllo.c 0350(Tenant1-LAN-VR): Mismatched network masks
20-06-26 07:09:32.00: qonmhllo.c 0503(Tenant1-LAN-VR): qonm_packet_dropped_event
20-06-26 07:09:32.00: qonmhllo.c 0522(Tenant1-LAN-VR): Reason code: 7
20-06-26 07:09:32.00: qonmhllo.c 0145(Tenant1-LAN-VR): Bad packet - discarding

As a note, the subnet mask is checked only on the multi-access interfaces and is ignored on point-to-point links. The source of this seemingly strange behavior is Section 10.5 of RFC 2328 which says: “However, there is one exception to the above rule: on point-to-point networks and on virtual links, the Network Mask in the received Hello Packet should be ignored.” Versa OS conforms strictly to the RFC and allows OSPF neighbors to form adjacency over a point-to-point link even when the subnet masks do not match. After fixing the “network mask” mismatch the OSPF adjacency is coming up.

7. OSPF Hello & Dead Interval mismatch

The check of hello & dead interval timers matching is also happening via OSPF hello packets exchange. On the Versa appliance peer the default values of OSPF hello and dead timers were changed:

interface Ethernet0/1.10
encapsulation dot1Q 10
ip vrf forwarding Tenant1
ip address 192.168.254.10 255.255.255.252
ip ospf network point-to-point
ip ospf dead-interval 44 <<< default value is 40
ip ospf hello-interval 11 <<< default value is10
ip ospf 1 area 0.0.0.0
end

Checking the OSPF debug log on the Versa appliance we’ll see that the received OSPF hello packet is discarded as the hello interval is not matching the local one. The dead interval is not even checked as the decision was already taken. The dead interval is checked only in case of hello interval matching for the peers.

20-06-26 07:37:07.00: qon2hllo.c 0094(Tenant1-LAN-VR): hello interval 11 expected 10
20-06-26 07:37:07.00: qon2hllo.c 0107(Tenant1-LAN-VR): Hello intervals do not match
20-06-26 07:37:07.00: qon2hllo.c 0300(Tenant1-LAN-VR): Invalid packet received - log problem 1
20-06-26 07:37:07.00: qonmhllo.c 0300(Tenant1-LAN-VR): Mismatched hello intervals
20-06-26 07:37:07.00: qonmhllo.c 0503(Tenant1-LAN-VR): qonm_packet_dropped_event
20-06-26 07:37:07.00: qonmhllo.c 0522(Tenant1-LAN-VR): Reason code: 8
20-06-26 07:37:07.00: qonmhllo.c 0145(Tenant1-LAN-VR): Bad packet - discarding

After the hello & dead intervals are fixed the OSPF adjacency is coming up:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 full 1.1.1.1 1 up

8. OSPF Area mismatch

One more time the adjacency is not coming up:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op ———— ——— —– ———– — — 192.168.254.10 vni-0/4.10 down 1.1.1.1 1 up In this case the Versa peer is part of OSPF area 0.0.0.1:

interface Ethernet0/1.10
encapsulation dot1Q 10
ip vrf forwarding Tenant1
ip address 192.168.254.10 255.255.255.252
ip ospf network point-to-point
ip ospf 1 area 0.0.0.1 <<<<<<
end

Versa log is not very explicit this time however we can see that every 10 sec (default OSPF hello interval time) a packet is dropped:

20-06-26 07:56:23.00: qonmhllo.c 0503(Tenant1-LAN-VR): qonm_packet_dropped_event
20-06-26 07:56:23.00: qonmhllo.c 0522(Tenant1-LAN-VR): Reason code: 2
20-06-26 07:56:32.00: qonmhllo.c 0503(Tenant1-LAN-VR): qonm_packet_dropped_event
20-06-26 07:56:32.00: qonmhllo.c 0522(Tenant1-LAN-VR): Reason code: 2
20-06-26 07:56:42.00: qonmhllo.c 0503(Tenant1-LAN-VR): qonm_packet_dropped_event
20-06-26 07:56:42.00: qonmhllo.c 0522(Tenant1-LAN-VR): Reason code: 2
20-06-26 07:56:51.00: qonmhllo.c 0503(Tenant1-LAN-VR): qonm_packet_dropped_event
20-06-26 07:56:51.00: qonmhllo.c 0522(Tenant1-LAN-VR): Reason code: 2

OSPF adjacency is coming up after the area mismatch is fixed on the Versa peer:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 full 1.1.1.1 1 up

9. OSPF authentication mismatch

OSPF adjacency is in down state between Versa and its peer as the OSPF authentication key is different:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR
routing-instance Tenant1-LAN-VR
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 down 1.1.1.1 1 up

In the Versa log we can see the same kind of output as for the previous issue, however the reason code for dropping the OSPF hello packets is different (Reason code: 6 vs Reason code: 2)

20-06-26 08:20:03.00: qonmhllo.c 0503(Tenant1-LAN-VR): qonm_packet_dropped_event
20-06-26 08:20:03.00: qonmhllo.c 0522(Tenant1-LAN-VR): Reason code: 6
20-06-26 08:20:13.00: qonmhllo.c 0503(Tenant1-LAN-VR): qonm_packet_dropped_event
20-06-26 08:20:13.00: qonmhllo.c 0522(Tenant1-LAN-VR): Reason code: 6
20-06-26 08:20:22.00: qonmhllo.c 0503(Tenant1-LAN-VR): qonm_packet_dropped_event
20-06-26 08:20:22.00: qonmhllo.c 0522(Tenant1-LAN-VR): Reason code: 6

After the OSPF authentication is fixed the adjacency is coming up:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 full 1.1.1.1 1 up
[ok][2020-06-26 08:33:09]

The same troubleshooting method could be used if only one peer has authentication configured or if the authentication method is different (clear text vs MD5)

10. IP MTU mismatch on the interface connecting the peers

In this situation the OSPF state is reported as EXSTART:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 exst 1.1.1.1 1 up

The troubleshooting should start by checking the local interface MTU and trying to send ICMP echo requests (ping) using the payload size reported in the MTU – L2 encapsulation overhead. Also, the DF bit should be set to prevent IP fragmentation.

admin@branch6-cli> show interfaces detail vni-0/4.10 | grep MTU
MTU : 1500

admin@branch6-cli> ping 192.168.254.10 routing-instance Tenant1-LAN-VR count 1 packet-size 1472 df-bit enable
PING 192.168.254.10 (192.168.254.10) 1472(1500) bytes of data.

--- 192.168.254.10 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

Interface MTU check between two peers happens during the Link State Database Synchronization process. There is the interface MTU field in the OSPF Database Description message. Interface MTU specifies the largest unfragmented packet that can be sent from the originating interface. If the MTUs do not match, the peers do not become adjacent. Otherwise, one neighbor might send messages larger than the other neighbor can receive. The troubleshoot should continue by enabling the OSPF database description packet tracing:

set routing-instances Tenant1-LAN-VR protocols ospf 3014 debug flags [ database-description ]
set routing-instances Tenant1-LAN-VR protocols ospf 3014 debug level debug

Checking the log we’ll see that the OSPF state is not moving over the EXSTART state:

20-06-29 02:05:45.00: qoamddsc.c 0106(Tenant1-LAN-VR): qoam_rcv_db_description
20-06-29 02:05:45.00: qoamddsc.c 0309(Tenant1-LAN-VR): Neighbor is in state 5
20-06-29 02:05:45.00: qoamddsc.c 0362(Tenant1-LAN-VR): INIT or EXCHANGE_START
20-06-29 02:05:45.00: qoamddsc.c 0422(Tenant1-LAN-VR): State is EXCHANGE_START
20-06-29 02:05:45.00: qoamddsc.c 0457(Tenant1-LAN-VR): dd_flags 0X07, dd data_len 8
20-06-29 02:05:45.00: qoamddsc.c 0459(Tenant1-LAN-VR): nbr rtr ID 0X01010101 our rtr ID 0X06060606
20-06-29 02:05:45.00: qoamddsc.c 0461(Tenant1-LAN-VR): dd seq no 8554 nbr dd seq no 1613411
20-06-29 02:05:45.00: qoamddsc.c 0478(Tenant1-LAN-VR): packet empty is True
20-06-29 02:05:45.00: qoamddsc.c 0525(Tenant1-LAN-VR): Remaining in EXCHANGE_START, flags 0X00000000
20-06-29 02:05:45.00: qoamddsc.c 0839(Tenant1-LAN-VR): NBR NM flags now 0X00000000^C

and also we can see the MTU mismatch between the peers (Local MTU = 1500B (05 DC in HEX), remote MTU = 1400B (05 78 in HEX)):

20-06-29 02:05:43.00: qonmuser.c 0511(Tenant1-LAN-VR): qonm_user_pkt_sent
20-06-29 02:05:43.00: qonmuser.c 0549(Tenant1-LAN-VR): DATABASE DESCRIPTION packet
20-06-29 02:05:43.00: qonmuser.c 0630(Tenant1-LAN-VR): Area id 0.0.0.0
20-06-29 02:05:43.00: qonmuser.c 0633(Tenant1-LAN-VR): Neighbor router ID 1.1.1.1
20-06-29 02:05:43.00: qonmuser.c 0636(Tenant1-LAN-VR): Destination IP Address 224.0.0.5
20-06-29 02:05:43.00: qonmuser.c 0642(Tenant1-LAN-VR): Interface index 0X0000000C
20-06-29 02:05:43.00: qonmuser.c 0644(Tenant1-LAN-VR): Interface IP address 192.168.254.9
20-06-29 02:05:43.00: qonmscks.c 1395(Tenant1-LAN-VR): 45 c0 00 34 00 00 00 00 01 59 00 00 c0 a8 fe 09 e0 00 00 05

20-06-29 02:05:43.00: qonmscks.c 1395(Tenant1-LAN-VR): 02 02 00 20 06 06 06 06 00 00 00 00 0b 73 00 00 00 00 00 00

20-06-29 02:05:43.00: qonmscks.c 1395(Tenant1-LAN-VR): 00 00 00 00 05 dc 42 07 00 18 9e 63

20-06-29 02:05:45.00: qonmuser.c 0310(Tenant1-LAN-VR): qonm_user_pkt_received
20-06-29 02:05:45.00: qonmuser.c 0348(Tenant1-LAN-VR): DATABASE DESCRIPTION packet
20-06-29 02:05:45.00: qonmuser.c 0425(Tenant1-LAN-VR): Area id 0.0.0.0
20-06-29 02:05:45.00: qonmuser.c 0428(Tenant1-LAN-VR): Neighbor router ID 1.1.1.1
20-06-29 02:05:45.00: qonmuser.c 0431(Tenant1-LAN-VR): Destination IP Address 192.168.254.10
20-06-29 02:05:45.00: qonmuser.c 0437(Tenant1-LAN-VR): Interface index 0X0000000C
20-06-29 02:05:45.00: qonmuser.c 0439(Tenant1-LAN-VR): Interface IP address 192.168.254.9
20-06-29 02:05:45.00: qonmscks.c 1395(Tenant1-LAN-VR): 45 c0 00 40 84 69 00 00 01 59 b6 d6 c0 a8 fe 0a c0 a8 fe 09

20-06-29 02:05:45.00: qonmscks.c 1395(Tenant1-LAN-VR): 02 02 00 20 01 01 01 01 00 00 00 00 82 f2 00 00 00 00 00 00

20-06-29 02:05:45.00: qonmscks.c 1395(Tenant1-LAN-VR): 00 00 00 00 05 78 52 07 00 00 21 6a ff f6 00 03 00 01 00 04

20-06-29 02:05:45.00: qonmscks.c 1395(Tenant1-LAN-VR): 00 00 00 01

After MTU mismatch is fixed the adjacency is coming up:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR brief
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 full 1.1.1.1 1 up
[ok][2020-06-29 02:18:07]

11. OSPF Adjacency is Stuck in Loading

This is a quite rare problem in OSPF neighbor relationship. It could be a result of an event like LS request is being made and neighbor is sending bad packet or memory corruption exists. Also, the LS update could become corrupted on the way if there are issues with the transmission links. In the bellow output the Versa appliance peer is in OSPF Loading state:

IOU1#show ip ospf 1 neighbor

Neighbor ID Pri State Dead Time Address Interface
7.7.7.7 0 FULL/ - 00:00:36 192.168.254.13 Ethernet0/2.10
6.6.6.6 0 LOADING/ - 00:00:38 192.168.254.9 Ethernet0/1.10

Versa device reports a full OSPF adjacency:

admin@branch6-cli> show ospf neighbor org Tenant1 routing-instance Tenant1-LAN-VR extensive
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.10 vni-0/4.10 full 1.1.1.1 1 up
Area: 0.0.0.0 DR: none BDR: none
Admin state: up
Relationship state with neighbor: full
Neighbor priority: 1; Options: 0x52; Re-transmission queue length: 5 <<<<<<
Number of neighbor relationship state changes or error: 144
Permanence: dynamic Hellos suppressed: no Requested LSAs: 0
Dead timer due in: 00:00:22 (hrs:mins:secs:n/a)
Hitless restart status: not helping
Remaining hitless restart interval: none
Hitless restart result: none

However, something is wrong as the retransmission-queue length always has OSPF updates waiting to be sent on the wire. This is most likely a result of these updates being corrupted on the wire or when the LSAs are kept/inserted in the LS database (memory corruption).

Troubleshooting OSPF Neighbor Adjacency