Using OSPF as a Routing Protocol on the Local LAN for a Dual Attachment HA Active-Active SD-WAN Site

By Radu Pavaloiu
  In this article We’ll look at a dual-homed SD-WAN HA Active-Active site where OSPF is the chosen IGP to run on the local network. Customer has OSPF protocol already configured and providing VRF/Tenant separation using a VRF lite construct. There will be mutual redistribution between MP-BGP and OSPF configured on the Versa appliances and for this kind of scenario special care should be taken to avoid routing loops and obtain optimal routing. We’ll see how Versa OS helps to prevent the routing loops, how to get optimal routing in a simple way and a possible problem and a way to solve it. Our lab topology looks like the bellow one:   On the single CPE site (Branch-250) the local LAN (50.50.50/24) is redistributed in MP-BGP. On the dual-homed HA Active-Active site OSPF is running between Versa CPE devices (Branch-6 and Branch-7) and a local router IOU1. All the active OSPF interfaces are configured as point-to-point to not have DR election and are running in the backbone area (Area 0). IOU1 is also a VRF aware device, so it supports multiple tenants/VRFs. This device provides connectivity to the local LAN in different VRFs. So the existing devices are VRF aware and Versa is able to integrate in the existing network and provide SD-WAN for every tenant/VRF. This can be done by running a separate IGP per VRF on the local network. We’ll focus the discussion on specific VRF – Tenant1. On IOU1(OSPF Router ID 1.1.1.1) Lo10 (100.100.100.100/32) is redistributed in OSFP (LSA type 5) and the local LAN (30.30.30.30/24) is configured as OSPF passive (LSA type 1):
admin@branch6-cli> show ospf database external | match 100.100.100.100
External 100.100.100.100 1.1.1.1 0x80000005 1963 0x0000B74F

admin@branch6-cli> show ospf database router adv-router 1.1.1.1 detail | match 30.30.30.0
1. Link ID: 30.30.30.0; Link data: 255.255.255.0; Type: Stub
Now let’s check what is the single CPE site view. On the dual-homed site HA Active-Active a workflow was used for staging the CPEs. If a workflow is used to deploy a HA Active-Active SD-WAN site by default a different RD (route distinguisher) will be chose for each CPE. As a result, a SD-WAN controller will see two different vpnv4 prefixes coming from the dual-homed site and reflect them to the other SD-WAN CPEs (BGP route reflector clients).
admin@branch250-cli> show route table l3vpn.ipv4.unicast receive-protocol bgp 100.100.100.100
…
Routes for Routing instance : Tenant1-Control-VR AFI: ipv4 SAFI: unicast
Routing entry for 100.100.100.100/32
Peer Address : 10.0.192.1 <<< Controller/RR
Route Distinguisher: 2L:2
Next-hop : 10.0.192.106 <<
So, for prefixes coming from the dual-homed site 2 routes per prefix will be available in the RIB and installed in the forwarding table:
admin@branch250-cli> show route routing-instance Tenant1-LAN-VR 100.100.100.100/32

Routes for Routing instance : Tenant1-LAN-VR AFI: ipv4 SAFI: unicast
[+] - Active Route

Routing entry for 100.100.100.100 (mask 255.255.255.255) [+]
Known via 'BGP', distance 200,
Redistributing via BGP
Last update from 10.0.192.106 00:00:07 ago
Routing Descriptor Blocks:
* 10.0.192.106 , via Indirect 00:00:07 ago

Routing entry for 100.100.100.100 (mask 255.255.255.255) [+]
Known via 'BGP', distance 200,
Redistributing via BGP
Last update from 10.0.192.107 00:00:07 ago
Routing Descriptor Blocks:
* 10.0.192.107 , via Indirect 00:00:07 ago

admin@branch250-cli> show route routing-instance Tenant1-LAN-VR 30.30.30.0/24

Routes for Routing instance : Tenant1-LAN-VR AFI: ipv4 SAFI: unicast
[+] - Active Route

Routing entry for 30.30.30.0 (mask 255.255.255.0) [+]
Known via 'BGP', distance 200,
Redistributing via BGP
Last update from 10.0.192.106 00:02:17 ago
Routing Descriptor Blocks:
* 10.0.192.106 , via Indirect 00:02:17 ago

Routing entry for 30.30.30.0 (mask 255.255.255.0) [+]
Known via 'BGP', distance 200,
Redistributing via BGP
Last update from 10.0.192.107 00:02:17 ago
Routing Descriptor Blocks:
* 10.0.192.107 , via Indirect 00:02:17 ago

admin@branch250-cli> show forwarding-table Tenant1-LAN-VR inet ipv4 brief 100.100.100.100/32
NEXT
ADDRESS NEXTHOP INTERFACE VRF AGE LABEL
-----------------------------------------------------------------------
100.100.100.100/32 10.0.192.106 dtvi-0/77 00:08:49 [ 24705 ]
100.100.100.100/32 10.0.192.107 dtvi-0/105 00:08:49 [ 24705 ]

admin@branch250-cli> show forwarding-table Tenant1-LAN-VR inet ipv4 brief 30.30.30.0/24
NEXT
ADDRESS NEXTHOP INTERFACE VRF AGE LABEL
--------------------------------------------------------------------
30.30.30.0/24 10.0.192.106 dtvi-0/77 00:08:51 [ 24705 ]
30.30.30.0/24 10.0.192.107 dtvi-0/105 00:08:51 [ 24705 ]
Now We’re moving to the interesting site, the dual-homed one and we’ll start by having a short description of Down Bit and Domain Tag. The interaction between OSPF and BGP when these two protocols are used in an overlay network – MPLS (similarly for SD-WAN) is described in RFC 4577 – OSPF as the Provider/Customer Edge Protocol for BGP/MPLS IP Virtual Private Networks (VPNs). As a summary the Down Bit is a bit that is set in the Options field of an OSPF LAS type 3. It indicates the direction that the route has been advertised. If the OSPF route has been advertise from a PE(Versa appliance) router into an OSPF area, the Down Bit is set. Another PE(Versa appliance) router in the same area does not redistribute this route back in to the iBGP overlay network if this bit is set. The PE router does not even include the route in the SPF calculation. As such a possible routing loop is avoided if the site is multihomed. The Domain Tag server the same purpose as the Down Bit, but for external routes (LSA type 5). & In Versa Overlay solution OSPF attributes like Areas, LSA types, etc… are not carried by MP-BGP at this moment, so all prefixes redistributed from MP-BGP into OSPF will be external (LSA type 5). To prevent routing loops both OSPF Down Bit and OSPF Domain VPN Tag are set by default for all redistributed prefixes. This can be seen for 50.50.50/24 prefix (coming from Branch-250) when it is redistributed from MP-BGP to OSPF on Branch-6 and Branch-7 routers:     These features are very helpful for routing loops prevention, but a problem arises when the Versa CPEs are not directly connected to the local LAN and there is another router providing Tenant based LAN connectivity and having OSPF neighborship with the Versa CPEs – in our topology IOU1.   As you can see bellow IOU1 has Branch-6 and Branch-7 as OSPF neighbors in vrf Tenant1.
router ospf 1 vrf Tenant1
router-id 1.1.1.1
redistribute connected subnets route-map C2OSPF
passive-interface Ethernet1/3.10

IOU1#show ip ospf 1 neighbor

Neighbor ID Pri State Dead Time Address Interface
7.7.7.7 0 FULL/ - 00:00:38 192.168.254.13 Ethernet0/2.10
6.6.6.6 0 FULL/ - 00:00:30 192.168.254.9 Ethernet0/1.10
Because of the Down Bit set in the LSA type 5 for 50.50.50/24 and the existing router is already VRF aware the LSA is not even taken in consideration for SPF calculation and as a result the prefix is not inserted in the IOU1 RIB. This finally breaks the connectivity for the dual-homed site to the remote locations.     50.50.50/24 is not present in the RIB:
IOU1#show ip route vrf Tenant1 50.50.50.0

Routing Table: Tenant1
% Network not in table
The recommended solution is to check if the intermediary Router (IOU1) has the capability to ignore the Down Bit set in the LSA option field and use the LSA in the SPF algorithm and finally insert the prefix in the RIB. This feature is called vrf-lite capability. Configuring it on IOU1 inserts the remote prefix (50.50.50/24) in the RIB – ECMP to Branch-6 and Branch-7.
router ospf 1 vrf Tenant1
router-id 1.1.1.1
capability vrf-lite <<<<<<
redistribute connected subnets route-map C2OSPF
passive-interface Ethernet1/3.10

IOU1#show ip route vrf Tenant1 50.50.50.0

Routing Table: Tenant1
Routing entry for 50.50.50.0/24
Known via "ospf 1", distance 110, metric 100
Tag 2, type extern 2, forward metric 10
Last update from 192.168.254.9 on Ethernet0/1.10, 00:00:02 ago
Routing Descriptor Blocks:
* 192.168.254.13, from 7.7.7.7, 00:00:12 ago, via Ethernet0/2.10
Route metric is 100, traffic share count is 1
Route tag 2
192.168.254.9, from 6.6.6.6, 00:00:02 ago, via Ethernet0/1.10
Route metric is 100, traffic share count is 1
Route tag 2
If the intermediary router does not support this capability the Down Bit setting should be disabled on the Versa appliances. We’ll see in a different post how routing loops are prevented and also how the optimal routing is achieved when the Versa appliances are configured to not set the Down Bit and Domain VPN Tag.   Down Bit and VPN Domain Tag helps in preventing routing loops but there is still one more problem, achieving hot-potato/optimal routing. Here a potential problem can be seen when a workflow deploys the HA Active-Active site for locally originated LSAs type 5 (external prefixes). By default, the workflow configures mutual redistribution between MP-BGP and OSPF without altering any attribute.   Let’s see now what the issue is and the possible events to trigger the problem.   IOU1 advertises 100.100.100.100 prefix as external. Checking the RIB on Branch-6 we see that everything looks well. Route learned directly from OSPF is preferred over the one coming from Branch-7 as the OSPF administrative distance is lesser than the IBGP one (110 vs 200):
admin@branch6-cli> show route routing-instance Tenant1-LAN-VR 100.100.100.100

Routes for Routing instance : Tenant1-LAN-VR AFI: ipv4 SAFI: unicast
[+] - Active Route

Routing entry for 100.100.100.100 (mask 255.255.255.255)
Known via 'BGP', distance 200,
Redistributing via BGP
Last update from 10.0.192.107 00:09:10 ago
Routing Descriptor Blocks:
* 10.0.192.107 , via Indirect 00:09:10 ago <<
Looking for the same prefix (100.100.100.100/32) on Branch-7 We find out that it prefers the BGP route from Branch-6 instead of the External OSPF one learned directly from IOU1. The OSPF route is not even in the routing table, even if the OSPF adjacency with IOU1 is UP:
admin@branch7-cli> show route routing-instance Tenant1-LAN-VR 100.100.100.100

Routes for Routing instance : Tenant1-LAN-VR AFI: ipv4 SAFI: unicast
[+] - Active Route

Routing entry for 100.100.100.100 (mask 255.255.255.255) [+]
Known via 'BGP', distance 200,
Redistributing via BGP
Last update from 10.0.192.106 00:27:36 ago
Routing Descriptor Blocks:
* 10.0.192.106 , via Indirect 00:27:36 ago

admin@branch7-cli> show ospf neighbor org Tenant1
Org name: Tenant1
routing-instance Tenant1-LAN-VR
State codes: atmpt - attempt, exchg - exchange, exst - exchange start,
load - loading, 2-way - two-way, full - full
Op codes: gdown - going down, gup - going up

Intf address Interface State Neighbor ID Pri Op
------------ --------- ----- ----------- --- --
192.168.254.14 vni-0/4.10 full 1.1.1.1 1 up
Troubleshooting should start by checking what we have in the LSA DB on Branch-7 for the respective external prefix:     As it can be seen redistributed prefix metric is lesser than the metric of the prefix learned directly vis OSPF from IOU1 (1 vs 20). As a result, the OSPF learned prefix is not getting into the RIB. The problem appears if the MP-BGP update arrives on Branch-7 before the external LSA from IOU1. It would be redistributed in OSPF with Metric=1 so the locally generated external LSA would be preferred over the one from originator (IOU1). As a result, Branch-7 would point to Branch-6 to reach 100.100.100.100 over SD-WAN instead of using the local OSPF route (even if OSPF has a better AD). It can be easily seen that OSPF neighborship flapping would trigger the non-optimal routing state.   The simplest solution without involving for example BGP community attribute-based filtering is to assign a high metric to the redistributed prefixes from BGP to OSPF (LSA type 5 has 24 bit field for metric). This should be done on both appliances part of the HA complex and it can be seen in the bellow config snippet and picture.
admin@branch7-cli> show configuration routing-instances Tenant1-LAN-VR policy-options redistribution-policy Default-Policy-To-OSPF | display set
set routing-instances Tenant1-LAN-VR policy-options redistribution-policy Default-Policy-To-OSPF term T2-BGP match protocol bgp
set routing-instances Tenant1-LAN-VR policy-options redistribution-policy Default-Policy-To-OSPF term T2-BGP action accept
set routing-instances Tenant1-LAN-VR policy-options redistribution-policy Default-Policy-To-OSPF term T2-BGP action set-origin igp
set routing-instances Tenant1-LAN-VR policy-options redistribution-policy Default-Policy-To-OSPF term T2-BGP action set-ospf-tag 2
set routing-instances Tenant1-LAN-VR policy-options redistribution-policy Default-Policy-To-OSPF term T2-BGP action metric 100
set routing-instances Tenant1-LAN-VR policy-options redistribution-policy Default-Policy-To-OSPF term T2-BGP action metric-conversion set
    After the change the optimal routing is achieved as the prefix installed in the RIB is the one learned via the OSPF neighbor which originate the respective LSA type 5.
admin@branch7-cli> show route routing-instance Tenant1-LAN-VR 100.100.100.100
Routes for Routing instance : Tenant1-LAN-VR AFI: ipv4 SAFI: unicast
[+] - Active Route

Routing entry for 100.100.100.100 (mask 255.255.255.255)
Known via 'BGP', distance 200,
Redistributing via BGP
Last update from 10.0.192.106 01:45:04 ago
Routing Descriptor Blocks:
* 10.0.192.106 , via Indirect 01:45:04 ago

Routing entry for 100.100.100.100 (mask 255.255.255.255) [+] <<