Network Automation AIOps BGP Python December 2025

Network Automation Lab — AIOps, LLDP Topology Validation, and Switch Replacement Automation

A full enterprise spine-leaf data center running on a free Oracle Cloud VM. This lab combines real eBGP routing, automated topology validation using LLDP, switch replacement config generation, and a 24/7 AIOps pipeline powered by Gemini AI and n8n. Every component is connected and every workflow runs automatically.

NetBox live n8n workflow Anomaly API GitHub
13
Containerlab nodes
8
BGP autonomous systems
19
cable links in NetBox
15m
AIOps check interval

Spine-leaf topology and BGP design

The lab runs a 13-node enterprise spine-leaf topology in Containerlab on a single Oracle Cloud ARM VM (4 CPU, 24 GB RAM, free tier). The routing protocol is eBGP with an AS-per-rack design, which is how large-scale data centers like those at Meta, Amazon, and TikTok are built. Each rack gets its own autonomous system number, spines share AS65001, and the edge router represents an upstream ISP or WAN connection at AS65000.

router1
AS65000 • mgmt 172.20.20.2
↓  eBGP  —  eth1=10.0.1.x  eth2=10.0.2.x  eth3=10.0.3.x  ↓
SP1
AS65001 • .11
SP2
AS65001 • .12
SP3
AS65001 • .13
↓  eBGP AS per rack  —  /30 subnets per spine-leaf link  ↓
SP4
AS65004 • .14
SP5
AS65005 • .15
SP6
AS65006 • .16
SP7
AS65007 • .17
↓  access layer  ↓
SRV1
.21
SRV2
.22
SRV3
.23
SRV4
.24
Edge router
Spine (AS65001)
Leaf / ToR (unique AS)
Server

IP addressing

LinkSide ASide BSubnet
router1 → SP1router1:eth1 = 10.0.1.1SP1:eth1 = 10.0.1.210.0.1.0/30
router1 → SP2router1:eth2 = 10.0.2.1SP2:eth1 = 10.0.2.210.0.2.0/30
router1 → SP3router1:eth3 = 10.0.3.1SP3:eth1 = 10.0.3.210.0.3.0/30
SP1 → SP4SP1:eth2 = 10.1.1.1SP4:eth1 = 10.1.1.210.1.1.0/30
SP1 → SP5SP1:eth3 = 10.1.2.1SP5:eth1 = 10.1.2.210.1.2.0/30
SP2 → SP4SP2:eth2 = 10.2.1.1SP4:eth2 = 10.2.1.210.2.1.0/30
SP3 → SP4SP3:eth2 = 10.3.1.1SP4:eth3 = 10.3.1.210.3.1.0/30
SP4 → SRV1SP4:eth4 = 10.4.1.1SRV1:eth110.4.1.0/30
SP7 → SRV4SP7:eth4 = 10.4.4.1SRV4:eth110.4.4.0/30

All 19 cable connections are stored in NetBox as the source of truth. The add_netbox_cables.py script reads the topology definition and creates every cable via the NetBox dcim.cables API automatically. Interface types are set to 1000base-t since NetBox requires physical interfaces for cable terminations.


BGP configuration with BIRD2

Each node runs BIRD2 as its routing daemon. The BGP design uses static blackhole route advertisements so that each device originates its own connected subnets into BGP. The static routes advertise the subnet into BIRD's master routing table and BGP then redistributes them to all peers. Without the static routes, BGP sessions establish but no prefixes are exchanged.

/etc/bird.conf — SP1 (spine AS65001)
router id 10.0.0.1;

protocol kernel { ipv4 { export all; import all; }; }
protocol device { scan time 10; }

# Advertise connected subnets into BGP via static blackhole routes
protocol static {
    ipv4;
    route 10.1.1.0/30 blackhole;  # SP1-SP4 link
    route 10.1.2.0/30 blackhole;  # SP1-SP5 link
    route 10.1.3.0/30 blackhole;  # SP1-SP6 link
    route 10.1.4.0/30 blackhole;  # SP1-SP7 link
}

# eBGP session to edge router (upstream)
protocol bgp upstream {
    local 10.0.1.2 as 65001;
    neighbor 10.0.1.1 as 65000;
    ipv4 { import all; export all; };
}

# eBGP sessions to each leaf switch (AS per rack)
protocol bgp sp4 {
    local 10.1.1.1 as 65001;
    neighbor 10.1.1.2 as 65004;
    ipv4 { import all; export all; };
}
protocol bgp sp5 { local 10.1.2.1 as 65001; neighbor 10.1.2.2 as 65005; ipv4 { import all; export all; }; }
protocol bgp sp6 { local 10.1.3.1 as 65001; neighbor 10.1.3.2 as 65006; ipv4 { import all; export all; }; }
protocol bgp sp7 { local 10.1.4.1 as 65001; neighbor 10.1.4.2 as 65007; ipv4 { import all; export all; }; }

The fix_bird_configs.py Python script writes this configuration directly into each running container using docker exec -i ... tee /etc/bird.conf, then restarts BIRD. This is the key automation step that makes the lab self-healing after a restart.

fix_bird_configs.py — writing config into container
def write_config(device, config):
    container = f"clab-enterprise-spine-leaf-{device}"
    result = subprocess.run(
        ["sudo", "docker", "exec", "-i", container,
         "sh", "-c", "tee /etc/bird.conf > /dev/null"],
        input=config, text=True, capture_output=True
    )
    return result.returncode == 0

def restart_bird(device):
    container = f"clab-enterprise-spine-leaf-{device}"
    subprocess.run(
        ["sudo", "docker", "exec", container,
         "sh", "-c", "pkill bird 2>/dev/null; sleep 1; bird -c /etc/bird.conf"]
    )

End-to-end verification: ping from SP4 (AS65004, 10.4.1.0/30) to SP7's subnet (10.4.4.1) passes through three spines via eBGP. This confirms full fabric connectivity across all eight autonomous systems.


LLDP validation against NetBox

LLDP (Link Layer Discovery Protocol) is a vendor-neutral Layer 2 protocol where every network device advertises its own identity and connected port to directly attached neighbors. By collecting this data from all devices and comparing it against the 19 cables stored in NetBox, the lab can detect any mismatch between the expected topology and the actual running state.

This is how enterprise network teams catch cable mistakes after a hardware replacement, detect rogue devices plugged in without authorization, and verify that their CMDB (NetBox) accurately reflects the physical infrastructure.


Automated switch config generation and replacement planning

Switch replacement is one of the most error-prone operations in network engineering. Generating a new switch config manually means transcribing VLANs, IP addresses, BGP neighbors, interface descriptions, and port channels from the old device to the new one. A single missed VLAN or wrong subnet causes a production outage.

This automation pipeline pulls the complete device record from NetBox (interfaces, IPs, roles, connected devices), generates the replacement configuration using Python and Jinja2 templates, validates it against the live topology using Netmiko, and prepares a cutover plan. The GitHub Actions pipeline can then deploy it automatically after human approval.

generate_host_vars.py — pulling device data from NetBox
import pynetbox, yaml, os

nb = pynetbox.api("https://netbox.networkforai.com", token=NETBOX_TOKEN)

for device in nb.dcim.devices.all():
    interfaces = list(nb.dcim.interfaces.filter(device=device.name))
    ips        = list(nb.ipam.ip_addresses.filter(device=device.name))
    cables     = [i for i in interfaces if i.cable]

    host_vars = {
        "hostname":   device.name,
        "role":       str(device.role),
        "interfaces": [{"name": i.name, "ip": str(i.primary_ip4)} for i in interfaces if i.primary_ip4],
        "bgp_peers":  [...],  # derived from cable endpoints
    }

    path = f"host_vars/{device.name}.yml"
    with open(path, "w") as f:
        yaml.dump(host_vars, f)

The startup.py master script runs configure_devices.py followed by fix_bird_configs.py on every deployment. This means the entire lab can be fully reconfigured from source in about 60 seconds after any container restart or redeploy.


n8n workflow — automated monitoring, AI analysis, and email alerting

n8n is a self-hosted workflow automation platform similar to Zapier but open source and running on the Oracle VM. The workflow connects the schedule trigger, the Flask API, Gemini AI, and Gmail into a fully automated monitoring loop. No human needs to be watching for anything — the pipeline runs on its own every 15 minutes and only sends an email when something is actually wrong.


GitHub Actions CI/CD pipeline

A self-hosted GitHub Actions runner runs on the Oracle VM as a systemd service. It connects to GitHub and listens for push events on the main branch. Every code commit triggers the deployment pipeline which reconfigures the entire lab and runs a connectivity test.

.github/workflows/deploy.yml
name: Network Config Pipeline

on:
  push:
    branches: [ main ]

jobs:
  deploy:
    runs-on: self-hosted

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure device interfaces
        run: |
          source /home/ubuntu/venv/bin/activate
          python3 /home/ubuntu/network-automation/scripts/configure_devices.py

      - name: Write BGP configs with static routes
        run: |
          source /home/ubuntu/venv/bin/activate
          python3 /home/ubuntu/network-automation/scripts/fix_bird_configs.py

      - name: Verify end-to-end connectivity
        run: |
          sleep 30
          sudo docker exec clab-enterprise-spine-leaf-SP4 \
            ping -c 3 10.4.4.1  # SP4 to SP7 subnet via 3 spines

The runner requires sudo access to docker for the ping test step. This is configured in /etc/sudoers.d/github-runner on the Oracle VM so the runner can execute docker exec commands without a password prompt during CI runs.


Full tech stack

Oracle Cloud ARM (free tier) Ubuntu 22.04 Containerlab BIRD2 BGP daemon nicolaka/netshoot Python 3.12 pynetbox lldpd / lldpcli NetBox v4.5 Flask + Gunicorn n8n Google Gemini 2.5 Flash GitHub Actions Nginx + Let's Encrypt SSL Netmiko Jinja2