Network Automation & Config Management

Foundation

What is network config management?

Config management is the practice of treating your network device configurations the same way software engineers treat code. Every configuration is version-controlled, reviewed, tested, and deployed through a repeatable automated pipeline rather than typed by hand into a terminal. The goal is to eliminate configuration drift, reduce human error, and make every change auditable and reversible.

In a manual world, 10 engineers touching 200 switches over 12 months creates 2,000 unique opportunities for configuration drift. In an automated world, all 200 switches get the same validated config from the same source of truth every time.

Source of truth

A single authoritative system (NetBox, Nautobot, or YAML files in Git) that defines what every device should look like. No config exists outside this system.

Template rendering

Jinja2 templates define config structure. Variables from the source of truth are injected to render device-specific configs programmatically.

Automated deployment

Python tooling using Netmiko, NAPALM, or Ansible pushes rendered configs to devices, validates the result, and rolls back on failure.

The workflow

End-to-end automation pipeline

This is the full pipeline from intent to production. Every stage has a clear input, output, and owner. Nothing touches a device without passing through every stage first.

🗄

NetBox

Source of truth

›

🐍

Python

Data extraction

›

📄

Jinja2

Config rendering

›

✅

Batfish

Pre-deploy testing

›

🔀

Git PR

Peer review

›

🚀

Netmiko

Push to device

›

📡

Telemetry

Validation

Each stage is a gate. If a stage fails, the pipeline stops and the engineer is notified before any device is touched. The pipeline runs in CI/CD (GitHub Actions or GitLab CI) and every run produces a complete audit log.

Stage 1

Extract from NetBox

Python pulls device inventory, IP addressing, VLAN assignments, BGP peer data, and interface roles from NetBox via its REST API. This becomes the data model for config rendering.

Stage 2

Render with Jinja2

Templates define the config structure for each device role (spine, leaf, PE router, border router). Variables from NetBox are injected to produce device-specific configs.

Stage 3

Validate with Batfish

Batfish performs offline network modelling against the rendered configs before they touch any device. It checks for routing loops, reachability violations, and policy mismatches.

Stage 4

Deploy and verify

Netmiko pushes configs in a controlled rollout. Post-deploy verification checks BGP sessions, OSPF adjacencies, and interface states before marking the change as successful.

Real code

Python tooling and Jinja2 templates

These are the actual patterns used in production automation. Not pseudocode. Real structure you can adapt to your environment.

Step 1: Pull device data from NetBox

        Python
        netbox_client.py
      
import pynetbox
import os

nb = pynetbox.api(
    "https://netbox.company.internal",
    token=os.environ["NETBOX_TOKEN"]
)

def get_bgp_peers(device_name):
    """Pull all BGP peers for a device from NetBox."""
    device = nb.dcim.devices.get(name=device_name)
    peers = nb.plugins.bgp.bgp_session.filter(
        device_id=device.id
    )
    return [{
        "neighbor_ip": p.remote_address.address.split("/")[0],
        "remote_asn":   p.remote_as.asn,
        "description":  p.description,
        "local_ip":    p.local_address.address.split("/")[0],
        "route_map_in": p.cf_route_map_in or "RM-DEFAULT-IN",
        "route_map_out": p.cf_route_map_out or "RM-DEFAULT-OUT"
    } for p in peers]

def build_device_context(device_name):
    """Build the full data context for template rendering."""
    device = nb.dcim.devices.get(name=device_name)
    return {
        "hostname":   device.name,
        "local_asn":  device.cf_bgp_asn,
        "router_id":  device.primary_ip.address.split("/")[0],
        "bgp_peers":  get_bgp_peers(device_name),
        "vlans":      [v.vid for v in nb.ipam.vlans.filter(site_id=device.site.id)],
        "role":       device.device_role.slug
    }

Step 2: Jinja2 BGP neighbor template

        Jinja2
        templates/bgp_neighbors.j2
      
! Generated by NetworkForAI automation pipeline
! Device: {{ hostname }} | ASN: {{ local_asn }}
! DO NOT EDIT MANUALLY

router bgp {{ local_asn }}
  bgp router-id {{ router_id }}
  bgp log-neighbor-changes
  no bgp default ipv4-unicast
{% for peer in bgp_peers %}

  ! Peer: {{ peer.description }}
  neighbor {{ peer.neighbor_ip }} remote-as {{ peer.remote_asn }}
  neighbor {{ peer.neighbor_ip }} description {{ peer.description }}
  neighbor {{ peer.neighbor_ip }} update-source Loopback0
  neighbor {{ peer.neighbor_ip }} password {{ peer.md5_password | default("changeme") }}
  neighbor {{ peer.neighbor_ip }} timers 10 30

  address-family ipv4 unicast
    neighbor {{ peer.neighbor_ip }} activate
    neighbor {{ peer.neighbor_ip }} soft-reconfiguration inbound
    neighbor {{ peer.neighbor_ip }} route-map {{ peer.route_map_in }} in
    neighbor {{ peer.neighbor_ip }} route-map {{ peer.route_map_out }} out
    neighbor {{ peer.neighbor_ip }} maximum-prefix 10000 80
  exit-address-family
{% endfor %}

Step 3: Render and push to device

        Python
        deploy.py
      
from jinja2 import Environment, FileSystemLoader
from netmiko import ConnectHandler
from netbox_client import build_device_context
import logging

log = logging.getLogger("deploy")

def render_config(device_name, template_name):
    env = Environment(loader=FileSystemLoader("templates/"))
    template = env.get_template(template_name)
    context = build_device_context(device_name)
    return template.render(context)

def deploy_to_device(device_name, mgmt_ip, config, dry_run=True):
    """Push config to device. dry_run=True just prints the diff."""
    config_lines = config.strip().splitlines()

    if dry_run:
        log.info(f"DRY RUN for {device_name}: config ready, not pushing")
        return config

    device = {
        "device_type": "cisco_ios",
        "host":        mgmt_ip,
        "username":    os.environ["NET_USER"],
        "password":    os.environ["NET_PASS"],
        "secret":      os.environ["NET_ENABLE"]
    }

    with ConnectHandler(**device) as conn:
        conn.enable()
        output = conn.send_config_set(config_lines)
        log.info(f"Config pushed to {device_name}")
        return output

def verify_bgp(conn, expected_peer_count):
    """Post-deploy check: verify BGP sessions are established."""
    output = conn.send_command("show bgp summary | include Established")
    established = output.count("Established")
    if established < expected_peer_count:
        raise ValueError(
            f"Expected {expected_peer_count} BGP sessions, got {established}"
        )
    log.info(f"BGP verification passed: {established} sessions up")

Step 4: EVPN/VXLAN leaf config template

        Jinja2
        templates/evpn_leaf.j2
      
! EVPN/VXLAN Leaf Config | {{ hostname }}

vlan 10,20,30

interface nve1
  no shutdown
  source-interface loopback0
  host-reachability protocol bgp
{% for vni in vxlan_vnis %}
  member vni {{ vni.id }}
    ingress-replication protocol bgp
{% endfor %}

router bgp {{ local_asn }}
  address-family l2vpn evpn
{% for peer in bgp_peers if peer.type == "spine" %}
    neighbor {{ peer.neighbor_ip }} activate
    neighbor {{ peer.neighbor_ip }} send-community extended
{% endfor %}
{% for vni in vxlan_vnis %}
  vni {{ vni.id }} l2
    rd auto
    route-target import auto
    route-target export auto
{% endfor %}

Infrastructure as code

Treating network config like software

Every config change goes through the same workflow as a software code change. This creates accountability, reversibility, and a complete history of every decision ever made on the network.

Git repository structure

            Structure
            network-configs/
          
network-configs/
  templates/
    bgp_neighbors.j2
    evpn_leaf.j2
    ospf_area.j2
    prefix_lists.j2
    vlan_config.j2
  inventory/
    hosts.yaml
    group_vars/
      spine.yaml
      leaf.yaml
      pe_routers.yaml
  scripts/
    deploy.py
    verify.py
    rollback.py
    diff.py
  tests/
    test_bgp_template.py
    test_evpn_template.py
  .github/
    workflows/
      validate.yaml
      deploy.yaml
  README.md

GitHub Actions CI/CD pipeline

            YAML
            .github/workflows/validate.yaml
          
name: Validate network configs

on:
  pull_request:
    branches: [main]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Render templates
        run: python scripts/render_all.py

      - name: Run Batfish analysis
        run: python tests/batfish_check.py

      - name: Template unit tests
        run: pytest tests/ -v

      - name: Post diff as PR comment
        run: python scripts/diff.py --comment

When an engineer opens a pull request, the CI pipeline automatically renders all templates, runs Batfish network analysis, executes unit tests, and posts a human-readable config diff directly in the PR comment. Reviewers see exactly what will change on each device before approving.

Observability

Telemetry pipeline

Streaming telemetry replaces SNMP polling for modern network observability. Devices stream operational data in real time using gNMI, which gets collected, processed, and stored for dashboards and alerting.

📡

Device

gNMI stream

›

🔌

Telegraf

Collection

›

🗃

InfluxDB

Time-series store

›

📊

Grafana

Dashboards

›

🔔

Alerting

PagerDuty

gNMI subscription via Python

        Python
        telemetry/gnmi_collector.py
      
from pygnmi.client import gNMIclient
import json, time

TARGET_PATHS = [
    "openconfig-bgp:bgp/neighbors",
    "openconfig-interfaces:interfaces/interface/state/counters",
    "openconfig-network-instance:network-instances"
]

def stream_telemetry(host, port=57400):
    with gNMIclient(
        target=(host, port),
        username=os.environ["NET_USER"],
        password=os.environ["NET_PASS"],
        insecure=True
    ) as gc:
        for update in gc.subscribe_sync(
            subscribe={
                "subscription": [{"path": p, "mode": "sample", "sample_interval": 10_000_000_000}
                               for p in TARGET_PATHS],
                "mode": "stream"
            }
        ):
            process_update(update)

def process_update(update):
    """Parse update and write to InfluxDB."""
    for notification in update.get("update", []):
        path = notification["path"]
        val  = notification["val"]
        write_to_influx(path, val)

Team collaboration

How the team works together

Automation only works if the whole team trusts it and follows the same process. These are the collaboration patterns that make production automation sustainable.

Role	Responsibility in the pipeline	Tools used
Network engineer	Defines intent in NetBox, opens PRs for config changes, reviews diffs before approving	NetBox, Git, Python
Automation engineer	Maintains templates and scripts, owns the CI/CD pipeline, handles tooling bugs	Jinja2, GitHub Actions, Batfish
Network architect	Reviews topology changes, approves new template patterns, signs off on major rollouts	Batfish reports, topology diagrams
NOC team	Monitors post-deploy telemetry, raises issues if verification fails, handles rollbacks	Grafana, PagerDuty, runbooks

Questions to ask before starting an automation project

What is the source of truth and who owns it? If NetBox data is wrong, every rendered config will be wrong. Establish who is responsible for keeping device data accurate before writing a single line of automation.

What is the rollback plan if a deploy fails? Every deploy script needs a corresponding rollback script. Define the failure threshold before starting: if X BGP sessions go down, automatically roll back and page the on-call engineer.

Which devices are in scope for the first rollout? Start with non-production or low-risk devices. Automate one device role at a time. Running automation across your entire network on day one is how you cause a major incident.

How do engineers who prefer CLI still fit into the workflow? Not every engineer will embrace automation immediately. Design the process so manual changes are still possible but require a NetBox update first. This preserves the source of truth even when people go off-script.

What does approval look like for production changes? Require at least two engineer approvals on any PR that touches production devices. The CI pipeline catches technical errors but human review catches intent errors.

Production-grade network
automation, built right

What is network config management?

Source of truth

Template rendering

Automated deployment

End-to-end automation pipeline

Extract from NetBox

Render with Jinja2

Validate with Batfish

Deploy and verify

Python tooling and Jinja2 templates

Step 1: Pull device data from NetBox

Step 2: Jinja2 BGP neighbor template

Step 3: Render and push to device

Step 4: EVPN/VXLAN leaf config template

Treating network config like software

Git repository structure

GitHub Actions CI/CD pipeline

Telemetry pipeline

gNMI subscription via Python

How the team works together

Questions to ask before starting an automation project

Config generator

More automation tools coming

Config compliance checker

Jinja2 live renderer

Rollback script generator

Production-grade networkautomation, built right

What is network config management?

Source of truth

Template rendering

Automated deployment

End-to-end automation pipeline

Extract from NetBox

Render with Jinja2

Validate with Batfish

Deploy and verify

Python tooling and Jinja2 templates

Step 1: Pull device data from NetBox

Step 2: Jinja2 BGP neighbor template

Step 3: Render and push to device

Step 4: EVPN/VXLAN leaf config template

Treating network config like software

Git repository structure

GitHub Actions CI/CD pipeline

Telemetry pipeline

gNMI subscription via Python

How the team works together

Questions to ask before starting an automation project

Config generator

More automation tools coming

Config compliance checker

Jinja2 live renderer

Rollback script generator

Production-grade network
automation, built right