// Network automation and config management

Production-grade network
automation, built right

A full walkthrough of how to build, test, and ship network config management and automation pipelines to production. Real code, real workflow, real team collaboration patterns.

Foundation

What is network config management?

Config management is the practice of treating your network device configurations the same way software engineers treat code. Every configuration is version-controlled, reviewed, tested, and deployed through a repeatable automated pipeline rather than typed by hand into a terminal. The goal is to eliminate configuration drift, reduce human error, and make every change auditable and reversible.

In a manual world, 10 engineers touching 200 switches over 12 months creates 2,000 unique opportunities for configuration drift. In an automated world, all 200 switches get the same validated config from the same source of truth every time.
01

Source of truth

A single authoritative system (NetBox, Nautobot, or YAML files in Git) that defines what every device should look like. No config exists outside this system.

02

Template rendering

Jinja2 templates define config structure. Variables from the source of truth are injected to render device-specific configs programmatically.

03

Automated deployment

Python tooling using Netmiko, NAPALM, or Ansible pushes rendered configs to devices, validates the result, and rolls back on failure.

The workflow

End-to-end automation pipeline

This is the full pipeline from intent to production. Every stage has a clear input, output, and owner. Nothing touches a device without passing through every stage first.

🗄
NetBox
Source of truth
🐍
Python
Data extraction
📄
Jinja2
Config rendering
Batfish
Pre-deploy testing
🔀
Git PR
Peer review
🚀
Netmiko
Push to device
📡
Telemetry
Validation
Each stage is a gate. If a stage fails, the pipeline stops and the engineer is notified before any device is touched. The pipeline runs in CI/CD (GitHub Actions or GitLab CI) and every run produces a complete audit log.
Stage 1

Extract from NetBox

Python pulls device inventory, IP addressing, VLAN assignments, BGP peer data, and interface roles from NetBox via its REST API. This becomes the data model for config rendering.

Stage 2

Render with Jinja2

Templates define the config structure for each device role (spine, leaf, PE router, border router). Variables from NetBox are injected to produce device-specific configs.

Stage 3

Validate with Batfish

Batfish performs offline network modelling against the rendered configs before they touch any device. It checks for routing loops, reachability violations, and policy mismatches.

Stage 4

Deploy and verify

Netmiko pushes configs in a controlled rollout. Post-deploy verification checks BGP sessions, OSPF adjacencies, and interface states before marking the change as successful.

Real code

Python tooling and Jinja2 templates

These are the actual patterns used in production automation. Not pseudocode. Real structure you can adapt to your environment.

Step 1: Pull device data from NetBox

Python netbox_client.py
import pynetbox
import os

nb = pynetbox.api(
    "https://netbox.company.internal",
    token=os.environ["NETBOX_TOKEN"]
)

def get_bgp_peers(device_name):
    """Pull all BGP peers for a device from NetBox."""
    device = nb.dcim.devices.get(name=device_name)
    peers = nb.plugins.bgp.bgp_session.filter(
        device_id=device.id
    )
    return [{
        "neighbor_ip": p.remote_address.address.split("/")[0],
        "remote_asn":   p.remote_as.asn,
        "description":  p.description,
        "local_ip":    p.local_address.address.split("/")[0],
        "route_map_in": p.cf_route_map_in or "RM-DEFAULT-IN",
        "route_map_out": p.cf_route_map_out or "RM-DEFAULT-OUT"
    } for p in peers]

def build_device_context(device_name):
    """Build the full data context for template rendering."""
    device = nb.dcim.devices.get(name=device_name)
    return {
        "hostname":   device.name,
        "local_asn":  device.cf_bgp_asn,
        "router_id":  device.primary_ip.address.split("/")[0],
        "bgp_peers":  get_bgp_peers(device_name),
        "vlans":      [v.vid for v in nb.ipam.vlans.filter(site_id=device.site.id)],
        "role":       device.device_role.slug
    }

Step 2: Jinja2 BGP neighbor template

Jinja2 templates/bgp_neighbors.j2
! Generated by NetworkForAI automation pipeline
! Device: {{ hostname }} | ASN: {{ local_asn }}
! DO NOT EDIT MANUALLY

router bgp {{ local_asn }}
  bgp router-id {{ router_id }}
  bgp log-neighbor-changes
  no bgp default ipv4-unicast
{% for peer in bgp_peers %}

  ! Peer: {{ peer.description }}
  neighbor {{ peer.neighbor_ip }} remote-as {{ peer.remote_asn }}
  neighbor {{ peer.neighbor_ip }} description {{ peer.description }}
  neighbor {{ peer.neighbor_ip }} update-source Loopback0
  neighbor {{ peer.neighbor_ip }} password {{ peer.md5_password | default("changeme") }}
  neighbor {{ peer.neighbor_ip }} timers 10 30

  address-family ipv4 unicast
    neighbor {{ peer.neighbor_ip }} activate
    neighbor {{ peer.neighbor_ip }} soft-reconfiguration inbound
    neighbor {{ peer.neighbor_ip }} route-map {{ peer.route_map_in }} in
    neighbor {{ peer.neighbor_ip }} route-map {{ peer.route_map_out }} out
    neighbor {{ peer.neighbor_ip }} maximum-prefix 10000 80
  exit-address-family
{% endfor %}

Step 3: Render and push to device

Python deploy.py
from jinja2 import Environment, FileSystemLoader
from netmiko import ConnectHandler
from netbox_client import build_device_context
import logging

log = logging.getLogger("deploy")

def render_config(device_name, template_name):
    env = Environment(loader=FileSystemLoader("templates/"))
    template = env.get_template(template_name)
    context = build_device_context(device_name)
    return template.render(context)

def deploy_to_device(device_name, mgmt_ip, config, dry_run=True):
    """Push config to device. dry_run=True just prints the diff."""
    config_lines = config.strip().splitlines()

    if dry_run:
        log.info(f"DRY RUN for {device_name}: config ready, not pushing")
        return config

    device = {
        "device_type": "cisco_ios",
        "host":        mgmt_ip,
        "username":    os.environ["NET_USER"],
        "password":    os.environ["NET_PASS"],
        "secret":      os.environ["NET_ENABLE"]
    }

    with ConnectHandler(**device) as conn:
        conn.enable()
        output = conn.send_config_set(config_lines)
        log.info(f"Config pushed to {device_name}")
        return output

def verify_bgp(conn, expected_peer_count):
    """Post-deploy check: verify BGP sessions are established."""
    output = conn.send_command("show bgp summary | include Established")
    established = output.count("Established")
    if established < expected_peer_count:
        raise ValueError(
            f"Expected {expected_peer_count} BGP sessions, got {established}"
        )
    log.info(f"BGP verification passed: {established} sessions up")

Step 4: EVPN/VXLAN leaf config template

Jinja2 templates/evpn_leaf.j2
! EVPN/VXLAN Leaf Config | {{ hostname }}

vlan 10,20,30

interface nve1
  no shutdown
  source-interface loopback0
  host-reachability protocol bgp
{% for vni in vxlan_vnis %}
  member vni {{ vni.id }}
    ingress-replication protocol bgp
{% endfor %}

router bgp {{ local_asn }}
  address-family l2vpn evpn
{% for peer in bgp_peers if peer.type == "spine" %}
    neighbor {{ peer.neighbor_ip }} activate
    neighbor {{ peer.neighbor_ip }} send-community extended
{% endfor %}
{% for vni in vxlan_vnis %}
  vni {{ vni.id }} l2
    rd auto
    route-target import auto
    route-target export auto
{% endfor %}
Infrastructure as code

Treating network config like software

Every config change goes through the same workflow as a software code change. This creates accountability, reversibility, and a complete history of every decision ever made on the network.

Git repository structure

Structure network-configs/
network-configs/
  templates/
    bgp_neighbors.j2
    evpn_leaf.j2
    ospf_area.j2
    prefix_lists.j2
    vlan_config.j2
  inventory/
    hosts.yaml
    group_vars/
      spine.yaml
      leaf.yaml
      pe_routers.yaml
  scripts/
    deploy.py
    verify.py
    rollback.py
    diff.py
  tests/
    test_bgp_template.py
    test_evpn_template.py
  .github/
    workflows/
      validate.yaml
      deploy.yaml
  README.md

GitHub Actions CI/CD pipeline

YAML .github/workflows/validate.yaml
name: Validate network configs

on:
  pull_request:
    branches: [main]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Render templates
        run: python scripts/render_all.py

      - name: Run Batfish analysis
        run: python tests/batfish_check.py

      - name: Template unit tests
        run: pytest tests/ -v

      - name: Post diff as PR comment
        run: python scripts/diff.py --comment
When an engineer opens a pull request, the CI pipeline automatically renders all templates, runs Batfish network analysis, executes unit tests, and posts a human-readable config diff directly in the PR comment. Reviewers see exactly what will change on each device before approving.
Observability

Telemetry pipeline

Streaming telemetry replaces SNMP polling for modern network observability. Devices stream operational data in real time using gNMI, which gets collected, processed, and stored for dashboards and alerting.

📡
Device
gNMI stream
🔌
Telegraf
Collection
🗃
InfluxDB
Time-series store
📊
Grafana
Dashboards
🔔
Alerting
PagerDuty

gNMI subscription via Python

Python telemetry/gnmi_collector.py
from pygnmi.client import gNMIclient
import json, time

TARGET_PATHS = [
    "openconfig-bgp:bgp/neighbors",
    "openconfig-interfaces:interfaces/interface/state/counters",
    "openconfig-network-instance:network-instances"
]

def stream_telemetry(host, port=57400):
    with gNMIclient(
        target=(host, port),
        username=os.environ["NET_USER"],
        password=os.environ["NET_PASS"],
        insecure=True
    ) as gc:
        for update in gc.subscribe_sync(
            subscribe={
                "subscription": [{"path": p, "mode": "sample", "sample_interval": 10_000_000_000}
                               for p in TARGET_PATHS],
                "mode": "stream"
            }
        ):
            process_update(update)

def process_update(update):
    """Parse update and write to InfluxDB."""
    for notification in update.get("update", []):
        path = notification["path"]
        val  = notification["val"]
        write_to_influx(path, val)
Team collaboration

How the team works together

Automation only works if the whole team trusts it and follows the same process. These are the collaboration patterns that make production automation sustainable.

Role Responsibility in the pipeline Tools used
Network engineer Defines intent in NetBox, opens PRs for config changes, reviews diffs before approving NetBox, Git, Python
Automation engineer Maintains templates and scripts, owns the CI/CD pipeline, handles tooling bugs Jinja2, GitHub Actions, Batfish
Network architect Reviews topology changes, approves new template patterns, signs off on major rollouts Batfish reports, topology diagrams
NOC team Monitors post-deploy telemetry, raises issues if verification fails, handles rollbacks Grafana, PagerDuty, runbooks

Questions to ask before starting an automation project

What is the source of truth and who owns it? If NetBox data is wrong, every rendered config will be wrong. Establish who is responsible for keeping device data accurate before writing a single line of automation.
What is the rollback plan if a deploy fails? Every deploy script needs a corresponding rollback script. Define the failure threshold before starting: if X BGP sessions go down, automatically roll back and page the on-call engineer.
Which devices are in scope for the first rollout? Start with non-production or low-risk devices. Automate one device role at a time. Running automation across your entire network on day one is how you cause a major incident.
How do engineers who prefer CLI still fit into the workflow? Not every engineer will embrace automation immediately. Design the process so manual changes are still possible but require a NetBox update first. This preserves the source of truth even when people go off-script.
What does approval look like for production changes? Require at least two engineer approvals on any PR that touches production devices. The CI pipeline catches technical errors but human review catches intent errors.
Interactive tool

Config generator

Fill in the parameters below and get a production-ready BGP neighbor config for Cisco IOS. This demonstrates the same templating logic used in the full pipeline above.

Generated config
Cisco IOS bgp_neighbor.txt
! Fill in the form and click Generate config
What is next

More automation tools coming

This page will grow as new tooling gets built and tested. The goal is to make every section interactive with real working tools, not just documentation.

Coming soon

Config compliance checker

Paste a running config and get a best-practice audit report. Flags missing authentication, unsafe defaults, and policy gaps.

Coming soon

Jinja2 live renderer

Paste a template and a JSON variable file, get the rendered output instantly. Useful for testing templates before committing.

Planned

Rollback script generator

Paste a config diff and get an automatically generated rollback config to undo the change cleanly and safely.