It’s Friday 5 p.m. and you’re readying to wrap up for the day and join your friends for a relaxing evening. Suddenly you see an email with Urgent in the subject field. An application that needs to go live tonight is not functioning properly and they suspect the firewall is the culprit. App developers need your help to troubleshoot and if needed, make necessary changes ASAP. You have to help them troubleshoot and if changes are needed, reverse engineer the architecture on the fly and enable the business quickly while ensuring the appropriate security controls are not compromised. Sounds familiar?
Everyday, network teams are walking this tightrope of operational stability and zero-trust security. And doing it manually is no longer sustainable.
Manual network operations expose us to the following problems:
- Security risks due to delays in rolling out security updates
- Operational risks associated with human error causing an outage
- Productivity drag having to prove network innocence to application owners
Manual changes are also operationally expensive and cause delays in the delivery of new initiatives.
You’re probably wondering – what drives network changes? Well, there’s a few reasons:
- Zero-trust security requires all network traffic to be routed through NGFWs with granular access policies. This has a side effect of having a large number of firewall policies across multiple devices and requires careful analysis for every policy change thereby increasing the MTTR for each firewall change. Total firewall changes for us, from March 2020 – March 2021, were 1774 (~34/wk).
- Network devices have to be upgraded due to security vulnerabilities or operational considerations (bugs/new features)
- Increasing business needs require network infrastructure
The need of the hour is that of a modern network management framework which provides increased agility while ensuring zero-trust security controls are adhered to.
At Palo Alto Networks, IT adopted a DevOps-based methodology for network configuration management. A CI/CD pipeline built on Jenkins-GitHub infrastructure serves as the conduit to making firewall and other network device changes.
This infrastructure provides the following advantages:
- Each change is approved on actual configuration and not the intent of change by the rule owners (stored in a dynamic db)
- Change is evaluated for configuration fidelity through a lab infrastructure
- Change is subjected to automated predetermined checks, thereby preventing bad changes by rejecting them outright (such as ‘any’ ‘any’ rules)
- All changes are logged (single source of truth)
- It provides a foundation to build advanced capabilities of intent management
Specifically for the firewall management, we defined a logical structure of Device group hierarchy:
Shared -> Global Egress -> Individual DGs
Rules were structured in tiers:
- ‘Core Infrastructure rules’ (DNS, NTP, AD, Monitoring, etc) were placed top-most at Shared Device group level to be inherited by every location.
- Global Egress policies were standardized for users access to internet
- Local policies protecting specific resource at a given location
Rules were constructed on object groups, providing flexibility without the need to modify the core construct of the firewall rule.
During the last 2 years, we have reaped benefits in the form of reduced MTTR on network tickets while the network requirements continued to grow amidst phenomenal company growth.
We have also enabled network troubleshooting self-service to application owners and have seen healthy adoption of these services with high amount of deflected workload:
A code-first approach is imperative to ensuring effective zero-trust implementation while meeting operational efficiency. We continue to build services on this foundational infrastructure such as configuration compliance, automated rule management, etc. I look forward to sharing more details in subsequent posts.