Prisma Cloud CSPM Automation Using Policy-as-Code

By Matthew Kwong

6 min read

Cloud Security Posture Management (CSPM) is one of the most popular tools in cybersecurity. Gartner estimated that up to 95% of cloud breaches are due to misconfigurations and CSPM can help prevent that. But CSPM generally comes with a tsunami of alerts, causing a lot of headache and fatigue for cloud security teams. In the early days of Prisma Cloud deployment, I used to see hundreds of new alerts every morning.

I want to share what we have done to deploy Prisma Cloud in our environment while accelerating alert remediation and suppressing false positives. While it’s possible to operate solely in the Prisma Cloud UI, Palo Alto Networks IT requires all the configuration and automation in code. I’m also going to talk about how the policies in Prisma Cloud are being managed with its extensive REST APIs and GitHub.

 

Incoming! Alerts!

Our main cloud vendor is Google Cloud Platform. To enable CSPM, we just had to upload the service account key in JSON to Prisma Cloud. The backend then tried to poll the resources and logs from GCP. It started to spill out 25000+ CSPM alerts, with over 95% related to configurations.

There are three independent methods to reduce the number of alerts:

  1. Only select a subset of GCP projects to be enabled
  2. Evaluate which policies can be disabled or severity adjusted as shown below in figure 1
  3. Set up alert rules to filter more alerts to be created, e.g., you want to create alerts for high severity and certain policies specifically
Figure 1. Prisma Cloud UI listing all the GCP policies that users can enable/disable individually

Alert Reduction

We want to have full visibility on all projects and alerts of all different severities, so we spent the majority of the time on method number two. We had about ~130 GCP built-in policies to look at. Since Palo Alto Networks IT has different teams for various domains, e.g., SRE, Network, IAM and compute, we labeled CSPM policies and worked with these teams to decide which CSPM policies should be enabled and how we can address them as shown in figure 2.

Figure 2. Tracking and Labeling CSPM policies with different stakeholders

Within 2 months, we resolved ~94% of the high severity alerts. For about 30% to 40% of these high severity alerts, we had to make changes to our terraforms in addition to tuning CSPM policies in Prisma Cloud. This is the key to avoiding alert fatigue!

An example of making changes to our terraforms after tuning CSPM policies is disabling project-wide SSH keys across VMs. We saw many alerts suggesting disabling project-wide SSH keys. This requires in-depth changes to our cloud build pipelines, based on the following steps:

  1. Making changes to cloud automation (terraform) for new builds after notifying developers.
  2. Disabling project-wide SSH keys for existing projects.
  3. Working with infosec teams to change org level policies to disable project-wide SSH keys.

Alerts as ServiceNow Incidents

We are in the process of enabling the ServiceNow integration to generate incidents for alerts. Prisma Cloud can move the ServiceNow ticket to the resolved state automatically once the corresponding alert is resolved. This is great because we can focus energy on tuning CSPM policies and for changing our terraforms. 

Also, these alert rules can let you send the alerts to different domain groups according to different policies, which is very important in our RACI model. We’re sending these alerts to a cloud operations team which has SMEs with Network, IAM, Compute & SRE skills. The overall flow is as follows:

Figure 3. Tracking and Labeling CSPM policies with different stakeholders

 

Policy-as-Code

This is critical for applying proper controls and audit capabilities to CSPM policies. The main idea is to have GitHub recording the changes of the policies in Prisma Cloud through pull requests and approvals by SRE admins. The policy changes are from either the Prisma Cloud UI (including the built-in policies update from the Prisma Cloud team) or the DevOps working on the YAMLs of the backup policies in GitHub. I wrote a Go CLI program to provide two commands for the interactions:

  1. export – backup all the policies from Prisma Cloud to local in the format of YAMLs
  2. sync – update the policies from local to Prisma Cloud
Figure 4. Flow diagram of the interaction between GitHub and Prisma Cloud

Based on the flow chart as shown above, SMEs from the SRE/DevOps/InfoSec teams can review CSPM policy changes through pull requests (PRs) before making a change to the policies. We are using Drone CI internally to push the change right after the PR merge. Our goal is to have all changes to CSPM policies be made through GitHub. At some point in the future we’d like to disable any policy changes through UI and only use the GitHub route.

 

Resty HTTP Library in Go is Neat

The CSPM policy-as-code project was written in Go. I have been writing Go for about 2 years, and this is obviously not the first time to do an API call with an HTTP request in Go. In the past, I just used the default built-in http client from the standard library. However, I kept asking myself this: While almost everyone is using the famous python-requests, is there any similar HTTP client library in Go to replace my ~30 lines of boilerplate? I found resty! Figure 5 shows the difference made:

Figure 5. The output of the “git diff” after using resty

So, what are the results?

  1. Headers, URL prefixes, HTTP timeout and HTTP return code checking (Response.IsError()) were all pre-configured in c.client, that you don’t need to repeat itself many times
  2. Error handling (err) was reduced to only once instead of five times
  3. A debug option can be easily enabled to output detailed logs about your HTTP requests

With less code you get less mistakes, which is great. A full feature list is on their GitHub repository. You should definitely check this out if you are writing Go with many HTTP requests. 

 

Proper Framework

CSPM is not a one-time exercise, but a constant feedback loop, where things improve in subsequent iterations if the proper framework is in place for:

  1. Identifying owners for relevant CSPM policies
  2. Enforcing policy-as-code for CSPM policies
  3. Establishing automated alert management mechanisms

In this deployment, we found Prisma Cloud to be impressive in finding most of the configuration problems reliably. Combining policies and alert rules is also a smart way to reduce the number of alerts and notifications. Most of the problems were from our technical debt, especially before having our terraform/ansible automation.

I hope this helps you in managing the alerts when deploying Prisma Cloud or evaluating any other CSPM products.