3 Insights on Cloud Security Posture Management (CSPM) Journey

By Yousuf Hasan

5 min read

Palo Alto Networks is always working on improving its cloud security posture given the expansion in our cloud footprint. Think of it as a never-ending and exciting journey. That’s because the cloud security landscape is constantly evolving. 

For example, user-managed service account (SA) keys in GCP were considered OK a year ago, but now it creates a flood of cloud security policy violations and alerts. As a result, some best practices have evolved with several remediation options. 

In continuation of a previously published post about Prisma Cloud CSPM automation using policy-as-code to automate security policy management in our CI/CD pipeline, I’d like to share some insights from our Prisma Cloud deployment which I hope you’ll find useful. 

 

Prisma Cloud CSPM Deployment

We enabled Prisma Cloud CSPM in late April 2021 and started improving our cloud security posture by reducing high severity alerts within 2 months. Let me walk you through a timeline of what we did and what we achieved. 

  1. Connect Prisma Cloud to IT GCP projects (~90 minutes): This is obviously the first step and very easy. Initially, we got 30,000 alerts based on 150 policy violations for 600+ GCP projects. The good: we were able to see all the security holes. The bad: it was too much information.
  2. Label CSPM policies based on IT domains, disable irrelevant policies, and adjust policy/alert severity (4 to 5 weeks): This was a critical step in building an ownership model for Network, Compute, SRE and IAM teams. You will have to spend some time on this part and make sure all stakeholders participate.
  3. Fix high severity policies and alerts (8 weeks): We engaged with IT domain owners and fixed ~25 policies, causing high severity alerts to go down by 93%, from 546 to 37, and eventually to zero.  This was a major success! Compliance metrics also improved for CMMC v1.02, CIS v1.2.0, and NIST 800-53 rev 5, tracked by our InfoSec teams. All high severity alerts/policies were related to GCP configuration. 
  4. Build a CSPM policy-as-code tool (2 weeks): This allowed us to manage all policies in Github as code, giving us the benefit of having an automated, version driven and auditable framework.
  5. Fix medium severity policies and alerts (12 weeks) : Next logical step .…… medium severity alerts will always be in the tens of thousands. We were able to address ~4000 alerts by working on 4 policies related to enabling Vault/Key Managed System based keys with key rotation enabled, and by addressing excessive permissions. For details, keep on reading.  

In the span of 6 months, we went from zero cloud security posture visibility to 100% visibility with an ownership driven model. Throughout this entire process, we learned a ton, including the following three insights: 

 

1: Assign Ownership to IT Domains

Prisma Cloud provides great information about cloud security posture, but without policy/alert ownership, it’s hard to move the needle forward. 

Labeling policies to IT domains – Network, Compute, SRE, IAM – allows IT domain leaders to take care of work cut out for them. Without this step, everyone will have to look at all policies and alerts which means … gridlock. Assigning ownership also lets us reward successful IT domains and hold everyone accountable. 

 

2: Think in Terms of Policies 

Prisma Cloud CSPM alerts indicate cloud resources violating policies. While this is useful in looking at the security posture of every cloud resource, it can be overwhelming to think in terms of tens of thousands of alerts. Instead, it makes sense to think in terms of policies, which are a small manageable number and impact a large number of alerts.

Based on our own CSPM deployment, 66 policies were responsible for 13,000+ CSPM alerts as shown in the table below. Fixing one high severity policy fixed 36 high severity alerts. Now that is some impact!

CSPM policy to alert mapping

 

3: Don’t Expect Quick Fixes

Based on our experience, about ~25% of policy/alert work was a quick fix. We disabled ~21% policies because they were not relevant, and we modified RQL for another 4% policies.

CSPM policy analysis

The remainder ~75% aren’t quick fixes. These are design-related projects requiring design and configuration changes to cloud deployment in order to improve security posture.

For example, 4 medium severity policies and subsequently 4,000 medium severity alerts were related to user-managed service account keys and excessive permissions in GCP. We addressed these policies and alerts by:

  • Enabling Hashicorp Vault to manage new GCP service account keys, migrating existing service account keys to Vault, and building an automated pipeline for Vault integration to manage keys. 
  • Removing GCP primitive role assignments: owner/editor/viewer
  • Creating custom permission sets for different personas
  • Reviewing/managing roles assigned to users and service accounts through AD group membership

 

What’s Next?

If your organization hasn’t done so already, it may be time to build a cloud operations team, so that any new policy fixes and alerts can be addressed by a dedicated team with Network, Compute, SRE and IAM skills. This will keep future CSPM alerts and policy violations under control and your compliance will look great. We’re in the process of building this team to fix CSPM policies. We’re also looking at using Prisma Cloud’s integration with ServiceNow for incident management. Our CSPM journey has been fruitful so far, I hope you can use our learnings to make your own journey successful as well.