From e8fe1d1dd3dce520c3f73b08439ca3b3281a41bd Mon Sep 17 00:00:00 2001 From: Dan Barr Date: Tue, 11 Feb 2025 14:32:21 -0500 Subject: [PATCH] Add PII redaction --- docs/about/changelog.md | 5 ++ docs/features/dependency-risk.md | 3 +- docs/features/muxing.md | 1 - docs/features/secrets-encryption.md | 82 +++++++++++++++++++---------- docs/features/security-reviews.md | 1 - docs/features/workspaces.mdx | 1 - docs/index.md | 7 +-- 7 files changed, 65 insertions(+), 35 deletions(-) diff --git a/docs/about/changelog.md b/docs/about/changelog.md index 1a02306..67b1293 100644 --- a/docs/about/changelog.md +++ b/docs/about/changelog.md @@ -13,6 +13,11 @@ Major features and changes are noted here. To review all updates, see the Related: [Upgrade CodeGate](../how-to/install.md#upgrade-codegate) +- **PII redaction:** - 10 Feb, 2025\ + Starting with v0.1.18, CodeGate now redacts personally identifiable + information (PII) found in LLM prompts and context. See the + [feature page](../features/secrets-encryption.md) to learn more. + - **Model muxing** - 7 Feb, 2025\ With CodeGate v0.1.17 you can use the new `/v1/mux` endpoint to configure model selection based on your workspace! Learn more in the diff --git a/docs/features/dependency-risk.md b/docs/features/dependency-risk.md index 1a61e51..0f6d507 100644 --- a/docs/features/dependency-risk.md +++ b/docs/features/dependency-risk.md @@ -1,7 +1,6 @@ --- title: Dependency risk awareness description: Protection from malicious or vulnerable dependencies -sidebar_position: 20 --- ## What's the risk? @@ -9,7 +8,7 @@ sidebar_position: 20 The large language models (LLMs) that drive AI coding assistants are incredibly costly and time-consuming to train. That's why each one has a "knowledge cutoff date" which is often months or even years in the past. For example, GPT-4o's -training cutoff was October 2023\. +training cutoff was October 2023. But the open source software ecosystem moves quickly, and so do malicious actors seeking to exploit the software supply chain. LLMs often suggest outdated, diff --git a/docs/features/muxing.md b/docs/features/muxing.md index 33bbc0a..eced281 100644 --- a/docs/features/muxing.md +++ b/docs/features/muxing.md @@ -1,7 +1,6 @@ --- title: Model muxing description: Configure a per-workspace LLM -sidebar_position: 35 --- ## Overview diff --git a/docs/features/secrets-encryption.md b/docs/features/secrets-encryption.md index d102302..f9b508b 100644 --- a/docs/features/secrets-encryption.md +++ b/docs/features/secrets-encryption.md @@ -1,34 +1,34 @@ --- -title: Secrets encryption +title: Secrets encryption and PII redaction description: Keep your secrets a secret -sidebar_position: 10 --- ## What's the risk? -As you interact with an AI coding assistant, sensitive data like passwords and -access tokens can be unintentionally exposed to third-party providers through -the code snippets and files you share as context. These secrets may become part -of the training data used to improve the AI model and potentially be exposed to -other users. +As you interact with an AI coding assistant, sensitive data like passwords +access tokens, and even personally identifiable information (PII) can be +unintentionally exposed to third-party providers through the code and files you +share as context. Besides the privacy and regulatory implications of exposing +this information, it may become part of the AI model's training data and +potentially be exposed to future users. ## How CodeGate helps CodeGate helps you protect sensitive information from being accidentally exposed to AI models and third-party AI provider systems by redacting detected secrets -from your prompts using encryption. +and PII found in your prompts. ## How it works -CodeGate automatically scans all prompts for secrets such as: +CodeGate automatically scans all prompts for secrets and PII. This happens +transparently without requiring a specific prompt. Without interrupting your +development flow, CodeGate protects your data by encrypting secrets and +anonymizing PII. These changes are made before the prompt is sent to the LLM and +are restored when the result is returned to your machine. -- API keys and tokens -- Private keys and certificates -- Database credentials -- SSH keys -- Cloud provider credentials - -This scan happens transparently without requiring a specific prompt. +When a secret or PII is detected, CodeGate adds a message to the LLM's output +and an alert is recorded in the [dashboard](../how-to/dashboard.md) (PII alerts +in the dashboard are coming soon). :::info @@ -36,27 +36,55 @@ Since CodeGate runs locally, your secrets never leave your system unprotected. ::: -CodeGate transparently encrypts secrets before sending the prompt to the LLM. -This way, CodeGate protects your sensitive data without blocking your -development flow. This is performed on the fly using AES256-GCM encryption with -a temporary per-session key that is securely erased from memory after the -response is delivered to your plugin. - ```mermaid sequenceDiagram participant Client as AI coding
assistant participant CodeGate as CodeGate
(local) participant LLM as AI model
(remote) - Client ->> CodeGate: Prompt with
plaintext secrets + Client ->> CodeGate: Prompt with
plaintext secrets/PII activate CodeGate - CodeGate ->> LLM: Prompt with
encrypted secrets + CodeGate ->> LLM: Prompt with
redacted secrets/PII deactivate CodeGate activate LLM - note right of LLM: LLM only sees
encrypted values - LLM -->> CodeGate: Response with
encrypted secrets + note right of LLM: LLM only sees
redacted values + LLM -->> CodeGate: Response with
redacted data deactivate LLM activate CodeGate - CodeGate -->> Client: Response with
plaintext secrets + CodeGate -->> Client: Response with
original data deactivate CodeGate ``` + +### Secrets encryption + +CodeGate uses pattern matching to detect secrets such as: + +- API keys and tokens +- Private keys and certificates +- Database credentials +- SSH keys +- Cloud provider credentials +- ...and more - see the + [signatures file](https://github.com/stacklok/codegate/blob/main/signatures.yaml) + in the project repo + +CodeGate transparently encrypts secrets before sending the prompt to the LLM. +This is performed on the fly using AES256-GCM encryption with a temporary +per-session key. When the LLM returns a response, CodeGate decrypts the secret +before delivering it to your coding assistant, then securely erases the +temporary key from memory. + +### PII redaction + +CodeGate scans for common types of PII like: + +- Email addresses +- Phone numbers +- Government identification numbers +- Credit card numbers +- Bank accounts and crypto wallet IDs + +CodeGate anonymizes PII by replacing each string with a unique identifier before +sending the prompt to the LLM. This way, CodeGate protects your sensitive data +without blocking your development flow. When the LLM returns a response, +CodeGate matches up the identifier and replaces it with the original value. diff --git a/docs/features/security-reviews.md b/docs/features/security-reviews.md index 2c2d1ca..8c90d80 100644 --- a/docs/features/security-reviews.md +++ b/docs/features/security-reviews.md @@ -1,7 +1,6 @@ --- title: Security reviews description: Enhanced secure coding guidance -sidebar_position: 30 --- ## What's the risk? diff --git a/docs/features/workspaces.mdx b/docs/features/workspaces.mdx index 08dc34f..6ea63b2 100644 --- a/docs/features/workspaces.mdx +++ b/docs/features/workspaces.mdx @@ -1,7 +1,6 @@ --- title: Workspaces description: Organize and customize your project environments -sidebar_position: 40 --- import useBaseUrl from '@docusaurus/useBaseUrl'; diff --git a/docs/index.md b/docs/index.md index 81beeca..4ecd995 100644 --- a/docs/index.md +++ b/docs/index.md @@ -36,8 +36,9 @@ sequenceDiagram CodeGate includes several key features for privacy, security, and coding efficiency, including: -- [Secrets encryption](./features/secrets-encryption.md) to protect your - sensitive credentials +- [Secrets encryption and PII redaction](./features/secrets-encryption.md) to + protect your sensitive credentials and anonymize personally identifiable + information - [Dependency risk awareness](./features/dependency-risk.md) to update the LLM's knowledge of malicious or deprecated open source packages - [Model muxing](./features/muxing.md) to quickly select the best LLM @@ -101,7 +102,7 @@ Review the [installation instructions](./how-to/install.md). Learn more about CodeGate's features: -- [Secrets encryption](./features/secrets-encryption.md) +- [Secrets and PII redaction](./features/secrets-encryption.md) - [Dependency risk awareness](./features/dependency-risk.md) - [Security reviews](./features/security-reviews.md) - [Workspaces](./features/workspaces.mdx)