diff --git a/docs/about/changelog.md b/docs/about/changelog.md
index 1a02306..67b1293 100644
--- a/docs/about/changelog.md
+++ b/docs/about/changelog.md
@@ -13,6 +13,11 @@ Major features and changes are noted here. To review all updates, see the
Related: [Upgrade CodeGate](../how-to/install.md#upgrade-codegate)
+- **PII redaction:** - 10 Feb, 2025\
+ Starting with v0.1.18, CodeGate now redacts personally identifiable
+ information (PII) found in LLM prompts and context. See the
+ [feature page](../features/secrets-encryption.md) to learn more.
+
- **Model muxing** - 7 Feb, 2025\
With CodeGate v0.1.17 you can use the new `/v1/mux` endpoint to configure
model selection based on your workspace! Learn more in the
diff --git a/docs/features/dependency-risk.md b/docs/features/dependency-risk.md
index 1a61e51..0f6d507 100644
--- a/docs/features/dependency-risk.md
+++ b/docs/features/dependency-risk.md
@@ -1,7 +1,6 @@
---
title: Dependency risk awareness
description: Protection from malicious or vulnerable dependencies
-sidebar_position: 20
---
## What's the risk?
@@ -9,7 +8,7 @@ sidebar_position: 20
The large language models (LLMs) that drive AI coding assistants are incredibly
costly and time-consuming to train. That's why each one has a "knowledge cutoff
date" which is often months or even years in the past. For example, GPT-4o's
-training cutoff was October 2023\.
+training cutoff was October 2023.
But the open source software ecosystem moves quickly, and so do malicious actors
seeking to exploit the software supply chain. LLMs often suggest outdated,
diff --git a/docs/features/muxing.md b/docs/features/muxing.md
index 33bbc0a..eced281 100644
--- a/docs/features/muxing.md
+++ b/docs/features/muxing.md
@@ -1,7 +1,6 @@
---
title: Model muxing
description: Configure a per-workspace LLM
-sidebar_position: 35
---
## Overview
diff --git a/docs/features/secrets-encryption.md b/docs/features/secrets-encryption.md
index d102302..f9b508b 100644
--- a/docs/features/secrets-encryption.md
+++ b/docs/features/secrets-encryption.md
@@ -1,34 +1,34 @@
---
-title: Secrets encryption
+title: Secrets encryption and PII redaction
description: Keep your secrets a secret
-sidebar_position: 10
---
## What's the risk?
-As you interact with an AI coding assistant, sensitive data like passwords and
-access tokens can be unintentionally exposed to third-party providers through
-the code snippets and files you share as context. These secrets may become part
-of the training data used to improve the AI model and potentially be exposed to
-other users.
+As you interact with an AI coding assistant, sensitive data like passwords
+access tokens, and even personally identifiable information (PII) can be
+unintentionally exposed to third-party providers through the code and files you
+share as context. Besides the privacy and regulatory implications of exposing
+this information, it may become part of the AI model's training data and
+potentially be exposed to future users.
## How CodeGate helps
CodeGate helps you protect sensitive information from being accidentally exposed
to AI models and third-party AI provider systems by redacting detected secrets
-from your prompts using encryption.
+and PII found in your prompts.
## How it works
-CodeGate automatically scans all prompts for secrets such as:
+CodeGate automatically scans all prompts for secrets and PII. This happens
+transparently without requiring a specific prompt. Without interrupting your
+development flow, CodeGate protects your data by encrypting secrets and
+anonymizing PII. These changes are made before the prompt is sent to the LLM and
+are restored when the result is returned to your machine.
-- API keys and tokens
-- Private keys and certificates
-- Database credentials
-- SSH keys
-- Cloud provider credentials
-
-This scan happens transparently without requiring a specific prompt.
+When a secret or PII is detected, CodeGate adds a message to the LLM's output
+and an alert is recorded in the [dashboard](../how-to/dashboard.md) (PII alerts
+in the dashboard are coming soon).
:::info
@@ -36,27 +36,55 @@ Since CodeGate runs locally, your secrets never leave your system unprotected.
:::
-CodeGate transparently encrypts secrets before sending the prompt to the LLM.
-This way, CodeGate protects your sensitive data without blocking your
-development flow. This is performed on the fly using AES256-GCM encryption with
-a temporary per-session key that is securely erased from memory after the
-response is delivered to your plugin.
-
```mermaid
sequenceDiagram
participant Client as AI coding
assistant
participant CodeGate as CodeGate
(local)
participant LLM as AI model
(remote)
- Client ->> CodeGate: Prompt with
plaintext secrets
+ Client ->> CodeGate: Prompt with
plaintext secrets/PII
activate CodeGate
- CodeGate ->> LLM: Prompt with
encrypted secrets
+ CodeGate ->> LLM: Prompt with
redacted secrets/PII
deactivate CodeGate
activate LLM
- note right of LLM: LLM only sees
encrypted values
- LLM -->> CodeGate: Response with
encrypted secrets
+ note right of LLM: LLM only sees
redacted values
+ LLM -->> CodeGate: Response with
redacted data
deactivate LLM
activate CodeGate
- CodeGate -->> Client: Response with
plaintext secrets
+ CodeGate -->> Client: Response with
original data
deactivate CodeGate
```
+
+### Secrets encryption
+
+CodeGate uses pattern matching to detect secrets such as:
+
+- API keys and tokens
+- Private keys and certificates
+- Database credentials
+- SSH keys
+- Cloud provider credentials
+- ...and more - see the
+ [signatures file](https://github.com/stacklok/codegate/blob/main/signatures.yaml)
+ in the project repo
+
+CodeGate transparently encrypts secrets before sending the prompt to the LLM.
+This is performed on the fly using AES256-GCM encryption with a temporary
+per-session key. When the LLM returns a response, CodeGate decrypts the secret
+before delivering it to your coding assistant, then securely erases the
+temporary key from memory.
+
+### PII redaction
+
+CodeGate scans for common types of PII like:
+
+- Email addresses
+- Phone numbers
+- Government identification numbers
+- Credit card numbers
+- Bank accounts and crypto wallet IDs
+
+CodeGate anonymizes PII by replacing each string with a unique identifier before
+sending the prompt to the LLM. This way, CodeGate protects your sensitive data
+without blocking your development flow. When the LLM returns a response,
+CodeGate matches up the identifier and replaces it with the original value.
diff --git a/docs/features/security-reviews.md b/docs/features/security-reviews.md
index 2c2d1ca..8c90d80 100644
--- a/docs/features/security-reviews.md
+++ b/docs/features/security-reviews.md
@@ -1,7 +1,6 @@
---
title: Security reviews
description: Enhanced secure coding guidance
-sidebar_position: 30
---
## What's the risk?
diff --git a/docs/features/workspaces.mdx b/docs/features/workspaces.mdx
index 08dc34f..6ea63b2 100644
--- a/docs/features/workspaces.mdx
+++ b/docs/features/workspaces.mdx
@@ -1,7 +1,6 @@
---
title: Workspaces
description: Organize and customize your project environments
-sidebar_position: 40
---
import useBaseUrl from '@docusaurus/useBaseUrl';
diff --git a/docs/index.md b/docs/index.md
index 81beeca..4ecd995 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -36,8 +36,9 @@ sequenceDiagram
CodeGate includes several key features for privacy, security, and coding
efficiency, including:
-- [Secrets encryption](./features/secrets-encryption.md) to protect your
- sensitive credentials
+- [Secrets encryption and PII redaction](./features/secrets-encryption.md) to
+ protect your sensitive credentials and anonymize personally identifiable
+ information
- [Dependency risk awareness](./features/dependency-risk.md) to update the LLM's
knowledge of malicious or deprecated open source packages
- [Model muxing](./features/muxing.md) to quickly select the best LLM
@@ -101,7 +102,7 @@ Review the [installation instructions](./how-to/install.md).
Learn more about CodeGate's features:
-- [Secrets encryption](./features/secrets-encryption.md)
+- [Secrets and PII redaction](./features/secrets-encryption.md)
- [Dependency risk awareness](./features/dependency-risk.md)
- [Security reviews](./features/security-reviews.md)
- [Workspaces](./features/workspaces.mdx)