This project is a data pipeline and user-friendly application for processing the Secure Controls Framework (SCF). It downloads the latest SCF data, cleans it, and exports it as normalized CSV files ready for import into any database or data analysis tool.
- Automated Data Pipeline: Robust data pipeline that automates downloading, cleaning, and normalization of SCF data.
- Graphical User Interface (GUI): Simple Tkinter application provides a user-friendly way to run the data pipeline and select output directories.
- Framework Selection: Choose specific frameworks (e.g., NIST, ISO, PCI) or export all 258 available frameworks through the GUI.
- Multiple Export Formats: Export data as CSV, JSON, or both formats with a single click.
- MongoDB-Optimized Output: Generates a denormalized JSON structure optimized for MongoDB with embedded relationships.
- Buildable Executable: Can be bundled into a single executable file using PyInstaller for easy distribution on Windows.
- Clean CSV Output: Generates normalized, database-ready CSV files with standardized column names and clean data.
- Relationship Mapping: Automatically creates relationship tables for many-to-many mappings between controls and frameworks.
- Configurable Cleaning: Data cleaning process is controlled by a
column_register.csvfile, allowing easy customization without modifying source code. - Modern Tooling: Uses
uvfor fast and reproducible dependency management.
- Python 3.10+
uv(Python package installer and manager)
-
Clone the repository:
git clone https://github.com/dwilbourn-git/scf_csv_json_extractor.git cd scf_csv_json_extractor -
Create a virtual environment:
uv venv
-
Install dependencies:
uv pip sync requirements.lock
Or to regenerate the lock file from requirements.txt:
uv pip compile requirements.txt -o requirements.lock uv pip sync requirements.lock
To run the GUI application, execute the app.py script:
python app.pyThe GUI will allow you to:
- Select an output directory for the cleaned data files
- Choose output format: CSV, JSON, or both
- Select specific frameworks or export all 258 frameworks
- Run the full pipeline with a single click
- View progress and status messages
The project can also be controlled via the command line using the main.py script:
-
Run the full data pipeline:
python main.py run
This will:
- Download the latest SCF Excel file from GitHub
- Split the workbook into individual CSV files
- Clean and normalize the data
- Generate relationship mapping files
-
Run with JSON output:
python main.py run --format json
Or export both CSV and JSON:
python main.py run --format both
-
Check SCF version:
python main.py version
-
Validate data integrity:
python main.py validate
This checks for broken foreign key relationships.
-
Clean up generated files:
python main.py clean
-
For a full list of commands:
python main.py --help
To build a standalone Windows executable:
python build.pyThe executable will be created in the dist directory as Dewis SCF Extractor.exe.
When the pipeline runs, it creates the following directory structure:
output_directory/
├── scf_full/ # Downloaded Excel file and metadata
├── csv/ # Raw CSV files split from Excel
├── csv_cleaned/ # Cleaned and normalized CSV files
├── scf_relationships/ # SCF-to-entity relationship mappings
├── framework_relationships/ # Framework-to-control relationship mappings
└── json_output/ # JSON export files (when JSON format selected)
├── scf_mongodb.json # MongoDB-optimized denormalized structure
└── *.json # Individual entity JSON files
- SCF.csv: Main controls file with all intrinsic control attributes
- SCF_Domains_Principles.csv: Domain and principle definitions
- Assessment_Objectives.csv: Assessment objective mappings
- Risk_Catalog.csv: Risk definitions
- Threat_Catalog.csv: Threat definitions
Relationship files use a two-column format for easy database import:
- SCF relationships: Link controls to SCF entities (domains, risks, threats, etc.)
- Framework relationships: Link controls to external frameworks (NIST, ISO, etc.)
The scf_mongodb.json file provides a denormalized, MongoDB-optimized structure where:
- Each control is a complete document with embedded relationships
- Domains, assessment objectives, threats, risks, and evidence requests are embedded directly
- Framework mappings are included as sub-documents
- Blank fields are omitted to reduce document size and follow MongoDB best practices
- Ready for direct import with
mongoimportor MongoDB drivers
Example structure:
{
"_id": "GOV-01",
"control_id": "GOV-01",
"title": "Statutory, Regulatory & Contractual Compliance",
"domain": {
"identifier": "GOV",
"name": "Governance, Risk Management & Compliance"
},
"assessment_objectives": [...],
"threats": [...],
"framework_mappings": {
"nist_800_53_rev5": ["PM-1"],
"iso_iec_27001_2022": ["5.1"]
}
}All credit for the Secure Controls Framework, its content and its relationship mappings goes to the SCF team: https://github.com/securecontrolsframework