Detect and prevent SQL injection attacks using pattern analysis and machine learning
- Overview
- Features
- Attack Demonstrations
- How It Works
- Tech Stack
- Installation
- Usage
- Detection Techniques
- Model Performance
- Security Best Practices
- Project Structure
- Contributing
- License
A comprehensive security-focused project demonstrating:
β οΈ Real-world SQL injection vulnerabilities and attack vectors- π‘οΈ Machine learning-based detection using pattern analysis
- β Secure coding practices and mitigation strategies
- π Educational demonstrations for cybersecurity learning
SQL Injection remains in the OWASP Top 10 web application security risks. This project provides:
- π Educational value - Learn attack patterns and defenses
- π Detection capabilities - ML-powered threat identification
- π‘ Best practices - Secure query implementation examples
- π οΈ Practical tools - Ready-to-use detection system
- β Authentication bypass examples
- β Union-based SQL injection
- β Blind SQL injection techniques
- β Time-based SQL injection
- β Error-based exploitation
- β Pattern-based detection engine
- β SVM classifier for query analysis
- β Real-time threat assessment
- β Confidence scoring system
- β Query sanitization examples
- β Prepared statement demonstrations
- β Query pattern visualization
- β Attack vector identification
- β Feature extraction pipeline
- β Performance metrics dashboard
β οΈ WARNING: These examples are for educational purposes only. Never use on systems without authorization.
Vulnerable Code:
# β VULNERABLE - Never do this!
username = request.form['username']
password = request.form['password']
query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
cursor.execute(query)Attack:
Username: admin' OR '1'='1
Password: anything
Resulting Query:
SELECT * FROM users WHERE username='admin' OR '1'='1' AND password='anything'
-- This always returns true!Secure Alternative:
# β
SECURE - Use parameterized queries
query = "SELECT * FROM users WHERE username=%s AND password=%s"
cursor.execute(query, (username, password))Attack:
' UNION SELECT username, password FROM admin_users--
Extracting data from other tablesAttack:
' AND 1=1-- # Returns true
' AND 1=2-- # Returns false
# Extracting data one bit at a timeβββββββββββββββ
β Input Query β
ββββββββ¬βββββββ
β
βΌ
ββββββββββββββββββββ
β Feature β
β Extraction β
β β’ Keywords β
β β’ Patterns β
β β’ Structure β
ββββββββββ¬ββββββββββ
β
βΌ
ββββββββββββββββββββ
β ML Classifier β
β (SVM) β
β β’ Pattern Match β
β β’ Anomaly Score β
ββββββββββ¬ββββββββββ
β
βΌ
ββββββββββββββββββββ
β Threat β
β Assessment β
β β’ Safe/Malicious β
β β’ Confidence % β
β β’ Attack Type β
ββββββββββββββββββββ
def extract_features(query):
features = {
# Suspicious keywords
'has_or': 'OR' in query.upper(),
'has_union': 'UNION' in query.upper(),
'has_comment': '--' in query or '/*' in query,
'has_semicolon': ';' in query,
# Pattern analysis
'num_quotes': query.count("'") + query.count('"'),
'num_equals': query.count('='),
'num_dashes': query.count('-'),
# Structure analysis
'query_length': len(query),
'has_always_true': check_tautology(query),
'suspicious_encoding': check_encoding(query),
# Advanced patterns
'has_sleep': 'SLEEP' in query.upper(),
'has_benchmark': 'BENCHMARK' in query.upper(),
'has_information_schema': 'INFORMATION_SCHEMA' in query.upper()
}
return features- Python 3.8+
- MySQL Server 8.0+
- Basic understanding of SQL
# 1. Clone repository
git clone https://github.com/ares-coding/sql-injection-attack-detection.git
cd sql-injection-attack-detection
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Setup MySQL database
mysql -u root -p < database/setup.sql
# 5. Configure database connection
cp config.example.py config.py
# Edit config.py with your MySQL credentials# Train SVM classifier on labeled SQL queries
python train_model.py --dataset data/sql_queries.csv --output models/detector.pkl
# Output:
# Training accuracy: 96.8%
# Test accuracy: 94.2%
# Model saved to models/detector.pkl# Run vulnerable application (for educational purposes)
python vulnerable_app.py
# Test SQL injection attacks
python test_attacks.pyfrom sql_detector import SQLInjectionDetector
# Initialize detector
detector = SQLInjectionDetector(model_path='models/detector.pkl')
# Analyze a query
query = "SELECT * FROM users WHERE id = '1' OR '1'='1'"
result = detector.predict(query)
print(f"Is Malicious: {result['is_malicious']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"Attack Type: {result['attack_type']}")
print(f"Risk Level: {result['risk_level']}")Output:
Is Malicious: True
Confidence: 98.7%
Attack Type: Authentication Bypass (Tautology)
Risk Level: CRITICAL
# Start Flask application
python app.py
# Visit http://localhost:5000# Example 1: Parameterized Queries
def get_user_safe(user_id):
query = "SELECT * FROM users WHERE id = %s"
cursor.execute(query, (user_id,))
return cursor.fetchone()
# Example 2: Input Validation
def validate_input(user_input):
# Whitelist allowed characters
allowed = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-")
if not set(user_input).issubset(allowed):
raise ValueError("Invalid characters detected")
return user_input
# Example 3: ORM Usage (SQLAlchemy)
from sqlalchemy import select
stmt = select(User).where(User.id == user_id)
result = session.execute(stmt)INJECTION_PATTERNS = [
r"(\bOR\b|\bAND\b)\s+['\"]?\w+['\"]?\s*=\s*['\"]?\w+['\"]?", # Tautology
r"\bUNION\b.*\bSELECT\b", # Union injection
r";\s*DROP\b", # Drop table
r"--", # SQL comments
r"/\*.*\*/", # Block comments
r"\bEXEC\b|\bEXECUTE\b", # Command execution
r"\bSLEEP\b\s*\(", # Time-based
r"\bBENCHMARK\b\s*\(", # Performance-based
r"information_schema", # Schema extraction
]| Feature Category | Examples |
|---|---|
| Keyword-based | OR, UNION, DROP, EXEC, SLEEP |
| Symbol-based | Quotes, dashes, semicolons, comments |
| Structural | Query length, nesting level, clause count |
| Semantic | Tautologies, always-true conditions |
| Encoding | Hex encoding, URL encoding, Unicode |
| Metric | Value |
|---|---|
| Accuracy | 94.2% |
| Precision | 93.8% |
| Recall | 94.7% |
| F1-Score | 94.2% |
| False Positive Rate | 3.2% |
| False Negative Rate | 2.8% |
Predicted
Benign Malicious
Actual Benign 1,842 58
Malicious 52 2,048
| Attack Type | Detection Rate |
|---|---|
| Tautology (OR 1=1) | 98.5% |
| Union-based | 96.3% |
| Blind SQL | 92.7% |
| Time-based | 94.1% |
| Comment-based | 97.8% |
- Use Parameterized Queries
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))- Use ORM Frameworks
User.query.filter_by(id=user_id).first()- Input Validation
if not user_input.isalnum():
raise ValueError("Invalid input")- Principle of Least Privilege
GRANT SELECT ON database.users TO 'app_user'@'localhost';- Escape Special Characters
escaped = db.escape_string(user_input)- Never concatenate user input
# β BAD
query = f"SELECT * FROM users WHERE id = '{user_id}'"- Don't trust any user input
# β BAD - Even "safe-looking" input can be malicious- Don't show detailed errors to users
# β BAD - Reveals database structure
except Exception as e:
return str(e)- Don't use dynamic table/column names from user input
# β BAD
query = f"SELECT * FROM {user_table}"sql-injection-detection/
βββ π data/
β βββ sql_queries.csv # Training dataset
β βββ attack_samples.txt # Attack examples
β βββ benign_samples.txt # Legitimate queries
βββ π database/
β βββ setup.sql # Database schema
β βββ sample_data.sql # Test data
β βββ vulnerable_tables.sql # Demo vulnerable setup
βββ π models/
β βββ svm_detector.pkl # Trained SVM model
β βββ feature_extractor.pkl # Feature scaler
βββ π src/
β βββ feature_extraction.py # Feature engineering
β βββ pattern_detection.py # Regex-based detection
β βββ ml_classifier.py # ML model training
β βββ utils.py # Utility functions
βββ π demos/
β βββ vulnerable_app.py # Intentionally vulnerable app
β βββ secure_app.py # Secured version
β βββ attack_scenarios.py # Attack demonstrations
βββ π notebooks/
β βββ 01_data_analysis.ipynb
β βββ 02_pattern_study.ipynb
β βββ 03_model_evaluation.ipynb
βββ π web/
β βββ app.py # Flask web interface
β βββ templates/
β βββ static/
βββ train_model.py # Model training script
βββ test_detector.py # Testing script
βββ requirements.txt
βββ README.md
# Run all tests
python -m pytest tests/
# Run specific test suite
python -m pytest tests/test_detection.py -v
# Run with coverage
python -m pytest --cov=src tests/# Test all attack types
python test_attacks.py --all
# Test specific attack
python test_attacks.py --type tautology
# Generate test report
python test_attacks.py --report output/test_results.html- Understand SQL Basics β W3Schools SQL Tutorial
- Learn SQL Injection β OWASP SQL Injection Guide
- Practice Safely β WebGoat, DVWA
- Study This Project β Run demos, read code
- Build Defenses β Implement secure code
- π OWASP Top 10
- π SQL Injection Cheat Sheet
- π Parameterized Queries Guide
- π― HackTheBox
- π― TryHackMe
- π― PentesterLab
β οΈ IMPORTANT DISCLAIMERThis project is for educational and research purposes only.
- β Use on systems you own or have explicit permission to test
- β Never use on production systems without authorization
- β Unauthorized access to computer systems is illegal
The author assumes no liability for misuse of this software.
We welcome contributions! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Add tests for new functionality
- Ensure all tests pass (
pytest) - Commit changes (
git commit -m 'Add AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- π Bug fixes and improvements
- π Documentation enhancements
- π§ͺ Additional test cases
- π¨ Web interface improvements
- π New detection patterns
- π Multi-language support
This project is licensed under the MIT License - see the LICENSE file for details.
Au Amores - Full Stack Developer & Cybersecurity Enthusiast
- OWASP Foundation for security resources
- MySQL documentation and community
- Scikit-learn contributors
- Security researchers and ethical hackers
@software{sql_injection_detection,
author = {Amores, Au},
title = {SQL Injection Attack Detection using Machine Learning},
year = {2025},
url = {https://github.com/ares-coding/sql-injection-attack-detection}
}β Star this repo if it helped you learn about SQL injection!
π Stay secure, code safely!
Made with π‘οΈ and β by Ares