Skip to content

ares-coding/sql-injection-attack-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ SQL Injection Attack Detection System

Machine Learning-Based Security System for SQL Injection Prevention

Python MySQL Scikit-learn License

Project Banner

Detect and prevent SQL injection attacks using pattern analysis and machine learning


πŸ“‹ Table of Contents


🎯 Overview

A comprehensive security-focused project demonstrating:

  1. ⚠️ Real-world SQL injection vulnerabilities and attack vectors
  2. πŸ›‘οΈ Machine learning-based detection using pattern analysis
  3. βœ… Secure coding practices and mitigation strategies
  4. πŸ“š Educational demonstrations for cybersecurity learning

Why This Matters

SQL Injection remains in the OWASP Top 10 web application security risks. This project provides:

  • πŸŽ“ Educational value - Learn attack patterns and defenses
  • πŸ” Detection capabilities - ML-powered threat identification
  • πŸ’‘ Best practices - Secure query implementation examples
  • πŸ› οΈ Practical tools - Ready-to-use detection system

✨ Features

πŸ”΄ Vulnerability Demonstrations

  • βœ… Authentication bypass examples
  • βœ… Union-based SQL injection
  • βœ… Blind SQL injection techniques
  • βœ… Time-based SQL injection
  • βœ… Error-based exploitation

🟒 Detection & Prevention

  • βœ… Pattern-based detection engine
  • βœ… SVM classifier for query analysis
  • βœ… Real-time threat assessment
  • βœ… Confidence scoring system
  • βœ… Query sanitization examples
  • βœ… Prepared statement demonstrations

πŸ“Š Analysis Tools

  • βœ… Query pattern visualization
  • βœ… Attack vector identification
  • βœ… Feature extraction pipeline
  • βœ… Performance metrics dashboard

⚠️ Attack Demonstrations

⚠️ WARNING: These examples are for educational purposes only. Never use on systems without authorization.

1. Authentication Bypass

Vulnerable Code:

# ❌ VULNERABLE - Never do this!
username = request.form['username']
password = request.form['password']

query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
cursor.execute(query)

Attack:

Username: admin' OR '1'='1
Password: anything

Resulting Query:
SELECT * FROM users WHERE username='admin' OR '1'='1' AND password='anything'
-- This always returns true!

Secure Alternative:

# βœ… SECURE - Use parameterized queries
query = "SELECT * FROM users WHERE username=%s AND password=%s"
cursor.execute(query, (username, password))

2. Union-Based Injection

Attack:

' UNION SELECT username, password FROM admin_users--

Extracting data from other tables

3. Blind SQL Injection

Attack:

' AND 1=1--  # Returns true
' AND 1=2--  # Returns false

# Extracting data one bit at a time

πŸ”¬ How It Works

Detection Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Input Query β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Feature          β”‚
β”‚ Extraction       β”‚
β”‚ β€’ Keywords       β”‚
β”‚ β€’ Patterns       β”‚
β”‚ β€’ Structure      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ML Classifier    β”‚
β”‚ (SVM)            β”‚
β”‚ β€’ Pattern Match  β”‚
β”‚ β€’ Anomaly Score  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Threat           β”‚
β”‚ Assessment       β”‚
β”‚ β€’ Safe/Malicious β”‚
β”‚ β€’ Confidence %   β”‚
β”‚ β€’ Attack Type    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Feature Extraction

def extract_features(query):
    features = {
        # Suspicious keywords
        'has_or': 'OR' in query.upper(),
        'has_union': 'UNION' in query.upper(),
        'has_comment': '--' in query or '/*' in query,
        'has_semicolon': ';' in query,
        
        # Pattern analysis
        'num_quotes': query.count("'") + query.count('"'),
        'num_equals': query.count('='),
        'num_dashes': query.count('-'),
        
        # Structure analysis
        'query_length': len(query),
        'has_always_true': check_tautology(query),
        'suspicious_encoding': check_encoding(query),
        
        # Advanced patterns
        'has_sleep': 'SLEEP' in query.upper(),
        'has_benchmark': 'BENCHMARK' in query.upper(),
        'has_information_schema': 'INFORMATION_SCHEMA' in query.upper()
    }
    return features

πŸ› οΈ Tech Stack

Core Technologies

Python MySQL Scikit-learn

Additional Tools

Pandas NumPy Flask


πŸ“₯ Installation

Prerequisites

  • Python 3.8+
  • MySQL Server 8.0+
  • Basic understanding of SQL

Setup

# 1. Clone repository
git clone https://github.com/ares-coding/sql-injection-attack-detection.git
cd sql-injection-attack-detection

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Setup MySQL database
mysql -u root -p < database/setup.sql

# 5. Configure database connection
cp config.example.py config.py
# Edit config.py with your MySQL credentials

πŸš€ Usage

1. Train the Detection Model

# Train SVM classifier on labeled SQL queries
python train_model.py --dataset data/sql_queries.csv --output models/detector.pkl

# Output:
# Training accuracy: 96.8%
# Test accuracy: 94.2%
# Model saved to models/detector.pkl

2. Test Vulnerability Demonstrations

# Run vulnerable application (for educational purposes)
python vulnerable_app.py

# Test SQL injection attacks
python test_attacks.py

3. Use the Detection System

from sql_detector import SQLInjectionDetector

# Initialize detector
detector = SQLInjectionDetector(model_path='models/detector.pkl')

# Analyze a query
query = "SELECT * FROM users WHERE id = '1' OR '1'='1'"
result = detector.predict(query)

print(f"Is Malicious: {result['is_malicious']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"Attack Type: {result['attack_type']}")
print(f"Risk Level: {result['risk_level']}")

Output:

Is Malicious: True
Confidence: 98.7%
Attack Type: Authentication Bypass (Tautology)
Risk Level: CRITICAL

4. Run the Web Interface

# Start Flask application
python app.py

# Visit http://localhost:5000

5. Secure Implementation Examples

# Example 1: Parameterized Queries
def get_user_safe(user_id):
    query = "SELECT * FROM users WHERE id = %s"
    cursor.execute(query, (user_id,))
    return cursor.fetchone()

# Example 2: Input Validation
def validate_input(user_input):
    # Whitelist allowed characters
    allowed = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-")
    if not set(user_input).issubset(allowed):
        raise ValueError("Invalid characters detected")
    return user_input

# Example 3: ORM Usage (SQLAlchemy)
from sqlalchemy import select
stmt = select(User).where(User.id == user_id)
result = session.execute(stmt)

πŸ” Detection Techniques

Pattern-Based Detection

INJECTION_PATTERNS = [
    r"(\bOR\b|\bAND\b)\s+['\"]?\w+['\"]?\s*=\s*['\"]?\w+['\"]?",  # Tautology
    r"\bUNION\b.*\bSELECT\b",                                      # Union injection
    r";\s*DROP\b",                                                 # Drop table
    r"--",                                                         # SQL comments
    r"/\*.*\*/",                                                   # Block comments
    r"\bEXEC\b|\bEXECUTE\b",                                      # Command execution
    r"\bSLEEP\b\s*\(",                                            # Time-based
    r"\bBENCHMARK\b\s*\(",                                        # Performance-based
    r"information_schema",                                         # Schema extraction
]

Machine Learning Features

Feature Category Examples
Keyword-based OR, UNION, DROP, EXEC, SLEEP
Symbol-based Quotes, dashes, semicolons, comments
Structural Query length, nesting level, clause count
Semantic Tautologies, always-true conditions
Encoding Hex encoding, URL encoding, Unicode

πŸ“Š Model Performance

Classification Results

Metric Value
Accuracy 94.2%
Precision 93.8%
Recall 94.7%
F1-Score 94.2%
False Positive Rate 3.2%
False Negative Rate 2.8%

Confusion Matrix

                Predicted
              Benign  Malicious
Actual Benign   1,842      58
     Malicious    52   2,048

Attack Type Detection

Attack Type Detection Rate
Tautology (OR 1=1) 98.5%
Union-based 96.3%
Blind SQL 92.7%
Time-based 94.1%
Comment-based 97.8%

πŸ” Security Best Practices

βœ… DO's

  1. Use Parameterized Queries
   cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
  1. Use ORM Frameworks
   User.query.filter_by(id=user_id).first()
  1. Input Validation
   if not user_input.isalnum():
       raise ValueError("Invalid input")
  1. Principle of Least Privilege
   GRANT SELECT ON database.users TO 'app_user'@'localhost';
  1. Escape Special Characters
   escaped = db.escape_string(user_input)

❌ DON'Ts

  1. Never concatenate user input
   # ❌ BAD
   query = f"SELECT * FROM users WHERE id = '{user_id}'"
  1. Don't trust any user input
   # ❌ BAD - Even "safe-looking" input can be malicious
  1. Don't show detailed errors to users
   # ❌ BAD - Reveals database structure
   except Exception as e:
       return str(e)
  1. Don't use dynamic table/column names from user input
   # ❌ BAD
   query = f"SELECT * FROM {user_table}"

πŸ“ Project Structure

sql-injection-detection/
β”œβ”€β”€ πŸ“ data/
β”‚   β”œβ”€β”€ sql_queries.csv          # Training dataset
β”‚   β”œβ”€β”€ attack_samples.txt       # Attack examples
β”‚   └── benign_samples.txt       # Legitimate queries
β”œβ”€β”€ πŸ“ database/
β”‚   β”œβ”€β”€ setup.sql                # Database schema
β”‚   β”œβ”€β”€ sample_data.sql          # Test data
β”‚   └── vulnerable_tables.sql    # Demo vulnerable setup
β”œβ”€β”€ πŸ“ models/
β”‚   β”œβ”€β”€ svm_detector.pkl         # Trained SVM model
β”‚   └── feature_extractor.pkl    # Feature scaler
β”œβ”€β”€ πŸ“ src/
β”‚   β”œβ”€β”€ feature_extraction.py    # Feature engineering
β”‚   β”œβ”€β”€ pattern_detection.py     # Regex-based detection
β”‚   β”œβ”€β”€ ml_classifier.py         # ML model training
β”‚   └── utils.py                 # Utility functions
β”œβ”€β”€ πŸ“ demos/
β”‚   β”œβ”€β”€ vulnerable_app.py        # Intentionally vulnerable app
β”‚   β”œβ”€β”€ secure_app.py            # Secured version
β”‚   └── attack_scenarios.py      # Attack demonstrations
β”œβ”€β”€ πŸ“ notebooks/
β”‚   β”œβ”€β”€ 01_data_analysis.ipynb
β”‚   β”œβ”€β”€ 02_pattern_study.ipynb
β”‚   └── 03_model_evaluation.ipynb
β”œβ”€β”€ πŸ“ web/
β”‚   β”œβ”€β”€ app.py                   # Flask web interface
β”‚   β”œβ”€β”€ templates/
β”‚   └── static/
β”œβ”€β”€ train_model.py               # Model training script
β”œβ”€β”€ test_detector.py             # Testing script
β”œβ”€β”€ requirements.txt
└── README.md

πŸ§ͺ Testing

Run Unit Tests

# Run all tests
python -m pytest tests/

# Run specific test suite
python -m pytest tests/test_detection.py -v

# Run with coverage
python -m pytest --cov=src tests/

Test Attack Scenarios

# Test all attack types
python test_attacks.py --all

# Test specific attack
python test_attacks.py --type tautology

# Generate test report
python test_attacks.py --report output/test_results.html

πŸ“š Educational Resources

Learning Path

  1. Understand SQL Basics β†’ W3Schools SQL Tutorial
  2. Learn SQL Injection β†’ OWASP SQL Injection Guide
  3. Practice Safely β†’ WebGoat, DVWA
  4. Study This Project β†’ Run demos, read code
  5. Build Defenses β†’ Implement secure code

Recommended Reading

Practice Platforms


βš–οΈ Legal & Ethical Notice

⚠️ IMPORTANT DISCLAIMER

This project is for educational and research purposes only.

  • βœ… Use on systems you own or have explicit permission to test
  • ❌ Never use on production systems without authorization
  • ❌ Unauthorized access to computer systems is illegal

The author assumes no liability for misuse of this software.


🀝 Contributing

We welcome contributions! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Add tests for new functionality
  4. Ensure all tests pass (pytest)
  5. Commit changes (git commit -m 'Add AmazingFeature')
  6. Push to branch (git push origin feature/AmazingFeature)
  7. Open a Pull Request

Contribution Ideas

  • πŸ› Bug fixes and improvements
  • πŸ“ Documentation enhancements
  • πŸ§ͺ Additional test cases
  • 🎨 Web interface improvements
  • πŸ” New detection patterns
  • 🌐 Multi-language support

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ‘€ Author

Au Amores - Full Stack Developer & Cybersecurity Enthusiast

LinkedIn GitHub Email


πŸ™ Acknowledgments

  • OWASP Foundation for security resources
  • MySQL documentation and community
  • Scikit-learn contributors
  • Security researchers and ethical hackers

πŸ“š Citation

@software{sql_injection_detection,
  author = {Amores, Au},
  title = {SQL Injection Attack Detection using Machine Learning},
  year = {2025},
  url = {https://github.com/ares-coding/sql-injection-attack-detection}
}

⭐ Star this repo if it helped you learn about SQL injection!

πŸ”’ Stay secure, code safely!

Made with πŸ›‘οΈ and β˜• by Ares