SHAURYA: Scalable High-frequency Architecture for Ultra-low Response Yield Access

Shaurya is a high-frequency trading (HFT) market data feed handler engineered for sub-microsecond latency. By leveraging Zero-Copy parsing, Lock-Free concurrency, and Stack-based memory management, it bypasses the performance bottlenecks of standard software architectures to process financial data with deterministic speed.

⚡ Performance Impact & Comparison

Shaurya was benchmarked using high-resolution hardware timers (QueryPerformanceCounter).

Implementation Approach	Average Latency	Min Latency	Why it's Slow/Fast?
Python Script	~45.0 µs	~30.0 µs	Interpreter overhead & Garbage Collection pauses.
Standard C++ (`std::string`)	~5.0 µs	~3.5 µs	Frequent Heap Allocations (`malloc`) & deep memory copying.
SHAURYA (Zero-Copy)	1.88 µs*	0.3 µs	Zero-Copy pointer arithmetic & Lock-Free queues.

The Result: Shaurya achieves a minimum internal reaction time of 300 nanoseconds, approximately 50x faster than standard Python implementations.

*Measured in Pure Mock Environment

🌍 Real-World Validation: The "Fragmented Liquidity" Test

Shaurya was subjected to a 30-minute stress test aggregating live ticks from Binance, Coinbase, and Bitstamp simultaneously.

Test Duration: 30 Minutes
Total Messages: 21,862 (Live Volatility Bursts)
Outcome: The engine successfully normalized fragmented liquidity streams in real-time. While average latency increased under OS scheduler load (due to non-isolated cores), the minimum latency remained at 0.3 µs, proving the core engine's efficiency remains stable even during crypto market volatility.

🏗 Key Technical Innovations

1. Zero-Copy Architecture

Instead of copying network packets into new std::string objects (which forces the OS to allocate memory), Shaurya uses a custom StringViewLite class. This creates a lightweight "view" over the raw socket buffer, allowing the engine to parse prices without moving a single byte of memory.

2. Lock-Free Concurrency (SPSC)

Traditional systems use Mutex locks (std::mutex) to share data between threads, which forces the CPU to stop and switch contexts (expensive). Shaurya implements a Single-Producer Single-Consumer Ring Buffer using std::atomic instructions. This allows the Network Thread to push data and the Strategy Thread to read data simultaneously without ever blocking.

3. CPU Cache Optimization

Critical data structures are aligned to 64-byte cache lines (alignas(64)). This prevents False Sharing, a phenomenon where two threads fight over the same CPU cache line, drastically reducing performance on multi-core systems.

🚀 Quick Start

🐍 Python Package Usage

After installing via pip, you can integrate Shaurya directly into your Python trading strategies or research notebooks to access C++ performance with Python simplicity.

Installation

pip install hft.shaurya

Create a Python script (e.g., main.py) to initialize the engine and listen for market data.
Note: Ensure you are running the MultiSourceUDP.py simulator (or have a real UDP feed active) before starting the engine.

import shaurya_hft
import time

def main():
    print("Initializing Shaurya HFT Engine...")
    print("🚀 Engine Started. Listening for live ticks...")
    try:
        while True:
            latency = engine.get_latency()
            if latency > 0:
                print(f"⚡ Tick Processed | Latency: {latency:.4f} μs")
            time.sleep(0.5)

    except KeyboardInterrupt:
        print("\nStopping Engine...")
        engine.stop()
        print("Engine Shutdown Complete.")

if __name__ == "__main__":
    main()

Function	Description
`engine = shaurya_hft.Engine()`	Initializes the C++ memory structures and lock-free ring buffers.
`engine.start(ip, port)`	Spawns the high-performance C++ network thread to listen on the specified UDP multicast group and releases the Python GIL.
`engine.get_latency()`	Returns the processing latency (in microseconds) of the most recent packet; thread-safe and lock-free.
`engine.stop()`	Safely signals the C++ thread to terminate and cleans up socket resources.

Resources

If you are new to High-Frequency Trading systems, these concepts explain the "Why" behind Shaurya's architecture:

Latency vs. Jitter: Understand why "Average Speed" is useless in HFT.
Zero-Copy Networking: How avoiding memory copies saves microseconds.
Lock-Free Programming: An introduction to Atomics and Ring Buffers.
False Sharing: The hidden killer of multi-threaded performance.

Developed by your's truly 🛩️!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
bin		bin
dist		dist
include		include
src		src
README.md		README.md
Shaurya_Metrics.txt		Shaurya_Metrics.txt
bridge.py		bridge.py
build.bat		build.bat
cmakelists.txt		cmakelists.txt
mockexchange.cpp		mockexchange.cpp
pyproject.toml		pyproject.toml
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SHAURYA: Scalable High-frequency Architecture for Ultra-low Response Yield Access

⚡ Performance Impact & Comparison

🌍 Real-World Validation: The "Fragmented Liquidity" Test

🏗 Key Technical Innovations

1. Zero-Copy Architecture

2. Lock-Free Concurrency (SPSC)

3. CPU Cache Optimization

🚀 Quick Start

🐍 Python Package Usage

Installation

Resources

About

Uh oh!

Releases 1

Packages

Languages

harshitsinghcode/shaurya

Folders and files

Latest commit

History

Repository files navigation

SHAURYA: Scalable High-frequency Architecture for Ultra-low Response Yield Access

⚡ Performance Impact & Comparison

🌍 Real-World Validation: The "Fragmented Liquidity" Test

🏗 Key Technical Innovations

1. Zero-Copy Architecture

2. Lock-Free Concurrency (SPSC)

3. CPU Cache Optimization

🚀 Quick Start

🐍 Python Package Usage

Installation

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages