Shaurya is a high-frequency trading (HFT) market data feed handler engineered for sub-microsecond latency. By leveraging Zero-Copy parsing, Lock-Free concurrency, and Stack-based memory management, it bypasses the performance bottlenecks of standard software architectures to process financial data with deterministic speed.
Shaurya was benchmarked using high-resolution hardware timers (QueryPerformanceCounter).
| Implementation Approach | Average Latency | Min Latency | Why it's Slow/Fast? |
|---|---|---|---|
| Python Script | ~45.0 µs | ~30.0 µs | Interpreter overhead & Garbage Collection pauses. |
Standard C++ (std::string) |
~5.0 µs | ~3.5 µs | Frequent Heap Allocations (malloc) & deep memory copying. |
| SHAURYA (Zero-Copy) | 1.88 µs* | 0.3 µs | Zero-Copy pointer arithmetic & Lock-Free queues. |
The Result: Shaurya achieves a minimum internal reaction time of 300 nanoseconds, approximately 50x faster than standard Python implementations.
*Measured in Pure Mock Environment
Shaurya was subjected to a 30-minute stress test aggregating live ticks from Binance, Coinbase, and Bitstamp simultaneously.
- Test Duration: 30 Minutes
- Total Messages: 21,862 (Live Volatility Bursts)
- Outcome: The engine successfully normalized fragmented liquidity streams in real-time. While average latency increased under OS scheduler load (due to non-isolated cores), the minimum latency remained at 0.3 µs, proving the core engine's efficiency remains stable even during crypto market volatility.
Instead of copying network packets into new std::string objects (which forces the OS to allocate memory), Shaurya uses a custom StringViewLite class. This creates a lightweight "view" over the raw socket buffer, allowing the engine to parse prices without moving a single byte of memory.
Traditional systems use Mutex locks (std::mutex) to share data between threads, which forces the CPU to stop and switch contexts (expensive). Shaurya implements a Single-Producer Single-Consumer Ring Buffer using std::atomic instructions. This allows the Network Thread to push data and the Strategy Thread to read data simultaneously without ever blocking.
Critical data structures are aligned to 64-byte cache lines (alignas(64)). This prevents False Sharing, a phenomenon where two threads fight over the same CPU cache line, drastically reducing performance on multi-core systems.
- After installing via
pip, you can integrate Shaurya directly into your Python trading strategies or research notebooks to access C++ performance with Python simplicity.
pip install hft.shaurya
- Create a Python script (e.g., main.py) to initialize the engine and listen for market data.
- Note: Ensure you are running the MultiSourceUDP.py simulator (or have a real UDP feed active) before starting the engine.
import shaurya_hft
import time
def main():
print("Initializing Shaurya HFT Engine...")
print("🚀 Engine Started. Listening for live ticks...")
try:
while True:
latency = engine.get_latency()
if latency > 0:
print(f"⚡ Tick Processed | Latency: {latency:.4f} μs")
time.sleep(0.5)
except KeyboardInterrupt:
print("\nStopping Engine...")
engine.stop()
print("Engine Shutdown Complete.")
if __name__ == "__main__":
main()| Function | Description |
|---|---|
engine = shaurya_hft.Engine() |
Initializes the C++ memory structures and lock-free ring buffers. |
engine.start(ip, port) |
Spawns the high-performance C++ network thread to listen on the specified UDP multicast group and releases the Python GIL. |
engine.get_latency() |
Returns the processing latency (in microseconds) of the most recent packet; thread-safe and lock-free. |
engine.stop() |
Safely signals the C++ thread to terminate and cleans up socket resources. |
If you are new to High-Frequency Trading systems, these concepts explain the "Why" behind Shaurya's architecture:
- Latency vs. Jitter: Understand why "Average Speed" is useless in HFT.
- Zero-Copy Networking: How avoiding memory copies saves microseconds.
- Lock-Free Programming: An introduction to Atomics and Ring Buffers.
- False Sharing: The hidden killer of multi-threaded performance.
Developed by your's truly 🛩️!
