This project simulates acoustic emission (AE) hits and stores them as Parquet files for later analysis. Each hit is a short, damped sinusoidal signal with noise, timestamped and assigned to a sensor.
The main entry point is the CLI script ae-simulate.
Build the image from the repository root:
docker build -t ae-simulate .Show CLI help via Docker:
docker run --rm ae-simulate --helpRun a finite simulation, storing hits in a host directory.
mkdir simulated_hits
docker run --rm `
-v "${PWD}/simulated_hits:/app/simulated_hits" `
ae-simulate `
--n-sensors 10 `
--n-hits 100 `
--realtimemkdir -p simulated_hits
docker run --rm \
-v "$(pwd)/simulated_hits:/app/simulated_hits" \
ae-simulate \
--n-sensors 10 \
--n-hits 100 \
--realtimeRun an infinite, real-time simulation and stop it with Ctrl+C or a
SIGTERM (same volume mapping applies, only the arguments change), e.g. on
Linux/macOS:
docker run --rm \
-v "$(pwd)/simulated_hits:/app/simulated_hits" \
ae-simulateThe container's entrypoint is ae-simulate, so any additional arguments
after the image name are passed directly to the simulator.
This project uses Poetry.
-
Make sure Poetry is installed.
-
From the repository root, install dependencies:
poetry install
-
Activate the environment when running commands:
poetry run ae-simulate --help
If you prefer not to use Poetry, you can instead create a virtual environment of your choice and install the project with pip from the repository root (for example on Windows PowerShell):
python -m venv .venv
.venv\\Scripts\\Activate.ps1
pip install .
ae-simulate --helpThe CLI is exposed via the ae-simulate script (configured in
[tool.poetry.scripts] in pyproject.toml).
Basic help:
poetry run ae-simulate --help-
--directory PATH- Directory where simulated hit Parquet files are written.
- Default:
./simulated_hits(created if it does not exist).
-
--sampling-frequency-hertz FLOAT- Sampling frequency of the generated waveforms in Hz.
- Default:
1000000.0(1 MHz).
-
--n-sensors INT- Number of sensors to simulate.
- Hits are randomly assigned to sensors with IDs in
[1, n-sensors]. - Default:
20.
-
--n-hits INT- Total number of hits to generate.
- Default:
None→ run indefinitely until interrupted withCtrl+C.
-
--realtime- If provided, the simulator does not wait between hits (i.e. runs as fast as possible).
- If omitted, real-time behavior is simulated: the process sleeps between hits according to a random inter-arrival time model.
-
--log-level LEVEL- Logging verbosity:
DEBUG,INFO,WARNING,ERROR. - Default:
INFO.
- Logging verbosity:
Simulate 100 hits for 10 sensors as fast as possible into the default directory:
poetry run ae-simulate --n-sensors 10 --n-hits 100 --realtimeSimulate hits indefinitely for 20 sensors, with real-time delays between hits, into a custom directory:
poetry run ae-simulate --directory data/hitsSimulate a shorter waveform (lower sampling frequency) for 50 sensors:
poetry run ae-simulate \
--n-sensors 50 \
--sampling-frequency-hertz 200000.0 \
--n-hits 1000 --realtimeStop an infinite simulation with Ctrl+C.
Each simulated hit is saved as a separate Parquet file in the target directory. Filenames follow the pattern:
sensor_{sensor_id}_{YYYYMMDD}T{HHMMSS}_{uuid}.parquet
Each file contains a single-row table with the following columns:
Timestamp–pandas.Timestamp(UTC) when the hit occurred.SamplingFrequencyHertz– float, the sampling frequency used.Signal– list of floats (millivolts), the waveform samples.
You can load and analyze the data with pandas:
import pandas as pd
df = pd.read_parquet("simulated_hits/sensor_1_20260127T120000_....parquet")
signal = df.loc[0, "Signal"]
timestamp = df.loc[0, "Timestamp"]
fs = df.loc[0, "SamplingFrequencyHertz"]For downstream analytics or stream-processing pipelines, you can
summarise each waveform into a small set of features using
extract_features from mkp/ae/simulate/features.py:
from mkp.ae.simulate.features import extract_features
features = extract_features(signal, sampling_frequency_hertz=fs)The returned dictionary contains peak amplitude, energy and dominant frequency and is suitable for feeding into streaming/ML pipelines or online monitoring dashboards.
-
Format/lint checks are configured via Ruff.
-
Tests are run with pytest:
poetry run pytest
Adjust or extend the simulator logic in mkp/ae/simulate/simulate.py and
the CLI in mkp/ae/simulate/main.py.