Weiting Tan, Yunmo Chen, Tongfei Chen, Guanghui Qin, Haoran Xu, Heidi C. Zhang, Benjamin Van Durme, and Philipp Koehn
STAR is a research project that targets low-latency speech translation/transcription using a segmenter module learned from cross-attention feedback from Transformer Encoder-Decoder Model.
The project is built using poetry for streamlined dependency management and reproducible environments. To install the environment, pull our codebase and run poetry install or you can setup your own env using conda and other tool by referring to our pyproject.toml config.
In this codebase, we refer to the module as “nugget” rather than “star.” This naming reflects our use of the cross-attention feedback mechanism first proposed in Nugget: Neural Agglomerative Embeddings of Text, now extended to speech-to-text tasks.
STAR/
├── src/ # Core source code
│ ├── train_simul.py # Simultaneous S2T
│ ├── train_w2v.py # Non-Streaming S2T
│ ├── lightning/ # PyTorch Lightning trainer modules
│ │ ├── simul_trainer.py
│ │ └── wav2vec_trainer.py
│ ├── data_utils/ # Data loading and preprocessing
│ │ └── data_module.py
│ │ └── preprocess_{dataset}.py # data preprocessing script is left here
│ ├── models/ # Customized Transformer and CTC model Code
│ │ └── my_transformer.sh
│ │ └── my_wav2vec.sh
|
├── scripts/ # Example run scripts
│ ├── simul_s2t.sh # Simultaneous S2T Entry
│ └── non_streaming.sh # Non-streaming S2T Entry
|
├── pyproject.toml # Poetry configuration
├── README.md # Project overview
└── LICENSE # License informationIn section 3 of our paper, we present non-streaming experiments. The training script can be found in /scripts/non_streaming.sh and the argument --nugget_compress_rate controls the compression rate. For details of the segmenter training, please track how scorer_logits are updated in my_transformer.py.
The training script is provided in simul_s2t.sh. The training is overall very similar to non-streaming compression, except that we add regularization to the scores following CIF so that the number of activation is close to the number of target tokens. Infinite-lookback can be activated by setting --use_ilk to True.
@misc{tan2024streaming,
title={Streaming Sequence Transduction through Dynamic Compression},
author={Weiting Tan and Yunmo Chen and Tongfei Chen and Guanghui Qin and Haoran Xu and Heidi C. Zhang and Benjamin Van Durme and Philipp Koehn},
year={2024},
eprint={2402.01172},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

