Centurion Mascot

Centurion Project Overview

A technical summary and ecosystem positioning of the Centurion foundation‑model plugin for the Julius decoder.

GitHub: https://github.com/halspeech/julius-speech-foundation-model

1. Abstract Description of the Centurion Project

Architecture

Audio Source adintool (from julius) Centurion plugin (wav2vec2 / WavLM / HuBERT / Whisper) Frame Encoder Logits Vector vecnet TCP HTK export Julius Decoder HMM Lexicon + LM output

Centurion is a lightweight integration plugin/layer that enables the classical Julius speech decoder to operate with modern foundation-model acoustic front‑ends such as wav2vec 2.0, HuBERT, WavLM, Whisper, and Data2Vec. Instead of modifying Julius’ C source code, Centurion injects frame‑level logits or posterior probabilities into Julius through existing mechanisms (vecnet streaming or HTK-compatible offline feature files). This design preserves Julius’ deterministic, high‑speed Viterbi decoding while significantly improving acoustic modeling using recent self-supervised learning (SSL) encoders.

Centurion effectively acts as a bridge between classical HMM-based decoding and modern neural acoustic modeling. Its purpose is not to replace Julius, but to extend its lifespan by enabling it to leverage the representational power of large-scale neural encoders, reducing the need to retrain decoder-side models and maintaining compatibility with legacy toolchains.

2. Positioning in the Speech Recognition Ecosystem

To understand Centurion’s role, it is useful to compare it with existing ASR toolkits and industry systems:

2.1 Academic & Open‑Source Speech Recognition Toolkits

2.2 Industrial ASR Systems

3. How Centurion Differs

Centurion is unique in that it provides:

In contrast to Kaldi, ESPNet, and industrial APIs, Centurion prioritizes interoperability, low-latency decoding, and ease of integration with legacy pipelines.

4. Speech Recognition Fundamentals

ASR traditionally follows four conceptual components:

Centurion preserves this classical framework while modernizing the acoustic modeling step.