SHENG LI

SHENG LI (李勝) Japanese version

Ph.D.
Tenure-track Researcher

National Institute of Information and Communications Technology (NICT)
└──The Universal Communication Research Institute (UCRI)
└──The Advanced Speech Translation Research and Development Promotion Center (ASTREC)
└──Advanced Speech Technology Laboratory (ASTL)

E-mail: sheng.li [at] nict.go.jp / lisheng.cs [at] gmail.com / sheng.li [at] ieee.org
Address: 3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0289, Japan

Current Research:

Next generation speech recognition/translation/synthesis technology
Welcome research students/visiting researchers for long and short-time visits.
I am expecting to collaborate with intelligent brains all over the world.

Education:

2006.7 B.S Computer Science, Nanjing University
2009.7 M.E Software and Embedded Systems, Nanjing University
(Joint Program with Chinese Academy of Sciences, Chinese University of Hong Kong)
2016.3 Ph.D Informatic Science, Kyoto University
(supported by Japanese Government MEXT, not CSC)

Work Experiences:

Jul.2009 - Apr.2012: Chinese Academy of Sciences [Guangdong China]
1. Mis-pronunciation detection for second language learners (NSFC60772165)
2. Phone-level articulatory 3D animation in pronunciation training (NSFC61135003)
Apr.2012 - Sep.2012: Sogou/Sohu Pinyin IME [Beijing, China]
Speech IME input (Discriminative training of acoustic models)
Oct.2012 - Jan.2017: Kyoto University, Speech and Audio Processing Lab.
1. DNN-based speech recognition and spoken lecture transcription (MEXT PhD, Oct.2012 - Mar.2016)
2. Speaker adaptation on DNN model (NICT cooperative research)
3. English ASR and online decoding for humanoid robot (ERATO researcher, Apr.2016 - Dec.2016)
2017 - Present: National Institute of Information & Communications Technology (NICT)
1. Researcher on multilngual speech technology for Tokyo Olympic2020 project (2017-2019)
2. Next generation speech technology (Tenure-track Researcher collaborating with Kyoto Univ., 2020 - Present)
Apr.2019 - May.2019: visiting researcher, Computer science department, Oxford University
Trust AI and explainable deep neural network
Dec.2021 - Present: master course advisor, Informatics department, Kyoto University
security-aware speech processing
Apr.2022 - Present: Guest researcher, Nanyang Technological University, Singapore
Speech recognition

Awards:

NICT
2022 Co-1st place in Main/OOD tracks in INTERSPEECH2022 special session: VoiceMOS challenge
2021 3rd/4th place in constrained/unconstrained resource multilingual ASR tracks of OLR2021 challenge
2021 Supervised student (Soky Kak) got best student paper nomination in O-COCOSDA2021
2021 R3 NICT Award: Outstanding Performance Award Excellence Award (Group)
2020 Two supervised students (Hao Shi and Yuqin Lin) got ISCA-grants in INTERSPEECH2020
2020 Supervised student (Yaowei Han) got best student paper nomination in IEEE-ICME2020
Kyoto University
2018 IEEE Signal Processing Society Japan Student Journal Paper Award
2016 Paper nominated as ACM/IEEE Trans. Audio, Speech and Language Process. cover
2012-2016 Kyoto Univ. admission/tuition fee total exemption
2012 MEXT scholarship by Japanese Government (recommanded by Kyoto Univ., not CSC)
Joint Lab CAS and CUHK
2012 Travel grant by IBM research for INTERSPEECH2012 at Portland, USA
2011 Best Creative Project Award in Young Entrepreneur Program 2011, HK
2011 Excellent Staff Award of Chinese Academy of Sciences
Nanjing University
2004 Encouragement Scholarship of Nanjing University
2002 Chen Yinchuan Scholarship (Hongkong) for Excellent University New Students
2002 Award of chemistry and biology Olympic for high school students (Jiangsu Province, China)

Fundings and Grants:

NICT international funding (PI): 2022-2024 (ongoing)
Bridging Eurasia from Sea -- Multilingual Speech Recognition for Maritime Silkroad
Grant-in-Aid for Young Scientists (PI): 2021-2023 (ongoing)
Phantom in the Opera -- the Vulnerabilities of Speech Interface for Robotic Dialogue System
NICT tenure-track start-up funding (PI): 2020-2023 (ongoing)
Advanced Multilingual End-to-End Speech Recognition
NICT international funding (PI): 2020-2022
Bridging Eurasia -- Multilingual Speech Recognition for Silkroad
NII Open Collaborative Research (collaborator): 2020-2021
Speaker De-identification with Provable Privacy in Speech Data Release
Grant-in-Aid for Research Activity Start-up (PI): 2019-2021
Next generation multilingual End-to-End speech recognition (from G30 to G200)

Publications:

(13 ICASSP, 22 INTERSPEECH, 3 Odyssey, 1 ASRU, 1 SIGDIAL, 2 SLT, 1 ICME, 2 TASLP, 1 SPEECH COMMUN, 1 EURASIP JASMP)

Ph.D Thesis:

Sheng Li (supervised by Prof. Tatsuya Kawahara).
Speech Recognition Enhanced by Lightly-supervised and Semi-supervised Acoustic Model Training.
Ph.D. Thesis, Kyoto University, Feb, 2016.

Book Chapter:

S. Li, Voices of the Himalayas: Investigation of Speech Recognition Technology for the Tibetan Language, ISBN**, 2022.
S. Li, Phantom in the Opera: The Vulnerabilities of Speech-based Artificial Intelligence Systems, ISBN978-4-904020, 2022.
X. Lu, S. Li, M. Fujimoto, Speech-to-Speech Translation, pp. 21-38, Springer Singapore, 2020.
S. Li, Chapter: From Shallow to Deep and Very Deep.
S. Li, Chapter: End-to-End and CTC models.

Invited Talks:

Phoneme-level articulatory animation in pronunciation training using EMA data,
2012, Speech Synthesis Lab., Tsinghua University, host: Prof. Zhiyong Wu.
Lightly-supervised training and confidence estimation by using CRF classifiers,
2014, Speech and Cognition Lab., Tianjin University, host: Prof. Jianwu Dang and Prof. Kiyoshi Honda.
End-to-End Speech Recognition,
2019, University of Tokyo, host: Dr. Jinze Yu.

Journals (Peer reviewed):

Kak Soky, Masato Mimura, Chenhui Chu, Tatsuya Kawahara, S. Li, Chenchen Ding, Sethserey Sam.
TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies.
International Journal of Asian Language Processing, 2022. (invited paper for best student paper nominated in OCOCOSDA-2021)
C. Fan, H. Zhang, J. Yi, Z. Lv, J. Tao, T. Li, G. Pei, X. Wu, S. Li.
SpecMNet: Spectrum Mend Network for Monaural Speech Enhancement.
Applied Acoustics, Volume 194, 15 June 2022, 108792.
S. Shimizu, C. Chu, S. Li, S. Kurohashi.
Cross-Lingual Transfer Learning for End-to-End Speech Translation.
Journal of Natural Language Processing (JNLP), Vol.29, No.2, 2022, June.
X. Chen, H. Huang, and S. Li,
Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview.
Applied Sciences, Special Issues of Machine Speech Communication, 2021. (Peer reviewed, invited survey paper)
S. Qin, L. Wang, S. Li (corresponding), J. Dang and L. Pan.
Improving Low-resource Tibetan End-to-end ASR by Multilingual and Multi-level Unit Modeling.
EURASIP Journal on Audio, Speech and Music Processing. (EURASIP JASMP), No.2 , 2022.
P. Shen, X. Lu, S. Li(patent co-inventor), H. Kawai.
Knowledge Distillation-based Representation Learning for Short-Utterance Spoken Language Identification.
IEEE Trans. Audio, Speech \& Language Process. (TASLP), vol. 28, pp. 2674--2683, 2020.
S. Li, Y.Akita, and T.Kawahara.
Semi-supervised acoustic model training by discriminative data selection from multiple ASR systems' hypotheses.
IEEE Trans. Audio, Speech \& Language Process. (TASLP), Vol.24, No.9, pp.1520--1530, 2016.
(Cover Paper, IEEE Signal Processing Society Japan Student Journal Paper Award)
S. Li, Y.Akita, and T.Kawahara.
Automatic lecture transcription based on discriminative data selection for lightly supervised acoustic model training.
IEICE Trans., Vol.E98-D, No.8, pp.1545--1552, 2015.
L. Wang, H. Chen, S. Li, and H. Meng.
Phoneme-level articulatory animation in pronunciation training,
Speech Communication (SPEECH COMMUN), Vol. 54, Issue 7, Sept. pp. 845--856, 2012.

International Conferences (Peer reviewed):

2022 (Tenure-track @NICT)

Zhuo Gong, Saito Daisuke, S. Li, Hisashi Kawai, Minematsu Nobuaki,
Can We Train a Language Model Inside an End-to-End ASR Model? - Investigating Effective Implicit Language Modeling.
in Proc. COLING2022 Workshop, 2022. (accept)
Kak Soky, Zhuo Gong, S. Li,
NICT-Tib1: A Public Speech Corpus of Lhasa Dialect for Benchmarking Tibetan Language Speech Recognition Systems.
in Proc. O-COCOSDA, 2022. (accept)
S. Li, Jiyi Li, Qianying Liu, Zhuo Gong,
An End-to-End Chinese and Japanese Bilingual Speech Recognition Systems with Shared Character Decomposition.
in Proc. ICONIP, 2022. (accept)
Xiaojiao Chen, Hao Huang, S. Li,
GVec: Extracting Speaker Embedding from End-to-End Speech Recognition Model.
in Proc. ICONIP, 2022. (accept)
Guangxing Li, Wangjin Zhou, S. Li, Yi Zhao, Jichen Yang, Hao Huang,
Investigating Effective Domain Adaptation Method for Speaker Verification Task.
in Proc. ICONIP, 2022. (accept)
Hao Shi, Longbiao Wang, S. Li, Jianwu Dang, Tatsuya Kawahara,
Subband-Based Spectrogram Fusion for Speech Enhancement by Combining Mapping and Masking Approaches.
in Proc. APSIPA ASC, 2022. (accept)
Longfei Yang, Jiyi Li, S. Li and Takahiro Shinozaki,
Multi-Domain Dialogue State Tracking with Top-k Slot Self Attention.
in Proc. SIGdial Meeting Discourse \& Dialogue, 2022. (accept)
Kak Soky, S. Li, Masato Mimura, Chenhui Chu and Tatsuya Kawahara,
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.
in Proc. INTERSPEECH, 2022. (accept)
Longfei Yang, Wenqing Wei, S. Li, Jiyi Li and Takahiro Shinozaki,
Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection.
in Proc. INTERSPEECH, 2022. (accept)
Kai Li, S. Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang and Masashi Unoki,
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
in Proc. INTERSPEECH, 2022. (accept)
Zhengdong Yang, Wangjin Zhou, Chenhui Chu, S. Li, Raj Dabre, Raphael Rubino and Yi Zhao,
Fusion of Self-supervised Learned Models for MOS Prediction.
in Proc. INTERSPEECH, 2022. (accept)
Siqing Qin, Longbiao Wang, S. Li, Yuqin Lin and Jianwu Dang,
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.
in Proc. INTERSPEECH, 2022. (accept)
Hao Shi, Longbiao Wang, S. Li, Jianwu Dang and Tatsuya Kawahara,
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
in Proc. INTERSPEECH, 2022. (accept)
Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, S. Li and Jianwu Dang,
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.
in Proc. INTERSPEECH, 2022. (accept)
Kai Li, Xugang Lu, Masato Akagi, Jianwu Dang, S. Li, Masashi Unoki,
Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals:Data-Driven Study Based on Frequency-Wise Attentional Neural Network
In Proc. EUSIPCO (European Signal Processing Conference), 2022. (accepted)
Zhuo Gong, Daisuke Saito, Longfei Yang, Takahiro Shinozaki, S. Li, Hisashi Kawai and Nobuaki Minematsu,
Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model.
In Proc. ISCA-Odyssey (The Speaker and Language Recognition Workshop), 2022. (accepted)
S. Li, J. Li, Q. Liu, Z. Gong,
Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection.
in Proc. LREC (Language Resources and Evaluation Conference), 2022. (accepted)
Y. Lv, L. Wang, M. Ge, S. Li (corresponding), C. Ding, L. Pan, Y. Wang, J. Dang, K. Honda,
Compressing Transformer-based ASR Model by Task-driven Loss and Attention-based Multi-level Feature Distillation.
in Proc. IEEE-ICASSP, pp. 7992--7996, 2022.
K. Wang, Y. Peng, H. Huang, Y. Hu, and S. Li,
Mining Hard Samples Locally and Globally for Improved Speech Separation.
in Proc. IEEE-ICASSP, pp. 6037--6041, 2022.

2021 (Tenure-track @NICT)

D. Liu, L. Wang, S. Li, H. Li, C. Ding, J. Zhang and J. Dang.
Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS.
In Proc. ICONIP, 2021. (accepted)
L. Qiang, H. Shi, M. Ge, H. Yin, N. Li, L. Wang, S. Li and J. Dang.
Speech Dereverberation Based on Scale-aware Mean Square Error Loss.
In Proc. ICONIP, 2021. (accepted)
H. Yin, L. Qiang, H. Shi, L. Wang, S. Li, M. Ge, G. Zhang and J. Dang.
Simultaneous Progressive Filtering-based Monaural Speech Enhancement.
In Proc. ICONIP, 2021. (accepted)
K. Soky, M. Mimura, T. Kawahara, S. Li, C. Ding, C. Chu, and S. Sam.
Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC).
In Proc. O-COCOSDA, pp. 122--127, 2021. (Best student paper nomination, invited as fast tracking journal paper)
H. Shi, L. Wang, S. Li, C. Fan, J. Dang, and T. Kawahara.
Spectrograms Fusion-based End-to-End Robust Automatic Speech Recognition.
In Proc. APSIPA ASC, pp. 438--442, 2021.
Y. Peng, J. Zhang, H. Zhang, H. Xu, H. Huang, S. Li, and E.S. Chng.
Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework.
In Proc. APSIPA ASC, pp. 1043-1048, 2021.
K. Soky, S. Li, M. Mimura, C. Chu, and T. Kawahara.
On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora.
In Proc. APSIPA ASC, pp. 433-437, 2021.
D. Wang, S. Ye, X. Hu, S. Li, and X. Xu,
An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model.
in Proc. INTERSPEECH, pp. 3266--3270, 2021.
K. Wang, H. Huang, Y. Hu, Z. Huang, and S. Li,
End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time- Frequency Domain.
in Proc. INTERSPEECH, pp. 3046--3050, 2021.
N. Li, L. Wang, M. Unoki, S. Li, R. Wang, M. Ge, and J. Dang,
Robust voice activity detection using a masked auditory encoder based convolutional neural network.
in Proc. IEEE-ICASSP, pp. 6828--6832, 2021.
S. Chen, X. Hu, S. Li, and X. Xu,
An investigation of using hybrid modeling units for improving End-to-End speech recognition systems.
in Proc. IEEE-ICASSP, pp. 6743--6747, 2021.
H. Huang, K. Wang, Y. Hu, and S. Li,
Encoder-Decoder based pitch tracking and joint model training for Mandarin tone classification.
in Proc. IEEE-ICASSP, pp. 6943--6947, 2021.

2020 (Tenure-track @NICT)

A. Thida, N. Han, S. Oo, S. Li, and C. Ding.
VOIS: The First Speech Therapy App in the World for Myanmar Hearing-Impaired Children.
In Proc. O-COCOSDA, pp. 151--154, 2020.
S. Guo, L. Wang, S. Li, J. Zhang, C. Gong, Y. Wang, J. Dang, and K. Honda.
Investigation of Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data.
In Proc. ICONIP, pp. 36--47, 2020.
Y. Lin, L. Wang, S. Li, J. Dang, and C. Ding.
Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription.
In Proc. INTERSPEECH, pp. 4791--4795, 2020 (Travel Granted by ISCA).
H. Shi, L. Wang, S. Li, C. Ding, M. Ge, N. Li, J. Dang, and H. Seki.
Singing Voice Extraction with Attention based Spectrograms Fusion.
In Proc. INTERSPEECH, pp. 2412--2416, 2020 (Travel Granted by ISCA).
S. Li, X. Lu, R. Dabre, P. Shen, and H. Kawai.
Joint Training End-to-End Speech Recognition Systems with Speaker Attributes.
In Proc. ISCA-Odyssey (The Speaker and Language Recognition Workshop), pp. 385--390, 2020.
P. Shen, X. Lu, K. Sugiura, S. Li, and H. Kawai.
Compensation on x-vector for short utterance spoken language identification.
In Proc. ISCA-Odyssey (The Speaker and Language Recognition Workshop), pp. 47--52, 2020.
Y. Han, S. Li, Y. Cao, Q. Ma, and M. Yoshikawa.
Voice-Indistinguishability: Protecting Voiceprint in Privacy Preserving Speech Data Release.
In Proc. IEEE-ICME, pp. 1--6, 2020. (Best student paper nomination, select as fast tracking journal paper of IEEE Trans. Multimedia (TMM))
Y. Lin, L. Wang, J. Dang, S. Li, and C. Ding.
End-To-End Articulatory Modeling for Dysarthria Articulatory Attribute Detection.
In Proc. IEEE-ICASSP, pp. 7349--7353, 2020.
H. Shi, L. Wang, M. Ge, S. Li, and J. Dang.
Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation.
In Proc. IEEE-ICASSP, pp. 7544--7548, 2020.

2019 (Post-doc @NICT)

K.Soky, S. Li, T.Kawahara, and S.Seng.
Multi-lingual transformer training for Khmer automatic speech recognition.
In Proc. APSIPA ASC, pp. 1893--1896, 2019.
L. Pan, S. Li, L. Wang, and J. Dang.
Effective training End-to-End ASR systems for low-resource Lhasa dialect of Tibetan language.
In Proc. APSIPA ASC, pp. 1152--1156, 2019.
X. Lu, P. Shen, S. Li, Y. Tsao, and H. Kawai.
Class-wise Centroid Distance Metric Learning for Acoustic Event Detection.
In Proc. INTERSPEECH, pp. 3614--3618, 2019.
S. Li, X. Lu, C. Ding, P. Shen, T. Kawahara, and H. Kawai.
Investigating Radical-based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.
In Proc. INTERSPEECH, pp. 2200--2204, 2019.
S. Li, C. Ding, X. Lu, P. Shen, T. Kawahara, and H. Kawai.
End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition.
In Proc. INTERSPEECH, pp. 2145--2149, 2019.
S. Li, R. Dabre, X. Lu, P. Shen, T. Kawahara, and H. Kawai.
Improving Transformer-based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.
In Proc. INTERSPEECH, pp. 4400--4404, 2019.
P.Shen, X.Lu, S. Li, and H.Kawai.
Interactive learning of teacher-student model for short utterance spoken language identification.
In Proc. IEEE-ICASSP, pp. 5981--5985, 2019.
R.Takashima, S. Li, and H.Kawai.
Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models.
In Proc. IEEE-ICASSP, pp. 6156--6160, 2019.

2018 (Post-doc @NICT)

S. Li, X.Lu, R.Takashima, P.Shen, T.Kawahara, and H.Kawai.
Improving very deep time-delay neural network with vertical-attention for effectively training CTC-based ASR systems.
In Proc. IEEE Spoken Language Technology Workshop (IEEE-SLT), pp. 77--83, 2018.
S. Li, X.Lu, R.Takashima, P.Shen, T.Kawahara, and H.Kawai.
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks.
In Proc. INTERSPEECH, pp. 3708--3712, 2018.
P.Shen, X.Lu, S. Li, and H.Kawai.
Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification.
In Proc. INTERSPEECH, pp. 1813--1817, 2018.
X.Lu, P.Shen, S. Li, Y.Tsao, and H.Kawai.
Temporal Attentive Pooling for Acoustic Event Detection.
In Proc. INTERSPEECH, pp. 1354--1357, 2018.
R.Takashima, S. Li, and H.Kawai.
An Investigation of a Knowledge Distillation Method for CTC Acoustic Models.
In Proc. IEEE-ICASSP, pp. 5809--5813, 2018.
R.Takashima, S. Li, and H.Kawai.
CTC Loss Function with a Unit-level Ambiguity Penalty.
In Proc. IEEE-ICASSP, pp. 5909--5913, 2018.

2017 (Post-doc @NICT)

S. Li, X.Lu, P.Shen, R.Takashima, T.Kawahara, and H.Kawai.
Incremental training and constructing the very deep convolutional residual network acoustic models.
In Proc. IEEE Workshop Automatic Speech Recognition \& Understanding (IEEE-ASRU), pp. 222--227, 2017.
P. Shen, X. Lu, S. Li, and H. Kawai.
Conditional Generative Adversarial Nets Classifier for Spoken Language Identification.
In Proc. INTERSPEECH, pp. 2814--2818, 2017.
S. Li, X.Lu, S.Sakai, M.Mimura, and T.Kawahara.
Semi-supervised ensemble DNN acoustic model training.
In Proc. IEEE-ICASSP, pp. 5270--5274, 2017.

before 2017 (Kyoto Univ.)

S. Li, X.Lu, S.Mori, Y.Akita, and T.Kawahara.
Confidence Estimation for Speech Recognition Systems using Conditional Random Fields Trained with Partially Annotated Data
Int'l Sympo. Chinese Spoken Language Processing (ISCSLP), pp.1--5, 2016.
S. Li, Y.Akita, and T.Kawahara.
Data selection from multiple ASR systems' hypotheses for unsupervised acoustic model training.
In Proc. IEEE-ICASSP, pp. 5875--5879, 2016.
S. Li, Y.Akita, and T.Kawahara.
Discriminative data selection for lightly supervised training of acoustic model using closed caption texts.
In Proc. INTERSPEECH, pp. 3526--3530, 2015.(oral)
S. Li, X.Lu, Y.Akita, and T.Kawahara.
Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation.
In Proc. INTERSPEECH, pp. 2892--2896, 2015.
S. Li, Y.Akita, and T.Kawahara.
Corpus and transcription system of Chinese Lecture Room.
In Proc. Int'l Sympo. Chinese Spoken Language Processing (ISCSLP), pp. 442--445, 2014.

before 2013 (Joint Lab. CAS & CUHK)

S. Li and L. Wang.
Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data,
In Proc. INTERSPEECH, pp. 903--906, 2012. (Travel granted by IBM research)
S. Li, L. Wang, and E. Qi.
The Phoneme-level Articulator Dynamics for Pronunciation Animation,
In Proc. IALP, pp. 283--286, 2011.
J. Chen, L. Wang, C. Li, J. Hu, and S. Li.
IELS: A Computer-aided Pronunciation Training System for Undergraduate Students,
ICETC, Vol.1, pp. 338--342, 2010.

Challenges・Demo:

X. Chen, G. Li, H. Huang, W. Zhou, S. Li, Y. Cao, Y. Zhao.
VoicePrivacy Challenge: System description.
In VoicePrivacy 2022 Challenge Workshop (Interspeech2022)
Guangxing Li, Wangjin Zhou, S. Li, Yi Zhao, Hao Huang, Jichen Yang.
System Description for the CN-Celeb Speaker Recognition Challenge 2022.
In CNSRC (the CN-Celeb Speaker Recognition Challenge), Speaker Odyssey 2022
S. Li, R. Dabre, R. Raphael, W. Zhou, Z. Yang, C. Chu, Y. Zhao.
The System Description for VoiceMOS Challenge 2022 (KK team, main/ood tasks).
D. Wang, S. Ye, X. Hu, S. Li.
The RoyalFlush-NICT System Description for AP21-OLR Challenge (Silk-road team, full tasks).
In OLR2021 (oriental language recognition challenge), 2021.
W. Wei, R. Wong, S. Li, Y. Guo and H. Huang.
System description of Alzheimer's disease early detection (Silk-road team, short speech track).
In special session of NCMMSC2021 (Alzheimer's disease detection challenge), 2021.
Y. Han, S. Li, Y. Cao, and M. Yoshikawa,
System Description for Voice Privacy Challenge (Kyoto Team).
In special session of INTERSPEECH2020 (VoicePrivacy challenge 2020), 2020.
Y. Han, Y. Cao, S. Li, Q. Ma, and M. Yoshikawa.
Voice-Indistinguishability: Protecting Voiceprint with Differential Privacy under an Untrusted Server.
ACM conference on Computer and Communications Security (CCS), demo, pp. 2125--2127, 2020.
H. Zhang, S. Li, X. Ma, Y. Zhao, Y. Cao, and T. Kawahara,
Phantom in the Opera: Effective Adversarial Music Attack on Keyword Spotting Systems.
in Proc. IEEE-SLT, 2021 (demo session, introduction).

Patents:

発明者:李勝、ルーシュガン、高島遼一、沈鵬、河井恒・出願人/特許権者：国立研究開発法人情報通信研究機構,
学習方法・特願2017-236626・特開2019-105899・特許番号「6979203」,
出願2017年12月11日・公開2019年6月21日, 特許登録日2021年11月17日.
発明者:高島遼一、李勝、河井恒・出願人/特許権者：国立研究開発法人情報通信研究機構,
時系列情報の学習システム、方法およびニューラルネットワークモデル・特願2018-044134・特開2019-159654・特許番号「7070894」,
出願2018年03月12日・公開2018年03月12日・特許登録日2022年5月10日.
発明者:李勝、ルーシュガン、高島遼一、沈鵬、河井恒・出願人/特許権者：国立研究開発法人情報通信研究機構,
音声認識システム、音声認識方法、学習済モデル・特願2018-044491・特開2019-159058・特許番号「7070894」,
出願 2018年03月12日・公開2018年3月12日, 特許登録日2022年7月22日.

Technical Reports:

K. Soky, S. Li, C. Chu, and T. Kawahara,
Domain and Language Adaptation of Large-scale Pretrained Model for Speech Recognition of Low-resource Language,
IEICE-SP, 2022.
Kai Li, Xugang Lu, Masato Akagi, Jianwu Dang, S. Li, Masashi Unoki,
Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals:Data-Driven Study Based on Frequency-Wise Attentional Neural Network.
IEICE Tech. Rep.(SIP), 2022-08. (oral)
Kak Soky, S. Li, Masato Mimura, Chenhui Chu and Tatsuya Kawahara,
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.
第55回関西合同音声ゼミ, 2022.
Hao Shi, Longbiao Wang, S. Li, Jianwu Dang and Tatsuya Kawahara,
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
第55回関西合同音声ゼミ, 2022.
X. Chen, H. Huang, and S. Li,
Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview.
in Proc. National Conference on Man-Machine Speech Communication (NCMMSC), 2021.
(report is selected to publish in Applied Sciences, Special Issues of Machine Speech Communication)
Y. Han, S. Li, Y. Cao, Q. Ma, and M. Yoshikawa.
Voice-Indistinguishability — Protecting Voiceprint with Differential Privacy under an Untrusted Server.,
京都大学第 15 回 ICT イノベーション, 2021-2-17.
K. Soky, S. Li, M. Mimura, C. Chu, and T. Kawahara,
Comparison of End-to-End Models for Joint Speaker and Speech Recognition,
IEICE-SP, 2021.
S. Shimizu, C. Chu, S. Li, and S. Kurohashi,
End-to-End Speech Translation with Cross-lingual Transfer Learning,
NLP, 2021.
Y. Han, S. Li, Y. Cao, Q. Ma, and M. Yoshikawa,
Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release,
INTERSPEECH 2020 Satellite Workshop (SLIMTS2020) (invited report).
H. Zhang, S. Ueno, M. Mimura, S. Li, W. Zhang, and T. Kawahara,
A Mixture of Character and Word End-to-End System for Keyword Spotting,
INTERSPEECH 2020 Satellite Workshop (SLIMTS2020)(full paper).
S. Guo, L. Wang, S. Li, J. Zhang, C. Gong, Y. Wang, J. Dang, and K. Honda,
Investigation of Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data and mask embedding,
INTERSPEECH 2020 Satellite Workshop (SLIMTS2020).
K. Soky, S. Li, T. Kawahara, and S. Seng,
Multi-lingual transformer training for Khmer automatic speech recognition,
INTERSPEECH 2020 Satellite Workshop (SLIMTS2020).
S. Li, C. Ding, X. Lu, P. Shen, and H. Kawai,
End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition,
Acoustical Society of Japan, spring, 2020.
S. Li, X. Lu, R. Dabre, P. Shen, and H. Kawai,
Joint Training End-to-End Systems for Speech and Speaker Recognition with Speaker Attributes,
Acoustical Society of Japan, spring, 2020.
P. Shen, X. Lu, K. Sugiura, S. Li, H. Kawai,
Improvement of x-vector for short utterance spoken language identification,
Acoustical Society of Japan, spring, 2020.
李勝,
End-to-end音声認識技術の研究,
情報通信フェア2019, 2019.
P. Shen, X. Lu, S. Li, and H. Kawai,
Investigation of multi-domain training for speech recognition,
Acoustical Society of Japan, spring, 2019.
高島遼一， 李勝，河井恒,
CTC音響モデルのためのシーケンスレベル知識蒸留法の検討,
IPSJ SIG-SLP-124-1, 2018.
S. Li, X. Lu, R.Takashima, P. Shen, and H. Kawai,
An Empirical Comparison of Sequence Training Methods for the Very Deep Time-delay Neural Network,
Acoustical Society of Japan, autumn, 2018.
P. Shen, X. Lu, S. Li, and H. Kawai,
Short utterance-based spoken language identification,
Acoustical Society of Japan, autumn, 2018.
S. Li, X. Lu, R.Takashima, P. Shen, and H. Kawai,
Improving CTC-based acoustic model with very deep residual neural network,
Acoustical Society of Japan, spring, 2018.
R.Takashima, S. Li, and H. Kawai,
CTC 音響モデルのための knowledge distillation 方式の検討,
Acoustical Society of Japan, spring, 2018.
S. Li, X. Lu, P. Shen, and H. Kawai,
Very deep convolutional residual network acoustic models for Japanese lecture transcription,
Acoustical Society of Japan, autumn, 2017.
P. Shen, X. Lu, S. Li, and H. Kawai,
cGAN-classifier: Conditional Generative Adversarial Nets for Classification,
Acoustical Society of Japan, autumn, 2017.
S. Li, X. Lu, S. Sakai, and T. Kawahara,
Diversity-driven Semi-supervised Ensemble DNN Acoustic Model Training,
Acoustical Society of Japan, autumn, 2016.
S. Li, X. Lu, S. Sakai, and T. Kawahara,
Diversity-driven Semi-supervised Ensemble DNN Acoustic Model Training,
IPSJ IEICE-SP2016-40, 2016. (oral)
S. Li, Y. Akita, and T. Kawahara,
Discriminative data selection from multiple ASR systems' hypotheses for unsupervised acoustic model training,
IPSJ SIG-SLP-109-8, 2015.(oral)
S. Li, Y. Akita, T. Kawahara,
Effective combination of multiple ASR hypotheses with CRF-based classifiers,
Acoustical Society of Japan, autumn, 2015.
S. Li, Y. Akita, and T. Kawahara,
Incorporating divergences from hypotheses of multiple ASR systems to improve unsupervised acoustic model training,
Acoustical Society of Japan, spring, 2015.
S. Li, Y. Akita, and T. Kawahara,
Unsupervised Training of Deep Neural Network Acoustic Models for Lecture Transcriptions,
Acoustical Society of Japan, autumn, 2014.
S. Li, Y. Akita, and T. Kawahara,
Classifier-based data selection for lightly-supervised training of acoustic model for lecture transcription,
IPSJ SIG-SLP-102-4, 2014.(oral)
S. Li, Y. Akita, and T. Kawahara,
Data Selection Assisted by Caption to Improve Acoustic Modeling for Lecture Transcription,
Acoustical Society of Japan, spring, 2014.(oral)
S. Li,
DNN-based Acoustic Modeling and Decoding for Chinese Spontaneous Speech Recogntion,
Technical-report. 2014.
S. Li, M. Mimura, and T. Kawahara,
Automatic Transcription of Chinese Spoken Lectures,
Acoustical Society of Japan, autumn. 2013.
S. Li,
Vocal Tract Length Normalization for Chinese Spontaneous Speech Recogntion,
Technical-report. 2013.
S. Li, K. Luo, and L. Wang,
The Phoneme-level Articulator Dynamics for 3D Pronunciation Animation for Chinese,
Bulletin of Advanced Technology Research, Vol.5 No.10/Otc.2011, Pages 5-7.
S. Li, and C. Li,
Application of the RFID based audio service in regional navigation system,
Bulletin of Advanced Technology Research, Vol.3 No.2/Feb.2009, Pages 44-47.

Software and Recipes:

VoiceTra: Multilingual Speech Translation Application (our teamwork, strongly suggest trying)
Julius decoder ── support Kaldi DNN-HMM acoustic model
└──── support Kaldi feature extractor
└──── support WFST graph optimization
└──── support CRF-based confidence estimation
└──── support EESEN CTC acoustic model
very deep residual time-delay neural network (TDNN) with LFMMI objective implemented with MS-CNTK
online speech recognition module for Erica the human robot

Data Releases:

S.Li, Y.Akita, and T.Kawahara.
Corpus and transcription system of Chinese Lecture Room.
In Proc. Int'l Sympo. Chinese Spoken Language Processing (ISCSLP), pp.442--445, 2014.
(Copy-right belongs to Kawahara Lab. Kyoto Univ.)
K. Soky, M. Mimura, T. Kawahara, S. Li, C. Ding, C. Chu and S. Sam.
Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC).
In Proc. O-COCOSDA, 2021.
(Copy-right belongs to Kawahara Lab. Kyoto Univ., NICT and CADT)
Kak Soky, Zhuo Gong, S. Li,
NICT-Tib1: A Public Speech Corpus of Lhasa Dialect for Benchmarking Tibetan Language Speech Recognition Systems.
in Proc. O-COCOSDA, 2022. (Copy-right belongs to NICT)
S. Li, J. Li, Q. Liu, Z. Gong,
Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection.
in Proc. LREC (Language Resources and Evaluation Conference), 2022. (Copy-right belongs to NICT)

Academic Services:

Academic Membership:

IEEE/IEEE-SPS (Signal Processing Society),
ISCA (International Speech Communication Association),
ASJ (Acoustical Society of Japan)，
SIG-CSLP (Chinese Spoken Language Processing),
APSIPA (Asia Pacific Signal and Information Processing Association)

Chairing and organizing:

[1] INTERSPEECH2020: Session Chair for session Topics of ASR I
[2] Co-organizing INTERSPEECH2020 SLIMTS (Spoken Language Interaction for Mobile Transportation System) workshop
[3] Speaker Odyssey2022: Session Chair for Evaluation and Benchmarking (EB)
[4] Co-organizing Coling2022 workshop

Reviewer/Program committee:
Journal:

[1] IEEE/ACM Trans. Audio, Speech \& Language Process.
[2] Computer Speech and Language
[3] Speech Communication
[4] IEICE transactions, letters
[5] APSIPA transactions
[6] Applied Acoustics
[7] Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
[8] Digital Signal Processing
[9] behavior information and technology

Conferences:

[1] ICASSP-2021/2022/2023, INTERSPEECH-2015/2018/2019/2020/2021/2022, SLT-2022
[2] APSIPA-2019/2020/2021/2022
[3] BC_VCC-2020 (Blizzard Challenge and Voice Conversion Challenge 2020)
[4] ACL-2017/2018/2020, EACL-2020/2022, NAACL-HLT-2016/2018/2019/2021
[5] IJCNLP-2017, EMNLP-IJCNLP-2019, EMNLP-2020/2021/2022, AACL-IJCNLP-2020/2022, COLING-2018/2022
[6] NLP-2022
[7] AAAI-2019, ICLR-2021, NeurIPS-2022
[8] IROS-2019, Ubiquitous Robots (UR)-2020
[9] ICME-2020/2021/2022, ACM Multimedia 2021/2022, ACM Multimedia Asia 2021/2022

Alumni:

Nanjing University
NJUer at Kansai
Info. OB of Kyoto University
Lilybbs

SHENG LI (李 勝) Japanese version