NICT 2023 ICASSP2024 ICMC-ASR (In-Car Multi-Channel Automatic Speech Recognition) Challenge. (top2 in one track)
2023年 1st place in one track in ASRU2023 special session: VoiceMOS challenge
2022年 1st place in 6 indexes (total 16) of Main/OOD tracks in INTERSPEECH2022 special session: VoiceMOS challenge
2021年 3rd/4th place in constrained/unconstrained resource multilingual ASR tracks of OLR2021 challenge
2021年 令和3年度情報通信研究機構表彰: 成績優秀表彰優秀賞(団体)
Kyoto University 2018年 第34回 電気通信普及財団 テレコムシステム技術学生賞
2016年 Paper nominated as ACM/IEEE Trans. Audio, Speech and Language Process. cover
2012年-2016年 京都大学入学料・授業料の全部免除
2012年 日本政府(文部科学省)奨学金, 京都大学推薦国費留学生特別配置入学
Joint Lab CAS and CUHK 2012年 ポートランド, USA,INTERSPEECH会議へIBM 旅行補助賞金
2011年 香港青年起業家プログラムの創造的な企画賞
2011年 職員優秀賞
Nanjing University and before 2004年 南京大学勵志奨学金
2002年 香港陳蔭川財団大学新入生優秀者奨学金
2002年 中国江蘇省化学オリンピック二等賞 (妻が物理オリンピック同等賞), 生物学オリンピック三等賞
教育活動/指導
2023年 Supervised Ph.D. student of Kyoto Univ. (Wangjin Zhou) got 1st place in one track in ASRU2023 special session: VoiceMOS challenge
2023年 Supervised Ph.D. student of Kyoto Univ. (Qianying Liu) got IEEE-SPS grant for IEEE-ICASSP2023 oral presentation
2022年 Supervised PhD student of Univ. of Tokyo (Zhuo Gong) successfully graduated from the Univ. of Tokyo. I supervised every paper.
2022年 Supervised Master student of Kyoto Univ. (Wangjin Zhou and Zhengdong Yang) got Co-1st in VoiceMOS Challenge2022
2021年 Supervised Ph.D. student of Kyoto Univ. (Soky Kak) who got best student paper nomination in O-COCOSDA2021
2020年 Supervised Master student of Kyoto Univ. (Yaowei Han) who got the best student paper nomination in IEEE-ICME2020
Grant-in-Aid for Scientific Research (B) (Co-I): 2023-2028 (ongoing)
意図を的確に伝える音声対話翻訳の基盤技術の創出
Grant-in-Aid for Scientific Research (C) (PI): 2023-2026 (ongoing)
M3OLR: Towards Effective Multilingual, Multimodal and Multitask Oriental Low-resourced Language Speech Recognition
Grant-in-Aid for Research Activity Start-up (PI): 2019-2021
Next generation multilingual End-to-End speech recognition (from G30 to G200)
Sheng Li (李 勝).
Speech Recognition Enhanced by Lightly-supervised and Semi-supervised Acoustic Model Training.
Ph.D. Thesis, Kyoto University, 2016.
学位論文: 音響モデルの準教師付き及び半教師付き学習による音声認識,指導教授: 河原達也先生,Feb. 2016.
著書
S. Li, Bridging Eurasia: Multilingual Speech Recognition for Silkroad, ISBN: 978-4-904020-29-6, 2023.
S. Li, Voices of the Himalayas: Investigation of Speech Recognition Technology for the Tibetan Language, ISBN: 978-4-904020-28-9, 2022.
S. Li, Phantom in the Opera: The Vulnerabilities of Speech-based Artificial Intelligence Systems, ISBN: 978-4-904020-26-5, 2022.
X. Lu, S. Li, M. Fujimoto, Speech-to-Speech Translation, pp. 21-38, Springer Singapore, 2020.
S. Li, Chapter: From Shallow to Deep and Very Deep.
S. Li, Chapter: End-to-End and CTC models.
招待講演
Phoneme-level articulatory animation in pronunciation training using EMA data, 2012, Speech Synthesis Lab., Tsinghua University, host: Prof. Zhiyong Wu.
Lightly-supervised training and confidence estimation using CRF classifiers, 2014, Speech and Cognition Lab., Tianjin University, host: Prof. Jianwu Dang and Prof. Kiyoshi Honda.
End-to-End Speech Recognition, 2019, University of Tokyo.
Towards Security-aware Speech Recognition System, 2023, NECTEC-NICT joint seminar, Thailand.
Self-Supervised Learning MOS Prediction with Listener Enhancement, 2023, VoiceMOS mini workshop, NII, Tokyo.
学術論文誌 掲載論文 (Peer reviewed)
S. Li, J. Li, C. Chu.
Voices of the Himalayas: Benchmarking Speech Recognition Systems for the Tibetan Language.
International Journal of Asian Language Processing, Vol. 34, No. 1, pp. 2450001, 2024.
S. Li, J. Li and Y. Cao,
Phantom in the Opera: Adversarial Music Attack for Robot Dialogue System.
Frontiers in Computer Science, section Human-Media Interaction, Vol.6, 2024.(invited paper)
Z. Yang, S. Shimizu, C. Chu, S. Li, S. Kurohashi,
End-to-end Japanese-English Speech-to-text Translation with Spoken-to-Written Style Conversion.
Journal of Natural Language Processing, Vol.31, No. 3, 2024.
N. Li, L. Wang, M. Ge, M. Unoki, S. Li and J. Dang,
Robust Voice Activity Detection Using an Auditory-Inspired Masked Modulation Encoder Based Convolutional Attention Network.
Speech Communication (SPEECH COMMUN), Vol. 157, No. 103024, 2024.
Y. Lin, L. Wang, J. Dang, S. Li and C. Ding,
Disordered Speech Recognition Considering Low Resources and Abnormal Articulation.
Speech Communication (SPEECH COMMUN), Vol. 155, No. 103002, 2023.
K. Soky, S. Li, C. Chu, T. Kawahara.
Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings.
International Journal of Asian Language Processing, Vol. 33, No. 4, pp. 2350024, 2023.
K. Soky, M. Mimura, C. Chu, T. Kawahara, S. Li, C. Ding, S. Sam.
TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies.
International Journal of Asian Language Processing, Vol. 31, No. 03&04, pp. 2250007, 2022. (invited paper for best student paper nominated in OCOCOSDA-2021)
C. Fan, H. Zhang, J. Yi, Z. Lv, J. Tao, T. Li, G. Pei, X. Wu, S. Li.
SpecMNet: Spectrum Mend Network for Monaural Speech Enhancement.
Applied Acoustics, Vol. 194, pp. 108792, 2022.
S. Shimizu, C. Chu, S. Li, S. Kurohashi.
Cross-Lingual Transfer Learning for End-to-End Speech Translation.
Journal of Natural Language Processing (JNLP), Vol.29, No.2, 2022.
S. Qin, L. Wang, S. Li (corresponding), J. Dang and L. Pan.
Improving Low-resource Tibetan End-to-end ASR by Multilingual and Multi-level Unit Modeling.
EURASIP Journal on Audio, Speech and Music Processing. (EURASIP JASMP), No.2, 2022.
X. Chen, H. Huang, and S. Li,
Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview.
Applied Sciences, Special Issues of Machine Speech Communication, Vol. 11, No. 18, pp. 8450, 2021. (Peer-reviewed, invited survey paper)
P. Shen, X. Lu, S. Li(patent co-inventor), H. Kawai.
Knowledge Distillation-based Representation Learning for Short-Utterance Spoken Language Identification.
IEEE Trans. Audio, Speech \& Language Process. (TASLP), vol. 28, pp. 2674--2683, 2020.
S. Li, Y.Akita, and T.Kawahara.
Semi-supervised acoustic model training by discriminative data selection from multiple ASR systems' hypotheses.
IEEE Trans. Audio, Speech \& Language Process. (TASLP), Vol.24, No.9, pp.1520--1530, 2016.
(Cover Paper, IEEE Signal Processing Society Japan Student Journal Paper Award)
S. Li, Y.Akita, and T.Kawahara.
Automatic lecture transcription based on discriminative data selection for lightly supervised acoustic model training.
IEICE Trans., Vol.E98-D, No.8, pp.1545--1552, 2015.
L. Wang, H. Chen, S. Li, and H. Meng.
Phoneme-level articulatory animation in pronunciation training,
Speech Communication (SPEECH COMMUN), Vol. 54, Issue 7, Sept. pp. 845--856, 2012.
国際会議 発表論文 (Peer reviewed)
2024@NICT
J. Chen, C. Chu, S. Li, T. Kawahara,
Data Selection using Spoken Language Identification for Low-Resource and Zero-Resource Speech Recognition,
in Proc. APSIPA ASC, pp. (accepted for presentation), 2024.
S. Li, Y. Ko, A. Ito,
LLM as decoder: Investigating Lattice-based Speech Recognition Hypotheses Rescoring Using LLM,
in Proc. APSIPA ASC, pp. (accepted for presentation), 2024.
C. Kwok, S. Li, J. Yip, E. Chng,
Low-resource Language Adaptation with Ensemble of PEFT Approaches,
in Proc. APSIPA ASC, pp. (accepted for presentation), 2024.
C. Tan, S. Li, Y. Cao, Z. Ren, T. Schultz,
Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition,
in Proc. ACM Multimedia Asia, pp. (accepted for presentation), 2024.
S. Li, C. Chen, C. Kwok, C. Chu, E. Chng, H. Kawai,
Investigating ASR Error Correction with Large Language Model and Multilingual 1-best Hypotheses.
in Proc. INTERSPEECH, pp. 1315--1319, 2024.
S. Li, J. Li, Y. Cao,
Automatic Post-Editing of Speech Recognition System Output Using Large Language Models.
in Proc. International Conference on Database Systems for Advanced Applications (DASFAA) Workshop, pp. ***, 2024.
Y. Wu, Y. Nakashima, N. Garcia, S. Li, Z. Zeng,
Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis.
in Proc. ACM ICMR (reproducibility paper), pp. 1228--1231, 2024.
L. Zheng, Y. Cao, R. Jiang, K. Taura, Y. Shen, S. Li, M. Yoshikawa,
Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks.
in Proc. International Conference on Database Systems for Advanced Applications (DASFAA), pp. 457--473, 2024.
Y. Zhao, C. Qiang, H. Li, Y. Hu, W. Zhou, S. Li,
Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-Processing.
in Proc. IEEE-ICASSP, pp. 8341--8345, 2024.
W. Zhou, Z. Yang, C. Chu, S. Li, R. Dabre, Y. Zhao, T. Kawahara,
MOS-FAD: Improving fake audio detection via automatic mean opinion score prediction,
in Proc. IEEE-ICASSP, pp. 876--880, 2024.
2023@NICT
W. Zhou, Z. Yang, S. Li, C. Chu,
KyotoMOS: An Automatic MOS Scoring System for Speech Synthesis.
in Proc. ACM Multimedia Asia Workshop, pp. 7:1-7:3, 2023.
X. Chen, S. Li, J. Li, H. Huang, Y. Cao, L. He,
Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization.
in Proc. ACM Multimedia Asia, pp. 93:1-93:5, 2023.
X. Chen, S. Li, J. Li, Y. Cao, H. Huang, L. He,
GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System.
in Proc. ACM Multimedia Asia, pp. 94:1-94:5, 2023.
W. Wei, Z. Yang, G. Yuan, J. Li, C. Chu, S. Okada, S. Li (corresponding),
FedCPC: an Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer’s Speech Detection.
in Proc. IEEE Workshop Automatic Speech Recognition \& Understanding (IEEE-ASRU), pp. 1-6, 2023.
Z. Qi, X. Hu, W. Zhou, S. Li, H. Wu, J. Lu, X. Xu,
LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement.
in Proc. IEEE Workshop Automatic Speech Recognition \& Understanding (IEEE-ASRU), pp. 1-6, 2023.
S. Li, J. Li,
Correction while Recognition: Combining Pretrained Language Model for Taiwan-accented Speech Recognition.
in Proc. International Conference on Artificial Neural Networks (ICANN), pp. 389--400, 2023.
Z. Yang, S. Shimizu, W. Zhou, S. Li, C. Chu,
The Kyoto Speech-to-Speech Translation System for IWSLT 2023.
in Proc. International Conference on Spoken Language Translation (IWSLT), pp. 357--362, 2023.
L. Yang, J. Li, S. Li, T. Shinozaki,
Dialogue State Tracking with Sparse Local Slot Attention.
in Proc. ACL, Workshop on NLP for Conversational AI, pp. 39--46, 2023.
L. Yang, J. Li, S. Li, T. Shinozaki,
Multi-Domain Dialogue State Tracking with Disentangled Domain-Slot Attention.
in Proc. ACL, (findings), pp. 4928--4938, 2023.
S. Shimizu, C. Chu, S. Li, S. Kurohashi,
Towards Speech Dialogue Translation Mediating Speakers of Different Languages.
in Proc. ACL, (findings), pp. 1122--1134, 2023.
C. Tan, Y. Cao, S. Li, M. Yoshikawa,
General or Specific? Investigating Effective Privacy Protection in Federated Learning for Speech Emotion Recognition.
in Proc. IEEE-ICASSP, pp. 1-5, 2023.
K. Wang, Y. Yang, H. Huang, Y. Hu, S. Li,
SpeakerAugment: Data Augmentation for Generalizable Source Separation via Speaker Parameter Manipulation.
in Proc. IEEE-ICASSP, pp. 1-5, 2023.
Y. Yang, H. Xu, H. Huang, E.S. Chng, S. Li,
Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.
in Proc. IEEE-ICASSP, pp. 1-5, 2023.
K. Soky, S. Li, C. Chu, T. Kawahara,
Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language.
in Proc. IEEE-ICASSP, pp. 1-5, 2023.
Q. Liu, Z. Gong, Z. Yang, Y. Yang, S. Li, Ding C. Chen, N. Minematsu, H. Huang, F. Cheng, C. Chu, S. Kurohashi,
Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition.
in Proc. IEEE-ICASSP, pp. 1-5, 2023. (Travel Granted by IEEE-SPS)
2022@NICT
L. Yang, J. Li, S. Li and T. Shinozaki,
Multi-Domain Dialogue State Tracking with Top-k Slot Self Attention.
in Proc. SIGdial Meeting Discourse \& Dialogue, pp. 231--236, 2022.
K. Soky, S. Li, M. Mimura, C. Chu and T. Kawahara,
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.
in Proc. INTERSPEECH, pp. 1362--1366, 2022.
L. Yang, W. Wei, S. Li, J. Li and T. Shinozaki,
Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection.
in Proc. INTERSPEECH, pp. 541--545, 2022.
K. Li, S. Li, X. Lu, M. Akagi, M. Liu, L. Zhang, C. Zeng, L. Wang, J. Dang and M. Unoki,
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
in Proc. INTERSPEECH, pp. 664--668, 2022.
Z. Yang, W. Zhou, C. Chu, S. Li, R. Dabre, R. Rubino and Y. Zhao,
Fusion of Self-supervised Learned Models for MOS Prediction.
in Proc. INTERSPEECH, pp. 5443--5447, 2022.
S. Qin, L. Wang, S. Li, Y. Lin and J. Dang,
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.
in Proc. INTERSPEECH, pp. 2133--2137, 2022.
H. Shi, L. Wang, S. Li, J. Dang and T. Kawahara,
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
in Proc. INTERSPEECH, pp. 221--225, 2022.
N. Li, M. Ge, L. Wang, M. Unoki, S. Li and J. Dang,
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.
in Proc. INTERSPEECH, pp. 361--365, 2022.
Z. Gong, D. Saito, L. Yang, T. Shinozaki, S. Li, H. Kawai and N. Minematsu,
Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model.
In Proc. ISCA-Odyssey (The Speaker and Language Recognition Workshop), pp. 415--420, 2022.
S. Li, J. Li, Q. Liu, Z. Gong,
Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection.
in Proc. LREC (Language Resources and Evaluation Conference), pp. 7291--7297, 2022.
Y. Lv, L. Wang, M. Ge, S. Li (corresponding), C. Ding, L. Pan, Y. Wang, J. Dang, K. Honda,
Compressing Transformer-based ASR Model by Task-driven Loss and Attention-based Multi-level Feature Distillation.
in Proc. IEEE-ICASSP, pp. 7992--7996, 2022.
K. Wang, Y. Peng, H. Huang, Y. Hu, and S. Li,
Mining Hard Samples Locally and Globally for Improved Speech Separation.
in Proc. IEEE-ICASSP, pp. 6037--6041, 2022.
2021@NICT
K. Soky, M. Mimura, T. Kawahara, S. Li, C. Ding, C. Chu, and S. Sam.
Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC).
In Proc. O-COCOSDA, pp. 122--127, 2021. (Best student paper nomination, invited as fast tracking journal paper)
D. Wang, S. Ye, X. Hu, S. Li, and X. Xu,
An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model.
in Proc. INTERSPEECH, pp. 3266--3270, 2021.
K. Wang, H. Huang, Y. Hu, Z. Huang, and S. Li,
End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time- Frequency Domain.
in Proc. INTERSPEECH, pp. 3046--3050, 2021.
N. Li, L. Wang, M. Unoki, S. Li, R. Wang, M. Ge, and J. Dang,
Robust voice activity detection using a masked auditory encoder based convolutional neural network.
in Proc. IEEE-ICASSP, pp. 6828--6832, 2021.
S. Chen, X. Hu, S. Li, and X. Xu,
An investigation of using hybrid modeling units for improving End-to-End speech recognition systems.
in Proc. IEEE-ICASSP, pp. 6743--6747, 2021.
H. Huang, K. Wang, Y. Hu, and S. Li,
Encoder-Decoder based pitch tracking and joint model training for Mandarin tone classification.
in Proc. IEEE-ICASSP, pp. 6943--6947, 2021.
2020@NICT
Y. Lin, L. Wang, S. Li, J. Dang, and C. Ding.
Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription.
In Proc. INTERSPEECH, pp. 4791--4795, 2020 (Travel Granted by ISCA).
H. Shi, L. Wang, S. Li, C. Ding, M. Ge, N. Li, J. Dang, and H. Seki.
Singing Voice Extraction with Attention based Spectrograms Fusion.
In Proc. INTERSPEECH, pp. 2412--2416, 2020 (Travel Granted by ISCA).
S. Li, X. Lu, R. Dabre, P. Shen, and H. Kawai.
Joint Training End-to-End Speech Recognition Systems with Speaker Attributes.
In Proc. ISCA-Odyssey (The Speaker and Language Recognition Workshop), pp. 385--390, 2020.
P. Shen, X. Lu, K. Sugiura, S. Li, and H. Kawai.
Compensation on x-vector for short utterance spoken language identification.
In Proc. ISCA-Odyssey (The Speaker and Language Recognition Workshop), pp. 47--52, 2020.
Y. Han, S. Li, Y. Cao, Q. Ma, and M. Yoshikawa.
Voice-Indistinguishability: Protecting Voiceprint in Privacy Preserving Speech Data Release.
In Proc. IEEE-ICME, pp. 1--6, 2020. (Best student paper nomination, select as fast tracking journal paper of IEEE Trans. Multimedia (TMM))
Y. Lin, L. Wang, J. Dang, S. Li, and C. Ding.
End-To-End Articulatory Modeling for Dysarthria Articulatory Attribute Detection.
In Proc. IEEE-ICASSP, pp. 7349--7353, 2020.
H. Shi, L. Wang, M. Ge, S. Li, and J. Dang.
Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation.
In Proc. IEEE-ICASSP, pp. 7544--7548, 2020.
2019@NICT
X. Lu, P. Shen, S. Li, Y. Tsao, and H. Kawai.
Class-wise Centroid Distance Metric Learning for Acoustic Event Detection.
In Proc. INTERSPEECH, pp. 3614--3618, 2019.
S. Li, X. Lu, C. Ding, P. Shen, T. Kawahara, and H. Kawai.
Investigating Radical-based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.
In Proc. INTERSPEECH, pp. 2200--2204, 2019.
S. Li, C. Ding, X. Lu, P. Shen, T. Kawahara, and H. Kawai.
End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition.
In Proc. INTERSPEECH, pp. 2145--2149, 2019.
S. Li, R. Dabre, X. Lu, P. Shen, T. Kawahara, and H. Kawai.
Improving Transformer-based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.
In Proc. INTERSPEECH, pp. 4400--4404, 2019.
P.Shen, X.Lu, S. Li, and H.Kawai.
Interactive learning of teacher-student model for short utterance spoken language identification.
In Proc. IEEE-ICASSP, pp. 5981--5985, 2019.
R.Takashima, S. Li, and H.Kawai.
Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models.
In Proc. IEEE-ICASSP, pp. 6156--6160, 2019.
2018@NICT
S. Li, X.Lu, R.Takashima, P.Shen, T.Kawahara, and H.Kawai.
Improving very deep time-delay neural network with vertical-attention for effectively training CTC-based ASR systems.
In Proc. IEEE Spoken Language Technology Workshop (IEEE-SLT), pp. 77--83, 2018.
S. Li, X.Lu, R.Takashima, P.Shen, T.Kawahara, and H.Kawai.
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks.
In Proc. INTERSPEECH, pp. 3708--3712, 2018.
P.Shen, X.Lu, S. Li, and H.Kawai.
Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification.
In Proc. INTERSPEECH, pp. 1813--1817, 2018.
X.Lu, P.Shen, S. Li, Y.Tsao, and H.Kawai.
Temporal Attentive Pooling for Acoustic Event Detection.
In Proc. INTERSPEECH, pp. 1354--1357, 2018.
R.Takashima, S. Li, and H.Kawai.
An Investigation of a Knowledge Distillation Method for CTC Acoustic Models.
In Proc. IEEE-ICASSP, pp. 5809--5813, 2018.
R.Takashima, S. Li, and H.Kawai.
CTC Loss Function with a Unit-level Ambiguity Penalty.
In Proc. IEEE-ICASSP, pp. 5909--5913, 2018.
2017@NICT
S. Li, X.Lu, P.Shen, R.Takashima, T.Kawahara, and H.Kawai.
Incremental training and constructing the very deep convolutional residual network acoustic models.
In Proc. IEEE Workshop Automatic Speech Recognition \& Understanding (IEEE-ASRU), pp. 222--227, 2017.
P. Shen, X. Lu, S. Li, and H. Kawai.
Conditional Generative Adversarial Nets Classifier for Spoken Language Identification.
In Proc. INTERSPEECH, pp. 2814--2818, 2017.
before 2017 (Kyoto Univ.)
S. Li, X.Lu, S.Sakai, M.Mimura, and T.Kawahara.
Semi-supervised ensemble DNN acoustic model training.
In Proc. IEEE-ICASSP, pp. 5270--5274, 2017.
S. Li, Y.Akita, and T.Kawahara.
Data selection from multiple ASR systems' hypotheses for unsupervised acoustic model training.
In Proc. IEEE-ICASSP, pp. 5875--5879, 2016.
S. Li, Y.Akita, and T.Kawahara.
Discriminative data selection for lightly supervised training of acoustic model using closed caption texts.
In Proc. INTERSPEECH, pp. 3526--3530, 2015.(oral)
S. Li, X.Lu, Y.Akita, and T.Kawahara.
Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation.
In Proc. INTERSPEECH, pp. 2892--2896, 2015.
before 2013 (Joint Lab. CAS & CUHK)
S. Li and L. Wang.
Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data,
In Proc. INTERSPEECH, pp. 903--906, 2012. (Travel granted by IBM research)
Challenges・Demo (selected):
ICASSP2024 ICMC-ASR (In-Car Multi-Channel Automatic Speech Recognition) Challenge. (top2 in one track)
The System Description for VoiceMOS Challenge 2023. (top1 in one track)
S. Li, R. Dabre, R. Raphael, W. Zhou, Z. Yang, C. Chu, Y. Zhao.
The System Description for VoiceMOS Challenge 2022 (KK team, main/ood tasks). (6 indexes 1st place)
D. Wang, S. Ye, X. Hu, S. Li.
The RoyalFlush-NICT System Description for AP21-OLR Challenge (Silk-road team, full tasks).
In OLR2021 (oriental language recognition challenge), 2021. (top3)
Y. Han, Y. Cao, S. Li, Q. Ma, and M. Yoshikawa.
Voice-Indistinguishability: Protecting Voiceprint with Differential Privacy under an Untrusted Server.
ACM conference on Computer and Communications Security (CCS), demo, pp. 2125--2127, 2020.
H. Zhang, S. Li, X. Ma, Y. Zhao, Y. Cao, and T. Kawahara,
Phantom in the Opera: Effective Adversarial Music Attack on Keyword Spotting Systems.
in Proc. IEEE-SLT, 2021 (demo session, introduction).
IEEE/IEEE-SPS (Signal Processing Society),
ISCA (International Speech Communication Association),
ASJ (日本音響学会),
SIG-CSLP (Chinese Spoken Language Processing),
APSIPA (Asia Pacific Signal and Information Processing Association),
ACM (Association for Computing Machinery)
APNNS (Asia Pacific Neural Network Society)
会議運営
[1] Session Chair of INTERSPEECH2020 session: Topics of ASR I
[2] Co-organizing INTERSPEECH2020 workshop: Spoken Language Interaction for Mobile Transportation System (SLIMTS)
[3] Session Chair of Speaker Odyssey2022 session: Evaluation and Benchmarking (EB)
[4] Co-organizing Coling2022 workshop: when creative ai meets conversational ai (cai + cai = cai^2)
[5] Co-organizing ACM Multimedia Asia 2023 workshop: M3Oriental (https://sites.google.com/view/m3oriental)
[6] Area Chair of APSIPA 2023
[7] Area Chair of EMNLP 2023
[8] Session Chair of ICANN 2023
[9] APSIPA Speech, Language, and Audio (SLA) Technical Committee (till 2026)
[10] Session Chair of ICASSP 2024
[11] Publicity Chair of ACM Multimedia Asia 2024
[12] Session Chair of DASFAA 2024
査読委員/委員会 ジャーナル
[1] IEEE/ACM Trans. Audio, Speech \& Language Process.
[2] Computer Speech and Language
[3] Speech Communication
[4] IEICE transactions, letters
[5] APSIPA transactions
[6] Applied Acoustics
[7] Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
[8] Digital Signal Processing
[9] behavior information and technology
[10] EURASIP Journal on Audio, Speech, and Music Processing