An analog-AI chip for energy-efficient speech recognition and transcription.


Journal

Nature
ISSN: 1476-4687
Titre abrégé: Nature
Pays: England
ID NLM: 0410462

Informations de publication

Date de publication:
Aug 2023
Historique:
received: 13 12 2022
accepted: 16 06 2023
medline: 25 8 2023
pubmed: 24 8 2023
entrez: 23 8 2023
Statut: ppublish

Résumé

Models of artificial intelligence (AI) that have billions of parameters can achieve high accuracy across a range of tasks

Identifiants

pubmed: 37612392
doi: 10.1038/s41586-023-06337-5
pii: 10.1038/s41586-023-06337-5
pmc: PMC10447234
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

768-775

Informations de copyright

© 2023. The Author(s).

Références

Vaswani, A. et al. Attention is all you need. In NIPS17: Proc. 31st Conference on Neural Information Processing Systems (eds. von Luxburg, U. et al.) 6000–6010 (Curran Associates, 2017).
Chan, W. et al. SpeechStew: simply mix all available speech recognition data to train one large neural network. Preprint at https://arxiv.org/abs/2104.02133 (2021).
Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).
doi: 10.1038/s41586-018-0180-5 pubmed: 29875487
Narayanan, P. et al. Fully on-chip MAC at 14 nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration-format. IEEE Trans. Electron. Devices 68, 6629–6636 (2021).
Khaddam-Aljameh, R. et al. HERMES-core—a 1.59-TOPS/mm
doi: 10.1109/JSSC.2022.3140414
Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
doi: 10.1038/s41586-020-1942-4 pubmed: 31996818
Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512 (2022).
doi: 10.1038/s41586-022-04992-8 pubmed: 35978128 pmcid: 9385482
Better Machine Learning for Everyone. ML Commons https://mlcommons.org (2023).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
doi: 10.1038/nature14539 pubmed: 26017442
Dahl, G. E., Yu, D., Deng, L. & Acero, A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30–42 (2011).
doi: 10.1109/TASL.2011.2134090
Graves, A., Fernández, S., Gomez, F. & Schmidhuber, J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ICML ’06: Proc. 23rd International Conference on Machine Learning (eds Cohen, W. & Moore, A.) 369–376 (ACM, 2006).
Graves, A. Sequence transduction with recurrent neural networks. Preprint at https://arxiv.org/abs/1211.3711 (2012).
Graves, A., Mohamed, A.-R. & Hinton, G. Speech recognition with deep recurrent neural networks. In Proc. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing 6645–6649 (IEEE, 2013) .
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. Preprint at https://arxiv.org/abs/1409.0473 (2014).
Hsu, W.-N. et al. HuBERT: self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3451–3460 (2021).
doi: 10.1109/TASLP.2021.3122291
Gulati, A. et al. Conformer: convolution-augmented transformer for speech recognition. Preprint at https://arxiv.org/abs/2005.08100 (2020).
Panayotov, V., Chen, G., Povey, D. & Khudanpur, S. Librispeech: an ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5206–5210 (IEEE, 2015).
Godfrey, J., Holliman, E. & McDaniel, J. SWITCHBOARD: telephone speech corpus for research and development. In ICASSP-92: Proc. International Conference on Acoustics, Speech and Signal Processing 517–520 (IEEE, 1992).
Gholami, A., Yao, Z., Kim, S., Mahoney, M. W. & Keutzer, K. AI and memory wall. RiseLab Medium https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8 (2021).
Jain, S. et al. A heterogeneous and programmable compute-in-memory accelerator architecture for analog-AI using dense 2-D mesh. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 31, 114–127 (2023).
Chen, G., Parada, C. & Heigold, G. Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4087–4091 (2014).
Zhang, Y., Suda, N., Lai, L. & Chandra, V. Hello edge: keyword spotting on microcontrollers. Preprint at https://arxiv.org/abs/1711.07128 (2018).
Gokmen, T., Rasch, M. J. & Haensch, W. The marriage of training and inference for scaled deep learning analog hardware. In 2019 IEEE International Electron Devices Meeting (IEDM) 22.3.1–22.3.4 (2019).
Spoon, K. et al. Toward software-equivalent accuracy on transformer-based deep neural networks with analog memory devices. Front. Comput. Neurosci. 15, 675741 (2021).
doi: 10.3389/fncom.2021.675741 pubmed: 34290595 pmcid: 8287521
Kariyappa, S. et al. Noise-resilient DNN: tolerating noise in PCM-based AI accelerators via noise-aware training. IEEE Trans. Electron Devices 68, 4356–4362 (2021).
doi: 10.1109/TED.2021.3089987
Joshi, V. et al. Accurate deep neural network inference using computational phase-change memory. Nat. Commun. 11, 2473 (2020).
doi: 10.1038/s41467-020-16108-9 pubmed: 32424184 pmcid: 7235046
Macoskey, J., Strimel, G. P., Su, J. & Rastrow, A. Amortized neural networks for low-latency speech recognition. Preprint at https://arxiv.org/abs/2108.01553 (2021).
Fasoli, A. et al. Accelerating inference and language model fusion of recurrent neural network transducers via end-to-end 4-bit quantization. In Proc. Interspeech 2022 2038–2042 (2022).
Ding, S. et al. 4-bit conformer with native quantization aware training for speech recognition. Proc. Interspeech 2022 1711–1715 (2022).
Sun, X. et al. Ultra-low precision 4-bit training of deep neural networks. Adv. Neural Inf. Process. Syst. 33, 1796–1807 (2020).
Lavizzari, S., Ielmini, D., Sharma, D. & Lacaita, A. L. Reliability impact of chalcogenide-structure relaxation in phase-change memory (PCM) cells—part II: physics-based modeling. IEEE Trans. Electron Devices 56, 1078–1085 (2009).
doi: 10.1109/TED.2009.2016398
Biswas, A. & Chandrakasan, A. P. Conv-RAM: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In Proc. 2018 IEEE International Solid-State Circuits Conference (ISSCC) 488–490 (IEEE, 2018).
Chang, H.-Y. et al. AI hardware acceleration with analog memory: microarchitectures for low energy at high speed. IBM J. Res. Dev. 63, 8:1–8:14 (2019).
doi: 10.1147/JRD.2019.2934050
Jiang, H., Li, W., Huang, S. & Yu, S. A 40nm analog-input ADC-free compute-in-memory RRAM macro with pulse-width modulation between sub-arrays. In 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits) 266–267 (IEEE, 2022).
Jia, H. et al. A programmable neural-network inference accelerator based on scalable in-memory computing. In 2021 IEEE International Solid-State Circuits Conference (ISSCC) 236–238 (IEEE, 2021).
Dong, Q. et al. A 351TOPS/W and 372.4GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine-learning applications. In 2020 IEEE International Solid-State Circuits Conference (ISSCC) 242–244 (IEEE, 2020).
Chih, Y.-D. et al. An 89TOPS/W and 16.3TOPS/mm
Su, J.-W. et al. A 28nm 384kb 6T-SRAM computation-in-memory macro with 8b precision for AI edge chips. In 2021 IEEE International Solid-State Circuits Conference (ISSCC) 250–252 (IEEE, 2021).
Yoon, J.-H. et al. A 40nm 64Kb 56.67TOPS/W read-disturb-tolerant compute-in-memory/digital RRAM macro with active-feedback-based read and in-situ write verification. In 2021 IEEE International Solid-State Circuits Conference (ISSCC) 404–406 (IEEE, 2021).
Xue, C.-X. et al. A 22nm 4Mb 8b-precision ReRAM computing-in-memory macro with 11.91 to 195.7TOPS/w for tiny AI edge devices. In 2021 IEEE International Solid- State Circuits Conference (ISSCC) 245–247 (IEEE, 2021).
Marinella, M. J. et al. Multiscale co-design analysis of energy, latency, area, and accuracy of a ReRAM analog neural training accelerator. IEEE J. Emerg. Select. Topics Circuits Syst. 8, 86–101 (2018).
doi: 10.1109/JETCAS.2018.2796379

Auteurs

S Ambrogio (S)

IBM Research - Almaden, San Jose, CA, USA. stefano.ambrogio@ibm.com.

P Narayanan (P)

IBM Research - Almaden, San Jose, CA, USA.

A Okazaki (A)

IBM Research - Tokyo, Kawasaki, Japan.

A Fasoli (A)

IBM Research - Almaden, San Jose, CA, USA.

C Mackin (C)

IBM Research - Almaden, San Jose, CA, USA.

K Hosokawa (K)

IBM Research - Tokyo, Kawasaki, Japan.

A Nomura (A)

IBM Research - Tokyo, Kawasaki, Japan.

T Yasuda (T)

IBM Research - Tokyo, Kawasaki, Japan.

A Chen (A)

IBM Research - Almaden, San Jose, CA, USA.

A Friz (A)

IBM Research - Almaden, San Jose, CA, USA.

M Ishii (M)

IBM Research - Tokyo, Kawasaki, Japan.

J Luquin (J)

IBM Research - Almaden, San Jose, CA, USA.

Y Kohda (Y)

IBM Research - Tokyo, Kawasaki, Japan.

N Saulnier (N)

IBM Research - Albany NanoTech Center, Albany, NY, USA.

K Brew (K)

IBM Research - Albany NanoTech Center, Albany, NY, USA.

S Choi (S)

IBM Research - Albany NanoTech Center, Albany, NY, USA.

I Ok (I)

IBM Research - Albany NanoTech Center, Albany, NY, USA.

T Philip (T)

IBM Research - Albany NanoTech Center, Albany, NY, USA.

V Chan (V)

IBM Research - Albany NanoTech Center, Albany, NY, USA.

C Silvestre (C)

IBM Research - Albany NanoTech Center, Albany, NY, USA.

I Ahsan (I)

IBM Research - Albany NanoTech Center, Albany, NY, USA.

V Narayanan (V)

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA.

H Tsai (H)

IBM Research - Almaden, San Jose, CA, USA.

G W Burr (GW)

IBM Research - Almaden, San Jose, CA, USA.

Classifications MeSH