Аналіз особливостей використання ресурсів мікроконтролера для розпізнавання мовлення

Рижова, А. Р.; Оникієнко, Ю. О.

Аналіз особливостей використання ресурсів мікроконтролера для розпізнавання мовлення

dc.contributor.author	Рижова, А. Р.
dc.contributor.author	Оникієнко, Ю. О.
dc.date.accessioned	2022-12-07T09:38:06Z
dc.date.available	2022-12-07T09:38:06Z
dc.date.issued	2022
dc.description.abstracten	The use of neural networks for information recognition, in particular, voice, expands the functional capabili-ties of embedded systems on microcontrollers. But it is necessary to take into account the limitations of the microcontroller resources. The purpose of the work is to analyze the impact of voice processing parameters and neural network architecture on the degree of microcontroller resources usage. To do this, a database of samples of the keyword, samples of other words and voices, and samples of noise are created, the probability of recognizing the keyword among other words and noises is evaluated, the dependence of the amount of memory used on the microcontroller and the decision-making time on the num-ber MFC coefficients is established, the dependence of the amount of used memory of the microcontroller and the decision-making time on the type of convolutional neural network is established also. During the experiment, the Arduino Nano 33 BLE Sense development board was used. The neural network model was built and trained on the Edge Impulse software platform. To conduct the experiment, three groups of data with the names "hello", "unknown", "noise" were created. The group "hello" contains 94 examples of the word "hello" in English, spoken by a female voice. The "unknown" group contains 167 examples of other words pronounced by both female and male voices. The "noise" group contains 166 samples of noise and random sounds. According to Edge Impulse's recommendation, 80% of the samples from each of the data groups were used to train the neural network model, and 20% of the samples were used for testing. Analysis of the results shows that with an increase in the number of MFC coefficients and, accordingly, the accuracy of keyword recognition, the amount of program memory occupied by the code increases by 480 bytes (less than 1%). For the nRF52840 microcontroller, this is not a significant increase. The amount of RAM used during the experiment did not change. Although the calculation time of the accuracy of the code word definition increased by only 14 ms (less than 5%) with the increase in the number of MFC coefficients, the calculation procedure is quite long (approximately 0.3 s) compared to the sound sample length of 1 s. This can be a certain limitation when processing a sound signal with 32-bit microcontrollers. To analyze phrases or sentences, it is necessary to use more powerful microcontrollers or microprocessors. Based on the results of experimental research, it can be stated that the computing resources of 32-bit microcontrollers are quite sufficient for recognizing voice commands with the possibility of pre-digital processing of the sound signal, in particular, the use of low-frequency cepstral coefficients. The selection of the number of coefficients does not significantly affect the amount of used FLASH and RAM memory of the nRF52840 microcontroller. The comparison results show the superiority of the 2D network in the accuracy of the keyword definition for both 12 and 13 MFC coefficients. The use of a one-dimensional convolutional neural network for voice sample recognition in the conducted experiment provides memory savings of approximately 5%. The quality of keyword recognition with the number of MFC coefficients of 12 is approxi-mately 0.7. For 17 MFC coefficients, the recognition quality is already 0.97. The amount of RAM used in the case of the 2D network has decreased slightly. Voice sample processing time for both types of networks is practically the same. Thus, 1D convolutional neural networks have certain advantages in microcontroller applications for voice processing and recognition. The limitation of voice recognition on the microcontroller is the sufficiently long processing time of the sound sample (ap-proximately 0.3 s) with the duration of the sample itself being 1 s, this can be explained by a sufficiently low clock frequency of 64 MHz. Increasing the clock frequency will reduce the calculation time.	uk
dc.description.abstractuk	В роботі виконано аналіз використання обчислювальних ресурсів мікроконтролера для машинного навчання та розпізнавання голосу. Поставлено експеримент для визначення залежності часу розпізнавання ключового слова, об’єму використаної оперативної пам’яті та пам’яті програм в залежності від кількості мел-частотних кепстральних коефіцієнтів та типу згорткової нейронної мережі. Для проведення експерименту використано плату розробки Arduino Nano 33 BLE Sense. Модель нейронної мережі створено та треновано на програмній платформі Edge Impulse. В результаті аналізу встановлено, що пам’яті 32-х бітного мікроконтролера достатньо для обчислень та використання нейронної мережі. Однак час класифікації ключового слова складає приблизно 0,3 с, відповідно розпізнавання довгих фраз може зайняти декілька секунд, що не завжди є прийнятним.	uk
dc.format.pagerange	С. 265406-1-265406-7	uk
dc.identifier.citation	Рижова, А. Р. Аналіз особливостей використання ресурсів мікроконтролера для розпізнавання мовлення / Рижова А. Р., Оникієнко Ю. О. // Мікросистеми, Електроніка та Акустика : науково-технічний журнал. – 2022. – Т. 27, № 2(121). – С. 265406-1-265406-7. – Бібліогр.: 24 назв.	uk
dc.identifier.doi	https://doi.org/10.20535/2523-4455.mea.265406
dc.identifier.orcid	0000-0003-3278-8448	uk
dc.identifier.orcid	0000-0001-7508-8391	uk
dc.identifier.uri	https://ela.kpi.ua/handle/123456789/51272
dc.language.iso	uk	uk
dc.publisher	КПІ ім. Ігоря Сікорського	uk
dc.publisher.place	Київ	uk
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.source	Мікросистеми, Електроніка та Акустика : науково-технічний журнал, 2022, Т. 27, № 2(121)	uk
dc.subject	мікроконтролери	uk
dc.subject	мел-частотні кепстральні коефіцієнти	uk
dc.subject	згорткові нейронні мережі	uk
dc.subject	розпізнавання голосу	uk
dc.subject	microcontrollers	uk
dc.subject	mel-frequency cepstral coefficients	uk
dc.subject	convolutional neural networks	uk
dc.subject	voice recognition	uk
dc.subject.udc	621.382	uk
dc.title	Аналіз особливостей використання ресурсів мікроконтролера для розпізнавання мовлення	uk
dc.type	Article	uk

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1

Назва:: MEA_2_2022_07_265406-1-265406-7.pdf
Розмір:: 569.97 KB
Формат:: Adobe Portable Document Format
Опис:

Завантажити

Ліцензійна угода

Зараз показуємо 1 - 1 з 1

Назва:: license.txt
Розмір:: 9.1 KB
Формат:: Item-specific license agreed upon to submission
Опис:

Завантажити

Зібрання

Мікросистеми, Електроніка та Акустика: науково-технічний журнал, Т. 27, № 2(121)