基于MAX78000实现的语音控制灯&PikaPython移植
1.使用MAX78000进行语音识别,然后通过串口将识别结果传递给ESP32。ESP32通过MQTT发送消息让小灯点亮或者熄灭 2.在MAX78000上移植PikaPython,实现在单片机上运行python脚本的目的
标签
嵌入式系统
ESP32
MAX78000
MQTT
PikaPython
vic
更新2023-01-31
1491

1.项目介绍

本次项目主要完成两个部分:

1.使用MAX78000进行语音识别,检测关键字ON、OFF的时候,通过MQTT控制指定终端设备进行开灯以及关灯的动作。

2.移植Pikapython,实现在MAX78000上运行python脚本。

2.项目设计思路

本次项目的组成部分主要有四个部分,如下图所示:

  • MAX78000评估板(用于进行语音关键字识别)
  • ESP32开发板(用于将MAX78000识别的关键字,转换为控制LED指令,通过MQTT发送)
  • MQTT服务器(这里采用的是HomeAssistant中集成的服务器,用于数据转发)
  • ESP8266开发板(通过板载LED来模拟台灯,接受MQTT的控制指令,执行对应操作)

Fth8uba--CO3mWomn3MQwAWAjRsQ

2.1.MAX78000设计处理

参照MAX78000官方文档中提供的资料,使用采集到的数据进行训练获得kwsv3的模型,然后对其进行量化,生成单片机上使用的模型文件。实现对于关键字:GO、STOP、ON、OFF进行识别,将结果通过串口3发送出去。

UART3发送代码如下所示:

static void uart3_init(void)
{
	int error = 0;

	// Initialize the UART
	if ((error = MXC_UART_Init(MXC_UART_GET_UART(3), 115200, MXC_UART_APB_CLK)) != E_NO_ERROR) {
		printf("-->Error initializing UART: %d\n", error);
		printf("-->Example Failed\n");
		while (1) {}
	} else {
		printf("UART3 initializing success\n");
	}

	return;
}

static void uart3_send_str(uint8_t *buf, uint32_t len)
{
    mxc_uart_req_t write_req = { 0 };
    int error = 0;

    write_req.uart = MXC_UART_GET_UART(3);
    write_req.txData = buf;
    write_req.txLen = len;
    write_req.rxLen = 0;
    write_req.callback = NULL;

    error = MXC_UART_Transaction(&write_req);
    if (error != E_NO_ERROR) {
        printf("-->Error starting sync write: %d\n", error);
        printf("-->Example Failed\n");
        while (1) {}
    }
}

static void uart3_send_msg(const char *keyword, double probability)
{
	char tmp_buf[32] = { 0 };

	// 构造传输数据
	snprintf(tmp_buf, sizeof(tmp_buf), "%s:%0.1f%%\n", keyword, probability);

	// 通过UART3发送
	uart3_send_str((uint8_t *)tmp_buf, strlen(tmp_buf));

	return;
}

关键字识别代码逻辑,通过板载MIC采集声音信号,当声音强度大于阈值,则采集数据传入到深度学习加速器进行模型推理,返回推理结果,主要流程代码如下所示:

    while (1) {
#ifndef ENABLE_MIC_PROCESSING

        /* end of test vectors */
        if (sampleCounter >= sizeof(voiceVector) / sizeof(voiceVector[0])) {
            PR_DEBUG("End of test Vector\n");
            break;
        }

#endif

        /* Read from Mic driver to get CHUNK worth of samples, otherwise next sample*/
        if (MicReadChunk(pChunkBuff, &avg) == 0) {
#ifdef WUT_ENABLE
#ifdef ENERGY

            // keep LED on for about 10sec for energy measurement
            if (tot_usec > 10 * 1000 * 1000) {
                LED_Off(LED1);
                tot_usec = -10000000; // wait for 10sec before measuring again
            } else if (tot_usec > 0) {
                LED_On(LED1);
            }

#endif
#endif

#if SLEEP_MODE == 1
            __WFI();
#elif SLEEP_MODE == 2
#ifdef WUT_ENABLE
            MXC_LP_ClearWakeStatus();
            SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk; // SLEEPDEEP=1
            __WFI();
#endif
#endif // #if SLEEP_MODE == 1
            continue;
        }

        sampleCounter += CHUNK;

#ifdef ENABLE_SILENCE_DETECTION // disable to start collecting data immediately.

        /* copy the preamble data*/
        /* add the new chunk to the end of circular buffer*/
        memcpy(&pPreambleCircBuffer[preambleCounter], pChunkBuff, sizeof(uint8_t) * CHUNK);

        /* increment circular buffer pointer*/
        preambleCounter = (preambleCounter + CHUNK) % (PREAMBLE_SIZE);

        /* Display average envelope as a bar */
#ifdef ENABLE_PRINT_ENVELOPE
        PR_DEBUG("%.6d|", sampleCounter);

        for (int i = 0; i < avg / 10; i++) {
            PR_DEBUG("=");
        }

        if (avg >= thresholdHigh) {
            PR_DEBUG("*");
        }

        PR_DEBUG("[%d]\n", avg);
#endif

        /* if we have not detected voice, check the average*/
        if (procState == SILENCE) {
            /* compute average, proceed if greater than threshold */
            if (avg >= thresholdHigh) {
                /* switch to keyword data collection*/
                procState = KEYWORD;
                PR_DEBUG("%.6d Word starts from index: %d, avg:%d > %d \n", sampleCounter,
                         sampleCounter - PREAMBLE_SIZE - CHUNK, avg, thresholdHigh);

                /* reorder circular buffer according to time at the beginning of pAI85Buffer */
                if (preambleCounter == 0) {
                    /* copy latest samples afterwards */
                    if (AddTranspose(&pPreambleCircBuffer[0], pAI85Buffer, PREAMBLE_SIZE,
                                     SAMPLE_SIZE, TRANSPOSE_WIDTH)) {
                        PR_DEBUG("ERROR: Transpose ended early \n");
                    }
                } else {
                    /* copy oldest samples to the beginning*/
                    if (AddTranspose(&pPreambleCircBuffer[preambleCounter], pAI85Buffer,
                                     PREAMBLE_SIZE - preambleCounter, SAMPLE_SIZE,
                                     TRANSPOSE_WIDTH)) {
                        PR_DEBUG("ERROR: Transpose ended early \n");
                    }

                    /* copy latest samples afterwards */
                    if (AddTranspose(&pPreambleCircBuffer[0], pAI85Buffer, preambleCounter,
                                     SAMPLE_SIZE, TRANSPOSE_WIDTH)) {
                        PR_DEBUG("ERROR: Transpose ended early \n");
                    }
                }

                /* preamble is copied and state is changed, start adding keyword samples next run */
                ai85Counter += PREAMBLE_SIZE;
                continue;
            }
        }
        /* if it is in data collection, add samples to buffer*/
        else if (procState == KEYWORD)
#endif //#ifdef ENABLE_SILENCE_DETECTION
        {
            uint8_t ret = 0;

            /* add sample, rearrange buffer */
            ret = AddTranspose(pChunkBuff, pAI85Buffer, CHUNK, SAMPLE_SIZE, TRANSPOSE_WIDTH);

            /* increment number of stored samples */
            ai85Counter += CHUNK;

            /* if there is silence after at least 1/3 of samples passed, increment number of times back to back silence to find end of keyword */
            if ((avg < thresholdLow) && (ai85Counter >= SAMPLE_SIZE / 3)) {
                avgSilenceCounter++;
            } else {
                avgSilenceCounter = 0;
            }

            /* if this is the last sample and there are not enough samples to
             * feed to CNN, or if it is long silence after keyword,  append with zero (for reading file)
             */
#ifndef ENABLE_MIC_PROCESSING

            if (((ai85Counter < SAMPLE_SIZE) &&
                 (sampleCounter >= sizeof(voiceVector) / sizeof(voiceVector[0]) - 1)) ||
                (avgSilenceCounter > SILENCE_COUNTER_THRESHOLD))
#else
            if (avgSilenceCounter > SILENCE_COUNTER_THRESHOLD)
#endif
            {
                memset(pChunkBuff, 0, CHUNK);
                PR_DEBUG("%.6d: Word ends, Appends %d zeros \n", sampleCounter,
                         SAMPLE_SIZE - ai85Counter);
                ret = 0;

                while (!ret) {
                    ret =
                        AddTranspose(pChunkBuff, pAI85Buffer, CHUNK, SAMPLE_SIZE, TRANSPOSE_WIDTH);
                    ai85Counter += CHUNK;
                }
            }

            /* if enough samples are collected, start CNN */
            if (ai85Counter >= SAMPLE_SIZE) {
                int16_t out_class = -1;
                double probability = 0;

                /* reset counters */
                ai85Counter = 0;
                avgSilenceCounter = 0;

                /* new word */
                wordCounter++;

                /* change state to silence */
                procState = SILENCE;

                /* sanity check, last transpose should have returned 1, as enough samples should have already been added */
                if (ret != 1) {
                    PR_DEBUG("ERROR: Transpose incomplete!\n");
                    fail();
                }

                //----------------------------------  : invoke AI85 CNN
                PR_DEBUG("%.6d: Starts CNN: %d\n", sampleCounter, wordCounter);
                /* enable CNN clock */
                MXC_SYS_ClockEnable(MXC_SYS_PERIPH_CLOCK_CNN);

                /* load to CNN */
                if (!cnn_load_data(pAI85Buffer)) {
                    PR_DEBUG("ERROR: Loading data to CNN! \n");
                    fail();
                }

                /* Start CNN */
                if (!cnn_start()) {
                    PR_DEBUG("ERROR: Starting CNN! \n");
                    fail();
                }

#if SLEEP_MODE == 0

                /* Wait for CNN  to complete */
                while (cnn_time == 0) {
                    __WFI();
                }

#elif SLEEP_MODE == 1

                while (cnn_time == 0) {
                    __WFI();
                }

#elif SLEEP_MODE == 2
                SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk; // SLEEPDEEP=1

                while (cnn_time == 0) {
#ifdef WUT_ENABLE
                    MXC_LP_ClearWakeStatus();
                    __WFI();
#endif
                }

#endif // #if SLEEP_MODE==0

                /* Read CNN result */
                cnn_unload((uint32_t *)ml_data);
                /* Stop CNN */
                cnn_stop();
                /* Disable CNN clock to save power */
                MXC_SYS_ClockDisable(MXC_SYS_PERIPH_CLOCK_CNN);
                /* Get time */
                MXC_TMR_GetTime(MXC_TMR0, cnn_time, (void *)&cnn_time, &units);
                PR_DEBUG("%.6d: Completes CNN: %d\n", sampleCounter, wordCounter);

                switch (units) {
                case TMR_UNIT_NANOSEC:
                    cnn_time /= 1000;
                    break;

                case TMR_UNIT_MILLISEC:
                    cnn_time *= 1000;
                    break;

                case TMR_UNIT_SEC:
                    cnn_time *= 1000000;
                    break;

                default:
                    break;
                }

                PR_DEBUG("CNN Time: %d us\n", cnn_time);

                /* run softmax */
                softmax_q17p14_q15((const q31_t *)ml_data, NUM_OUTPUTS, ml_softmax);

#ifdef ENABLE_CLASSIFICATION_DISPLAY
                PR_DEBUG("\nClassification results:\n");

                for (int i = 0; i < NUM_OUTPUTS; i++) {
                    int digs = (1000 * ml_softmax[i] + 0x4000) >> 15;
                    int tens = digs % 10;
                    digs = digs / 10;

                    PR_DEBUG("[%+.7d] -> Class %.2d %8s: %d.%d%%\n", ml_data[i], i, keywords[i],
                             digs, tens);
                }

#endif
                /* find detected class with max probability */
                ret = check_inference(ml_softmax, ml_data, &out_class, &probability);

                PR_DEBUG("----------------------------------------- \n");
                /* Treat low confidence detections as unknown*/
                if (!ret || out_class == 20) {
                    PR_DEBUG("Detected word: %s", "Unknown");
                } else {
                    PR_DEBUG("Detected word: %s (%0.1f%%)", keywords[out_class], probability);
                    uart3_send_msg(keywords[out_class], probability);
                }
                PR_DEBUG("\n----------------------------------------- \n");

                Max = 0;
                Min = 0;
                //------------------------------------------------------------

#ifdef SEND_MIC_OUT_SDCARD
                /**
                 *
                 *  - Blink Green led if a keyword is detected
                 *  - Blink Yellow if detection is low confidence or unknown
                 *  - Solid Red if there is error with SD card interface
                 *
                 **/
                LED_Off(LED_GREEN);
                if (!ret || out_class == 20) {
                    // Low Confidence or unknown
                    LED_On(LED_GREEN);
                    LED_On(LED_RED);
                }

                int i = 0;
                for (i = 0; i < SAMPLE_SIZE; i++) {
                    // printf("%d\n",serialMicBuff[(serialMicBufIndex+i)%SAMPLE_SIZE]);
                    snippet[i] = serialMicBuff[(serialMicBufIndex + i) % SAMPLE_SIZE];
                }
                if (ret && out_class != 20) {
                    // Word detected with high confidence
                    snprintf(fileName, sizeof(fileName), "%04d_%s", fileCount, keywords[out_class]);
                } else {
                    // Unknown or Low confidence: add "L" at the end of file name
                    snprintf(fileName, sizeof(fileName), "%04d_%s_L", fileCount, "Unknown");
                }
                if (writeSoundSnippet((char *)fileName, snippetLength, &snippet[0]) != E_NO_ERROR) {
                    printf("*** !!!SD ERROR!!! ***\n");
                    LED_Off(LED_GREEN);
                    LED_On(LED_RED); // Permanent Red Led
                    while (1) {}
                }
                fileCount++;
                LED_Off(LED_RED);
                LED_On(LED_GREEN);
#endif
            }
        }

        /* Stop demo if PB1 is pushed */
        if (PB_Get(0)) {
            PR_INFO("Stop! \r\n");
            procState = STOP;
            break;
        }
    }

 

2.2.ESP32设计逻辑

由于MAX78000自身不带网络连接的外设,所以需要选择一个可以进行网络连接的设备将MAX78000进行联网。这里选择的是经济实惠的ESP32开发板进行数据传输。

ESP32开发板的UART0连接到MAX78000的UART3,当MAX78000检测到关键字的时候会将检测到的关键字以及置信度通过串口发送出去。ESP32会监听与MAX78000连接的串口,当读取到数据之后会对数据进行解析。

当检测到GO关键字时,ESP32会进入控制触发模式,如果就紧接着识别到ON或者OFF就会通过MQTT将对应的LED控制指令发送到一个名为esp_test_sub的主题中。

当ESP32处于控制触发模式时,如果接收到STOP的语音关键字,则会退出控制触发模式。

当ESP32不处于控制触发模式时,对于接收到的ON、OFF关键字会进行忽略,减少误触发的情况。

ESP32代码如下所示:

#include <Arduino.h>
#include <WiFi.h>
#include <PubSubClient.h>

#define WIFI_SSID   "xxxx"
#define WIFI_PASSWD "xxxxx"

#define MQTT_SERVER "xxxx"
#define MQTT_PORT xxxx

WiFiClient esp_wifi;
PubSubClient client(esp_wifi);

static void setup_wifi(void) {
  // We start by connecting to a WiFi network
  Serial.println();
  Serial.print("Connecting to ");
  Serial.println(WIFI_SSID);

  WiFi.begin(WIFI_SSID, WIFI_PASSWD);

  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }

  Serial.println("");
  Serial.println("WiFi connected");
  Serial.println("IP address: ");
  Serial.println(WiFi.localIP());
}

static void msg_parse(String &msg, String &keyword, double &probability)
{
  char tmp1[16] = { 0 };
  char tmp2[16] = { 0 };
  uint8_t count = 0;
  char *target = tmp1;

  for (int i=0; i<msg.length(); i++) {
    char c = msg.c_str()[i];

    if (':' == c) {
      count = 0;
      target = tmp2;
    } else if ('%' == c) {
      break;
    } else {
      target[count++] = c;
    }
  }

  keyword = tmp1;
  probability = atof(tmp2);
}

static void reconnect() {
  // Loop until we're reconnected
  while (!client.connected()) {
    Serial.print("Attempting MQTT connection...");
    // Attempt to connect
    if (client.connect("ESP32Client11111111")) {
      Serial.println("connected");
      // Once connected, publish an announcement...
      //client.publish("esp_test_pub", "hello world");
      // ... and resubscribe
      client.subscribe("esp_test_sub");
    } else {
      Serial.print("failed, rc=");
      Serial.print(client.state());
      Serial.println(" try again in 5 seconds");
      // Wait 5 seconds before retrying
      delay(5000);
    }
  }
}

void setup() {
  // put your setup code here, to run once:
  Serial.begin(115200);
  Serial.println("init success");

  setup_wifi();

  client.setServer(MQTT_SERVER, MQTT_PORT);
}

void loop() {
  String str = "";

  while (Serial.available() > 0) {
      str += char(Serial.read());  
      delay(10);  
  }

  if (str.length() > 0) {
      String keyword = "";
      double probability = 0.0;
      int status = -1;
      static int trigger = 0;

      msg_parse(str, keyword, probability);
      Serial.print("keyword: ");
      Serial.println(keyword);
      Serial.print("probability: ");
      Serial.println(probability);

      if (keyword == "ON") {
        status = 1;
      } else if (keyword == "OFF") {
        status = 0;
      } else if (keyword == "GO") {
        trigger = 1;
      } else if (keyword == "STOP") {
        trigger = 0;
      } else {
        status = -1;
      } 

      if (status != -1 && trigger == 1)
      {
        char msg[8];
        snprintf(msg, sizeof(msg), "%d", status);
        Serial.print("send msg: ");
        Serial.println(status);
        client.publish("esp_test_sub", msg);
        trigger = 0;
      }
  }

  reconnect();
}

2.3.ESP8266小灯代码

在本次测试中,使用ESP8266+RGB LED来实现模拟台灯的效果。

ESP8266通过WIFI连接到MQTT服务器并订阅sep_test_sub主题,当收到ESP32发布的控制主题是,依据具体的控制指令对RGB LED灯进行对应的控制。

ESP8266代码如下所示:

#include <Arduino.h>
#include <ESP8266WiFi.h>
#include <PubSubClient.h>

#define WIFI_SSID   "xxx"
#define WIFI_PASSWD "xxx"

#define MQTT_SERVER "xxxx"
#define MQTT_PORT xxx

#define LED_RGB_B 13
#define LED_RGB_R 15
#define LED_RGB_G 12
#define BUTTON_K1 4


WiFiClient esp_wifi;
PubSubClient client(esp_wifi);

static void setup_wifi(void) {
  // We start by connecting to a WiFi network
  Serial.println();
  Serial.print("Connecting to ");
  Serial.println(WIFI_SSID);

  WiFi.begin(WIFI_SSID, WIFI_PASSWD);

  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }

  Serial.println("");
  Serial.println("WiFi connected");
  Serial.println("IP address: ");
  Serial.println(WiFi.localIP());
}

static void mqtt_callback(char* topic, byte* payload, unsigned int length) {
  Serial.print("Message arrived [");
  Serial.print(topic);
  Serial.print("] ");
  for (int i = 0; i < length; i++) {
    Serial.print((char)payload[i]);
  }
  Serial.println();

  Serial.print((char)payload[0]);

  if (payload[0] == '0') {
    digitalWrite(LED_RGB_B, LOW);
  } else {
    digitalWrite(LED_RGB_B, HIGH);
  }
}

void setup() {
  // put your setup code here, to run once:
  Serial.begin(115200);
  Serial.println("init success");

  pinMode(LED_RGB_R, OUTPUT);
  pinMode(LED_RGB_G, OUTPUT);
  pinMode(LED_RGB_B, OUTPUT);

  digitalWrite(LED_RGB_R, LOW);
  digitalWrite(LED_RGB_G, LOW);
  digitalWrite(LED_RGB_B, LOW);

  setup_wifi();

  client.setServer(MQTT_SERVER, MQTT_PORT);
  client.setCallback(mqtt_callback);
}

void reconnect() {
  // Loop until we're reconnected
  while (!client.connected()) {
    Serial.print("Attempting MQTT connection...");
    // Attempt to connect
    if (client.connect("ESP8266Client11111111")) {
      Serial.println("connected");
      // Once connected, publish an announcement...
      //client.publish("esp_test_pub", "hello world");
      // ... and resubscribe
      client.subscribe("esp_test_sub");
    } else {
      Serial.print("failed, rc=");
      Serial.print(client.state());
      Serial.println(" try again in 5 seconds");
      // Wait 5 seconds before retrying
      delay(5000);
    }
  }
}

long lastMsg = 0;
char msg[50];
int value = 0;

void loop() {
  reconnect();
  client.loop();
  delay(100);
}

3.搜集素材的思路

使用电脑录取各种关键字的语音数据,然后使用音频裁剪工具将其一个个裁剪出来,然后对裁剪出来的文件进行批量格式转换,转换格式为:

  • 速率为16000hz
  • 编码为16位小端PCM编码方式
  • 单声道
  • 文件格式为wav

比较拉胯的处理方式,比较理想的是使用Python进行裁剪处理,手工操作处理起来太慢,很消耗时间,大家不要学习...

4.预训练实现过程

4.1.训练服务器准备

由于训练过程最好由CUDA支持,所以这里采用的是网上租赁的GPU云服务器,具体配置如下图所示,一小时1.5还是挺划算的。需要注意的是环境选择:PyTorch 1.8.1、Cuda 11.1、Python3.8.10

FtV00lw70TMu9LcKXHHgPW1LhcA_

4.2.训练环境搭建

4.2.1.更新系统

使用之前,首先使用如下命令更新系统的软件版本。

apt update
apt upgrade -y

4.2.2.安装依赖

依照美信github仓库要求,安装如下依赖程序。

apt install -y make build-essential libssl-dev zlib1g-dev \
  libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
  libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev \
  libsndfile-dev portaudio19-dev

4.2.3.git配置

git config --global user.email "xxx@xxx"
git config --global user.name "xxx"

4.2.4.下载美信ai8x-xxxx仓库

git clone --recursive https://github.com/MaximIntegratedAI/ai8x-training.git
git clone --recursive https://github.com/MaximIntegratedAI/ai8x-synthesis.git

4.2.5.训练

创建训练环境:

conda create --name max78000-training python=3.8
conda activate max78000-training
pip3 install -U pip setuptools
pip3 install -r requirements-cu11.txt

验证训练环境对于CUDA调用是否正常,如果有如下输出说明环境调用CUDA正常。

(max78000-training) root@I10203ab1090050122f:/hy-tmp/ai8x-training# ./check_cuda.py 
System:            linux
Python version:    3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
PyTorch version:   1.8.1+cu111
CUDA acceleration: available in PyTorch

开始训练命令如下:

(max78000-training) root@I10203ab1090050122f:/hy-tmp/ai8x-training# ./scripts/train_kws20_v3.sh 

训练过程信息如下,使用3090训练大概3个小时,还是挺久的...

FrHK8yZ_G-P24x7xR9TFroopH2t3

4.2.6.量化

环境搭建

conda create --name max78000-synthesis python=3.8

conda activate max78000-synthesis

pip3 install -U pip setuptools

pip3 install -r requirements.txt

拷贝训练结果文件到量化代码目录下

root@I10203ab1090050122f:/hy-tmp/ai8x-synthesis/trained# cp /hy-tmp/ai8x-training/logs/2023.01.14-005709/qat_checkpoint.pth.tar ai85-kws20_v3-qat8-q.pth.tar -fv

开始量化以及结果展示

root@I10203ab1090050122f:/hy-tmp/ai8x-synthesis/trained# cd ..
root@I10203ab1090050122f:/hy-tmp/ai8x-synthesis# conda activate max78000-synthesis
(max78000-synthesis) root@I10203ab1090050122f:/hy-tmp/ai8x-synthesis# cat ./scripts/quantize_kws20_v3.sh 
#!/bin/sh
python quantize.py trained/ai85-kws20_v3-qat8.pth.tar trained/ai85-kws20_v3-qat8-q.pth.tar --device MAX78000 -v "$@"
(max78000-synthesis) root@I10203ab1090050122f:/hy-tmp/ai8x-synthesis# 
(max78000-synthesis) root@I10203ab1090050122f:/hy-tmp/ai8x-synthesis# ./scripts/quantize_kws20_v3.sh 
Configuring device: MAX78000
Converting checkpoint file trained/ai85-kws20_v3-qat8.pth.tar to trained/ai85-kws20_v3-qat8-q.pth.tar

Model keys (state_dict):
voice_conv1.output_shift, voice_conv1.weight_bits, voice_conv1.bias_bits, voice_conv1.quantize_activation, voice_conv1.adjust_output_shift, voice_conv1.shift_quantile, voice_conv1.op.weight, voice_conv2.output_shift, voice_conv2.weight_bits, voice_conv2.bias_bits, voice_conv2.quantize_activation, voice_conv2.adjust_output_shift, voice_conv2.shift_quantile, voice_conv2.op.weight, voice_conv3.output_shift, voice_conv3.weight_bits, voice_conv3.bias_bits, voice_conv3.quantize_activation, voice_conv3.adjust_output_shift, voice_conv3.shift_quantile, voice_conv3.op.weight, voice_conv4.output_shift, voice_conv4.weight_bits, voice_conv4.bias_bits, voice_conv4.quantize_activation, voice_conv4.adjust_output_shift, voice_conv4.shift_quantile, voice_conv4.op.weight, kws_conv1.output_shift, kws_conv1.weight_bits, kws_conv1.bias_bits, kws_conv1.quantize_activation, kws_conv1.adjust_output_shift, kws_conv1.shift_quantile, kws_conv1.op.weight, kws_conv2.output_shift, kws_conv2.weight_bits, kws_conv2.bias_bits, kws_conv2.quantize_activation, kws_conv2.adjust_output_shift, kws_conv2.shift_quantile, kws_conv2.op.weight, kws_conv3.output_shift, kws_conv3.weight_bits, kws_conv3.bias_bits, kws_conv3.quantize_activation, kws_conv3.adjust_output_shift, kws_conv3.shift_quantile, kws_conv3.op.weight, kws_conv4.output_shift, kws_conv4.weight_bits, kws_conv4.bias_bits, kws_conv4.quantize_activation, kws_conv4.adjust_output_shift, kws_conv4.shift_quantile, kws_conv4.op.weight, fc.output_shift, fc.weight_bits, fc.bias_bits, fc.quantize_activation, fc.adjust_output_shift, fc.shift_quantile, fc.op.weight
voice_conv1.op.weight avg_max: 34.97 max: 71.0 mean: -0.0203125 factor: [1.] bits: 8
voice_conv2.op.weight avg_max: 33.458332 max: 66.0 mean: -1.1409723 factor: [1.] bits: 8
voice_conv3.op.weight avg_max: 45.84375 max: 109.0 mean: -2.6535916 factor: [1.] bits: 8
voice_conv4.op.weight avg_max: 60.854168 max: 122.0 mean: -2.7325304 factor: [1.] bits: 8
kws_conv1.op.weight avg_max: 35.65625 max: 84.0 mean: -1.8712022 factor: [1.] bits: 8
kws_conv2.op.weight avg_max: 37.083332 max: 103.0 mean: -1.8656142 factor: [1.] bits: 8
kws_conv3.op.weight avg_max: 48.8 max: 91.0 mean: -2.7921875 factor: [1.] bits: 8
kws_conv4.op.weight avg_max: 62.546875 max: 101.0 mean: 1.183125 factor: [1.] bits: 8
fc.op.weight avg_max: 85.52381 max: 116.0 mean: -6.631696 factor: [1.] bits: 8

生成kw20_v3示例程序

(max78000-synthesis) root@I10203ab1090050122f:/hy-tmp/ai8x-synthesis# rm -rf sdk/Examples/MAX78000/CNN/kws20_v3/
(max78000-synthesis) root@I10203ab1090050122f:/hy-tmp/ai8x-synthesis# ./scripts/gen_kws20_v3_max78000.sh 
Configuring device: MAX78000
Reading networks/kws20-v3-hwc.yaml to configure network...
Reading trained/ai85-kws20_v3-qat8-q.pth.tar to configure network weights...
Checkpoint for epoch 192, model ai85kws20netv3 - weight and bias data:
 InCh OutCh  Weights         Quant Shift  Min  Max    Size Key                                       Bias       Quant  Min  Max Size Key
  128   100  (12800, 1)          8     7  -71   62   12800 voice_conv1.op.weight                     N/A            0    0    0    0 N/A                      
  100    96  (9600, 3)           8     7  -66   41   28800 voice_conv2.op.weight                     N/A            0    0    0    0 N/A                      
   96    64  (6144, 3)           8     7 -109   41   18432 voice_conv3.op.weight                     N/A            0    0    0    0 N/A                      
   64    48  (3072, 3)           8     7 -122   54    9216 voice_conv4.op.weight                     N/A            0    0    0    0 N/A                      
   48    64  (3072, 3)           8     7  -84   44    9216 kws_conv1.op.weight                       N/A            0    0    0    0 N/A                      
   64    96  (6144, 3)           8     7 -103   42   18432 kws_conv2.op.weight                       N/A            0    0    0    0 N/A                      
   96   100  (9600, 3)           8     7  -91   37   28800 kws_conv3.op.weight                       N/A            0    0    0    0 N/A                      
  100    64  (6400, 6)           8     7 -100  101   38400 kws_conv4.op.weight                       N/A            0    0    0    0 N/A                      
  256    21  (1, 21, 256)        8     7 -116   45    5376 fc.op.weight                              N/A            0    0    0    0 N/A                      
TOTAL: 9 parameter layers, 169,472 parameters, 169,472 bytes
Configuring data set: KWS_20.
kws20_v3...
Arranging weights... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%
Storing weights...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%
Creating network...  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%

拷贝生成示例程序的cnn.c、cnn.h、weights.h文件到kw20_demo目录下进行替换,执行结果如下

ANALOG DEVICES 
Keyword Spotting Demo
Ver. 3.2.0 (11/28/22) 

***** Init *****
pChunkBuff: 128
pPreambleCircBuffer: 3840
pAI85Buffer: 16384

*** I2S & Mic Init ***

*** READY ***


ANALOG DEVICES 
Keyword Spotting Demo
Ver. 3.2.0 (11/28/22) 

***** Init *****
pChunkBuff: 128
pPreambleCircBuffer: 3840
pAI85Buffer: 16384

*** I2S & Mic Init ***

*** READY ***
022016 Word starts from index: 18048, avg:372 > 350 
034560: Starts CNN: 1
034560: Completes CNN: 1
CNN Time: 1850 us
Min: -128,   Max: 127 
----------------------------------------- 
Detected word: Unknown
----------------------------------------- 
052096 Word starts from index: 48128, avg:350 > 350 
064640: Starts CNN: 2
064640: Completes CNN: 2
CNN Time: 1849 us
Min: -127,   Max: 127 
----------------------------------------- 
Detected word: Unknown
----------------------------------------- 
218880 Word starts from index: 214912, avg:352 > 350 
223104: Word ends, Appends 8320 zeros 
223104: Starts CNN: 3
223104: Completes CNN: 3
CNN Time: 1849 us
Min: -106,   Max: 127 
----------------------------------------- 
Detected word: Unknown
----------------------------------------- 
335360 Word starts from index: 331392, avg:480 > 350 
347904: Starts CNN: 4
347904: Completes CNN: 4
CNN Time: 1849 us
Min: -54,   Max: 55 
----------------------------------------- 
Detected word: Unknown
----------------------------------------- 
439296 Word starts from index: 435328, avg:398 > 350 
450560: Word ends, Appends 1280 zeros 
450560: Starts CNN: 5
450560: Completes CNN: 5
CNN Time: 1849 us
Min: -73,   Max: 68 
----------------------------------------- 
Detected word: Unknown
----------------------------------------- 
489088 Word starts from index: 485120, avg:408 > 350 
501632: Starts CNN: 6
501632: Completes CNN: 6
CNN Time: 1849 us
Min: -128,   Max: 127 
----------------------------------------- 
Detected word: Unknown
----------------------------------------- 
547584 Word starts from index: 543616, avg:399 > 350 
560128: Starts CNN: 7
560128: Completes CNN: 7
CNN Time: 1849 us
Min: -128,   Max: 127 
----------------------------------------- 
Detected word: Unknown
----------------------------------------- 
593152 Word starts from index: 589184, avg:452 > 350 
605696: Starts CNN: 8
605696: Completes CNN: 8
CNN Time: 1849 us
Min: -128,   Max: 127 
----------------------------------------- 
Detected word: Unknown
----------------------------------------- 
645376 Word starts from index: 641408, avg:404 > 350 
657280: Word ends, Appends 640 zeros 
657280: Starts CNN: 9
657280: Completes CNN: 9
CNN Time: 1849 us
Min: -63,   Max: 79 
----------------------------------------- 
Detected word: AWAKE (97.4%)
----------------------------------------- 
668800 Word starts from index: 664832, avg:470 > 350 
678016: Word ends, Appends 3328 zeros 
678016: Starts CNN: 10
678016: Completes CNN: 10
CNN Time: 1849 us
Min: -52,   Max: 42 
----------------------------------------- 
Detected word: AWAKE (93.8%)
----------------------------------------- 
725888 Word starts from index: 721920, avg:373 > 350 
737152: Word ends, Appends 1280 zeros 
737152: Starts CNN: 11
737152: Completes CNN: 11
CNN Time: 1849 us
Min: -128,   Max: 127 
----------------------------------------- 
Detected word: Unknown
----------------------------------------- 
832000 Word starts from index: 828032, avg:396 > 350 
842752: Word ends, Appends 1792 zeros 
842752: Starts CNN: 12
842752: Completes CNN: 12
CNN Time: 1849 us
Min: -127,   Max: 126 
----------------------------------------- 
Detected word: BLINK (96.5%)
----------------------------------------- 
883328 Word starts from index: 879360, avg:451 > 350 
893952: Word ends, Appends 1920 zeros 
893952: Starts CNN: 13
893952: Completes CNN: 13
CNN Time: 1849 us
Min: -128,   Max: 127 
----------------------------------------- 
Detected word: TWO (93.2%)

4.2.7.数据添加

自己的数据需要拷贝到ai8x-training/data/KWS/raw目录下,这个目录下的每个目录名称都是一个标签,里面存储的文件是当前标签的数据。

root@I10203ab1090050122f:/hy-tmp# cd ai8x-training/data/KWS/raw/
root@I10203ab1090050122f:/hy-tmp/ai8x-training/data/KWS/raw# cp -rfv /hy-tmp/wav/ . 

FvyCFr2yTveDKdsND8Ki-aLDKXPv

数据添加完毕之后还不能直接生效,我们需要调整下ai8x-training/datasets/kws20.py这个脚本的内容,主要有两部分需要修改,修改内容如下。这里记录的是6关键字的内容,实际我这边使用的20关键字的那个输出进行的修改。

# 添加数据集    
# class_dict = {'backward': 0, 'bed': 1, 'bird': 2, 'cat': 3, 'dog': 4, 'down': 5,
    #               'eight': 6, 'five': 7, 'follow': 8, 'forward': 9, 'four': 10, 'go': 11,
    #               'happy': 12, 'house': 13, 'learn': 14, 'left': 15, 'librispeech': 16,
    #               'marvin': 17, 'nine': 18, 'no': 19, 'off': 20, 'on': 21, 'one': 22,
    #               'right': 23, 'seven': 24, 'sheila': 25, 'six': 26, 'stop': 27,
    #               'three': 28, 'tree': 29, 'two': 30, 'up': 31, 'visual': 32, 'wow': 33,
    #               'yes': 34, 'zero': 35}
    class_dict = {'backward': 0, 'bed': 1, 'bird': 2, 'cat': 3, 'dog': 4, 'down': 5,
                  'eight': 6, 'five': 7, 'follow': 8, 'forward': 9, 'four': 10, 'go': 11,
                  'happy': 12, 'house': 13, 'learn': 14, 'left': 15, 'librispeech': 16,
                  'marvin': 17, 'nine': 18, 'no': 19, 'off': 20, 'on': 21, 'one': 22,
                  'right': 23, 'seven': 24, 'sheila': 25, 'six': 26, 'stop': 27,
                  'three': 28, 'tree': 29, 'two': 30, 'up': 31, 'visual': 32, 'wow': 33,
                  'yes': 34, 'zero': 35, 'awake': 36, 'blink': 37, 'breathing': 38,
                  'close': 39, 'open': 40}


# 修改输出识别的标签
datasets = [
    # {
    #     'name': 'KWS',  # 6 keywords
    #     'input': (512, 64),
    #     'output': ('up', 'down', 'left', 'right', 'stop', 'go', 'UNKNOWN'),
    #     'weight': (1, 1, 1, 1, 1, 1, 0.06),
    #     'loader': KWS_get_datasets,
    # },
    {
        'name': 'KWS',  # 6 keywords
        'input': (512, 64),
        'output': ('awake', 'blink', 'breathing', 'close', 'open', 'go', 'UNKNOWN'),
        'weight': (1, 1, 1, 1, 1, 1, 0.06),
        'loader': KWS_get_datasets,
    },

5.问题记录

5.1.训练脚本执行提示缺少distiller

在运行tran.py的时候,会有如下错误提示:

root@If40ff448100801c1e:/hy-tmp/ai8x-training# ./scripts/train_kws20_v3.sh
Traceback (most recent call last):
  File "train.py", line 96, in <module>
    import distiller
ModuleNotFoundError: No module named 'distiller'

解决方式如下:

cd distiller
pip3 install .

5.2.添加自己的语音数据之后重新训练无法生效

在训练完成之后,会创建dataset2.pt的处理之后的文件,如果该文件存在,则下一次直接使用该数据进行训练,此时新加入的音频文件无法被加入到训练。

处理方式为:使用如下方式,删除生成dataset2.pt文件,然后重新进行生成,此时新加入数据就可以正常开始数据处理以及之后加入到训练中。

(max78000-training) root@I10203ab1090050122f:/hy-tmp/ai8x-training/data/KWS# cd processed/
(max78000-training) root@I10203ab1090050122f:/hy-tmp/ai8x-training/data/KWS/processed# ls
dataset2.pt
(max78000-training) root@I10203ab1090050122f:/hy-tmp/ai8x-training/data/KWS/processed# rm -rf *
(max78000-training) root@I10203ab1090050122f:/hy-tmp/ai8x-training/data/KWS/processed# cd ..

5.3.语音误检测导致LED状态不断改变

在实际的测试过程中,发现ON、OFF的关键字容易被触发导致ESP8266会频繁控制小灯做出状态切换,不利于实际使用。

解决方案:

添加唤醒词GO,当检测到GO时,进入唤醒状态,此时下一个关键字是ON/OFF的时候才会发出LED控制指令,否则不发送控制指令。

并且为了方便测试退出唤醒状态,还添加关键字STOP,当检测到关键字STOP时,此时如果处于唤醒状态,则自动退出唤醒状态,此时,控制指令ON/OFF将无效化。

6.实现结果展示

6.1.开灯展示

MAX78000识别结果如下

67363584 Word starts from index: 67359616, avg:361 > 350 
67369088: Word ends, Appends 7040 zeros 
67369088: Starts CNN: 389
67369088: Completes CNN: 389
CNN Time: 1849 us
Min: -65,   Max: 76 
----------------------------------------- 
Detected word: GO (100.0%)
----------------------------------------- 
67380608 Word starts from index: 67376640, avg:466 > 350 
67386496: Word ends, Appends 6656 zeros 
67386496: Starts CNN: 390
67386496: Completes CNN: 390
CNN Time: 1849 us
Min: -47,   Max: 46 
----------------------------------------- 
Detected word: ON (96.6%)
----------------------------------------- 

ESP32处理结果如下

keyword: GO
probability: 100.00
keyword: ON
probability: 99.90
send msg: 1

LED点亮展示

FkZi6CeA0K8O60ereRYxuDksCXSK

6.2.LED关闭延时

MAX78000识别结果

----------------------------------------- 
67141248 Word starts from index: 67137280, avg:388 > 350 
67148672: Word ends, Appends 5120 zeros 
67148672: Starts CNN: 386
67148672: Completes CNN: 386
CNN Time: 1849 us
Min: -124,   Max: 123 
----------------------------------------- 
Detected word: GO (97.7%)
----------------------------------------- 
67171328 Word starts from index: 67167360, avg:398 > 350 
67177728: Word ends, Appends 6144 zeros 
67177728: Starts CNN: 387
67177728: Completes CNN: 387
CNN Time: 1849 us
Min: -100,   Max: 89 
----------------------------------------- 
Detected word: OFF (99.6%)
----------------------------------------- 

EESP32处理结果如下

keyword: GO
probability: 99.30
keyword: OFF
probability: 98.40
send msg: 0

led关闭展示

Fv9PX2HDMppLWakfW6jA1Wb3PnRy

7.PikaPython适配

在本次的活动中,在MAX78000上实现了PikaPython的适配。PikaPython是一个完全重写的超级轻量级python引擎,零依赖,零配置,可以在Falsh<=64KB,RAM<=4KB的平台下运行。

7.1.下载源码

PikaPython源码从github上拉取,通过使用pikaPackage.exe以及requestment.txt即可完成类似于pip的包下载方式。

requestment.txt的内容如下所示,然后双击运行oikaPakage.exe即可

pikascript-core==v1.12.0
PikaStdLib==v1.12.0

拉取前的目录结构如下所示:

FkEQ0N5ueg5QngSKnwpSvyTjxxOf

拉取过程如下所示:

FrKyRx8m8Emgj19s59-0g96314M2

拉取之后的目录结构如下所示:

Fs7r0400gdJQktVu4MOJPqaJZ2Z8

7.2.添加PikaScript

在MAX78000的Hello_World的例程的Makefile文件中,进行源文件以及头文件的添加,然后执行编译即可。

FqaUJbiwcGS6jSMwW0cABw94rr-T

7.3.调用脚本解释器

在PikaScript目录下创建main.py进行测试,其内容如下所示,主要内容是输出递增信息

count = 0
while True:
    count += 1
    print('count %d' % count)

然后在MAX78000的例程中添加对于该脚本预编译为C代码的函数进行调用,具体代码如下所示

/***** Includes *****/
#include <stdio.h>
#include <stdint.h>
#include "mxc_device.h"
#include "led.h"
#include "board.h"
#include "mxc_delay.h"
#include "pikaScript.h"

/***** Definitions *****/

/***** Globals *****/

/***** Functions *****/

// *****************************************************************************
int main(void)
{
    int count = 0;

    printf("Hello World!\n");

    PikaObj* pikaMain = pikaScriptInit();

    while (1) {
    }
}

添加内容如下所示:FuikQzzc8jP2o9Vy_tqBFlcwbr6z

7.4.运行结果展示

将程序编译然后上传到MAX78000之后,显示结果如下所示,依照main.py的内容正常输出count的递增结果。

FotblGg1gcRUUn1TW_2R52o9CVxD

8.小结

通过本次活动的板卡测试了MAX78000这枚可以进行深度学习推理的MCU,效果的确不错,小成本就能获得不错的收益,外设也很丰富,资料齐全。

只是深度学习模型的训练,太依赖于数据集的数量了,自己添加的数据集量太少,训练结果只能说是及其拉胯...

在研究Micropython的时候,发现了PikaPython这个更为小巧,移植更为简单的python解释器,在本次项目中也在MAX78000上实现了python脚本的运行,但是时间有限,还没有支持文件下载运行、交互式运行。后续如果有机会可以在其它活动中进行进一步完善。

附件下载
语音识别.7z
语音识别控制小灯全套代码:ESP32代码,实现MAX78000识别关键字转换为LED MQTT控制指令操作;8266代码,实现接受LED MQTT控制指令,控制LED改变状态操作;MAX78000代码,实现关键字识别以及通过串口输出功能
PikaPython移植.7z
PikaPython在MAX78000上运行的实例工程
团队介绍
努力搬砖,争取不当股东
评论
0 / 100
查看更多
目录
硬禾服务号
关注最新动态
0512-67862536
info@eetree.cn
江苏省苏州市苏州工业园区新平街388号腾飞创新园A2幢815室
苏州硬禾信息科技有限公司
Copyright © 2024 苏州硬禾信息科技有限公司 All Rights Reserved 苏ICP备19040198号