一、项目介绍
本项目使用MAX78000羽毛板,实现了基于手势识别的简单人机交互系统。训练网络的backbone使用resnet18,并在此基础上进行修改适配开发板的资源;数据集使用网上开源的部分数据集和自己采集的数据集进行训练;使用lcd,结合手势识别训练的结果实现人机交互功能。
本项目设计了3项交互功能,首先便进行手势识别,进行不同的功能界面:
- 根据手势识别的结果,对一张拆开的图片各部分进行拼图,若按正确的顺序识别即可获得完整、正确的图片;
- 显示开发板的相关硬件信息;
- 与开发板外设进行交互,根据手势识别的结果改变字体颜色,并反转LED1的亮灭。
二、项目设计思路
(1)Windows上进行模型搭建于调试,首先搭建一个准确率较高的手势识别模型
(2)然后移植到linux 的max78000的ai8x 库上,再次进行模型调试修改、训练、量化和评估
(3)其次再进行模型转换部署到开发板上
(4)最后基于开发板和lcd实现手势识别和人机交互功能
图1 项目设计框图
功能演示说明:
① 图片拼接(Puzz页面):将一张完整的图片分成四份后使用Img2Lcd软件取模,取模时上下颠倒,取16位真彩色,不包含图像
头数据,在程序中,加上图像的大小头数据;
原图像如下所示:
分成四份后:
显示时,如:手势“1”代表4份的第1份,手势“2”代表第二份,即使用4个手势分别代表原完整图像的4
个不同部分;
进入该功能页面后,当识别到手势时,保持手势约5s,即lcd中显示count=7时,才会显示该手势代表的图片
部分,依次显示在屏幕下方,重新将4个部分拼接成完整图像,若按正确的手势识别顺序进行识别,就会还原原来的完整图像,否则图像就不是原图像。
举例说明:
(a)若手势识别顺序为:“1” 、“2” 、“3” 、“5” ,则显示正确的完整图像:
(b)若手势识别顺序为:“2” 、“3” 、“1” 、“5” ,则显示错误图像:
(c)若手势识别顺序为:“3” 、“3” 、“5” 、“5” ,也显示错误图像:
② 信息显示(Info页面):
直接显示 max78000 的相关信息;
③ 外设控制(Peri页面):
“1”、“2”、“3”、“5”四个手势分别代表 “红”、“蓝”、“黄”、“绿”四种字体颜色,根据手势识别结果,改变lcd显示
的字体颜色
三、搜集素材的思路
(1)寻找网上的开源的手势识别分类数据集,添加自定义标签
图2 开源数据集
(2)自己采集的数据集
图3 自己采集数据集
四、预训练结果及关键代码说明
4.1 预训练结果
(1)Windows:
图4 windows训练结果
(2)Linux-训练结果
图5 linux训练结果
(3)量化结果
图6 量化结果
(4)评估结果
图7 评估结果
(5)生成npy文件
图8 生成npy文件
(6)模型转换
图9 模型转换结果
4.2 关键代码说明
4.2.1 数据集分割生成标签代码
将数据按1:9分割成数据集:测试集,读取文件夹的内容并整理成标签生成训练集和测试集的标签文件:train.txt 和 test.txt。
import os
import torch
import random
from PIL import Image
from torchvision import transforms
# 90%当训练集
train_ratio = 0.9
# 剩下的当测试集
test_ratio = 1 - train_ratio
#class_all = ["0", "1", "2", "3", "4", "5"]
class_all = ["0", "1", "2", "3", "4"]
root_data = "/home/xb/Maxim/ai8x-training/data/gesture/self_dataset/gesture/"
train_list, test_list = [], []
data_list = []
for j in range(len(class_all)):
temp = class_all[j] # A, ...
root = root_data + temp + "/"
# print(root)
for a, b, c in os.walk(root):
# print(a) # 路径
# print(b) # 空 []
# print(c) # 文件夹内所有内容
for i in range(len(c)):
data_list.append(os.path.join(a, c[i]))
for i in range(0, int(len(c) * train_ratio)):
train_data = os.path.join(a, c[i]) + ' ' + str(j) + '\n'
train_list.append(train_data)
for i in range(int(len(c) * train_ratio), len(c)):
test_data = os.path.join(a, c[i]) + ' ' + str(j) + '\n'
test_list.append(test_data)
print("len(test):", len(test_list))
print("len(train):", len(train_list))
random.shuffle(train_list)
random.shuffle(test_list)
with open('train.txt', 'w', encoding='UTF-8') as f:
for train_img in train_list:
f.write(str(train_img))
with open('test.txt', 'w', encoding='UTF-8') as f:
for test_img in test_list:
f.write(test_img)
4.2.2 模型代码
基于resnet18的4个blk块进行删减和细节修改,修改stride和pad,以及卷积层的卷积核大小和blk块里的层数、类型。
import torch
import torch.nn as nn
from torch.nn import functional as F
import ai8x
class ResBlk(nn.Module):
"""
resnet block
"""
def __init__(self, ch_in, ch_out, stride=1, bias=False, **kwargs): # 要传入输入、输出的维度
"""
:param ch_in:
:param ch_out:
"""
super(ResBlk, self).__init__()
self.ch_in = ch_in
self.ch_out = ch_out
self.conv1 = ai8x.FusedMaxPoolConv2dReLU(ch_in, ch_out, kernel_size=3, pool_size=2, pool_stride=2, stride=stride, padding=1, bias=bias, **kwargs)
self.conv2 = ai8x.FusedConv2dReLU(ch_out, ch_out, kernel_size=3, stride=1, padding=1, bias=bias, **kwargs)
#self.conv3 = ai8x.FusedConv2dBNReLU(ch_out, ch_out, kernel_size=3, stride=1, padding=1, bias=bias, **kwargs)
self.conv4 = ai8x.FusedConv2dReLU(ch_out, ch_out, kernel_size=1, stride=1, bias=bias, **kwargs)
#self.conv2 = ai8x.FusedConv2dReLU(ch_out, ch_out, kernel_size=3, stride=1, padding=1, bias=bias, **kwargs)
# self.extra = nn.Sequential()
# if ch_out != ch_in:
# # [b, ch_in, h, w] => [b, ch_out, h, w]
# self.extra = nn.Sequential(
# ai8x.Conv2d(ch_in, ch_out, kernel_size=1, stride=stride, bias=bias, **kwargs),
# #nn.BatchNorm2d(ch_out)
# )
self.resid1 = ai8x.Add()
#self.extra = ai8x.Conv2d(ch_out, ch_out, kernel_size=3, stride=stride, padding=1, bias=bias, **kwargs)
self.extra = ai8x.FusedConv2dReLU(ch_out, ch_out, kernel_size=1, stride=stride, bias=bias, **kwargs)
def forward(self, x):
"""
:param x: [b, ch, h, w]
:return:
"""
#out = F.relu(self.bn1(self.conv1(x)))
#out = self.bn2(self.conv2(out))
#print("out1:", x)
x = self.conv1(x)
#print("x:", x.shape)
out = self.conv2(x)
#out = self.conv3(out)
out = self.conv4(out)
#print("out:", x.shape)
# short cut.
# extra module: [b, ch_in, h, w] => [b, ch_out, ch_out, h, w]
# element-wise add:
# out = self.extra(x) + out
out = self.extra(out)
out = self.resid1(out, x)
return out
class ResNet18(nn.Module):
def __init__(self,num_classes=6, num_channels=3,dimensions=(64, 64), bias=False, **kwargs):
super(ResNet18, self).__init__()
#self.conv1 = nn.Sequential(
# ai8x.Conv2d(3, 64, kernel_size=3, stride=3, padding=0),
# nn.BatchNorm2d(64)
#)
# self.conv1 = ai8x.FusedConv2dReLU(3, 32, kernel_size=3, stride=3, padding=0, bias=bias, #**kwargs)
self.conv2 = ai8x.FusedMaxPoolConv2dReLU(3, 16, kernel_size=3, pool_size=2, pool_stride=2, stride=1, padding=1, bias=bias, **kwargs)
# 修改为 stride=2, padding=1 后就是 64->32
# pool_size=2, pool_stride=2 就是 32*32
# pool_size=1, pool_stride=3 就是 22*22
# followed 4 blocks
# [b, 64, h, w] => [b, 128, h, w] # 输入 22*22
self.blk1 = ResBlk(16, 32, stride=1) # 11*11
# [b, 128, h, w] => [b, 256, h, w]
self.blk2 = ResBlk(32, 32, stride=1) # 6*6
# [b, 256, h, w] => [b, 5112, h, w]
self.blk3 = ResBlk(32, 64, stride=1) # 3*3
# [b, 512, h, w] => [b, 1024, h, w]
self.blk4 = ResBlk(64, 64, stride=1) # 2*2
self.conv3 = ai8x.FusedConv2dReLU(64, 64, kernel_size=1, stride=1, padding=0, bias=bias, **kwargs)
self.outlayer = ai8x.Linear(64 * 2 * 2, 5)
def forward(self, x):
"""
:param x:
:return:
"""
# x = F.relu(self.conv1(x))
# print(x.shape) # torch.Size([64, 9, 64, 64])
# x = x[:, :3, :, :]
#print(x.shape) # torch.Size([3, 64, 64])
#x = self.conv1(x)
x = self.conv2(x)
#print("x1:",x.shape)
# [b, 64, h, w] => [b, 1024, h, w]
x = self.blk1(x)
#print("x2:",x.shape)
x = self.blk2(x)
#print("x3:",x.shape)
x = self.blk3(x)
#print("x4:",x.shape)
x = self.blk4(x)
#print("x5:",x.shape)
x = self.conv3(x)
x = x.view(x.size(0), -1)
#print("x7:",x.shape)
x = self.outlayer(x)
#print("x8:",x.shape)
return x
def ai85net_gesture(pretrained=False, **kwargs):
assert not pretrained
return ResNet18(**kwargs)
models = [
{
'name': 'ai85net_gesture',
'min_input': 1,
'dim': 3,
},
]
4.2.3 数据集导入代码
训练时加载数据集代码,将数据resize为64*64并转换成Tensor,适当添加数据增强方法。
"""
figerprint-recognition Dataset
"""
import os
import numpy as np
import torchvision
from torchvision import transforms
from PIL import Image
import ai8x
import torch
from torch.utils.data import Dataset
class FVCDataset(Dataset):
def __init__(self, data_file, transform=None, dir=None):
# 所有图片的绝对路径
# self.datas=os.listdir(data_file)
self.transform = transform
self.dir = dir
fh = open(data_file, 'r')
imgs = []
label = []
for line in fh:
line = line.strip('\n')
line = line.rstrip()
words = line.split()
imgs.append((words[0]))
label.append((words[1]))
self.imgs = imgs
self.label = label
def __getitem__(self, index):
# img_path=self.datas[index]
data = 0
path=self.imgs[index]
img=Image.open(path).convert('RGB')
label = int(self.label[index])
if self.transform is not None:
# data = torch.tensor(np.array(data))
img = self.transform(img) # torch.Size([3, 64, 64])
# print('data:',data.shape)
# return torch.from_numpy(np.array(data)), torch.from_numpy(np.array(label))
#print(img.shape) # torch.Size([9, 64, 64])
return torch.from_numpy(np.array(img)), torch.from_numpy(np.array(label))
def __len__(self):
return len(self.imgs)
def gesture_get_datasets(data, load_train=True, load_test=True):
"""
Load the figerprint-recognition dataset.
The original training dataset is split into training and validation sets (code is
inspired by https://github.com/ZhugeKongan/Fingerprint-Recognition-pytorch-for-mcu).
By default we use a 90:10 (45K:5K) training:validation split.
The output of torchvision datasets are PIL Image images of range [0, 1].
"""
(data_dir, args) = data
#data_dir='/disks/disk2/lishengyan/dataset/fingerprint/'
data_dir="/home/xb/Maxim/ai8x-training/data/gesture/self_dataset/gesture/"
if load_train:
data_path =data_dir+'train.txt'
dir = 'train'
train_transform = transforms.Compose([
# transforms.RandomCrop(32, padding=4),
# transforms.RandomHorizontalFlip(),
transforms.Resize((64,64)),
#transforms.ColorJitter(
# brightness=(0.3, .8),
# contrast=(.7, 1),
# saturation=0.2,
# ),
#transforms.RandomHorizontalFlip(p=0.5),
#transforms.RandomApply(([
# transforms.ColorJitter(),
# ]), p=0.3),
transforms.ToTensor(),
ai8x.normalize(args=args)
])
train_dataset = FVCDataset(data_path,train_transform,dir)
else:
train_dataset = None
if load_test:
data_path = data_dir + 'test.txt'
dir='test'
test_transform = transforms.Compose([
transforms.Resize((64,64)),
transforms.ToTensor(),
ai8x.normalize(args=args)
])
test_dataset = FVCDataset(data_path,test_transform,dir)
if args.truncate_testset:
test_dataset.data = test_dataset.data[:1]
else:
test_dataset = None
return train_dataset, test_dataset
datasets = [
{
'name': 'gesture',
'input': (3, 64, 64),
'output': ('0', '1', '2', '3', '4'),
'loader': gesture_get_datasets,
},
]
4.2.4 模型转换yaml文件配置代码
根据模型配置yaml文件,模型每层的channel通道用来设置processors,残差连接部分使用 eltwise:add,此时operation就要用passthrough
---
# FaceNet sequential model ending with avg_pool, CHW(big data) data_format
arch: ai85net_gesture
dataset: gesture
# layer0
layers:
- out_offset: 0x1000
processors: 0x0000000000000007
operation: conv2d
max_pool: 2
pool_stride: 2
pad: 1
kernel_size: 3x3
activate: ReLU
data_format: HWC
# blk1
- processors: 0x0ffff00000000000 # 16
out_offset: 0x0000
operation: conv2d
activate: ReLU
max_pool: 2
pool_stride: 2
kernel_size: 3x3
pad: 1
output_processors: 0x00000000ffffffff # 32
- processors: 0x00000000ffffffff
out_offset: 0x2000
write_gap: 1
operation: passthrough
output_processors: 0x00000000ffffffff # 32
name: res1
- operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU
write_gap: 1
out_offset: 0x4000
processors: 0x00000000ffffffff
- operation: conv2d
kernel_size: 1x1
pad: 0
activate: ReLU
write_gap: 1
out_offset: 0x6000
processors: 0x00000000ffffffff
- operation: conv2d
out_offset: 0x2004
kernel_size: 1x1
pad: 0
activate: ReLU
name: res2
write_gap: 1
processors: 0x00000000ffffffff
# layer6 + blk2
- in_sequences: [res1, res2]
processors: 0x00000000ffffffff
in_offset: 0x2000
out_offset: 0x0000
operation: conv2d
eltwise: add
max_pool: 2
pool_stride: 2
activate: ReLU
kernel_size: 3x3
pad: 1
- processors: 0x00000000ffffffff
out_offset: 0x2000
operation: passthrough
write_gap: 1
output_processors: 0x00000000ffffffff
name: res3
- operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU
write_gap: 1
out_offset: 0x4000
processors: 0x00000000ffffffff
- operation: conv2d
kernel_size: 1x1
pad: 0
activate: ReLU
out_offset: 0x6000
processors: 0x00000000ffffffff
- operation: conv2d
out_offset: 0x2004
kernel_size: 1x1
pad: 0
activate: ReLU
name: res4
write_gap: 1
processors: 0x00000000ffffffff
# layer11 + blk3
- in_sequences: [res3, res4]
processors: 0x00000000ffffffff
in_offset: 0x2000
out_offset: 0x0000
operation: conv2d
eltwise: add
activate: ReLU
max_pool: 2
pool_stride: 2
kernel_size: 3x3
pad: 1
- processors: 0xffffffffffffffff # 64
out_offset: 0x2000
operation: passthrough
write_gap: 1
output_processors: 0xffffffffffffffff
name: res5
- operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU
write_gap: 1
out_offset: 0x4000
processors: 0xffffffffffffffff
- operation: conv2d
kernel_size: 1x1
pad: 0
activate: ReLU
out_offset: 0x6000
processors: 0xffffffffffffffff
- operation: conv2d
out_offset: 0x2004
kernel_size: 1x1
pad: 0
activate: ReLU
name: res6
write_gap: 1
processors: 0xffffffffffffffff
# layer16 + blk4
- in_sequences: [res5, res6]
processors: 0xffffffffffffffff
in_offset: 0x2000
out_offset: 0x0000
operation: conv2d
eltwise: add
activate: ReLU
max_pool: 2
pool_stride: 2
kernel_size: 3x3
pad: 1
- processors: 0xffffffffffffffff
out_offset: 0x2000
operation: passthrough
write_gap: 1
output_processors: 0xffffffffffffffff
name: res7
- operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU
write_gap: 1
out_offset: 0x4000
processors: 0xffffffffffffffff
- operation: conv2d
kernel_size: 1x1
pad: 0
activate: ReLU
out_offset: 0x6000
processors: 0xffffffffffffffff
- operation: conv2d
out_offset: 0x2004
kernel_size: 1x1
pad: 0
activate: ReLU
name: res8
write_gap: 1
processors: 0xffffffffffffffff
# layer17
- in_sequences: [res7, res8]
in_offset: 0x2000
out_offset: 0x0000
eltwise: add
operation: conv2d
kernel_size: 1x1
pad: 0
activate: ReLU
processors: 0xffffffffffffffff
output_processors: 0xffffffffffffffff
# Layer 18 - LINNER
- out_offset: 0x2000
processors: 0xffffffffffffffff
operation: mlp
flatten: true
output_width: 32
五、实现结果展示
Home : 手势“0”
Puzz : 手势“1”
Info : 手势“2”
Peri : 手势“3”
Page : 手势“5”
(1)手势1识别
图10 手势1识别结果图
(2)手势2识别
图11 手势2识别结果图
(3)手势3识别
图12 手势3识别结果图
(4)Peri界面
图13 Peri界面识别结果图
(5)Info界面
图14 Info界面
(6)Puzz界面
该图时拼接错误的展示,拼接成功的展示已在视频中演示;
图15 Puzz界面识别结果图
六、遇到的问题
6.1 量化后评估精度下降严重
6.1.1 第一次量化问题
第一次量化问题是在训练demo时,量化后发现精度下降严重。
该问题表现为:在训练时,精度可以达到95-98%及以上,但是量化后再对模型进行评估时,却发现,测试集的精度非常低,仅仅4-12%,与训练时的高精度,相差甚远。
该问题的解决参考了一篇由博主“诸葛灬孔暗”发布的微信博文,解决方案内容大致如下:经过对量化后和未量化的模型前后精度的测试,发现对量化后的模型评估精度下降了,而对未量化的模型评估时精度正常,即说明是量化过程出现问题:可能模型量化失败但没有报错。
在对底层代码进行分析后,发现模型量化对多GPU训练的模型处理有致命的错误。这是问题产生的根本原因。
正常量化的权重是这样的:
图16 正常量化权重
多GPU训练的权重是这样的:
图17 多GPU训练权重
图16、17均来自上述博文,由于遇到该问题时忘记保存问题图片,所以使用了上述博文中阐述的相同问题图片代替。
解决方法有两种,一种简单,另一种对开发更友好。
(1)使用CPU或者单GPU训练,修改权重关键字
训练时使用单GPU训练,加上命令--gpus 0,并且将train.py的339-340两行使用多GPU的命令注释掉。
(2)优化权重加载代码
方法1应用范围有限,比如无法适用于多GPU训练。解决问题还是从根源出发比较好。根据测试,我们知道模型没有量化(或者说优化,评估时会自动进行)会导致评估异常,而train.py在进行评估时,会先将源模型保存为一个前缀为__obselete的模型文件,然后对模型进行优化测试并保存为新的模型文件,最后还要将两个模型进行对比,然后从旧模型更新参数,所以最简单做法就是直接注释掉。
综上,我将两个方法都尝试了一遍,发现量化demo后精度果然恢复正常。
6.1.2 第二次量化问题
第二次遇到量化后评估精度下降问题是在使用自己的模型进行训练时出现的,使用上述两种方法均无法解决,因此只能尝试寻找不同点来找问题。
经过对比和查阅相关资料最后发现,我的模型中使用了nn的库,需要将其全部修改为ai8x库,修改后,量化精度问题解决。
6.2 部署到开发板后,网络模型预测效果差
在训练、量化、评估后模型均没有问题,且精度达到较高的水平,但是部署到开发板后,网络预测的结果精度却非常低,可以说基本没有效果。
经过反复测试、对照和查阅资料,唯一可能存在问题的地方就是在配置yaml文件处,最初配置时,并没有细究各个部分的用处,导致部署后网络精度差。在查阅官方提供的yaml文件配置资料后,发现:
- 卷积层的步长无法设置且默认为1;
- Pooling层的步长无法设置超过3;
- processors 和 output_processors 其实就是输入、输出通道channel的大小;
- 残差部分要使用“eltwise:add”,模型也要做修改,使用add。
经过对yaml文件和模型的修改后,再次部署,精度恢复到了较高的水平。
七、未来计划
(1)模型还可以进行升级,现模型对于复杂环境、复杂背景下手势的识别精度不高;
(2)使用检测类算法,由分类升级为检测,提高实用性;
(3)增加数据集的数量和复杂度。