基于max78000的Jarvis ——漫威人物识别
该项目使用了max78000,C语言,python,实现了贾维斯V0.1的设计,它的主要功能为:漫威英雄人物识别。 该项目使用了undefined,实现了undefined的设计,它的主要功能为:undefined。
标签
嵌入式系统
沐雨
更新2024-01-10
浙江师范大学
596

1 项目介绍

       漫威漫画公司(Marvel Comics)与DC漫画公司(Detective Comics)并称美国两大漫画巨头,它创建于1939年,于1961年正式定名为Marvel。几十年来,围绕漫威IP制作的动画、游戏、电影和商品等作品数不胜数。许多超级英雄如蜘蛛侠、钢铁侠、美国队长等都深受全国观众的喜爱。在漫威宇宙中,有一套人工智能系统,名为贾维斯(J.A.R.V.I.S.),它作为钢铁侠的战斗助手出现,具有日常语音交互 + 基础图像识别 + 程序化语句执行 + 云端数据库索引查询 + 网络设备基础控制等功能,是强人工智能的代表。

       虽然我们目前无法创造出和电影一样的强大独属人工智能系统,但是随着计算机视觉、深度学习、图像处理等相关技术的发展,它的功能实现也逐渐变成了一种“可能”。本项目的目标是基于图像识别等技术,利用MAX78000开发板,实现贾维斯的基础图像识别功能,即对超级英雄图片输入的精确识别。由于漫威超级人物数量过于庞大(几千个),如果要实现全部识别,所需的资源和算力远超个人云端服务器和开发板的限制。因此,本项目仅选取部分耳熟能详的英雄人物(24个)进行识别。

贾维斯.png

 图1 电影中贾维斯启动界面图

2 项目设计思路

       “贾维斯V0.1 “具有两大功能:超级英雄人物名称识别和关系识别。

       Kaggle是由联合创始人、首席执行官安东尼·高德布卢姆(Anthony Goldbloom)2010年在墨尔本创立的,主要为开发商和数据科学家提供举办机器学习竞赛、托管数据库、编写和分享代码的平台本项目的训练数据集节选自kaggle平台上的SH2022,链接:SH2022 Dataset (kaggle.com)

       数据集中包含了24位超级英雄的相关漫画、影视图片,不同图片间具有一定的差异性。图片尺寸大部分为 224x224x3,总共有80616张图片,已经被预先划分为训练集和测试集两部分,非常适合拿来做人物分类模型的训练素材。

图片示例:

0.jpg61.jpg2.jpg

图2 从左到右分别为:蜘蛛侠,小浣熊,美国队长

3 效果展示

       摁下按键后摄像头图片读取,转化为图像显示并且保存

image.png

图3 读取RGB565并转换

       实物演示图片

image.png

图4 max78000与目标拍摄图像

       捕捉到的较好图像

1.jpg

       调用模型进行结果预测分析

image.png

       可以看出,虽然实际拍摄过程中有绿灯的干扰和泛白色差的情况出现,但是模型仍然可以成功识别。

4 项目实现过程

image.png

图5 项目框图

       由于刚开始接触这个比赛时,仅具有32单片机开发经验的我对linux各种操作指令和神经网络模型搭建都一无所知。经过几个星期如无头苍蝇般乱撞摸索后,我决定还是从windos系统利用anaconda,pycharm,pytorch搭建环境进行训练,争取先对神经网络训练的整个过程,常见的函数,数据集处理等操作有一个初步的了解。

       随后在autodl云服务器上,参照美信官方文档,进行linux平台的模型训练,最终得到C代码可烧录程序。

       接下来我将围绕这两个系统为主线,进行一些经验分享,希望能够对大家有所帮助。

4.1 Windows 平台

4.1.1 环境安装

       以我一个小白的角度来看,尽管网上有许多教程,但神经网络模型训练的前置环境安装仍不简单,特别是一开始连基础命令行操作都不会的时候。

       需要安装的重要软件有pycharm,anaconda以及官方自带的MDSK相关软件如(eclipse)

       关于前两个软件,我认为哔哩哔哩上的“我是土堆”的教程视频相对完全,大家可以照着视频一步步安装

【最详细的 Windows PyTorch 入门深度学习环境安装与配置 CPU GPU | 土堆教程】 https://www.bilibili.com/video/BV1S5411X7FY/?share_source=copy_web&vd_source=34de5d98e3835fbec74f3447647c0018

image.png



       官方自带的SDK安装一般来说有两种方式,第一种是用按照github上官方文档的代码下载安装,不仅需要有一定的命令行基础,还需要能够稳定高速访问github的工具,不然很容易出现下载到一半,因为网络出错导致安装失败。一般来说更推荐方法二,在网盘中下载离线安装包,可以参考Z同学的安装教程https://www.eetree.cn/project/detail/2187,里面提供了较为详细的安装步骤,此法只需适当更改文件目录后直接套用相关代码即可完成安装,同时还提供了pycharm和vscode的相关环境更改教程.

       照着上面两位的教程,基本上能过顺利实现安装,少走很多弯路,安装过程中遇到的大部分问题都可以通过搜索引擎和交流群提问解决。我这里遇到一个有关新版anaconda和pycharm配置的问题,在这里补充下解决方案.

       旧版:

image.png

       旧版本的conda environment是通过导入python.exe来实现的,但是新版本中无此文件,需要改用conda.bat来替换

       新版:

image.png



4.1.2 深度学习入门

       在刚开始进行项目时,我曾盲目地照着教程进行嵌入式平台的深度学习训练,可以说是乱操作,瞎操作,不了解相关代码意思,白白浪费了很多时间。由于windows系统具有较强的可视化性和交互性,我在意识到自己需要补充相关基础知识时,选择了利用pycharm上的pytorch实现一些基础功能,学习相关案例。这看上去和我们嵌入式训练毫不相干,但是对于新手来说,这是一个门槛相对更低,同时对于嵌入式平台代码开发就有很强借鉴意义的训练。

       在这里我推荐两位Up主“我是土堆”和“同济子豪兄”的教程 ,来实现快速入门。

       相关视频链接如下:

       水果图像识别

https://www.bilibili.com/video/BV12d4y1P7xz/?share_source=copy_web&vd_source=34de5d98e3835fbec74f3447647c0018

image.png


【PyTorch深度学习快速入门教程(绝对通俗易懂!)【小土堆】】 https://www.bilibili.com/video/BV1hE411t7RN/?share_source=copy_web&vd_source=34de5d98e3835fbec74f3447647c0018

image.png


       由于本项目的目标是实现超级英雄分类,在学习完上述两个教程后,我也尝试着自己搭建了一个resnet-18模型来实现分类

       resnet-18又名残差网络,18代表其含有权重的层数,在yaml官方文件指引中也有相关介绍,因此这个模型我们可以在大体框架不变的情况下,改用ai8x库中的函数实现模型移植。

       其网络结构如下:

image.png

图6 resnet18网络结构

       在这里我也用其进行了英雄分类训练

相关调用代码如下(考虑到篇幅这里直截取一部分,完整代码会上传到文章附件

# 主程序入口使用ResNet-18模型对图像进行分类,并训练模型并保存模型参数。
if __name__ == "__main__":
# 用于初始化ResNet-18 模型
model_ft = torchvision.models.resnet18(pretrained=False)
# 加载resnet18网络参数
model_ft.load_state_dict(torch.load('../model/resnet18.pth'))
# 提取fc层中固定的参数
num_ftrs = model_ft.fc.in_features
# 重写全连接层的分类
model_ft.fc = nn.Linear(num_ftrs, len(os.listdir('../data/train')))
model_ft = model_ft.to(device)
# 这里使用分类交叉熵Cross-Entropy作为损失函数,动量SGD作为优化器
criterion = nn.CrossEntropyLoss()
# 初始化优化器
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# 每7个epochs衰减LR通过设置gamma = 0.1
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
#调用通用训练模型的函数,返回训练得到的最佳模型
'''
下面调用了定义好的模型的训练函数。最后一个参数num_epochs用于指定训练模型的轮数,训练模型的轮数是一个重要的参数,
它决定了模型训练的时间和精度。训练轮数太短可能会导致模型欠拟合,训练轮数太长则可能会导致模型过拟合。
因此,在选择训练轮数时需要进行适当的调整和优化,以得到最佳的模型性能。
在这个例子中,由于数据集较小,所以训练轮数设置为 20轮,在实际应用中,
需要根据数据集的大小、复杂度和模型的性能等因素进行调整,以便得到更为准确和实用的模型。 '''
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=20)
# 保存模型
torch.save(model_ft.state_dict(), './models/dobot20231127.pkl')
训练结果

train Loss: 0.3329 Acc: 0.9176

test Loss: 0.3092 Acc: 0.9065
Epoch 1/19
----------

Epoch 19/19
----------
train Loss: 0.0043 Acc: 0.9992

test Loss: 0.2752 Acc: 0.9249
 
Training complete in 98m 8s

Best test Acc: 0.927536

预测结果,展示原图

image.pngimage.png


       准确率高达92%!

       可以看出在Windows平台上resnet进行本目标分类模型训练是完全足够的,至此,windows平台的基础入门已经完成。

       在自己电脑上训练可以看到时间动辄需要几小时,还会导致电脑发热,声音过大等问题,更推荐用网上服务器进行模型训练,下面我将进行介绍。

4.2Linux平台

4.2.1 环境安装

       autodl有一个好处就是它的服务器都是自带anaconda和python,按照上届使用阿里云的选手的教程一步步安装即可

文章链接:https://www.eetree.cn/project/detail/1333

image.png


在操作过程中,我经常遇到三个问题

1.XXX Permission denied

       这个问题主要出现在Jupyterlab开发时直接在文件目录中进行创建,而权限混乱时。

       解决方法:在终端输入”chmod -R 777 文件名 “将文件权限更改为所有人可以编辑使用即可


2.No module named XXX

如:

image.png

这是由于使用了未安装的前置包,一般来说 使用pip install指令即可解决

例如:pip install xxhash

部分包可能由于是国外源下载过慢,这里推荐先启动学术加速后在安装下载

在终端中输入:source /etc/network_turbo


3.代码无法运行

       这个问题就可能有多种情况,需要注意的是,在py等文件中编辑代码时,可能由于输入法原因或者代码复制粘贴时的错误,出现缩进不一致。这时可以将代码复制到Pycharm当中,利用其自带的缩进显示即可快速排查出问题所在。

       除此之外,部分代码直接使用python XXX可能无法运行,但是编辑成脚本文件后,调用脚本文件却可以运行,推荐尽量使用脚本文件调用来运行代码。

       最后一个就是环境启动问题,很容易出现环境未启动运行代码显示各种缺失,这里推荐将常用环境转换代码编辑成块保存至自己电脑文档中,方便快捷地粘贴使用即可,能显著提高效率。

如:

/root/autodl-tmp/78000/ai8x-synthesis/sample
/root/autodl-tmp/78000/ai8x-training/data/data/test
autodl-tmp/78000/ai8x-synthesis/tests/convert_sample.py
tests/convert_sample.py

快速实现样本文件转换


4.2.2 模型搭建

       由于官方文档中有关于残缺网络模型的介绍,参照:MaximIntegratedAI/ai8x-training: ADI公司MAX78000和MAX78002 AI器件的模型训练 (github.com)

       而我之前又在Windows平台上实现了不错的训练成果,因此我首先想到的就是套用之前的模型代码。但是各种报错。虽然美信也是基于pytorch进行模型开发部署,但使用的CNN模板库都已经重写过了,添加了支持量化和max78000部署的设计。因此,开发自己的AI模型首先需要根据美信自定义的 PyTorch 类重写AI模型。任何设计为在 MAX78000 上运行的模型都应该使用这些类。ai8x.py与默认类torch.nn.Module有三个主要变化:

  1. 额外的“融合”操作,用于模型中的池化和激活层;
  2. 与硬件匹配的舍入和裁剪;
  3. 支持量化操作(使用-8命令行参数时)。

数据变换与预处理:(/datasets/fa.sh)

将图像统一裁剪为64*64的尺寸,同时进行亮度变化,随机翻转等操作,提高数据集的泛用性。

import os

from torchvision import transforms
from torchvision.datasets import ImageFolder
import ai8x

def marvel_get_datasets(data, load_train=True, load_test=True):
(data_dir, args) = data

if load_train:
train_transform = transforms.Compose([
# transforms.GaussianBlur(kernel_size=(3, 3), sigma=(0.1, 5)),
transforms.RandomAffine(degrees=10, translate=(0.05, 0.05), shear=5),
transforms.RandomPerspective(distortion_scale=0.3, p=0.2),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(brightness=0.7, contrast=0.7, saturation=0.7),
# transforms.RandomGrayscale(0.2),
transforms.Resize((64, 64)),
transforms.ToTensor(),
ai8x.normalize(args=args),
])

train_dataset = ImageFolder(root='/root/autodl-tmp/78000/ai8x-training/data/data/train', transform=train_transform)
else:
train_dataset = None

if load_test:
test_transform = transforms.Compose([
transforms.Resize((64, 64)),
transforms.ToTensor(),
ai8x.normalize(args=args),
])

test_dataset = ImageFolder(root='/root/autodl-tmp/78000/ai8x-training/data/data/test', transform=test_transform)
else:
test_dataset = None

return train_dataset, test_dataset


datasets = [
{
'name': 'fa',
'input': (3, 64, 64),
'output': ('Ant Man', 'Aquaman', 'Batman', 'Black Panther', 'Black Widow', 'Captain America', 'Captain Marvel', 'Deadpool', 'Dr Strange', 'Falcon', 'Flash', 'Green Lantern', 'Hawkeye', 'Hulk', 'Iron Man', 'Joker', 'Nebula', 'Rocket Raccon', 'Scarlet Witch', 'Shazam', 'Spider Man', 'Super Man', 'Thor', 'Wonder Woman'),
'loader': marvel_get_datasets,
},
]

if __name__ == '__main__':
from PIL import Image
train_dataset, test_dataset = marvel_get_datasets(('/root/autodl-tmp/78000/ai8x-training/data/data', None))
for i in range(20):
train_out_dir = '/root/autodl-tmp/78000/ai8x-training/data/data/train'
test_out_dir = '/root/autodl-tmp/78000/ai8x-training/data/data/test'
transform = transforms.ToPILImage()
img = train_dataset[i][0]
img = transform(img)
img.save(os.path.join(train_out_dir, f'train{i}.png'))
img = test_dataset[i][0]
img = transform(img)
img.save(os.path.join(train_out_dir, f'test{i}.png'))

 

根据库改写的简单resnet模型如下:(/models/famodule.py)

包含5个池化层 和 7个卷积层,将输入的图像转换后通过全连接层实现分类目标

import torch
import torch.nn as nn

import ai8x


class AI85Net_f1(nn.Module):
"""
改写ai8x库的残缺网络
"""
def __init__(self, num_classes=24, num_channels=3,dimensions=(64, 64), bias=False, **kwargs):
super().__init__()

# self.conv1 = ai8x.FusedConv2dReLU(num_channels, 16, 3, stride=1, padding=1, bias=bias, **kwargs)
self.conv1 = ai8x.FusedMaxPoolConv2dReLU(num_channels, 16, 3, pool_size=2, pool_stride=2,
stride=1, padding=1, bias=bias, **kwargs)
self.conv2 = ai8x.FusedConv2dReLU(16, 20, 3, stride=1, padding=1, bias=bias, **kwargs)
self.conv3 = ai8x.FusedConv2dReLU(20, 20, 3, stride=1, padding=1, bias=bias, **kwargs)
self.conv4 = ai8x.FusedConv2dReLU(20, 20, 3, stride=1, padding=1, bias=bias, **kwargs)
self.resid1 = ai8x.Add()
self.conv5 = ai8x.FusedMaxPoolConv2dReLU(20, 20, 3, pool_size=2, pool_stride=2,
stride=1, padding=1, bias=bias, **kwargs)
self.conv6 = ai8x.FusedConv2dReLU(20, 20, 3, stride=1, padding=1, bias=bias, **kwargs)
self.resid2 = ai8x.Add()
self.conv7 = ai8x.FusedConv2dReLU(20, 44, 3, stride=1, padding=1, bias=bias, **kwargs)
self.conv8 = ai8x.FusedMaxPoolConv2dReLU(44, 48, 3, pool_size=2, pool_stride=2,
stride=1, padding=1, bias=bias, **kwargs)
self.conv9 = ai8x.FusedConv2dReLU(48, 48, 3, stride=1, padding=1, bias=bias, **kwargs)
self.resid3 = ai8x.Add()
self.conv10 = ai8x.FusedMaxPoolConv2dReLU(48, 32, 3, pool_size=2, pool_stride=2,
stride=1, padding=0, bias=bias, **kwargs)
# self.conv11 = ai8x.FusedAvgPoolConv2dReLU(96, 32, 1, pool_size=2, pool_stride=1,
# padding=0, bias=bias, **kwargs)
self.fc = ai8x.Linear(32*2*2, num_classes, bias=True, wide=True, **kwargs)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

def forward(self, x): # pylint: disable=arguments-differ
"""Forward prop"""

x = self.conv1(x) # 16x32x32
x_res = self.conv2(x) # 20x32x32
x = self.conv3(x_res) # 20x32x32
x = self.resid1(x, x_res) # 20x32x32
x = self.conv4(x) # 20x32x32
x_res = self.conv5(x) # 20x16x16
x = self.conv6(x_res) # 20x16x16
x = self.resid2(x, x_res) # 20x16x16
x = self.conv7(x) # 44x16x16
x_res = self.conv8(x) # 48x8x8
x = self.conv9(x_res) # 48x8x8
x = self.resid3(x, x_res) # 48x8x8
x = self.conv10(x) # 96x4x4
# x = self.conv11(x) # 512x2x2
# print(x.size())
x = x.view(x.size(0), -1)
x = self.fc(x)
return x


def ai85fa1(pretrained=False, **kwargs):
"""
Constructs a AI85Net5 model.
"""
assert not pretrained
return AI85Net_f1(**kwargs)


models = [
{
'name': 'ai85fa1',
'min_input': 1,
'dim': 2,
},

] 


4.2.3训练过程记录

实际过程中由于模型一开始并非最优,同时出现了各种问题,故此处仅展示部分模型的训练效果。 (使用3090显卡训练了1小时)

2023-12-22 13:12:03,013 - Log file for this run: /root/autodl-tmp/78000/ai8x-training/logs/2023.12.22-131203/2023.12.22-131203.log
2023-12-22 13:12:06,479 - Optimizer Type: <class 'torch.optim.adam.Adam'>
2023-12-22 13:12:06,479 - Optimizer Args: {'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0.0, 'amsgrad': False}
2023-12-22 13:12:06,791 - Dataset sizes:
training=79835
validation=759
test=759
2023-12-22 13:12:06,791 - Reading compression schedule from: policies/schedule-marvel.yaml
2023-12-22 13:12:06,796 -

2023-12-22 13:12:06,796 - Training epoch: 79835 samples (128 per mini-batch)
2023-12-22 13:12:10,066 - Epoch: [0][ 10/ 624] Overall Loss 3.189340 Objective Loss 3.189340 LR 0.001000 Time 0.326921
2023-12-22 13:12:11,186 - Epoch: [0][ 20/ 624] Overall Loss 3.186848 Objective Loss 3.186848 LR 0.001000 Time 0.219448
2023-12-22 13:12:12,685 - Epoch: [0][ 30/ 624] Overall Loss 3.183414 Objective Loss 3.183414 LR 0.001000 Time 0.196226
2023-12-22 13:12:13,728 - Epoch: [0][ 40/ 624] Overall Loss 3.180689 Objective Loss 3.180689 LR 0.001000 Time 0.173231
2023-12-22 13:12:15,407 - Epoch: [0][ 50/ 624] Overall Loss 3.178444 Objective Loss 3.178444 LR 0.001000 Time 0.172155
2023-12-22 13:12:16,589 - Epoch: [0][ 60/ 624] Overall Loss 3.177114 Objective Loss 3.177114 LR 0.001000 Time 0.163169
2023-12-22 13:12:18,018 - Epoch: [0][ 70/ 624] Overall Loss 3.175762 Objective Loss 3.175762 LR 0.001000 Time 0.160263
2023-12-22 13:12:19,029 - Epoch: [0][ 80/ 624] Overall Loss 3.174467 Objective Loss 3.174467 LR 0.001000 Time 0.152857
2023-12-22 13:12:20,453 - Epoch: [0][ 90/ 624] Overall Loss 3.172292 Objective Loss 3.172292 LR 0.001000 Time 0.151690
2023-12-22 13:12:21,470 - Epoch: [0][ 100/ 624] Overall Loss 3.170104 Objective Loss 3.170104 LR 0.001000 Time 0.146688
2023-12-22 13:12:23,309 - Epoch: [0][ 110/ 624] Overall Loss 3.167526 Objective Loss 3.167526 LR 0.001000 Time 0.150068
2023-12-22 13:12:24,535 - Epoch: [0][ 120/ 624] Overall Loss 3.163776 Objective Loss 3.163776 LR 0.001000 Time 0.147780
2023-12-22 13:12:25,988 - Epoch: [0][ 130/ 624] Overall Loss 3.160534 Objective Loss 3.160534 LR 0.001000 Time 0.147587
2023-12-22 13:12:27,010 - Epoch: [0][ 140/ 624] Overall Loss 3.156668 Objective Loss 3.156668 LR 0.001000 Time 0.144340
2023-12-22 13:12:28,495 - Epoch: [0][ 150/ 624] Overall Loss 3.151955 Objective Loss 3.151955 LR 0.001000 Time 0.144616
2023-12-22 13:12:29,515 - Epoch: [0][ 160/ 624] Overall Loss 3.149120 Objective Loss 3.149120 LR 0.001000 Time 0.141946
2023-12-22 13:12:31,386 - Epoch: [0][ 170/ 624] Overall Loss 3.145820 Objective Loss 3.145820 LR 0.001000 Time 0.144603
2023-12-22 13:12:32,404 - Epoch: [0][ 180/ 624] Overall Loss 3.142398 Objective Loss 3.142398 LR 0.001000 Time 0.142221
2023-12-22 13:12:33,800 - Epoch: [0][ 190/ 624] Overall Loss 3.138780 Objective Loss 3.138780 LR 0.001000 Time 0.142080
2023-12-22 13:12:34,812 - Epoch: [0][ 200/ 624] Overall Loss 3.134052 Objective Loss 3.134052 LR 0.001000 Time 0.140034
2023-12-22 13:12:36,429 - Epoch: [0][ 210/ 624] Overall Loss 3.130013 Objective Loss 3.130013 LR 0.001000 Time 0.141063
2023-12-22 13:12:37,745 - Epoch: [0][ 220/ 624] Overall Loss 3.127031 Objective Loss 3.127031 LR 0.001000 Time 0.140633
2023-12-22 13:12:39,164 - Epoch: [0][ 230/ 624] Overall Loss 3.123732 Objective Loss 3.123732 LR 0.001000 Time 0.140685
2023-12-22 13:12:40,151 - Epoch: [0][ 240/ 624] Overall Loss 3.121415 Objective Loss 3.121415 LR 0.001000 Time 0.138933
2023-12-22 13:12:41,590 - Epoch: [0][ 250/ 624] Overall Loss 3.118425 Objective Loss 3.118425 LR 0.001000 Time 0.139130
2023-12-22 13:12:42,603 - Epoch: [0][ 260/ 624] Overall Loss 3.115832 Objective Loss 3.115832 LR 0.001000 Time 0.137676
2023-12-22 13:12:44,020 - Epoch: [0][ 270/ 624] Overall Loss 3.112909 Objective Loss 3.112909 LR 0.001000 Time 0.137824
2023-12-22 13:12:45,024 - Epoch: [0][ 280/ 624] Overall Loss 3.109520 Objective Loss 3.109520 LR 0.001000 Time 0.136484
2023-12-22 13:12:46,458 - Epoch: [0][ 290/ 624] Overall Loss 3.106657 Objective Loss 3.106657 LR 0.001000 Time 0.136723
2023-12-22 13:12:47,472 - Epoch: [0][ 300/ 624] Overall Loss 3.103554 Objective Loss 3.103554 LR 0.001000 Time 0.135542
2023-12-22 13:12:48,964 - Epoch: [0][ 310/ 624] Overall Loss 3.100075 Objective Loss 3.100075 LR 0.001000 Time 0.135981
2023-12-22 13:12:49,955 - Epoch: [0][ 320/ 624] Overall Loss 3.096546 Objective Loss 3.096546 LR 0.001000 Time 0.134827
2023-12-22 13:12:51,892 - Epoch: [0][ 330/ 624] Overall Loss 3.095006 Objective Loss 3.095006 LR 0.001000 Time 0.136611
2023-12-22 13:12:53,000 - Epoch: [0][ 340/ 624] Overall Loss 3.092562 Objective Loss 3.092562 LR 0.001000 Time 0.135850
2023-12-22 13:12:54,766 - Epoch: [0][ 350/ 624] Overall Loss 3.089351 Objective Loss 3.089351 LR 0.001000 Time 0.137013
2023-12-22 13:12:55,745 - Epoch: [0][ 360/ 624] Overall Loss 3.085655 Objective Loss 3.085655 LR 0.001000 Time 0.135927
2023-12-22 13:12:57,136 - Epoch: [0][ 370/ 624] Overall Loss 3.082963 Objective Loss 3.082963 LR 0.001000 Time 0.136011
2023-12-22 13:12:58,519 - Epoch: [0][ 380/ 624] Overall Loss 3.079355 Objective Loss 3.079355 LR 0.001000 Time 0.136071
2023-12-22 13:12:59,612 - Epoch: [0][ 390/ 624] Overall Loss 3.077248 Objective Loss 3.077248 LR 0.001000 Time 0.135384
2023-12-22 13:13:01,280 - Epoch: [0][ 400/ 624] Overall Loss 3.073946 Objective Loss 3.073946 LR 0.001000 Time 0.136167
2023-12-22 13:13:02,591 - Epoch: [0][ 410/ 624] Overall Loss 3.071275 Objective Loss 3.071275 LR 0.001000 Time 0.136042
2023-12-22 13:13:04,032 - Epoch: [0][ 420/ 624] Overall Loss 3.068194 Objective Loss 3.068194 LR 0.001000 Time 0.136232
2023-12-22 13:13:04,975 - Epoch: [0][ 430/ 624] Overall Loss 3.065454 Objective Loss 3.065454 LR 0.001000 Time 0.135256
2023-12-22 13:13:06,342 - Epoch: [0][ 440/ 624] Overall Loss 3.062620 Objective Loss 3.062620 LR 0.001000 Time 0.135290
2023-12-22 13:13:07,330 - Epoch: [0][ 450/ 624] Overall Loss 3.061413 Objective Loss 3.061413 LR 0.001000 Time 0.134477
2023-12-22 13:13:08,720 - Epoch: [0][ 460/ 624] Overall Loss 3.061051 Objective Loss 3.061051 LR 0.001000 Time 0.134575
2023-12-22 13:13:10,122 - Epoch: [0][ 470/ 624] Overall Loss 3.059812 Objective Loss 3.059812 LR 0.001000 Time 0.134693
2023-12-22 13:13:11,853 - Epoch: [0][ 480/ 624] Overall Loss 3.057819 Objective Loss 3.057819 LR 0.001000 Time 0.135494
2023-12-22 13:13:13,188 - Epoch: [0][ 490/ 624] Overall Loss 3.055687 Objective Loss 3.055687 LR 0.001000 Time 0.135452
2023-12-22 13:13:14,633 - Epoch: [0][ 500/ 624] Overall Loss 3.053592 Objective Loss 3.053592 LR 0.001000 Time 0.135632
2023-12-22 13:13:15,636 - Epoch: [0][ 510/ 624] Overall Loss 3.052252 Objective Loss 3.052252 LR 0.001000 Time 0.134937
2023-12-22 13:13:17,285 - Epoch: [0][ 520/ 624] Overall Loss 3.050222 Objective Loss 3.050222 LR 0.001000 Time 0.135513
2023-12-22 13:13:18,585 - Epoch: [0][ 530/ 624] Overall Loss 3.048126 Objective Loss 3.048126 LR 0.001000 Time 0.135408
2023-12-22 13:13:20,025 - Epoch: [0][ 540/ 624] Overall Loss 3.046032 Objective Loss 3.046032 LR 0.001000 Time 0.135567
2023-12-22 13:13:21,014 - Epoch: [0][ 550/ 624] Overall Loss 3.043823 Objective Loss 3.043823 LR 0.001000 Time 0.134899
2023-12-22 13:13:22,451 - Epoch: [0][ 560/ 624] Overall Loss 3.041580 Objective Loss 3.041580 LR 0.001000 Time 0.135056
2023-12-22 13:13:23,457 - Epoch: [0][ 570/ 624] Overall Loss 3.039217 Objective Loss 3.039217 LR 0.001000 Time 0.134451
2023-12-22 13:13:24,921 - Epoch: [0][ 580/ 624] Overall Loss 3.037450 Objective Loss 3.037450 LR 0.001000 Time 0.134657
2023-12-22 13:13:25,974 - Epoch: [0][ 590/ 624] Overall Loss 3.035085 Objective Loss 3.035085 LR 0.001000 Time 0.134158
2023-12-22 13:13:27,373 - Epoch: [0][ 600/ 624] Overall Loss 3.033019 Objective Loss 3.033019 LR 0.001000 Time 0.134252
2023-12-22 13:13:28,462 - Epoch: [0][ 610/ 624] Overall Loss 3.031329 Objective Loss 3.031329 LR 0.001000 Time 0.133837
2023-12-22 13:13:30,351 - Epoch: [0][ 620/ 624] Overall Loss 3.030050 Objective Loss 3.030050 LR 0.001000 Time 0.134724
2023-12-22 13:13:30,959 - Epoch: [0][ 624/ 624] Overall Loss 3.029147 Objective Loss 3.029147 Top1 10.502283 Top5 53.881279 LR 0.001000 Time 0.134835
2023-12-22 13:13:31,049 - --- validate (epoch=0)-----------
2023-12-22 13:13:31,049 - 759 samples (128 per mini-batch)
2023-12-22 13:13:32,375 - Epoch: [0][ 6/ 6] Loss 2.886372 Top1 13.833992 Top5 45.059289
2023-12-22 13:13:32,460 - ==> Top1: 13.834 Top5: 45.059 Loss: 2.886

2023-12-22 13:13:32,466 - ==> Best [Top1: 13.834 Top5: 45.059 Sparsity:0.00 Params: 82272 on epoch: 0]
2023-12-22 13:13:32,466 - Saving checkpoint to: logs/2023.12.22-131203/checkpoint.pth.tar
2023-12-22 13:13:32,480 -

......
2023-12-22 13:55:24,272 - Epoch: [30][ 490/ 624] Overall Loss 1.537187 Objective Loss 1.537187 LR 0.001000 Time 0.132512
2023-12-22 13:55:25,615 - Epoch: [30][ 500/ 624] Overall Loss 1.537269 Objective Loss 1.537269 LR 0.001000 Time 0.132546
2023-12-22 13:55:27,101 - Epoch: [30][ 510/ 624] Overall Loss 1.537550 Objective Loss 1.537550 LR 0.001000 Time 0.132860
2023-12-22 13:55:28,127 - Epoch: [30][ 520/ 624] Overall Loss 1.537369 Objective Loss 1.537369 LR 0.001000 Time 0.132278
2023-12-22 13:55:29,576 - Epoch: [30][ 530/ 624] Overall Loss 1.536828 Objective Loss 1.536828 LR 0.001000 Time 0.132515
2023-12-22 13:55:30,619 - Epoch: [30][ 540/ 624] Overall Loss 1.536296 Objective Loss 1.536296 LR 0.001000 Time 0.131993
2023-12-22 13:55:32,055 - Epoch: [30][ 550/ 624] Overall Loss 1.536633 Objective Loss 1.536633 LR 0.001000 Time 0.132203
2023-12-22 13:55:33,048 - Epoch: [30][ 560/ 624] Overall Loss 1.537612 Objective Loss 1.537612 LR 0.001000 Time 0.131614
2023-12-22 13:55:34,765 - Epoch: [30][ 570/ 624] Overall Loss 1.537210 Objective Loss 1.537210 LR 0.001000 Time 0.132317
2023-12-22 13:55:35,867 - Epoch: [30][ 580/ 624] Overall Loss 1.536446 Objective Loss 1.536446 LR 0.001000 Time 0.131935
2023-12-22 13:55:37,263 - Epoch: [30][ 590/ 624] Overall Loss 1.536435 Objective Loss 1.536435 LR 0.001000 Time 0.132064
2023-12-22 13:55:38,350 - Epoch: [30][ 600/ 624] Overall Loss 1.535875 Objective Loss 1.535875 LR 0.001000 Time 0.131674
2023-12-22 13:55:39,778 - Epoch: [30][ 610/ 624] Overall Loss 1.535607 Objective Loss 1.535607 LR 0.001000 Time 0.131856
2023-12-22 13:55:40,808 - Epoch: [30][ 620/ 624] Overall Loss 1.534582 Objective Loss 1.534582 LR 0.001000 Time 0.131390
2023-12-22 13:55:41,324 - Epoch: [30][ 624/ 624] Overall Loss 1.534226 Objective Loss 1.534226 Top1 59.817352 Top5 83.105023 LR 0.001000 Time 0.131374
2023-12-22 13:55:41,416 - --- validate (epoch=30)-----------
2023-12-22 13:55:41,416 - 759 samples (128 per mini-batch)
2023-12-22 13:55:42,817 - Epoch: [30][ 6/ 6] Loss 1.720627 Top1 53.754941 Top5 83.794466
2023-12-22 13:55:42,908 - ==> Top1: 53.755 Top5: 83.794 Loss: 1.721

2023-12-22 13:55:42,910 - ==> Best [Top1: 55.468 Top5: 86.430 Sparsity:0.00 Params: 82272 on epoch: 24]
2023-12-22 13:55:42,910 - Saving checkpoint to: logs/2023.12.22-131203/qat_checkpoint.pth.tar
2023-12-22 13:55:42,930 -

2023-12-22 13:55:42,930 - Training epoch: 79835 samples (128 per mini-batch)
2023-12-22 13:55:45,833 - Epoch: [31][ 10/ 624] Overall Loss 1.496132 Objective Loss 1.496132 LR 0.001000 Time 0.290178
2023-12-22 13:55:47,135 - Epoch: [31][ 20/ 624] Overall Loss 1.486714 Objective Loss 1.486714 LR 0.001000 Time 0.210148
2023-12-22 13:55:47,643 -
2023-12-22 13:55:47,644 - Log file for this run: /root/autodl-tmp/78000/ai8x-training/logs/2023.12.22-131203/2023.12.22-131203.log

4.2.4模型量化(autodl-tmp/78000/ai8x-synthesis/scripts/quantize_fa.sh)

#!/bin/sh
python quantize.py ./trained/ai8x-new64-qat8.pth.tar ./trained/ai8x-new641-qat8-q.pth.tar --device MAX78000 -v "$@"-c networks/hero.yaml

显示结果

(max78000) root@autodl-container-ee5311983c-c38dbca7:~/autodl-tmp/78000/ai8x-synthesis# conda activate max78000
(max78000) root@autodl-container-ee5311983c-c38dbca7:~/autodl-tmp/78000/ai8x-synthesis# cd /root/autodl-tmp/78000/ai8x-synthesis
(max78000) root@autodl-container-ee5311983c-c38dbca7:~/autodl-tmp/78000/ai8x-synthesis# scripts/quantize_fa.sh
Configuring device: MAX78000
Reading networks/hero.yaml to configure network...
NOTICE: Defaulting to "no activation" for mlp in layer sequence 13 in YAML configuration.
Converting checkpoint file ./trained/ai8x-new64-qat8.pth.tar to ./trained/ai8x-new641-qat8-q.pth.tar

Model keys (state_dict):
conv1.output_shift, conv1.weight_bits, conv1.bias_bits, conv1.quantize_activation, conv1.adjust_output_shift, conv1.shift_quantile, conv1.op.weight, conv1.op.bias, conv2.output_shift, conv2.weight_bits, conv2.bias_bits, conv2.quantize_activation, conv2.adjust_output_shift, conv2.shift_quantile, conv2.op.weight, conv2.op.bias, conv3.output_shift, conv3.weight_bits, conv3.bias_bits, conv3.quantize_activation, conv3.adjust_output_shift, conv3.shift_quantile, conv3.op.weight, conv3.op.bias, conv4.output_shift, conv4.weight_bits, conv4.bias_bits, conv4.quantize_activation, conv4.adjust_output_shift, conv4.shift_quantile, conv4.op.weight, conv4.op.bias, conv5.output_shift, conv5.weight_bits, conv5.bias_bits, conv5.quantize_activation, conv5.adjust_output_shift, conv5.shift_quantile, conv5.op.weight, conv5.op.bias, conv6.output_shift, conv6.weight_bits, conv6.bias_bits, conv6.quantize_activation, conv6.adjust_output_shift, conv6.shift_quantile, conv6.op.weight, conv6.op.bias, conv7.output_shift, conv7.weight_bits, conv7.bias_bits, conv7.quantize_activation, conv7.adjust_output_shift, conv7.shift_quantile, conv7.op.weight, conv7.op.bias, conv8.output_shift, conv8.weight_bits, conv8.bias_bits, conv8.quantize_activation, conv8.adjust_output_shift, conv8.shift_quantile, conv8.op.weight, conv8.op.bias, conv9.output_shift, conv9.weight_bits, conv9.bias_bits, conv9.quantize_activation, conv9.adjust_output_shift, conv9.shift_quantile, conv9.op.weight, conv9.op.bias, conv10.output_shift, conv10.weight_bits, conv10.bias_bits, conv10.quantize_activation, conv10.adjust_output_shift, conv10.shift_quantile, conv10.op.weight, conv10.op.bias, fc.output_shift, fc.weight_bits, fc.bias_bits, fc.quantize_activation, fc.adjust_output_shift, fc.shift_quantile, fc.op.weight, fc.op.bias
conv1.op.weight avg_max: 0.2842156 max: 0.4851619 mean: -0.00016793636 factor: [256.] bits: 8
conv1.op.bias avg_max: 0.019166237 max: 0.19449908 mean: -0.019166237 factor: [256.] bits: 8
conv2.op.weight avg_max: 0.40061802 max: 0.49217635 mean: -0.019351257 factor: [256.] bits: 8
conv2.op.bias avg_max: 0.005468032 max: 0.10562769 mean: 0.005468032 factor: [256.] bits: 8
conv3.op.weight avg_max: 0.41723236 max: 0.6840195 mean: -0.028088227 factor: [128.] bits: 8
conv3.op.bias avg_max: 0.030817986 max: 0.12903765 mean: -0.030817986 factor: [128.] bits: 8
conv4.op.weight avg_max: 0.38754082 max: 0.48139024 mean: -0.025951229 factor: [256.] bits: 8
conv4.op.bias avg_max: 0.004316292 max: 0.12373583 mean: 0.004316292 factor: [256.] bits: 8
conv5.op.weight avg_max: 0.41810822 max: 0.5295685 mean: -0.013185449 factor: [128.] bits: 8
conv5.op.bias avg_max: 0.004799053 max: 0.05806088 mean: 0.004799053 factor: [128.] bits: 8
conv6.op.weight avg_max: 0.43570262 max: 0.56144214 mean: -0.027370185 factor: [128.] bits: 8
conv6.op.bias avg_max: 0.02671298 max: 0.12536536 mean: -0.02671298 factor: [128.] bits: 8
conv7.op.weight avg_max: 0.33150923 max: 0.48439544 mean: -0.019768748 factor: [256.] bits: 8
conv7.op.bias avg_max: 0.016853213 max: 0.1298755 mean: -0.016853213 factor: [256.] bits: 8
conv8.op.weight avg_max: 0.3815161 max: 0.62232244 mean: -0.016798494 factor: [128.] bits: 8
conv8.op.bias avg_max: 0.0045858608 max: 0.13316673 mean: -0.0045858608 factor: [128.] bits: 8
conv9.op.weight avg_max: 0.41677853 max: 0.58169824 mean: -0.01796972 factor: [128.] bits: 8
conv9.op.bias avg_max: 0.0020677613 max: 0.13530329 mean: -0.0020677613 factor: [128.] bits: 8
conv10.op.weight avg_max: 0.37181193 max: 0.7121762 mean: -0.0033740979 factor: [128.] bits: 8
conv10.op.bias avg_max: 0.0018625436 max: 0.14630195 mean: -0.0018625436 factor: [128.] bits: 8
fc.op.weight avg_max: 0.8473938 max: 1.1725179 mean: -0.037812572 factor: [64.] bits: 8
fc.op.bias avg_max: 0.023375945 max: 0.16078189 mean: -0.023375945 factor: [64.] bits: 8

可以看到各层均被成功量化。

可能存在的问题:

547dad895a75077401ba55291d9b2d4.png

提示BN层未折叠

解决方案:直接把check_repo函数注释掉即可


4.2.5模型评估(autodl-tmp/78000/ai8x-training/scripts/evaluate_marvel.sh)

#!/bin/sh
python train.py --model ai85fa1 --dataset fa --confusion --evaluate --exp-load-weights-from ../ai8x-synthesis/trained/ai8x-new641-qat8-q.pth.tar -8 --device MAX78000 --use-bias"$@"

该脚本以训练过程中保存的网络参数(qat_best.pth.tar)为输入,输出量化后的网络参数(pokemon-q.pth.tar),还需要指定描述网络的yaml文件(networks/pokemon.yaml),该文件的编写方法可以参考https://github.com/MaximIntegratedAI/MaximAI_Documentation/blob/master/Guides/YAML%20Quickstart.md,其内容如下(autodl-tmp/78000/ai8x-synthesis/networks/hero.yaml):

arch: ai85fa1 #模型调用名称
dataset: fa #数据集

# Define layer parameters in order of the layer sequence

layers:
# Layer 0
- out_offset: 0x2000
processors: 0x0000000000000007 #三通道输入
operation: conv2d
kernel_size: 3x3
max_pool: 2
pool_stride: 2
pad: 1
activate: ReLU
data_format: HWC

# Layer 1
- out_offset: 0x0000
processors: 0x0ffff00000000000
operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU

# Layer 2 - re-form data with gap
- out_offset: 0x2000
processors: 0x00000000000fffff
output_processors: 0x00000000000fffff
operation: passthrough
write_gap: 1

# Layer 3
- in_offset: 0x0000
out_offset: 0x2004
processors: 0x00000000000fffff
operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU

write_gap: 1

# Layer 4 - Residual-1
- in_sequences: [2, 3]
in_offset: 0x2000
out_offset: 0x0000
processors: 0x00000000000fffff
eltwise: add
operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU

# Layer 5
- out_offset: 0x2000
processors: 0xfffff00000000000
output_processors: 0x000000fffff00000
max_pool: 2
pool_stride: 2
pad: 1
operation: conv2d
kernel_size: 3x3
activate: ReLU

# Layer 6 - re-form data with gap
- out_offset: 0x0000
processors: 0x000000fffff00000
output_processors: 0x000000fffff00000
op: passthrough
write_gap: 1

# Layer 7 (input offset 0x0000)
- in_offset: 0x2000
out_offset: 0x0004
processors: 0x000000fffff00000
operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU

write_gap: 1

# Layer 8 - Residual-2 (input offset 0x2000)
- in_sequences: [6, 7]
in_offset: 0x0000
out_offset: 0x2000
processors: 0x000000fffff00000
eltwise: add
operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU

# Layer 9
- out_offset: 0x0000
processors: 0x00000fffffffffff
max_pool: 2
pool_stride: 2
pad: 1
operation: conv2d
kernel_size: 3x3
activate: ReLU

# Layer 10 - re-form data with gap
- out_offset: 0x2000
processors: 0x0000ffffffffffff
output_processors: 0x0000ffffffffffff
op: passthrough
write_gap: 1

# Layer 11
- in_offset: 0x0000
out_offset: 0x2004
processors: 0x0000ffffffffffff
operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU

write_gap: 1

# Layer 12 - Residual-3
- in_sequences: [10, 11]
in_offset: 0x2000
out_offset: 0x0000
processors: 0x0000ffffffffffff
eltwise: add
max_pool: 2
pool_stride: 2
pad: 0
pool_first: false
operation: conv2d
kernel_size: 3x3
activate: ReLU

# Layer 13 - LINNER
- out_offset: 0x2000
processors: 0x000000000ffffffff
operation: mlp
flatten: true
output_width: 32


4.2.6评估结果

(max78000) root@autodl-container-ee5311983c-c38dbca7:~/autodl-tmp/78000/ai8x-training# scripts/evaluate_marvel.sh

Configuring device: MAX78000, simulate=True.
Log file for this run: /root/autodl-tmp/78000/ai8x-training/logs/2024.01.02-222917/2024.01.02-222917.log
WARNING: CUDA hardware acceleration is not available, training will be slow
{'start_epoch': 10, 'weight_bits': 8}
=> loading checkpoint ../ai8x-synthesis/trained/ai8x-new641-qat8-q.pth.tar
=> Checkpoint contents:
+----------------------+-------------+---------+
| Key | Type | Value |
|----------------------+-------------+---------|
| arch | str | ai85fa1 |
| compression_sched | dict | |
| epoch | int | 7 |
| extras | dict | |
| optimizer_state_dict | dict | |
| optimizer_type | type | Adam |
| state_dict | OrderedDict | |
+----------------------+-------------+---------+

=> Checkpoint['extras'] contents:
+-----------------+--------+-------------------+
| Key | Type | Value |
|-----------------+--------+-------------------|
| best_epoch | int | 7 |
| best_mAP | int | 0 |
| best_top1 | float | 55.73122529644269 |
| clipping_method | str | MAX_BIT_SHIFT |
| current_mAP | int | 0 |
| current_top1 | float | 55.73122529644269 |
+-----------------+--------+-------------------+

Loaded compression schedule from checkpoint (epoch 7)
=> loaded 'state_dict' from checkpoint '../ai8x-synthesis/trained/ai8x-new641-qat8-q.pth.tar'
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0.0001, 'nesterov': False}
Dataset sizes:
test=759
--- test ---------------------
759 samples (256 per mini-batch)
Test: [ 3/ 3] Loss 2.285913 Top1 55.467721 Top5 83.926219
==> Top1: 55.468 Top5: 83.926 Loss: 2.286

==> Confusion:
[[13 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0 1 2 1 0 1 1 1 0]
[ 0 20 3 0 1 0 0 0 0 0 0 0 0 0 1 0 0 3 0 0 0 0 0 1]
[ 0 0 20 2 2 4 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 1]
[ 0 0 6 7 2 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0]
[ 0 1 2 0 29 2 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0]
[ 0 2 1 0 6 16 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 2 0 5]
[ 0 1 1 0 4 1 4 0 3 0 0 0 0 0 0 0 0 1 0 0 1 2 0 4]
[ 2 0 0 0 2 0 0 20 0 0 1 0 0 0 0 0 1 0 2 0 2 0 1 1]
[ 0 1 1 0 0 1 3 0 17 0 1 0 0 0 2 1 0 0 0 0 2 1 0 1]
[ 0 0 2 0 1 5 0 0 0 9 0 0 0 0 0 1 1 0 0 0 3 0 4 1]
[ 0 0 0 0 0 0 1 1 2 0 15 0 0 0 4 0 0 2 2 0 0 0 0 1]
[ 0 0 0 0 0 0 0 0 0 0 0 31 0 2 0 0 0 0 0 0 0 0 0 0]
[ 0 0 6 1 3 2 0 0 0 0 0 0 19 0 0 0 0 3 2 0 0 0 0 0]
[ 0 1 0 1 0 0 0 0 0 0 0 0 0 28 0 1 0 0 0 0 0 0 0 3]
[ 0 0 0 1 3 0 0 1 2 1 6 0 0 0 9 0 0 3 4 0 3 3 0 1]
[ 1 2 0 2 1 0 0 0 1 0 0 0 1 1 2 13 2 4 2 2 0 0 2 0]
[ 0 0 0 0 1 0 0 0 0 0 0 0 1 0 3 1 22 0 0 0 0 0 1 0]
[ 0 2 0 1 1 0 2 0 1 1 0 0 1 0 0 0 0 22 1 0 0 0 0 4]
[ 0 0 0 0 1 0 0 2 0 0 1 0 0 0 2 2 0 0 25 0 1 0 1 2]
[ 0 0 0 0 0 0 0 0 2 0 3 0 0 0 0 0 0 0 0 23 0 0 0 1]
[ 3 0 0 0 0 1 0 5 1 0 1 0 1 0 0 1 1 2 1 0 16 2 0 1]
[ 1 0 0 0 2 2 2 0 1 0 0 0 1 0 0 0 0 1 1 0 1 23 1 0]
[ 0 2 1 1 2 3 0 1 0 0 1 0 1 0 1 2 1 3 2 0 1 0 6 4]
[ 0 5 1 0 0 2 1 0 0 0 1 0 0 0 0 2 0 3 1 0 0 0 0 14]]

训练准确率:

......
2023-12-22 13:03:18,556 - Epoch: [7][ 620/ 624] Overall Loss 1.348479 Objective Loss 1.348479 LR 0.001000 Time 0.131116
2023-12-22 13:03:18,889 - Epoch: [7][ 624/ 624] Overall Loss 1.347655 Objective Loss 1.347655 Top1 64.383562 Top5 88.127854 LR 0.001000 Time 0.130809
2023-12-22 13:03:18,972 - --- validate (epoch=7)-----------
2023-12-22 13:03:18,972 - 759 samples (128 per mini-batch)
2023-12-22 13:03:20,180 - Epoch: [7][ 6/ 6] Loss 1.561306 Top1 57.180501 Top5 84.189723
2023-12-22 13:03:20,280 - ==> Top1: 57.181 Top5: 84.190 Loss: 1.561

2023-12-22 13:03:20,282 - ==> Best [Top1: 57.181 Top5: 84.190 Sparsity:0.00 Params: 82272 on epoch: 7]
2023-12-22 13:03:20,282 - Saving checkpoint to: logs/2023.12.22-125208/checkpoint.pth.tar
2023-12-22 13:03:20,297 -

我们可以看到评估准确率相较训练准确率有所下滑,但下降不大,整体识别准确率还可以达到了83%。


但其实这个模型量化有很多坑,因为只有一个好的,且适合ai8x库的模型才能够成功量化评估,不然很可能虽然显示量化成功,却在模型评估中表现很差,准确率几乎为纯随机抽取.

如:

9cee6dd74c15d797d9c01c90704424a.png

这是我之前写的14层模型(参照上届选手),但评估结果远远低于测试时的效果。这一度让我陷入了自我怀疑之中,后来通过逐渐摸索才意识到,是由于量化失败的原因,该模型量化报错如下:

5bf07b66f213a12e7e4ae3ea38b28b4.png

理论上应该14层量化,可最后只量化了7层,最让人摸不着头脑的是,它这个量化过程不会给你报错,但是会在评估和C语言程序转换的时候各种报错。

怀疑可能是由于强制使用--use-bias的原因,但是目前还没有探索出相应的代码更改方案,如果不使用,则会直接报错.

image.png


因此我建议大家在模型更改时候,一定要注意模型的特征和ai8x库的特征,很多模型无法直接适用于某些更改后的情况(如更改输入的尺寸),而自行乱添加和更改层数,会导致模型的学习率与准确率下滑。还是建议大家一开始按照例程或者说网上成功大佬的代码来理解和运行,自行更改适合了解模型结构和库特点后进行,个人认为这就属于进阶操作了。很遗憾,很少能够在网上搜索到相关内容教程,由于时间原因,此方面的研究我也感觉心有余而力不足。

希望大家在一开始不要浪费过多的时间在自建模型上,创造的基础是模仿,不然很容易打击继续学习下去的信心。


4.2.7模型C语言代码转换

C语言工程文件生成需要三个文件:

(1) 量化后的模型(trained/ai8x-new641-qat8-q.pth.tar)

(2) 网络描述文件(networks/hero.yaml)

(3) 输入样本文件(sample_fa.npy

 

生成样本文件(autodl-tmp/78000/ai8x-synthesis/scripts/marvel.sh)内容如下:

./train.py --model ai85fa1 --save-sample 30 --dataset fa --evaluate --exp-load-weights-from ../ai8x-synthesis/trained/ai8x-new641-qat8-q.pth.tar -8 --device MAX78000 "$@"

生成C语言工程文件(autodl-tmp/78000/ai8x-synthesis/scripts/generate.sh)内容如下:

运行以下脚本进行模型转换,其中--test-dir需指定pkl文件的文件夹,--checkpoint-file为量化之后的模型位置,转换后的模型在./trained/文件夹中

./ai8xize.py --verbose --test-dir demos --prefix ai85fa1 --checkpoint-file ./trained/ai8x-new641-qat8-q.pth.tar --config-file networks/hero.yaml --device MAX78000 --compact-data --mexpress --softmax --fifo --overwrite

运行结果如下:

image.png

随后即可在(autodl-tmp/78000/ai8x-synthesis/demos/ai85fa1)看到相关代码了

image.png


至此,我们得到了可以部署在MAX78000上的超级英雄分类模型,完成了超级英雄分类的任务。除此之外,我们还需要摄像头,串口,显示屏等模块来实现人机交互。官方例程中的cats and dogs demo 已经配备好TFT屏幕,摄像头初始化和一定的串口显示代码,我们可以在它的基础上进行适当更改。

4.2.8 单片机例程更改

第一个需要修改的地方是i的范围,此处按理来说应该更改为为64 X 64 =4096而原来输入为128x128 =16384。但是实际情况却是,如果填写4096的话则会报错,显示寄存器无法寻址,原因有待调查(可以换用了比4096更大的数字).

void load_input(void)

{

// This function loads the sample data input -- replace with actual data



int i;

const uint32_t *in0 = input_0;



for (i = 0; i < 4096; i++) {

// Remove the following line if there is no risk that the source would overrun the FIFO:

while (((*((volatile uint32_t *) 0x50000004) & 1)) != 0); // Wait for FIFO 0

*((volatile uint32_t *) 0x50000008) = *in0++; // Write FIFO 0

}

}

填写4096时报错如下

Info : SWD DPIDR 0x2ba01477

Error: Could not find MEM-AP to control the core

Examination failed, GDB will be halted. Polling again in 3100ms

Polling target max32xxx.cpu failed, trying to reexamine

Info : SWD DPIDR 0x2ba01477

Warn : Connecting DP: stalled AP operation, issuing ABORT

Info : SWD DPIDR 0x2ba01477

Error: Could not find MEM-AP to control the core

Examination failed, GDB will be halted. Polling again in 6300ms

Polling target max32xxx.cpu failed, trying to reexamine

Info : SWD DPIDR 0x2ba01477

Warn : Connecting DP: stalled AP operation, issuing ABORT

Info : SWD DPIDR 0x2ba01477

Error: Could not find MEM-AP to control the core


第二个需要修改的地方是输出的类别名

改为
const char classes[CNN_NUM_OUTPUTS][24] = { "Ant Man", "Aquaman", "Batman", "Black Panther", "Black Widow", "Captain America", "Captain Marvel", "Deadpool", "Dr Strange", "Falcon", "Flash", "Green Lantern", "Hawkeye", "Hulk", "Iron Man", "Joker", "Nebula", "Rocket Raccon", "Scarlet Witch", "Shazam", "Spider Man", "Super Man", "Thor", "Wonder Woman" };

摄像头驱动函数 capture_process_camera(void)内容如下

void capture_process_camera(void)

{

uint8_t *raw;

uint32_t imgLen;

uint32_t w, h;

int cnt = 0;

uint8_t r, g, b;

uint16_t rgb;

int j = 0;

uint8_t *data = NULL;

stream_stat_t *stat;

camera_start_capture_image();


// Get the details of the image from the camera driver.

camera_get_image(&raw, &imgLen, &w, &h);

printf("W:%d H:%d L:%d \n", w, h, imgLen);

#if defined(TFT_ENABLE) && defined(BOARD_FTHR_REVA)

// Initialize FTHR TFT for DMA streaming

MXC_TFT_Stream(TFT_X_START, TFT_Y_START, w, h);

#endif

// Get image line by line

for (int row = 0; row < h; row++) {

// Wait until camera streaming buffer is full

while ((data = get_camera_stream_buffer()) == NULL) {

if (camera_is_image_rcv()) {

break;

}

}

//LED_Toggle(LED2);

#ifdef BOARD_EVKIT_V1

j = IMAGE_SIZE_X * 2 - 2; // mirror on display

#else

j = 0;

#endif

for (int k = 0; k < 4 * w; k += 4) {

// data format: 0x00bbggrr

r = data[k];

g = data[k + 1];

b = data[k + 2];

//skip k+3
// change the range from [0,255] to [-128,127] and store in buffer for CNN

input_0[cnt++] = ((b << 16) | (g << 8) | r) ^ 0x00808080;

// convert to RGB656 for display

rgb = ((r & 0b11111000) << 8) | ((g & 0b11111100) << 3) | (b >> 3);

data565[j] = (rgb >> 8) & 0xFF;

data565[j + 1] = rgb & 0xFF;

#ifdef BOARD_EVKIT_V1

j -= 2; // mirror on display

#else

j += 2;

#endif

}


整体运行下来应该是没有什么大问题的,需要注意的一点是烧录CNN模型时候容易出现:target not examined yet 这时候就需要在下载程序的过程中摁SW4进行重置操作。


在使用Eclipse的过程中,我曾出现过load_init()初始化卡死的情况,在使用debugger一步步运行并且使用变量窗口查看之后才发现,i的值异常。

image.png

结合下方的问题说明,说我缺少gcc,翻开环境变量一看,原来之前清理文件的时候被我删了,重新按照mingw后即可正常运行。


5.1 个人总结

       一直以来,我对AI都有着浓厚的兴趣,但不知该如何学起,没有目标与动力。偶然看到硬禾学堂举行的与嵌入式结合的训练活动,我便激动地参与到了其中。三个月时间的学习过程,可谓充满坎坷。之前我以为自己具有电赛的参赛经验,能够熟练运用stm32单片机和C语言,应该能够很快上手。可现实却是,环境安装便卡了我许久,中途又由于研究模型创建的机理和想要提高识别准确,而陷入了局部的消耗当中,最后以为很简单的模型部署,也受到之前不成熟模型的影响而困难重重,期间面对各种Bug和不熟悉的python语言与linux平台,真的无数次想放弃。好在还有交流群的大佬们,总能够在我卡壳的时候给我帮助和指引,让我有继续改进的目标和动力。回首看自己的学习过程,似乎简单却又十分不易。

       同时这也是我第一次接触相对底层的代码开发,与之前满大街的相关例程不同,max78000作为相对小众的单片机,相关文章可谓少的可怜。再加上全英文的用户手册和官方文档,一开始蛮劝退的。在一遍又一遍翻看上届选手报告文档的过程中,我也逐渐理清了自己的思路,明白了很多相对官方文档更加细节的点,慢慢地也开始主动地去翻阅英文文档,来实现“入门”到“提高”转变。

       可惜由于个人学业压力,很多想法都还未能实现(受网友评论启发,本来是想着做一个超分辨红外测试仪的,结果就咕咕了),包括贾维斯的人物关系识别,原本是想借用Neo4j尝试模仿网上例程构建漫威人物关系图谱的。但受限于目前的编码水平和转移能力,还有单片机的有限资源,只能暂时放弃。

image.png

       图7 关系宇宙图谱示意图

       总的来说,这三个月的学习,可以说真的把我一个AI小白领入门了,同时也让我对嵌入式和AI结合的前景有了新的了解。真的很感谢硬禾学堂和ADI公司能够举办本次活动!让我把自己的爱好和所学相关专业能够结合在一起,体会到了创造的快乐。

       随着科技的发展,我相信未来终有一天,贾维斯将从电影中走出,成为我们生活的一部分。科技使得科幻回归生活,在我看来也是一件非常浪漫的事情。


5.2 个人建议

       强烈建议文档编写支持word直接导入,电子森林的报告文档在篇幅过长之后打字有明显的卡顿,而且快捷操作相对较少,排版比较困难。其次就是希望在入门资料之外,能够提供些提高资料(虽然已经超出活动要求),但是很多时候真的因为没有相关资料而感到心有余而力不足的遗憾。最后就是希望能够更多的工程师解答环节。


附件下载
trycnn.7z
train.7z
sythsis.7z
团队介绍
沐雨
评论
0 / 100
查看更多
硬禾服务号
关注最新动态
0512-67862536
info@eetree.cn
江苏省苏州市苏州工业园区新平街388号腾飞创新园A2幢815室
苏州硬禾信息科技有限公司
Copyright © 2024 苏州硬禾信息科技有限公司 All Rights Reserved 苏ICP备19040198号