大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【电子技术】【2013.06】FPGA硬件加速器——设计方法和权衡的案例研究

已有 919 次阅读 2021-1-21 18:08 |系统分类:科研笔记|文章来源:转载

图片


本文为美国罗切斯特理工学院(作者:Matthew V. Ryan)的硕士论文,共65页。

 

以前的研究表明,任何计算的性能都与执行它的体系结构直接相关。因此,使用异构系统可以提高计算密集型应用程序的性能。这些系统由各种处理器架构组成,如CPU、FPGA、DSP和GPU。单个计算可以在异构系统中的不同处理器架构上并行执行。通过利用实现库中的现有设计来执行计算。这些库中缺少FPGA加速器,因此需要设计额外的实现函数

 

开发FPGA加速器的不同设计方法导致实现在性能、设计时间和资源利用率方面各不相同。一个特定的方法和支持工具集可以为设计生成更好的结果。

 

设计FPGA加速器的常用方法是从算法中开发系统架构,并使用硬件描述语言(HDL)对其进行建模。另一种方法是直接从软件实现转换为HDL,这个过程被称为高级合成(HLS)

 

通过比较不同的线性代数运算,可以检验这两种技术的优缺点。许多线性代数运算本质上是并行的,这使得它们有可能成为通过FPGA实现加速的好选择。特别地,矩阵乘法是一个很好的候选测试,因为它不仅具有并行性,而且还具有多种不同的算法。本研究的目的是设计不同的矩阵乘法加速器,并深入了解每个设计过程的优缺点。

 

Previous research has shown that the performance of any computation is directly related to the architecture on which it is performed. As a result, the performance of compute intensive applications can be improved using heterogeneous systems. These systems consist of various processor architectures such as CPU, FPGA, DSP, and GPU. Individual computations can be performed in parallel on different processor architecrues within the heterogeneous system. Computations are performed by utilizing existing designs from implementation libraries. There is a lack of FPGA accelerators for use in these libraries and as such additional implementations need to be designed.

Different design methodologies for developing FPGA accelerators result in implementations that vary in performance, design time, and resource utilization. A particular method and supporting toolset may produce better results for one type of design than another.

The customary method for designing FPGA accelerators is to develop the system architecture from an algorithm and model it using a hardware decription language (HDL). Another method is to convert directly from a software implementation to HDL. This process is known as high level synthesis (HLS).

The advantages and disadvantages of these two techniques can be examined through comparison of different linear algebra operations. Many linear algebra operations are parallel in nature which makes them potentially good choices to speedup through implementation on an FPGA. In particular, matrix multiplication is an excellent candidate for examination due to not only its parallelism but also its multitude of different algorithms. The goal of this research is to design different matrix multiplication accelerators and provide insight into the advantages and disadvantages of each design procedure.

 

1.       项目背景与动机

2. 相关工作

3. 自定义的实现方式

4. HLS实现

5. 系统设计

6. 结果

7. 设计时间比较

8. 组合的自定义/HLS设计流程

9. 结论


更多精彩文章请关注公众号:205328s611i1aqxbbgxv19.jpg




https://wap.sciencenet.cn/blog-69686-1268283.html

上一篇:[转载]【无人机】【2015.12】基于无人机视觉系统的飞机检测与跟踪
下一篇:[转载]【计算机科学】【2017.12】基于马尔可夫决策过程的认知雷达目标跟踪
收藏 IP: 60.169.68.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-27 06:18

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部