博文

Fluent 并行核数对比

已有 3326 次阅读 2022-11-1 23:18 |系统分类:科研笔记

测试平台：2颗 epyc 7742 Fluent2020

网格12w，density based，开启energy，计算161 迭代步

核数	时间/迭代步 [s/iter]
10	0.124
20	0.068
40	0.023
60	0.018
80	0.016

可见，核数越多，越快。

但是，如果同时开启2个以上fluent，

速度会相互影响。

Performance Timer for 161 iterations on 10 compute nodes
Average wall-clock time per iteration:              0.124 sec
Global reductions per iteration:                      101 ops
Global reductions time per iteration:               0.000 sec (0.0%)

Message count per iteration: 2284 messages

Data transfer per iteration: 2.981 MB

LE solves per iteration:                                3 solves
LE wall-clock time per iteration:                   0.035 sec (28.5%)
LE global solves per iteration:                         1 solves
LE global wall-clock time per iteration:            0.000 sec (0.2%)
LE global matrix maximum size:                        62
AMG cycles per iteration:                           3.000 cycles
Relaxation sweeps per iteration:                      226 sweeps
Relaxation exchanges per iteration:                     0 exchanges
LE early protections (stall) per iteration:           0.000 times
LE early protections (divergence) per iteration:      0.000 times
Total SVARS touched:                              369
Time-step updates per iteration:                     0.31 updates
Time-step wall-clock time per iteration:            0.002 sec (1.9%)

Total wall-clock time:                             19.916 sec

Performance Timer for 330 iterations on 20 compute nodes
Average wall-clock time per iteration:              0.068 sec
Global reductions per iteration:                      101 ops
Global reductions time per iteration:               0.000 sec (0.0%)
Message count per iteration:                         7687 messages
Data transfer per iteration:                        5.265 MB
LE solves per iteration:                                3 solves
LE wall-clock time per iteration:                   0.020 sec (28.7%)
LE global solves per iteration:                         1 solves
LE global wall-clock time per iteration:            0.000 sec (0.6%)
LE global matrix maximum size:                       192
AMG cycles per iteration:                           3.000 cycles
Relaxation sweeps per iteration:                      212 sweeps
Relaxation exchanges per iteration:                     0 exchanges
LE early protections (stall) per iteration:           0.000 times
LE early protections (divergence) per iteration:      0.000 times
Total SVARS touched:                              369
Time-step updates per iteration:                     0.30 updates
Time-step wall-clock time per iteration:            0.001 sec (2.1%)

Total wall-clock time:                             22.477 sec

Performance Timer for 161 iterations on 40 compute nodes
Average wall-clock time per iteration:              0.023 sec
Global reductions per iteration:                      101 ops
Global reductions time per iteration:               0.000 sec (0.0%)
Message count per iteration:                        13744 messages
Data transfer per iteration:                        7.996 MB
LE solves per iteration:                                3 solves
LE wall-clock time per iteration:                   0.008 sec (33.1%)
LE global solves per iteration:                         1 solves
LE global wall-clock time per iteration:            0.001 sec (4.0%)
LE global matrix maximum size:                       602
AMG cycles per iteration:                           3.000 cycles
Relaxation sweeps per iteration:                      200 sweeps
Relaxation exchanges per iteration:                     0 exchanges
LE early protections (stall) per iteration:           0.000 times
LE early protections (divergence) per iteration:      0.000 times
Total SVARS touched:                              369
Time-step updates per iteration:                     0.31 updates
Time-step wall-clock time per iteration:            0.001 sec (2.6%)

Total wall-clock time:                              3.732 sec

Performance Timer for 161 iterations on 60 compute nodes
Average wall-clock time per iteration:              0.018 sec
Global reductions per iteration:                      101 ops
Global reductions time per iteration:               0.000 sec (0.0%)
Message count per iteration:                        22381 messages
Data transfer per iteration:                       10.313 MB
LE solves per iteration:                                3 solves
LE wall-clock time per iteration:                   0.006 sec (34.2%)
LE global solves per iteration:                         1 solves
LE global wall-clock time per iteration:            0.001 sec (5.6%)
LE global matrix maximum size:                       612
AMG cycles per iteration:                           3.000 cycles
Relaxation sweeps per iteration:                      197 sweeps
Relaxation exchanges per iteration:                     0 exchanges
LE early protections (stall) per iteration:           0.000 times
LE early protections (divergence) per iteration:      0.000 times
Total SVARS touched:                              369
Time-step updates per iteration:                     0.31 updates
Time-step wall-clock time per iteration:            0.001 sec (3.1%)

Total wall-clock time:                              2.843 sec

Performance Timer for161 iterations on 80 compute nodes
Average wall-clock time per iteration:              0.016 sec
Global reductions per iteration:                      101 ops
Global reductions time per iteration:               0.000 sec (0.0%)
Message count per iteration:                        31017 messages
Data transfer per iteration:                       12.356 MB
LE solves per iteration:                                3 solves
LE wall-clock time per iteration:                   0.007 sec (41.4%)
LE global solves per iteration:                         1 solves
LE global wall-clock time per iteration:            0.001 sec (7.4%)
LE global matrix maximum size:                       623
AMG cycles per iteration:                           3.000 cycles
Relaxation sweeps per iteration:                      199 sweeps
Relaxation exchanges per iteration:                     0 exchanges
LE early protections (stall) per iteration:           0.000 times
LE early protections (divergence) per iteration:      0.000 times
Total SVARS touched:                              369
Time-step updates per iteration:                     0.31 updates
Time-step wall-clock time per iteration:            0.001 sec (3.3%)

Total wall-clock time:                              2.631 sec

转载本文请联系原作者获取授权，同时请注明本文来自姚程科学网博客。
链接地址：https://wap.sciencenet.cn/blog-531760-1361900.html

上一篇：Sajben扩压器型线python计算
下一篇：[转载]百度网盘上传网络异常

收藏 IP: 112.32.26.*| 热度|

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

姚程

扫一扫，分享此博文

slaon的个人博客分享 http://blog.sciencenet.cn/u/slaon

博文

Fluent 并行核数对比

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

姚程

全部作者的其他最新博文

全部精选博文导读

slaon的个人博客分享 http://blog.sciencenet.cn/u/slaon

博文

Fluent 并行核数对比

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

姚程

全部作者的其他最新博文

全部精选博文导读

该博文允许注册用户评论请点击登录评论 (0 个评论)