1. 倚天710的ddr5子系统倚天710支持支持最先进的ddr5 dram,为云计算和hpc提供巨大的内存带宽。倚天710有8 ddr5通道(channel),每个die上有4个。每个通道相互独立地服务系统的内存请求,分别支持用于1dpc(dimm per channel)的ddr5-4400和2dpc的ddr5-4000。
1.2 ddr5 architecture
ddr5的一个主要变化是新的dimm通道结构(fig 2中channel architecture)。ddr4 dimm的总线位宽为72比特,由64比特数据位和8比特ecc位组成。ddr5的每个dimm有两个独立的子通道。两个通道中的总线位宽都为40比特:32比特的数据位和8比特的ecc位。尽管ddr4和ddr5的数据位宽相同(总共64比特),但两个独立通道可以提高内存访问效率并减少延迟。单通道单次任务只能读或写,双通道的ddr5则读写可以同时进行。
1.2 ddr5 理论带宽倚天2dpc的ddr5-4000的理论带宽为:
4000mhz *32bit / 8 *8 *2 = 128 *10^9 *2 bytes = 128gb/s *2= 256 gb/s内存等效频率(4000mhz)_ 子通道位宽(32 bit)/ 8 _ 子通道数(8)* die (2)注意gb和gib的不同:
1 gb = 1000000000 bytes (= 1000^3 b = 10^9 b)1 gib = 1073741824 bytes (= 1024^3 b = 2^30 b).2. 倚天710 ddrss pmu倚天710的ddrss为每个子通道都实现了独立的pmu,用于性能和功能调试,每个子通道的pmu包含16个通用计数器。
带宽计算公式为:
dram read bandwidth = perf_hif_rd *ddrc_width *ddrc_freq / ddrc_cycledram write bandwidth = (perf_hif_wr + perf_hif_rmw) *ddrc_width *ddrc_freq / ddrc_cycleddrc_width: units of 64 bytes3. cloud-kernel对ddrss pmu的支持#lscpuarchitecture: aarch64byte order: little endiancpu(s): 128on-line cpu(s) list: 0-127thread(s) per core: 1core(s) per socket: 128socket(s): 1numa node(s): 2...测试环境为1个socket,2个die,包含两个numa node。
#numactl -havailable: 2 nodes (0-1)node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63node 0 size: 257416 mbnode 0 free: 187991 mbnode 1 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127node 1 size: 257014 mbnode 1 free: 194504 mbnode distances:node 0 1 0: 10 15 1: 15 10每个numa node有 256 gb内存。
#dmidecode|grep -p -a5 memorys+device|grep size|grep -v range size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: 32 gb size: no module installed ...#dmidecode -t memory | grep speed: speed: 4000 mhz configured clock speed: 4000 mhz2dpc,共插了16根dimm,每个die8根dimm,有效频率为 4000mhz。
#ls /sys/bus/event_source/devices/ | grep drwali_drw_21000ali_drw_21080ali_drw_23000ali_drw_23080ali_drw_25000ali_drw_25080ali_drw_27000ali_drw_27080ali_drw_40021000ali_drw_40021080ali_drw_40023000ali_drw_40023080ali_drw_40025000ali_drw_40025080ali_drw_40027000ali_drw_400270802dpc满插时一共16个pmu设备,其中ali_drw_21000与ali_drw_21080为die 0上同一个dimm的两个子通道,ali_drw_2x000为die 0的pmu设备,ali_drw_4002x000为die 1的pmu设备。
4. ddr 带宽准确性验证4.1 tl;dr
带宽单位:mb/s
可以看到,ddr pmu的带宽统计误差不超过 1%。测试原理,请阅读《倚天710性能监控 —— cmn flit traffic trace with watchpoint event》。
4.2 c0m0 rd# first, run bw_mem as backgroud workload# numactl --cpubind=0 --membind=0 ./bw_mem 40960m rd# then run perf command in another consoleperf stat -e ali_drw_21000/perf_hif_wr/ -e ali_drw_21000/perf_hif_rd/ -e ali_drw_21000/perf_hif_rmw/ -e ali_drw_21000/perf_cycle/ -e ali_drw_21080/perf_hif_wr/ -e ali_drw_21080/perf_hif_rd/ -e ali_drw_21080/perf_hif_rmw/ -e ali_drw_21080/perf_cycle/ -e ali_drw_23000/perf_hif_wr/ -e ali_drw_23000/perf_hif_rd/ -e ali_drw_23000/perf_hif_rmw/ -e ali_drw_23000/perf_cycle/ -e ali_drw_23080/perf_hif_wr/ -e ali_drw_23080/perf_hif_rd/ -e ali_drw_23080/perf_hif_rmw/ -e ali_drw_23080/perf_cycle/ -e ali_drw_25000/perf_hif_wr/ -e ali_drw_25000/perf_hif_rd/ -e ali_drw_25000/perf_hif_rmw/ -e ali_drw_25000/perf_cycle/ -e ali_drw_25080/perf_hif_wr/ -e ali_drw_25080/perf_hif_rd/ -e ali_drw_25080/perf_hif_rmw/ -e ali_drw_25080/perf_cycle/ -e ali_drw_27000/perf_hif_wr/ -e ali_drw_27000/perf_hif_rd/ -e ali_drw_27000/perf_hif_rmw/ -e ali_drw_27000/perf_cycle/ -e ali_drw_27080/perf_hif_wr/ -e ali_drw_27080/perf_hif_rd/ -e ali_drw_27080/perf_hif_rmw/ -e ali_drw_27080/perf_cycle/ -a -- sleep 1performance counter stats for 'system wide': 12398 ali_drw_21000/perf_hif_wr/ 40160751 ali_drw_21000/perf_hif_rd/ 743 ali_drw_21000/perf_hif_rmw/ 500620725 ali_drw_21000/perf_cycle/ 12252 ali_drw_21080/perf_hif_wr/ 40161013 ali_drw_21080/perf_hif_rd/ 767 ali_drw_21080/perf_hif_rmw/ 500619340 ali_drw_21080/perf_cycle/ 11960 ali_drw_23000/perf_hif_wr/ 40159522 ali_drw_23000/perf_hif_rd/ 737 ali_drw_23000/perf_hif_rmw/ 500613505 ali_drw_23000/perf_cycle/ 12044 ali_drw_23080/perf_hif_wr/ 40159066 ali_drw_23080/perf_hif_rd/ 773 ali_drw_23080/perf_hif_rmw/ 500607620 ali_drw_23080/perf_cycle/ 12698 ali_drw_25000/perf_hif_wr/ 40160138 ali_drw_25000/perf_hif_rd/ 709 ali_drw_25000/perf_hif_rmw/ 500601240 ali_drw_25000/perf_cycle/ 12521 ali_drw_25080/perf_hif_wr/ 40160169 ali_drw_25080/perf_hif_rd/ 727 ali_drw_25080/perf_hif_rmw/ 500594755 ali_drw_25080/perf_cycle/ 12171 ali_drw_27000/perf_hif_wr/ 40159404 ali_drw_27000/perf_hif_rd/ 706 ali_drw_27000/perf_hif_rmw/ 500589945 ali_drw_27000/perf_cycle/ 12290 ali_drw_27080/perf_hif_wr/ 40157620 ali_drw_27080/perf_hif_rd/ 710 ali_drw_27080/perf_hif_rmw/ 500583305 ali_drw_27080/perf_cycle/ 1.000923276 seconds time elapsed>>> 40159522*8*64/1000/1000.020561.675# set cpu and memory to the same numa nodenumactl --cpubind=0 --membind=0 ./bw_mem 40960m rd40960.00 20507.824.3 c1m1 rd# first, run bw_mem as backgroud workload# numactl --cpubind=1 --membind=1 ./bw_mem 40960m rd# then run perf command in another consoleperf stat -e ali_drw_40021000/perf_hif_wr/ -e ali_drw_40021000/perf_hif_rd/ -e ali_drw_40021000/perf_hif_rmw/ -e ali_drw_40021000/perf_cycle/ -e ali_drw_40021080/perf_hif_wr/ -e ali_drw_40021080/perf_hif_rd/ -e ali_drw_40021080/perf_hif_rmw/ -e ali_drw_40021080/perf_cycle/ -e ali_drw_40023000/perf_hif_wr/ -e ali_drw_40023000/perf_hif_rd/ -e ali_drw_40023000/perf_hif_rmw/ -e ali_drw_40023000/perf_cycle/ -e ali_drw_40023080/perf_hif_wr/ -e ali_drw_40023080/perf_hif_rd/ -e ali_drw_40023080/perf_hif_rmw/ -e ali_drw_40023080/perf_cycle/ -e ali_drw_40025000/perf_hif_wr/ -e ali_drw_40025000/perf_hif_rd/ -e ali_drw_40025000/perf_hif_rmw/ -e ali_drw_40025000/perf_cycle/ -e ali_drw_40025080/perf_hif_wr/ -e ali_drw_40025080/perf_hif_rd/ -e ali_drw_40025080/perf_hif_rmw/ -e ali_drw_40025080/perf_cycle/ -e ali_drw_40027000/perf_hif_wr/ -e ali_drw_40027000/perf_hif_rd/ -e ali_drw_40027000/perf_hif_rmw/ -e ali_drw_40027000/perf_cycle/ -e ali_drw_40027080/perf_hif_wr/ -e ali_drw_40027080/perf_hif_rd/ -e ali_drw_40027080/perf_hif_rmw/ -e ali_drw_40027080/perf_cycle/ -a -- sleep 1 performance counter stats for 'system wide': 2329 ali_drw_40021000/perf_hif_wr/ 40071983 ali_drw_40021000/perf_hif_rd/ 58 ali_drw_40021000/perf_hif_rmw/ 500572165 ali_drw_40021000/perf_cycle/ 2374 ali_drw_40021080/perf_hif_wr/ 40071737 ali_drw_40021080/perf_hif_rd/ 39 ali_drw_40021080/perf_hif_rmw/ 500569615 ali_drw_40021080/perf_cycle/ 2330 ali_drw_40023000/perf_hif_wr/ 40071063 ali_drw_40023000/perf_hif_rd/ 74 ali_drw_40023000/perf_hif_rmw/ 500565635 ali_drw_40023000/perf_cycle/ 2372 ali_drw_40023080/perf_hif_wr/ 40070344 ali_drw_40023080/perf_hif_rd/ 54 ali_drw_40023080/perf_hif_rmw/ 500561355 ali_drw_40023080/perf_cycle/ 2362 ali_drw_40025000/perf_hif_wr/ 40070906 ali_drw_40025000/perf_hif_rd/ 45 ali_drw_40025000/perf_hif_rmw/ 500557480 ali_drw_40025000/perf_cycle/ 2385 ali_drw_40025080/perf_hif_wr/ 40070168 ali_drw_40025080/perf_hif_rd/ 46 ali_drw_40025080/perf_hif_rmw/ 500552550 ali_drw_40025080/perf_cycle/ 2333 ali_drw_40027000/perf_hif_wr/ 40069233 ali_drw_40027000/perf_hif_rd/ 28 ali_drw_40027000/perf_hif_rmw/ 500548745 ali_drw_40027000/perf_cycle/ 2211 ali_drw_40027080/perf_hif_wr/ 40068227 ali_drw_40027080/perf_hif_rd/ 30 ali_drw_40027080/perf_hif_rmw/ 500544450 ali_drw_40027080/perf_cycle/ 1.000863258 seconds time elapsed>>> 40070906*8*64/1000/1000.020516.303numactl --cpubind=1 --membind=1 ./bw_mem 40960m rd40960.00 20492.534.4 c0m0 fwr# first, run bw_mem as backgroud workload# numactl --cpubind=0 --membind=0 ./bw_mem 40960m fwr# then run perf command in another consoleperf stat -e ali_drw_21000/perf_hif_wr/ -e ali_drw_21000/perf_hif_rd/ -e ali_drw_21000/perf_hif_rmw/ -e ali_drw_21000/perf_cycle/ -e ali_drw_21080/perf_hif_wr/ -e ali_drw_21080/perf_hif_rd/ -e ali_drw_21080/perf_hif_rmw/ -e ali_drw_21080/perf_cycle/ -e ali_drw_23000/perf_hif_wr/ -e ali_drw_23000/perf_hif_rd/ -e ali_drw_23000/perf_hif_rmw/ -e ali_drw_23000/perf_cycle/ -e ali_drw_23080/perf_hif_wr/ -e ali_drw_23080/perf_hif_rd/ -e ali_drw_23080/perf_hif_rmw/ -e ali_drw_23080/perf_cycle/ -e ali_drw_25000/perf_hif_wr/ -e ali_drw_25000/perf_hif_rd/ -e ali_drw_25000/perf_hif_rmw/ -e ali_drw_25000/perf_cycle/ -e ali_drw_25080/perf_hif_wr/ -e ali_drw_25080/perf_hif_rd/ -e ali_drw_25080/perf_hif_rmw/ -e ali_drw_25080/perf_cycle/ -e ali_drw_27000/perf_hif_wr/ -e ali_drw_27000/perf_hif_rd/ -e ali_drw_27000/perf_hif_rmw/ -e ali_drw_27000/perf_cycle/ -e ali_drw_27080/perf_hif_wr/ -e ali_drw_27080/perf_hif_rd/ -e ali_drw_27080/perf_hif_rmw/ -e ali_drw_27080/perf_cycle/ -a -- sleep 1 performance counter stats for 'system wide': 42910737 ali_drw_21000/perf_hif_wr/ 108397 ali_drw_21000/perf_hif_rd/ 495 ali_drw_21000/perf_hif_rmw/ 500708510 ali_drw_21000/perf_cycle/ 42911223 ali_drw_21080/perf_hif_wr/ 117280 ali_drw_21080/perf_hif_rd/ 515 ali_drw_21080/perf_hif_rmw/ 500706780 ali_drw_21080/perf_cycle/ 42910038 ali_drw_23000/perf_hif_wr/ 109179 ali_drw_23000/perf_hif_rd/ 516 ali_drw_23000/perf_hif_rmw/ 500702100 ali_drw_23000/perf_cycle/ 42911620 ali_drw_23080/perf_hif_wr/ 111038 ali_drw_23080/perf_hif_rd/ 523 ali_drw_23080/perf_hif_rmw/ 500697340 ali_drw_23080/perf_cycle/ 42910435 ali_drw_25000/perf_hif_wr/ 111748 ali_drw_25000/perf_hif_rd/ 469 ali_drw_25000/perf_hif_rmw/ 500692500 ali_drw_25000/perf_cycle/ 42908786 ali_drw_25080/perf_hif_wr/ 110177 ali_drw_25080/perf_hif_rd/ 456 ali_drw_25080/perf_hif_rmw/ 500686595 ali_drw_25080/perf_cycle/ 42908903 ali_drw_27000/perf_hif_wr/ 114093 ali_drw_27000/perf_hif_rd/ 490 ali_drw_27000/perf_hif_rmw/ 500681405 ali_drw_27000/perf_cycle/ 42908156 ali_drw_27080/perf_hif_wr/ 109668 ali_drw_27080/perf_hif_rd/ 489 ali_drw_27080/perf_hif_rmw/ 500676420 ali_drw_27080/perf_cycle/ 1.001100811 seconds time elapsed>>> (42908156+489)*8*64/1000/1000.021969.226numactl --cpubind=0 --membind=0 ./bw_mem 40960m fwr40960.00 21936.504.5 c1m1 fwr# first, run bw_mem as backgroud workload# numactl --cpubind=1 --membind=1 ./bw_mem 40960m fwr# then run perf command in another consoleperf stat -e ali_drw_40021000/perf_hif_wr/ -e ali_drw_40021000/perf_hif_rd/ -e ali_drw_40021000/perf_hif_rmw/ -e ali_drw_40021000/perf_cycle/ -e ali_drw_40021080/perf_hif_wr/ -e ali_drw_40021080/perf_hif_rd/ -e ali_drw_40021080/perf_hif_rmw/ -e ali_drw_40021080/perf_cycle/ -e ali_drw_40023000/perf_hif_wr/ -e ali_drw_40023000/perf_hif_rd/ -e ali_drw_40023000/perf_hif_rmw/ -e ali_drw_40023000/perf_cycle/ -e ali_drw_40023080/perf_hif_wr/ -e ali_drw_40023080/perf_hif_rd/ -e ali_drw_40023080/perf_hif_rmw/ -e ali_drw_40023080/perf_cycle/ -e ali_drw_40025000/perf_hif_wr/ -e ali_drw_40025000/perf_hif_rd/ -e ali_drw_40025000/perf_hif_rmw/ -e ali_drw_40025000/perf_cycle/ -e ali_drw_40025080/perf_hif_wr/ -e ali_drw_40025080/perf_hif_rd/ -e ali_drw_40025080/perf_hif_rmw/ -e ali_drw_40025080/perf_cycle/ -e ali_drw_40027000/perf_hif_wr/ -e ali_drw_40027000/perf_hif_rd/ -e ali_drw_40027000/perf_hif_rmw/ -e ali_drw_40027000/perf_cycle/ -e ali_drw_40027080/perf_hif_wr/ -e ali_drw_40027080/perf_hif_rd/ -e ali_drw_40027080/perf_hif_rmw/ -e ali_drw_40027080/perf_cycle/ -a -- sleep 1 performance counter stats for 'system wide': 42906048 ali_drw_40021000/perf_hif_wr/ 33939 ali_drw_40021000/perf_hif_rd/ 76 ali_drw_40021000/perf_hif_rmw/ 500629355 ali_drw_40021000/perf_cycle/ 42905967 ali_drw_40021080/perf_hif_wr/ 34018 ali_drw_40021080/perf_hif_rd/ 63 ali_drw_40021080/perf_hif_rmw/ 500631900 ali_drw_40021080/perf_cycle/ 42905422 ali_drw_40023000/perf_hif_wr/ 33843 ali_drw_40023000/perf_hif_rd/ 75 ali_drw_40023000/perf_hif_rmw/ 500628540 ali_drw_40023000/perf_cycle/ 42905547 ali_drw_40023080/perf_hif_wr/ 33858 ali_drw_40023080/perf_hif_rd/ 68 ali_drw_40023080/perf_hif_rmw/ 500623970 ali_drw_40023080/perf_cycle/ 42905230 ali_drw_40025000/perf_hif_wr/ 34028 ali_drw_40025000/perf_hif_rd/ 56 ali_drw_40025000/perf_hif_rmw/ 500620630 ali_drw_40025000/perf_cycle/ 42904734 ali_drw_40025080/perf_hif_wr/ 34141 ali_drw_40025080/perf_hif_rd/ 61 ali_drw_40025080/perf_hif_rmw/ 500615840 ali_drw_40025080/perf_cycle/ 42903390 ali_drw_40027000/perf_hif_wr/ 33712 ali_drw_40027000/perf_hif_rd/ 84 ali_drw_40027000/perf_hif_rmw/ 500610635 ali_drw_40027000/perf_cycle/ 42903975 ali_drw_40027080/perf_hif_wr/ 33916 ali_drw_40027080/perf_hif_rd/ 106 ali_drw_40027080/perf_hif_rmw/ 500606645 ali_drw_40027080/perf_cycle/ 1.000953335 seconds time elapsed>>> (42903975+106)*8*64/1000/1000.021966.889#numactl --cpubind=1 --membind=1 ./bw_mem 40960m fwr40960.00 21934.51
基于TMS320DM6437数字媒体处理器实现高级驾驶员辅助系统的设计
聊聊MBD开发流程
安费诺先进传感器推出IPT2000压力变送器
总投资65亿元,温州比亚迪新能源动力电池项目开工
传三星要自行研发GPU 自研GPU的好处是什么
倚天710性能监控—DDR PMU子系统
浅谈接线端子的连接方法
全球5G手机市场需求强劲,预计年产量达5亿部
OLED柔性拼接屏的优势
基于微流控的血管芯片来模拟人体血管内部剪切力分布状况
imec压电超声波传感器有什么作用?
智能家居尚处于起步阶段,落地难题尚未解决
飞机电子设备是如何处理电磁干扰的?
32位工业级信号链MCU CS32F031C8T6
西门子PLC的MPI是如何进行网络通讯的详细资料讲解
Type-c降噪耳机如何降噪|type-c接口耳机降噪方案
10.26日即将开幕!台湾高技亮相安徽长三角国际汽车产业及供应链博览会
如何设计扼流圈变压器详细介绍
应宜伦:智能车载光技术应用开启座舱创新生态新格局
NASA与诺基亚联手建立月球4G网络