Howto analyze the results of the NetPIPE benchmark

2018-07-01 00:35:28

I measured the the latency and throughput of an Ethernet between two Raspberry Pi Model B single board computers with the benchmark tool NetPIPE. The benchmark tests over a range of message sizes between two processes. It was executed one time by just using TCP as end-to-end protocol and one time by using the Open MPI message passing layer library.

The connection is not a direct link. An unmanaged Layer-2 switch (10/100 Mbps Ethernet) is between both devices.

MTU=1500 Bytes.

The figures show that using MPI (which also uses TCP as transport layer protocol) is an overhead that has a negative impact on throughput and latency. The best measured throughput when using MPI is 65 Mbit/s. When using just TCP, the throughput reaches up to 85 Mbit/s.

As long as the payload fits into a single TCP segment, The latency when using MPI is approximately ten times worse compared when using just TCP. The maximum transmission unit (MTU), which specifies the maximum payload inside an Ethernet frame, is 1500 Bytes in our cluster. Consequently, the maximum segment size (MSS), which specifies the maximum payload inside a TCP segment, is 1460 Bytes.

Some questions:

Why have the MPI graphs more outliers comapred with the TCP graph? This can be clearly seen in the bottom left figure. Is this because of the process scheduling of the operating system? The TCP stack is a part of the linux kernel and thus executed in kernel space. The MPI library is executed in user space.

Why is the latency for a longer time constant when using MPI compared with TCP? It can be clearly seen in the top right figure.

Are there any further explanations for the results I missed?

使用NetPIPE基准分析两台Raspberry Pi B型单板计算机之间的网络性能

BTW: The poor overall Ethernet performance is probably because the 10/100 Mbit Ethernet controller of the Raspberry Pi is internally connected to the USB 2.0 hub.

Update

Performance drops, especially around 4 MB payload size are probably caused by the limited CPU resources of the Raspberry Pi nodes. I checked the CPU utilization with htop and when running the MPI benchmark it is almost entirely utilized.

the performance drop at around 512 KiB is due to the MPI protocoll:

eager mode => 1x roundtrip time

rendevouz => 2x roundtrip time

See: https://computing.llnl.gov/tutorials/mpi_performance/#EagerVsRendezvous The default point when it changes the configuration depends on the MPI implementation chosen. Also there are configuration variables to change it, and, in your case, you should later switch from eager to rendevouz. The additional latency causes performance to slowly improve again until 1 MiB transfer size. The later performance drop is unclear to me.

链接地址: http://www.djcxy.com/p/86530.html

上一篇: R在Windows上进行Specter和Meltdown缓解的基准测试

下一篇: 如何分析NetPIPE基准测试结果