Iperf 簡介
簡介
根據維基百科,他是一個跨平台工具,可以為任何網路產生標準化的效能測量結果。
最早來自 ttcp,他是最古老的網路效能測試工具之一,起源於 1980 年代的 BRL(美國陸軍彈道研究實驗室),並在 BSD 系統中發揚光大。
主要版本包含
- iperf2:支援多執行緒,適合測試多核心系統下的最大併發吞吐量
- iperf3:目前主流,重新撰寫了代碼庫,體積更輕量,專門為測試單執行緒性能與 10G/40G/100G 網路優化,但在 v3.16 以後,他也支援了多執行續
相關開發人員像是 Bruce Mah,他既是 ESnet(iperf3 的維護機構)的軟體工程師,也是 FreeBSD 的資深 Committer。
Mike Muuss (1958–2000),他是 ttcp 的共同作者,也是網路界家喻戶曉的 ping 程式的發明者。
Mike Muuss 的程式碼是 BSD 網路堆疊(Network Stack)的核心組成部分。FreeBSD 作為 BSD 的直接後裔,繼承並發展了 Mike Muuss 的原始碼。直到今天,你在 FreeBSD 裡下 man ping,作者欄位依然赫然寫著他的名字。
選項
kola:~/proj/kola_blog$ iperf3 -h
Usage: iperf3 [-s|-c host] [options]
iperf3 [-h|--help] [-v|--version]
Server or Client:
-p, --port # server port to listen on/connect to
-f, --format [kmgtKMGT] format to report: Kbits, Mbits, Gbits, Tbits
-i, --interval # seconds between periodic throughput reports
-I, --pidfile file write PID file
-F, --file name xmit/recv the specified file
-A, --affinity n/n,m set CPU affinity
-B, --bind <host>[%<dev>] bind to the interface associated with the address <host>
(optional <dev> equivalent to `--bind-dev <dev>`)
--bind-dev <dev> bind to the network interface with SO_BINDTODEVICE
-V, --verbose more detailed output
-J, --json output in JSON format
--logfile f send output to a log file
--forceflush force flushing output at every interval
--timestamps<=format> emit a timestamp at the start of each output line
(optional "=" and format string as per strftime(3))
--rcv-timeout # idle timeout for receiving data (default 120000 ms)
--snd-timeout # timeout for unacknowledged TCP data
(in ms, default is system settings)
-d, --debug[=#] emit debugging output
(optional optional "=" and debug level: 1-4. Default is 4 - all messages)
-v, --version show version information and quit
-h, --help show this message and quit
Server specific:
-s, --server run in server mode
-D, --daemon run the server as a daemon
-1, --one-off handle one client connection then exit
--server-bitrate-limit #[KMG][/#] server's total bit rate limit (default 0 = no limit)
(optional slash and number of secs interval for averaging
total data rate. Default is 5 seconds)
--idle-timeout # restart idle server after # seconds in case it
got stuck (default - no timeout)
--rsa-private-key-path path to the RSA private key used to decrypt
authentication credentials
--authorized-users-path path to the configuration file containing user
credentials
--time-skew-threshold time skew threshold (in seconds) between the server
and client during the authentication process
Client specific:
-c, --client <host>[%<dev>] run in client mode, connecting to <host>
(option <dev> equivalent to `--bind-dev <dev>`)
--sctp use SCTP rather than TCP
-X, --xbind <name> bind SCTP association to links
--nstreams # number of SCTP streams
-u, --udp use UDP rather than TCP
--connect-timeout # timeout for control connection setup (ms)
-b, --bitrate #[KMG][/#] target bitrate in bits/sec (0 for unlimited)
(default 1 Mbit/sec for UDP, unlimited for TCP)
(optional slash and packet count for burst mode)
--pacing-timer #[KMG] set the timing for pacing, in microseconds (default 1000)
--fq-rate #[KMG] enable fair-queuing based socket pacing in
bits/sec (Linux only)
-t, --time # time in seconds to transmit for (default 10 secs)
-n, --bytes #[KMG] number of bytes to transmit (instead of -t)
-k, --blockcount #[KMG] number of blocks (packets) to transmit (instead of -t or -n)
-l, --length #[KMG] length of buffer to read or write
(default 128 KB for TCP, dynamic or 1460 for UDP)
--cport <port> bind to a specific client port (TCP and UDP, default: ephemeral port)
-P, --parallel # number of parallel client streams to run
-R, --reverse run in reverse mode (server sends, client receives)
--bidir run in bidirectional mode.
Client and server send and receive data.
-w, --window #[KMG] set send/receive socket buffer sizes
(indirectly sets TCP window size)
-C, --congestion <algo> set TCP congestion control algorithm (Linux and FreeBSD only)
-M, --set-mss # set TCP/SCTP maximum segment size (MTU - 40 bytes)
-N, --no-delay set TCP/SCTP no delay, disabling Nagle's Algorithm
-4, --version4 only use IPv4
-6, --version6 only use IPv6
-S, --tos N set the IP type of service, 0-255.
The usual prefixes for octal and hex can be used,
i.e. 52, 064 and 0x34 all specify the same value.
--dscp N or --dscp val set the IP dscp value, either 0-63 or symbolic.
Numeric values can be specified in decimal,
octal and hex (see --tos above).
-L, --flowlabel N set the IPv6 flow label (only supported on Linux)
-Z, --zerocopy use a 'zero copy' method of sending data
-O, --omit N perform pre-test for N seconds and omit the pre-test statistics
-T, --title str prefix every output line with this string
--extra-data str data string to include in client and server JSON
--get-server-output get results from server
--udp-counters-64bit use 64-bit counters in UDP test packets
--repeating-payload use repeating pattern in payload, instead of
randomized payload (like in iperf2)
--dont-fragment set IPv4 Don't Fragment flag
--username username for authentication
--rsa-public-key-path path to the RSA public key used to encrypt
authentication credentials
[KMG] indicates options that support a K/M/G suffix for kilo-, mega-, or giga-
iperf3 homepage at: https://software.es.net/iperf/
Report bugs to: https://github.com/esnet/iperf比較重要選項是:
--server-bitrate-limit: v3.16 新功能,限制伺服器總頻寬-u, --udp: 使用 UDP 測試。預設是 TCP-b, --bitrate: 目標頻寬。UDP 預設只有 1M,測試 Gbps 網路時必須手動設定(例如 -b 10G)-P, --parallel: 平行串流數量。在 v3.16 中,這會對應到多執行緒(Multi-threading),是壓測多核性能的關鍵w, --window**: 設定 Socket 緩衝區大小(TCP Window Size)C, --congestion: 設定擁塞控制演算法(如cubic,reno,bbr)M, --set-mss: 設定最大段大小(MSS)。這會影響封包切割(Segmentation)N, --no-delay: 停用 Nagle’s Algorithm。適合測試低延遲(Low-latency)環境。Z, --zerocopy: 零拷貝模式O, --omit: 忽略前 N 秒的數據。因為 TCP 有「慢啟動(Slow Start)」,忽略前幾秒能得到更穩定的平均
多執行緒設定
網路測試會受到不同原因,像是本身 CPU Core 算力 (網卡要支援 RSS)、網卡自己的頻寬、記憶體拷貝頻寬。
可以先在自己電腦上面呼叫 iperf3 看 one core 不受網卡影響的話,最多可以多少。
kola:~$ taskset -c 0 iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 127.0.0.1, port 47504
[ 5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 47518
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 5.21 GBytes 44.7 Gbits/sec
[ 5] 1.00-2.00 sec 4.96 GBytes 42.6 Gbits/sec
[ 5] 2.00-3.00 sec 5.07 GBytes 43.6 Gbits/sec
[ 5] 3.00-4.00 sec 5.15 GBytes 44.2 Gbits/sec
[ 5] 4.00-5.00 sec 5.08 GBytes 43.7 Gbits/sec
[ 5] 5.00-6.00 sec 5.19 GBytes 44.6 Gbits/sec
[ 5] 6.00-7.00 sec 5.11 GBytes 43.9 Gbits/sec
[ 5] 7.00-8.00 sec 5.03 GBytes 43.2 Gbits/sec
[ 5] 7.00-8.00 sec 5.03 GBytes 43.2 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-8.00 sec 42.7 GBytes 45.9 Gbits/sec receiver
iperf3: the client has terminated
kola:~$ taskset -c 1 iperf3 -c 127.0.0.1 -t 10
Connecting to host 127.0.0.1, port 5201
[ 5] local 127.0.0.1 port 47518 connected to 127.0.0.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 5.21 GBytes 44.7 Gbits/sec 0 1.19 MBytes
[ 5] 1.00-2.00 sec 4.96 GBytes 42.6 Gbits/sec 0 1.62 MBytes
[ 5] 2.00-3.00 sec 5.07 GBytes 43.6 Gbits/sec 0 2.19 MBytes
[ 5] 3.00-4.00 sec 5.15 GBytes 44.2 Gbits/sec 0 2.19 MBytes
[ 5] 4.00-5.00 sec 5.08 GBytes 43.7 Gbits/sec 0 2.19 MBytes
[ 5] 5.00-6.00 sec 5.19 GBytes 44.6 Gbits/sec 0 3.31 MBytes
[ 5] 6.00-7.00 sec 5.11 GBytes 43.9 Gbits/sec 0 3.31 MBytes
[ 5] 7.00-8.00 sec 5.03 GBytes 43.2 Gbits/sec 0 5.00 MBytes
[ 5] 8.00-8.37 sec 1.90 GBytes 44.3 Gbits/sec 0 5.00 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-8.37 sec 42.7 GBytes 43.8 Gbits/sec 0 sender以上面來說大概是 43.8 Gbit/s,但我們可以使用看看多執行續參數
kola:~$ taskset -c 0-3 iperf3 -c 127.0.0.1 -P 4 -t 10
Connecting to host 127.0.0.1, port 5201
[ 5] local 127.0.0.1 port 49584 connected to 127.0.0.1 port 5201
[ 7] local 127.0.0.1 port 49594 connected to 127.0.0.1 port 5201
[ 9] local 127.0.0.1 port 49602 connected to 127.0.0.1 port 5201
[ 11] local 127.0.0.1 port 49608 connected to 127.0.0.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.39 GBytes 20.5 Gbits/sec 39 4.43 MBytes
[ 7] 0.00-1.00 sec 2.63 GBytes 22.6 Gbits/sec 0 4.25 MBytes
[ 9] 0.00-1.00 sec 2.56 GBytes 21.9 Gbits/sec 2 4.12 MBytes
[ 11] 0.00-1.00 sec 2.27 GBytes 19.5 Gbits/sec 86 4.50 MBytes
[SUM] 0.00-1.00 sec 9.87 GBytes 84.7 Gbits/sec 127
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 3.00-3.12 sec 295 MBytes 20.9 Gbits/sec 0 4.43 MBytes
[ 7] 3.00-3.12 sec 296 MBytes 21.0 Gbits/sec 0 4.25 MBytes
[ 9] 3.00-3.12 sec 275 MBytes 19.7 Gbits/sec 1 4.12 MBytes
[ 11] 3.00-3.12 sec 308 MBytes 22.0 Gbits/sec 2 4.50 MBytes
[SUM] 3.00-3.12 sec 1.15 GBytes 83.2 Gbits/sec 3
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-3.12 sec 7.56 GBytes 20.8 Gbits/sec 47 sender
[ 5] 0.00-3.12 sec 0.00 Bytes 0.00 bits/sec receiver
[ 7] 0.00-3.12 sec 7.72 GBytes 21.3 Gbits/sec 9 sender
[ 7] 0.00-3.12 sec 0.00 Bytes 0.00 bits/sec receiver
[ 9] 0.00-3.12 sec 7.62 GBytes 21.0 Gbits/sec 12 sender
[ 9] 0.00-3.12 sec 0.00 Bytes 0.00 bits/sec receiver
[ 11] 0.00-3.12 sec 7.44 GBytes 20.5 Gbits/sec 98 sender
[ 11] 0.00-3.12 sec 0.00 Bytes 0.00 bits/sec receiver
[SUM] 0.00-3.12 sec 30.3 GBytes 83.5 Gbits/sec 166 sender
[SUM] 0.00-3.12 sec 0.00 Bytes 0.00 bits/sec receiver
iperf3: interrupt - the client has terminated
kola:~$ taskset -c 0 iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 127.0.0.1, port 49576
[ 5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 49584
[ 8] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 49594
[ 10] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 49602
[ 12] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 49608
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 2.39 GBytes 20.5 Gbits/sec
[ 8] 0.00-1.00 sec 2.63 GBytes 22.5 Gbits/sec
[ 10] 0.00-1.00 sec 2.56 GBytes 22.0 Gbits/sec
[ 12] 0.00-1.00 sec 2.27 GBytes 19.5 Gbits/sec
[SUM] 0.00-1.00 sec 9.85 GBytes 84.5 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 2.00-3.00 sec 2.39 GBytes 20.5 Gbits/sec
[ 8] 2.00-3.00 sec 2.39 GBytes 20.5 Gbits/sec
[ 10] 2.00-3.00 sec 2.39 GBytes 20.5 Gbits/sec
[ 12] 2.00-3.00 sec 2.39 GBytes 20.6 Gbits/sec
[SUM] 2.00-3.00 sec 9.56 GBytes 82.1 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-3.00 sec 7.55 GBytes 21.6 Gbits/sec receiver
[ 8] 0.00-3.00 sec 7.72 GBytes 22.1 Gbits/sec receiver
[ 10] 0.00-3.00 sec 7.62 GBytes 21.8 Gbits/sec receiver
[ 12] 0.00-3.00 sec 7.44 GBytes 21.3 Gbits/sec receiver
[SUM] 0.00-3.00 sec 30.3 GBytes 86.8 Gbits/sec receiver
iperf3: the client has terminated
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------如上,變成 86.8 Gbit/s,我們使用四個執行緒,但沒有四倍,主要是因為 server 只用一個 core 跑。
kola:~$ taskset -c 4-7 iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 127.0.0.1, port 37116
[ 5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 37122
[ 8] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 37128
[ 10] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 37136
[ 12] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 37148
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 4.58 GBytes 39.3 Gbits/sec
[ 8] 0.00-1.00 sec 3.41 GBytes 29.3 Gbits/sec
[ 10] 0.00-1.00 sec 3.36 GBytes 28.8 Gbits/sec
[ 12] 0.00-1.00 sec 4.39 GBytes 37.7 Gbits/sec
[SUM] 0.00-1.00 sec 15.7 GBytes 135 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 4.00-5.00 sec 4.70 GBytes 40.4 Gbits/sec
[ 8] 4.00-5.00 sec 3.43 GBytes 29.4 Gbits/sec
[ 10] 4.00-5.00 sec 4.76 GBytes 40.9 Gbits/sec
[ 12] 4.00-5.00 sec 3.37 GBytes 29.0 Gbits/sec
[SUM] 4.00-5.00 sec 16.3 GBytes 140 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-5.00 sec 25.6 GBytes 44.1 Gbits/sec receiver
[ 8] 0.00-5.00 sec 18.9 GBytes 32.4 Gbits/sec receiver
[ 10] 0.00-5.00 sec 23.2 GBytes 39.8 Gbits/sec receiver
[ 12] 0.00-5.00 sec 20.9 GBytes 35.9 Gbits/sec receiver
[SUM] 0.00-5.00 sec 88.6 GBytes 152 Gbits/sec receiver
iperf3: the client has terminated
-----------------------------------------------------------
Server listening on 5201 (test #2)
kola:~$ taskset -c 0-3 iperf3 -c 127.0.0.1 -P 4 -t 10
Connecting to host 127.0.0.1, port 5201
[ 5] local 127.0.0.1 port 37122 connected to 127.0.0.1 port 5201
[ 7] local 127.0.0.1 port 37128 connected to 127.0.0.1 port 5201
[ 9] local 127.0.0.1 port 37136 connected to 127.0.0.1 port 5201
[ 11] local 127.0.0.1 port 37148 connected to 127.0.0.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 4.60 GBytes 39.3 Gbits/sec 0 3.00 MBytes
[ 7] 0.00-1.05 sec 3.56 GBytes 29.3 Gbits/sec 0 2.00 MBytes
[ 9] 0.00-1.06 sec 3.56 GBytes 28.9 Gbits/sec 0 2.00 MBytes
[ 11] 0.00-1.06 sec 4.63 GBytes 37.5 Gbits/sec 1 2.75 MBytes
[SUM] 0.00-1.00 sec 16.4 GBytes 140 Gbits/sec 1
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 5.00-5.59 sec 2.71 GBytes 39.7 Gbits/sec 0 4.18 MBytes
[ 7] 5.00-5.59 sec 2.04 GBytes 29.9 Gbits/sec 0 3.18 MBytes
[ 9] 5.00-5.59 sec 2.79 GBytes 40.9 Gbits/sec 0 2.75 MBytes
[ 11] 5.00-5.59 sec 2.02 GBytes 29.5 Gbits/sec 0 4.87 MBytes
[SUM] 5.00-5.59 sec 9.57 GBytes 140 Gbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.59 sec 25.6 GBytes 39.4 Gbits/sec 0 sender
[ 5] 0.00-5.59 sec 0.00 Bytes 0.00 bits/sec receiver
[ 7] 0.00-5.59 sec 18.9 GBytes 29.0 Gbits/sec 3 sender
[ 7] 0.00-5.59 sec 0.00 Bytes 0.00 bits/sec receiver
[ 9] 0.00-5.59 sec 23.2 GBytes 35.6 Gbits/sec 6 sender
[ 9] 0.00-5.59 sec 0.00 Bytes 0.00 bits/sec receiver
[ 11] 0.00-5.59 sec 20.9 GBytes 32.1 Gbits/sec 1 sender
[ 11] 0.00-5.59 sec 0.00 Bytes 0.00 bits/sec receiver
[SUM] 0.00-5.59 sec 88.6 GBytes 136 Gbits/sec 10 sender
[SUM] 0.00-5.59 sec 0.00 Bytes 0.00 bits/sec receiver現在可以看到到達 136 Gbit/s 了,但我們實際上在 3.16 版本之後的 ipef3,iperf3 -s 會針對每一個串流建立一個獨立的執行緒,所以也是會 136 Gbit/s。
OS 排程器會觀察 CPU 負載。為了避免快取競爭(Cache Contention)和過熱,排程器通常會把這 4 個執行緒丟到不同的核心(例如 Core 0, 2, 4, 6)。
案例
兩個電腦直連(10G 接 1G 網卡)
直接直連發現用 iperf3 跑不滿 1G,連只到 200MB 也跑不滿,不管是 UDP/TCP 都一堆重傳。
有一些徵兆像是:
# rx_errors 變大
adl@adl-D630MT:~$ ethtool -S enp4s0 | grep -iE "overrun|dropped|missed|errors"
tx_errors: 0
rx_errors: 94373
rx_missed: 3207
align_errors: 0
# 所有中斷都在一個 Core
adl@adl-D630MT:~$ cat /proc/interrupts | grep enp4s0
137: 0 0 0 149352 IR-PCI-MSIX-0000:04:00.0 0-ed
# 網卡只有一個 Buffer 沒有 RSS
adl@adl-D630MT:~$ sudo ethtool -l enp4s0
netlink error: Operation not supported因為注意到中斷都在一個 Core,發現網卡沒有 RSS,只能開啟軟體分配 RPS 看看 echo f | sudo tee /sys/class/net/enp4s0/queues/rx-0/rps_cpus。
但還是失敗,然後在 deubg 看看網卡 buffer 能不能變大,如下只有 256 不能變大。
# 只有 256 Buffer
adl@adl-D630MT:~$ ethtool -g enp4s0
Ring parameters for enp4s0:
Pre-set maximums:
RX: 256
RX Mini: n/a
RX Jumbo: n/a
TX: 256
TX push buff len: n/a
Current hardware settings:
RX: 256
RX Mini: n/a
RX Jumbo: n/a
TX: 256
RX Buf Len: n/a
CQE Size: n/a
TX Push: off
RX Push: off
TX push buff len: n/a
TCP data split: n/a
adl@adl-D630MT:~$然後也看 PCIE 協定夠不夠快,因為 1G 電腦很舊了。
adl@adl-D630MT:~$ ethtool -i enp4s0
driver: r8169
version: 6.17.0-20-generic
firmware-version: rtl8168h-2_0.0.2 02/26/15
expansion-rom-version:
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
adl@adl-D630MT:~$ sudo lspci -vvv -s 04:00.0 | grep -E "LnkCap|LnkSta"
pcilib: sysfs_read_vpd: read failed: No such device
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
LnkSta: Speed 2.5GT/s, Width x1
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-最後才發現,原來是因為 10G 網卡他是用突發 10G 的,問題不在於這台老機器收不動 1G,而是它收不動「以 10G 時脈噴發的 1G」。
嘗試把 10G 網卡降低頻率:
adl@Twinkle:~$ sudo ethtool -s enp6s0f1 speed 1000 duplex full autoneg on
Cannot set new settings: Invalid argument
not setting speed
not setting duplex
not setting autoneg發現網卡不給設定,所以最後用 tc 軟體設定限制速率,做 Shaping (整形) 限制發送速度。
# 清除舊規則
sudo tc qdisc del dev enp6s0f1 root 2>/dev/null
# 加入 Token Bucket Filter (TBF) 限制流量並增加平滑緩衝
sudo tc qdisc add dev enp6s0f1 root tbf rate 1gbit burst 32kbit latency 400ms最後就成功了,能跑到 1G。