Linux 高速網路封包設定
網路封包送出流程
最早在 Usersapce,Process 會組好封包,透過 socket descriptor 傳入封包,這時會透過 system call 把封包放到核心裡面的 socket send queue。
再來會進到 qdisc queue,核心會作一些封包處理(像是 netfilter、分段)。
在傳到 device layer,驅動會放到 NIC TX ring(第三個 queue),NIC 會透過 DMA 拿 NIC 硬體 TX Ring 送出封包。
當 NIC ring 送出後,會發出中斷說,我已經送完,告蘇 driver 可以塞更多封包。
三個 queue:
-
socket queue:: 每個 socket 一個 queue
adl@Twinkle:~$ cat /proc/net/sockstat sockets: used 315 TCP: inuse 39 orphan 0 tw 0 alloc 40 mem 4 UDP: inuse 3 mem 3 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0 -
qdisc queue: 現代網路介面卡 (NIC) 具有多個硬體傳送 (TX) 佇列。 Linux 核心使用「mq」(多佇列)框架,其中每個硬體佇列都附加一個單獨的 Qdisc,佇列數量會根據 CPU 核心數或硬體設計而增加。
adl@Twinkle:~$ tc -s qdisc show dev enp6s0f1 qdisc mq 0: root Sent 32930743567 bytes 609828465 pkt (dropped 3, overlimits 0 requeues 7616) backlog 0b 0p requeues 7616 qdisc fq_codel 0: parent :14 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 59033718 bytes 1093217 pkt (dropped 0, overlimits 0 requeues 86) backlog 0b 0p requeues 86 maxpacket 54 drop_overlimit 0 new_flow_count 560 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :13 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 57459726 bytes 1064069 pkt (dropped 0, overlimits 0 requeues 64) backlog 0b 0p requeues 64 maxpacket 54 drop_overlimit 0 new_flow_count 268 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :12 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 57390610 bytes 1062787 pkt (dropped 0, overlimits 0 requeues 53) backlog 0b 0p requeues 53 maxpacket 54 drop_overlimit 0 new_flow_count 413 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :11 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 3034216446 bytes 56189195 pkt (dropped 0, overlimits 0 requeues 545) backlog 0b 0p requeues 545 maxpacket 54 drop_overlimit 0 new_flow_count 5620 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :10 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 344980080 bytes 6388520 pkt (dropped 0, overlimits 0 requeues 169) backlog 0b 0p requeues 169 maxpacket 54 drop_overlimit 0 new_flow_count 1143 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :f limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 200657544 bytes 3715875 pkt (dropped 0, overlimits 0 requeues 200) backlog 0b 0p requeues 200 maxpacket 54 drop_overlimit 0 new_flow_count 1884 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :e limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 149811324 bytes 2774284 pkt (dropped 0, overlimits 0 requeues 156) backlog 0b 0p requeues 156 maxpacket 54 drop_overlimit 0 new_flow_count 1191 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :d limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 195745492 bytes 3624905 pkt (dropped 0, overlimits 0 requeues 267) backlog 0b 0p requeues 267 maxpacket 54 drop_overlimit 0 new_flow_count 2332 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :c limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 177561618 bytes 3288179 pkt (dropped 0, overlimits 0 requeues 711) backlog 0b 0p requeues 711 maxpacket 54 drop_overlimit 0 new_flow_count 4602 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :b limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 11404401960 bytes 211192606 pkt (dropped 0, overlimits 0 requeues 1516) backlog 0b 0p requeues 1516 maxpacket 54 drop_overlimit 0 new_flow_count 8062 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :a limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 164616798 bytes 3048457 pkt (dropped 3, overlimits 0 requeues 483) backlog 0b 0p requeues 483 maxpacket 54 drop_overlimit 0 new_flow_count 5024 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :9 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 15427025954 bytes 285685599 pkt (dropped 0, overlimits 0 requeues 1739) backlog 0b 0p requeues 1739 maxpacket 54 drop_overlimit 0 new_flow_count 18733 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :8 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 146930166 bytes 2720929 pkt (dropped 0, overlimits 0 requeues 86) backlog 0b 0p requeues 86 maxpacket 54 drop_overlimit 0 new_flow_count 1189 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :7 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 104468416 bytes 1934596 pkt (dropped 0, overlimits 0 requeues 213) backlog 0b 0p requeues 213 maxpacket 54 drop_overlimit 0 new_flow_count 5862 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :6 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 650631474 bytes 12048731 pkt (dropped 0, overlimits 0 requeues 205) backlog 0b 0p requeues 205 maxpacket 54 drop_overlimit 0 new_flow_count 1579 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :5 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 174091626 bytes 3223919 pkt (dropped 0, overlimits 0 requeues 316) backlog 0b 0p requeues 316 maxpacket 54 drop_overlimit 0 new_flow_count 2513 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 131839056 bytes 2441464 pkt (dropped 0, overlimits 0 requeues 103) backlog 0b 0p requeues 103 maxpacket 54 drop_overlimit 0 new_flow_count 954 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 188725344 bytes 3494914 pkt (dropped 0, overlimits 0 requeues 270) backlog 0b 0p requeues 270 maxpacket 54 drop_overlimit 0 new_flow_count 2479 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 134800200 bytes 2496300 pkt (dropped 0, overlimits 0 requeues 147) backlog 0b 0p requeues 147 maxpacket 54 drop_overlimit 0 new_flow_count 1392 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn Sent 126356015 bytes 2339919 pkt (dropped 0, overlimits 0 requeues 287) backlog 0b 0p requeues 287 maxpacket 54 drop_overlimit 0 new_flow_ -
NIC RX queue
發送端設定
以前有 XPS,避免所有 CPU 都打同一個 TX queue,但現在在 tqdic 有 fq,kernel flow-based TX scheduling。
但假如發送大量同樣 5 tuple,就算有 fq or xps 也會只用到網卡的一個 TX queue。
XPS 設定如下假如你有 20 cores:
for i in /sys/class/net/enp6s0f1/queues/tx-*; do
echo fffff > $i/xps_cpus
done
# 更好的方式是 1 queue ↔ 1 CPU
echo 00001 > tx-0/xps_cpus
echo 00002 > tx-1/xps_cpus
echo 00004 > tx-2/xps_cpus
echo 00008 > tx-3/xps_cpus多 queue NIC + 多 core + 高 PPS 才適合開 xps,讓不同 CPU 的 send flow 對應不同 TX queue。
接收端設定
RSS:是網卡有沒有這個功能,NIC 自己把 packet 分到不同 RX queue,用 hash(5-tuple),每個 queue 一個 MSI-X RPS(Receive Packet Steering):NIC 已經把 packet 丟進 RX queue,但 CPU 可以再「重新分配」 RFS(Receive Flow Steering):讓同一條 TCP flow 一直在同 CPU
RPS + RFS 會一起用,主要就是用在 receive path(RX),而且它的設計目標就是:在「NIC 已經把封包打進某個 RX queue + 某個 CPU」之後,再把後續處理搬去別的 CPU。
但假如你有 RSS 就不用用 RPS + RFS 了,因為他們是為了單個 queue。
- RPS 設定是:
echo 4096 > /proc/sys/net/core/rps_sock_flow_entries irqbalance: 自動調整 NIC interrupt 分配到 CPU