找回密码
 注册
查看: 1144|回复: 0

openshift Slow performance on pod to pod communication over vxlan in openshift-s

[复制链接]

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
发表于 2022-3-18 15:00:00 | 显示全部楼层 |阅读模式
Slow performance on pod to pod communication over vxlan in openshift-sdn
4 J! E. B1 p& t' K9 Y% [
环境
/ F7 Z# Q* V( l; M6 M- V# K, WRed Hat OpenShift Container Platform
: y! l5 i7 B1 a. {8 j& `Red Hat CoreOS
0 r# B1 Q) U0 k' ?kernel-4.18.0-193.41.1.el8_2.x86_640 W7 a; r  g8 y0 t2 G. b: Z+ V
iptables-nft) ?; _7 ~1 g$ H9 b6 Z
问题# V/ J. v# O* @* }) }/ e/ Z9 T% [
TCP and UDP iperf3 test between pods on different nodes over openshift-sdn is very slow. Even if the nodes are on the same Hypervisor or when an iperf between the node IPs is fast.
2 H% O/ }3 p0 J( B- `Raw! H; m& D* v  J  v/ e
[  5]   0.00-1.00   sec  5.31 MBytes  44.5 Mbits/sec   18    620 KBytes
$ a6 ?  w) |1 Q[  5]   1.00-2.00   sec  4.12 MBytes  34.5 Mbits/sec    0    625 KBytes
/ D& D2 i' |1 L+ ~: v0 d[  5]   2.00-3.00   sec  4.02 MBytes  33.7 Mbits/sec    0    628 KBytes
9 t1 I, T# E/ u' b[  5]   3.00-4.00   sec  4.13 MBytes  34.6 Mbits/sec    0    640 KBytes4 T2 B5 o' E: _! F
[  5]   4.00-5.00   sec  4.15 MBytes  34.8 Mbits/sec    0    665 KBytes
2 Z  O) s0 x0 W9 O0 p[  5]   5.00-6.00   sec  3.95 MBytes  33.1 Mbits/sec    7    673 KBytes
- Y/ O( c, S. {$ v' U, f[  5]   6.00-7.00   sec  4.03 MBytes  33.8 Mbits/sec    3    675 KBytes
  u& c6 E1 o+ c+ LThe same iperf3 tests after adding the iptables rules in the resolution section where performance is boosted from MB/s to GB/s:3 ~8 G5 T5 A  H. [2 U/ j+ n
Raw
: U, ~% R# p1 H5 Y[  5] 490.00-491.00 sec   382 MBytes  3.20 Gbits/sec    4   1.02 MBytes
" X, `; W! m' R2 H" j" @[  5] 491.00-492.00 sec   403 MBytes  3.38 Gbits/sec    4    957 KBytes# }  O: D$ G' m
[  5] 492.00-493.00 sec   404 MBytes  3.39 Gbits/sec   12    869 KBytes
6 Y/ u. |. o* `$ |9 b. j: @[  5] 493.00-494.00 sec   398 MBytes  3.34 Gbits/sec    0   1.10 MBytes9 ]8 B0 y6 Q7 c* ~. Y
[  5] 494.00-495.00 sec   384 MBytes  3.23 Gbits/sec   11   1.02 MBytes. Y/ U) n# N; u" U9 q! ]& o. K
Conntrack shows several UNREPLIED entries at the vxlan port
* H0 D% W! q7 E, ]/ aRaw
& F2 S; R0 D$ v1 |! R7 F0 R0 A$ cat /proc/net/nf_conntrack | egrep udp | egrep dport=4789 | egrep UNREPLIED | wc -l' v- _0 Y# g: F; e; W
2323 |3 C0 W7 b0 |% d& T! R- y! f1 n7 S
$ cat /proc/net/nf_conntrack | egrep udp | egrep dport=4789 | wc -l
4 J6 M9 ~# f7 Z; }4 Y* y232, Z$ S# d9 v! `7 @- v
决议
0 J+ |7 W" w2 g% fThis issue for IPv4 is resolved in releases:
4.9.0+ V! I, S$ V; }; Z- W& R, h
4.8.10! h4 j3 o7 |/ y2 \7 @) F
4.7.30
/ d( f6 R% |' E$ x4.6.453 m! ]$ k4 A) T) Q/ m
Workaround
+ W( l. _3 r8 k2 }  t7 s- {Apply these these iptables rules on the affected nodes:
Raw9 r4 {) O& ^* E) Q" H  ?" |
# iptables -t raw -A OUTPUT -p udp --dport 4789 -j NOTRACK- y7 E/ U0 S; C9 W4 h
# iptables -t raw -A PREROUTING -p udp --dport 4789 -j NOTRACK+ a; ~/ _2 d) W1 M
根源
1 f: u3 n* w0 X3 k8 p/ lUnlike other protocols like DNS, VXLAN doesn't have conversations where one client sends a packet from ${IP1}:${PORT1} to ${IP2}:${PORT2} and expects an answer from the server coming from ${IP2}:${PORT2} to ${IP1}:${PORT1}. Instead, whenever some host wants to communicate with the other host, it will always send packets from ${IP1}:${RANDOM_PORT} to ${IP2}:${VXLAN_PORT:-4789} and if the other hosts sends a packet to the first host, then it would send a packet from ${IP2}:${RANDOM_PORT} to ${IP1}:${VXLAN_PORT:-4789}. What will never happen is that some packet gets replied to the random port used as client port of another packet, so doing connection tracking in VXLAN is not required and doesn't make sense.
However, although not required, doing such connection tracking can have negative side effects in the performance on some scenario, specially if the number of iptables rules of the cluster is high due to the cluster having a very big number of services.
Each vxlan packet would be unnecessarily traversing the iptables rules which can cause delays. As this is a sequential operation and a check needs to be done for each rule it slows UDP packets down considerably. The vxlan makes the following calls:
Raw
( t3 l6 Q& @; z' Svxlan_xmit_one()->udp_tunnel_xmit_skb()->iptunnel_xmit()->ip_local_out(), L8 Q7 o- B9 a, L, U9 y
On the egress side the ip_local_out() routine will call into the netfilter routines as will incoming vxlan packets on the ingress side. With the iptables rules as per the resolution section in place the vxlan packets will not traverse the NAT rules as nf_conntrack is required to do that which mitigates the delay and improves bandwidth.
诊断步骤
& }1 Q/ Z8 k2 _& f- _Check the number of iptables rules on the nodes:
Raw+ Z- N/ n* p/ W: q5 w
# iptables-save | wc -l. {3 B6 a* Y0 M
153900
您需要登录后才可以回帖 登录 | 注册

本版积分规则

返回首页|Archiver|手机版|小黑屋|易陆发现技术论坛 ( 蜀ICP备2026014127号-1 )

GMT+8, 2026-6-12 02:18 , Processed in 0.017049 second(s), 21 queries .

Powered by Discuz! X5.0

© 2001-2026 Discuz! Team.

快速回复 返回顶部 返回列表