找回密码
 注册
查看: 1143|回复: 0

openshift Slow performance on pod to pod communication over vxlan in openshift-s

[复制链接]

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
发表于 2022-3-18 15:00:00 | 显示全部楼层 |阅读模式
Slow performance on pod to pod communication over vxlan in openshift-sdn
7 V0 W% m; I; x2 v
环境
# p* }* \+ ^' g5 v5 {- VRed Hat OpenShift Container Platform+ }7 C! l7 P0 i0 q' P
Red Hat CoreOS3 g6 I! [1 X  d+ C% I
kernel-4.18.0-193.41.1.el8_2.x86_644 A% X5 B9 P1 m
iptables-nft9 V6 q- ?* _, `* C5 b9 @
问题& w0 O6 Q. d  E2 i( u( r) i1 i8 s
TCP and UDP iperf3 test between pods on different nodes over openshift-sdn is very slow. Even if the nodes are on the same Hypervisor or when an iperf between the node IPs is fast.
5 N$ U' V1 a" x: i9 [& CRaw2 X) x) s6 ~1 c+ g: X% N1 w/ u" Z
[  5]   0.00-1.00   sec  5.31 MBytes  44.5 Mbits/sec   18    620 KBytes2 ]: I- |/ _0 q+ p" W: _
[  5]   1.00-2.00   sec  4.12 MBytes  34.5 Mbits/sec    0    625 KBytes- f4 H. w- G# Y; Q5 S; `2 t: k
[  5]   2.00-3.00   sec  4.02 MBytes  33.7 Mbits/sec    0    628 KBytes( c" `. R& Z2 N1 ?9 i$ M) D7 S
[  5]   3.00-4.00   sec  4.13 MBytes  34.6 Mbits/sec    0    640 KBytes
# u4 v, H5 U$ w[  5]   4.00-5.00   sec  4.15 MBytes  34.8 Mbits/sec    0    665 KBytes; w/ M  a7 k' f  \: ]
[  5]   5.00-6.00   sec  3.95 MBytes  33.1 Mbits/sec    7    673 KBytes
. b1 M8 t: h8 s( h; e6 C[  5]   6.00-7.00   sec  4.03 MBytes  33.8 Mbits/sec    3    675 KBytes
# f4 {0 J' `% a6 dThe same iperf3 tests after adding the iptables rules in the resolution section where performance is boosted from MB/s to GB/s:
! T; A! d( R0 J0 U8 P4 T0 [Raw
/ H# x1 k+ p9 O1 s% u- ~[  5] 490.00-491.00 sec   382 MBytes  3.20 Gbits/sec    4   1.02 MBytes
/ [! i& ^  G. @[  5] 491.00-492.00 sec   403 MBytes  3.38 Gbits/sec    4    957 KBytes
5 G; P! m: Z* ]( }[  5] 492.00-493.00 sec   404 MBytes  3.39 Gbits/sec   12    869 KBytes
3 d5 u3 b/ N9 s( A( Y[  5] 493.00-494.00 sec   398 MBytes  3.34 Gbits/sec    0   1.10 MBytes0 v9 r8 k9 u9 @* H
[  5] 494.00-495.00 sec   384 MBytes  3.23 Gbits/sec   11   1.02 MBytes* I0 F0 ^8 X, b2 i
Conntrack shows several UNREPLIED entries at the vxlan port, S: X' t) q& T- T+ i
Raw9 y9 q& \0 f* `% t
$ cat /proc/net/nf_conntrack | egrep udp | egrep dport=4789 | egrep UNREPLIED | wc -l+ O; Q5 p! @: D' P! L9 d
232. T, ^& ^( I4 W# {2 P$ g' |
$ cat /proc/net/nf_conntrack | egrep udp | egrep dport=4789 | wc -l& y+ d7 m( R5 C4 e
2326 M  X/ S( X, Q; I# f
决议
! X; @, H2 H9 {) zThis issue for IPv4 is resolved in releases:
4.9.0
' m! \8 P' R& I- V4.8.10
$ F* f* Z: h$ F  I. t  A4.7.30
# r9 g7 {* V9 @4 c. b  V' ]6 G. X: u4.6.45& ]: r( \( ]2 Q- Z, `# p9 p/ q
Workaround
! `. e1 O0 ^5 ^) d- T* nApply these these iptables rules on the affected nodes:
Raw
/ m2 l$ K: {# A/ T! o9 g# iptables -t raw -A OUTPUT -p udp --dport 4789 -j NOTRACK1 i3 W/ R  S5 _, J1 O5 D" M# Y
# iptables -t raw -A PREROUTING -p udp --dport 4789 -j NOTRACK+ `1 v5 O# \: Z: _; c( v: r
根源( g1 P0 n+ h- x& ]- Q+ D! X: k
Unlike other protocols like DNS, VXLAN doesn't have conversations where one client sends a packet from ${IP1}:${PORT1} to ${IP2}:${PORT2} and expects an answer from the server coming from ${IP2}:${PORT2} to ${IP1}:${PORT1}. Instead, whenever some host wants to communicate with the other host, it will always send packets from ${IP1}:${RANDOM_PORT} to ${IP2}:${VXLAN_PORT:-4789} and if the other hosts sends a packet to the first host, then it would send a packet from ${IP2}:${RANDOM_PORT} to ${IP1}:${VXLAN_PORT:-4789}. What will never happen is that some packet gets replied to the random port used as client port of another packet, so doing connection tracking in VXLAN is not required and doesn't make sense.
However, although not required, doing such connection tracking can have negative side effects in the performance on some scenario, specially if the number of iptables rules of the cluster is high due to the cluster having a very big number of services.
Each vxlan packet would be unnecessarily traversing the iptables rules which can cause delays. As this is a sequential operation and a check needs to be done for each rule it slows UDP packets down considerably. The vxlan makes the following calls:
Raw  z. m9 m- o5 j- Z, o& t& e
vxlan_xmit_one()->udp_tunnel_xmit_skb()->iptunnel_xmit()->ip_local_out()3 k7 N% c4 R( x5 i9 f) ^* L
On the egress side the ip_local_out() routine will call into the netfilter routines as will incoming vxlan packets on the ingress side. With the iptables rules as per the resolution section in place the vxlan packets will not traverse the NAT rules as nf_conntrack is required to do that which mitigates the delay and improves bandwidth.
诊断步骤7 L. k3 t/ U% r) x8 T4 ]. s
Check the number of iptables rules on the nodes:
Raw0 |$ W3 o" C* v) o( f$ X& D
# iptables-save | wc -l4 Z8 X0 p  a3 z
153900
您需要登录后才可以回帖 登录 | 注册

本版积分规则

返回首页|Archiver|手机版|小黑屋|易陆发现技术论坛 ( 蜀ICP备2026014127号-1 )

GMT+8, 2026-6-12 01:04 , Processed in 0.031993 second(s), 24 queries .

Powered by Discuz! X5.0

© 2001-2026 Discuz! Team.

快速回复 返回顶部 返回列表