* Problems: network rx out-of-order issue
@ 2018-03-28 9:26 Anny Hu
2018-03-28 14:19 ` Eric Dumazet
0 siblings, 1 reply; 3+ messages in thread
From: Anny Hu @ 2018-03-28 9:26 UTC (permalink / raw)
To: netdev; +Cc: zhen.jiang
Dears,
Recently, we find the following patch will impact multi-core network throughput performance on kernel-4.9,
for it will cause rx packet out-of-order.
commit id: 4cd13c21b207e80ddb1144c576500098f2d5f882
[kernel version]:
kernel-4.9
[repeat steps]
1. two our phones with load based on kernel-4.9, enable wifi and set up P2P connection;
2. two reference phones with load based on kernel-4.4, enable wifi and set up P2P connection;
3. do iperf UDP downlink test with the following commands;
Client: adb shell iperf -c 192.168.49.xx -u -i 1 -t 60 -b 15M
Server: adb shell iperf -s -u
4. out-of-order rate of our phones are obviously higher than reference phone:
DUT Lost rate out of order
Our phone 0.011% 164
Reference phone 0.0091% 1
[Code flow]
Wi-Fi driver trigger rx_thread to receive rx packets and call netif_rx_ni() to push packets to tcp stack in thread context.
[netlog]:
No Time Source ip Dest ip Protocol Len Ip id Info
3811 2010-01-01 08:11:35.075522 192.168.43.1 192.168.43.90 UDP 144 0x25ee (9710) 60834 -> 5201 Len=100
3812 2010-01-01 08:11:35.075805 192.168.43.1 192.168.43.90 UDP 144 0x25f5 (9717) 60834 -> 5201 Len=100
3813 2010-01-01 08:11:35.075549 192.168.43.1 192.168.43.90 UDP 144 0x25ef (9711) 60834 -> 5201 Len=100
3814 2010-01-01 08:11:35.075944 192.168.43.1 192.168.43.90 UDP 144 0x25f6 (9718) 60834 -> 5201 Len=100
3815 2010-01-01 08:11:35.075567 192.168.43.1 192.168.43.90 UDP 144 0x25f0 (9712) 60834 -> 5201 Len=100
out-of-order issue happens on:
1) ip_id=0x25f5 is received before ip_id=0x25ef, which are belong to the same downlink connection;
2) ip_id=0x25f6 is received before ip_id=0x25f0, which are belong to the same downlink connection.
[kernel log]:
We add debug log in rx kernel flow, such as netif_rx_ni() -> enqueue_to_backlog -> do_softirq() -> net_rx_action() -> process_backlog()
kernel time
| cpu id which process running on
| | process name function name
| | | |
[ 598.837358] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ee, pending=0x8a, backlog_state=1
// wifi rx_thread is running on cpu7 to receive rx packets of ip_id 0x25ee ~ 0x25f4, which are only enqueued.
[ 598.837366] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ee, input_qlen=1, proc_qlen=0, pending=0x8a, backlog_state=1
// because ksoftirqd is in running queue, do_softirq() exit and net_rx_action isn't executed actually.
[ 598.837373] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
[ 598.837383] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ef,input_qlen=2, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837391] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0 pending=0x8a, ksoftirqd_running=1
[ 598.837400] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f0,input_qlen=3, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837407] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
[ 598.837417] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f1,input_qlen=4, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837424] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
[ 598.837434] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f2,input_qlen=5, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837441] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
[ 598.837450] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f3,input_qlen=6, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837458] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
// after ip_id(0x25f4), there is no rx packet in this wifi interrupt, so rx_thread exit.
[ 598.837470] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f4,input_qlen=7, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837479] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
// ksoftirqd is scheduled to run and call do_softirq(), so net_rx_action() is executed.
[ 598.837511] (7)[45:ksoftirqd/7]: [RX]net_rx_action: budget=300, time=4295042005
// net_rx_action() call process_backlog() to handle rx packet of ip_id=0x25ee
[ 598.837646] (7)[45:ksoftirqd/7]: [RX]process_backlog: ip_id=0x25ee, delay=126308, work=1, input_qlen=0, proc_qlen=6
// wifi interrupt happens and trigger rx_thread to receive the next rx packet, which is running on cpu1 this time.
[ 598.837660] (1)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f5, input_qlen=1, proc_qlen=0, pending=0x8, backlog_state=1
// because ksoftirqd is not running, net_rx softirq is executed immediately.
[ 598.837666] (1)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8, ksoftirqd_running =0
// so ksoftirqd on cpu7 and rx_thread on cpu1 are running at the same time.
[ 598.837679] (1)[3282:rx_thread]: [RX]net_rx_action: budget=300, time=4295042005
// ip_id=0x25ef is handled on cpu7 by net_rx softirq in ksoftirqd() thread context.
[ 598.837762] (7)[45:ksoftirqd/7]: [RX]process_backlog: ip_id=0x25ef, delay=70385, work=2, input_qlen=0, proc_qlen=5
// ip_id=0x25f5 is handled on cpu1 by net_rx softirq in rx_thread context, out-of-order issue happens.
[ 598.837768] (1)[3282:rx_thread]: [RX]process_backlog: ip_id=0x25f5, delay=82000, work=1, quota=64, input_qlen=0, proc_qlen=0
// ip_id=0x25f0 is handled on cpu7 by net_rx softirq in ksoftirqd() thread context.
[ 598.837878] (7)[45:ksoftirqd/7]: [RX]process_backlog:ip_id=0x25f0, delay=47539, work=3, input_qlen=0, proc_qlen=4
Could you please help to check if this out-of-order issue is caused by kernel-4.9 patch above?
Thanks very much!
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Problems: network rx out-of-order issue
2018-03-28 9:26 Problems: network rx out-of-order issue Anny Hu
@ 2018-03-28 14:19 ` Eric Dumazet
0 siblings, 0 replies; 3+ messages in thread
From: Eric Dumazet @ 2018-03-28 14:19 UTC (permalink / raw)
To: Anny Hu, netdev; +Cc: zhen.jiang
On 03/28/2018 02:26 AM, Anny Hu wrote:
> Dears,
>
> Recently, we find the following patch will impact multi-core network throughput performance on kernel-4.9,
> for it will cause rx packet out-of-order.
> commit id: 4cd13c21b207e80ddb1144c576500098f2d5f882
>
> [kernel version]:
> kernel-4.9
>
> [repeat steps]
> 1. two our phones with load based on kernel-4.9, enable wifi and set up P2P connection;
> 2. two reference phones with load based on kernel-4.4, enable wifi and set up P2P connection;
> 3. do iperf UDP downlink test with the following commands;
> Client: adb shell iperf -c 192.168.49.xx -u -i 1 -t 60 -b 15M
> Server: adb shell iperf -s -u
> 4. out-of-order rate of our phones are obviously higher than reference phone:
> DUT Lost rate out of order
> Our phone 0.011% 164
> Reference phone 0.0091% 1
>
> [Code flow]
> Wi-Fi driver trigger rx_thread to receive rx packets and call netif_rx_ni() to push packets to tcp stack in thread context.
>
> [netlog]:
> No Time Source ip Dest ip Protocol Len Ip id Info
> 3811 2010-01-01 08:11:35.075522 192.168.43.1 192.168.43.90 UDP 144 0x25ee (9710) 60834 -> 5201 Len=100
> 3812 2010-01-01 08:11:35.075805 192.168.43.1 192.168.43.90 UDP 144 0x25f5 (9717) 60834 -> 5201 Len=100
> 3813 2010-01-01 08:11:35.075549 192.168.43.1 192.168.43.90 UDP 144 0x25ef (9711) 60834 -> 5201 Len=100
> 3814 2010-01-01 08:11:35.075944 192.168.43.1 192.168.43.90 UDP 144 0x25f6 (9718) 60834 -> 5201 Len=100
> 3815 2010-01-01 08:11:35.075567 192.168.43.1 192.168.43.90 UDP 144 0x25f0 (9712) 60834 -> 5201 Len=100
>
> out-of-order issue happens on:
> 1) ip_id=0x25f5 is received before ip_id=0x25ef, which are belong to the same downlink connection;
> 2) ip_id=0x25f6 is received before ip_id=0x25f0, which are belong to the same downlink connection.
>
>
> [kernel log]:
> We add debug log in rx kernel flow, such as netif_rx_ni() -> enqueue_to_backlog -> do_softirq() -> net_rx_action() -> process_backlog()
>
> kernel time
> | cpu id which process running on
> | | process name function name
> | | | |
> [ 598.837358] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ee, pending=0x8a, backlog_state=1
>
> // wifi rx_thread is running on cpu7 to receive rx packets of ip_id 0x25ee ~ 0x25f4, which are only enqueued.
> [ 598.837366] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ee, input_qlen=1, proc_qlen=0, pending=0x8a, backlog_state=1
>
> // because ksoftirqd is in running queue, do_softirq() exit and net_rx_action isn't executed actually.
>
> [ 598.837373] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
>
> [ 598.837383] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ef,input_qlen=2, proc_qlen=0, pending=0x8a, backlog_state=1
> [ 598.837391] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0 pending=0x8a, ksoftirqd_running=1
> [ 598.837400] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f0,input_qlen=3, proc_qlen=0, pending=0x8a, backlog_state=1
> [ 598.837407] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
> [ 598.837417] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f1,input_qlen=4, proc_qlen=0, pending=0x8a, backlog_state=1
> [ 598.837424] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
> [ 598.837434] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f2,input_qlen=5, proc_qlen=0, pending=0x8a, backlog_state=1
> [ 598.837441] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
> [ 598.837450] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f3,input_qlen=6, proc_qlen=0, pending=0x8a, backlog_state=1
> [ 598.837458] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
>
> // after ip_id(0x25f4), there is no rx packet in this wifi interrupt, so rx_thread exit.
> [ 598.837470] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f4,input_qlen=7, proc_qlen=0, pending=0x8a, backlog_state=1
> [ 598.837479] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
>
> // ksoftirqd is scheduled to run and call do_softirq(), so net_rx_action() is executed.
> [ 598.837511] (7)[45:ksoftirqd/7]: [RX]net_rx_action: budget=300, time=4295042005
> // net_rx_action() call process_backlog() to handle rx packet of ip_id=0x25ee
>
> [ 598.837646] (7)[45:ksoftirqd/7]: [RX]process_backlog: ip_id=0x25ee, delay=126308, work=1, input_qlen=0, proc_qlen=6
>
> // wifi interrupt happens and trigger rx_thread to receive the next rx packet, which is running on cpu1 this time.
> [ 598.837660] (1)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f5, input_qlen=1, proc_qlen=0, pending=0x8, backlog_state=1
> // because ksoftirqd is not running, net_rx softirq is executed immediately.
> [ 598.837666] (1)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8, ksoftirqd_running =0
>
> // so ksoftirqd on cpu7 and rx_thread on cpu1 are running at the same time.
> [ 598.837679] (1)[3282:rx_thread]: [RX]net_rx_action: budget=300, time=4295042005
>
> // ip_id=0x25ef is handled on cpu7 by net_rx softirq in ksoftirqd() thread context.
> [ 598.837762] (7)[45:ksoftirqd/7]: [RX]process_backlog: ip_id=0x25ef, delay=70385, work=2, input_qlen=0, proc_qlen=5
>
> // ip_id=0x25f5 is handled on cpu1 by net_rx softirq in rx_thread context, out-of-order issue happens.
>
> [ 598.837768] (1)[3282:rx_thread]: [RX]process_backlog: ip_id=0x25f5, delay=82000, work=1, quota=64, input_qlen=0, proc_qlen=0
>
> // ip_id=0x25f0 is handled on cpu7 by net_rx softirq in ksoftirqd() thread context.
> [ 598.837878] (7)[45:ksoftirqd/7]: [RX]process_backlog:ip_id=0x25f0, delay=47539, work=3, input_qlen=0, proc_qlen=4
>
>
> Could you please help to check if this out-of-order issue is caused by kernel-4.9 patch above?
>
> Thanks very much!
>
A driver calling netif_rx_ni() can have out of order packets,
regardless of commit 4cd13c21b207e80ddb1144c57650
This patch has changed scheduling, and might have pointed out ancient bugs/issues.
'thread context' means cpu can change under us (preempt kernels)
Which driver are you using exactly ?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Problems: network rx out-of-order issue
@ 2018-03-28 9:20 Anny Hu
0 siblings, 0 replies; 3+ messages in thread
From: Anny Hu @ 2018-03-28 9:20 UTC (permalink / raw)
To: netdev
Dears,
Recently, we find the following patch will impact multi-core network throughput performance on kernel-4.9,
for it will cause rx packet out-of-order.
commit id: 4cd13c21b207e80ddb1144c576500098f2d5f882
[kernel version]:
kernel-4.9
[repeat steps]
1. two our phones with load based on kernel-4.9, enable wifi and set up P2P connection;
2. two reference phones with load based on kernel-4.4, enable wifi and set up P2P connection;
3. do iperf UDP downlink test with the following commands;
Client: adb shell iperf -c 192.168.49.xx -u -i 1 -t 60 -b 15M
Server: adb shell iperf -s -u
4. out-of-order rate of our phones are obviously higher than reference phone:
DUT Lost rate out of order
Our phone 0.011% 164
Reference phone 0.0091% 1
[Code flow]
Wi-Fi driver trigger rx_thread to receive rx packets and call netif_rx_ni() to push packets to tcp stack in thread context.
[netlog]:
No Time Source ip Dest ip Protocol Len Ip id Info
3811 2010-01-01 08:11:35.075522 192.168.43.1 192.168.43.90 UDP 144 0x25ee (9710) 60834 -> 5201 Len=100
3812 2010-01-01 08:11:35.075805 192.168.43.1 192.168.43.90 UDP 144 0x25f5 (9717) 60834 -> 5201 Len=100
3813 2010-01-01 08:11:35.075549 192.168.43.1 192.168.43.90 UDP 144 0x25ef (9711) 60834 -> 5201 Len=100
3814 2010-01-01 08:11:35.075944 192.168.43.1 192.168.43.90 UDP 144 0x25f6 (9718) 60834 -> 5201 Len=100
3815 2010-01-01 08:11:35.075567 192.168.43.1 192.168.43.90 UDP 144 0x25f0 (9712) 60834 -> 5201 Len=100
out-of-order issue happens on:
1) ip_id=0x25f5 is received before ip_id=0x25ef, which are belong to the same downlink connection;
2) ip_id=0x25f6 is received before ip_id=0x25f0, which are belong to the same downlink connection.
[kernel log]:
We add debug log in rx kernel flow, such as netif_rx_ni() -> enqueue_to_backlog -> do_softirq() -> net_rx_action() -> process_backlog()
kernel time
| cpu id which process running on
| | process name function name
| | | |
[ 598.837358] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ee, pending=0x8a, backlog_state=1
// wifi rx_thread is running on cpu7 to receive rx packets of ip_id 0x25ee ~ 0x25f4, which are only enqueued.
[ 598.837366] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ee, input_qlen=1, proc_qlen=0, pending=0x8a, backlog_state=1
// because ksoftirqd is in running queue, do_softirq() exit and net_rx_action isn't executed actually.
[ 598.837373] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
[ 598.837383] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ef,input_qlen=2, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837391] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0 pending=0x8a, ksoftirqd_running=1
[ 598.837400] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f0,input_qlen=3, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837407] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
[ 598.837417] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f1,input_qlen=4, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837424] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
[ 598.837434] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f2,input_qlen=5, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837441] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
[ 598.837450] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f3,input_qlen=6, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837458] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
// after ip_id(0x25f4), there is no rx packet in this wifi interrupt, so rx_thread exit.
[ 598.837470] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f4,input_qlen=7, proc_qlen=0, pending=0x8a, backlog_state=1
[ 598.837479] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1
// ksoftirqd is scheduled to run and call do_softirq(), so net_rx_action() is executed.
[ 598.837511] (7)[45:ksoftirqd/7]: [RX]net_rx_action: budget=300, time=4295042005
// net_rx_action() call process_backlog() to handle rx packet of ip_id=0x25ee
[ 598.837646] (7)[45:ksoftirqd/7]: [RX]process_backlog: ip_id=0x25ee, delay=126308, work=1, input_qlen=0, proc_qlen=6
// wifi interrupt happens and trigger rx_thread to receive the next rx packet, which is running on cpu1 this time.
[ 598.837660] (1)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f5, input_qlen=1, proc_qlen=0, pending=0x8, backlog_state=1
// because ksoftirqd is not running, net_rx softirq is executed immediately.
[ 598.837666] (1)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8, ksoftirqd_running =0
// so ksoftirqd on cpu7 and rx_thread on cpu1 are running at the same time.
[ 598.837679] (1)[3282:rx_thread]: [RX]net_rx_action: budget=300, time=4295042005
// ip_id=0x25ef is handled on cpu7 by net_rx softirq in ksoftirqd() thread context.
[ 598.837762] (7)[45:ksoftirqd/7]: [RX]process_backlog: ip_id=0x25ef, delay=70385, work=2, input_qlen=0, proc_qlen=5
// ip_id=0x25f5 is handled on cpu1 by net_rx softirq in rx_thread context, out-of-order issue happens.
[ 598.837768] (1)[3282:rx_thread]: [RX]process_backlog: ip_id=0x25f5, delay=82000, work=1, quota=64, input_qlen=0, proc_qlen=0
// ip_id=0x25f0 is handled on cpu7 by net_rx softirq in ksoftirqd() thread context.
[ 598.837878] (7)[45:ksoftirqd/7]: [RX]process_backlog:ip_id=0x25f0, delay=47539, work=3, input_qlen=0, proc_qlen=4
Could you please help to check if this out-of-order issue is caused by kernel-4.9 patch above?
Thanks very much!
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-03-28 14:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-28 9:26 Problems: network rx out-of-order issue Anny Hu
2018-03-28 14:19 ` Eric Dumazet
-- strict thread matches above, loose matches on Subject: below --
2018-03-28 9:20 Anny Hu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.