All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anny Hu <Anny.Hu@mediatek.com>
To: <netdev@vger.kernel.org>
Subject: Problems: network rx out-of-order issue
Date: Wed, 28 Mar 2018 17:20:08 +0800	[thread overview]
Message-ID: <1522228808.8705.42.camel@mbjswglx818> (raw)

Dears,

Recently, we find the following patch will impact multi-core network throughput performance on kernel-4.9,
for it will cause rx packet out-of-order.
commit id: 4cd13c21b207e80ddb1144c576500098f2d5f882

[kernel version]: 
kernel-4.9

[repeat steps]
1. two our phones with load based on kernel-4.9, enable wifi and set up P2P connection; 
2. two reference phones with load based on kernel-4.4, enable wifi and set up P2P connection; 
3. do iperf UDP downlink test with the following commands;
Client: adb shell iperf -c 192.168.49.xx -u -i 1 -t 60 -b 15M
Server: adb shell iperf -s -u
4. out-of-order rate of our phones are obviously higher than reference phone:
DUT                                  Lost rate              out of order
Our phone                            0.011%                  164    
Reference phone                      0.0091%                 1

[Code flow]
Wi-Fi driver trigger rx_thread to receive rx packets and call netif_rx_ni() to push packets to tcp stack in thread context.

[netlog]:
No      Time                            Source ip       Dest ip         Protocol Len     Ip id                   Info
3811    2010-01-01 08:11:35.075522      192.168.43.1    192.168.43.90   UDP      144     0x25ee (9710)           60834 -> 5201 Len=100
3812    2010-01-01 08:11:35.075805      192.168.43.1    192.168.43.90   UDP      144     0x25f5 (9717)           60834 -> 5201 Len=100
3813    2010-01-01 08:11:35.075549      192.168.43.1    192.168.43.90   UDP      144     0x25ef (9711)           60834 -> 5201 Len=100
3814    2010-01-01 08:11:35.075944      192.168.43.1    192.168.43.90   UDP      144     0x25f6 (9718)           60834 -> 5201 Len=100
3815    2010-01-01 08:11:35.075567      192.168.43.1    192.168.43.90   UDP      144     0x25f0 (9712)           60834 -> 5201 Len=100 

out-of-order issue happens on:
1) ip_id=0x25f5 is received before ip_id=0x25ef, which are belong to the same downlink connection;
2) ip_id=0x25f6 is received before ip_id=0x25f0, which are belong to the same downlink connection.


[kernel log]:
We add debug log in rx kernel flow, such as netif_rx_ni() -> enqueue_to_backlog -> do_softirq() -> net_rx_action() -> process_backlog()

    kernel time
    |          cpu id which process running on
    |           |         process name         function name
    |           |         |                    |
[  598.837358] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ee, pending=0x8a, backlog_state=1

// wifi rx_thread is running on cpu7 to receive rx packets of ip_id 0x25ee ~ 0x25f4, which are only enqueued.
[  598.837366] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ee, input_qlen=1, proc_qlen=0, pending=0x8a, backlog_state=1  
 
// because ksoftirqd is in running queue, do_softirq() exit and net_rx_action isn't executed actually.
   
[  598.837373] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1                          

[  598.837383] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ef,input_qlen=2, proc_qlen=0, pending=0x8a, backlog_state=1 
[  598.837391] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0 pending=0x8a, ksoftirqd_running=1 
[  598.837400] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f0,input_qlen=3, proc_qlen=0, pending=0x8a, backlog_state=1 
[  598.837407] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1 
[  598.837417] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f1,input_qlen=4, proc_qlen=0, pending=0x8a, backlog_state=1 
[  598.837424] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1 
[  598.837434] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f2,input_qlen=5, proc_qlen=0, pending=0x8a, backlog_state=1 
[  598.837441] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1 
[  598.837450] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f3,input_qlen=6, proc_qlen=0, pending=0x8a, backlog_state=1 
[  598.837458] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1 

// after ip_id(0x25f4), there is no rx packet in this wifi interrupt, so rx_thread exit.
[  598.837470] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f4,input_qlen=7, proc_qlen=0, pending=0x8a, backlog_state=1
[  598.837479] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1                                  

//  ksoftirqd is scheduled to run and call do_softirq(), so net_rx_action() is executed.    
[  598.837511] (7)[45:ksoftirqd/7]: [RX]net_rx_action: budget=300, time=4295042005  
//  net_rx_action() call process_backlog() to handle rx packet of ip_id=0x25ee
                                                                                 
[  598.837646] (7)[45:ksoftirqd/7]: [RX]process_backlog: ip_id=0x25ee, delay=126308, work=1, input_qlen=0, proc_qlen=6  

// wifi interrupt happens and trigger rx_thread to receive the next rx packet, which is running on cpu1 this time. 
[  598.837660] (1)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f5, input_qlen=1, proc_qlen=0, pending=0x8, backlog_state=1     
// because ksoftirqd is not running, net_rx softirq is executed immediately.
[  598.837666] (1)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8, ksoftirqd_running =0                                  

// so ksoftirqd on cpu7 and rx_thread on cpu1 are running at the same time.
[  598.837679] (1)[3282:rx_thread]: [RX]net_rx_action: budget=300, time=4295042005  
  
//  ip_id=0x25ef is handled on cpu7 by net_rx softirq in ksoftirqd() thread context.                                                                                      
[  598.837762] (7)[45:ksoftirqd/7]: [RX]process_backlog: ip_id=0x25ef, delay=70385, work=2, input_qlen=0, proc_qlen=5  
 
//  ip_id=0x25f5 is handled on cpu1 by net_rx softirq in rx_thread context, out-of-order issue happens.
                                                  
[  598.837768] (1)[3282:rx_thread]: [RX]process_backlog: ip_id=0x25f5, delay=82000, work=1, quota=64, input_qlen=0, proc_qlen=0 
             
//  ip_id=0x25f0 is handled on cpu7 by net_rx softirq in ksoftirqd() thread context.                              
[  598.837878] (7)[45:ksoftirqd/7]: [RX]process_backlog:ip_id=0x25f0, delay=47539, work=3, input_qlen=0, proc_qlen=4   

                               
Could you please help to check if this out-of-order issue is caused by kernel-4.9 patch above?

Thanks very much!

             reply	other threads:[~2018-03-28  9:20 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-28  9:20 Anny Hu [this message]
2018-03-28  9:26 Problems: network rx out-of-order issue Anny Hu
2018-03-28 14:19 ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1522228808.8705.42.camel@mbjswglx818 \
    --to=anny.hu@mediatek.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.