All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: netdev <netdev@vger.kernel.org>,
	Tariq Toukan <tariqt@mellanox.com>,
	Saeed Mahameed <saeedm@mellanox.com>
Subject: Re: [PATCH net] net: solve a NAPI race
Date: Mon, 27 Feb 2017 06:06:39 -0800	[thread overview]
Message-ID: <1488204399.9415.177.camel@edumazet-glaptop3.roam.corp.google.com> (raw)
In-Reply-To: <1488166294.9415.172.camel@edumazet-glaptop3.roam.corp.google.com>

On Sun, 2017-02-26 at 19:31 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> While playing with mlx4 hardware timestamping of RX packets, I found
> that some packets were received by TCP stack with a ~200 ms delay...
> 
> Since the timestamp was provided by the NIC, and my probe was added
> in tcp_v4_rcv() while in BH handler, I was confident it was not
> a sender issue, or a drop in the network.
> 
> This would happen with a very low probability, but hurting RPC
> workloads.
> 
> A NAPI driver normally arms the IRQ after the napi_complete_done(),
> after NAPI_STATE_SCHED is cleared, so that the hard irq handler can grab
> it.
> 
> Problem is that if another point in the stack grabs NAPI_STATE_SCHED bit
> while IRQ are not disabled, we might have later an IRQ firing and
> finding this bit set, right before napi_complete_done() clears it.
> 
> This can happen with busy polling users, or if gro_flush_timeout is
> used. But some other uses of napi_schedule() in drivers can cause this
> as well.
> 
> This patch adds a new NAPI_STATE_MISSED bit, that napi_schedule_prep()
> can set if it could not grab NAPI_STATE_SCHED
> 
> Then napi_complete_done() properly reschedules the napi to make sure
> we do not miss something.
> 
> Since we manipulate multiple bits at once, use cmpxchg() like in
> sk_busy_loop() to provide proper transactions.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/netdevice.h |   29 +++++++----------------
>  net/core/dev.c            |   44 ++++++++++++++++++++++++++++++++++--
>  2 files changed, 51 insertions(+), 22 deletions(-)

I will send a v2 of this patch.

Points trying to grab NAPI_STATE_SCHED not from the device driver IRQ
handler should not set NAPI_STATE_MISSED if they fail, otherwise this
adds extra work for no purpose.

One of this point is napi_watchdog() which can fire pretty often.

  reply	other threads:[~2017-02-27 14:15 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-25 14:22 [PATCH net] net/mlx4_en: reception NAPI/IRQ race breaker Eric Dumazet
2017-02-26 16:32 ` Saeed Mahameed
2017-02-26 17:34   ` Eric Dumazet
2017-02-26 17:40     ` Eric Dumazet
2017-02-26 18:03       ` Eric Dumazet
2017-02-27  3:31 ` [PATCH net] net: solve a NAPI race Eric Dumazet
2017-02-27 14:06   ` Eric Dumazet [this message]
2017-02-27 14:21   ` [PATCH v2 " Eric Dumazet
2017-02-27 16:19     ` David Miller
2017-02-27 16:44       ` Eric Dumazet
2017-02-27 17:10         ` Eric Dumazet
2017-02-28  2:08         ` David Miller
2017-03-01  0:22           ` Francois Romieu
2017-03-01  1:04             ` Stephen Hemminger
2017-02-27 21:00       ` Alexander Duyck
2017-02-27 20:18     ` [PATCH v3 " Eric Dumazet
2017-02-27 22:14       ` Stephen Hemminger
2017-02-27 22:35         ` Eric Dumazet
2017-02-27 22:44           ` Stephen Hemminger
2017-02-27 22:48             ` David Miller
2017-02-27 23:23               ` Stephen Hemminger
2017-02-28 10:14           ` David Laight
2017-02-28 13:04             ` Eric Dumazet
2017-02-28 13:20             ` Eric Dumazet
2017-02-28 16:17       ` Stephen Hemminger
2017-02-28 16:57         ` Eric Dumazet
2017-02-28 18:34       ` [PATCH v4 " Eric Dumazet
2017-03-01 17:53         ` David Miller
2017-02-28 17:20     ` [PATCH v2 " Alexander Duyck
2017-02-28 17:47       ` Eric Dumazet
2017-03-01 10:41       ` David Laight
2017-03-01 16:14         ` Alexander Duyck
2017-03-01 17:32           ` Eric Dumazet
2017-03-02 10:24             ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1488204399.9415.177.camel@edumazet-glaptop3.roam.corp.google.com \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.