All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jonathan Lemon" <jonathan.lemon@gmail.com>
To: "Jesper Dangaard Brouer" <brouer@redhat.com>
Cc: "Saeed Mahameed" <saeedm@mellanox.com>,
	davem@davemloft.net, kernel-team@fb.com, netdev@vger.kernel.org,
	ilias.apalodimas@linaro.org, "Li RongQing" <lirongqing@baidu.com>
Subject: Re: [PATCH] page_pool: use irqsave/irqrestore to protect ring access.
Date: Tue, 10 Mar 2020 10:52:37 -0700	[thread overview]
Message-ID: <67CA3793-636E-4F73-AC80-44D05A6BDB6F@gmail.com> (raw)
In-Reply-To: <20200310110412.66b60677@carbon>



On 10 Mar 2020, at 3:04, Jesper Dangaard Brouer wrote:

> On Tue, 10 Mar 2020 02:30:34 +0000
> Saeed Mahameed <saeedm@mellanox.com> wrote:
>
>> On Mon, 2020-03-09 at 17:55 -0700, David Miller wrote:
>>> From: Jonathan Lemon <jonathan.lemon@gmail.com>
>>> Date: Mon, 9 Mar 2020 12:49:29 -0700
>>>
>>>> netpoll may be called from IRQ context, which may access the
>>>> page pool ring.  The current _bh variants do not provide sufficient
>>>> protection, so use irqsave/restore instead.
>>>>
>>>> Error observed on a modified mlx4 driver, but the code path exists
>>>> for any driver which calls page_pool_recycle from napi poll.
>
> Netpoll calls into drivers are problematic, nasty and annoying. 
> Drivers
> usually catch netpoll calls via seeing NAPI budget is zero, and handle
> the situation inside the driver e.g.[1][2]. (even napi_consume_skb
> catch it this way).
>
> [1] 
> https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c#L3179
> [2] 
> https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c#L1110

This still doesn't seem safe.

netpoll calls napi poll in IRQ context.
Consider the following path:

  ixgbe_poll()
   ixgbe_for_each_ring()
    ixgbe_clean_tx_irq()    /* before the L3179 check above */
     xdp_return_frame()
      page_pool_put_page( .. napi=0 )

Which bypasses the cache and tries to return the page to the ring.
Since in_serving_softirq() is false, it uses the _bh variant, which
blows up.



>>>> WARNING: CPU: 34 PID: 550248 at /ro/source/kernel/softirq.c:161
>>> __local_bh_enable_ip+0x35/0x50
>>>  ...
>>>> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
>>>
>>> The netpoll stuff always makes the locking more complicated than it
>>> needs to be.  I wonder if there is another way around this issue?
>>>
>>> Because IRQ save/restore is a high cost to pay in this critical 
>>> path.
>
> Yes, huge NACK from me, this would kill performance. We need to find
> another way.

Sure - checking in_irq() in the free path is another solution.


>> a printk inside irq context lead to this, so maybe it can be avoided 
>> ..
>>
>> or instead of checking in_serving_softirq()  change page_pool to
>> check in_interrupt() which is more powerful, to avoid ptr_ring 
>> locking
>> and the complication with netpoll altogether.
>>
>> I wonder why Jesper picked in_serving_softirq() in first place, was
>> there a specific reason ? or he just wanted it to be as less strict 
>> as
>> possible ?
>
> I wanted to make it specific that this is optimized for softirq.
>
> I actually think this is a regression we have introduced recently in
> combined patches:
>
>  304db6cb7610 ("page_pool: refill page when alloc.count of pool is 
> zero")
>  44768decb7c0 ("page_pool: handle page recycle for NUMA_NO_NODE 
> condition")
>
> I forgot that the in_serving_softirq() check was also protecting us
> when getting called from netpoll.  The general idea before was that if
> the in_serving_softirq() check failed, then we falled-through to
> slow-path calling normal alloc_pages_node (regular page allocator).

I don't think it ever did that - in all cases, the in_serving_softirq()
check guards access to the cache, otherwise it falls through to the 
ring.

Here, it seems we need  a tristate:
    in_irq:  fall through to system page allocator
    napi?    use cache
    else     ring


However, looking at those two patches again, while it doesn't address
this current issue, I do agree that they are regressions:

   1) Under stress, the system allocator may return pages with
       a different node id.

   2) Since the check is now removed from page_pool_reusable(), the
      foreign page is placed in the cache and persists there until
      it is moved to the ring.

   3) on refill from ring, if this page is found, processing stops,
      regardless of the other pages in the ring.

The above regression(s) are introduced by trying to avoid checking
the page node_id at free time.  Are there any metrics showing that
this is a performance issue?  Drivers which do page biasing still
have to perform this check in the driver (e.g.: mlx4, i40e) regardless.
-- 
Jonathan

  reply	other threads:[~2020-03-10 17:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-09 19:49 [PATCH] page_pool: use irqsave/irqrestore to protect ring access Jonathan Lemon
2020-03-10  0:55 ` David Miller
2020-03-10  2:30   ` Saeed Mahameed
2020-03-10 10:04     ` Jesper Dangaard Brouer
2020-03-10 17:52       ` Jonathan Lemon [this message]
2020-03-10 21:09       ` Jakub Kicinski
2020-03-10 23:53         ` Saeed Mahameed
2020-03-10 17:07     ` Jonathan Lemon
2020-03-10 23:34       ` Saeed Mahameed
2020-03-10 16:52   ` Jonathan Lemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67CA3793-636E-4F73-AC80-44D05A6BDB6F@gmail.com \
    --to=jonathan.lemon@gmail.com \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=ilias.apalodimas@linaro.org \
    --cc=kernel-team@fb.com \
    --cc=lirongqing@baidu.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.