All of lore.kernel.org
 help / color / mirror / Atom feed
From: Julian Wiedmann <jwi@linux.ibm.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
	Eric Dumazet <edumazet@google.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Luigi Rizzo <lrizzo@google.com>
Subject: Re: [PATCH net-next 1/3] net: napi: add hard irqs deferral feature
Date: Mon, 4 May 2020 17:25:01 +0200	[thread overview]
Message-ID: <dd7d271f-ce45-5783-45a0-e89a6c428428@linux.ibm.com> (raw)
In-Reply-To: <78e8b060-6386-b6c1-d32f-907da2c930a7@gmail.com>

On 02.05.20 18:24, Eric Dumazet wrote:
> 
> 
> On 5/2/20 9:10 AM, Julian Wiedmann wrote:
>> On 02.05.20 17:40, Eric Dumazet wrote:
>>> On Sat, May 2, 2020 at 7:56 AM Julian Wiedmann <jwi@linux.ibm.com> wrote:
>>>>
>>>> On 22.04.20 18:13, Eric Dumazet wrote:

[...]

>>>>
>>>>
>>>>> By default, both gro_flush_timeout and napi_defer_hard_irqs are zero.
>>>>>
>>>>> This patch does not change the prior behavior of gro_flush_timeout
>>>>> if used alone : NIC hard irqs should be rearmed as before.
>>>>>
>>>>> One concrete usage can be :
>>>>>
>>>>> echo 20000 >/sys/class/net/eth1/gro_flush_timeout
>>>>> echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs
>>>>>
>>>>> If at least one packet is retired, then we will reset napi counter
>>>>> to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans
>>>>> of the queue.
>>>>>
>>>>> On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ
>>>>> avoidance was only possible if napi->poll() was exhausting its budget
>>>>> and not call napi_complete_done().
>>>>>
>>>>
>>>> I was confused here for a second, so let me just clarify how this is intended
>>>> to look like for pure TX completion IRQs:
>>>>
>>>> napi->poll() calls napi_complete_done() with an accurate work_done value, but
>>>> then still returns 0 because TX completion work doesn't consume NAPI budget.
>>>
>>>
>>> If the napi budget was consumed, the driver does _not_ call
>>> napi_complete() or napi_complete_done() anyway.
>>>
>>
>> I was thinking of "TX completions are cheap and don't consume _any_ NAPI budget, ever"
>> as the current consensus, but looking at the mlx4 code that evidently isn't true
>> for all drivers.
> 
> TX completions are not cheap in many cases.
> 
> Doing the unmap stuff can be costly in IOMMU world, and freeing skb
> can be also expensive.
> Add to this that TCP stack might be called back (via skb->destructor()) to add more packets to the qdisc/device.
> 
> So using effectively the budget as a limit might help in some stress situations,
> by not re-enabling NIC interrupts, even before napi_defer_hard_irqs addition.
> 

Neat, thanks for sharing this. Now I also see the tricks that mlx4 plays to still
get netpoll working.... fun.

>>
>>> If the budget is consumed, then napi_complete_done(napi, X>0) allows
>>> napi_complete_done()
>>> to return 0 if napi_defer_hard_irqs is not 0
>>>
>>> This means that the NIC hard irq will stay disabled for at least one more round.
>>>


  parent reply	other threads:[~2020-05-04 15:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-22 16:13 [PATCH net-next 0/3] net: napi: addition of napi_defer_hard_irqs Eric Dumazet
2020-04-22 16:13 ` [PATCH net-next 1/3] net: napi: add hard irqs deferral feature Eric Dumazet
2020-05-02 14:56   ` Julian Wiedmann
2020-05-02 15:40     ` Eric Dumazet
2020-05-02 16:10       ` Julian Wiedmann
2020-05-02 16:24         ` Eric Dumazet
2020-05-02 23:45           ` David Miller
2020-05-04 15:25           ` Julian Wiedmann [this message]
2020-05-04 15:33             ` Eric Dumazet
2020-04-22 16:13 ` [PATCH net-next 2/3] net: napi: use READ_ONCE()/WRITE_ONCE() Eric Dumazet
2020-04-22 16:13 ` [PATCH net-next 3/3] net/mlx4_en: use napi_complete_done() in TX completion Eric Dumazet
2020-04-23 19:43 ` [PATCH net-next 0/3] net: napi: addition of napi_defer_hard_irqs David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd7d271f-ce45-5783-45a0-e89a6c428428@linux.ibm.com \
    --to=jwi@linux.ibm.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=lrizzo@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.