All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Benjamin Poirier <bpoirier@suse.com>
Cc: devel@driverdev.osuosl.org, netdev@vger.kernel.org,
	GR-Linux-NIC-Dev@marvell.com, linux-kernel@vger.kernel.org,
	Manish Chopra <manishc@marvell.com>
Subject: Re: [PATCH v2 0/17] staging: qlge: Fix rx stall in case of allocation failures
Date: Fri, 4 Oct 2019 10:19:31 +0200	[thread overview]
Message-ID: <20191004081931.GA67764@kroah.com> (raw)
In-Reply-To: <20190927101210.23856-1-bpoirier@suse.com>

On Fri, Sep 27, 2019 at 07:11:54PM +0900, Benjamin Poirier wrote:
> qlge refills rx buffers from napi context. In case of allocation failure,
> allocation will be retried the next time napi runs. If a receive queue runs
> out of free buffers (possibly after subsequent allocation failures), it
> drops all traffic, no longer raises interrupts and napi is no longer
> scheduled; reception is stalled until manual admin intervention.
> 
> This patch series adds a fallback mechanism for rx buffer allocation. If an
> rx buffer queue becomes empty, a workqueue is scheduled to refill it from
> process context where allocation can block until mm has freed some pages
> (hopefully). This approach was inspired by the virtio_net driver (commit
> 3161e453e496 "virtio: net refill on out-of-memory").
> 
> I've compared this with how some other devices with a similar allocation
> scheme handle this situation:
> mlx4 relies on a periodic watchdog, sfc uses a timer, e1000e and fm10k rely
> on periodic hardware interrupts (IIUC). In all cases, they use this to
> schedule napi periodically at a fixed interval (10-250ms) until allocations
> succeed. This kind of approach simplifies allocations because only one
> context may refill buffers, however it is inefficient because of the fixed
> interval: either the interval was too short, the allocation fails again and
> work was done without forward progress; or the interval was too long,
> buffers could've been allocated earlier and rx restarted earlier, instead
> traffic was dropped while the system was idle.
> 
> Note that the qlge driver (and device) uses two kinds of buffers for
> received data, so-called "small buffers" and "large buffers". The two are
> arranged in ring pairs, the sbq and lbq. Depending on frame size, protocol
> content and header splitting, data can go in either type of buffers.
> Because of buffer size, lbq allocations are more likely to fail and lead to
> stall, however I've reproduced the problem with sbq as well. The problem
> was originally found when running jumbo frames. In that case, qlge uses
> order-1 allocations for the large buffers. Although the two kinds of
> buffers are managed similarly, the qlge driver duplicates most data
> structures and code for their handling. In fact, even a casual look at the
> qlge driver shows it to be in a state of disrepair, to put it kindly...
> 
> Patches 1-14 are cleanups that remove, fix and deduplicate code related to
> sbq and lbq handling. Regarding those cleanups, patches 2 ("Remove
> irq_cnt") and 8 ("Deduplicate rx buffer queue management") are the most
> important. Finally, patches 15-17 fix the actual problem of rx stalls in
> case of allocation failures by implementing the fallback of allocations to
> a workqueue.
> 
> I've tested these patches using two different approaches:
> 1) A sender uses pktgen to send udp traffic. The receiver has a large swap,
> a large net.core.rmem_max, runs a program that dirties all free memory in a
> loop and runs a program that opens as many udp sockets as possible but
> doesn't read from them. Since received data is all queued in the sockets
> rather than freed, qlge is allocating receive buffers as quickly as
> possible and faces allocation failures if the swap is slower than the
> network.
> 2) A sender uses super_netperf. Likewise, the receiver has a large swap, a
> large net.core.rmem_max and runs a program that dirties all free memory in
> a loop. After the netperf send test is started, `killall -s SIGSTOP
> netserver` on the receiver leads to the same situation as above.

As this code got moved to staging with the goal to drop it from the
tree, why are you working on fixing it up?  Do you want it moved back
out of staging into the "real" part of the tree, or are you just fixing
things that you find in order to make it cleaner before we delete it?

confused,

greg k-h

WARNING: multiple messages have this Message-ID (diff)
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Benjamin Poirier <bpoirier@suse.com>
Cc: devel@driverdev.osuosl.org, netdev@vger.kernel.org,
	GR-Linux-NIC-Dev@marvell.com, linux-kernel@vger.kernel.org,
	Manish Chopra <manishc@marvell.com>
Subject: Re: [PATCH v2 0/17] staging: qlge: Fix rx stall in case of allocation failures
Date: Fri, 4 Oct 2019 10:19:31 +0200	[thread overview]
Message-ID: <20191004081931.GA67764@kroah.com> (raw)
In-Reply-To: <20190927101210.23856-1-bpoirier@suse.com>

On Fri, Sep 27, 2019 at 07:11:54PM +0900, Benjamin Poirier wrote:
> qlge refills rx buffers from napi context. In case of allocation failure,
> allocation will be retried the next time napi runs. If a receive queue runs
> out of free buffers (possibly after subsequent allocation failures), it
> drops all traffic, no longer raises interrupts and napi is no longer
> scheduled; reception is stalled until manual admin intervention.
> 
> This patch series adds a fallback mechanism for rx buffer allocation. If an
> rx buffer queue becomes empty, a workqueue is scheduled to refill it from
> process context where allocation can block until mm has freed some pages
> (hopefully). This approach was inspired by the virtio_net driver (commit
> 3161e453e496 "virtio: net refill on out-of-memory").
> 
> I've compared this with how some other devices with a similar allocation
> scheme handle this situation:
> mlx4 relies on a periodic watchdog, sfc uses a timer, e1000e and fm10k rely
> on periodic hardware interrupts (IIUC). In all cases, they use this to
> schedule napi periodically at a fixed interval (10-250ms) until allocations
> succeed. This kind of approach simplifies allocations because only one
> context may refill buffers, however it is inefficient because of the fixed
> interval: either the interval was too short, the allocation fails again and
> work was done without forward progress; or the interval was too long,
> buffers could've been allocated earlier and rx restarted earlier, instead
> traffic was dropped while the system was idle.
> 
> Note that the qlge driver (and device) uses two kinds of buffers for
> received data, so-called "small buffers" and "large buffers". The two are
> arranged in ring pairs, the sbq and lbq. Depending on frame size, protocol
> content and header splitting, data can go in either type of buffers.
> Because of buffer size, lbq allocations are more likely to fail and lead to
> stall, however I've reproduced the problem with sbq as well. The problem
> was originally found when running jumbo frames. In that case, qlge uses
> order-1 allocations for the large buffers. Although the two kinds of
> buffers are managed similarly, the qlge driver duplicates most data
> structures and code for their handling. In fact, even a casual look at the
> qlge driver shows it to be in a state of disrepair, to put it kindly...
> 
> Patches 1-14 are cleanups that remove, fix and deduplicate code related to
> sbq and lbq handling. Regarding those cleanups, patches 2 ("Remove
> irq_cnt") and 8 ("Deduplicate rx buffer queue management") are the most
> important. Finally, patches 15-17 fix the actual problem of rx stalls in
> case of allocation failures by implementing the fallback of allocations to
> a workqueue.
> 
> I've tested these patches using two different approaches:
> 1) A sender uses pktgen to send udp traffic. The receiver has a large swap,
> a large net.core.rmem_max, runs a program that dirties all free memory in a
> loop and runs a program that opens as many udp sockets as possible but
> doesn't read from them. Since received data is all queued in the sockets
> rather than freed, qlge is allocating receive buffers as quickly as
> possible and faces allocation failures if the swap is slower than the
> network.
> 2) A sender uses super_netperf. Likewise, the receiver has a large swap, a
> large net.core.rmem_max and runs a program that dirties all free memory in
> a loop. After the netperf send test is started, `killall -s SIGSTOP
> netserver` on the receiver leads to the same situation as above.

As this code got moved to staging with the goal to drop it from the
tree, why are you working on fixing it up?  Do you want it moved back
out of staging into the "real" part of the tree, or are you just fixing
things that you find in order to make it cleaner before we delete it?

confused,

greg k-h
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

  parent reply	other threads:[~2019-10-04  8:19 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-27 10:11 [PATCH v2 0/17] staging: qlge: Fix rx stall in case of allocation failures Benjamin Poirier
2019-09-27 10:11 ` Benjamin Poirier
2019-09-27 10:11 ` [PATCH v2 01/17] staging: qlge: Fix irq masking in INTx mode Benjamin Poirier
2019-09-27 10:11   ` Benjamin Poirier
2019-09-27 10:11 ` [PATCH v2 02/17] staging: qlge: Remove irq_cnt Benjamin Poirier
2019-09-27 10:11   ` Benjamin Poirier
2019-09-27 10:11 ` [PATCH v2 03/17] staging: qlge: Remove page_chunk.last_flag Benjamin Poirier
2019-09-27 10:11   ` Benjamin Poirier
2019-09-27 10:11 ` [PATCH v2 04/17] staging: qlge: Deduplicate lbq_buf_size Benjamin Poirier
2019-09-27 10:11   ` Benjamin Poirier
2019-09-27 10:11 ` [PATCH v2 05/17] staging: qlge: Remove bq_desc.maplen Benjamin Poirier
2019-09-27 10:11   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 06/17] staging: qlge: Remove rx_ring.sbq_buf_size Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 07/17] staging: qlge: Remove useless dma synchronization calls Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 08/17] staging: qlge: Deduplicate rx buffer queue management Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 09/17] staging: qlge: Fix dma_sync_single calls Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 10/17] staging: qlge: Remove rx_ring.type Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 11/17] staging: qlge: Factor out duplicated expression Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 12/17] staging: qlge: Remove qlge_bq.len & size Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 13/17] staging: qlge: Remove useless memset Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 14/17] staging: qlge: Replace memset with assignment Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 15/17] staging: qlge: Update buffer queue prod index despite oom Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 16/17] staging: qlge: Refill rx buffers up to multiple of 16 Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-09-27 10:12 ` [PATCH v2 17/17] staging: qlge: Refill empty buffer queues from wq Benjamin Poirier
2019-09-27 10:12   ` Benjamin Poirier
2019-10-04  8:19 ` Greg Kroah-Hartman [this message]
2019-10-04  8:19   ` [PATCH v2 0/17] staging: qlge: Fix rx stall in case of allocation failures Greg Kroah-Hartman
2019-10-04  9:15   ` Benjamin Poirier
2019-10-04  9:15     ` Benjamin Poirier
2019-10-04 15:19     ` Greg Kroah-Hartman
2019-10-04 15:19       ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191004081931.GA67764@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=GR-Linux-NIC-Dev@marvell.com \
    --cc=bpoirier@suse.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manishc@marvell.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.