All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Vineeth Remanan Pillai <vineethp@amazon.com>,
	David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wei Liu <wei.liu2@citrix.com>,
	Paul Durrant <paul.durrant@citrix.com>,
	xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
Date: Sun, 29 Jan 2017 18:09:21 -0500	[thread overview]
Message-ID: <38ccfaea-0a65-a6f3-c19a-e6f9c0d4ef76@oracle.com> (raw)
In-Reply-To: <66b10c64-936a-8001-6855-2ff1ed626642@amazon.com>



On 01/19/2017 11:35 AM, Vineeth Remanan Pillai wrote:
> From: Vineeth Remanan Pillai <vineethp@amazon.com>
>
> During an OOM scenario, request slots could not be created as skb
> allocation fails. So the netback cannot pass in packets and netfront
> wrongly assumes that there is no more work to be done and it disables
> polling. This causes Rx to stall.
>
> The issue is with the retry logic which schedules the timer if the
> created slots are less than NET_RX_SLOTS_MIN. The count of new request
> slots to be pushed are calculated as a difference between new req_prod
> and rsp_cons which could be more than the actual slots, if there are
> unconsumed responses.
>
> The fix is to calculate the count of newly created slots as the
> difference between new req_prod and old req_prod.
>
> Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
> ---
> Changes in v2:
>     - Removed the old implementation of enabling polling on
>       skb allocation error.
>     - Corrected the refill timer logic to schedule when newly
>       created slots since last push is less than NET_RX_SLOTS_MIN.


There are couple of problems with this patch.
1. The 'if' clause now evaluates to true on pretty much every call to 
xennet_alloc_rx_buffers().
2. It tickles a latent bug during resume where the timer triggers before 
we re-connect. The trouble is that we now try to dereference 
queue->rx.sring which is NULL since we disconnect in netfront_resume(). 
(Curiously, I only observe it with 32-bit guests)

I'll send a patch later that will delete the timer since it looks like a 
bug to me in any case but the first problem seems to be more serious 
than the problem that this patch addresses.

-boris

>
>  drivers/net/xen-netfront.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 40f26b6..2c7c29f 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -321,7 +321,7 @@ static void xennet_alloc_rx_buffers(struct
> netfront_queue *queue)
>      queue->rx.req_prod_pvt = req_prod;
>
>      /* Not enough requests? Try again later. */
> -    if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
> +    if (req_prod - queue->rx.sring->req_prod < NET_RX_SLOTS_MIN) {
>          mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
>          return;
>      }

WARNING: multiple messages have this Message-ID (diff)
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Vineeth Remanan Pillai <vineethp@amazon.com>,
	David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wei Liu <wei.liu2@citrix.com>,
	Paul Durrant <paul.durrant@citrix.com>,
	xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
Date: Sun, 29 Jan 2017 18:09:21 -0500	[thread overview]
Message-ID: <38ccfaea-0a65-a6f3-c19a-e6f9c0d4ef76@oracle.com> (raw)
In-Reply-To: <66b10c64-936a-8001-6855-2ff1ed626642@amazon.com>



On 01/19/2017 11:35 AM, Vineeth Remanan Pillai wrote:
> From: Vineeth Remanan Pillai <vineethp@amazon.com>
>
> During an OOM scenario, request slots could not be created as skb
> allocation fails. So the netback cannot pass in packets and netfront
> wrongly assumes that there is no more work to be done and it disables
> polling. This causes Rx to stall.
>
> The issue is with the retry logic which schedules the timer if the
> created slots are less than NET_RX_SLOTS_MIN. The count of new request
> slots to be pushed are calculated as a difference between new req_prod
> and rsp_cons which could be more than the actual slots, if there are
> unconsumed responses.
>
> The fix is to calculate the count of newly created slots as the
> difference between new req_prod and old req_prod.
>
> Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
> ---
> Changes in v2:
>     - Removed the old implementation of enabling polling on
>       skb allocation error.
>     - Corrected the refill timer logic to schedule when newly
>       created slots since last push is less than NET_RX_SLOTS_MIN.


There are couple of problems with this patch.
1. The 'if' clause now evaluates to true on pretty much every call to 
xennet_alloc_rx_buffers().
2. It tickles a latent bug during resume where the timer triggers before 
we re-connect. The trouble is that we now try to dereference 
queue->rx.sring which is NULL since we disconnect in netfront_resume(). 
(Curiously, I only observe it with 32-bit guests)

I'll send a patch later that will delete the timer since it looks like a 
bug to me in any case but the first problem seems to be more serious 
than the problem that this patch addresses.

-boris

>
>  drivers/net/xen-netfront.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 40f26b6..2c7c29f 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -321,7 +321,7 @@ static void xennet_alloc_rx_buffers(struct
> netfront_queue *queue)
>      queue->rx.req_prod_pvt = req_prod;
>
>      /* Not enough requests? Try again later. */
> -    if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
> +    if (req_prod - queue->rx.sring->req_prod < NET_RX_SLOTS_MIN) {
>          mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
>          return;
>      }

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  parent reply	other threads:[~2017-01-29 23:11 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-18 20:25 [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM Remanan Pillai
2017-01-18 20:25 ` Remanan Pillai
2017-01-19 16:35 ` Vineeth Remanan Pillai
2017-01-19 17:11   ` David Miller
2017-01-19 17:17     ` Vineeth Remanan Pillai
2017-01-19 18:10       ` David Miller
2017-01-20 19:09   ` David Miller
2017-01-29 23:09   ` Boris Ostrovsky [this message]
2017-01-29 23:09     ` Boris Ostrovsky
2017-01-30 16:47     ` Vineeth Remanan Pillai
2017-01-30 16:47       ` Vineeth Remanan Pillai
2017-01-30 16:47       ` Vineeth Remanan Pillai
2017-01-30 17:06       ` Boris Ostrovsky
2017-01-30 17:06         ` Boris Ostrovsky
2017-01-30 17:13         ` Vineeth Remanan Pillai
2017-01-30 17:13         ` Vineeth Remanan Pillai
2017-01-31 16:47       ` Vineeth Remanan Pillai
2017-01-31 16:47         ` Vineeth Remanan Pillai
  -- strict thread matches above, loose matches on Subject: below --
2017-01-12 23:09 [PATCH] " Vineeth Remanan Pillai
2017-01-13 17:55 ` [PATCH v2] " Remanan Pillai
2017-01-13 17:55   ` Remanan Pillai
2017-01-16  6:24   ` Juergen Gross
2017-01-16  6:24   ` Juergen Gross
2017-01-18 17:02     ` Vineeth Remanan Pillai
2017-01-18 17:02       ` Vineeth Remanan Pillai
2017-01-18 17:02       ` Vineeth Remanan Pillai
2017-01-18 20:08       ` David Miller
2017-01-18 20:08         ` David Miller
2017-01-18 20:10       ` David Miller
2017-01-18 20:10         ` David Miller
2017-01-18 20:24         ` Vineeth Remanan Pillai
2017-01-18 20:24           ` Vineeth Remanan Pillai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38ccfaea-0a65-a6f3-c19a-e6f9c0d4ef76@oracle.com \
    --to=boris.ostrovsky@oracle.com \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paul.durrant@citrix.com \
    --cc=vineethp@amazon.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.