linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xen-netfront: Fix Rx stall during network stress and OOM
@ 2017-01-11 23:17 Vineeth Remanan Pillai
  2017-01-12 20:17 ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Vineeth Remanan Pillai @ 2017-01-11 23:17 UTC (permalink / raw)
  To: boris.ostrovsky, jgross, xen-devel, netdev, linux-kernel
  Cc: Vineeth Remanan Pillai, kamatam, aliguori

During an OOM scenario, request slots could not be created as skb
allocation fails. So the netback cannot pass in packets and netfront
wrongly assumes that there is no more work to be done and it disables
polling. This causes Rx to stall.

Fix is to consider the skb allocation failure as an error and in the
event of this error, re-enable polling so that request slots could be
created when memory is available.

Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
---
 drivers/net/xen-netfront.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 40f26b6..8275549 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -277,13 +277,14 @@ static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 }
 
 
-static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
+static int xennet_alloc_rx_buffers(struct netfront_queue *queue)
 {
 	RING_IDX req_prod = queue->rx.req_prod_pvt;
 	int notify;
+	int err = 0;
 
 	if (unlikely(!netif_carrier_ok(queue->info->netdev)))
-		return;
+		return err;
 
 	for (req_prod = queue->rx.req_prod_pvt;
 	     req_prod - queue->rx.rsp_cons < NET_RX_RING_SIZE;
@@ -295,8 +296,10 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		struct xen_netif_rx_request *req;
 
 		skb = xennet_alloc_one_rx_buffer(queue);
-		if (!skb)
+		if (!skb) {
+			err = -ENOMEM;
 			break;
+		}
 
 		id = xennet_rxidx(req_prod);
 
@@ -321,9 +324,9 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 	queue->rx.req_prod_pvt = req_prod;
 
 	/* Not enough requests? Try again later. */
-	if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
+	if (req_prod - queue->rx.sring->rsp_prod < NET_RX_SLOTS_MIN) {
 		mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
-		return;
+		return err;
 	}
 
 	wmb();		/* barrier so backend seens requests */
@@ -331,6 +334,8 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&queue->rx, notify);
 	if (notify)
 		notify_remote_via_irq(queue->rx_irq);
+
+	return err;
 }
 
 static int xennet_open(struct net_device *dev)
@@ -1046,7 +1051,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
 
 	work_done -= handle_incoming_queue(queue, &rxq);
 
-	xennet_alloc_rx_buffers(queue);
+	err = xennet_alloc_rx_buffers(queue);
 
 	if (work_done < budget) {
 		int more_to_do = 0;
@@ -1054,7 +1059,11 @@ static int xennet_poll(struct napi_struct *napi, int budget)
 		napi_complete(napi);
 
 		RING_FINAL_CHECK_FOR_RESPONSES(&queue->rx, more_to_do);
-		if (more_to_do)
+
+		/* If there is more work to do or could not allocate
+		 * rx buffers, re-enable polling.
+		 */
+		if (more_to_do || err != 0)
 			napi_schedule(napi);
 	}
 
-- 
2.1.2.AMZN

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-11 23:17 [PATCH] xen-netfront: Fix Rx stall during network stress and OOM Vineeth Remanan Pillai
@ 2017-01-12 20:17 ` David Miller
  2017-01-12 23:09   ` Vineeth Remanan Pillai
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2017-01-12 20:17 UTC (permalink / raw)
  To: vineethp
  Cc: boris.ostrovsky, jgross, xen-devel, netdev, linux-kernel,
	kamatam, aliguori

From: Vineeth Remanan Pillai <vineethp@amazon.com>
Date: Wed, 11 Jan 2017 23:17:17 +0000

> @@ -1054,7 +1059,11 @@ static int xennet_poll(struct napi_struct *napi, int budget)
>  		napi_complete(napi);
>  
>  		RING_FINAL_CHECK_FOR_RESPONSES(&queue->rx, more_to_do);
> -		if (more_to_do)
> +
> +		/* If there is more work to do or could not allocate
> +		 * rx buffers, re-enable polling.
> +		 */
> +		if (more_to_do || err != 0)
>  			napi_schedule(napi);

Just polling endlessly in a loop retrying the SKB allocation over and over
again until it succeeds is not very nice behavior.

You already have that refill timer, so please use that to retry instead
of wasting cpu cycles looping in NAPI poll.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-12 20:17 ` David Miller
@ 2017-01-12 23:09   ` Vineeth Remanan Pillai
       [not found]     ` <1484330100-3960-1-git-send-email-vineethp@u480fcf3b67f557f68df1.ant.amazon.com>
  0 siblings, 1 reply; 18+ messages in thread
From: Vineeth Remanan Pillai @ 2017-01-12 23:09 UTC (permalink / raw)
  To: David Miller
  Cc: boris.ostrovsky, jgross, xen-devel, netdev, linux-kernel,
	kamatam, aliguori



On 01/12/2017 12:17 PM, David Miller wrote:
> From: Vineeth Remanan Pillai <vineethp@amazon.com>
> Date: Wed, 11 Jan 2017 23:17:17 +0000
>
>> @@ -1054,7 +1059,11 @@ static int xennet_poll(struct napi_struct *napi, int budget)
>>   		napi_complete(napi);
>>   
>>   		RING_FINAL_CHECK_FOR_RESPONSES(&queue->rx, more_to_do);
>> -		if (more_to_do)
>> +
>> +		/* If there is more work to do or could not allocate
>> +		 * rx buffers, re-enable polling.
>> +		 */
>> +		if (more_to_do || err != 0)
>>   			napi_schedule(napi);
> Just polling endlessly in a loop retrying the SKB allocation over and over
> again until it succeeds is not very nice behavior.
>
> You already have that refill timer, so please use that to retry instead
> of wasting cpu cycles looping in NAPI poll.
Thanks Dave for the inputs.
On further look, I think I can fix it much simpler by correcting the 
test condition
for minimum slots for pushing requests. Existing test is like this:

<snip>
         /* Not enough requests? Try again later. */
        if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
                 mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
                 return;
         }
</snip>

Actually the above check counts more than the newly created request slots
as it counts from rsp_cons. The actual count should be the difference 
between
new req_prod and old req_prod(in the queue). If skbs cannot be created, 
this
count remains small and hence we would schedule the timer. So the fix 
could be:

         /* Not enough requests? Try again later. */
-       if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
+       if (req_prod - queue->rx.sring->req_prod < NET_RX_SLOTS_MIN) {


I have done some initial testing to verify the fix. Will send out v2 
patch after couple
more round of testing.

Thanks,
Vineeth

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
       [not found]     ` <1484330100-3960-1-git-send-email-vineethp@u480fcf3b67f557f68df1.ant.amazon.com>
@ 2017-01-16  6:24       ` Juergen Gross
  2017-01-18 17:02         ` Vineeth Remanan Pillai
  0 siblings, 1 reply; 18+ messages in thread
From: Juergen Gross @ 2017-01-16  6:24 UTC (permalink / raw)
  To: Remanan Pillai, boris.ostrovsky, xen-devel, netdev, linux-kernel
  Cc: Vineeth Remanan Pillai, kamatam, aliguori

On 13/01/17 18:55, Remanan Pillai wrote:
> From: Vineeth Remanan Pillai <vineethp@amazon.com>
> 
> During an OOM scenario, request slots could not be created as skb
> allocation fails. So the netback cannot pass in packets and netfront
> wrongly assumes that there is no more work to be done and it disables
> polling. This causes Rx to stall.
> 
> The issue is with the retry logic which schedules the timer if the
> created slots are less than NET_RX_SLOTS_MIN. The count of new request
> slots to be pushed are calculated as a difference between new req_prod
> and rsp_cons which could be more than the actual slots, if there are
> unconsumed responses.
> 
> The fix is to calculate the count of newly created slots as the
> difference between new req_prod and old req_prod.
> 
> Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Thanks,

Juergen

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-16  6:24       ` [PATCH v2] " Juergen Gross
@ 2017-01-18 17:02         ` Vineeth Remanan Pillai
  2017-01-18 20:08           ` David Miller
  2017-01-18 20:10           ` David Miller
  0 siblings, 2 replies; 18+ messages in thread
From: Vineeth Remanan Pillai @ 2017-01-18 17:02 UTC (permalink / raw)
  To: Juergen Gross, David Miller, boris.ostrovsky, xen-devel, netdev,
	linux-kernel
  Cc: kamatam, aliguori, vineethp


On 01/15/2017 10:24 PM, Juergen Gross wrote:
> On 13/01/17 18:55, Remanan Pillai wrote:
>> From: Vineeth Remanan Pillai <vineethp@amazon.com>
>>
>> During an OOM scenario, request slots could not be created as skb
>> allocation fails. So the netback cannot pass in packets and netfront
>> wrongly assumes that there is no more work to be done and it disables
>> polling. This causes Rx to stall.
>>
>> The issue is with the retry logic which schedules the timer if the
>> created slots are less than NET_RX_SLOTS_MIN. The count of new request
>> slots to be pushed are calculated as a difference between new req_prod
>> and rsp_cons which could be more than the actual slots, if there are
>> unconsumed responses.
>>
>> The fix is to calculate the count of newly created slots as the
>> difference between new req_prod and old req_prod.
>>
>> Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
Thanks Juergen.

David,

Could you please pick up this change for net-next if there no more 
concerns..

Many Thanks,
Vineeth

>
>
> Thanks,
>
> Juergen
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-18 17:02         ` Vineeth Remanan Pillai
@ 2017-01-18 20:08           ` David Miller
  2017-01-18 20:10           ` David Miller
  1 sibling, 0 replies; 18+ messages in thread
From: David Miller @ 2017-01-18 20:08 UTC (permalink / raw)
  To: vineethp
  Cc: jgross, boris.ostrovsky, xen-devel, netdev, linux-kernel,
	kamatam, aliguori

From: Vineeth Remanan Pillai <vineethp@amazon.com>
Date: Wed, 18 Jan 2017 09:02:17 -0800

> 
> On 01/15/2017 10:24 PM, Juergen Gross wrote:
>> On 13/01/17 18:55, Remanan Pillai wrote:
>>> From: Vineeth Remanan Pillai <vineethp@amazon.com>
>>>
>>> During an OOM scenario, request slots could not be created as skb
>>> allocation fails. So the netback cannot pass in packets and netfront
>>> wrongly assumes that there is no more work to be done and it disables
>>> polling. This causes Rx to stall.
>>>
>>> The issue is with the retry logic which schedules the timer if the
>>> created slots are less than NET_RX_SLOTS_MIN. The count of new request
>>> slots to be pushed are calculated as a difference between new req_prod
>>> and rsp_cons which could be more than the actual slots, if there are
>>> unconsumed responses.
>>>
>>> The fix is to calculate the count of newly created slots as the
>>> difference between new req_prod and old req_prod.
>>>
>>> Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
> Thanks Juergen.
> 
> David,
> 
> Could you please pick up this change for net-next if there no more
> concerns..

Why would I pick up "this change", if the author of the patch has
stated that he will resubmit the change implemented differently based
upon my feedback?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-18 17:02         ` Vineeth Remanan Pillai
  2017-01-18 20:08           ` David Miller
@ 2017-01-18 20:10           ` David Miller
  2017-01-18 20:24             ` Vineeth Remanan Pillai
  1 sibling, 1 reply; 18+ messages in thread
From: David Miller @ 2017-01-18 20:10 UTC (permalink / raw)
  To: vineethp
  Cc: jgross, boris.ostrovsky, xen-devel, netdev, linux-kernel,
	kamatam, aliguori


This v2 never made it into patchwork.  I don't know why, so please resend it to
netdev with the accumulated reviewed-by etc. tags added.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-18 20:10           ` David Miller
@ 2017-01-18 20:24             ` Vineeth Remanan Pillai
  0 siblings, 0 replies; 18+ messages in thread
From: Vineeth Remanan Pillai @ 2017-01-18 20:24 UTC (permalink / raw)
  To: David Miller
  Cc: jgross, boris.ostrovsky, xen-devel, netdev, linux-kernel,
	kamatam, aliguori


On 01/18/2017 12:10 PM, David Miller wrote:
> This v2 never made it into patchwork.  I don't know why, so please resend it to
> netdev with the accumulated reviewed-by etc. tags added.
>
> Thanks.
Sorry about that. Will resend as a separate thread right away.

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-30 16:47     ` Vineeth Remanan Pillai
  2017-01-30 17:06       ` Boris Ostrovsky
@ 2017-01-31 16:47       ` Vineeth Remanan Pillai
  1 sibling, 0 replies; 18+ messages in thread
From: Vineeth Remanan Pillai @ 2017-01-31 16:47 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-kernel, David Miller, netdev, Wei Liu, Paul Durrant, xen-devel


On 01/30/2017 08:47 AM, Vineeth Remanan Pillai wrote:
>
> On 01/29/2017 03:09 PM, Boris Ostrovsky wrote:
>>
>> There are couple of problems with this patch.
>> 1. The 'if' clause now evaluates to true on pretty much every call to 
>> xennet_alloc_rx_buffers().
> Thanks for catching this. In my testing I did not notice this - mostly 
> because of the nature of the workload in my testing.
I am working on a patch to revert to the old behavior and solve the Rx 
stall issue by scheduling the timer if any of the following
conditions are true:
  - unconsumed requests + new requests < NET_RX_SLOTS_MIN (old behavior)
  - skb allocations fail

Will send out the patch by next week after I can do some testing.

Thanks,
Vineeth

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-30 17:06       ` Boris Ostrovsky
@ 2017-01-30 17:13         ` Vineeth Remanan Pillai
  0 siblings, 0 replies; 18+ messages in thread
From: Vineeth Remanan Pillai @ 2017-01-30 17:13 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-kernel, David Miller, netdev, Wei Liu, Paul Durrant, xen-devel


On 01/30/2017 09:06 AM, Boris Ostrovsky wrote:
> On 01/30/2017 11:47 AM, Vineeth Remanan Pillai wrote:
>
>>> 2. It tickles a latent bug during resume where the timer triggers
>>> before we re-connect. The trouble is that we now try to dereference
>>> queue->rx.sring which is NULL since we disconnect in
>>> netfront_resume(). (Curiously, I only observe it with 32-bit guests)
>> I think we may hit this bug after removing the timer as well. We call
>> RING_PUSH_REQUESTS_AND_CHECK_NOTIFY soon after, which also dereference
>> queue->rx.sring.
> If the timer is deleted in xennet_disconnect_backend() then why would
> anyone be pushing anything to the backend after that?
Sorry, I got the ordering wrong. Thanks for the clarification..

Thanks,
Vineeth

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-30 16:47     ` Vineeth Remanan Pillai
@ 2017-01-30 17:06       ` Boris Ostrovsky
  2017-01-30 17:13         ` Vineeth Remanan Pillai
  2017-01-31 16:47       ` Vineeth Remanan Pillai
  1 sibling, 1 reply; 18+ messages in thread
From: Boris Ostrovsky @ 2017-01-30 17:06 UTC (permalink / raw)
  To: Vineeth Remanan Pillai, linux-kernel
  Cc: David Miller, netdev, Wei Liu, Paul Durrant, xen-devel

On 01/30/2017 11:47 AM, Vineeth Remanan Pillai wrote:
>
>> 2. It tickles a latent bug during resume where the timer triggers
>> before we re-connect. The trouble is that we now try to dereference
>> queue->rx.sring which is NULL since we disconnect in
>> netfront_resume(). (Curiously, I only observe it with 32-bit guests)
> I think we may hit this bug after removing the timer as well. We call
> RING_PUSH_REQUESTS_AND_CHECK_NOTIFY soon after, which also dereference
> queue->rx.sring.


If the timer is deleted in xennet_disconnect_backend() then why would
anyone be pushing anything to the backend after that?

-boris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-29 23:09   ` Boris Ostrovsky
@ 2017-01-30 16:47     ` Vineeth Remanan Pillai
  2017-01-30 17:06       ` Boris Ostrovsky
  2017-01-31 16:47       ` Vineeth Remanan Pillai
  0 siblings, 2 replies; 18+ messages in thread
From: Vineeth Remanan Pillai @ 2017-01-30 16:47 UTC (permalink / raw)
  To: Boris Ostrovsky, linux-kernel
  Cc: David Miller, netdev, Wei Liu, Paul Durrant, xen-devel


On 01/29/2017 03:09 PM, Boris Ostrovsky wrote:
>
> There are couple of problems with this patch.
> 1. The 'if' clause now evaluates to true on pretty much every call to 
> xennet_alloc_rx_buffers().
Thanks for catching this. In my testing I did not notice this - mostly 
because of the nature of the workload in my testing.

> 2. It tickles a latent bug during resume where the timer triggers 
> before we re-connect. The trouble is that we now try to dereference 
> queue->rx.sring which is NULL since we disconnect in 
> netfront_resume(). (Curiously, I only observe it with 32-bit guests)
I think we may hit this bug after removing the timer as well. We call 
RING_PUSH_REQUESTS_AND_CHECK_NOTIFY soon after, which also dereference 
queue->rx.sring.

Thanks,
Vineeth

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-19 16:35 ` Vineeth Remanan Pillai
  2017-01-19 17:11   ` David Miller
  2017-01-20 19:09   ` David Miller
@ 2017-01-29 23:09   ` Boris Ostrovsky
  2017-01-30 16:47     ` Vineeth Remanan Pillai
  2 siblings, 1 reply; 18+ messages in thread
From: Boris Ostrovsky @ 2017-01-29 23:09 UTC (permalink / raw)
  To: Vineeth Remanan Pillai, David Miller, netdev, linux-kernel,
	Wei Liu, Paul Durrant, xen-devel



On 01/19/2017 11:35 AM, Vineeth Remanan Pillai wrote:
> From: Vineeth Remanan Pillai <vineethp@amazon.com>
>
> During an OOM scenario, request slots could not be created as skb
> allocation fails. So the netback cannot pass in packets and netfront
> wrongly assumes that there is no more work to be done and it disables
> polling. This causes Rx to stall.
>
> The issue is with the retry logic which schedules the timer if the
> created slots are less than NET_RX_SLOTS_MIN. The count of new request
> slots to be pushed are calculated as a difference between new req_prod
> and rsp_cons which could be more than the actual slots, if there are
> unconsumed responses.
>
> The fix is to calculate the count of newly created slots as the
> difference between new req_prod and old req_prod.
>
> Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
> ---
> Changes in v2:
>     - Removed the old implementation of enabling polling on
>       skb allocation error.
>     - Corrected the refill timer logic to schedule when newly
>       created slots since last push is less than NET_RX_SLOTS_MIN.


There are couple of problems with this patch.
1. The 'if' clause now evaluates to true on pretty much every call to 
xennet_alloc_rx_buffers().
2. It tickles a latent bug during resume where the timer triggers before 
we re-connect. The trouble is that we now try to dereference 
queue->rx.sring which is NULL since we disconnect in netfront_resume(). 
(Curiously, I only observe it with 32-bit guests)

I'll send a patch later that will delete the timer since it looks like a 
bug to me in any case but the first problem seems to be more serious 
than the problem that this patch addresses.

-boris

>
>  drivers/net/xen-netfront.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 40f26b6..2c7c29f 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -321,7 +321,7 @@ static void xennet_alloc_rx_buffers(struct
> netfront_queue *queue)
>      queue->rx.req_prod_pvt = req_prod;
>
>      /* Not enough requests? Try again later. */
> -    if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
> +    if (req_prod - queue->rx.sring->req_prod < NET_RX_SLOTS_MIN) {
>          mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
>          return;
>      }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-19 16:35 ` Vineeth Remanan Pillai
  2017-01-19 17:11   ` David Miller
@ 2017-01-20 19:09   ` David Miller
  2017-01-29 23:09   ` Boris Ostrovsky
  2 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2017-01-20 19:09 UTC (permalink / raw)
  To: vineethp; +Cc: netdev, linux-kernel

From: Vineeth Remanan Pillai <vineethp@amazon.com>
Date: Thu, 19 Jan 2017 08:35:39 -0800

> From: Vineeth Remanan Pillai <vineethp@amazon.com>
> 
> During an OOM scenario, request slots could not be created as skb
> allocation fails. So the netback cannot pass in packets and netfront
> wrongly assumes that there is no more work to be done and it disables
> polling. This causes Rx to stall.
> 
> The issue is with the retry logic which schedules the timer if the
> created slots are less than NET_RX_SLOTS_MIN. The count of new request
> slots to be pushed are calculated as a difference between new req_prod
> and rsp_cons which could be more than the actual slots, if there are
> unconsumed responses.
> 
> The fix is to calculate the count of newly created slots as the
> difference between new req_prod and old req_prod.
> 
> Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
> ---
> Changes in v2:
> 	- Removed the old implementation of enabling polling on
> 	  skb allocation error.
> 	- Corrected the refill timer logic to schedule when newly
> 	  created slots since last push is less than NET_RX_SLOTS_MIN.

Applied.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-19 17:17     ` Vineeth Remanan Pillai
@ 2017-01-19 18:10       ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2017-01-19 18:10 UTC (permalink / raw)
  To: vineethp; +Cc: netdev, linux-kernel

From: Vineeth Remanan Pillai <vineethp@amazon.com>
Date: Thu, 19 Jan 2017 09:17:09 -0800

> Should I try sending it once again?

No need, it just showed up.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-19 17:11   ` David Miller
@ 2017-01-19 17:17     ` Vineeth Remanan Pillai
  2017-01-19 18:10       ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Vineeth Remanan Pillai @ 2017-01-19 17:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel



On 01/19/2017 09:11 AM, David Miller wrote:
> From: Vineeth Remanan Pillai <vineethp@amazon.com>
> Date: Thu, 19 Jan 2017 08:35:39 -0800
>
>> From: Vineeth Remanan Pillai <vineethp@amazon.com>
>>
>> During an OOM scenario, request slots could not be created as skb
>> allocation fails. So the netback cannot pass in packets and netfront
>> wrongly assumes that there is no more work to be done and it disables
>> polling. This causes Rx to stall.
>>
>> The issue is with the retry logic which schedules the timer if the
>> created slots are less than NET_RX_SLOTS_MIN. The count of new request
>> slots to be pushed are calculated as a difference between new req_prod
>> and rsp_cons which could be more than the actual slots, if there are
>> unconsumed responses.
>>
>> The fix is to calculate the count of newly created slots as the
>> difference between new req_prod and old req_prod.
>>
>> Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>> ---
>> Changes in v2:
>> 	- Removed the old implementation of enabling polling on
>> 	  skb allocation error.
>> 	- Corrected the refill timer logic to schedule when newly
>> 	  created slots since last push is less than NET_RX_SLOTS_MIN.
> Your postings aren't showing up on vger.kernel.org at all.
>
> Are you getting a bounce message back?  I can only assume you are triggering
> one of the various content filters we have.
>
I haven't received any bounce messages till now. The mail showed up
in xen-devel after about 8 hours yesterday. Not sure what is happening
with vger.kernel.org. My initial patch showed up in all the mailing 
list. The
only difference is, I switched to a machine running a later version of git.

Should I try sending it once again?

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
  2017-01-19 16:35 ` Vineeth Remanan Pillai
@ 2017-01-19 17:11   ` David Miller
  2017-01-19 17:17     ` Vineeth Remanan Pillai
  2017-01-20 19:09   ` David Miller
  2017-01-29 23:09   ` Boris Ostrovsky
  2 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2017-01-19 17:11 UTC (permalink / raw)
  To: vineethp; +Cc: netdev, linux-kernel

From: Vineeth Remanan Pillai <vineethp@amazon.com>
Date: Thu, 19 Jan 2017 08:35:39 -0800

> From: Vineeth Remanan Pillai <vineethp@amazon.com>
> 
> During an OOM scenario, request slots could not be created as skb
> allocation fails. So the netback cannot pass in packets and netfront
> wrongly assumes that there is no more work to be done and it disables
> polling. This causes Rx to stall.
> 
> The issue is with the retry logic which schedules the timer if the
> created slots are less than NET_RX_SLOTS_MIN. The count of new request
> slots to be pushed are calculated as a difference between new req_prod
> and rsp_cons which could be more than the actual slots, if there are
> unconsumed responses.
> 
> The fix is to calculate the count of newly created slots as the
> difference between new req_prod and old req_prod.
> 
> Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
> ---
> Changes in v2:
> 	- Removed the old implementation of enabling polling on
> 	  skb allocation error.
> 	- Corrected the refill timer logic to schedule when newly
> 	  created slots since last push is less than NET_RX_SLOTS_MIN.

Your postings aren't showing up on vger.kernel.org at all.

Are you getting a bounce message back?  I can only assume you are triggering
one of the various content filters we have.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM
       [not found] <1484771149-12699-1-git-send-email-vineethp@u480fcf3b67f557f68df1.ant.amazon.com>
@ 2017-01-19 16:35 ` Vineeth Remanan Pillai
  2017-01-19 17:11   ` David Miller
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Vineeth Remanan Pillai @ 2017-01-19 16:35 UTC (permalink / raw)
  To: David Miller, netdev, linux-kernel; +Cc: vineethp

From: Vineeth Remanan Pillai <vineethp@amazon.com>

During an OOM scenario, request slots could not be created as skb
allocation fails. So the netback cannot pass in packets and netfront
wrongly assumes that there is no more work to be done and it disables
polling. This causes Rx to stall.

The issue is with the retry logic which schedules the timer if the
created slots are less than NET_RX_SLOTS_MIN. The count of new request
slots to be pushed are calculated as a difference between new req_prod
and rsp_cons which could be more than the actual slots, if there are
unconsumed responses.

The fix is to calculate the count of newly created slots as the
difference between new req_prod and old req_prod.

Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
Changes in v2:
	- Removed the old implementation of enabling polling on
	  skb allocation error.
	- Corrected the refill timer logic to schedule when newly
	  created slots since last push is less than NET_RX_SLOTS_MIN.

  drivers/net/xen-netfront.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 40f26b6..2c7c29f 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -321,7 +321,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
  	queue->rx.req_prod_pvt = req_prod;
  
  	/* Not enough requests? Try again later. */
-	if (req_prod - queue->rx.rsp_cons < NET_RX_SLOTS_MIN) {
+	if (req_prod - queue->rx.sring->req_prod < NET_RX_SLOTS_MIN) {
  		mod_timer(&queue->rx_refill_timer, jiffies + (HZ/10));
  		return;
  	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-01-31 16:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-11 23:17 [PATCH] xen-netfront: Fix Rx stall during network stress and OOM Vineeth Remanan Pillai
2017-01-12 20:17 ` David Miller
2017-01-12 23:09   ` Vineeth Remanan Pillai
     [not found]     ` <1484330100-3960-1-git-send-email-vineethp@u480fcf3b67f557f68df1.ant.amazon.com>
2017-01-16  6:24       ` [PATCH v2] " Juergen Gross
2017-01-18 17:02         ` Vineeth Remanan Pillai
2017-01-18 20:08           ` David Miller
2017-01-18 20:10           ` David Miller
2017-01-18 20:24             ` Vineeth Remanan Pillai
     [not found] <1484771149-12699-1-git-send-email-vineethp@u480fcf3b67f557f68df1.ant.amazon.com>
2017-01-19 16:35 ` Vineeth Remanan Pillai
2017-01-19 17:11   ` David Miller
2017-01-19 17:17     ` Vineeth Remanan Pillai
2017-01-19 18:10       ` David Miller
2017-01-20 19:09   ` David Miller
2017-01-29 23:09   ` Boris Ostrovsky
2017-01-30 16:47     ` Vineeth Remanan Pillai
2017-01-30 17:06       ` Boris Ostrovsky
2017-01-30 17:13         ` Vineeth Remanan Pillai
2017-01-31 16:47       ` Vineeth Remanan Pillai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).