netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] igb: don't reuse pages with pfmemalloc flag
@ 2014-10-22 13:50 Roman Gushchin
  2014-10-22 15:45 ` Eric Dumazet
  2014-10-22 18:30 ` [PATCH] igb: don't reuse pages with pfmemalloc flag Jeff Kirsher
  0 siblings, 2 replies; 8+ messages in thread
From: Roman Gushchin @ 2014-10-22 13:50 UTC (permalink / raw)
  To: jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	peter.p.waskiewicz.jr, alexander.h.duyck, john.ronciak,
	tushar.n.dave, davem, sassmann, gregkh, e1000-devel
  Cc: netdev, linux-kernel, Roman Gushchin

Incoming packet is dropped silently by sk_filter(), if the skb was
allocated from pfmemalloc reserves and the corresponding socket is
not marked with the SOCK_MEMALLOC flag.

Igb driver allocates pages for DMA with __skb_alloc_page(), which
calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
of OOM condition, igb can get pages with pfmemalloc flag set.

If an incoming packet hits the pfmemalloc page and is large enough
(small packets are copying into the memory, allocated with
netdev_alloc_skb_ip_align(), so they are not affected), it will be
dropped.

This behavior is ok under high memory pressure, but the problem is
that the igb driver reuses these mapped pages. So, packets are still
dropping even if all memory issues are gone and there is a plenty
of free memory.

In my case, some TCP sessions hang on a small percentage (< 0.1%)
of machines days after OOMs.

Fix this by avoiding reuse of such pages.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 0d4c897..6586392 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6178,6 +6178,9 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
 	if (unlikely(page_to_nid(page) != numa_node_id()))
 		return false;
 
+	if (unlikely(page->pfmemalloc))
+		return false;
+
 #if (PAGE_SIZE < 8192)
 	/* if we are only owner of page we can reuse it */
 	if (unlikely(page_count(page) != 1))
@@ -6245,7 +6248,8 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring,
 		memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
 
 		/* we can reuse buffer as-is, just make sure it is local */
-		if (likely(page_to_nid(page) == numa_node_id()))
+		if (likely((page_to_nid(page) == numa_node_id()) &&
+			   !page->pfmemalloc))
 			return true;
 
 		/* this page cannot be reused so discard it */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] igb: don't reuse pages with pfmemalloc flag
  2014-10-22 13:50 [PATCH] igb: don't reuse pages with pfmemalloc flag Roman Gushchin
@ 2014-10-22 15:45 ` Eric Dumazet
  2014-10-23 11:21   ` Roman Gushchin
  2014-10-22 18:30 ` [PATCH] igb: don't reuse pages with pfmemalloc flag Jeff Kirsher
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2014-10-22 15:45 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: alexander.h.duyck, netdev, sassmann, e1000-devel,
	peter.p.waskiewicz.jr, bruce.w.allan, jesse.brandeburg, davem,
	john.ronciak, gregkh, linux-kernel

On Wed, 2014-10-22 at 17:50 +0400, Roman Gushchin wrote:
> Incoming packet is dropped silently by sk_filter(), if the skb was
> allocated from pfmemalloc reserves and the corresponding socket is
> not marked with the SOCK_MEMALLOC flag.
> 
> Igb driver allocates pages for DMA with __skb_alloc_page(), which
> calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
> of OOM condition, igb can get pages with pfmemalloc flag set.
> 
> If an incoming packet hits the pfmemalloc page and is large enough
> (small packets are copying into the memory, allocated with
> netdev_alloc_skb_ip_align(), so they are not affected), it will be
> dropped.
> 
> This behavior is ok under high memory pressure, but the problem is
> that the igb driver reuses these mapped pages. So, packets are still
> dropping even if all memory issues are gone and there is a plenty
> of free memory.
> 
> In my case, some TCP sessions hang on a small percentage (< 0.1%)
> of machines days after OOMs.
> 
> Fix this by avoiding reuse of such pages.
> 
> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> ---

Interesting...

It seems we also need to clear skb->pfmemalloc in napi_reuse_skb()




------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] igb: don't reuse pages with pfmemalloc flag
  2014-10-22 13:50 [PATCH] igb: don't reuse pages with pfmemalloc flag Roman Gushchin
  2014-10-22 15:45 ` Eric Dumazet
@ 2014-10-22 18:30 ` Jeff Kirsher
  2014-10-23  7:52   ` Roman Gushchin
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Kirsher @ 2014-10-22 18:30 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: jesse.brandeburg, bruce.w.allan, carolyn.wyborny,
	donald.c.skidmore, gregory.v.rose, peter.p.waskiewicz.jr,
	alexander.h.duyck, john.ronciak, tushar.n.dave, davem, sassmann,
	gregkh, e1000-devel, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1283 bytes --]

On Wed, 2014-10-22 at 17:50 +0400, Roman Gushchin wrote:
> Incoming packet is dropped silently by sk_filter(), if the skb was
> allocated from pfmemalloc reserves and the corresponding socket is
> not marked with the SOCK_MEMALLOC flag.
> 
> Igb driver allocates pages for DMA with __skb_alloc_page(), which
> calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
> of OOM condition, igb can get pages with pfmemalloc flag set.
> 
> If an incoming packet hits the pfmemalloc page and is large enough
> (small packets are copying into the memory, allocated with
> netdev_alloc_skb_ip_align(), so they are not affected), it will be
> dropped.
> 
> This behavior is ok under high memory pressure, but the problem is
> that the igb driver reuses these mapped pages. So, packets are still
> dropping even if all memory issues are gone and there is a plenty
> of free memory.
> 
> In my case, some TCP sessions hang on a small percentage (< 0.1%)
> of machines days after OOMs.
> 
> Fix this by avoiding reuse of such pages.
> 
> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Thanks Roman, I have added you patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] igb: don't reuse pages with pfmemalloc flag
  2014-10-22 18:30 ` [PATCH] igb: don't reuse pages with pfmemalloc flag Jeff Kirsher
@ 2014-10-23  7:52   ` Roman Gushchin
  0 siblings, 0 replies; 8+ messages in thread
From: Roman Gushchin @ 2014-10-23  7:52 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: alexander.h.duyck, netdev, sassmann, e1000-devel,
	peter.p.waskiewicz.jr, bruce.w.allan, jesse.brandeburg, davem,
	john.ronciak, gregkh, linux-kernel

Thank you!

Probably we should add it to stable trees too?

--
Regards,
Roman

22.10.2014, 22:30, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com>:
> On Wed, 2014-10-22 at 17:50 +0400, Roman Gushchin wrote:
>>  Incoming packet is dropped silently by sk_filter(), if the skb was
>>  allocated from pfmemalloc reserves and the corresponding socket is
>>  not marked with the SOCK_MEMALLOC flag.
>>
>>  Igb driver allocates pages for DMA with __skb_alloc_page(), which
>>  calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
>>  of OOM condition, igb can get pages with pfmemalloc flag set.
>>
>>  If an incoming packet hits the pfmemalloc page and is large enough
>>  (small packets are copying into the memory, allocated with
>>  netdev_alloc_skb_ip_align(), so they are not affected), it will be
>>  dropped.
>>
>>  This behavior is ok under high memory pressure, but the problem is
>>  that the igb driver reuses these mapped pages. So, packets are still
>>  dropping even if all memory issues are gone and there is a plenty
>>  of free memory.
>>
>>  In my case, some TCP sessions hang on a small percentage (< 0.1%)
>>  of machines days after OOMs.
>>
>>  Fix this by avoiding reuse of such pages.
>>
>>  Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
>>  ---
>>   drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> Thanks Roman, I have added you patch to my queue.

------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] igb: don't reuse pages with pfmemalloc flag
  2014-10-22 15:45 ` Eric Dumazet
@ 2014-10-23 11:21   ` Roman Gushchin
  2014-10-23 13:30     ` [PATHC] net: napi_reuse_skb() should check pfmemalloc Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Roman Gushchin @ 2014-10-23 11:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	peter.p.waskiewicz.jr, alexander.h.duyck, john.ronciak,
	tushar.n.dave, davem, sassmann, gregkh, e1000-devel, netdev,

>
> Interesting...
>
> It seems we also need to clear skb->pfmemalloc in napi_reuse_skb()

Sounds reasonable, but are you sure, that we can just drop skb->pfmemalloc flag in napi_reuse_skb()?

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATHC] net: napi_reuse_skb() should check pfmemalloc
  2014-10-23 11:21   ` Roman Gushchin
@ 2014-10-23 13:30     ` Eric Dumazet
  2014-10-23 13:49       ` Roman Gushchin
  2014-10-27  2:47       ` David Miller
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2014-10-23 13:30 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	peter.p.waskiewicz.jr, alexander.h.duyck, john.ronciak,
	tushar.n.dave, davem, sassmann, gregkh, e1000-devel, netdev,

From: Eric Dumazet <edumazet@google.com>

Do not reuse skb if it was pfmemalloc tainted, otherwise
future frame might be dropped anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index b793e3521a36..945bbd001359 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4157,6 +4157,10 @@ EXPORT_SYMBOL(napi_gro_receive);
 
 static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
 {
+	if (unlikely(skb->pfmemalloc)) {
+		consume_skb(skb);
+		return;
+	}
 	__skb_pull(skb, skb_headlen(skb));
 	/* restore the reserve we had after netdev_alloc_skb_ip_align() */
 	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN - skb_headroom(skb));

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATHC] net: napi_reuse_skb() should check pfmemalloc
  2014-10-23 13:30     ` [PATHC] net: napi_reuse_skb() should check pfmemalloc Eric Dumazet
@ 2014-10-23 13:49       ` Roman Gushchin
  2014-10-27  2:47       ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: Roman Gushchin @ 2014-10-23 13:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: alexander.h.duyck, netdev, sassmann, e1000-devel,
	peter.p.waskiewicz.jr, bruce.w.allan, jesse.brandeburg, davem,
	john.ronciak, gregkh, linux-kernel

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>

--
@klamm


23.10.2014, 17:30, "Eric Dumazet" <eric.dumazet@gmail.com>:
> From: Eric Dumazet <edumazet@google.com>
>
> Do not reuse skb if it was pfmemalloc tainted, otherwise
> future frame might be dropped anyway.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/core/dev.c |    4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b793e3521a36..945bbd001359 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4157,6 +4157,10 @@ EXPORT_SYMBOL(napi_gro_receive);
>
>  static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
>  {
> + if (unlikely(skb->pfmemalloc)) {
> + consume_skb(skb);
> + return;
> + }
>          __skb_pull(skb, skb_headlen(skb));
>          /* restore the reserve we had after netdev_alloc_skb_ip_align() */
>          skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN - skb_headroom(skb));

------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATHC] net: napi_reuse_skb() should check pfmemalloc
  2014-10-23 13:30     ` [PATHC] net: napi_reuse_skb() should check pfmemalloc Eric Dumazet
  2014-10-23 13:49       ` Roman Gushchin
@ 2014-10-27  2:47       ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2014-10-27  2:47 UTC (permalink / raw)
  To: eric.dumazet
  Cc: klamm, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	peter.p.waskiewicz.jr, alexander.h.duyck, john.ronciak,
	tushar.n.dave, sassmann, gregkh, e1000-devel, netdev,
	linux-kernel

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 23 Oct 2014 06:30:30 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Do not reuse skb if it was pfmemalloc tainted, otherwise
> future frame might be dropped anyway.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-10-27  2:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-22 13:50 [PATCH] igb: don't reuse pages with pfmemalloc flag Roman Gushchin
2014-10-22 15:45 ` Eric Dumazet
2014-10-23 11:21   ` Roman Gushchin
2014-10-23 13:30     ` [PATHC] net: napi_reuse_skb() should check pfmemalloc Eric Dumazet
2014-10-23 13:49       ` Roman Gushchin
2014-10-27  2:47       ` David Miller
2014-10-22 18:30 ` [PATCH] igb: don't reuse pages with pfmemalloc flag Jeff Kirsher
2014-10-23  7:52   ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).