All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3] net: ip: avoid OOM kills with large UDP sends over loopback
@ 2021-06-23 16:23 Jakub Kicinski
  2021-06-23 18:05 ` Eric Dumazet
  2021-06-23 19:45 ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 7+ messages in thread
From: Jakub Kicinski @ 2021-06-23 16:23 UTC (permalink / raw)
  To: davem
  Cc: netdev, willemb, eric.dumazet, dsahern, yoshfuji, Jakub Kicinski,
	Dave Jones

Dave observed number of machines hitting OOM on the UDP send
path. The workload seems to be sending large UDP packets over
loopback. Since loopback has MTU of 64k kernel will try to
allocate an skb with up to 64k of head space. This has a good
chance of failing under memory pressure. What's worse if
the message length is <32k the allocation may trigger an
OOM killer.

This is entirely avoidable, we can use an skb with page frags.

af_unix solves a similar problem by limiting the head
length to SKB_MAX_ALLOC. This seems like a good and simple
approach. It means that UDP messages > 16kB will now
use fragments if underlying device supports SG, if extra
allocator pressure causes regressions in real workloads
we can switch to trying the large allocation first and
falling back.

Reported-by: Dave Jones <dsj@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/ipv4/ip_output.c  | 4 +++-
 net/ipv6/ip6_output.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c3efc7d658f6..790dd28fd198 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1077,7 +1077,9 @@ static int __ip_append_data(struct sock *sk,
 			if ((flags & MSG_MORE) &&
 			    !(rt->dst.dev->features&NETIF_F_SG))
 				alloclen = mtu;
-			else if (!paged)
+			else if (!paged &&
+				 (fraglen + hh_len + 15 < SKB_MAX_ALLOC ||
+				  !(rt->dst.dev->features & NETIF_F_SG)))
 				alloclen = fraglen;
 			else {
 				alloclen = min_t(int, fraglen, MAX_HEADER);
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ff4f9ebcf7f6..ae8dbd6cdab1 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1585,7 +1585,9 @@ static int __ip6_append_data(struct sock *sk,
 			if ((flags & MSG_MORE) &&
 			    !(rt->dst.dev->features&NETIF_F_SG))
 				alloclen = mtu;
-			else if (!paged)
+			else if (!paged &&
+				 (fraglen + hh_len < SKB_MAX_ALLOC ||
+				  !(rt->dst.dev->features & NETIF_F_SG)))
 				alloclen = fraglen;
 			else {
 				alloclen = min_t(int, fraglen, MAX_HEADER);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v3] net: ip: avoid OOM kills with large UDP sends over loopback
  2021-06-23 16:23 [PATCH net-next v3] net: ip: avoid OOM kills with large UDP sends over loopback Jakub Kicinski
@ 2021-06-23 18:05 ` Eric Dumazet
  2021-06-23 18:11   ` Eric Dumazet
  2021-06-23 19:45 ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2021-06-23 18:05 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, willemb, eric.dumazet, dsahern, yoshfuji, Dave Jones



On 6/23/21 6:23 PM, Jakub Kicinski wrote:
> Dave observed number of machines hitting OOM on the UDP send
> path. The workload seems to be sending large UDP packets over
> loopback. Since loopback has MTU of 64k kernel will try to
> allocate an skb with up to 64k of head space. This has a good
> chance of failing under memory pressure. What's worse if
> the message length is <32k the allocation may trigger an
> OOM killer.
> 
> This is entirely avoidable, we can use an skb with page frags.
> 
> af_unix solves a similar problem by limiting the head
> length to SKB_MAX_ALLOC. This seems like a good and simple
> approach. It means that UDP messages > 16kB will now
> use fragments if underlying device supports SG, if extra
> allocator pressure causes regressions in real workloads
> we can switch to trying the large allocation first and
> falling back.
> 
> Reported-by: Dave Jones <dsj@fb.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks !


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v3] net: ip: avoid OOM kills with large UDP sends over loopback
  2021-06-23 18:05 ` Eric Dumazet
@ 2021-06-23 18:11   ` Eric Dumazet
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2021-06-23 18:11 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, davem
  Cc: netdev, willemb, dsahern, yoshfuji, Dave Jones



On 6/23/21 8:05 PM, Eric Dumazet wrote:
> 
> 
> On 6/23/21 6:23 PM, Jakub Kicinski wrote:
>> Dave observed number of machines hitting OOM on the UDP send
>> path. The workload seems to be sending large UDP packets over
>> loopback. Since loopback has MTU of 64k kernel will try to
>> allocate an skb with up to 64k of head space. This has a good
>> chance of failing under memory pressure. What's worse if
>> the message length is <32k the allocation may trigger an
>> OOM killer.
>>
>> This is entirely avoidable, we can use an skb with page frags.
>>
>> af_unix solves a similar problem by limiting the head
>> length to SKB_MAX_ALLOC. This seems like a good and simple
>> approach. It means that UDP messages > 16kB will now
>> use fragments if underlying device supports SG, if extra
>> allocator pressure causes regressions in real workloads
>> we can switch to trying the large allocation first and
>> falling back.
>>
>> Reported-by: Dave Jones <dsj@fb.com>
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> ---
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> 
> Thanks !
> 

I am taking this back.

IPv6 side also needs to account for sizeof(struct frag_hdr) ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v3] net: ip: avoid OOM kills with large UDP sends over loopback
  2021-06-23 16:23 [PATCH net-next v3] net: ip: avoid OOM kills with large UDP sends over loopback Jakub Kicinski
  2021-06-23 18:05 ` Eric Dumazet
@ 2021-06-23 19:45 ` Jesper Dangaard Brouer
  2021-06-23 21:07   ` Jakub Kicinski
  2021-06-23 21:14   ` Eric Dumazet
  1 sibling, 2 replies; 7+ messages in thread
From: Jesper Dangaard Brouer @ 2021-06-23 19:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: brouer, davem, netdev, willemb, eric.dumazet, dsahern, yoshfuji,
	Dave Jones

On Wed, 23 Jun 2021 09:23:28 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> Dave observed number of machines hitting OOM on the UDP send
> path. The workload seems to be sending large UDP packets over
> loopback. Since loopback has MTU of 64k kernel will try to
> allocate an skb with up to 64k of head space. This has a good
> chance of failing under memory pressure. What's worse if
> the message length is <32k the allocation may trigger an
> OOM killer.
> 
> This is entirely avoidable, we can use an skb with page frags.
> 
> af_unix solves a similar problem by limiting the head
> length to SKB_MAX_ALLOC. This seems like a good and simple
> approach. It means that UDP messages > 16kB will now
> use fragments if underlying device supports SG, if extra
> allocator pressure causes regressions in real workloads
> we can switch to trying the large allocation first and
> falling back.
> 
> Reported-by: Dave Jones <dsj@fb.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  net/ipv4/ip_output.c  | 4 +++-
>  net/ipv6/ip6_output.c | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index c3efc7d658f6..790dd28fd198 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1077,7 +1077,9 @@ static int __ip_append_data(struct sock *sk,
>  			if ((flags & MSG_MORE) &&
>  			    !(rt->dst.dev->features&NETIF_F_SG))
>  				alloclen = mtu;
> -			else if (!paged)
> +			else if (!paged &&
> +				 (fraglen + hh_len + 15 < SKB_MAX_ALLOC ||

What does the number 15 represent here?

> +				  !(rt->dst.dev->features & NETIF_F_SG)))
>  				alloclen = fraglen;
>  			else {
>  				alloclen = min_t(int, fraglen, MAX_HEADER);
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index ff4f9ebcf7f6..ae8dbd6cdab1 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1585,7 +1585,9 @@ static int __ip6_append_data(struct sock *sk,
>  			if ((flags & MSG_MORE) &&
>  			    !(rt->dst.dev->features&NETIF_F_SG))
>  				alloclen = mtu;
> -			else if (!paged)
> +			else if (!paged &&
> +				 (fraglen + hh_len < SKB_MAX_ALLOC ||

The number 15 is not use here.

> +				  !(rt->dst.dev->features & NETIF_F_SG)))
>  				alloclen = fraglen;
>  			else {
>  				alloclen = min_t(int, fraglen, MAX_HEADER);



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v3] net: ip: avoid OOM kills with large UDP sends over loopback
  2021-06-23 19:45 ` Jesper Dangaard Brouer
@ 2021-06-23 21:07   ` Jakub Kicinski
  2021-06-24  2:23     ` Willem de Bruijn
  2021-06-23 21:14   ` Eric Dumazet
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2021-06-23 21:07 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, netdev, willemb, eric.dumazet, dsahern, yoshfuji, Dave Jones

On Wed, 23 Jun 2021 21:45:55 +0200 Jesper Dangaard Brouer wrote:
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index c3efc7d658f6..790dd28fd198 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -1077,7 +1077,9 @@ static int __ip_append_data(struct sock *sk,
> >  			if ((flags & MSG_MORE) &&
> >  			    !(rt->dst.dev->features&NETIF_F_SG))
> >  				alloclen = mtu;
> > -			else if (!paged)
> > +			else if (!paged &&
> > +				 (fraglen + hh_len + 15 < SKB_MAX_ALLOC ||  
> 
> What does the number 15 represent here?

No idea, it's there on the allocation line, so I need to include it on
the size check.

Looking at super old code (2.4.x) it looks like it may have gotten
copy & pasted mistakenly? The hard headers are rounded up to 16B,
and there is code which does things like:

	skb_alloc(size + dev->hard_header_len + 15);
	skb_reserve(skb, (dev->hard_header_len + 15) & ~15);

in other spots. So if I was to guess I'd say someone decided to add the
15B "to be safe" even though hh_len already includes the round up here.

But that's just my guess. I can't get this simple patch right,
so take that with a grain of salt :/

> > +				  !(rt->dst.dev->features & NETIF_F_SG)))
> >  				alloclen = fraglen;
> >  			else {
> >  				alloclen = min_t(int, fraglen, MAX_HEADER);
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index ff4f9ebcf7f6..ae8dbd6cdab1 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -1585,7 +1585,9 @@ static int __ip6_append_data(struct sock *sk,
> >  			if ((flags & MSG_MORE) &&
> >  			    !(rt->dst.dev->features&NETIF_F_SG))
> >  				alloclen = mtu;
> > -			else if (!paged)
> > +			else if (!paged &&
> > +				 (fraglen + hh_len < SKB_MAX_ALLOC ||  
> 
> The number 15 is not use here.
> 
> > +				  !(rt->dst.dev->features & NETIF_F_SG)))
> >  				alloclen = fraglen;
> >  			else {
> >  				alloclen = min_t(int, fraglen, MAX_HEADER);  

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v3] net: ip: avoid OOM kills with large UDP sends over loopback
  2021-06-23 19:45 ` Jesper Dangaard Brouer
  2021-06-23 21:07   ` Jakub Kicinski
@ 2021-06-23 21:14   ` Eric Dumazet
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2021-06-23 21:14 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Jakub Kicinski
  Cc: davem, netdev, willemb, eric.dumazet, dsahern, yoshfuji, Dave Jones



On 6/23/21 9:45 PM, Jesper Dangaard Brouer wrote:
> On Wed, 23 Jun 2021 09:23:28 -0700
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
>> Dave observed number of machines hitting OOM on the UDP send
>> path. The workload seems to be sending large UDP packets over
>> loopback. Since loopback has MTU of 64k kernel will try to
>> allocate an skb with up to 64k of head space. This has a good
>> chance of failing under memory pressure. What's worse if
>> the message length is <32k the allocation may trigger an
>> OOM killer.
>>
>> This is entirely avoidable, we can use an skb with page frags.
>>
>> af_unix solves a similar problem by limiting the head
>> length to SKB_MAX_ALLOC. This seems like a good and simple
>> approach. It means that UDP messages > 16kB will now
>> use fragments if underlying device supports SG, if extra
>> allocator pressure causes regressions in real workloads
>> we can switch to trying the large allocation first and
>> falling back.
>>
>> Reported-by: Dave Jones <dsj@fb.com>
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> ---
>>  net/ipv4/ip_output.c  | 4 +++-
>>  net/ipv6/ip6_output.c | 4 +++-
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index c3efc7d658f6..790dd28fd198 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -1077,7 +1077,9 @@ static int __ip_append_data(struct sock *sk,
>>  			if ((flags & MSG_MORE) &&
>>  			    !(rt->dst.dev->features&NETIF_F_SG))
>>  				alloclen = mtu;
>> -			else if (!paged)
>> +			else if (!paged &&
>> +				 (fraglen + hh_len + 15 < SKB_MAX_ALLOC ||
> 
> What does the number 15 represent here?

Just look at the existing code, few lines below

skb = alloc_skb(alloclen + hh_len + 15, ...


> 
>> +				  !(rt->dst.dev->features & NETIF_F_SG)))
>>  				alloclen = fraglen;
>>  			else {
>>  				alloclen = min_t(int, fraglen, MAX_HEADER);
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index ff4f9ebcf7f6..ae8dbd6cdab1 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -1585,7 +1585,9 @@ static int __ip6_append_data(struct sock *sk,
>>  			if ((flags & MSG_MORE) &&
>>  			    !(rt->dst.dev->features&NETIF_F_SG))
>>  				alloclen = mtu;
>> -			else if (!paged)
>> +			else if (!paged &&
>> +				 (fraglen + hh_len < SKB_MAX_ALLOC ||
> 
> The number 15 is not use here.

Because the alloc_skb() done later does not use + 15



> 
>> +				  !(rt->dst.dev->features & NETIF_F_SG)))
>>  				alloclen = fraglen;
>>  			else {
>>  				alloclen = min_t(int, fraglen, MAX_HEADER);
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v3] net: ip: avoid OOM kills with large UDP sends over loopback
  2021-06-23 21:07   ` Jakub Kicinski
@ 2021-06-24  2:23     ` Willem de Bruijn
  0 siblings, 0 replies; 7+ messages in thread
From: Willem de Bruijn @ 2021-06-24  2:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jesper Dangaard Brouer, davem, netdev, eric.dumazet, dsahern,
	yoshfuji, Dave Jones

On Wed, Jun 23, 2021 at 5:07 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 23 Jun 2021 21:45:55 +0200 Jesper Dangaard Brouer wrote:
> > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > > index c3efc7d658f6..790dd28fd198 100644
> > > --- a/net/ipv4/ip_output.c
> > > +++ b/net/ipv4/ip_output.c
> > > @@ -1077,7 +1077,9 @@ static int __ip_append_data(struct sock *sk,
> > >                     if ((flags & MSG_MORE) &&
> > >                         !(rt->dst.dev->features&NETIF_F_SG))
> > >                             alloclen = mtu;
> > > -                   else if (!paged)
> > > +                   else if (!paged &&
> > > +                            (fraglen + hh_len + 15 < SKB_MAX_ALLOC ||
> >
> > What does the number 15 represent here?
>
> No idea, it's there on the allocation line, so I need to include it on
> the size check.
>
> Looking at super old code (2.4.x) it looks like it may have gotten
> copy & pasted mistakenly? The hard headers are rounded up to 16B,
> and there is code which does things like:
>
>         skb_alloc(size + dev->hard_header_len + 15);
>         skb_reserve(skb, (dev->hard_header_len + 15) & ~15);
>
> in other spots. So if I was to guess I'd say someone decided to add the
> 15B "to be safe" even though hh_len already includes the round up here.

The 15 seems to come from alignment indeed. Not sure when it was
introduced, but until 56951b54e87a there is also this

                /*
                 *      Get the memory we require with some space left
for alignment.
                 */

                skb = sock_alloc_send_skb(sk, fraglen+hh_len+15, 0,
flags&MSG_DONTWAIT, &err);

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-24  2:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-23 16:23 [PATCH net-next v3] net: ip: avoid OOM kills with large UDP sends over loopback Jakub Kicinski
2021-06-23 18:05 ` Eric Dumazet
2021-06-23 18:11   ` Eric Dumazet
2021-06-23 19:45 ` Jesper Dangaard Brouer
2021-06-23 21:07   ` Jakub Kicinski
2021-06-24  2:23     ` Willem de Bruijn
2021-06-23 21:14   ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.