All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: let skb_orphan_partial wake-up waiters.
@ 2021-03-30 14:24 Paolo Abeni
  2021-03-30 14:39 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2021-03-30 14:24 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet

Currently the mentioned helper can end-up freeing the socket wmem
without waking-up any processes waiting for more write memory.

If the partially orphaned skb is attached to an UDP (or raw) socket,
the lack of wake-up can hang the user-space.

Address the issue invoking the write_space callback after
releasing the memory, if the old skb destructor requires that.

Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/core/sock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index 0ed98f20448a2..7a38332d748e7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb)
 
 		if (refcount_inc_not_zero(&sk->sk_refcnt)) {
 			WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
+			if (skb->destructor == sock_wfree)
+				sk->sk_write_space(sk);
 			skb->destructor = sock_efree;
 		}
 	} else {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net: let skb_orphan_partial wake-up waiters.
  2021-03-30 14:24 [PATCH net] net: let skb_orphan_partial wake-up waiters Paolo Abeni
@ 2021-03-30 14:39 ` Eric Dumazet
  2021-03-30 14:40   ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2021-03-30 14:39 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev

On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Currently the mentioned helper can end-up freeing the socket wmem
> without waking-up any processes waiting for more write memory.
>
> If the partially orphaned skb is attached to an UDP (or raw) socket,
> the lack of wake-up can hang the user-space.
>
> Address the issue invoking the write_space callback after
> releasing the memory, if the old skb destructor requires that.
>
> Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/core/sock.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 0ed98f20448a2..7a38332d748e7 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb)
>
>                 if (refcount_inc_not_zero(&sk->sk_refcnt)) {
>                         WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
> +                       if (skb->destructor == sock_wfree)
> +                               sk->sk_write_space(sk);

Interesting.

Why TCP is not a problem here ?

I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize,
&sk->sk_wmem_alloc)) by :
                        skb_orphan(skb);

This will get rid of this suspect WARN_ON() at the same time ?

>                         skb->destructor = sock_efree;
>                 }
>         } else {
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net: let skb_orphan_partial wake-up waiters.
  2021-03-30 14:39 ` Eric Dumazet
@ 2021-03-30 14:40   ` Eric Dumazet
  2021-03-30 15:18     ` Paolo Abeni
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2021-03-30 14:40 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev

On Tue, Mar 30, 2021 at 4:39 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > Currently the mentioned helper can end-up freeing the socket wmem
> > without waking-up any processes waiting for more write memory.
> >
> > If the partially orphaned skb is attached to an UDP (or raw) socket,
> > the lack of wake-up can hang the user-space.
> >
> > Address the issue invoking the write_space callback after
> > releasing the memory, if the old skb destructor requires that.
> >
> > Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/core/sock.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 0ed98f20448a2..7a38332d748e7 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb)
> >
> >                 if (refcount_inc_not_zero(&sk->sk_refcnt)) {
> >                         WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
> > +                       if (skb->destructor == sock_wfree)
> > +                               sk->sk_write_space(sk);
>
> Interesting.
>
> Why TCP is not a problem here ?
>
> I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize,
> &sk->sk_wmem_alloc)) by :
>                         skb_orphan(skb);

And of course re-add
                        skb->sk = sk;

>
> This will get rid of this suspect WARN_ON() at the same time ?
>
> >                         skb->destructor = sock_efree;
> >                 }
> >         } else {
> > --
> > 2.26.2
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net: let skb_orphan_partial wake-up waiters.
  2021-03-30 14:40   ` Eric Dumazet
@ 2021-03-30 15:18     ` Paolo Abeni
  2021-03-30 15:29       ` Eric Dumazet
  2021-03-30 15:32       ` Eric Dumazet
  0 siblings, 2 replies; 6+ messages in thread
From: Paolo Abeni @ 2021-03-30 15:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Tue, 2021-03-30 at 16:40 +0200, Eric Dumazet wrote:
> On Tue, Mar 30, 2021 at 4:39 PM Eric Dumazet <edumazet@google.com> wrote:
> > On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > Currently the mentioned helper can end-up freeing the socket wmem
> > > without waking-up any processes waiting for more write memory.
> > > 
> > > If the partially orphaned skb is attached to an UDP (or raw) socket,
> > > the lack of wake-up can hang the user-space.
> > > 
> > > Address the issue invoking the write_space callback after
> > > releasing the memory, if the old skb destructor requires that.
> > > 
> > > Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  net/core/sock.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index 0ed98f20448a2..7a38332d748e7 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb)
> > > 
> > >                 if (refcount_inc_not_zero(&sk->sk_refcnt)) {
> > >                         WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
> > > +                       if (skb->destructor == sock_wfree)
> > > +                               sk->sk_write_space(sk);
> > 
> > Interesting.
> > 
> > Why TCP is not a problem here ?

AFAICS, tcp_wfree() does not call sk->sk_write_space(). Processes
waiting for wmem are woken by ack processing.

> > I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize,
> > &sk->sk_wmem_alloc)) by :
> >                         skb_orphan(skb);
> 
> And of course re-add
>                         skb->sk = sk;

Double checking to be sure. The patched slice of skb_orphan_partial()
will then look like:

	if (can_skb_orphan_partial(skb)) {
		struct sock *sk = skb->sk;
		
		if (refcount_inc_not_zero(&sk->sk_refcnt)) {
			skb_orphan(skb);
			skb->sk = sk;
			skb->destructor = sock_efree;
		}
	} // ...

Am I correct?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net: let skb_orphan_partial wake-up waiters.
  2021-03-30 15:18     ` Paolo Abeni
@ 2021-03-30 15:29       ` Eric Dumazet
  2021-03-30 15:32       ` Eric Dumazet
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2021-03-30 15:29 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev

On Tue, Mar 30, 2021 at 5:18 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2021-03-30 at 16:40 +0200, Eric Dumazet wrote:
> > On Tue, Mar 30, 2021 at 4:39 PM Eric Dumazet <edumazet@google.com> wrote:
> > > On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > Currently the mentioned helper can end-up freeing the socket wmem
> > > > without waking-up any processes waiting for more write memory.
> > > >
> > > > If the partially orphaned skb is attached to an UDP (or raw) socket,
> > > > the lack of wake-up can hang the user-space.
> > > >
> > > > Address the issue invoking the write_space callback after
> > > > releasing the memory, if the old skb destructor requires that.
> > > >
> > > > Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > >  net/core/sock.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > index 0ed98f20448a2..7a38332d748e7 100644
> > > > --- a/net/core/sock.c
> > > > +++ b/net/core/sock.c
> > > > @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb)
> > > >
> > > >                 if (refcount_inc_not_zero(&sk->sk_refcnt)) {
> > > >                         WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
> > > > +                       if (skb->destructor == sock_wfree)
> > > > +                               sk->sk_write_space(sk);
> > >
> > > Interesting.
> > >
> > > Why TCP is not a problem here ?
>
> AFAICS, tcp_wfree() does not call sk->sk_write_space(). Processes
> waiting for wmem are woken by ack processing.
>
> > > I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize,
> > > &sk->sk_wmem_alloc)) by :
> > >                         skb_orphan(skb);
> >
> > And of course re-add
> >                         skb->sk = sk;
>
> Double checking to be sure. The patched slice of skb_orphan_partial()
> will then look like:
>
>         if (can_skb_orphan_partial(skb)) {
>                 struct sock *sk = skb->sk;
>
>                 if (refcount_inc_not_zero(&sk->sk_refcnt)) {
>                         skb_orphan(skb);
>                         skb->sk = sk;
>                         skb->destructor = sock_efree;
>                 }
>         } // ...
>
> Am I correct?
>

Yes.

We also could add a helper for the whole construct, since many other
paths do almost the same.
(They might use sock_hold(), but it seems safe to use the
refcount_inc_not_zero())
Or they omit the skb_orphan() (see can_skb_set_owner), which seems also risky.

static inline void skb_set_owner_sk_safe(sk, skb)
{
    if (sk && refcount_inc_not_zero(&sk->sk_refcnt)) {
           skb_orphan(skb);
           skb->destructor = sock_efree;
          skb->sk = sk;
    }
}



> Thanks!
>
> Paolo
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net: let skb_orphan_partial wake-up waiters.
  2021-03-30 15:18     ` Paolo Abeni
  2021-03-30 15:29       ` Eric Dumazet
@ 2021-03-30 15:32       ` Eric Dumazet
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2021-03-30 15:32 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev

On Tue, Mar 30, 2021 at 5:18 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2021-03-30 at 16:40 +0200, Eric Dumazet wrote:

> > >
> > > Why TCP is not a problem here ?
>
> AFAICS, tcp_wfree() does not call sk->sk_write_space(). Processes
> waiting for wmem are woken by ack processing.

My concern was TCP Small Queue. If we do not call tcp_wfree() we might miss
an opportunity to queue more packets (and thus fallback to RTO or
incoming ACK is we are lucky)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-30 15:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30 14:24 [PATCH net] net: let skb_orphan_partial wake-up waiters Paolo Abeni
2021-03-30 14:39 ` Eric Dumazet
2021-03-30 14:40   ` Eric Dumazet
2021-03-30 15:18     ` Paolo Abeni
2021-03-30 15:29       ` Eric Dumazet
2021-03-30 15:32       ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.