All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] sctp: fix refcount bug in sctp_wfree
@ 2020-03-20 11:09 ` Qiujun Huang
  0 siblings, 0 replies; 18+ messages in thread
From: Qiujun Huang @ 2020-03-20 11:09 UTC (permalink / raw)
  To: marcelo.leitner, davem
  Cc: vyasevich, nhorman, kuba, linux-sctp, netdev, linux-kernel,
	anenbupt, Qiujun Huang

Do accounting for skb's real sk.
In some case skb->sk != asoc->base.sk:

for the trouble SKB, it was in outq->transmitted queue

sctp_outq_sack
	sctp_check_transmitted
		SKB was moved to outq->sack
	then throw away the sack queue
		SKB was deleted from outq->sack
(but the datamsg held SKB at sctp_datamsg_to_asoc
So, sctp_wfree was not called to destroy SKB)

then migrate happened

	sctp_for_each_tx_datachunk(
	sctp_clear_owner_w);
	sctp_assoc_migrate();
	sctp_for_each_tx_datachunk(
	sctp_set_owner_w);
SKB was not in the outq, and was not changed to newsk

finally

__sctp_outq_teardown
	sctp_chunk_put (for another skb)
		sctp_datamsg_put
			__kfree_skb(msg->frag_list)
				sctp_wfree (for SKB)
this case in sctp_wfree SKB->sk was oldsk.

It looks only trouble here so handling it in sctp_wfree is enough.

Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
---
 net/sctp/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1b56fc440606..5f5c28b30e25 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -9080,7 +9080,7 @@ static void sctp_wfree(struct sk_buff *skb)
 {
 	struct sctp_chunk *chunk = skb_shinfo(skb)->destructor_arg;
 	struct sctp_association *asoc = chunk->asoc;
-	struct sock *sk = asoc->base.sk;
+	struct sock *sk = skb->sk;
 
 	sk_mem_uncharge(sk, skb->truesize);
 	sk->sk_wmem_queued -= skb->truesize + sizeof(struct sctp_chunk);
@@ -9109,7 +9109,7 @@ static void sctp_wfree(struct sk_buff *skb)
 	}
 
 	sock_wfree(skb);
-	sctp_wake_up_waiters(sk, asoc);
+	sctp_wake_up_waiters(asoc->base.sk, asoc);
 
 	sctp_association_put(asoc);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3] sctp: fix refcount bug in sctp_wfree
@ 2020-03-20 11:09 ` Qiujun Huang
  0 siblings, 0 replies; 18+ messages in thread
From: Qiujun Huang @ 2020-03-20 11:09 UTC (permalink / raw)
  To: marcelo.leitner, davem
  Cc: vyasevich, nhorman, kuba, linux-sctp, netdev, linux-kernel,
	anenbupt, Qiujun Huang

Do accounting for skb's real sk.
In some case skb->sk != asoc->base.sk:

for the trouble SKB, it was in outq->transmitted queue

sctp_outq_sack
	sctp_check_transmitted
		SKB was moved to outq->sack
	then throw away the sack queue
		SKB was deleted from outq->sack
(but the datamsg held SKB at sctp_datamsg_to_asoc
So, sctp_wfree was not called to destroy SKB)

then migrate happened

	sctp_for_each_tx_datachunk(
	sctp_clear_owner_w);
	sctp_assoc_migrate();
	sctp_for_each_tx_datachunk(
	sctp_set_owner_w);
SKB was not in the outq, and was not changed to newsk

finally

__sctp_outq_teardown
	sctp_chunk_put (for another skb)
		sctp_datamsg_put
			__kfree_skb(msg->frag_list)
				sctp_wfree (for SKB)
this case in sctp_wfree SKB->sk was oldsk.

It looks only trouble here so handling it in sctp_wfree is enough.

Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
---
 net/sctp/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1b56fc440606..5f5c28b30e25 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -9080,7 +9080,7 @@ static void sctp_wfree(struct sk_buff *skb)
 {
 	struct sctp_chunk *chunk = skb_shinfo(skb)->destructor_arg;
 	struct sctp_association *asoc = chunk->asoc;
-	struct sock *sk = asoc->base.sk;
+	struct sock *sk = skb->sk;
 
 	sk_mem_uncharge(sk, skb->truesize);
 	sk->sk_wmem_queued -= skb->truesize + sizeof(struct sctp_chunk);
@@ -9109,7 +9109,7 @@ static void sctp_wfree(struct sk_buff *skb)
 	}
 
 	sock_wfree(skb);
-	sctp_wake_up_waiters(sk, asoc);
+	sctp_wake_up_waiters(asoc->base.sk, asoc);
 
 	sctp_association_put(asoc);
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
  2020-03-20 11:09 ` Qiujun Huang
@ 2020-03-20 17:10   ` Eric Dumazet
  -1 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2020-03-20 17:10 UTC (permalink / raw)
  To: Qiujun Huang, marcelo.leitner, davem
  Cc: vyasevich, nhorman, kuba, linux-sctp, netdev, linux-kernel, anenbupt



On 3/20/20 4:09 AM, Qiujun Huang wrote:
> Do accounting for skb's real sk.
> In some case skb->sk != asoc->base.sk:
> 
> for the trouble SKB, it was in outq->transmitted queue
> 
> sctp_outq_sack
> 	sctp_check_transmitted
> 		SKB was moved to outq->sack
> 	then throw away the sack queue
> 		SKB was deleted from outq->sack
> (but the datamsg held SKB at sctp_datamsg_to_asoc
> So, sctp_wfree was not called to destroy SKB)
> 
> then migrate happened
> 
> 	sctp_for_each_tx_datachunk(
> 	sctp_clear_owner_w);
> 	sctp_assoc_migrate();
> 	sctp_for_each_tx_datachunk(
> 	sctp_set_owner_w);
> SKB was not in the outq, and was not changed to newsk
> 
> finally
> 
> __sctp_outq_teardown
> 	sctp_chunk_put (for another skb)
> 		sctp_datamsg_put
> 			__kfree_skb(msg->frag_list)
> 				sctp_wfree (for SKB)
> this case in sctp_wfree SKB->sk was oldsk.
> 
> It looks only trouble here so handling it in sctp_wfree is enough.
> 
> Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com
> Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
> ---
>  net/sctp/socket.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 1b56fc440606..5f5c28b30e25 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -9080,7 +9080,7 @@ static void sctp_wfree(struct sk_buff *skb)
>  {
>  	struct sctp_chunk *chunk = skb_shinfo(skb)->destructor_arg;
>  	struct sctp_association *asoc = chunk->asoc;
> -	struct sock *sk = asoc->base.sk;
> +	struct sock *sk = skb->sk;
>  
>  	sk_mem_uncharge(sk, skb->truesize);
>  	sk->sk_wmem_queued -= skb->truesize + sizeof(struct sctp_chunk);
> @@ -9109,7 +9109,7 @@ static void sctp_wfree(struct sk_buff *skb)
>  	}
>  
>  	sock_wfree(skb);
> -	sctp_wake_up_waiters(sk, asoc);
> +	sctp_wake_up_waiters(asoc->base.sk, asoc);
>  
>  	sctp_association_put(asoc);
>  }
> 

This does not really solve the issue.

Even if the particular syzbot repro is now fine.

Really, having anything _after_ the sock_wfree(skb) is the bug, since the current thread no longer
own a reference on a socket.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
@ 2020-03-20 17:10   ` Eric Dumazet
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2020-03-20 17:10 UTC (permalink / raw)
  To: Qiujun Huang, marcelo.leitner, davem
  Cc: vyasevich, nhorman, kuba, linux-sctp, netdev, linux-kernel, anenbupt



On 3/20/20 4:09 AM, Qiujun Huang wrote:
> Do accounting for skb's real sk.
> In some case skb->sk != asoc->base.sk:
> 
> for the trouble SKB, it was in outq->transmitted queue
> 
> sctp_outq_sack
> 	sctp_check_transmitted
> 		SKB was moved to outq->sack
> 	then throw away the sack queue
> 		SKB was deleted from outq->sack
> (but the datamsg held SKB at sctp_datamsg_to_asoc
> So, sctp_wfree was not called to destroy SKB)
> 
> then migrate happened
> 
> 	sctp_for_each_tx_datachunk(
> 	sctp_clear_owner_w);
> 	sctp_assoc_migrate();
> 	sctp_for_each_tx_datachunk(
> 	sctp_set_owner_w);
> SKB was not in the outq, and was not changed to newsk
> 
> finally
> 
> __sctp_outq_teardown
> 	sctp_chunk_put (for another skb)
> 		sctp_datamsg_put
> 			__kfree_skb(msg->frag_list)
> 				sctp_wfree (for SKB)
> this case in sctp_wfree SKB->sk was oldsk.
> 
> It looks only trouble here so handling it in sctp_wfree is enough.
> 
> Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com
> Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
> ---
>  net/sctp/socket.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 1b56fc440606..5f5c28b30e25 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -9080,7 +9080,7 @@ static void sctp_wfree(struct sk_buff *skb)
>  {
>  	struct sctp_chunk *chunk = skb_shinfo(skb)->destructor_arg;
>  	struct sctp_association *asoc = chunk->asoc;
> -	struct sock *sk = asoc->base.sk;
> +	struct sock *sk = skb->sk;
>  
>  	sk_mem_uncharge(sk, skb->truesize);
>  	sk->sk_wmem_queued -= skb->truesize + sizeof(struct sctp_chunk);
> @@ -9109,7 +9109,7 @@ static void sctp_wfree(struct sk_buff *skb)
>  	}
>  
>  	sock_wfree(skb);
> -	sctp_wake_up_waiters(sk, asoc);
> +	sctp_wake_up_waiters(asoc->base.sk, asoc);
>  
>  	sctp_association_put(asoc);
>  }
> 

This does not really solve the issue.

Even if the particular syzbot repro is now fine.

Really, having anything _after_ the sock_wfree(skb) is the bug, since the current thread no longer
own a reference on a socket.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
  2020-03-20 11:09 ` Qiujun Huang
@ 2020-03-20 18:52   ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-03-20 18:52 UTC (permalink / raw)
  To: Qiujun Huang
  Cc: davem, vyasevich, nhorman, kuba, linux-sctp, netdev,
	linux-kernel, anenbupt

On Fri, Mar 20, 2020 at 07:09:59PM +0800, Qiujun Huang wrote:
> Do accounting for skb's real sk.
> In some case skb->sk != asoc->base.sk:
> 
> for the trouble SKB, it was in outq->transmitted queue
> 
> sctp_outq_sack
> 	sctp_check_transmitted
> 		SKB was moved to outq->sack

There is no outq->sack. You mean outq->sacked, I assume.

> 	then throw away the sack queue

Where? How?
If you mean:
        /* Throw away stuff rotting on the sack queue.  */
        list_for_each_safe(lchunk, temp, &q->sacked) {
                tchunk = list_entry(lchunk, struct sctp_chunk,
                                    transmitted_list);
                tsn = ntohl(tchunk->subh.data_hdr->tsn);
                if (TSN_lte(tsn, ctsn)) {
                        list_del_init(&tchunk->transmitted_list);
                        if (asoc->peer.prsctp_capable &&
                            SCTP_PR_PRIO_ENABLED(chunk->sinfo.sinfo_flags))
                                asoc->sent_cnt_removable--;
                        sctp_chunk_free(tchunk);

Then sctp_chunk_free is supposed to free the datamsg as well for
chunks that were cumulative-sacked.
For those not cumulative-sacked, sctp_for_each_tx_datachunk() will
handle q->sacked queue as well:
        list_for_each_entry(chunk, &q->sacked, transmitted_list)
                cb(chunk);

So I don't see how skbs can be overlooked here.

> 		SKB was deleted from outq->sack
> (but the datamsg held SKB at sctp_datamsg_to_asoc

You mean sctp_datamsg_from_user ? If so, isn't it the other way
around? sctp_datamsg_assign() will hold the datamsg, not the skb.

> So, sctp_wfree was not called to destroy SKB)
> 
> then migrate happened
> 
> 	sctp_for_each_tx_datachunk(
> 	sctp_clear_owner_w);
> 	sctp_assoc_migrate();
> 	sctp_for_each_tx_datachunk(
> 	sctp_set_owner_w);
> SKB was not in the outq, and was not changed to newsk

The real fix is to fix the migration to the new socket, though the
situation on which it is happening is still not clear.

The 2nd sendto() call on the reproducer is sending 212992 bytes on a
single call. That's usually the whole sndbuf size, and will cause
fragmentation to happen. That means the datamsg will contain several
skbs. But still, the sacked chunks should be freed if needed while the
remaining ones will be left on the queues that they are.

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
@ 2020-03-20 18:52   ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-03-20 18:52 UTC (permalink / raw)
  To: Qiujun Huang
  Cc: davem, vyasevich, nhorman, kuba, linux-sctp, netdev,
	linux-kernel, anenbupt

On Fri, Mar 20, 2020 at 07:09:59PM +0800, Qiujun Huang wrote:
> Do accounting for skb's real sk.
> In some case skb->sk != asoc->base.sk:
> 
> for the trouble SKB, it was in outq->transmitted queue
> 
> sctp_outq_sack
> 	sctp_check_transmitted
> 		SKB was moved to outq->sack

There is no outq->sack. You mean outq->sacked, I assume.

> 	then throw away the sack queue

Where? How?
If you mean:
        /* Throw away stuff rotting on the sack queue.  */
        list_for_each_safe(lchunk, temp, &q->sacked) {
                tchunk = list_entry(lchunk, struct sctp_chunk,
                                    transmitted_list);
                tsn = ntohl(tchunk->subh.data_hdr->tsn);
                if (TSN_lte(tsn, ctsn)) {
                        list_del_init(&tchunk->transmitted_list);
                        if (asoc->peer.prsctp_capable &&
                            SCTP_PR_PRIO_ENABLED(chunk->sinfo.sinfo_flags))
                                asoc->sent_cnt_removable--;
                        sctp_chunk_free(tchunk);

Then sctp_chunk_free is supposed to free the datamsg as well for
chunks that were cumulative-sacked.
For those not cumulative-sacked, sctp_for_each_tx_datachunk() will
handle q->sacked queue as well:
        list_for_each_entry(chunk, &q->sacked, transmitted_list)
                cb(chunk);

So I don't see how skbs can be overlooked here.

> 		SKB was deleted from outq->sack
> (but the datamsg held SKB at sctp_datamsg_to_asoc

You mean sctp_datamsg_from_user ? If so, isn't it the other way
around? sctp_datamsg_assign() will hold the datamsg, not the skb.

> So, sctp_wfree was not called to destroy SKB)
> 
> then migrate happened
> 
> 	sctp_for_each_tx_datachunk(
> 	sctp_clear_owner_w);
> 	sctp_assoc_migrate();
> 	sctp_for_each_tx_datachunk(
> 	sctp_set_owner_w);
> SKB was not in the outq, and was not changed to newsk

The real fix is to fix the migration to the new socket, though the
situation on which it is happening is still not clear.

The 2nd sendto() call on the reproducer is sending 212992 bytes on a
single call. That's usually the whole sndbuf size, and will cause
fragmentation to happen. That means the datamsg will contain several
skbs. But still, the sacked chunks should be freed if needed while the
remaining ones will be left on the queues that they are.

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
  2020-03-20 17:10   ` Eric Dumazet
@ 2020-03-20 23:36     ` Qiujun Huang
  -1 siblings, 0 replies; 18+ messages in thread
From: Qiujun Huang @ 2020-03-20 23:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marcelo Ricardo Leitner, David S. Miller, vyasevich, nhorman,
	Jakub Kicinski, linux-sctp, netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 1:10 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
>
> This does not really solve the issue.
>
> Even if the particular syzbot repro is now fine.
>
> Really, having anything _after_ the sock_wfree(skb) is the bug, since the current thread no longer
> own a reference on a socket.

I get it, thanks.

>
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
@ 2020-03-20 23:36     ` Qiujun Huang
  0 siblings, 0 replies; 18+ messages in thread
From: Qiujun Huang @ 2020-03-20 23:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marcelo Ricardo Leitner, David S. Miller, vyasevich, nhorman,
	Jakub Kicinski, linux-sctp, netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 1:10 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
>
> This does not really solve the issue.
>
> Even if the particular syzbot repro is now fine.
>
> Really, having anything _after_ the sock_wfree(skb) is the bug, since the current thread no longer
> own a reference on a socket.

I get it, thanks.

>
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
  2020-03-20 18:52   ` Marcelo Ricardo Leitner
@ 2020-03-20 23:48     ` Qiujun Huang
  -1 siblings, 0 replies; 18+ messages in thread
From: Qiujun Huang @ 2020-03-20 23:48 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David S. Miller, vyasevich, nhorman, Jakub Kicinski, linux-sctp,
	netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 2:52 AM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
>
> On Fri, Mar 20, 2020 at 07:09:59PM +0800, Qiujun Huang wrote:
> > Do accounting for skb's real sk.
> > In some case skb->sk != asoc->base.sk:
> >
> > for the trouble SKB, it was in outq->transmitted queue
> >
> > sctp_outq_sack
> >       sctp_check_transmitted
> >               SKB was moved to outq->sack
>
> There is no outq->sack. You mean outq->sacked, I assume.

Yes, my typo.

>
> >       then throw away the sack queue
>
> Where? How?
> If you mean:
>         /* Throw away stuff rotting on the sack queue.  */
>         list_for_each_safe(lchunk, temp, &q->sacked) {
>                 tchunk = list_entry(lchunk, struct sctp_chunk,
>                                     transmitted_list);
>                 tsn = ntohl(tchunk->subh.data_hdr->tsn);
>                 if (TSN_lte(tsn, ctsn)) {
>                         list_del_init(&tchunk->transmitted_list);
>                         if (asoc->peer.prsctp_capable &&
>                             SCTP_PR_PRIO_ENABLED(chunk->sinfo.sinfo_flags))
>                                 asoc->sent_cnt_removable--;
>                         sctp_chunk_free(tchunk);

Yes, it was delected here.

>
> Then sctp_chunk_free is supposed to free the datamsg as well for
> chunks that were cumulative-sacked.

Datamsg should be freed until all his chunks had been freed.

sctp_datamsg_from_user->sctp_datamsg_assign
every chunks holds datamsg.

> For those not cumulative-sacked, sctp_for_each_tx_datachunk() will
> handle q->sacked queue as well:
>         list_for_each_entry(chunk, &q->sacked, transmitted_list)
>                 cb(chunk);
>
> So I don't see how skbs can be overlooked here.
>
> >               SKB was deleted from outq->sack
> > (but the datamsg held SKB at sctp_datamsg_to_asoc
>
> You mean sctp_datamsg_from_user ? If so, isn't it the other way
> around? sctp_datamsg_assign() will hold the datamsg, not the skb.
yeah.

>
> > So, sctp_wfree was not called to destroy SKB)
> >
> > then migrate happened
> >
> >       sctp_for_each_tx_datachunk(
> >       sctp_clear_owner_w);
> >       sctp_assoc_migrate();
> >       sctp_for_each_tx_datachunk(
> >       sctp_set_owner_w);
> > SKB was not in the outq, and was not changed to newsk
>
> The real fix is to fix the migration to the new socket, though the
> situation on which it is happening is still not clear.
>
> The 2nd sendto() call on the reproducer is sending 212992 bytes on a
> single call. That's usually the whole sndbuf size, and will cause
> fragmentation to happen. That means the datamsg will contain several
> skbs. But still, the sacked chunks should be freed if needed while the
> remaining ones will be left on the queues that they are.
>
> Thanks,
> Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
@ 2020-03-20 23:48     ` Qiujun Huang
  0 siblings, 0 replies; 18+ messages in thread
From: Qiujun Huang @ 2020-03-20 23:48 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David S. Miller, vyasevich, nhorman, Jakub Kicinski, linux-sctp,
	netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 2:52 AM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
>
> On Fri, Mar 20, 2020 at 07:09:59PM +0800, Qiujun Huang wrote:
> > Do accounting for skb's real sk.
> > In some case skb->sk != asoc->base.sk:
> >
> > for the trouble SKB, it was in outq->transmitted queue
> >
> > sctp_outq_sack
> >       sctp_check_transmitted
> >               SKB was moved to outq->sack
>
> There is no outq->sack. You mean outq->sacked, I assume.

Yes, my typo.

>
> >       then throw away the sack queue
>
> Where? How?
> If you mean:
>         /* Throw away stuff rotting on the sack queue.  */
>         list_for_each_safe(lchunk, temp, &q->sacked) {
>                 tchunk = list_entry(lchunk, struct sctp_chunk,
>                                     transmitted_list);
>                 tsn = ntohl(tchunk->subh.data_hdr->tsn);
>                 if (TSN_lte(tsn, ctsn)) {
>                         list_del_init(&tchunk->transmitted_list);
>                         if (asoc->peer.prsctp_capable &&
>                             SCTP_PR_PRIO_ENABLED(chunk->sinfo.sinfo_flags))
>                                 asoc->sent_cnt_removable--;
>                         sctp_chunk_free(tchunk);

Yes, it was delected here.

>
> Then sctp_chunk_free is supposed to free the datamsg as well for
> chunks that were cumulative-sacked.

Datamsg should be freed until all his chunks had been freed.

sctp_datamsg_from_user->sctp_datamsg_assign
every chunks holds datamsg.

> For those not cumulative-sacked, sctp_for_each_tx_datachunk() will
> handle q->sacked queue as well:
>         list_for_each_entry(chunk, &q->sacked, transmitted_list)
>                 cb(chunk);
>
> So I don't see how skbs can be overlooked here.
>
> >               SKB was deleted from outq->sack
> > (but the datamsg held SKB at sctp_datamsg_to_asoc
>
> You mean sctp_datamsg_from_user ? If so, isn't it the other way
> around? sctp_datamsg_assign() will hold the datamsg, not the skb.
yeah.

>
> > So, sctp_wfree was not called to destroy SKB)
> >
> > then migrate happened
> >
> >       sctp_for_each_tx_datachunk(
> >       sctp_clear_owner_w);
> >       sctp_assoc_migrate();
> >       sctp_for_each_tx_datachunk(
> >       sctp_set_owner_w);
> > SKB was not in the outq, and was not changed to newsk
>
> The real fix is to fix the migration to the new socket, though the
> situation on which it is happening is still not clear.
>
> The 2nd sendto() call on the reproducer is sending 212992 bytes on a
> single call. That's usually the whole sndbuf size, and will cause
> fragmentation to happen. That means the datamsg will contain several
> skbs. But still, the sacked chunks should be freed if needed while the
> remaining ones will be left on the queues that they are.
>
> Thanks,
> Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
  2020-03-20 18:52   ` Marcelo Ricardo Leitner
@ 2020-03-20 23:53     ` Qiujun Huang
  -1 siblings, 0 replies; 18+ messages in thread
From: Qiujun Huang @ 2020-03-20 23:53 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David S. Miller, vyasevich, nhorman, Jakub Kicinski, linux-sctp,
	netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 2:52 AM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
>
> On Fri, Mar 20, 2020 at 07:09:59PM +0800, Qiujun Huang wrote:
> > Do accounting for skb's real sk.
> > In some case skb->sk != asoc->base.sk:
> >
> > for the trouble SKB, it was in outq->transmitted queue
> >
> > sctp_outq_sack
> >       sctp_check_transmitted
> >               SKB was moved to outq->sack
>
> There is no outq->sack. You mean outq->sacked, I assume.
>
> >       then throw away the sack queue
>
> Where? How?
> If you mean:
>         /* Throw away stuff rotting on the sack queue.  */
>         list_for_each_safe(lchunk, temp, &q->sacked) {
>                 tchunk = list_entry(lchunk, struct sctp_chunk,
>                                     transmitted_list);
>                 tsn = ntohl(tchunk->subh.data_hdr->tsn);
>                 if (TSN_lte(tsn, ctsn)) {
>                         list_del_init(&tchunk->transmitted_list);
>                         if (asoc->peer.prsctp_capable &&
>                             SCTP_PR_PRIO_ENABLED(chunk->sinfo.sinfo_flags))
>                                 asoc->sent_cnt_removable--;
>                         sctp_chunk_free(tchunk);
>
> Then sctp_chunk_free is supposed to free the datamsg as well for
> chunks that were cumulative-sacked.
> For those not cumulative-sacked, sctp_for_each_tx_datachunk() will
> handle q->sacked queue as well:
>         list_for_each_entry(chunk, &q->sacked, transmitted_list)
>                 cb(chunk);
>
> So I don't see how skbs can be overlooked here.
>
> >               SKB was deleted from outq->sack
> > (but the datamsg held SKB at sctp_datamsg_to_asoc
>
> You mean sctp_datamsg_from_user ? If so, isn't it the other way
> around? sctp_datamsg_assign() will hold the datamsg, not the skb.
>
> > So, sctp_wfree was not called to destroy SKB)
> >
> > then migrate happened
> >
> >       sctp_for_each_tx_datachunk(
> >       sctp_clear_owner_w);
> >       sctp_assoc_migrate();
> >       sctp_for_each_tx_datachunk(
> >       sctp_set_owner_w);
> > SKB was not in the outq, and was not changed to newsk
>
> The real fix is to fix the migration to the new socket, though the
> situation on which it is happening is still not clear.
>
> The 2nd sendto() call on the reproducer is sending 212992 bytes on a
> single call. That's usually the whole sndbuf size, and will cause
> fragmentation to happen. That means the datamsg will contain several
> skbs. But still, the sacked chunks should be freed if needed while the
> remaining ones will be left on the queues that they are.

in sctp_sendmsg_to_asoc
datamsg holds his chunk result in that the sacked chunks can't be freed

list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
sctp_chunk_hold(chunk);
sctp_set_owner_w(chunk);
chunk->transport = transport;
}

any ideas to handle it?

>
> Thanks,
> Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
@ 2020-03-20 23:53     ` Qiujun Huang
  0 siblings, 0 replies; 18+ messages in thread
From: Qiujun Huang @ 2020-03-20 23:53 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David S. Miller, vyasevich, nhorman, Jakub Kicinski, linux-sctp,
	netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 2:52 AM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
>
> On Fri, Mar 20, 2020 at 07:09:59PM +0800, Qiujun Huang wrote:
> > Do accounting for skb's real sk.
> > In some case skb->sk != asoc->base.sk:
> >
> > for the trouble SKB, it was in outq->transmitted queue
> >
> > sctp_outq_sack
> >       sctp_check_transmitted
> >               SKB was moved to outq->sack
>
> There is no outq->sack. You mean outq->sacked, I assume.
>
> >       then throw away the sack queue
>
> Where? How?
> If you mean:
>         /* Throw away stuff rotting on the sack queue.  */
>         list_for_each_safe(lchunk, temp, &q->sacked) {
>                 tchunk = list_entry(lchunk, struct sctp_chunk,
>                                     transmitted_list);
>                 tsn = ntohl(tchunk->subh.data_hdr->tsn);
>                 if (TSN_lte(tsn, ctsn)) {
>                         list_del_init(&tchunk->transmitted_list);
>                         if (asoc->peer.prsctp_capable &&
>                             SCTP_PR_PRIO_ENABLED(chunk->sinfo.sinfo_flags))
>                                 asoc->sent_cnt_removable--;
>                         sctp_chunk_free(tchunk);
>
> Then sctp_chunk_free is supposed to free the datamsg as well for
> chunks that were cumulative-sacked.
> For those not cumulative-sacked, sctp_for_each_tx_datachunk() will
> handle q->sacked queue as well:
>         list_for_each_entry(chunk, &q->sacked, transmitted_list)
>                 cb(chunk);
>
> So I don't see how skbs can be overlooked here.
>
> >               SKB was deleted from outq->sack
> > (but the datamsg held SKB at sctp_datamsg_to_asoc
>
> You mean sctp_datamsg_from_user ? If so, isn't it the other way
> around? sctp_datamsg_assign() will hold the datamsg, not the skb.
>
> > So, sctp_wfree was not called to destroy SKB)
> >
> > then migrate happened
> >
> >       sctp_for_each_tx_datachunk(
> >       sctp_clear_owner_w);
> >       sctp_assoc_migrate();
> >       sctp_for_each_tx_datachunk(
> >       sctp_set_owner_w);
> > SKB was not in the outq, and was not changed to newsk
>
> The real fix is to fix the migration to the new socket, though the
> situation on which it is happening is still not clear.
>
> The 2nd sendto() call on the reproducer is sending 212992 bytes on a
> single call. That's usually the whole sndbuf size, and will cause
> fragmentation to happen. That means the datamsg will contain several
> skbs. But still, the sacked chunks should be freed if needed while the
> remaining ones will be left on the queues that they are.

in sctp_sendmsg_to_asoc
datamsg holds his chunk result in that the sacked chunks can't be freed

list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
sctp_chunk_hold(chunk);
sctp_set_owner_w(chunk);
chunk->transport = transport;
}

any ideas to handle it?

>
> Thanks,
> Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
  2020-03-20 23:53     ` Qiujun Huang
@ 2020-03-21  1:02       ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-03-21  1:02 UTC (permalink / raw)
  To: Qiujun Huang
  Cc: David S. Miller, vyasevich, nhorman, Jakub Kicinski, linux-sctp,
	netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 07:53:29AM +0800, Qiujun Huang wrote:
...
> > > So, sctp_wfree was not called to destroy SKB)
> > >
> > > then migrate happened
> > >
> > >       sctp_for_each_tx_datachunk(
> > >       sctp_clear_owner_w);
> > >       sctp_assoc_migrate();
> > >       sctp_for_each_tx_datachunk(
> > >       sctp_set_owner_w);
> > > SKB was not in the outq, and was not changed to newsk
> >
> > The real fix is to fix the migration to the new socket, though the
> > situation on which it is happening is still not clear.
> >
> > The 2nd sendto() call on the reproducer is sending 212992 bytes on a
> > single call. That's usually the whole sndbuf size, and will cause
> > fragmentation to happen. That means the datamsg will contain several
> > skbs. But still, the sacked chunks should be freed if needed while the
> > remaining ones will be left on the queues that they are.
> 
> in sctp_sendmsg_to_asoc
> datamsg holds his chunk result in that the sacked chunks can't be freed

Right! Now I see it, thanks.
In the end, it's not a locking race condition. It's just not iterating
on the lists properly.

> 
> list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
> sctp_chunk_hold(chunk);
> sctp_set_owner_w(chunk);
> chunk->transport = transport;
> }
> 
> any ideas to handle it?

sctp_for_each_tx_datachunk() needs to be aware of this situation.
Instead of iterating directly/only over the chunk list, it should
iterate over the datamsgs instead. Something like the below (just
compile tested).

Then, the old socket will be free to die regardless of the new one.
Otherwise, if this association gets stuck on retransmissions or so,
the old socket would not be freed till then.

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index fed26a1e9518..85c742310d26 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -151,9 +151,10 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
 				       void (*cb)(struct sctp_chunk *))
 
 {
+	struct sctp_datamsg *msg, *prev_msg = NULL;
 	struct sctp_outq *q = &asoc->outqueue;
 	struct sctp_transport *t;
-	struct sctp_chunk *chunk;
+	struct sctp_chunk *chunk, *c;
 
 	list_for_each_entry(t, &asoc->peer.transport_addr_list, transports)
 		list_for_each_entry(chunk, &t->transmitted, transmitted_list)
@@ -162,8 +163,14 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
 	list_for_each_entry(chunk, &q->retransmit, transmitted_list)
 		cb(chunk);
 
-	list_for_each_entry(chunk, &q->sacked, transmitted_list)
-		cb(chunk);
+	list_for_each_entry(chunk, &q->sacked, transmitted_list) {
+		msg = chunk->msg;
+		if (msg == prev_msg)
+			continue;
+		list_for_each_entry(c, &msg->chunks, frag_list)
+			cb(c);
+		prev_msg = msg;
+	}
 
 	list_for_each_entry(chunk, &q->abandoned, transmitted_list)
 		cb(chunk);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
@ 2020-03-21  1:02       ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-03-21  1:02 UTC (permalink / raw)
  To: Qiujun Huang
  Cc: David S. Miller, vyasevich, nhorman, Jakub Kicinski, linux-sctp,
	netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 07:53:29AM +0800, Qiujun Huang wrote:
...
> > > So, sctp_wfree was not called to destroy SKB)
> > >
> > > then migrate happened
> > >
> > >       sctp_for_each_tx_datachunk(
> > >       sctp_clear_owner_w);
> > >       sctp_assoc_migrate();
> > >       sctp_for_each_tx_datachunk(
> > >       sctp_set_owner_w);
> > > SKB was not in the outq, and was not changed to newsk
> >
> > The real fix is to fix the migration to the new socket, though the
> > situation on which it is happening is still not clear.
> >
> > The 2nd sendto() call on the reproducer is sending 212992 bytes on a
> > single call. That's usually the whole sndbuf size, and will cause
> > fragmentation to happen. That means the datamsg will contain several
> > skbs. But still, the sacked chunks should be freed if needed while the
> > remaining ones will be left on the queues that they are.
> 
> in sctp_sendmsg_to_asoc
> datamsg holds his chunk result in that the sacked chunks can't be freed

Right! Now I see it, thanks.
In the end, it's not a locking race condition. It's just not iterating
on the lists properly.

> 
> list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
> sctp_chunk_hold(chunk);
> sctp_set_owner_w(chunk);
> chunk->transport = transport;
> }
> 
> any ideas to handle it?

sctp_for_each_tx_datachunk() needs to be aware of this situation.
Instead of iterating directly/only over the chunk list, it should
iterate over the datamsgs instead. Something like the below (just
compile tested).

Then, the old socket will be free to die regardless of the new one.
Otherwise, if this association gets stuck on retransmissions or so,
the old socket would not be freed till then.

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index fed26a1e9518..85c742310d26 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -151,9 +151,10 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
 				       void (*cb)(struct sctp_chunk *))
 
 {
+	struct sctp_datamsg *msg, *prev_msg = NULL;
 	struct sctp_outq *q = &asoc->outqueue;
 	struct sctp_transport *t;
-	struct sctp_chunk *chunk;
+	struct sctp_chunk *chunk, *c;
 
 	list_for_each_entry(t, &asoc->peer.transport_addr_list, transports)
 		list_for_each_entry(chunk, &t->transmitted, transmitted_list)
@@ -162,8 +163,14 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
 	list_for_each_entry(chunk, &q->retransmit, transmitted_list)
 		cb(chunk);
 
-	list_for_each_entry(chunk, &q->sacked, transmitted_list)
-		cb(chunk);
+	list_for_each_entry(chunk, &q->sacked, transmitted_list) {
+		msg = chunk->msg;
+		if (msg = prev_msg)
+			continue;
+		list_for_each_entry(c, &msg->chunks, frag_list)
+			cb(c);
+		prev_msg = msg;
+	}
 
 	list_for_each_entry(chunk, &q->abandoned, transmitted_list)
 		cb(chunk);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
  2020-03-21  1:02       ` Marcelo Ricardo Leitner
@ 2020-03-21  1:23         ` Qiujun Huang
  -1 siblings, 0 replies; 18+ messages in thread
From: Qiujun Huang @ 2020-03-21  1:23 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David S. Miller, vyasevich, nhorman, Jakub Kicinski, linux-sctp,
	netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 9:02 AM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
>
> On Sat, Mar 21, 2020 at 07:53:29AM +0800, Qiujun Huang wrote:
> ...
> > > > So, sctp_wfree was not called to destroy SKB)
> > > >
> > > > then migrate happened
> > > >
> > > >       sctp_for_each_tx_datachunk(
> > > >       sctp_clear_owner_w);
> > > >       sctp_assoc_migrate();
> > > >       sctp_for_each_tx_datachunk(
> > > >       sctp_set_owner_w);
> > > > SKB was not in the outq, and was not changed to newsk
> > >
> > > The real fix is to fix the migration to the new socket, though the
> > > situation on which it is happening is still not clear.
> > >
> > > The 2nd sendto() call on the reproducer is sending 212992 bytes on a
> > > single call. That's usually the whole sndbuf size, and will cause
> > > fragmentation to happen. That means the datamsg will contain several
> > > skbs. But still, the sacked chunks should be freed if needed while the
> > > remaining ones will be left on the queues that they are.
> >
> > in sctp_sendmsg_to_asoc
> > datamsg holds his chunk result in that the sacked chunks can't be freed
>
> Right! Now I see it, thanks.
> In the end, it's not a locking race condition. It's just not iterating
> on the lists properly.
>
> >
> > list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
> > sctp_chunk_hold(chunk);
> > sctp_set_owner_w(chunk);
> > chunk->transport = transport;
> > }
> >
> > any ideas to handle it?
>
> sctp_for_each_tx_datachunk() needs to be aware of this situation.
> Instead of iterating directly/only over the chunk list, it should
> iterate over the datamsgs instead. Something like the below (just
> compile tested).
>
> Then, the old socket will be free to die regardless of the new one.
> Otherwise, if this association gets stuck on retransmissions or so,
> the old socket would not be freed till then.
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index fed26a1e9518..85c742310d26 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -151,9 +151,10 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
>                                        void (*cb)(struct sctp_chunk *))
>
>  {
> +       struct sctp_datamsg *msg, *prev_msg = NULL;
>         struct sctp_outq *q = &asoc->outqueue;
>         struct sctp_transport *t;
> -       struct sctp_chunk *chunk;
> +       struct sctp_chunk *chunk, *c;
>
>         list_for_each_entry(t, &asoc->peer.transport_addr_list, transports)
>                 list_for_each_entry(chunk, &t->transmitted, transmitted_list)
> @@ -162,8 +163,14 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
>         list_for_each_entry(chunk, &q->retransmit, transmitted_list)
>                 cb(chunk);
>
> -       list_for_each_entry(chunk, &q->sacked, transmitted_list)
> -               cb(chunk);
> +       list_for_each_entry(chunk, &q->sacked, transmitted_list) {
> +               msg = chunk->msg;
> +               if (msg == prev_msg)
> +                       continue;
> +               list_for_each_entry(c, &msg->chunks, frag_list)
> +                       cb(c);
> +               prev_msg = msg;
> +       }

great. I'll trigger a syzbot test. Thanks.

>
>         list_for_each_entry(chunk, &q->abandoned, transmitted_list)
>                 cb(chunk);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
@ 2020-03-21  1:23         ` Qiujun Huang
  0 siblings, 0 replies; 18+ messages in thread
From: Qiujun Huang @ 2020-03-21  1:23 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David S. Miller, vyasevich, nhorman, Jakub Kicinski, linux-sctp,
	netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 9:02 AM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
>
> On Sat, Mar 21, 2020 at 07:53:29AM +0800, Qiujun Huang wrote:
> ...
> > > > So, sctp_wfree was not called to destroy SKB)
> > > >
> > > > then migrate happened
> > > >
> > > >       sctp_for_each_tx_datachunk(
> > > >       sctp_clear_owner_w);
> > > >       sctp_assoc_migrate();
> > > >       sctp_for_each_tx_datachunk(
> > > >       sctp_set_owner_w);
> > > > SKB was not in the outq, and was not changed to newsk
> > >
> > > The real fix is to fix the migration to the new socket, though the
> > > situation on which it is happening is still not clear.
> > >
> > > The 2nd sendto() call on the reproducer is sending 212992 bytes on a
> > > single call. That's usually the whole sndbuf size, and will cause
> > > fragmentation to happen. That means the datamsg will contain several
> > > skbs. But still, the sacked chunks should be freed if needed while the
> > > remaining ones will be left on the queues that they are.
> >
> > in sctp_sendmsg_to_asoc
> > datamsg holds his chunk result in that the sacked chunks can't be freed
>
> Right! Now I see it, thanks.
> In the end, it's not a locking race condition. It's just not iterating
> on the lists properly.
>
> >
> > list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
> > sctp_chunk_hold(chunk);
> > sctp_set_owner_w(chunk);
> > chunk->transport = transport;
> > }
> >
> > any ideas to handle it?
>
> sctp_for_each_tx_datachunk() needs to be aware of this situation.
> Instead of iterating directly/only over the chunk list, it should
> iterate over the datamsgs instead. Something like the below (just
> compile tested).
>
> Then, the old socket will be free to die regardless of the new one.
> Otherwise, if this association gets stuck on retransmissions or so,
> the old socket would not be freed till then.
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index fed26a1e9518..85c742310d26 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -151,9 +151,10 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
>                                        void (*cb)(struct sctp_chunk *))
>
>  {
> +       struct sctp_datamsg *msg, *prev_msg = NULL;
>         struct sctp_outq *q = &asoc->outqueue;
>         struct sctp_transport *t;
> -       struct sctp_chunk *chunk;
> +       struct sctp_chunk *chunk, *c;
>
>         list_for_each_entry(t, &asoc->peer.transport_addr_list, transports)
>                 list_for_each_entry(chunk, &t->transmitted, transmitted_list)
> @@ -162,8 +163,14 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
>         list_for_each_entry(chunk, &q->retransmit, transmitted_list)
>                 cb(chunk);
>
> -       list_for_each_entry(chunk, &q->sacked, transmitted_list)
> -               cb(chunk);
> +       list_for_each_entry(chunk, &q->sacked, transmitted_list) {
> +               msg = chunk->msg;
> +               if (msg = prev_msg)
> +                       continue;
> +               list_for_each_entry(c, &msg->chunks, frag_list)
> +                       cb(c);
> +               prev_msg = msg;
> +       }

great. I'll trigger a syzbot test. Thanks.

>
>         list_for_each_entry(chunk, &q->abandoned, transmitted_list)
>                 cb(chunk);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
  2020-03-21  1:23         ` Qiujun Huang
@ 2020-03-21  1:33           ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-03-21  1:33 UTC (permalink / raw)
  To: Qiujun Huang
  Cc: David S. Miller, vyasevich, nhorman, Jakub Kicinski, linux-sctp,
	netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 09:23:54AM +0800, Qiujun Huang wrote:
> On Sat, Mar 21, 2020 at 9:02 AM Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> >
> > On Sat, Mar 21, 2020 at 07:53:29AM +0800, Qiujun Huang wrote:
> > ...
> > > > > So, sctp_wfree was not called to destroy SKB)
> > > > >
> > > > > then migrate happened
> > > > >
> > > > >       sctp_for_each_tx_datachunk(
> > > > >       sctp_clear_owner_w);
> > > > >       sctp_assoc_migrate();
> > > > >       sctp_for_each_tx_datachunk(
> > > > >       sctp_set_owner_w);
> > > > > SKB was not in the outq, and was not changed to newsk
> > > >
> > > > The real fix is to fix the migration to the new socket, though the
> > > > situation on which it is happening is still not clear.
> > > >
> > > > The 2nd sendto() call on the reproducer is sending 212992 bytes on a
> > > > single call. That's usually the whole sndbuf size, and will cause
> > > > fragmentation to happen. That means the datamsg will contain several
> > > > skbs. But still, the sacked chunks should be freed if needed while the
> > > > remaining ones will be left on the queues that they are.
> > >
> > > in sctp_sendmsg_to_asoc
> > > datamsg holds his chunk result in that the sacked chunks can't be freed
> >
> > Right! Now I see it, thanks.
> > In the end, it's not a locking race condition. It's just not iterating
> > on the lists properly.
> >
> > >
> > > list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
> > > sctp_chunk_hold(chunk);
> > > sctp_set_owner_w(chunk);
> > > chunk->transport = transport;
> > > }
> > >
> > > any ideas to handle it?
> >
> > sctp_for_each_tx_datachunk() needs to be aware of this situation.
> > Instead of iterating directly/only over the chunk list, it should
> > iterate over the datamsgs instead. Something like the below (just
> > compile tested).
> >
> > Then, the old socket will be free to die regardless of the new one.
> > Otherwise, if this association gets stuck on retransmissions or so,
> > the old socket would not be freed till then.
> >
> > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > index fed26a1e9518..85c742310d26 100644
> > --- a/net/sctp/socket.c
> > +++ b/net/sctp/socket.c
> > @@ -151,9 +151,10 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
> >                                        void (*cb)(struct sctp_chunk *))
> >
> >  {
> > +       struct sctp_datamsg *msg, *prev_msg = NULL;
> >         struct sctp_outq *q = &asoc->outqueue;
> >         struct sctp_transport *t;
> > -       struct sctp_chunk *chunk;
> > +       struct sctp_chunk *chunk, *c;

I missed to swap some lines here, for reverse christmass-tree style,
btw.

> >
> >         list_for_each_entry(t, &asoc->peer.transport_addr_list, transports)
> >                 list_for_each_entry(chunk, &t->transmitted, transmitted_list)
> > @@ -162,8 +163,14 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
> >         list_for_each_entry(chunk, &q->retransmit, transmitted_list)
> >                 cb(chunk);
> >
> > -       list_for_each_entry(chunk, &q->sacked, transmitted_list)
> > -               cb(chunk);
> > +       list_for_each_entry(chunk, &q->sacked, transmitted_list) {
> > +               msg = chunk->msg;
> > +               if (msg == prev_msg)
> > +                       continue;
> > +               list_for_each_entry(c, &msg->chunks, frag_list)
> > +                       cb(c);
> > +               prev_msg = msg;
> > +       }
> 
> great. I'll trigger a syzbot test. Thanks.

Mind that it may need to handled on the other lists as well. I didn't
check them :]

> 
> >
> >         list_for_each_entry(chunk, &q->abandoned, transmitted_list)
> >                 cb(chunk);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] sctp: fix refcount bug in sctp_wfree
@ 2020-03-21  1:33           ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-03-21  1:33 UTC (permalink / raw)
  To: Qiujun Huang
  Cc: David S. Miller, vyasevich, nhorman, Jakub Kicinski, linux-sctp,
	netdev, LKML, anenbupt

On Sat, Mar 21, 2020 at 09:23:54AM +0800, Qiujun Huang wrote:
> On Sat, Mar 21, 2020 at 9:02 AM Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> >
> > On Sat, Mar 21, 2020 at 07:53:29AM +0800, Qiujun Huang wrote:
> > ...
> > > > > So, sctp_wfree was not called to destroy SKB)
> > > > >
> > > > > then migrate happened
> > > > >
> > > > >       sctp_for_each_tx_datachunk(
> > > > >       sctp_clear_owner_w);
> > > > >       sctp_assoc_migrate();
> > > > >       sctp_for_each_tx_datachunk(
> > > > >       sctp_set_owner_w);
> > > > > SKB was not in the outq, and was not changed to newsk
> > > >
> > > > The real fix is to fix the migration to the new socket, though the
> > > > situation on which it is happening is still not clear.
> > > >
> > > > The 2nd sendto() call on the reproducer is sending 212992 bytes on a
> > > > single call. That's usually the whole sndbuf size, and will cause
> > > > fragmentation to happen. That means the datamsg will contain several
> > > > skbs. But still, the sacked chunks should be freed if needed while the
> > > > remaining ones will be left on the queues that they are.
> > >
> > > in sctp_sendmsg_to_asoc
> > > datamsg holds his chunk result in that the sacked chunks can't be freed
> >
> > Right! Now I see it, thanks.
> > In the end, it's not a locking race condition. It's just not iterating
> > on the lists properly.
> >
> > >
> > > list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
> > > sctp_chunk_hold(chunk);
> > > sctp_set_owner_w(chunk);
> > > chunk->transport = transport;
> > > }
> > >
> > > any ideas to handle it?
> >
> > sctp_for_each_tx_datachunk() needs to be aware of this situation.
> > Instead of iterating directly/only over the chunk list, it should
> > iterate over the datamsgs instead. Something like the below (just
> > compile tested).
> >
> > Then, the old socket will be free to die regardless of the new one.
> > Otherwise, if this association gets stuck on retransmissions or so,
> > the old socket would not be freed till then.
> >
> > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > index fed26a1e9518..85c742310d26 100644
> > --- a/net/sctp/socket.c
> > +++ b/net/sctp/socket.c
> > @@ -151,9 +151,10 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
> >                                        void (*cb)(struct sctp_chunk *))
> >
> >  {
> > +       struct sctp_datamsg *msg, *prev_msg = NULL;
> >         struct sctp_outq *q = &asoc->outqueue;
> >         struct sctp_transport *t;
> > -       struct sctp_chunk *chunk;
> > +       struct sctp_chunk *chunk, *c;

I missed to swap some lines here, for reverse christmass-tree style,
btw.

> >
> >         list_for_each_entry(t, &asoc->peer.transport_addr_list, transports)
> >                 list_for_each_entry(chunk, &t->transmitted, transmitted_list)
> > @@ -162,8 +163,14 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc,
> >         list_for_each_entry(chunk, &q->retransmit, transmitted_list)
> >                 cb(chunk);
> >
> > -       list_for_each_entry(chunk, &q->sacked, transmitted_list)
> > -               cb(chunk);
> > +       list_for_each_entry(chunk, &q->sacked, transmitted_list) {
> > +               msg = chunk->msg;
> > +               if (msg = prev_msg)
> > +                       continue;
> > +               list_for_each_entry(c, &msg->chunks, frag_list)
> > +                       cb(c);
> > +               prev_msg = msg;
> > +       }
> 
> great. I'll trigger a syzbot test. Thanks.

Mind that it may need to handled on the other lists as well. I didn't
check them :]

> 
> >
> >         list_for_each_entry(chunk, &q->abandoned, transmitted_list)
> >                 cb(chunk);

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-03-21  1:33 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-20 11:09 [PATCH v3] sctp: fix refcount bug in sctp_wfree Qiujun Huang
2020-03-20 11:09 ` Qiujun Huang
2020-03-20 17:10 ` Eric Dumazet
2020-03-20 17:10   ` Eric Dumazet
2020-03-20 23:36   ` Qiujun Huang
2020-03-20 23:36     ` Qiujun Huang
2020-03-20 18:52 ` Marcelo Ricardo Leitner
2020-03-20 18:52   ` Marcelo Ricardo Leitner
2020-03-20 23:48   ` Qiujun Huang
2020-03-20 23:48     ` Qiujun Huang
2020-03-20 23:53   ` Qiujun Huang
2020-03-20 23:53     ` Qiujun Huang
2020-03-21  1:02     ` Marcelo Ricardo Leitner
2020-03-21  1:02       ` Marcelo Ricardo Leitner
2020-03-21  1:23       ` Qiujun Huang
2020-03-21  1:23         ` Qiujun Huang
2020-03-21  1:33         ` Marcelo Ricardo Leitner
2020-03-21  1:33           ` Marcelo Ricardo Leitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.