[RESEND 1/2] sctp: export sctp_endpoint_{hold,put}() for use by seperate modules

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RESEND 1/2] sctp: export sctp_endpoint_{hold,put}() for use by seperate modules
@ 2021-12-14 21:57 Lee Jones
  2021-12-14 21:57 ` [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF Lee Jones
  0 siblings, 1 reply; 18+ messages in thread
From: Lee Jones @ 2021-12-14 21:57 UTC (permalink / raw)
  To: lee.jones
  Cc: linux-kernel, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, Jakub Kicinski,
	lksctp developers, H.P. Yarroll, Karl Knutson, Jon Grimm,
	Xingang Guo, Hui Huang, Sridhar Samudrala, Daisy Chang,
	Ryan Layer, Kevin Gao, netdev

net/sctp/diag.c for instance is built into its own separate module
(sctp_diag.ko) and requires the use of sctp_endpoint_{hold,put}() in
order to prevent a recently found use-after-free issue.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: lksctp developers <linux-sctp@vger.kernel.org>
Cc: "H.P. Yarroll" <piggy@acm.org>
Cc: Karl Knutson <karl@athena.chicago.il.us>
Cc: Jon Grimm <jgrimm@us.ibm.com>
Cc: Xingang Guo <xingang.guo@intel.com>
Cc: Hui Huang <hui.huang@nokia.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Daisy Chang <daisyc@us.ibm.com>
Cc: Ryan Layer <rmlayer@us.ibm.com>
Cc: Kevin Gao <kevin.gao@intel.com>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 net/sctp/endpointola.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 48c9c2c7602f7..7c36056f3a1b4 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -226,6 +226,7 @@ void sctp_endpoint_hold(struct sctp_endpoint *ep)
 {
 	refcount_inc(&ep->base.refcnt);
 }
+EXPORT_SYMBOL_GPL(sctp_endpoint_hold);
 
 /* Release a reference to an endpoint and clean up if there are
  * no more references.
@@ -235,6 +236,7 @@ void sctp_endpoint_put(struct sctp_endpoint *ep)
 	if (refcount_dec_and_test(&ep->base.refcnt))
 		sctp_endpoint_destroy(ep);
 }
+EXPORT_SYMBOL_GPL(sctp_endpoint_put);
 
 /* Is this the endpoint we are looking for?  */
 struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-14 21:57 [RESEND 1/2] sctp: export sctp_endpoint_{hold,put}() for use by seperate modules Lee Jones
@ 2021-12-14 21:57 ` Lee Jones
  2021-12-16  1:48   ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Lee Jones @ 2021-12-14 21:57 UTC (permalink / raw)
  To: lee.jones
  Cc: linux-kernel, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, Jakub Kicinski,
	lksctp developers, H.P. Yarroll, Karl Knutson, Jon Grimm,
	Xingang Guo, Hui Huang, Sridhar Samudrala, Daisy Chang,
	Ryan Layer, Kevin Gao, netdev

The cause of the resultant dump_stack() reported below is a
dereference of a freed pointer to 'struct sctp_endpoint' in
sctp_sock_dump().

This race condition occurs when a transport is cached into its
associated hash table followed by an endpoint/sock migration to a new
association in sctp_assoc_migrate() prior to their subsequent use in
sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
table calling into sctp_sock_dump() where the dereference occurs.

  BUG: KASAN: use-after-free in sctp_sock_dump+0xa8/0x438 [sctp_diag]
  Call trace:
   dump_backtrace+0x0/0x2dc
   show_stack+0x20/0x2c
   dump_stack+0x120/0x144
   print_address_description+0x80/0x2f4
   __kasan_report+0x174/0x194
   kasan_report+0x10/0x18
   __asan_load8+0x84/0x8c
   sctp_sock_dump+0xa8/0x438 [sctp_diag]
   sctp_for_each_transport+0x1e0/0x26c [sctp]
   sctp_diag_dump+0x180/0x1f0 [sctp_diag]
   inet_diag_dump+0x12c/0x168
   netlink_dump+0x24c/0x5b8
   __netlink_dump_start+0x274/0x2a8
   inet_diag_handler_cmd+0x224/0x274
   sock_diag_rcv_msg+0x21c/0x230
   netlink_rcv_skb+0xe0/0x1bc
   sock_diag_rcv+0x34/0x48
   netlink_unicast+0x3b4/0x430
   netlink_sendmsg+0x4f0/0x574
   sock_write_iter+0x18c/0x1f0
   do_iter_readv_writev+0x230/0x2a8
   do_iter_write+0xc8/0x2b4
   vfs_writev+0xf8/0x184
   do_writev+0xb0/0x1a8
   __arm64_sys_writev+0x4c/0x5c
   el0_svc_common+0x118/0x250
   el0_svc_handler+0x3c/0x9c
   el0_svc+0x8/0xc

To prevent this from happening we need to take a references to the
to-be-used/dereferenced 'struct sock' and 'struct sctp_endpoint's
until such a time when we know it can be safely released.

When KASAN is not enabled, a similar, but slightly different NULL
pointer derefernce crash occurs later along the thread of execution in
inet_sctp_diag_fill() this time.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: lksctp developers <linux-sctp@vger.kernel.org>
Cc: "H.P. Yarroll" <piggy@acm.org>
Cc: Karl Knutson <karl@athena.chicago.il.us>
Cc: Jon Grimm <jgrimm@us.ibm.com>
Cc: Xingang Guo <xingang.guo@intel.com>
Cc: Hui Huang <hui.huang@nokia.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Daisy Chang <daisyc@us.ibm.com>
Cc: Ryan Layer <rmlayer@us.ibm.com>
Cc: Kevin Gao <kevin.gao@intel.com>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 net/sctp/diag.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index 760b367644c12..2029b240b6f24 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -301,6 +301,8 @@ static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
 	struct sctp_association *assoc;
 	int err = 0;
 
+	sctp_endpoint_hold(ep);
+	sock_hold(sk);
 	lock_sock(sk);
 	list_for_each_entry(assoc, &ep->asocs, asocs) {
 		if (cb->args[4] < cb->args[1])
@@ -341,6 +343,8 @@ static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
 	cb->args[4] = 0;
 release:
 	release_sock(sk);
+	sock_put(sk);
+	sctp_endpoint_put(ep);
 	return err;
 }
 
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-14 21:57 ` [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF Lee Jones
@ 2021-12-16  1:48   ` Jakub Kicinski
  2021-12-16 16:12     ` Xin Long
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2021-12-16  1:48 UTC (permalink / raw)
  To: Lee Jones
  Cc: linux-kernel, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao, netdev,
	Xin Long

On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> The cause of the resultant dump_stack() reported below is a
> dereference of a freed pointer to 'struct sctp_endpoint' in
> sctp_sock_dump().
> 
> This race condition occurs when a transport is cached into its
> associated hash table followed by an endpoint/sock migration to a new
> association in sctp_assoc_migrate() prior to their subsequent use in
> sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> table calling into sctp_sock_dump() where the dereference occurs.
> 
>   BUG: KASAN: use-after-free in sctp_sock_dump+0xa8/0x438 [sctp_diag]
>   Call trace:
>    dump_backtrace+0x0/0x2dc
>    show_stack+0x20/0x2c
>    dump_stack+0x120/0x144
>    print_address_description+0x80/0x2f4
>    __kasan_report+0x174/0x194
>    kasan_report+0x10/0x18
>    __asan_load8+0x84/0x8c
>    sctp_sock_dump+0xa8/0x438 [sctp_diag]
>    sctp_for_each_transport+0x1e0/0x26c [sctp]
>    sctp_diag_dump+0x180/0x1f0 [sctp_diag]
>    inet_diag_dump+0x12c/0x168
>    netlink_dump+0x24c/0x5b8
>    __netlink_dump_start+0x274/0x2a8
>    inet_diag_handler_cmd+0x224/0x274
>    sock_diag_rcv_msg+0x21c/0x230
>    netlink_rcv_skb+0xe0/0x1bc
>    sock_diag_rcv+0x34/0x48
>    netlink_unicast+0x3b4/0x430
>    netlink_sendmsg+0x4f0/0x574
>    sock_write_iter+0x18c/0x1f0
>    do_iter_readv_writev+0x230/0x2a8
>    do_iter_write+0xc8/0x2b4
>    vfs_writev+0xf8/0x184
>    do_writev+0xb0/0x1a8
>    __arm64_sys_writev+0x4c/0x5c
>    el0_svc_common+0x118/0x250
>    el0_svc_handler+0x3c/0x9c
>    el0_svc+0x8/0xc
> 
> To prevent this from happening we need to take a references to the
> to-be-used/dereferenced 'struct sock' and 'struct sctp_endpoint's
> until such a time when we know it can be safely released.
> 
> When KASAN is not enabled, a similar, but slightly different NULL
> pointer derefernce crash occurs later along the thread of execution in
> inet_sctp_diag_fill() this time.

Are you able to identify where the bug was introduced? Fixes tag would
be good to have here. 

You should squash the two patches together.

> diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> index 760b367644c12..2029b240b6f24 100644
> --- a/net/sctp/diag.c
> +++ b/net/sctp/diag.c
> @@ -301,6 +301,8 @@ static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
>  	struct sctp_association *assoc;
>  	int err = 0;
>  
> +	sctp_endpoint_hold(ep);
> +	sock_hold(sk);
>  	lock_sock(sk);
>  	list_for_each_entry(assoc, &ep->asocs, asocs) {
>  		if (cb->args[4] < cb->args[1])
> @@ -341,6 +343,8 @@ static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
>  	cb->args[4] = 0;
>  release:
>  	release_sock(sk);
> +	sock_put(sk);
> +	sctp_endpoint_put(ep);
>  	return err;
>  }
>  


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16  1:48   ` Jakub Kicinski
@ 2021-12-16 16:12     ` Xin Long
  2021-12-16 16:39       ` Lee Jones
  0 siblings, 1 reply; 18+ messages in thread
From: Xin Long @ 2021-12-16 16:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Lee Jones, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > The cause of the resultant dump_stack() reported below is a
> > dereference of a freed pointer to 'struct sctp_endpoint' in
> > sctp_sock_dump().
> >
> > This race condition occurs when a transport is cached into its
> > associated hash table followed by an endpoint/sock migration to a new
> > association in sctp_assoc_migrate() prior to their subsequent use in
> > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > table calling into sctp_sock_dump() where the dereference occurs.
in sctp_sock_dump():
        struct sock *sk = ep->base.sk;
        ... <--[1]
        lock_sock(sk);

Do you mean in [1], the sk is peeled off and gets freed elsewhere?
if that's true, it's still late to do sock_hold(sk) in your this patch.

I talked with Marcelo about this before, if the possible UAF in [1] exists,
the problem also exists in the main RX path sctp_rcv().

> >
> >   BUG: KASAN: use-after-free in sctp_sock_dump+0xa8/0x438 [sctp_diag]
> >   Call trace:
> >    dump_backtrace+0x0/0x2dc
> >    show_stack+0x20/0x2c
> >    dump_stack+0x120/0x144
> >    print_address_description+0x80/0x2f4
> >    __kasan_report+0x174/0x194
> >    kasan_report+0x10/0x18
> >    __asan_load8+0x84/0x8c
> >    sctp_sock_dump+0xa8/0x438 [sctp_diag]
> >    sctp_for_each_transport+0x1e0/0x26c [sctp]
> >    sctp_diag_dump+0x180/0x1f0 [sctp_diag]
> >    inet_diag_dump+0x12c/0x168
> >    netlink_dump+0x24c/0x5b8
> >    __netlink_dump_start+0x274/0x2a8
> >    inet_diag_handler_cmd+0x224/0x274
> >    sock_diag_rcv_msg+0x21c/0x230
> >    netlink_rcv_skb+0xe0/0x1bc
> >    sock_diag_rcv+0x34/0x48
> >    netlink_unicast+0x3b4/0x430
> >    netlink_sendmsg+0x4f0/0x574
> >    sock_write_iter+0x18c/0x1f0
> >    do_iter_readv_writev+0x230/0x2a8
> >    do_iter_write+0xc8/0x2b4
> >    vfs_writev+0xf8/0x184
> >    do_writev+0xb0/0x1a8
> >    __arm64_sys_writev+0x4c/0x5c
> >    el0_svc_common+0x118/0x250
> >    el0_svc_handler+0x3c/0x9c
> >    el0_svc+0x8/0xc
> >
> > To prevent this from happening we need to take a references to the
> > to-be-used/dereferenced 'struct sock' and 'struct sctp_endpoint's
> > until such a time when we know it can be safely released.
> >
> > When KASAN is not enabled, a similar, but slightly different NULL
> > pointer derefernce crash occurs later along the thread of execution in
> > inet_sctp_diag_fill() this time.
Are you able to reproduce this issue?

What I'm thinking is to fix it by freeing sk in call_rcu() by
sock_set_flag(sock->sk, SOCK_RCU_FREE),
and add rcu_read_lock() in sctp_sock_dump().

Thanks.

>
> Are you able to identify where the bug was introduced? Fixes tag would
> be good to have here.
>
> You should squash the two patches together.
>
> > diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> > index 760b367644c12..2029b240b6f24 100644
> > --- a/net/sctp/diag.c
> > +++ b/net/sctp/diag.c
> > @@ -301,6 +301,8 @@ static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> >       struct sctp_association *assoc;
> >       int err = 0;
> >
> > +     sctp_endpoint_hold(ep);
> > +     sock_hold(sk);
> >       lock_sock(sk);
> >       list_for_each_entry(assoc, &ep->asocs, asocs) {
> >               if (cb->args[4] < cb->args[1])
> > @@ -341,6 +343,8 @@ static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> >       cb->args[4] = 0;
> >  release:
> >       release_sock(sk);
> > +     sock_put(sk);
> > +     sctp_endpoint_put(ep);
> >       return err;
> >  }
> >
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 16:12     ` Xin Long
@ 2021-12-16 16:39       ` Lee Jones
  2021-12-16 16:52         ` Xin Long
  2021-12-16 20:44         ` Jakub Kicinski
  0 siblings, 2 replies; 18+ messages in thread
From: Lee Jones @ 2021-12-16 16:39 UTC (permalink / raw)
  To: Xin Long
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Thu, 16 Dec 2021, Xin Long wrote:

> On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > The cause of the resultant dump_stack() reported below is a
> > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > sctp_sock_dump().
> > >
> > > This race condition occurs when a transport is cached into its
> > > associated hash table followed by an endpoint/sock migration to a new
> > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > table calling into sctp_sock_dump() where the dereference occurs.

> in sctp_sock_dump():
>         struct sock *sk = ep->base.sk;
>         ... <--[1]
>         lock_sock(sk);
> 
> Do you mean in [1], the sk is peeled off and gets freed elsewhere?

'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().

> if that's true, it's still late to do sock_hold(sk) in your this patch.

No, that's not right.

The schedule happens *inside* the lock_sock() call.

So if you take the reference before it, you're good.

> I talked with Marcelo about this before, if the possible UAF in [1] exists,
> the problem also exists in the main RX path sctp_rcv().
> 
> > >
> > >   BUG: KASAN: use-after-free in sctp_sock_dump+0xa8/0x438 [sctp_diag]
> > >   Call trace:
> > >    dump_backtrace+0x0/0x2dc
> > >    show_stack+0x20/0x2c
> > >    dump_stack+0x120/0x144
> > >    print_address_description+0x80/0x2f4
> > >    __kasan_report+0x174/0x194
> > >    kasan_report+0x10/0x18
> > >    __asan_load8+0x84/0x8c
> > >    sctp_sock_dump+0xa8/0x438 [sctp_diag]
> > >    sctp_for_each_transport+0x1e0/0x26c [sctp]
> > >    sctp_diag_dump+0x180/0x1f0 [sctp_diag]
> > >    inet_diag_dump+0x12c/0x168
> > >    netlink_dump+0x24c/0x5b8
> > >    __netlink_dump_start+0x274/0x2a8
> > >    inet_diag_handler_cmd+0x224/0x274
> > >    sock_diag_rcv_msg+0x21c/0x230
> > >    netlink_rcv_skb+0xe0/0x1bc
> > >    sock_diag_rcv+0x34/0x48
> > >    netlink_unicast+0x3b4/0x430
> > >    netlink_sendmsg+0x4f0/0x574
> > >    sock_write_iter+0x18c/0x1f0
> > >    do_iter_readv_writev+0x230/0x2a8
> > >    do_iter_write+0xc8/0x2b4
> > >    vfs_writev+0xf8/0x184
> > >    do_writev+0xb0/0x1a8
> > >    __arm64_sys_writev+0x4c/0x5c
> > >    el0_svc_common+0x118/0x250
> > >    el0_svc_handler+0x3c/0x9c
> > >    el0_svc+0x8/0xc
> > >
> > > To prevent this from happening we need to take a references to the
> > > to-be-used/dereferenced 'struct sock' and 'struct sctp_endpoint's
> > > until such a time when we know it can be safely released.
> > >
> > > When KASAN is not enabled, a similar, but slightly different NULL
> > > pointer derefernce crash occurs later along the thread of execution in
> > > inet_sctp_diag_fill() this time.
> Are you able to reproduce this issue?

Yes 100% of the time without this patch.

0% of the time with it applied.

> What I'm thinking is to fix it by freeing sk in call_rcu() by
> sock_set_flag(sock->sk, SOCK_RCU_FREE),
> and add rcu_read_lock() in sctp_sock_dump().
> 
> Thanks.
> 
> >
> > Are you able to identify where the bug was introduced? Fixes tag would
> > be good to have here.

It's probably been there since the code was introduced.

I'll see how far back we have to go.

> > You should squash the two patches together.

I generally like patches to encapsulate functional changes.

This one depends on the other, but they are not functionally related.

You're the boss though - I'll squash them if you insist.

> > > diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> > > index 760b367644c12..2029b240b6f24 100644
> > > --- a/net/sctp/diag.c
> > > +++ b/net/sctp/diag.c
> > > @@ -301,6 +301,8 @@ static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> > >       struct sctp_association *assoc;
> > >       int err = 0;
> > >
> > > +     sctp_endpoint_hold(ep);
> > > +     sock_hold(sk);
> > >       lock_sock(sk);
> > >       list_for_each_entry(assoc, &ep->asocs, asocs) {
> > >               if (cb->args[4] < cb->args[1])
> > > @@ -341,6 +343,8 @@ static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> > >       cb->args[4] = 0;
> > >  release:
> > >       release_sock(sk);
> > > +     sock_put(sk);
> > > +     sctp_endpoint_put(ep);
> > >       return err;
> > >  }
> > >
> >

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 16:39       ` Lee Jones
@ 2021-12-16 16:52         ` Xin Long
  2021-12-16 17:13           ` Lee Jones
  2021-12-16 20:44         ` Jakub Kicinski
  1 sibling, 1 reply; 18+ messages in thread
From: Xin Long @ 2021-12-16 16:52 UTC (permalink / raw)
  To: Lee Jones
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
>
> On Thu, 16 Dec 2021, Xin Long wrote:
>
> > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > The cause of the resultant dump_stack() reported below is a
> > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > sctp_sock_dump().
> > > >
> > > > This race condition occurs when a transport is cached into its
> > > > associated hash table followed by an endpoint/sock migration to a new
> > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > table calling into sctp_sock_dump() where the dereference occurs.
>
> > in sctp_sock_dump():
> >         struct sock *sk = ep->base.sk;
> >         ... <--[1]
> >         lock_sock(sk);
> >
> > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
>
> 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
>
> > if that's true, it's still late to do sock_hold(sk) in your this patch.
>
> No, that's not right.
>
> The schedule happens *inside* the lock_sock() call.
Sorry, I don't follow this.
We can't expect when the schedule happens, why do you think this
can never be scheduled before the lock_sock() call?

If the sock is peeled off or is being freed, we shouldn't dump this sock,
and it's better to skip it.

>
> So if you take the reference before it, you're good.
>
> > I talked with Marcelo about this before, if the possible UAF in [1] exists,
> > the problem also exists in the main RX path sctp_rcv().
> >
> > > >
> > > >   BUG: KASAN: use-after-free in sctp_sock_dump+0xa8/0x438 [sctp_diag]
> > > >   Call trace:
> > > >    dump_backtrace+0x0/0x2dc
> > > >    show_stack+0x20/0x2c
> > > >    dump_stack+0x120/0x144
> > > >    print_address_description+0x80/0x2f4
> > > >    __kasan_report+0x174/0x194
> > > >    kasan_report+0x10/0x18
> > > >    __asan_load8+0x84/0x8c
> > > >    sctp_sock_dump+0xa8/0x438 [sctp_diag]
> > > >    sctp_for_each_transport+0x1e0/0x26c [sctp]
> > > >    sctp_diag_dump+0x180/0x1f0 [sctp_diag]
> > > >    inet_diag_dump+0x12c/0x168
> > > >    netlink_dump+0x24c/0x5b8
> > > >    __netlink_dump_start+0x274/0x2a8
> > > >    inet_diag_handler_cmd+0x224/0x274
> > > >    sock_diag_rcv_msg+0x21c/0x230
> > > >    netlink_rcv_skb+0xe0/0x1bc
> > > >    sock_diag_rcv+0x34/0x48
> > > >    netlink_unicast+0x3b4/0x430
> > > >    netlink_sendmsg+0x4f0/0x574
> > > >    sock_write_iter+0x18c/0x1f0
> > > >    do_iter_readv_writev+0x230/0x2a8
> > > >    do_iter_write+0xc8/0x2b4
> > > >    vfs_writev+0xf8/0x184
> > > >    do_writev+0xb0/0x1a8
> > > >    __arm64_sys_writev+0x4c/0x5c
> > > >    el0_svc_common+0x118/0x250
> > > >    el0_svc_handler+0x3c/0x9c
> > > >    el0_svc+0x8/0xc
> > > >
> > > > To prevent this from happening we need to take a references to the
> > > > to-be-used/dereferenced 'struct sock' and 'struct sctp_endpoint's
> > > > until such a time when we know it can be safely released.
> > > >
> > > > When KASAN is not enabled, a similar, but slightly different NULL
> > > > pointer derefernce crash occurs later along the thread of execution in
> > > > inet_sctp_diag_fill() this time.
> > Are you able to reproduce this issue?
>
> Yes 100% of the time without this patch.
>
> 0% of the time with it applied.
>
> > What I'm thinking is to fix it by freeing sk in call_rcu() by
> > sock_set_flag(sock->sk, SOCK_RCU_FREE),
> > and add rcu_read_lock() in sctp_sock_dump().
> >
> > Thanks.
> >
> > >
> > > Are you able to identify where the bug was introduced? Fixes tag would
> > > be good to have here.
>
> It's probably been there since the code was introduced.
>
> I'll see how far back we have to go.
>
> > > You should squash the two patches together.
>
> I generally like patches to encapsulate functional changes.
>
> This one depends on the other, but they are not functionally related.
>
> You're the boss though - I'll squash them if you insist.
>
> > > > diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> > > > index 760b367644c12..2029b240b6f24 100644
> > > > --- a/net/sctp/diag.c
> > > > +++ b/net/sctp/diag.c
> > > > @@ -301,6 +301,8 @@ static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> > > >       struct sctp_association *assoc;
> > > >       int err = 0;
> > > >
> > > > +     sctp_endpoint_hold(ep);
> > > > +     sock_hold(sk);
> > > >       lock_sock(sk);
> > > >       list_for_each_entry(assoc, &ep->asocs, asocs) {
> > > >               if (cb->args[4] < cb->args[1])
> > > > @@ -341,6 +343,8 @@ static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> > > >       cb->args[4] = 0;
> > > >  release:
> > > >       release_sock(sk);
> > > > +     sock_put(sk);
> > > > +     sctp_endpoint_put(ep);
> > > >       return err;
> > > >  }
> > > >
> > >
>
> --
> Lee Jones [李琼斯]
> Senior Technical Lead - Developer Services
> Linaro.org │ Open source software for Arm SoCs
> Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 16:52         ` Xin Long
@ 2021-12-16 17:13           ` Lee Jones
  2021-12-16 17:14             ` Lee Jones
  0 siblings, 1 reply; 18+ messages in thread
From: Lee Jones @ 2021-12-16 17:13 UTC (permalink / raw)
  To: Xin Long
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Thu, 16 Dec 2021, Xin Long wrote:

> On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
> >
> > On Thu, 16 Dec 2021, Xin Long wrote:
> >
> > > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > >
> > > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > > The cause of the resultant dump_stack() reported below is a
> > > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > > sctp_sock_dump().
> > > > >
> > > > > This race condition occurs when a transport is cached into its
> > > > > associated hash table followed by an endpoint/sock migration to a new
> > > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > > table calling into sctp_sock_dump() where the dereference occurs.
> >
> > > in sctp_sock_dump():
> > >         struct sock *sk = ep->base.sk;
> > >         ... <--[1]
> > >         lock_sock(sk);
> > >
> > > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
> >
> > 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
> >
> > > if that's true, it's still late to do sock_hold(sk) in your this patch.
> >
> > No, that's not right.
> >
> > The schedule happens *inside* the lock_sock() call.
> Sorry, I don't follow this.
> We can't expect when the schedule happens, why do you think this
> can never be scheduled before the lock_sock() call?

True, but I've had this running for hours and it hasn't reproduced.

Without this patch, I can reproduce this in around 2 seconds.

The C-repro for this is pretty intense!

If you want to be *sure* that a schedule will never happen, we can
take a reference directly with:

     ep = sctp_endpoint_hold(tsp->asoc->ep);
     sk = sock_hold(ep->base.sk);

Which was my original plan before I soak tested this submitted patch
for hours without any sign of reproducing the issue.

> If the sock is peeled off or is being freed, we shouldn't dump this sock,
> and it's better to skip it.

I guess we can do that too.

Are you suggesting sctp_sock_migrate() as the call site?

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 17:13           ` Lee Jones
@ 2021-12-16 17:14             ` Lee Jones
  2021-12-16 18:12               ` Xin Long
  0 siblings, 1 reply; 18+ messages in thread
From: Lee Jones @ 2021-12-16 17:14 UTC (permalink / raw)
  To: Xin Long
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Thu, 16 Dec 2021, Lee Jones wrote:

> On Thu, 16 Dec 2021, Xin Long wrote:
> 
> > On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
> > >
> > > On Thu, 16 Dec 2021, Xin Long wrote:
> > >
> > > > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > >
> > > > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > > > The cause of the resultant dump_stack() reported below is a
> > > > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > > > sctp_sock_dump().
> > > > > >
> > > > > > This race condition occurs when a transport is cached into its
> > > > > > associated hash table followed by an endpoint/sock migration to a new
> > > > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > > > table calling into sctp_sock_dump() where the dereference occurs.
> > >
> > > > in sctp_sock_dump():
> > > >         struct sock *sk = ep->base.sk;
> > > >         ... <--[1]
> > > >         lock_sock(sk);
> > > >
> > > > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
> > >
> > > 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
> > >
> > > > if that's true, it's still late to do sock_hold(sk) in your this patch.
> > >
> > > No, that's not right.
> > >
> > > The schedule happens *inside* the lock_sock() call.
> > Sorry, I don't follow this.
> > We can't expect when the schedule happens, why do you think this
> > can never be scheduled before the lock_sock() call?
> 
> True, but I've had this running for hours and it hasn't reproduced.
> 
> Without this patch, I can reproduce this in around 2 seconds.
> 
> The C-repro for this is pretty intense!
> 
> If you want to be *sure* that a schedule will never happen, we can
> take a reference directly with:
> 
>      ep = sctp_endpoint_hold(tsp->asoc->ep);
>      sk = sock_hold(ep->base.sk);
> 
> Which was my original plan before I soak tested this submitted patch
> for hours without any sign of reproducing the issue.
> 
> > If the sock is peeled off or is being freed, we shouldn't dump this sock,
> > and it's better to skip it.
> 
> I guess we can do that too.
> 
> Are you suggesting sctp_sock_migrate() as the call site?

Also, when are you planning on testing the flag?

Won't that suffer with the same issue(s)?

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 17:14             ` Lee Jones
@ 2021-12-16 18:12               ` Xin Long
  2021-12-16 18:22                 ` Xin Long
  2021-12-16 18:25                 ` Lee Jones
  0 siblings, 2 replies; 18+ messages in thread
From: Xin Long @ 2021-12-16 18:12 UTC (permalink / raw)
  To: Lee Jones
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Thu, Dec 16, 2021 at 12:14 PM Lee Jones <lee.jones@linaro.org> wrote:
>
> On Thu, 16 Dec 2021, Lee Jones wrote:
>
> > On Thu, 16 Dec 2021, Xin Long wrote:
> >
> > > On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
> > > >
> > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > >
> > > > > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > >
> > > > > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > > > > The cause of the resultant dump_stack() reported below is a
> > > > > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > > > > sctp_sock_dump().
> > > > > > >
> > > > > > > This race condition occurs when a transport is cached into its
> > > > > > > associated hash table followed by an endpoint/sock migration to a new
> > > > > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > > > > table calling into sctp_sock_dump() where the dereference occurs.
> > > >
> > > > > in sctp_sock_dump():
> > > > >         struct sock *sk = ep->base.sk;
> > > > >         ... <--[1]
> > > > >         lock_sock(sk);
> > > > >
> > > > > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
> > > >
> > > > 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
> > > >
> > > > > if that's true, it's still late to do sock_hold(sk) in your this patch.
> > > >
> > > > No, that's not right.
> > > >
> > > > The schedule happens *inside* the lock_sock() call.
> > > Sorry, I don't follow this.
> > > We can't expect when the schedule happens, why do you think this
> > > can never be scheduled before the lock_sock() call?
> >
> > True, but I've had this running for hours and it hasn't reproduced.
I understand, but it's a crash, we shouldn't take any risk that it
will never happen.
you may try to add a usleep() before the lock_sock call to reproduce it.

> >
> > Without this patch, I can reproduce this in around 2 seconds.
> >
> > The C-repro for this is pretty intense!
> >
> > If you want to be *sure* that a schedule will never happen, we can
> > take a reference directly with:
> >
> >      ep = sctp_endpoint_hold(tsp->asoc->ep);
> >      sk = sock_hold(ep->base.sk);
> >
> > Which was my original plan before I soak tested this submitted patch
> > for hours without any sign of reproducing the issue.
we tried to not export sctp_obj_hold/put(), that's why we had
sctp_for_each_transport().

ep itself holds a reference of sk when it's alive, so it's weird to do
these 2 together.

> >
> > > If the sock is peeled off or is being freed, we shouldn't dump this sock,
> > > and it's better to skip it.
> >
> > I guess we can do that too.
> >
> > Are you suggesting sctp_sock_migrate() as the call site?
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 85ac2e901ffc..56ea7a0e2add 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -9868,6 +9868,7 @@ static int sctp_sock_migrate(struct sock *oldsk,
struct sock *newsk,
                inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
        }

+       sock_set_flag(oldsk, SOCK_RCU_FREE);
        release_sock(newsk);

        return 0;

SOCK_RCU_FREE is set to the previous sk, so that this sk will not
be freed between rcu_read_lock() and rcu_read_unlock().

>
> Also, when are you planning on testing the flag?
SOCK_RCU_FREE flag is used when freeing sk in sk_destruct(),
and if it's set, it will be freed in the next grace period of RCU.

>
> Won't that suffer with the same issue(s)?
diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index 7970d786c4a2..b4c4acd9e67e 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -309,16 +309,21 @@ static int sctp_tsp_dump_one(struct
sctp_transport *tsp, void *p)

 static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
 {
-       struct sctp_endpoint *ep = tsp->asoc->ep;
        struct sctp_comm_param *commp = p;
-       struct sock *sk = ep->base.sk;
        struct sk_buff *skb = commp->skb;
        struct netlink_callback *cb = commp->cb;
        const struct inet_diag_req_v2 *r = commp->r;
        struct sctp_association *assoc;
+       struct sctp_endpoint *ep;
+       struct sock *sk;
        int err = 0;

+       rcu_read_lock();
+       ep = tsp->asoc->ep;
+       sk = ep->base.sk;
        lock_sock(sk);
+       if (tsp->asoc->ep != ep)
+               goto release;
        list_for_each_entry(assoc, &ep->asocs, asocs) {
                if (cb->args[4] < cb->args[1])
                        goto next;
@@ -358,6 +363,7 @@ static int sctp_sock_dump(struct sctp_transport
*tsp, void *p)
        cb->args[4] = 0;
 release:
        release_sock(sk);
+       rcu_read_unlock();
        return err;
 }

rcu_read_lock() will make sure sk from tsp->asoc->ep->base.sk will not
be freed until rcu_read_unlock().

That's all I have. Do you see any other way to fix this?

Thanks.

>
> --
> Lee Jones [李琼斯]
> Senior Technical Lead - Developer Services
> Linaro.org │ Open source software for Arm SoCs
> Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 18:12               ` Xin Long
@ 2021-12-16 18:22                 ` Xin Long
  2021-12-16 19:03                   ` Lee Jones
  2021-12-16 18:25                 ` Lee Jones
  1 sibling, 1 reply; 18+ messages in thread
From: Xin Long @ 2021-12-16 18:22 UTC (permalink / raw)
  To: Lee Jones
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

(

On Thu, Dec 16, 2021 at 1:12 PM Xin Long <lucien.xin@gmail.com> wrote:
>
> On Thu, Dec 16, 2021 at 12:14 PM Lee Jones <lee.jones@linaro.org> wrote:
> >
> > On Thu, 16 Dec 2021, Lee Jones wrote:
> >
> > > On Thu, 16 Dec 2021, Xin Long wrote:
> > >
> > > > On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
> > > > >
> > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > >
> > > > > > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > >
> > > > > > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > > > > > The cause of the resultant dump_stack() reported below is a
> > > > > > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > > > > > sctp_sock_dump().
> > > > > > > >
> > > > > > > > This race condition occurs when a transport is cached into its
> > > > > > > > associated hash table followed by an endpoint/sock migration to a new
> > > > > > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > > > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > > > > > table calling into sctp_sock_dump() where the dereference occurs.
> > > > >
> > > > > > in sctp_sock_dump():
> > > > > >         struct sock *sk = ep->base.sk;
> > > > > >         ... <--[1]
> > > > > >         lock_sock(sk);
> > > > > >
> > > > > > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
> > > > >
> > > > > 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
> > > > >
> > > > > > if that's true, it's still late to do sock_hold(sk) in your this patch.
> > > > >
> > > > > No, that's not right.
> > > > >
> > > > > The schedule happens *inside* the lock_sock() call.
> > > > Sorry, I don't follow this.
> > > > We can't expect when the schedule happens, why do you think this
> > > > can never be scheduled before the lock_sock() call?
> > >
> > > True, but I've had this running for hours and it hasn't reproduced.
> I understand, but it's a crash, we shouldn't take any risk that it
> will never happen.
> you may try to add a usleep() before the lock_sock call to reproduce it.
>
> > >
> > > Without this patch, I can reproduce this in around 2 seconds.
> > >
> > > The C-repro for this is pretty intense!
> > >
> > > If you want to be *sure* that a schedule will never happen, we can
> > > take a reference directly with:
> > >
> > >      ep = sctp_endpoint_hold(tsp->asoc->ep);
> > >      sk = sock_hold(ep->base.sk);
> > >
> > > Which was my original plan before I soak tested this submitted patch
> > > for hours without any sign of reproducing the issue.
> we tried to not export sctp_obj_hold/put(), that's why we had
> sctp_for_each_transport().
>
> ep itself holds a reference of sk when it's alive, so it's weird to do
> these 2 together.
>
> > >
> > > > If the sock is peeled off or is being freed, we shouldn't dump this sock,
> > > > and it's better to skip it.
> > >
> > > I guess we can do that too.
> > >
> > > Are you suggesting sctp_sock_migrate() as the call site?
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 85ac2e901ffc..56ea7a0e2add 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -9868,6 +9868,7 @@ static int sctp_sock_migrate(struct sock *oldsk,
> struct sock *newsk,
>                 inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
>         }
>
> +       sock_set_flag(oldsk, SOCK_RCU_FREE);
>         release_sock(newsk);
>
>         return 0;
>
> SOCK_RCU_FREE is set to the previous sk, so that this sk will not
> be freed between rcu_read_lock() and rcu_read_unlock().
>
> >
> > Also, when are you planning on testing the flag?
> SOCK_RCU_FREE flag is used when freeing sk in sk_destruct(),
> and if it's set, it will be freed in the next grace period of RCU.
>
> >
> > Won't that suffer with the same issue(s)?
> diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> index 7970d786c4a2..b4c4acd9e67e 100644
> --- a/net/sctp/diag.c
> +++ b/net/sctp/diag.c
> @@ -309,16 +309,21 @@ static int sctp_tsp_dump_one(struct
> sctp_transport *tsp, void *p)
>
>  static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
>  {
> -       struct sctp_endpoint *ep = tsp->asoc->ep;
>         struct sctp_comm_param *commp = p;
> -       struct sock *sk = ep->base.sk;
>         struct sk_buff *skb = commp->skb;
>         struct netlink_callback *cb = commp->cb;
>         const struct inet_diag_req_v2 *r = commp->r;
>         struct sctp_association *assoc;
> +       struct sctp_endpoint *ep;
> +       struct sock *sk;
>         int err = 0;
>
> +       rcu_read_lock();
> +       ep = tsp->asoc->ep;
> +       sk = ep->base.sk;
>         lock_sock(sk);
Unfortunately, this isn't going to work, as lock_sock() may sleep,
and is not allowed to be called understand rcu_read_lock() :(

> +       if (tsp->asoc->ep != ep)
> +               goto release;
>         list_for_each_entry(assoc, &ep->asocs, asocs) {
>                 if (cb->args[4] < cb->args[1])
>                         goto next;
> @@ -358,6 +363,7 @@ static int sctp_sock_dump(struct sctp_transport
> *tsp, void *p)
>         cb->args[4] = 0;
>  release:
>         release_sock(sk);
> +       rcu_read_unlock();
>         return err;
>  }
>
> rcu_read_lock() will make sure sk from tsp->asoc->ep->base.sk will not
> be freed until rcu_read_unlock().
>
> That's all I have. Do you see any other way to fix this?
>
> Thanks.
>
> >
> > --
> > Lee Jones [李琼斯]
> > Senior Technical Lead - Developer Services
> > Linaro.org │ Open source software for Arm SoCs
> > Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 18:12               ` Xin Long
  2021-12-16 18:22                 ` Xin Long
@ 2021-12-16 18:25                 ` Lee Jones
  1 sibling, 0 replies; 18+ messages in thread
From: Lee Jones @ 2021-12-16 18:25 UTC (permalink / raw)
  To: Xin Long
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Thu, 16 Dec 2021, Xin Long wrote:

> On Thu, Dec 16, 2021 at 12:14 PM Lee Jones <lee.jones@linaro.org> wrote:
> >
> > On Thu, 16 Dec 2021, Lee Jones wrote:
> >
> > > On Thu, 16 Dec 2021, Xin Long wrote:
> > >
> > > > On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
> > > > >
> > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > >
> > > > > > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > >
> > > > > > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > > > > > The cause of the resultant dump_stack() reported below is a
> > > > > > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > > > > > sctp_sock_dump().
> > > > > > > >
> > > > > > > > This race condition occurs when a transport is cached into its
> > > > > > > > associated hash table followed by an endpoint/sock migration to a new
> > > > > > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > > > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > > > > > table calling into sctp_sock_dump() where the dereference occurs.
> > > > >
> > > > > > in sctp_sock_dump():
> > > > > >         struct sock *sk = ep->base.sk;
> > > > > >         ... <--[1]
> > > > > >         lock_sock(sk);
> > > > > >
> > > > > > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
> > > > >
> > > > > 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
> > > > >
> > > > > > if that's true, it's still late to do sock_hold(sk) in your this patch.
> > > > >
> > > > > No, that's not right.
> > > > >
> > > > > The schedule happens *inside* the lock_sock() call.
> > > > Sorry, I don't follow this.
> > > > We can't expect when the schedule happens, why do you think this
> > > > can never be scheduled before the lock_sock() call?
> > >
> > > True, but I've had this running for hours and it hasn't reproduced.
> I understand, but it's a crash, we shouldn't take any risk that it
> will never happen.
> you may try to add a usleep() before the lock_sock call to reproduce it.
> 
> > >
> > > Without this patch, I can reproduce this in around 2 seconds.
> > >
> > > The C-repro for this is pretty intense!
> > >
> > > If you want to be *sure* that a schedule will never happen, we can
> > > take a reference directly with:
> > >
> > >      ep = sctp_endpoint_hold(tsp->asoc->ep);
> > >      sk = sock_hold(ep->base.sk);
> > >
> > > Which was my original plan before I soak tested this submitted patch
> > > for hours without any sign of reproducing the issue.
> we tried to not export sctp_obj_hold/put(), that's why we had
> sctp_for_each_transport().
> 
> ep itself holds a reference of sk when it's alive, so it's weird to do
> these 2 together.
> 
> > >
> > > > If the sock is peeled off or is being freed, we shouldn't dump this sock,
> > > > and it's better to skip it.
> > >
> > > I guess we can do that too.
> > >
> > > Are you suggesting sctp_sock_migrate() as the call site?
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 85ac2e901ffc..56ea7a0e2add 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -9868,6 +9868,7 @@ static int sctp_sock_migrate(struct sock *oldsk,
> struct sock *newsk,
>                 inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
>         }
> 
> +       sock_set_flag(oldsk, SOCK_RCU_FREE);
>         release_sock(newsk);
> 
>         return 0;
> 
> SOCK_RCU_FREE is set to the previous sk, so that this sk will not
> be freed between rcu_read_lock() and rcu_read_unlock().
> 
> >
> > Also, when are you planning on testing the flag?
> SOCK_RCU_FREE flag is used when freeing sk in sk_destruct(),
> and if it's set, it will be freed in the next grace period of RCU.
> 
> >
> > Won't that suffer with the same issue(s)?
> diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> index 7970d786c4a2..b4c4acd9e67e 100644
> --- a/net/sctp/diag.c
> +++ b/net/sctp/diag.c
> @@ -309,16 +309,21 @@ static int sctp_tsp_dump_one(struct
> sctp_transport *tsp, void *p)
> 
>  static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
>  {
> -       struct sctp_endpoint *ep = tsp->asoc->ep;
>         struct sctp_comm_param *commp = p;
> -       struct sock *sk = ep->base.sk;
>         struct sk_buff *skb = commp->skb;
>         struct netlink_callback *cb = commp->cb;
>         const struct inet_diag_req_v2 *r = commp->r;
>         struct sctp_association *assoc;
> +       struct sctp_endpoint *ep;
> +       struct sock *sk;
>         int err = 0;
> 
> +       rcu_read_lock();
> +       ep = tsp->asoc->ep;
> +       sk = ep->base.sk;
>         lock_sock(sk);
> +       if (tsp->asoc->ep != ep)
> +               goto release;
>         list_for_each_entry(assoc, &ep->asocs, asocs) {
>                 if (cb->args[4] < cb->args[1])
>                         goto next;
> @@ -358,6 +363,7 @@ static int sctp_sock_dump(struct sctp_transport
> *tsp, void *p)
>         cb->args[4] = 0;
>  release:
>         release_sock(sk);
> +       rcu_read_unlock();
>         return err;
>  }
> 
> rcu_read_lock() will make sure sk from tsp->asoc->ep->base.sk will not
> be freed until rcu_read_unlock().
> 
> That's all I have. Do you see any other way to fix this?

No, that sounds reasonable enough.

Do you want me to hack this up and test it, or are you planning on
submitting a this?

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 18:22                 ` Xin Long
@ 2021-12-16 19:03                   ` Lee Jones
  2021-12-19 21:20                     ` Xin Long
  0 siblings, 1 reply; 18+ messages in thread
From: Lee Jones @ 2021-12-16 19:03 UTC (permalink / raw)
  To: Xin Long
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Thu, 16 Dec 2021, Xin Long wrote:

> (
> 
> On Thu, Dec 16, 2021 at 1:12 PM Xin Long <lucien.xin@gmail.com> wrote:
> >
> > On Thu, Dec 16, 2021 at 12:14 PM Lee Jones <lee.jones@linaro.org> wrote:
> > >
> > > On Thu, 16 Dec 2021, Lee Jones wrote:
> > >
> > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > >
> > > > > On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
> > > > > >
> > > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > > >
> > > > > > > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > > > > > > The cause of the resultant dump_stack() reported below is a
> > > > > > > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > > > > > > sctp_sock_dump().
> > > > > > > > >
> > > > > > > > > This race condition occurs when a transport is cached into its
> > > > > > > > > associated hash table followed by an endpoint/sock migration to a new
> > > > > > > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > > > > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > > > > > > table calling into sctp_sock_dump() where the dereference occurs.
> > > > > >
> > > > > > > in sctp_sock_dump():
> > > > > > >         struct sock *sk = ep->base.sk;
> > > > > > >         ... <--[1]
> > > > > > >         lock_sock(sk);
> > > > > > >
> > > > > > > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
> > > > > >
> > > > > > 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
> > > > > >
> > > > > > > if that's true, it's still late to do sock_hold(sk) in your this patch.
> > > > > >
> > > > > > No, that's not right.
> > > > > >
> > > > > > The schedule happens *inside* the lock_sock() call.
> > > > > Sorry, I don't follow this.
> > > > > We can't expect when the schedule happens, why do you think this
> > > > > can never be scheduled before the lock_sock() call?
> > > >
> > > > True, but I've had this running for hours and it hasn't reproduced.
> > I understand, but it's a crash, we shouldn't take any risk that it
> > will never happen.
> > you may try to add a usleep() before the lock_sock call to reproduce it.
> >
> > > >
> > > > Without this patch, I can reproduce this in around 2 seconds.
> > > >
> > > > The C-repro for this is pretty intense!
> > > >
> > > > If you want to be *sure* that a schedule will never happen, we can
> > > > take a reference directly with:
> > > >
> > > >      ep = sctp_endpoint_hold(tsp->asoc->ep);
> > > >      sk = sock_hold(ep->base.sk);
> > > >
> > > > Which was my original plan before I soak tested this submitted patch
> > > > for hours without any sign of reproducing the issue.
> > we tried to not export sctp_obj_hold/put(), that's why we had
> > sctp_for_each_transport().
> >
> > ep itself holds a reference of sk when it's alive, so it's weird to do
> > these 2 together.
> >
> > > >
> > > > > If the sock is peeled off or is being freed, we shouldn't dump this sock,
> > > > > and it's better to skip it.
> > > >
> > > > I guess we can do that too.
> > > >
> > > > Are you suggesting sctp_sock_migrate() as the call site?
> > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > index 85ac2e901ffc..56ea7a0e2add 100644
> > --- a/net/sctp/socket.c
> > +++ b/net/sctp/socket.c
> > @@ -9868,6 +9868,7 @@ static int sctp_sock_migrate(struct sock *oldsk,
> > struct sock *newsk,
> >                 inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
> >         }
> >
> > +       sock_set_flag(oldsk, SOCK_RCU_FREE);
> >         release_sock(newsk);
> >
> >         return 0;
> >
> > SOCK_RCU_FREE is set to the previous sk, so that this sk will not
> > be freed between rcu_read_lock() and rcu_read_unlock().
> >
> > >
> > > Also, when are you planning on testing the flag?
> > SOCK_RCU_FREE flag is used when freeing sk in sk_destruct(),
> > and if it's set, it will be freed in the next grace period of RCU.
> >
> > >
> > > Won't that suffer with the same issue(s)?
> > diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> > index 7970d786c4a2..b4c4acd9e67e 100644
> > --- a/net/sctp/diag.c
> > +++ b/net/sctp/diag.c
> > @@ -309,16 +309,21 @@ static int sctp_tsp_dump_one(struct
> > sctp_transport *tsp, void *p)
> >
> >  static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> >  {
> > -       struct sctp_endpoint *ep = tsp->asoc->ep;
> >         struct sctp_comm_param *commp = p;
> > -       struct sock *sk = ep->base.sk;
> >         struct sk_buff *skb = commp->skb;
> >         struct netlink_callback *cb = commp->cb;
> >         const struct inet_diag_req_v2 *r = commp->r;
> >         struct sctp_association *assoc;
> > +       struct sctp_endpoint *ep;
> > +       struct sock *sk;
> >         int err = 0;
> >
> > +       rcu_read_lock();
> > +       ep = tsp->asoc->ep;
> > +       sk = ep->base.sk;
> >         lock_sock(sk);
> Unfortunately, this isn't going to work, as lock_sock() may sleep,
> and is not allowed to be called understand rcu_read_lock() :(

Ah!

How about my original solution of taking:

  tsp->asoc->ep

... directly?

If it already holds the sk, we should be golden?

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 16:39       ` Lee Jones
  2021-12-16 16:52         ` Xin Long
@ 2021-12-16 20:44         ` Jakub Kicinski
  2021-12-17 11:21           ` Lee Jones
  1 sibling, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2021-12-16 20:44 UTC (permalink / raw)
  To: Lee Jones
  Cc: Xin Long, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Thu, 16 Dec 2021 16:39:15 +0000 Lee Jones wrote:
> > > You should squash the two patches together.  
> 
> I generally like patches to encapsulate functional changes.
> 
> This one depends on the other, but they are not functionally related.
> 
> You're the boss though - I'll squash them if you insist.

Yes, please squash them.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 20:44         ` Jakub Kicinski
@ 2021-12-17 11:21           ` Lee Jones
  0 siblings, 0 replies; 18+ messages in thread
From: Lee Jones @ 2021-12-17 11:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Xin Long, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Thu, 16 Dec 2021, Jakub Kicinski wrote:

> On Thu, 16 Dec 2021 16:39:15 +0000 Lee Jones wrote:
> > > > You should squash the two patches together.  
> > 
> > I generally like patches to encapsulate functional changes.
> > 
> > This one depends on the other, but they are not functionally related.
> > 
> > You're the boss though - I'll squash them if you insist.
> 
> Yes, please squash them.

I'm just about to make some changes to the patches.

Specifically, I'm going to make sctp_endpoint_hold() return the
endpoint it incremented, in order to prevent schedule related data
corruption before/after the increment of refcnt.

I'm going to keep the patches separate for the time being (since I'm
going to submit this before you get out of bed most likely).  Just let
me know if you still want them squashed, even with these additional
changes (along with their explanation in the commit message), or feel
free to squash them yourself if you choose to merge them.

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-16 19:03                   ` Lee Jones
@ 2021-12-19 21:20                     ` Xin Long
  2021-12-20  8:56                       ` Lee Jones
  0 siblings, 1 reply; 18+ messages in thread
From: Xin Long @ 2021-12-19 21:20 UTC (permalink / raw)
  To: Lee Jones
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Thu, Dec 16, 2021 at 2:03 PM Lee Jones <lee.jones@linaro.org> wrote:
>
> On Thu, 16 Dec 2021, Xin Long wrote:
>
> > (
> >
> > On Thu, Dec 16, 2021 at 1:12 PM Xin Long <lucien.xin@gmail.com> wrote:
> > >
> > > On Thu, Dec 16, 2021 at 12:14 PM Lee Jones <lee.jones@linaro.org> wrote:
> > > >
> > > > On Thu, 16 Dec 2021, Lee Jones wrote:
> > > >
> > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > >
> > > > > > On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
> > > > > > >
> > > > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > > > >
> > > > > > > > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > > > > > > > The cause of the resultant dump_stack() reported below is a
> > > > > > > > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > > > > > > > sctp_sock_dump().
> > > > > > > > > >
> > > > > > > > > > This race condition occurs when a transport is cached into its
> > > > > > > > > > associated hash table followed by an endpoint/sock migration to a new
> > > > > > > > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > > > > > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > > > > > > > table calling into sctp_sock_dump() where the dereference occurs.
> > > > > > >
> > > > > > > > in sctp_sock_dump():
> > > > > > > >         struct sock *sk = ep->base.sk;
> > > > > > > >         ... <--[1]
> > > > > > > >         lock_sock(sk);
> > > > > > > >
> > > > > > > > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
> > > > > > >
> > > > > > > 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
> > > > > > >
> > > > > > > > if that's true, it's still late to do sock_hold(sk) in your this patch.
> > > > > > >
> > > > > > > No, that's not right.
> > > > > > >
> > > > > > > The schedule happens *inside* the lock_sock() call.
> > > > > > Sorry, I don't follow this.
> > > > > > We can't expect when the schedule happens, why do you think this
> > > > > > can never be scheduled before the lock_sock() call?
> > > > >
> > > > > True, but I've had this running for hours and it hasn't reproduced.
> > > I understand, but it's a crash, we shouldn't take any risk that it
> > > will never happen.
> > > you may try to add a usleep() before the lock_sock call to reproduce it.
> > >
> > > > >
> > > > > Without this patch, I can reproduce this in around 2 seconds.
> > > > >
> > > > > The C-repro for this is pretty intense!
> > > > >
> > > > > If you want to be *sure* that a schedule will never happen, we can
> > > > > take a reference directly with:
> > > > >
> > > > >      ep = sctp_endpoint_hold(tsp->asoc->ep);
> > > > >      sk = sock_hold(ep->base.sk);
> > > > >
> > > > > Which was my original plan before I soak tested this submitted patch
> > > > > for hours without any sign of reproducing the issue.
> > > we tried to not export sctp_obj_hold/put(), that's why we had
> > > sctp_for_each_transport().
> > >
> > > ep itself holds a reference of sk when it's alive, so it's weird to do
> > > these 2 together.
> > >
> > > > >
> > > > > > If the sock is peeled off or is being freed, we shouldn't dump this sock,
> > > > > > and it's better to skip it.
> > > > >
> > > > > I guess we can do that too.
> > > > >
> > > > > Are you suggesting sctp_sock_migrate() as the call site?
> > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > index 85ac2e901ffc..56ea7a0e2add 100644
> > > --- a/net/sctp/socket.c
> > > +++ b/net/sctp/socket.c
> > > @@ -9868,6 +9868,7 @@ static int sctp_sock_migrate(struct sock *oldsk,
> > > struct sock *newsk,
> > >                 inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
> > >         }
> > >
> > > +       sock_set_flag(oldsk, SOCK_RCU_FREE);
> > >         release_sock(newsk);
> > >
> > >         return 0;
> > >
> > > SOCK_RCU_FREE is set to the previous sk, so that this sk will not
> > > be freed between rcu_read_lock() and rcu_read_unlock().
> > >
> > > >
> > > > Also, when are you planning on testing the flag?
> > > SOCK_RCU_FREE flag is used when freeing sk in sk_destruct(),
> > > and if it's set, it will be freed in the next grace period of RCU.
> > >
> > > >
> > > > Won't that suffer with the same issue(s)?
> > > diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> > > index 7970d786c4a2..b4c4acd9e67e 100644
> > > --- a/net/sctp/diag.c
> > > +++ b/net/sctp/diag.c
> > > @@ -309,16 +309,21 @@ static int sctp_tsp_dump_one(struct
> > > sctp_transport *tsp, void *p)
> > >
> > >  static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> > >  {
> > > -       struct sctp_endpoint *ep = tsp->asoc->ep;
> > >         struct sctp_comm_param *commp = p;
> > > -       struct sock *sk = ep->base.sk;
> > >         struct sk_buff *skb = commp->skb;
> > >         struct netlink_callback *cb = commp->cb;
> > >         const struct inet_diag_req_v2 *r = commp->r;
> > >         struct sctp_association *assoc;
> > > +       struct sctp_endpoint *ep;
> > > +       struct sock *sk;
> > >         int err = 0;
> > >
> > > +       rcu_read_lock();
> > > +       ep = tsp->asoc->ep;
> > > +       sk = ep->base.sk;
> > >         lock_sock(sk);
> > Unfortunately, this isn't going to work, as lock_sock() may sleep,
> > and is not allowed to be called understand rcu_read_lock() :(
>
> Ah!
>
> How about my original solution of taking:
>
>   tsp->asoc->ep
>
> ... directly?
>
> If it already holds the sk, we should be golden?
Both ep and sk could be destroyed at this moment.
you can't try to hold an object that has already been destroyed.
It holds the sk only when ep is still alive.

I don't see a way to get this fix with the current transport hashtable.
I will change to use port hashtable to dump sock/asocs for this.

Thanks.
>
> --
> Lee Jones [李琼斯]
> Senior Technical Lead - Developer Services
> Linaro.org │ Open source software for Arm SoCs
> Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-19 21:20                     ` Xin Long
@ 2021-12-20  8:56                       ` Lee Jones
  2021-12-21 19:52                         ` Xin Long
  0 siblings, 1 reply; 18+ messages in thread
From: Lee Jones @ 2021-12-20  8:56 UTC (permalink / raw)
  To: Xin Long
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Karl Knutson, Jon Grimm, Xingang Guo, Hui Huang,
	Sridhar Samudrala, Daisy Chang, Ryan Layer, Kevin Gao,
	network dev

On Sun, 19 Dec 2021, Xin Long wrote:

> On Thu, Dec 16, 2021 at 2:03 PM Lee Jones <lee.jones@linaro.org> wrote:
> >
> > On Thu, 16 Dec 2021, Xin Long wrote:
> >
> > > (
> > >
> > > On Thu, Dec 16, 2021 at 1:12 PM Xin Long <lucien.xin@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 16, 2021 at 12:14 PM Lee Jones <lee.jones@linaro.org> wrote:
> > > > >
> > > > > On Thu, 16 Dec 2021, Lee Jones wrote:
> > > > >
> > > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > > >
> > > > > > > On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
> > > > > > > >
> > > > > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > > > > >
> > > > > > > > > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > > > > > > > > The cause of the resultant dump_stack() reported below is a
> > > > > > > > > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > > > > > > > > sctp_sock_dump().
> > > > > > > > > > >
> > > > > > > > > > > This race condition occurs when a transport is cached into its
> > > > > > > > > > > associated hash table followed by an endpoint/sock migration to a new
> > > > > > > > > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > > > > > > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > > > > > > > > table calling into sctp_sock_dump() where the dereference occurs.
> > > > > > > >
> > > > > > > > > in sctp_sock_dump():
> > > > > > > > >         struct sock *sk = ep->base.sk;
> > > > > > > > >         ... <--[1]
> > > > > > > > >         lock_sock(sk);
> > > > > > > > >
> > > > > > > > > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
> > > > > > > >
> > > > > > > > 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
> > > > > > > >
> > > > > > > > > if that's true, it's still late to do sock_hold(sk) in your this patch.
> > > > > > > >
> > > > > > > > No, that's not right.
> > > > > > > >
> > > > > > > > The schedule happens *inside* the lock_sock() call.
> > > > > > > Sorry, I don't follow this.
> > > > > > > We can't expect when the schedule happens, why do you think this
> > > > > > > can never be scheduled before the lock_sock() call?
> > > > > >
> > > > > > True, but I've had this running for hours and it hasn't reproduced.
> > > > I understand, but it's a crash, we shouldn't take any risk that it
> > > > will never happen.
> > > > you may try to add a usleep() before the lock_sock call to reproduce it.
> > > >
> > > > > >
> > > > > > Without this patch, I can reproduce this in around 2 seconds.
> > > > > >
> > > > > > The C-repro for this is pretty intense!
> > > > > >
> > > > > > If you want to be *sure* that a schedule will never happen, we can
> > > > > > take a reference directly with:
> > > > > >
> > > > > >      ep = sctp_endpoint_hold(tsp->asoc->ep);
> > > > > >      sk = sock_hold(ep->base.sk);
> > > > > >
> > > > > > Which was my original plan before I soak tested this submitted patch
> > > > > > for hours without any sign of reproducing the issue.
> > > > we tried to not export sctp_obj_hold/put(), that's why we had
> > > > sctp_for_each_transport().
> > > >
> > > > ep itself holds a reference of sk when it's alive, so it's weird to do
> > > > these 2 together.
> > > >
> > > > > >
> > > > > > > If the sock is peeled off or is being freed, we shouldn't dump this sock,
> > > > > > > and it's better to skip it.
> > > > > >
> > > > > > I guess we can do that too.
> > > > > >
> > > > > > Are you suggesting sctp_sock_migrate() as the call site?
> > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > > index 85ac2e901ffc..56ea7a0e2add 100644
> > > > --- a/net/sctp/socket.c
> > > > +++ b/net/sctp/socket.c
> > > > @@ -9868,6 +9868,7 @@ static int sctp_sock_migrate(struct sock *oldsk,
> > > > struct sock *newsk,
> > > >                 inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
> > > >         }
> > > >
> > > > +       sock_set_flag(oldsk, SOCK_RCU_FREE);
> > > >         release_sock(newsk);
> > > >
> > > >         return 0;
> > > >
> > > > SOCK_RCU_FREE is set to the previous sk, so that this sk will not
> > > > be freed between rcu_read_lock() and rcu_read_unlock().
> > > >
> > > > >
> > > > > Also, when are you planning on testing the flag?
> > > > SOCK_RCU_FREE flag is used when freeing sk in sk_destruct(),
> > > > and if it's set, it will be freed in the next grace period of RCU.
> > > >
> > > > >
> > > > > Won't that suffer with the same issue(s)?
> > > > diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> > > > index 7970d786c4a2..b4c4acd9e67e 100644
> > > > --- a/net/sctp/diag.c
> > > > +++ b/net/sctp/diag.c
> > > > @@ -309,16 +309,21 @@ static int sctp_tsp_dump_one(struct
> > > > sctp_transport *tsp, void *p)
> > > >
> > > >  static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> > > >  {
> > > > -       struct sctp_endpoint *ep = tsp->asoc->ep;
> > > >         struct sctp_comm_param *commp = p;
> > > > -       struct sock *sk = ep->base.sk;
> > > >         struct sk_buff *skb = commp->skb;
> > > >         struct netlink_callback *cb = commp->cb;
> > > >         const struct inet_diag_req_v2 *r = commp->r;
> > > >         struct sctp_association *assoc;
> > > > +       struct sctp_endpoint *ep;
> > > > +       struct sock *sk;
> > > >         int err = 0;
> > > >
> > > > +       rcu_read_lock();
> > > > +       ep = tsp->asoc->ep;
> > > > +       sk = ep->base.sk;
> > > >         lock_sock(sk);
> > > Unfortunately, this isn't going to work, as lock_sock() may sleep,
> > > and is not allowed to be called understand rcu_read_lock() :(
> >
> > Ah!
> >
> > How about my original solution of taking:
> >
> >   tsp->asoc->ep
> >
> > ... directly?
> >
> > If it already holds the sk, we should be golden?
> Both ep and sk could be destroyed at this moment.
> you can't try to hold an object that has already been destroyed.
> It holds the sk only when ep is still alive.
> 
> I don't see a way to get this fix with the current transport hashtable.
> I will change to use port hashtable to dump sock/asocs for this.

Right.  Cache invalidation is hard!

Sure, if there is a better way, please go ahead.

Thanks.

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-20  8:56                       ` Lee Jones
@ 2021-12-21 19:52                         ` Xin Long
  2021-12-22 10:58                           ` Lee Jones
  0 siblings, 1 reply; 18+ messages in thread
From: Xin Long @ 2021-12-21 19:52 UTC (permalink / raw)
  To: Lee Jones
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Hui Huang, network dev

On Mon, Dec 20, 2021 at 3:56 AM Lee Jones <lee.jones@linaro.org> wrote:
>
> On Sun, 19 Dec 2021, Xin Long wrote:
>
> > On Thu, Dec 16, 2021 at 2:03 PM Lee Jones <lee.jones@linaro.org> wrote:
> > >
> > > On Thu, 16 Dec 2021, Xin Long wrote:
> > >
> > > > (
> > > >
> > > > On Thu, Dec 16, 2021 at 1:12 PM Xin Long <lucien.xin@gmail.com> wrote:
> > > > >
> > > > > On Thu, Dec 16, 2021 at 12:14 PM Lee Jones <lee.jones@linaro.org> wrote:
> > > > > >
> > > > > > On Thu, 16 Dec 2021, Lee Jones wrote:
> > > > > >
> > > > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > > > >
> > > > > > > > On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > > > > > >
> > > > > > > > > > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > > > > > > > > > The cause of the resultant dump_stack() reported below is a
> > > > > > > > > > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > > > > > > > > > sctp_sock_dump().
> > > > > > > > > > > >
> > > > > > > > > > > > This race condition occurs when a transport is cached into its
> > > > > > > > > > > > associated hash table followed by an endpoint/sock migration to a new
> > > > > > > > > > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > > > > > > > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > > > > > > > > > table calling into sctp_sock_dump() where the dereference occurs.
> > > > > > > > >
> > > > > > > > > > in sctp_sock_dump():
> > > > > > > > > >         struct sock *sk = ep->base.sk;
> > > > > > > > > >         ... <--[1]
> > > > > > > > > >         lock_sock(sk);
> > > > > > > > > >
> > > > > > > > > > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
> > > > > > > > >
> > > > > > > > > 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
> > > > > > > > >
> > > > > > > > > > if that's true, it's still late to do sock_hold(sk) in your this patch.
> > > > > > > > >
> > > > > > > > > No, that's not right.
> > > > > > > > >
> > > > > > > > > The schedule happens *inside* the lock_sock() call.
> > > > > > > > Sorry, I don't follow this.
> > > > > > > > We can't expect when the schedule happens, why do you think this
> > > > > > > > can never be scheduled before the lock_sock() call?
> > > > > > >
> > > > > > > True, but I've had this running for hours and it hasn't reproduced.
> > > > > I understand, but it's a crash, we shouldn't take any risk that it
> > > > > will never happen.
> > > > > you may try to add a usleep() before the lock_sock call to reproduce it.
> > > > >
> > > > > > >
> > > > > > > Without this patch, I can reproduce this in around 2 seconds.
> > > > > > >
> > > > > > > The C-repro for this is pretty intense!
> > > > > > >
> > > > > > > If you want to be *sure* that a schedule will never happen, we can
> > > > > > > take a reference directly with:
> > > > > > >
> > > > > > >      ep = sctp_endpoint_hold(tsp->asoc->ep);
> > > > > > >      sk = sock_hold(ep->base.sk);
> > > > > > >
> > > > > > > Which was my original plan before I soak tested this submitted patch
> > > > > > > for hours without any sign of reproducing the issue.
> > > > > we tried to not export sctp_obj_hold/put(), that's why we had
> > > > > sctp_for_each_transport().
> > > > >
> > > > > ep itself holds a reference of sk when it's alive, so it's weird to do
> > > > > these 2 together.
> > > > >
> > > > > > >
> > > > > > > > If the sock is peeled off or is being freed, we shouldn't dump this sock,
> > > > > > > > and it's better to skip it.
> > > > > > >
> > > > > > > I guess we can do that too.
> > > > > > >
> > > > > > > Are you suggesting sctp_sock_migrate() as the call site?
> > > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > > > index 85ac2e901ffc..56ea7a0e2add 100644
> > > > > --- a/net/sctp/socket.c
> > > > > +++ b/net/sctp/socket.c
> > > > > @@ -9868,6 +9868,7 @@ static int sctp_sock_migrate(struct sock *oldsk,
> > > > > struct sock *newsk,
> > > > >                 inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
> > > > >         }
> > > > >
> > > > > +       sock_set_flag(oldsk, SOCK_RCU_FREE);
> > > > >         release_sock(newsk);
> > > > >
> > > > >         return 0;
> > > > >
> > > > > SOCK_RCU_FREE is set to the previous sk, so that this sk will not
> > > > > be freed between rcu_read_lock() and rcu_read_unlock().
> > > > >
> > > > > >
> > > > > > Also, when are you planning on testing the flag?
> > > > > SOCK_RCU_FREE flag is used when freeing sk in sk_destruct(),
> > > > > and if it's set, it will be freed in the next grace period of RCU.
> > > > >
> > > > > >
> > > > > > Won't that suffer with the same issue(s)?
> > > > > diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> > > > > index 7970d786c4a2..b4c4acd9e67e 100644
> > > > > --- a/net/sctp/diag.c
> > > > > +++ b/net/sctp/diag.c
> > > > > @@ -309,16 +309,21 @@ static int sctp_tsp_dump_one(struct
> > > > > sctp_transport *tsp, void *p)
> > > > >
> > > > >  static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> > > > >  {
> > > > > -       struct sctp_endpoint *ep = tsp->asoc->ep;
> > > > >         struct sctp_comm_param *commp = p;
> > > > > -       struct sock *sk = ep->base.sk;
> > > > >         struct sk_buff *skb = commp->skb;
> > > > >         struct netlink_callback *cb = commp->cb;
> > > > >         const struct inet_diag_req_v2 *r = commp->r;
> > > > >         struct sctp_association *assoc;
> > > > > +       struct sctp_endpoint *ep;
> > > > > +       struct sock *sk;
> > > > >         int err = 0;
> > > > >
> > > > > +       rcu_read_lock();
> > > > > +       ep = tsp->asoc->ep;
> > > > > +       sk = ep->base.sk;
> > > > >         lock_sock(sk);
> > > > Unfortunately, this isn't going to work, as lock_sock() may sleep,
> > > > and is not allowed to be called understand rcu_read_lock() :(
> > >
> > > Ah!
> > >
> > > How about my original solution of taking:
> > >
> > >   tsp->asoc->ep
> > >
> > > ... directly?
> > >
> > > If it already holds the sk, we should be golden?
> > Both ep and sk could be destroyed at this moment.
> > you can't try to hold an object that has already been destroyed.
> > It holds the sk only when ep is still alive.
> >
> > I don't see a way to get this fix with the current transport hashtable.
> > I will change to use port hashtable to dump sock/asocs for this.
>
> Right.  Cache invalidation is hard!
>
> Sure, if there is a better way, please go ahead.
Hi, Jones,

Port hashtable doesn't work either as lock_sock can not be called
under spin_lock().

I posted another patch where this issue can be fixed by moving ep free
to call_rcu().
It will be great if you are able to test it.

Thanks.

>
> Thanks.
>
> --
> Lee Jones [李琼斯]
> Senior Technical Lead - Developer Services
> Linaro.org │ Open source software for Arm SoCs
> Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF
  2021-12-21 19:52                         ` Xin Long
@ 2021-12-22 10:58                           ` Lee Jones
  0 siblings, 0 replies; 18+ messages in thread
From: Lee Jones @ 2021-12-22 10:58 UTC (permalink / raw)
  To: Xin Long
  Cc: Jakub Kicinski, LKML, Vlad Yasevich, Neil Horman,
	Marcelo Ricardo Leitner, David S. Miller, lksctp developers,
	H.P. Yarroll, Hui Huang, network dev

On Tue, 21 Dec 2021, Xin Long wrote:

> On Mon, Dec 20, 2021 at 3:56 AM Lee Jones <lee.jones@linaro.org> wrote:
> >
> > On Sun, 19 Dec 2021, Xin Long wrote:
> >
> > > On Thu, Dec 16, 2021 at 2:03 PM Lee Jones <lee.jones@linaro.org> wrote:
> > > >
> > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > >
> > > > > (
> > > > >
> > > > > On Thu, Dec 16, 2021 at 1:12 PM Xin Long <lucien.xin@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Dec 16, 2021 at 12:14 PM Lee Jones <lee.jones@linaro.org> wrote:
> > > > > > >
> > > > > > > On Thu, 16 Dec 2021, Lee Jones wrote:
> > > > > > >
> > > > > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > > > > >
> > > > > > > > > On Thu, Dec 16, 2021 at 11:39 AM Lee Jones <lee.jones@linaro.org> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, 16 Dec 2021, Xin Long wrote:
> > > > > > > > > >
> > > > > > > > > > > On Wed, Dec 15, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, 14 Dec 2021 21:57:32 +0000 Lee Jones wrote:
> > > > > > > > > > > > > The cause of the resultant dump_stack() reported below is a
> > > > > > > > > > > > > dereference of a freed pointer to 'struct sctp_endpoint' in
> > > > > > > > > > > > > sctp_sock_dump().
> > > > > > > > > > > > >
> > > > > > > > > > > > > This race condition occurs when a transport is cached into its
> > > > > > > > > > > > > associated hash table followed by an endpoint/sock migration to a new
> > > > > > > > > > > > > association in sctp_assoc_migrate() prior to their subsequent use in
> > > > > > > > > > > > > sctp_diag_dump() which uses sctp_for_each_transport() to walk the hash
> > > > > > > > > > > > > table calling into sctp_sock_dump() where the dereference occurs.
> > > > > > > > > >
> > > > > > > > > > > in sctp_sock_dump():
> > > > > > > > > > >         struct sock *sk = ep->base.sk;
> > > > > > > > > > >         ... <--[1]
> > > > > > > > > > >         lock_sock(sk);
> > > > > > > > > > >
> > > > > > > > > > > Do you mean in [1], the sk is peeled off and gets freed elsewhere?
> > > > > > > > > >
> > > > > > > > > > 'ep' and 'sk' are both switched out for new ones in sctp_sock_migrate().
> > > > > > > > > >
> > > > > > > > > > > if that's true, it's still late to do sock_hold(sk) in your this patch.
> > > > > > > > > >
> > > > > > > > > > No, that's not right.
> > > > > > > > > >
> > > > > > > > > > The schedule happens *inside* the lock_sock() call.
> > > > > > > > > Sorry, I don't follow this.
> > > > > > > > > We can't expect when the schedule happens, why do you think this
> > > > > > > > > can never be scheduled before the lock_sock() call?
> > > > > > > >
> > > > > > > > True, but I've had this running for hours and it hasn't reproduced.
> > > > > > I understand, but it's a crash, we shouldn't take any risk that it
> > > > > > will never happen.
> > > > > > you may try to add a usleep() before the lock_sock call to reproduce it.
> > > > > >
> > > > > > > >
> > > > > > > > Without this patch, I can reproduce this in around 2 seconds.
> > > > > > > >
> > > > > > > > The C-repro for this is pretty intense!
> > > > > > > >
> > > > > > > > If you want to be *sure* that a schedule will never happen, we can
> > > > > > > > take a reference directly with:
> > > > > > > >
> > > > > > > >      ep = sctp_endpoint_hold(tsp->asoc->ep);
> > > > > > > >      sk = sock_hold(ep->base.sk);
> > > > > > > >
> > > > > > > > Which was my original plan before I soak tested this submitted patch
> > > > > > > > for hours without any sign of reproducing the issue.
> > > > > > we tried to not export sctp_obj_hold/put(), that's why we had
> > > > > > sctp_for_each_transport().
> > > > > >
> > > > > > ep itself holds a reference of sk when it's alive, so it's weird to do
> > > > > > these 2 together.
> > > > > >
> > > > > > > >
> > > > > > > > > If the sock is peeled off or is being freed, we shouldn't dump this sock,
> > > > > > > > > and it's better to skip it.
> > > > > > > >
> > > > > > > > I guess we can do that too.
> > > > > > > >
> > > > > > > > Are you suggesting sctp_sock_migrate() as the call site?
> > > > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > > > > index 85ac2e901ffc..56ea7a0e2add 100644
> > > > > > --- a/net/sctp/socket.c
> > > > > > +++ b/net/sctp/socket.c
> > > > > > @@ -9868,6 +9868,7 @@ static int sctp_sock_migrate(struct sock *oldsk,
> > > > > > struct sock *newsk,
> > > > > >                 inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
> > > > > >         }
> > > > > >
> > > > > > +       sock_set_flag(oldsk, SOCK_RCU_FREE);
> > > > > >         release_sock(newsk);
> > > > > >
> > > > > >         return 0;
> > > > > >
> > > > > > SOCK_RCU_FREE is set to the previous sk, so that this sk will not
> > > > > > be freed between rcu_read_lock() and rcu_read_unlock().
> > > > > >
> > > > > > >
> > > > > > > Also, when are you planning on testing the flag?
> > > > > > SOCK_RCU_FREE flag is used when freeing sk in sk_destruct(),
> > > > > > and if it's set, it will be freed in the next grace period of RCU.
> > > > > >
> > > > > > >
> > > > > > > Won't that suffer with the same issue(s)?
> > > > > > diff --git a/net/sctp/diag.c b/net/sctp/diag.c
> > > > > > index 7970d786c4a2..b4c4acd9e67e 100644
> > > > > > --- a/net/sctp/diag.c
> > > > > > +++ b/net/sctp/diag.c
> > > > > > @@ -309,16 +309,21 @@ static int sctp_tsp_dump_one(struct
> > > > > > sctp_transport *tsp, void *p)
> > > > > >
> > > > > >  static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> > > > > >  {
> > > > > > -       struct sctp_endpoint *ep = tsp->asoc->ep;
> > > > > >         struct sctp_comm_param *commp = p;
> > > > > > -       struct sock *sk = ep->base.sk;
> > > > > >         struct sk_buff *skb = commp->skb;
> > > > > >         struct netlink_callback *cb = commp->cb;
> > > > > >         const struct inet_diag_req_v2 *r = commp->r;
> > > > > >         struct sctp_association *assoc;
> > > > > > +       struct sctp_endpoint *ep;
> > > > > > +       struct sock *sk;
> > > > > >         int err = 0;
> > > > > >
> > > > > > +       rcu_read_lock();
> > > > > > +       ep = tsp->asoc->ep;
> > > > > > +       sk = ep->base.sk;
> > > > > >         lock_sock(sk);
> > > > > Unfortunately, this isn't going to work, as lock_sock() may sleep,
> > > > > and is not allowed to be called understand rcu_read_lock() :(
> > > >
> > > > Ah!
> > > >
> > > > How about my original solution of taking:
> > > >
> > > >   tsp->asoc->ep
> > > >
> > > > ... directly?
> > > >
> > > > If it already holds the sk, we should be golden?
> > > Both ep and sk could be destroyed at this moment.
> > > you can't try to hold an object that has already been destroyed.
> > > It holds the sk only when ep is still alive.
> > >
> > > I don't see a way to get this fix with the current transport hashtable.
> > > I will change to use port hashtable to dump sock/asocs for this.
> >
> > Right.  Cache invalidation is hard!
> >
> > Sure, if there is a better way, please go ahead.
> Hi, Jones,
> 
> Port hashtable doesn't work either as lock_sock can not be called
> under spin_lock().
> 
> I posted another patch where this issue can be fixed by moving ep free
> to call_rcu().
> It will be great if you are able to test it.

I certainly will.

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-12-22 10:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-14 21:57 [RESEND 1/2] sctp: export sctp_endpoint_{hold,put}() for use by seperate modules Lee Jones
2021-12-14 21:57 ` [RESEND 2/2] sctp: hold cached endpoints to prevent possible UAF Lee Jones
2021-12-16  1:48   ` Jakub Kicinski
2021-12-16 16:12     ` Xin Long
2021-12-16 16:39       ` Lee Jones
2021-12-16 16:52         ` Xin Long
2021-12-16 17:13           ` Lee Jones
2021-12-16 17:14             ` Lee Jones
2021-12-16 18:12               ` Xin Long
2021-12-16 18:22                 ` Xin Long
2021-12-16 19:03                   ` Lee Jones
2021-12-19 21:20                     ` Xin Long
2021-12-20  8:56                       ` Lee Jones
2021-12-21 19:52                         ` Xin Long
2021-12-22 10:58                           ` Lee Jones
2021-12-16 18:25                 ` Lee Jones
2021-12-16 20:44         ` Jakub Kicinski
2021-12-17 11:21           ` Lee Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.