All of lore.kernel.org
 help / color / mirror / Atom feed
* Request for feedback : Possible use-after-free in routing SA query via netlink
@ 2020-04-24 15:28 Divya Indi
  2020-04-24 18:22 ` Jason Gunthorpe
  0 siblings, 1 reply; 3+ messages in thread
From: Divya Indi @ 2020-04-24 15:28 UTC (permalink / raw)
  To: linux-kernel, linux-rdma
  Cc: Jason Gunthorpe, Håkon Bugge, Kaike Wan, Doug Ledford,
	Gerd Rausch, Srinivas Eeda, Rama Nichanamatlu

Hi All,

I wanted some feedback on a crash caused due to use-after-free in the 
ibacm code path [while routing SA query via netlink].

Commit 3ebd2fd IB/sa: Put netlink request into the request list before sending

Above commit moved adding the query to the request list before ib_nl_snd_msg
and moved ib_nl_snd_msg out of the spinlock (request_lock).

However, if there is a delay in sending out the request (For
eg: Delay due to low memory situation) the timer to handle request timeout
might kick in before the request is sent out to ibacm via netlink.
ib_nl_request_timeout may result in release of the query (by call to send_handler) 
while ib_nl_snd_msg is still accessing query.


We get the following stacktrace for the crash -

[<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core]
[<ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
[<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
[<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
[<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
[rds_rdma]
[<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850
[rds_rdma]
[<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
[<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm]
[<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr]
[<ffffffff810a02f9>] process_one_work+0x169/0x4a0
[<ffffffff810a0b2b>] worker_thread+0x5b/0x560
[<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50
[<ffffffff810a68fb>] kthread+0xcb/0xf0
[<ffffffff816ec49a>] ? __schedule+0x24a/0x810
[<ffffffff816ec49a>] ? __schedule+0x24a/0x810
[<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
[<ffffffff816f25a7>] ret_from_fork+0x47/0x90
[<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180
....
RIP  [<ffffffffa03296cd>] send_mad+0x33d/0x5d0 [ib_sa] 

On analysis of the vmcore, we see crash happens at -

ib_sa_path_rec_get
  send_mad
     ib_nl_make_request()
        ib_nl_send_msg
                1  static void ib_nl_set_path_rec_attrs(struct sk_buff *skb,
                2        struct ib_sa_query *query)
                3  {
                4   struct ib_sa_path_rec *sa_rec = query->mad_buf->context[1];
                5   struct ib_sa_mad *mad = query->mad_buf->mad;
                6   ib_sa_comp_mask comp_mask = mad->sa_hdr.comp_mask;

Page fault occurs at line 5 while trying to access query->mad_buf->mad;

If we look at the query, it does not appear to be a valid ib_sa_query. Instead
looks like a pid struct for a process -> Use-after-free situation.

We could simulate the crash by explicitly introducing a delay in ib_nl_snd_msg with
a sleep. The timer kicks in before ib_nl_send_msg has even sent out the request 
and releases the query. We could reproduce the crash with a similar stack trace.

To summarize - We have a use-after-free possibility here when the timer(ib_nl_request_timeout)
kicks in before ib_nl_snd_msg has completed sending the query out to ibacm via netlink. The 
timeout handler ie ib_nl_request_timeout may result in releasing the query while ib_nl_snd_msg 
is still accessing query.

Appreciate your thoughts on the above issue.


Thanks,
Divya 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Request for feedback : Possible use-after-free in routing SA query via netlink
  2020-04-24 15:28 Request for feedback : Possible use-after-free in routing SA query via netlink Divya Indi
@ 2020-04-24 18:22 ` Jason Gunthorpe
  2020-04-30 15:18   ` Divya Indi
  0 siblings, 1 reply; 3+ messages in thread
From: Jason Gunthorpe @ 2020-04-24 18:22 UTC (permalink / raw)
  To: Divya Indi
  Cc: linux-kernel, linux-rdma, Håkon Bugge, Kaike Wan,
	Doug Ledford, Gerd Rausch, Srinivas Eeda, Rama Nichanamatlu

On Fri, Apr 24, 2020 at 08:28:09AM -0700, Divya Indi wrote:

> If we look at the query, it does not appear to be a valid ib_sa_query. Instead
> looks like a pid struct for a process -> Use-after-free situation.
> 
> We could simulate the crash by explicitly introducing a delay in ib_nl_snd_msg with
> a sleep. The timer kicks in before ib_nl_send_msg has even sent out the request 
> and releases the query. We could reproduce the crash with a similar stack trace.
> 
> To summarize - We have a use-after-free possibility here when the timer(ib_nl_request_timeout)
> kicks in before ib_nl_snd_msg has completed sending the query out to ibacm via netlink. The 
> timeout handler ie ib_nl_request_timeout may result in releasing the query while ib_nl_snd_msg 
> is still accessing query.
> 
> Appreciate your thoughts on the above issue.

Your assesment looks right to me.

Fixing it will require being able to clearly explain what the lifetime
cycle is for ib_sa_query - and what is there today looks like a mess,
so I'm not sure what it should be changed into.

There is lots of other stuff there that looks really weird, like
ib_sa_cancel_query() keeps going even though there are still timers
running..

Jason

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Request for feedback : Possible use-after-free in routing SA query via netlink
  2020-04-24 18:22 ` Jason Gunthorpe
@ 2020-04-30 15:18   ` Divya Indi
  0 siblings, 0 replies; 3+ messages in thread
From: Divya Indi @ 2020-04-30 15:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, linux-rdma, Håkon Bugge, Kaike Wan,
	Doug Ledford, Gerd Rausch, Srinivas Eeda, Rama Nichanamatlu

Hi Jason,

Thanks for taking the time to review. 

On 4/24/20 11:22 AM, Jason Gunthorpe wrote:
> On Fri, Apr 24, 2020 at 08:28:09AM -0700, Divya Indi wrote:
>
>> If we look at the query, it does not appear to be a valid ib_sa_query. Instead
>> looks like a pid struct for a process -> Use-after-free situation.
>>
>> We could simulate the crash by explicitly introducing a delay in ib_nl_snd_msg with
>> a sleep. The timer kicks in before ib_nl_send_msg has even sent out the request 
>> and releases the query. We could reproduce the crash with a similar stack trace.
>>
>> To summarize - We have a use-after-free possibility here when the timer(ib_nl_request_timeout)
>> kicks in before ib_nl_snd_msg has completed sending the query out to ibacm via netlink. The 
>> timeout handler ie ib_nl_request_timeout may result in releasing the query while ib_nl_snd_msg 
>> is still accessing query.
>>
>> Appreciate your thoughts on the above issue.
> Your assesment looks right to me.
>
> Fixing it will require being able to clearly explain what the lifetime
> cycle is for ib_sa_query - and what is there today looks like a mess,
> so I'm not sure what it should be changed into.

The issue reported here appears to be restricted to the ibacm code path. 
Hence, we tried to resolve it for the life cycle of sa_query in the ibacm code path.
I will follow up this email with a proposed fix(patch), it would be very helpful if
you can provide your feedback on the same.

Thanks,
Divya

>
> The
> re is lots of other stuff there that looks really weird, like
> ib_sa_cancel_query() keeps going even though there are still timers
> running..
>
> Jason

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-30 15:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-24 15:28 Request for feedback : Possible use-after-free in routing SA query via netlink Divya Indi
2020-04-24 18:22 ` Jason Gunthorpe
2020-04-30 15:18   ` Divya Indi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.