All of lore.kernel.org
 help / color / mirror / Atom feed
* UAF in blk_add_rq_to_plug()?
@ 2022-10-31 22:12 Al Viro
  2022-10-31 22:42 ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2022-10-31 22:12 UTC (permalink / raw)
  To: linux-block

static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
{
        struct request *last = rq_list_peek(&plug->mq_list);

Suppose it's not NULL...

        if (!plug->rq_count) {
                trace_block_plug(rq->q);
        } else if (plug->rq_count >= blk_plug_max_rq_count(plug) ||
                   (!blk_queue_nomerges(rq->q) &&
                    blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
... and we went here:
                blk_mq_flush_plug_list(plug, false);
All requests, including the one last points to, might get fed ->queue_rq()
here.  At which point there seems to be nothing to prevent them getting
completed and freed on another CPU, possibly before we return here.

                trace_block_plug(rq->q);
        }

        if (!plug->multiple_queues && last && last->q != rq->q)
... and here we dereference last.

Shouldn't we reset last to NULL after the call of blk_mq_flush_plug_list()
above?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: UAF in blk_add_rq_to_plug()?
  2022-10-31 22:12 UAF in blk_add_rq_to_plug()? Al Viro
@ 2022-10-31 22:42 ` Jens Axboe
  2022-10-31 23:35   ` Al Viro
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2022-10-31 22:42 UTC (permalink / raw)
  To: Al Viro, linux-block

On 10/31/22 4:12 PM, Al Viro wrote:
> static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
> {
>         struct request *last = rq_list_peek(&plug->mq_list);
> 
> Suppose it's not NULL...
> 
>         if (!plug->rq_count) {
>                 trace_block_plug(rq->q);
>         } else if (plug->rq_count >= blk_plug_max_rq_count(plug) ||
>                    (!blk_queue_nomerges(rq->q) &&
>                     blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
> ... and we went here:
>                 blk_mq_flush_plug_list(plug, false);
> All requests, including the one last points to, might get fed ->queue_rq()
> here.  At which point there seems to be nothing to prevent them getting
> completed and freed on another CPU, possibly before we return here.
> 
>                 trace_block_plug(rq->q);
>         }
> 
>         if (!plug->multiple_queues && last && last->q != rq->q)
> ... and here we dereference last.
> 
> Shouldn't we reset last to NULL after the call of blk_mq_flush_plug_list()
> above?

There's no UAF here as the requests aren't freed. We could clear 'last'
to make the code more explicit, and that would avoid any potential
suboptimal behavior with ->multiple_queues being wrong.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: UAF in blk_add_rq_to_plug()?
  2022-10-31 22:42 ` Jens Axboe
@ 2022-10-31 23:35   ` Al Viro
  2022-11-01  0:03     ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2022-10-31 23:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Mon, Oct 31, 2022 at 04:42:11PM -0600, Jens Axboe wrote:
> On 10/31/22 4:12 PM, Al Viro wrote:
> > static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
> > {
> >         struct request *last = rq_list_peek(&plug->mq_list);
> > 
> > Suppose it's not NULL...
> > 
> >         if (!plug->rq_count) {
> >                 trace_block_plug(rq->q);
> >         } else if (plug->rq_count >= blk_plug_max_rq_count(plug) ||
> >                    (!blk_queue_nomerges(rq->q) &&
> >                     blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
> > ... and we went here:
> >                 blk_mq_flush_plug_list(plug, false);
> > All requests, including the one last points to, might get fed ->queue_rq()
> > here.  At which point there seems to be nothing to prevent them getting
> > completed and freed on another CPU, possibly before we return here.
> > 
> >                 trace_block_plug(rq->q);
> >         }
> > 
> >         if (!plug->multiple_queues && last && last->q != rq->q)
> > ... and here we dereference last.
> > 
> > Shouldn't we reset last to NULL after the call of blk_mq_flush_plug_list()
> > above?
> 
> There's no UAF here as the requests aren't freed. We could clear 'last'
> to make the code more explicit, and that would avoid any potential
> suboptimal behavior with ->multiple_queues being wrong.

Umm...
Suppose ->has_elevator is false and so's ->multiple_queues.
No ->queue_rqs(), so blk_mq_flush_plug_list() grabs rcu_read_lock() and
hit blk_mq_plug_issue_direct().
blk_mq_plug_issue_direct() picks the first request off the list
and passes it to blk_mq_request_issue_directly(), which passes it
to __blk_mq_request_issue_directly().  There we grab a tag and
proceed to __blk_mq_issue_directly(), which feeds request to ->queue_rq().

What's to stop e.g. worker on another CPU from picking that sucker,
completing it and calling blk_mq_end_request() which completes all
bio involved and calls blk_mq_free_request()?

If all of that manages to happen before blk_mq_flush_plug_list()
returns to caller...  Sure, you probably won't hit in on bare
metal, but if you are in a KVM and this virtual CPU happens to
lose the host timeslice... I've seen considerably more narrow
race windows getting hit on such setups.

Am I missing something subtle here?  It's been a long time since
I've read through that area - as the matter of fact, I'm trying
to refresh my memories of the bio_submit()-related code paths
at the moment...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: UAF in blk_add_rq_to_plug()?
  2022-10-31 23:35   ` Al Viro
@ 2022-11-01  0:03     ` Jens Axboe
  2022-11-01  0:06       ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2022-11-01  0:03 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-block

On 10/31/22 5:35 PM, Al Viro wrote:
> On Mon, Oct 31, 2022 at 04:42:11PM -0600, Jens Axboe wrote:
>> On 10/31/22 4:12 PM, Al Viro wrote:
>>> static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
>>> {
>>>         struct request *last = rq_list_peek(&plug->mq_list);
>>>
>>> Suppose it's not NULL...
>>>
>>>         if (!plug->rq_count) {
>>>                 trace_block_plug(rq->q);
>>>         } else if (plug->rq_count >= blk_plug_max_rq_count(plug) ||
>>>                    (!blk_queue_nomerges(rq->q) &&
>>>                     blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
>>> ... and we went here:
>>>                 blk_mq_flush_plug_list(plug, false);
>>> All requests, including the one last points to, might get fed ->queue_rq()
>>> here.  At which point there seems to be nothing to prevent them getting
>>> completed and freed on another CPU, possibly before we return here.
>>>
>>>                 trace_block_plug(rq->q);
>>>         }
>>>
>>>         if (!plug->multiple_queues && last && last->q != rq->q)
>>> ... and here we dereference last.
>>>
>>> Shouldn't we reset last to NULL after the call of blk_mq_flush_plug_list()
>>> above?
>>
>> There's no UAF here as the requests aren't freed. We could clear 'last'
>> to make the code more explicit, and that would avoid any potential
>> suboptimal behavior with ->multiple_queues being wrong.
> 
> Umm...
> Suppose ->has_elevator is false and so's ->multiple_queues.
> No ->queue_rqs(), so blk_mq_flush_plug_list() grabs rcu_read_lock() and
> hit blk_mq_plug_issue_direct().
> blk_mq_plug_issue_direct() picks the first request off the list
> and passes it to blk_mq_request_issue_directly(), which passes it
> to __blk_mq_request_issue_directly().  There we grab a tag and
> proceed to __blk_mq_issue_directly(), which feeds request to ->queue_rq().
> 
> What's to stop e.g. worker on another CPU from picking that sucker,
> completing it and calling blk_mq_end_request() which completes all
> bio involved and calls blk_mq_free_request()?
> 
> If all of that manages to happen before blk_mq_flush_plug_list()
> returns to caller...  Sure, you probably won't hit in on bare
> metal, but if you are in a KVM and this virtual CPU happens to
> lose the host timeslice... I've seen considerably more narrow
> race windows getting hit on such setups.
> 
> Am I missing something subtle here?  It's been a long time since
> I've read through that area - as the matter of fact, I'm trying
> to refresh my memories of the bio_submit()-related code paths
> at the moment...

With blk-mq, which all drivers are these days, the request memory is
statically allocated when the driver is setup. I'm not denying that you
could very well have issued AND completed the request which 'last'
points to by the time we dereference it, but that won't be a UAF unless
the device has also been quiesced and removed in between. Which I guess
could indeed be a possibility since it is by definition on a different
queue (->multiple_queues would be true, however, but that's also what
would be required to reach that far into that statement).

This is different from the older days of a request being literally freed
when it completes, which is what I initially reacted to.

As mentioned in the original reply, I do think we should just clear
'last' as you suggest. But it's not something we've seen on the FB fleet
of servers, even with the majority of hosts running this code (and on
VMs).

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: UAF in blk_add_rq_to_plug()?
  2022-11-01  0:03     ` Jens Axboe
@ 2022-11-01  0:06       ` Jens Axboe
  2022-11-01  0:54         ` Al Viro
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2022-11-01  0:06 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-block

On 10/31/22 6:03 PM, Jens Axboe wrote:
> On 10/31/22 5:35 PM, Al Viro wrote:
>> On Mon, Oct 31, 2022 at 04:42:11PM -0600, Jens Axboe wrote:
>>> On 10/31/22 4:12 PM, Al Viro wrote:
>>>> static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
>>>> {
>>>>         struct request *last = rq_list_peek(&plug->mq_list);
>>>>
>>>> Suppose it's not NULL...
>>>>
>>>>         if (!plug->rq_count) {
>>>>                 trace_block_plug(rq->q);
>>>>         } else if (plug->rq_count >= blk_plug_max_rq_count(plug) ||
>>>>                    (!blk_queue_nomerges(rq->q) &&
>>>>                     blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
>>>> ... and we went here:
>>>>                 blk_mq_flush_plug_list(plug, false);
>>>> All requests, including the one last points to, might get fed ->queue_rq()
>>>> here.  At which point there seems to be nothing to prevent them getting
>>>> completed and freed on another CPU, possibly before we return here.
>>>>
>>>>                 trace_block_plug(rq->q);
>>>>         }
>>>>
>>>>         if (!plug->multiple_queues && last && last->q != rq->q)
>>>> ... and here we dereference last.
>>>>
>>>> Shouldn't we reset last to NULL after the call of blk_mq_flush_plug_list()
>>>> above?
>>>
>>> There's no UAF here as the requests aren't freed. We could clear 'last'
>>> to make the code more explicit, and that would avoid any potential
>>> suboptimal behavior with ->multiple_queues being wrong.
>>
>> Umm...
>> Suppose ->has_elevator is false and so's ->multiple_queues.
>> No ->queue_rqs(), so blk_mq_flush_plug_list() grabs rcu_read_lock() and
>> hit blk_mq_plug_issue_direct().
>> blk_mq_plug_issue_direct() picks the first request off the list
>> and passes it to blk_mq_request_issue_directly(), which passes it
>> to __blk_mq_request_issue_directly().  There we grab a tag and
>> proceed to __blk_mq_issue_directly(), which feeds request to ->queue_rq().
>>
>> What's to stop e.g. worker on another CPU from picking that sucker,
>> completing it and calling blk_mq_end_request() which completes all
>> bio involved and calls blk_mq_free_request()?
>>
>> If all of that manages to happen before blk_mq_flush_plug_list()
>> returns to caller...  Sure, you probably won't hit in on bare
>> metal, but if you are in a KVM and this virtual CPU happens to
>> lose the host timeslice... I've seen considerably more narrow
>> race windows getting hit on such setups.
>>
>> Am I missing something subtle here?  It's been a long time since
>> I've read through that area - as the matter of fact, I'm trying
>> to refresh my memories of the bio_submit()-related code paths
>> at the moment...
> 
> With blk-mq, which all drivers are these days, the request memory is
> statically allocated when the driver is setup. I'm not denying that you
> could very well have issued AND completed the request which 'last'
> points to by the time we dereference it, but that won't be a UAF unless
> the device has also been quiesced and removed in between. Which I guess
> could indeed be a possibility since it is by definition on a different
> queue (->multiple_queues would be true, however, but that's also what
> would be required to reach that far into that statement).
> 
> This is different from the older days of a request being literally freed
> when it completes, which is what I initially reacted to.
> 
> As mentioned in the original reply, I do think we should just clear
> 'last' as you suggest. But it's not something we've seen on the FB fleet
> of servers, even with the majority of hosts running this code (and on
> VMs).

Forgot to ask - do you want to send a patch for that, or do you just
want me to cook one up with a reported-by for you?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: UAF in blk_add_rq_to_plug()?
  2022-11-01  0:06       ` Jens Axboe
@ 2022-11-01  0:54         ` Al Viro
  2022-11-01  2:23           ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2022-11-01  0:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Mon, Oct 31, 2022 at 06:06:34PM -0600, Jens Axboe wrote:

> >> Am I missing something subtle here?  It's been a long time since
> >> I've read through that area - as the matter of fact, I'm trying
> >> to refresh my memories of the bio_submit()-related code paths
> >> at the moment...
> > 
> > With blk-mq, which all drivers are these days, the request memory is
> > statically allocated when the driver is setup. I'm not denying that you
> > could very well have issued AND completed the request which 'last'
> > points to by the time we dereference it, but that won't be a UAF unless
> > the device has also been quiesced and removed in between. Which I guess
> > could indeed be a possibility since it is by definition on a different
> > queue (->multiple_queues would be true, however, but that's also what
> > would be required to reach that far into that statement).
> > 
> > This is different from the older days of a request being literally freed
> > when it completes, which is what I initially reacted to.

Got it (and my memories of struct request lifetime rules had been stale,
indeed).  

> > As mentioned in the original reply, I do think we should just clear
> > 'last' as you suggest. But it's not something we've seen on the FB fleet
> > of servers, even with the majority of hosts running this code (and on
> > VMs).
> 
> Forgot to ask - do you want to send a patch for that, or do you just
> want me to cook one up with a reported-by for you?

You mean, try and put together a commit message for that one-liner? ;-)

[PATCH] blk_add_rq_to_plug(): 'last' is stale after flush, if we end up doing one

blk_mq_flush_plug_list() empties ->mq_list and request we'd peeked there
before that call is gone; in any case, we are not dealing with a mix
of requests for different queues now - there's no requests left in the
plug.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8070b6c10e8d..a0e044a645b3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1257,6 +1257,7 @@ static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
 		   (!blk_queue_nomerges(rq->q) &&
 		    blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
 		blk_mq_flush_plug_list(plug, false);
+		last = NULL;
 		trace_block_plug(rq->q);
 	}
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: UAF in blk_add_rq_to_plug()?
  2022-11-01  0:54         ` Al Viro
@ 2022-11-01  2:23           ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2022-11-01  2:23 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-block

On 10/31/22 6:54 PM, Al Viro wrote:
> On Mon, Oct 31, 2022 at 06:06:34PM -0600, Jens Axboe wrote:
> 
>>>> Am I missing something subtle here?  It's been a long time since
>>>> I've read through that area - as the matter of fact, I'm trying
>>>> to refresh my memories of the bio_submit()-related code paths
>>>> at the moment...
>>>
>>> With blk-mq, which all drivers are these days, the request memory is
>>> statically allocated when the driver is setup. I'm not denying that you
>>> could very well have issued AND completed the request which 'last'
>>> points to by the time we dereference it, but that won't be a UAF unless
>>> the device has also been quiesced and removed in between. Which I guess
>>> could indeed be a possibility since it is by definition on a different
>>> queue (->multiple_queues would be true, however, but that's also what
>>> would be required to reach that far into that statement).
>>>
>>> This is different from the older days of a request being literally freed
>>> when it completes, which is what I initially reacted to.
> 
> Got it (and my memories of struct request lifetime rules had been stale,
> indeed).  
> 
>>> As mentioned in the original reply, I do think we should just clear
>>> 'last' as you suggest. But it's not something we've seen on the FB fleet
>>> of servers, even with the majority of hosts running this code (and on
>>> VMs).
>>
>> Forgot to ask - do you want to send a patch for that, or do you just
>> want me to cook one up with a reported-by for you?
> 
> You mean, try and put together a commit message for that one-liner? ;-)

Right, thanks for doing that! Queued it up for 6.1.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-11-01  2:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31 22:12 UAF in blk_add_rq_to_plug()? Al Viro
2022-10-31 22:42 ` Jens Axboe
2022-10-31 23:35   ` Al Viro
2022-11-01  0:03     ` Jens Axboe
2022-11-01  0:06       ` Jens Axboe
2022-11-01  0:54         ` Al Viro
2022-11-01  2:23           ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.