Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
       [not found]             ` <20200519153000.GB22286@lst.de>
@ 2020-05-20  1:18               ` Ming Lei
  2020-05-20  3:04                 ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2020-05-20  1:18 UTC (permalink / raw)
  To: Christoph Hellwig, linux-kernel
  Cc: Thomas Gleixner, linux-block, John Garry, Bart Van Assche,
	Hannes Reinecke

On Tue, May 19, 2020 at 05:30:00PM +0200, Christoph Hellwig wrote:
> On Tue, May 19, 2020 at 09:54:20AM +0800, Ming Lei wrote:
> > As Thomas clarified, workqueue hasn't such issue any more, and only other
> > per CPU kthreads can run until the CPU clears the online bit.
> > 
> > So the question is if IO can be submitted from such kernel context?
> 
> What other per-CPU kthreads even exist?

I don't know, so expose to wider audiences.

> 
> > > INACTIVE is set to the hctx, and it is set by the last CPU to be
> > > offlined that is mapped to the hctx.  once the bit is set the barrier
> > > ensured it is seen everywhere before we start waiting for the requests
> > > to finish.  What is missing?:
> > 
> > memory barrier should always be used as pair, and you should have mentioned
> > that the implied barrier in test_and_set_bit_lock pair from sbitmap_get()
> > is pair of smp_mb__after_atomic() in blk_mq_hctx_notify_offline().
> 
> Documentation/core-api/atomic_ops.rst makes it pretty clear that the
> special smp_mb__before_atomic and smp_mb__after_atomic barriers are only
> used around the set_bit/clear_bit/change_bit operations, and not on the
> test_bit side.  That is also how they are used in all the callsites I
> checked.

I didn't care if the barrier is smp_mb__after_atomic or smp_mb() because it
is added in slow path.

What I tried to express is that every SMP memory barrier use should be commented
clearly, especially about the pairing usage, see "SMP BARRIER PAIRING" section of
Documentation/memory-barriers.txt.

So please add comments around the new added smp_mb__after_atomic(),
something like:

/*
 * The pair of the following smp_mb__after_atomic() is smp_mb() implied in
 * test_and_set_bit_lock pair from sbitmap_get(), so that setting tag bit and
 * checking INACTIVE in blk_mq_get_tag() can be ordered, same with setting
 * INACTIVE and checking tag bit in blk_mq_hctx_notify_offline().
 */

> 
> > Then setting tag bit and checking INACTIVE in blk_mq_get_tag() can be ordered,
> > same with setting INACTIVE and checking tag bit in blk_mq_hctx_notify_offline().
> 
> Buy yes, even if not that would take care of it.

The OPs have been ordered in this way, that is exactly purpose of the added memory
barrier.

thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-20  1:18               ` [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx Ming Lei
@ 2020-05-20  3:04                 ` Ming Lei
  2020-05-20  8:03                   ` io_uring vs CPU hotplug, was " Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2020-05-20  3:04 UTC (permalink / raw)
  To: Christoph Hellwig, linux-kernel
  Cc: Thomas Gleixner, linux-block, John Garry, Bart Van Assche,
	Hannes Reinecke

On Wed, May 20, 2020 at 09:18:23AM +0800, Ming Lei wrote:
> On Tue, May 19, 2020 at 05:30:00PM +0200, Christoph Hellwig wrote:
> > On Tue, May 19, 2020 at 09:54:20AM +0800, Ming Lei wrote:
> > > As Thomas clarified, workqueue hasn't such issue any more, and only other
> > > per CPU kthreads can run until the CPU clears the online bit.
> > > 
> > > So the question is if IO can be submitted from such kernel context?
> > 
> > What other per-CPU kthreads even exist?
> 
> I don't know, so expose to wider audiences.

One user is io uring with IORING_SETUP_SQPOLL & IORING_SETUP_SQ_AFF, see
io_sq_offload_start(), and it is a IO submission kthread.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

* io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-20  3:04                 ` Ming Lei
@ 2020-05-20  8:03                   ` Christoph Hellwig
  2020-05-20 14:45                     ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2020-05-20  8:03 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: Christoph Hellwig, linux-kernel, Thomas Gleixner, linux-block,
	John Garry, Bart Van Assche, Hannes Reinecke, io-uring

On Wed, May 20, 2020 at 11:04:24AM +0800, Ming Lei wrote:
> On Wed, May 20, 2020 at 09:18:23AM +0800, Ming Lei wrote:
> > On Tue, May 19, 2020 at 05:30:00PM +0200, Christoph Hellwig wrote:
> > > On Tue, May 19, 2020 at 09:54:20AM +0800, Ming Lei wrote:
> > > > As Thomas clarified, workqueue hasn't such issue any more, and only other
> > > > per CPU kthreads can run until the CPU clears the online bit.
> > > > 
> > > > So the question is if IO can be submitted from such kernel context?
> > > 
> > > What other per-CPU kthreads even exist?
> > 
> > I don't know, so expose to wider audiences.
> 
> One user is io uring with IORING_SETUP_SQPOLL & IORING_SETUP_SQ_AFF, see
> io_sq_offload_start(), and it is a IO submission kthread.

As far as I can tell that code is buggy, as it still needs to migrate
the thread away when the cpu is offlined.  This isn't a per-cpu kthread
in the sene of having one for each CPU.

Jens?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-20  8:03                   ` io_uring vs CPU hotplug, was " Christoph Hellwig
@ 2020-05-20 14:45                     ` Jens Axboe
  2020-05-20 15:20                       ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2020-05-20 14:45 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: linux-kernel, Thomas Gleixner, linux-block, John Garry,
	Bart Van Assche, Hannes Reinecke, io-uring

On 5/20/20 2:03 AM, Christoph Hellwig wrote:
> On Wed, May 20, 2020 at 11:04:24AM +0800, Ming Lei wrote:
>> On Wed, May 20, 2020 at 09:18:23AM +0800, Ming Lei wrote:
>>> On Tue, May 19, 2020 at 05:30:00PM +0200, Christoph Hellwig wrote:
>>>> On Tue, May 19, 2020 at 09:54:20AM +0800, Ming Lei wrote:
>>>>> As Thomas clarified, workqueue hasn't such issue any more, and only other
>>>>> per CPU kthreads can run until the CPU clears the online bit.
>>>>>
>>>>> So the question is if IO can be submitted from such kernel context?
>>>>
>>>> What other per-CPU kthreads even exist?
>>>
>>> I don't know, so expose to wider audiences.
>>
>> One user is io uring with IORING_SETUP_SQPOLL & IORING_SETUP_SQ_AFF, see
>> io_sq_offload_start(), and it is a IO submission kthread.
> 
> As far as I can tell that code is buggy, as it still needs to migrate
> the thread away when the cpu is offlined.  This isn't a per-cpu kthread
> in the sene of having one for each CPU.
> 
> Jens?

It just uses kthread_create_on_cpu(), nothing home grown. Pretty sure
they just break affinity if that CPU goes offline.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-20 14:45                     ` Jens Axboe
@ 2020-05-20 15:20                       ` Jens Axboe
  2020-05-20 15:31                         ` Christoph Hellwig
  2020-05-20 19:41                         ` Thomas Gleixner
  0 siblings, 2 replies; 17+ messages in thread
From: Jens Axboe @ 2020-05-20 15:20 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: linux-kernel, Thomas Gleixner, linux-block, John Garry,
	Bart Van Assche, Hannes Reinecke, io-uring

On 5/20/20 8:45 AM, Jens Axboe wrote:
> On 5/20/20 2:03 AM, Christoph Hellwig wrote:
>> On Wed, May 20, 2020 at 11:04:24AM +0800, Ming Lei wrote:
>>> On Wed, May 20, 2020 at 09:18:23AM +0800, Ming Lei wrote:
>>>> On Tue, May 19, 2020 at 05:30:00PM +0200, Christoph Hellwig wrote:
>>>>> On Tue, May 19, 2020 at 09:54:20AM +0800, Ming Lei wrote:
>>>>>> As Thomas clarified, workqueue hasn't such issue any more, and only other
>>>>>> per CPU kthreads can run until the CPU clears the online bit.
>>>>>>
>>>>>> So the question is if IO can be submitted from such kernel context?
>>>>>
>>>>> What other per-CPU kthreads even exist?
>>>>
>>>> I don't know, so expose to wider audiences.
>>>
>>> One user is io uring with IORING_SETUP_SQPOLL & IORING_SETUP_SQ_AFF, see
>>> io_sq_offload_start(), and it is a IO submission kthread.
>>
>> As far as I can tell that code is buggy, as it still needs to migrate
>> the thread away when the cpu is offlined.  This isn't a per-cpu kthread
>> in the sene of having one for each CPU.
>>
>> Jens?
> 
> It just uses kthread_create_on_cpu(), nothing home grown. Pretty sure
> they just break affinity if that CPU goes offline.

Just checked, and it works fine for me. If I create an SQPOLL ring with
SQ_AFF set and bound to CPU 3, if CPU 3 goes offline, then the kthread
just appears unbound but runs just fine. When CPU 3 comes online again,
the mask appears correct.

So don't think there's anything wrong on that side. The affinity is a
performance optimization, not a correctness issue. Really not much we
can do if the chosen CPU is offlined, apart from continue to chug along.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-20 15:20                       ` Jens Axboe
@ 2020-05-20 15:31                         ` Christoph Hellwig
  2020-05-20 19:41                         ` Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-05-20 15:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-kernel, Thomas Gleixner,
	linux-block, John Garry, Bart Van Assche, Hannes Reinecke,
	io-uring

On Wed, May 20, 2020 at 09:20:50AM -0600, Jens Axboe wrote:
> Just checked, and it works fine for me. If I create an SQPOLL ring with
> SQ_AFF set and bound to CPU 3, if CPU 3 goes offline, then the kthread
> just appears unbound but runs just fine. When CPU 3 comes online again,
> the mask appears correct.
> 
> So don't think there's anything wrong on that side. The affinity is a
> performance optimization, not a correctness issue. Really not much we
> can do if the chosen CPU is offlined, apart from continue to chug along.

Ok, that sounds pretty sensible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-20 15:20                       ` Jens Axboe
  2020-05-20 15:31                         ` Christoph Hellwig
@ 2020-05-20 19:41                         ` Thomas Gleixner
  2020-05-20 20:18                           ` Jens Axboe
  1 sibling, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2020-05-20 19:41 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Ming Lei
  Cc: linux-kernel, linux-block, John Garry, Bart Van Assche,
	Hannes Reinecke, io-uring

Jens Axboe <axboe@kernel.dk> writes:
> On 5/20/20 8:45 AM, Jens Axboe wrote:
>> It just uses kthread_create_on_cpu(), nothing home grown. Pretty sure
>> they just break affinity if that CPU goes offline.
>
> Just checked, and it works fine for me. If I create an SQPOLL ring with
> SQ_AFF set and bound to CPU 3, if CPU 3 goes offline, then the kthread
> just appears unbound but runs just fine. When CPU 3 comes online again,
> the mask appears correct.

When exactly during the unplug operation is it unbound?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-20 19:41                         ` Thomas Gleixner
@ 2020-05-20 20:18                           ` Jens Axboe
  2020-05-20 22:14                             ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2020-05-20 20:18 UTC (permalink / raw)
  To: Thomas Gleixner, Christoph Hellwig, Ming Lei
  Cc: linux-kernel, linux-block, John Garry, Bart Van Assche,
	Hannes Reinecke, io-uring

On 5/20/20 1:41 PM, Thomas Gleixner wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>> On 5/20/20 8:45 AM, Jens Axboe wrote:
>>> It just uses kthread_create_on_cpu(), nothing home grown. Pretty sure
>>> they just break affinity if that CPU goes offline.
>>
>> Just checked, and it works fine for me. If I create an SQPOLL ring with
>> SQ_AFF set and bound to CPU 3, if CPU 3 goes offline, then the kthread
>> just appears unbound but runs just fine. When CPU 3 comes online again,
>> the mask appears correct.
> 
> When exactly during the unplug operation is it unbound?

When the CPU has been fully offlined. I check the affinity mask, it
reports 0. But it's still being scheduled, and it's processing work.
Here's an example, PID 420 is the thread in question:

[root@archlinux cpu3]# taskset -p 420
pid 420's current affinity mask: 8
[root@archlinux cpu3]# echo 0 > online 
[root@archlinux cpu3]# taskset -p 420
pid 420's current affinity mask: 0
[root@archlinux cpu3]# echo 1 > online 
[root@archlinux cpu3]# taskset -p 420
pid 420's current affinity mask: 8

So as far as I can tell, it's working fine for me with the goals
I have for that kthread.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-20 20:18                           ` Jens Axboe
@ 2020-05-20 22:14                             ` Thomas Gleixner
  2020-05-20 22:40                               ` Jens Axboe
  2020-05-21  2:27                               ` Ming Lei
  0 siblings, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2020-05-20 22:14 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Ming Lei
  Cc: linux-kernel, linux-block, John Garry, Bart Van Assche,
	Hannes Reinecke, io-uring, Peter Zijlstra

Jens Axboe <axboe@kernel.dk> writes:

> On 5/20/20 1:41 PM, Thomas Gleixner wrote:
>> Jens Axboe <axboe@kernel.dk> writes:
>>> On 5/20/20 8:45 AM, Jens Axboe wrote:
>>>> It just uses kthread_create_on_cpu(), nothing home grown. Pretty sure
>>>> they just break affinity if that CPU goes offline.
>>>
>>> Just checked, and it works fine for me. If I create an SQPOLL ring with
>>> SQ_AFF set and bound to CPU 3, if CPU 3 goes offline, then the kthread
>>> just appears unbound but runs just fine. When CPU 3 comes online again,
>>> the mask appears correct.
>> 
>> When exactly during the unplug operation is it unbound?
>
> When the CPU has been fully offlined. I check the affinity mask, it
> reports 0. But it's still being scheduled, and it's processing work.
> Here's an example, PID 420 is the thread in question:
>
> [root@archlinux cpu3]# taskset -p 420
> pid 420's current affinity mask: 8
> [root@archlinux cpu3]# echo 0 > online 
> [root@archlinux cpu3]# taskset -p 420
> pid 420's current affinity mask: 0
> [root@archlinux cpu3]# echo 1 > online 
> [root@archlinux cpu3]# taskset -p 420
> pid 420's current affinity mask: 8
>
> So as far as I can tell, it's working fine for me with the goals
> I have for that kthread.

Works for me is not really useful information and does not answer my
question:

>> When exactly during the unplug operation is it unbound?

The problem Ming and Christoph are trying to solve requires that the
thread is migrated _before_ the hardware queue is shut down and
drained. That's why I asked for the exact point where this happens.

When the CPU is finally offlined, i.e. the CPU cleared the online bit in
the online mask is definitely too late simply because it still runs on
that outgoing CPU _after_ the hardware queue is shut down and drained.

This needs more thought and changes to sched and kthread so that the
kthread breaks affinity once the CPU goes offline. Too tired to figure
that out right now.

Thanks,

        tglx





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-20 22:14                             ` Thomas Gleixner
@ 2020-05-20 22:40                               ` Jens Axboe
  2020-05-21  2:27                               ` Ming Lei
  1 sibling, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2020-05-20 22:40 UTC (permalink / raw)
  To: Thomas Gleixner, Christoph Hellwig, Ming Lei
  Cc: linux-kernel, linux-block, John Garry, Bart Van Assche,
	Hannes Reinecke, io-uring, Peter Zijlstra

On 5/20/20 4:14 PM, Thomas Gleixner wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> On 5/20/20 1:41 PM, Thomas Gleixner wrote:
>>> Jens Axboe <axboe@kernel.dk> writes:
>>>> On 5/20/20 8:45 AM, Jens Axboe wrote:
>>>>> It just uses kthread_create_on_cpu(), nothing home grown. Pretty sure
>>>>> they just break affinity if that CPU goes offline.
>>>>
>>>> Just checked, and it works fine for me. If I create an SQPOLL ring with
>>>> SQ_AFF set and bound to CPU 3, if CPU 3 goes offline, then the kthread
>>>> just appears unbound but runs just fine. When CPU 3 comes online again,
>>>> the mask appears correct.
>>>
>>> When exactly during the unplug operation is it unbound?
>>
>> When the CPU has been fully offlined. I check the affinity mask, it
>> reports 0. But it's still being scheduled, and it's processing work.
>> Here's an example, PID 420 is the thread in question:
>>
>> [root@archlinux cpu3]# taskset -p 420
>> pid 420's current affinity mask: 8
>> [root@archlinux cpu3]# echo 0 > online 
>> [root@archlinux cpu3]# taskset -p 420
>> pid 420's current affinity mask: 0
>> [root@archlinux cpu3]# echo 1 > online 
>> [root@archlinux cpu3]# taskset -p 420
>> pid 420's current affinity mask: 8
>>
>> So as far as I can tell, it's working fine for me with the goals
>> I have for that kthread.
> 
> Works for me is not really useful information and does not answer my
> question:
> 
>>> When exactly during the unplug operation is it unbound?

I agree, and that question is relevant to the block side of things. What
Christoph asked in this particular sub-thread was specifically for the
io_uring sqpoll thread, and that's what I was adressing. For that, it
doesn't matter _when_ it becomes unbound. All that matters it that it
breaks affinity and keeps working.

> The problem Ming and Christoph are trying to solve requires that the
> thread is migrated _before_ the hardware queue is shut down and
> drained. That's why I asked for the exact point where this happens.

Right, and I haven't looked into that at all, so don't know the answer
to that question.

> When the CPU is finally offlined, i.e. the CPU cleared the online bit in
> the online mask is definitely too late simply because it still runs on
> that outgoing CPU _after_ the hardware queue is shut down and drained.
> 
> This needs more thought and changes to sched and kthread so that the
> kthread breaks affinity once the CPU goes offline. Too tired to figure
> that out right now.

Yes, to provide the needed guarantees for the block ctx and hctx
mappings we'll need to know exactly at what stage it ceases to run on
that CPU.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-20 22:14                             ` Thomas Gleixner
  2020-05-20 22:40                               ` Jens Axboe
@ 2020-05-21  2:27                               ` Ming Lei
  2020-05-21  8:13                                 ` Thomas Gleixner
  1 sibling, 1 reply; 17+ messages in thread
From: Ming Lei @ 2020-05-21  2:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jens Axboe, Christoph Hellwig, linux-kernel, linux-block,
	John Garry, Bart Van Assche, Hannes Reinecke, io-uring,
	Peter Zijlstra

On Thu, May 21, 2020 at 12:14:18AM +0200, Thomas Gleixner wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
> > On 5/20/20 1:41 PM, Thomas Gleixner wrote:
> >> Jens Axboe <axboe@kernel.dk> writes:
> >>> On 5/20/20 8:45 AM, Jens Axboe wrote:
> >>>> It just uses kthread_create_on_cpu(), nothing home grown. Pretty sure
> >>>> they just break affinity if that CPU goes offline.
> >>>
> >>> Just checked, and it works fine for me. If I create an SQPOLL ring with
> >>> SQ_AFF set and bound to CPU 3, if CPU 3 goes offline, then the kthread
> >>> just appears unbound but runs just fine. When CPU 3 comes online again,
> >>> the mask appears correct.
> >> 
> >> When exactly during the unplug operation is it unbound?
> >
> > When the CPU has been fully offlined. I check the affinity mask, it
> > reports 0. But it's still being scheduled, and it's processing work.
> > Here's an example, PID 420 is the thread in question:
> >
> > [root@archlinux cpu3]# taskset -p 420
> > pid 420's current affinity mask: 8
> > [root@archlinux cpu3]# echo 0 > online 
> > [root@archlinux cpu3]# taskset -p 420
> > pid 420's current affinity mask: 0
> > [root@archlinux cpu3]# echo 1 > online 
> > [root@archlinux cpu3]# taskset -p 420
> > pid 420's current affinity mask: 8
> >
> > So as far as I can tell, it's working fine for me with the goals
> > I have for that kthread.
> 
> Works for me is not really useful information and does not answer my
> question:
> 
> >> When exactly during the unplug operation is it unbound?
> 
> The problem Ming and Christoph are trying to solve requires that the
> thread is migrated _before_ the hardware queue is shut down and
> drained. That's why I asked for the exact point where this happens.
> 
> When the CPU is finally offlined, i.e. the CPU cleared the online bit in
> the online mask is definitely too late simply because it still runs on
> that outgoing CPU _after_ the hardware queue is shut down and drained.

IMO, the patch in Christoph's blk-mq-hotplug.2 still works for percpu
kthread.

It is just not optimal in the retrying, but it should be fine. When the
percpu kthread is scheduled on the CPU to be offlined:

- if the kthread doesn't observe the INACTIVE flag, the allocated request
will be drained.

- otherwise, the kthread just retries and retries to allocate & release,
and sooner or later, its time slice is consumed, and migrated out, and the
cpu hotplug handler will get chance to run and move on, then the cpu is
shutdown.

- After the cpu is shutdown, the percpu kthread becomes unbound, and
the allocation from new online cpu will succeed.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-21  2:27                               ` Ming Lei
@ 2020-05-21  8:13                                 ` Thomas Gleixner
  2020-05-21  9:23                                   ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2020-05-21  8:13 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-kernel, linux-block,
	John Garry, Bart Van Assche, Hannes Reinecke, io-uring,
	Peter Zijlstra

Ming Lei <ming.lei@redhat.com> writes:
> On Thu, May 21, 2020 at 12:14:18AM +0200, Thomas Gleixner wrote:
>> When the CPU is finally offlined, i.e. the CPU cleared the online bit in
>> the online mask is definitely too late simply because it still runs on
>> that outgoing CPU _after_ the hardware queue is shut down and drained.
>
> IMO, the patch in Christoph's blk-mq-hotplug.2 still works for percpu
> kthread.
>
> It is just not optimal in the retrying, but it should be fine. When the
> percpu kthread is scheduled on the CPU to be offlined:
>
> - if the kthread doesn't observe the INACTIVE flag, the allocated request
> will be drained.
>
> - otherwise, the kthread just retries and retries to allocate & release,
> and sooner or later, its time slice is consumed, and migrated out, and the
> cpu hotplug handler will get chance to run and move on, then the cpu is
> shutdown.

1) This is based on the assumption that the kthread is in the SCHED_OTHER
   scheduling class. Is that really a valid assumption?

2) What happens in the following scenario:

   unplug

     mq_offline
       set_ctx_inactive()
       drain_io()
       
   io_kthread()
       try_queue()
       wait_on_ctx()

   Can this happen and if so what will wake up that thread?

I'm not familiar enough with that code to answer #2, but this really
wants to be properly described and documented.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-21  8:13                                 ` Thomas Gleixner
@ 2020-05-21  9:23                                   ` Ming Lei
  2020-05-21 18:39                                     ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2020-05-21  9:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jens Axboe, Christoph Hellwig, linux-kernel, linux-block,
	John Garry, Bart Van Assche, Hannes Reinecke, io-uring,
	Peter Zijlstra

Hi Thomas,

On Thu, May 21, 2020 at 10:13:59AM +0200, Thomas Gleixner wrote:
> Ming Lei <ming.lei@redhat.com> writes:
> > On Thu, May 21, 2020 at 12:14:18AM +0200, Thomas Gleixner wrote:
> >> When the CPU is finally offlined, i.e. the CPU cleared the online bit in
> >> the online mask is definitely too late simply because it still runs on
> >> that outgoing CPU _after_ the hardware queue is shut down and drained.
> >
> > IMO, the patch in Christoph's blk-mq-hotplug.2 still works for percpu
> > kthread.
> >
> > It is just not optimal in the retrying, but it should be fine. When the
> > percpu kthread is scheduled on the CPU to be offlined:
> >
> > - if the kthread doesn't observe the INACTIVE flag, the allocated request
> > will be drained.
> >
> > - otherwise, the kthread just retries and retries to allocate & release,
> > and sooner or later, its time slice is consumed, and migrated out, and the
> > cpu hotplug handler will get chance to run and move on, then the cpu is
> > shutdown.
> 
> 1) This is based on the assumption that the kthread is in the SCHED_OTHER
>    scheduling class. Is that really a valid assumption?

Given it is unlikely path, we can add msleep() before retrying when INACTIVE bit
is observed by current thread, and this way can avoid spinning and should work
for other schedulers.

> 
> 2) What happens in the following scenario:
> 
>    unplug
> 
>      mq_offline
>        set_ctx_inactive()
>        drain_io()
>        
>    io_kthread()
>        try_queue()
>        wait_on_ctx()
> 
>    Can this happen and if so what will wake up that thread?

drain_io() releases all tag of this hctx, then wait_on_ctx() will be waken up
after any tag is released.

If wait_on_ctx() waits for other generic resource, it will be waken up
after this resource is available.

thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-21  9:23                                   ` Ming Lei
@ 2020-05-21 18:39                                     ` Thomas Gleixner
  2020-05-21 18:45                                       ` Jens Axboe
  2020-05-22  1:57                                       ` Ming Lei
  0 siblings, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2020-05-21 18:39 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-kernel, linux-block,
	John Garry, Bart Van Assche, Hannes Reinecke, io-uring,
	Peter Zijlstra

Ming,

Ming Lei <ming.lei@redhat.com> writes:
> On Thu, May 21, 2020 at 10:13:59AM +0200, Thomas Gleixner wrote:
>> Ming Lei <ming.lei@redhat.com> writes:
>> > On Thu, May 21, 2020 at 12:14:18AM +0200, Thomas Gleixner wrote:
>> > - otherwise, the kthread just retries and retries to allocate & release,
>> > and sooner or later, its time slice is consumed, and migrated out, and the
>> > cpu hotplug handler will get chance to run and move on, then the cpu is
>> > shutdown.
>> 
>> 1) This is based on the assumption that the kthread is in the SCHED_OTHER
>>    scheduling class. Is that really a valid assumption?
>
> Given it is unlikely path, we can add msleep() before retrying when INACTIVE bit
> is observed by current thread, and this way can avoid spinning and should work
> for other schedulers.

That should work, but pretty is something else

>> 
>> 2) What happens in the following scenario:
>> 
>>    unplug
>> 
>>      mq_offline
>>        set_ctx_inactive()
>>        drain_io()
>>        
>>    io_kthread()
>>        try_queue()
>>        wait_on_ctx()
>> 
>>    Can this happen and if so what will wake up that thread?
>
> drain_io() releases all tag of this hctx, then wait_on_ctx() will be waken up
> after any tag is released.

drain_io() is already done ...

So looking at that thread function:

static int io_sq_thread(void *data)
{
	struct io_ring_ctx *ctx = data;

        while (...) {
              ....
	      to_submit = io_sqring_entries(ctx);

--> preemption

hotplug runs
   mq_offline()
      set_ctx_inactive();
      drain_io();
      finished();

--> thread runs again

      mutex_lock(&ctx->uring_lock);
      ret = io_submit_sqes(ctx, to_submit, NULL, -1, true);
      mutex_unlock(&ctx->uring_lock);

      ....

      if (!to_submit || ret == -EBUSY)
          ...
      	  wait_on_ctx();

Can this happen or did drain_io() already take care of the 'to_submit'
items and the call to io_submit_sqes() turns into a zero action ?

If the above happens then nothing will wake it up because the context
draining is done and finished.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-21 18:39                                     ` Thomas Gleixner
@ 2020-05-21 18:45                                       ` Jens Axboe
  2020-05-21 20:00                                         ` Thomas Gleixner
  2020-05-22  1:57                                       ` Ming Lei
  1 sibling, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2020-05-21 18:45 UTC (permalink / raw)
  To: Thomas Gleixner, Ming Lei
  Cc: Christoph Hellwig, linux-kernel, linux-block, John Garry,
	Bart Van Assche, Hannes Reinecke, io-uring, Peter Zijlstra

On 5/21/20 12:39 PM, Thomas Gleixner wrote:
> Ming,
> 
> Ming Lei <ming.lei@redhat.com> writes:
>> On Thu, May 21, 2020 at 10:13:59AM +0200, Thomas Gleixner wrote:
>>> Ming Lei <ming.lei@redhat.com> writes:
>>>> On Thu, May 21, 2020 at 12:14:18AM +0200, Thomas Gleixner wrote:
>>>> - otherwise, the kthread just retries and retries to allocate & release,
>>>> and sooner or later, its time slice is consumed, and migrated out, and the
>>>> cpu hotplug handler will get chance to run and move on, then the cpu is
>>>> shutdown.
>>>
>>> 1) This is based on the assumption that the kthread is in the SCHED_OTHER
>>>    scheduling class. Is that really a valid assumption?
>>
>> Given it is unlikely path, we can add msleep() before retrying when INACTIVE bit
>> is observed by current thread, and this way can avoid spinning and should work
>> for other schedulers.
> 
> That should work, but pretty is something else
> 
>>>
>>> 2) What happens in the following scenario:
>>>
>>>    unplug
>>>
>>>      mq_offline
>>>        set_ctx_inactive()
>>>        drain_io()
>>>        
>>>    io_kthread()
>>>        try_queue()
>>>        wait_on_ctx()
>>>
>>>    Can this happen and if so what will wake up that thread?
>>
>> drain_io() releases all tag of this hctx, then wait_on_ctx() will be waken up
>> after any tag is released.
> 
> drain_io() is already done ...
> 
> So looking at that thread function:
> 
> static int io_sq_thread(void *data)
> {
> 	struct io_ring_ctx *ctx = data;
> 
>         while (...) {
>               ....
> 	      to_submit = io_sqring_entries(ctx);
> 
> --> preemption
> 
> hotplug runs
>    mq_offline()
>       set_ctx_inactive();
>       drain_io();
>       finished();
> 
> --> thread runs again
> 
>       mutex_lock(&ctx->uring_lock);
>       ret = io_submit_sqes(ctx, to_submit, NULL, -1, true);
>       mutex_unlock(&ctx->uring_lock);
> 
>       ....
> 
>       if (!to_submit || ret == -EBUSY)
>           ...
>       	  wait_on_ctx();
> 
> Can this happen or did drain_io() already take care of the 'to_submit'
> items and the call to io_submit_sqes() turns into a zero action ?
> 
> If the above happens then nothing will wake it up because the context
> draining is done and finished.

Again, this is mixing up io_uring and blk-mq. Maybe it's the fact that
both use 'ctx' that makes this confusing. On the blk-mq side, the 'ctx'
is the per-cpu queue context, for io_uring it's the io_uring instance.

io_sq_thread() doesn't care about any sort of percpu mappings, it's
happy as long as it'll keep running regardless of whether or not the
optional pinned CPU is selected and then offlined.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-21 18:45                                       ` Jens Axboe
@ 2020-05-21 20:00                                         ` Thomas Gleixner
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2020-05-21 20:00 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: Christoph Hellwig, linux-kernel, linux-block, John Garry,
	Bart Van Assche, Hannes Reinecke, io-uring, Peter Zijlstra

Jens Axboe <axboe@kernel.dk> writes:
> Again, this is mixing up io_uring and blk-mq. Maybe it's the fact that
> both use 'ctx' that makes this confusing. On the blk-mq side, the 'ctx'
> is the per-cpu queue context, for io_uring it's the io_uring instance.

Yes, that got me horribly confused. :)

> io_sq_thread() doesn't care about any sort of percpu mappings, it's
> happy as long as it'll keep running regardless of whether or not the
> optional pinned CPU is selected and then offlined.

Fair enough.

So aside of the potential spin forever if the uring thread is lifted to
an RT scheduling class, this looks all good.

Though I assume that if that thread is pinned and an admin pushs it into
RT scheduling the spinning live lock can happen independent of cpu
hotplug.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: io_uring vs CPU hotplug, was Re: [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx
  2020-05-21 18:39                                     ` Thomas Gleixner
  2020-05-21 18:45                                       ` Jens Axboe
@ 2020-05-22  1:57                                       ` Ming Lei
  1 sibling, 0 replies; 17+ messages in thread
From: Ming Lei @ 2020-05-22  1:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jens Axboe, Christoph Hellwig, linux-kernel, linux-block,
	John Garry, Bart Van Assche, Hannes Reinecke, io-uring,
	Peter Zijlstra

On Thu, May 21, 2020 at 08:39:16PM +0200, Thomas Gleixner wrote:
> Ming,
> 
> Ming Lei <ming.lei@redhat.com> writes:
> > On Thu, May 21, 2020 at 10:13:59AM +0200, Thomas Gleixner wrote:
> >> Ming Lei <ming.lei@redhat.com> writes:
> >> > On Thu, May 21, 2020 at 12:14:18AM +0200, Thomas Gleixner wrote:
> >> > - otherwise, the kthread just retries and retries to allocate & release,
> >> > and sooner or later, its time slice is consumed, and migrated out, and the
> >> > cpu hotplug handler will get chance to run and move on, then the cpu is
> >> > shutdown.
> >> 
> >> 1) This is based on the assumption that the kthread is in the SCHED_OTHER
> >>    scheduling class. Is that really a valid assumption?
> >
> > Given it is unlikely path, we can add msleep() before retrying when INACTIVE bit
> > is observed by current thread, and this way can avoid spinning and should work
> > for other schedulers.
> 
> That should work, but pretty is something else
> 
> >> 
> >> 2) What happens in the following scenario:
> >> 
> >>    unplug
> >> 
> >>      mq_offline
> >>        set_ctx_inactive()
> >>        drain_io()
> >>        
> >>    io_kthread()
> >>        try_queue()
> >>        wait_on_ctx()
> >> 
> >>    Can this happen and if so what will wake up that thread?
> >
> > drain_io() releases all tag of this hctx, then wait_on_ctx() will be waken up
> > after any tag is released.
> 
> drain_io() is already done ...
> 
> So looking at that thread function:
> 
> static int io_sq_thread(void *data)
> {
> 	struct io_ring_ctx *ctx = data;
> 
>         while (...) {
>               ....
> 	      to_submit = io_sqring_entries(ctx);
> 
> --> preemption
> 
> hotplug runs
>    mq_offline()
>       set_ctx_inactive();
>       drain_io();
>       finished();
> 
> --> thread runs again
> 
>       mutex_lock(&ctx->uring_lock);
>       ret = io_submit_sqes(ctx, to_submit, NULL, -1, true);
>       mutex_unlock(&ctx->uring_lock);
> 
>       ....
> 
>       if (!to_submit || ret == -EBUSY)
>           ...
>       	  wait_on_ctx();
> 
> Can this happen or did drain_io() already take care of the 'to_submit'
> items and the call to io_submit_sqes() turns into a zero action ?
> 
> If the above happens then nothing will wake it up because the context
> draining is done and finished.

As Jens replied, you mixed the ctx from io uring and blk-mq, both are in
two worlds.

Any wait in this percpu kthread should just wait for generic resource,
not directly related with blk-mq's inactive hctx. Once this thread is
migrated to other online cpu, it will move on.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-05-22  1:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200518093155.GB35380@T590>
     [not found] ` <87imgty15d.fsf@nanos.tec.linutronix.de>
     [not found]   ` <20200518115454.GA46364@T590>
     [not found]     ` <20200518131634.GA645@lst.de>
     [not found]       ` <20200518141107.GA50374@T590>
     [not found]         ` <20200518165619.GA17465@lst.de>
     [not found]           ` <20200519015420.GA70957@T590>
     [not found]             ` <20200519153000.GB22286@lst.de>
2020-05-20  1:18               ` [PATCH 5/9] blk-mq: don't set data->ctx and data->hctx in blk_mq_alloc_request_hctx Ming Lei
2020-05-20  3:04                 ` Ming Lei
2020-05-20  8:03                   ` io_uring vs CPU hotplug, was " Christoph Hellwig
2020-05-20 14:45                     ` Jens Axboe
2020-05-20 15:20                       ` Jens Axboe
2020-05-20 15:31                         ` Christoph Hellwig
2020-05-20 19:41                         ` Thomas Gleixner
2020-05-20 20:18                           ` Jens Axboe
2020-05-20 22:14                             ` Thomas Gleixner
2020-05-20 22:40                               ` Jens Axboe
2020-05-21  2:27                               ` Ming Lei
2020-05-21  8:13                                 ` Thomas Gleixner
2020-05-21  9:23                                   ` Ming Lei
2020-05-21 18:39                                     ` Thomas Gleixner
2020-05-21 18:45                                       ` Jens Axboe
2020-05-21 20:00                                         ` Thomas Gleixner
2020-05-22  1:57                                       ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).