Re: [PATCH 1/2] loop: use a global workqueue

From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.cz>,
	Dan Schatzberg <schatzberg.dan@gmail.com>,
	linux-block@vger.kernel.org
Subject: Re: [PATCH 1/2] loop: use a global workqueue
Date: Fri, 24 Dec 2021 21:05:53 +0900	[thread overview]
Message-ID: <f839a895-bb91-02f8-33fb-5994dd725d24@i-love.sakura.ne.jp> (raw)
In-Reply-To: <20211224060311.GC12234@lst.de>

On 2021/12/24 15:03, Christoph Hellwig wrote:
> On Thu, Dec 23, 2021 at 11:37:21PM +0900, Tetsuo Handa wrote:
>>> @@ -1115,7 +1107,6 @@ static void __loop_clr_fd(struct loop_device *lo)
>>>  	/* freeze request queue during the transition */
>>>  	blk_mq_freeze_queue(lo->lo_queue);
>>>  
>>> -	destroy_workqueue(lo->workqueue);
>>
>> is it safe to remove destroy_workqueue() call here?
>>
>>>  	spin_lock_irq(&lo->lo_work_lock);
>>>  	list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
>>>  				idle_list) {
>>
>> destroy_workqueue() implies flush_workqueue() which is creating the lock
>> ordering problem. And I think that flush_workqueue() is required for making
>> sure that there is no more work to process (i.e. loop_process_work() is
>> no longer running) before start deleting idle workers.
>>
>> My understanding is that the problem is not the use of a per-device workqueue
>> but the need to call flush_workqueue() in order to make sure that all pending
>> works are completed.
> 
> All the work items are for requests, and the blk_mq_freeze_queue should
> take care of flushing them all out.

Hmm, OK.

(1) loop_queue_rq() calls blk_mq_start_request() and then calls loop_queue_work().

(2) loop_queue_work() allocates "struct work_struct" and calls queue_work().

(3) loop_handle_cmd() from loop_process_work() from loop_workfn() is called by a WQ thread.

(4) do_req_filebacked() from loop_handle_cmd() performs read/write on lo->lo_backing_file.

(5) Either completion function or loop_handle_cmd() calls blk_mq_complete_request().

Therefore, as long as blk_mq_freeze_queue(lo->lo_queue) waits for completion of (5) and
blocks new events for (2), there should be no work to process by loop_process_work().

Then, we can defer

	destroy_workqueue(lo->workqueue);
	spin_lock_irq(&lo->lo_work_lock);
	list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
				idle_list) {
		list_del(&worker->idle_list);
		rb_erase(&worker->rb_node, &lo->worker_tree);
		css_put(worker->blkcg_css);
		kfree(worker);
	}
	spin_unlock_irq(&lo->lo_work_lock);
	del_timer_sync(&lo->timer);

block in __loop_clr_fd() till loop_remove() if we want. Assuming that
loop devices are likely created only when there is no free one, a loop
device is likely reused once created. Then, we don't need to care idle
workers on every loop_configure()/__loop_clr_fd() pairs?

By the way, is it safe to use single global WQ if (4) is a synchronous I/O request?
Since there can be up to 1048576 loop devices, and one loop device can use another
loop device as lo->lo_backing_file (unless loop_validate_file() finds a circular
usage), one synchronous I/O request in (4) might recursively involve up to 1048576
works (which would be too many concurrency to be handled by a WQ) ?

Also, is

	blk_mq_start_request(rq);

	if (lo->lo_state != Lo_bound)
		return BLK_STS_IOERR;

in loop_queue_rq() correct? (Not only lo->lo_state test is racy, but wants
blk_mq_end_request() like lo_complete_rq() does?