All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolai Stange <nstange@suse.de>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
	axboe@kernel.dk, viro@zeniv.linux.org.uk,
	gregkh@linuxfoundation.org, rostedt@goodmis.org,
	mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com,
	nstange@suse.de, mhocko@suse.com, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Omar Sandoval <osandov@fb.com>, Hannes Reinecke <hare@suse.com>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [RFC 3/3] block: avoid deferral of blk_release_queue() work
Date: Thu, 02 Apr 2020 16:49:37 +0200	[thread overview]
Message-ID: <87o8saj62m.fsf@suse.de> (raw)
In-Reply-To: <774a33e8-43ba-143f-f6fd-9cb0ae0862ac@acm.org> (Bart Van Assche's message of "Wed, 1 Apr 2020 20:39:48 -0700")

Bart Van Assche <bvanassche@acm.org> writes:

> On 2020-04-01 17:00, Luis Chamberlain wrote:
>> Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") moved
>> the blk_release_queue() into a workqueue after a splat floated around
>> with some work here which could sleep in blk_exit_rl().
>> 
>> On recent commit db6d9952356 ("block: remove request_list code") though
>> Jens Axboe removed this code, now merged since v5.0. We no longer have
>> to defer this work.
>> 
>> By doing this we also avoid failing to detach / attach a block
>> device with a BLKTRACESETUP. This issue can be reproduced with
>> break-blktrace [0] using:
>> 
>>   break-blktrace -c 10 -d -s
>> 
>> The kernel does not crash without this commit, it just fails to
>> create the block device because the prior block device removal
>> deferred work is pending. After this commit we can use the above
>> flaky use of blktrace without an issue.
>> 
>> [0] https://github.com/mcgrof/break-blktrace
>> 
>> Cc: Bart Van Assche <bvanassche@acm.org>
>> Cc: Omar Sandoval <osandov@fb.com>
>> Cc: Hannes Reinecke <hare@suse.com>
>> Cc: Nicolai Stange <nstange@suse.de>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Suggested-by: Nicolai Stange <nstange@suse.de>
>> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
>> ---
>>  block/blk-sysfs.c | 18 +++++-------------
>>  1 file changed, 5 insertions(+), 13 deletions(-)
>> 
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 20f20b0fa0b9..f159b40899ee 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -862,8 +862,8 @@ static void blk_exit_queue(struct request_queue *q)
>>  
>>  
>>  /**
>> - * __blk_release_queue - release a request queue
>> - * @work: pointer to the release_work member of the request queue to be released
>> + * blk_release_queue - release a request queue
>> + * @kojb: pointer to the kobj representing the request queue
>>   *
>>   * Description:
>>   *     This function is called when a block device is being unregistered. The
>> @@ -873,9 +873,10 @@ static void blk_exit_queue(struct request_queue *q)
>>   *     of the request queue reaches zero, blk_release_queue is called to release
>>   *     all allocated resources of the request queue.
>>   */
>> -static void __blk_release_queue(struct work_struct *work)
>> +static void blk_release_queue(struct kobject *kobj)
>>  {
>> -	struct request_queue *q = container_of(work, typeof(*q), release_work);
>> +	struct request_queue *q =
>> +		container_of(kobj, struct request_queue, kobj);
>>  
>>  	if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags))
>>  		blk_stat_remove_callback(q, q->poll_cb);
>> @@ -905,15 +906,6 @@ static void __blk_release_queue(struct work_struct *work)
>>  	call_rcu(&q->rcu_head, blk_free_queue_rcu);
>>  }
>>  
>> -static void blk_release_queue(struct kobject *kobj)
>> -{
>> -	struct request_queue *q =
>> -		container_of(kobj, struct request_queue, kobj);
>> -
>> -	INIT_WORK(&q->release_work, __blk_release_queue);
>> -	schedule_work(&q->release_work);
>> -}
>> -
>>  static const struct sysfs_ops queue_sysfs_ops = {
>>  	.show	= queue_attr_show,
>>  	.store	= queue_attr_store,
>
> The description of this patch mentions a single blk_release_queue() call
> that happened in the past from a context from which sleeping is not
> allowed and from which sleeping is allowed today. Have all other
> blk_release_queue() / blk_put_queue() calls been verified to see whether
> none of these happens from a context from which sleeping is not allowed?

I've just done this today and found the following potentially
problematic call paths to blk_put_queue().

1.) mem_cgroup_throttle_swaprate() takes a spinlock and
    calls blkcg_schedule_throttle()->blk_put_queue().

    Also note that AFAICS mem_cgroup_try_charge_delay() can be called
    with GFP_ATOMIC.

2.) scsi_unblock_requests() gets called from a lot of drivers and
    invoke blk_put_queue() through
    scsi_unblock_requests() -> scsi_run_host_queues() ->
    scsi_starved_list_run() -> blk_put_queue().

    Most call sites are fine, the ones which are not are:
    a.) pmcraid_complete_ioa_reset(). This gets assigned
        to struct pmcraid_cmd's ->cmd_done and later invoked
        under a spinlock.

    b.) qla82xx_fw_dump() and qla8044_fw_dump().
        These can potentially block w/o this patch already,
        because both invoke qla2x00_wait_for_chip_reset().

	However, they can get called from IRQ context. For example,
        qla82xx_intr_handler(), qla82xx_msix_default() and
        qla82xx_poll() call qla2x00_async_event(), which calls
        ->fw_dump().

	The aforementioned functions can also reach ->fw_dump() through
        qla24xx_process_response_queue()->qlt_handle_abts_recv()->qlt_response_pkt_all_vps()
        ->qlt_response_pkt()->qlt_handle_abts_completion()->qlt_chk_unresolv_exchg()
        -> ->fw_dump().

	But I'd consider this a problem with the driver -- either
	->fw_dump() can sleep and must not be called from IRQ context
        or they must not invoke qla2x00_wait_for_hba_ready().


(I can share the full analysis, but it's lengthy and contains nothing
 interesting except for what is listed above).


One final note though: If I'm not mistaken, then the final
blk_put_queue() can in principle block even today, simply by virtue of
the kernfs operations invoked through
kobject_put()->kobject_release()->kobject_cleanup()->kobject_del()
->sysfs_remove_dir()->kernfs_remove()->mutex_lock()?


Thanks,

Nicolai

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg), GF: Felix Imendörffer

  reply	other threads:[~2020-04-02 14:49 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-01 23:59 [RFC 0/3] block: address blktrace use-after-free Luis Chamberlain
2020-04-02  0:00 ` [RFC 1/3] block: move main block debugfs initialization to its own file Luis Chamberlain
2020-04-05  3:12   ` Bart Van Assche
2020-04-06 14:23     ` Luis Chamberlain
2020-04-02  0:00 ` [RFC 2/3] blktrace: fix debugfs use after free Luis Chamberlain
2020-04-02  1:57   ` Eric Sandeen
2020-04-02 16:14     ` Luis Chamberlain
2020-04-03 14:19   ` kbuild test robot
2020-04-05  3:39   ` Bart Van Assche
2020-04-06  1:27     ` Eric Sandeen
2020-04-06  4:25       ` Bart Van Assche
2020-04-06  9:18         ` Nicolai Stange
2020-04-06 15:19           ` Luis Chamberlain
2020-04-07  8:15             ` Luis Chamberlain
2020-04-06 14:29         ` Eric Sandeen
2020-04-07  8:09           ` Luis Chamberlain
2020-04-06 15:14     ` Luis Chamberlain
2020-04-02  0:00 ` [RFC 3/3] block: avoid deferral of blk_release_queue() work Luis Chamberlain
2020-04-02  3:39   ` Bart Van Assche
2020-04-02 14:49     ` Nicolai Stange [this message]
2020-04-06  9:11       ` Nicolai Stange
2020-04-09 18:11       ` Luis Chamberlain
2020-04-02  7:44 ` [RFC 0/3] block: address blktrace use-after-free Greg KH
2020-04-03  8:19 ` Ming Lei
2020-04-03 14:06   ` Luis Chamberlain
2020-04-03 14:13   ` Bart Van Assche
2020-04-03 19:49     ` Luis Chamberlain
2020-04-07  2:47   ` yukuai (C)
2020-04-07 19:00     ` Luis Chamberlain
2020-04-09 20:59       ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8saj62m.fsf@suse.de \
    --to=nstange@suse.de \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=ming.lei@redhat.com \
    --cc=mingo@redhat.com \
    --cc=osandov@fb.com \
    --cc=rostedt@goodmis.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.