From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756371AbbJUVtX (ORCPT ); Wed, 21 Oct 2015 17:49:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47099 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750767AbbJUVtW (ORCPT ); Wed, 21 Oct 2015 17:49:22 -0400 Date: Wed, 21 Oct 2015 17:49:20 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Ming Lei cc: Mike Snitzer , Jens Axboe , Kent Overstreet , dm-devel@redhat.com, Linux Kernel Mailing List , "Alasdair G. Kergon" , Jeff Moyer Subject: Re: [PATCH v3 for-4.4] block: flush queued bios when process blocks to avoid deadlock In-Reply-To: Message-ID: References: <5384CE82.90601@kernel.dk> <20151005205943.GB25762@redhat.com> <20151006185016.GA31955@redhat.com> <20151006201637.GA4158@redhat.com> <20151008150859.GA11770@redhat.com> <20151009195203.GA18790@redhat.com> <20151009195907.GB18790@redhat.com> <20151014204739.GA23449@redhat.com> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 22 Oct 2015, Ming Lei wrote: > > Some drivers (dm-snapshot, dm-thin) do acquire a mutex in .make_requests() > > for every bio. It wouldn't be practical to convert them to not acquire the > > mutex (and it would also degrade performance of these drivers, if they had > > to offload every bio to a worker thread that can acquire the mutex). > > Lots of drivers handle I/O in that way, and this way makes AIO not possible > basically for dm-snapshot. It doesn't have to do anything with asynchronous I/O. Of course you can do asynchronous I/O on dm-snapshot. > >> Also sometimes it can hurt performance by converting I/O submission > >> from one context into concurrent contexts of workqueue, especially > >> in case of sequential I/O, since plug & plug merge can't be used any > >> more. > > > > You can add blk_start_plug/blk_finish_plug to the function > > bio_alloc_rescue. That would be reasonable to make sure that the requests > > are merged even when they are offloaded to rescue thread. > > The IOs submitted from each wq context becomes not contineous any > more, so plug merge isn't doable, not mention the extra context switch > cost. If the requests are mergeable, blk_start_plug/blk_finish_plug will merge them, if not, it won't. > This kind of cost can be introduced for all bio devices just for handling > the unusual case, fair? Offloading bios to a worker thread when the make_request_fn function blocks is required to avoid a deadlock (BTW. the deadlock became more common in the kernel 4.3 due to unrestricted size of bios). The bio list current->bio_list introduces a false locking dependency - completion of a bio depends on completion of other bios on current->bio_list directed for different devices, thus it could create circular dependency resulting in deadlock. To avoid the circular dependency, each bio must be offloaded to a specific workqueue, so that completion of bio for device A no longer depends on completion of another bio for device B. > >> > - queue_work(bs->rescue_workqueue, &bs->rescue_work); > >> > + spin_lock(&bs->rescue_lock); > >> > + bio_list_add(&bs->rescue_list, bio); > >> > + queue_work(bs->rescue_workqueue, &bs->rescue_work); > >> > + spin_unlock(&bs->rescue_lock); > >> > + } > >> > >> Not like rescuring path, schedule out can be quite frequent, and the > >> above change will switch to submit these I/Os from wq concurrently, > >> which might hurt performance for sequential I/O. > >> > >> Also I am wondering why not submit these I/Os in 'current' context > >> just like what flush plug does? > > > > Processing requests doesn't block (they only take the queue spinlock). > > > > Processing bios can block (they can take other mutexes or semaphores), so > > processing them from the schedule hook is impossible - the bio's > > make_request function could attempt to take some lock that is already > > held. So - we must offload the bios to a separate workqueue. > > Yes, so better to just handle dm-snapshot in this way. All dm targets and almost all other bio-processing drivers can block in the make_request_fn function (for example, they may block when allocating from a mempool). Mikulas > Thanks, > Ming Lei >