From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751479AbdKVBWC (ORCPT ); Tue, 21 Nov 2017 20:22:02 -0500 Received: from mx1.redhat.com ([209.132.183.28]:43586 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751379AbdKVBWB (ORCPT ); Tue, 21 Nov 2017 20:22:01 -0500 Date: Tue, 21 Nov 2017 20:21:54 -0500 (EST) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Mike Snitzer cc: NeilBrown , Jens Axboe , "linux-kernel@vger.kernel.org" , linux-block@vger.kernel.org, device-mapper development , Zdenek Kabelac Subject: Re: new patchset to eliminate DM's use of BIOSET_NEED_RESCUER In-Reply-To: <20171121225119.GA19630@redhat.com> Message-ID: References: <149776047907.23258.8058071140236879834.stgit@noble> <20170618184143.GA10920@kernel.dk> <87poe13rmm.fsf@notabene.neil.brown.name> <87a7zg31vx.fsf@notabene.neil.brown.name> <20171121013533.GA14520@redhat.com> <20171121121049.GA17014@redhat.com> <20171121124311.GA17243@redhat.com> <20171121194709.GA18903@redhat.com> <20171121225119.GA19630@redhat.com> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 22 Nov 2017 01:22:00 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 21 Nov 2017, Mike Snitzer wrote: > On Tue, Nov 21 2017 at 4:23pm -0500, > Mikulas Patocka wrote: > > > This is not correct: > > > > 2206 static void dm_wq_work(struct work_struct *work) > > 2207 { > > 2208 struct mapped_device *md = container_of(work, struct mapped_device, work); > > 2209 struct bio *bio; > > 2210 int srcu_idx; > > 2211 struct dm_table *map; > > 2212 > > 2213 if (!bio_list_empty(&md->rescued)) { > > 2214 struct bio_list list; > > 2215 spin_lock_irq(&md->deferred_lock); > > 2216 list = md->rescued; > > 2217 bio_list_init(&md->rescued); > > 2218 spin_unlock_irq(&md->deferred_lock); > > 2219 while ((bio = bio_list_pop(&list))) > > 2220 generic_make_request(bio); > > 2221 } > > 2222 > > 2223 map = dm_get_live_table(md, &srcu_idx); > > 2224 > > 2225 while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) { > > 2226 spin_lock_irq(&md->deferred_lock); > > 2227 bio = bio_list_pop(&md->deferred); > > 2228 spin_unlock_irq(&md->deferred_lock); > > 2229 > > 2230 if (!bio) > > 2231 break; > > 2232 > > 2233 if (dm_request_based(md)) > > 2234 generic_make_request(bio); > > 2235 else > > 2236 __split_and_process_bio(md, map, bio); > > 2237 } > > 2238 > > 2239 dm_put_live_table(md, srcu_idx); > > 2240 } > > > > You can see that if we are in dm_wq_work in __split_and_process_bio, we > > will not process md->rescued list. > > Can you elaborate further? We cannot be "in dm_wq_work in > __split_and_process_bio" simultaneously. Do you mean as a side-effect > of scheduling away from __split_and_process_bio? > > The more detail you can share the better. Suppose this scenario: * dm_wq_work calls __split_and_process_bio * __split_and_process_bio eventually reaches the function snapshot_map * snapshot_map attempts to take the snapshot lock * the snapshot lock could be released only if some bios submitted by the snapshot driver to the underlying device complete * the bios submitted to the underlying device were already offloaded by some other task and they are waiting on the list md->rescued * the bios waiting on md->rescued are not processed, because dm_wq_work is blocked in snapshot_map (called from __split_and_process_bio) > > The processing of md->rescued is also wrong - bios for different devices > > must be offloaded to different helper threads, so that processing a bio > > for a lower device doesn't depend on processing a bio for a higher device. > > If you offload all the bios on current->bio_list to the same thread, the > > bios still depend on each other and the deadlock will still happen. > > Commit 325738403 ("dm: revise 'rescue' strategy for bio-based bioset > allocations") speaks to this with: > > "Note that only current->bio_list[0] is offloaded. current->bio_list[1] > contains bios that were scheduled *before* the current one started, so > they must have been submitted from higher up the stack, and we cannot be > waiting for them here (thanks to the "dm: ensure bio submission follows > a depth-first tree walk" commit). Also, we now rescue *all* bios on the > list as there is nothing to be gained by being more selective." I think you are right - if we only offload current->bio_list[0], then mixing of dependent bios on the offloaded list won't happen. > And again: this patchset passes your dm-snapshot deadlock test. Is > that test somehow lacking? With your patchset, the deadlock would happen only if bios are queued on &md->deferred - and that happens only in case of resume or if we are processing REQ_PREFLUSH with non-zero data size. So, the simple test that I wrote doesn't trigger it, but a more complex test involving REQ_PREFLUSH could. > Or do you see a hypothetical case where a deadlock is still possible? > That is of less concern. I'd prefer that we tackle problems for > targets, and associated scenarios, that we currently support. > > Either way, happy to review this with you further. Any fixes are > welcomed too. But I'd like us to head in a direction that this patchset > is taking us. Specifically: away from DM relying on BIOSET_NEED_RESCUER. > > Thanks, > Mike Mikulas