From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752170AbcF1IY6 (ORCPT ); Tue, 28 Jun 2016 04:24:58 -0400 Received: from zimbra13.linbit.com ([212.69.166.240]:59502 "EHLO zimbra13.linbit.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750704AbcF1IYz (ORCPT ); Tue, 28 Jun 2016 04:24:55 -0400 Date: Tue, 28 Jun 2016 10:24:49 +0200 From: Lars Ellenberg To: Mike Snitzer Cc: Ming Lei , linux-block@vger.kernel.org, Roland Kammerer , Jens Axboe , NeilBrown , Kent Overstreet , Shaohua Li , Alasdair Kergon , "open list:DEVICE-MAPPER (LVM)" , Ingo Molnar , Peter Zijlstra , Takashi Iwai , Jiri Kosina , Zheng Liu , Keith Busch , "Martin K. Petersen" , "Kirill A. Shutemov" , Linux Kernel Mailing List , "open list:BCACHE (BLOCK LAYER CACHE)" , "open list:SOFTWARE RAID (Multiple Disks) SUPPORT" Subject: Re: block: fix blk_queue_split() resource exhaustion Message-ID: <20160628082448.GH3239@soda.linbit> References: <1466583730-28595-1-git-send-email-lars.ellenberg@linbit.com> <20160624142711.GF3239@soda.linbit> <20160624151547.GA13898@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160624151547.GA13898@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 24, 2016 at 11:15:47AM -0400, Mike Snitzer wrote: > On Fri, Jun 24 2016 at 10:27am -0400, > Lars Ellenberg wrote: > > > On Fri, Jun 24, 2016 at 07:36:57PM +0800, Ming Lei wrote: > > > > > > > > This is not a theoretical problem. > > > > At least int DRBD, and an unfortunately high IO concurrency wrt. the > > > > "max-buffers" setting, without this patch we have a reproducible deadlock. > > > > > > Is there any log about the deadlock? And is there any lockdep warning > > > if it is enabled? > > > > In DRBD, to avoid potentially very long internal queues as we wait for > > our replication peer device and local backend, we limit the number of > > in-flight bios we accept, and block in our ->make_request_fn() if that > > number exceeds a configured watermark ("max-buffers"). > > > > Works fine, as long as we could assume that once our make_request_fn() > > returns, any bios we "recursively" submitted against the local backend > > would be dispatched. Which used to be the case. > > It'd be useful to know whether this patch fixes your issue: > https://patchwork.kernel.org/patch/7398411/ I would assume so. because if current is blocked for any reason, it will dispatch all bios that are still on current->bio_list to be processed from other contexts. Which means we will not deadlock, but make progress, if the unblock of current depends on processing of those bios. Also see my other mail on the issue, where I try to better explain the mechanics of "my" deadlock. Lars