From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lars Ellenberg Subject: Re: [RFC] block: fix blk_queue_split() resource exhaustion Date: Thu, 7 Jul 2016 10:03:28 +0200 Message-ID: <20160707080328.GG13335@soda.linbit> References: <1466583730-28595-1-git-send-email-lars.ellenberg@linbit.com> <20160704082006.GN3239@soda.linbit> <20160706123841.GA13335@soda.linbit> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: Ming Lei Cc: Jens Axboe , Keith Busch , Linux Kernel Mailing List , "Martin K. Petersen" , Mike Snitzer , Peter Zijlstra , Jiri Kosina , NeilBrown , Zheng Liu , linux-block , Takashi Iwai , "open list:DEVICE-MAPPER (LVM)" , Ingo Molnar , "Kirill A. Shutemov" , "open list:SOFTWARE RAID (Multiple Disks) SUPPORT" , "open list:BCACHE (BLOCK LAYER CACHE)" , Shaohua Li , Kent Overstreet List-Id: linux-raid.ids On Wed, Jul 06, 2016 at 11:57:51PM +0800, Ming Lei wrote: > > ==== my suggestion > > > > generic_make_request(bio_orig) > > NULL in-flight=0 > > bio_orig empty in-flight=0 > > qA->make_request_fn(bio_orig) > > blk_queue_split() > > result: > > bio_s, and bio_r stuffed away to head of remainder list. > > in-flight=1 > > bio_c = bio_clone(bio_s) > > generic_make_request(bio_c to qB) > > bio_c > > <-return > > bio_c > > bio_list_pop() > > empty > > qB->make_request_fn(bio_c) > > (Assume it does not clone, but only remap. > > But it may also be a striping layer, > > and queue more than one bio here.) > > generic_make_request(bio_c to qC) > > bio_c > > <-return > > bio_list_pop() > > empty > > qC->make_request_fn(bio_c) > > generic_make_request(bio_c to qD) > > bio_c > > <-return > > bio_list_pop() > > empty > > qD->make_request_fn(bio_c) > > dispatches to hardware > > <-return > > empty > > bio_list_pop() > > NULL, great, lets pop from remainder list > > qA->make_request_fn(bio_r) in-flight=? > > > > May block, but only until completion of bio_c. > > Which may already have happened. > > > > *makes progress* > > I admit your solution is smart, but it isn't easy to prove it as correct > in theory. But if the traversal can be mapped into pre-order traversal > of the above binary tree, it may be correct. What are you talking about. There is no tree. There is a single fifo. And I suggest to make that one fifo, and one lifo instead. |<------ original bio ----->| |piece|----remainder--------| |piece| is then processed, just as it was before, all recursive submissions turned into iterative processing, in the exact order they have been called recursively. Until all deeper level submissions have been fully processed. If deeper levels are again calling bio_queue_split, their respective remainder are queued in front of the "top level" remainder. And only then, the remainders are processed, just as if they did come in as "original bio", see above. So if it did make progress before, it will make progress now. Lars From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 7 Jul 2016 10:03:28 +0200 From: Lars Ellenberg To: Ming Lei Cc: linux-block , Roland Kammerer , Jens Axboe , NeilBrown , Kent Overstreet , Shaohua Li , Alasdair Kergon , Mike Snitzer , "open list:DEVICE-MAPPER (LVM)" , Ingo Molnar , Peter Zijlstra , Takashi Iwai , Jiri Kosina , Zheng Liu , Keith Busch , "Martin K. Petersen" , "Kirill A. Shutemov" , Linux Kernel Mailing List , "open list:BCACHE (BLOCK LAYER CACHE)" , "open list:SOFTWARE RAID (Multiple Disks) SUPPORT" Subject: Re: [RFC] block: fix blk_queue_split() resource exhaustion Message-ID: <20160707080328.GG13335@soda.linbit> References: <1466583730-28595-1-git-send-email-lars.ellenberg@linbit.com> <20160704082006.GN3239@soda.linbit> <20160706123841.GA13335@soda.linbit> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: List-ID: On Wed, Jul 06, 2016 at 11:57:51PM +0800, Ming Lei wrote: > > ==== my suggestion > > > > generic_make_request(bio_orig) > > NULL in-flight=0 > > bio_orig empty in-flight=0 > > qA->make_request_fn(bio_orig) > > blk_queue_split() > > result: > > bio_s, and bio_r stuffed away to head of remainder list. > > in-flight=1 > > bio_c = bio_clone(bio_s) > > generic_make_request(bio_c to qB) > > bio_c > > <-return > > bio_c > > bio_list_pop() > > empty > > qB->make_request_fn(bio_c) > > (Assume it does not clone, but only remap. > > But it may also be a striping layer, > > and queue more than one bio here.) > > generic_make_request(bio_c to qC) > > bio_c > > <-return > > bio_list_pop() > > empty > > qC->make_request_fn(bio_c) > > generic_make_request(bio_c to qD) > > bio_c > > <-return > > bio_list_pop() > > empty > > qD->make_request_fn(bio_c) > > dispatches to hardware > > <-return > > empty > > bio_list_pop() > > NULL, great, lets pop from remainder list > > qA->make_request_fn(bio_r) in-flight=? > > > > May block, but only until completion of bio_c. > > Which may already have happened. > > > > *makes progress* > > I admit your solution is smart, but it isn't easy to prove it as correct > in theory. But if the traversal can be mapped into pre-order traversal > of the above binary tree, it may be correct. What are you talking about. There is no tree. There is a single fifo. And I suggest to make that one fifo, and one lifo instead. |<------ original bio ----->| |piece|----remainder--------| |piece| is then processed, just as it was before, all recursive submissions turned into iterative processing, in the exact order they have been called recursively. Until all deeper level submissions have been fully processed. If deeper levels are again calling bio_queue_split, their respective remainder are queued in front of the "top level" remainder. And only then, the remainders are processed, just as if they did come in as "original bio", see above. So if it did make progress before, it will make progress now. Lars From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933622AbcGGI0y (ORCPT ); Thu, 7 Jul 2016 04:26:54 -0400 Received: from zimbra13.linbit.com ([212.69.166.240]:34905 "EHLO zimbra13.linbit.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933433AbcGGIDe (ORCPT ); Thu, 7 Jul 2016 04:03:34 -0400 Date: Thu, 7 Jul 2016 10:03:28 +0200 From: Lars Ellenberg To: Ming Lei Cc: linux-block , Roland Kammerer , Jens Axboe , NeilBrown , Kent Overstreet , Shaohua Li , Alasdair Kergon , Mike Snitzer , "open list:DEVICE-MAPPER (LVM)" , Ingo Molnar , Peter Zijlstra , Takashi Iwai , Jiri Kosina , Zheng Liu , Keith Busch , "Martin K. Petersen" , "Kirill A. Shutemov" , Linux Kernel Mailing List , "open list:BCACHE (BLOCK LAYER CACHE)" , "open list:SOFTWARE RAID (Multiple Disks) SUPPORT" Subject: Re: [RFC] block: fix blk_queue_split() resource exhaustion Message-ID: <20160707080328.GG13335@soda.linbit> References: <1466583730-28595-1-git-send-email-lars.ellenberg@linbit.com> <20160704082006.GN3239@soda.linbit> <20160706123841.GA13335@soda.linbit> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 06, 2016 at 11:57:51PM +0800, Ming Lei wrote: > > ==== my suggestion > > > > generic_make_request(bio_orig) > > NULL in-flight=0 > > bio_orig empty in-flight=0 > > qA->make_request_fn(bio_orig) > > blk_queue_split() > > result: > > bio_s, and bio_r stuffed away to head of remainder list. > > in-flight=1 > > bio_c = bio_clone(bio_s) > > generic_make_request(bio_c to qB) > > bio_c > > <-return > > bio_c > > bio_list_pop() > > empty > > qB->make_request_fn(bio_c) > > (Assume it does not clone, but only remap. > > But it may also be a striping layer, > > and queue more than one bio here.) > > generic_make_request(bio_c to qC) > > bio_c > > <-return > > bio_list_pop() > > empty > > qC->make_request_fn(bio_c) > > generic_make_request(bio_c to qD) > > bio_c > > <-return > > bio_list_pop() > > empty > > qD->make_request_fn(bio_c) > > dispatches to hardware > > <-return > > empty > > bio_list_pop() > > NULL, great, lets pop from remainder list > > qA->make_request_fn(bio_r) in-flight=? > > > > May block, but only until completion of bio_c. > > Which may already have happened. > > > > *makes progress* > > I admit your solution is smart, but it isn't easy to prove it as correct > in theory. But if the traversal can be mapped into pre-order traversal > of the above binary tree, it may be correct. What are you talking about. There is no tree. There is a single fifo. And I suggest to make that one fifo, and one lifo instead. |<------ original bio ----->| |piece|----remainder--------| |piece| is then processed, just as it was before, all recursive submissions turned into iterative processing, in the exact order they have been called recursively. Until all deeper level submissions have been fully processed. If deeper levels are again calling bio_queue_split, their respective remainder are queued in front of the "top level" remainder. And only then, the remainders are processed, just as if they did come in as "original bio", see above. So if it did make progress before, it will make progress now. Lars