From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF448C43381 for ; Mon, 18 Mar 2019 17:31:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8EBFF20989 for ; Mon, 18 Mar 2019 17:31:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=broadcom.com header.i=@broadcom.com header.b="XnDqyQIg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726773AbfCRRbA (ORCPT ); Mon, 18 Mar 2019 13:31:00 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:37819 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726765AbfCRRbA (ORCPT ); Mon, 18 Mar 2019 13:31:00 -0400 Received: by mail-pg1-f196.google.com with SMTP id q206so11869198pgq.4 for ; Mon, 18 Mar 2019 10:30:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=+VrQH3cs+pQ9NWRQ08YnxQceXRyyvIn4lVuNrLgWMk8=; b=XnDqyQIgzkeI5Gy7jUo8E98YyRGDt6GBH951NGzLa/IBY3SLBaA8tJQqKa+MCeG4IS Z61jQBoBuYOPh+Az4BGjGNLYGQx8/IkR4i7q4WTpTlCY/IR3qUDNhGzLqdL+qpjYHW+s cXu4u4Ib8HeqU1BJkO1xkctCaKmtasmksA47U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=+VrQH3cs+pQ9NWRQ08YnxQceXRyyvIn4lVuNrLgWMk8=; b=SZAZuwLrbSVGuA/R1S1bOt0GNOcsDQt2gQfhawbQyC+j6SQGtMCRDrADLanuuEh4Wn HaLAO1lbwszmO+lr+BbXYIjDmwEAEVgLO7n67gik9Ms3S8x+5W5+uyOhDA0sX5FUBYBg AOo9AAPrBnbFZEohTqVbDZEvmi4Y8m9ToXxfpwlNimxtRac1u6+7HJrowxcxwH2msLW8 Q0Ixssfx/3ucXtzl5dwZ+Vu0QgCNjwklsCJk/a6trRXjhak0p1/I2CTOFahISvOgVJr9 Pn0P3YWy/BltVW/TCJmv1H0GHfO8EMq4usYcBOODZh/MaikhlBsCsWooKRGxyMC3h4Hz cO8w== X-Gm-Message-State: APjAAAVI3/hHOgp4mpJPe+Lfw3MQSR9oobpTukDmkMNj5YCdFreEBP8x d8kVy+2wZlXnuztx9z7ZhBVnE8QL6Nw= X-Google-Smtp-Source: APXvYqysGQplly8c/Nv9usH9GYG/5edDjGMas2z5AOOnPMGYtntF2/xgtGSlcOWjg4/Kky4mFMB/GA== X-Received: by 2002:a17:902:20e3:: with SMTP id v32mr20941193plg.213.1552930259531; Mon, 18 Mar 2019 10:30:59 -0700 (PDT) Received: from [10.69.37.149] ([192.19.223.250]) by smtp.gmail.com with ESMTPSA id u14sm16455374pfm.66.2019.03.18.10.30.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Mar 2019 10:30:58 -0700 (PDT) Subject: Re: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync() To: Bart Van Assche , Ming Lei , Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , linux-nvme@lists.infradead.org References: <20190318032950.17770-1-ming.lei@redhat.com> <20190318032950.17770-2-ming.lei@redhat.com> From: James Smart Message-ID: Date: Mon, 18 Mar 2019 10:30:57 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 3/17/2019 9:09 PM, Bart Van Assche wrote: > On 3/17/19 8:29 PM, Ming Lei wrote: >> In NVMe's error handler, follows the typical steps for tearing down >> hardware: >> >> 1) stop blk_mq hw queues >> 2) stop the real hw queues >> 3) cancel in-flight requests via >>     blk_mq_tagset_busy_iter(tags, cancel_request, ...) >> cancel_request(): >>     mark the request as abort >>     blk_mq_complete_request(req); >> 4) destroy real hw queues >> >> However, there may be race between #3 and #4, because >> blk_mq_complete_request() >> actually completes the request asynchronously. >> >> This patch introduces blk_mq_complete_request_sync() for fixing the >> above race. > > Other block drivers wait until outstanding requests have completed by > calling blk_cleanup_queue() before hardware queues are destroyed. Why > can't the NVMe driver follow that approach? > speaking for the fabrics, not necessarily pci: The intent of this looping, which happens immediately following an error being detected, is to cause the termination of the outstanding requests. Otherwise, the only recourse is to wait for the ios to finish, which they may never do, or have their upper-level timeout expire to cause their termination - thus a very long delay.   And one of the commands, on the admin queue - a different tag set but handled the same, doesn't have a timeout (the Async Event Reporting command) so it wouldn't necessarily clear without this looping. -- james From mboxrd@z Thu Jan 1 00:00:00 1970 From: james.smart@broadcom.com (James Smart) Date: Mon, 18 Mar 2019 10:30:57 -0700 Subject: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync() In-Reply-To: References: <20190318032950.17770-1-ming.lei@redhat.com> <20190318032950.17770-2-ming.lei@redhat.com> Message-ID: On 3/17/2019 9:09 PM, Bart Van Assche wrote: > On 3/17/19 8:29 PM, Ming Lei wrote: >> In NVMe's error handler, follows the typical steps for tearing down >> hardware: >> >> 1) stop blk_mq hw queues >> 2) stop the real hw queues >> 3) cancel in-flight requests via >> ????blk_mq_tagset_busy_iter(tags, cancel_request, ...) >> cancel_request(): >> ????mark the request as abort >> ????blk_mq_complete_request(req); >> 4) destroy real hw queues >> >> However, there may be race between #3 and #4, because >> blk_mq_complete_request() >> actually completes the request asynchronously. >> >> This patch introduces blk_mq_complete_request_sync() for fixing the >> above race. > > Other block drivers wait until outstanding requests have completed by > calling blk_cleanup_queue() before hardware queues are destroyed. Why > can't the NVMe driver follow that approach? > speaking for the fabrics, not necessarily pci: The intent of this looping, which happens immediately following an error being detected, is to cause the termination of the outstanding requests. Otherwise, the only recourse is to wait for the ios to finish, which they may never do, or have their upper-level timeout expire to cause their termination - thus a very long delay. ? And one of the commands, on the admin queue - a different tag set but handled the same, doesn't have a timeout (the Async Event Reporting command) so it wouldn't necessarily clear without this looping. -- james