From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34997C433E0 for ; Thu, 30 Jul 2020 16:10:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1982F20829 for ; Thu, 30 Jul 2020 16:10:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729745AbgG3QKz (ORCPT ); Thu, 30 Jul 2020 12:10:55 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:34685 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729387AbgG3QKz (ORCPT ); Thu, 30 Jul 2020 12:10:55 -0400 Received: by mail-wr1-f67.google.com with SMTP id f7so25468839wrw.1 for ; Thu, 30 Jul 2020 09:10:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=9s3tkWHkOAEpkDIzv/j8SSeEIjQRVfELEcdZ9Sis/ys=; b=H+uXUG5gienwRcRKDkNAUx8z9m8PL7jMDr2sStaX42oAV5U3xrWFdWozMBF/vtx1Mq tFwOgMFz679q6HlN7Rw/wzrdfyJguAiYrosGhRZnl1poPwVfU7cSkdpqWiNATpGuWW6e +HBbuL3yLz7qlQ8vr/4X8LvW1t0+tVRMrdn/s4THqWCYSXSQ/y5bMmSEwBjWdfvzM1eE hwjU/T6NWxUNgHr8M+IfL/uHb0XSD/fYn+cJay/0NesVdDeU7jbV6CrurK48oOASahSO w92+RPkBMnlsaTBxVT7flZkEb72n57u1lMHrqcjhYd85x8Gv4495r5zMrj6Gm9y6Js9k 1KPA== X-Gm-Message-State: AOAM5325ROXywdY+1K3FuH+q5pNak16FQvNJrYEu9SQOVx2uH5ltUeQs B8CMWh+bYiNnUy/fi0YauAI= X-Google-Smtp-Source: ABdhPJzXk6jYC93kFknNUf1tJXhdUMk5yo5MFtKkg6DynuYU+CPDELgf7eE+k/YsuwIguhDgbcQ6jw== X-Received: by 2002:adf:b1cf:: with SMTP id r15mr3717928wra.118.1596125452811; Thu, 30 Jul 2020 09:10:52 -0700 (PDT) Received: from ?IPv6:2601:647:4802:9070:11db:a722:81b1:7143? ([2601:647:4802:9070:11db:a722:81b1:7143]) by smtp.gmail.com with ESMTPSA id u6sm2694377wrn.95.2020.07.30.09.10.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 30 Jul 2020 09:10:52 -0700 (PDT) Subject: Re: [RFC PATCH] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING To: Ming Lei Cc: Jens Axboe , Christoph Hellwig , linux-block@vger.kernel.org, Lai Jiangshan , "Paul E . McKenney" , Josh Triplett , Bart Van Assche References: <20200728134938.1505467-1-ming.lei@redhat.com> <20200729102856.GA1563056@T590> <20200729154957.GD1698748@T590> <20200730145325.GA1710335@T590> From: Sagi Grimberg Message-ID: <57689a6d-9e6f-bb28-dd5f-f575988e7cb6@grimberg.me> Date: Thu, 30 Jul 2020 09:10:48 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20200730145325.GA1710335@T590> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org >>>>>> In case of BLK_MQ_F_BLOCKING, blk-mq uses SRCU to mark read critical >>>>>> section during dispatching request, then request queue quiesce is based on >>>>>> SRCU. What we want to get is low cost added in fast path. >>>>>> >>>>>> However, from srcu_read_lock/srcu_read_unlock implementation, not see >>>>>> it is quicker than percpu refcount, so use percpu_ref to implement >>>>>> queue quiesce. This usage is cleaner and simpler & enough for implementing >>>>>> queue quiesce. The main requirement is to make sure all read sections to observe >>>>>> QUEUE_FLAG_QUIESCED once blk_mq_quiesce_queue() returns. >>>>>> >>>>>> Also it becomes much easier to add interface of async queue quiesce. >>>>> >>>>> BTW, no obvious IOPS difference is observed with this patch applied when running >>>>> io on null_blk(blocking, submit_queues=32) in one dual-socket, 32cores system. >>>> >>>> Thanks Ming, can you test for non-blocking on the same setup? >>> >>> OK, I can do that, but this patch supposes to not affect non-blocking, >>> care to share your motivation for testing non-blocking? >> >> I think it will be a significant improvement to have a single code path. >> The code will be more robust and we won't need to face issues that are >> specific for blocking. >> >> If the cost is negligible, I think the upside is worth it. >> > > rcu_read_lock and rcu_read_unlock has been proved as efficient enough, > and I don't think percpu_refcount is better than it, so I'd suggest to > not switch non-blocking into this way. It's not a matter of which is better, its a matter of making the code more robust because it has a single code-path. If moving to percpu_ref is negligible, I would suggest to move both, I don't want to have two completely different mechanism for blocking vs. non-blocking. > BTW, in case of blocking, one hctx may dispatch at most one request because there > is only single .run_work, even though when .queue_rq() is slept, that said > blk_mq_submit_bio() queues bio in sync style. This way won't be very efficient. > So percpu_refcount should be good enough for blocking code path, but may not be > well enough for non-blocking case. Not sure what you mean, the percpu_ref is taken exactly where rcu is taken, not sure what is the difference.