From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26522C5519F for ; Wed, 18 Nov 2020 05:55:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BF22D24698 for ; Wed, 18 Nov 2020 05:55:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Q0P6hCC/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725787AbgKRFzV (ORCPT ); Wed, 18 Nov 2020 00:55:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725355AbgKRFzV (ORCPT ); Wed, 18 Nov 2020 00:55:21 -0500 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A205EC0613D4 for ; Tue, 17 Nov 2020 21:55:20 -0800 (PST) Received: by mail-qt1-x842.google.com with SMTP id g20so873908qtu.4 for ; Tue, 17 Nov 2020 21:55:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=V9+stv06XmqYfnxd4Gam5RDpKKF5YIl6UJbUAF1P1ys=; b=Q0P6hCC/ziN9pS44jx9RvewbjnL2cwPQF1+o+UstRGIGHGjc/ZXP6I7cRVsXHR5D6x GbZh7g/uJWqbbwVd8IcQWq48x0EOYHCB1zeSMvksQCeMVJHlG3yKlc01C7lAQ8QW4hvh FT/VOAez+f1YcmtsOlM8kUvjVcss14klohaa0G2C0BfEaBwvxBhie0dB7Rah0dsBwxiY FHasLuRqOoL5WTWzU2va9PO2XCSlwVOYz5Cr/F3nSLo6hX+C8n4jdS5RUQAeZobQldAZ PsUBCrHN6ExYRabArcyJXN3Lqg7aJai0X2y1Gz8TxGM60QmY89ivVW514r9muvhRVuKq Fj1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=V9+stv06XmqYfnxd4Gam5RDpKKF5YIl6UJbUAF1P1ys=; b=iPbBxN3xNyabs7p7NgKPKxzb/cT0IblQ66k72zoYcamqkdUvsgE/doWMSTAJoCgojT sdBGzwhrZsuA5smvnBgE4d9xBxBaQzffSyPPL2YXE5+aOzoaGNe/VPqQFdcI+aC+OTa5 debuEqNYuXVeW2R4XIxZEkm/VUNPnRXH+bVNsFtW/jPUIe87Nqq2GDaJXQDYiAuWjAhT 0+nieN2sgPYBG1vYUMV+KWbYOOweU5HDKvfabbwCxOVkQusb59V/deF6JLu13WsmTWXd /2gZBJT2davfzPZVITRXSUOBnp1xqkgtmFjEuKT8YRXQDkAnCmm08/ILTGkirhS9/xjH 2mjw== X-Gm-Message-State: AOAM532keY0waSSjppqQ8IHWVnlNOGhbBEro3PbmHM2kzAq3Oy0LsX4o /jT4uRMjMNWHSx+Veb5Lq0vSn+pNzpWX2T45ZpcqXYCpkjUkag== X-Google-Smtp-Source: ABdhPJzwtI1W2AJvSVb8eegz9BU9bY7/MfmBwxVYQe72bw1FGFbDpWonAgoaz2TkDASomsrQlIhtQS6FVtoGALBYfaI= X-Received: by 2002:aed:2091:: with SMTP id 17mr3230644qtb.342.1605678919773; Tue, 17 Nov 2020 21:55:19 -0800 (PST) MIME-Version: 1.0 References: <20201027045411.GA39796@192.168.3.9> <20201117032756.GE56247@T590> <20201117074039.GA74954@T590> In-Reply-To: <20201117074039.GA74954@T590> From: Weiping Zhang Date: Wed, 18 Nov 2020 13:55:08 +0800 Message-ID: Subject: Re: [PATCH v5 0/2] fix inaccurate io_ticks To: Ming Lei Cc: Jens Axboe , Mike Snitzer , mpatocka@redhat.com, linux-block@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Tue, Nov 17, 2020 at 3:40 PM Ming Lei wrote: > > On Tue, Nov 17, 2020 at 12:59:46PM +0800, Weiping Zhang wrote: > > On Tue, Nov 17, 2020 at 11:28 AM Ming Lei wrote: > > > > > > On Tue, Nov 17, 2020 at 11:01:49AM +0800, Weiping Zhang wrote: > > > > Hi Jens, > > > > > > > > Ping > > > > > > Hello Weiping, > > > > > > Not sure we have to fix this issue, and adding blk_mq_queue_inflight() > > > back to IO path brings cost which turns out to be visible, and I did > > > get soft lockup report on Azure NVMe because of this kind of cost. > > > > > Have you test v5, this patch is different from v1, the v1 gets > > inflight for each IO, > > v5 has changed to get inflight every jiffer. > > I meant the issue can be reproduced on kernel before 5b18b5a73760("block: > delete part_round_stats and switch to less precise counting"). > > Also do we really need to fix this issue? I understand device > utilization becomes not accurate at very small load, is it really > worth of adding runtime load in fast path for fixing this issue? > Hello Ming, The problem is user hard to know how busy disk is, for small load, it shows high utilization, for heavy load it also shows high utilization, that makes %util meaningless. The following test case shows a big gap with same workload: modprobe null_blk submit_queues=8 queue_mode=2 irqmode=2 completion_nsec=100000 fio -name=test -ioengine=sync -bs=4K -rw=write -filename=/dev/nullb0 -size=100M -time_based=1 -direct=1 -runtime=300 -rate=4m & w/s w_await %util ----------------------------------------------- before patch 1024 0.15 100 after patch 1024 0.15 14.5 I know for hyper speed disk, add such accounting in fast path is harmful, maybe we add an interface to enable/disable io_ticks accounting, like what /sys/block//queue/iostat does. eg: /sys/block//queue/iostat_io_ticks when write 0 to it, just disable io_ticks totally. Or any other good idea ? > > > > If for v5, can we reproduce it on null_blk ? > > No, I just saw report on Azure NVMe. > > > > > > BTW, suppose the io accounting issue needs to be fixed, just wondering > > > why not simply revert 5b18b5a73760 ("block: delete part_round_stats and > > > switch to less precise counting"), and the original way had been worked > > > for decades. > > > > > This patch is more better than before, it will break early when find there is > > inflight io on any cpu, for the worst case(the io in running on the last cpu), > > it iterates all cpus. > Yes, it's the worst case. Actually v5 has two improvements compare to before 5b18b5a73760: 1. for io end, v5 do not get inflight count 2. for io start, v5 just find the first inflight io in any cpu, for the worst case it does same as before. > Please see the following case: > > 1) one device has 256 hw queues, and the system has 256 cpu cores, and > each hw queue's depth is 1k. > > 2) there isn't any io load on CPUs(0 ~ 254) > > 3) heavy io load is run on CPU 255 > > So with your trick the code still need to iterate hw queues from 0 to 254, and > the load isn't something which can be ignored. Especially it is just for > io accounting. > > > Thanks, > Ming > Thanks