From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B63D2C6369E for ; Fri, 13 Nov 2020 20:58:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 78F5A2224D for ; Fri, 13 Nov 2020 20:58:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726439AbgKMU6N (ORCPT ); Fri, 13 Nov 2020 15:58:13 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:44342 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726003AbgKMU6N (ORCPT ); Fri, 13 Nov 2020 15:58:13 -0500 Received: by mail-pf1-f193.google.com with SMTP id y7so8617004pfq.11; Fri, 13 Nov 2020 12:58:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=eqiW5WCeO56holm7RXoi3eQgmDv2WVSj1ocHaA+r/9M=; b=SD0NIz4vD6djBocDFHuL9faqso6+ZL+OWbv6vafsErXswxuleWC2KWuNDKMWtgJelq wqmMjYhAdp/UsZB8V8ParcVhH3u/jU1tipksoZ+lcV9YwOXu9A7IpVuU8BkpfRDkgkuL xaaN/fbRFx0WO9Bdoxx0xvNoGhTb/uBUvRMJFEkwaoNx73yqfiKrbaSuZ+6iDsHIvhS9 8sY0i+ZlDsw6OfH3eWagT3F/t7oID5D/PtWJIRe5Xw8QiOf3Jtj84GMal5PVxR/aSrWF AyhGQOZCekU5E1JkHGtL6vg4a4AQTeWUW7ocW0u85ZsmSKA2ua/hGMtyvUhOlBVd25xm ikyA== X-Gm-Message-State: AOAM5313KifQhjHGaY1yWoqRMfkfZz1p7pwYHmyIrjoA6Q8xT0/RIDjZ crDdDv+T/v8cVNIj7wdkB1vVTFqJk+U= X-Google-Smtp-Source: ABdhPJwn08QDfPoq2VTpcaBPIWAw389P+6FOdpXkdFrwjnNGlFh6XICRT0Yz5zZUNeeVgaqEo7SODw== X-Received: by 2002:a62:7bcc:0:b029:18b:5859:d5e1 with SMTP id w195-20020a627bcc0000b029018b5859d5e1mr3398498pfc.40.1605301092806; Fri, 13 Nov 2020 12:58:12 -0800 (PST) Received: from ?IPv6:2601:647:4802:9070:be97:ffd:339d:919c? ([2601:647:4802:9070:be97:ffd:339d:919c]) by smtp.gmail.com with ESMTPSA id r127sm10585552pfc.159.2020.11.13.12.58.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 13 Nov 2020 12:58:12 -0800 (PST) Subject: Re: [PATCH] iosched: Add i10 I/O Scheduler To: Ming Lei , Rachit Agarwal Cc: Jens Axboe , Qizhe Cai , Rachit Agarwal , linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, Midhul Vuppalapati , Jaehyun Hwang , Rachit Agarwal , Keith Busch , Sagi Grimberg , Christoph Hellwig References: <20201112140752.1554-1-rach4x0r@gmail.com> <20201113145912.GA1074955@T590> From: Sagi Grimberg Message-ID: <44d5bcb0-689e-50c8-fa8e-a7d2b569f75c@grimberg.me> Date: Fri, 13 Nov 2020 12:58:10 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20201113145912.GA1074955@T590> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org > blk-mq actually has built-in batching(or sort of) mechanism, which is enabled > if the hw queue is busy(hctx->dispatch_busy is > 0). We use EWMA to compute > hctx->dispatch_busy, and it is adaptive, even though the implementation is quite > coarse. But there should be much space to improve, IMO. You are correct, however nvme-tcp should be getting to dispatch_busy > 0 IIUC. > It is reported that this way improves SQ high-end SCSI SSD very much[1], > and MMC performance gets improved too[2]. > > [1] https://lore.kernel.org/linux-block/3cc3e03901dc1a63ef32e036182521af@mail.gmail.com/ > [2] https://lore.kernel.org/linux-block/CADBw62o9eTQDJ9RvNgEqSpXmg6Xcq=2TxH0Hfxhp29uF2W=TXA@mail.gmail.com/ Yes, the guys paid attention to the MMC related improvements that you made. >> The i10 I/O scheduler builds upon recent work on [6]. We have tested the i10 I/O >> scheduler with nvme-tcp optimizaitons [2,3] and batching dispatch [4], varying number >> of cores, varying read/write ratios, and varying request sizes, and with NVMe SSD and >> RAM block device. For NVMe SSDs, the i10 I/O scheduler achieves ~60% improvements in >> terms of IOPS per core over "noop" I/O scheduler. These results are available at [5], >> and many additional results are presented in [6]. > > In case of none scheduler, basically nvme driver won't provide any queue busy > feedback, so the built-in batching dispatch doesn't work simply. Exactly. > kyber scheduler uses io latency feedback to throttle and build io batch, > can you compare i10 with kyber on nvme/nvme-tcp? I assume it should be simple to get, I'll let Rachit/Jaehyun comment. >> While other schedulers may also batch I/O (e.g., mq-deadline), the optimization target >> in the i10 I/O scheduler is throughput maximization. Hence there is no latency target >> nor a need for a global tracking context, so a new scheduler is needed rather than >> to build this functionality to an existing scheduler. >> >> We currently use fixed default values as batching thresholds (e.g., 16 for #requests, >> 64KB for #bytes, and 50us for timeout). These default values are based on sensitivity >> tests in [6]. For our future work, we plan to support adaptive batching according to > > Frankly speaking, hardcode 16 #rquests or 64KB may not work everywhere, > and product environment could be much complicated than your sensitivity > tests. If possible, please start with adaptive batching. That was my feedback as well for sure. But given that this is a scheduler one would opt-in to anyway, that won't be a must-have initially. I'm not sure if the guys made progress with this yet, I'll let them comment. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1CCAC6369E for ; Fri, 13 Nov 2020 20:58:22 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8CEBE2224A for ; Fri, 13 Nov 2020 20:58:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="VqojXNK0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8CEBE2224A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=grimberg.me Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=xYYyQIUzhZwJsz9uFQvrjsTjeTyC063XFPSgIQXm+oA=; b=VqojXNK00E6uBSqzywbIX3yD5 fMD8FstRJ3QOLhLB/KvrC1Kaag752V4HLI8ZuDsyF4YhbKn4WQSM5RHcy7D1zV5kuG8M0a8QH4I9V EppZz4jvKJv8cSrFgbOoXdWEbIISINCMCGFT6Gh3Dqzl0kZto/c2IBnK0j55GvIT20z/s34RuTKda z+xkTf6EFshGiJXcHKwDHPQwFRUa4jvTztkrzIfLPe7ly2j9XEFCf8d+Z1GQcDV89Ra2G1Rpzl048 rDKRkfB0wT3EEh0orgmkoy5pNZL5ECffDtwlNYhepdRkDPmoryjQcZJDq834AWXWcRrfAsNgwJgyL HgwYkjFIQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kdg9B-0007ch-8t; Fri, 13 Nov 2020 20:58:17 +0000 Received: from mail-pf1-f196.google.com ([209.85.210.196]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kdg98-0007bx-5m for linux-nvme@lists.infradead.org; Fri, 13 Nov 2020 20:58:15 +0000 Received: by mail-pf1-f196.google.com with SMTP id g7so8669173pfc.2 for ; Fri, 13 Nov 2020 12:58:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=eqiW5WCeO56holm7RXoi3eQgmDv2WVSj1ocHaA+r/9M=; b=pbNOIjWgtHeRg4sGySEocCBTgz4+MnTZ5g316btLM0QgBRud03R3C18//MUyduXqls rnEPYUGro+jD//B/L97CWwppn9UTEFe9NXr3QOFf12W2JUdn84pzsuUH0PACZaDTb9wv x2Yeva9Y/TPigAjy2hXWbuUwm+Gus6GM/jEIS6vzTNCdOimuWGaP+CcrGzqvHKCc1ZtT w+V50aPG7/FIEbVXYz80jlpFDTqxAmuSIAwH8tLoPM1nLJBIVox9o+fFcLPn8dtbzItN XH81f1SF0Jv2igYMpK9Ba20+q9ueccGYXWpeoDSKH4rFvCAz/XWJz6xlDUgeFlZTvQXr tkaQ== X-Gm-Message-State: AOAM532kObdrhuVW1rDUQ18CMRnkPZnbKaQkKHY5sw0h/tYrf/u89hUX D2ewTBl+pmqfMu4ZfF7CkCA= X-Google-Smtp-Source: ABdhPJwn08QDfPoq2VTpcaBPIWAw389P+6FOdpXkdFrwjnNGlFh6XICRT0Yz5zZUNeeVgaqEo7SODw== X-Received: by 2002:a62:7bcc:0:b029:18b:5859:d5e1 with SMTP id w195-20020a627bcc0000b029018b5859d5e1mr3398498pfc.40.1605301092806; Fri, 13 Nov 2020 12:58:12 -0800 (PST) Received: from ?IPv6:2601:647:4802:9070:be97:ffd:339d:919c? ([2601:647:4802:9070:be97:ffd:339d:919c]) by smtp.gmail.com with ESMTPSA id r127sm10585552pfc.159.2020.11.13.12.58.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 13 Nov 2020 12:58:12 -0800 (PST) Subject: Re: [PATCH] iosched: Add i10 I/O Scheduler To: Ming Lei , Rachit Agarwal References: <20201112140752.1554-1-rach4x0r@gmail.com> <20201113145912.GA1074955@T590> From: Sagi Grimberg Message-ID: <44d5bcb0-689e-50c8-fa8e-a7d2b569f75c@grimberg.me> Date: Fri, 13 Nov 2020 12:58:10 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20201113145912.GA1074955@T590> Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201113_155814_289895_681DB1CB X-CRM114-Status: GOOD ( 21.11 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jens Axboe , linux-block@vger.kernel.org, Rachit Agarwal , linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, Qizhe Cai , Midhul Vuppalapati , Jaehyun Hwang , Rachit Agarwal , Keith Busch , Sagi Grimberg , Christoph Hellwig Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org > blk-mq actually has built-in batching(or sort of) mechanism, which is enabled > if the hw queue is busy(hctx->dispatch_busy is > 0). We use EWMA to compute > hctx->dispatch_busy, and it is adaptive, even though the implementation is quite > coarse. But there should be much space to improve, IMO. You are correct, however nvme-tcp should be getting to dispatch_busy > 0 IIUC. > It is reported that this way improves SQ high-end SCSI SSD very much[1], > and MMC performance gets improved too[2]. > > [1] https://lore.kernel.org/linux-block/3cc3e03901dc1a63ef32e036182521af@mail.gmail.com/ > [2] https://lore.kernel.org/linux-block/CADBw62o9eTQDJ9RvNgEqSpXmg6Xcq=2TxH0Hfxhp29uF2W=TXA@mail.gmail.com/ Yes, the guys paid attention to the MMC related improvements that you made. >> The i10 I/O scheduler builds upon recent work on [6]. We have tested the i10 I/O >> scheduler with nvme-tcp optimizaitons [2,3] and batching dispatch [4], varying number >> of cores, varying read/write ratios, and varying request sizes, and with NVMe SSD and >> RAM block device. For NVMe SSDs, the i10 I/O scheduler achieves ~60% improvements in >> terms of IOPS per core over "noop" I/O scheduler. These results are available at [5], >> and many additional results are presented in [6]. > > In case of none scheduler, basically nvme driver won't provide any queue busy > feedback, so the built-in batching dispatch doesn't work simply. Exactly. > kyber scheduler uses io latency feedback to throttle and build io batch, > can you compare i10 with kyber on nvme/nvme-tcp? I assume it should be simple to get, I'll let Rachit/Jaehyun comment. >> While other schedulers may also batch I/O (e.g., mq-deadline), the optimization target >> in the i10 I/O scheduler is throughput maximization. Hence there is no latency target >> nor a need for a global tracking context, so a new scheduler is needed rather than >> to build this functionality to an existing scheduler. >> >> We currently use fixed default values as batching thresholds (e.g., 16 for #requests, >> 64KB for #bytes, and 50us for timeout). These default values are based on sensitivity >> tests in [6]. For our future work, we plan to support adaptive batching according to > > Frankly speaking, hardcode 16 #rquests or 64KB may not work everywhere, > and product environment could be much complicated than your sensitivity > tests. If possible, please start with adaptive batching. That was my feedback as well for sure. But given that this is a scheduler one would opt-in to anyway, that won't be a must-have initially. I'm not sure if the guys made progress with this yet, I'll let them comment. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme