From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77FF6C28CC3 for ; Thu, 30 May 2019 08:29:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5C27824F57 for ; Thu, 30 May 2019 08:29:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726716AbfE3I3d (ORCPT ); Thu, 30 May 2019 04:29:33 -0400 Received: from outgoing-stata.csail.mit.edu ([128.30.2.210]:56488 "EHLO outgoing-stata.csail.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726439AbfE3I3c (ORCPT ); Thu, 30 May 2019 04:29:32 -0400 Received: from c-73-193-85-113.hsd1.wa.comcast.net ([73.193.85.113] helo=srivatsab-a01.vmware.com) by outgoing-stata.csail.mit.edu with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1hWGRC-000QA7-QF; Thu, 30 May 2019 04:29:26 -0400 Subject: Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller To: Paolo Valente Cc: linux-fsdevel@vger.kernel.org, linux-block , linux-ext4@vger.kernel.org, cgroups@vger.kernel.org, kernel list , Jens Axboe , Jan Kara , jmoyer@redhat.com, Theodore Ts'o , amakhalov@vmware.com, anishs@vmware.com, srivatsab@vmware.com References: <8d72fcf7-bbb4-2965-1a06-e9fc177a8938@csail.mit.edu> <46c6a4be-f567-3621-2e16-0e341762b828@csail.mit.edu> <07D11833-8285-49C2-943D-E4C1D23E8859@linaro.org> <5B6570A2-541A-4CF8-98E0-979EA6E3717D@linaro.org> <2CB39B34-21EE-4A95-A073-8633CF2D187C@linaro.org> <0e3fdf31-70d9-26eb-7b42-2795d4b03722@csail.mit.edu> <686D6469-9DE7-4738-B92A-002144C3E63E@linaro.org> <01d55216-5718-767a-e1e6-aadc67b632f4@csail.mit.edu> <6FE0A98F-1E3D-4EF6-8B38-2C85741924A4@linaro.org> <2A58C239-EF3F-422B-8D87-E7A3B500C57C@linaro.org> <5b71028c-72f0-73dd-0cd5-f28ff298a0a3@csail.mit.edu> From: "Srivatsa S. Bhat" Message-ID: <0d6e3c02-1952-2177-02d7-10ebeb133940@csail.mit.edu> Date: Thu, 30 May 2019 01:29:23 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 5/29/19 12:41 AM, Paolo Valente wrote: > > >> Il giorno 29 mag 2019, alle ore 03:09, Srivatsa S. Bhat ha scritto: >> >> On 5/23/19 11:51 PM, Paolo Valente wrote: >>> >>>> Il giorno 24 mag 2019, alle ore 01:43, Srivatsa S. Bhat ha scritto: >>>> >>>> When trying to run multiple dd tasks simultaneously, I get the kernel >>>> panic shown below (mainline is fine, without these patches). >>>> >>> >>> Could you please provide me somehow with a list *(bfq_serv_to_charge+0x21) ? >>> >> >> Hi Paolo, >> >> Sorry for the delay! Here you go: >> >> (gdb) list *(bfq_serv_to_charge+0x21) >> 0xffffffff814bad91 is in bfq_serv_to_charge (./include/linux/blkdev.h:919). >> 914 >> 915 extern unsigned int blk_rq_err_bytes(const struct request *rq); >> 916 >> 917 static inline unsigned int blk_rq_sectors(const struct request *rq) >> 918 { >> 919 return blk_rq_bytes(rq) >> SECTOR_SHIFT; >> 920 } >> 921 >> 922 static inline unsigned int blk_rq_cur_sectors(const struct request *rq) >> 923 { >> (gdb) >> >> >> For some reason, I've not been able to reproduce this issue after >> reporting it here. (Perhaps I got lucky when I hit the kernel panic >> a bunch of times last week). >> >> I'll test with your fix applied and see how it goes. >> > > Great! the offending line above gives me hope that my fix is correct. > If no more failures occur, then I'm eager (and a little worried ...) > to see how it goes with throughput :) > Your fix held up well under my testing :) As for throughput, with low_latency = 1, I get around 1.4 MB/s with bfq (vs 1.6 MB/s with mq-deadline). This is a huge improvement compared to what it was before (70 KB/s). With tracing on, the throughput is a bit lower (as expected I guess), about 1 MB/s, and the corresponding trace file (trace-waker-detection-1MBps) is available at: https://www.dropbox.com/s/3roycp1zwk372zo/bfq-traces.tar.gz?dl=0 Thank you so much for your tireless efforts in fixing this issue! Regards, Srivatsa VMware Photon OS