From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934443AbcECSOj (ORCPT ); Tue, 3 May 2016 14:14:39 -0400 Received: from mail-io0-f181.google.com ([209.85.223.181]:36443 "EHLO mail-io0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933754AbcECSOh (ORCPT ); Tue, 3 May 2016 14:14:37 -0400 Subject: Re: [PATCH 7/8] wbt: add general throttling mechanism To: Jan Kara , Jens Axboe References: <1461686131-22999-1-git-send-email-axboe@fb.com> <1461686131-22999-8-git-send-email-axboe@fb.com> <20160428110559.GC17362@quack2.suse.cz> <57225C3E.7060504@fb.com> <20160503093410.GD12748@quack2.suse.cz> <20160503154032.GG25436@quack2.suse.cz> <20160503154831.GH25436@quack2.suse.cz> <5728D90F.8080204@kernel.dk> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, dchinner@redhat.com, sedat.dilek@gmail.com From: Jens Axboe Message-ID: <5728EA8A.9040405@kernel.dk> Date: Tue, 3 May 2016 12:14:34 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <5728D90F.8080204@kernel.dk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/03/2016 10:59 AM, Jens Axboe wrote: > On 05/03/2016 09:48 AM, Jan Kara wrote: >> On Tue 03-05-16 17:40:32, Jan Kara wrote: >>> On Tue 03-05-16 11:34:10, Jan Kara wrote: >>>> Yeah, once I'll hunt down that regression with old disk, I can have >>>> a look >>>> into how writeback throttling plays together with blkio-controller. >>> >>> So I've tried the following script (note that you need cgroup v2 for >>> writeback IO to be throttled): >>> >>> --- >>> mkdir /sys/fs/cgroup/group1 >>> echo 1000 >/sys/fs/cgroup/group1/io.weight >>> dd if=/dev/zero of=/mnt/file1 bs=1M count=10000& >>> DD1=$! >>> echo $DD1 >/sys/fs/cgroup/group1/cgroup.procs >>> >>> mkdir /sys/fs/cgroup/group2 >>> echo 100 >/sys/fs/cgroup/group2/io.weight >>> #echo "259:65536 wbps=5000000" >/sys/fs/cgroup/group2/io.max >>> echo "259:65536 wbps=max" >/sys/fs/cgroup/group2/io.max >>> dd if=/dev/zero of=/mnt/file2 bs=1M count=10000& >>> DD2=$! >>> echo $DD2 >/sys/fs/cgroup/group2/cgroup.procs >>> >>> while true; do >>> sleep 1 >>> kill -USR1 $DD1 >>> kill -USR1 $DD2 >>> echo '=======================================================' >>> done >>> --- >>> >>> and watched the progress of the dd processes in different cgroups. >>> The 1/10 >>> weight difference has no effect with your writeback patches - the >>> situation >>> after one minute: >>> >>> 3120+1 records in >>> 3120+1 records out >>> 3272392704 bytes (3.3 GB) copied, 63.7119 s, 51.4 MB/s >>> 3217+1 records in >>> 3217+1 records out >>> 3374010368 bytes (3.4 GB) copied, 63.5819 s, 53.1 MB/s >>> >>> I should add that even without your patches the progress doesn't quite >>> correspond to the weight ratio: >> >> Forgot to fill in corresponding data for unpatched kernel here: >> >> 5962+2 records in >> 5962+2 records out >> 6252281856 bytes (6.3 GB) copied, 64.1719 s, 97.4 MB/s >> 1502+0 records in >> 1502+0 records out >> 1574961152 bytes (1.6 GB) copied, 64.207 s, 24.5 MB/s > > Thanks for testing this, I'll see what we can do about that. It stands > to reason that we'll throttle a heavier writer more, statistically. But > I'm assuming this above test was run basically with just the writes > going, so no real competition? And hence we end up throttling them > equally much, destroying the weighting in the process. But for both > cases, we basically don't pay any attention to cgroup weights. > >>> but still there is noticeable difference to cgroups with different >>> weights. >>> >>> OTOH blk-throttle combines well with your patches: Limiting one >>> cgroup to >>> 5 M/s results in numbers like: >>> >>> 3883+2 records in >>> 3883+2 records out >>> 4072091648 bytes (4.1 GB) copied, 36.6713 s, 111 MB/s >>> 413+0 records in >>> 413+0 records out >>> 433061888 bytes (433 MB) copied, 36.8939 s, 11.7 MB/s >>> >>> which is fine and comparable with unpatched kernel. Higher throughput >>> number is because we do buffered writes and dd reports what it wrote >>> into >>> page cache. And there is no wonder blk-throttle combines fine - it >>> throttles bios which happens before we reach writeback throttling >>> mechanism. > > OK, that's good, at least that part works fine. And yes, the throttle > path is hit before we end up in the make_request_fn, which is where wbt > drops in. > >>> So I belive this demonstrates that your writeback throttling just >>> doesn't >>> work well with selective scheduling policy that happens below it >>> because it >>> can essentially lead to IO priority inversion issues... > > It this testing still done on the QD=1 ATA disk? Not too surprising that > this falls apart, since we have very little room to maneuver. I wonder > if a normal SATA with NCQ would behave better in this regard. I'll have > to test a bit and think about how we can best handle this case. I think what we'll do for now is just disable wbt IFF we have a non-root cgroup attached to CFQ. Done here: http://git.kernel.dk/cgit/linux-block/commit/?h=wb-buf-throttle&id=7315756efe76bbdf83076fc9dbc569bbb4da5d32 We don't have a strong need for wbt (supposedly) since CFQ should take care of most of it, if you have policies set for proportional sharing. Longer term it's not a concern either, as we'll move away from that model anyway. -- Jens Axboe