From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933262AbcECMkW (ORCPT ); Tue, 3 May 2016 08:40:22 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:49975 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932825AbcECMkT (ORCPT ); Tue, 3 May 2016 08:40:19 -0400 Date: Tue, 3 May 2016 08:40:11 -0400 From: Chris Mason To: Jan Kara CC: Jens Axboe , , , , , Subject: Re: [PATCHSET v5] Make background writeback great again for the first time Message-ID: <20160503124011.igocaapb2nvnjj3o@floor.masoncoding.com> Mail-Followup-To: Chris Mason , Jan Kara , Jens Axboe , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, dchinner@redhat.com, sedat.dilek@gmail.com References: <1461686131-22999-1-git-send-email-axboe@fb.com> <20160427180105.GA17362@quack2.suse.cz> <5721021E.8060006@fb.com> <20160427203708.GA25397@kernel.dk> <20160427205915.GC25397@kernel.dk> <20160428115401.GD17362@quack2.suse.cz> <57225A91.50002@kernel.dk> <20160503121719.GA25436@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20160503121719.GA25436@quack2.suse.cz> User-Agent: Mutt/1.5.23.1 (2014-03-12) X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-05-03_05:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 03, 2016 at 02:17:19PM +0200, Jan Kara wrote: > On Thu 28-04-16 12:46:41, Jens Axboe wrote: > > >>- rwb->wb_max = 1 + ((depth - 1) >> min(31U, rwb->scale_step)); > > >>- rwb->wb_normal = (rwb->wb_max + 1) / 2; > > >>- rwb->wb_background = (rwb->wb_max + 3) / 4; > > >>+ if (rwb->queue_depth == 1) { > > >>+ rwb->wb_max = rwb->wb_normal = 2; > > >>+ rwb->wb_background = 1; > > > > > >This breaks the detection of too big scale_step in scale_up() where we key > > >of wb_max == 1 value. However even with that fixed no luck :(: > > > > Yeah, I need to look at that. For QD=1, I think the only sensible values for > > max/normal/bg is 2/2/1 and 1/1/1 if we step down. > > > > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync > > >Runtime: 105.126 107.125 105.641 > > > > > >So about the same as before. I'll try to debug this later today... > > > > Thanks, I'm very interested in what you find! > > OK, so the reason was relatively standard in the end. I was using ext3 (or > more exactly ext4 without delayed allocation) for the test. The throttling > of background writes gave more priority to writes from the journalling > thread which happen with WRITE_SYNC and thus are not throttled. Thus the > journalling thread ended up having to do more data writeback to be able to > commit a transaction (due to requirements of data=ordered mode) and it is > less efficient at that than the normal flusher thread. > > So this is an example where throttling background writeback effectively > just pushes more work into another context which does it less efficiently > and indirectly makes everyone wait for it. ext3 has been always sensitive to > issues like this. ext4 is using delayed allocation and thus only data > writes into holes end up being part of a transaction -> simple dd test case > doesn't hit that path. And indeed when I repeat the same test with ext4, > the numbers with and without your patch are exactly the same. > > The question remains how common a pattern where throttling of background > writeback delays also something else is. I'll schedule a couple of > benchmarks to measure impact of your patches for a wider range of workloads > (but sadly pretty limited set of hw). If ext3 is the only one seeing > issues, I would be willing to accept that ext3 takes the hit since it is > doing something rather stupid (but inherent in its journal design) and we > have a way to deal with this either by enabling delayed allocation or by > turning off the writeback throttling... At least in the case of io that we know is going to be data=ordered, we can bump the prio of those pages? -chris