From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751518Ab1AWKZf (ORCPT <rfc822;w@1wt.eu>);
	Sun, 23 Jan 2011 05:25:35 -0500
Received: from mail-fx0-f46.google.com ([209.85.161.46]:57346 "EHLO
	mail-fx0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751447Ab1AWKZd (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Sun, 23 Jan 2011 05:25:33 -0500
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=sender:date:from:to:cc:subject:message-id:references:mime-version
         :content-type:content-disposition:in-reply-to:user-agent;
        b=Ixp/VUUJzwVFPvFjsNd6kItyD1OtM8EuUCTRG8cWxDlYoaF98bibjLqRsJUA8aO44x
         iS6CFei5xyU5+BAdiZk6WoJUNuAfzuCC5XOoJCLF1pS9MFD6V1nS0QqqNlRgKxOijTUE
         klEpkbyQkK415v37dBtSpha5k1RkFENQNbN4k=
Date: Sun, 23 Jan 2011 11:25:26 +0100
From: Tejun Heo <tj@kernel.org>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: axboe@kernel.dk, tytso@mit.edu, djwong@us.ibm.com, shli@kernel.org,
        neilb@suse.de, adilger.kernel@dilger.ca, jack@suse.cz,
        snitzer@redhat.com, linux-kernel@vger.kernel.org, kmannth@us.ibm.com,
        cmm@us.ibm.com, linux-ext4@vger.kernel.org, rwheeler@redhat.com,
        hch@lst.de, josef@redhat.com
Subject: Re: [PATCH 3/3] block: reimplement FLUSH/FUA to support merge
Message-ID: <20110123102526.GA23121@htj.dyndns.org>
References: <1295625598-15203-1-git-send-email-tj@kernel.org>
 <1295625598-15203-4-git-send-email-tj@kernel.org>
 <20110121185617.GI12072@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110121185617.GI12072@redhat.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello,

On Fri, Jan 21, 2011 at 01:56:17PM -0500, Vivek Goyal wrote:
> > + * Currently, the following conditions are used to determine when to issue
> > + * flush.
> > + *
> > + * C1. At any given time, only one flush shall be in progress.  This makes
> > + *     double buffering sufficient.
> > + *
> > + * C2. Flush is not deferred if any request is executing DATA of its
> > + *     sequence.  This avoids issuing separate POSTFLUSHes for requests
> > + *     which shared PREFLUSH.
> 
> Tejun, did you mean "Flush is deferred" instead of "Flush is not deferred"
> above?

Oh yeah, I did.  :-)

> IIUC, C2 might help only if requests which contain data are also going to 
> issue postflush. Couple of cases come to mind.

That's true.  I didn't want to go too advanced on it.  I wanted
something which is fairly mechanical (without intricate parameters)
and effective enough for common cases.

> - If queue supports FUA, I think we will not issue POSTFLUSH. In that
>   case issuing next PREFLUSH which data is in flight might make sense.
>
> - Even if queue does not support FUA and we are only getting requests
>   with REQ_FLUSH then also waiting for data requests to finish before
>   issuing next FLUSH might not help.
> 
> - Even if queue does not support FUA and say we have a mix of REQ_FUA
>   and REQ_FLUSH, then this will help only if in a batch we have more
>   than 1 request which is going to issue POSTFLUSH and those postflush
>   will be merged.

Sure, not applying C2 and 3 if the underlying device supports REQ_FUA
would probably be the most compelling change of the bunch; however,
please keep in mind that issuing flush as soon as possible doesn't
necessarily result in better performance.  It's inherently a balancing
act between latency and throughput.  Even inducing artificial issue
latencies is likely to help if done right (as the ioscheds do).

So, I think it's better to start with something simple and improve it
with actual testing.  If the current simple implementation can match
Darrick's previous numbers, let's first settle the mechanisms.  We can
tune the latency/throughput balance all we want later.  Other than the
double buffering contraint (which can be relaxed too but I don't think
that would be necessary or a good idea) things can be easily adjusted
in blk_kick_flush().  It's intentionally designed that way.

> - Ric Wheeler was once mentioning that there are boxes which advertise
>   writeback cache but are battery backed so they ignore flush internally and
>   signal completion immediately. I am not sure how prevalent those
>   cases are but I think waiting for data to finish will delay processing
>   of new REQ_FLUSH requests in pending queue for such array. There
>   we will not anyway benefit from merging of FLUSH.

I don't really think we should design the whole thing around broken
devices which incorrectly report writeback cache when it need not.
The correct place to work around that is during device identification
not in the flush logic.

> Given that C2 is going to benefit primarily only if queue does not support
> FUA and we have many requets with REQ_FUA set, will it make sense to 
> put additional checks for C2. Atleast a simple queue support FUA
> check might help.
> 
> In practice does C2 really help or we can get rid of it entirely?

Again, issuing flushes as fast as possible isn't necessarily better.
It might feel counter-intuitive but it generally makes sense to delay
flush if there are a lot of concurrent flush activities going on.
Another related interesting point is that with flush merging,
depending on workload, there's a likelihood that FUA, even if the
device supports it, might result in worse performance than merged DATA
+ single POSTFLUSH sequence.

Thanks.

-- 
tejun