From mboxrd@z Thu Jan  1 00:00:00 1970
From: Tom Marshall <tom@cyngn.com>
Subject: Re: fs compression
Date: Wed, 27 May 2015 17:20:57 -0700
Message-ID: <20150528002057.GA30909@eden.sea.cyngn.com>
References: <20150513064802.GA48682@jaegeuk-mac02.hsd1.ca.comcast.net>
 <20150514003721.GN15721@dastard>
 <20150516132403.GA2998@thunk.org>
 <20150516171326.GA24795@eden.sea.cyngn.com>
 <20150520174635.GA17651@eden.sea.cyngn.com>
 <20150520213641.GM2871@thunk.org>
 <20150520224630.GA10927@eden.sea.cyngn.com>
 <20150521042819.GA14709@eden.sea.cyngn.com>
 <5566129D.9040509@cyngn.com>
 <20150527233800.GB18540@thunk.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Jaegeuk Kim <jaegeuk@kernel.org>, linux-fsdevel@vger.kernel.org
To: Theodore Ts'o <tytso@mit.edu>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-pa0-f46.google.com ([209.85.220.46]:33365 "EHLO
	mail-pa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750922AbbE1AVA (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 27 May 2015 20:21:00 -0400
Received: by padbw4 with SMTP id bw4so9715206pad.0
        for <linux-fsdevel@vger.kernel.org>; Wed, 27 May 2015 17:20:59 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <20150527233800.GB18540@thunk.org>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Wed, May 27, 2015 at 07:38:00PM -0400, Theodore Ts'o wrote:
> On Wed, May 27, 2015 at 11:53:17AM -0700, Tom Marshall wrote:
> > But one thing I'm wrestling with is how to be asynchronously notified when
> > the lower readpage/readpages complete.  The two ideas that come to mind are
> > (1) plumbing a callback into mpage_end_io(), (2) allowing override of
> > mpage_end_io() with a custom function, (3) creating kernel threads analogous
> > to kblockd to wait for pending pages.
> 
> Not all file systems use mpage_end_io(), so that's not a good
> solution.

Ah, thanks, I was not aware of that.  So that leaves waiting on pages, which
probably means a fair amount of plumbing to do correctly.

> You can do something like
> 
> 	wait_on_page_bit(page, PG_uptodate);
> 
> ... although to be robust you will also need to wake up if PG_error is
> set (if there is an I/O error, PG_error is set instead of
> PG_uptodate).  So that means you'd have to spin your own wait function
> using the waitqueue primitives and page_waitqueue(), using
> __wait_on_bit() as an initial model.

Right, that should be pretty easy.

> This suggestion should not be taken as an endorsement of your
> higher-levle architecture.  I suggest you think very carefully about
> whether or not you need to be able to support random write
> functionality, and if you don't, there are simpler ways such as the
> one I outlined to you earlier.

I recall this:

> [...] So it's better to have the file system supply the physical location on
> disk, and then to read in the compressed data to a scratched set of page
> which is freed immediately after you are done decompressing things.

Is that what you're referring to?

If so, I'm not seeing how this makes things simpler.  It's still
asynchronous, right?  ext4_readpage calls back into mpage_readpage which
uses ext4_get_block, which then queues a bio request.  I don't see any way
to avoid queueing asynchronous bio requests or even getting completion
notifications.

Now, I could pass ext4_get_block up into my code and setup my own bio
requests so that I can get callbacks.  But this basically means implementing
the equivalent of do_mpage_readpage in my own code, and that's not really
trivial code to copy/paste/hack.  And it also doesn't address filesystems
that don't use mpage_end_io(), right?

Am I missing something?