linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miklos Szeredi <miklos@szeredi.hu>
To: Maxim Patlasov <mpatlasov@parallels.com>
Cc: riel@redhat.com, Kirill Korotaev <dev@parallels.com>,
	Pavel Emelianov <xemul@parallels.com>,
	fuse-devel <fuse-devel@lists.sourceforge.net>,
	Brian Foster <bfoster@redhat.com>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>,
	James Bottomley <jbottomley@parallels.com>,
	linux-mm@kvack.org, Al Viro <viro@zeniv.linux.org.uk>,
	Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	fengguang.wu@intel.com, devel@openvz.org,
	Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH 10/16] fuse: Implement writepages callback
Date: Fri, 30 Aug 2013 12:12:07 +0200	[thread overview]
Message-ID: <20130830101207.GD19636@tucsk.piliscsaba.szeredi.hu> (raw)
In-Reply-To: <52050474.8040608@parallels.com>

On Fri, Aug 09, 2013 at 07:02:12PM +0400, Maxim Patlasov wrote:
> 08/06/2013 08:25 PM, Miklos Szeredi пишет:
> >Hmm.  Direct IO on an mmaped file will do get_user_pages() which will
> >do the necessary page fault magic and ->page_mkwrite() will be called.
> >At least AFAICS.
> 
> Yes, I agree.
> 
> >
> >The page cannot become dirty through a memory mapping without first
> >switching the pte from read-only to read-write first.  Page accounting
> >logic relies on this too.  The other way the page can become dirty is
> >through write(2) on the fs.  But we do get notified about that too.
> 
> Yes, that's correct, but I don't understand why you disregard two
> other cases of marking page dirty (both related to direct AIO read
> from a file to a memory region mmap-ed to a fuse file):
> 
> 1. dio_bio_submit() -->
>       bio_set_pages_dirty() -->
>         set_page_dirty_lock()
> 
> 2. dio_bio_complete() -->
>       bio_check_pages_dirty() -->
>          bio_dirty_fn() -->
>             bio_set_pages_dirty() -->
>                set_page_dirty_lock()
> 
> As soon as a page became dirty through a memory mapping (exactly as
> you explained), nothing would prevent it to be written-back. And
> fuse will call end_page_writeback almost immediately after copying
> the real page to a temporary one. Then dio_bio_submit may re-dirty
> page speculatively w/o notifying fuse. And again, since then nothing
> would prevent it to be written-back once more. Hence we can end up
> in more then one temporary page in fuse write-back. And similar
> concern for dio_bio_complete() re-dirty.
> 
> This make me think that we do need fuse_page_is_writeback() in
> fuse_writepages_fill(). But it shouldn't be harmful because it will
> no-op practically always due to waiting for fuse writeback in
> ->page_mkwrite() and in course of handling write(2).

The problem is: if we need it in ->writepages, we need it in ->writepage too.
And that's where we can't have it because it would deadlock in reclaim.

There's a way to work around this:

   - if the request is still in queue, just update it with the contents of the
     new page

   - if the request already in userspace, create a new reqest, but only let
     userspace have it once the previous request for the same page completes, so
     the ordering is not messed up

But that's a lot of hairy code.

Any other ideas?

The best would be if we could get rid of the ugly temporary page requirement for
fuse writeback.  The big blocker has always been direct reclaim: allocation must
not wait on fuse writebacks.  AFAICS there's still a wait_on_page_writeback() in
relation to memcg.  And it interacts with page migration which also has them.
Those are the really difficult ones...

The other offender, balance_dirty_pages() is much easier and needs to be tought
about fuse behavior anyway.

Thoughts?

Thanks,
Miklos

  reply	other threads:[~2013-08-30 10:12 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-29 17:41 [PATCH v5 00/16] fuse: An attempt to implement a write-back cache policy Maxim Patlasov
2013-06-29 17:42 ` [PATCH 01/16] fuse: Linking file to inode helper Maxim Patlasov
2013-06-29 17:42 ` [PATCH 02/16] fuse: Getting file for writeback helper Maxim Patlasov
2013-06-29 17:42 ` [PATCH 03/16] fuse: Prepare to handle short reads Maxim Patlasov
2013-06-29 17:42 ` [PATCH 04/16] fuse: Prepare to handle multiple pages in writeback Maxim Patlasov
2013-06-29 17:42 ` [PATCH 05/16] fuse: Connection bit for enabling writeback Maxim Patlasov
2013-06-29 17:44 ` [PATCH 06/16] fuse: Trust kernel i_size only - v4 Maxim Patlasov
2013-06-29 17:44 ` [PATCH 07/16] fuse: Trust kernel i_mtime only Maxim Patlasov
2013-07-11 16:14   ` [PATCH 07/16] fuse: Trust kernel i_mtime only -v2 Maxim Patlasov
2013-06-29 17:45 ` [PATCH 08/16] fuse: Flush files on wb close Maxim Patlasov
2013-07-11 16:18   ` [PATCH 08/16] fuse: Flush files on wb close -v2 Maxim Patlasov
2013-06-29 17:45 ` [PATCH 09/16] fuse: restructure fuse_readpage() Maxim Patlasov
2013-06-29 17:45 ` [PATCH 10/16] fuse: Implement writepages callback Maxim Patlasov
2013-07-19 16:50   ` Miklos Szeredi
2013-08-02 15:40     ` Maxim Patlasov
2013-08-06 16:25       ` Miklos Szeredi
2013-08-09 15:02         ` Maxim Patlasov
2013-08-30 10:12           ` Miklos Szeredi [this message]
2013-08-30 14:50             ` Maxim Patlasov
2013-09-03 10:31               ` Miklos Szeredi
2013-09-03 16:02                 ` Maxim Patlasov
2013-06-29 17:45 ` [PATCH 11/16] fuse: Implement write_begin/write_end callbacks Maxim Patlasov
2013-06-29 17:46 ` [PATCH 12/16] fuse: fuse_writepage_locked() should wait on writeback Maxim Patlasov
2013-06-29 17:46 ` [PATCH 13/16] fuse: fuse_flush() " Maxim Patlasov
2013-06-29 17:46 ` [PATCH 14/16] fuse: Fix O_DIRECT operations vs cached writeback misorder - v2 Maxim Patlasov
2013-06-29 17:47 ` [PATCH 15/16] fuse: Turn writeback cache on Maxim Patlasov
2013-06-29 17:48 ` [PATCH 16/16] mm: strictlimit feature Maxim Patlasov
2013-07-01 21:16   ` Andrew Morton
2013-07-02  8:33     ` Maxim Patlasov
2013-07-02 17:44   ` [PATCH] mm: strictlimit feature -v2 Maxim Patlasov
2013-07-02 19:38     ` Andrew Morton
2013-07-03 11:01       ` Maxim Patlasov
2013-07-03 23:16         ` Jan Kara
2013-07-05 13:14           ` Maxim Patlasov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130830101207.GD19636@tucsk.piliscsaba.szeredi.hu \
    --to=miklos@szeredi.hu \
    --cc=akpm@linux-foundation.org \
    --cc=bfoster@redhat.com \
    --cc=dev@parallels.com \
    --cc=devel@openvz.org \
    --cc=fengguang.wu@intel.com \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=jbottomley@parallels.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mpatlasov@parallels.com \
    --cc=riel@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).