From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maxim Patlasov Subject: Re: [PATCH 10/16] fuse: Implement writepages callback Date: Fri, 9 Aug 2013 19:02:12 +0400 Message-ID: <52050474.8040608@parallels.com> References: <20130629172211.20175.70154.stgit@maximpc.sw.ru> <20130629174525.20175.18987.stgit@maximpc.sw.ru> <20130719165037.GA18358@tucsk.piliscsaba.szeredi.hu> <51FBD2DF.50506@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable Cc: , Kirill Korotaev , Pavel Emelianov , fuse-devel , Brian Foster , Kernel Mailing List , James Bottomley , , Al Viro , Linux-Fsdevel , Andrew Morton , , , Mel Gorman To: Miklos Szeredi Return-path: In-Reply-To: Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org Hi Miklos, 08/06/2013 08:25 PM, Miklos Szeredi =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > On Fri, Aug 2, 2013 at 5:40 PM, Maxim Patlasov wrote: >> 07/19/2013 08:50 PM, Miklos Szeredi =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >> >>> On Sat, Jun 29, 2013 at 09:45:29PM +0400, Maxim Patlasov wrote: >>>> From: Pavel Emelyanov >>>> >>>> The .writepages one is required to make each writeback request carry= more >>>> than >>>> one page on it. The patch enables optimized behaviour unconditionall= y, >>>> i.e. mmap-ed writes will benefit from the patch even if >>>> fc->writeback_cache=3D0. >>> I rewrote this a bit, so we won't have to do the thing in two passes, >>> which >>> makes it simpler and more robust. Waiting for page writeback here is >>> wrong >>> anyway, see comment above fuse_page_mkwrite(). BTW we had a race the= re >>> because >>> fuse_page_mkwrite() didn't take the page lock. I've also fixed that = up >>> and >>> pushed a series containing these patches up to implementing ->writepa= ges() >>> to >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git >>> writepages >>> >>> Passed some trivial testing but more is needed. >> >> Thanks a lot for efforts. The approach you implemented looks promising= , but >> it introduces the following assumption: a page cannot become dirty bef= ore we >> have a chance to wait on fuse writeback holding the page locked. This = is >> already true for mmap-ed writes (due to your fixes) and it seems doabl= e for >> cached writes as well (like we do in fuse_perform_write). But the assu= mption >> seems to be broken in case of direct read from local fs (e.g. ext4) to= a >> memory region mmap-ed to a file on fuse fs. See how dio_bio_submit() m= arks >> pages dirty by bio_set_pages_dirty(). I can't see any solution for thi= s >> use-case. Do you? > Hmm. Direct IO on an mmaped file will do get_user_pages() which will > do the necessary page fault magic and ->page_mkwrite() will be called. > At least AFAICS. Yes, I agree. > > The page cannot become dirty through a memory mapping without first > switching the pte from read-only to read-write first. Page accounting > logic relies on this too. The other way the page can become dirty is > through write(2) on the fs. But we do get notified about that too. Yes, that's correct, but I don't understand why you disregard two other=20 cases of marking page dirty (both related to direct AIO read from a file=20 to a memory region mmap-ed to a fuse file): 1. dio_bio_submit() --> bio_set_pages_dirty() --> set_page_dirty_lock() 2. dio_bio_complete() --> bio_check_pages_dirty() --> bio_dirty_fn() --> bio_set_pages_dirty() --> set_page_dirty_lock() As soon as a page became dirty through a memory mapping (exactly as you=20 explained), nothing would prevent it to be written-back. And fuse will=20 call end_page_writeback almost immediately after copying the real page=20 to a temporary one. Then dio_bio_submit may re-dirty page speculatively=20 w/o notifying fuse. And again, since then nothing would prevent it to be=20 written-back once more. Hence we can end up in more then one temporary=20 page in fuse write-back. And similar concern for dio_bio_complete()=20 re-dirty. This make me think that we do need fuse_page_is_writeback() in=20 fuse_writepages_fill(). But it shouldn't be harmful because it will=20 no-op practically always due to waiting for fuse writeback in=20 ->page_mkwrite() and in course of handling write(2). Thanks, Maxim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org