From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:41098 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754819AbcG1Jpf convert rfc822-to-8bit (ORCPT ); Thu, 28 Jul 2016 05:45:35 -0400 From: "Benjamin Coddington" To: "Trond Myklebust" Cc: "hch@infradead.org" , "List Linux NFS Mailing" Subject: Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics Date: Thu, 28 Jul 2016 05:47:21 -0400 Message-ID: In-Reply-To: <4F9280AE-46C2-4D1A-8CC9-F28DCFCE6271@primarydata.com> References: <1467844205-76852-23-git-send-email-trond.myklebust@primarydata.com> <1467844205-76852-24-git-send-email-trond.myklebust@primarydata.com> <1467844205-76852-25-git-send-email-trond.myklebust@primarydata.com> <20160718034847.GA1195@infradead.org> <1468817945.5273.2.camel@primarydata.com> <20160719035843.GA24437@infradead.org> <20160721082216.GA15247@infradead.org> <9E2543EF-0442-491E-AB22-3E1667DABBA4@redhat.com> <70375D08-FCA9-4C73-9991-38490C11B171@redhat.com> <8B8BB73D-39A6-4291-A10E-736B0F907459@primarydata.com> <737F0071-AAD2-49B5-9BD0-02DF98049B76@primarydata.com> <10749CC8-F76E-4DBD-9C27-9802475CF8A5@redhat.com> <3448E968-3564-43C1-ADBB-A16C9687B0F3@primarydata.com> <4CA4361B-2501-48AA-8B2C-715E5A3FD35C@redhat.com> <7EEB460E-F197-4B68-BBD1-7EEF16B77A71@redhat.com> <455341C5-F202-4FA7-8A9B-7AA73CFA21CF@primarydata.com> <06BBFA63-02D9-4BDE-A61D-14FE9F4E9E5F@primarydata.com> <4F9280AE-46C2-4D1A-8CC9-F28DCFCE6271@primarydata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 27 Jul 2016, at 14:05, Trond Myklebust wrote: >> On Jul 27, 2016, at 12:14, Benjamin Coddington >> wrote: >> >> On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >> >>>> On Jul 27, 2016, at 08:15, Trond Myklebust >>>> wrote: >>>> >>>> >>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington >>>>> wrote: >>>>> >>>>> After adding more debugging, I see that all of that is working >>>>> correctly, >>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 >>>>> from the >>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never >>>>> brings it back >>>>> up again. >>>>> >>>> >>>> Excellent! Thanks for debugging that. >>>> >>>>> Now I see that we should be marking the block extents as written >>>>> atomically with >>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a >>>>> LAYOUTCOMMIT can >>>>> collect extents just added from the next bl_write_cleanup(). >>>>> Then, the next >>>>> LAYOUTCOMMIT fails, and all we're left with is the size from the >>>>> first >>>>> LAYOUTCOMMIT. Not sure if that particular problem is the whole >>>>> fix, but >>>>> that's something to work on. >>>>> >>>>> I see ways to fix that: >>>>> >>>>> - make a new pnfs_set_layoutcommit_locked() that can be used to >>>>> call >>>>> ext_tree_mark_written() inside the i_lock >>>>> >>>>> - make another pnfs_layoutdriver_type operation to be used within >>>>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), >>>>> and call >>>>> ext_tree_mark_written() within that.. >>>>> >>>>> - have .prepare_layoutcommit return a new positive plh_lwb that >>>>> would >>>>> extend the current LAYOUTCOMMIT >>>>> >>>>> - make ext_tree_prepare_commit only encode up to plh_lwb >>>> >>>> I see no reason why ext_tree_prepare_commit() shouldn’t be >>>> allowed to extend the args->lastbytewritten. This is a metadata >>>> operation that is owned by the pNFS layout driver. >>>> The only thing I’d note is you should then rewrite the failure >>>> case in pnfs_layoutcommit_inode() so that it doesn’t rely on the >>>> saved “end_pos”, but uses args->lastbytewritten instead (with a >>>> comment to the effect why)… >>> >>> In fact, given the potential for races here, I think the right thing >>> to do is to have ext_tree_prepare_commit() always set the correct >>> value for args->lastbytewritten. >> >> OK, that has cleared up that common failure case that was getting in >> the >> way, but now it can still fail like this: >> > > Good progress! :-) > >> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR >> set, and sets NFS_INO_LAYOUTCOMMIT >> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears >> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING >> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR >> set, and sets NFS_INO_LAYOUTCOMMIT >> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, >> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING >> 1st nfs_getattr -> __revalidate_inode sets size 4096, >> NFS_INO_INVALID_ATTR not set.. cache is valid >> 2nd nfs_getattr immediately returns 4096 even though >> NFS_INO_LAYOUTCOMMIT > > Is this being tested on top of the current linux-next/testing? > Normally, I’d expect > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed > to cause 1st nfs_getattr() to not declare the cache valid. Yes, this is on your linux-next branch. When the 1st nfs_getattr() goes through nfs_update_inode() the second time (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by anything, since all the attributes returned match the cache. So even though NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is "false", the NFS_INO_INVALID_ATTR is never set in the "invalid" local variable. Should pnfs_layoutcommit_outstanding() always set NFS_INO_INVALID_ATTR? Ben