From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:45064 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932411AbcG1Pgd convert rfc822-to-8bit (ORCPT ); Thu, 28 Jul 2016 11:36:33 -0400 From: "Benjamin Coddington" To: "Trond Myklebust" Cc: "hch@infradead.org" , "List Linux NFS Mailing" Subject: Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics Date: Thu, 28 Jul 2016 11:38:18 -0400 Message-ID: <82A0BF81-D7D4-4972-B331-0BA60529E5BC@redhat.com> In-Reply-To: <54CFDC65-76C2-46AE-9401-861EA4AA8127@primarydata.com> References: <1467844205-76852-23-git-send-email-trond.myklebust@primarydata.com> <1467844205-76852-24-git-send-email-trond.myklebust@primarydata.com> <1467844205-76852-25-git-send-email-trond.myklebust@primarydata.com> <20160718034847.GA1195@infradead.org> <1468817945.5273.2.camel@primarydata.com> <20160719035843.GA24437@infradead.org> <20160721082216.GA15247@infradead.org> <9E2543EF-0442-491E-AB22-3E1667DABBA4@redhat.com> <70375D08-FCA9-4C73-9991-38490C11B171@redhat.com> <8B8BB73D-39A6-4291-A10E-736B0F907459@primarydata.com> <737F0071-AAD2-49B5-9BD0-02DF98049B76@primarydata.com> <10749CC8-F76E-4DBD-9C27-9802475CF8A5@redhat.com> <3448E968-3564-43C1-ADBB-A16C9687B0F3@primarydata.com> <4CA4361B-2501-48AA-8B2C-715E5A3FD35C@redhat.com> <7EEB460E-F197-4B68-BBD1-7EEF16B77A71@redhat.com> <455341C5-F202-4FA7-8A9B-7AA73CFA21CF@primarydata.com> <06BBFA63-02D9-4BDE-A61D-14FE9F4E9E5F@primarydata.com> <4F9280AE-46C2-4D1A-8CC9-F28DCFCE6271@primarydata.com> <4FC77AD6-40D9-48E4-BF08-0EA03334BDC7@primarydata.com> <54CFDC65-76C2-46AE-9401-861EA4AA8127@primarydata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 28 Jul 2016, at 10:04, Trond Myklebust wrote: >> On Jul 28, 2016, at 08:31, Trond Myklebust >> wrote: >> >> >>> On Jul 28, 2016, at 05:47, Benjamin Coddington >>> wrote: >>> >>> >>> On 27 Jul 2016, at 14:05, Trond Myklebust wrote: >>> >>>>> On Jul 27, 2016, at 12:14, Benjamin Coddington >>>>> wrote: >>>>> >>>>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >>>>> >>>>>>> On Jul 27, 2016, at 08:15, Trond Myklebust >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington >>>>>>>> wrote: >>>>>>>> >>>>>>>> After adding more debugging, I see that all of that is working >>>>>>>> correctly, >>>>>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 >>>>>>>> from the >>>>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never >>>>>>>> brings it back >>>>>>>> up again. >>>>>>>> >>>>>>> >>>>>>> Excellent! Thanks for debugging that. >>>>>>> >>>>>>>> Now I see that we should be marking the block extents as >>>>>>>> written atomically with >>>>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a >>>>>>>> LAYOUTCOMMIT can >>>>>>>> collect extents just added from the next bl_write_cleanup(). >>>>>>>> Then, the next >>>>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from >>>>>>>> the first >>>>>>>> LAYOUTCOMMIT. Not sure if that particular problem is the whole >>>>>>>> fix, but >>>>>>>> that's something to work on. >>>>>>>> >>>>>>>> I see ways to fix that: >>>>>>>> >>>>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used to >>>>>>>> call >>>>>>>> ext_tree_mark_written() inside the i_lock >>>>>>>> >>>>>>>> - make another pnfs_layoutdriver_type operation to be used >>>>>>>> within >>>>>>>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), >>>>>>>> and call >>>>>>>> ext_tree_mark_written() within that.. >>>>>>>> >>>>>>>> - have .prepare_layoutcommit return a new positive plh_lwb that >>>>>>>> would >>>>>>>> extend the current LAYOUTCOMMIT >>>>>>>> >>>>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb >>>>>>> >>>>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be >>>>>>> allowed to extend the args->lastbytewritten. This is a metadata >>>>>>> operation that is owned by the pNFS layout driver. >>>>>>> The only thing I’d note is you should then rewrite the failure >>>>>>> case in pnfs_layoutcommit_inode() so that it doesn’t rely on >>>>>>> the saved “end_pos”, but uses args->lastbytewritten instead >>>>>>> (with a comment to the effect why)… >>>>>> >>>>>> In fact, given the potential for races here, I think the right >>>>>> thing to do is to have ext_tree_prepare_commit() always set the >>>>>> correct value for args->lastbytewritten. >>>>> >>>>> OK, that has cleared up that common failure case that was getting >>>>> in the >>>>> way, but now it can still fail like this: >>>>> >>>> >>>> Good progress! :-) >>>> >>>>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR >>>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears >>>>> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING >>>>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR >>>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, >>>>> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING >>>>> 1st nfs_getattr -> __revalidate_inode sets size 4096, >>>>> NFS_INO_INVALID_ATTR not set.. cache is valid >>>>> 2nd nfs_getattr immediately returns 4096 even though >>>>> NFS_INO_LAYOUTCOMMIT >>>> >>>> Is this being tested on top of the current linux-next/testing? >>>> Normally, I’d expect >>>> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed >>>> to cause 1st nfs_getattr() to not declare the cache valid. >>> >>> Yes, this is on your linux-next branch. >>> >>> When the 1st nfs_getattr() goes through nfs_update_inode() the >>> second time >>> (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by >>> anything, >>> since all the attributes returned match the cache. So even though >>> NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is >>> "false", >>> the NFS_INO_INVALID_ATTR is never set in the "invalid" local >>> variable. >>> >>> Should pnfs_layoutcommit_outstanding() always set >>> NFS_INO_INVALID_ATTR? >>> >>> Ben >> >> nfs_post_op_update_inode_locked() should be doing that as part of the >> callchain in nfs_writeback_update_inode(). >> > > By the way. I just noticed that nothing appears to be using the > attributes we retrieve as part of the layoutcommit call. Does adding a > nfs_refresh_inode() to the “success” path in > nfs4_layoutcommit_done() perhaps help? We do it in layoutcommit_release: nfs4_layoutcommit_done [nfsv4]() { ... } nfs4_layoutcommit_release [nfsv4]() { ... nfs_post_op_update_inode_force_wcc [nfs]() { nfs_post_op_update_inode_force_wcc_locked [nfs]() { nfs_post_op_update_inode_locked [nfs]() { nfs4_have_delegation [nfsv4]() { nfs4_do_check_delegation [nfsv4](); } nfs_refresh_inode_locked [nfs]() { nfs_update_inode [nfs]() { Should I still try adding it in nfs4_layoutcommit_done()? Ben