From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Hellwig Subject: Re: [PATCH 17/18] xfs: implement pnfs export operations Date: Wed, 7 Jan 2015 11:40:10 +0100 Message-ID: <20150107104010.GD28783@lst.de> References: <1420561721-9150-1-git-send-email-hch@lst.de> <1420561721-9150-18-git-send-email-hch@lst.de> <20150107002434.GG31508@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Christoph Hellwig , "J. Bruce Fields" , Jeff Layton , linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org To: Dave Chinner Return-path: Content-Disposition: inline In-Reply-To: <20150107002434.GG31508@dastard> Sender: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-fsdevel.vger.kernel.org On Wed, Jan 07, 2015 at 11:24:34AM +1100, Dave Chinner wrote: > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > index fdc6422..2b86be8 100644 > > --- a/fs/xfs/xfs_fsops.c > > +++ b/fs/xfs/xfs_fsops.c > > @@ -601,6 +601,8 @@ xfs_growfs_data( > > if (!mutex_trylock(&mp->m_growlock)) > > return -EWOULDBLOCK; > > error =3D xfs_growfs_data_private(mp, in); > > + if (!error) > > + mp->m_generation++; > > mutex_unlock(&mp->m_growlock); > > return error; > > } >=20 > I couldn't find an explanation of what this generation number is > for. What are it's semantics w.r.t. server crashes? The generation is incremented when we grow the filesystem, so that a new layout (block mapping) returned to the cl=D1=96ent referers to th= e new NFS device ID, which will make the client aware of the new size. The device IDs aren't persistent, so after a server crash / reboot we'll start at zero again. I'll add comments explaining this to the code. > Why does this function get passed an offset it is not actually used? Historic reasons.. > > +static int > > +xfs_fs_update_flags( > > + struct xfs_inode *ip) > > +{ > > + struct xfs_mount *mp =3D ip->i_mount; > > + struct xfs_trans *tp; > > + int error; > > + > > + /* > > + * Update the mode, and prealloc flag bits. > > + */ > > + tp =3D xfs_trans_alloc(mp, XFS_TRANS_WRITEID); > > + error =3D xfs_trans_reserve(tp, &M_RES(mp)->tr_writeid, 0, 0); > > + if (error) { > > + xfs_trans_cancel(tp, 0); > > + return error; > > + } > > + > > + xfs_ilock(ip, XFS_ILOCK_EXCL); > > + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); > > + ip->i_d.di_mode &=3D ~S_ISUID; > > + if (ip->i_d.di_mode & S_IXGRP) > > + ip->i_d.di_mode &=3D ~S_ISGID; > > + > > + ip->i_d.di_flags |=3D XFS_DIFLAG_PREALLOC; > > + > > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); > > + return xfs_trans_commit(tp, 0); > > +} >=20 > That needs timestamp changes as well. i.e.: >=20 > xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); The time stamps are only updated when we actually commit the data. Updating them here might be harmless, but I'll have to dig into the protocol specification and tests a bit more to check if doing the additional timestamp update would be harmless. > > + > > +/* > > + * Get a layout for the pNFS client. > > + * > > + * Note that in the allocation case we do force out the transactio= n here. > > + * There is no metadata update that is required to be stable for N= =46S > > + * semantics, and layouts are not valid over a server crash. Inst= ead > > + * we'll have to be careful in the commit routine as it might pass= us > > + * blocks for an allocation that never made it to disk in the reco= very > > + * case. >=20 > I think you are saying that because block allocation is an async > transaction, then we have to deal with the possibility that we crash > before the transaction hits the disk. >=20 > How often do we have to allocate > new blocks like this? Do we need to use async transactions for this > case, or should we simply do the brute force thing (by making the > allocation transaction synchronous) initially and then, if > performance problems arise, optimise from there? Every block allocation from a pNFS client goes through this path, so yes it is performance critical. > > + xfs_map_iomap(ip, iomap, &imap, offset); > > + *device_generation =3D mp->m_generation; >=20 > So whenever the server first starts up the generation number in a > map is going to be zero - what purpose does this actually serve? So that we can communicate if a device was grown to the client, which in this case needs to re-read the device information. > > + if (!length) > > + continue; > > + > > + error =3D xfs_iomap_write_unwritten(ip, start, length); > > + if (error) > > + goto out_drop_iolock; > > + } > > + > > + /* > > + * Make sure reads through the pagecache see the new data. > > + */ > > + invalidate_inode_pages2(inode->i_mapping); >=20 > Probably should do that first. Also, what happens if there is local > dirty data on the file at this point? Doesn't this just toss them > away? If there was local data it will be tossed. For regular writes this can= 't happen because we really outstanding layouts in the write path. For mmap we for now ignore this problem, as a pNFS server should generally not be used locally. =20 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id 49B957F8B for ; Wed, 7 Jan 2015 04:40:15 -0600 (CST) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay2.corp.sgi.com (Postfix) with ESMTP id 08E1E304039 for ; Wed, 7 Jan 2015 02:40:14 -0800 (PST) Received: from newverein.lst.de (verein.lst.de [213.95.11.211]) by cuda.sgi.com with ESMTP id zgELpLL9o2qqbIiB (version=TLSv1 cipher=AES256-SHA bits=256 verify=NO) for ; Wed, 07 Jan 2015 02:40:13 -0800 (PST) Date: Wed, 7 Jan 2015 11:40:10 +0100 From: Christoph Hellwig Subject: Re: [PATCH 17/18] xfs: implement pnfs export operations Message-ID: <20150107104010.GD28783@lst.de> References: <1420561721-9150-1-git-send-email-hch@lst.de> <1420561721-9150-18-git-send-email-hch@lst.de> <20150107002434.GG31508@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150107002434.GG31508@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: linux-nfs@vger.kernel.org, xfs@oss.sgi.com, "J. Bruce Fields" , Jeff Layton , linux-fsdevel@vger.kernel.org, Christoph Hellwig T24gV2VkLCBKYW4gMDcsIDIwMTUgYXQgMTE6MjQ6MzRBTSArMTEwMCwgRGF2ZSBDaGlubmVyIHdy b3RlOgo+ID4gZGlmZiAtLWdpdCBhL2ZzL3hmcy94ZnNfZnNvcHMuYyBiL2ZzL3hmcy94ZnNfZnNv cHMuYwo+ID4gaW5kZXggZmRjNjQyMi4uMmI4NmJlOCAxMDA2NDQKPiA+IC0tLSBhL2ZzL3hmcy94 ZnNfZnNvcHMuYwo+ID4gKysrIGIvZnMveGZzL3hmc19mc29wcy5jCj4gPiBAQCAtNjAxLDYgKzYw MSw4IEBAIHhmc19ncm93ZnNfZGF0YSgKPiA+ICAJaWYgKCFtdXRleF90cnlsb2NrKCZtcC0+bV9n cm93bG9jaykpCj4gPiAgCQlyZXR1cm4gLUVXT1VMREJMT0NLOwo+ID4gIAllcnJvciA9IHhmc19n cm93ZnNfZGF0YV9wcml2YXRlKG1wLCBpbik7Cj4gPiArCWlmICghZXJyb3IpCj4gPiArCQltcC0+ bV9nZW5lcmF0aW9uKys7Cj4gPiAgCW11dGV4X3VubG9jaygmbXAtPm1fZ3Jvd2xvY2spOwo+ID4g IAlyZXR1cm4gZXJyb3I7Cj4gPiAgfQo+IAo+IEkgY291bGRuJ3QgZmluZCBhbiBleHBsYW5hdGlv biBvZiB3aGF0IHRoaXMgZ2VuZXJhdGlvbiBudW1iZXIgaXMKPiBmb3IuIFdoYXQgYXJlIGl0J3Mg c2VtYW50aWNzIHcuci50LiBzZXJ2ZXIgY3Jhc2hlcz8KClRoZSBnZW5lcmF0aW9uIGlzIGluY3Jl bWVudGVkIHdoZW4gd2UgZ3JvdyB0aGUgZmlsZXN5c3RlbSwgc28gdGhhdAphIG5ldyBsYXlvdXQg KGJsb2NrIG1hcHBpbmcpIHJldHVybmVkIHRvIHRoZSBjbNGWZW50IHJlZmVyZXJzIHRvIHRoZQpu ZXcgTkZTIGRldmljZSBJRCwgd2hpY2ggd2lsbCBtYWtlIHRoZSBjbGllbnQgYXdhcmUgb2YgdGhl IG5ldyBzaXplLgoKVGhlIGRldmljZSBJRHMgYXJlbid0IHBlcnNpc3RlbnQsIHNvIGFmdGVyIGEg c2VydmVyIGNyYXNoIC8gcmVib290CndlJ2xsIHN0YXJ0IGF0IHplcm8gYWdhaW4uCgpJJ2xsIGFk ZCBjb21tZW50cyBleHBsYWluaW5nIHRoaXMgdG8gdGhlIGNvZGUuCgo+IFdoeSBkb2VzIHRoaXMg ZnVuY3Rpb24gZ2V0IHBhc3NlZCBhbiBvZmZzZXQgaXQgaXMgbm90IGFjdHVhbGx5IHVzZWQ/CgpI aXN0b3JpYyByZWFzb25zLi4KCj4gPiArc3RhdGljIGludAo+ID4gK3hmc19mc191cGRhdGVfZmxh Z3MoCj4gPiArCXN0cnVjdCB4ZnNfaW5vZGUJKmlwKQo+ID4gK3sKPiA+ICsJc3RydWN0IHhmc19t b3VudAkqbXAgPSBpcC0+aV9tb3VudDsKPiA+ICsJc3RydWN0IHhmc190cmFucwkqdHA7Cj4gPiAr CWludAkJCWVycm9yOwo+ID4gKwo+ID4gKwkvKgo+ID4gKwkgKiBVcGRhdGUgdGhlIG1vZGUsIGFu ZCBwcmVhbGxvYyBmbGFnIGJpdHMuCj4gPiArCSAqLwo+ID4gKwl0cCA9IHhmc190cmFuc19hbGxv YyhtcCwgWEZTX1RSQU5TX1dSSVRFSUQpOwo+ID4gKwllcnJvciA9IHhmc190cmFuc19yZXNlcnZl KHRwLCAmTV9SRVMobXApLT50cl93cml0ZWlkLCAwLCAwKTsKPiA+ICsJaWYgKGVycm9yKSB7Cj4g PiArCQl4ZnNfdHJhbnNfY2FuY2VsKHRwLCAwKTsKPiA+ICsJCXJldHVybiBlcnJvcjsKPiA+ICsJ fQo+ID4gKwo+ID4gKwl4ZnNfaWxvY2soaXAsIFhGU19JTE9DS19FWENMKTsKPiA+ICsJeGZzX3Ry YW5zX2lqb2luKHRwLCBpcCwgWEZTX0lMT0NLX0VYQ0wpOwo+ID4gKwlpcC0+aV9kLmRpX21vZGUg Jj0gflNfSVNVSUQ7Cj4gPiArCWlmIChpcC0+aV9kLmRpX21vZGUgJiBTX0lYR1JQKQo+ID4gKwkJ aXAtPmlfZC5kaV9tb2RlICY9IH5TX0lTR0lEOwo+ID4gKwo+ID4gKwlpcC0+aV9kLmRpX2ZsYWdz IHw9IFhGU19ESUZMQUdfUFJFQUxMT0M7Cj4gPiArCj4gPiArCXhmc190cmFuc19sb2dfaW5vZGUo dHAsIGlwLCBYRlNfSUxPR19DT1JFKTsKPiA+ICsJcmV0dXJuIHhmc190cmFuc19jb21taXQodHAs IDApOwo+ID4gK30KPiAKPiBUaGF0IG5lZWRzIHRpbWVzdGFtcCBjaGFuZ2VzIGFzIHdlbGwuIGku ZS46Cj4gCj4gCXhmc190cmFuc19pY2hndGltZSh0cCwgaXAsIFhGU19JQ0hHVElNRV9NT0QgfCBY RlNfSUNIR1RJTUVfQ0hHKTsKClRoZSB0aW1lIHN0YW1wcyBhcmUgb25seSB1cGRhdGVkIHdoZW4g d2UgYWN0dWFsbHkgY29tbWl0IHRoZSBkYXRhLgpVcGRhdGluZyB0aGVtIGhlcmUgbWlnaHQgYmUg aGFybWxlc3MsIGJ1dCBJJ2xsIGhhdmUgdG8gZGlnIGludG8gdGhlCnByb3RvY29sIHNwZWNpZmlj YXRpb24gYW5kIHRlc3RzIGEgYml0IG1vcmUgdG8gY2hlY2sgaWYgZG9pbmcgdGhlCmFkZGl0aW9u YWwgdGltZXN0YW1wIHVwZGF0ZSB3b3VsZCBiZSBoYXJtbGVzcy4KCj4gPiArCj4gPiArLyoKPiA+ ICsgKiBHZXQgYSBsYXlvdXQgZm9yIHRoZSBwTkZTIGNsaWVudC4KPiA+ICsgKgo+ID4gKyAqIE5v dGUgdGhhdCBpbiB0aGUgYWxsb2NhdGlvbiBjYXNlIHdlIGRvIGZvcmNlIG91dCB0aGUgdHJhbnNh Y3Rpb24gaGVyZS4KPiA+ICsgKiBUaGVyZSBpcyBubyBtZXRhZGF0YSB1cGRhdGUgdGhhdCBpcyBy ZXF1aXJlZCB0byBiZSBzdGFibGUgZm9yIE5GUwo+ID4gKyAqIHNlbWFudGljcywgYW5kIGxheW91 dHMgYXJlIG5vdCB2YWxpZCBvdmVyIGEgc2VydmVyIGNyYXNoLiAgSW5zdGVhZAo+ID4gKyAqIHdl J2xsIGhhdmUgdG8gYmUgY2FyZWZ1bCBpbiB0aGUgY29tbWl0IHJvdXRpbmUgYXMgaXQgbWlnaHQg cGFzcyB1cwo+ID4gKyAqIGJsb2NrcyBmb3IgYW4gYWxsb2NhdGlvbiB0aGF0IG5ldmVyIG1hZGUg aXQgdG8gZGlzayBpbiB0aGUgcmVjb3ZlcnkKPiA+ICsgKiBjYXNlLgo+IAo+IEkgdGhpbmsgeW91 IGFyZSBzYXlpbmcgdGhhdCBiZWNhdXNlIGJsb2NrIGFsbG9jYXRpb24gaXMgYW4gYXN5bmMKPiB0 cmFuc2FjdGlvbiwgdGhlbiB3ZSBoYXZlIHRvIGRlYWwgd2l0aCB0aGUgcG9zc2liaWxpdHkgdGhh dCB3ZSBjcmFzaAo+IGJlZm9yZSB0aGUgdHJhbnNhY3Rpb24gaGl0cyB0aGUgZGlzay4KPiAKPiBI b3cgb2Z0ZW4gZG8gd2UgaGF2ZSB0byBhbGxvY2F0ZQo+IG5ldyBibG9ja3MgbGlrZSB0aGlzPyBE byB3ZSBuZWVkIHRvIHVzZSBhc3luYyB0cmFuc2FjdGlvbnMgZm9yIHRoaXMKPiBjYXNlLCBvciBz aG91bGQgd2Ugc2ltcGx5IGRvIHRoZSBicnV0ZSBmb3JjZSB0aGluZyAoYnkgbWFraW5nIHRoZQo+ IGFsbG9jYXRpb24gdHJhbnNhY3Rpb24gc3luY2hyb25vdXMpIGluaXRpYWxseSBhbmQgdGhlbiwg aWYKPiBwZXJmb3JtYW5jZSBwcm9ibGVtcyBhcmlzZSwgb3B0aW1pc2UgZnJvbSB0aGVyZT8KCkV2 ZXJ5IGJsb2NrIGFsbG9jYXRpb24gZnJvbSBhIHBORlMgY2xpZW50IGdvZXMgdGhyb3VnaCB0aGlz IHBhdGgsIHNvCnllcyBpdCBpcyBwZXJmb3JtYW5jZSBjcml0aWNhbC4KCj4gPiArCXhmc19tYXBf aW9tYXAoaXAsIGlvbWFwLCAmaW1hcCwgb2Zmc2V0KTsKPiA+ICsJKmRldmljZV9nZW5lcmF0aW9u ID0gbXAtPm1fZ2VuZXJhdGlvbjsKPiAKPiBTbyB3aGVuZXZlciB0aGUgc2VydmVyIGZpcnN0IHN0 YXJ0cyB1cCB0aGUgZ2VuZXJhdGlvbiBudW1iZXIgaW4gYQo+IG1hcCBpcyBnb2luZyB0byBiZSB6 ZXJvIC0gd2hhdCBwdXJwb3NlIGRvZXMgdGhpcyBhY3R1YWxseSBzZXJ2ZT8KClNvIHRoYXQgd2Ug Y2FuIGNvbW11bmljYXRlIGlmIGEgZGV2aWNlIHdhcyBncm93biB0byB0aGUgY2xpZW50LCB3aGlj aAppbiB0aGlzIGNhc2UgbmVlZHMgdG8gcmUtcmVhZCB0aGUgZGV2aWNlIGluZm9ybWF0aW9uLgoK PiA+ICsJCWlmICghbGVuZ3RoKQo+ID4gKwkJCWNvbnRpbnVlOwo+ID4gKwo+ID4gKwkJZXJyb3Ig PSB4ZnNfaW9tYXBfd3JpdGVfdW53cml0dGVuKGlwLCBzdGFydCwgbGVuZ3RoKTsKPiA+ICsJCWlm IChlcnJvcikKPiA+ICsJCQlnb3RvIG91dF9kcm9wX2lvbG9jazsKPiA+ICsJfQo+ID4gKwo+ID4g KwkvKgo+ID4gKwkgKiBNYWtlIHN1cmUgcmVhZHMgdGhyb3VnaCB0aGUgcGFnZWNhY2hlIHNlZSB0 aGUgbmV3IGRhdGEuCj4gPiArCSAqLwo+ID4gKwlpbnZhbGlkYXRlX2lub2RlX3BhZ2VzMihpbm9k ZS0+aV9tYXBwaW5nKTsKPiAKPiBQcm9iYWJseSBzaG91bGQgZG8gdGhhdCBmaXJzdC4gQWxzbywg d2hhdCBoYXBwZW5zIGlmIHRoZXJlIGlzIGxvY2FsCj4gZGlydHkgZGF0YSBvbiB0aGUgZmlsZSBh dCB0aGlzIHBvaW50PyBEb2Vzbid0IHRoaXMganVzdCB0b3NzIHRoZW0KPiBhd2F5PwoKSWYgdGhl cmUgd2FzIGxvY2FsIGRhdGEgaXQgd2lsbCBiZSB0b3NzZWQuICBGb3IgcmVndWxhciB3cml0ZXMg dGhpcyBjYW4ndApoYXBwZW4gYmVjYXVzZSB3ZSByZWFsbHkgb3V0c3RhbmRpbmcgbGF5b3V0cyBp biB0aGUgd3JpdGUgcGF0aC4gIEZvcgptbWFwIHdlIGZvciBub3cgaWdub3JlIHRoaXMgcHJvYmxl bSwgYXMgYSBwTkZTIHNlcnZlciBzaG91bGQgZ2VuZXJhbGx5Cm5vdCBiZSB1c2VkIGxvY2FsbHku ICAKCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCnhmcyBt YWlsaW5nIGxpc3QKeGZzQG9zcy5zZ2kuY29tCmh0dHA6Ly9vc3Muc2dpLmNvbS9tYWlsbWFuL2xp c3RpbmZvL3hmcwo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: linux-nfs-owner@vger.kernel.org Received: from verein.lst.de ([213.95.11.211]:59249 "EHLO newverein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751064AbbAGKkP (ORCPT ); Wed, 7 Jan 2015 05:40:15 -0500 Date: Wed, 7 Jan 2015 11:40:10 +0100 From: Christoph Hellwig To: Dave Chinner Cc: Christoph Hellwig , "J. Bruce Fields" , Jeff Layton , linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com Subject: Re: [PATCH 17/18] xfs: implement pnfs export operations Message-ID: <20150107104010.GD28783@lst.de> References: <1420561721-9150-1-git-send-email-hch@lst.de> <1420561721-9150-18-git-send-email-hch@lst.de> <20150107002434.GG31508@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 In-Reply-To: <20150107002434.GG31508@dastard> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, Jan 07, 2015 at 11:24:34AM +1100, Dave Chinner wrote: > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > index fdc6422..2b86be8 100644 > > --- a/fs/xfs/xfs_fsops.c > > +++ b/fs/xfs/xfs_fsops.c > > @@ -601,6 +601,8 @@ xfs_growfs_data( > > if (!mutex_trylock(&mp->m_growlock)) > > return -EWOULDBLOCK; > > error = xfs_growfs_data_private(mp, in); > > + if (!error) > > + mp->m_generation++; > > mutex_unlock(&mp->m_growlock); > > return error; > > } > > I couldn't find an explanation of what this generation number is > for. What are it's semantics w.r.t. server crashes? The generation is incremented when we grow the filesystem, so that a new layout (block mapping) returned to the clіent referers to the new NFS device ID, which will make the client aware of the new size. The device IDs aren't persistent, so after a server crash / reboot we'll start at zero again. I'll add comments explaining this to the code. > Why does this function get passed an offset it is not actually used? Historic reasons.. > > +static int > > +xfs_fs_update_flags( > > + struct xfs_inode *ip) > > +{ > > + struct xfs_mount *mp = ip->i_mount; > > + struct xfs_trans *tp; > > + int error; > > + > > + /* > > + * Update the mode, and prealloc flag bits. > > + */ > > + tp = xfs_trans_alloc(mp, XFS_TRANS_WRITEID); > > + error = xfs_trans_reserve(tp, &M_RES(mp)->tr_writeid, 0, 0); > > + if (error) { > > + xfs_trans_cancel(tp, 0); > > + return error; > > + } > > + > > + xfs_ilock(ip, XFS_ILOCK_EXCL); > > + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); > > + ip->i_d.di_mode &= ~S_ISUID; > > + if (ip->i_d.di_mode & S_IXGRP) > > + ip->i_d.di_mode &= ~S_ISGID; > > + > > + ip->i_d.di_flags |= XFS_DIFLAG_PREALLOC; > > + > > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); > > + return xfs_trans_commit(tp, 0); > > +} > > That needs timestamp changes as well. i.e.: > > xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); The time stamps are only updated when we actually commit the data. Updating them here might be harmless, but I'll have to dig into the protocol specification and tests a bit more to check if doing the additional timestamp update would be harmless. > > + > > +/* > > + * Get a layout for the pNFS client. > > + * > > + * Note that in the allocation case we do force out the transaction here. > > + * There is no metadata update that is required to be stable for NFS > > + * semantics, and layouts are not valid over a server crash. Instead > > + * we'll have to be careful in the commit routine as it might pass us > > + * blocks for an allocation that never made it to disk in the recovery > > + * case. > > I think you are saying that because block allocation is an async > transaction, then we have to deal with the possibility that we crash > before the transaction hits the disk. > > How often do we have to allocate > new blocks like this? Do we need to use async transactions for this > case, or should we simply do the brute force thing (by making the > allocation transaction synchronous) initially and then, if > performance problems arise, optimise from there? Every block allocation from a pNFS client goes through this path, so yes it is performance critical. > > + xfs_map_iomap(ip, iomap, &imap, offset); > > + *device_generation = mp->m_generation; > > So whenever the server first starts up the generation number in a > map is going to be zero - what purpose does this actually serve? So that we can communicate if a device was grown to the client, which in this case needs to re-read the device information. > > + if (!length) > > + continue; > > + > > + error = xfs_iomap_write_unwritten(ip, start, length); > > + if (error) > > + goto out_drop_iolock; > > + } > > + > > + /* > > + * Make sure reads through the pagecache see the new data. > > + */ > > + invalidate_inode_pages2(inode->i_mapping); > > Probably should do that first. Also, what happens if there is local > dirty data on the file at this point? Doesn't this just toss them > away? If there was local data it will be tossed. For regular writes this can't happen because we really outstanding layouts in the write path. For mmap we for now ignore this problem, as a pNFS server should generally not be used locally.