* I can't get no readdir satisfaction @ 2016-08-23 15:09 Benjamin Coddington 2016-08-23 15:36 ` Trond Myklebust 2016-08-23 19:36 ` J. Bruce Fields 0 siblings, 2 replies; 19+ messages in thread From: Benjamin Coddington @ 2016-08-23 15:09 UTC (permalink / raw) To: List Linux NFS Mailing Hi linux-nfs, 311324ad1713 ("NFS: Be more aggressive in using readdirplus for 'ls -l' situations") changed when nfs_readdir() decides to revalidate the directory's mapping, which contains all the entries. In addition to just checking if the attribute cache has expired, it includes a check to see if NFS_INO_INVALID_DATA is set on the directory. Well, customers that have directories with very many dentries and that same directory's attributes are frequently updated are now grouchy that `ls -l` takes so long since any update of the directory causes the mapping to be invalidated and we have to start over filling the directory's mapping. I actually haven't put real hard thought into it yet (because often for me that just wastes a lot of time), so I am doing the lazy thing by asking this question: Can we go back to just the using the attribute cache timeout, or should we get all heuristical about readdir? Ben ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-23 15:09 I can't get no readdir satisfaction Benjamin Coddington @ 2016-08-23 15:36 ` Trond Myklebust 2016-08-23 21:21 ` Benjamin Coddington 2016-08-23 19:36 ` J. Bruce Fields 1 sibling, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2016-08-23 15:36 UTC (permalink / raw) To: Coddington Benjamin; +Cc: List Linux NFS Mailing > On Aug 23, 2016, at 11:09, Benjamin Coddington <bcodding@redhat.com> wrot= e: >=20 > Hi linux-nfs, >=20 > 311324ad1713 ("NFS: Be more aggressive in using readdirplus for 'ls -l' > situations") changed when nfs_readdir() decides to revalidate the > directory's mapping, which contains all the entries. In addition to just > checking if the attribute cache has expired, it includes a check to see i= f > NFS_INO_INVALID_DATA is set on the directory. >=20 > Well, customers that have directories with very many dentries and that sa= me > directory's attributes are frequently updated are now grouchy that `ls -l= ` > takes so long since any update of the directory causes the mapping to be > invalidated and we have to start over filling the directory's mapping. >=20 > I actually haven't put real hard thought into it yet (because often for m= e > that just wastes a lot of time), so I am doing the lazy thing by asking t= his > question: >=20 > Can we go back to just the using the attribute cache timeout, or should w= e > get all heuristical about readdir? >=20 We are all heuristical at this point. How are the heuristics failing? The original problem those heuristics were designed to solve was that all t= he stat() calls took forever to complete, since they are all synchronous; T= igran showed some very convincing numbers for a large directory where the d= ifference in performance was an order of magnitude improved by using readdi= rplus instead of readdir=85 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-23 15:36 ` Trond Myklebust @ 2016-08-23 21:21 ` Benjamin Coddington 2016-08-24 12:18 ` Trond Myklebust 2016-08-24 13:02 ` Benjamin Coddington 0 siblings, 2 replies; 19+ messages in thread From: Benjamin Coddington @ 2016-08-23 21:21 UTC (permalink / raw) To: Trond Myklebust; +Cc: List Linux NFS Mailing On 23 Aug 2016, at 11:36, Trond Myklebust wrote: >> On Aug 23, 2016, at 11:09, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> Hi linux-nfs, >> >> 311324ad1713 ("NFS: Be more aggressive in using readdirplus for 'ls >> -l' >> situations") changed when nfs_readdir() decides to revalidate the >> directory's mapping, which contains all the entries. In addition to >> just >> checking if the attribute cache has expired, it includes a check to >> see if >> NFS_INO_INVALID_DATA is set on the directory. >> >> Well, customers that have directories with very many dentries and >> that same >> directory's attributes are frequently updated are now grouchy that >> `ls -l` >> takes so long since any update of the directory causes the mapping to >> be >> invalidated and we have to start over filling the directory's >> mapping. >> >> I actually haven't put real hard thought into it yet (because often >> for me >> that just wastes a lot of time), so I am doing the lazy thing by >> asking this >> question: >> >> Can we go back to just the using the attribute cache timeout, or >> should we >> get all heuristical about readdir? >> > > We are all heuristical at this point. How are the heuristics failing? > > The original problem those heuristics were designed to solve was that > all > the stat() calls took forever to complete, since they are all > synchronous; > Tigran showed some very convincing numbers for a large directory where > the > difference in performance was an order of magnitude improved by using > readdirplus instead of readdir… I'll try to present a better explanation. While `ls -l` is walking through a directory repeatedly entering nfs_readdir(), a CREATE response send us through nfs_post_op_update_inode_locked(): 1531 if (S_ISDIR(inode->i_mode)) 1532 invalid |= NFS_INO_INVALID_DATA; 1533 nfs_set_cache_invalid(inode, invalid); Now, the next entry into nfs_readdir() has us do nfs_revalidate_mapping(), which will do nfs_invalidate_mapping() for the directory, and so we have to start over with cookie 0 sending READDIRPLUS to re-fill the directory's mapping to get back to where we are for the current nfs_readdir(). This process repeats for every entry into nfs_readdir() if the directory keeps getting updated, and it becomes more likely that it will be updated as each pass takes longer and longer to re-fill the mapping as the current nfs_readdir() invocation is further along. READDIRPLUS isn't the problem, the problem is invalidating the directory mapping in the middle of a series of getdents() if we do a CREATE. Also, I think a similar thing happens if the directory's mtime or ctime is updated - but in that case we set NFS_INO_INVALID_DATA because the change_attr updates. So, for directories with a large number of entries that updates often, it can be very slow to list the directory. Why did 311324ad1713 change nfs_readdir from if (nfs_attribute_cache_expired(inode)) nfs_revalidate_mapping(inode, file->f_mapping); to if (nfs_attribute_cache_expired(inode) || nfsi->cache_validity & NFS_INO_INVALID_DATA) nfs_revalidate_mapping(inode, file->f_mapping); .. and can we go back to the way it was before? OK.. I understand why -- it is more correct since if we know the directory has changed, we might as well fetch the change. Otherwise, we might be creating files and then wondering why they aren't listed. It might be nicer to not invalidate the mapping we're currently using for readdir, though. Maybe there's a way to keep the mapping for the currently opened directory and invalidate it once it's closed. I'm using v3 for all this, but I think the problem is the same for v4. Ben ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-23 21:21 ` Benjamin Coddington @ 2016-08-24 12:18 ` Trond Myklebust 2016-08-24 13:15 ` Benjamin Coddington 2016-08-24 13:56 ` J. Bruce Fields 2016-08-24 13:02 ` Benjamin Coddington 1 sibling, 2 replies; 19+ messages in thread From: Trond Myklebust @ 2016-08-24 12:18 UTC (permalink / raw) To: Coddington Benjamin; +Cc: List Linux NFS Mailing DQo+IE9uIEF1ZyAyMywgMjAxNiwgYXQgMTc6MjEsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gDQo+IA0KPiBPbiAyMyBBdWcgMjAxNiwgYXQg MTE6MzYsIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gQXVnIDIzLCAyMDE2LCBh dCAxMTowOSwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6 DQo+Pj4gDQo+Pj4gSGkgbGludXgtbmZzLA0KPj4+IA0KPj4+IDMxMTMyNGFkMTcxMyAoIk5GUzog QmUgbW9yZSBhZ2dyZXNzaXZlIGluIHVzaW5nIHJlYWRkaXJwbHVzIGZvciAnbHMgLWwnDQo+Pj4g c2l0dWF0aW9ucyIpIGNoYW5nZWQgd2hlbiBuZnNfcmVhZGRpcigpIGRlY2lkZXMgdG8gcmV2YWxp ZGF0ZSB0aGUNCj4+PiBkaXJlY3RvcnkncyBtYXBwaW5nLCB3aGljaCBjb250YWlucyBhbGwgdGhl IGVudHJpZXMuICBJbiBhZGRpdGlvbiB0byBqdXN0DQo+Pj4gY2hlY2tpbmcgaWYgdGhlIGF0dHJp YnV0ZSBjYWNoZSBoYXMgZXhwaXJlZCwgaXQgaW5jbHVkZXMgYSBjaGVjayB0byBzZWUgaWYNCj4+ PiBORlNfSU5PX0lOVkFMSURfREFUQSBpcyBzZXQgb24gdGhlIGRpcmVjdG9yeS4NCj4+PiANCj4+ PiBXZWxsLCBjdXN0b21lcnMgdGhhdCBoYXZlIGRpcmVjdG9yaWVzIHdpdGggdmVyeSBtYW55IGRl bnRyaWVzIGFuZCB0aGF0IHNhbWUNCj4+PiBkaXJlY3RvcnkncyBhdHRyaWJ1dGVzIGFyZSBmcmVx dWVudGx5IHVwZGF0ZWQgYXJlIG5vdyBncm91Y2h5IHRoYXQgYGxzIC1sYA0KPj4+IHRha2VzIHNv IGxvbmcgc2luY2UgYW55IHVwZGF0ZSBvZiB0aGUgZGlyZWN0b3J5IGNhdXNlcyB0aGUgbWFwcGlu ZyB0byBiZQ0KPj4+IGludmFsaWRhdGVkIGFuZCB3ZSBoYXZlIHRvIHN0YXJ0IG92ZXIgZmlsbGlu ZyB0aGUgZGlyZWN0b3J5J3MgbWFwcGluZy4NCj4+PiANCj4+PiBJIGFjdHVhbGx5IGhhdmVuJ3Qg cHV0IHJlYWwgaGFyZCB0aG91Z2h0IGludG8gaXQgeWV0IChiZWNhdXNlIG9mdGVuIGZvciBtZQ0K Pj4+IHRoYXQganVzdCB3YXN0ZXMgYSBsb3Qgb2YgdGltZSksIHNvIEkgYW0gZG9pbmcgdGhlIGxh enkgdGhpbmcgYnkgYXNraW5nIHRoaXMNCj4+PiBxdWVzdGlvbjoNCj4+PiANCj4+PiBDYW4gd2Ug Z28gYmFjayB0byBqdXN0IHRoZSB1c2luZyB0aGUgYXR0cmlidXRlIGNhY2hlIHRpbWVvdXQsIG9y IHNob3VsZCB3ZQ0KPj4+IGdldCBhbGwgaGV1cmlzdGljYWwgYWJvdXQgcmVhZGRpcj8NCj4+PiAN Cj4+IA0KPj4gV2UgYXJlIGFsbCBoZXVyaXN0aWNhbCBhdCB0aGlzIHBvaW50LiBIb3cgYXJlIHRo ZSBoZXVyaXN0aWNzIGZhaWxpbmc/DQo+PiANCj4+IFRoZSBvcmlnaW5hbCBwcm9ibGVtIHRob3Nl IGhldXJpc3RpY3Mgd2VyZSBkZXNpZ25lZCB0byBzb2x2ZSB3YXMgdGhhdCBhbGwNCj4+IHRoZSBz dGF0KCkgY2FsbHMgdG9vayBmb3JldmVyIHRvIGNvbXBsZXRlLCBzaW5jZSB0aGV5IGFyZSBhbGwg c3luY2hyb25vdXM7DQo+PiBUaWdyYW4gc2hvd2VkIHNvbWUgdmVyeSBjb252aW5jaW5nIG51bWJl cnMgZm9yIGEgbGFyZ2UgZGlyZWN0b3J5IHdoZXJlIHRoZQ0KPj4gZGlmZmVyZW5jZSBpbiBwZXJm b3JtYW5jZSB3YXMgYW4gb3JkZXIgb2YgbWFnbml0dWRlIGltcHJvdmVkIGJ5IHVzaW5nDQo+PiBy ZWFkZGlycGx1cyBpbnN0ZWFkIG9mIHJlYWRkaXLigKYNCj4gDQo+IEknbGwgdHJ5IHRvIHByZXNl bnQgYSBiZXR0ZXIgZXhwbGFuYXRpb24uICBXaGlsZSBgbHMgLWxgIGlzIHdhbGtpbmcgdGhyb3Vn aA0KPiBhIGRpcmVjdG9yeSByZXBlYXRlZGx5IGVudGVyaW5nIG5mc19yZWFkZGlyKCksIGEgQ1JF QVRFIHJlc3BvbnNlIHNlbmQgdXMNCj4gdGhyb3VnaCBuZnNfcG9zdF9vcF91cGRhdGVfaW5vZGVf bG9ja2VkKCk6DQo+IA0KPiAxNTMxICAgICBpZiAoU19JU0RJUihpbm9kZS0+aV9tb2RlKSkNCj4g MTUzMiAgICAgICAgIGludmFsaWQgfD0gTkZTX0lOT19JTlZBTElEX0RBVEE7DQo+IDE1MzMgICAg IG5mc19zZXRfY2FjaGVfaW52YWxpZChpbm9kZSwgaW52YWxpZCk7DQo+IA0KPiBOb3csIHRoZSBu ZXh0IGVudHJ5IGludG8gbmZzX3JlYWRkaXIoKSBoYXMgdXMgZG8gbmZzX3JldmFsaWRhdGVfbWFw cGluZygpLA0KPiB3aGljaCB3aWxsIGRvIG5mc19pbnZhbGlkYXRlX21hcHBpbmcoKSBmb3IgdGhl IGRpcmVjdG9yeSwgYW5kIHNvIHdlIGhhdmUgdG8NCj4gc3RhcnQgb3ZlciB3aXRoIGNvb2tpZSAw IHNlbmRpbmcgUkVBRERJUlBMVVMgdG8gcmUtZmlsbCB0aGUgZGlyZWN0b3J5J3MNCj4gbWFwcGlu ZyB0byBnZXQgYmFjayB0byB3aGVyZSB3ZSBhcmUgZm9yIHRoZSBjdXJyZW50IG5mc19yZWFkZGly KCkuDQo+IA0KPiBUaGlzIHByb2Nlc3MgcmVwZWF0cyBmb3IgZXZlcnkgZW50cnkgaW50byBuZnNf cmVhZGRpcigpIGlmIHRoZSBkaXJlY3RvcnkNCj4ga2VlcHMgZ2V0dGluZyB1cGRhdGVkLCBhbmQg aXQgYmVjb21lcyBtb3JlIGxpa2VseSB0aGF0IGl0IHdpbGwgYmUgdXBkYXRlZCBhcw0KPiBlYWNo IHBhc3MgdGFrZXMgbG9uZ2VyIGFuZCBsb25nZXIgdG8gcmUtZmlsbCB0aGUgbWFwcGluZyBhcyB0 aGUgY3VycmVudA0KPiBuZnNfcmVhZGRpcigpIGludm9jYXRpb24gaXMgZnVydGhlciBhbG9uZy4N Cj4gDQo+IFJFQURESVJQTFVTIGlzbid0IHRoZSBwcm9ibGVtLCB0aGUgcHJvYmxlbSBpcyBpbnZh bGlkYXRpbmcgdGhlIGRpcmVjdG9yeQ0KPiBtYXBwaW5nIGluIHRoZSBtaWRkbGUgb2YgYSBzZXJp ZXMgb2YgZ2V0ZGVudHMoKSBpZiB3ZSBkbyBhIENSRUFURS4gIEFsc28sIEkNCj4gdGhpbmsgYSBz aW1pbGFyIHRoaW5nIGhhcHBlbnMgaWYgdGhlIGRpcmVjdG9yeSdzIG10aW1lIG9yIGN0aW1lIGlz IHVwZGF0ZWQgLQ0KPiBidXQgaW4gdGhhdCBjYXNlIHdlIHNldCBORlNfSU5PX0lOVkFMSURfREFU QSBiZWNhdXNlIHRoZSBjaGFuZ2VfYXR0cg0KPiB1cGRhdGVzLg0KPiANCj4gU28sIGZvciBkaXJl Y3RvcmllcyB3aXRoIGEgbGFyZ2UgbnVtYmVyIG9mIGVudHJpZXMgdGhhdCB1cGRhdGVzIG9mdGVu LCBpdCBjYW4NCj4gYmUgdmVyeSBzbG93IHRvIGxpc3QgdGhlIGRpcmVjdG9yeS4NCj4gDQo+IFdo eSBkaWQgMzExMzI0YWQxNzEzIGNoYW5nZSBuZnNfcmVhZGRpciBmcm9tDQo+IA0KPiBpZiAobmZz X2F0dHJpYnV0ZV9jYWNoZV9leHBpcmVkKGlub2RlKSkNCj4gICAgbmZzX3JldmFsaWRhdGVfbWFw cGluZyhpbm9kZSwgZmlsZS0+Zl9tYXBwaW5nKTsNCj4gdG8NCj4gDQo+IGlmIChuZnNfYXR0cmli dXRlX2NhY2hlX2V4cGlyZWQoaW5vZGUpIHx8IG5mc2ktPmNhY2hlX3ZhbGlkaXR5ICYgTkZTX0lO T19JTlZBTElEX0RBVEEpDQo+ICAgIG5mc19yZXZhbGlkYXRlX21hcHBpbmcoaW5vZGUsIGZpbGUt PmZfbWFwcGluZyk7DQoNCkFzIHRoZSBjb21taXQgbWVzc2FnZSBzYXlzLCB0aGUgd2hvbGUgcHVy cG9zZSB3YXMgdG8gdXNlIFJFQURESVJQTFVTIGFzIGEgc3Vic3RpdHV0ZSBmb3IgbXVsdGlwbGUg R0VUQVRUUiBjYWxscyB3aGVuIHRoZSBoZXVyaXN0aWMgdGVsbHMgdXMgdGhhdCB0aGUgdXNlciBp cyBwZXJmb3JtaW5nIGFuIOKAmGxzIC1s4oCZIHN0eWxlIG9mIHdvcmtsb2FkLg0KDQo+IA0KPiAu LiBhbmQgY2FuIHdlIGdvIGJhY2sgdG8gdGhlIHdheSBpdCB3YXMgYmVmb3JlPw0KDQpOb3Qgd2l0 aG91dCBzbG93aW5nIGRvd24g4oCYbHMgLWzigJkgb24gbGFyZ2UgZGlyZWN0b3JpZXMuDQoNCj4g DQo+IE9LLi4gSSB1bmRlcnN0YW5kIHdoeSAtLSBpdCBpcyBtb3JlIGNvcnJlY3Qgc2luY2UgaWYg d2Uga25vdyB0aGUgZGlyZWN0b3J5IGhhcw0KPiBjaGFuZ2VkLCB3ZSBtaWdodCBhcyB3ZWxsIGZl dGNoIHRoZSBjaGFuZ2UuICBPdGhlcndpc2UsIHdlIG1pZ2h0IGJlIGNyZWF0aW5nDQo+IGZpbGVz IGFuZCB0aGVuIHdvbmRlcmluZyB3aHkgdGhleSBhcmVuJ3QgbGlzdGVkLg0KPiANCj4gSXQgbWln aHQgYmUgbmljZXIgdG8gbm90IGludmFsaWRhdGUgdGhlIG1hcHBpbmcgd2UncmUgY3VycmVudGx5 IHVzaW5nIGZvcg0KPiByZWFkZGlyLCB0aG91Z2guICBNYXliZSB0aGVyZSdzIGEgd2F5IHRvIGtl ZXAgdGhlIG1hcHBpbmcgZm9yIHRoZSBjdXJyZW50bHkNCj4gb3BlbmVkIGRpcmVjdG9yeSBhbmQg aW52YWxpZGF0ZSBpdCBvbmNlIGl0J3MgY2xvc2VkLg0KDQogUE9TSVggcmVxdWlyZXMgdGhhdCB5 b3UgcmV2YWxpZGF0ZSBvbiBvcGVuZGlyKCkgYW5kIHJld2luZGRpcigpLCBhbmQgbGVhdmVzIGJl aGF2aW91ciB3LnIudC4gZmlsZSBhZGRpdGlvbiBhbmQgcmVtb3ZhbCBhZnRlciB0aGUgY2FsbCB0 byBvcGVuZGlyKCkvcmV3aW5kZGlyKCkgdW5kZWZpbmVkIChodHRwOi8vcHVicy5vcGVuZ3JvdXAu b3JnL29ubGluZXB1YnMvOTY5OTkxOTc5OS9mdW5jdGlvbnMvcmVhZGRpci5odG1sKS4gSSBiZWxp ZXZlIG1vc3QgZmlsZXN5c3RlbXMgdW5kZXIgTGludXggd2lsbCBlbnN1cmUgdGhhdCBpZiBhIGZp bGUgaXMgcmVtb3ZlZCwgaXQgaXMgbm8gbG9uZ2VyIHNlZW4gaW4gdGhlIHJlYWRkaXIgc3RyZWFt LCBhbmQgYSBmaWxlIHRoYXQgaXMgYWRkZWQgd2lsbCBiZSBzZWVuIHVubGVzcyB0aGUgcmVhZGRp ciBjdXJzb3IgaGFzIGFscmVhZHkgcGFzc2VkIGl0LCBzbyB0aGVyZSBpcyB0aGF0IHRvIGNvbnNp ZGVyLg0KDQpIb3dldmVyIGNvcnJlY3RuZXNzIHdhcyBub3QgdGhlIG1haW4gaXNzdWUgaGVyZS4N Cg0K ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 12:18 ` Trond Myklebust @ 2016-08-24 13:15 ` Benjamin Coddington 2016-08-24 13:39 ` Trond Myklebust 2016-08-24 13:56 ` J. Bruce Fields 1 sibling, 1 reply; 19+ messages in thread From: Benjamin Coddington @ 2016-08-24 13:15 UTC (permalink / raw) To: Trond Myklebust; +Cc: List Linux NFS Mailing Apologies for the overlapping messages.. I had a mailer issue that caused me to miss this message. On 24 Aug 2016, at 8:18, Trond Myklebust wrote: >> On Aug 23, 2016, at 17:21, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> So, for directories with a large number of entries that updates >> often, it can >> be very slow to list the directory. >> >> Why did 311324ad1713 change nfs_readdir from >> >> if (nfs_attribute_cache_expired(inode)) >> nfs_revalidate_mapping(inode, file->f_mapping); >> to >> >> if (nfs_attribute_cache_expired(inode) || nfsi->cache_validity & >> NFS_INO_INVALID_DATA) >> nfs_revalidate_mapping(inode, file->f_mapping); > > As the commit message says, the whole purpose was to use READDIRPLUS > as a substitute for multiple GETATTR calls when the heuristic tells us > that the user is performing an ‘ls -l’ style of workload. > >> >> .. and can we go back to the way it was before? > > Not without slowing down ‘ls -l’ on large directories. Ah, so my musing is not about reverting the READDIRPLUS optimization, just the bit where we revalidate the mapping if NFS_INO_INVALID_DATA. I'll send a patch for that, let's see what happens to it! >> OK.. I understand why -- it is more correct since if we know the >> directory has >> changed, we might as well fetch the change. Otherwise, we might be >> creating >> files and then wondering why they aren't listed. >> >> It might be nicer to not invalidate the mapping we're currently using >> for >> readdir, though. Maybe there's a way to keep the mapping for the >> currently >> opened directory and invalidate it once it's closed. > > POSIX requires that you revalidate on opendir() and rewinddir(), and > leaves behaviour w.r.t. file addition and removal after the call to > opendir()/rewinddir() undefined > > (http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html). > I believe most filesystems under Linux will ensure that if a file is > removed, it is no longer seen in the readdir stream, and a file that > is > added will be seen unless the readdir cursor has already passed it, > so > there is that to consider. In the NFS case it would be up to the server to decide if it wants to throw out the in-use cookiverf with BADCOOKIE, or just include the new file at some later entry cookie. I think if the server wanted to insert the entry, it must invalidate all currently in-use cookieverf values. > However correctness was not the main issue here. I think by "here" you mean the READDIRPLUS optimization, and not the POSIX undefined behavior, in which case I take you to mean that the main issue was performance. That's the issue at hand for the case I described as well. Dropping the mid-directory revalidation should keep the optimization for your case and restore the optimization for this case. Again, apologies for the messy mail thread. :/ Ben ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 13:15 ` Benjamin Coddington @ 2016-08-24 13:39 ` Trond Myklebust 0 siblings, 0 replies; 19+ messages in thread From: Trond Myklebust @ 2016-08-24 13:39 UTC (permalink / raw) To: Coddington Benjamin; +Cc: List Linux NFS Mailing DQo+IE9uIEF1ZyAyNCwgMjAxNiwgYXQgMDk6MTUsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gQXBvbG9naWVzIGZvciB0aGUgb3ZlcmxhcHBp bmcgbWVzc2FnZXMuLiBJIGhhZCBhIG1haWxlciBpc3N1ZSB0aGF0IGNhdXNlZCBtZQ0KPiB0byBt aXNzIHRoaXMgbWVzc2FnZS4NCj4gDQo+IE9uIDI0IEF1ZyAyMDE2LCBhdCA4OjE4LCBUcm9uZCBN eWtsZWJ1c3Qgd3JvdGU6DQo+Pj4gT24gQXVnIDIzLCAyMDE2LCBhdCAxNzoyMSwgQmVuamFtaW4g Q29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4gU28sIGZvciBkaXJl Y3RvcmllcyB3aXRoIGEgbGFyZ2UgbnVtYmVyIG9mIGVudHJpZXMgdGhhdCB1cGRhdGVzIG9mdGVu LCBpdCBjYW4NCj4+PiBiZSB2ZXJ5IHNsb3cgdG8gbGlzdCB0aGUgZGlyZWN0b3J5Lg0KPj4+IA0K Pj4+IFdoeSBkaWQgMzExMzI0YWQxNzEzIGNoYW5nZSBuZnNfcmVhZGRpciBmcm9tDQo+Pj4gDQo+ Pj4gaWYgKG5mc19hdHRyaWJ1dGVfY2FjaGVfZXhwaXJlZChpbm9kZSkpDQo+Pj4gICBuZnNfcmV2 YWxpZGF0ZV9tYXBwaW5nKGlub2RlLCBmaWxlLT5mX21hcHBpbmcpOw0KPj4+IHRvDQo+Pj4gDQo+ Pj4gaWYgKG5mc19hdHRyaWJ1dGVfY2FjaGVfZXhwaXJlZChpbm9kZSkgfHwgbmZzaS0+Y2FjaGVf dmFsaWRpdHkgJiBORlNfSU5PX0lOVkFMSURfREFUQSkNCj4+PiAgIG5mc19yZXZhbGlkYXRlX21h cHBpbmcoaW5vZGUsIGZpbGUtPmZfbWFwcGluZyk7DQo+PiANCj4+IEFzIHRoZSBjb21taXQgbWVz c2FnZSBzYXlzLCB0aGUgd2hvbGUgcHVycG9zZSB3YXMgdG8gdXNlIFJFQURESVJQTFVTIGFzIGEg c3Vic3RpdHV0ZSBmb3IgbXVsdGlwbGUgR0VUQVRUUiBjYWxscyB3aGVuIHRoZSBoZXVyaXN0aWMg dGVsbHMgdXMgdGhhdCB0aGUgdXNlciBpcyBwZXJmb3JtaW5nIGFuIOKAmGxzIC1s4oCZIHN0eWxl IG9mIHdvcmtsb2FkLg0KPj4gDQo+Pj4gDQo+Pj4gLi4gYW5kIGNhbiB3ZSBnbyBiYWNrIHRvIHRo ZSB3YXkgaXQgd2FzIGJlZm9yZT8NCj4+IA0KPj4gTm90IHdpdGhvdXQgc2xvd2luZyBkb3duIOKA mGxzIC1s4oCZIG9uIGxhcmdlIGRpcmVjdG9yaWVzLg0KPiANCj4gQWgsIHNvIG15IG11c2luZyBp cyBub3QgYWJvdXQgcmV2ZXJ0aW5nIHRoZSBSRUFERElSUExVUyBvcHRpbWl6YXRpb24sIGp1c3QN Cj4gdGhlIGJpdCB3aGVyZSB3ZSByZXZhbGlkYXRlIHRoZSBtYXBwaW5nIGlmIE5GU19JTk9fSU5W QUxJRF9EQVRBLg0KPiANCj4gSSdsbCBzZW5kIGEgcGF0Y2ggZm9yIHRoYXQsIGxldCdzIHNlZSB3 aGF0IGhhcHBlbnMgdG8gaXQhDQo+IA0KPj4+IE9LLi4gSSB1bmRlcnN0YW5kIHdoeSAtLSBpdCBp cyBtb3JlIGNvcnJlY3Qgc2luY2UgaWYgd2Uga25vdyB0aGUgZGlyZWN0b3J5IGhhcw0KPj4+IGNo YW5nZWQsIHdlIG1pZ2h0IGFzIHdlbGwgZmV0Y2ggdGhlIGNoYW5nZS4gIE90aGVyd2lzZSwgd2Ug bWlnaHQgYmUgY3JlYXRpbmcNCj4+PiBmaWxlcyBhbmQgdGhlbiB3b25kZXJpbmcgd2h5IHRoZXkg YXJlbid0IGxpc3RlZC4NCj4+PiANCj4+PiBJdCBtaWdodCBiZSBuaWNlciB0byBub3QgaW52YWxp ZGF0ZSB0aGUgbWFwcGluZyB3ZSdyZSBjdXJyZW50bHkgdXNpbmcgZm9yDQo+Pj4gcmVhZGRpciwg dGhvdWdoLiAgTWF5YmUgdGhlcmUncyBhIHdheSB0byBrZWVwIHRoZSBtYXBwaW5nIGZvciB0aGUg Y3VycmVudGx5DQo+Pj4gb3BlbmVkIGRpcmVjdG9yeSBhbmQgaW52YWxpZGF0ZSBpdCBvbmNlIGl0 J3MgY2xvc2VkLg0KPj4gDQo+PiBQT1NJWCByZXF1aXJlcyB0aGF0IHlvdSByZXZhbGlkYXRlIG9u IG9wZW5kaXIoKSBhbmQgcmV3aW5kZGlyKCksIGFuZA0KPj4gbGVhdmVzIGJlaGF2aW91ciB3LnIu dC4gZmlsZSBhZGRpdGlvbiBhbmQgcmVtb3ZhbCBhZnRlciB0aGUgY2FsbCB0bw0KPj4gb3BlbmRp cigpL3Jld2luZGRpcigpIHVuZGVmaW5lZA0KPj4gKGh0dHA6Ly9wdWJzLm9wZW5ncm91cC5vcmcv b25saW5lcHVicy85Njk5OTE5Nzk5L2Z1bmN0aW9ucy9yZWFkZGlyLmh0bWwpLg0KPj4gSSBiZWxp ZXZlIG1vc3QgZmlsZXN5c3RlbXMgdW5kZXIgTGludXggd2lsbCBlbnN1cmUgdGhhdCBpZiBhIGZp bGUgaXMNCj4+IHJlbW92ZWQsIGl0IGlzIG5vIGxvbmdlciBzZWVuIGluIHRoZSByZWFkZGlyIHN0 cmVhbSwgYW5kIGEgZmlsZSB0aGF0IGlzDQo+PiBhZGRlZCB3aWxsIGJlIHNlZW4gdW5sZXNzIHRo ZSByZWFkZGlyIGN1cnNvciBoYXMgYWxyZWFkeSBwYXNzZWQgaXQsIHNvDQo+PiB0aGVyZSBpcyB0 aGF0IHRvIGNvbnNpZGVyLg0KPiANCj4gSW4gdGhlIE5GUyBjYXNlIGl0IHdvdWxkIGJlIHVwIHRv IHRoZSBzZXJ2ZXIgdG8gZGVjaWRlIGlmIGl0IHdhbnRzIHRvIHRocm93DQo+IG91dCB0aGUgaW4t dXNlIGNvb2tpdmVyZiB3aXRoIEJBRENPT0tJRSwgb3IganVzdCBpbmNsdWRlIHRoZSBuZXcgZmls ZSBhdA0KPiBzb21lIGxhdGVyIGVudHJ5IGNvb2tpZS4gIEkgdGhpbmsgaWYgdGhlIHNlcnZlciB3 YW50ZWQgdG8gaW5zZXJ0IHRoZSBlbnRyeSwNCj4gaXQgbXVzdCBpbnZhbGlkYXRlIGFsbCBjdXJy ZW50bHkgaW4tdXNlIGNvb2tpZXZlcmYgdmFsdWVzLg0KDQpUaGUgd2hvbGUgcmVhZGRpciB2ZXJp ZmllciBjcmFwIGlzIG5vdCBzdXBwb3J0ZWQgb3IgaW1wbGVtZW50ZWQgaW4gdGhlIExpbnV4IGNs aWVudC4gU2VlIHRoZSBwbGV0aG9yYSBvZiBoaXN0b3JpY2FsIHRocmVhZHMgb24gdGhpcyB0b3Bp YyBhbmQgd2h5IHlvdSBjYW5ub3Qgc3VwcG9ydCB0ZWxsZGlyKCkvc2Vla2RpcigpIHNlbWFudGlj cyB3aXRoIGNvb2tpZXMgdGhhdCBjaGFuZ2Ugd2lsbHkgbmlsbHkuLi4gDQoNCj4gDQo+PiBIb3dl dmVyIGNvcnJlY3RuZXNzIHdhcyBub3QgdGhlIG1haW4gaXNzdWUgaGVyZS4NCj4gDQo+IEkgdGhp bmsgYnkgImhlcmUiIHlvdSBtZWFuIHRoZSBSRUFERElSUExVUyBvcHRpbWl6YXRpb24sIGFuZCBu b3QgdGhlIFBPU0lYDQo+IHVuZGVmaW5lZCBiZWhhdmlvciwgaW4gd2hpY2ggY2FzZSBJIHRha2Ug eW91IHRvIG1lYW4gdGhhdCB0aGUgbWFpbiBpc3N1ZSB3YXMNCj4gcGVyZm9ybWFuY2UuICBUaGF0 J3MgdGhlIGlzc3VlIGF0IGhhbmQgZm9yIHRoZSBjYXNlIEkgZGVzY3JpYmVkIGFzIHdlbGwuDQo+ IERyb3BwaW5nIHRoZSBtaWQtZGlyZWN0b3J5IHJldmFsaWRhdGlvbiBzaG91bGQga2VlcCB0aGUg b3B0aW1pemF0aW9uIGZvcg0KPiB5b3VyIGNhc2UgYW5kIHJlc3RvcmUgdGhlIG9wdGltaXphdGlv biBmb3IgdGhpcyBjYXNlLg0KPiANCj4gQWdhaW4sIGFwb2xvZ2llcyBmb3IgdGhlIG1lc3N5IG1h aWwgdGhyZWFkLiAgOi8NCj4gDQo+IEJlbg0KDQo= ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 12:18 ` Trond Myklebust 2016-08-24 13:15 ` Benjamin Coddington @ 2016-08-24 13:56 ` J. Bruce Fields 2016-08-24 14:02 ` Trond Myklebust 1 sibling, 1 reply; 19+ messages in thread From: J. Bruce Fields @ 2016-08-24 13:56 UTC (permalink / raw) To: Trond Myklebust; +Cc: Coddington Benjamin, List Linux NFS Mailing On Wed, Aug 24, 2016 at 12:18:04PM +0000, Trond Myklebust wrote: >=20 > > On Aug 23, 2016, at 17:21, Benjamin Coddington <bcodding@redhat.com> wr= ote: > >=20 > >=20 > >=20 > > On 23 Aug 2016, at 11:36, Trond Myklebust wrote: > >=20 > >>> On Aug 23, 2016, at 11:09, Benjamin Coddington <bcodding@redhat.com> = wrote: > >>>=20 > >>> Hi linux-nfs, > >>>=20 > >>> 311324ad1713 ("NFS: Be more aggressive in using readdirplus for 'ls -= l' > >>> situations") changed when nfs_readdir() decides to revalidate the > >>> directory's mapping, which contains all the entries. In addition to = just > >>> checking if the attribute cache has expired, it includes a check to s= ee if > >>> NFS_INO_INVALID_DATA is set on the directory. > >>>=20 > >>> Well, customers that have directories with very many dentries and tha= t same > >>> directory's attributes are frequently updated are now grouchy that `l= s -l` > >>> takes so long since any update of the directory causes the mapping to= be > >>> invalidated and we have to start over filling the directory's mapping. > >>>=20 > >>> I actually haven't put real hard thought into it yet (because often f= or me > >>> that just wastes a lot of time), so I am doing the lazy thing by aski= ng this > >>> question: > >>>=20 > >>> Can we go back to just the using the attribute cache timeout, or shou= ld we > >>> get all heuristical about readdir? > >>>=20 > >>=20 > >> We are all heuristical at this point. How are the heuristics failing? > >>=20 > >> The original problem those heuristics were designed to solve was that = all > >> the stat() calls took forever to complete, since they are all synchron= ous; > >> Tigran showed some very convincing numbers for a large directory where= the > >> difference in performance was an order of magnitude improved by using > >> readdirplus instead of readdir=E2=80=A6 > >=20 > > I'll try to present a better explanation. While `ls -l` is walking thr= ough > > a directory repeatedly entering nfs_readdir(), a CREATE response send us > > through nfs_post_op_update_inode_locked(): > >=20 > > 1531 if (S_ISDIR(inode->i_mode)) > > 1532 invalid |=3D NFS_INO_INVALID_DATA; > > 1533 nfs_set_cache_invalid(inode, invalid); > >=20 > > Now, the next entry into nfs_readdir() has us do nfs_revalidate_mapping= (), > > which will do nfs_invalidate_mapping() for the directory, and so we hav= e to > > start over with cookie 0 sending READDIRPLUS to re-fill the directory's > > mapping to get back to where we are for the current nfs_readdir(). > >=20 > > This process repeats for every entry into nfs_readdir() if the directory > > keeps getting updated, and it becomes more likely that it will be updat= ed as > > each pass takes longer and longer to re-fill the mapping as the current > > nfs_readdir() invocation is further along. > >=20 > > READDIRPLUS isn't the problem, the problem is invalidating the directory > > mapping in the middle of a series of getdents() if we do a CREATE. Als= o, I > > think a similar thing happens if the directory's mtime or ctime is upda= ted - > > but in that case we set NFS_INO_INVALID_DATA because the change_attr > > updates. > >=20 > > So, for directories with a large number of entries that updates often, = it can > > be very slow to list the directory. > >=20 > > Why did 311324ad1713 change nfs_readdir from > >=20 > > if (nfs_attribute_cache_expired(inode)) > > nfs_revalidate_mapping(inode, file->f_mapping); > > to > >=20 > > if (nfs_attribute_cache_expired(inode) || nfsi->cache_validity & NFS_IN= O_INVALID_DATA) > > nfs_revalidate_mapping(inode, file->f_mapping); >=20 > As the commit message says, the whole purpose was to use READDIRPLUS as a= substitute for multiple GETATTR calls when the heuristic tells us that the= user is performing an =E2=80=98ls -l=E2=80=99 style of workload. >=20 > >=20 > > .. and can we go back to the way it was before? >=20 > Not without slowing down =E2=80=98ls -l=E2=80=99 on large directories. >=20 > >=20 > > OK.. I understand why -- it is more correct since if we know the direct= ory has > > changed, we might as well fetch the change. Otherwise, we might be cre= ating > > files and then wondering why they aren't listed. > >=20 > > It might be nicer to not invalidate the mapping we're currently using f= or > > readdir, though. Maybe there's a way to keep the mapping for the curre= ntly > > opened directory and invalidate it once it's closed. >=20 > POSIX requires that you revalidate on opendir() and rewinddir(), and > leaves behaviour w.r.t. file addition and removal after the call to > opendir()/rewinddir() undefined > (http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html). It is only undefined whether the added or removed entry is returned. Other entries still need to returned exactly once. In this case we're restarting the read of the directory from scratch--I don't understand how that's possible while avoiding skipped or duplicated entries. Surely the only safe thing to do is to continue reading using the last cookie returned from the server. --b. > I believe most filesystems under Linux will ensure that if a file is > removed, it is no longer seen in the readdir stream, and a file that > is added will be seen unless the readdir cursor has already passed > it, so there is that to consider. >=20 > However correctness was not the main issue here. >=20 > N?????r??y????b?X??=C7=A7v?^?)=DE=BA{.n?+????{???"??^n?r???z?=1A??h?????&= ??=1E?G???h?=03(?=E9=9A=8E?=DD=A2j"??=1A?=1Bm??????z?=DE=96???f???h???~?m ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 13:56 ` J. Bruce Fields @ 2016-08-24 14:02 ` Trond Myklebust 2016-08-24 14:16 ` Benjamin Coddington 2016-08-24 14:20 ` Fields Bruce James 0 siblings, 2 replies; 19+ messages in thread From: Trond Myklebust @ 2016-08-24 14:02 UTC (permalink / raw) To: Fields Bruce James; +Cc: Coddington Benjamin, List Linux NFS Mailing DQo+IE9uIEF1ZyAyNCwgMjAxNiwgYXQgMDk6NTYsIEouIEJydWNlIEZpZWxkcyA8YmZpZWxkc0Bm aWVsZHNlcy5vcmc+IHdyb3RlOg0KPiANCj4gT24gV2VkLCBBdWcgMjQsIDIwMTYgYXQgMTI6MTg6 MDRQTSArMDAwMCwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4gDQo+Pj4gT24gQXVnIDIzLCAy MDE2LCBhdCAxNzoyMSwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4g d3JvdGU6DQo+Pj4gDQo+Pj4gDQo+Pj4gDQo+Pj4gT24gMjMgQXVnIDIwMTYsIGF0IDExOjM2LCBU cm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+Pj4gDQo+Pj4+PiBPbiBBdWcgMjMsIDIwMTYsIGF0IDEx OjA5LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+ Pj4+IA0KPj4+Pj4gSGkgbGludXgtbmZzLA0KPj4+Pj4gDQo+Pj4+PiAzMTEzMjRhZDE3MTMgKCJO RlM6IEJlIG1vcmUgYWdncmVzc2l2ZSBpbiB1c2luZyByZWFkZGlycGx1cyBmb3IgJ2xzIC1sJw0K Pj4+Pj4gc2l0dWF0aW9ucyIpIGNoYW5nZWQgd2hlbiBuZnNfcmVhZGRpcigpIGRlY2lkZXMgdG8g cmV2YWxpZGF0ZSB0aGUNCj4+Pj4+IGRpcmVjdG9yeSdzIG1hcHBpbmcsIHdoaWNoIGNvbnRhaW5z IGFsbCB0aGUgZW50cmllcy4gIEluIGFkZGl0aW9uIHRvIGp1c3QNCj4+Pj4+IGNoZWNraW5nIGlm IHRoZSBhdHRyaWJ1dGUgY2FjaGUgaGFzIGV4cGlyZWQsIGl0IGluY2x1ZGVzIGEgY2hlY2sgdG8g c2VlIGlmDQo+Pj4+PiBORlNfSU5PX0lOVkFMSURfREFUQSBpcyBzZXQgb24gdGhlIGRpcmVjdG9y eS4NCj4+Pj4+IA0KPj4+Pj4gV2VsbCwgY3VzdG9tZXJzIHRoYXQgaGF2ZSBkaXJlY3RvcmllcyB3 aXRoIHZlcnkgbWFueSBkZW50cmllcyBhbmQgdGhhdCBzYW1lDQo+Pj4+PiBkaXJlY3RvcnkncyBh dHRyaWJ1dGVzIGFyZSBmcmVxdWVudGx5IHVwZGF0ZWQgYXJlIG5vdyBncm91Y2h5IHRoYXQgYGxz IC1sYA0KPj4+Pj4gdGFrZXMgc28gbG9uZyBzaW5jZSBhbnkgdXBkYXRlIG9mIHRoZSBkaXJlY3Rv cnkgY2F1c2VzIHRoZSBtYXBwaW5nIHRvIGJlDQo+Pj4+PiBpbnZhbGlkYXRlZCBhbmQgd2UgaGF2 ZSB0byBzdGFydCBvdmVyIGZpbGxpbmcgdGhlIGRpcmVjdG9yeSdzIG1hcHBpbmcuDQo+Pj4+PiAN Cj4+Pj4+IEkgYWN0dWFsbHkgaGF2ZW4ndCBwdXQgcmVhbCBoYXJkIHRob3VnaHQgaW50byBpdCB5 ZXQgKGJlY2F1c2Ugb2Z0ZW4gZm9yIG1lDQo+Pj4+PiB0aGF0IGp1c3Qgd2FzdGVzIGEgbG90IG9m IHRpbWUpLCBzbyBJIGFtIGRvaW5nIHRoZSBsYXp5IHRoaW5nIGJ5IGFza2luZyB0aGlzDQo+Pj4+ PiBxdWVzdGlvbjoNCj4+Pj4+IA0KPj4+Pj4gQ2FuIHdlIGdvIGJhY2sgdG8ganVzdCB0aGUgdXNp bmcgdGhlIGF0dHJpYnV0ZSBjYWNoZSB0aW1lb3V0LCBvciBzaG91bGQgd2UNCj4+Pj4+IGdldCBh bGwgaGV1cmlzdGljYWwgYWJvdXQgcmVhZGRpcj8NCj4+Pj4+IA0KPj4+PiANCj4+Pj4gV2UgYXJl IGFsbCBoZXVyaXN0aWNhbCBhdCB0aGlzIHBvaW50LiBIb3cgYXJlIHRoZSBoZXVyaXN0aWNzIGZh aWxpbmc/DQo+Pj4+IA0KPj4+PiBUaGUgb3JpZ2luYWwgcHJvYmxlbSB0aG9zZSBoZXVyaXN0aWNz IHdlcmUgZGVzaWduZWQgdG8gc29sdmUgd2FzIHRoYXQgYWxsDQo+Pj4+IHRoZSBzdGF0KCkgY2Fs bHMgdG9vayBmb3JldmVyIHRvIGNvbXBsZXRlLCBzaW5jZSB0aGV5IGFyZSBhbGwgc3luY2hyb25v dXM7DQo+Pj4+IFRpZ3JhbiBzaG93ZWQgc29tZSB2ZXJ5IGNvbnZpbmNpbmcgbnVtYmVycyBmb3Ig YSBsYXJnZSBkaXJlY3Rvcnkgd2hlcmUgdGhlDQo+Pj4+IGRpZmZlcmVuY2UgaW4gcGVyZm9ybWFu Y2Ugd2FzIGFuIG9yZGVyIG9mIG1hZ25pdHVkZSBpbXByb3ZlZCBieSB1c2luZw0KPj4+PiByZWFk ZGlycGx1cyBpbnN0ZWFkIG9mIHJlYWRkaXLigKYNCj4+PiANCj4+PiBJJ2xsIHRyeSB0byBwcmVz ZW50IGEgYmV0dGVyIGV4cGxhbmF0aW9uLiAgV2hpbGUgYGxzIC1sYCBpcyB3YWxraW5nIHRocm91 Z2gNCj4+PiBhIGRpcmVjdG9yeSByZXBlYXRlZGx5IGVudGVyaW5nIG5mc19yZWFkZGlyKCksIGEg Q1JFQVRFIHJlc3BvbnNlIHNlbmQgdXMNCj4+PiB0aHJvdWdoIG5mc19wb3N0X29wX3VwZGF0ZV9p bm9kZV9sb2NrZWQoKToNCj4+PiANCj4+PiAxNTMxICAgICBpZiAoU19JU0RJUihpbm9kZS0+aV9t b2RlKSkNCj4+PiAxNTMyICAgICAgICAgaW52YWxpZCB8PSBORlNfSU5PX0lOVkFMSURfREFUQTsN Cj4+PiAxNTMzICAgICBuZnNfc2V0X2NhY2hlX2ludmFsaWQoaW5vZGUsIGludmFsaWQpOw0KPj4+ IA0KPj4+IE5vdywgdGhlIG5leHQgZW50cnkgaW50byBuZnNfcmVhZGRpcigpIGhhcyB1cyBkbyBu ZnNfcmV2YWxpZGF0ZV9tYXBwaW5nKCksDQo+Pj4gd2hpY2ggd2lsbCBkbyBuZnNfaW52YWxpZGF0 ZV9tYXBwaW5nKCkgZm9yIHRoZSBkaXJlY3RvcnksIGFuZCBzbyB3ZSBoYXZlIHRvDQo+Pj4gc3Rh cnQgb3ZlciB3aXRoIGNvb2tpZSAwIHNlbmRpbmcgUkVBRERJUlBMVVMgdG8gcmUtZmlsbCB0aGUg ZGlyZWN0b3J5J3MNCj4+PiBtYXBwaW5nIHRvIGdldCBiYWNrIHRvIHdoZXJlIHdlIGFyZSBmb3Ig dGhlIGN1cnJlbnQgbmZzX3JlYWRkaXIoKS4NCj4+PiANCj4+PiBUaGlzIHByb2Nlc3MgcmVwZWF0 cyBmb3IgZXZlcnkgZW50cnkgaW50byBuZnNfcmVhZGRpcigpIGlmIHRoZSBkaXJlY3RvcnkNCj4+ PiBrZWVwcyBnZXR0aW5nIHVwZGF0ZWQsIGFuZCBpdCBiZWNvbWVzIG1vcmUgbGlrZWx5IHRoYXQg aXQgd2lsbCBiZSB1cGRhdGVkIGFzDQo+Pj4gZWFjaCBwYXNzIHRha2VzIGxvbmdlciBhbmQgbG9u Z2VyIHRvIHJlLWZpbGwgdGhlIG1hcHBpbmcgYXMgdGhlIGN1cnJlbnQNCj4+PiBuZnNfcmVhZGRp cigpIGludm9jYXRpb24gaXMgZnVydGhlciBhbG9uZy4NCj4+PiANCj4+PiBSRUFERElSUExVUyBp c24ndCB0aGUgcHJvYmxlbSwgdGhlIHByb2JsZW0gaXMgaW52YWxpZGF0aW5nIHRoZSBkaXJlY3Rv cnkNCj4+PiBtYXBwaW5nIGluIHRoZSBtaWRkbGUgb2YgYSBzZXJpZXMgb2YgZ2V0ZGVudHMoKSBp ZiB3ZSBkbyBhIENSRUFURS4gIEFsc28sIEkNCj4+PiB0aGluayBhIHNpbWlsYXIgdGhpbmcgaGFw cGVucyBpZiB0aGUgZGlyZWN0b3J5J3MgbXRpbWUgb3IgY3RpbWUgaXMgdXBkYXRlZCAtDQo+Pj4g YnV0IGluIHRoYXQgY2FzZSB3ZSBzZXQgTkZTX0lOT19JTlZBTElEX0RBVEEgYmVjYXVzZSB0aGUg Y2hhbmdlX2F0dHINCj4+PiB1cGRhdGVzLg0KPj4+IA0KPj4+IFNvLCBmb3IgZGlyZWN0b3JpZXMg d2l0aCBhIGxhcmdlIG51bWJlciBvZiBlbnRyaWVzIHRoYXQgdXBkYXRlcyBvZnRlbiwgaXQgY2Fu DQo+Pj4gYmUgdmVyeSBzbG93IHRvIGxpc3QgdGhlIGRpcmVjdG9yeS4NCj4+PiANCj4+PiBXaHkg ZGlkIDMxMTMyNGFkMTcxMyBjaGFuZ2UgbmZzX3JlYWRkaXIgZnJvbQ0KPj4+IA0KPj4+IGlmIChu ZnNfYXR0cmlidXRlX2NhY2hlX2V4cGlyZWQoaW5vZGUpKQ0KPj4+ICAgbmZzX3JldmFsaWRhdGVf bWFwcGluZyhpbm9kZSwgZmlsZS0+Zl9tYXBwaW5nKTsNCj4+PiB0bw0KPj4+IA0KPj4+IGlmIChu ZnNfYXR0cmlidXRlX2NhY2hlX2V4cGlyZWQoaW5vZGUpIHx8IG5mc2ktPmNhY2hlX3ZhbGlkaXR5 ICYgTkZTX0lOT19JTlZBTElEX0RBVEEpDQo+Pj4gICBuZnNfcmV2YWxpZGF0ZV9tYXBwaW5nKGlu b2RlLCBmaWxlLT5mX21hcHBpbmcpOw0KPj4gDQo+PiBBcyB0aGUgY29tbWl0IG1lc3NhZ2Ugc2F5 cywgdGhlIHdob2xlIHB1cnBvc2Ugd2FzIHRvIHVzZSBSRUFERElSUExVUyBhcyBhIHN1YnN0aXR1 dGUgZm9yIG11bHRpcGxlIEdFVEFUVFIgY2FsbHMgd2hlbiB0aGUgaGV1cmlzdGljIHRlbGxzIHVz IHRoYXQgdGhlIHVzZXIgaXMgcGVyZm9ybWluZyBhbiDigJhscyAtbOKAmSBzdHlsZSBvZiB3b3Jr bG9hZC4NCj4+IA0KPj4+IA0KPj4+IC4uIGFuZCBjYW4gd2UgZ28gYmFjayB0byB0aGUgd2F5IGl0 IHdhcyBiZWZvcmU/DQo+PiANCj4+IE5vdCB3aXRob3V0IHNsb3dpbmcgZG93biDigJhscyAtbOKA mSBvbiBsYXJnZSBkaXJlY3Rvcmllcy4NCj4+IA0KPj4+IA0KPj4+IE9LLi4gSSB1bmRlcnN0YW5k IHdoeSAtLSBpdCBpcyBtb3JlIGNvcnJlY3Qgc2luY2UgaWYgd2Uga25vdyB0aGUgZGlyZWN0b3J5 IGhhcw0KPj4+IGNoYW5nZWQsIHdlIG1pZ2h0IGFzIHdlbGwgZmV0Y2ggdGhlIGNoYW5nZS4gIE90 aGVyd2lzZSwgd2UgbWlnaHQgYmUgY3JlYXRpbmcNCj4+PiBmaWxlcyBhbmQgdGhlbiB3b25kZXJp bmcgd2h5IHRoZXkgYXJlbid0IGxpc3RlZC4NCj4+PiANCj4+PiBJdCBtaWdodCBiZSBuaWNlciB0 byBub3QgaW52YWxpZGF0ZSB0aGUgbWFwcGluZyB3ZSdyZSBjdXJyZW50bHkgdXNpbmcgZm9yDQo+ Pj4gcmVhZGRpciwgdGhvdWdoLiAgTWF5YmUgdGhlcmUncyBhIHdheSB0byBrZWVwIHRoZSBtYXBw aW5nIGZvciB0aGUgY3VycmVudGx5DQo+Pj4gb3BlbmVkIGRpcmVjdG9yeSBhbmQgaW52YWxpZGF0 ZSBpdCBvbmNlIGl0J3MgY2xvc2VkLg0KPj4gDQo+IA0KPj4gUE9TSVggcmVxdWlyZXMgdGhhdCB5 b3UgcmV2YWxpZGF0ZSBvbiBvcGVuZGlyKCkgYW5kIHJld2luZGRpcigpLCBhbmQNCj4+IGxlYXZl cyBiZWhhdmlvdXIgdy5yLnQuIGZpbGUgYWRkaXRpb24gYW5kIHJlbW92YWwgYWZ0ZXIgdGhlIGNh bGwgdG8NCj4+IG9wZW5kaXIoKS9yZXdpbmRkaXIoKSB1bmRlZmluZWQNCj4+IChodHRwOi8vcHVi cy5vcGVuZ3JvdXAub3JnL29ubGluZXB1YnMvOTY5OTkxOTc5OS9mdW5jdGlvbnMvcmVhZGRpci5o dG1sKS4NCj4gDQo+IEl0IGlzIG9ubHkgdW5kZWZpbmVkIHdoZXRoZXIgdGhlIGFkZGVkIG9yIHJl bW92ZWQgZW50cnkgaXMgcmV0dXJuZWQuDQo+IE90aGVyIGVudHJpZXMgc3RpbGwgbmVlZCB0byBy ZXR1cm5lZCBleGFjdGx5IG9uY2UuDQo+IA0KPiBJbiB0aGlzIGNhc2Ugd2UncmUgcmVzdGFydGlu ZyB0aGUgcmVhZCBvZiB0aGUgZGlyZWN0b3J5IGZyb20gc2NyYXRjaC0tSQ0KPiBkb24ndCB1bmRl cnN0YW5kIGhvdyB0aGF0J3MgcG9zc2libGUgd2hpbGUgYXZvaWRpbmcgc2tpcHBlZCBvcg0KPiBk dXBsaWNhdGVkIGVudHJpZXMuDQo+IA0KPiBTdXJlbHkgdGhlIG9ubHkgc2FmZSB0aGluZyB0byBk byBpcyB0byBjb250aW51ZSByZWFkaW5nIHVzaW5nIHRoZSBsYXN0DQo+IGNvb2tpZSByZXR1cm5l ZCBmcm9tIHRoZSBzZXJ2ZXIuDQoNCldoeT8gVGhlIGNsaWVudCBzaG91bGQgYmUgYWJsZSB0byBy ZXN0YXJ0IHVzaW5nIGFueSBjb29raWUgYXQgYW55IHRpbWUsIGFuZCB3ZSByZWx5IG9uIHRoZSBj b29raWVzIGJlaW5nIHVuaXF1ZSB0byBlYWNoIGVudHJ5LiBJZiB5b3Ugd2FudCBtb3JlIHJlbGF4 ZWQgY29va2llIHNlbWFudGljcyB0aGVuIGJlIHByZXBhcmVkIHRvIGhhdmUgdG8gc2V0IHVwIGEg c3RhdGVmdWwgTkZTIHJlYWRkaXIgcHJvdG9jb2wuDQoNClRyb25k ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 14:02 ` Trond Myklebust @ 2016-08-24 14:16 ` Benjamin Coddington 2016-08-24 14:19 ` Trond Myklebust 2016-08-24 14:20 ` Fields Bruce James 1 sibling, 1 reply; 19+ messages in thread From: Benjamin Coddington @ 2016-08-24 14:16 UTC (permalink / raw) To: Trond Myklebust; +Cc: Fields Bruce James, List Linux NFS Mailing On 24 Aug 2016, at 10:02, Trond Myklebust wrote: >> On Aug 24, 2016, at 09:56, J. Bruce Fields <bfields@fieldses.org> >> wrote: >> >> On Wed, Aug 24, 2016 at 12:18:04PM +0000, Trond Myklebust wrote: >>> >>>> On Aug 23, 2016, at 17:21, Benjamin Coddington >>>> <bcodding@redhat.com> wrote: >>>> >>>> >>>> >>>> On 23 Aug 2016, at 11:36, Trond Myklebust wrote: >>>> >>>>>> On Aug 23, 2016, at 11:09, Benjamin Coddington >>>>>> <bcodding@redhat.com> wrote: >>>>>> >>>>>> Hi linux-nfs, >>>>>> >>>>>> 311324ad1713 ("NFS: Be more aggressive in using readdirplus for >>>>>> 'ls -l' >>>>>> situations") changed when nfs_readdir() decides to revalidate the >>>>>> directory's mapping, which contains all the entries. In addition >>>>>> to just >>>>>> checking if the attribute cache has expired, it includes a check >>>>>> to see if >>>>>> NFS_INO_INVALID_DATA is set on the directory. >>>>>> >>>>>> Well, customers that have directories with very many dentries and >>>>>> that same >>>>>> directory's attributes are frequently updated are now grouchy >>>>>> that `ls -l` >>>>>> takes so long since any update of the directory causes the >>>>>> mapping to be >>>>>> invalidated and we have to start over filling the directory's >>>>>> mapping. >>>>>> >>>>>> I actually haven't put real hard thought into it yet (because >>>>>> often for me >>>>>> that just wastes a lot of time), so I am doing the lazy thing by >>>>>> asking this >>>>>> question: >>>>>> >>>>>> Can we go back to just the using the attribute cache timeout, or >>>>>> should we >>>>>> get all heuristical about readdir? >>>>>> >>>>> >>>>> We are all heuristical at this point. How are the heuristics >>>>> failing? >>>>> >>>>> The original problem those heuristics were designed to solve was >>>>> that all >>>>> the stat() calls took forever to complete, since they are all >>>>> synchronous; >>>>> Tigran showed some very convincing numbers for a large directory >>>>> where the >>>>> difference in performance was an order of magnitude improved by >>>>> using >>>>> readdirplus instead of readdir… >>>> >>>> I'll try to present a better explanation. While `ls -l` is walking >>>> through >>>> a directory repeatedly entering nfs_readdir(), a CREATE response >>>> send us >>>> through nfs_post_op_update_inode_locked(): >>>> >>>> 1531 if (S_ISDIR(inode->i_mode)) >>>> 1532 invalid |= NFS_INO_INVALID_DATA; >>>> 1533 nfs_set_cache_invalid(inode, invalid); >>>> >>>> Now, the next entry into nfs_readdir() has us do >>>> nfs_revalidate_mapping(), >>>> which will do nfs_invalidate_mapping() for the directory, and so we >>>> have to >>>> start over with cookie 0 sending READDIRPLUS to re-fill the >>>> directory's >>>> mapping to get back to where we are for the current nfs_readdir(). >>>> >>>> This process repeats for every entry into nfs_readdir() if the >>>> directory >>>> keeps getting updated, and it becomes more likely that it will be >>>> updated as >>>> each pass takes longer and longer to re-fill the mapping as the >>>> current >>>> nfs_readdir() invocation is further along. >>>> >>>> READDIRPLUS isn't the problem, the problem is invalidating the >>>> directory >>>> mapping in the middle of a series of getdents() if we do a CREATE. >>>> Also, I >>>> think a similar thing happens if the directory's mtime or ctime is >>>> updated - >>>> but in that case we set NFS_INO_INVALID_DATA because the >>>> change_attr >>>> updates. >>>> >>>> So, for directories with a large number of entries that updates >>>> often, it can >>>> be very slow to list the directory. >>>> >>>> Why did 311324ad1713 change nfs_readdir from >>>> >>>> if (nfs_attribute_cache_expired(inode)) >>>> nfs_revalidate_mapping(inode, file->f_mapping); >>>> to >>>> >>>> if (nfs_attribute_cache_expired(inode) || nfsi->cache_validity & >>>> NFS_INO_INVALID_DATA) >>>> nfs_revalidate_mapping(inode, file->f_mapping); >>> >>> As the commit message says, the whole purpose was to use READDIRPLUS >>> as a substitute for multiple GETATTR calls when the heuristic tells >>> us that the user is performing an ‘ls -l’ style of workload. >>> >>>> >>>> .. and can we go back to the way it was before? >>> >>> Not without slowing down ‘ls -l’ on large directories. >>> >>>> >>>> OK.. I understand why -- it is more correct since if we know the >>>> directory has >>>> changed, we might as well fetch the change. Otherwise, we might be >>>> creating >>>> files and then wondering why they aren't listed. >>>> >>>> It might be nicer to not invalidate the mapping we're currently >>>> using for >>>> readdir, though. Maybe there's a way to keep the mapping for the >>>> currently >>>> opened directory and invalidate it once it's closed. >>> >> >>> POSIX requires that you revalidate on opendir() and rewinddir(), and >>> leaves behaviour w.r.t. file addition and removal after the call to >>> opendir()/rewinddir() undefined >>> (http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html). >> >> It is only undefined whether the added or removed entry is returned. >> Other entries still need to returned exactly once. >> >> In this case we're restarting the read of the directory from >> scratch--I >> don't understand how that's possible while avoiding skipped or >> duplicated entries. >> >> Surely the only safe thing to do is to continue reading using the >> last >> cookie returned from the server. > > Why? The client should be able to restart using any cookie at any > time, > and we rely on the cookies being unique to each entry. If you want > more > relaxed cookie semantics then be prepared to have to set up a stateful > NFS > readdir protocol. It looks to me as if I send in a non-zero position to nfs_readdir(), but the mapping has been invalidated, then the client starts over at cookie 0 sending READDIRs to the server and reading the directory entries into the page cache in order to figure out which position maps to which entry/cookie. It seems possible to get duplicated or skipped entries. Ben ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 14:16 ` Benjamin Coddington @ 2016-08-24 14:19 ` Trond Myklebust 2016-08-24 15:18 ` Benjamin Coddington 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2016-08-24 14:19 UTC (permalink / raw) To: Coddington Benjamin; +Cc: Fields Bruce James, List Linux NFS Mailing DQo+IE9uIEF1ZyAyNCwgMjAxNiwgYXQgMTA6MTYsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjQgQXVnIDIwMTYsIGF0IDEwOjAyLCBU cm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+IA0KPj4+IE9uIEF1ZyAyNCwgMjAxNiwgYXQgMDk6NTYs IEouIEJydWNlIEZpZWxkcyA8YmZpZWxkc0BmaWVsZHNlcy5vcmc+IHdyb3RlOg0KPj4+IA0KPj4+ IE9uIFdlZCwgQXVnIDI0LCAyMDE2IGF0IDEyOjE4OjA0UE0gKzAwMDAsIFRyb25kIE15a2xlYnVz dCB3cm90ZToNCj4+Pj4gDQo+Pj4+PiBPbiBBdWcgMjMsIDIwMTYsIGF0IDE3OjIxLCBCZW5qYW1p biBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4+IA0KPj4+Pj4g DQo+Pj4+PiANCj4+Pj4+IE9uIDIzIEF1ZyAyMDE2LCBhdCAxMTozNiwgVHJvbmQgTXlrbGVidXN0 IHdyb3RlOg0KPj4+Pj4gDQo+Pj4+Pj4+IE9uIEF1ZyAyMywgMjAxNiwgYXQgMTE6MDksIEJlbmph bWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+Pj4+PiANCj4+ Pj4+Pj4gSGkgbGludXgtbmZzLA0KPj4+Pj4+PiANCj4+Pj4+Pj4gMzExMzI0YWQxNzEzICgiTkZT OiBCZSBtb3JlIGFnZ3Jlc3NpdmUgaW4gdXNpbmcgcmVhZGRpcnBsdXMgZm9yICdscyAtbCcNCj4+ Pj4+Pj4gc2l0dWF0aW9ucyIpIGNoYW5nZWQgd2hlbiBuZnNfcmVhZGRpcigpIGRlY2lkZXMgdG8g cmV2YWxpZGF0ZSB0aGUNCj4+Pj4+Pj4gZGlyZWN0b3J5J3MgbWFwcGluZywgd2hpY2ggY29udGFp bnMgYWxsIHRoZSBlbnRyaWVzLiAgSW4gYWRkaXRpb24gdG8ganVzdA0KPj4+Pj4+PiBjaGVja2lu ZyBpZiB0aGUgYXR0cmlidXRlIGNhY2hlIGhhcyBleHBpcmVkLCBpdCBpbmNsdWRlcyBhIGNoZWNr IHRvIHNlZSBpZg0KPj4+Pj4+PiBORlNfSU5PX0lOVkFMSURfREFUQSBpcyBzZXQgb24gdGhlIGRp cmVjdG9yeS4NCj4+Pj4+Pj4gDQo+Pj4+Pj4+IFdlbGwsIGN1c3RvbWVycyB0aGF0IGhhdmUgZGly ZWN0b3JpZXMgd2l0aCB2ZXJ5IG1hbnkgZGVudHJpZXMgYW5kIHRoYXQgc2FtZQ0KPj4+Pj4+PiBk aXJlY3RvcnkncyBhdHRyaWJ1dGVzIGFyZSBmcmVxdWVudGx5IHVwZGF0ZWQgYXJlIG5vdyBncm91 Y2h5IHRoYXQgYGxzIC1sYA0KPj4+Pj4+PiB0YWtlcyBzbyBsb25nIHNpbmNlIGFueSB1cGRhdGUg b2YgdGhlIGRpcmVjdG9yeSBjYXVzZXMgdGhlIG1hcHBpbmcgdG8gYmUNCj4+Pj4+Pj4gaW52YWxp ZGF0ZWQgYW5kIHdlIGhhdmUgdG8gc3RhcnQgb3ZlciBmaWxsaW5nIHRoZSBkaXJlY3RvcnkncyBt YXBwaW5nLg0KPj4+Pj4+PiANCj4+Pj4+Pj4gSSBhY3R1YWxseSBoYXZlbid0IHB1dCByZWFsIGhh cmQgdGhvdWdodCBpbnRvIGl0IHlldCAoYmVjYXVzZSBvZnRlbiBmb3IgbWUNCj4+Pj4+Pj4gdGhh dCBqdXN0IHdhc3RlcyBhIGxvdCBvZiB0aW1lKSwgc28gSSBhbSBkb2luZyB0aGUgbGF6eSB0aGlu ZyBieSBhc2tpbmcgdGhpcw0KPj4+Pj4+PiBxdWVzdGlvbjoNCj4+Pj4+Pj4gDQo+Pj4+Pj4+IENh biB3ZSBnbyBiYWNrIHRvIGp1c3QgdGhlIHVzaW5nIHRoZSBhdHRyaWJ1dGUgY2FjaGUgdGltZW91 dCwgb3Igc2hvdWxkIHdlDQo+Pj4+Pj4+IGdldCBhbGwgaGV1cmlzdGljYWwgYWJvdXQgcmVhZGRp cj8NCj4+Pj4+Pj4gDQo+Pj4+Pj4gDQo+Pj4+Pj4gV2UgYXJlIGFsbCBoZXVyaXN0aWNhbCBhdCB0 aGlzIHBvaW50LiBIb3cgYXJlIHRoZSBoZXVyaXN0aWNzIGZhaWxpbmc/DQo+Pj4+Pj4gDQo+Pj4+ Pj4gVGhlIG9yaWdpbmFsIHByb2JsZW0gdGhvc2UgaGV1cmlzdGljcyB3ZXJlIGRlc2lnbmVkIHRv IHNvbHZlIHdhcyB0aGF0IGFsbA0KPj4+Pj4+IHRoZSBzdGF0KCkgY2FsbHMgdG9vayBmb3JldmVy IHRvIGNvbXBsZXRlLCBzaW5jZSB0aGV5IGFyZSBhbGwgc3luY2hyb25vdXM7DQo+Pj4+Pj4gVGln cmFuIHNob3dlZCBzb21lIHZlcnkgY29udmluY2luZyBudW1iZXJzIGZvciBhIGxhcmdlIGRpcmVj dG9yeSB3aGVyZSB0aGUNCj4+Pj4+PiBkaWZmZXJlbmNlIGluIHBlcmZvcm1hbmNlIHdhcyBhbiBv cmRlciBvZiBtYWduaXR1ZGUgaW1wcm92ZWQgYnkgdXNpbmcNCj4+Pj4+PiByZWFkZGlycGx1cyBp bnN0ZWFkIG9mIHJlYWRkaXLigKYNCj4+Pj4+IA0KPj4+Pj4gSSdsbCB0cnkgdG8gcHJlc2VudCBh IGJldHRlciBleHBsYW5hdGlvbi4gIFdoaWxlIGBscyAtbGAgaXMgd2Fsa2luZyB0aHJvdWdoDQo+ Pj4+PiBhIGRpcmVjdG9yeSByZXBlYXRlZGx5IGVudGVyaW5nIG5mc19yZWFkZGlyKCksIGEgQ1JF QVRFIHJlc3BvbnNlIHNlbmQgdXMNCj4+Pj4+IHRocm91Z2ggbmZzX3Bvc3Rfb3BfdXBkYXRlX2lu b2RlX2xvY2tlZCgpOg0KPj4+Pj4gDQo+Pj4+PiAxNTMxICAgICBpZiAoU19JU0RJUihpbm9kZS0+ aV9tb2RlKSkNCj4+Pj4+IDE1MzIgICAgICAgICBpbnZhbGlkIHw9IE5GU19JTk9fSU5WQUxJRF9E QVRBOw0KPj4+Pj4gMTUzMyAgICAgbmZzX3NldF9jYWNoZV9pbnZhbGlkKGlub2RlLCBpbnZhbGlk KTsNCj4+Pj4+IA0KPj4+Pj4gTm93LCB0aGUgbmV4dCBlbnRyeSBpbnRvIG5mc19yZWFkZGlyKCkg aGFzIHVzIGRvIG5mc19yZXZhbGlkYXRlX21hcHBpbmcoKSwNCj4+Pj4+IHdoaWNoIHdpbGwgZG8g bmZzX2ludmFsaWRhdGVfbWFwcGluZygpIGZvciB0aGUgZGlyZWN0b3J5LCBhbmQgc28gd2UgaGF2 ZSB0bw0KPj4+Pj4gc3RhcnQgb3ZlciB3aXRoIGNvb2tpZSAwIHNlbmRpbmcgUkVBRERJUlBMVVMg dG8gcmUtZmlsbCB0aGUgZGlyZWN0b3J5J3MNCj4+Pj4+IG1hcHBpbmcgdG8gZ2V0IGJhY2sgdG8g d2hlcmUgd2UgYXJlIGZvciB0aGUgY3VycmVudCBuZnNfcmVhZGRpcigpLg0KPj4+Pj4gDQo+Pj4+ PiBUaGlzIHByb2Nlc3MgcmVwZWF0cyBmb3IgZXZlcnkgZW50cnkgaW50byBuZnNfcmVhZGRpcigp IGlmIHRoZSBkaXJlY3RvcnkNCj4+Pj4+IGtlZXBzIGdldHRpbmcgdXBkYXRlZCwgYW5kIGl0IGJl Y29tZXMgbW9yZSBsaWtlbHkgdGhhdCBpdCB3aWxsIGJlIHVwZGF0ZWQgYXMNCj4+Pj4+IGVhY2gg cGFzcyB0YWtlcyBsb25nZXIgYW5kIGxvbmdlciB0byByZS1maWxsIHRoZSBtYXBwaW5nIGFzIHRo ZSBjdXJyZW50DQo+Pj4+PiBuZnNfcmVhZGRpcigpIGludm9jYXRpb24gaXMgZnVydGhlciBhbG9u Zy4NCj4+Pj4+IA0KPj4+Pj4gUkVBRERJUlBMVVMgaXNuJ3QgdGhlIHByb2JsZW0sIHRoZSBwcm9i bGVtIGlzIGludmFsaWRhdGluZyB0aGUgZGlyZWN0b3J5DQo+Pj4+PiBtYXBwaW5nIGluIHRoZSBt aWRkbGUgb2YgYSBzZXJpZXMgb2YgZ2V0ZGVudHMoKSBpZiB3ZSBkbyBhIENSRUFURS4gIEFsc28s IEkNCj4+Pj4+IHRoaW5rIGEgc2ltaWxhciB0aGluZyBoYXBwZW5zIGlmIHRoZSBkaXJlY3Rvcnkn cyBtdGltZSBvciBjdGltZSBpcyB1cGRhdGVkIC0NCj4+Pj4+IGJ1dCBpbiB0aGF0IGNhc2Ugd2Ug c2V0IE5GU19JTk9fSU5WQUxJRF9EQVRBIGJlY2F1c2UgdGhlIGNoYW5nZV9hdHRyDQo+Pj4+PiB1 cGRhdGVzLg0KPj4+Pj4gDQo+Pj4+PiBTbywgZm9yIGRpcmVjdG9yaWVzIHdpdGggYSBsYXJnZSBu dW1iZXIgb2YgZW50cmllcyB0aGF0IHVwZGF0ZXMgb2Z0ZW4sIGl0IGNhbg0KPj4+Pj4gYmUgdmVy eSBzbG93IHRvIGxpc3QgdGhlIGRpcmVjdG9yeS4NCj4+Pj4+IA0KPj4+Pj4gV2h5IGRpZCAzMTEz MjRhZDE3MTMgY2hhbmdlIG5mc19yZWFkZGlyIGZyb20NCj4+Pj4+IA0KPj4+Pj4gaWYgKG5mc19h dHRyaWJ1dGVfY2FjaGVfZXhwaXJlZChpbm9kZSkpDQo+Pj4+PiAgbmZzX3JldmFsaWRhdGVfbWFw cGluZyhpbm9kZSwgZmlsZS0+Zl9tYXBwaW5nKTsNCj4+Pj4+IHRvDQo+Pj4+PiANCj4+Pj4+IGlm IChuZnNfYXR0cmlidXRlX2NhY2hlX2V4cGlyZWQoaW5vZGUpIHx8IG5mc2ktPmNhY2hlX3ZhbGlk aXR5ICYgTkZTX0lOT19JTlZBTElEX0RBVEEpDQo+Pj4+PiAgbmZzX3JldmFsaWRhdGVfbWFwcGlu Zyhpbm9kZSwgZmlsZS0+Zl9tYXBwaW5nKTsNCj4+Pj4gDQo+Pj4+IEFzIHRoZSBjb21taXQgbWVz c2FnZSBzYXlzLCB0aGUgd2hvbGUgcHVycG9zZSB3YXMgdG8gdXNlIFJFQURESVJQTFVTIGFzIGEg c3Vic3RpdHV0ZSBmb3IgbXVsdGlwbGUgR0VUQVRUUiBjYWxscyB3aGVuIHRoZSBoZXVyaXN0aWMg dGVsbHMgdXMgdGhhdCB0aGUgdXNlciBpcyBwZXJmb3JtaW5nIGFuIOKAmGxzIC1s4oCZIHN0eWxl IG9mIHdvcmtsb2FkLg0KPj4+PiANCj4+Pj4+IA0KPj4+Pj4gLi4gYW5kIGNhbiB3ZSBnbyBiYWNr IHRvIHRoZSB3YXkgaXQgd2FzIGJlZm9yZT8NCj4+Pj4gDQo+Pj4+IE5vdCB3aXRob3V0IHNsb3dp bmcgZG93biDigJhscyAtbOKAmSBvbiBsYXJnZSBkaXJlY3Rvcmllcy4NCj4+Pj4gDQo+Pj4+PiAN Cj4+Pj4+IE9LLi4gSSB1bmRlcnN0YW5kIHdoeSAtLSBpdCBpcyBtb3JlIGNvcnJlY3Qgc2luY2Ug aWYgd2Uga25vdyB0aGUgZGlyZWN0b3J5IGhhcw0KPj4+Pj4gY2hhbmdlZCwgd2UgbWlnaHQgYXMg d2VsbCBmZXRjaCB0aGUgY2hhbmdlLiAgT3RoZXJ3aXNlLCB3ZSBtaWdodCBiZSBjcmVhdGluZw0K Pj4+Pj4gZmlsZXMgYW5kIHRoZW4gd29uZGVyaW5nIHdoeSB0aGV5IGFyZW4ndCBsaXN0ZWQuDQo+ Pj4+PiANCj4+Pj4+IEl0IG1pZ2h0IGJlIG5pY2VyIHRvIG5vdCBpbnZhbGlkYXRlIHRoZSBtYXBw aW5nIHdlJ3JlIGN1cnJlbnRseSB1c2luZyBmb3INCj4+Pj4+IHJlYWRkaXIsIHRob3VnaC4gIE1h eWJlIHRoZXJlJ3MgYSB3YXkgdG8ga2VlcCB0aGUgbWFwcGluZyBmb3IgdGhlIGN1cnJlbnRseQ0K Pj4+Pj4gb3BlbmVkIGRpcmVjdG9yeSBhbmQgaW52YWxpZGF0ZSBpdCBvbmNlIGl0J3MgY2xvc2Vk Lg0KPj4+PiANCj4+PiANCj4+Pj4gUE9TSVggcmVxdWlyZXMgdGhhdCB5b3UgcmV2YWxpZGF0ZSBv biBvcGVuZGlyKCkgYW5kIHJld2luZGRpcigpLCBhbmQNCj4+Pj4gbGVhdmVzIGJlaGF2aW91ciB3 LnIudC4gZmlsZSBhZGRpdGlvbiBhbmQgcmVtb3ZhbCBhZnRlciB0aGUgY2FsbCB0bw0KPj4+PiBv cGVuZGlyKCkvcmV3aW5kZGlyKCkgdW5kZWZpbmVkDQo+Pj4+IChodHRwOi8vcHVicy5vcGVuZ3Jv dXAub3JnL29ubGluZXB1YnMvOTY5OTkxOTc5OS9mdW5jdGlvbnMvcmVhZGRpci5odG1sKS4NCj4+ PiANCj4+PiBJdCBpcyBvbmx5IHVuZGVmaW5lZCB3aGV0aGVyIHRoZSBhZGRlZCBvciByZW1vdmVk IGVudHJ5IGlzIHJldHVybmVkLg0KPj4+IE90aGVyIGVudHJpZXMgc3RpbGwgbmVlZCB0byByZXR1 cm5lZCBleGFjdGx5IG9uY2UuDQo+Pj4gDQo+Pj4gSW4gdGhpcyBjYXNlIHdlJ3JlIHJlc3RhcnRp bmcgdGhlIHJlYWQgb2YgdGhlIGRpcmVjdG9yeSBmcm9tIHNjcmF0Y2gtLUkNCj4+PiBkb24ndCB1 bmRlcnN0YW5kIGhvdyB0aGF0J3MgcG9zc2libGUgd2hpbGUgYXZvaWRpbmcgc2tpcHBlZCBvcg0K Pj4+IGR1cGxpY2F0ZWQgZW50cmllcy4NCj4+PiANCj4+PiBTdXJlbHkgdGhlIG9ubHkgc2FmZSB0 aGluZyB0byBkbyBpcyB0byBjb250aW51ZSByZWFkaW5nIHVzaW5nIHRoZSBsYXN0DQo+Pj4gY29v a2llIHJldHVybmVkIGZyb20gdGhlIHNlcnZlci4NCj4+IA0KPj4gV2h5PyBUaGUgY2xpZW50IHNo b3VsZCBiZSBhYmxlIHRvIHJlc3RhcnQgdXNpbmcgYW55IGNvb2tpZSBhdCBhbnkgdGltZSwNCj4+ IGFuZCB3ZSByZWx5IG9uIHRoZSBjb29raWVzIGJlaW5nIHVuaXF1ZSB0byBlYWNoIGVudHJ5LiBJ ZiB5b3Ugd2FudCBtb3JlDQo+PiByZWxheGVkIGNvb2tpZSBzZW1hbnRpY3MgdGhlbiBiZSBwcmVw YXJlZCB0byBoYXZlIHRvIHNldCB1cCBhIHN0YXRlZnVsIE5GUw0KPj4gcmVhZGRpciBwcm90b2Nv bC4NCj4gDQo+IEl0IGxvb2tzIHRvIG1lIGFzIGlmIEkgc2VuZCBpbiBhIG5vbi16ZXJvIHBvc2l0 aW9uIHRvIG5mc19yZWFkZGlyKCksIGJ1dCB0aGUgbWFwcGluZw0KPiBoYXMgYmVlbiBpbnZhbGlk YXRlZCwgdGhlbiB0aGUgY2xpZW50IHN0YXJ0cyBvdmVyIGF0IGNvb2tpZSAwIHNlbmRpbmcNCj4g UkVBRERJUnMgdG8gdGhlIHNlcnZlciBhbmQgcmVhZGluZyB0aGUgZGlyZWN0b3J5IGVudHJpZXMg aW50byB0aGUgcGFnZSBjYWNoZQ0KPiBpbiBvcmRlciB0byBmaWd1cmUgb3V0IHdoaWNoIHBvc2l0 aW9uIG1hcHMgdG8gd2hpY2ggZW50cnkvY29va2llLg0KDQpZZXMuDQoNCj4gSXQgc2VlbXMgcG9z c2libGUgdG8gZ2V0IGR1cGxpY2F0ZWQgb3Igc2tpcHBlZCBlbnRyaWVzLg0KDQpUaGVyZSBpcyBh IGxvZ2ljYWwgc3RlcCBtaXNzaW5nIGhlcmUuIEhvdyBkbyB5b3UgZ2V0IGZyb20g4oCcd2UgY2Fu IGVuZCB1cCByZS1maWxsaW5nIHRoZSBjYWNoZSBmcm9tIHNjcmF0Y2jigJ0gdG8g4oCcdGhlcmUg Y2FuIGJlIGR1cGxpY2F0ZSBvciBza2lwcGVkIGVudHJpZXPigJ0/DQoNCg0K ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 14:19 ` Trond Myklebust @ 2016-08-24 15:18 ` Benjamin Coddington 0 siblings, 0 replies; 19+ messages in thread From: Benjamin Coddington @ 2016-08-24 15:18 UTC (permalink / raw) To: Trond Myklebust; +Cc: Fields Bruce James, List Linux NFS Mailing On 24 Aug 2016, at 10:19, Trond Myklebust wrote: >> On Aug 24, 2016, at 10:16, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 24 Aug 2016, at 10:02, Trond Myklebust wrote: >> >>>> On Aug 24, 2016, at 09:56, J. Bruce Fields <bfields@fieldses.org> >>>> wrote: >>>> >>>> On Wed, Aug 24, 2016 at 12:18:04PM +0000, Trond Myklebust wrote: >>>>> >>>>>> On Aug 23, 2016, at 17:21, Benjamin Coddington >>>>>> <bcodding@redhat.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 23 Aug 2016, at 11:36, Trond Myklebust wrote: >>>>>> >>>>>>>> On Aug 23, 2016, at 11:09, Benjamin Coddington >>>>>>>> <bcodding@redhat.com> wrote: >>>>>>>> >>>>>>>> Hi linux-nfs, >>>>>>>> >>>>>>>> 311324ad1713 ("NFS: Be more aggressive in using readdirplus for >>>>>>>> 'ls -l' >>>>>>>> situations") changed when nfs_readdir() decides to revalidate >>>>>>>> the >>>>>>>> directory's mapping, which contains all the entries. In >>>>>>>> addition to just >>>>>>>> checking if the attribute cache has expired, it includes a >>>>>>>> check to see if >>>>>>>> NFS_INO_INVALID_DATA is set on the directory. >>>>>>>> >>>>>>>> Well, customers that have directories with very many dentries >>>>>>>> and that same >>>>>>>> directory's attributes are frequently updated are now grouchy >>>>>>>> that `ls -l` >>>>>>>> takes so long since any update of the directory causes the >>>>>>>> mapping to be >>>>>>>> invalidated and we have to start over filling the directory's >>>>>>>> mapping. >>>>>>>> >>>>>>>> I actually haven't put real hard thought into it yet (because >>>>>>>> often for me >>>>>>>> that just wastes a lot of time), so I am doing the lazy thing >>>>>>>> by asking this >>>>>>>> question: >>>>>>>> >>>>>>>> Can we go back to just the using the attribute cache timeout, >>>>>>>> or should we >>>>>>>> get all heuristical about readdir? >>>>>>>> >>>>>>> >>>>>>> We are all heuristical at this point. How are the heuristics >>>>>>> failing? >>>>>>> >>>>>>> The original problem those heuristics were designed to solve was >>>>>>> that all >>>>>>> the stat() calls took forever to complete, since they are all >>>>>>> synchronous; >>>>>>> Tigran showed some very convincing numbers for a large directory >>>>>>> where the >>>>>>> difference in performance was an order of magnitude improved by >>>>>>> using >>>>>>> readdirplus instead of readdir… >>>>>> >>>>>> I'll try to present a better explanation. While `ls -l` is >>>>>> walking through >>>>>> a directory repeatedly entering nfs_readdir(), a CREATE response >>>>>> send us >>>>>> through nfs_post_op_update_inode_locked(): >>>>>> >>>>>> 1531 if (S_ISDIR(inode->i_mode)) >>>>>> 1532 invalid |= NFS_INO_INVALID_DATA; >>>>>> 1533 nfs_set_cache_invalid(inode, invalid); >>>>>> >>>>>> Now, the next entry into nfs_readdir() has us do >>>>>> nfs_revalidate_mapping(), >>>>>> which will do nfs_invalidate_mapping() for the directory, and so >>>>>> we have to >>>>>> start over with cookie 0 sending READDIRPLUS to re-fill the >>>>>> directory's >>>>>> mapping to get back to where we are for the current >>>>>> nfs_readdir(). >>>>>> >>>>>> This process repeats for every entry into nfs_readdir() if the >>>>>> directory >>>>>> keeps getting updated, and it becomes more likely that it will be >>>>>> updated as >>>>>> each pass takes longer and longer to re-fill the mapping as the >>>>>> current >>>>>> nfs_readdir() invocation is further along. >>>>>> >>>>>> READDIRPLUS isn't the problem, the problem is invalidating the >>>>>> directory >>>>>> mapping in the middle of a series of getdents() if we do a >>>>>> CREATE. Also, I >>>>>> think a similar thing happens if the directory's mtime or ctime >>>>>> is updated - >>>>>> but in that case we set NFS_INO_INVALID_DATA because the >>>>>> change_attr >>>>>> updates. >>>>>> >>>>>> So, for directories with a large number of entries that updates >>>>>> often, it can >>>>>> be very slow to list the directory. >>>>>> >>>>>> Why did 311324ad1713 change nfs_readdir from >>>>>> >>>>>> if (nfs_attribute_cache_expired(inode)) >>>>>> nfs_revalidate_mapping(inode, file->f_mapping); >>>>>> to >>>>>> >>>>>> if (nfs_attribute_cache_expired(inode) || nfsi->cache_validity & >>>>>> NFS_INO_INVALID_DATA) >>>>>> nfs_revalidate_mapping(inode, file->f_mapping); >>>>> >>>>> As the commit message says, the whole purpose was to use >>>>> READDIRPLUS as a substitute for multiple GETATTR calls when the >>>>> heuristic tells us that the user is performing an ‘ls -l’ >>>>> style of workload. >>>>> >>>>>> >>>>>> .. and can we go back to the way it was before? >>>>> >>>>> Not without slowing down ‘ls -l’ on large directories. >>>>> >>>>>> >>>>>> OK.. I understand why -- it is more correct since if we know the >>>>>> directory has >>>>>> changed, we might as well fetch the change. Otherwise, we might >>>>>> be creating >>>>>> files and then wondering why they aren't listed. >>>>>> >>>>>> It might be nicer to not invalidate the mapping we're currently >>>>>> using for >>>>>> readdir, though. Maybe there's a way to keep the mapping for the >>>>>> currently >>>>>> opened directory and invalidate it once it's closed. >>>>> >>>> >>>>> POSIX requires that you revalidate on opendir() and rewinddir(), >>>>> and >>>>> leaves behaviour w.r.t. file addition and removal after the call >>>>> to >>>>> opendir()/rewinddir() undefined >>>>> (http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html). >>>> >>>> It is only undefined whether the added or removed entry is >>>> returned. >>>> Other entries still need to returned exactly once. >>>> >>>> In this case we're restarting the read of the directory from >>>> scratch--I >>>> don't understand how that's possible while avoiding skipped or >>>> duplicated entries. >>>> >>>> Surely the only safe thing to do is to continue reading using the >>>> last >>>> cookie returned from the server. >>> >>> Why? The client should be able to restart using any cookie at any >>> time, >>> and we rely on the cookies being unique to each entry. If you want >>> more >>> relaxed cookie semantics then be prepared to have to set up a >>> stateful NFS >>> readdir protocol. >> >> It looks to me as if I send in a non-zero position to nfs_readdir(), >> but the mapping >> has been invalidated, then the client starts over at cookie 0 sending >> READDIRs to the server and reading the directory entries into the >> page cache >> in order to figure out which position maps to which entry/cookie. > > Yes. > >> It seems possible to get duplicated or skipped entries. > > There is a logical step missing here. How do you get from “we can > end up > re-filling the cache from scratch” to “there can be duplicate or > skipped > entries”? Because I thought we were searching for the position, not the cookie. I missed that we keep the last position/cookie around. Thanks, I need to read more carefully. It seems a seekdir would clear that cookie, then we might have duplicate/missing entries.. but this is getting far from the original point. There's plenty of previous discussion on telldir/seekdir as you point out, so I'll go off and read that. Ben ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 14:02 ` Trond Myklebust 2016-08-24 14:16 ` Benjamin Coddington @ 2016-08-24 14:20 ` Fields Bruce James 2016-08-24 14:26 ` Trond Myklebust 1 sibling, 1 reply; 19+ messages in thread From: Fields Bruce James @ 2016-08-24 14:20 UTC (permalink / raw) To: Trond Myklebust; +Cc: Coddington Benjamin, List Linux NFS Mailing On Wed, Aug 24, 2016 at 02:02:04PM +0000, Trond Myklebust wrote: > > > On Aug 24, 2016, at 09:56, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Wed, Aug 24, 2016 at 12:18:04PM +0000, Trond Myklebust wrote: > >> > >>> On Aug 23, 2016, at 17:21, Benjamin Coddington <bcodding@redhat.com> wrote: > >>> > >>> > >>> > >>> On 23 Aug 2016, at 11:36, Trond Myklebust wrote: > >>> > >>>>> On Aug 23, 2016, at 11:09, Benjamin Coddington <bcodding@redhat.com> wrote: > >>>>> > >>>>> Hi linux-nfs, > >>>>> > >>>>> 311324ad1713 ("NFS: Be more aggressive in using readdirplus for 'ls -l' > >>>>> situations") changed when nfs_readdir() decides to revalidate the > >>>>> directory's mapping, which contains all the entries. In addition to just > >>>>> checking if the attribute cache has expired, it includes a check to see if > >>>>> NFS_INO_INVALID_DATA is set on the directory. > >>>>> > >>>>> Well, customers that have directories with very many dentries and that same > >>>>> directory's attributes are frequently updated are now grouchy that `ls -l` > >>>>> takes so long since any update of the directory causes the mapping to be > >>>>> invalidated and we have to start over filling the directory's mapping. > >>>>> > >>>>> I actually haven't put real hard thought into it yet (because often for me > >>>>> that just wastes a lot of time), so I am doing the lazy thing by asking this > >>>>> question: > >>>>> > >>>>> Can we go back to just the using the attribute cache timeout, or should we > >>>>> get all heuristical about readdir? > >>>>> > >>>> > >>>> We are all heuristical at this point. How are the heuristics failing? > >>>> > >>>> The original problem those heuristics were designed to solve was that all > >>>> the stat() calls took forever to complete, since they are all synchronous; > >>>> Tigran showed some very convincing numbers for a large directory where the > >>>> difference in performance was an order of magnitude improved by using > >>>> readdirplus instead of readdir… > >>> > >>> I'll try to present a better explanation. While `ls -l` is walking through > >>> a directory repeatedly entering nfs_readdir(), a CREATE response send us > >>> through nfs_post_op_update_inode_locked(): > >>> > >>> 1531 if (S_ISDIR(inode->i_mode)) > >>> 1532 invalid |= NFS_INO_INVALID_DATA; > >>> 1533 nfs_set_cache_invalid(inode, invalid); > >>> > >>> Now, the next entry into nfs_readdir() has us do nfs_revalidate_mapping(), > >>> which will do nfs_invalidate_mapping() for the directory, and so we have to > >>> start over with cookie 0 sending READDIRPLUS to re-fill the directory's > >>> mapping to get back to where we are for the current nfs_readdir(). > >>> > >>> This process repeats for every entry into nfs_readdir() if the directory > >>> keeps getting updated, and it becomes more likely that it will be updated as > >>> each pass takes longer and longer to re-fill the mapping as the current > >>> nfs_readdir() invocation is further along. > >>> > >>> READDIRPLUS isn't the problem, the problem is invalidating the directory > >>> mapping in the middle of a series of getdents() if we do a CREATE. Also, I > >>> think a similar thing happens if the directory's mtime or ctime is updated - > >>> but in that case we set NFS_INO_INVALID_DATA because the change_attr > >>> updates. > >>> > >>> So, for directories with a large number of entries that updates often, it can > >>> be very slow to list the directory. > >>> > >>> Why did 311324ad1713 change nfs_readdir from > >>> > >>> if (nfs_attribute_cache_expired(inode)) > >>> nfs_revalidate_mapping(inode, file->f_mapping); > >>> to > >>> > >>> if (nfs_attribute_cache_expired(inode) || nfsi->cache_validity & NFS_INO_INVALID_DATA) > >>> nfs_revalidate_mapping(inode, file->f_mapping); > >> > >> As the commit message says, the whole purpose was to use READDIRPLUS as a substitute for multiple GETATTR calls when the heuristic tells us that the user is performing an ‘ls -l’ style of workload. > >> > >>> > >>> .. and can we go back to the way it was before? > >> > >> Not without slowing down ‘ls -l’ on large directories. > >> > >>> > >>> OK.. I understand why -- it is more correct since if we know the directory has > >>> changed, we might as well fetch the change. Otherwise, we might be creating > >>> files and then wondering why they aren't listed. > >>> > >>> It might be nicer to not invalidate the mapping we're currently using for > >>> readdir, though. Maybe there's a way to keep the mapping for the currently > >>> opened directory and invalidate it once it's closed. > >> > > > >> POSIX requires that you revalidate on opendir() and rewinddir(), and > >> leaves behaviour w.r.t. file addition and removal after the call to > >> opendir()/rewinddir() undefined > >> (http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html). > > > > It is only undefined whether the added or removed entry is returned. > > Other entries still need to returned exactly once. > > > > In this case we're restarting the read of the directory from scratch--I > > don't understand how that's possible while avoiding skipped or > > duplicated entries. > > > > Surely the only safe thing to do is to continue reading using the last > > cookie returned from the server. > > Why? The client should be able to restart using any cookie at any time, and we rely on the cookies being unique to each entry. If you want more relaxed cookie semantics then be prepared to have to set up a stateful NFS readdir protocol. I may be misunderstanding what the client's doing. If it's doing something like this: - return first hundred entries to user - throw out cache, restart readdir with cookie 0. - skip first hundred entries, return entries 101-200 to user. then it's risking skipped or duplicate entries. I don't see how to avoid that unless you use the cookie returned after entry 100 to read the subsequent entries. --b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 14:20 ` Fields Bruce James @ 2016-08-24 14:26 ` Trond Myklebust 2016-08-24 14:40 ` J. Bruce Fields 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2016-08-24 14:26 UTC (permalink / raw) To: Fields Bruce James; +Cc: Coddington Benjamin, List Linux NFS Mailing DQo+IE9uIEF1ZyAyNCwgMjAxNiwgYXQgMTA6MjAsIEZpZWxkcyBCcnVjZSBKYW1lcyA8YmZpZWxk c0BmaWVsZHNlcy5vcmc+IHdyb3RlOg0KPiANCj4gT24gV2VkLCBBdWcgMjQsIDIwMTYgYXQgMDI6 MDI6MDRQTSArMDAwMCwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4gDQo+Pj4gT24gQXVnIDI0 LCAyMDE2LCBhdCAwOTo1NiwgSi4gQnJ1Y2UgRmllbGRzIDxiZmllbGRzQGZpZWxkc2VzLm9yZz4g d3JvdGU6DQo+Pj4gDQo+Pj4gT24gV2VkLCBBdWcgMjQsIDIwMTYgYXQgMTI6MTg6MDRQTSArMDAw MCwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4+PiANCj4+Pj4+IE9uIEF1ZyAyMywgMjAxNiwg YXQgMTc6MjEsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3Rl Og0KPj4+Pj4gDQo+Pj4+PiANCj4+Pj4+IA0KPj4+Pj4gT24gMjMgQXVnIDIwMTYsIGF0IDExOjM2 LCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+Pj4+PiANCj4+Pj4+Pj4gT24gQXVnIDIzLCAyMDE2 LCBhdCAxMTowOSwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3Jv dGU6DQo+Pj4+Pj4+IA0KPj4+Pj4+PiBIaSBsaW51eC1uZnMsDQo+Pj4+Pj4+IA0KPj4+Pj4+PiAz MTEzMjRhZDE3MTMgKCJORlM6IEJlIG1vcmUgYWdncmVzc2l2ZSBpbiB1c2luZyByZWFkZGlycGx1 cyBmb3IgJ2xzIC1sJw0KPj4+Pj4+PiBzaXR1YXRpb25zIikgY2hhbmdlZCB3aGVuIG5mc19yZWFk ZGlyKCkgZGVjaWRlcyB0byByZXZhbGlkYXRlIHRoZQ0KPj4+Pj4+PiBkaXJlY3RvcnkncyBtYXBw aW5nLCB3aGljaCBjb250YWlucyBhbGwgdGhlIGVudHJpZXMuICBJbiBhZGRpdGlvbiB0byBqdXN0 DQo+Pj4+Pj4+IGNoZWNraW5nIGlmIHRoZSBhdHRyaWJ1dGUgY2FjaGUgaGFzIGV4cGlyZWQsIGl0 IGluY2x1ZGVzIGEgY2hlY2sgdG8gc2VlIGlmDQo+Pj4+Pj4+IE5GU19JTk9fSU5WQUxJRF9EQVRB IGlzIHNldCBvbiB0aGUgZGlyZWN0b3J5Lg0KPj4+Pj4+PiANCj4+Pj4+Pj4gV2VsbCwgY3VzdG9t ZXJzIHRoYXQgaGF2ZSBkaXJlY3RvcmllcyB3aXRoIHZlcnkgbWFueSBkZW50cmllcyBhbmQgdGhh dCBzYW1lDQo+Pj4+Pj4+IGRpcmVjdG9yeSdzIGF0dHJpYnV0ZXMgYXJlIGZyZXF1ZW50bHkgdXBk YXRlZCBhcmUgbm93IGdyb3VjaHkgdGhhdCBgbHMgLWxgDQo+Pj4+Pj4+IHRha2VzIHNvIGxvbmcg c2luY2UgYW55IHVwZGF0ZSBvZiB0aGUgZGlyZWN0b3J5IGNhdXNlcyB0aGUgbWFwcGluZyB0byBi ZQ0KPj4+Pj4+PiBpbnZhbGlkYXRlZCBhbmQgd2UgaGF2ZSB0byBzdGFydCBvdmVyIGZpbGxpbmcg dGhlIGRpcmVjdG9yeSdzIG1hcHBpbmcuDQo+Pj4+Pj4+IA0KPj4+Pj4+PiBJIGFjdHVhbGx5IGhh dmVuJ3QgcHV0IHJlYWwgaGFyZCB0aG91Z2h0IGludG8gaXQgeWV0IChiZWNhdXNlIG9mdGVuIGZv ciBtZQ0KPj4+Pj4+PiB0aGF0IGp1c3Qgd2FzdGVzIGEgbG90IG9mIHRpbWUpLCBzbyBJIGFtIGRv aW5nIHRoZSBsYXp5IHRoaW5nIGJ5IGFza2luZyB0aGlzDQo+Pj4+Pj4+IHF1ZXN0aW9uOg0KPj4+ Pj4+PiANCj4+Pj4+Pj4gQ2FuIHdlIGdvIGJhY2sgdG8ganVzdCB0aGUgdXNpbmcgdGhlIGF0dHJp YnV0ZSBjYWNoZSB0aW1lb3V0LCBvciBzaG91bGQgd2UNCj4+Pj4+Pj4gZ2V0IGFsbCBoZXVyaXN0 aWNhbCBhYm91dCByZWFkZGlyPw0KPj4+Pj4+PiANCj4+Pj4+PiANCj4+Pj4+PiBXZSBhcmUgYWxs IGhldXJpc3RpY2FsIGF0IHRoaXMgcG9pbnQuIEhvdyBhcmUgdGhlIGhldXJpc3RpY3MgZmFpbGlu Zz8NCj4+Pj4+PiANCj4+Pj4+PiBUaGUgb3JpZ2luYWwgcHJvYmxlbSB0aG9zZSBoZXVyaXN0aWNz IHdlcmUgZGVzaWduZWQgdG8gc29sdmUgd2FzIHRoYXQgYWxsDQo+Pj4+Pj4gdGhlIHN0YXQoKSBj YWxscyB0b29rIGZvcmV2ZXIgdG8gY29tcGxldGUsIHNpbmNlIHRoZXkgYXJlIGFsbCBzeW5jaHJv bm91czsNCj4+Pj4+PiBUaWdyYW4gc2hvd2VkIHNvbWUgdmVyeSBjb252aW5jaW5nIG51bWJlcnMg Zm9yIGEgbGFyZ2UgZGlyZWN0b3J5IHdoZXJlIHRoZQ0KPj4+Pj4+IGRpZmZlcmVuY2UgaW4gcGVy Zm9ybWFuY2Ugd2FzIGFuIG9yZGVyIG9mIG1hZ25pdHVkZSBpbXByb3ZlZCBieSB1c2luZw0KPj4+ Pj4+IHJlYWRkaXJwbHVzIGluc3RlYWQgb2YgcmVhZGRpcuKApg0KPj4+Pj4gDQo+Pj4+PiBJJ2xs IHRyeSB0byBwcmVzZW50IGEgYmV0dGVyIGV4cGxhbmF0aW9uLiAgV2hpbGUgYGxzIC1sYCBpcyB3 YWxraW5nIHRocm91Z2gNCj4+Pj4+IGEgZGlyZWN0b3J5IHJlcGVhdGVkbHkgZW50ZXJpbmcgbmZz X3JlYWRkaXIoKSwgYSBDUkVBVEUgcmVzcG9uc2Ugc2VuZCB1cw0KPj4+Pj4gdGhyb3VnaCBuZnNf cG9zdF9vcF91cGRhdGVfaW5vZGVfbG9ja2VkKCk6DQo+Pj4+PiANCj4+Pj4+IDE1MzEgICAgIGlm IChTX0lTRElSKGlub2RlLT5pX21vZGUpKQ0KPj4+Pj4gMTUzMiAgICAgICAgIGludmFsaWQgfD0g TkZTX0lOT19JTlZBTElEX0RBVEE7DQo+Pj4+PiAxNTMzICAgICBuZnNfc2V0X2NhY2hlX2ludmFs aWQoaW5vZGUsIGludmFsaWQpOw0KPj4+Pj4gDQo+Pj4+PiBOb3csIHRoZSBuZXh0IGVudHJ5IGlu dG8gbmZzX3JlYWRkaXIoKSBoYXMgdXMgZG8gbmZzX3JldmFsaWRhdGVfbWFwcGluZygpLA0KPj4+ Pj4gd2hpY2ggd2lsbCBkbyBuZnNfaW52YWxpZGF0ZV9tYXBwaW5nKCkgZm9yIHRoZSBkaXJlY3Rv cnksIGFuZCBzbyB3ZSBoYXZlIHRvDQo+Pj4+PiBzdGFydCBvdmVyIHdpdGggY29va2llIDAgc2Vu ZGluZyBSRUFERElSUExVUyB0byByZS1maWxsIHRoZSBkaXJlY3Rvcnkncw0KPj4+Pj4gbWFwcGlu ZyB0byBnZXQgYmFjayB0byB3aGVyZSB3ZSBhcmUgZm9yIHRoZSBjdXJyZW50IG5mc19yZWFkZGly KCkuDQo+Pj4+PiANCj4+Pj4+IFRoaXMgcHJvY2VzcyByZXBlYXRzIGZvciBldmVyeSBlbnRyeSBp bnRvIG5mc19yZWFkZGlyKCkgaWYgdGhlIGRpcmVjdG9yeQ0KPj4+Pj4ga2VlcHMgZ2V0dGluZyB1 cGRhdGVkLCBhbmQgaXQgYmVjb21lcyBtb3JlIGxpa2VseSB0aGF0IGl0IHdpbGwgYmUgdXBkYXRl ZCBhcw0KPj4+Pj4gZWFjaCBwYXNzIHRha2VzIGxvbmdlciBhbmQgbG9uZ2VyIHRvIHJlLWZpbGwg dGhlIG1hcHBpbmcgYXMgdGhlIGN1cnJlbnQNCj4+Pj4+IG5mc19yZWFkZGlyKCkgaW52b2NhdGlv biBpcyBmdXJ0aGVyIGFsb25nLg0KPj4+Pj4gDQo+Pj4+PiBSRUFERElSUExVUyBpc24ndCB0aGUg cHJvYmxlbSwgdGhlIHByb2JsZW0gaXMgaW52YWxpZGF0aW5nIHRoZSBkaXJlY3RvcnkNCj4+Pj4+ IG1hcHBpbmcgaW4gdGhlIG1pZGRsZSBvZiBhIHNlcmllcyBvZiBnZXRkZW50cygpIGlmIHdlIGRv IGEgQ1JFQVRFLiAgQWxzbywgSQ0KPj4+Pj4gdGhpbmsgYSBzaW1pbGFyIHRoaW5nIGhhcHBlbnMg aWYgdGhlIGRpcmVjdG9yeSdzIG10aW1lIG9yIGN0aW1lIGlzIHVwZGF0ZWQgLQ0KPj4+Pj4gYnV0 IGluIHRoYXQgY2FzZSB3ZSBzZXQgTkZTX0lOT19JTlZBTElEX0RBVEEgYmVjYXVzZSB0aGUgY2hh bmdlX2F0dHINCj4+Pj4+IHVwZGF0ZXMuDQo+Pj4+PiANCj4+Pj4+IFNvLCBmb3IgZGlyZWN0b3Jp ZXMgd2l0aCBhIGxhcmdlIG51bWJlciBvZiBlbnRyaWVzIHRoYXQgdXBkYXRlcyBvZnRlbiwgaXQg Y2FuDQo+Pj4+PiBiZSB2ZXJ5IHNsb3cgdG8gbGlzdCB0aGUgZGlyZWN0b3J5Lg0KPj4+Pj4gDQo+ Pj4+PiBXaHkgZGlkIDMxMTMyNGFkMTcxMyBjaGFuZ2UgbmZzX3JlYWRkaXIgZnJvbQ0KPj4+Pj4g DQo+Pj4+PiBpZiAobmZzX2F0dHJpYnV0ZV9jYWNoZV9leHBpcmVkKGlub2RlKSkNCj4+Pj4+ICBu ZnNfcmV2YWxpZGF0ZV9tYXBwaW5nKGlub2RlLCBmaWxlLT5mX21hcHBpbmcpOw0KPj4+Pj4gdG8N Cj4+Pj4+IA0KPj4+Pj4gaWYgKG5mc19hdHRyaWJ1dGVfY2FjaGVfZXhwaXJlZChpbm9kZSkgfHwg bmZzaS0+Y2FjaGVfdmFsaWRpdHkgJiBORlNfSU5PX0lOVkFMSURfREFUQSkNCj4+Pj4+ICBuZnNf cmV2YWxpZGF0ZV9tYXBwaW5nKGlub2RlLCBmaWxlLT5mX21hcHBpbmcpOw0KPj4+PiANCj4+Pj4g QXMgdGhlIGNvbW1pdCBtZXNzYWdlIHNheXMsIHRoZSB3aG9sZSBwdXJwb3NlIHdhcyB0byB1c2Ug UkVBRERJUlBMVVMgYXMgYSBzdWJzdGl0dXRlIGZvciBtdWx0aXBsZSBHRVRBVFRSIGNhbGxzIHdo ZW4gdGhlIGhldXJpc3RpYyB0ZWxscyB1cyB0aGF0IHRoZSB1c2VyIGlzIHBlcmZvcm1pbmcgYW4g 4oCYbHMgLWzigJkgc3R5bGUgb2Ygd29ya2xvYWQuDQo+Pj4+IA0KPj4+Pj4gDQo+Pj4+PiAuLiBh bmQgY2FuIHdlIGdvIGJhY2sgdG8gdGhlIHdheSBpdCB3YXMgYmVmb3JlPw0KPj4+PiANCj4+Pj4g Tm90IHdpdGhvdXQgc2xvd2luZyBkb3duIOKAmGxzIC1s4oCZIG9uIGxhcmdlIGRpcmVjdG9yaWVz Lg0KPj4+PiANCj4+Pj4+IA0KPj4+Pj4gT0suLiBJIHVuZGVyc3RhbmQgd2h5IC0tIGl0IGlzIG1v cmUgY29ycmVjdCBzaW5jZSBpZiB3ZSBrbm93IHRoZSBkaXJlY3RvcnkgaGFzDQo+Pj4+PiBjaGFu Z2VkLCB3ZSBtaWdodCBhcyB3ZWxsIGZldGNoIHRoZSBjaGFuZ2UuICBPdGhlcndpc2UsIHdlIG1p Z2h0IGJlIGNyZWF0aW5nDQo+Pj4+PiBmaWxlcyBhbmQgdGhlbiB3b25kZXJpbmcgd2h5IHRoZXkg YXJlbid0IGxpc3RlZC4NCj4+Pj4+IA0KPj4+Pj4gSXQgbWlnaHQgYmUgbmljZXIgdG8gbm90IGlu dmFsaWRhdGUgdGhlIG1hcHBpbmcgd2UncmUgY3VycmVudGx5IHVzaW5nIGZvcg0KPj4+Pj4gcmVh ZGRpciwgdGhvdWdoLiAgTWF5YmUgdGhlcmUncyBhIHdheSB0byBrZWVwIHRoZSBtYXBwaW5nIGZv ciB0aGUgY3VycmVudGx5DQo+Pj4+PiBvcGVuZWQgZGlyZWN0b3J5IGFuZCBpbnZhbGlkYXRlIGl0 IG9uY2UgaXQncyBjbG9zZWQuDQo+Pj4+IA0KPj4+IA0KPj4+PiBQT1NJWCByZXF1aXJlcyB0aGF0 IHlvdSByZXZhbGlkYXRlIG9uIG9wZW5kaXIoKSBhbmQgcmV3aW5kZGlyKCksIGFuZA0KPj4+PiBs ZWF2ZXMgYmVoYXZpb3VyIHcuci50LiBmaWxlIGFkZGl0aW9uIGFuZCByZW1vdmFsIGFmdGVyIHRo ZSBjYWxsIHRvDQo+Pj4+IG9wZW5kaXIoKS9yZXdpbmRkaXIoKSB1bmRlZmluZWQNCj4+Pj4gKGh0 dHA6Ly9wdWJzLm9wZW5ncm91cC5vcmcvb25saW5lcHVicy85Njk5OTE5Nzk5L2Z1bmN0aW9ucy9y ZWFkZGlyLmh0bWwpLg0KPj4+IA0KPj4+IEl0IGlzIG9ubHkgdW5kZWZpbmVkIHdoZXRoZXIgdGhl IGFkZGVkIG9yIHJlbW92ZWQgZW50cnkgaXMgcmV0dXJuZWQuDQo+Pj4gT3RoZXIgZW50cmllcyBz dGlsbCBuZWVkIHRvIHJldHVybmVkIGV4YWN0bHkgb25jZS4NCj4+PiANCj4+PiBJbiB0aGlzIGNh c2Ugd2UncmUgcmVzdGFydGluZyB0aGUgcmVhZCBvZiB0aGUgZGlyZWN0b3J5IGZyb20gc2NyYXRj aC0tSQ0KPj4+IGRvbid0IHVuZGVyc3RhbmQgaG93IHRoYXQncyBwb3NzaWJsZSB3aGlsZSBhdm9p ZGluZyBza2lwcGVkIG9yDQo+Pj4gZHVwbGljYXRlZCBlbnRyaWVzLg0KPj4+IA0KPj4+IFN1cmVs eSB0aGUgb25seSBzYWZlIHRoaW5nIHRvIGRvIGlzIHRvIGNvbnRpbnVlIHJlYWRpbmcgdXNpbmcg dGhlIGxhc3QNCj4+PiBjb29raWUgcmV0dXJuZWQgZnJvbSB0aGUgc2VydmVyLg0KPj4gDQo+PiBX aHk/IFRoZSBjbGllbnQgc2hvdWxkIGJlIGFibGUgdG8gcmVzdGFydCB1c2luZyBhbnkgY29va2ll IGF0IGFueSB0aW1lLCBhbmQgd2UgcmVseSBvbiB0aGUgY29va2llcyBiZWluZyB1bmlxdWUgdG8g ZWFjaCBlbnRyeS4gSWYgeW91IHdhbnQgbW9yZSByZWxheGVkIGNvb2tpZSBzZW1hbnRpY3MgdGhl biBiZSBwcmVwYXJlZCB0byBoYXZlIHRvIHNldCB1cCBhIHN0YXRlZnVsIE5GUyByZWFkZGlyIHBy b3RvY29sLg0KPiANCj4gSSBtYXkgYmUgbWlzdW5kZXJzdGFuZGluZyB3aGF0IHRoZSBjbGllbnQn cyBkb2luZy4gIElmIGl0J3MgZG9pbmcNCj4gc29tZXRoaW5nIGxpa2UgdGhpczoNCj4gDQo+IAkt IHJldHVybiBmaXJzdCBodW5kcmVkIGVudHJpZXMgdG8gdXNlcg0KPiAJLSB0aHJvdyBvdXQgY2Fj aGUsIHJlc3RhcnQgcmVhZGRpciB3aXRoIGNvb2tpZSAwLg0KPiAJLSBza2lwIGZpcnN0IGh1bmRy ZWQgZW50cmllcywgcmV0dXJuIGVudHJpZXMgMTAxLTIwMCB0byB1c2VyLg0KDQpOby4gSWYgdGhl IGNsaWVudCBoYXMgYSBjYWNoZWQgY29va2llIGFjdGluZyBhcyB0aGUgY3Vyc29yLCB0aGVuIHdl IGxvb2sgZm9yIHRoYXQgY29va2llIGluIHRoZSBuZXcgc3RyZWFtLiBJZiB3ZSBkb27igJl0IGZp bmQgdGhhdCBjb29raWUsIHRoZW4gd2UgZG8gYW5vdGhlciByZWFkZGlyIHdoZXJlIHRoYXQgY29v a2llIGFjdHMgYXMgYW4gYXJndW1lbnQsIGFuZCB1c2UgdGhhdCB0byBmaWd1cmUgb3V0IHdoZXJl IHRoZSBzdHJlYW0gbm93IGNvbnRpbnVlcy4NCg0KDQo+IA0KPiB0aGVuIGl0J3Mgcmlza2luZyBz a2lwcGVkIG9yIGR1cGxpY2F0ZSBlbnRyaWVzLg0KPiANCj4gSSBkb24ndCBzZWUgaG93IHRvIGF2 b2lkIHRoYXQgdW5sZXNzIHlvdSB1c2UgdGhlIGNvb2tpZSByZXR1cm5lZCBhZnRlcg0KPiBlbnRy eSAxMDAgdG8gcmVhZCB0aGUgc3Vic2VxdWVudCBlbnRyaWVzLg0KPiANCg0KU2VlIGFib3ZlDQoN Cg== ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 14:26 ` Trond Myklebust @ 2016-08-24 14:40 ` J. Bruce Fields 2016-08-24 14:53 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: J. Bruce Fields @ 2016-08-24 14:40 UTC (permalink / raw) To: Trond Myklebust; +Cc: Coddington Benjamin, List Linux NFS Mailing On Wed, Aug 24, 2016 at 02:26:21PM +0000, Trond Myklebust wrote: > > > On Aug 24, 2016, at 10:20, <bfields@fieldses.org> wrote: > > I may be misunderstanding what the client's doing. If it's doing > > something like this: > > > > - return first hundred entries to user > > - throw out cache, restart readdir with cookie 0. > > - skip first hundred entries, return entries 101-200 to user. > > No. If the client has a cached cookie acting as the cursor, then we look for that cookie in the new stream. If we don’t find that cookie, then we do another readdir where that cookie acts as an argument, and use that to figure out where the stream now continues. Got it, thanks, so the struct file caches the real server side cookie even after you throw out the cached entries? OK, I guess I need to make another attempt at understanding the actual code, apologies. (Doesn't that mean that in the case your last position was a deleted file, you're probably going to have to read the entire directory to find out the cookie's not there? It seems so much more painful than starting off with the last cookie.) --b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 14:40 ` J. Bruce Fields @ 2016-08-24 14:53 ` Trond Myklebust 2016-08-24 15:16 ` Fields Bruce James 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2016-08-24 14:53 UTC (permalink / raw) To: Fields Bruce James; +Cc: Coddington Benjamin, List Linux NFS Mailing DQo+IE9uIEF1ZyAyNCwgMjAxNiwgYXQgMTA6NDAsIEouIEJydWNlIEZpZWxkcyA8YmZpZWxkc0Bm aWVsZHNlcy5vcmc+IHdyb3RlOg0KPiANCj4gT24gV2VkLCBBdWcgMjQsIDIwMTYgYXQgMDI6MjY6 MjFQTSArMDAwMCwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4gDQo+Pj4gT24gQXVnIDI0LCAy MDE2LCBhdCAxMDoyMCwgPGJmaWVsZHNAZmllbGRzZXMub3JnPiB3cm90ZToNCj4+PiBJIG1heSBi ZSBtaXN1bmRlcnN0YW5kaW5nIHdoYXQgdGhlIGNsaWVudCdzIGRvaW5nLiAgSWYgaXQncyBkb2lu Zw0KPj4+IHNvbWV0aGluZyBsaWtlIHRoaXM6DQo+Pj4gDQo+Pj4gCS0gcmV0dXJuIGZpcnN0IGh1 bmRyZWQgZW50cmllcyB0byB1c2VyDQo+Pj4gCS0gdGhyb3cgb3V0IGNhY2hlLCByZXN0YXJ0IHJl YWRkaXIgd2l0aCBjb29raWUgMC4NCj4+PiAJLSBza2lwIGZpcnN0IGh1bmRyZWQgZW50cmllcywg cmV0dXJuIGVudHJpZXMgMTAxLTIwMCB0byB1c2VyLg0KPj4gDQo+PiBOby4gSWYgdGhlIGNsaWVu dCBoYXMgYSBjYWNoZWQgY29va2llIGFjdGluZyBhcyB0aGUgY3Vyc29yLCB0aGVuIHdlIGxvb2sg Zm9yIHRoYXQgY29va2llIGluIHRoZSBuZXcgc3RyZWFtLiBJZiB3ZSBkb27igJl0IGZpbmQgdGhh dCBjb29raWUsIHRoZW4gd2UgZG8gYW5vdGhlciByZWFkZGlyIHdoZXJlIHRoYXQgY29va2llIGFj dHMgYXMgYW4gYXJndW1lbnQsIGFuZCB1c2UgdGhhdCB0byBmaWd1cmUgb3V0IHdoZXJlIHRoZSBz dHJlYW0gbm93IGNvbnRpbnVlcy4NCj4gDQo+IEdvdCBpdCwgdGhhbmtzLCBzbyB0aGUgc3RydWN0 IGZpbGUgY2FjaGVzIHRoZSByZWFsIHNlcnZlciBzaWRlIGNvb2tpZQ0KPiBldmVuIGFmdGVyIHlv dSB0aHJvdyBvdXQgdGhlIGNhY2hlZCBlbnRyaWVzPyAgT0ssIEkgZ3Vlc3MgSSBuZWVkIHRvIG1h a2UNCj4gYW5vdGhlciBhdHRlbXB0IGF0IHVuZGVyc3RhbmRpbmcgdGhlIGFjdHVhbCBjb2RlLCBh cG9sb2dpZXMuDQo+IA0KPiAoRG9lc24ndCB0aGF0IG1lYW4gdGhhdCBpbiB0aGUgY2FzZSB5b3Vy IGxhc3QgcG9zaXRpb24gd2FzIGEgZGVsZXRlZA0KPiBmaWxlLCB5b3UncmUgcHJvYmFibHkgZ29p bmcgdG8gaGF2ZSB0byByZWFkIHRoZSBlbnRpcmUgZGlyZWN0b3J5IHRvIGZpbmQNCj4gb3V0IHRo ZSBjb29raWUncyBub3QgdGhlcmU/ICBJdCBzZWVtcyBzbyBtdWNoIG1vcmUgcGFpbmZ1bCB0aGFu IHN0YXJ0aW5nDQo+IG9mZiB3aXRoIHRoZSBsYXN0IGNvb2tpZS4pDQoNCkhvdyBkbyB5b3UgY2Fj aGUgdGhlIHJlc3VsdHMgb2Ygc3VjaCBhIHJlYWRkaXI/IFlvdSdsbCBoYXZlIG5vIHJlZmVyZW5j ZSBwb2ludCB0byBmaWd1cmUgb3V0IHdoZXJlIHlvdSBhcmUgaW4gdGhlIHN0cmVhbS4NCkFsc28s IGhvdyBkbyB5b3UgY29udmVydCB0aGF0IGNvb2tpZSBpbnRvIGFuIG9mZnNldCB0aGF0IHRlbGxk aXIoKSBjYW4gdXNlPw0KDQoNCg== ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-24 14:53 ` Trond Myklebust @ 2016-08-24 15:16 ` Fields Bruce James 0 siblings, 0 replies; 19+ messages in thread From: Fields Bruce James @ 2016-08-24 15:16 UTC (permalink / raw) To: Trond Myklebust; +Cc: Coddington Benjamin, List Linux NFS Mailing On Wed, Aug 24, 2016 at 02:53:10PM +0000, Trond Myklebust wrote: > > > On Aug 24, 2016, at 10:40, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Wed, Aug 24, 2016 at 02:26:21PM +0000, Trond Myklebust wrote: > >> > >>> On Aug 24, 2016, at 10:20, <bfields@fieldses.org> wrote: > >>> I may be misunderstanding what the client's doing. If it's doing > >>> something like this: > >>> > >>> - return first hundred entries to user > >>> - throw out cache, restart readdir with cookie 0. > >>> - skip first hundred entries, return entries 101-200 to user. > >> > >> No. If the client has a cached cookie acting as the cursor, then we look for that cookie in the new stream. If we don’t find that cookie, then we do another readdir where that cookie acts as an argument, and use that to figure out where the stream now continues. > > > > Got it, thanks, so the struct file caches the real server side cookie > > even after you throw out the cached entries? OK, I guess I need to make > > another attempt at understanding the actual code, apologies. > > > > (Doesn't that mean that in the case your last position was a deleted > > file, you're probably going to have to read the entire directory to find > > out the cookie's not there? It seems so much more painful than starting > > off with the last cookie.) > > How do you cache the results of such a readdir? You'll have no reference point to figure out where you are in the stream. > Also, how do you convert that cookie into an offset that telldir() can use? No idea. But then, I don't really understand how we're going to do that even after reading the whole directory. Any telldir offsets that we've returned previously are in the future going to index into a completely new array, so they seem kind of meaningless. --b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-23 21:21 ` Benjamin Coddington 2016-08-24 12:18 ` Trond Myklebust @ 2016-08-24 13:02 ` Benjamin Coddington 1 sibling, 0 replies; 19+ messages in thread From: Benjamin Coddington @ 2016-08-24 13:02 UTC (permalink / raw) To: Trond Myklebust; +Cc: List Linux NFS Mailing On 23 Aug 2016, at 17:21, Benjamin Coddington wrote: > On 23 Aug 2016, at 11:36, Trond Myklebust wrote: > >>> On Aug 23, 2016, at 11:09, Benjamin Coddington <bcodding@redhat.com> >>> wrote: >>> >>> Hi linux-nfs, >>> >>> 311324ad1713 ("NFS: Be more aggressive in using readdirplus for 'ls >>> -l' >>> situations") changed when nfs_readdir() decides to revalidate the >>> directory's mapping, which contains all the entries. In addition to >>> just >>> checking if the attribute cache has expired, it includes a check to >>> see if >>> NFS_INO_INVALID_DATA is set on the directory. >>> >>> Well, customers that have directories with very many dentries and >>> that same >>> directory's attributes are frequently updated are now grouchy that >>> `ls -l` >>> takes so long since any update of the directory causes the mapping >>> to be >>> invalidated and we have to start over filling the directory's >>> mapping. >>> >>> I actually haven't put real hard thought into it yet (because often >>> for me >>> that just wastes a lot of time), so I am doing the lazy thing by >>> asking this >>> question: >>> >>> Can we go back to just the using the attribute cache timeout, or >>> should we >>> get all heuristical about readdir? >>> >> >> We are all heuristical at this point. How are the heuristics failing? >> >> The original problem those heuristics were designed to solve was that >> all >> the stat() calls took forever to complete, since they are all >> synchronous; >> Tigran showed some very convincing numbers for a large directory >> where the >> difference in performance was an order of magnitude improved by using >> readdirplus instead of readdir… > > I'll try to present a better explanation. While `ls -l` is walking > through > a directory repeatedly entering nfs_readdir(), a CREATE response send > us > through nfs_post_op_update_inode_locked(): > > 1531 if (S_ISDIR(inode->i_mode)) > 1532 invalid |= NFS_INO_INVALID_DATA; > 1533 nfs_set_cache_invalid(inode, invalid); > > Now, the next entry into nfs_readdir() has us do > nfs_revalidate_mapping(), > which will do nfs_invalidate_mapping() for the directory, and so we > have to > start over with cookie 0 sending READDIRPLUS to re-fill the > directory's > mapping to get back to where we are for the current nfs_readdir(). > > This process repeats for every entry into nfs_readdir() if the > directory > keeps getting updated, and it becomes more likely that it will be > updated as > each pass takes longer and longer to re-fill the mapping as the > current > nfs_readdir() invocation is further along. > > READDIRPLUS isn't the problem, the problem is invalidating the > directory > mapping in the middle of a series of getdents() if we do a CREATE. > Also, I > think a similar thing happens if the directory's mtime or ctime is > updated - > but in that case we set NFS_INO_INVALID_DATA because the change_attr > updates. > > So, for directories with a large number of entries that updates often, > it can > be very slow to list the directory. > > Why did 311324ad1713 change nfs_readdir from > > if (nfs_attribute_cache_expired(inode)) > nfs_revalidate_mapping(inode, file->f_mapping); > to > > if (nfs_attribute_cache_expired(inode) || nfsi->cache_validity & > NFS_INO_INVALID_DATA) > nfs_revalidate_mapping(inode, file->f_mapping); > > .. and can we go back to the way it was before? > > OK.. I understand why -- it is more correct since if we know the > directory has > changed, we might as well fetch the change. Otherwise, we might be > creating > files and then wondering why they aren't listed. But that won't happen unless a process deliberately starts over at position 0.. I think a user creating a file, then listing the directory would work as expected. Scott Mayhew already addressed this problem in: 07b5ce8ef2d8 ("NFS: Make nfs_readdir revalidate less often") So, from 3.10 to 3.14 we had the behavior where we wouldn't clear the mapping unless pos = 0 or the attribute cache expired. > It might be nicer to not invalidate the mapping we're currently using > for > readdir, though. Maybe there's a way to keep the mapping for the > currently > opened directory and invalidate it once it's closed. POSIX readdir says: If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. However, RFC 1813 says: The client should be careful to avoid holding directory entry cookies across operations that modify the directory contents, such as REMOVE and CREATE. Ideally, we would keep using the same cookieverf until the server sends NFS3ERR_BAD_COOKIE, and maybe just keep track of the last position/entry cookie so we could pick up where we left off after invalidation. Ben ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-23 15:09 I can't get no readdir satisfaction Benjamin Coddington 2016-08-23 15:36 ` Trond Myklebust @ 2016-08-23 19:36 ` J. Bruce Fields 2016-08-23 21:50 ` Benjamin Coddington 1 sibling, 1 reply; 19+ messages in thread From: J. Bruce Fields @ 2016-08-23 19:36 UTC (permalink / raw) To: Benjamin Coddington; +Cc: List Linux NFS Mailing On Tue, Aug 23, 2016 at 11:09:37AM -0400, Benjamin Coddington wrote: > Hi linux-nfs, > > 311324ad1713 ("NFS: Be more aggressive in using readdirplus for 'ls -l' > situations") changed when nfs_readdir() decides to revalidate the > directory's mapping, which contains all the entries. In addition to just > checking if the attribute cache has expired, it includes a check to see if > NFS_INO_INVALID_DATA is set on the directory. > > Well, customers that have directories with very many dentries and that same > directory's attributes are frequently updated are now grouchy that `ls -l` > takes so long since any update of the directory causes the mapping to be > invalidated and we have to start over filling the directory's mapping. Apologies, I don't understand the client's readdir implementation. So it really zeroes out the cookie every time it invalidates the directory cache? I also seem to remember it makes up its own cookies to return to users instead of returning the server's. Is the cookie invalidation a consequence of that? I don't think it should have to be. And as long as these two things (cache and cookie validity) are tied together, I can't see how we're going to guarantee readdir progress. --b. > > I actually haven't put real hard thought into it yet (because often for me > that just wastes a lot of time), so I am doing the lazy thing by asking this > question: > > Can we go back to just the using the attribute cache timeout, or should we > get all heuristical about readdir? > > Ben > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: I can't get no readdir satisfaction 2016-08-23 19:36 ` J. Bruce Fields @ 2016-08-23 21:50 ` Benjamin Coddington 0 siblings, 0 replies; 19+ messages in thread From: Benjamin Coddington @ 2016-08-23 21:50 UTC (permalink / raw) To: J. Bruce Fields; +Cc: List Linux NFS Mailing On 23 Aug 2016, at 15:36, J. Bruce Fields wrote: > On Tue, Aug 23, 2016 at 11:09:37AM -0400, Benjamin Coddington wrote: >> Hi linux-nfs, >> >> 311324ad1713 ("NFS: Be more aggressive in using readdirplus for 'ls >> -l' >> situations") changed when nfs_readdir() decides to revalidate the >> directory's mapping, which contains all the entries. In addition to >> just >> checking if the attribute cache has expired, it includes a check to >> see if >> NFS_INO_INVALID_DATA is set on the directory. >> >> Well, customers that have directories with very many dentries and >> that same >> directory's attributes are frequently updated are now grouchy that >> `ls -l` >> takes so long since any update of the directory causes the mapping to >> be >> invalidated and we have to start over filling the directory's >> mapping. > > Apologies, I don't understand the client's readdir implementation. So > it really zeroes out the cookie every time it invalidates the > directory > cache? Yes, it zeros the cookieverf, so as I understand it we have to start over. But, maybe for a given open() of a directory we can just keep the same cookieverf, even if we invalidate the mapping we can continue to return entries where the last nfs_readdir() left off.. I don't fully understand the client's readdir implementation either.. > I also seem to remember it makes up its own cookies to return to users > instead of returning the server's. Is the cookie invalidation a > consequence of that? I don't think it should have to be. And as long > as these two things (cache and cookie validity) are tied together, I > can't see how we're going to guarantee readdir progress. For users, it looks like a "position" is used, which is like an index into the entries. After READDIR, each entry's cookie is stored in the pagecache in an array indexed by this position. So in order to resume reading the directory at a position, we need retrieve all the prior entries from the server to determine what cookie to start from. .. that made sense to me. I hope I am understanding this correctly. Ben ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2016-08-24 15:18 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-23 15:09 I can't get no readdir satisfaction Benjamin Coddington 2016-08-23 15:36 ` Trond Myklebust 2016-08-23 21:21 ` Benjamin Coddington 2016-08-24 12:18 ` Trond Myklebust 2016-08-24 13:15 ` Benjamin Coddington 2016-08-24 13:39 ` Trond Myklebust 2016-08-24 13:56 ` J. Bruce Fields 2016-08-24 14:02 ` Trond Myklebust 2016-08-24 14:16 ` Benjamin Coddington 2016-08-24 14:19 ` Trond Myklebust 2016-08-24 15:18 ` Benjamin Coddington 2016-08-24 14:20 ` Fields Bruce James 2016-08-24 14:26 ` Trond Myklebust 2016-08-24 14:40 ` J. Bruce Fields 2016-08-24 14:53 ` Trond Myklebust 2016-08-24 15:16 ` Fields Bruce James 2016-08-24 13:02 ` Benjamin Coddington 2016-08-23 19:36 ` J. Bruce Fields 2016-08-23 21:50 ` Benjamin Coddington
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.