linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* FS-Cache: Duplicate cookie detected
@ 2019-03-06  0:24 Christian Kujau
  2019-03-06 13:16 ` David Howells
  2019-03-06 17:20 ` David Howells
  0 siblings, 2 replies; 15+ messages in thread
From: Christian Kujau @ 2019-03-06  0:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Howells, Anna Schumaker, Steve Dickson

Hi,

ever since ec0328e46d6e ("fscache: Maintain a catalogue of allocated 
cookies") was commited, people are seeing[0] those "Duplicate cookie 
detected" messages in syslog, see below. NFS and CIFS mounts appear to 
continue to work, but these messsages are new and I too am wondering if 
this is something to worry about.

They _are_ logged with pr_err in fs/fscache/cookie.c, but maybe this needs 
to be changed to a different loglevel?

Thanks,
Christian.

 FS-Cache: Duplicate cookie detected
 FS-Cache: O-cookie c=000000009da9dbf0 [p=000000001593f904 fl=222 nc=0 na=1]
 FS-Cache: O-cookie d=00000000287febd9 n=00000000980c9e8a
 FS-Cache: O-key=[8] '020001bdc0a80064'
 FS-Cache: N-cookie c=00000000bfe3f869 [p=000000001593f904 fl=2 nc=0 na=1]
 FS-Cache: N-cookie d=00000000287febd9 n=00000000e153f178
 FS-Cache: N-key=[8] '020001bdc0a80064'


[0] https://bugzilla.kernel.org/show_bug.cgi?id=200145
-- 
BOFH excuse #318:

Your EMAIL is now being delivered by the USPS.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-06  0:24 FS-Cache: Duplicate cookie detected Christian Kujau
@ 2019-03-06 13:16 ` David Howells
  2019-03-06 17:20 ` David Howells
  1 sibling, 0 replies; 15+ messages in thread
From: David Howells @ 2019-03-06 13:16 UTC (permalink / raw)
  To: Christian Kujau; +Cc: dhowells, linux-kernel, Anna Schumaker, Steve Dickson

Christian Kujau <lists@nerdbynature.de> wrote:

>  FS-Cache: Duplicate cookie detected
>  FS-Cache: O-cookie c=000000009da9dbf0 [p=000000001593f904 fl=222 nc=0 na=1]
>  FS-Cache: O-cookie d=00000000287febd9 n=00000000980c9e8a
>  FS-Cache: O-key=[8] '020001bdc0a80064'
>  FS-Cache: N-cookie c=00000000bfe3f869 [p=000000001593f904 fl=2 nc=0 na=1]
>  FS-Cache: N-cookie d=00000000287febd9 n=00000000e153f178
>  FS-Cache: N-key=[8] '020001bdc0a80064'

I've just spotted something that all these things have in common.  The old
cookie has FSCACHE_COOKIE_RELINQUISHING set in the flags (fl=222).  I don't
have a wait for this situation in fscache - which I should.

Interestingly, '020001bdc0a80064' looks like is could be the key from the
server level, assuming this is from NFS:

	0200 01bdc0a8 0064

could be:

	AF_INET (LE), 1.189.192.168, 100 (BE)

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-06  0:24 FS-Cache: Duplicate cookie detected Christian Kujau
  2019-03-06 13:16 ` David Howells
@ 2019-03-06 17:20 ` David Howells
  2019-03-06 17:56   ` Christian Kujau
  2019-03-08 14:49   ` David Howells
  1 sibling, 2 replies; 15+ messages in thread
From: David Howells @ 2019-03-06 17:20 UTC (permalink / raw)
  To: Christian Kujau; +Cc: dhowells, linux-kernel, Anna Schumaker, Steve Dickson

David Howells <dhowells@redhat.com> wrote:

> >  FS-Cache: Duplicate cookie detected
> >  FS-Cache: O-cookie c=000000009da9dbf0 [p=000000001593f904 fl=222 nc=0 na=1]
> >  FS-Cache: O-cookie d=00000000287febd9 n=00000000980c9e8a
> >  FS-Cache: O-key=[8] '020001bdc0a80064'
> >  FS-Cache: N-cookie c=00000000bfe3f869 [p=000000001593f904 fl=2 nc=0 na=1]
> >  FS-Cache: N-cookie d=00000000287febd9 n=00000000e153f178
> >  FS-Cache: N-key=[8] '020001bdc0a80064'
> 
> I've just spotted something that all these things have in common.  The old
> cookie has FSCACHE_COOKIE_RELINQUISHING set in the flags (fl=222).  I don't
> have a wait for this situation in fscache - which I should.

I take it back - 0x200 is FSCACHE_COOKIE_ACQUIRED. :-/

I can reproduce a slightly different problem by setting off ~6000 parallel
processes, each reading its own individual directory of files.

I also see reports like this:

   FS-Cache: Duplicate cookie detected
   FS-Cache: O-cookie c=00000000db33ad59 [p=000000004bc53500 fl=218 nc=0 na=0]
   FS-Cache: O-cookie d=          (null) n=          (null)
   FS-Cache: O-cookie o=000000006cf6db4f
   FS-Cache: O-key=[16] '0100010101000000e51fc6000323ae02'
   FS-Cache: N-cookie c=00000000791c49d0 [p=000000004bc53500 fl=2 nc=0 na=1]
   FS-Cache: N-cookie d=00000000e220fe14 n=00000000d4484489
   FS-Cache: N-key=[16] '0100010101000000e51fc6000323ae02'

with no cookie def or netfs data and flags ACQUIRED, RELINQUISHED and
INVALIDATING - which I can insert a wait for.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-06 17:20 ` David Howells
@ 2019-03-06 17:56   ` Christian Kujau
  2019-03-08 14:49   ` David Howells
  1 sibling, 0 replies; 15+ messages in thread
From: Christian Kujau @ 2019-03-06 17:56 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel, Anna Schumaker, Steve Dickson

On Wed, 6 Mar 2019, David Howells wrote:
> I can reproduce a slightly different problem by setting off ~6000 parallel
> processes, each reading its own individual directory of files.

Ususually I only see it shortly after mount, and only once, but I too can 
reproduce it with NFStest ([0], and there's a Fedora package too) via 
"nfstest_cache --server $SERVER --client `hostname`", which then produces 
a couple of these messages:

FS-Cache: Duplicate cookie detected
FS-Cache: O-cookie c=000000002fcc866b [p=00000000c10c6e18 fl=222 nc=0 na=1]
FS-Cache: O-cookie d=00000000d5ed73bb n=00000000076c9150
FS-Cache: O-key=[10] '040002000801c0a80073'
FS-Cache: N-cookie c=00000000e8d5dcd4 [p=00000000c10c6e18 fl=2 nc=0 na=1]
FS-Cache: N-cookie d=00000000d5ed73bb n=00000000a54e9705
FS-Cache: N-key=[10] '040002000801c0a80073'


...and the O-key does indeed seem to resemble the server address, 
somewhat:

 >>> s = "040002000801c0a80073";
 >>> bytes = ["".join(x) for x in zip(*[iter(s)]*2)]; bytes = [int(x, 16) for x in bytes]; print ".".join(str(x) for x in reversed(bytes))
 115.0.168.192.1.8.0.2.0.4
 ^^^^^^^^^^^^^

Mount options on that client are:

$ mount | grep nfs4
nfs:/usr/local/src on /usr/local/src type nfs4 (ro,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.56.139,local_lock=none,addr=192.168.0.115)

...so FS-Cache ("fsc") isn't even used here. The server in that scenario 
is a current Fedora 29 installation.

HTH,
Christian.

[0] http://wiki.linux-nfs.org/wiki/index.php/NFStest

> 
> I also see reports like this:
> 
>    FS-Cache: Duplicate cookie detected
>    FS-Cache: O-cookie c=00000000db33ad59 [p=000000004bc53500 fl=218 nc=0 na=0]
>    FS-Cache: O-cookie d=          (null) n=          (null)
>    FS-Cache: O-cookie o=000000006cf6db4f
>    FS-Cache: O-key=[16] '0100010101000000e51fc6000323ae02'
>    FS-Cache: N-cookie c=00000000791c49d0 [p=000000004bc53500 fl=2 nc=0 na=1]
>    FS-Cache: N-cookie d=00000000e220fe14 n=00000000d4484489
>    FS-Cache: N-key=[16] '0100010101000000e51fc6000323ae02'
> 
> with no cookie def or netfs data and flags ACQUIRED, RELINQUISHED and
> INVALIDATING - which I can insert a wait for.
> 
> David

-- 
BOFH excuse #420:

Feature was not beta tested

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-06 17:20 ` David Howells
  2019-03-06 17:56   ` Christian Kujau
@ 2019-03-08 14:49   ` David Howells
  2019-03-08 17:57     ` Christian Kujau
                       ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: David Howells @ 2019-03-08 14:49 UTC (permalink / raw)
  To: Christian Kujau; +Cc: dhowells, linux-kernel, Anna Schumaker, Steve Dickson

Christian Kujau <lists@nerdbynature.de> wrote:

> $ mount | grep nfs4
> nfs:/usr/local/src on /usr/local/src type nfs4 (ro,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.56.139,local_lock=none,addr=192.168.0.115)
> 
> ...so FS-Cache ("fsc") isn't even used here.

Interesting.  Can you do:

	cat /proc/fs/nfsfs/volumes

See the attached for a patch that helps with certain kinds of collision,
though I can't see that it should help with what you're seeing since the
RELINQUISHED flag isn't set on the old cookie (fl=222, but 0x10 isn't in
there).  You can monitor the number of waits by looking in
/proc/fs/fscache/stats for the:

	Acquire: n=289166 nul=0 noc=0 ok=286331 nbf=2 oom=0 wrq=23748

line.  See the wrq= value.

David
---
commit abe600bc6e5022e14c0ce135fa3d9ceaafa3cb46
Author: David Howells <dhowells@redhat.com>
Date:   Fri Mar 8 13:14:36 2019 +0000

    fscache: Fix cookie collision
    
    When adding a new cookie, a collision can sometimes occur because an old
    cookie with the same key is still in the process of being asynchronously
    relinquished, leading to something like the following being emitted into
    the kernel log:
    
       FS-Cache: Duplicate cookie detected
       FS-Cache: O-cookie c=00000000db33ad59 [p=000000004bc53500 fl=218 nc=0 na=0]
       FS-Cache: O-cookie d=          (null) n=          (null)
       FS-Cache: O-cookie o=000000006cf6db4f
       FS-Cache: O-key=[16] '0100010101000000e51fc6000323ae02'
       FS-Cache: N-cookie c=00000000791c49d0 [p=000000004bc53500 fl=2 nc=0 na=1]
       FS-Cache: N-cookie d=00000000e220fe14 n=00000000d4484489
       FS-Cache: N-key=[16] '0100010101000000e51fc6000323ae02'
    
    with the old cookie (O- lines) showing no cookie def or netfs data and
    showing flags ACQUIRED, RELINQUISHED and INVALIDATING (fl=218).
    
    Fix this by:
    
     (1) Setting FSCACHE_COOKIE_RELINQUISHING on the cookie we're about to tear
         dispose of.
    
     (2) Making fscache_hash_cookie() sleep on RELINQUISHING if there's a
         collision with a cookie that has RELINQUISHED set.
    
     (3) Clearing RELINQUISHING upon having unhashed the cookie and waking up
         anyone waiting for it to transition to unset.
    
    Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
    Signed-off-by: David Howells <dhowells@redhat.com>

diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
index 50f0a5757f48..f304bda88bb1 100644
--- a/Documentation/filesystems/caching/fscache.txt
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -231,6 +231,7 @@ proc files.
 		ok=N	Number of acq reqs succeeded
 		nbf=N	Number of acq reqs rejected due to error
 		oom=N	Number of acq reqs failed on ENOMEM
+		wrq=N	Number of waits for relinquishment of conflicting old cookies.
 	Lookups	n=N	Number of lookup calls made on cache backends
 		neg=N	Number of negative lookups made
 		pos=N	Number of positive lookups made
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index c550512ce335..75d0ffd36ac0 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -202,7 +202,9 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate)
 	struct hlist_bl_head *h;
 	struct hlist_bl_node *p;
 	unsigned int bucket;
+	int ret;
 
+retry:
 	bucket = candidate->key_hash & (ARRAY_SIZE(fscache_cookie_hash) - 1);
 	h = &fscache_cookie_hash[bucket];
 
@@ -220,19 +222,37 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate)
 	return candidate;
 
 collision:
-	if (test_and_set_bit(FSCACHE_COOKIE_ACQUIRED, &cursor->flags)) {
-		trace_fscache_cookie(cursor, fscache_cookie_collision,
-				     atomic_read(&cursor->usage));
-		pr_err("Duplicate cookie detected\n");
-		fscache_print_cookie(cursor, 'O');
-		fscache_print_cookie(candidate, 'N');
-		hlist_bl_unlock(h);
-		return NULL;
-	}
+	if (test_and_set_bit(FSCACHE_COOKIE_ACQUIRED, &cursor->flags))
+		goto duplicate;
 
 	fscache_cookie_get(cursor, fscache_cookie_get_reacquire);
 	hlist_bl_unlock(h);
 	return cursor;
+
+duplicate:
+	if (test_bit(FSCACHE_COOKIE_RELINQUISHED, &cursor->flags))
+		goto wait_for_removal;
+
+	trace_fscache_cookie(cursor, fscache_cookie_collision,
+			     atomic_read(&cursor->usage));
+	pr_err("Duplicate cookie detected\n");
+	fscache_print_cookie(cursor, 'O');
+	fscache_print_cookie(candidate, 'N');
+	hlist_bl_unlock(h);
+	return NULL;
+
+wait_for_removal:
+	fscache_cookie_get(cursor, fscache_cookie_get_wait);
+	hlist_bl_unlock(h);
+
+	fscache_stat(&fscache_n_acquires_wait_relinq);
+	ret = wait_on_bit(&cursor->flags, FSCACHE_COOKIE_RELINQUISHING,
+			  TASK_INTERRUPTIBLE);
+
+	fscache_cookie_put(cursor, fscache_cookie_put_wait);
+	if (ret < 0)
+		return NULL;
+	goto retry;
 }
 
 /*
@@ -816,9 +836,16 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 	/* No further netfs-accessing operations on this cookie permitted */
 	if (test_and_set_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags))
 		BUG();
+	set_bit(FSCACHE_COOKIE_RELINQUISHING, &cookie->flags);
 
 	__fscache_disable_cookie(cookie, aux_data, retire);
 
+	spin_lock(&cookie->lock);
+	if (test_bit(FSCACHE_COOKIE_RELINQUISHING, &cookie->flags) &&
+	    hlist_empty(&cookie->backing_objects))
+		fscache_unhash_cookie(cookie);
+	spin_unlock(&cookie->lock);
+
 	/* Clear pointers back to the netfs */
 	cookie->netfs_data	= NULL;
 	cookie->def		= NULL;
@@ -839,19 +866,24 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 EXPORT_SYMBOL(__fscache_relinquish_cookie);
 
 /*
- * Remove a cookie from the hash table.
+ * Remove a cookie from the hash table and wake up anyone waiting on this.
  */
-static void fscache_unhash_cookie(struct fscache_cookie *cookie)
+void fscache_unhash_cookie(struct fscache_cookie *cookie)
 {
 	struct hlist_bl_head *h;
 	unsigned int bucket;
 
+	_enter("%p", cookie);
+
 	bucket = cookie->key_hash & (ARRAY_SIZE(fscache_cookie_hash) - 1);
 	h = &fscache_cookie_hash[bucket];
 
 	hlist_bl_lock(h);
-	hlist_bl_del(&cookie->hash_link);
+	hlist_bl_del_init(&cookie->hash_link);
 	hlist_bl_unlock(h);
+
+	if (test_and_clear_bit(FSCACHE_COOKIE_RELINQUISHING, &cookie->flags))
+		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_RELINQUISHING);
 }
 
 /*
@@ -873,8 +905,8 @@ void fscache_cookie_put(struct fscache_cookie *cookie,
 			return;
 		BUG_ON(usage < 0);
 
+		ASSERT(hlist_bl_unhashed(&cookie->hash_link));
 		parent = cookie->parent;
-		fscache_unhash_cookie(cookie);
 		fscache_free_cookie(cookie);
 
 		cookie = parent;
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index d6209022e965..caf0df77e1b1 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -57,6 +57,7 @@ extern struct fscache_cookie *fscache_alloc_cookie(struct fscache_cookie *,
 						   const void *, size_t,
 						   void *, loff_t);
 extern struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *);
+extern void fscache_unhash_cookie(struct fscache_cookie *);
 extern void fscache_cookie_put(struct fscache_cookie *,
 			       enum fscache_cookie_trace);
 
@@ -227,6 +228,7 @@ extern atomic_t fscache_n_acquires_no_cache;
 extern atomic_t fscache_n_acquires_ok;
 extern atomic_t fscache_n_acquires_nobufs;
 extern atomic_t fscache_n_acquires_oom;
+extern atomic_t fscache_n_acquires_wait_relinq;
 
 extern atomic_t fscache_n_invalidates;
 extern atomic_t fscache_n_invalidates_run;
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 6d9cb1719de5..d16737e0307a 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -706,7 +706,7 @@ static const struct fscache_state *fscache_drop_object(struct fscache_object *ob
 	struct fscache_object *parent = object->parent;
 	struct fscache_cookie *cookie = object->cookie;
 	struct fscache_cache *cache = object->cache;
-	bool awaken = false;
+	int awaken = 0;
 
 	_enter("{OBJ%x,%d},%d", object->debug_id, object->n_children, event);
 
@@ -723,13 +723,18 @@ static const struct fscache_state *fscache_drop_object(struct fscache_object *ob
 	 */
 	spin_lock(&cookie->lock);
 	hlist_del_init(&object->cookie_link);
-	if (hlist_empty(&cookie->backing_objects) &&
-	    test_and_clear_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
-		awaken = true;
+	if (hlist_empty(&cookie->backing_objects)) {
+		if (test_bit(FSCACHE_COOKIE_RELINQUISHING, &cookie->flags))
+			awaken = 2;
+		else if (test_and_clear_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
+			awaken = 1;
+	}
 	spin_unlock(&cookie->lock);
 
 	if (awaken)
 		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
+	if (awaken == 2)
+		fscache_unhash_cookie(cookie);
 	if (test_and_clear_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags))
 		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
 
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 00564a1dfd76..ca53e84cb7ea 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -80,6 +80,7 @@ atomic_t fscache_n_acquires_no_cache;
 atomic_t fscache_n_acquires_ok;
 atomic_t fscache_n_acquires_nobufs;
 atomic_t fscache_n_acquires_oom;
+atomic_t fscache_n_acquires_wait_relinq;
 
 atomic_t fscache_n_invalidates;
 atomic_t fscache_n_invalidates_run;
@@ -163,13 +164,14 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_uncaches));
 
 	seq_printf(m, "Acquire: n=%u nul=%u noc=%u ok=%u nbf=%u"
-		   " oom=%u\n",
+		   " oom=%u wrq=%u\n",
 		   atomic_read(&fscache_n_acquires),
 		   atomic_read(&fscache_n_acquires_null),
 		   atomic_read(&fscache_n_acquires_no_cache),
 		   atomic_read(&fscache_n_acquires_ok),
 		   atomic_read(&fscache_n_acquires_nobufs),
-		   atomic_read(&fscache_n_acquires_oom));
+		   atomic_read(&fscache_n_acquires_oom),
+		   atomic_read(&fscache_n_acquires_wait_relinq));
 
 	seq_printf(m, "Lookups: n=%u neg=%u pos=%u crt=%u tmo=%u\n",
 		   atomic_read(&fscache_n_object_lookups),
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 686cfe997ed2..d9a969642da7 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -30,11 +30,13 @@ enum fscache_cookie_trace {
 	fscache_cookie_get_attach_object,
 	fscache_cookie_get_reacquire,
 	fscache_cookie_get_register_netfs,
+	fscache_cookie_get_wait,
 	fscache_cookie_put_acquire_nobufs,
 	fscache_cookie_put_dup_netfs,
 	fscache_cookie_put_relinquish,
 	fscache_cookie_put_object,
 	fscache_cookie_put_parent,
+	fscache_cookie_put_wait,
 };
 
 enum fscache_page_trace {
@@ -96,11 +98,13 @@ enum fscache_page_op_trace {
 	EM(fscache_cookie_get_attach_object,	"GET obj")		\
 	EM(fscache_cookie_get_reacquire,	"GET raq")		\
 	EM(fscache_cookie_get_register_netfs,	"GET net")		\
+	EM(fscache_cookie_get_wait,		"GET wai")		\
 	EM(fscache_cookie_put_acquire_nobufs,	"PUT nbf")		\
 	EM(fscache_cookie_put_dup_netfs,	"PUT dnt")		\
 	EM(fscache_cookie_put_relinquish,	"PUT rlq")		\
 	EM(fscache_cookie_put_object,		"PUT obj")		\
-	E_(fscache_cookie_put_parent,		"PUT prn")
+	EM(fscache_cookie_put_parent,		"PUT prn")		\
+	E_(fscache_cookie_put_wait,		"PUT wai")
 
 #define fscache_page_traces						\
 	EM(fscache_page_cached,			"Cached ")		\

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-08 14:49   ` David Howells
@ 2019-03-08 17:57     ` Christian Kujau
  2019-03-08 23:15     ` Christian Kujau
  2019-03-11 14:48     ` David Howells
  2 siblings, 0 replies; 15+ messages in thread
From: Christian Kujau @ 2019-03-08 17:57 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel, Anna Schumaker, Steve Dickson

On Fri, 8 Mar 2019, David Howells wrote:
> > $ mount | grep nfs4
> > nfs:/usr/local/src on /usr/local/src type nfs4 (ro,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.56.139,local_lock=none,addr=192.168.0.115)
> > 
> > ...so FS-Cache ("fsc") isn't even used here.
> 
> Interesting.  Can you do:
> 
> 	cat /proc/fs/nfsfs/volumes

That seems to confirm the mount options, fsc is disabled:

# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV          FSID                              FSC
v4 c0a80073  801 0:46         1cfd45bf1921474d:a795870ea80f5ff7 no

> See the attached for a patch that helps with certain kinds of collision,
> though I can't see that it should help with what you're seeing since the
> RELINQUISHED flag isn't set on the old cookie (fl=222, but 0x10 isn't in
> there).  You can monitor the number of waits by looking in
> /proc/fs/fscache/stats for the:
> 
> 	Acquire: n=289166 nul=0 noc=0 ok=286331 nbf=2 oom=0 wrq=23748

Ah, the wrq= field gets only introduced by this patch. OK, I'll see if I 
can build a test kernel with that and will report back.

Thanks for looking in to this,
Christian.
-- 
BOFH excuse #290:

The CPU has shifted, and become decentralized.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-08 14:49   ` David Howells
  2019-03-08 17:57     ` Christian Kujau
@ 2019-03-08 23:15     ` Christian Kujau
  2019-03-08 23:25       ` Christian Kujau
  2019-03-11 14:48     ` David Howells
  2 siblings, 1 reply; 15+ messages in thread
From: Christian Kujau @ 2019-03-08 23:15 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel, Anna Schumaker, Steve Dickson

On Fri, 8 Mar 2019, David Howells wrote:
> See the attached for a patch that helps with certain kinds of collision,
> though I can't see that it should help with what you're seeing since the
> RELINQUISHED flag isn't set on the old cookie (fl=222, but 0x10 isn't in
> there).  You can monitor the number of waits by looking in
> /proc/fs/fscache/stats for the:
> 
> 	Acquire: n=289166 nul=0 noc=0 ok=286331 nbf=2 oom=0 wrq=23748

Running Linux v5.0 with this patch applied does indeed still produce the 
"Duplicate cookie detected" messages, but I only ever see wrq=0 when 
running nfstest_cache: 

   https://paste.fedoraproject.org/paste/dkav0FQzYZxE9-V7GphjAQ

(Scroll down until the messages start to appear again)

Only the n= field seems to change during that test:


fedora0# grep wrq n2.log  | sort | uniq -c | sort -n
     28 Acquire: n=8 nul=0 noc=0 ok=1 nbf=0 oom=0 wrq=0
     29 Acquire: n=7 nul=0 noc=0 ok=1 nbf=0 oom=0 wrq=0
     34 Acquire: n=6 nul=0 noc=0 ok=1 nbf=0 oom=0 wrq=0
     82 Acquire: n=9 nul=0 noc=0 ok=1 nbf=0 oom=0 wrq=0
     93 Acquire: n=5 nul=0 noc=0 ok=1 nbf=0 oom=0 wrq=0

HTH,
Christian.
-- 
BOFH excuse #5:

static from plastic slide rules

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-08 23:15     ` Christian Kujau
@ 2019-03-08 23:25       ` Christian Kujau
  0 siblings, 0 replies; 15+ messages in thread
From: Christian Kujau @ 2019-03-08 23:25 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel, Anna Schumaker, Steve Dickson

On Fri, 8 Mar 2019, Christian Kujau wrote:
> Running Linux v5.0 with this patch applied does indeed still produce the 
> "Duplicate cookie detected" messages, but I only ever see wrq=0 when 
> running nfstest_cache: 
> 
>    https://paste.fedoraproject.org/paste/dkav0FQzYZxE9-V7GphjAQ

And again with the whole /proc/fs/fscache/stats output and better time 
stamps: https://paste.fedoraproject.org/paste/hZtCPStJlqB1d9JXnTFndQ

C
-- 
BOFH excuse #5:

static from plastic slide rules

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-08 14:49   ` David Howells
  2019-03-08 17:57     ` Christian Kujau
  2019-03-08 23:15     ` Christian Kujau
@ 2019-03-11 14:48     ` David Howells
  2019-03-11 14:50       ` [PATCH 1/3] fscache: Fix cookie collision David Howells
                         ` (4 more replies)
  2 siblings, 5 replies; 15+ messages in thread
From: David Howells @ 2019-03-11 14:48 UTC (permalink / raw)
  To: Christian Kujau; +Cc: dhowells, linux-kernel, Anna Schumaker, Steve Dickson

Hi Christian,

I've a couple more patches for you - one a bugfix and one that will print more
information.  They don't actually affect the problem you're seeing.  I'll post
them as replies to this message.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] fscache: Fix cookie collision
  2019-03-11 14:48     ` David Howells
@ 2019-03-11 14:50       ` David Howells
  2019-03-11 14:50       ` [PATCH 2/3] fscache: Add missing unhash David Howells
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2019-03-11 14:50 UTC (permalink / raw)
  To: lists; +Cc: linux-kernel, anna.schumaker, steved, dhowells

When adding a new cookie, a collision can sometimes occur because an old
cookie with the same key is still in the process of being asynchronously
relinquished, leading to something like the following being emitted into
the kernel log:

   FS-Cache: Duplicate cookie detected
   FS-Cache: O-cookie c=00000000db33ad59 [p=000000004bc53500 fl=218 nc=0 na=0]
   FS-Cache: O-cookie d=          (null) n=          (null)
   FS-Cache: O-cookie o=000000006cf6db4f
   FS-Cache: O-key=[16] '0100010101000000e51fc6000323ae02'
   FS-Cache: N-cookie c=00000000791c49d0 [p=000000004bc53500 fl=2 nc=0 na=1]
   FS-Cache: N-cookie d=00000000e220fe14 n=00000000d4484489
   FS-Cache: N-key=[16] '0100010101000000e51fc6000323ae02'

with the old cookie (O- lines) showing no cookie def or netfs data and
showing flags ACQUIRED, RELINQUISHED and INVALIDATING (fl=218).

Fix this by:

 (1) Setting FSCACHE_COOKIE_RELINQUISHING on the cookie we're about to tear
     dispose of.

 (2) Making fscache_hash_cookie() sleep on RELINQUISHING if there's a
     collision with a cookie that has RELINQUISHED set.

 (3) Clearing RELINQUISHING upon having unhashed the cookie and waking up
     anyone waiting for it to transition to unset.

Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
Signed-off-by: David Howells <dhowells@redhat.com>
---

 Documentation/filesystems/caching/fscache.txt |    1 
 fs/fscache/cookie.c                           |   58 +++++++++++++++++++------
 fs/fscache/internal.h                         |    2 +
 fs/fscache/object.c                           |   13 ++++--
 fs/fscache/stats.c                            |    6 ++-
 include/trace/events/fscache.h                |    6 ++-
 6 files changed, 66 insertions(+), 20 deletions(-)

diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
index 50f0a5757f48..f304bda88bb1 100644
--- a/Documentation/filesystems/caching/fscache.txt
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -231,6 +231,7 @@ proc files.
 		ok=N	Number of acq reqs succeeded
 		nbf=N	Number of acq reqs rejected due to error
 		oom=N	Number of acq reqs failed on ENOMEM
+		wrq=N	Number of waits for relinquishment of conflicting old cookies.
 	Lookups	n=N	Number of lookup calls made on cache backends
 		neg=N	Number of negative lookups made
 		pos=N	Number of positive lookups made
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index c550512ce335..75d0ffd36ac0 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -202,7 +202,9 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate)
 	struct hlist_bl_head *h;
 	struct hlist_bl_node *p;
 	unsigned int bucket;
+	int ret;
 
+retry:
 	bucket = candidate->key_hash & (ARRAY_SIZE(fscache_cookie_hash) - 1);
 	h = &fscache_cookie_hash[bucket];
 
@@ -220,19 +222,37 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate)
 	return candidate;
 
 collision:
-	if (test_and_set_bit(FSCACHE_COOKIE_ACQUIRED, &cursor->flags)) {
-		trace_fscache_cookie(cursor, fscache_cookie_collision,
-				     atomic_read(&cursor->usage));
-		pr_err("Duplicate cookie detected\n");
-		fscache_print_cookie(cursor, 'O');
-		fscache_print_cookie(candidate, 'N');
-		hlist_bl_unlock(h);
-		return NULL;
-	}
+	if (test_and_set_bit(FSCACHE_COOKIE_ACQUIRED, &cursor->flags))
+		goto duplicate;
 
 	fscache_cookie_get(cursor, fscache_cookie_get_reacquire);
 	hlist_bl_unlock(h);
 	return cursor;
+
+duplicate:
+	if (test_bit(FSCACHE_COOKIE_RELINQUISHED, &cursor->flags))
+		goto wait_for_removal;
+
+	trace_fscache_cookie(cursor, fscache_cookie_collision,
+			     atomic_read(&cursor->usage));
+	pr_err("Duplicate cookie detected\n");
+	fscache_print_cookie(cursor, 'O');
+	fscache_print_cookie(candidate, 'N');
+	hlist_bl_unlock(h);
+	return NULL;
+
+wait_for_removal:
+	fscache_cookie_get(cursor, fscache_cookie_get_wait);
+	hlist_bl_unlock(h);
+
+	fscache_stat(&fscache_n_acquires_wait_relinq);
+	ret = wait_on_bit(&cursor->flags, FSCACHE_COOKIE_RELINQUISHING,
+			  TASK_INTERRUPTIBLE);
+
+	fscache_cookie_put(cursor, fscache_cookie_put_wait);
+	if (ret < 0)
+		return NULL;
+	goto retry;
 }
 
 /*
@@ -816,9 +836,16 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 	/* No further netfs-accessing operations on this cookie permitted */
 	if (test_and_set_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags))
 		BUG();
+	set_bit(FSCACHE_COOKIE_RELINQUISHING, &cookie->flags);
 
 	__fscache_disable_cookie(cookie, aux_data, retire);
 
+	spin_lock(&cookie->lock);
+	if (test_bit(FSCACHE_COOKIE_RELINQUISHING, &cookie->flags) &&
+	    hlist_empty(&cookie->backing_objects))
+		fscache_unhash_cookie(cookie);
+	spin_unlock(&cookie->lock);
+
 	/* Clear pointers back to the netfs */
 	cookie->netfs_data	= NULL;
 	cookie->def		= NULL;
@@ -839,19 +866,24 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 EXPORT_SYMBOL(__fscache_relinquish_cookie);
 
 /*
- * Remove a cookie from the hash table.
+ * Remove a cookie from the hash table and wake up anyone waiting on this.
  */
-static void fscache_unhash_cookie(struct fscache_cookie *cookie)
+void fscache_unhash_cookie(struct fscache_cookie *cookie)
 {
 	struct hlist_bl_head *h;
 	unsigned int bucket;
 
+	_enter("%p", cookie);
+
 	bucket = cookie->key_hash & (ARRAY_SIZE(fscache_cookie_hash) - 1);
 	h = &fscache_cookie_hash[bucket];
 
 	hlist_bl_lock(h);
-	hlist_bl_del(&cookie->hash_link);
+	hlist_bl_del_init(&cookie->hash_link);
 	hlist_bl_unlock(h);
+
+	if (test_and_clear_bit(FSCACHE_COOKIE_RELINQUISHING, &cookie->flags))
+		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_RELINQUISHING);
 }
 
 /*
@@ -873,8 +905,8 @@ void fscache_cookie_put(struct fscache_cookie *cookie,
 			return;
 		BUG_ON(usage < 0);
 
+		ASSERT(hlist_bl_unhashed(&cookie->hash_link));
 		parent = cookie->parent;
-		fscache_unhash_cookie(cookie);
 		fscache_free_cookie(cookie);
 
 		cookie = parent;
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index d6209022e965..caf0df77e1b1 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -57,6 +57,7 @@ extern struct fscache_cookie *fscache_alloc_cookie(struct fscache_cookie *,
 						   const void *, size_t,
 						   void *, loff_t);
 extern struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *);
+extern void fscache_unhash_cookie(struct fscache_cookie *);
 extern void fscache_cookie_put(struct fscache_cookie *,
 			       enum fscache_cookie_trace);
 
@@ -227,6 +228,7 @@ extern atomic_t fscache_n_acquires_no_cache;
 extern atomic_t fscache_n_acquires_ok;
 extern atomic_t fscache_n_acquires_nobufs;
 extern atomic_t fscache_n_acquires_oom;
+extern atomic_t fscache_n_acquires_wait_relinq;
 
 extern atomic_t fscache_n_invalidates;
 extern atomic_t fscache_n_invalidates_run;
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 6d9cb1719de5..d16737e0307a 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -706,7 +706,7 @@ static const struct fscache_state *fscache_drop_object(struct fscache_object *ob
 	struct fscache_object *parent = object->parent;
 	struct fscache_cookie *cookie = object->cookie;
 	struct fscache_cache *cache = object->cache;
-	bool awaken = false;
+	int awaken = 0;
 
 	_enter("{OBJ%x,%d},%d", object->debug_id, object->n_children, event);
 
@@ -723,13 +723,18 @@ static const struct fscache_state *fscache_drop_object(struct fscache_object *ob
 	 */
 	spin_lock(&cookie->lock);
 	hlist_del_init(&object->cookie_link);
-	if (hlist_empty(&cookie->backing_objects) &&
-	    test_and_clear_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
-		awaken = true;
+	if (hlist_empty(&cookie->backing_objects)) {
+		if (test_bit(FSCACHE_COOKIE_RELINQUISHING, &cookie->flags))
+			awaken = 2;
+		else if (test_and_clear_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
+			awaken = 1;
+	}
 	spin_unlock(&cookie->lock);
 
 	if (awaken)
 		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
+	if (awaken == 2)
+		fscache_unhash_cookie(cookie);
 	if (test_and_clear_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags))
 		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
 
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 00564a1dfd76..ca53e84cb7ea 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -80,6 +80,7 @@ atomic_t fscache_n_acquires_no_cache;
 atomic_t fscache_n_acquires_ok;
 atomic_t fscache_n_acquires_nobufs;
 atomic_t fscache_n_acquires_oom;
+atomic_t fscache_n_acquires_wait_relinq;
 
 atomic_t fscache_n_invalidates;
 atomic_t fscache_n_invalidates_run;
@@ -163,13 +164,14 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_uncaches));
 
 	seq_printf(m, "Acquire: n=%u nul=%u noc=%u ok=%u nbf=%u"
-		   " oom=%u\n",
+		   " oom=%u wrq=%u\n",
 		   atomic_read(&fscache_n_acquires),
 		   atomic_read(&fscache_n_acquires_null),
 		   atomic_read(&fscache_n_acquires_no_cache),
 		   atomic_read(&fscache_n_acquires_ok),
 		   atomic_read(&fscache_n_acquires_nobufs),
-		   atomic_read(&fscache_n_acquires_oom));
+		   atomic_read(&fscache_n_acquires_oom),
+		   atomic_read(&fscache_n_acquires_wait_relinq));
 
 	seq_printf(m, "Lookups: n=%u neg=%u pos=%u crt=%u tmo=%u\n",
 		   atomic_read(&fscache_n_object_lookups),
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 686cfe997ed2..d9a969642da7 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -30,11 +30,13 @@ enum fscache_cookie_trace {
 	fscache_cookie_get_attach_object,
 	fscache_cookie_get_reacquire,
 	fscache_cookie_get_register_netfs,
+	fscache_cookie_get_wait,
 	fscache_cookie_put_acquire_nobufs,
 	fscache_cookie_put_dup_netfs,
 	fscache_cookie_put_relinquish,
 	fscache_cookie_put_object,
 	fscache_cookie_put_parent,
+	fscache_cookie_put_wait,
 };
 
 enum fscache_page_trace {
@@ -96,11 +98,13 @@ enum fscache_page_op_trace {
 	EM(fscache_cookie_get_attach_object,	"GET obj")		\
 	EM(fscache_cookie_get_reacquire,	"GET raq")		\
 	EM(fscache_cookie_get_register_netfs,	"GET net")		\
+	EM(fscache_cookie_get_wait,		"GET wai")		\
 	EM(fscache_cookie_put_acquire_nobufs,	"PUT nbf")		\
 	EM(fscache_cookie_put_dup_netfs,	"PUT dnt")		\
 	EM(fscache_cookie_put_relinquish,	"PUT rlq")		\
 	EM(fscache_cookie_put_object,		"PUT obj")		\
-	E_(fscache_cookie_put_parent,		"PUT prn")
+	EM(fscache_cookie_put_parent,		"PUT prn")		\
+	E_(fscache_cookie_put_wait,		"PUT wai")
 
 #define fscache_page_traces						\
 	EM(fscache_page_cached,			"Cached ")		\


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] fscache: Add missing unhash
  2019-03-11 14:48     ` David Howells
  2019-03-11 14:50       ` [PATCH 1/3] fscache: Fix cookie collision David Howells
@ 2019-03-11 14:50       ` David Howells
  2019-03-11 14:50       ` [PATCH 3/3] fscache: Print cookie debugging information David Howells
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2019-03-11 14:50 UTC (permalink / raw)
  To: lists; +Cc: linux-kernel, anna.schumaker, steved, dhowells

In __fscache_acquire_cookie() if fscache_acquire_non_index_cookie() fails,
we clean up and return NULL (indicating no cookie) - however, we don't
unhash the previously hashed object first.

Fix this by moving the failure code out of the main path through the
function to its own goto label and add the missing fscache_unhash_cookie()
call.

Moving the code out to its own label allows the remaining part of the
if-statement to be simplified also.

Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/cookie.c |   26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 75d0ffd36ac0..6931e45b7d59 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -346,20 +346,10 @@ struct fscache_cookie *__fscache_acquire_cookie(
 		/* if the object is an index then we need do nothing more here
 		 * - we create indices on disk when we need them as an index
 		 * may exist in multiple caches */
-		if (cookie->type != FSCACHE_COOKIE_TYPE_INDEX) {
-			if (fscache_acquire_non_index_cookie(cookie, object_size) == 0) {
-				set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-			} else {
-				atomic_dec(&parent->n_children);
-				fscache_cookie_put(cookie,
-						   fscache_cookie_put_acquire_nobufs);
-				fscache_stat(&fscache_n_acquires_nobufs);
-				_leave(" = NULL");
-				return NULL;
-			}
-		} else {
-			set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-		}
+		if (cookie->type != FSCACHE_COOKIE_TYPE_INDEX &&
+		    fscache_acquire_non_index_cookie(cookie, object_size) < 0)
+			goto failed;
+		set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
 	}
 
 	fscache_stat(&fscache_n_acquires_ok);
@@ -367,6 +357,14 @@ struct fscache_cookie *__fscache_acquire_cookie(
 out:
 	fscache_free_cookie(candidate);
 	return cookie;
+
+failed:
+	atomic_dec(&parent->n_children);
+	fscache_unhash_cookie(cookie);
+	fscache_cookie_put(cookie, fscache_cookie_put_acquire_nobufs);
+	fscache_stat(&fscache_n_acquires_nobufs);
+	_leave(" = NULL");
+	return NULL;
 }
 EXPORT_SYMBOL(__fscache_acquire_cookie);
 


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] fscache: Print cookie debugging information
  2019-03-11 14:48     ` David Howells
  2019-03-11 14:50       ` [PATCH 1/3] fscache: Fix cookie collision David Howells
  2019-03-11 14:50       ` [PATCH 2/3] fscache: Add missing unhash David Howells
@ 2019-03-11 14:50       ` David Howells
  2019-03-12  6:07       ` FS-Cache: Duplicate cookie detected Christian Kujau
  2019-03-12  7:01       ` David Howells
  4 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2019-03-11 14:50 UTC (permalink / raw)
  To: lists; +Cc: linux-kernel, anna.schumaker, steved, dhowells


---

 fs/afs/cache.c          |   11 +++++++++++
 fs/fscache/cookie.c     |    7 +++++--
 fs/nfs/fscache-index.c  |   11 +++++++++++
 include/linux/fscache.h |    2 ++
 4 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/fs/afs/cache.c b/fs/afs/cache.c
index f6d0a21e8052..c16b10dcf895 100644
--- a/fs/afs/cache.c
+++ b/fs/afs/cache.c
@@ -32,10 +32,21 @@ struct fscache_cookie_def afs_volume_cache_index_def = {
 	.type		= FSCACHE_COOKIE_TYPE_INDEX,
 };
 
+static void afs_vnode_print_cookie(void *cookie_netfs_data, char prefix)
+{
+	struct afs_vnode *vnode = cookie_netfs_data;
+
+	pr_err("kAFS    : %c-vnode %llx:%llx:%x\n",
+	       prefix, vnode->fid.vid, vnode->fid.vnode, vnode->fid.unique);
+	pr_err("kAFS    : %c-vol v=%p %llx\n",
+	       prefix, vnode->volume, vnode->volume->vid);
+}
+
 struct fscache_cookie_def afs_vnode_cache_index_def = {
 	.name		= "AFS.vnode",
 	.type		= FSCACHE_COOKIE_TYPE_DATAFILE,
 	.check_aux	= afs_vnode_cache_check_aux,
+	.print		= afs_vnode_print_cookie,
 };
 
 /*
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 6931e45b7d59..50db256e6593 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -41,8 +41,9 @@ static void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
 	       prefix, cookie, cookie->parent, cookie->flags,
 	       atomic_read(&cookie->n_children),
 	       atomic_read(&cookie->n_active));
-	pr_err("%c-cookie d=%p n=%p\n",
-	       prefix, cookie->def, cookie->netfs_data);
+	pr_err("%c-cookie d=%p n=%p [%s]\n",
+	       prefix, cookie->def, cookie->netfs_data,
+	       cookie->def ? cookie->def->name : "");
 
 	object = READ_ONCE(cookie->backing_objects.first);
 	if (object)
@@ -55,6 +56,8 @@ static void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
 	for (loop = 0; loop < cookie->key_len; loop++)
 		pr_cont("%02x", k[loop]);
 	pr_cont("'\n");
+	if (cookie->def && cookie->def->print)
+		cookie->def->print(cookie->netfs_data, prefix);
 }
 
 void fscache_free_cookie(struct fscache_cookie *cookie)
diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index 666415d13d52..bba6753cdfb7 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -124,6 +124,16 @@ static void nfs_fh_put_context(void *cookie_netfs_data, void *context)
 		put_nfs_open_context(context);
 }
 
+static void nfs_inode_print_cookie(void *cookie_netfs_data, char prefix)
+{
+	struct nfs_inode *nfsi = cookie_netfs_data;
+	struct inode *i = &nfsi->vfs_inode;
+
+	pr_err("NFS     : %c-nfs fl=%lx cv=%lx ii=%lx if=%x sb=%p sf=%lx\n",
+	       prefix, nfsi->flags, nfsi->cache_validity,
+	       i->i_ino, i->i_flags, i->i_sb, i->i_sb->s_flags);
+}
+
 /*
  * Define the inode object for FS-Cache.  This is used to describe an inode
  * object to fscache_acquire_cookie().  It is keyed by the NFS file handle for
@@ -139,4 +149,5 @@ const struct fscache_cookie_def nfs_fscache_inode_object_def = {
 	.check_aux	= nfs_fscache_inode_check_aux,
 	.get_context	= nfs_fh_get_context,
 	.put_context	= nfs_fh_put_context,
+	.print		= nfs_inode_print_cookie,
 };
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 84b90a79d75a..84525722522e 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -114,6 +114,8 @@ struct fscache_cookie_def {
 	void (*mark_page_cached)(void *cookie_netfs_data,
 				 struct address_space *mapping,
 				 struct page *page);
+
+	void (*print)(void *cookie_netfs_data, char prefix);
 };
 
 /*


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-11 14:48     ` David Howells
                         ` (2 preceding siblings ...)
  2019-03-11 14:50       ` [PATCH 3/3] fscache: Print cookie debugging information David Howells
@ 2019-03-12  6:07       ` Christian Kujau
  2019-03-12  7:01       ` David Howells
  4 siblings, 0 replies; 15+ messages in thread
From: Christian Kujau @ 2019-03-12  6:07 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel, Anna Schumaker, Steve Dickson

[-- Attachment #1: Type: text/plain, Size: 3829 bytes --]

On Mon, 11 Mar 2019, David Howells wrote:
> I've a couple more patches for you - one a bugfix and one that will print more
> information.  They don't actually affect the problem you're seeing.  I'll post
> them as replies to this message.

Thanks for the patches. I've applied all three to v5.0 and ran 
"nfstest_cache" and was able to reproduce the messages. Please note that 
I'm only running "nfstest_cache" because it's somehow able to reproduce 
the message reliably - otherwise the message just shows up once or twice 
in syslog, but I didn't know how to reproduce it.

But I noticed something else this time, and I did not notice that before: 
while running nfstest_cache, the "duplicate cookie" messages were only 
triggered when my other, non-test mount was also mounted during the test. 
Let me describe my F29 test VM again:

* VM boots, and /usr/local/src gets mounted via NFS, read-only, and
  with w/o fsc options. cachefilesd isn't even installed here.

* I run nfstest_cache and apparently it's mounting the same NFS export
  from the server to /mnt/t, as a readonly mount.

So two mounts, one in /usr/local/src, the other in /mnt/t, both readonly 
and both w/o "fsc", but the "duplicate cookie" message is only printed 
when /usr/local/src was mounted. If /usr/local/src wasn't mounted, the 
test would complete[0] and no "duplicate message was printed. And then I 
noticed:

----------------------------------------------------------
$ mount | tail -2 | fold
horus:/usr/local/src on /usr/local/src type nfs4 
(ro,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255, 
hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.56.139, 
local_lock=none,addr=192.168.0.115)

horus:/ on /mnt/t type nfs4 
(rw,relatime,vers=4.1,rsize=4096,wsize=4096,namlen=255, 
hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.56.139, 
local_lock=none,addr=192.168.0.115) 
----------------------------------------------------------

My /usr/local/src mount was mounted with vers=4.2 (default), while 
nfstest_cache was mounting its test-mount with vers=4.1! Apart from the 
different rsize/wsize values, the version number stood out. And indeed, 
when I mount my regular NFS mount /usr/local/src with vers=4.1, the 
"duplicate cookie" is no longer printed.

For simplicity, I've attached two logs to this email:

* nfs_no-mount.txt.xz - showing /proc/fs/nfsfs/volumes and
  /proc/fs/fscache/stats every 0.01 seconds, while running nfstest_cache
  in another terminal. Note that no duplicate "cookie messages" were 
  triggered, as /usr/local/src was not mounted.

* nfs_with-mount.txt.xz - same, but here /usr/local/src was mounted (and 
  defaulted to vers=4.2), and thus "duplicate cookie" messages were 
  printed.


I fear that all this may complicate this strange behaviour, and now we're 
examining NFS mount versions, but I only noticed that now, not earlier :-\

I can't comment on the patches much, as you mentioned they won't make the 
message go away, but I hope it printed more details now.

Thanks,
Christian.


[0] Again, I'm using nfstest_cache only to trigger the message. Everytime
    I execute it, the test fails, because I think it expects a rw-mount:

 $ nfstest_cache --server horus --client fedora0 --runtest=acregmin_attr
 *** Verify consistency of attribute caching with NFSv4.1 on a file 
 acregmin = 10
     TEST: Running test 'acregmin_attr'
     FAIL: Traceback (most recent call last):
            File "/usr/bin/nfstest_cache", line 199, in do_file_test
              fdw = open(self.absfile, "w")
            IOError: [Errno 30] Read-only file system: 
 '/mnt/t/nfstest_cache_20190311223404_f_1'
    TIME: 4.497078s
 1 tests (0 passed, 1 failed)
 Total time: 5.529826s

-- 
BOFH excuse #209:

Only people with names beginning with 'A' are getting mail this week (a la Microsoft)

[-- Attachment #2: Type: application/x-xz, Size: 5284 bytes --]

[-- Attachment #3: Type: application/x-xz, Size: 4820 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-11 14:48     ` David Howells
                         ` (3 preceding siblings ...)
  2019-03-12  6:07       ` FS-Cache: Duplicate cookie detected Christian Kujau
@ 2019-03-12  7:01       ` David Howells
  2019-03-31  9:05         ` Christian Kujau
  4 siblings, 1 reply; 15+ messages in thread
From: David Howells @ 2019-03-12  7:01 UTC (permalink / raw)
  To: Christian Kujau; +Cc: dhowells, linux-kernel, Anna Schumaker, Steve Dickson

Christian Kujau <lists@nerdbynature.de> wrote:

> My /usr/local/src mount was mounted with vers=4.2 (default), while 
> nfstest_cache was mounting its test-mount with vers=4.1! Apart from the 
> different rsize/wsize values, the version number stood out. And indeed, 
> when I mount my regular NFS mount /usr/local/src with vers=4.1, the 
> "duplicate cookie" is no longer printed.

Yeah - NFS superblocks are differentiated by a whole host of parameters,
including protocol version number, and caches aren't shared between
superblocks because this introduces a tricky coherency problem.

The issue is that NFS superblocks to the same place do not currently manage
coherency (inode attributes, data) between themselves, except via the server.

However, if "fsc" isn't given on the mount commandline, the superblock
probably shouldn't get a server-level cookie if we can avoid it.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: FS-Cache: Duplicate cookie detected
  2019-03-12  7:01       ` David Howells
@ 2019-03-31  9:05         ` Christian Kujau
  0 siblings, 0 replies; 15+ messages in thread
From: Christian Kujau @ 2019-03-31  9:05 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel, Anna Schumaker, Steve Dickson

Hi David,

On Tue, 12 Mar 2019, David Howells wrote:
> > My /usr/local/src mount was mounted with vers=4.2 (default), while 
> > nfstest_cache was mounting its test-mount with vers=4.1! Apart from the 
> > different rsize/wsize values, the version number stood out. And indeed, 
> > when I mount my regular NFS mount /usr/local/src with vers=4.1, the 
> > "duplicate cookie" is no longer printed.
> 
> Yeah - NFS superblocks are differentiated by a whole host of parameters,
> including protocol version number, and caches aren't shared between
> superblocks because this introduces a tricky coherency problem.
> 
> The issue is that NFS superblocks to the same place do not currently manage
> coherency (inode attributes, data) between themselves, except via the server.
> 
> However, if "fsc" isn't given on the mount commandline, the superblock
> probably shouldn't get a server-level cookie if we can avoid it.

Just checking - are you waiting for new results from me, should I test 
something that I missed? Or are new patches in the works? :-D

Thanks,
Christian.
-- 
BOFH excuse #139:

UBNC (user brain not connected)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-03-31  9:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-06  0:24 FS-Cache: Duplicate cookie detected Christian Kujau
2019-03-06 13:16 ` David Howells
2019-03-06 17:20 ` David Howells
2019-03-06 17:56   ` Christian Kujau
2019-03-08 14:49   ` David Howells
2019-03-08 17:57     ` Christian Kujau
2019-03-08 23:15     ` Christian Kujau
2019-03-08 23:25       ` Christian Kujau
2019-03-11 14:48     ` David Howells
2019-03-11 14:50       ` [PATCH 1/3] fscache: Fix cookie collision David Howells
2019-03-11 14:50       ` [PATCH 2/3] fscache: Add missing unhash David Howells
2019-03-11 14:50       ` [PATCH 3/3] fscache: Print cookie debugging information David Howells
2019-03-12  6:07       ` FS-Cache: Duplicate cookie detected Christian Kujau
2019-03-12  7:01       ` David Howells
2019-03-31  9:05         ` Christian Kujau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).