All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
@ 2010-12-29 20:47 J. Bruce Fields
  2010-12-29 20:59 ` J. Bruce Fields
  2011-01-03 22:26 ` J. Bruce Fields
  0 siblings, 2 replies; 20+ messages in thread
From: J. Bruce Fields @ 2010-12-29 20:47 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: J. Bruce Fields <bfields@redhat.com>

Once a sunrpc cache entry is non-NEGATIVE, we should be replacing it
(and allowing any concurrent users to destroy it on last put) instead of
trying to update it in place.

Otherwise someone referencing the ip_map we're modifying here could try
to use the m_client just as we're putting the last reference.

The bug should only be seen by users of the legacy nfsd interfaces.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 net/sunrpc/svcauth_unix.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

Intended to apply for 2.6.38 if this looks right....

--b.

diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index a04ac91..70586ad 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -386,6 +386,21 @@ int auth_unix_forget_old(struct auth_domain *dom)
 }
 EXPORT_SYMBOL_GPL(auth_unix_forget_old);
 
+static void auth_unix_invalidate_ip_map(struct cache_detail *cd, struct ip_map *ipm)
+{
+	struct cache_head *ch;
+	struct ip_map ip;
+
+	ip.m_client = ipm->m_client;
+	ip.h.flags = CACHE_NEGATIVE;
+	ip.h.expiry_time = ipm->h.expiry_time;
+	ch = sunrpc_cache_update(cd, &ip.h, &ipm->h,
+				 hash_str(ipm->m_class, IP_HASHBITS) ^
+				 hash_ip6(ipm->m_addr));
+	if (ch)
+		cache_put(ch, cd);
+}
+
 struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr)
 {
 	struct ip_map *ipm;
@@ -401,8 +416,7 @@ struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr)
 		return NULL;
 
 	if ((ipm->m_client->addr_changes - ipm->m_add_change) >0) {
-		if (test_and_set_bit(CACHE_NEGATIVE, &ipm->h.flags) == 0)
-			auth_domain_put(&ipm->m_client->h);
+		auth_unix_invalidate_ip_map(sn->ip_map_cache, ipm);
 		rv = NULL;
 	} else {
 		rv = &ipm->m_client->h;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2010-12-29 20:47 [PATCH] svcrpc: modifying positive sunrpc cache entries is racy J. Bruce Fields
@ 2010-12-29 20:59 ` J. Bruce Fields
  2010-12-30  1:19   ` Neil Brown
  2011-01-03 22:26 ` J. Bruce Fields
  1 sibling, 1 reply; 20+ messages in thread
From: J. Bruce Fields @ 2010-12-29 20:59 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

On Wed, Dec 29, 2010 at 03:47:52PM -0500, bfields wrote:
> From: J. Bruce Fields <bfields@redhat.com>
> 
> Once a sunrpc cache entry is non-NEGATIVE, we should be replacing it
> (and allowing any concurrent users to destroy it on last put) instead of
> trying to update it in place.
> 
> Otherwise someone referencing the ip_map we're modifying here could try
> to use the m_client just as we're putting the last reference.
> 
> The bug should only be seen by users of the legacy nfsd interfaces.
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  net/sunrpc/svcauth_unix.c |   18 ++++++++++++++++--
>  1 files changed, 16 insertions(+), 2 deletions(-)
> 
> Intended to apply for 2.6.38 if this looks right....

Also noticed while trying to track down an rhel5 oops in
svcauth_unix_set_client():

	- cache_check() can set an entry negative in place, which if
	  nothing else must cause a leak in some cases.  (Because when
	  the entry is eventually destroyed, it will be assumed to not
	  have any contents.)  I suppose the fix is again to try to
	  adding a new negative entry instead.

	- since cache_check() doesn't use any locking, I can't see what
	  guarantees that when it sees the CACHE_VALID bit set and
	  CACHE_NEGATIVE cleared, it must necessarily see the new
	  contents.   I think that'd be fixed by a wmb() before setting
	  those bits and a rmb() after checking them.  I don't know if
	  it's actually possible to hit that bug....

--b.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2010-12-29 20:59 ` J. Bruce Fields
@ 2010-12-30  1:19   ` Neil Brown
  2010-12-30  1:57     ` J. Bruce Fields
  0 siblings, 1 reply; 20+ messages in thread
From: Neil Brown @ 2010-12-30  1:19 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Wed, 29 Dec 2010 15:59:42 -0500 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Wed, Dec 29, 2010 at 03:47:52PM -0500, bfields wrote:
> > From: J. Bruce Fields <bfields@redhat.com>
> > 
> > Once a sunrpc cache entry is non-NEGATIVE, we should be replacing it
> > (and allowing any concurrent users to destroy it on last put) instead of
> > trying to update it in place.
> > 
> > Otherwise someone referencing the ip_map we're modifying here could try
> > to use the m_client just as we're putting the last reference.
> > 
> > The bug should only be seen by users of the legacy nfsd interfaces.
> > 
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  net/sunrpc/svcauth_unix.c |   18 ++++++++++++++++--
> >  1 files changed, 16 insertions(+), 2 deletions(-)
> > 
> > Intended to apply for 2.6.38 if this looks right....
> 
> Also noticed while trying to track down an rhel5 oops in
> svcauth_unix_set_client():
> 
> 	- cache_check() can set an entry negative in place, which if
> 	  nothing else must cause a leak in some cases.  (Because when
> 	  the entry is eventually destroyed, it will be assumed to not
> 	  have any contents.)  I suppose the fix is again to try to
> 	  adding a new negative entry instead.

cache_check should only set an entry 'negative' if it is not already valid
(rv == -EAGAIN) and there is no up-call pending.

Maybe we should check CACHE_VALID again after the test_and_set of
CACHE_PENDING, but is a very unlikely race (if it is actually a race at all)

> 
> 	- since cache_check() doesn't use any locking, I can't see what
> 	  guarantees that when it sees the CACHE_VALID bit set and
> 	  CACHE_NEGATIVE cleared, it must necessarily see the new
> 	  contents.   I think that'd be fixed by a wmb() before setting
> 	  those bits and a rmb() after checking them.  I don't know if
> 	  it's actually possible to hit that bug....

Yes, we probably want a set_bit_lock in cache_fresh_locked() though I don't
think that exists, so we could use test_and_set_bit_locked() instead.

But it does feel like maybe we should add some locking to cache_check.
Take the lock at the the start, and release it after the
test_and_set_bit(CACHE_PENDING) or once we have decided not to do that ???

I think when I wrote this I might have thought that bit ops implied memory
ordering ... or maybe I just didn't think through the issues properly at all.

Thanks,
NeilBrown


> 
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2010-12-30  1:19   ` Neil Brown
@ 2010-12-30  1:57     ` J. Bruce Fields
  2011-01-03 20:55       ` J. Bruce Fields
  0 siblings, 1 reply; 20+ messages in thread
From: J. Bruce Fields @ 2010-12-30  1:57 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs

On Thu, Dec 30, 2010 at 12:19:40PM +1100, Neil Brown wrote:
> On Wed, 29 Dec 2010 15:59:42 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
> > On Wed, Dec 29, 2010 at 03:47:52PM -0500, bfields wrote:
> > > From: J. Bruce Fields <bfields@redhat.com>
> > > 
> > > Once a sunrpc cache entry is non-NEGATIVE, we should be replacing it
> > > (and allowing any concurrent users to destroy it on last put) instead of
> > > trying to update it in place.
> > > 
> > > Otherwise someone referencing the ip_map we're modifying here could try
> > > to use the m_client just as we're putting the last reference.
> > > 
> > > The bug should only be seen by users of the legacy nfsd interfaces.
> > > 
> > > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > > ---
> > >  net/sunrpc/svcauth_unix.c |   18 ++++++++++++++++--
> > >  1 files changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > Intended to apply for 2.6.38 if this looks right....
> > 
> > Also noticed while trying to track down an rhel5 oops in
> > svcauth_unix_set_client():
> > 
> > 	- cache_check() can set an entry negative in place, which if
> > 	  nothing else must cause a leak in some cases.  (Because when
> > 	  the entry is eventually destroyed, it will be assumed to not
> > 	  have any contents.)  I suppose the fix is again to try to
> > 	  adding a new negative entry instead.
> 
> cache_check should only set an entry 'negative' if it is not already valid
> (rv == -EAGAIN) and there is no up-call pending.

I don't think anything keeps VALID from being set after the
cache_is_valid check but before the code that does the
set_bit(CACHE_NEGATIVE).

> Maybe we should check CACHE_VALID again after the test_and_set of
> CACHE_PENDING, but is a very unlikely race (if it is actually a race at all)
> 
> > 
> > 	- since cache_check() doesn't use any locking, I can't see what
> > 	  guarantees that when it sees the CACHE_VALID bit set and
> > 	  CACHE_NEGATIVE cleared, it must necessarily see the new
> > 	  contents.   I think that'd be fixed by a wmb() before setting
> > 	  those bits and a rmb() after checking them.  I don't know if
> > 	  it's actually possible to hit that bug....
> 
> Yes, we probably want a set_bit_lock in cache_fresh_locked() though I don't
> think that exists, so we could use test_and_set_bit_locked() instead.
> 
> But it does feel like maybe we should add some locking to cache_check.
> Take the lock at the the start, and release it after the
> test_and_set_bit(CACHE_PENDING) or once we have decided not to do that ???

Maybe so.

--b.

> 
> I think when I wrote this I might have thought that bit ops implied memory
> ordering ... or maybe I just didn't think through the issues properly at all.
> 
> Thanks,
> NeilBrown
> 
> 
> > 
> > --b.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2010-12-30  1:57     ` J. Bruce Fields
@ 2011-01-03 20:55       ` J. Bruce Fields
  2011-01-04  5:01         ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-03 20:55 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs

On Wed, Dec 29, 2010 at 08:57:19PM -0500, J. Bruce Fields wrote:
> On Thu, Dec 30, 2010 at 12:19:40PM +1100, Neil Brown wrote:
> > On Wed, 29 Dec 2010 15:59:42 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> > wrote:
> > > Also noticed while trying to track down an rhel5 oops in
> > > svcauth_unix_set_client():
> > > 
> > > 	- cache_check() can set an entry negative in place, which if
> > > 	  nothing else must cause a leak in some cases.  (Because when
> > > 	  the entry is eventually destroyed, it will be assumed to not
> > > 	  have any contents.)  I suppose the fix is again to try to
> > > 	  adding a new negative entry instead.
> > 
> > cache_check should only set an entry 'negative' if it is not already valid
> > (rv == -EAGAIN) and there is no up-call pending.
> 
> I don't think anything keeps VALID from being set after the
> cache_is_valid check but before the code that does the
> set_bit(CACHE_NEGATIVE).
> 
> > Maybe we should check CACHE_VALID again after the test_and_set of
> > CACHE_PENDING, but is a very unlikely race (if it is actually a race at all)
> > 
> > > 
> > > 	- since cache_check() doesn't use any locking, I can't see what
> > > 	  guarantees that when it sees the CACHE_VALID bit set and
> > > 	  CACHE_NEGATIVE cleared, it must necessarily see the new
> > > 	  contents.   I think that'd be fixed by a wmb() before setting
> > > 	  those bits and a rmb() after checking them.  I don't know if
> > > 	  it's actually possible to hit that bug....
> > 
> > Yes, we probably want a set_bit_lock in cache_fresh_locked() though I don't
> > think that exists, so we could use test_and_set_bit_locked() instead.
> > 
> > But it does feel like maybe we should add some locking to cache_check.
> > Take the lock at the the start, and release it after the
> > test_and_set_bit(CACHE_PENDING) or once we have decided not to do that ???
> 
> Maybe so.

Here's one attempt.

--b.

commit 55563023f85d01698ccf72325c87e3a7039a189b
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Mon Jan 3 15:10:27 2011 -0500

    svcrpc: take locks to fix cache_check races
    
    There are at least a couple races in cache_check:
    
    	- We attempt to turn a cache entry negative in place.  But that
    	  entry may already have been filled in by some other task since
    	  we last checked whether it was valid, so we could be modifying
    	  an already-valid entry.  If nothing else there's a likely leak
    	  in such a case when the entry is eventually put() and contents
    	  are not freed because it has CACHE_NEGATIVE set.
    	- If cache_check races with an update that is turning the entry
    	  CACHE_VALID, then it's possible that the CACHE_VALID bit could
    	  become visible on this CPU before the actual contents do, so
    	  we could tell the caller this entry is ready to use when in
    	  fact the caller could still get invalid contents.
    
    Some memory barriers might be sufficient to fix at least the latter; but
    for now let's keep things simple and take the hash_lock when we turn an
    entry negative or check the CACHE_VALID bit.
    
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 0d6002f..2105b40 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -200,8 +200,9 @@ static int cache_make_upcall(struct cache_detail *cd, struct cache_head *h)
 	return cd->cache_upcall(cd, h);
 }
 
-static inline int cache_is_valid(struct cache_detail *detail, struct cache_head *h)
+static int __cache_is_valid(struct cache_detail *detail, struct cache_head *h)
 {
+
 	if (!test_bit(CACHE_VALID, &h->flags))
 		return -EAGAIN;
 	else {
@@ -213,6 +214,33 @@ static inline int cache_is_valid(struct cache_detail *detail, struct cache_head
 	}
 }
 
+static int cache_is_valid(struct cache_detail *detail, struct cache_head *h)
+{
+	int rv;
+
+	read_lock(&detail->hash_lock);
+	rv = __cache_is_valid(detail, h);
+	read_unlock(&detail->hash_lock);
+	return rv;
+}
+
+static int try_to_negate_entry(struct cache_detail *detail, struct cache_head *h)
+{
+	int rv;
+
+	write_lock(&detail->hash_lock);
+	rv = __cache_is_valid(detail, h);
+	if (rv != -EAGAIN) {
+		write_unlock(&detail->hash_lock);
+		return rv;
+	}
+	set_bit(CACHE_NEGATIVE, &h->flags);
+	cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
+	write_unlock(&detail->hash_lock);
+	cache_fresh_unlocked(h, detail);
+	return -ENOENT;
+}
+
 /*
  * This is the generic cache management routine for all
  * the authentication caches.
@@ -251,14 +279,8 @@ int cache_check(struct cache_detail *detail,
 			case -EINVAL:
 				clear_bit(CACHE_PENDING, &h->flags);
 				cache_revisit_request(h);
-				if (rv == -EAGAIN) {
-					set_bit(CACHE_NEGATIVE, &h->flags);
-					cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
-					cache_fresh_unlocked(h, detail);
-					rv = -ENOENT;
-				}
+				rv = try_to_negate_entry(detail, h);
 				break;
-
 			case -EAGAIN:
 				clear_bit(CACHE_PENDING, &h->flags);
 				cache_revisit_request(h);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2010-12-29 20:47 [PATCH] svcrpc: modifying positive sunrpc cache entries is racy J. Bruce Fields
  2010-12-29 20:59 ` J. Bruce Fields
@ 2011-01-03 22:26 ` J. Bruce Fields
  2011-01-04  3:08   ` J. Bruce Fields
  1 sibling, 1 reply; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-03 22:26 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

On Wed, Dec 29, 2010 at 03:47:52PM -0500, bfields wrote:
> From: J. Bruce Fields <bfields@redhat.com>
> 
> Once a sunrpc cache entry is non-NEGATIVE, we should be replacing it
> (and allowing any concurrent users to destroy it on last put) instead of
> trying to update it in place.

Or the following seems simpler.

(And I was thinking it was necessary to ensure that the right thing
happened to the cached xprt->xpt_auth_cache entry--though on a second
look I see that sunrpc_cache_update also expires the replaced entry
immediately.  Still, this seems simpler if it also works.)

--b.

commit 20771de3185bf0031aa767d3b19f3e744a465a0c
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Fri Dec 24 14:03:38 2010 -0500

    svcrpc: modifying valid sunrpc cache entries is racy
    
    Once a sunrpc cache entry is non-NEGATIVE, we shouldn't attempt to
    modify it in place.  Otherwise someone referencing the ip_map we're
    modifying here could try to use the m_client just as we're putting the
    last reference.
    
    Instead, just set the expiry time so it expires immediately.
    
    The bug should only be seen by users of the legacy nfsd interfaces.
    
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index a04ac91..5edc147 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -401,8 +401,7 @@ struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr)
 		return NULL;
 
 	if ((ipm->m_client->addr_changes - ipm->m_add_change) >0) {
-		if (test_and_set_bit(CACHE_NEGATIVE, &ipm->h.flags) == 0)
-			auth_domain_put(&ipm->m_client->h);
+		ipm->h.expiry_time = 0;
 		rv = NULL;
 	} else {
 		rv = &ipm->m_client->h;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-03 22:26 ` J. Bruce Fields
@ 2011-01-04  3:08   ` J. Bruce Fields
  2011-01-04  4:51     ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-04  3:08 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

On Mon, Jan 03, 2011 at 05:26:05PM -0500, J. Bruce Fields wrote:
> On Wed, Dec 29, 2010 at 03:47:52PM -0500, bfields wrote:
> > From: J. Bruce Fields <bfields@redhat.com>
> > 
> > Once a sunrpc cache entry is non-NEGATIVE, we should be replacing it
> > (and allowing any concurrent users to destroy it on last put) instead of
> > trying to update it in place.
> 
> Or the following seems simpler.
> 
> (And I was thinking it was necessary to ensure that the right thing
> happened to the cached xprt->xpt_auth_cache entry--though on a second
> look I see that sunrpc_cache_update also expires the replaced entry
> immediately.  Still, this seems simpler if it also works.)

Eh, on third thoughts: we probably do want a real negative entry created
in the cache, so I think the original patch was correct!

--b.

> 
> --b.
> 
> commit 20771de3185bf0031aa767d3b19f3e744a465a0c
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Fri Dec 24 14:03:38 2010 -0500
> 
>     svcrpc: modifying valid sunrpc cache entries is racy
>     
>     Once a sunrpc cache entry is non-NEGATIVE, we shouldn't attempt to
>     modify it in place.  Otherwise someone referencing the ip_map we're
>     modifying here could try to use the m_client just as we're putting the
>     last reference.
>     
>     Instead, just set the expiry time so it expires immediately.
>     
>     The bug should only be seen by users of the legacy nfsd interfaces.
>     
>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> 
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index a04ac91..5edc147 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -401,8 +401,7 @@ struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr)
>  		return NULL;
>  
>  	if ((ipm->m_client->addr_changes - ipm->m_add_change) >0) {
> -		if (test_and_set_bit(CACHE_NEGATIVE, &ipm->h.flags) == 0)
> -			auth_domain_put(&ipm->m_client->h);
> +		ipm->h.expiry_time = 0;
>  		rv = NULL;
>  	} else {
>  		rv = &ipm->m_client->h;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-04  3:08   ` J. Bruce Fields
@ 2011-01-04  4:51     ` NeilBrown
  2011-01-04 18:43       ` J. Bruce Fields
  2011-01-04 21:46       ` J. Bruce Fields
  0 siblings, 2 replies; 20+ messages in thread
From: NeilBrown @ 2011-01-04  4:51 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Mon, 3 Jan 2011 22:08:05 -0500 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Mon, Jan 03, 2011 at 05:26:05PM -0500, J. Bruce Fields wrote:
> > On Wed, Dec 29, 2010 at 03:47:52PM -0500, bfields wrote:
> > > From: J. Bruce Fields <bfields@redhat.com>
> > > 
> > > Once a sunrpc cache entry is non-NEGATIVE, we should be replacing it
> > > (and allowing any concurrent users to destroy it on last put) instead of
> > > trying to update it in place.
> > 
> > Or the following seems simpler.
> > 
> > (And I was thinking it was necessary to ensure that the right thing
> > happened to the cached xprt->xpt_auth_cache entry--though on a second
> > look I see that sunrpc_cache_update also expires the replaced entry
> > immediately.  Still, this seems simpler if it also works.)
> 
> Eh, on third thoughts: we probably do want a real negative entry created
> in the cache, so I think the original patch was correct!

I like your second thought better than your third.
I don't see any reason to push this item out of the cache extra quickly.
In fact I think it would still be correct to just remove those two lines and
not set expiry_time to 0.  auth_unix_lookup will never find that IP address,
and so it doesn't matter if it remains in the cache or not.
I guess there could result in the cache appearing to contain different data
depending on whether you look at it the 'old' way or the 'new' way, but I
don't that is a real problem, and setting expiry_time to 0 overcomes that.

What was the substance of your third thought?

BTW, you could use sunrpc_invalidate rather than just setting expiry_time to
zero, which would hurry it out of the cache a bit faster.


And this all made me realise that there is more code that can be placed under
CONFIG_NFSD_DEPRECTATED.


SUNRPC: Remove more code when NFSD_DEPRECATED is not configured

Signed-off-by: NeilBrown <neilb@suse.de>


diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 6950c98..c2aebe8 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -255,10 +255,13 @@ static inline time_t get_expiry(char **bpp)
 	return rv - boot.tv_sec;
 }
 
+#ifdef CONFIG_NFSD_DEPRECATED
 static inline void sunrpc_invalidate(struct cache_head *h,
 				     struct cache_detail *detail)
 {
 	h->expiry_time = seconds_since_boot() - 1;
 	detail->nextcheck = seconds_since_boot();
 }
+#endif /* CONFIG_NFSD_DEPRECATED */
+
 #endif /*  _LINUX_SUNRPC_CACHE_H_ */
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 560677d..92d47ad 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -30,7 +30,9 @@
 
 struct unix_domain {
 	struct auth_domain	h;
+#ifdef CONFIG_NFSD_DEPRECATED
 	int	addr_changes;
+#endif /* CONFIG_NFSD_DEPRECATED */
 	/* other stuff later */
 };
 
@@ -64,7 +66,9 @@ struct auth_domain *unix_domain_find(char *name)
 			return NULL;
 		}
 		new->h.flavour = &svcauth_unix;
+#ifdef CONFIG_NFSD_DEPRECATED
 		new->addr_changes = 0;
+#endif /* CONFIG_NFSD_DEPRECATED */
 		rv = auth_domain_lookup(name, &new->h);
 	}
 }
@@ -92,7 +96,9 @@ struct ip_map {
 	char			m_class[8]; /* e.g. "nfsd" */
 	struct in6_addr		m_addr;
 	struct unix_domain	*m_client;
+#ifdef CONFIG_NFSD_DEPRECATED
 	int			m_add_change;
+#endif /* CONFIG_NFSD_DEPRECATED */
 };
 
 static void ip_map_put(struct kref *kref)
@@ -146,7 +152,9 @@ static void update(struct cache_head *cnew, struct cache_head *citem)
 
 	kref_get(&item->m_client->h.ref);
 	new->m_client = item->m_client;
+#ifdef CONFIG_NFSD_DEPRECATED
 	new->m_add_change = item->m_add_change;
+#endif /* CONFIG_NFSD_DEPRECATED */
 }
 static struct cache_head *ip_map_alloc(void)
 {
@@ -331,6 +339,7 @@ static int __ip_map_update(struct cache_detail *cd, struct ip_map *ipm,
 	ip.h.flags = 0;
 	if (!udom)
 		set_bit(CACHE_NEGATIVE, &ip.h.flags);
+#ifdef CONFIG_NFSD_DEPRECATED
 	else {
 		ip.m_add_change = udom->addr_changes;
 		/* if this is from the legacy set_client system call,
@@ -339,6 +348,7 @@ static int __ip_map_update(struct cache_detail *cd, struct ip_map *ipm,
 		if (expiry == NEVER)
 			ip.m_add_change++;
 	}
+#endif /* CONFIG_NFSD_DEPRECATED */
 	ip.h.expiry_time = expiry;
 	ch = sunrpc_cache_update(cd, &ip.h, &ipm->h,
 				 hash_str(ipm->m_class, IP_HASHBITS) ^
@@ -358,6 +368,7 @@ static inline int ip_map_update(struct net *net, struct ip_map *ipm,
 	return __ip_map_update(sn->ip_map_cache, ipm, udom, expiry);
 }
 
+#ifdef CONFIG_NFSD_DEPRECATED
 int auth_unix_add_addr(struct net *net, struct in6_addr *addr, struct auth_domain *dom)
 {
 	struct unix_domain *udom;
@@ -426,6 +437,7 @@ void svcauth_unix_purge(void)
 	}
 }
 EXPORT_SYMBOL_GPL(svcauth_unix_purge);
+#endif /* CONFIG_NFSD_DEPRECATED */
 
 static inline struct ip_map *
 ip_map_cached_get(struct svc_xprt *xprt)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-03 20:55       ` J. Bruce Fields
@ 2011-01-04  5:01         ` NeilBrown
  2011-01-04 15:22           ` J. Bruce Fields
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2011-01-04  5:01 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Mon, 3 Jan 2011 15:55:14 -0500 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Wed, Dec 29, 2010 at 08:57:19PM -0500, J. Bruce Fields wrote:
> > On Thu, Dec 30, 2010 at 12:19:40PM +1100, Neil Brown wrote:
> > > On Wed, 29 Dec 2010 15:59:42 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> > > wrote:
> > > > Also noticed while trying to track down an rhel5 oops in
> > > > svcauth_unix_set_client():
> > > > 
> > > > 	- cache_check() can set an entry negative in place, which if
> > > > 	  nothing else must cause a leak in some cases.  (Because when
> > > > 	  the entry is eventually destroyed, it will be assumed to not
> > > > 	  have any contents.)  I suppose the fix is again to try to
> > > > 	  adding a new negative entry instead.
> > > 
> > > cache_check should only set an entry 'negative' if it is not already valid
> > > (rv == -EAGAIN) and there is no up-call pending.
> > 
> > I don't think anything keeps VALID from being set after the
> > cache_is_valid check but before the code that does the
> > set_bit(CACHE_NEGATIVE).
> > 
> > > Maybe we should check CACHE_VALID again after the test_and_set of
> > > CACHE_PENDING, but is a very unlikely race (if it is actually a race at all)
> > > 
> > > > 
> > > > 	- since cache_check() doesn't use any locking, I can't see what
> > > > 	  guarantees that when it sees the CACHE_VALID bit set and
> > > > 	  CACHE_NEGATIVE cleared, it must necessarily see the new
> > > > 	  contents.   I think that'd be fixed by a wmb() before setting
> > > > 	  those bits and a rmb() after checking them.  I don't know if
> > > > 	  it's actually possible to hit that bug....
> > > 
> > > Yes, we probably want a set_bit_lock in cache_fresh_locked() though I don't
> > > think that exists, so we could use test_and_set_bit_locked() instead.
> > > 
> > > But it does feel like maybe we should add some locking to cache_check.
> > > Take the lock at the the start, and release it after the
> > > test_and_set_bit(CACHE_PENDING) or once we have decided not to do that ???
> > 
> > Maybe so.
> 
> Here's one attempt.
> 
> --b.
> 
> commit 55563023f85d01698ccf72325c87e3a7039a189b
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Mon Jan 3 15:10:27 2011 -0500
> 
>     svcrpc: take locks to fix cache_check races
>     
>     There are at least a couple races in cache_check:
>     
>     	- We attempt to turn a cache entry negative in place.  But that
>     	  entry may already have been filled in by some other task since
>     	  we last checked whether it was valid, so we could be modifying
>     	  an already-valid entry.  If nothing else there's a likely leak
>     	  in such a case when the entry is eventually put() and contents
>     	  are not freed because it has CACHE_NEGATIVE set.
>     	- If cache_check races with an update that is turning the entry
>     	  CACHE_VALID, then it's possible that the CACHE_VALID bit could
>     	  become visible on this CPU before the actual contents do, so
>     	  we could tell the caller this entry is ready to use when in
>     	  fact the caller could still get invalid contents.
>     
>     Some memory barriers might be sufficient to fix at least the latter; but
>     for now let's keep things simple and take the hash_lock when we turn an
>     entry negative or check the CACHE_VALID bit.
>     
>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 0d6002f..2105b40 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -200,8 +200,9 @@ static int cache_make_upcall(struct cache_detail *cd, struct cache_head *h)
>  	return cd->cache_upcall(cd, h);
>  }
>  
> -static inline int cache_is_valid(struct cache_detail *detail, struct cache_head *h)
> +static int __cache_is_valid(struct cache_detail *detail, struct cache_head *h)
>  {
> +
>  	if (!test_bit(CACHE_VALID, &h->flags))
>  		return -EAGAIN;
>  	else {
> @@ -213,6 +214,33 @@ static inline int cache_is_valid(struct cache_detail *detail, struct cache_head
>  	}
>  }
>  
> +static int cache_is_valid(struct cache_detail *detail, struct cache_head *h)
> +{
> +	int rv;
> +
> +	read_lock(&detail->hash_lock);
> +	rv = __cache_is_valid(detail, h);
> +	read_unlock(&detail->hash_lock);
> +	return rv;
> +}

I don't think there is anything in __cache_is_valid that needs to be
protected.
The compiler will almost certainly produce code which loads f->flags once and
then performs 1 or 2 bit tests against the value in the register and produces
one of 3 possible return values based on the result.
There is absolutely no value in putting locking around that, especially as
CACHE_VALID is never cleared.

Maybe you imagine a re-ordering of setting CACHE_NEGATIVE and CACHE_VALID,
but as they are in the same cache line (and in fact in the same byte) they
cannot be re-ordered.  We always set CACHE_NEGATIVE before CACHE_VALID and
there is no way those two could get to memory in the wrong order.



> +
> +static int try_to_negate_entry(struct cache_detail *detail, struct cache_head *h)
> +{
> +	int rv;
> +
> +	write_lock(&detail->hash_lock);
> +	rv = __cache_is_valid(detail, h);
> +	if (rv != -EAGAIN) {
> +		write_unlock(&detail->hash_lock);
> +		return rv;
> +	}
> +	set_bit(CACHE_NEGATIVE, &h->flags);
> +	cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
> +	write_unlock(&detail->hash_lock);
> +	cache_fresh_unlocked(h, detail);
> +	return -ENOENT;
> +}
> +
>  /*
>   * This is the generic cache management routine for all
>   * the authentication caches.
> @@ -251,14 +279,8 @@ int cache_check(struct cache_detail *detail,
>  			case -EINVAL:
>  				clear_bit(CACHE_PENDING, &h->flags);
>  				cache_revisit_request(h);
> -				if (rv == -EAGAIN) {
> -					set_bit(CACHE_NEGATIVE, &h->flags);
> -					cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
> -					cache_fresh_unlocked(h, detail);
> -					rv = -ENOENT;
> -				}
> +				rv = try_to_negate_entry(detail, h);
>  				break;

This bit looks good those.  It feels much better having an 'unlock' between
'cache_fresh_locked' and 'cache_fresh_unlocked' !!

Thanks,

NeilBrown


> -
>  			case -EAGAIN:
>  				clear_bit(CACHE_PENDING, &h->flags);
>  				cache_revisit_request(h);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-04  5:01         ` NeilBrown
@ 2011-01-04 15:22           ` J. Bruce Fields
  2011-01-04 19:23             ` J. Bruce Fields
  0 siblings, 1 reply; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-04 15:22 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

On Tue, Jan 04, 2011 at 04:01:52PM +1100, NeilBrown wrote:
> On Mon, 3 Jan 2011 15:55:14 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> > @@ -213,6 +214,33 @@ static inline int cache_is_valid(struct cache_detail *detail, struct cache_head
> >  	}
> >  }
> >  
> > +static int cache_is_valid(struct cache_detail *detail, struct cache_head *h)
> > +{
> > +	int rv;
> > +
> > +	read_lock(&detail->hash_lock);
> > +	rv = __cache_is_valid(detail, h);
> > +	read_unlock(&detail->hash_lock);
> > +	return rv;
> > +}
> 
> I don't think there is anything in __cache_is_valid that needs to be
> protected.
> The compiler will almost certainly produce code which loads f->flags once and
> then performs 1 or 2 bit tests against the value in the register and produces
> one of 3 possible return values based on the result.
> There is absolutely no value in putting locking around that, especially as
> CACHE_VALID is never cleared.
> 
> Maybe you imagine a re-ordering of setting CACHE_NEGATIVE and CACHE_VALID,
> but as they are in the same cache line (and in fact in the same byte) they
> cannot be re-ordered.  We always set CACHE_NEGATIVE before CACHE_VALID and
> there is no way those two could get to memory in the wrong order.

The risk would be a reordering of CACHE_VALID with setting of the actual
contents in the !NEGATIVE case, so:

	task doing lookup		task doing update
	^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^

	1. test for CACHE_VALID		3. set item->contents
	   & !CACHE_NEGATIVE

	2. dereference			4. set CACHE_VALID, clear
	   item->contents->...		   CACHE_NEGATIVE.

As I understand it, if we want to gaurantee that item->contents is good
at step 2, then we need a write barrier between 3 and 4, together with a
read barrier between 1 and 2.  Taking the spinlock is overkill, but
should accomplish the same thing, as it forces 1 to occur before 3 or
after 4, and adds any necessary memory barriers.

--b.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-04  4:51     ` NeilBrown
@ 2011-01-04 18:43       ` J. Bruce Fields
  2011-01-04 21:15         ` NeilBrown
  2011-01-04 21:46       ` J. Bruce Fields
  1 sibling, 1 reply; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-04 18:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

On Tue, Jan 04, 2011 at 03:51:07PM +1100, NeilBrown wrote:
> On Mon, 3 Jan 2011 22:08:05 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
> > On Mon, Jan 03, 2011 at 05:26:05PM -0500, J. Bruce Fields wrote:
> > > On Wed, Dec 29, 2010 at 03:47:52PM -0500, bfields wrote:
> > > > From: J. Bruce Fields <bfields@redhat.com>
> > > > 
> > > > Once a sunrpc cache entry is non-NEGATIVE, we should be replacing it
> > > > (and allowing any concurrent users to destroy it on last put) instead of
> > > > trying to update it in place.
> > > 
> > > Or the following seems simpler.
> > > 
> > > (And I was thinking it was necessary to ensure that the right thing
> > > happened to the cached xprt->xpt_auth_cache entry--though on a second
> > > look I see that sunrpc_cache_update also expires the replaced entry
> > > immediately.  Still, this seems simpler if it also works.)
> > 
> > Eh, on third thoughts: we probably do want a real negative entry created
> > in the cache, so I think the original patch was correct!
> 
> I like your second thought better than your third.
> I don't see any reason to push this item out of the cache extra quickly.
> In fact I think it would still be correct to just remove those two lines and
> not set expiry_time to 0.  auth_unix_lookup will never find that IP address,
> and so it doesn't matter if it remains in the cache or not.
> I guess there could result in the cache appearing to contain different data
> depending on whether you look at it the 'old' way or the 'new' way, but I
> don't that is a real problem, and setting expiry_time to 0 overcomes that.
> 
> What was the substance of your third thought?

I had some idea that we could end up with a cache entry stuck in the
cache forever.

OK, actually I think late last night I reverted into some sort of "maybe
if I type the first thing that comes to my mind, somebody else will
think this through for me" state.  Apologies.  Um, did I win?

> BTW, you could use sunrpc_invalidate rather than just setting expiry_time to
> zero, which would hurry it out of the cache a bit faster.

Yep, I like that, and I'd forgotten about sunrpc_invalidate, thanks!
Done....  (As below).

> And this all made me realise that there is more code that can be placed under
> CONFIG_NFSD_DEPRECTATED.

Yep, applied.

(When do we get to remove all this?  Taking a stab at the 2.6.40 merge
window....  OK, party at my place in May!)

--b.

commit ab5c05c579b0b37b9bf2c79c9c8f0ef6045ee41d
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Fri Dec 24 14:03:38 2010 -0500

    svcrpc: modifying valid sunrpc cache entries is racy
    
    Once a sunrpc cache entry is VALID, we should be replacing it (and
    allowing any concurrent users to destroy it on last put) instead of
    trying to update it in place.
    
    Otherwise someone referencing the ip_map we're modifying here could try
    to use the m_client just as we're putting the last reference.
    
    The bug should only be seen by users of the legacy nfsd interfaces.
    
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index a04ac91..59a7c52 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -401,8 +401,7 @@ struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr)
 		return NULL;
 
 	if ((ipm->m_client->addr_changes - ipm->m_add_change) >0) {
-		if (test_and_set_bit(CACHE_NEGATIVE, &ipm->h.flags) == 0)
-			auth_domain_put(&ipm->m_client->h);
+		sunrpc_invalidate(&ipm->h, sn->ip_map_cache);
 		rv = NULL;
 	} else {
 		rv = &ipm->m_client->h;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-04 15:22           ` J. Bruce Fields
@ 2011-01-04 19:23             ` J. Bruce Fields
  2011-01-04 19:31               ` [PATCH 1/2] svcrpc: take lock on turning entry NEGATIVE in cache_check J. Bruce Fields
                                 ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-04 19:23 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

On Tue, Jan 04, 2011 at 10:22:31AM -0500, J. Bruce Fields wrote:
> On Tue, Jan 04, 2011 at 04:01:52PM +1100, NeilBrown wrote:
> > On Mon, 3 Jan 2011 15:55:14 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> > wrote:
> > > @@ -213,6 +214,33 @@ static inline int cache_is_valid(struct cache_detail *detail, struct cache_head
> > >  	}
> > >  }
> > >  
> > > +static int cache_is_valid(struct cache_detail *detail, struct cache_head *h)
> > > +{
> > > +	int rv;
> > > +
> > > +	read_lock(&detail->hash_lock);
> > > +	rv = __cache_is_valid(detail, h);
> > > +	read_unlock(&detail->hash_lock);
> > > +	return rv;
> > > +}
> > 
> > I don't think there is anything in __cache_is_valid that needs to be
> > protected.
> > The compiler will almost certainly produce code which loads f->flags once and
> > then performs 1 or 2 bit tests against the value in the register and produces
> > one of 3 possible return values based on the result.
> > There is absolutely no value in putting locking around that, especially as
> > CACHE_VALID is never cleared.
> > 
> > Maybe you imagine a re-ordering of setting CACHE_NEGATIVE and CACHE_VALID,
> > but as they are in the same cache line (and in fact in the same byte) they
> > cannot be re-ordered.  We always set CACHE_NEGATIVE before CACHE_VALID and
> > there is no way those two could get to memory in the wrong order.
> 
> The risk would be a reordering of CACHE_VALID with setting of the actual
> contents in the !NEGATIVE case, so:
> 
> 	task doing lookup		task doing update
> 	^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^
> 
> 	1. test for CACHE_VALID		3. set item->contents
> 	   & !CACHE_NEGATIVE
> 
> 	2. dereference			4. set CACHE_VALID, clear
> 	   item->contents->...		   CACHE_NEGATIVE.
> 
> As I understand it, if we want to gaurantee that item->contents is good
> at step 2, then we need a write barrier between 3 and 4, together with a
> read barrier between 1 and 2.  Taking the spinlock is overkill, but
> should accomplish the same thing, as it forces 1 to occur before 3 or
> after 4, and adds any necessary memory barriers.

So, that being the real problem, perhaps it's clearer to use explicit
memory barriers instead of more locking, and add some comments.  Also
split this into a separate patch, as in the following.

--b.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] svcrpc: take lock on turning entry NEGATIVE in cache_check
  2011-01-04 19:23             ` J. Bruce Fields
@ 2011-01-04 19:31               ` J. Bruce Fields
  2011-01-04 19:31               ` [PATCH 2/2] svcrpc: ensure cache_check caller sees updated entry J. Bruce Fields
  2011-01-04 21:10               ` [PATCH] svcrpc: modifying positive sunrpc cache entries is racy NeilBrown
  2 siblings, 0 replies; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-04 19:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs, J. Bruce Fields

We attempt to turn a cache entry negative in place.  But that entry may
already have been filled in by some other task since we last checked
whether it was valid, so we could be modifying an already-valid entry.
If nothing else there's a likely leak in such a case when the entry is
eventually put() and contents are not freed because it has
CACHE_NEGATIVE set.

So, take the cache_lock just as sunrpc_cache_update() does.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 net/sunrpc/cache.c |   25 ++++++++++++++++++-------
 1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 0d6002f..a6c5733 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -213,6 +213,23 @@ static inline int cache_is_valid(struct cache_detail *detail, struct cache_head
 	}
 }
 
+static int try_to_negate_entry(struct cache_detail *detail, struct cache_head *h)
+{
+	int rv;
+
+	write_lock(&detail->hash_lock);
+	rv = cache_is_valid(detail, h);
+	if (rv != -EAGAIN) {
+		write_unlock(&detail->hash_lock);
+		return rv;
+	}
+	set_bit(CACHE_NEGATIVE, &h->flags);
+	cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
+	write_unlock(&detail->hash_lock);
+	cache_fresh_unlocked(h, detail);
+	return -ENOENT;
+}
+
 /*
  * This is the generic cache management routine for all
  * the authentication caches.
@@ -251,14 +268,8 @@ int cache_check(struct cache_detail *detail,
 			case -EINVAL:
 				clear_bit(CACHE_PENDING, &h->flags);
 				cache_revisit_request(h);
-				if (rv == -EAGAIN) {
-					set_bit(CACHE_NEGATIVE, &h->flags);
-					cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
-					cache_fresh_unlocked(h, detail);
-					rv = -ENOENT;
-				}
+				rv = try_to_negate_entry(detail, h);
 				break;
-
 			case -EAGAIN:
 				clear_bit(CACHE_PENDING, &h->flags);
 				cache_revisit_request(h);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] svcrpc: ensure cache_check caller sees updated entry
  2011-01-04 19:23             ` J. Bruce Fields
  2011-01-04 19:31               ` [PATCH 1/2] svcrpc: take lock on turning entry NEGATIVE in cache_check J. Bruce Fields
@ 2011-01-04 19:31               ` J. Bruce Fields
  2011-01-04 21:10               ` [PATCH] svcrpc: modifying positive sunrpc cache entries is racy NeilBrown
  2 siblings, 0 replies; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-04 19:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs, J. Bruce Fields

Supposes cache_check runs simultaneously with an update on a different
CPU:

	cache_check			task doing update
	^^^^^^^^^^^			^^^^^^^^^^^^^^^^^

	1. test for CACHE_VALID		1'. set entry->data
	   & !CACHE_NEGATIVE

	2. use entry->data		2'. set CACHE_VALID

If the two memory writes performed in step 1' and 2' appear misordered
with respect to the reads in step 1 and 2, then the caller could get
stale data at step 2 even though it saw CACHE_VALID set on the cache
entry.

Add memory barriers to prevent this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 net/sunrpc/cache.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index a6c5733..72ad836 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -128,6 +128,7 @@ static void cache_fresh_locked(struct cache_head *head, time_t expiry)
 {
 	head->expiry_time = expiry;
 	head->last_refresh = seconds_since_boot();
+	smp_wmb(); /* paired with smp_rmb() in cache_is_valid() */
 	set_bit(CACHE_VALID, &head->flags);
 }
 
@@ -208,8 +209,16 @@ static inline int cache_is_valid(struct cache_detail *detail, struct cache_head
 		/* entry is valid */
 		if (test_bit(CACHE_NEGATIVE, &h->flags))
 			return -ENOENT;
-		else
+		else {
+			/*
+			 * In combination with write barrier in
+			 * sunrpc_cache_update, ensures that anyone
+			 * using the cache entry after this sees the
+			 * updated contents:
+			 */
+			smp_rmb();
 			return 0;
+		}
 	}
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-04 19:23             ` J. Bruce Fields
  2011-01-04 19:31               ` [PATCH 1/2] svcrpc: take lock on turning entry NEGATIVE in cache_check J. Bruce Fields
  2011-01-04 19:31               ` [PATCH 2/2] svcrpc: ensure cache_check caller sees updated entry J. Bruce Fields
@ 2011-01-04 21:10               ` NeilBrown
       [not found]                 ` <20110105081031.220bfbc9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2011-01-04 21:10 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Tue, 4 Jan 2011 14:23:51 -0500 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> So, that being the real problem, perhaps it's clearer to use explicit
> memory barriers instead of more locking, and add some comments.  Also
> split this into a separate patch, as in the following.

Me likie.

Two separate patches is good, and nice comments next to the memory barriers
is good.
Reviewed-by: NeilBrown <neilb@suse.de>

Thanks!

NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
       [not found]                 ` <20110105081031.220bfbc9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2011-01-04 21:15                   ` J. Bruce Fields
  0 siblings, 0 replies; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-04 21:15 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

On Wed, Jan 05, 2011 at 08:10:31AM +1100, NeilBrown wrote:
> On Tue, 4 Jan 2011 14:23:51 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
> > So, that being the real problem, perhaps it's clearer to use explicit
> > memory barriers instead of more locking, and add some comments.  Also
> > split this into a separate patch, as in the following.
> 
> Me likie.
> 
> Two separate patches is good, and nice comments next to the memory barriers
> is good.
> Reviewed-by: NeilBrown <neilb@suse.de>

Good, thanks--queueing up for 2.6.38.--b.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-04 18:43       ` J. Bruce Fields
@ 2011-01-04 21:15         ` NeilBrown
  2011-01-04 21:21           ` J. Bruce Fields
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2011-01-04 21:15 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Tue, 4 Jan 2011 13:43:14 -0500 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Tue, Jan 04, 2011 at 03:51:07PM +1100, NeilBrown wrote:
> > On Mon, 3 Jan 2011 22:08:05 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> > wrote:
> > 
> > > On Mon, Jan 03, 2011 at 05:26:05PM -0500, J. Bruce Fields wrote:
> > > > On Wed, Dec 29, 2010 at 03:47:52PM -0500, bfields wrote:
> > > > > From: J. Bruce Fields <bfields@redhat.com>
> > > > > 
> > > > > Once a sunrpc cache entry is non-NEGATIVE, we should be replacing it
> > > > > (and allowing any concurrent users to destroy it on last put) instead of
> > > > > trying to update it in place.
> > > > 
> > > > Or the following seems simpler.
> > > > 
> > > > (And I was thinking it was necessary to ensure that the right thing
> > > > happened to the cached xprt->xpt_auth_cache entry--though on a second
> > > > look I see that sunrpc_cache_update also expires the replaced entry
> > > > immediately.  Still, this seems simpler if it also works.)
> > > 
> > > Eh, on third thoughts: we probably do want a real negative entry created
> > > in the cache, so I think the original patch was correct!
> > 
> > I like your second thought better than your third.
> > I don't see any reason to push this item out of the cache extra quickly.
> > In fact I think it would still be correct to just remove those two lines and
> > not set expiry_time to 0.  auth_unix_lookup will never find that IP address,
> > and so it doesn't matter if it remains in the cache or not.
> > I guess there could result in the cache appearing to contain different data
> > depending on whether you look at it the 'old' way or the 'new' way, but I
> > don't that is a real problem, and setting expiry_time to 0 overcomes that.
> > 
> > What was the substance of your third thought?
> 
> I had some idea that we could end up with a cache entry stuck in the
> cache forever.
> 
> OK, actually I think late last night I reverted into some sort of "maybe
> if I type the first thing that comes to my mind, somebody else will
> think this through for me" state.  Apologies.  Um, did I win?

This would be the Andrew Morton method of community involvement:  say
something obviously wrong as it produces more responses than saying something
right - and some of them might be useful.... I fall for it all the time!


> 
> > BTW, you could use sunrpc_invalidate rather than just setting expiry_time to
> > zero, which would hurry it out of the cache a bit faster.
> 
> Yep, I like that, and I'd forgotten about sunrpc_invalidate, thanks!
> Done....  (As below).
> 
> > And this all made me realise that there is more code that can be placed under
> > CONFIG_NFSD_DEPRECTATED.
> 
> Yep, applied.
> 
> (When do we get to remove all this?  Taking a stab at the 2.6.40 merge
> window....

That is what is says in feature-removal-schedule.txt (which no-one reads).


>             OK, party at my place in May!)

Will there still be snow?  I'm not coming if there is no snow!

(patch looks good)

NeilBrown



> 
> --b.
> 
> commit ab5c05c579b0b37b9bf2c79c9c8f0ef6045ee41d
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Fri Dec 24 14:03:38 2010 -0500
> 
>     svcrpc: modifying valid sunrpc cache entries is racy
>     
>     Once a sunrpc cache entry is VALID, we should be replacing it (and
>     allowing any concurrent users to destroy it on last put) instead of
>     trying to update it in place.
>     
>     Otherwise someone referencing the ip_map we're modifying here could try
>     to use the m_client just as we're putting the last reference.
>     
>     The bug should only be seen by users of the legacy nfsd interfaces.
>     
>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> 
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index a04ac91..59a7c52 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -401,8 +401,7 @@ struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr)
>  		return NULL;
>  
>  	if ((ipm->m_client->addr_changes - ipm->m_add_change) >0) {
> -		if (test_and_set_bit(CACHE_NEGATIVE, &ipm->h.flags) == 0)
> -			auth_domain_put(&ipm->m_client->h);
> +		sunrpc_invalidate(&ipm->h, sn->ip_map_cache);
>  		rv = NULL;
>  	} else {
>  		rv = &ipm->m_client->h;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-04 21:15         ` NeilBrown
@ 2011-01-04 21:21           ` J. Bruce Fields
  0 siblings, 0 replies; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-04 21:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

On Wed, Jan 05, 2011 at 08:15:38AM +1100, NeilBrown wrote:
> On Tue, 4 Jan 2011 13:43:14 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> > Yep, applied.
> > 
> > (When do we get to remove all this?  Taking a stab at the 2.6.40 merge
> > window....
> 
> That is what is says in feature-removal-schedule.txt (which no-one reads).
> 
> 
> >             OK, party at my place in May!)
> 
> Will there still be snow?  I'm not coming if there is no snow!

If anyone attends from Australia, I'll find the snow.  It may melt
rather quickly, though....

> (patch looks good)

Applied--thanks for the help!--b.

> 
> NeilBrown
> 
> 
> 
> > 
> > --b.
> > 
> > commit ab5c05c579b0b37b9bf2c79c9c8f0ef6045ee41d
> > Author: J. Bruce Fields <bfields@redhat.com>
> > Date:   Fri Dec 24 14:03:38 2010 -0500
> > 
> >     svcrpc: modifying valid sunrpc cache entries is racy
> >     
> >     Once a sunrpc cache entry is VALID, we should be replacing it (and
> >     allowing any concurrent users to destroy it on last put) instead of
> >     trying to update it in place.
> >     
> >     Otherwise someone referencing the ip_map we're modifying here could try
> >     to use the m_client just as we're putting the last reference.
> >     
> >     The bug should only be seen by users of the legacy nfsd interfaces.
> >     
> >     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > 
> > diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> > index a04ac91..59a7c52 100644
> > --- a/net/sunrpc/svcauth_unix.c
> > +++ b/net/sunrpc/svcauth_unix.c
> > @@ -401,8 +401,7 @@ struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr)
> >  		return NULL;
> >  
> >  	if ((ipm->m_client->addr_changes - ipm->m_add_change) >0) {
> > -		if (test_and_set_bit(CACHE_NEGATIVE, &ipm->h.flags) == 0)
> > -			auth_domain_put(&ipm->m_client->h);
> > +		sunrpc_invalidate(&ipm->h, sn->ip_map_cache);
> >  		rv = NULL;
> >  	} else {
> >  		rv = &ipm->m_client->h;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-04  4:51     ` NeilBrown
  2011-01-04 18:43       ` J. Bruce Fields
@ 2011-01-04 21:46       ` J. Bruce Fields
  2011-01-04 23:05         ` NeilBrown
  1 sibling, 1 reply; 20+ messages in thread
From: J. Bruce Fields @ 2011-01-04 21:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

On Tue, Jan 04, 2011 at 03:51:07PM +1100, NeilBrown wrote:
> And this all made me realise that there is more code that can be placed under
> CONFIG_NFSD_DEPRECTATED.
> 
> 
> SUNRPC: Remove more code when NFSD_DEPRECATED is not configured
> 
> Signed-off-by: NeilBrown <neilb@suse.de>

I did need one change to get a succesful !CONFIG_NFSD_DEPRECATED
compile.

--b.

diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 46e1cbc..30916b0 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -422,6 +422,7 @@ struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr)
 	return rv;
 }
 EXPORT_SYMBOL_GPL(auth_unix_lookup);
+#endif /* CONFIG_NFSD_DEPRECATED */
 
 void svcauth_unix_purge(void)
 {
@@ -435,7 +436,6 @@ void svcauth_unix_purge(void)
 	}
 }
 EXPORT_SYMBOL_GPL(svcauth_unix_purge);
-#endif /* CONFIG_NFSD_DEPRECATED */
 
 static inline struct ip_map *
 ip_map_cached_get(struct svc_xprt *xprt)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
  2011-01-04 21:46       ` J. Bruce Fields
@ 2011-01-04 23:05         ` NeilBrown
  0 siblings, 0 replies; 20+ messages in thread
From: NeilBrown @ 2011-01-04 23:05 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Tue, 4 Jan 2011 16:46:39 -0500 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Tue, Jan 04, 2011 at 03:51:07PM +1100, NeilBrown wrote:
> > And this all made me realise that there is more code that can be placed under
> > CONFIG_NFSD_DEPRECTATED.
> > 
> > 
> > SUNRPC: Remove more code when NFSD_DEPRECATED is not configured
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> 
> I did need one change to get a succesful !CONFIG_NFSD_DEPRECATED
> compile.

yep.... I had done a test compile, but I managed to get the sense of
CONFIG_NFSD_DEPRECATED inverted :-(

NeilBrown


> 
> --b.
> 
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index 46e1cbc..30916b0 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -422,6 +422,7 @@ struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr)
>  	return rv;
>  }
>  EXPORT_SYMBOL_GPL(auth_unix_lookup);
> +#endif /* CONFIG_NFSD_DEPRECATED */
>  
>  void svcauth_unix_purge(void)
>  {
> @@ -435,7 +436,6 @@ void svcauth_unix_purge(void)
>  	}
>  }
>  EXPORT_SYMBOL_GPL(svcauth_unix_purge);
> -#endif /* CONFIG_NFSD_DEPRECATED */
>  
>  static inline struct ip_map *
>  ip_map_cached_get(struct svc_xprt *xprt)


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-01-04 23:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-29 20:47 [PATCH] svcrpc: modifying positive sunrpc cache entries is racy J. Bruce Fields
2010-12-29 20:59 ` J. Bruce Fields
2010-12-30  1:19   ` Neil Brown
2010-12-30  1:57     ` J. Bruce Fields
2011-01-03 20:55       ` J. Bruce Fields
2011-01-04  5:01         ` NeilBrown
2011-01-04 15:22           ` J. Bruce Fields
2011-01-04 19:23             ` J. Bruce Fields
2011-01-04 19:31               ` [PATCH 1/2] svcrpc: take lock on turning entry NEGATIVE in cache_check J. Bruce Fields
2011-01-04 19:31               ` [PATCH 2/2] svcrpc: ensure cache_check caller sees updated entry J. Bruce Fields
2011-01-04 21:10               ` [PATCH] svcrpc: modifying positive sunrpc cache entries is racy NeilBrown
     [not found]                 ` <20110105081031.220bfbc9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2011-01-04 21:15                   ` J. Bruce Fields
2011-01-03 22:26 ` J. Bruce Fields
2011-01-04  3:08   ` J. Bruce Fields
2011-01-04  4:51     ` NeilBrown
2011-01-04 18:43       ` J. Bruce Fields
2011-01-04 21:15         ` NeilBrown
2011-01-04 21:21           ` J. Bruce Fields
2011-01-04 21:46       ` J. Bruce Fields
2011-01-04 23:05         ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.