* [PATCH Version 3] SVCAUTH update the rsc cacue on RPC_GSS_PROC_DESTROY
@ 2016-12-19 23:55 andros
2016-12-19 23:55 ` [PATCH Version 3] SVCAUTH update the rsc cache " andros
0 siblings, 1 reply; 5+ messages in thread
From: andros @ 2016-12-19 23:55 UTC (permalink / raw)
To: bfields; +Cc: neilb, linux-nfs, Andy Adamson
From: Andy Adamson <andros@rhel7-2-ga-2.androsad.fake>
I instrumented cache_get, cache_put, cache_clean, svcauth_cache_lookup
and svcauth_gss_accept.
Then I mounted -o sec=krb5, listed the mount directory, and umounted.
CASE 1: Here is the instrumented output without the patch:
svcauth_gss_accept gc_proc 3 rq_proc 0 (process RPC_GSS_PROC_DESTROY)
--> sunrpc_cache_lookup from rsc_lookup key ffffc90002b9bc58 hash 940
sunrpc_cache_lookup 1 calling cache_get key ffffc90002b9bc58 tmp
ffff880079b3f500
--> cache_get h ffff880079b3f500 refcount 1
sunrpc_cache_lookup RETURN 1 key ffffc90002b9bc58 tmp ffff880079b3f500
--> rsc_free h ffffc90002b9bc58
&rsci->h is ffff880079b3f500, which identifies the cache entry we want
destroyed.
Case: RPC_GSS_PROC_DESTROY:
svcauth_gss_accept RPC_GSS_PROC_DESTROY h ffff880079b3f500 ref 2
expiry 1481823331 <<<<<
svcauth_gss_accept AFTER SETTING h ffff880079b3f500 expiry 1481819736 <<<<<<<
Note: expiry_time (and CACHE_NEGATIVE) are set.
END of svcauth_gss_accept:
svcauth_gss_accept END calling cache_put h ffff880079b3f500
--> cache_put h ffff880079b3f500 refcount 2 cd->cache_put ffffffffa045c180
Dec 15 11:35:36 rhel7-2-ga-2 kernel: <-- cache_put h ffff880079b3f500 refcount 1
refcount is 1, but cache_clean is not setup to reap the entry as
rsc_update was not called. Besides using the write_lock (which should
be used for entry changes) rsc_update also sets other fields to
trigger cache_clean to remove the entry from the cache_list under
the write_lock and do a final cache_put.
cache_clean CLEAN h ffff880079b3e540 h->expiry_time 1481819736 hash 982
The above cache_clean occurs about 45 minutes after the END
svcauth_gss_accept cache_put.
I waited about 2 hours, and the cache entry was still not reaped
even though the expiry time was set.
CASE 2: Here is the instrumented output with the patch:
svcauth_gss_accept gc_proc 3 rq_proc 0 (process RPC_GSS_PROC_DESTROY)
--> sunrpc_cache_lookup from rsc_lookup key ffffc90002befc30 hash 982
sunrpc_cache_lookup calling cache_get key ffffc90002befc30 tmp ffff88006fcc6cc0
--> cache_get h ffff88006fcc6cc0 refcount 1
sunrpc_cache_lookup RETURN 1 key ffffc90002befc30 tmp ffff88006fcc6cc0
--> rsc_free h ffffc90002befc30
&rsci->h is ffff88006fcc6cc0, which identifies the cache entry we want
destroyed.
Case: RPC_GSS_PROC_DESTROY:
svcauth_gss_accept RPC_GSS_PROC_DESTROY h ffff88006fcc6cc0 ref 2
--> rsc_update new h ffffc90002befd18 new ref 0 old h ffff88006fcc6cc0 old ref 2
--> update_rsc new->h ffff88006fcc6cc0 tmp->h ffffc90002befd18
END of svcauth_gss_accept:
svcauth_gss_accept END calling cache_put h ffff88006fcc6cc0
--> cache_put h ffff88006fcc6cc0 refcount 2 cd->cache_put ffffffffa03cd170
<-- cache_put h ffff88006fcc6cc0 refcount 1
refcount is 1, setup for cache_clean.
cache_clean CLEAN h ffff88006fcc6cc0 h->expiry_time 0 hash 982
cache_clean DELETE h ffff88006fcc6cc0 from cache_list
cache_clean CLEANED calling cache_put h ffff88006fcc6cc0
--> cache_put h ffff88006fcc6cc0 refcount 1
--> rsc_put PUT h ffff88006fcc6cc0
--> rsc_free h ffff88006fcc6cc0
<-- cache_put h ffff88006fcc6cc0 refcount 0
The cache entry is destroyed.
Andy Adamson (1):
SVCAUTH update the rsc cache on RPC_GSS_PROC_DESTROY
net/sunrpc/auth_gss/svcauth_gss.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH Version 3] SVCAUTH update the rsc cache on RPC_GSS_PROC_DESTROY
2016-12-19 23:55 [PATCH Version 3] SVCAUTH update the rsc cacue on RPC_GSS_PROC_DESTROY andros
@ 2016-12-19 23:55 ` andros
0 siblings, 0 replies; 5+ messages in thread
From: andros @ 2016-12-19 23:55 UTC (permalink / raw)
To: bfields; +Cc: neilb, linux-nfs, Andy Adamson
From: Andy Adamson <andros@netapp.com>
The rsc cache code operates in a read_lock/write_lock environment.
Changes to a cache entry should use the provided rsc_update
routine which takes the write_lock.
The current code sets the expiry_time and the CACHE_NEGATIVE flag
without taking the write_lock as it does not call rsc_update.
Without this patch, while cache_clean sees the entries to be
removed, it does not remove the rsc_entries. This is because
rsc_update sets other fields in the entry to properly trigger
cache_clean.
Cache_clean takes the write_lock to remove expired or invalid
entries from the cache_list and calls cache_put on the entry.
Looking at sunrpc_cache_update, what we want is to invalidate the
cache entry, so that it is direclty replaced which means that
update_rsc is called. We pass in a new zero'ed rsc cache entry
to rsc_update with an expiry_time set to 0 along with the
invalidatedcache entry to be destroyed.
The cache_put at the end of svcauth_gss_accept processing drops
the reference count to 1 which allows the cache_put called by
cache_clean to result in a call to rsc_put and rsc_free
to reap the entry after it has been removed from the cache_list
under the write_lock.
Signed-off-by: Andy Adamson <andros@netapp.com>
---
net/sunrpc/auth_gss/svcauth_gss.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 45662d7..b8093da 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -1409,7 +1409,9 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {}
u32 crlen;
struct gss_svc_data *svcdata = rqstp->rq_auth_data;
struct rpc_gss_wire_cred *gc;
- struct rsc *rsci = NULL;
+ struct rsc *rsci = NULL, new = {
+ .mechctx = 0,
+ };
__be32 *rpcstart;
__be32 *reject_stat = resv->iov_base + resv->iov_len;
int ret;
@@ -1489,10 +1491,20 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {}
case RPC_GSS_PROC_DESTROY:
if (gss_write_verf(rqstp, rsci->mechctx, gc->gc_seq))
goto auth_err;
- rsci->h.expiry_time = get_seconds();
- set_bit(CACHE_NEGATIVE, &rsci->h.flags);
+
+ /** Invalidate the cache entry so sunrpc_update_cache
+ * direclty updates rsci. new->h.expiry_time is zero,
+ * so rsci->h.expiry_time will be set to zero and
+ * cache_clean will properly remove rsci.
+ */
+ clear_bit(CACHE_VALID, &rsci->h.flags);
+ rsci = rsc_update(sn->rsc_cache, &new, rsci);
+ if (!rsci)
+ goto drop;
+
if (resv->iov_len + 4 > PAGE_SIZE)
goto drop;
+
svc_putnl(resv, RPC_SUCCESS);
goto complete;
case RPC_GSS_PROC_DATA:
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH Version 3] SVCAUTH update the rsc cacue on RPC_GSS_PROC_DESTROY
2016-12-20 3:57 ` NeilBrown
@ 2016-12-20 15:06 ` Andy Adamson
0 siblings, 0 replies; 5+ messages in thread
From: Andy Adamson @ 2016-12-20 15:06 UTC (permalink / raw)
To: NeilBrown; +Cc: J. Bruce Fields, Neil Brown, NFS list
On Mon, Dec 19, 2016 at 10:57 PM, NeilBrown <neilb@suse.com> wrote:
> On Tue, Dec 20 2016, andros@netapp.com wrote:
>
>> From: Andy Adamson <andros@netapp.com>
>>
>> I instrumented cache_get, cache_put, cache_clean, svcauth_cache_lookup
>> and svcauth_gss_accept.
>>
>> Then I mounted -o sec=krb5, listed the mount directory, and umounted.
>>
>> CASE 1: Here is the instrumented output without the patch:
>>
>> svcauth_gss_accept gc_proc 3 rq_proc 0 (process RPC_GSS_PROC_DESTROY)
>> --> sunrpc_cache_lookup from rsc_lookup key ffffc90002b9bc58 hash 940
>> sunrpc_cache_lookup 1 calling cache_get key ffffc90002b9bc58 tmp
>> ffff880079b3f500
>> --> cache_get h ffff880079b3f500 refcount 1
>> sunrpc_cache_lookup RETURN 1 key ffffc90002b9bc58 tmp ffff880079b3f500
>> --> rsc_free h ffffc90002b9bc58
>>
>> &rsci->h is ffff880079b3f500, which identifies the cache entry we want
>> destroyed.
>>
>> Case: RPC_GSS_PROC_DESTROY:
>> svcauth_gss_accept RPC_GSS_PROC_DESTROY h ffff880079b3f500 ref 2
>> expiry 1481823331 <<<<<
>> svcauth_gss_accept AFTER SETTING h ffff880079b3f500 expiry 1481819736 <<<<<<<
>>
>> Note: expiry_time (and CACHE_NEGATIVE) are set.
>>
>> END of svcauth_gss_accept:
>> svcauth_gss_accept END calling cache_put h ffff880079b3f500
>> --> cache_put h ffff880079b3f500 refcount 2 cd->cache_put ffffffffa045c180
>> Dec 15 11:35:36 rhel7-2-ga-2 kernel: <-- cache_put h ffff880079b3f500 refcount 1
>>
>> refcount is 1, but cache_clean is not setup to reap the entry as
>> rsc_update was not called.
>
> What exactly do you mean by "cache_clean is not setup to reap the entry"?
>
> I think the problem is that detail->flush_time hasn't been updated to
> match the new (reduced) ->expiry_time. If that is what you meant by the
> above, I completely agree.
Here is the code that prevents cache_clean from reaping the entry:
cache_clean:
hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
if (current_detail->nextcheck > ch->expiry_time)
current_detail->nextcheck = ch->expiry_time+1;
if (!cache_is_expired(current_detail, ch))
continue;
<<<<<<<<<<<<<<<<<<<<<<<<<<<<< does not make it past this continue.
hlist_del_init(&ch->cache_list);
current_detail->entries--;
rv = 1;
break;
}
static inline int cache_is_expired(struct cache_detail *detail, struct
cache_head *h)
{
return (h->expiry_time < seconds_since_boot()) ||
(detail->flush_time >= h->last_refresh);
}
so either setting the expiry_time < seconds_since_boot or ensuring
that the flush_time is greater than the last_refresh is what I mean.
The current code does neither, and so the rsc entry is _never_ reaped.
>
>
>> Besides using the write_lock (which should
>> be used for entry changes) rsc_update also sets other fields to
>> trigger cache_clean to remove the entry from the cache_list under
>> the write_lock and do a final cache_put.
>
> The write_lock is needed for some changes, but not necessarily all.
> e.g. just clearing a bit is already atomic, so doesn't need a lock.
> Decreasing ->flush_time to match a new ->expiry_time does need the lock
> as it needs to be atomic.
And as I pointed out above, the write_lock is needed to remove the
cache entry from the cache_list.
Note that if a cache_put is called with an input refcount of 1 and so
calls rsc_put/rsc_free without first removing the entry from the
cache_list, the next time cache_clean walks the list, it encounters
the hole left by the freed entry and produces a kernel oops. So either
we
1) set it up so that the cache_is_expired() test passes and the refcount is 1
or
2) take the write_lock in rsc_put, and delete the entry from the
cache_list prior to freeing the entry.
This patch chooses #1.
I'm confused as to why the architecture is not clear. Why is it not
obvious, or well documented, as to what needs to be called to expire
an entry?
-->Andy
>
> I'll comment on the patch separately.
>
> Thanks,
> NeilBrown
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH Version 3] SVCAUTH update the rsc cacue on RPC_GSS_PROC_DESTROY
2016-12-20 0:00 [PATCH Version 3] SVCAUTH update the rsc cacue " andros
@ 2016-12-20 3:57 ` NeilBrown
2016-12-20 15:06 ` Andy Adamson
0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2016-12-20 3:57 UTC (permalink / raw)
To: andros, bfields; +Cc: neilb, linux-nfs, Andy Adamson
[-- Attachment #1: Type: text/plain, Size: 2202 bytes --]
On Tue, Dec 20 2016, andros@netapp.com wrote:
> From: Andy Adamson <andros@netapp.com>
>
> I instrumented cache_get, cache_put, cache_clean, svcauth_cache_lookup
> and svcauth_gss_accept.
>
> Then I mounted -o sec=krb5, listed the mount directory, and umounted.
>
> CASE 1: Here is the instrumented output without the patch:
>
> svcauth_gss_accept gc_proc 3 rq_proc 0 (process RPC_GSS_PROC_DESTROY)
> --> sunrpc_cache_lookup from rsc_lookup key ffffc90002b9bc58 hash 940
> sunrpc_cache_lookup 1 calling cache_get key ffffc90002b9bc58 tmp
> ffff880079b3f500
> --> cache_get h ffff880079b3f500 refcount 1
> sunrpc_cache_lookup RETURN 1 key ffffc90002b9bc58 tmp ffff880079b3f500
> --> rsc_free h ffffc90002b9bc58
>
> &rsci->h is ffff880079b3f500, which identifies the cache entry we want
> destroyed.
>
> Case: RPC_GSS_PROC_DESTROY:
> svcauth_gss_accept RPC_GSS_PROC_DESTROY h ffff880079b3f500 ref 2
> expiry 1481823331 <<<<<
> svcauth_gss_accept AFTER SETTING h ffff880079b3f500 expiry 1481819736 <<<<<<<
>
> Note: expiry_time (and CACHE_NEGATIVE) are set.
>
> END of svcauth_gss_accept:
> svcauth_gss_accept END calling cache_put h ffff880079b3f500
> --> cache_put h ffff880079b3f500 refcount 2 cd->cache_put ffffffffa045c180
> Dec 15 11:35:36 rhel7-2-ga-2 kernel: <-- cache_put h ffff880079b3f500 refcount 1
>
> refcount is 1, but cache_clean is not setup to reap the entry as
> rsc_update was not called.
What exactly do you mean by "cache_clean is not setup to reap the entry"?
I think the problem is that detail->flush_time hasn't been updated to
match the new (reduced) ->expiry_time. If that is what you meant by the
above, I completely agree.
> Besides using the write_lock (which should
> be used for entry changes) rsc_update also sets other fields to
> trigger cache_clean to remove the entry from the cache_list under
> the write_lock and do a final cache_put.
The write_lock is needed for some changes, but not necessarily all.
e.g. just clearing a bit is already atomic, so doesn't need a lock.
Decreasing ->flush_time to match a new ->expiry_time does need the lock
as it needs to be atomic.
I'll comment on the patch separately.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH Version 3] SVCAUTH update the rsc cacue on RPC_GSS_PROC_DESTROY
@ 2016-12-20 0:00 andros
2016-12-20 3:57 ` NeilBrown
0 siblings, 1 reply; 5+ messages in thread
From: andros @ 2016-12-20 0:00 UTC (permalink / raw)
To: bfields; +Cc: neilb, linux-nfs, Andy Adamson
From: Andy Adamson <andros@netapp.com>
I instrumented cache_get, cache_put, cache_clean, svcauth_cache_lookup
and svcauth_gss_accept.
Then I mounted -o sec=krb5, listed the mount directory, and umounted.
CASE 1: Here is the instrumented output without the patch:
svcauth_gss_accept gc_proc 3 rq_proc 0 (process RPC_GSS_PROC_DESTROY)
--> sunrpc_cache_lookup from rsc_lookup key ffffc90002b9bc58 hash 940
sunrpc_cache_lookup 1 calling cache_get key ffffc90002b9bc58 tmp
ffff880079b3f500
--> cache_get h ffff880079b3f500 refcount 1
sunrpc_cache_lookup RETURN 1 key ffffc90002b9bc58 tmp ffff880079b3f500
--> rsc_free h ffffc90002b9bc58
&rsci->h is ffff880079b3f500, which identifies the cache entry we want
destroyed.
Case: RPC_GSS_PROC_DESTROY:
svcauth_gss_accept RPC_GSS_PROC_DESTROY h ffff880079b3f500 ref 2
expiry 1481823331 <<<<<
svcauth_gss_accept AFTER SETTING h ffff880079b3f500 expiry 1481819736 <<<<<<<
Note: expiry_time (and CACHE_NEGATIVE) are set.
END of svcauth_gss_accept:
svcauth_gss_accept END calling cache_put h ffff880079b3f500
--> cache_put h ffff880079b3f500 refcount 2 cd->cache_put ffffffffa045c180
Dec 15 11:35:36 rhel7-2-ga-2 kernel: <-- cache_put h ffff880079b3f500 refcount 1
refcount is 1, but cache_clean is not setup to reap the entry as
rsc_update was not called. Besides using the write_lock (which should
be used for entry changes) rsc_update also sets other fields to
trigger cache_clean to remove the entry from the cache_list under
the write_lock and do a final cache_put.
cache_clean CLEAN h ffff880079b3e540 h->expiry_time 1481819736 hash 982
The above cache_clean occurs about 45 minutes after the END
svcauth_gss_accept cache_put.
I waited about 2 hours, and the cache entry was still not reaped
even though the expiry time was set.
CASE 2: Here is the instrumented output with the patch:
svcauth_gss_accept gc_proc 3 rq_proc 0 (process RPC_GSS_PROC_DESTROY)
--> sunrpc_cache_lookup from rsc_lookup key ffffc90002befc30 hash 982
sunrpc_cache_lookup calling cache_get key ffffc90002befc30 tmp ffff88006fcc6cc0
--> cache_get h ffff88006fcc6cc0 refcount 1
sunrpc_cache_lookup RETURN 1 key ffffc90002befc30 tmp ffff88006fcc6cc0
--> rsc_free h ffffc90002befc30
&rsci->h is ffff88006fcc6cc0, which identifies the cache entry we want
destroyed.
Case: RPC_GSS_PROC_DESTROY:
svcauth_gss_accept RPC_GSS_PROC_DESTROY h ffff88006fcc6cc0 ref 2
--> rsc_update new h ffffc90002befd18 new ref 0 old h ffff88006fcc6cc0 old ref 2
--> update_rsc new->h ffff88006fcc6cc0 tmp->h ffffc90002befd18
END of svcauth_gss_accept:
svcauth_gss_accept END calling cache_put h ffff88006fcc6cc0
--> cache_put h ffff88006fcc6cc0 refcount 2 cd->cache_put ffffffffa03cd170
<-- cache_put h ffff88006fcc6cc0 refcount 1
refcount is 1, setup for cache_clean.
cache_clean CLEAN h ffff88006fcc6cc0 h->expiry_time 0 hash 982
cache_clean DELETE h ffff88006fcc6cc0 from cache_list
cache_clean CLEANED calling cache_put h ffff88006fcc6cc0
--> cache_put h ffff88006fcc6cc0 refcount 1
--> rsc_put PUT h ffff88006fcc6cc0
--> rsc_free h ffff88006fcc6cc0
<-- cache_put h ffff88006fcc6cc0 refcount 0
The cache entry is destroyed.
Andy Adamson (1):
SVCAUTH update the rsc cache on RPC_GSS_PROC_DESTROY
net/sunrpc/auth_gss/svcauth_gss.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-12-20 15:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-19 23:55 [PATCH Version 3] SVCAUTH update the rsc cacue on RPC_GSS_PROC_DESTROY andros
2016-12-19 23:55 ` [PATCH Version 3] SVCAUTH update the rsc cache " andros
2016-12-20 0:00 [PATCH Version 3] SVCAUTH update the rsc cacue " andros
2016-12-20 3:57 ` NeilBrown
2016-12-20 15:06 ` Andy Adamson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.