* [PATCH 0/6] Intro: convert lockd to kthread and fix use-after-free (try #6) @ 2008-01-08 19:33 Jeff Layton 2008-01-08 19:33 ` [PATCH 1/6] SUNRPC: spin svc_rqst initialization to its own function Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-08 19:33 UTC (permalink / raw) To: akpm, neilb; +Cc: linux-nfs, linux-kernel This is the sixth patchset to fix the use-after-free problem in lockd which we originally discussed back in October. The main problem is detailed in the last patch of the series. Along the way, Christoph Hellwig mentioned that it would be advantageous to convert lockd to use the kthread API. This patch set first makes that change and then patches it to actually fix the use after free problem. It also fixes a couple of minor bugs in the current lockd implementation. Most of the changes from the last patchset were ones suggested by Neil Brown and are: + fix a preexisting bug that would cause a NULL pointer dereference if the later kmallocs failed in svc_prepare_thread + additional comments to explain the rationale behind nlmsvc_ref increments and decrements + removed module_get/put from lockd(). It should no longer be necessary and isn't safe + sanity checks in lockd_down have been changed to BUG() calls. They should never happen and if they do, then something is very wrong. I've done some basic testing and everything seems to work as expected. I've also tested this against the reproducer that I have for the use-after-free problem and this does fix it. I've tried to make this cleanly bisectable, but have only really tested the final result. Many thanks to Trond Myklebust, Chuck Lever, Neil Brown and Christoph Hellwig for their guidance on this. Signed-off-by: Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 1/6] SUNRPC: spin svc_rqst initialization to its own function 2008-01-08 19:33 [PATCH 0/6] Intro: convert lockd to kthread and fix use-after-free (try #6) Jeff Layton @ 2008-01-08 19:33 ` Jeff Layton 2008-01-08 19:33 ` [PATCH 2/6] SUNRPC: export svc_sock_update_bufs Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-08 19:33 UTC (permalink / raw) To: akpm, neilb; +Cc: linux-nfs, linux-kernel Move the initialzation in __svc_create_thread that happens prior to thread creation to a new function. Export the function to allow services to have better control over the svc_rqst structs. Also rearrange the rqstp initialization to prevent NULL pointer dereferences in svc_exit_thread in case allocations fail. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/sunrpc/svc.h | 2 + net/sunrpc/svc.c | 59 +++++++++++++++++++++++++++++++------------ 2 files changed, 44 insertions(+), 17 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 8531a70..5f07300 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -382,6 +382,8 @@ struct svc_procedure { */ struct svc_serv * svc_create(struct svc_program *, unsigned int, void (*shutdown)(struct svc_serv*)); +struct svc_rqst *svc_prepare_thread(struct svc_serv *serv, + struct svc_pool *pool); int svc_create_thread(svc_thread_fn, struct svc_serv *); void svc_exit_thread(struct svc_rqst *); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index fca17d0..f9636bf 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -538,31 +538,17 @@ svc_release_buffer(struct svc_rqst *rqstp) put_page(rqstp->rq_pages[i]); } -/* - * Create a thread in the given pool. Caller must hold BKL. - * On a NUMA or SMP machine, with a multi-pool serv, the thread - * will be restricted to run on the cpus belonging to the pool. - */ -static int -__svc_create_thread(svc_thread_fn func, struct svc_serv *serv, - struct svc_pool *pool) +struct svc_rqst * +svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool) { struct svc_rqst *rqstp; - int error = -ENOMEM; - int have_oldmask = 0; - cpumask_t oldmask; rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL); if (!rqstp) - goto out; + goto out_enomem; init_waitqueue_head(&rqstp->rq_wait); - if (!(rqstp->rq_argp = kmalloc(serv->sv_xdrsize, GFP_KERNEL)) - || !(rqstp->rq_resp = kmalloc(serv->sv_xdrsize, GFP_KERNEL)) - || !svc_init_buffer(rqstp, serv->sv_max_mesg)) - goto out_thread; - serv->sv_nrthreads++; spin_lock_bh(&pool->sp_lock); pool->sp_nrthreads++; @@ -571,6 +557,45 @@ __svc_create_thread(svc_thread_fn func, struct svc_serv *serv, rqstp->rq_server = serv; rqstp->rq_pool = pool; + rqstp->rq_argp = kmalloc(serv->sv_xdrsize, GFP_KERNEL); + if (!rqstp->rq_argp) + goto out_thread; + + rqstp->rq_resp = kmalloc(serv->sv_xdrsize, GFP_KERNEL); + if (!rqstp->rq_resp) + goto out_thread; + + if (!svc_init_buffer(rqstp, serv->sv_max_mesg)) + goto out_thread; + + return rqstp; +out_thread: + svc_exit_thread(rqstp); +out_enomem: + return ERR_PTR(-ENOMEM); +} +EXPORT_SYMBOL(svc_prepare_thread); + +/* + * Create a thread in the given pool. Caller must hold BKL. + * On a NUMA or SMP machine, with a multi-pool serv, the thread + * will be restricted to run on the cpus belonging to the pool. + */ +static int +__svc_create_thread(svc_thread_fn func, struct svc_serv *serv, + struct svc_pool *pool) +{ + struct svc_rqst *rqstp; + int error = -ENOMEM; + int have_oldmask = 0; + cpumask_t oldmask; + + rqstp = svc_prepare_thread(serv, pool); + if (IS_ERR(rqstp)) { + error = PTR_ERR(rqstp); + goto out; + } + if (serv->sv_nrpools > 1) have_oldmask = svc_pool_map_set_cpumask(pool->sp_id, &oldmask); -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 2/6] SUNRPC: export svc_sock_update_bufs 2008-01-08 19:33 ` [PATCH 1/6] SUNRPC: spin svc_rqst initialization to its own function Jeff Layton @ 2008-01-08 19:33 ` Jeff Layton 2008-01-08 19:33 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-08 19:33 UTC (permalink / raw) To: akpm, neilb; +Cc: linux-nfs, linux-kernel Needed since the plan is to not have a svc_create_thread helper and to have current users of that function just call kthread_run directly. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- net/sunrpc/svcsock.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 057c870..f2bef16 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1407,6 +1407,7 @@ svc_sock_update_bufs(struct svc_serv *serv) } spin_unlock_bh(&serv->sv_lock); } +EXPORT_SYMBOL(svc_sock_update_bufs); /* * Receive the next request on any socket. This code is carefully -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-08 19:33 ` [PATCH 2/6] SUNRPC: export svc_sock_update_bufs Jeff Layton @ 2008-01-08 19:33 ` Jeff Layton 2008-01-08 19:33 ` [PATCH 4/6] NLM: Have lockd call try_to_freeze Jeff Layton 2008-01-09 17:35 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Christoph Hellwig 0 siblings, 2 replies; 32+ messages in thread From: Jeff Layton @ 2008-01-08 19:33 UTC (permalink / raw) To: akpm, neilb; +Cc: linux-nfs, linux-kernel lockd_start_done is a global var that can be reused if lockd is restarted, but it's never reinitialized. On all but the first use, wait_for_completion isn't actually waiting on it since it has already completed once. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 82e2192..0f4148a 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -300,6 +300,7 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ /* * Create the kernel thread and wait for it to start. */ + init_completion(&lockd_start_done); error = svc_create_thread(lockd, serv); if (error) { printk(KERN_WARNING -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 4/6] NLM: Have lockd call try_to_freeze 2008-01-08 19:33 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Jeff Layton @ 2008-01-08 19:33 ` Jeff Layton 2008-01-08 19:33 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Jeff Layton 2008-01-09 17:35 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Christoph Hellwig 1 sibling, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-08 19:33 UTC (permalink / raw) To: akpm, neilb; +Cc: linux-nfs, linux-kernel lockd makes itself freezable, but never calls try_to_freeze(). Have it call try_to_freeze() within the main loop. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 0f4148a..03a83a0 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -155,6 +155,9 @@ lockd(struct svc_rqst *rqstp) long timeout = MAX_SCHEDULE_TIMEOUT; char buf[RPC_MAX_ADDRBUFLEN]; + if (try_to_freeze()) + continue; + if (signalled()) { flush_signals(current); if (nlmsvc_ops) { -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 5/6] NLM: Convert lockd to use kthreads 2008-01-08 19:33 ` [PATCH 4/6] NLM: Have lockd call try_to_freeze Jeff Layton @ 2008-01-08 19:33 ` Jeff Layton 2008-01-08 19:33 ` [PATCH 6/6] NLM: Add reference counting to lockd Jeff Layton 2008-01-09 17:45 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Christoph Hellwig 0 siblings, 2 replies; 32+ messages in thread From: Jeff Layton @ 2008-01-08 19:33 UTC (permalink / raw) To: akpm, neilb; +Cc: linux-nfs, linux-kernel Have lockd_up start lockd using kthread_run. With this change, lockd_down now blocks until lockd actually exits, so there's no longer need for the waitqueue code at the end of lockd_down. This also means that only one lockd can be running at a time which simplifies the code within lockd's main loop. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 93 ++++++++++++++++++++++++-------------------------------- 1 files changed, 40 insertions(+), 53 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 03a83a0..0777a4e 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -25,6 +25,7 @@ #include <linux/smp.h> #include <linux/smp_lock.h> #include <linux/mutex.h> +#include <linux/kthread.h> #include <linux/freezer.h> #include <linux/sunrpc/types.h> @@ -48,13 +49,12 @@ EXPORT_SYMBOL(nlmsvc_ops); static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; -static pid_t nlmsvc_pid; +static struct task_struct *nlmsvc_task; static struct svc_serv *nlmsvc_serv; int nlmsvc_grace_period; unsigned long nlmsvc_timeout; static DECLARE_COMPLETION(lockd_start_done); -static DECLARE_WAIT_QUEUE_HEAD(lockd_exit); /* * These can be set at insmod time (useful for NFS as root filesystem), @@ -111,10 +111,11 @@ static inline void clear_grace_period(void) /* * This is the lockd kernel thread */ -static void -lockd(struct svc_rqst *rqstp) +static int +lockd(void *vrqstp) { int err = 0; + struct svc_rqst *rqstp = vrqstp; unsigned long grace_period_expire; /* Lock module and set up kernel thread */ @@ -122,17 +123,14 @@ lockd(struct svc_rqst *rqstp) * be holding a reference to this module, so it * is safe to just claim another reference */ - __module_get(THIS_MODULE); lock_kernel(); /* * Let our maker know we're running. */ - nlmsvc_pid = current->pid; nlmsvc_serv = rqstp->rq_server; complete(&lockd_start_done); - daemonize("lockd"); set_freezable(); /* Process request with signals blocked, but allow SIGKILL. */ @@ -151,7 +149,7 @@ lockd(struct svc_rqst *rqstp) * NFS mount or NFS daemon has gone away, and we've been sent a * signal, or else another process has taken over our job. */ - while ((nlmsvc_users || !signalled()) && nlmsvc_pid == current->pid) { + while (!kthread_should_stop()) { long timeout = MAX_SCHEDULE_TIMEOUT; char buf[RPC_MAX_ADDRBUFLEN]; @@ -199,27 +197,18 @@ lockd(struct svc_rqst *rqstp) flush_signals(current); - /* - * Check whether there's a new lockd process before - * shutting down the hosts and clearing the slot. - */ - if (!nlmsvc_pid || current->pid == nlmsvc_pid) { - if (nlmsvc_ops) - nlmsvc_invalidate_all(); - nlm_shutdown_hosts(); - nlmsvc_pid = 0; - nlmsvc_serv = NULL; - } else - printk(KERN_DEBUG - "lockd: new process, skipping host shutdown\n"); - wake_up(&lockd_exit); + if (nlmsvc_ops) + nlmsvc_invalidate_all(); + nlm_shutdown_hosts(); + nlmsvc_task = NULL; + nlmsvc_serv = NULL; /* Exit the RPC thread */ svc_exit_thread(rqstp); /* Release module */ unlock_kernel(); - module_put_and_exit(0); + return 0; } @@ -269,14 +258,15 @@ static int make_socks(struct svc_serv *serv, int proto) int lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ { - struct svc_serv * serv; - int error = 0; + struct svc_serv *serv; + struct svc_rqst *rqstp; + int error = 0; mutex_lock(&nlmsvc_mutex); /* * Check whether we're already up and running. */ - if (nlmsvc_pid) { + if (nlmsvc_task) { if (proto) error = make_socks(nlmsvc_serv, proto); goto out; @@ -303,11 +293,24 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ /* * Create the kernel thread and wait for it to start. */ + rqstp = svc_prepare_thread(serv, &serv->sv_pools[0]); + if (IS_ERR(rqstp)) { + error = PTR_ERR(rqstp); + printk(KERN_WARNING + "lockd_up: svc_rqst allocation failed, error=%d\n", + error); + goto destroy_and_out; + } + + svc_sock_update_bufs(serv); init_completion(&lockd_start_done); - error = svc_create_thread(lockd, serv); - if (error) { + nlmsvc_task = kthread_run(lockd, rqstp, serv->sv_name); + if (IS_ERR(nlmsvc_task)) { + error = PTR_ERR(nlmsvc_task); + nlmsvc_task = NULL; printk(KERN_WARNING - "lockd_up: create thread failed, error=%d\n", error); + "lockd_up: kthread_run failed, error=%d\n", error); + svc_exit_thread(rqstp); goto destroy_and_out; } wait_for_completion(&lockd_start_done); @@ -332,37 +335,21 @@ EXPORT_SYMBOL(lockd_up); void lockd_down(void) { - static int warned; - mutex_lock(&nlmsvc_mutex); if (nlmsvc_users) { if (--nlmsvc_users) goto out; - } else - printk(KERN_WARNING "lockd_down: no users! pid=%d\n", nlmsvc_pid); - - if (!nlmsvc_pid) { - if (warned++ == 0) - printk(KERN_WARNING "lockd_down: no lockd running.\n"); - goto out; + } else { + printk(KERN_ERR "lockd_down: no users! task=%p\n", + nlmsvc_task); + BUG(); } - warned = 0; - kill_proc(nlmsvc_pid, SIGKILL, 1); - /* - * Wait for the lockd process to exit, but since we're holding - * the lockd semaphore, we can't wait around forever ... - */ - clear_thread_flag(TIF_SIGPENDING); - interruptible_sleep_on_timeout(&lockd_exit, HZ); - if (nlmsvc_pid) { - printk(KERN_WARNING - "lockd_down: lockd failed to exit, clearing pid\n"); - nlmsvc_pid = 0; + if (!nlmsvc_task) { + printk(KERN_ERR "lockd_down: no lockd running.\n"); + BUG(); } - spin_lock_irq(¤t->sighand->siglock); - recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); + kthread_stop(nlmsvc_task); out: mutex_unlock(&nlmsvc_mutex); } -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-08 19:33 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Jeff Layton @ 2008-01-08 19:33 ` Jeff Layton 2008-01-09 17:47 ` Christoph Hellwig 2008-01-10 3:29 ` Neil Brown 2008-01-09 17:45 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Christoph Hellwig 1 sibling, 2 replies; 32+ messages in thread From: Jeff Layton @ 2008-01-08 19:33 UTC (permalink / raw) To: akpm, neilb; +Cc: linux-nfs, linux-kernel ...and only have lockd exit when the last reference is dropped. The problem is this: When a lock that a client is blocking on comes free, lockd does this in nlmsvc_grant_blocked(): nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, &nlmsvc_grant_ops); the callback from this call is nlmsvc_grant_callback(). That function does this at the end to wake up lockd: svc_wake_up(block->b_daemon); However there is no guarantee that lockd will be up when this happens. If someone shuts down or restarts lockd before the async call completes, then the b_daemon pointer will point to freed memory and the kernel may oops. I first noticed this on older kernels and had mistakenly thought that newer kernels weren't susceptible, but that's not correct. There's a bit of a race to make sure that the nlm_host is bound when the async call is done, but I can now reproduce this at will on current kernels. This patch is based on Trond's suggestion to add a new reference counter to lockd, and only allows lockd to go down when it reaches 0. With this change we can't use kthread_stop here. nlmsvc_unlink_block is called by lockd and a kthread can't call kthread_stop on itself. So the patch changes lockd to check the refcount itself and to return if it goes to 0. We do the checking and exit while holding the nlmsvc_mutex to make sure that a new lockd is not started until the old one is down. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 50 +++++++++++++++++++++++++++++++++++++------ fs/lockd/svclock.c | 8 +++++++ include/linux/lockd/lockd.h | 1 + 3 files changed, 52 insertions(+), 7 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 0777a4e..b1918e9 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -51,6 +51,7 @@ static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; static struct task_struct *nlmsvc_task; static struct svc_serv *nlmsvc_serv; +atomic_t nlmsvc_ref = ATOMIC_INIT(0); int nlmsvc_grace_period; unsigned long nlmsvc_timeout; @@ -133,7 +134,10 @@ lockd(void *vrqstp) set_freezable(); - /* Process request with signals blocked, but allow SIGKILL. */ + /* + * Process request with signals blocked, but allow SIGKILL which + * signifies that lockd should drop all of its locks. + */ allow_signal(SIGKILL); dprintk("NFS locking service started (ver " LOCKD_VERSION ").\n"); @@ -146,15 +150,19 @@ lockd(void *vrqstp) /* * The main request loop. We don't terminate until the last - * NFS mount or NFS daemon has gone away, and we've been sent a - * signal, or else another process has taken over our job. + * NFS mount or NFS daemon has gone away, and the nlm_blocked + * list is empty. The nlmsvc_mutex ensures that we prevent a + * new lockd from being started before the old one is down. */ - while (!kthread_should_stop()) { + mutex_lock(&nlmsvc_mutex); + while (atomic_read(&nlmsvc_ref) != 0) { long timeout = MAX_SCHEDULE_TIMEOUT; char buf[RPC_MAX_ADDRBUFLEN]; + mutex_unlock(&nlmsvc_mutex); + if (try_to_freeze()) - continue; + goto again; if (signalled()) { flush_signals(current); @@ -181,11 +189,12 @@ lockd(void *vrqstp) */ err = svc_recv(rqstp, timeout); if (err == -EAGAIN || err == -EINTR) - continue; + goto again; if (err < 0) { printk(KERN_WARNING "lockd: terminating on error %d\n", -err); + mutex_lock(&nlmsvc_mutex); break; } @@ -193,8 +202,15 @@ lockd(void *vrqstp) svc_print_addr(rqstp, buf, sizeof(buf))); svc_process(rqstp); +again: + mutex_lock(&nlmsvc_mutex); } + /* + * at this point lockd is committed to going down. We hold the + * nlmsvc_mutex until just before exit to prevent a new one + * from starting before it's down. + */ flush_signals(current); if (nlmsvc_ops) @@ -202,6 +218,7 @@ lockd(void *vrqstp) nlm_shutdown_hosts(); nlmsvc_task = NULL; nlmsvc_serv = NULL; + mutex_unlock(&nlmsvc_mutex); /* Exit the RPC thread */ svc_exit_thread(rqstp); @@ -263,6 +280,11 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ int error = 0; mutex_lock(&nlmsvc_mutex); + + /* first lockd_up caller takes a nlmsvc_ref */ + if (!nlmsvc_users) + atomic_inc(&nlmsvc_ref); + /* * Check whether we're already up and running. */ @@ -322,6 +344,9 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ destroy_and_out: svc_destroy(serv); out: + /* if there was an error and this is the first user, drop reference */ + if (!nlmsvc_users && error) + atomic_dec(&nlmsvc_ref); if (!error) nlmsvc_users++; mutex_unlock(&nlmsvc_mutex); @@ -349,7 +374,18 @@ lockd_down(void) printk(KERN_ERR "lockd_down: no lockd running.\n"); BUG(); } - kthread_stop(nlmsvc_task); + if (!atomic_dec_and_test(&nlmsvc_ref)) + printk(KERN_WARNING "lockd_down: lockd is waiting for " + "outstanding requests to complete before exiting.\n"); + + /* + * Sending a signal is necessary here. If we get to this point and + * nlm_blocked isn't empty then lockd may be held hostage by clients + * that are still blocking. Sending the signal makes sure that lockd + * invalidates all of its locks so that it's just waiting on RPC + * callbacks to complete + */ + kill_proc(nlmsvc_task->pid, SIGKILL, 1); out: mutex_unlock(&nlmsvc_mutex); } diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index d120ec3..8333315 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -61,6 +61,11 @@ nlmsvc_insert_block(struct nlm_block *block, unsigned long when) struct list_head *pos; dprintk("lockd: nlmsvc_insert_block(%p, %ld)\n", block, when); + + /* take a lockd reference when first lock goes on nlm_blocked */ + if (list_empty(&nlm_blocked)) + atomic_inc(&nlmsvc_ref); + if (list_empty(&block->b_list)) { kref_get(&block->b_count); } else { @@ -239,6 +244,9 @@ static int nlmsvc_unlink_block(struct nlm_block *block) /* Remove block from list */ status = posix_unblock_lock(block->b_file->f_file, &block->b_call->a_args.lock.fl); nlmsvc_remove_block(block); + /* drop lockd reference when last lock is removed from nlm_blocked */ + if (list_empty(&nlm_blocked)) + atomic_dec(&nlmsvc_ref); return status; } diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index e2d1ce3..7389553 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -154,6 +154,7 @@ extern struct svc_procedure nlmsvc_procedures4[]; extern int nlmsvc_grace_period; extern unsigned long nlmsvc_timeout; extern int nsm_use_hostnames; +extern atomic_t nlmsvc_ref; /* * Lockd client functions -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-08 19:33 ` [PATCH 6/6] NLM: Add reference counting to lockd Jeff Layton @ 2008-01-09 17:47 ` Christoph Hellwig 2008-01-09 18:36 ` Jeff Layton 2008-01-10 3:29 ` Neil Brown 1 sibling, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2008-01-09 17:47 UTC (permalink / raw) To: Jeff Layton; +Cc: akpm, neilb, linux-nfs, linux-kernel On Tue, Jan 08, 2008 at 02:33:18PM -0500, Jeff Layton wrote: > ...and only have lockd exit when the last reference is dropped. > > The problem is this: > > When a lock that a client is blocking on comes free, lockd does this in > nlmsvc_grant_blocked(): > > nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, &nlmsvc_grant_ops); > > the callback from this call is nlmsvc_grant_callback(). That function > does this at the end to wake up lockd: > > svc_wake_up(block->b_daemon); > > However there is no guarantee that lockd will be up when this happens. > If someone shuts down or restarts lockd before the async call completes, > then the b_daemon pointer will point to freed memory and the kernel may > oops. > > I first noticed this on older kernels and had mistakenly thought that > newer kernels weren't susceptible, but that's not correct. There's a bit > of a race to make sure that the nlm_host is bound when the async call is > done, but I can now reproduce this at will on current kernels. > > This patch is based on Trond's suggestion to add a new reference counter > to lockd, and only allows lockd to go down when it reaches 0. With this > change we can't use kthread_stop here. nlmsvc_unlink_block is called by > lockd and a kthread can't call kthread_stop on itself. So the patch > changes lockd to check the refcount itself and to return if it goes to > 0. We do the checking and exit while holding the nlmsvc_mutex to make > sure that a new lockd is not started until the old one is down. I don't like this signals/kthread mixture at all. Why can't we simply call kthread_stop when the refcount hits zero and keep all the nice kthread helpers? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-09 17:47 ` Christoph Hellwig @ 2008-01-09 18:36 ` Jeff Layton 2008-01-09 18:48 ` Christoph Hellwig 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-09 18:36 UTC (permalink / raw) To: Christoph Hellwig; +Cc: akpm, neilb, linux-nfs, linux-kernel On Wed, 9 Jan 2008 17:47:07 +0000 Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Jan 08, 2008 at 02:33:18PM -0500, Jeff Layton wrote: > > ...and only have lockd exit when the last reference is dropped. > > > > The problem is this: > > > > When a lock that a client is blocking on comes free, lockd does > > this in nlmsvc_grant_blocked(): > > > > nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, > > &nlmsvc_grant_ops); > > > > the callback from this call is nlmsvc_grant_callback(). That > > function does this at the end to wake up lockd: > > > > svc_wake_up(block->b_daemon); > > > > However there is no guarantee that lockd will be up when this > > happens. If someone shuts down or restarts lockd before the async > > call completes, then the b_daemon pointer will point to freed > > memory and the kernel may oops. > > > > I first noticed this on older kernels and had mistakenly thought > > that newer kernels weren't susceptible, but that's not correct. > > There's a bit of a race to make sure that the nlm_host is bound > > when the async call is done, but I can now reproduce this at will > > on current kernels. > > > > This patch is based on Trond's suggestion to add a new reference > > counter to lockd, and only allows lockd to go down when it reaches > > 0. With this change we can't use kthread_stop here. > > nlmsvc_unlink_block is called by lockd and a kthread can't call > > kthread_stop on itself. So the patch changes lockd to check the > > refcount itself and to return if it goes to 0. We do the checking > > and exit while holding the nlmsvc_mutex to make sure that a new > > lockd is not started until the old one is down. > > I don't like this signals/kthread mixture at all. Why can't we simply > call kthread_stop when the refcount hits zero and keep all the nice > kthread helpers? > As I stated in an earlier email, I'm not fond of this either :-) I don't see a good alternative though. We need to be able to drop the and check the refcount in nlmsvc_unlink_block. That function is called from lockd, and we can't have lockd call kthread_stop on itself. If you see a better way to do this, I'm certainly open to suggestions. I'll note that my first stab at fixing this problem was to change the svc_wake_up() call in the rpc callback to a routine to wake up any lockd on the box that happened to be up. That sidesteps this entire problem of having to make sure lockd stays up. If we decided that was the right approach we could dump the last patch in this series altogether. That said there could be other use after free bugs lurking in the lockd code so maybe keeping lockd up until nlm_blocked is empty is the right thing to do. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-09 18:36 ` Jeff Layton @ 2008-01-09 18:48 ` Christoph Hellwig 2008-01-09 18:59 ` Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2008-01-09 18:48 UTC (permalink / raw) To: Jeff Layton; +Cc: Christoph Hellwig, akpm, neilb, linux-nfs, linux-kernel On Wed, Jan 09, 2008 at 01:36:21PM -0500, Jeff Layton wrote: > I don't see a good alternative though. We need to be able to drop the > and check the refcount in nlmsvc_unlink_block. That function is called > from lockd, and we can't have lockd call kthread_stop on itself. > > If you see a better way to do this, I'm certainly open to suggestions. > > I'll note that my first stab at fixing this problem was to change the > svc_wake_up() call in the rpc callback to a routine to wake up any > lockd on the box that happened to be up. That sidesteps this entire > problem of having to make sure lockd stays up. If we decided that was > the right approach we could dump the last patch in this series > altogether. > > That said there could be other use after free bugs lurking in the lockd > code so maybe keeping lockd up until nlm_blocked is empty is the right > thing to do. What about just not exiting from lockd as long as nlm_blocked is not empty? lockd_down still simply calls kthread_stop, but lockd only honours it when nlm_blocked is empty? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-09 18:48 ` Christoph Hellwig @ 2008-01-09 18:59 ` Jeff Layton 0 siblings, 0 replies; 32+ messages in thread From: Jeff Layton @ 2008-01-09 18:59 UTC (permalink / raw) To: Christoph Hellwig; +Cc: akpm, neilb, linux-nfs, linux-kernel On Wed, 9 Jan 2008 18:48:14 +0000 Christoph Hellwig <hch@infradead.org> wrote: > On Wed, Jan 09, 2008 at 01:36:21PM -0500, Jeff Layton wrote: > > I don't see a good alternative though. We need to be able to drop > > the and check the refcount in nlmsvc_unlink_block. That function is > > called from lockd, and we can't have lockd call kthread_stop on > > itself. > > > > If you see a better way to do this, I'm certainly open to > > suggestions. > > > > I'll note that my first stab at fixing this problem was to change > > the svc_wake_up() call in the rpc callback to a routine to wake up > > any lockd on the box that happened to be up. That sidesteps this > > entire problem of having to make sure lockd stays up. If we decided > > that was the right approach we could dump the last patch in this > > series altogether. > > > > That said there could be other use after free bugs lurking in the > > lockd code so maybe keeping lockd up until nlm_blocked is empty is > > the right thing to do. > > What about just not exiting from lockd as long as nlm_blocked is not > empty? lockd_down still simply calls kthread_stop, but lockd only > honours it when nlm_blocked is empty? lockd can basically block forever in this situation if the client goes away for good. With the current kthread implementation, kthread_stops are serialized and I don't think we want to monopolize the kthread_stop queue. If kthread_stops could occur in parallel, that would be a different situation :-) -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-08 19:33 ` [PATCH 6/6] NLM: Add reference counting to lockd Jeff Layton 2008-01-09 17:47 ` Christoph Hellwig @ 2008-01-10 3:29 ` Neil Brown 2008-01-10 11:58 ` Jeff Layton 1 sibling, 1 reply; 32+ messages in thread From: Neil Brown @ 2008-01-10 3:29 UTC (permalink / raw) To: Jeff Layton; +Cc: akpm, linux-nfs, linux-kernel On Tuesday January 8, jlayton@redhat.com wrote: > ...and only have lockd exit when the last reference is dropped. > > The problem is this: > > When a lock that a client is blocking on comes free, lockd does this in > nlmsvc_grant_blocked(): > > nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, &nlmsvc_grant_ops); > > the callback from this call is nlmsvc_grant_callback(). That function > does this at the end to wake up lockd: > > svc_wake_up(block->b_daemon); Uhmmm... Maybe there is an easier way. block->b_daemon will always be nlmsvc_serv, so can we simply make this svc_wake_up(nlmsvc_serv); with a little locking to make sure nlmsvc_serv is valid? Actually svc_wake_up is only called from lockd and goes through various hoops to find the right rqstp, which we could have known in advance. So store the rqstp in some global wrapped in a spinlock so we can access it safely and just: spin_lock(whatever) if (nlmsvc_rqstp) wake_up(&nlmsvc_rqstp->rq_wait) spin_unlock(whatever) That seems a somewhat simpler way of avoiding the particular problem. Hmmm.... I guess that nlmsvc_grant_callback could then be run after the 'lockd' module had been unloaded. Maybe nlm_shutdown_hosts could call rpc_killall_tasks(host->h_rpcclnt) on each host. That should ensure the callback wont happen afterwards. Maybe? NeilBrown ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-10 3:29 ` Neil Brown @ 2008-01-10 11:58 ` Jeff Layton 0 siblings, 0 replies; 32+ messages in thread From: Jeff Layton @ 2008-01-10 11:58 UTC (permalink / raw) To: Neil Brown; +Cc: akpm, linux-nfs, linux-kernel On Thu, 10 Jan 2008 14:29:22 +1100 Neil Brown <neilb@suse.de> wrote: > On Tuesday January 8, jlayton@redhat.com wrote: > > ...and only have lockd exit when the last reference is dropped. > > > > The problem is this: > > > > When a lock that a client is blocking on comes free, lockd does > > this in nlmsvc_grant_blocked(): > > > > nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, > > &nlmsvc_grant_ops); > > > > the callback from this call is nlmsvc_grant_callback(). That > > function does this at the end to wake up lockd: > > > > svc_wake_up(block->b_daemon); > > Uhmmm... Maybe there is an easier way. > > block->b_daemon will always be nlmsvc_serv, so can we simply make this > > svc_wake_up(nlmsvc_serv); > with a little locking to make sure nlmsvc_serv is valid? > That's very close to my original patch to fix this problem. I just replaced svc_wake_up with a call to a new function that wakes up any lockd that happens to be up. I'm not sure that my original patch was careful enough with the locking though... > Actually svc_wake_up is only called from lockd and goes through > various hoops to find the right rqstp, which we could have known in > advance. > So store the rqstp in some global wrapped in a spinlock so we can > access it safely and just: > > spin_lock(whatever) > if (nlmsvc_rqstp) > wake_up(&nlmsvc_rqstp->rq_wait) > spin_unlock(whatever) > > > That seems a somewhat simpler way of avoiding the particular problem. > Yes. Much. > > Hmmm.... I guess that nlmsvc_grant_callback could then be run after > the 'lockd' module had been unloaded. > Maybe nlm_shutdown_hosts could call rpc_killall_tasks(host->h_rpcclnt) > on each host. That should ensure the callback wont happen afterwards. > > Maybe? > I think so. If we let lockd go down before all the RPC's are done, then the whole problem of accessing lockd data from them sounds like it could be a problem. If not now, then future changes could cause it. IIRC, The reason we don't get nlm_destroy_host done on each nlm_host in this situation is because the h_count is too high. Doing rpc_killall_tasks in this situation might fix that, but the logic in all of this is pretty convoluted. I'll see if I can cook up a new patchset that does this instead. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 5/6] NLM: Convert lockd to use kthreads 2008-01-08 19:33 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Jeff Layton 2008-01-08 19:33 ` [PATCH 6/6] NLM: Add reference counting to lockd Jeff Layton @ 2008-01-09 17:45 ` Christoph Hellwig 2008-01-09 18:08 ` Jeff Layton 1 sibling, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2008-01-09 17:45 UTC (permalink / raw) To: Jeff Layton; +Cc: akpm, neilb, linux-nfs, linux-kernel On Tue, Jan 08, 2008 at 02:33:17PM -0500, Jeff Layton wrote: > - struct svc_serv * serv; > - int error = 0; > + struct svc_serv *serv; > + struct svc_rqst *rqstp; > + int error = 0; > > mutex_lock(&nlmsvc_mutex); > /* > * Check whether we're already up and running. > */ > - if (nlmsvc_pid) { > + if (nlmsvc_task) { > if (proto) > error = make_socks(nlmsvc_serv, proto); While equivalent I think it would be clener to check for nlmsvc_serv above as that'swhat we're passing to make_socks. But I think the whole of lockd_up could use a little makeover, but that's for later. > void > lockd_down(void) > { > mutex_lock(&nlmsvc_mutex); > if (nlmsvc_users) { > if (--nlmsvc_users) > goto out; > + } else { > + printk(KERN_ERR "lockd_down: no users! task=%p\n", > + nlmsvc_task); > + BUG(); > } > + if (!nlmsvc_task) { > + printk(KERN_ERR "lockd_down: no lockd running.\n"); > + BUG(); > } > + kthread_stop(nlmsvc_task); I think all this user/foo checking here should be BUG_ONs as it's quite fatal errors. e.g. void lockd_down(void) { mutex_lock(&nlmsvc_mutex); BUG_ON(!nlmsvc_task); BUG_ON(!nlmsvc_users); if (!--nlmsvc_users) kthread_stop(nlmsvc_task); mutex_unlock(&nlmsvc_mutex); } same applies for similar checks in lockd_up aswell. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 5/6] NLM: Convert lockd to use kthreads 2008-01-09 17:45 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Christoph Hellwig @ 2008-01-09 18:08 ` Jeff Layton 0 siblings, 0 replies; 32+ messages in thread From: Jeff Layton @ 2008-01-09 18:08 UTC (permalink / raw) To: Christoph Hellwig; +Cc: akpm, neilb, linux-nfs, linux-kernel On Wed, 9 Jan 2008 17:45:06 +0000 Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Jan 08, 2008 at 02:33:17PM -0500, Jeff Layton wrote: > > - struct svc_serv * serv; > > - int error = 0; > > + struct svc_serv *serv; > > + struct svc_rqst *rqstp; > > + int error = 0; > > > > mutex_lock(&nlmsvc_mutex); > > /* > > * Check whether we're already up and running. > > */ > > - if (nlmsvc_pid) { > > + if (nlmsvc_task) { > > if (proto) > > error = make_socks(nlmsvc_serv, proto); > > While equivalent I think it would be clener to check for nlmsvc_serv > above as that'swhat we're passing to make_socks. But I think the > whole of lockd_up could use a little makeover, but that's for later. > Probably so. If I respin, I'll plan to fix that too. > > void > > lockd_down(void) > > { > > mutex_lock(&nlmsvc_mutex); > > if (nlmsvc_users) { > > if (--nlmsvc_users) > > goto out; > > + } else { > > + printk(KERN_ERR "lockd_down: no users! task=%p\n", > > + nlmsvc_task); > > + BUG(); > > } > > + if (!nlmsvc_task) { > > + printk(KERN_ERR "lockd_down: no lockd running.\n"); > > + BUG(); > > } > > + kthread_stop(nlmsvc_task); > > I think all this user/foo checking here should be BUG_ONs as it's > quite fatal errors. > > e.g. > > void > lockd_down(void) > { > mutex_lock(&nlmsvc_mutex); > > BUG_ON(!nlmsvc_task); > BUG_ON(!nlmsvc_users); > > if (!--nlmsvc_users) > kthread_stop(nlmsvc_task); > mutex_unlock(&nlmsvc_mutex); > } > > > same applies for similar checks in lockd_up aswell. > With this patch the lockd_down checks should now be BUGs. I decided not to do that in lockd_up. If there's an error within the main lockd loop, it can exit without being requested to do so. If someone then calls lockd_up then the counts will be off and the check will fire. It seems like if we're going to make the check in lockd_up be a BUG, then we should also BUG rather than letting lockd exit prematurely. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-08 19:33 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Jeff Layton 2008-01-08 19:33 ` [PATCH 4/6] NLM: Have lockd call try_to_freeze Jeff Layton @ 2008-01-09 17:35 ` Christoph Hellwig 2008-01-09 18:05 ` Jeff Layton 2008-01-13 13:27 ` Jeff Layton 1 sibling, 2 replies; 32+ messages in thread From: Christoph Hellwig @ 2008-01-09 17:35 UTC (permalink / raw) To: Jeff Layton; +Cc: akpm, neilb, linux-nfs, linux-kernel On Tue, Jan 08, 2008 at 02:33:15PM -0500, Jeff Layton wrote: > lockd_start_done is a global var that can be reused if lockd is > restarted, but it's never reinitialized. On all but the first use, > wait_for_completion isn't actually waiting on it since it has > already completed once. I don't think we'll need lockd_start_done anymore after the kthread conversion. When kthread_run returns the thread it created is guaranteed to have run until it scheduled away. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-09 17:35 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Christoph Hellwig @ 2008-01-09 18:05 ` Jeff Layton 2008-01-09 18:14 ` Christoph Hellwig 2008-01-13 13:27 ` Jeff Layton 1 sibling, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-09 18:05 UTC (permalink / raw) To: Christoph Hellwig; +Cc: akpm, neilb, linux-nfs, linux-kernel On Wed, 9 Jan 2008 17:35:42 +0000 Christoph Hellwig <hch@infradead.org> wrote: > I don't think we'll need lockd_start_done anymore after the kthread > conversion. When kthread_run returns the thread it created is > guaranteed to have run until it scheduled away. > Makes sense. My only concern is that we make sure this is behavior we can count on in the future and not just an artifact of the current kthread implementation. If that's the case, then I'll plan to remove it on the next respin. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-09 18:05 ` Jeff Layton @ 2008-01-09 18:14 ` Christoph Hellwig 0 siblings, 0 replies; 32+ messages in thread From: Christoph Hellwig @ 2008-01-09 18:14 UTC (permalink / raw) To: Jeff Layton; +Cc: Christoph Hellwig, akpm, neilb, linux-nfs, linux-kernel On Wed, Jan 09, 2008 at 01:05:54PM -0500, Jeff Layton wrote: > Makes sense. My only concern is that we make sure this is behavior we > can count on in the future and not just an artifact of the current > kthread implementation. If that's the case, then I'll plan to remove it > on the next respin. It's absolutely intentional and one of the reasons why the kthread infrastructure is so much nicer than plain kthread_create :) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-09 17:35 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Christoph Hellwig 2008-01-09 18:05 ` Jeff Layton @ 2008-01-13 13:27 ` Jeff Layton 2008-01-13 18:17 ` Christoph Hellwig 1 sibling, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-13 13:27 UTC (permalink / raw) To: Christoph Hellwig; +Cc: akpm, neilb, linux-nfs, linux-kernel On Wed, 9 Jan 2008 17:35:42 +0000 Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Jan 08, 2008 at 02:33:15PM -0500, Jeff Layton wrote: > > lockd_start_done is a global var that can be reused if lockd is > > restarted, but it's never reinitialized. On all but the first use, > > wait_for_completion isn't actually waiting on it since it has > > already completed once. > > I don't think we'll need lockd_start_done anymore after the kthread > conversion. When kthread_run returns the thread it created is > guaranteed to have run until it scheduled away. > Christoph, I've been hitting an intermittent null pointer dereference ever since I've made this change: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000038 printing eip: e09ddee1 *pde = 1f377067 *pte = 00000000 Oops: 0000 [#1] SMP Modules linked in: nfsd nfs_acl auth_rpcgss exportfs rfcomm l2cap bluetooth autofs4 lockd sunrpc nf_conntrack_ipv6 xt_state nf_conntrack xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 loop dm_multipath pcspkr 8139cp 8139too mii joydev i2c_piix4 i2c_core sr_mod sg cdrom dm_snapshot dm_zero dm_mirror dm_mod ata_piix pata_acpi ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 1946, comm: rpc.nfsd Not tainted (2.6.24-0.138.rc7.kthread.2.fc9 #1) EIP: 0060:[<e09ddee1>] EFLAGS: 00010202 CPU: 0 EIP is at find_socket+0xa/0x3f [lockd] EAX: 00000000 EBX: 00000006 ECX: 00000000 EDX: 00000011 ESI: 00000000 EDI: 00000011 EBP: df358ec4 ESP: df358eb8 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process rpc.nfsd (pid: 1946, ti=df358000 task=df38ae10 task.ti=df358000) Stack: 00000006 00000000 00000006 df358ee0 e09de0cb 22222222 22222222 00000006 00000008 00000801 df358f04 e09de337 00000000 00000000 00000000 00000000 00000801 00000008 00000801 df358f1c e0aa05ac 00000000 df356200 00000008 Call Trace: [<c040649a>] show_trace_log_lvl+0x1a/0x2f [<c040654a>] show_stack_log_lvl+0x9b/0xa3 [<c04065f9>] show_registers+0xa7/0x178 [<c04067ff>] die+0x135/0x220 [<c063ff1b>] do_page_fault+0x553/0x631 [<c063e5a2>] error_code+0x72/0x78 [<e09de0cb>] make_socks+0x27/0xbe [lockd] [<e09de337>] lockd_up+0x3b/0x148 [lockd] [<e0aa05ac>] nfsd_svc+0xf2/0x107 [nfsd] [<e0aa0b19>] write_svc+0x1a/0x20 [nfsd] [<e0aa0d10>] nfsctl_transaction_write+0x39/0x63 [nfsd] [<c04b97cf>] sys_nfsservctl+0x11f/0x160 [<c0405252>] syscall_call+0x7/0xb ======================= Code: 89 d8 5b 5d c3 55 89 e5 c7 05 48 a4 9e e0 01 00 00 00 e8 a7 ff ff ff 8b 15 00 0c 76 c0 5d 01 d0 c3 55 89 e5 57 89 d7 56 89 c6 53 <8b> 48 38 83 e9 08 eb 15 8b 41 14 0f b6 40 29 39 f8 75 07 b8 01 EIP: [<e09ddee1>] find_socket+0xa/0x3f [lockd] SS:ESP 0068:df358eb8 ---[ end trace 7d509b4c18b144aa ]--- The problem is that make_socks is occasionally getting called with a NULL nlmsvc_serv pointer. I think the problem occurs here in lockd_up(). if (nlmsvc_task) { if (proto) error = make_socks(nlmsvc_serv, proto); goto out; } You pointed out earlier that this should really be checking that nlmsvc_serv is non-NULL. I can and will make this change, and that will likely fix this particular oops, but the fact that I'm hitting it here suggests that kthread_run is returning before lockd has a chance to set nlmsvc_serv. It shouldn't be according to your statement above. Are you sure that kthread_run is working correctly? It seems like it might not be doing the right thing here... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-13 13:27 ` Jeff Layton @ 2008-01-13 18:17 ` Christoph Hellwig 2008-01-13 19:12 ` J. Bruce Fields ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Christoph Hellwig @ 2008-01-13 18:17 UTC (permalink / raw) To: Jeff Layton; +Cc: Christoph Hellwig, akpm, neilb, linux-nfs, linux-kernel On Sun, Jan 13, 2008 at 08:27:18AM -0500, Jeff Layton wrote: > I've been hitting an intermittent null pointer dereference ever > since I've made this change: The first thing lockd does is to call lock_kernel(). This may either block (or spin) when it is contended and thus delay updating nlmsvc_serv. Now lockd_up checks for nlmsvc_task which already is non-NULL and happily dereference nlmsvc_serv. The patch below updates nlmsvc_serv in lockd_up where it is protected by nlmsvc_mutex and also checks for nlmsvc_serv beeing set instead of nlmsvc_task to fix this problem. The patch hasn't actually been tested but I'm sure it will fix this issue. Btw, lockd() takes BKL just after starting up and only implicitly drops it when blocking. This seems very dangerous to me and badly wants updating to some real locking scheme.. Signed-off-by: Christoph Hellwig <hch@lst.de> Index: linux-2.6/fs/lockd/svc.c =================================================================== --- linux-2.6.orig/fs/lockd/svc.c 2008-01-13 19:07:17.000000000 +0100 +++ linux-2.6/fs/lockd/svc.c 2008-01-13 19:13:23.000000000 +0100 @@ -118,7 +118,6 @@ lockd(void *vrqstp) /* set up kernel thread */ lock_kernel(); - nlmsvc_serv = rqstp->rq_server; set_freezable(); /* Allow SIGKILL to tell lockd to drop all of its locks */ @@ -253,7 +252,7 @@ lockd_up(int proto) /* Maybe add a 'fami /* * Check whether we're already up and running. */ - if (nlmsvc_task) { + if (nlmsvc_serv) { if (proto) error = make_socks(nlmsvc_serv, proto); goto out; @@ -290,6 +289,9 @@ lockd_up(int proto) /* Maybe add a 'fami } svc_sock_update_bufs(serv); + + nlmsvc_serv = rqstp->rq_server; + nlmsvc_task = kthread_run(lockd, rqstp, serv->sv_name); if (IS_ERR(nlmsvc_task)) { error = PTR_ERR(nlmsvc_task); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-13 18:17 ` Christoph Hellwig @ 2008-01-13 19:12 ` J. Bruce Fields 2008-01-14 14:24 ` Jeff Layton 2008-03-15 3:44 ` Mike Snitzer 2 siblings, 0 replies; 32+ messages in thread From: J. Bruce Fields @ 2008-01-13 19:12 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jeff Layton, akpm, neilb, linux-nfs, linux-kernel On Sun, Jan 13, 2008 at 06:17:43PM +0000, Christoph Hellwig wrote: > Btw, lockd() takes BKL just after starting up and only implicitly drops > it when blocking. This seems very dangerous to me and badly wants > updating to some real locking scheme.. Yep. --b. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-13 18:17 ` Christoph Hellwig 2008-01-13 19:12 ` J. Bruce Fields @ 2008-01-14 14:24 ` Jeff Layton 2008-01-14 14:25 ` Christoph Hellwig 2008-03-15 3:44 ` Mike Snitzer 2 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-14 14:24 UTC (permalink / raw) To: Christoph Hellwig; +Cc: akpm, neilb, linux-nfs, linux-kernel On Sun, 13 Jan 2008 18:17:43 +0000 Christoph Hellwig <hch@infradead.org> wrote: > On Sun, Jan 13, 2008 at 08:27:18AM -0500, Jeff Layton wrote: > > I've been hitting an intermittent null pointer dereference ever > > since I've made this change: > > The first thing lockd does is to call lock_kernel(). This may either > block (or spin) when it is contended and thus delay updating > nlmsvc_serv. Now lockd_up checks for nlmsvc_task which already is > non-NULL and happily dereference nlmsvc_serv. The patch below > updates nlmsvc_serv in lockd_up where it is protected by nlmsvc_mutex > and also checks for nlmsvc_serv beeing set instead of nlmsvc_task to > fix this problem. > > The patch hasn't actually been tested but I'm sure it will fix this > issue. > Thanks Christoph. I incorporated this into my latest patchset. It does seem to fix the issue (tested by bouncing NFS up and down for 30 mins or so). Let me know if you want me to add a signed-off-by line for you... > Btw, lockd() takes BKL just after starting up and only implicitly > drops it when blocking. This seems very dangerous to me and badly > wants updating to some real locking scheme.. > Yep -- It's ugly. I took a look a while back at what it would take to change that. The problem is that it's very difficult to tell exactly what the BKL is intended to protect in. I assume it does it for the same reason that fs/locks.c uses it, but there may be other things that need protection if it's removed. It might be best to try to change this incrementally -- gradually audit and move pieces of lockd() outside of the BKL, until it's clear that it's no longer needed. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-14 14:24 ` Jeff Layton @ 2008-01-14 14:25 ` Christoph Hellwig 0 siblings, 0 replies; 32+ messages in thread From: Christoph Hellwig @ 2008-01-14 14:25 UTC (permalink / raw) To: Jeff Layton; +Cc: Christoph Hellwig, akpm, neilb, linux-nfs, linux-kernel On Mon, Jan 14, 2008 at 09:24:54AM -0500, Jeff Layton wrote: > Thanks Christoph. I incorporated this into my latest patchset. It does > seem to fix the issue (tested by bouncing NFS up and down for 30 mins > or so). Let me know if you want me to add a signed-off-by line for > you... No need to add anything, this was just a two-liner trivial fix.. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-13 18:17 ` Christoph Hellwig 2008-01-13 19:12 ` J. Bruce Fields 2008-01-14 14:24 ` Jeff Layton @ 2008-03-15 3:44 ` Mike Snitzer 2008-03-15 6:34 ` Christoph Hellwig 2 siblings, 1 reply; 32+ messages in thread From: Mike Snitzer @ 2008-03-15 3:44 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jeff Layton, akpm, neilb, linux-nfs, linux-kernel On Sun, Jan 13, 2008 at 1:17 PM, Christoph Hellwig <hch@infradead.org> wrote: > Btw, lockd() takes BKL just after starting up and only implicitly drops > it when blocking. This seems very dangerous to me and badly wants > updating to some real locking scheme.. Can you elaborate on what is meant by lockd "blocking"? Blocking in svc_recv() or during a SETLKW or ??? I'm trying to come to terms with why nlmsvc_lock() wouldn't have the BKL on entry. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-03-15 3:44 ` Mike Snitzer @ 2008-03-15 6:34 ` Christoph Hellwig 0 siblings, 0 replies; 32+ messages in thread From: Christoph Hellwig @ 2008-03-15 6:34 UTC (permalink / raw) To: Mike Snitzer Cc: Christoph Hellwig, Jeff Layton, akpm, neilb, linux-nfs, linux-kernel On Fri, Mar 14, 2008 at 10:44:31PM -0500, Mike Snitzer wrote: > On Sun, Jan 13, 2008 at 1:17 PM, Christoph Hellwig <hch@infradead.org> wrote: > > Btw, lockd() takes BKL just after starting up and only implicitly drops > > it when blocking. This seems very dangerous to me and badly wants > > updating to some real locking scheme.. > > Can you elaborate on what is meant by lockd "blocking"? Blocking in > svc_recv() or during a SETLKW or ??? Blocking in kernel context means sleeping aka scheduling away. So in the sentence above that means BKL is dropped once lockd sleeps on a syncronization primitive the first time. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 0/6] Intro: convert lockd to kthread and fix use-after-free (try #5) @ 2008-01-05 12:02 Jeff Layton 2008-01-05 12:02 ` [PATCH 1/6] SUNRPC: spin svc_rqst initialization to its own function Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-05 12:02 UTC (permalink / raw) To: akpm; +Cc: linux-nfs, linux-kernel This is the fifth patchset to fix the use-after-free problem in lockd which we originally discussed back in October. The main problem is detailed in the last patch of the series. Along the way, Christoph Hellwig mentioned that it would be advantageous to convert lockd to use the kthread API. This patch set first makes that change and then patches it to actually fix the use after free problem. It also fixes a couple of minor bugs in the current lockd implementation. The main changes from the original patchset are: + dropped the new thread creation helper and just have lockd_up call kthread_run directly. + dropped the first patch that changed svc_pool_map_set_cpumask, since it's no longer needed. + added a warning message when lockd_down is called for the final time, but lockd is still up + done some style cleanups recommended by checkpatch.pl. I've done some basic smoke testing and everything seems to work as expected. I've also tested this against the reproducer that I have for the use-after-free problem and this does fix it. I've tried to make this cleanly bisectable, but have only really tested the final result. Many thanks to Trond Myklebust, Chuck Lever and Christoph Hellwig for their guidance on this. Signed-off-by: Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 1/6] SUNRPC: spin svc_rqst initialization to its own function 2008-01-05 12:02 [PATCH 0/6] Intro: convert lockd to kthread and fix use-after-free (try #5) Jeff Layton @ 2008-01-05 12:02 ` Jeff Layton 2008-01-05 12:02 ` [PATCH 2/6] SUNRPC: export svc_sock_update_bufs Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-05 12:02 UTC (permalink / raw) To: akpm; +Cc: linux-nfs, linux-kernel Move the initialzation in __svc_create_thread that happens prior to thread creation to a new function. Export the function to allow services to have better control over the svc_rqst structs. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/sunrpc/svc.h | 2 ++ net/sunrpc/svc.c | 43 +++++++++++++++++++++++++++++++------------ 2 files changed, 33 insertions(+), 12 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 8531a70..5f07300 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -382,6 +382,8 @@ struct svc_procedure { */ struct svc_serv * svc_create(struct svc_program *, unsigned int, void (*shutdown)(struct svc_serv*)); +struct svc_rqst *svc_prepare_thread(struct svc_serv *serv, + struct svc_pool *pool); int svc_create_thread(svc_thread_fn, struct svc_serv *); void svc_exit_thread(struct svc_rqst *); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index fca17d0..b29ed43 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -538,23 +538,14 @@ svc_release_buffer(struct svc_rqst *rqstp) put_page(rqstp->rq_pages[i]); } -/* - * Create a thread in the given pool. Caller must hold BKL. - * On a NUMA or SMP machine, with a multi-pool serv, the thread - * will be restricted to run on the cpus belonging to the pool. - */ -static int -__svc_create_thread(svc_thread_fn func, struct svc_serv *serv, - struct svc_pool *pool) +struct svc_rqst * +svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool) { struct svc_rqst *rqstp; - int error = -ENOMEM; - int have_oldmask = 0; - cpumask_t oldmask; rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL); if (!rqstp) - goto out; + goto out_enomem; init_waitqueue_head(&rqstp->rq_wait); @@ -570,6 +561,34 @@ __svc_create_thread(svc_thread_fn func, struct svc_serv *serv, spin_unlock_bh(&pool->sp_lock); rqstp->rq_server = serv; rqstp->rq_pool = pool; + return rqstp; + +out_thread: + svc_exit_thread(rqstp); +out_enomem: + return ERR_PTR(-ENOMEM); +} +EXPORT_SYMBOL(svc_prepare_thread); + +/* + * Create a thread in the given pool. Caller must hold BKL. + * On a NUMA or SMP machine, with a multi-pool serv, the thread + * will be restricted to run on the cpus belonging to the pool. + */ +static int +__svc_create_thread(svc_thread_fn func, struct svc_serv *serv, + struct svc_pool *pool) +{ + struct svc_rqst *rqstp; + int error = -ENOMEM; + int have_oldmask = 0; + cpumask_t oldmask; + + rqstp = svc_prepare_thread(serv, pool); + if (IS_ERR(rqstp)) { + error = PTR_ERR(rqstp); + goto out; + } if (serv->sv_nrpools > 1) have_oldmask = svc_pool_map_set_cpumask(pool->sp_id, &oldmask); -- 1.5.3.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 2/6] SUNRPC: export svc_sock_update_bufs 2008-01-05 12:02 ` [PATCH 1/6] SUNRPC: spin svc_rqst initialization to its own function Jeff Layton @ 2008-01-05 12:02 ` Jeff Layton 2008-01-05 12:02 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-05 12:02 UTC (permalink / raw) To: akpm; +Cc: linux-nfs, linux-kernel Needed since the plan is to not have a svc_create_thread helper and to have current users of that function just call kthread_run directly. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- net/sunrpc/svcsock.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 057c870..f2bef16 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1407,6 +1407,7 @@ svc_sock_update_bufs(struct svc_serv *serv) } spin_unlock_bh(&serv->sv_lock); } +EXPORT_SYMBOL(svc_sock_update_bufs); /* * Receive the next request on any socket. This code is carefully -- 1.5.3.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2008-01-05 12:02 ` [PATCH 2/6] SUNRPC: export svc_sock_update_bufs Jeff Layton @ 2008-01-05 12:02 ` Jeff Layton 2008-01-05 12:02 ` [PATCH 4/6] NLM: Have lockd call try_to_freeze Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-05 12:02 UTC (permalink / raw) To: akpm; +Cc: linux-nfs, linux-kernel lockd_start_done is a global var that can be reused if lockd is restarted, but it's never reinitialized. On all but the first use, wait_for_completion isn't actually waiting on it since it has already completed once. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 82e2192..0f4148a 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -300,6 +300,7 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ /* * Create the kernel thread and wait for it to start. */ + init_completion(&lockd_start_done); error = svc_create_thread(lockd, serv); if (error) { printk(KERN_WARNING -- 1.5.3.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 4/6] NLM: Have lockd call try_to_freeze 2008-01-05 12:02 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Jeff Layton @ 2008-01-05 12:02 ` Jeff Layton 2008-01-05 12:02 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-05 12:02 UTC (permalink / raw) To: akpm; +Cc: linux-nfs, linux-kernel lockd makes itself freezable, but never calls try_to_freeze(). Have it call try_to_freeze() within the main loop. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 0f4148a..03a83a0 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -155,6 +155,9 @@ lockd(struct svc_rqst *rqstp) long timeout = MAX_SCHEDULE_TIMEOUT; char buf[RPC_MAX_ADDRBUFLEN]; + if (try_to_freeze()) + continue; + if (signalled()) { flush_signals(current); if (nlmsvc_ops) { -- 1.5.3.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 5/6] NLM: Convert lockd to use kthreads 2008-01-05 12:02 ` [PATCH 4/6] NLM: Have lockd call try_to_freeze Jeff Layton @ 2008-01-05 12:02 ` Jeff Layton 2008-01-05 12:02 ` [PATCH 6/6] NLM: Add reference counting to lockd Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-05 12:02 UTC (permalink / raw) To: akpm; +Cc: linux-nfs, linux-kernel Have lockd_up start lockd using kthread_run. With this change, lockd_down now blocks until lockd actually exits, so there's no longer need for the waitqueue code at the end of lockd_down. This also means that only one lockd can be running at a time which simplifies the code within lockd's main loop. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 79 ++++++++++++++++++++++++++----------------------------- 1 files changed, 37 insertions(+), 42 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 03a83a0..d7209ea 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -25,6 +25,7 @@ #include <linux/smp.h> #include <linux/smp_lock.h> #include <linux/mutex.h> +#include <linux/kthread.h> #include <linux/freezer.h> #include <linux/sunrpc/types.h> @@ -48,13 +49,12 @@ EXPORT_SYMBOL(nlmsvc_ops); static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; -static pid_t nlmsvc_pid; +static struct task_struct *nlmsvc_task; static struct svc_serv *nlmsvc_serv; int nlmsvc_grace_period; unsigned long nlmsvc_timeout; static DECLARE_COMPLETION(lockd_start_done); -static DECLARE_WAIT_QUEUE_HEAD(lockd_exit); /* * These can be set at insmod time (useful for NFS as root filesystem), @@ -111,10 +111,11 @@ static inline void clear_grace_period(void) /* * This is the lockd kernel thread */ -static void -lockd(struct svc_rqst *rqstp) +static int +lockd(void *vrqstp) { int err = 0; + struct svc_rqst *rqstp = vrqstp; unsigned long grace_period_expire; /* Lock module and set up kernel thread */ @@ -128,11 +129,9 @@ lockd(struct svc_rqst *rqstp) /* * Let our maker know we're running. */ - nlmsvc_pid = current->pid; nlmsvc_serv = rqstp->rq_server; complete(&lockd_start_done); - daemonize("lockd"); set_freezable(); /* Process request with signals blocked, but allow SIGKILL. */ @@ -151,7 +150,7 @@ lockd(struct svc_rqst *rqstp) * NFS mount or NFS daemon has gone away, and we've been sent a * signal, or else another process has taken over our job. */ - while ((nlmsvc_users || !signalled()) && nlmsvc_pid == current->pid) { + while (!kthread_should_stop()) { long timeout = MAX_SCHEDULE_TIMEOUT; char buf[RPC_MAX_ADDRBUFLEN]; @@ -203,23 +202,19 @@ lockd(struct svc_rqst *rqstp) * Check whether there's a new lockd process before * shutting down the hosts and clearing the slot. */ - if (!nlmsvc_pid || current->pid == nlmsvc_pid) { - if (nlmsvc_ops) - nlmsvc_invalidate_all(); - nlm_shutdown_hosts(); - nlmsvc_pid = 0; - nlmsvc_serv = NULL; - } else - printk(KERN_DEBUG - "lockd: new process, skipping host shutdown\n"); - wake_up(&lockd_exit); + if (nlmsvc_ops) + nlmsvc_invalidate_all(); + nlm_shutdown_hosts(); + nlmsvc_task = NULL; + nlmsvc_serv = NULL; /* Exit the RPC thread */ svc_exit_thread(rqstp); /* Release module */ unlock_kernel(); - module_put_and_exit(0); + module_put(THIS_MODULE); + return 0; } @@ -269,14 +264,15 @@ static int make_socks(struct svc_serv *serv, int proto) int lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ { - struct svc_serv * serv; - int error = 0; + struct svc_serv *serv; + struct svc_rqst *rqstp; + int error = 0; mutex_lock(&nlmsvc_mutex); /* * Check whether we're already up and running. */ - if (nlmsvc_pid) { + if (nlmsvc_task) { if (proto) error = make_socks(nlmsvc_serv, proto); goto out; @@ -303,11 +299,24 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ /* * Create the kernel thread and wait for it to start. */ + rqstp = svc_prepare_thread(serv, &serv->sv_pools[0]); + if (IS_ERR(rqstp)) { + error = PTR_ERR(rqstp); + printk(KERN_WARNING + "lockd_up: svc_rqst allocation failed, error=%d\n", + error); + goto destroy_and_out; + } + + svc_sock_update_bufs(serv); init_completion(&lockd_start_done); - error = svc_create_thread(lockd, serv); - if (error) { + nlmsvc_task = kthread_run(lockd, rqstp, serv->sv_name); + if (IS_ERR(nlmsvc_task)) { + error = PTR_ERR(nlmsvc_task); + nlmsvc_task = NULL; printk(KERN_WARNING - "lockd_up: create thread failed, error=%d\n", error); + "lockd_up: kthread_run failed, error=%d\n", error); + svc_exit_thread(rqstp); goto destroy_and_out; } wait_for_completion(&lockd_start_done); @@ -339,30 +348,16 @@ lockd_down(void) if (--nlmsvc_users) goto out; } else - printk(KERN_WARNING "lockd_down: no users! pid=%d\n", nlmsvc_pid); + printk(KERN_WARNING "lockd_down: no users! task=%p\n", + nlmsvc_task); - if (!nlmsvc_pid) { + if (!nlmsvc_task) { if (warned++ == 0) printk(KERN_WARNING "lockd_down: no lockd running.\n"); goto out; } warned = 0; - - kill_proc(nlmsvc_pid, SIGKILL, 1); - /* - * Wait for the lockd process to exit, but since we're holding - * the lockd semaphore, we can't wait around forever ... - */ - clear_thread_flag(TIF_SIGPENDING); - interruptible_sleep_on_timeout(&lockd_exit, HZ); - if (nlmsvc_pid) { - printk(KERN_WARNING - "lockd_down: lockd failed to exit, clearing pid\n"); - nlmsvc_pid = 0; - } - spin_lock_irq(¤t->sighand->siglock); - recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); + kthread_stop(nlmsvc_task); out: mutex_unlock(&nlmsvc_mutex); } -- 1.5.3.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-05 12:02 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Jeff Layton @ 2008-01-05 12:02 ` Jeff Layton 2008-01-08 6:46 ` Neil Brown 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2008-01-05 12:02 UTC (permalink / raw) To: akpm; +Cc: linux-nfs, linux-kernel ...and only have lockd exit when the last reference is dropped. The problem is this: When a lock that a client is blocking on comes free, lockd does this in nlmsvc_grant_blocked(): nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, &nlmsvc_grant_ops); the callback from this call is nlmsvc_grant_callback(). That function does this at the end to wake up lockd: svc_wake_up(block->b_daemon); However there is no guarantee that lockd will be up when this happens. If someone shuts down or restarts lockd before the async call completes, then the b_daemon pointer will point to freed memory and the kernel may oops. I first noticed this on older kernels and had mistakenly thought that newer kernels weren't susceptible, but that's not correct. There's a bit of a race to make sure that the nlm_host is bound when the async call is done, but I can now reproduce this at will on current kernels. This patch is based on Trond's suggestion to add a new reference counter to lockd, and only allows lockd to go down when it reaches 0. With this change we can't use kthread_stop here. nlmsvc_unlink_block is called by lockd and a kthread can't call kthread_stop on itself. So the patch changes lockd to check the refcount itself and to return if it goes to 0. We do the checking and exit while holding the nlmsvc_mutex to make sure that a new lockd is not started until the old one is down. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 51 +++++++++++++++++++++++++++++++++--------- fs/lockd/svclock.c | 5 ++++ include/linux/lockd/lockd.h | 1 + 3 files changed, 46 insertions(+), 11 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index d7209ea..0f56edf 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -51,6 +51,7 @@ static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; static struct task_struct *nlmsvc_task; static struct svc_serv *nlmsvc_serv; +atomic_t nlmsvc_ref = ATOMIC_INIT(0); int nlmsvc_grace_period; unsigned long nlmsvc_timeout; @@ -134,7 +135,10 @@ lockd(void *vrqstp) set_freezable(); - /* Process request with signals blocked, but allow SIGKILL. */ + /* + * Process request with signals blocked, but allow SIGKILL which + * signifies that lockd should drop all of its locks. + */ allow_signal(SIGKILL); dprintk("NFS locking service started (ver " LOCKD_VERSION ").\n"); @@ -147,15 +151,19 @@ lockd(void *vrqstp) /* * The main request loop. We don't terminate until the last - * NFS mount or NFS daemon has gone away, and we've been sent a - * signal, or else another process has taken over our job. + * NFS mount or NFS daemon has gone away, and the nlm_blocked + * list is empty. The nlmsvc_mutex ensures that we prevent a + * new lockd from being started before the old one is down. */ - while (!kthread_should_stop()) { + mutex_lock(&nlmsvc_mutex); + while (atomic_read(&nlmsvc_ref) != 0) { long timeout = MAX_SCHEDULE_TIMEOUT; char buf[RPC_MAX_ADDRBUFLEN]; + mutex_unlock(&nlmsvc_mutex); + if (try_to_freeze()) - continue; + goto again; if (signalled()) { flush_signals(current); @@ -182,11 +190,12 @@ lockd(void *vrqstp) */ err = svc_recv(rqstp, timeout); if (err == -EAGAIN || err == -EINTR) - continue; + goto again; if (err < 0) { printk(KERN_WARNING "lockd: terminating on error %d\n", -err); + mutex_lock(&nlmsvc_mutex); break; } @@ -194,19 +203,22 @@ lockd(void *vrqstp) svc_print_addr(rqstp, buf, sizeof(buf))); svc_process(rqstp); +again: + mutex_lock(&nlmsvc_mutex); } - flush_signals(current); - /* - * Check whether there's a new lockd process before - * shutting down the hosts and clearing the slot. + * at this point lockd is committed to going down. We hold the + * nlmsvc_mutex until just before exit to prevent a new one + * from starting before it's down. */ + flush_signals(current); if (nlmsvc_ops) nlmsvc_invalidate_all(); nlm_shutdown_hosts(); nlmsvc_task = NULL; nlmsvc_serv = NULL; + mutex_unlock(&nlmsvc_mutex); /* Exit the RPC thread */ svc_exit_thread(rqstp); @@ -269,6 +281,10 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ int error = 0; mutex_lock(&nlmsvc_mutex); + + if (!nlmsvc_users) + atomic_inc(&nlmsvc_ref); + /* * Check whether we're already up and running. */ @@ -328,6 +344,8 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ destroy_and_out: svc_destroy(serv); out: + if (!nlmsvc_users && error) + atomic_dec(&nlmsvc_ref); if (!error) nlmsvc_users++; mutex_unlock(&nlmsvc_mutex); @@ -357,7 +375,18 @@ lockd_down(void) goto out; } warned = 0; - kthread_stop(nlmsvc_task); + if (atomic_sub_return(1, &nlmsvc_ref) != 0) + printk(KERN_WARNING "lockd_down: lockd is waiting for " + "outstanding requests to complete before exiting.\n"); + + /* + * Sending a signal is necessary here. If we get to this point and + * nlm_blocked isn't empty then lockd may be held hostage by clients + * that are still blocking. Sending the signal makes sure that lockd + * invalidates all of its locks so that it's just waiting on RPC + * callbacks to complete + */ + kill_proc(nlmsvc_task->pid, SIGKILL, 1); out: mutex_unlock(&nlmsvc_mutex); } diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index d120ec3..b8fbda3 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -61,6 +61,9 @@ nlmsvc_insert_block(struct nlm_block *block, unsigned long when) struct list_head *pos; dprintk("lockd: nlmsvc_insert_block(%p, %ld)\n", block, when); + if (list_empty(&nlm_blocked)) + atomic_inc(&nlmsvc_ref); + if (list_empty(&block->b_list)) { kref_get(&block->b_count); } else { @@ -239,6 +242,8 @@ static int nlmsvc_unlink_block(struct nlm_block *block) /* Remove block from list */ status = posix_unblock_lock(block->b_file->f_file, &block->b_call->a_args.lock.fl); nlmsvc_remove_block(block); + if (list_empty(&nlm_blocked)) + atomic_dec(&nlmsvc_ref); return status; } diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index e2d1ce3..7389553 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -154,6 +154,7 @@ extern struct svc_procedure nlmsvc_procedures4[]; extern int nlmsvc_grace_period; extern unsigned long nlmsvc_timeout; extern int nsm_use_hostnames; +extern atomic_t nlmsvc_ref; /* * Lockd client functions -- 1.5.3.6 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-05 12:02 ` [PATCH 6/6] NLM: Add reference counting to lockd Jeff Layton @ 2008-01-08 6:46 ` Neil Brown 2008-01-08 13:26 ` Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Neil Brown @ 2008-01-08 6:46 UTC (permalink / raw) To: Jeff Layton; +Cc: akpm, linux-nfs, linux-kernel On Saturday January 5, jlayton@redhat.com wrote: > @@ -357,7 +375,18 @@ lockd_down(void) > goto out; > } > warned = 0; > - kthread_stop(nlmsvc_task); > + if (atomic_sub_return(1, &nlmsvc_ref) != 0) > + printk(KERN_WARNING "lockd_down: lockd is waiting for " > + "outstanding requests to complete before exiting.\n"); Why not "atomic_dec_and_test" ?? > + > + /* > + * Sending a signal is necessary here. If we get to this point and > + * nlm_blocked isn't empty then lockd may be held hostage by clients > + * that are still blocking. Sending the signal makes sure that lockd > + * invalidates all of its locks so that it's just waiting on RPC > + * callbacks to complete > + */ > + kill_proc(nlmsvc_task->pid, SIGKILL, 1); The previous patch removes a kill_proc(... SIGKILL), this one adds it back. That makes me wonder if the intermediate state is 'correct'. But I also wonder what "correct" means. Do we want all locks to be dropped when the last nfsd thread dies? The answer is presumably either "yes" or "no". If "yes", then we don't have that because if there are any NFS mounts active, lockd will not be killed. If "no", then we don't want this kill_proc here. The comment in lockd() which currently reads: /* * The main request loop. We don't terminate until the last * NFS mount or NFS daemon has gone away, and we've been sent a * signal, or else another process has taken over our job. */ suggests that someone once thought that lockd could hang around after all nfsd threads and nfs mounts had gone, but I don't think it does. We really should think this through and get it right, because if lockd ever drops it's locks, then we really need to make sure sm_notify gets run. So it needs to be a well defined event. Thoughts? Also, it is sad that the inc/dec of nlmsvc_ref is called in somewhat non-obvious ways. e.g. > + if (!nlmsvc_users && error) > + atomic_dec(&nlmsvc_ref); and > + if (list_empty(&nlm_blocked)) > + atomic_inc(&nlmsvc_ref); > + > if (list_empty(&block->b_list)) { > kref_get(&block->b_count); > } else { where if we moved the atomic_inc a little bit later next to the "list_add_tail" (which seems to make more sense) it would actually be wrong... But I think that code is correct as it is - just non-obvious. NeilBrown ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-08 6:46 ` Neil Brown @ 2008-01-08 13:26 ` Jeff Layton 2008-01-08 15:52 ` Wendy Cheng 2008-01-08 16:13 ` Peter Staubach 0 siblings, 2 replies; 32+ messages in thread From: Jeff Layton @ 2008-01-08 13:26 UTC (permalink / raw) To: Neil Brown; +Cc: akpm, linux-nfs, linux-kernel On Tue, 8 Jan 2008 17:46:33 +1100 Neil Brown <neilb@suse.de> wrote: The comments about patch 5/6 seem sane. I'll plan to incorporate them in the respin... > On Saturday January 5, jlayton@redhat.com wrote: > > @@ -357,7 +375,18 @@ lockd_down(void) > > goto out; > > } > > warned = 0; > > - kthread_stop(nlmsvc_task); > > + if (atomic_sub_return(1, &nlmsvc_ref) != 0) > > + printk(KERN_WARNING "lockd_down: lockd is waiting > > for " > > + "outstanding requests to complete before > > exiting.\n"); > > Why not "atomic_dec_and_test" ?? > Temporary amnesia? :-) I'll change that, atomic_dec_and_test will be clearer. > > + > > + /* > > + * Sending a signal is necessary here. If we get to this > > point and > > + * nlm_blocked isn't empty then lockd may be held hostage > > by clients > > + * that are still blocking. Sending the signal makes sure > > that lockd > > + * invalidates all of its locks so that it's just waiting > > on RPC > > + * callbacks to complete > > + */ > > + kill_proc(nlmsvc_task->pid, SIGKILL, 1); > > The previous patch removes a kill_proc(... SIGKILL), this one adds it > back. > That makes me wonder if the intermediate state is 'correct'. > > But I also wonder what "correct" means. > Do we want all locks to be dropped when the last nfsd thread dies? > The answer is presumably either "yes" or "no". > If "yes", then we don't have that because if there are any NFS mounts > active, lockd will not be killed. > If "no", then we don't want this kill_proc here. > > The comment in lockd() which currently reads: > > /* > * The main request loop. We don't terminate until the last > * NFS mount or NFS daemon has gone away, and we've been sent > a > * signal, or else another process has taken over our job. > */ > > suggests that someone once thought that lockd could hang around after > all nfsd threads and nfs mounts had gone, but I don't think it does. > > We really should think this through and get it right, because if lockd > ever drops it's locks, then we really need to make sure sm_notify gets > run. So it needs to be a well defined event. > > Thoughts? > This is the part I've been struggling with the most -- defining what proper behavior should be when lockd is restarted. As you point out, restarting lockd without doing a sm_notify could be bad news for data integrity. Then again, we'd like someone to be able to shut down the NFS "service" and be able to unmount underlying filesystems without jumping through special hoops.... Overall, I think I'd vote "yes". We need to drop locks when the last nfsd goes down. If userspace brings down nfsd, then it's userspace's responsibility to make sure that a sm_notify is sent when nfsd and lockd are restarted. As a side note, I'm not thrilled with this design that mixes signals and kthreads, but didn't see another way to do this. I'm open to suggestions if anyone has them... > Also, it is sad that the inc/dec of nlmsvc_ref is called in somewhat > non-obvious ways. > e.g. > > > + if (!nlmsvc_users && error) > > + atomic_dec(&nlmsvc_ref); > > and > > > + if (list_empty(&nlm_blocked)) > > + atomic_inc(&nlmsvc_ref); > > + > > if (list_empty(&block->b_list)) { > > kref_get(&block->b_count); > > } else { > > where if we moved the atomic_inc a little bit later next to the > "list_add_tail" (which seems to make more sense) it would actually be > wrong... But I think that code is correct as it is - just non-obvious. > The nlmsvc_ref logic is pretty convoluted, unfortunately. I'll plan to add some comments to clarify what I'm doing there. Thanks for the review, Neil. I'll see if I can get a new patchset done in the next few days. Cheers, -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-08 13:26 ` Jeff Layton @ 2008-01-08 15:52 ` Wendy Cheng 2008-01-08 16:13 ` Jeff Layton 2008-01-08 16:13 ` Peter Staubach 1 sibling, 1 reply; 32+ messages in thread From: Wendy Cheng @ 2008-01-08 15:52 UTC (permalink / raw) To: Jeff Layton; +Cc: Neil Brown, akpm, linux-nfs, linux-kernel Jeff Layton wrote: > >> The previous patch removes a kill_proc(... SIGKILL), this one adds it >> back. >> That makes me wonder if the intermediate state is 'correct'. >> >> But I also wonder what "correct" means. >> Do we want all locks to be dropped when the last nfsd thread dies? >> The answer is presumably either "yes" or "no". >> If "yes", then we don't have that because if there are any NFS mounts >> active, lockd will not be killed. >> If "no", then we don't want this kill_proc here. >> >> The comment in lockd() which currently reads: >> >> /* >> * The main request loop. We don't terminate until the last >> * NFS mount or NFS daemon has gone away, and we've been sent >> a >> * signal, or else another process has taken over our job. >> */ >> >> suggests that someone once thought that lockd could hang around after >> all nfsd threads and nfs mounts had gone, but I don't think it does. >> >> We really should think this through and get it right, because if lockd >> ever drops it's locks, then we really need to make sure sm_notify gets >> run. So it needs to be a well defined event. >> >> Thoughts? >> >> > > This is the part I've been struggling with the most -- defining what > proper behavior should be when lockd is restarted. As you point out, > restarting lockd without doing a sm_notify could be bad news for data > integrity. > > Then again, we'd like someone to be able to shut down the NFS "service" > and be able to unmount underlying filesystems without jumping through > special hoops.... > > Overall, I think I'd vote "yes". We need to drop locks when the last > nfsd goes down. If userspace brings down nfsd, then it's userspace's > responsibility to make sure that a sm_notify is sent when nfsd and lockd > are restarted. > I would vote for "no", at least for nfs v3. Shutting down lockd would require clients to reclaim the locks. With current status (protocol, design, and even the implementation itself, etc), it is simply too disruptive. I understand current logic (i.e. shutting down nfsd but leaving lockd alone) is awkward but debugging multiple platforms (remember clients may not be on linux boxes) is very non-trivial. -- Wendy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-08 15:52 ` Wendy Cheng @ 2008-01-08 16:13 ` Jeff Layton 0 siblings, 0 replies; 32+ messages in thread From: Jeff Layton @ 2008-01-08 16:13 UTC (permalink / raw) To: Wendy Cheng; +Cc: Neil Brown, akpm, linux-nfs, linux-kernel On Tue, 08 Jan 2008 10:52:19 -0500 Wendy Cheng <wcheng@redhat.com> wrote: > Jeff Layton wrote: > > > >> The previous patch removes a kill_proc(... SIGKILL), this one > >> adds it back. > >> That makes me wonder if the intermediate state is 'correct'. > >> > >> But I also wonder what "correct" means. > >> Do we want all locks to be dropped when the last nfsd thread dies? > >> The answer is presumably either "yes" or "no". > >> If "yes", then we don't have that because if there are any NFS > >> mounts active, lockd will not be killed. > >> If "no", then we don't want this kill_proc here. > >> > >> The comment in lockd() which currently reads: > >> > >> /* > >> * The main request loop. We don't terminate until the last > >> * NFS mount or NFS daemon has gone away, and we've been > >> sent a > >> * signal, or else another process has taken over our job. > >> */ > >> > >> suggests that someone once thought that lockd could hang around > >> after all nfsd threads and nfs mounts had gone, but I don't think > >> it does. > >> > >> We really should think this through and get it right, because if > >> lockd ever drops it's locks, then we really need to make sure > >> sm_notify gets run. So it needs to be a well defined event. > >> > >> Thoughts? > >> > >> > > > > This is the part I've been struggling with the most -- defining what > > proper behavior should be when lockd is restarted. As you point out, > > restarting lockd without doing a sm_notify could be bad news for > > data integrity. > > > > Then again, we'd like someone to be able to shut down the NFS > > "service" and be able to unmount underlying filesystems without > > jumping through special hoops.... > > > > Overall, I think I'd vote "yes". We need to drop locks when the last > > nfsd goes down. If userspace brings down nfsd, then it's userspace's > > responsibility to make sure that a sm_notify is sent when nfsd and > > lockd are restarted. > > > > I would vote for "no", at least for nfs v3. Shutting down lockd would > require clients to reclaim the locks. With current status (protocol, > design, and even the implementation itself, etc), it is simply too > disruptive. I understand current logic (i.e. shutting down nfsd but > leaving lockd alone) is awkward but debugging multiple platforms > (remember clients may not be on linux boxes) is very non-trivial. > The current lockd implementation already drops all locks if nfsd goes down (providing there are no local NFS mounts). The last lockd_down call will bring down lockd and it will drop all of its locks in the process. My vote for "yes" is a vote to keep things the way they are. I don't think I'd consider it disruptive. Changing lockd to not drop locks will mean that userspace will need to take extra steps if someone wants to bring down NFS and unmount an underlying filesystem. Those extra steps could be a SIGKILL to lockd or a call into the new interfaces your recent patchset adds. Either way, that would mean a change in behavior that will have to be accounted for in userspace. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] NLM: Add reference counting to lockd 2008-01-08 13:26 ` Jeff Layton 2008-01-08 15:52 ` Wendy Cheng @ 2008-01-08 16:13 ` Peter Staubach 1 sibling, 0 replies; 32+ messages in thread From: Peter Staubach @ 2008-01-08 16:13 UTC (permalink / raw) To: Jeff Layton; +Cc: Neil Brown, akpm, linux-nfs, linux-kernel Jeff Layton wrote: > On Tue, 8 Jan 2008 17:46:33 +1100 > Neil Brown <neilb@suse.de> wrote: > > The comments about patch 5/6 seem sane. I'll plan to incorporate them > in the respin... > > >> On Saturday January 5, jlayton@redhat.com wrote: >> >>> @@ -357,7 +375,18 @@ lockd_down(void) >>> goto out; >>> } >>> warned = 0; >>> - kthread_stop(nlmsvc_task); >>> + if (atomic_sub_return(1, &nlmsvc_ref) != 0) >>> + printk(KERN_WARNING "lockd_down: lockd is waiting >>> for " >>> + "outstanding requests to complete before >>> exiting.\n"); >>> >> Why not "atomic_dec_and_test" ?? >> >> > > Temporary amnesia? :-) I'll change that, atomic_dec_and_test will be > clearer. > > >>> + >>> + /* >>> + * Sending a signal is necessary here. If we get to this >>> point and >>> + * nlm_blocked isn't empty then lockd may be held hostage >>> by clients >>> + * that are still blocking. Sending the signal makes sure >>> that lockd >>> + * invalidates all of its locks so that it's just waiting >>> on RPC >>> + * callbacks to complete >>> + */ >>> + kill_proc(nlmsvc_task->pid, SIGKILL, 1); >>> >> The previous patch removes a kill_proc(... SIGKILL), this one adds it >> back. >> That makes me wonder if the intermediate state is 'correct'. >> >> But I also wonder what "correct" means. >> Do we want all locks to be dropped when the last nfsd thread dies? >> The answer is presumably either "yes" or "no". >> If "yes", then we don't have that because if there are any NFS mounts >> active, lockd will not be killed. >> If "no", then we don't want this kill_proc here. >> >> The comment in lockd() which currently reads: >> >> /* >> * The main request loop. We don't terminate until the last >> * NFS mount or NFS daemon has gone away, and we've been sent >> a >> * signal, or else another process has taken over our job. >> */ >> >> suggests that someone once thought that lockd could hang around after >> all nfsd threads and nfs mounts had gone, but I don't think it does. >> >> We really should think this through and get it right, because if lockd >> ever drops it's locks, then we really need to make sure sm_notify gets >> run. So it needs to be a well defined event. >> >> Thoughts? >> >> > > This is the part I've been struggling with the most -- defining what > proper behavior should be when lockd is restarted. As you point out, > restarting lockd without doing a sm_notify could be bad news for data > integrity. > > Then again, we'd like someone to be able to shut down the NFS "service" > and be able to unmount underlying filesystems without jumping through > special hoops.... > > Overall, I think I'd vote "yes". We need to drop locks when the last > nfsd goes down. If userspace brings down nfsd, then it's userspace's > responsibility to make sure that a sm_notify is sent when nfsd and lockd > are restarted. > I would vote for the simplest possible model that makes sense. We need a simple model for admins as well as a simple model which is easy to implement in as bug free way as possible. The trick is not making it too simple because that can cost performance, but not making it too complicated to implement reasonably and for admins to be able to figure out. So, I would vote for "yes" as well. That will yield an architecture where we can shutdown systems cleanly and will be easy to understand when locks for clients exist and when they do not. Thanx... ps > As a side note, I'm not thrilled with this design that mixes signals > and kthreads, but didn't see another way to do this. I'm open to > suggestions if anyone has them... > > >> Also, it is sad that the inc/dec of nlmsvc_ref is called in somewhat >> non-obvious ways. >> e.g. >> >> >>> + if (!nlmsvc_users && error) >>> + atomic_dec(&nlmsvc_ref); >>> >> and >> >> >>> + if (list_empty(&nlm_blocked)) >>> + atomic_inc(&nlmsvc_ref); >>> + >>> if (list_empty(&block->b_list)) { >>> kref_get(&block->b_count); >>> } else { >>> >> where if we moved the atomic_inc a little bit later next to the >> "list_add_tail" (which seems to make more sense) it would actually be >> wrong... But I think that code is correct as it is - just non-obvious. >> >> > > The nlmsvc_ref logic is pretty convoluted, unfortunately. I'll plan to > add some comments to clarify what I'm doing there. > > Thanks for the review, Neil. I'll see if I can get a new patchset done > in the next few days. > > Cheers, > ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 0/6] Intro: convert lockd to kthread and fix use-after-free
@ 2007-12-13 20:40 Jeff Layton
2007-12-13 20:40 ` [PATCH 1/6] SUNRPC: Allow svc_pool_map_set_cpumask to work with any task Jeff Layton
0 siblings, 1 reply; 32+ messages in thread
From: Jeff Layton @ 2007-12-13 20:40 UTC (permalink / raw)
To: linux-nfs; +Cc: linux-kernel, nfsv4
The only reply that I got to my last patchset to fix the use-after-free
problem in lockd was from Christoph Hellwig, who said:
> might be better to do the refcounting outside the thread and use the
> kthread api, which is something we still need to do for lockd anyway.
This patchset is an attempt to implement that suggestion. The first
two patches add a new svc_create_kthread() function that works like
svc_create_thread, but uses the kthread API under the covers. The rest
of the patches convert lockd to use this function (and fix a couple of
lockd bugs). The final patch adds reference counting that's needed
to fix the original problem.
Unfortunately, moving the refcounting outside of the thread altogether
isn't feasible for reasons outlined in description of the 6th patch.
Comments and suggestions appreciated...
Signed-off-by: Jeff Layton <jlayton@redhat.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 1/6] SUNRPC: Allow svc_pool_map_set_cpumask to work with any task 2007-12-13 20:40 [PATCH 0/6] Intro: convert lockd to kthread and fix use-after-free Jeff Layton @ 2007-12-13 20:40 ` Jeff Layton 2007-12-13 20:40 ` [PATCH 2/6] SUNRPC: Break up __svc_create_thread and make svc_create_kthread Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2007-12-13 20:40 UTC (permalink / raw) To: linux-nfs; +Cc: linux-kernel, nfsv4 svc_pool_map_set_cpumask will only affect "current" as of now. Add a new arg so that it can change the cpumask on any given task. Also if we're not changing "current" we don't care what the oldmask was, so allow it to be a NULL pointer. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- net/sunrpc/svc.c | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index a4a6bf7..9696ae7 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -297,7 +297,8 @@ svc_pool_map_put(void) * Returns 1 and fills in oldmask iff a cpumask was applied. */ static inline int -svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask) +svc_pool_map_set_cpumask(struct task_struct *task, unsigned int pidx, + cpumask_t *oldmask) { struct svc_pool_map *m = &svc_pool_map; unsigned int node; /* or cpu */ @@ -314,13 +315,15 @@ svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask) return 0; case SVC_POOL_PERCPU: node = m->pool_to[pidx]; - *oldmask = current->cpus_allowed; - set_cpus_allowed(current, cpumask_of_cpu(node)); + if (oldmask != NULL) + *oldmask = task->cpus_allowed; + set_cpus_allowed(task, cpumask_of_cpu(node)); return 1; case SVC_POOL_PERNODE: node = m->pool_to[pidx]; - *oldmask = current->cpus_allowed; - set_cpus_allowed(current, node_to_cpumask(node)); + if (oldmask != NULL) + *oldmask = task->cpus_allowed; + set_cpus_allowed(task, node_to_cpumask(node)); return 1; } } @@ -569,7 +572,8 @@ __svc_create_thread(svc_thread_fn func, struct svc_serv *serv, rqstp->rq_pool = pool; if (serv->sv_nrpools > 1) - have_oldmask = svc_pool_map_set_cpumask(pool->sp_id, &oldmask); + have_oldmask = svc_pool_map_set_cpumask(current, pool->sp_id, + &oldmask); error = kernel_thread((int (*)(void *)) func, rqstp, 0); -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 2/6] SUNRPC: Break up __svc_create_thread and make svc_create_kthread 2007-12-13 20:40 ` [PATCH 1/6] SUNRPC: Allow svc_pool_map_set_cpumask to work with any task Jeff Layton @ 2007-12-13 20:40 ` Jeff Layton 2007-12-13 20:40 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2007-12-13 20:40 UTC (permalink / raw) To: linux-nfs; +Cc: linux-kernel, nfsv4 Move the initialization that happens prior to thread creation to a new function (svc_prepare_thread) so that we can call it from a new thread creation routine. Add a new function svc_create_kthread that spawns svc threads using kthread API. We should be able to eventually convert all of the callers to the kthread API, at which point we can drop __svc_create_thread. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/sunrpc/svc.h | 2 + net/sunrpc/sunrpc_syms.c | 1 + net/sunrpc/svc.c | 69 +++++++++++++++++++++++++++++++++++++++---- 3 files changed, 65 insertions(+), 7 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 8531a70..fd980af 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -383,6 +383,8 @@ struct svc_procedure { struct svc_serv * svc_create(struct svc_program *, unsigned int, void (*shutdown)(struct svc_serv*)); int svc_create_thread(svc_thread_fn, struct svc_serv *); +int svc_create_kthread(svc_thread_fn func, + struct svc_serv *serv, struct svc_pool *pool); void svc_exit_thread(struct svc_rqst *); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, void (*shutdown)(struct svc_serv*), diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c index 33d89e8..7feb878 100644 --- a/net/sunrpc/sunrpc_syms.c +++ b/net/sunrpc/sunrpc_syms.c @@ -64,6 +64,7 @@ EXPORT_SYMBOL(put_rpccred); /* RPC server stuff */ EXPORT_SYMBOL(svc_create); EXPORT_SYMBOL(svc_create_thread); +EXPORT_SYMBOL(svc_create_kthread); EXPORT_SYMBOL(svc_create_pooled); EXPORT_SYMBOL(svc_set_num_threads); EXPORT_SYMBOL(svc_exit_thread); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 9696ae7..d9c26e3 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -18,6 +18,7 @@ #include <linux/mm.h> #include <linux/interrupt.h> #include <linux/module.h> +#include <linux/kthread.h> #include <linux/sunrpc/types.h> #include <linux/sunrpc/xdr.h> @@ -543,18 +544,15 @@ svc_release_buffer(struct svc_rqst *rqstp) * On a NUMA or SMP machine, with a multi-pool serv, the thread * will be restricted to run on the cpus belonging to the pool. */ -static int -__svc_create_thread(svc_thread_fn func, struct svc_serv *serv, - struct svc_pool *pool) +static struct svc_rqst * +svc_prepare_thread(svc_thread_fn func, struct svc_serv *serv, + struct svc_pool *pool) { struct svc_rqst *rqstp; - int error = -ENOMEM; - int have_oldmask = 0; - cpumask_t oldmask; rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL); if (!rqstp) - goto out; + goto out_enomem; init_waitqueue_head(&rqstp->rq_wait); @@ -570,6 +568,30 @@ __svc_create_thread(svc_thread_fn func, struct svc_serv *serv, spin_unlock_bh(&pool->sp_lock); rqstp->rq_server = serv; rqstp->rq_pool = pool; +out: + return rqstp; + +out_thread: + svc_exit_thread(rqstp); +out_enomem: + rqstp = ERR_PTR(-ENOMEM); + goto out; +} + +static int +__svc_create_thread(svc_thread_fn func, struct svc_serv *serv, + struct svc_pool *pool) +{ + struct svc_rqst *rqstp; + int have_oldmask = 0; + cpumask_t oldmask; + int error; + + rqstp = svc_prepare_thread(func, serv, pool); + if (IS_ERR(rqstp)) { + error = PTR_ERR(rqstp); + goto out; + } if (serv->sv_nrpools > 1) have_oldmask = svc_pool_map_set_cpumask(current, pool->sp_id, @@ -601,6 +623,39 @@ svc_create_thread(svc_thread_fn func, struct svc_serv *serv) return __svc_create_thread(func, serv, &serv->sv_pools[0]); } +int +svc_create_kthread(svc_thread_fn func, struct svc_serv *serv, + struct svc_pool *pool) +{ + struct svc_rqst *rqstp; + struct task_struct *task; + int error = 0; + + rqstp = svc_prepare_thread(func, serv, pool); + if (IS_ERR(rqstp)) { + error = PTR_ERR(rqstp); + goto out; + } + + task = kthread_create((int (*)(void *)) func, rqstp, serv->sv_name); + if (IS_ERR(task)) { + error = PTR_ERR(task); + goto out_thread; + } + + if (serv->sv_nrpools > 1) + svc_pool_map_set_cpumask(task, pool->sp_id, NULL); + + svc_sock_update_bufs(serv); + wake_up_process(task); +out: + return error; + +out_thread: + svc_exit_thread(rqstp); + goto out; +} + /* * Choose a pool in which to create a new thread, for svc_set_num_threads */ -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 3/6] NLM: Initialize completion variable in lockd_up 2007-12-13 20:40 ` [PATCH 2/6] SUNRPC: Break up __svc_create_thread and make svc_create_kthread Jeff Layton @ 2007-12-13 20:40 ` Jeff Layton 2007-12-13 20:40 ` [PATCH 4/6] NLM: Have lockd call try_to_freeze Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2007-12-13 20:40 UTC (permalink / raw) To: linux-nfs; +Cc: linux-kernel, nfsv4 lockd_start_done is a global var that can be reused if lockd is restarted, but it's never reinitialized. On all but the first use, wait_for_completion isn't actually waiting on it since it has already completed once. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 82e2192..0f4148a 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -300,6 +300,7 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ /* * Create the kernel thread and wait for it to start. */ + init_completion(&lockd_start_done); error = svc_create_thread(lockd, serv); if (error) { printk(KERN_WARNING -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 4/6] NLM: Have lockd call try_to_freeze 2007-12-13 20:40 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Jeff Layton @ 2007-12-13 20:40 ` Jeff Layton 2007-12-13 20:40 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2007-12-13 20:40 UTC (permalink / raw) To: linux-nfs; +Cc: linux-kernel, nfsv4 lockd makes itself freezable, but never calls try_to_freeze(). Have it call try_to_freeze() within the main loop. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 0f4148a..03a83a0 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -155,6 +155,9 @@ lockd(struct svc_rqst *rqstp) long timeout = MAX_SCHEDULE_TIMEOUT; char buf[RPC_MAX_ADDRBUFLEN]; + if (try_to_freeze()) + continue; + if (signalled()) { flush_signals(current); if (nlmsvc_ops) { -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 5/6] NLM: Convert lockd to use kthreads 2007-12-13 20:40 ` [PATCH 4/6] NLM: Have lockd call try_to_freeze Jeff Layton @ 2007-12-13 20:40 ` Jeff Layton 2007-12-13 20:40 ` [PATCH 6/6] NLM: Add reference counting to lockd Jeff Layton 0 siblings, 1 reply; 32+ messages in thread From: Jeff Layton @ 2007-12-13 20:40 UTC (permalink / raw) To: linux-nfs; +Cc: linux-kernel, nfsv4 Have lockd_up start lockd using svc_create_kthread. With this change, lockd_down now blocks until lockd actually exits, so there's no longer need for the waitqueue code at the end of lockd_down. This also means that only one lockd can be running at a time which simplifies the code within lockd's main loop a bit. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 53 ++++++++++++++++------------------------------------- 1 files changed, 16 insertions(+), 37 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 03a83a0..1303ce8 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -25,6 +25,7 @@ #include <linux/smp.h> #include <linux/smp_lock.h> #include <linux/mutex.h> +#include <linux/kthread.h> #include <linux/freezer.h> #include <linux/sunrpc/types.h> @@ -48,13 +49,12 @@ EXPORT_SYMBOL(nlmsvc_ops); static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; -static pid_t nlmsvc_pid; -static struct svc_serv *nlmsvc_serv; +static struct task_struct * nlmsvc_task; +static struct svc_serv * nlmsvc_serv; int nlmsvc_grace_period; unsigned long nlmsvc_timeout; static DECLARE_COMPLETION(lockd_start_done); -static DECLARE_WAIT_QUEUE_HEAD(lockd_exit); /* * These can be set at insmod time (useful for NFS as root filesystem), @@ -128,11 +128,10 @@ lockd(struct svc_rqst *rqstp) /* * Let our maker know we're running. */ - nlmsvc_pid = current->pid; + nlmsvc_task = current; nlmsvc_serv = rqstp->rq_server; complete(&lockd_start_done); - daemonize("lockd"); set_freezable(); /* Process request with signals blocked, but allow SIGKILL. */ @@ -151,7 +150,7 @@ lockd(struct svc_rqst *rqstp) * NFS mount or NFS daemon has gone away, and we've been sent a * signal, or else another process has taken over our job. */ - while ((nlmsvc_users || !signalled()) && nlmsvc_pid == current->pid) { + while (!kthread_should_stop()) { long timeout = MAX_SCHEDULE_TIMEOUT; char buf[RPC_MAX_ADDRBUFLEN]; @@ -203,23 +202,18 @@ lockd(struct svc_rqst *rqstp) * Check whether there's a new lockd process before * shutting down the hosts and clearing the slot. */ - if (!nlmsvc_pid || current->pid == nlmsvc_pid) { - if (nlmsvc_ops) - nlmsvc_invalidate_all(); - nlm_shutdown_hosts(); - nlmsvc_pid = 0; - nlmsvc_serv = NULL; - } else - printk(KERN_DEBUG - "lockd: new process, skipping host shutdown\n"); - wake_up(&lockd_exit); + if (nlmsvc_ops) + nlmsvc_invalidate_all(); + nlm_shutdown_hosts(); + nlmsvc_task = NULL; + nlmsvc_serv = NULL; /* Exit the RPC thread */ svc_exit_thread(rqstp); /* Release module */ unlock_kernel(); - module_put_and_exit(0); + module_put(THIS_MODULE); } @@ -276,7 +270,7 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ /* * Check whether we're already up and running. */ - if (nlmsvc_pid) { + if (nlmsvc_task) { if (proto) error = make_socks(nlmsvc_serv, proto); goto out; @@ -304,7 +298,7 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ * Create the kernel thread and wait for it to start. */ init_completion(&lockd_start_done); - error = svc_create_thread(lockd, serv); + error = svc_create_kthread(lockd, serv, &serv->sv_pools[0]); if (error) { printk(KERN_WARNING "lockd_up: create thread failed, error=%d\n", error); @@ -339,30 +333,15 @@ lockd_down(void) if (--nlmsvc_users) goto out; } else - printk(KERN_WARNING "lockd_down: no users! pid=%d\n", nlmsvc_pid); + printk(KERN_WARNING "lockd_down: no users! task=%p\n", nlmsvc_task); - if (!nlmsvc_pid) { + if (!nlmsvc_task) { if (warned++ == 0) printk(KERN_WARNING "lockd_down: no lockd running.\n"); goto out; } warned = 0; - - kill_proc(nlmsvc_pid, SIGKILL, 1); - /* - * Wait for the lockd process to exit, but since we're holding - * the lockd semaphore, we can't wait around forever ... - */ - clear_thread_flag(TIF_SIGPENDING); - interruptible_sleep_on_timeout(&lockd_exit, HZ); - if (nlmsvc_pid) { - printk(KERN_WARNING - "lockd_down: lockd failed to exit, clearing pid\n"); - nlmsvc_pid = 0; - } - spin_lock_irq(¤t->sighand->siglock); - recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); + kthread_stop(nlmsvc_task); out: mutex_unlock(&nlmsvc_mutex); } -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 6/6] NLM: Add reference counting to lockd 2007-12-13 20:40 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Jeff Layton @ 2007-12-13 20:40 ` Jeff Layton 0 siblings, 0 replies; 32+ messages in thread From: Jeff Layton @ 2007-12-13 20:40 UTC (permalink / raw) To: linux-nfs; +Cc: linux-kernel, nfsv4 ...and only have lockd exit when the last reference is dropped. This means that we can't use kthread_stop here. nlmsvc_unlink_block is called by lockd and a kthread can't call kthread_stop on itself. So, change lockd to check the refcount itself and to return if it goes to 0. We do the checking and exit while holding the nlmsvc_mutex to make sure that a new lockd is not started until the old one is down. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/lockd/svc.c | 51 ++++++++++++++++++++++++++++++++---------- fs/lockd/svclock.c | 5 ++++ include/linux/lockd/lockd.h | 1 + 3 files changed, 45 insertions(+), 12 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 1303ce8..05d2317 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -51,6 +51,7 @@ static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; static struct task_struct * nlmsvc_task; static struct svc_serv * nlmsvc_serv; +atomic_t nlmsvc_ref = ATOMIC_INIT(0); int nlmsvc_grace_period; unsigned long nlmsvc_timeout; @@ -134,7 +135,10 @@ lockd(struct svc_rqst *rqstp) set_freezable(); - /* Process request with signals blocked, but allow SIGKILL. */ + /* + * Process request with signals blocked, but allow SIGKILL which + * signifies that lockd should drop all of its locks. + */ allow_signal(SIGKILL); dprintk("NFS locking service started (ver " LOCKD_VERSION ").\n"); @@ -147,15 +151,19 @@ lockd(struct svc_rqst *rqstp) /* * The main request loop. We don't terminate until the last - * NFS mount or NFS daemon has gone away, and we've been sent a - * signal, or else another process has taken over our job. + * NFS mount or NFS daemon has gone away, and the nlm_blocked + * list is empty. The nlmsvc_mutex ensures that we prevent a + * new lockd from being started before the old one is down. */ - while (!kthread_should_stop()) { + mutex_lock(&nlmsvc_mutex); + while (atomic_read(&nlmsvc_ref) != 0) { long timeout = MAX_SCHEDULE_TIMEOUT; char buf[RPC_MAX_ADDRBUFLEN]; + mutex_unlock(&nlmsvc_mutex); + if (try_to_freeze()) - continue; + goto again; if (signalled()) { flush_signals(current); @@ -182,11 +190,12 @@ lockd(struct svc_rqst *rqstp) */ err = svc_recv(rqstp, timeout); if (err == -EAGAIN || err == -EINTR) - continue; + goto again; if (err < 0) { printk(KERN_WARNING "lockd: terminating on error %d\n", -err); + mutex_lock(&nlmsvc_mutex); break; } @@ -194,19 +203,22 @@ lockd(struct svc_rqst *rqstp) svc_print_addr(rqstp, buf, sizeof(buf))); svc_process(rqstp); +again: + mutex_lock(&nlmsvc_mutex); } - flush_signals(current); - /* - * Check whether there's a new lockd process before - * shutting down the hosts and clearing the slot. - */ + * at this point lockd is committed to going down. We hold the + * nlmsvc_mutex until just before exit to prevent a new one + * from starting before it's down. + */ + flush_signals(current); if (nlmsvc_ops) nlmsvc_invalidate_all(); nlm_shutdown_hosts(); nlmsvc_task = NULL; nlmsvc_serv = NULL; + mutex_unlock(&nlmsvc_mutex); /* Exit the RPC thread */ svc_exit_thread(rqstp); @@ -267,6 +279,10 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ int error = 0; mutex_lock(&nlmsvc_mutex); + + if (!nlmsvc_users) + atomic_inc(&nlmsvc_ref); + /* * Check whether we're already up and running. */ @@ -313,6 +329,8 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ destroy_and_out: svc_destroy(serv); out: + if (!nlmsvc_users && error) + atomic_dec(&nlmsvc_ref); if (!error) nlmsvc_users++; mutex_unlock(&nlmsvc_mutex); @@ -341,7 +359,16 @@ lockd_down(void) goto out; } warned = 0; - kthread_stop(nlmsvc_task); + atomic_dec(&nlmsvc_ref); + + /* + * Sending a signal is necessary here. If we get to this point and + * nlm_blocked isn't empty then lockd may be held hostage by clients + * that are still blocking. Sending the signal makes sure that lockd + * invalidates all of its locks so that it's just waiting on RPC + * callbacks to complete + */ + kill_proc(nlmsvc_task->pid, SIGKILL, 1); out: mutex_unlock(&nlmsvc_mutex); } diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index d120ec3..b8fbda3 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -61,6 +61,9 @@ nlmsvc_insert_block(struct nlm_block *block, unsigned long when) struct list_head *pos; dprintk("lockd: nlmsvc_insert_block(%p, %ld)\n", block, when); + if (list_empty(&nlm_blocked)) + atomic_inc(&nlmsvc_ref); + if (list_empty(&block->b_list)) { kref_get(&block->b_count); } else { @@ -239,6 +242,8 @@ static int nlmsvc_unlink_block(struct nlm_block *block) /* Remove block from list */ status = posix_unblock_lock(block->b_file->f_file, &block->b_call->a_args.lock.fl); nlmsvc_remove_block(block); + if (list_empty(&nlm_blocked)) + atomic_dec(&nlmsvc_ref); return status; } diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index e2d1ce3..7389553 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -154,6 +154,7 @@ extern struct svc_procedure nlmsvc_procedures4[]; extern int nlmsvc_grace_period; extern unsigned long nlmsvc_timeout; extern int nsm_use_hostnames; +extern atomic_t nlmsvc_ref; /* * Lockd client functions -- 1.5.3.3 ^ permalink raw reply related [flat|nested] 32+ messages in thread
end of thread, other threads:[~2008-03-15 6:34 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-01-08 19:33 [PATCH 0/6] Intro: convert lockd to kthread and fix use-after-free (try #6) Jeff Layton 2008-01-08 19:33 ` [PATCH 1/6] SUNRPC: spin svc_rqst initialization to its own function Jeff Layton 2008-01-08 19:33 ` [PATCH 2/6] SUNRPC: export svc_sock_update_bufs Jeff Layton 2008-01-08 19:33 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Jeff Layton 2008-01-08 19:33 ` [PATCH 4/6] NLM: Have lockd call try_to_freeze Jeff Layton 2008-01-08 19:33 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Jeff Layton 2008-01-08 19:33 ` [PATCH 6/6] NLM: Add reference counting to lockd Jeff Layton 2008-01-09 17:47 ` Christoph Hellwig 2008-01-09 18:36 ` Jeff Layton 2008-01-09 18:48 ` Christoph Hellwig 2008-01-09 18:59 ` Jeff Layton 2008-01-10 3:29 ` Neil Brown 2008-01-10 11:58 ` Jeff Layton 2008-01-09 17:45 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Christoph Hellwig 2008-01-09 18:08 ` Jeff Layton 2008-01-09 17:35 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Christoph Hellwig 2008-01-09 18:05 ` Jeff Layton 2008-01-09 18:14 ` Christoph Hellwig 2008-01-13 13:27 ` Jeff Layton 2008-01-13 18:17 ` Christoph Hellwig 2008-01-13 19:12 ` J. Bruce Fields 2008-01-14 14:24 ` Jeff Layton 2008-01-14 14:25 ` Christoph Hellwig 2008-03-15 3:44 ` Mike Snitzer 2008-03-15 6:34 ` Christoph Hellwig -- strict thread matches above, loose matches on Subject: below -- 2008-01-05 12:02 [PATCH 0/6] Intro: convert lockd to kthread and fix use-after-free (try #5) Jeff Layton 2008-01-05 12:02 ` [PATCH 1/6] SUNRPC: spin svc_rqst initialization to its own function Jeff Layton 2008-01-05 12:02 ` [PATCH 2/6] SUNRPC: export svc_sock_update_bufs Jeff Layton 2008-01-05 12:02 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Jeff Layton 2008-01-05 12:02 ` [PATCH 4/6] NLM: Have lockd call try_to_freeze Jeff Layton 2008-01-05 12:02 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Jeff Layton 2008-01-05 12:02 ` [PATCH 6/6] NLM: Add reference counting to lockd Jeff Layton 2008-01-08 6:46 ` Neil Brown 2008-01-08 13:26 ` Jeff Layton 2008-01-08 15:52 ` Wendy Cheng 2008-01-08 16:13 ` Jeff Layton 2008-01-08 16:13 ` Peter Staubach 2007-12-13 20:40 [PATCH 0/6] Intro: convert lockd to kthread and fix use-after-free Jeff Layton 2007-12-13 20:40 ` [PATCH 1/6] SUNRPC: Allow svc_pool_map_set_cpumask to work with any task Jeff Layton 2007-12-13 20:40 ` [PATCH 2/6] SUNRPC: Break up __svc_create_thread and make svc_create_kthread Jeff Layton 2007-12-13 20:40 ` [PATCH 3/6] NLM: Initialize completion variable in lockd_up Jeff Layton 2007-12-13 20:40 ` [PATCH 4/6] NLM: Have lockd call try_to_freeze Jeff Layton 2007-12-13 20:40 ` [PATCH 5/6] NLM: Convert lockd to use kthreads Jeff Layton 2007-12-13 20:40 ` [PATCH 6/6] NLM: Add reference counting to lockd Jeff Layton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).