All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility
@ 2017-09-01 15:40 David Howells
  2017-09-01 15:41 ` [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions David Howells
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:40 UTC (permalink / raw)
  To: linux-afs; +Cc: Tejun Heo, linux-fsdevel, dhowells, Lai Jiangshan, linux-kernel

Add a facility to the workqueue subsystem whereby an atomic_t can be
registered by a work function such that the work function dispatcher will
decrement the atomic after the work function has returned and then call
wake_up_atomic() on it if it reached 0.

This is analogous to complete_and_exit() for kernel threads and is used to
avoid a race between notifying that a work item is about to finish and the
.text segment from a module being discarded.

The way this is used is that the work function calls:

	dec_after_work(atomic_t *counter);

to register the counter and then process_one_work() calls it, potentially
wakes it and clears the registration.

The reason I've used an atomic_t rather than a completion is that (1) it
takes up less space and (2) it can monitor multiple objects.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tejun Heo <tj@kernel.org>
cc: Lai Jiangshan <jiangshanlai@gmail.com>
---

 include/linux/workqueue.h   |    1 +
 kernel/workqueue.c          |   25 +++++++++++++++++++++++++
 kernel/workqueue_internal.h |    1 +
 3 files changed, 27 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index db6dc9dc0482..ceaed1387e9b 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -451,6 +451,7 @@ extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 
 extern void flush_workqueue(struct workqueue_struct *wq);
 extern void drain_workqueue(struct workqueue_struct *wq);
+extern void dec_after_work(atomic_t *counter);
 
 extern int schedule_on_each_cpu(work_func_t func);
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ca937b0c3a96..2936ad0ab293 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2112,6 +2112,12 @@ __acquires(&pool->lock)
 		dump_stack();
 	}
 
+	if (worker->dec_after) {
+		if (atomic_dec_and_test(worker->dec_after))
+			wake_up_atomic_t(worker->dec_after);
+		worker->dec_after = NULL;
+	}
+
 	/*
 	 * The following prevents a kworker from hogging CPU on !PREEMPT
 	 * kernels, where a requeueing work item waiting for something to
@@ -3087,6 +3093,25 @@ int schedule_on_each_cpu(work_func_t func)
 }
 
 /**
+ * dec_after_work - Register counter to dec and wake after work func returns
+ * @counter: The counter to decrement and wake
+ *
+ * Register an atomic counter to be decremented after a work function returns
+ * to the core.  The counter is 'woken' if it is decremented to 0.  This allows
+ * synchronisation to be effected by one or more work functions in a module
+ * without leaving a window in which the work function code can be unloaded.
+ */
+void dec_after_work(atomic_t *counter)
+{
+	struct worker *worker = current_wq_worker();
+
+	BUG_ON(!worker);
+	BUG_ON(worker->dec_after);
+	worker->dec_after = counter;
+}
+EXPORT_SYMBOL(dec_after_work);
+
+/**
  * execute_in_process_context - reliably execute the routine with user context
  * @fn:		the function to execute
  * @ew:		guaranteed storage for the execute work structure (must
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 8635417c587b..94ea1ca9b01f 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -28,6 +28,7 @@ struct worker {
 
 	struct work_struct	*current_work;	/* L: work being processed */
 	work_func_t		current_func;	/* L: current_work's fn */
+	atomic_t		*dec_after;	/* Decrement after func returns */
 	struct pool_workqueue	*current_pwq; /* L: current_work's pwq */
 	bool			desc_valid;	/* ->desc is valid */
 	struct list_head	scheduled;	/* L: scheduled works */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
@ 2017-09-01 15:41 ` David Howells
  2017-09-01 16:42   ` Peter Zijlstra
                     ` (3 more replies)
  2017-09-01 15:41 ` [RFC PATCH 03/11] Pass mode to wait_on_atomic_t() action funcs and provide default actions David Howells
                   ` (11 subsequent siblings)
  12 siblings, 4 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:41 UTC (permalink / raw)
  To: linux-afs
  Cc: Peter Zijlstra, linux-fsdevel, dhowells, Kees Cook, linux-kernel

Implement functions that increment or decrement a refcount_t object and
return the value.  The dec-and-ret function can be used to maintain a
counter in a cache where 1 means the object is unused, but available and
the garbage collector can use refcount_dec_if_one() to make the object
unavailable.  Further, both functions can be used to accurately trace the
refcount (refcount_inc() followed by refcount_read() can't be considered
accurate).

The interface is as follows:

	unsigned int refcount_dec_return(refcount_t *r);
	unsigned int refcount_inc_return(refcount_t *r);

instead.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
cc: Kees Cook <keescook@chromium.org>
---

 include/linux/refcount.h |   12 ++++++++
 lib/refcount.c           |   67 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 591792c8e5b0..566c0cea7343 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -52,6 +52,8 @@ extern __must_check bool refcount_sub_and_test(unsigned int i, refcount_t *r);
 
 extern __must_check bool refcount_dec_and_test(refcount_t *r);
 extern void refcount_dec(refcount_t *r);
+extern __must_check unsigned int refcount_inc_return(refcount_t *r);
+extern __must_check unsigned int refcount_dec_return(refcount_t *r);
 #else
 static inline __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
 {
@@ -87,6 +89,16 @@ static inline void refcount_dec(refcount_t *r)
 {
 	atomic_dec(&r->refs);
 }
+
+static inline unsigned int refcount_inc_return(refcount_t *r)
+{
+	return atomic_inc_return(&r->refs);
+}
+
+static inline unsigned int refcount_dec_return(refcount_t *r)
+{
+	return atomic_dec_return(&r->refs);
+}
 #endif /* CONFIG_REFCOUNT_FULL */
 
 extern __must_check bool refcount_dec_if_one(refcount_t *r);
diff --git a/lib/refcount.c b/lib/refcount.c
index 5d0582a9480c..3a1d800bf830 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -154,6 +154,40 @@ void refcount_inc(refcount_t *r)
 EXPORT_SYMBOL(refcount_inc);
 
 /**
+ * refcount_inc_return - increment a refcount and return the new value
+ * @r: the refcount to increment
+ *
+ * Similar to atomic_inc_return(), but will saturate at UINT_MAX and WARN.
+ *
+ * Provides no memory ordering, it is assumed the caller has guaranteed the
+ * object memory to be stable (RCU, etc.). It does provide a control dependency
+ * and thereby orders future stores. See the comment on top.
+ *
+ * Return: the new value.
+ */
+unsigned int refcount_inc_return(refcount_t *r)
+{
+	unsigned int new, val = atomic_read(&r->refs);
+
+	do {
+		new = val + 1;
+
+		if (!val) {
+			WARN_ONCE(!val, "refcount_t: increment on 0; use-after-free.\n");
+			return 0;
+		}
+
+		if (unlikely(!new))
+			return UINT_MAX;
+
+	} while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
+
+	WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
+	return new;
+}
+EXPORT_SYMBOL(refcount_inc_return);
+
+/**
  * refcount_sub_and_test - subtract from a refcount and test if it is 0
  * @i: amount to subtract from the refcount
  * @r: the refcount
@@ -227,6 +261,39 @@ void refcount_dec(refcount_t *r)
 	WARN_ONCE(refcount_dec_and_test(r), "refcount_t: decrement hit 0; leaking memory.\n");
 }
 EXPORT_SYMBOL(refcount_dec);
+
+/**
+ * refcount_dec_return - Decrement a refcount and return the new value.
+ * @r: the refcount
+ *
+ * Similar to atomic_dec_return(), it will WARN on underflow and fail to
+ * decrement when saturated at UINT_MAX.  It isn't permitted to use this to
+ * decrement a counter to 0.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before.
+ */
+unsigned int refcount_dec_return(refcount_t *r)
+{
+	unsigned int new, val = atomic_read(&r->refs);
+
+	do {
+		if (unlikely(val == UINT_MAX))
+			return val;
+
+		new = val - 1;
+		if (unlikely(val == 0)) {
+			WARN_ONCE(val == 0, "refcount_t: underflow; use-after-free.\n");
+			return val;
+		}
+
+		WARN_ONCE(val == 1, "refcount_t: decrement hit 0; leaking memory.\n");
+
+	} while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
+
+	return new;
+}
+EXPORT_SYMBOL(refcount_dec);
 #endif /* CONFIG_REFCOUNT_FULL */
 
 /**

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 03/11] Pass mode to wait_on_atomic_t() action funcs and provide default actions
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
  2017-09-01 15:41 ` [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions David Howells
@ 2017-09-01 15:41 ` David Howells
  2017-09-01 15:41 ` [RFC PATCH 04/11] Add a function to start/reduce a timer David Howells
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:41 UTC (permalink / raw)
  To: linux-afs; +Cc: linux-fsdevel, dhowells, Ingo Molnar, linux-kernel

Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
extra argument and make it 'unsigned int throughout.

Also, consolidate a bunch of identical action functions into a default
function that can do the appropriate thing for the mode.

Also, change the argument name in the bit_wait*() function declarations to
reflect the fact that it's the mode and not the bit number.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ingo Molnar <mingo@kernel.org>
---

 arch/mips/kernel/traps.c                           |   14 +---------
 drivers/gpu/drm/drm_dp_aux_dev.c                   |    8 +-----
 drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c |   10 +------
 drivers/media/platform/qcom/venus/hfi.c            |    8 +-----
 fs/afs/rxrpc.c                                     |    8 +-----
 fs/btrfs/extent-tree.c                             |   27 ++------------------
 fs/fscache/cookie.c                                |    2 +
 fs/fscache/internal.h                              |    2 -
 fs/fscache/main.c                                  |    9 -------
 fs/nfs/inode.c                                     |    4 +--
 fs/nfs/internal.h                                  |    2 +
 fs/ocfs2/filecheck.c                               |    8 +-----
 include/linux/wait_bit.h                           |   15 +++++++----
 kernel/sched/wait_bit.c                            |   18 ++++++++++---
 14 files changed, 37 insertions(+), 98 deletions(-)

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index b68b4d0726d3..3b3589ff3ecb 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -1235,18 +1235,6 @@ static int default_cu2_call(struct notifier_block *nfb, unsigned long action,
 	return NOTIFY_OK;
 }
 
-static int wait_on_fp_mode_switch(atomic_t *p)
-{
-	/*
-	 * The FP mode for this task is currently being switched. That may
-	 * involve modifications to the format of this tasks FP context which
-	 * make it unsafe to proceed with execution for the moment. Instead,
-	 * schedule some other task.
-	 */
-	schedule();
-	return 0;
-}
-
 static int enable_restore_fp_context(int msa)
 {
 	int err, was_fpu_owner, prior_msa;
@@ -1256,7 +1244,7 @@ static int enable_restore_fp_context(int msa)
 	 * complete before proceeding.
 	 */
 	wait_on_atomic_t(&current->mm->context.fp_mode_switching,
-			 wait_on_fp_mode_switch, TASK_KILLABLE);
+			 atomic_t_wait, TASK_KILLABLE);
 
 	if (!used_math()) {
 		/* First time FP context user. */
diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c b/drivers/gpu/drm/drm_dp_aux_dev.c
index d34e5096887a..053044201e31 100644
--- a/drivers/gpu/drm/drm_dp_aux_dev.c
+++ b/drivers/gpu/drm/drm_dp_aux_dev.c
@@ -263,12 +263,6 @@ static struct drm_dp_aux_dev *drm_dp_aux_dev_get_by_aux(struct drm_dp_aux *aux)
 	return aux_dev;
 }
 
-static int auxdev_wait_atomic_t(atomic_t *p)
-{
-	schedule();
-	return 0;
-}
-
 void drm_dp_aux_unregister_devnode(struct drm_dp_aux *aux)
 {
 	struct drm_dp_aux_dev *aux_dev;
@@ -283,7 +277,7 @@ void drm_dp_aux_unregister_devnode(struct drm_dp_aux *aux)
 	mutex_unlock(&aux_idr_mutex);
 
 	atomic_dec(&aux_dev->usecount);
-	wait_on_atomic_t(&aux_dev->usecount, auxdev_wait_atomic_t,
+	wait_on_atomic_t(&aux_dev->usecount, atomic_t_wait,
 			 TASK_UNINTERRUPTIBLE);
 
 	minor = aux_dev->index;
diff --git a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
index 7276194c04f7..f4815b96db0f 100644
--- a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
@@ -271,13 +271,7 @@ struct igt_wakeup {
 	u32 seqno;
 };
 
-static int wait_atomic(atomic_t *p)
-{
-	schedule();
-	return 0;
-}
-
-static int wait_atomic_timeout(atomic_t *p)
+static int wait_atomic_timeout(atomic_t *p, unsigned int mode)
 {
 	return schedule_timeout(10 * HZ) ? 0 : -ETIMEDOUT;
 }
@@ -348,7 +342,7 @@ static void igt_wake_all_sync(atomic_t *ready,
 	atomic_set(ready, 0);
 	wake_up_all(wq);
 
-	wait_on_atomic_t(set, wait_atomic, TASK_UNINTERRUPTIBLE);
+	wait_on_atomic_t(set, atomic_t_wait, TASK_UNINTERRUPTIBLE);
 	atomic_set(ready, count);
 	atomic_set(done, count);
 }
diff --git a/drivers/media/platform/qcom/venus/hfi.c b/drivers/media/platform/qcom/venus/hfi.c
index c09490876516..e374c7d1a618 100644
--- a/drivers/media/platform/qcom/venus/hfi.c
+++ b/drivers/media/platform/qcom/venus/hfi.c
@@ -88,12 +88,6 @@ int hfi_core_init(struct venus_core *core)
 	return ret;
 }
 
-static int core_deinit_wait_atomic_t(atomic_t *p)
-{
-	schedule();
-	return 0;
-}
-
 int hfi_core_deinit(struct venus_core *core, bool blocking)
 {
 	int ret = 0, empty;
@@ -112,7 +106,7 @@ int hfi_core_deinit(struct venus_core *core, bool blocking)
 
 	if (!empty) {
 		mutex_unlock(&core->lock);
-		wait_on_atomic_t(&core->insts_count, core_deinit_wait_atomic_t,
+		wait_on_atomic_t(&core->insts_count, atomic_t_wait,
 				 TASK_UNINTERRUPTIBLE);
 		mutex_lock(&core->lock);
 	}
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 0bf191f0dbaf..cc7f7b3369ab 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -41,12 +41,6 @@ static void afs_charge_preallocation(struct work_struct *);
 
 static DECLARE_WORK(afs_charge_preallocation_work, afs_charge_preallocation);
 
-static int afs_wait_atomic_t(atomic_t *p)
-{
-	schedule();
-	return 0;
-}
-
 /*
  * open an RxRPC socket and bind it to be a server for callback notifications
  * - the socket is left in blocking mode and non-blocking ops use MSG_DONTWAIT
@@ -121,7 +115,7 @@ void afs_close_socket(void)
 	}
 
 	_debug("outstanding %u", atomic_read(&afs_outstanding_calls));
-	wait_on_atomic_t(&afs_outstanding_calls, afs_wait_atomic_t,
+	wait_on_atomic_t(&afs_outstanding_calls, atomic_t_wait,
 			 TASK_UNINTERRUPTIBLE);
 	_debug("no outstanding calls");
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e3b0b4196d3d..14c8437eb4fc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3934,16 +3934,9 @@ void btrfs_dec_nocow_writers(struct btrfs_fs_info *fs_info, u64 bytenr)
 	btrfs_put_block_group(bg);
 }
 
-static int btrfs_wait_nocow_writers_atomic_t(atomic_t *a)
-{
-	schedule();
-	return 0;
-}
-
 void btrfs_wait_nocow_writers(struct btrfs_block_group_cache *bg)
 {
-	wait_on_atomic_t(&bg->nocow_writers,
-			 btrfs_wait_nocow_writers_atomic_t,
+	wait_on_atomic_t(&bg->nocow_writers, atomic_t_wait,
 			 TASK_UNINTERRUPTIBLE);
 }
 
@@ -6523,12 +6516,6 @@ void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
 	btrfs_put_block_group(bg);
 }
 
-static int btrfs_wait_bg_reservations_atomic_t(atomic_t *a)
-{
-	schedule();
-	return 0;
-}
-
 void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
 {
 	struct btrfs_space_info *space_info = bg->space_info;
@@ -6551,8 +6538,7 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg)
 	down_write(&space_info->groups_sem);
 	up_write(&space_info->groups_sem);
 
-	wait_on_atomic_t(&bg->reservations,
-			 btrfs_wait_bg_reservations_atomic_t,
+	wait_on_atomic_t(&bg->reservations, atomic_t_wait,
 			 TASK_UNINTERRUPTIBLE);
 }
 
@@ -11036,12 +11022,6 @@ int btrfs_start_write_no_snapshoting(struct btrfs_root *root)
 	return 1;
 }
 
-static int wait_snapshoting_atomic_t(atomic_t *a)
-{
-	schedule();
-	return 0;
-}
-
 void btrfs_wait_for_snapshot_creation(struct btrfs_root *root)
 {
 	while (true) {
@@ -11050,8 +11030,7 @@ void btrfs_wait_for_snapshot_creation(struct btrfs_root *root)
 		ret = btrfs_start_write_no_snapshoting(root);
 		if (ret)
 			break;
-		wait_on_atomic_t(&root->will_be_snapshoted,
-				 wait_snapshoting_atomic_t,
+		wait_on_atomic_t(&root->will_be_snapshoted, atomic_t_wait,
 				 TASK_UNINTERRUPTIBLE);
 	}
 }
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 40d61077bead..ff84258132bb 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -558,7 +558,7 @@ void __fscache_disable_cookie(struct fscache_cookie *cookie, bool invalidate)
 	 * have completed.
 	 */
 	if (!atomic_dec_and_test(&cookie->n_active))
-		wait_on_atomic_t(&cookie->n_active, fscache_wait_atomic_t,
+		wait_on_atomic_t(&cookie->n_active, atomic_t_wait,
 				 TASK_UNINTERRUPTIBLE);
 
 	/* Make sure any pending writes are cancelled. */
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 97ec45110957..0ff4b49a0037 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -97,8 +97,6 @@ static inline bool fscache_object_congested(void)
 	return workqueue_congested(WORK_CPU_UNBOUND, fscache_object_wq);
 }
 
-extern int fscache_wait_atomic_t(atomic_t *);
-
 /*
  * object.c
  */
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index b39d487ccfb0..249968dcbf5c 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -195,12 +195,3 @@ static void __exit fscache_exit(void)
 }
 
 module_exit(fscache_exit);
-
-/*
- * wait_on_atomic_t() sleep function for uninterruptible waiting
- */
-int fscache_wait_atomic_t(atomic_t *p)
-{
-	schedule();
-	return 0;
-}
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 109279d6d91b..8b0475b3d48e 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -85,9 +85,9 @@ int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
 }
 EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
 
-int nfs_wait_atomic_killable(atomic_t *p)
+int nfs_wait_atomic_killable(atomic_t *p, unsigned int mode)
 {
-	return nfs_wait_killable(TASK_KILLABLE);
+	return nfs_wait_killable(mode);
 }
 
 /**
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index dc456416d2be..39998b94a3de 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -388,7 +388,7 @@ extern void nfs_evict_inode(struct inode *);
 void nfs_zap_acl_cache(struct inode *inode);
 extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
 extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
-extern int nfs_wait_atomic_killable(atomic_t *p);
+extern int nfs_wait_atomic_killable(atomic_t *p, unsigned int mode);
 
 /* super.c */
 extern const struct super_operations nfs_sops;
diff --git a/fs/ocfs2/filecheck.c b/fs/ocfs2/filecheck.c
index 2cabbcf2f28e..e87279e49ba3 100644
--- a/fs/ocfs2/filecheck.c
+++ b/fs/ocfs2/filecheck.c
@@ -129,19 +129,13 @@ static struct kobj_attribute ocfs2_attr_filecheck_set =
 					ocfs2_filecheck_show,
 					ocfs2_filecheck_store);
 
-static int ocfs2_filecheck_sysfs_wait(atomic_t *p)
-{
-	schedule();
-	return 0;
-}
-
 static void
 ocfs2_filecheck_sysfs_free(struct ocfs2_filecheck_sysfs_entry *entry)
 {
 	struct ocfs2_filecheck_entry *p;
 
 	if (!atomic_dec_and_test(&entry->fs_count))
-		wait_on_atomic_t(&entry->fs_count, ocfs2_filecheck_sysfs_wait,
+		wait_on_atomic_t(&entry->fs_count, atomic_t_wait,
 				 TASK_UNINTERRUPTIBLE);
 
 	spin_lock(&entry->fs_fcheck->fc_lock);
diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h
index 12b26660d7e9..40b9dd5b5ba1 100644
--- a/include/linux/wait_bit.h
+++ b/include/linux/wait_bit.h
@@ -25,6 +25,8 @@ struct wait_bit_queue_entry {
 	{ .flags = p, .bit_nr = WAIT_ATOMIC_T_BIT_NR, }
 
 typedef int wait_bit_action_f(struct wait_bit_key *key, int mode);
+typedef int wait_atomic_t_action_f(atomic_t *counter, unsigned int mode);
+
 void __wake_up_bit(struct wait_queue_head *wq_head, void *word, int bit);
 int __wait_on_bit(struct wait_queue_head *wq_head, struct wait_bit_queue_entry *wbq_entry, wait_bit_action_f *action, unsigned int mode);
 int __wait_on_bit_lock(struct wait_queue_head *wq_head, struct wait_bit_queue_entry *wbq_entry, wait_bit_action_f *action, unsigned int mode);
@@ -33,7 +35,7 @@ void wake_up_atomic_t(atomic_t *p);
 int out_of_line_wait_on_bit(void *word, int, wait_bit_action_f *action, unsigned int mode);
 int out_of_line_wait_on_bit_timeout(void *word, int, wait_bit_action_f *action, unsigned int mode, unsigned long timeout);
 int out_of_line_wait_on_bit_lock(void *word, int, wait_bit_action_f *action, unsigned int mode);
-int out_of_line_wait_on_atomic_t(atomic_t *p, int (*)(atomic_t *), unsigned int mode);
+int out_of_line_wait_on_atomic_t(atomic_t *p, wait_atomic_t_action_f action, unsigned int mode);
 struct wait_queue_head *bit_waitqueue(void *word, int bit);
 extern void __init wait_bit_init(void);
 
@@ -50,10 +52,11 @@ int wake_bit_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync
 		},								\
 	}
 
-extern int bit_wait(struct wait_bit_key *key, int bit);
-extern int bit_wait_io(struct wait_bit_key *key, int bit);
-extern int bit_wait_timeout(struct wait_bit_key *key, int bit);
-extern int bit_wait_io_timeout(struct wait_bit_key *key, int bit);
+extern int bit_wait(struct wait_bit_key *key, int mode);
+extern int bit_wait_io(struct wait_bit_key *key, int mode);
+extern int bit_wait_timeout(struct wait_bit_key *key, int mode);
+extern int bit_wait_io_timeout(struct wait_bit_key *key, int mode);
+extern int atomic_t_wait(atomic_t *counter, unsigned int mode);
 
 /**
  * wait_on_bit - wait for a bit to be cleared
@@ -250,7 +253,7 @@ wait_on_bit_lock_action(unsigned long *word, int bit, wait_bit_action_f *action,
  * outside of the target 'word'.
  */
 static inline
-int wait_on_atomic_t(atomic_t *val, int (*action)(atomic_t *), unsigned mode)
+int wait_on_atomic_t(atomic_t *val, wait_atomic_t_action_f action, unsigned mode)
 {
 	might_sleep();
 	if (atomic_read(val) == 0)
diff --git a/kernel/sched/wait_bit.c b/kernel/sched/wait_bit.c
index f8159698aa4d..84cb3acd9260 100644
--- a/kernel/sched/wait_bit.c
+++ b/kernel/sched/wait_bit.c
@@ -183,7 +183,7 @@ static int wake_atomic_t_function(struct wait_queue_entry *wq_entry, unsigned mo
  */
 static __sched
 int __wait_on_atomic_t(struct wait_queue_head *wq_head, struct wait_bit_queue_entry *wbq_entry,
-		       int (*action)(atomic_t *), unsigned mode)
+		       wait_atomic_t_action_f action, unsigned int mode)
 {
 	atomic_t *val;
 	int ret = 0;
@@ -193,7 +193,7 @@ int __wait_on_atomic_t(struct wait_queue_head *wq_head, struct wait_bit_queue_en
 		val = wbq_entry->key.flags;
 		if (atomic_read(val) == 0)
 			break;
-		ret = (*action)(val);
+		ret = (*action)(val, mode);
 	} while (!ret && atomic_read(val) != 0);
 	finish_wait(wq_head, &wbq_entry->wq_entry);
 	return ret;
@@ -210,8 +210,9 @@ int __wait_on_atomic_t(struct wait_queue_head *wq_head, struct wait_bit_queue_en
 		},							\
 	}
 
-__sched int out_of_line_wait_on_atomic_t(atomic_t *p, int (*action)(atomic_t *),
-					 unsigned mode)
+__sched int out_of_line_wait_on_atomic_t(atomic_t *p,
+					 wait_atomic_t_action_f action,
+					 unsigned int mode)
 {
 	struct wait_queue_head *wq_head = atomic_t_waitqueue(p);
 	DEFINE_WAIT_ATOMIC_T(wq_entry, p);
@@ -220,6 +221,15 @@ __sched int out_of_line_wait_on_atomic_t(atomic_t *p, int (*action)(atomic_t *),
 }
 EXPORT_SYMBOL(out_of_line_wait_on_atomic_t);
 
+__sched int atomic_t_wait(atomic_t *counter, unsigned int mode)
+{
+	schedule();
+	if (signal_pending_state(mode, current))
+		return -EINTR;
+	return 0;
+}
+EXPORT_SYMBOL(atomic_t_wait);
+
 /**
  * wake_up_atomic_t - Wake up a waiter on a atomic_t
  * @p: The atomic_t being waited on, a kernel virtual address

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 04/11] Add a function to start/reduce a timer
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
  2017-09-01 15:41 ` [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions David Howells
  2017-09-01 15:41 ` [RFC PATCH 03/11] Pass mode to wait_on_atomic_t() action funcs and provide default actions David Howells
@ 2017-09-01 15:41 ` David Howells
  2017-10-20 12:20   ` Thomas Gleixner
  2017-11-09  0:33   ` David Howells
  2017-09-01 15:41 ` [RFC PATCH 05/11] afs: Lay the groundwork for supporting network namespaces David Howells
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:41 UTC (permalink / raw)
  To: linux-afs; +Cc: linux-fsdevel, dhowells, Thomas Gleixner, linux-kernel

Add a function, similar to mod_timer(), that will start a timer it isn't
running and will modify it if it is running and has an expiry time longer
than the new time.  If the timer is running with an expiry time that's the
same or sooner, no change is made.

The function looks like:

	int reduce_timer(struct timer_list *timer, unsigned long expires);

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Thomas Gleixner <tglx@linutronix.de>
---

 include/linux/timer.h |    1 +
 kernel/time/timer.c   |   49 +++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index e6789b8757d5..6ec5d897606d 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -187,6 +187,7 @@ extern void add_timer_on(struct timer_list *timer, int cpu);
 extern int del_timer(struct timer_list * timer);
 extern int mod_timer(struct timer_list *timer, unsigned long expires);
 extern int mod_timer_pending(struct timer_list *timer, unsigned long expires);
+extern int reduce_timer(struct timer_list *timer, unsigned long expires);
 
 /*
  * The jiffies value which is added to now, when there is no timer
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 8f5d1bf18854..6fcbad70a924 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -922,8 +922,11 @@ static struct timer_base *lock_timer_base(struct timer_list *timer,
 	}
 }
 
+#define MOD_TIMER_PENDING_ONLY	0x01
+#define MOD_TIMER_REDUCE	0x02
+
 static inline int
-__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
+__mod_timer(struct timer_list *timer, unsigned long expires, unsigned int options)
 {
 	struct timer_base *base, *new_base;
 	unsigned int idx = UINT_MAX;
@@ -938,8 +941,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
 	 * same array bucket then just return:
 	 */
 	if (timer_pending(timer)) {
-		if (timer->expires == expires)
-			return 1;
+		if (options & MOD_TIMER_REDUCE) {
+			if (time_before_eq(timer->expires, expires))
+				return 1;
+		} else {
+			if (timer->expires == expires)
+				return 1;
+		}
 
 		/*
 		 * We lock timer base and calculate the bucket index right
@@ -949,6 +957,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
 		 */
 		base = lock_timer_base(timer, &flags);
 
+		if (timer_pending(timer) &&
+		    options & MOD_TIMER_REDUCE &&
+		    time_before_eq(timer->expires, expires)) {
+			ret = 1;
+			goto out_unlock;
+		}
+
 		clk = base->clk;
 		idx = calc_wheel_index(expires, clk);
 
@@ -958,7 +973,10 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
 		 * subsequent call will exit in the expires check above.
 		 */
 		if (idx == timer_get_idx(timer)) {
-			timer->expires = expires;
+			if (!(options & MOD_TIMER_REDUCE))
+				timer->expires = expires;
+			else if (time_after(timer->expires, expires))
+				timer->expires = expires;
 			ret = 1;
 			goto out_unlock;
 		}
@@ -967,7 +985,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
 	}
 
 	ret = detach_if_pending(timer, base, false);
-	if (!ret && pending_only)
+	if (!ret && options & MOD_TIMER_PENDING_ONLY)
 		goto out_unlock;
 
 	debug_activate(timer, expires);
@@ -1030,7 +1048,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
  */
 int mod_timer_pending(struct timer_list *timer, unsigned long expires)
 {
-	return __mod_timer(timer, expires, true);
+	return __mod_timer(timer, expires, MOD_TIMER_PENDING_ONLY);
 }
 EXPORT_SYMBOL(mod_timer_pending);
 
@@ -1056,11 +1074,26 @@ EXPORT_SYMBOL(mod_timer_pending);
  */
 int mod_timer(struct timer_list *timer, unsigned long expires)
 {
-	return __mod_timer(timer, expires, false);
+	return __mod_timer(timer, expires, 0);
 }
 EXPORT_SYMBOL(mod_timer);
 
 /**
+ * reduce_timer - modify a timer's timeout if it would reduce the timeout
+ * @timer: the timer to be modified
+ * @expires: new timeout in jiffies
+ *
+ * reduce_timer() is very similar to mod_timer(), except that it will only
+ * modify a running timer if that would reduce the expiration time (it will
+ * start a timer that isn't running).
+ */
+int reduce_timer(struct timer_list *timer, unsigned long expires)
+{
+	return __mod_timer(timer, expires, MOD_TIMER_REDUCE);
+}
+EXPORT_SYMBOL(reduce_timer);
+
+/**
  * add_timer - start a timer
  * @timer: the timer to be added
  *
@@ -1707,7 +1740,7 @@ signed long __sched schedule_timeout(signed long timeout)
 	expire = timeout + jiffies;
 
 	setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
-	__mod_timer(&timer, expire, false);
+	__mod_timer(&timer, expire, 0);
 	schedule();
 	del_singleshot_timer_sync(&timer);
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 05/11] afs: Lay the groundwork for supporting network namespaces
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
                   ` (2 preceding siblings ...)
  2017-09-01 15:41 ` [RFC PATCH 04/11] Add a function to start/reduce a timer David Howells
@ 2017-09-01 15:41 ` David Howells
  2017-09-01 15:41 ` [RFC PATCH 06/11] afs: Add some protocol defs David Howells
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:41 UTC (permalink / raw)
  To: linux-afs; +Cc: linux-fsdevel, dhowells, linux-kernel

Lay the groundwork for supporting network namespaces (netns) to the AFS
filesystem by moving various global features to a network-namespace struct
(afs_net) and providing an instance of this as a temporary global variable
that everything uses via accessor functions for the moment.

The following changes have been made:

 (1) Store the netns in the superblock info.  This will be obtained from
     the mounter's nsproxy on a manual mount and inherited from the parent
     superblock on an automount.

 (2) The cell list is made per-netns.  It can be viewed through
     /proc/net/afs/cells and also be modified by writing commands to that
     file.

 (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
     This is unset by default.

 (4) The 'rootcell' module parameter, which sets a cell and VL server list
     modifies the init net namespace, thereby allowing an AFS root fs to be
     theoretically used.

 (5) The volume location lists and the file lock manager are made
     per-netns.

 (6) The AF_RXRPC socket and associated I/O bits are made per-ns.

The various workqueues remain global for the moment.

Changes still to be made:

 (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
     from the old name.

 (2) A per-netns subsys needs to be registered for AFS into which it can
     store its per-netns data.

 (3) Rather than the AF_RXRPC socket being opened on module init, it needs
     to be opened on the creation of a superblock in that netns.

 (4) The socket needs to be closed when the last superblock using it is
     destroyed and all outstanding client calls on it have been completed.
     This prevents a reference loop on the namespace.

 (5) It is possible that several namespaces will want to use AFS, in which
     case each one will need its own UDP port.  These can either be set
     through /proc/net/afs/cm_port or the kernel can pick one at random.
     The init_ns gets 7001 by default.

Issues:

 (1) The DNS keyring needs net-namespacing.

 (2) Where do upcalls go (eg. DNS request-key upcall)?

 (3) Need something like open_socket_in_file_ns() syscall so that AFS
     command line tools attempting to operate on an AFS file/volume have
     their RPC calls go to the right place.
---

 fs/afs/afs.h               |    9 +++
 fs/afs/callback.c          |   24 +------
 fs/afs/cell.c              |  130 ++++++++++++++++++-------------------
 fs/afs/cmservice.c         |   26 ++++---
 fs/afs/flock.c             |   39 +----------
 fs/afs/fsclient.c          |   56 +++++++++++-----
 fs/afs/internal.h          |  153 ++++++++++++++++++++++++++++++++------------
 fs/afs/main.c              |  153 ++++++++++++++++++++++++++++++--------------
 fs/afs/proc.c              |   64 ++++++++++++------
 fs/afs/rxrpc.c             |  125 ++++++++++++++++++------------------
 fs/afs/server.c            |   82 +++++++++++-------------
 fs/afs/super.c             |   45 +++++++------
 fs/afs/vlclient.c          |   10 ++-
 fs/afs/vlocation.c         |  151 ++++++++++++++++++++-----------------------
 fs/afs/volume.c            |   10 +--
 include/uapi/linux/magic.h |    1 
 16 files changed, 589 insertions(+), 489 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index 3c462ff6db63..93053115bcfc 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -72,6 +72,15 @@ struct afs_callback {
 
 #define AFSCBMAX 50	/* maximum callbacks transferred per bulk op */
 
+struct afs_uuid {
+	__be32		time_low;			/* low part of timestamp */
+	__be16		time_mid;			/* mid part of timestamp */
+	__be16		time_hi_and_version;		/* high part of timestamp and version  */
+	__u8		clock_seq_hi_and_reserved;	/* clock seq hi and variant */
+	__u8		clock_seq_low;			/* clock seq low */
+	__u8		node[6];			/* spatially unique node ID (MAC addr) */
+};
+
 /*
  * AFS volume information
  */
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 25d404d22cae..d12dffb76b67 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -28,9 +28,7 @@ unsigned afs_vnode_update_timeout = 10;
 	CIRC_SPACE((server)->cb_break_head, (server)->cb_break_tail,	\
 		   ARRAY_SIZE((server)->cb_break))
 
-//static void afs_callback_updater(struct work_struct *);
-
-static struct workqueue_struct *afs_callback_update_worker;
+struct workqueue_struct *afs_callback_update_worker;
 
 /*
  * allow the fileserver to request callback state (re-)initialisation
@@ -343,7 +341,7 @@ void afs_dispatch_give_up_callbacks(struct work_struct *work)
 	 *   had callbacks entirely, and the server will call us later to break
 	 *   them
 	 */
-	afs_fs_give_up_callbacks(server, true);
+	afs_fs_give_up_callbacks(server->cell->net, server, true);
 }
 
 /*
@@ -456,21 +454,3 @@ static void afs_callback_updater(struct work_struct *work)
 	afs_put_vnode(vl);
 }
 #endif
-
-/*
- * initialise the callback update process
- */
-int __init afs_callback_update_init(void)
-{
-	afs_callback_update_worker = alloc_ordered_workqueue("kafs_callbackd",
-							     WQ_MEM_RECLAIM);
-	return afs_callback_update_worker ? 0 : -ENOMEM;
-}
-
-/*
- * shut down the callback update process
- */
-void afs_callback_update_kill(void)
-{
-	destroy_workqueue(afs_callback_update_worker);
-}
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index ca0a3cf93791..bd570fa539a0 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -18,20 +18,12 @@
 #include <keys/rxrpc-type.h>
 #include "internal.h"
 
-DECLARE_RWSEM(afs_proc_cells_sem);
-LIST_HEAD(afs_proc_cells);
-
-static LIST_HEAD(afs_cells);
-static DEFINE_RWLOCK(afs_cells_lock);
-static DECLARE_RWSEM(afs_cells_sem); /* add/remove serialisation */
-static DECLARE_WAIT_QUEUE_HEAD(afs_cells_freeable_wq);
-static struct afs_cell *afs_cell_root;
-
 /*
  * allocate a cell record and fill in its name, VL server address list and
  * allocate an anonymous key
  */
-static struct afs_cell *afs_cell_alloc(const char *name, unsigned namelen,
+static struct afs_cell *afs_cell_alloc(struct afs_net *net,
+				       const char *name, unsigned namelen,
 				       char *vllist)
 {
 	struct afs_cell *cell;
@@ -62,6 +54,7 @@ static struct afs_cell *afs_cell_alloc(const char *name, unsigned namelen,
 
 	atomic_set(&cell->usage, 1);
 	INIT_LIST_HEAD(&cell->link);
+	cell->net = net;
 	rwlock_init(&cell->servers_lock);
 	INIT_LIST_HEAD(&cell->servers);
 	init_rwsem(&cell->vl_sem);
@@ -142,12 +135,14 @@ static struct afs_cell *afs_cell_alloc(const char *name, unsigned namelen,
 
 /*
  * afs_cell_crate() - create a cell record
+ * @net:	The network namespace
  * @name:	is the name of the cell.
  * @namsesz:	is the strlen of the cell name.
  * @vllist:	is a colon separated list of IP addresses in "a.b.c.d" format.
  * @retref:	is T to return the cell reference when the cell exists.
  */
-struct afs_cell *afs_cell_create(const char *name, unsigned namesz,
+struct afs_cell *afs_cell_create(struct afs_net *net,
+				 const char *name, unsigned namesz,
 				 char *vllist, bool retref)
 {
 	struct afs_cell *cell;
@@ -155,23 +150,23 @@ struct afs_cell *afs_cell_create(const char *name, unsigned namesz,
 
 	_enter("%*.*s,%s", namesz, namesz, name ?: "", vllist);
 
-	down_write(&afs_cells_sem);
-	read_lock(&afs_cells_lock);
-	list_for_each_entry(cell, &afs_cells, link) {
+	down_write(&net->cells_sem);
+	read_lock(&net->cells_lock);
+	list_for_each_entry(cell, &net->cells, link) {
 		if (strncasecmp(cell->name, name, namesz) == 0)
 			goto duplicate_name;
 	}
-	read_unlock(&afs_cells_lock);
+	read_unlock(&net->cells_lock);
 
-	cell = afs_cell_alloc(name, namesz, vllist);
+	cell = afs_cell_alloc(net, name, namesz, vllist);
 	if (IS_ERR(cell)) {
 		_leave(" = %ld", PTR_ERR(cell));
-		up_write(&afs_cells_sem);
+		up_write(&net->cells_sem);
 		return cell;
 	}
 
 	/* add a proc directory for this cell */
-	ret = afs_proc_cell_setup(cell);
+	ret = afs_proc_cell_setup(net, cell);
 	if (ret < 0)
 		goto error;
 
@@ -183,20 +178,20 @@ struct afs_cell *afs_cell_create(const char *name, unsigned namesz,
 #endif
 
 	/* add to the cell lists */
-	write_lock(&afs_cells_lock);
-	list_add_tail(&cell->link, &afs_cells);
-	write_unlock(&afs_cells_lock);
+	write_lock(&net->cells_lock);
+	list_add_tail(&cell->link, &net->cells);
+	write_unlock(&net->cells_lock);
 
-	down_write(&afs_proc_cells_sem);
-	list_add_tail(&cell->proc_link, &afs_proc_cells);
-	up_write(&afs_proc_cells_sem);
-	up_write(&afs_cells_sem);
+	down_write(&net->proc_cells_sem);
+	list_add_tail(&cell->proc_link, &net->proc_cells);
+	up_write(&net->proc_cells_sem);
+	up_write(&net->cells_sem);
 
 	_leave(" = %p", cell);
 	return cell;
 
 error:
-	up_write(&afs_cells_sem);
+	up_write(&net->cells_sem);
 	key_put(cell->anonymous_key);
 	kfree(cell);
 	_leave(" = %d", ret);
@@ -206,8 +201,8 @@ struct afs_cell *afs_cell_create(const char *name, unsigned namesz,
 	if (retref && !IS_ERR(cell))
 		afs_get_cell(cell);
 
-	read_unlock(&afs_cells_lock);
-	up_write(&afs_cells_sem);
+	read_unlock(&net->cells_lock);
+	up_write(&net->cells_sem);
 
 	if (retref) {
 		_leave(" = %p", cell);
@@ -223,7 +218,7 @@ struct afs_cell *afs_cell_create(const char *name, unsigned namesz,
  * - can be called with a module parameter string
  * - can be called from a write to /proc/fs/afs/rootcell
  */
-int afs_cell_init(char *rootcell)
+int afs_cell_init(struct afs_net *net, char *rootcell)
 {
 	struct afs_cell *old_root, *new_root;
 	char *cp;
@@ -245,17 +240,17 @@ int afs_cell_init(char *rootcell)
 		*cp++ = 0;
 
 	/* allocate a cell record for the root cell */
-	new_root = afs_cell_create(rootcell, strlen(rootcell), cp, false);
+	new_root = afs_cell_create(net, rootcell, strlen(rootcell), cp, false);
 	if (IS_ERR(new_root)) {
 		_leave(" = %ld", PTR_ERR(new_root));
 		return PTR_ERR(new_root);
 	}
 
 	/* install the new cell */
-	write_lock(&afs_cells_lock);
-	old_root = afs_cell_root;
-	afs_cell_root = new_root;
-	write_unlock(&afs_cells_lock);
+	write_lock(&net->cells_lock);
+	old_root = net->ws_cell;
+	net->ws_cell = new_root;
+	write_unlock(&net->cells_lock);
 	afs_put_cell(old_root);
 
 	_leave(" = 0");
@@ -265,19 +260,20 @@ int afs_cell_init(char *rootcell)
 /*
  * lookup a cell record
  */
-struct afs_cell *afs_cell_lookup(const char *name, unsigned namesz,
+struct afs_cell *afs_cell_lookup(struct afs_net *net,
+				 const char *name, unsigned namesz,
 				 bool dns_cell)
 {
 	struct afs_cell *cell;
 
 	_enter("\"%*.*s\",", namesz, namesz, name ?: "");
 
-	down_read(&afs_cells_sem);
-	read_lock(&afs_cells_lock);
+	down_read(&net->cells_sem);
+	read_lock(&net->cells_lock);
 
 	if (name) {
 		/* if the cell was named, look for it in the cell record list */
-		list_for_each_entry(cell, &afs_cells, link) {
+		list_for_each_entry(cell, &net->cells, link) {
 			if (strncmp(cell->name, name, namesz) == 0) {
 				afs_get_cell(cell);
 				goto found;
@@ -289,7 +285,7 @@ struct afs_cell *afs_cell_lookup(const char *name, unsigned namesz,
 	found:
 		;
 	} else {
-		cell = afs_cell_root;
+		cell = net->ws_cell;
 		if (!cell) {
 			/* this should not happen unless user tries to mount
 			 * when root cell is not set. Return an impossibly
@@ -304,16 +300,16 @@ struct afs_cell *afs_cell_lookup(const char *name, unsigned namesz,
 
 	}
 
-	read_unlock(&afs_cells_lock);
-	up_read(&afs_cells_sem);
+	read_unlock(&net->cells_lock);
+	up_read(&net->cells_sem);
 	_leave(" = %p", cell);
 	return cell;
 
 create_cell:
-	read_unlock(&afs_cells_lock);
-	up_read(&afs_cells_sem);
+	read_unlock(&net->cells_lock);
+	up_read(&net->cells_sem);
 
-	cell = afs_cell_create(name, namesz, NULL, true);
+	cell = afs_cell_create(net, name, namesz, NULL, true);
 
 	_leave(" = %p", cell);
 	return cell;
@@ -325,14 +321,14 @@ struct afs_cell *afs_cell_lookup(const char *name, unsigned namesz,
  */
 struct afs_cell *afs_get_cell_maybe(struct afs_cell *cell)
 {
-	write_lock(&afs_cells_lock);
+	write_lock(&net->cells_lock);
 
 	if (cell && !list_empty(&cell->link))
 		afs_get_cell(cell);
 	else
 		cell = NULL;
 
-	write_unlock(&afs_cells_lock);
+	write_unlock(&net->cells_lock);
 	return cell;
 }
 #endif  /*  0  */
@@ -351,10 +347,10 @@ void afs_put_cell(struct afs_cell *cell)
 
 	/* to prevent a race, the decrement and the dequeue must be effectively
 	 * atomic */
-	write_lock(&afs_cells_lock);
+	write_lock(&cell->net->cells_lock);
 
 	if (likely(!atomic_dec_and_test(&cell->usage))) {
-		write_unlock(&afs_cells_lock);
+		write_unlock(&cell->net->cells_lock);
 		_leave("");
 		return;
 	}
@@ -362,19 +358,19 @@ void afs_put_cell(struct afs_cell *cell)
 	ASSERT(list_empty(&cell->servers));
 	ASSERT(list_empty(&cell->vl_list));
 
-	write_unlock(&afs_cells_lock);
+	wake_up(&cell->net->cells_freeable_wq);
 
-	wake_up(&afs_cells_freeable_wq);
+	write_unlock(&cell->net->cells_lock);
 
 	_leave(" [unused]");
 }
 
 /*
  * destroy a cell record
- * - must be called with the afs_cells_sem write-locked
+ * - must be called with the net->cells_sem write-locked
  * - cell->link should have been broken by the caller
  */
-static void afs_cell_destroy(struct afs_cell *cell)
+static void afs_cell_destroy(struct afs_net *net, struct afs_cell *cell)
 {
 	_enter("%p{%d,%s}", cell, atomic_read(&cell->usage), cell->name);
 
@@ -387,14 +383,14 @@ static void afs_cell_destroy(struct afs_cell *cell)
 
 		_debug("wait for cell %s", cell->name);
 		set_current_state(TASK_UNINTERRUPTIBLE);
-		add_wait_queue(&afs_cells_freeable_wq, &myself);
+		add_wait_queue(&net->cells_freeable_wq, &myself);
 
 		while (atomic_read(&cell->usage) > 0) {
 			schedule();
 			set_current_state(TASK_UNINTERRUPTIBLE);
 		}
 
-		remove_wait_queue(&afs_cells_freeable_wq, &myself);
+		remove_wait_queue(&net->cells_freeable_wq, &myself);
 		set_current_state(TASK_RUNNING);
 	}
 
@@ -403,11 +399,11 @@ static void afs_cell_destroy(struct afs_cell *cell)
 	ASSERT(list_empty(&cell->servers));
 	ASSERT(list_empty(&cell->vl_list));
 
-	afs_proc_cell_remove(cell);
+	afs_proc_cell_remove(net, cell);
 
-	down_write(&afs_proc_cells_sem);
+	down_write(&net->proc_cells_sem);
 	list_del_init(&cell->proc_link);
-	up_write(&afs_proc_cells_sem);
+	up_write(&net->proc_cells_sem);
 
 #ifdef CONFIG_AFS_FSCACHE
 	fscache_relinquish_cookie(cell->cache, 0);
@@ -422,39 +418,39 @@ static void afs_cell_destroy(struct afs_cell *cell)
  * purge in-memory cell database on module unload or afs_init() failure
  * - the timeout daemon is stopped before calling this
  */
-void afs_cell_purge(void)
+void afs_cell_purge(struct afs_net *net)
 {
 	struct afs_cell *cell;
 
 	_enter("");
 
-	afs_put_cell(afs_cell_root);
+	afs_put_cell(net->ws_cell);
 
-	down_write(&afs_cells_sem);
+	down_write(&net->cells_sem);
 
-	while (!list_empty(&afs_cells)) {
+	while (!list_empty(&net->cells)) {
 		cell = NULL;
 
 		/* remove the next cell from the front of the list */
-		write_lock(&afs_cells_lock);
+		write_lock(&net->cells_lock);
 
-		if (!list_empty(&afs_cells)) {
-			cell = list_entry(afs_cells.next,
+		if (!list_empty(&net->cells)) {
+			cell = list_entry(net->cells.next,
 					  struct afs_cell, link);
 			list_del_init(&cell->link);
 		}
 
-		write_unlock(&afs_cells_lock);
+		write_unlock(&net->cells_lock);
 
 		if (cell) {
 			_debug("PURGING CELL %s (%d)",
 			       cell->name, atomic_read(&cell->usage));
 
 			/* now the cell should be left with no references */
-			afs_cell_destroy(cell);
+			afs_cell_destroy(net, cell);
 		}
 	}
 
-	up_write(&afs_cells_sem);
+	up_write(&net->cells_sem);
 	_leave("");
 }
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 782d4d05a53b..30ce4be4165f 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -193,7 +193,7 @@ static int afs_deliver_cb_callback(struct afs_call *call)
 
 	switch (call->unmarshall) {
 	case 0:
-		rxrpc_kernel_get_peer(afs_socket, call->rxcall, &srx);
+		rxrpc_kernel_get_peer(call->net->socket, call->rxcall, &srx);
 		call->offset = 0;
 		call->unmarshall++;
 
@@ -290,7 +290,7 @@ static int afs_deliver_cb_callback(struct afs_call *call)
 
 	/* we'll need the file server record as that tells us which set of
 	 * vnodes to operate upon */
-	server = afs_find_server(&srx);
+	server = afs_find_server(call->net, &srx);
 	if (!server)
 		return -ENOTCONN;
 	call->server = server;
@@ -324,7 +324,7 @@ static int afs_deliver_cb_init_call_back_state(struct afs_call *call)
 
 	_enter("");
 
-	rxrpc_kernel_get_peer(afs_socket, call->rxcall, &srx);
+	rxrpc_kernel_get_peer(call->net->socket, call->rxcall, &srx);
 
 	ret = afs_extract_data(call, NULL, 0, false);
 	if (ret < 0)
@@ -335,7 +335,7 @@ static int afs_deliver_cb_init_call_back_state(struct afs_call *call)
 
 	/* we'll need the file server record as that tells us which set of
 	 * vnodes to operate upon */
-	server = afs_find_server(&srx);
+	server = afs_find_server(call->net, &srx);
 	if (!server)
 		return -ENOTCONN;
 	call->server = server;
@@ -357,7 +357,7 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call)
 
 	_enter("");
 
-	rxrpc_kernel_get_peer(afs_socket, call->rxcall, &srx);
+	rxrpc_kernel_get_peer(call->net->socket, call->rxcall, &srx);
 
 	_enter("{%u}", call->unmarshall);
 
@@ -407,7 +407,7 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call)
 
 	/* we'll need the file server record as that tells us which set of
 	 * vnodes to operate upon */
-	server = afs_find_server(&srx);
+	server = afs_find_server(call->net, &srx);
 	if (!server)
 		return -ENOTCONN;
 	call->server = server;
@@ -461,7 +461,7 @@ static void SRXAFSCB_ProbeUuid(struct work_struct *work)
 
 	_enter("");
 
-	if (memcmp(r, &afs_uuid, sizeof(afs_uuid)) == 0)
+	if (memcmp(r, &call->net->uuid, sizeof(call->net->uuid)) == 0)
 		reply.match = htonl(0);
 	else
 		reply.match = htonl(1);
@@ -568,13 +568,13 @@ static void SRXAFSCB_TellMeAboutYourself(struct work_struct *work)
 	memset(&reply, 0, sizeof(reply));
 	reply.ia.nifs = htonl(nifs);
 
-	reply.ia.uuid[0] = afs_uuid.time_low;
-	reply.ia.uuid[1] = htonl(ntohs(afs_uuid.time_mid));
-	reply.ia.uuid[2] = htonl(ntohs(afs_uuid.time_hi_and_version));
-	reply.ia.uuid[3] = htonl((s8) afs_uuid.clock_seq_hi_and_reserved);
-	reply.ia.uuid[4] = htonl((s8) afs_uuid.clock_seq_low);
+	reply.ia.uuid[0] = call->net->uuid.time_low;
+	reply.ia.uuid[1] = htonl(ntohs(call->net->uuid.time_mid));
+	reply.ia.uuid[2] = htonl(ntohs(call->net->uuid.time_hi_and_version));
+	reply.ia.uuid[3] = htonl((s8) call->net->uuid.clock_seq_hi_and_reserved);
+	reply.ia.uuid[4] = htonl((s8) call->net->uuid.clock_seq_low);
 	for (loop = 0; loop < 6; loop++)
-		reply.ia.uuid[loop + 5] = htonl((s8) afs_uuid.node[loop]);
+		reply.ia.uuid[loop + 5] = htonl((s8) call->net->uuid.node[loop]);
 
 	if (ifs) {
 		for (loop = 0; loop < nifs; loop++) {
diff --git a/fs/afs/flock.c b/fs/afs/flock.c
index 3191dff2c156..559ac00af5f7 100644
--- a/fs/afs/flock.c
+++ b/fs/afs/flock.c
@@ -14,48 +14,17 @@
 #define AFS_LOCK_GRANTED	0
 #define AFS_LOCK_PENDING	1
 
+struct workqueue_struct *afs_lock_manager;
+
 static void afs_fl_copy_lock(struct file_lock *new, struct file_lock *fl);
 static void afs_fl_release_private(struct file_lock *fl);
 
-static struct workqueue_struct *afs_lock_manager;
-static DEFINE_MUTEX(afs_lock_manager_mutex);
-
 static const struct file_lock_operations afs_lock_ops = {
 	.fl_copy_lock		= afs_fl_copy_lock,
 	.fl_release_private	= afs_fl_release_private,
 };
 
 /*
- * initialise the lock manager thread if it isn't already running
- */
-static int afs_init_lock_manager(void)
-{
-	int ret;
-
-	ret = 0;
-	if (!afs_lock_manager) {
-		mutex_lock(&afs_lock_manager_mutex);
-		if (!afs_lock_manager) {
-			afs_lock_manager = alloc_workqueue("kafs_lockd",
-							   WQ_MEM_RECLAIM, 0);
-			if (!afs_lock_manager)
-				ret = -ENOMEM;
-		}
-		mutex_unlock(&afs_lock_manager_mutex);
-	}
-	return ret;
-}
-
-/*
- * destroy the lock manager thread if it's running
- */
-void __exit afs_kill_lock_manager(void)
-{
-	if (afs_lock_manager)
-		destroy_workqueue(afs_lock_manager);
-}
-
-/*
  * if the callback is broken on this vnode, then the lock may now be available
  */
 void afs_lock_may_be_available(struct afs_vnode *vnode)
@@ -264,10 +233,6 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
 	if (fl->fl_start != 0 || fl->fl_end != OFFSET_MAX)
 		return -EINVAL;
 
-	ret = afs_init_lock_manager();
-	if (ret < 0)
-		return ret;
-
 	fl->fl_ops = &afs_lock_ops;
 	INIT_LIST_HEAD(&fl->fl_u.afs.link);
 	fl->fl_u.afs.state = AFS_LOCK_PENDING;
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 19f76ae36982..ce6f0159e1d4 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -284,12 +284,13 @@ int afs_fs_fetch_file_status(struct afs_server *server,
 			     bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 
 	_enter(",%x,{%x:%u},,",
 	       key_serial(key), vnode->fid.vid, vnode->fid.vnode);
 
-	call = afs_alloc_flat_call(&afs_RXFSFetchStatus, 16, (21 + 3 + 6) * 4);
+	call = afs_alloc_flat_call(net, &afs_RXFSFetchStatus, 16, (21 + 3 + 6) * 4);
 	if (!call)
 		return -ENOMEM;
 
@@ -490,11 +491,12 @@ static int afs_fs_fetch_data64(struct afs_server *server,
 			       bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 
 	_enter("");
 
-	call = afs_alloc_flat_call(&afs_RXFSFetchData64, 32, (21 + 3 + 6) * 4);
+	call = afs_alloc_flat_call(net, &afs_RXFSFetchData64, 32, (21 + 3 + 6) * 4);
 	if (!call)
 		return -ENOMEM;
 
@@ -531,6 +533,7 @@ int afs_fs_fetch_data(struct afs_server *server,
 		      bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 
 	if (upper_32_bits(req->pos) ||
@@ -540,7 +543,7 @@ int afs_fs_fetch_data(struct afs_server *server,
 
 	_enter("");
 
-	call = afs_alloc_flat_call(&afs_RXFSFetchData, 24, (21 + 3 + 6) * 4);
+	call = afs_alloc_flat_call(net, &afs_RXFSFetchData, 24, (21 + 3 + 6) * 4);
 	if (!call)
 		return -ENOMEM;
 
@@ -590,7 +593,8 @@ static const struct afs_call_type afs_RXFSGiveUpCallBacks = {
  * give up a set of callbacks
  * - the callbacks are held in the server->cb_break ring
  */
-int afs_fs_give_up_callbacks(struct afs_server *server,
+int afs_fs_give_up_callbacks(struct afs_net *net,
+			     struct afs_server *server,
 			     bool async)
 {
 	struct afs_call *call;
@@ -610,7 +614,7 @@ int afs_fs_give_up_callbacks(struct afs_server *server,
 
 	_debug("break %zu callbacks", ncallbacks);
 
-	call = afs_alloc_flat_call(&afs_RXFSGiveUpCallBacks,
+	call = afs_alloc_flat_call(net, &afs_RXFSGiveUpCallBacks,
 				   12 + ncallbacks * 6 * 4, 0);
 	if (!call)
 		return -ENOMEM;
@@ -699,6 +703,7 @@ int afs_fs_create(struct afs_server *server,
 		  bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	size_t namesz, reqsz, padsz;
 	__be32 *bp;
 
@@ -708,7 +713,7 @@ int afs_fs_create(struct afs_server *server,
 	padsz = (4 - (namesz & 3)) & 3;
 	reqsz = (5 * 4) + namesz + padsz + (6 * 4);
 
-	call = afs_alloc_flat_call(&afs_RXFSCreateXXXX, reqsz,
+	call = afs_alloc_flat_call(net, &afs_RXFSCreateXXXX, reqsz,
 				   (3 + 21 + 21 + 3 + 6) * 4);
 	if (!call)
 		return -ENOMEM;
@@ -789,6 +794,7 @@ int afs_fs_remove(struct afs_server *server,
 		  bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	size_t namesz, reqsz, padsz;
 	__be32 *bp;
 
@@ -798,7 +804,7 @@ int afs_fs_remove(struct afs_server *server,
 	padsz = (4 - (namesz & 3)) & 3;
 	reqsz = (5 * 4) + namesz + padsz;
 
-	call = afs_alloc_flat_call(&afs_RXFSRemoveXXXX, reqsz, (21 + 6) * 4);
+	call = afs_alloc_flat_call(net, &afs_RXFSRemoveXXXX, reqsz, (21 + 6) * 4);
 	if (!call)
 		return -ENOMEM;
 
@@ -870,6 +876,7 @@ int afs_fs_link(struct afs_server *server,
 		bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	size_t namesz, reqsz, padsz;
 	__be32 *bp;
 
@@ -879,7 +886,7 @@ int afs_fs_link(struct afs_server *server,
 	padsz = (4 - (namesz & 3)) & 3;
 	reqsz = (5 * 4) + namesz + padsz + (3 * 4);
 
-	call = afs_alloc_flat_call(&afs_RXFSLink, reqsz, (21 + 21 + 6) * 4);
+	call = afs_alloc_flat_call(net, &afs_RXFSLink, reqsz, (21 + 21 + 6) * 4);
 	if (!call)
 		return -ENOMEM;
 
@@ -958,6 +965,7 @@ int afs_fs_symlink(struct afs_server *server,
 		   bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	size_t namesz, reqsz, padsz, c_namesz, c_padsz;
 	__be32 *bp;
 
@@ -971,7 +979,7 @@ int afs_fs_symlink(struct afs_server *server,
 
 	reqsz = (6 * 4) + namesz + padsz + c_namesz + c_padsz + (6 * 4);
 
-	call = afs_alloc_flat_call(&afs_RXFSSymlink, reqsz,
+	call = afs_alloc_flat_call(net, &afs_RXFSSymlink, reqsz,
 				   (3 + 21 + 21 + 6) * 4);
 	if (!call)
 		return -ENOMEM;
@@ -1062,6 +1070,7 @@ int afs_fs_rename(struct afs_server *server,
 		  bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(orig_dvnode);
 	size_t reqsz, o_namesz, o_padsz, n_namesz, n_padsz;
 	__be32 *bp;
 
@@ -1078,7 +1087,7 @@ int afs_fs_rename(struct afs_server *server,
 		(3 * 4) +
 		4 + n_namesz + n_padsz;
 
-	call = afs_alloc_flat_call(&afs_RXFSRename, reqsz, (21 + 21 + 6) * 4);
+	call = afs_alloc_flat_call(net, &afs_RXFSRename, reqsz, (21 + 21 + 6) * 4);
 	if (!call)
 		return -ENOMEM;
 
@@ -1172,12 +1181,13 @@ static int afs_fs_store_data64(struct afs_server *server,
 {
 	struct afs_vnode *vnode = wb->vnode;
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 
 	_enter(",%x,{%x:%u},,",
 	       key_serial(wb->key), vnode->fid.vid, vnode->fid.vnode);
 
-	call = afs_alloc_flat_call(&afs_RXFSStoreData64,
+	call = afs_alloc_flat_call(net, &afs_RXFSStoreData64,
 				   (4 + 6 + 3 * 2) * 4,
 				   (21 + 6) * 4);
 	if (!call)
@@ -1230,6 +1240,7 @@ int afs_fs_store_data(struct afs_server *server, struct afs_writeback *wb,
 {
 	struct afs_vnode *vnode = wb->vnode;
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	loff_t size, pos, i_size;
 	__be32 *bp;
 
@@ -1254,7 +1265,7 @@ int afs_fs_store_data(struct afs_server *server, struct afs_writeback *wb,
 		return afs_fs_store_data64(server, wb, first, last, offset, to,
 					   size, pos, i_size, async);
 
-	call = afs_alloc_flat_call(&afs_RXFSStoreData,
+	call = afs_alloc_flat_call(net, &afs_RXFSStoreData,
 				   (4 + 6 + 3) * 4,
 				   (21 + 6) * 4);
 	if (!call)
@@ -1356,6 +1367,7 @@ static int afs_fs_setattr_size64(struct afs_server *server, struct key *key,
 				 bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 
 	_enter(",%x,{%x:%u},,",
@@ -1363,7 +1375,7 @@ static int afs_fs_setattr_size64(struct afs_server *server, struct key *key,
 
 	ASSERT(attr->ia_valid & ATTR_SIZE);
 
-	call = afs_alloc_flat_call(&afs_RXFSStoreData64_as_Status,
+	call = afs_alloc_flat_call(net, &afs_RXFSStoreData64_as_Status,
 				   (4 + 6 + 3 * 2) * 4,
 				   (21 + 6) * 4);
 	if (!call)
@@ -1404,6 +1416,7 @@ static int afs_fs_setattr_size(struct afs_server *server, struct key *key,
 			       bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 
 	_enter(",%x,{%x:%u},,",
@@ -1414,7 +1427,7 @@ static int afs_fs_setattr_size(struct afs_server *server, struct key *key,
 		return afs_fs_setattr_size64(server, key, vnode, attr,
 					     async);
 
-	call = afs_alloc_flat_call(&afs_RXFSStoreData_as_Status,
+	call = afs_alloc_flat_call(net, &afs_RXFSStoreData_as_Status,
 				   (4 + 6 + 3) * 4,
 				   (21 + 6) * 4);
 	if (!call)
@@ -1452,6 +1465,7 @@ int afs_fs_setattr(struct afs_server *server, struct key *key,
 		   bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 
 	if (attr->ia_valid & ATTR_SIZE)
@@ -1461,7 +1475,7 @@ int afs_fs_setattr(struct afs_server *server, struct key *key,
 	_enter(",%x,{%x:%u},,",
 	       key_serial(key), vnode->fid.vid, vnode->fid.vnode);
 
-	call = afs_alloc_flat_call(&afs_RXFSStoreStatus,
+	call = afs_alloc_flat_call(net, &afs_RXFSStoreStatus,
 				   (4 + 6) * 4,
 				   (21 + 6) * 4);
 	if (!call)
@@ -1687,6 +1701,7 @@ int afs_fs_get_volume_status(struct afs_server *server,
 			     bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 	void *tmpbuf;
 
@@ -1696,7 +1711,7 @@ int afs_fs_get_volume_status(struct afs_server *server,
 	if (!tmpbuf)
 		return -ENOMEM;
 
-	call = afs_alloc_flat_call(&afs_RXFSGetVolumeStatus, 2 * 4, 12 * 4);
+	call = afs_alloc_flat_call(net, &afs_RXFSGetVolumeStatus, 2 * 4, 12 * 4);
 	if (!call) {
 		kfree(tmpbuf);
 		return -ENOMEM;
@@ -1779,11 +1794,12 @@ int afs_fs_set_lock(struct afs_server *server,
 		    bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 
 	_enter("");
 
-	call = afs_alloc_flat_call(&afs_RXFSSetLock, 5 * 4, 6 * 4);
+	call = afs_alloc_flat_call(net, &afs_RXFSSetLock, 5 * 4, 6 * 4);
 	if (!call)
 		return -ENOMEM;
 
@@ -1812,11 +1828,12 @@ int afs_fs_extend_lock(struct afs_server *server,
 		       bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 
 	_enter("");
 
-	call = afs_alloc_flat_call(&afs_RXFSExtendLock, 4 * 4, 6 * 4);
+	call = afs_alloc_flat_call(net, &afs_RXFSExtendLock, 4 * 4, 6 * 4);
 	if (!call)
 		return -ENOMEM;
 
@@ -1844,11 +1861,12 @@ int afs_fs_release_lock(struct afs_server *server,
 			bool async)
 {
 	struct afs_call *call;
+	struct afs_net *net = afs_v2net(vnode);
 	__be32 *bp;
 
 	_enter("");
 
-	call = afs_alloc_flat_call(&afs_RXFSReleaseLock, 4 * 4, 6 * 4);
+	call = afs_alloc_flat_call(net, &afs_RXFSReleaseLock, 4 * 4, 6 * 4);
 	if (!call)
 		return -ENOMEM;
 
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 82e16556afea..e0484a38c9ce 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -21,6 +21,7 @@
 #include <linux/fscache.h>
 #include <linux/backing-dev.h>
 #include <linux/uuid.h>
+#include <net/net_namespace.h>
 #include <net/af_rxrpc.h>
 
 #include "afs.h"
@@ -48,6 +49,7 @@ struct afs_mount_params {
 	afs_voltype_t		type;		/* type of volume requested */
 	int			volnamesz;	/* size of volume name */
 	const char		*volname;	/* name of volume to mount */
+	struct afs_net		*net;		/* Network namespace in effect */
 	struct afs_cell		*cell;		/* cell in which to find volume */
 	struct afs_volume	*volume;	/* volume record */
 	struct key		*key;		/* key to use for secure mounting */
@@ -62,6 +64,7 @@ enum afs_call_state {
 	AFS_CALL_AWAIT_ACK,	/* awaiting final ACK of incoming call */
 	AFS_CALL_COMPLETE,	/* Completed or failed */
 };
+
 /*
  * a record of an in-progress RxRPC call
  */
@@ -72,6 +75,7 @@ struct afs_call {
 	struct work_struct	work;		/* actual work processor */
 	struct rxrpc_call	*rxcall;	/* RxRPC call handle */
 	struct key		*key;		/* security for this call */
+	struct afs_net		*net;		/* The network namespace */
 	struct afs_server	*server;	/* server affected by incoming CM call */
 	void			*request;	/* request data (first part) */
 	struct address_space	*mapping;	/* page set */
@@ -172,6 +176,7 @@ struct afs_writeback {
  * - there's one superblock per volume
  */
 struct afs_super_info {
+	struct afs_net		*net;		/* Network namespace */
 	struct afs_volume	*volume;	/* volume record */
 	char			rwparent;	/* T if parent is R/W AFS volume */
 };
@@ -192,11 +197,61 @@ struct afs_cache_cell {
 };
 
 /*
+ * AFS network namespace record.
+ */
+struct afs_net {
+	struct afs_uuid		uuid;
+	bool			live;		/* F if this namespace is being removed */
+
+	/* AF_RXRPC I/O stuff */
+	struct socket		*socket;
+	struct afs_call		*spare_incoming_call;
+	struct work_struct	charge_preallocation_work;
+	struct mutex		socket_mutex;
+	atomic_t		nr_outstanding_calls;
+	atomic_t		nr_superblocks;
+
+	/* Cell database */
+	struct list_head	cells;
+	struct afs_cell		*ws_cell;
+	rwlock_t		cells_lock;
+	struct rw_semaphore	cells_sem;
+	wait_queue_head_t	cells_freeable_wq;
+
+	struct rw_semaphore	proc_cells_sem;
+	struct list_head	proc_cells;
+
+	/* Volume location database */
+	struct list_head	vl_updates;		/* VL records in need-update order */
+	struct list_head	vl_graveyard;		/* Inactive VL records */
+	struct delayed_work	vl_reaper;
+	struct delayed_work	vl_updater;
+	spinlock_t		vl_updates_lock;
+	spinlock_t		vl_graveyard_lock;
+
+	/* File locking renewal management */
+	struct mutex		lock_manager_mutex;
+
+	/* Server database */
+	struct rb_root		servers;		/* Active servers */
+	rwlock_t		servers_lock;
+	struct list_head	server_graveyard;	/* Inactive server LRU list */
+	spinlock_t		server_graveyard_lock;
+	struct delayed_work	server_reaper;
+
+	/* Misc */
+	struct proc_dir_entry	*proc_afs;		/* /proc/net/afs directory */
+};
+
+extern struct afs_net __afs_net;// Dummy AFS network namespace; TODO: replace with real netns
+
+/*
  * AFS cell record
  */
 struct afs_cell {
 	atomic_t		usage;
 	struct list_head	link;		/* main cell list link */
+	struct afs_net		*net;		/* The network namespace */
 	struct key		*anonymous_key;	/* anonymous user key for this cell */
 	struct list_head	proc_link;	/* /proc cell list link */
 #ifdef CONFIG_AFS_FSCACHE
@@ -410,15 +465,6 @@ struct afs_interface {
 	unsigned	mtu;		/* MTU of interface */
 };
 
-struct afs_uuid {
-	__be32		time_low;			/* low part of timestamp */
-	__be16		time_mid;			/* mid part of timestamp */
-	__be16		time_hi_and_version;		/* high part of timestamp and version  */
-	__u8		clock_seq_hi_and_reserved;	/* clock seq hi and variant */
-	__u8		clock_seq_low;			/* clock seq low */
-	__u8		node[6];			/* spatially unique node ID (MAC addr) */
-};
-
 /*****************************************************************************/
 /*
  * cache.c
@@ -439,6 +485,8 @@ extern struct fscache_cookie_def afs_vnode_cache_index_def;
 /*
  * callback.c
  */
+extern struct workqueue_struct *afs_callback_update_worker;
+
 extern void afs_init_callback_state(struct afs_server *);
 extern void afs_broken_callback_work(struct work_struct *);
 extern void afs_break_callbacks(struct afs_server *, size_t,
@@ -447,22 +495,17 @@ extern void afs_discard_callback_on_delete(struct afs_vnode *);
 extern void afs_give_up_callback(struct afs_vnode *);
 extern void afs_dispatch_give_up_callbacks(struct work_struct *);
 extern void afs_flush_callback_breaks(struct afs_server *);
-extern int __init afs_callback_update_init(void);
-extern void afs_callback_update_kill(void);
 
 /*
  * cell.c
  */
-extern struct rw_semaphore afs_proc_cells_sem;
-extern struct list_head afs_proc_cells;
-
 #define afs_get_cell(C) do { atomic_inc(&(C)->usage); } while(0)
-extern int afs_cell_init(char *);
-extern struct afs_cell *afs_cell_create(const char *, unsigned, char *, bool);
-extern struct afs_cell *afs_cell_lookup(const char *, unsigned, bool);
+extern int __net_init afs_cell_init(struct afs_net *, char *);
+extern struct afs_cell *afs_cell_create(struct afs_net *, const char *, unsigned, char *, bool);
+extern struct afs_cell *afs_cell_lookup(struct afs_net *, const char *, unsigned, bool);
 extern struct afs_cell *afs_grab_cell(struct afs_cell *);
 extern void afs_put_cell(struct afs_cell *);
-extern void afs_cell_purge(void);
+extern void __net_exit afs_cell_purge(struct afs_net *);
 
 /*
  * cmservice.c
@@ -491,7 +534,8 @@ extern void afs_put_read(struct afs_read *);
 /*
  * flock.c
  */
-extern void __exit afs_kill_lock_manager(void);
+extern struct workqueue_struct *afs_lock_manager;
+
 extern void afs_lock_work(struct work_struct *);
 extern void afs_lock_may_be_available(struct afs_vnode *);
 extern int afs_lock(struct file *, int, struct file_lock *);
@@ -503,7 +547,7 @@ extern int afs_flock(struct file *, int, struct file_lock *);
 extern int afs_fs_fetch_file_status(struct afs_server *, struct key *,
 				    struct afs_vnode *, struct afs_volsync *,
 				    bool);
-extern int afs_fs_give_up_callbacks(struct afs_server *, bool);
+extern int afs_fs_give_up_callbacks(struct afs_net *, struct afs_server *, bool);
 extern int afs_fs_fetch_data(struct afs_server *, struct key *,
 			     struct afs_vnode *, struct afs_read *, bool);
 extern int afs_fs_create(struct afs_server *, struct key *,
@@ -553,7 +597,25 @@ extern int afs_drop_inode(struct inode *);
  * main.c
  */
 extern struct workqueue_struct *afs_wq;
-extern struct afs_uuid afs_uuid;
+
+static inline struct afs_net *afs_v2net(struct afs_vnode *vnode)
+{
+	return &__afs_net;
+}
+
+static inline struct afs_net *afs_sock2net(struct sock *sk)
+{
+	return &__afs_net;
+}
+
+static inline struct afs_net *afs_net_get(struct afs_net *net)
+{
+	return net;
+}
+
+static inline void afs_net_put(struct afs_net *net)
+{
+}
 
 /*
  * misc.c
@@ -578,23 +640,24 @@ extern int afs_get_ipv4_interfaces(struct afs_interface *, size_t, bool);
 /*
  * proc.c
  */
-extern int afs_proc_init(void);
-extern void afs_proc_cleanup(void);
-extern int afs_proc_cell_setup(struct afs_cell *);
-extern void afs_proc_cell_remove(struct afs_cell *);
+extern int __net_init afs_proc_init(struct afs_net *);
+extern void __net_exit afs_proc_cleanup(struct afs_net *);
+extern int afs_proc_cell_setup(struct afs_net *, struct afs_cell *);
+extern void afs_proc_cell_remove(struct afs_net *, struct afs_cell *);
 
 /*
  * rxrpc.c
  */
-extern struct socket *afs_socket;
-extern atomic_t afs_outstanding_calls;
+extern struct workqueue_struct *afs_async_calls;
 
-extern int afs_open_socket(void);
-extern void afs_close_socket(void);
+extern int __net_init afs_open_socket(struct afs_net *);
+extern void __net_exit afs_close_socket(struct afs_net *);
+extern void afs_charge_preallocation(struct work_struct *);
 extern void afs_put_call(struct afs_call *);
 extern int afs_queue_call_work(struct afs_call *);
 extern int afs_make_call(struct in_addr *, struct afs_call *, gfp_t, bool);
-extern struct afs_call *afs_alloc_flat_call(const struct afs_call_type *,
+extern struct afs_call *afs_alloc_flat_call(struct afs_net *,
+					    const struct afs_call_type *,
 					    size_t, size_t);
 extern void afs_flat_call_destructor(struct afs_call *);
 extern void afs_send_empty_reply(struct afs_call *);
@@ -628,37 +691,45 @@ do {								\
 
 extern struct afs_server *afs_lookup_server(struct afs_cell *,
 					    const struct in_addr *);
-extern struct afs_server *afs_find_server(const struct sockaddr_rxrpc *);
+extern struct afs_server *afs_find_server(struct afs_net *,
+					  const struct sockaddr_rxrpc *);
 extern void afs_put_server(struct afs_server *);
-extern void __exit afs_purge_servers(void);
+extern void afs_reap_server(struct work_struct *);
+extern void __net_exit afs_purge_servers(struct afs_net *);
 
 /*
  * super.c
  */
-extern int afs_fs_init(void);
-extern void afs_fs_exit(void);
+extern int __init afs_fs_init(void);
+extern void __exit afs_fs_exit(void);
 
 /*
  * vlclient.c
  */
-extern int afs_vl_get_entry_by_name(struct in_addr *, struct key *,
+extern int afs_vl_get_entry_by_name(struct afs_net *,
+				    struct in_addr *, struct key *,
 				    const char *, struct afs_cache_vlocation *,
 				    bool);
-extern int afs_vl_get_entry_by_id(struct in_addr *, struct key *,
+extern int afs_vl_get_entry_by_id(struct afs_net *,
+				  struct in_addr *, struct key *,
 				  afs_volid_t, afs_voltype_t,
 				  struct afs_cache_vlocation *, bool);
 
 /*
  * vlocation.c
  */
+extern struct workqueue_struct *afs_vlocation_update_worker;
+
 #define afs_get_vlocation(V) do { atomic_inc(&(V)->usage); } while(0)
 
-extern int __init afs_vlocation_update_init(void);
-extern struct afs_vlocation *afs_vlocation_lookup(struct afs_cell *,
+extern struct afs_vlocation *afs_vlocation_lookup(struct afs_net *,
+						  struct afs_cell *,
 						  struct key *,
 						  const char *, size_t);
-extern void afs_put_vlocation(struct afs_vlocation *);
-extern void afs_vlocation_purge(void);
+extern void afs_put_vlocation(struct afs_net *, struct afs_vlocation *);
+extern void afs_vlocation_updater(struct work_struct *);
+extern void afs_vlocation_reaper(struct work_struct *);
+extern void __net_exit afs_vlocation_purge(struct afs_net *);
 
 /*
  * vnode.c
@@ -706,7 +777,7 @@ extern int afs_vnode_release_lock(struct afs_vnode *, struct key *);
  */
 #define afs_get_volume(V) do { atomic_inc(&(V)->usage); } while(0)
 
-extern void afs_put_volume(struct afs_volume *);
+extern void afs_put_volume(struct afs_net *, struct afs_volume *);
 extern struct afs_volume *afs_volume_lookup(struct afs_mount_params *);
 extern struct afs_server *afs_volume_pick_fileserver(struct afs_vnode *);
 extern int afs_volume_release_fileserver(struct afs_vnode *,
diff --git a/fs/afs/main.c b/fs/afs/main.c
index 9944770849da..87b1a9c8000d 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -31,30 +31,104 @@ static char *rootcell;
 module_param(rootcell, charp, 0);
 MODULE_PARM_DESC(rootcell, "root AFS cell name and VL server IP addr list");
 
-struct afs_uuid afs_uuid;
 struct workqueue_struct *afs_wq;
+struct afs_net __afs_net;
+
+/*
+ * Initialise an AFS network namespace record.
+ */
+static int __net_init afs_net_init(struct afs_net *net)
+{
+	int ret;
+
+	net->live = true;
+	generate_random_uuid((unsigned char *)&net->uuid);
+
+	INIT_WORK(&net->charge_preallocation_work, afs_charge_preallocation);
+	mutex_init(&net->socket_mutex);
+	INIT_LIST_HEAD(&net->cells);
+	rwlock_init(&net->cells_lock);
+	init_rwsem(&net->cells_sem);
+	init_waitqueue_head(&net->cells_freeable_wq);
+	init_rwsem(&net->proc_cells_sem);
+	INIT_LIST_HEAD(&net->proc_cells);
+	INIT_LIST_HEAD(&net->vl_updates);
+	INIT_LIST_HEAD(&net->vl_graveyard);
+	INIT_DELAYED_WORK(&net->vl_reaper, afs_vlocation_reaper);
+	INIT_DELAYED_WORK(&net->vl_updater, afs_vlocation_updater);
+	spin_lock_init(&net->vl_updates_lock);
+	spin_lock_init(&net->vl_graveyard_lock);
+	net->servers = RB_ROOT;
+	rwlock_init(&net->servers_lock);
+	INIT_LIST_HEAD(&net->server_graveyard);
+	spin_lock_init(&net->server_graveyard_lock);
+	INIT_DELAYED_WORK(&net->server_reaper, afs_reap_server);
+
+	/* Register the /proc stuff */
+	ret = afs_proc_init(net);
+	if (ret < 0)
+		goto error_proc;
+
+	/* Initialise the cell DB */
+	ret = afs_cell_init(net, rootcell);
+	if (ret < 0)
+		goto error_cell_init;
+
+	/* Create the RxRPC transport */
+	ret = afs_open_socket(net);
+	if (ret < 0)
+		goto error_open_socket;
+
+	return 0;
+
+error_open_socket:
+	afs_vlocation_purge(net);
+	afs_cell_purge(net);
+error_cell_init:
+	afs_proc_cleanup(net);
+error_proc:
+	return ret;
+}
+
+/*
+ * Clean up and destroy an AFS network namespace record.
+ */
+static void __net_exit afs_net_exit(struct afs_net *net)
+{
+	net->live = false;
+	afs_close_socket(net);
+	afs_purge_servers(net);
+	afs_vlocation_purge(net);
+	afs_cell_purge(net);
+	afs_proc_cleanup(net);
+}
 
 /*
  * initialise the AFS client FS module
  */
 static int __init afs_init(void)
 {
-	int ret;
+	int ret = -ENOMEM;
 
 	printk(KERN_INFO "kAFS: Red Hat AFS client v0.1 registering.\n");
 
-	generate_random_uuid((unsigned char *)&afs_uuid);
-
-	/* create workqueue */
-	ret = -ENOMEM;
 	afs_wq = alloc_workqueue("afs", 0, 0);
 	if (!afs_wq)
-		return ret;
-
-	/* register the /proc stuff */
-	ret = afs_proc_init();
-	if (ret < 0)
-		goto error_proc;
+		goto error_afs_wq;
+	afs_async_calls = alloc_workqueue("kafsd", WQ_MEM_RECLAIM, 0);
+	if (!afs_async_calls)
+		goto error_async;
+	afs_vlocation_update_worker =
+		alloc_workqueue("kafs_vlupdated", WQ_MEM_RECLAIM, 0);
+	if (!afs_vlocation_update_worker)
+		goto error_vl_up;
+	afs_callback_update_worker =
+		alloc_ordered_workqueue("kafs_callbackd", WQ_MEM_RECLAIM);
+	if (!afs_callback_update_worker)
+		goto error_callback;
+	afs_lock_manager = alloc_workqueue("kafs_lockd", WQ_MEM_RECLAIM, 0);
+	if (!afs_lock_manager)
+		goto error_lockmgr;
 
 #ifdef CONFIG_AFS_FSCACHE
 	/* we want to be able to cache */
@@ -63,25 +137,9 @@ static int __init afs_init(void)
 		goto error_cache;
 #endif
 
-	/* initialise the cell DB */
-	ret = afs_cell_init(rootcell);
-	if (ret < 0)
-		goto error_cell_init;
-
-	/* initialise the VL update process */
-	ret = afs_vlocation_update_init();
-	if (ret < 0)
-		goto error_vl_update_init;
-
-	/* initialise the callback update process */
-	ret = afs_callback_update_init();
+	ret = afs_net_init(&__afs_net);
 	if (ret < 0)
-		goto error_callback_update_init;
-
-	/* create the RxRPC transport */
-	ret = afs_open_socket();
-	if (ret < 0)
-		goto error_open_socket;
+		goto error_net;
 
 	/* register the filesystems */
 	ret = afs_fs_init();
@@ -91,21 +149,22 @@ static int __init afs_init(void)
 	return ret;
 
 error_fs:
-	afs_close_socket();
-error_open_socket:
-	afs_callback_update_kill();
-error_callback_update_init:
-	afs_vlocation_purge();
-error_vl_update_init:
-	afs_cell_purge();
-error_cell_init:
+	afs_net_exit(&__afs_net);
+error_net:
 #ifdef CONFIG_AFS_FSCACHE
 	fscache_unregister_netfs(&afs_cache_netfs);
 error_cache:
 #endif
-	afs_proc_cleanup();
-error_proc:
+	destroy_workqueue(afs_lock_manager);
+error_lockmgr:
+	destroy_workqueue(afs_callback_update_worker);
+error_callback:
+	destroy_workqueue(afs_vlocation_update_worker);
+error_vl_up:
+	destroy_workqueue(afs_async_calls);
+error_async:
 	destroy_workqueue(afs_wq);
+error_afs_wq:
 	rcu_barrier();
 	printk(KERN_ERR "kAFS: failed to register: %d\n", ret);
 	return ret;
@@ -124,17 +183,15 @@ static void __exit afs_exit(void)
 	printk(KERN_INFO "kAFS: Red Hat AFS client v0.1 unregistering.\n");
 
 	afs_fs_exit();
-	afs_kill_lock_manager();
-	afs_close_socket();
-	afs_purge_servers();
-	afs_callback_update_kill();
-	afs_vlocation_purge();
-	destroy_workqueue(afs_wq);
-	afs_cell_purge();
+	afs_net_exit(&__afs_net);
 #ifdef CONFIG_AFS_FSCACHE
 	fscache_unregister_netfs(&afs_cache_netfs);
 #endif
-	afs_proc_cleanup();
+	destroy_workqueue(afs_lock_manager);
+	destroy_workqueue(afs_callback_update_worker);
+	destroy_workqueue(afs_vlocation_update_worker);
+	destroy_workqueue(afs_async_calls);
+	destroy_workqueue(afs_wq);
 	rcu_barrier();
 }
 
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 35efb9a31dd7..c93433460348 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -17,8 +17,15 @@
 #include <linux/uaccess.h>
 #include "internal.h"
 
-static struct proc_dir_entry *proc_afs;
+static inline struct afs_net *afs_proc2net(struct file *f)
+{
+	return &__afs_net;
+}
 
+static inline struct afs_net *afs_seq2net(struct seq_file *m)
+{
+	return &__afs_net; // TODO: use seq_file_net(m)
+}
 
 static int afs_proc_cells_open(struct inode *inode, struct file *file);
 static void *afs_proc_cells_start(struct seq_file *p, loff_t *pos);
@@ -122,23 +129,23 @@ static const struct file_operations afs_proc_cell_servers_fops = {
 /*
  * initialise the /proc/fs/afs/ directory
  */
-int afs_proc_init(void)
+int afs_proc_init(struct afs_net *net)
 {
 	_enter("");
 
-	proc_afs = proc_mkdir("fs/afs", NULL);
-	if (!proc_afs)
+	net->proc_afs = proc_mkdir("fs/afs", NULL);
+	if (!net->proc_afs)
 		goto error_dir;
 
-	if (!proc_create("cells", 0644, proc_afs, &afs_proc_cells_fops) ||
-	    !proc_create("rootcell", 0644, proc_afs, &afs_proc_rootcell_fops))
+	if (!proc_create("cells", 0644, net->proc_afs, &afs_proc_cells_fops) ||
+	    !proc_create("rootcell", 0644, net->proc_afs, &afs_proc_rootcell_fops))
 		goto error_tree;
 
 	_leave(" = 0");
 	return 0;
 
 error_tree:
-	remove_proc_subtree("fs/afs", NULL);
+	proc_remove(net->proc_afs);
 error_dir:
 	_leave(" = -ENOMEM");
 	return -ENOMEM;
@@ -147,9 +154,10 @@ int afs_proc_init(void)
 /*
  * clean up the /proc/fs/afs/ directory
  */
-void afs_proc_cleanup(void)
+void afs_proc_cleanup(struct afs_net *net)
 {
-	remove_proc_subtree("fs/afs", NULL);
+	proc_remove(net->proc_afs);
+	net->proc_afs = NULL;
 }
 
 /*
@@ -176,25 +184,30 @@ static int afs_proc_cells_open(struct inode *inode, struct file *file)
  */
 static void *afs_proc_cells_start(struct seq_file *m, loff_t *_pos)
 {
-	/* lock the list against modification */
-	down_read(&afs_proc_cells_sem);
-	return seq_list_start_head(&afs_proc_cells, *_pos);
+	struct afs_net *net = afs_seq2net(m);
+
+	down_read(&net->proc_cells_sem);
+	return seq_list_start_head(&net->proc_cells, *_pos);
 }
 
 /*
  * move to next cell in cells list
  */
-static void *afs_proc_cells_next(struct seq_file *p, void *v, loff_t *pos)
+static void *afs_proc_cells_next(struct seq_file *m, void *v, loff_t *pos)
 {
-	return seq_list_next(v, &afs_proc_cells, pos);
+	struct afs_net *net = afs_seq2net(m);
+
+	return seq_list_next(v, &net->proc_cells, pos);
 }
 
 /*
  * clean up after reading from the cells list
  */
-static void afs_proc_cells_stop(struct seq_file *p, void *v)
+static void afs_proc_cells_stop(struct seq_file *m, void *v)
 {
-	up_read(&afs_proc_cells_sem);
+	struct afs_net *net = afs_seq2net(m);
+
+	up_read(&net->proc_cells_sem);
 }
 
 /*
@@ -203,8 +216,9 @@ static void afs_proc_cells_stop(struct seq_file *p, void *v)
 static int afs_proc_cells_show(struct seq_file *m, void *v)
 {
 	struct afs_cell *cell = list_entry(v, struct afs_cell, proc_link);
+	struct afs_net *net = afs_seq2net(m);
 
-	if (v == &afs_proc_cells) {
+	if (v == &net->proc_cells) {
 		/* display header on line 1 */
 		seq_puts(m, "USE NAME\n");
 		return 0;
@@ -223,6 +237,7 @@ static int afs_proc_cells_show(struct seq_file *m, void *v)
 static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf,
 				    size_t size, loff_t *_pos)
 {
+	struct afs_net *net = afs_proc2net(file);
 	char *kbuf, *name, *args;
 	int ret;
 
@@ -264,7 +279,7 @@ static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf,
 	if (strcmp(kbuf, "add") == 0) {
 		struct afs_cell *cell;
 
-		cell = afs_cell_create(name, strlen(name), args, false);
+		cell = afs_cell_create(net, name, strlen(name), args, false);
 		if (IS_ERR(cell)) {
 			ret = PTR_ERR(cell);
 			goto done;
@@ -303,6 +318,7 @@ static ssize_t afs_proc_rootcell_write(struct file *file,
 				       const char __user *buf,
 				       size_t size, loff_t *_pos)
 {
+	struct afs_net *net = afs_proc2net(file);
 	char *kbuf, *s;
 	int ret;
 
@@ -322,7 +338,7 @@ static ssize_t afs_proc_rootcell_write(struct file *file,
 	/* determine command to perform */
 	_debug("rootcell=%s", kbuf);
 
-	ret = afs_cell_init(kbuf);
+	ret = afs_cell_init(net, kbuf);
 	if (ret >= 0)
 		ret = size;	/* consume everything, always */
 
@@ -334,13 +350,13 @@ static ssize_t afs_proc_rootcell_write(struct file *file,
 /*
  * initialise /proc/fs/afs/<cell>/
  */
-int afs_proc_cell_setup(struct afs_cell *cell)
+int afs_proc_cell_setup(struct afs_net *net, struct afs_cell *cell)
 {
 	struct proc_dir_entry *dir;
 
 	_enter("%p{%s}", cell, cell->name);
 
-	dir = proc_mkdir(cell->name, proc_afs);
+	dir = proc_mkdir(cell->name, net->proc_afs);
 	if (!dir)
 		goto error_dir;
 
@@ -356,7 +372,7 @@ int afs_proc_cell_setup(struct afs_cell *cell)
 	return 0;
 
 error_tree:
-	remove_proc_subtree(cell->name, proc_afs);
+	remove_proc_subtree(cell->name, net->proc_afs);
 error_dir:
 	_leave(" = -ENOMEM");
 	return -ENOMEM;
@@ -365,11 +381,11 @@ int afs_proc_cell_setup(struct afs_cell *cell)
 /*
  * remove /proc/fs/afs/<cell>/
  */
-void afs_proc_cell_remove(struct afs_cell *cell)
+void afs_proc_cell_remove(struct afs_net *net, struct afs_cell *cell)
 {
 	_enter("");
 
-	remove_proc_subtree(cell->name, proc_afs);
+	remove_proc_subtree(cell->name, net->proc_afs);
 
 	_leave("");
 }
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index cc7f7b3369ab..fba7d56b64b4 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -17,10 +17,7 @@
 #include "internal.h"
 #include "afs_cm.h"
 
-struct socket *afs_socket; /* my RxRPC socket */
-static struct workqueue_struct *afs_async_calls;
-static struct afs_call *afs_spare_incoming_call;
-atomic_t afs_outstanding_calls;
+struct workqueue_struct *afs_async_calls;
 
 static void afs_wake_up_call_waiter(struct sock *, struct rxrpc_call *, unsigned long);
 static int afs_wait_for_call_to_complete(struct afs_call *);
@@ -37,15 +34,11 @@ static const struct afs_call_type afs_RXCMxxxx = {
 	.abort_to_error	= afs_abort_to_error,
 };
 
-static void afs_charge_preallocation(struct work_struct *);
-
-static DECLARE_WORK(afs_charge_preallocation_work, afs_charge_preallocation);
-
 /*
  * open an RxRPC socket and bind it to be a server for callback notifications
  * - the socket is left in blocking mode and non-blocking ops use MSG_DONTWAIT
  */
-int afs_open_socket(void)
+int afs_open_socket(struct afs_net *net)
 {
 	struct sockaddr_rxrpc srx;
 	struct socket *socket;
@@ -53,11 +46,6 @@ int afs_open_socket(void)
 
 	_enter("");
 
-	ret = -ENOMEM;
-	afs_async_calls = alloc_workqueue("kafsd", WQ_MEM_RECLAIM, 0);
-	if (!afs_async_calls)
-		goto error_0;
-
 	ret = sock_create_kern(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET, &socket);
 	if (ret < 0)
 		goto error_1;
@@ -85,16 +73,14 @@ int afs_open_socket(void)
 	if (ret < 0)
 		goto error_2;
 
-	afs_socket = socket;
-	afs_charge_preallocation(NULL);
+	net->socket = socket;
+	afs_charge_preallocation(&net->charge_preallocation_work);
 	_leave(" = 0");
 	return 0;
 
 error_2:
 	sock_release(socket);
 error_1:
-	destroy_workqueue(afs_async_calls);
-error_0:
 	_leave(" = %d", ret);
 	return ret;
 }
@@ -102,36 +88,36 @@ int afs_open_socket(void)
 /*
  * close the RxRPC socket AFS was using
  */
-void afs_close_socket(void)
+void afs_close_socket(struct afs_net *net)
 {
 	_enter("");
 
-	kernel_listen(afs_socket, 0);
+	kernel_listen(net->socket, 0);
 	flush_workqueue(afs_async_calls);
 
-	if (afs_spare_incoming_call) {
-		afs_put_call(afs_spare_incoming_call);
-		afs_spare_incoming_call = NULL;
+	if (net->spare_incoming_call) {
+		afs_put_call(net->spare_incoming_call);
+		net->spare_incoming_call = NULL;
 	}
 
-	_debug("outstanding %u", atomic_read(&afs_outstanding_calls));
-	wait_on_atomic_t(&afs_outstanding_calls, atomic_t_wait,
+	_debug("outstanding %u", atomic_read(&net->nr_outstanding_calls));
+	wait_on_atomic_t(&net->nr_outstanding_calls, atomic_t_wait,
 			 TASK_UNINTERRUPTIBLE);
 	_debug("no outstanding calls");
 
-	kernel_sock_shutdown(afs_socket, SHUT_RDWR);
+	kernel_sock_shutdown(net->socket, SHUT_RDWR);
 	flush_workqueue(afs_async_calls);
-	sock_release(afs_socket);
+	sock_release(net->socket);
 
 	_debug("dework");
-	destroy_workqueue(afs_async_calls);
 	_leave("");
 }
 
 /*
  * Allocate a call.
  */
-static struct afs_call *afs_alloc_call(const struct afs_call_type *type,
+static struct afs_call *afs_alloc_call(struct afs_net *net,
+				       const struct afs_call_type *type,
 				       gfp_t gfp)
 {
 	struct afs_call *call;
@@ -142,11 +128,12 @@ static struct afs_call *afs_alloc_call(const struct afs_call_type *type,
 		return NULL;
 
 	call->type = type;
+	call->net = net;
 	atomic_set(&call->usage, 1);
 	INIT_WORK(&call->async_work, afs_process_async_call);
 	init_waitqueue_head(&call->waitq);
 
-	o = atomic_inc_return(&afs_outstanding_calls);
+	o = atomic_inc_return(&net->nr_outstanding_calls);
 	trace_afs_call(call, afs_call_trace_alloc, 1, o,
 		       __builtin_return_address(0));
 	return call;
@@ -157,8 +144,9 @@ static struct afs_call *afs_alloc_call(const struct afs_call_type *type,
  */
 void afs_put_call(struct afs_call *call)
 {
+	struct afs_net *net = call->net;
 	int n = atomic_dec_return(&call->usage);
-	int o = atomic_read(&afs_outstanding_calls);
+	int o = atomic_read(&net->nr_outstanding_calls);
 
 	trace_afs_call(call, afs_call_trace_put, n + 1, o,
 		       __builtin_return_address(0));
@@ -169,7 +157,7 @@ void afs_put_call(struct afs_call *call)
 		ASSERT(call->type->name != NULL);
 
 		if (call->rxcall) {
-			rxrpc_kernel_end_call(afs_socket, call->rxcall);
+			rxrpc_kernel_end_call(net->socket, call->rxcall);
 			call->rxcall = NULL;
 		}
 		if (call->type->destructor)
@@ -178,11 +166,11 @@ void afs_put_call(struct afs_call *call)
 		kfree(call->request);
 		kfree(call);
 
-		o = atomic_dec_return(&afs_outstanding_calls);
+		o = atomic_dec_return(&net->nr_outstanding_calls);
 		trace_afs_call(call, afs_call_trace_free, 0, o,
 			       __builtin_return_address(0));
 		if (o == 0)
-			wake_up_atomic_t(&afs_outstanding_calls);
+			wake_up_atomic_t(&net->nr_outstanding_calls);
 	}
 }
 
@@ -194,7 +182,7 @@ int afs_queue_call_work(struct afs_call *call)
 	int u = atomic_inc_return(&call->usage);
 
 	trace_afs_call(call, afs_call_trace_work, u,
-		       atomic_read(&afs_outstanding_calls),
+		       atomic_read(&call->net->nr_outstanding_calls),
 		       __builtin_return_address(0));
 
 	INIT_WORK(&call->work, call->type->work);
@@ -207,12 +195,13 @@ int afs_queue_call_work(struct afs_call *call)
 /*
  * allocate a call with flat request and reply buffers
  */
-struct afs_call *afs_alloc_flat_call(const struct afs_call_type *type,
+struct afs_call *afs_alloc_flat_call(struct afs_net *net,
+				     const struct afs_call_type *type,
 				     size_t request_size, size_t reply_max)
 {
 	struct afs_call *call;
 
-	call = afs_alloc_call(type, GFP_NOFS);
+	call = afs_alloc_call(net, type, GFP_NOFS);
 	if (!call)
 		goto nomem_call;
 
@@ -317,7 +306,7 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg)
 		bytes = msg->msg_iter.count;
 		nr = msg->msg_iter.nr_segs;
 
-		ret = rxrpc_kernel_send_data(afs_socket, call->rxcall, msg,
+		ret = rxrpc_kernel_send_data(call->net->socket, call->rxcall, msg,
 					     bytes, afs_notify_end_request_tx);
 		for (loop = 0; loop < nr; loop++)
 			put_page(bv[loop].bv_page);
@@ -352,7 +341,7 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 
 	_debug("____MAKE %p{%s,%x} [%d]____",
 	       call, call->type->name, key_serial(call->key),
-	       atomic_read(&afs_outstanding_calls));
+	       atomic_read(&call->net->nr_outstanding_calls));
 
 	call->async = async;
 
@@ -376,7 +365,7 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 	}
 
 	/* create a call */
-	rxcall = rxrpc_kernel_begin_call(afs_socket, &srx, call->key,
+	rxcall = rxrpc_kernel_begin_call(call->net->socket, &srx, call->key,
 					 (unsigned long)call,
 					 tx_total_len, gfp,
 					 (async ?
@@ -409,7 +398,7 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 	 */
 	if (!call->send_pages)
 		call->state = AFS_CALL_AWAIT_REPLY;
-	ret = rxrpc_kernel_send_data(afs_socket, rxcall,
+	ret = rxrpc_kernel_send_data(call->net->socket, rxcall,
 				     &msg, call->request_size,
 				     afs_notify_end_request_tx);
 	if (ret < 0)
@@ -431,13 +420,13 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 error_do_abort:
 	call->state = AFS_CALL_COMPLETE;
 	if (ret != -ECONNABORTED) {
-		rxrpc_kernel_abort_call(afs_socket, rxcall, RX_USER_ABORT,
-					ret, "KSD");
+		rxrpc_kernel_abort_call(call->net->socket, rxcall,
+					RX_USER_ABORT, ret, "KSD");
 	} else {
 		abort_code = 0;
 		offset = 0;
-		rxrpc_kernel_recv_data(afs_socket, rxcall, NULL, 0, &offset,
-				       false, &abort_code);
+		rxrpc_kernel_recv_data(call->net->socket, rxcall, NULL,
+				       0, &offset, false, &abort_code);
 		ret = call->type->abort_to_error(abort_code);
 	}
 error_kill_call:
@@ -463,7 +452,8 @@ static void afs_deliver_to_call(struct afs_call *call)
 	       ) {
 		if (call->state == AFS_CALL_AWAIT_ACK) {
 			size_t offset = 0;
-			ret = rxrpc_kernel_recv_data(afs_socket, call->rxcall,
+			ret = rxrpc_kernel_recv_data(call->net->socket,
+						     call->rxcall,
 						     NULL, 0, &offset, false,
 						     &call->abort_code);
 			trace_afs_recv_data(call, 0, offset, false, ret);
@@ -490,12 +480,12 @@ static void afs_deliver_to_call(struct afs_call *call)
 			goto call_complete;
 		case -ENOTCONN:
 			abort_code = RX_CALL_DEAD;
-			rxrpc_kernel_abort_call(afs_socket, call->rxcall,
+			rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
 						abort_code, ret, "KNC");
 			goto save_error;
 		case -ENOTSUPP:
 			abort_code = RXGEN_OPCODE;
-			rxrpc_kernel_abort_call(afs_socket, call->rxcall,
+			rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
 						abort_code, ret, "KIV");
 			goto save_error;
 		case -ENODATA:
@@ -505,7 +495,7 @@ static void afs_deliver_to_call(struct afs_call *call)
 			abort_code = RXGEN_CC_UNMARSHAL;
 			if (call->state != AFS_CALL_AWAIT_REPLY)
 				abort_code = RXGEN_SS_UNMARSHAL;
-			rxrpc_kernel_abort_call(afs_socket, call->rxcall,
+			rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
 						abort_code, -EBADMSG, "KUM");
 			goto save_error;
 		}
@@ -560,7 +550,7 @@ static int afs_wait_for_call_to_complete(struct afs_call *call)
 	/* Kill off the call if it's still live. */
 	if (call->state < AFS_CALL_COMPLETE) {
 		_debug("call interrupted");
-		rxrpc_kernel_abort_call(afs_socket, call->rxcall,
+		rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
 					RX_USER_ABORT, -EINTR, "KWI");
 	}
 
@@ -598,7 +588,7 @@ static void afs_wake_up_async_call(struct sock *sk, struct rxrpc_call *rxcall,
 	u = __atomic_add_unless(&call->usage, 1, 0);
 	if (u != 0) {
 		trace_afs_call(call, afs_call_trace_wake, u,
-			       atomic_read(&afs_outstanding_calls),
+			       atomic_read(&call->net->nr_outstanding_calls),
 			       __builtin_return_address(0));
 
 		if (!queue_work(afs_async_calls, &call->async_work))
@@ -662,13 +652,15 @@ static void afs_rx_attach(struct rxrpc_call *rxcall, unsigned long user_call_ID)
 /*
  * Charge the incoming call preallocation.
  */
-static void afs_charge_preallocation(struct work_struct *work)
+void afs_charge_preallocation(struct work_struct *work)
 {
-	struct afs_call *call = afs_spare_incoming_call;
+	struct afs_net *net =
+		container_of(work, struct afs_net, charge_preallocation_work);
+	struct afs_call *call = net->spare_incoming_call;
 
 	for (;;) {
 		if (!call) {
-			call = afs_alloc_call(&afs_RXCMxxxx, GFP_KERNEL);
+			call = afs_alloc_call(net, &afs_RXCMxxxx, GFP_KERNEL);
 			if (!call)
 				break;
 
@@ -677,7 +669,7 @@ static void afs_charge_preallocation(struct work_struct *work)
 			init_waitqueue_head(&call->waitq);
 		}
 
-		if (rxrpc_kernel_charge_accept(afs_socket,
+		if (rxrpc_kernel_charge_accept(net->socket,
 					       afs_wake_up_async_call,
 					       afs_rx_attach,
 					       (unsigned long)call,
@@ -685,7 +677,7 @@ static void afs_charge_preallocation(struct work_struct *work)
 			break;
 		call = NULL;
 	}
-	afs_spare_incoming_call = call;
+	net->spare_incoming_call = call;
 }
 
 /*
@@ -706,7 +698,9 @@ static void afs_rx_discard_new_call(struct rxrpc_call *rxcall,
 static void afs_rx_new_call(struct sock *sk, struct rxrpc_call *rxcall,
 			    unsigned long user_call_ID)
 {
-	queue_work(afs_wq, &afs_charge_preallocation_work);
+	struct afs_net *net = afs_sock2net(sk);
+
+	queue_work(afs_wq, &net->charge_preallocation_work);
 }
 
 /*
@@ -761,11 +755,12 @@ static void afs_notify_end_reply_tx(struct sock *sock,
  */
 void afs_send_empty_reply(struct afs_call *call)
 {
+	struct afs_net *net = call->net;
 	struct msghdr msg;
 
 	_enter("");
 
-	rxrpc_kernel_set_tx_length(afs_socket, call->rxcall, 0);
+	rxrpc_kernel_set_tx_length(net->socket, call->rxcall, 0);
 
 	msg.msg_name		= NULL;
 	msg.msg_namelen		= 0;
@@ -775,7 +770,7 @@ void afs_send_empty_reply(struct afs_call *call)
 	msg.msg_flags		= 0;
 
 	call->state = AFS_CALL_AWAIT_ACK;
-	switch (rxrpc_kernel_send_data(afs_socket, call->rxcall, &msg, 0,
+	switch (rxrpc_kernel_send_data(net->socket, call->rxcall, &msg, 0,
 				       afs_notify_end_reply_tx)) {
 	case 0:
 		_leave(" [replied]");
@@ -783,7 +778,7 @@ void afs_send_empty_reply(struct afs_call *call)
 
 	case -ENOMEM:
 		_debug("oom");
-		rxrpc_kernel_abort_call(afs_socket, call->rxcall,
+		rxrpc_kernel_abort_call(net->socket, call->rxcall,
 					RX_USER_ABORT, -ENOMEM, "KOO");
 	default:
 		_leave(" [error]");
@@ -796,13 +791,14 @@ void afs_send_empty_reply(struct afs_call *call)
  */
 void afs_send_simple_reply(struct afs_call *call, const void *buf, size_t len)
 {
+	struct afs_net *net = call->net;
 	struct msghdr msg;
 	struct kvec iov[1];
 	int n;
 
 	_enter("");
 
-	rxrpc_kernel_set_tx_length(afs_socket, call->rxcall, len);
+	rxrpc_kernel_set_tx_length(net->socket, call->rxcall, len);
 
 	iov[0].iov_base		= (void *) buf;
 	iov[0].iov_len		= len;
@@ -814,7 +810,7 @@ void afs_send_simple_reply(struct afs_call *call, const void *buf, size_t len)
 	msg.msg_flags		= 0;
 
 	call->state = AFS_CALL_AWAIT_ACK;
-	n = rxrpc_kernel_send_data(afs_socket, call->rxcall, &msg, len,
+	n = rxrpc_kernel_send_data(net->socket, call->rxcall, &msg, len,
 				   afs_notify_end_reply_tx);
 	if (n >= 0) {
 		/* Success */
@@ -824,7 +820,7 @@ void afs_send_simple_reply(struct afs_call *call, const void *buf, size_t len)
 
 	if (n == -ENOMEM) {
 		_debug("oom");
-		rxrpc_kernel_abort_call(afs_socket, call->rxcall,
+		rxrpc_kernel_abort_call(net->socket, call->rxcall,
 					RX_USER_ABORT, -ENOMEM, "KOO");
 	}
 	_leave(" [error]");
@@ -836,6 +832,7 @@ void afs_send_simple_reply(struct afs_call *call, const void *buf, size_t len)
 int afs_extract_data(struct afs_call *call, void *buf, size_t count,
 		     bool want_more)
 {
+	struct afs_net *net = call->net;
 	int ret;
 
 	_enter("{%s,%zu},,%zu,%d",
@@ -843,7 +840,7 @@ int afs_extract_data(struct afs_call *call, void *buf, size_t count,
 
 	ASSERTCMP(call->offset, <=, count);
 
-	ret = rxrpc_kernel_recv_data(afs_socket, call->rxcall,
+	ret = rxrpc_kernel_recv_data(net->socket, call->rxcall,
 				     buf, count, &call->offset,
 				     want_more, &call->abort_code);
 	trace_afs_recv_data(call, count, call->offset, want_more, ret);
diff --git a/fs/afs/server.c b/fs/afs/server.c
index c001b1f2455f..e47fd9bc0ddc 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -15,32 +15,22 @@
 
 static unsigned afs_server_timeout = 10;	/* server timeout in seconds */
 
-static void afs_reap_server(struct work_struct *);
-
-/* tree of all the servers, indexed by IP address */
-static struct rb_root afs_servers = RB_ROOT;
-static DEFINE_RWLOCK(afs_servers_lock);
-
-/* LRU list of all the servers not currently in use */
-static LIST_HEAD(afs_server_graveyard);
-static DEFINE_SPINLOCK(afs_server_graveyard_lock);
-static DECLARE_DELAYED_WORK(afs_server_reaper, afs_reap_server);
-
 /*
  * install a server record in the master tree
  */
 static int afs_install_server(struct afs_server *server)
 {
 	struct afs_server *xserver;
+	struct afs_net *net = server->cell->net;
 	struct rb_node **pp, *p;
 	int ret;
 
 	_enter("%p", server);
 
-	write_lock(&afs_servers_lock);
+	write_lock(&net->servers_lock);
 
 	ret = -EEXIST;
-	pp = &afs_servers.rb_node;
+	pp = &net->servers.rb_node;
 	p = NULL;
 	while (*pp) {
 		p = *pp;
@@ -55,11 +45,11 @@ static int afs_install_server(struct afs_server *server)
 	}
 
 	rb_link_node(&server->master_rb, p, pp);
-	rb_insert_color(&server->master_rb, &afs_servers);
+	rb_insert_color(&server->master_rb, &net->servers);
 	ret = 0;
 
 error:
-	write_unlock(&afs_servers_lock);
+	write_unlock(&net->servers_lock);
 	return ret;
 }
 
@@ -150,9 +140,9 @@ struct afs_server *afs_lookup_server(struct afs_cell *cell,
 	read_unlock(&cell->servers_lock);
 no_longer_unused:
 	if (!list_empty(&server->grave)) {
-		spin_lock(&afs_server_graveyard_lock);
+		spin_lock(&cell->net->server_graveyard_lock);
 		list_del_init(&server->grave);
-		spin_unlock(&afs_server_graveyard_lock);
+		spin_unlock(&cell->net->server_graveyard_lock);
 	}
 	_leave(" = %p{%d}", server, atomic_read(&server->usage));
 	return server;
@@ -178,7 +168,8 @@ struct afs_server *afs_lookup_server(struct afs_cell *cell,
 /*
  * look up a server by its IP address
  */
-struct afs_server *afs_find_server(const struct sockaddr_rxrpc *srx)
+struct afs_server *afs_find_server(struct afs_net *net,
+				   const struct sockaddr_rxrpc *srx)
 {
 	struct afs_server *server = NULL;
 	struct rb_node *p;
@@ -191,9 +182,9 @@ struct afs_server *afs_find_server(const struct sockaddr_rxrpc *srx)
 		return NULL;
 	}
 
-	read_lock(&afs_servers_lock);
+	read_lock(&net->servers_lock);
 
-	p = afs_servers.rb_node;
+	p = net->servers.rb_node;
 	while (p) {
 		server = rb_entry(p, struct afs_server, master_rb);
 
@@ -211,7 +202,7 @@ struct afs_server *afs_find_server(const struct sockaddr_rxrpc *srx)
 
 	server = NULL;
 found:
-	read_unlock(&afs_servers_lock);
+	read_unlock(&net->servers_lock);
 	ASSERTIFCMP(server, server->addr.s_addr, ==, addr.s_addr);
 	_leave(" = %p", server);
 	return server;
@@ -223,6 +214,8 @@ struct afs_server *afs_find_server(const struct sockaddr_rxrpc *srx)
  */
 void afs_put_server(struct afs_server *server)
 {
+	struct afs_net *net = server->cell->net;
+
 	if (!server)
 		return;
 
@@ -239,14 +232,14 @@ void afs_put_server(struct afs_server *server)
 
 	afs_flush_callback_breaks(server);
 
-	spin_lock(&afs_server_graveyard_lock);
+	spin_lock(&net->server_graveyard_lock);
 	if (atomic_read(&server->usage) == 0) {
-		list_move_tail(&server->grave, &afs_server_graveyard);
+		list_move_tail(&server->grave, &net->server_graveyard);
 		server->time_of_death = ktime_get_real_seconds();
-		queue_delayed_work(afs_wq, &afs_server_reaper,
-				   afs_server_timeout * HZ);
+		queue_delayed_work(afs_wq, &net->server_reaper,
+				   net->live ? afs_server_timeout * HZ : 0);
 	}
-	spin_unlock(&afs_server_graveyard_lock);
+	spin_unlock(&net->server_graveyard_lock);
 	_leave(" [dead]");
 }
 
@@ -272,42 +265,45 @@ static void afs_destroy_server(struct afs_server *server)
 /*
  * reap dead server records
  */
-static void afs_reap_server(struct work_struct *work)
+void afs_reap_server(struct work_struct *work)
 {
 	LIST_HEAD(corpses);
 	struct afs_server *server;
+	struct afs_net *net = container_of(work, struct afs_net, server_reaper.work);
 	unsigned long delay, expiry;
 	time64_t now;
 
 	now = ktime_get_real_seconds();
-	spin_lock(&afs_server_graveyard_lock);
+	spin_lock(&net->server_graveyard_lock);
 
-	while (!list_empty(&afs_server_graveyard)) {
-		server = list_entry(afs_server_graveyard.next,
+	while (!list_empty(&net->server_graveyard)) {
+		server = list_entry(net->server_graveyard.next,
 				    struct afs_server, grave);
 
 		/* the queue is ordered most dead first */
-		expiry = server->time_of_death + afs_server_timeout;
-		if (expiry > now) {
-			delay = (expiry - now) * HZ;
-			mod_delayed_work(afs_wq, &afs_server_reaper, delay);
-			break;
+		if (net->live) {
+			expiry = server->time_of_death + afs_server_timeout;
+			if (expiry > now) {
+				delay = (expiry - now) * HZ;
+				mod_delayed_work(afs_wq, &net->server_reaper, delay);
+				break;
+			}
 		}
 
 		write_lock(&server->cell->servers_lock);
-		write_lock(&afs_servers_lock);
+		write_lock(&net->servers_lock);
 		if (atomic_read(&server->usage) > 0) {
 			list_del_init(&server->grave);
 		} else {
 			list_move_tail(&server->grave, &corpses);
 			list_del_init(&server->link);
-			rb_erase(&server->master_rb, &afs_servers);
+			rb_erase(&server->master_rb, &net->servers);
 		}
-		write_unlock(&afs_servers_lock);
+		write_unlock(&net->servers_lock);
 		write_unlock(&server->cell->servers_lock);
 	}
 
-	spin_unlock(&afs_server_graveyard_lock);
+	spin_unlock(&net->server_graveyard_lock);
 
 	/* now reap the corpses we've extracted */
 	while (!list_empty(&corpses)) {
@@ -318,10 +314,10 @@ static void afs_reap_server(struct work_struct *work)
 }
 
 /*
- * discard all the server records for rmmod
+ * Discard all the server records from a net namespace when it is destroyed or
+ * the afs module is removed.
  */
-void __exit afs_purge_servers(void)
+void __net_exit afs_purge_servers(struct afs_net *net)
 {
-	afs_server_timeout = 0;
-	mod_delayed_work(afs_wq, &afs_server_reaper, 0);
+	mod_delayed_work(afs_wq, &net->server_reaper, 0);
 }
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 689173c0a682..1bfc7b28700b 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -25,11 +25,10 @@
 #include <linux/statfs.h>
 #include <linux/sched.h>
 #include <linux/nsproxy.h>
+#include <linux/magic.h>
 #include <net/net_namespace.h>
 #include "internal.h"
 
-#define AFS_FS_MAGIC 0x6B414653 /* 'kAFS' */
-
 static void afs_i_init_once(void *foo);
 static struct dentry *afs_mount(struct file_system_type *fs_type,
 		      int flags, const char *dev_name, void *data);
@@ -201,7 +200,8 @@ static int afs_parse_options(struct afs_mount_params *params,
 		token = match_token(p, afs_options_list, args);
 		switch (token) {
 		case afs_opt_cell:
-			cell = afs_cell_lookup(args[0].from,
+			cell = afs_cell_lookup(params->net,
+					       args[0].from,
 					       args[0].to - args[0].from,
 					       false);
 			if (IS_ERR(cell))
@@ -308,7 +308,7 @@ static int afs_parse_device_name(struct afs_mount_params *params,
 
 	/* lookup the cell record */
 	if (cellname || !params->cell) {
-		cell = afs_cell_lookup(cellname, cellnamesz, true);
+		cell = afs_cell_lookup(params->net, cellname, cellnamesz, true);
 		if (IS_ERR(cell)) {
 			printk(KERN_ERR "kAFS: unable to lookup cell '%*.*s'\n",
 			       cellnamesz, cellnamesz, cellname ?: "");
@@ -334,7 +334,7 @@ static int afs_test_super(struct super_block *sb, void *data)
 	struct afs_super_info *as1 = data;
 	struct afs_super_info *as = sb->s_fs_info;
 
-	return as->volume == as1->volume;
+	return as->net == as1->net && as->volume == as1->volume;
 }
 
 static int afs_set_super(struct super_block *sb, void *data)
@@ -411,6 +411,7 @@ static struct dentry *afs_mount(struct file_system_type *fs_type,
 	_enter(",,%s,%p", dev_name, options);
 
 	memset(&params, 0, sizeof(params));
+	params.net = &__afs_net;
 
 	ret = -EINVAL;
 	if (current->nsproxy->net_ns != &init_net)
@@ -444,36 +445,32 @@ static struct dentry *afs_mount(struct file_system_type *fs_type,
 	}
 
 	/* allocate a superblock info record */
+	ret = -ENOMEM;
 	as = kzalloc(sizeof(struct afs_super_info), GFP_KERNEL);
-	if (!as) {
-		ret = -ENOMEM;
-		afs_put_volume(vol);
-		goto error;
-	}
+	if (!as)
+		goto error_vol;
+
+	as->net = afs_net_get(params.net);
 	as->volume = vol;
 
 	/* allocate a deviceless superblock */
 	sb = sget(fs_type, afs_test_super, afs_set_super, flags, as);
 	if (IS_ERR(sb)) {
 		ret = PTR_ERR(sb);
-		afs_put_volume(vol);
-		kfree(as);
-		goto error;
+		goto error_as;
 	}
 
 	if (!sb->s_root) {
 		/* initial superblock/root creation */
 		_debug("create");
 		ret = afs_fill_super(sb, &params);
-		if (ret < 0) {
-			deactivate_locked_super(sb);
-			goto error;
-		}
+		if (ret < 0)
+			goto error_sb;
 		sb->s_flags |= MS_ACTIVE;
 	} else {
 		_debug("reuse");
 		ASSERTCMP(sb->s_flags, &, MS_ACTIVE);
-		afs_put_volume(vol);
+		afs_put_volume(params.net, vol);
 		kfree(as);
 	}
 
@@ -482,6 +479,14 @@ static struct dentry *afs_mount(struct file_system_type *fs_type,
 	_leave(" = 0 [%p]", sb);
 	return dget(sb->s_root);
 
+error_sb:
+	deactivate_locked_super(sb);
+	goto error;
+error_as:
+	afs_net_put(as->net);
+	kfree(as);
+error_vol:
+	afs_put_volume(params.net, vol);
 error:
 	afs_put_cell(params.cell);
 	key_put(params.key);
@@ -493,8 +498,10 @@ static struct dentry *afs_mount(struct file_system_type *fs_type,
 static void afs_kill_super(struct super_block *sb)
 {
 	struct afs_super_info *as = sb->s_fs_info;
+	struct afs_net *net = as->net;
+
 	kill_anon_super(sb);
-	afs_put_volume(as->volume);
+	afs_put_volume(net, as->volume);
 	kfree(as);
 }
 
diff --git a/fs/afs/vlclient.c b/fs/afs/vlclient.c
index a5e4cc561b6c..f5a043a9ba61 100644
--- a/fs/afs/vlclient.c
+++ b/fs/afs/vlclient.c
@@ -143,7 +143,8 @@ static const struct afs_call_type afs_RXVLGetEntryById = {
 /*
  * dispatch a get volume entry by name operation
  */
-int afs_vl_get_entry_by_name(struct in_addr *addr,
+int afs_vl_get_entry_by_name(struct afs_net *net,
+			     struct in_addr *addr,
 			     struct key *key,
 			     const char *volname,
 			     struct afs_cache_vlocation *entry,
@@ -159,7 +160,7 @@ int afs_vl_get_entry_by_name(struct in_addr *addr,
 	padsz = (4 - (volnamesz & 3)) & 3;
 	reqsz = 8 + volnamesz + padsz;
 
-	call = afs_alloc_flat_call(&afs_RXVLGetEntryByName, reqsz, 384);
+	call = afs_alloc_flat_call(net, &afs_RXVLGetEntryByName, reqsz, 384);
 	if (!call)
 		return -ENOMEM;
 
@@ -183,7 +184,8 @@ int afs_vl_get_entry_by_name(struct in_addr *addr,
 /*
  * dispatch a get volume entry by ID operation
  */
-int afs_vl_get_entry_by_id(struct in_addr *addr,
+int afs_vl_get_entry_by_id(struct afs_net *net,
+			   struct in_addr *addr,
 			   struct key *key,
 			   afs_volid_t volid,
 			   afs_voltype_t voltype,
@@ -195,7 +197,7 @@ int afs_vl_get_entry_by_id(struct in_addr *addr,
 
 	_enter("");
 
-	call = afs_alloc_flat_call(&afs_RXVLGetEntryById, 12, 384);
+	call = afs_alloc_flat_call(net, &afs_RXVLGetEntryById, 12, 384);
 	if (!call)
 		return -ENOMEM;
 
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index 37b7c3b342a6..ccb7aacfbeca 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -16,20 +16,11 @@
 #include <linux/sched.h>
 #include "internal.h"
 
+struct workqueue_struct *afs_vlocation_update_worker;
+
 static unsigned afs_vlocation_timeout = 10;	/* volume location timeout in seconds */
 static unsigned afs_vlocation_update_timeout = 10 * 60;
 
-static void afs_vlocation_reaper(struct work_struct *);
-static void afs_vlocation_updater(struct work_struct *);
-
-static LIST_HEAD(afs_vlocation_updates);
-static LIST_HEAD(afs_vlocation_graveyard);
-static DEFINE_SPINLOCK(afs_vlocation_updates_lock);
-static DEFINE_SPINLOCK(afs_vlocation_graveyard_lock);
-static DECLARE_DELAYED_WORK(afs_vlocation_reap, afs_vlocation_reaper);
-static DECLARE_DELAYED_WORK(afs_vlocation_update, afs_vlocation_updater);
-static struct workqueue_struct *afs_vlocation_update_worker;
-
 /*
  * iterate through the VL servers in a cell until one of them admits knowing
  * about the volume in question
@@ -52,8 +43,8 @@ static int afs_vlocation_access_vl_by_name(struct afs_vlocation *vl,
 		_debug("CellServ[%hu]: %08x", cell->vl_curr_svix, addr.s_addr);
 
 		/* attempt to access the VL server */
-		ret = afs_vl_get_entry_by_name(&addr, key, vl->vldb.name, vldb,
-					       false);
+		ret = afs_vl_get_entry_by_name(cell->net, &addr, key,
+					       vl->vldb.name, vldb, false);
 		switch (ret) {
 		case 0:
 			goto out;
@@ -110,8 +101,8 @@ static int afs_vlocation_access_vl_by_id(struct afs_vlocation *vl,
 		_debug("CellServ[%hu]: %08x", cell->vl_curr_svix, addr.s_addr);
 
 		/* attempt to access the VL server */
-		ret = afs_vl_get_entry_by_id(&addr, key, volid, voltype, vldb,
-					     false);
+		ret = afs_vl_get_entry_by_id(cell->net, &addr, key, volid,
+					     voltype, vldb, false);
 		switch (ret) {
 		case 0:
 			goto out;
@@ -335,7 +326,8 @@ static int afs_vlocation_fill_in_record(struct afs_vlocation *vl,
 /*
  * queue a vlocation record for updates
  */
-static void afs_vlocation_queue_for_updates(struct afs_vlocation *vl)
+static void afs_vlocation_queue_for_updates(struct afs_net *net,
+					    struct afs_vlocation *vl)
 {
 	struct afs_vlocation *xvl;
 
@@ -343,25 +335,25 @@ static void afs_vlocation_queue_for_updates(struct afs_vlocation *vl)
 	vl->update_at = ktime_get_real_seconds() +
 			afs_vlocation_update_timeout;
 
-	spin_lock(&afs_vlocation_updates_lock);
+	spin_lock(&net->vl_updates_lock);
 
-	if (!list_empty(&afs_vlocation_updates)) {
+	if (!list_empty(&net->vl_updates)) {
 		/* ... but wait at least 1 second more than the newest record
 		 * already queued so that we don't spam the VL server suddenly
 		 * with lots of requests
 		 */
-		xvl = list_entry(afs_vlocation_updates.prev,
+		xvl = list_entry(net->vl_updates.prev,
 				 struct afs_vlocation, update);
 		if (vl->update_at <= xvl->update_at)
 			vl->update_at = xvl->update_at + 1;
-	} else {
+	} else if (net->live) {
 		queue_delayed_work(afs_vlocation_update_worker,
-				   &afs_vlocation_update,
+				   &net->vl_updater,
 				   afs_vlocation_update_timeout * HZ);
 	}
 
-	list_add_tail(&vl->update, &afs_vlocation_updates);
-	spin_unlock(&afs_vlocation_updates_lock);
+	list_add_tail(&vl->update, &net->vl_updates);
+	spin_unlock(&net->vl_updates_lock);
 }
 
 /*
@@ -371,7 +363,8 @@ static void afs_vlocation_queue_for_updates(struct afs_vlocation *vl)
  * - lookup in the local cache if not able to find on the VL server
  * - insert/update in the local cache if did get a VL response
  */
-struct afs_vlocation *afs_vlocation_lookup(struct afs_cell *cell,
+struct afs_vlocation *afs_vlocation_lookup(struct afs_net *net,
+					   struct afs_cell *cell,
 					   struct key *key,
 					   const char *name,
 					   size_t namesz)
@@ -427,7 +420,7 @@ struct afs_vlocation *afs_vlocation_lookup(struct afs_cell *cell,
 #endif
 
 	/* schedule for regular updates */
-	afs_vlocation_queue_for_updates(vl);
+	afs_vlocation_queue_for_updates(net, vl);
 	goto success;
 
 found_in_memory:
@@ -436,9 +429,9 @@ struct afs_vlocation *afs_vlocation_lookup(struct afs_cell *cell,
 	atomic_inc(&vl->usage);
 	spin_unlock(&cell->vl_lock);
 	if (!list_empty(&vl->grave)) {
-		spin_lock(&afs_vlocation_graveyard_lock);
+		spin_lock(&net->vl_graveyard_lock);
 		list_del_init(&vl->grave);
-		spin_unlock(&afs_vlocation_graveyard_lock);
+		spin_unlock(&net->vl_graveyard_lock);
 	}
 	up_write(&cell->vl_sem);
 
@@ -481,7 +474,7 @@ struct afs_vlocation *afs_vlocation_lookup(struct afs_cell *cell,
 	wake_up(&vl->waitq);
 error:
 	ASSERT(vl != NULL);
-	afs_put_vlocation(vl);
+	afs_put_vlocation(net, vl);
 	_leave(" = %d", ret);
 	return ERR_PTR(ret);
 }
@@ -489,7 +482,7 @@ struct afs_vlocation *afs_vlocation_lookup(struct afs_cell *cell,
 /*
  * finish using a volume location record
  */
-void afs_put_vlocation(struct afs_vlocation *vl)
+void afs_put_vlocation(struct afs_net *net, struct afs_vlocation *vl)
 {
 	if (!vl)
 		return;
@@ -503,22 +496,22 @@ void afs_put_vlocation(struct afs_vlocation *vl)
 		return;
 	}
 
-	spin_lock(&afs_vlocation_graveyard_lock);
+	spin_lock(&net->vl_graveyard_lock);
 	if (atomic_read(&vl->usage) == 0) {
 		_debug("buried");
-		list_move_tail(&vl->grave, &afs_vlocation_graveyard);
+		list_move_tail(&vl->grave, &net->vl_graveyard);
 		vl->time_of_death = ktime_get_real_seconds();
-		queue_delayed_work(afs_wq, &afs_vlocation_reap,
+		queue_delayed_work(afs_wq, &net->vl_reaper,
 				   afs_vlocation_timeout * HZ);
 
 		/* suspend updates on this record */
 		if (!list_empty(&vl->update)) {
-			spin_lock(&afs_vlocation_updates_lock);
+			spin_lock(&net->vl_updates_lock);
 			list_del_init(&vl->update);
-			spin_unlock(&afs_vlocation_updates_lock);
+			spin_unlock(&net->vl_updates_lock);
 		}
 	}
-	spin_unlock(&afs_vlocation_graveyard_lock);
+	spin_unlock(&net->vl_graveyard_lock);
 	_leave(" [killed?]");
 }
 
@@ -539,31 +532,34 @@ static void afs_vlocation_destroy(struct afs_vlocation *vl)
 /*
  * reap dead volume location records
  */
-static void afs_vlocation_reaper(struct work_struct *work)
+void afs_vlocation_reaper(struct work_struct *work)
 {
 	LIST_HEAD(corpses);
 	struct afs_vlocation *vl;
+	struct afs_net *net = container_of(work, struct afs_net, vl_reaper.work);
 	unsigned long delay, expiry;
 	time64_t now;
 
 	_enter("");
 
 	now = ktime_get_real_seconds();
-	spin_lock(&afs_vlocation_graveyard_lock);
+	spin_lock(&net->vl_graveyard_lock);
 
-	while (!list_empty(&afs_vlocation_graveyard)) {
-		vl = list_entry(afs_vlocation_graveyard.next,
+	while (!list_empty(&net->vl_graveyard)) {
+		vl = list_entry(net->vl_graveyard.next,
 				struct afs_vlocation, grave);
 
 		_debug("check %p", vl);
 
 		/* the queue is ordered most dead first */
-		expiry = vl->time_of_death + afs_vlocation_timeout;
-		if (expiry > now) {
-			delay = (expiry - now) * HZ;
-			_debug("delay %lu", delay);
-			mod_delayed_work(afs_wq, &afs_vlocation_reap, delay);
-			break;
+		if (net->live) {
+			expiry = vl->time_of_death + afs_vlocation_timeout;
+			if (expiry > now) {
+				delay = (expiry - now) * HZ;
+				_debug("delay %lu", delay);
+				mod_delayed_work(afs_wq, &net->vl_reaper, delay);
+				break;
+			}
 		}
 
 		spin_lock(&vl->cell->vl_lock);
@@ -578,7 +574,7 @@ static void afs_vlocation_reaper(struct work_struct *work)
 		spin_unlock(&vl->cell->vl_lock);
 	}
 
-	spin_unlock(&afs_vlocation_graveyard_lock);
+	spin_unlock(&net->vl_graveyard_lock);
 
 	/* now reap the corpses we've extracted */
 	while (!list_empty(&corpses)) {
@@ -591,56 +587,46 @@ static void afs_vlocation_reaper(struct work_struct *work)
 }
 
 /*
- * initialise the VL update process
- */
-int __init afs_vlocation_update_init(void)
-{
-	afs_vlocation_update_worker = alloc_workqueue("kafs_vlupdated",
-						      WQ_MEM_RECLAIM, 0);
-	return afs_vlocation_update_worker ? 0 : -ENOMEM;
-}
-
-/*
  * discard all the volume location records for rmmod
  */
-void afs_vlocation_purge(void)
+void __net_exit afs_vlocation_purge(struct afs_net *net)
 {
-	afs_vlocation_timeout = 0;
-
-	spin_lock(&afs_vlocation_updates_lock);
-	list_del_init(&afs_vlocation_updates);
-	spin_unlock(&afs_vlocation_updates_lock);
-	mod_delayed_work(afs_vlocation_update_worker, &afs_vlocation_update, 0);
-	destroy_workqueue(afs_vlocation_update_worker);
-
-	mod_delayed_work(afs_wq, &afs_vlocation_reap, 0);
+	spin_lock(&net->vl_updates_lock);
+	list_del_init(&net->vl_updates);
+	spin_unlock(&net->vl_updates_lock);
+	mod_delayed_work(afs_vlocation_update_worker, &net->vl_updater, 0);
+	mod_delayed_work(afs_wq, &net->vl_reaper, 0);
 }
 
 /*
  * update a volume location
  */
-static void afs_vlocation_updater(struct work_struct *work)
+void afs_vlocation_updater(struct work_struct *work)
 {
 	struct afs_cache_vlocation vldb;
 	struct afs_vlocation *vl, *xvl;
+	struct afs_net *net = container_of(work, struct afs_net, vl_updater.work);
 	time64_t now;
 	long timeout;
 	int ret;
 
+	if (!net->live)
+		return;
+
 	_enter("");
 
 	now = ktime_get_real_seconds();
 
 	/* find a record to update */
-	spin_lock(&afs_vlocation_updates_lock);
+	spin_lock(&net->vl_updates_lock);
 	for (;;) {
-		if (list_empty(&afs_vlocation_updates)) {
-			spin_unlock(&afs_vlocation_updates_lock);
+		if (list_empty(&net->vl_updates) || !net->live) {
+			spin_unlock(&net->vl_updates_lock);
 			_leave(" [nothing]");
 			return;
 		}
 
-		vl = list_entry(afs_vlocation_updates.next,
+		vl = list_entry(net->vl_updates.next,
 				struct afs_vlocation, update);
 		if (atomic_read(&vl->usage) > 0)
 			break;
@@ -650,15 +636,15 @@ static void afs_vlocation_updater(struct work_struct *work)
 	timeout = vl->update_at - now;
 	if (timeout > 0) {
 		queue_delayed_work(afs_vlocation_update_worker,
-				   &afs_vlocation_update, timeout * HZ);
-		spin_unlock(&afs_vlocation_updates_lock);
+				   &net->vl_updater, timeout * HZ);
+		spin_unlock(&net->vl_updates_lock);
 		_leave(" [nothing]");
 		return;
 	}
 
 	list_del_init(&vl->update);
 	atomic_inc(&vl->usage);
-	spin_unlock(&afs_vlocation_updates_lock);
+	spin_unlock(&net->vl_updates_lock);
 
 	/* we can now perform the update */
 	_debug("update %s", vl->vldb.name);
@@ -688,18 +674,18 @@ static void afs_vlocation_updater(struct work_struct *work)
 	vl->update_at = ktime_get_real_seconds() +
 			afs_vlocation_update_timeout;
 
-	spin_lock(&afs_vlocation_updates_lock);
+	spin_lock(&net->vl_updates_lock);
 
-	if (!list_empty(&afs_vlocation_updates)) {
+	if (!list_empty(&net->vl_updates)) {
 		/* next update in 10 minutes, but wait at least 1 second more
 		 * than the newest record already queued so that we don't spam
 		 * the VL server suddenly with lots of requests
 		 */
-		xvl = list_entry(afs_vlocation_updates.prev,
+		xvl = list_entry(net->vl_updates.prev,
 				 struct afs_vlocation, update);
 		if (vl->update_at <= xvl->update_at)
 			vl->update_at = xvl->update_at + 1;
-		xvl = list_entry(afs_vlocation_updates.next,
+		xvl = list_entry(net->vl_updates.next,
 				 struct afs_vlocation, update);
 		timeout = xvl->update_at - now;
 		if (timeout < 0)
@@ -710,11 +696,10 @@ static void afs_vlocation_updater(struct work_struct *work)
 
 	ASSERT(list_empty(&vl->update));
 
-	list_add_tail(&vl->update, &afs_vlocation_updates);
+	list_add_tail(&vl->update, &net->vl_updates);
 
 	_debug("timeout %ld", timeout);
-	queue_delayed_work(afs_vlocation_update_worker,
-			   &afs_vlocation_update, timeout * HZ);
-	spin_unlock(&afs_vlocation_updates_lock);
-	afs_put_vlocation(vl);
+	queue_delayed_work(afs_vlocation_update_worker, &net->vl_updater, timeout * HZ);
+	spin_unlock(&net->vl_updates_lock);
+	afs_put_vlocation(net, vl);
 }
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index db73d6dad02b..3d5363e0b7e1 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -54,7 +54,7 @@ struct afs_volume *afs_volume_lookup(struct afs_mount_params *params)
 	       params->volnamesz, params->volnamesz, params->volname, params->rwpath);
 
 	/* lookup the volume location record */
-	vlocation = afs_vlocation_lookup(params->cell, params->key,
+	vlocation = afs_vlocation_lookup(params->net, params->cell, params->key,
 					 params->volname, params->volnamesz);
 	if (IS_ERR(vlocation)) {
 		ret = PTR_ERR(vlocation);
@@ -138,7 +138,7 @@ struct afs_volume *afs_volume_lookup(struct afs_mount_params *params)
 	_debug("kAFS selected %s volume %08x",
 	       afs_voltypes[volume->type], volume->vid);
 	up_write(&params->cell->vl_sem);
-	afs_put_vlocation(vlocation);
+	afs_put_vlocation(params->net, vlocation);
 	_leave(" = %p", volume);
 	return volume;
 
@@ -146,7 +146,7 @@ struct afs_volume *afs_volume_lookup(struct afs_mount_params *params)
 error_up:
 	up_write(&params->cell->vl_sem);
 error:
-	afs_put_vlocation(vlocation);
+	afs_put_vlocation(params->net, vlocation);
 	_leave(" = %d", ret);
 	return ERR_PTR(ret);
 
@@ -163,7 +163,7 @@ struct afs_volume *afs_volume_lookup(struct afs_mount_params *params)
 /*
  * destroy a volume record
  */
-void afs_put_volume(struct afs_volume *volume)
+void afs_put_volume(struct afs_net *net, struct afs_volume *volume)
 {
 	struct afs_vlocation *vlocation;
 	int loop;
@@ -195,7 +195,7 @@ void afs_put_volume(struct afs_volume *volume)
 #ifdef CONFIG_AFS_FSCACHE
 	fscache_relinquish_cookie(volume->cache, 0);
 #endif
-	afs_put_vlocation(vlocation);
+	afs_put_vlocation(net, vlocation);
 
 	for (loop = volume->nservers - 1; loop >= 0; loop--)
 		afs_put_server(volume->servers[loop]);
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index e439565df838..b61025e1944e 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -46,6 +46,7 @@
 #define OPENPROM_SUPER_MAGIC	0x9fa1
 #define QNX4_SUPER_MAGIC	0x002f		/* qnx4 fs detection */
 #define QNX6_SUPER_MAGIC	0x68191122	/* qnx6 fs detection */
+#define AFS_FS_MAGIC		0x6B414653
 
 #define REISERFS_SUPER_MAGIC	0x52654973	/* used by gcc */
 					/* used by file system utilities that

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 06/11] afs: Add some protocol defs
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
                   ` (3 preceding siblings ...)
  2017-09-01 15:41 ` [RFC PATCH 05/11] afs: Lay the groundwork for supporting network namespaces David Howells
@ 2017-09-01 15:41 ` David Howells
  2017-09-01 15:41 ` [RFC PATCH 07/11] afs: Update the cache index structure David Howells
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:41 UTC (permalink / raw)
  To: linux-afs; +Cc: linux-fsdevel, dhowells, linux-kernel

Add some protocol definitions, including max field lengths, flag defs and
an XDR-encoded UUID def.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/afs.h    |   25 ++++++++++++++++++++-----
 fs/afs/afs_vl.h |    6 +++++-
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index 93053115bcfc..0d837bddbf7d 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -14,11 +14,14 @@
 
 #include <linux/in.h>
 
-#define AFS_MAXCELLNAME	64		/* maximum length of a cell name */
-#define AFS_MAXVOLNAME	64		/* maximum length of a volume name */
-#define AFSNAMEMAX	256		/* maximum length of a filename plus NUL */
-#define AFSPATHMAX	1024		/* maximum length of a pathname plus NUL */
-#define AFSOPAQUEMAX	1024		/* maximum length of an opaque field */
+#define AFS_MAXCELLNAME		64  	/* Maximum length of a cell name */
+#define AFS_MAXVOLNAME		64  	/* Maximum length of a volume name */
+#define AFS_MAXNSERVERS		8   	/* Maximum servers in a basic volume record */
+#define AFS_NMAXNSERVERS	13  	/* Maximum servers in a N/U-class volume record */
+#define AFS_MAXTYPES		3	/* Maximum number of volume types */
+#define AFSNAMEMAX		256 	/* Maximum length of a filename plus NUL */
+#define AFSPATHMAX		1024	/* Maximum length of a pathname plus NUL */
+#define AFSOPAQUEMAX		1024	/* Maximum length of an opaque field */
 
 typedef unsigned			afs_volid_t;
 typedef unsigned			afs_vnodeid_t;
@@ -176,4 +179,16 @@ struct afs_volume_status {
 
 #define AFS_BLOCK_SIZE	1024
 
+/*
+ * XDR encoding of UUID in AFS.
+ */
+struct afs_uuid__xdr {
+	__be32		time_low;
+	__be32		time_mid;
+	__be32		time_hi_and_version;
+	__be32		clock_seq_hi_and_reserved;
+	__be32		clock_seq_low;
+	__be32		node[6];
+};
+
 #endif /* AFS_H */
diff --git a/fs/afs/afs_vl.h b/fs/afs/afs_vl.h
index 800f607ffaf5..5b7cd9d7e9e6 100644
--- a/fs/afs/afs_vl.h
+++ b/fs/afs/afs_vl.h
@@ -74,11 +74,15 @@ struct afs_vldbentry {
 		struct in_addr	addr;		/* server address */
 		unsigned	partition;	/* partition ID on this server */
 		unsigned	flags;		/* server specific flags */
-#define AFS_VLSF_NEWREPSITE	0x0001	/* unused */
+#define AFS_VLSF_NEWREPSITE	0x0001	/* Ignore all 'non-new' servers */
 #define AFS_VLSF_ROVOL		0x0002	/* this server holds a R/O instance of the volume */
 #define AFS_VLSF_RWVOL		0x0004	/* this server holds a R/W instance of the volume */
 #define AFS_VLSF_BACKVOL	0x0008	/* this server holds a backup instance of the volume */
+#define AFS_VLSF_UUID		0x0010	/* This server is referred to by its UUID */
+#define AFS_VLSF_DONTUSE	0x0020	/* This server ref should be ignored */
 	} servers[8];
 };
 
+#define AFS_VLDB_MAXNAMELEN 65
+
 #endif /* AFS_VL_H */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 07/11] afs: Update the cache index structure
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
                   ` (4 preceding siblings ...)
  2017-09-01 15:41 ` [RFC PATCH 06/11] afs: Add some protocol defs David Howells
@ 2017-09-01 15:41 ` David Howells
  2017-09-01 15:41 ` [RFC PATCH 08/11] afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr David Howells
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:41 UTC (permalink / raw)
  To: linux-afs; +Cc: linux-fsdevel, dhowells, linux-kernel

Update the cache index structure in the following ways:

 (1) Don't use the volume name followed by the volume type as levels in the
     cache index.  Volumes can be renamed.  Use the volume ID instead.

 (2) Don't store the VLDB data for a volume in the tree.  If the volume
     database should be cached locally, then it should be done in a separate
     tree.

 (3) Expand the volume ID stored in the cache to 64 bits.

 (4) Expand the file/vnode ID stored in the cache to 96 bits.

 (5) Increment the cache structure version number to 1.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/cache.c     |  239 +++++++++-------------------------------------------
 fs/afs/internal.h  |   21 -----
 fs/afs/vlocation.c |   39 +-------
 fs/afs/volume.c    |    2 
 4 files changed, 50 insertions(+), 251 deletions(-)

diff --git a/fs/afs/cache.c b/fs/afs/cache.c
index 577763c3d88b..e19a643e0c29 100644
--- a/fs/afs/cache.c
+++ b/fs/afs/cache.c
@@ -14,19 +14,6 @@
 
 static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
 				       void *buffer, uint16_t buflen);
-static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
-				       void *buffer, uint16_t buflen);
-static enum fscache_checkaux afs_cell_cache_check_aux(void *cookie_netfs_data,
-						      const void *buffer,
-						      uint16_t buflen);
-
-static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
-					    void *buffer, uint16_t buflen);
-static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
-					    void *buffer, uint16_t buflen);
-static enum fscache_checkaux afs_vlocation_cache_check_aux(
-	void *cookie_netfs_data, const void *buffer, uint16_t buflen);
-
 static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
 					 void *buffer, uint16_t buflen);
 
@@ -43,23 +30,13 @@ static void afs_vnode_cache_now_uncached(void *cookie_netfs_data);
 
 struct fscache_netfs afs_cache_netfs = {
 	.name			= "afs",
-	.version		= 0,
+	.version		= 1,
 };
 
 struct fscache_cookie_def afs_cell_cache_index_def = {
 	.name		= "AFS.cell",
 	.type		= FSCACHE_COOKIE_TYPE_INDEX,
 	.get_key	= afs_cell_cache_get_key,
-	.get_aux	= afs_cell_cache_get_aux,
-	.check_aux	= afs_cell_cache_check_aux,
-};
-
-struct fscache_cookie_def afs_vlocation_cache_index_def = {
-	.name			= "AFS.vldb",
-	.type			= FSCACHE_COOKIE_TYPE_INDEX,
-	.get_key		= afs_vlocation_cache_get_key,
-	.get_aux		= afs_vlocation_cache_get_aux,
-	.check_aux		= afs_vlocation_cache_check_aux,
 };
 
 struct fscache_cookie_def afs_volume_cache_index_def = {
@@ -97,150 +74,26 @@ static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
 	return klen;
 }
 
-/*
- * provide new auxiliary cache data
- */
-static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
-				       void *buffer, uint16_t bufmax)
-{
-	const struct afs_cell *cell = cookie_netfs_data;
-	uint16_t dlen;
-
-	_enter("%p,%p,%u", cell, buffer, bufmax);
-
-	dlen = cell->vl_naddrs * sizeof(cell->vl_addrs[0]);
-	dlen = min(dlen, bufmax);
-	dlen &= ~(sizeof(cell->vl_addrs[0]) - 1);
-
-	memcpy(buffer, cell->vl_addrs, dlen);
-	return dlen;
-}
-
-/*
- * check that the auxiliary data indicates that the entry is still valid
- */
-static enum fscache_checkaux afs_cell_cache_check_aux(void *cookie_netfs_data,
-						      const void *buffer,
-						      uint16_t buflen)
-{
-	_leave(" = OKAY");
-	return FSCACHE_CHECKAUX_OKAY;
-}
-
-/*****************************************************************************/
-/*
- * set the key for the index entry
- */
-static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
-					    void *buffer, uint16_t bufmax)
-{
-	const struct afs_vlocation *vlocation = cookie_netfs_data;
-	uint16_t klen;
-
-	_enter("{%s},%p,%u", vlocation->vldb.name, buffer, bufmax);
-
-	klen = strnlen(vlocation->vldb.name, sizeof(vlocation->vldb.name));
-	if (klen > bufmax)
-		return 0;
-
-	memcpy(buffer, vlocation->vldb.name, klen);
-
-	_leave(" = %u", klen);
-	return klen;
-}
-
-/*
- * provide new auxiliary cache data
- */
-static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
-					    void *buffer, uint16_t bufmax)
-{
-	const struct afs_vlocation *vlocation = cookie_netfs_data;
-	uint16_t dlen;
-
-	_enter("{%s},%p,%u", vlocation->vldb.name, buffer, bufmax);
-
-	dlen = sizeof(struct afs_cache_vlocation);
-	dlen -= offsetof(struct afs_cache_vlocation, nservers);
-	if (dlen > bufmax)
-		return 0;
-
-	memcpy(buffer, (uint8_t *)&vlocation->vldb.nservers, dlen);
-
-	_leave(" = %u", dlen);
-	return dlen;
-}
-
-/*
- * check that the auxiliary data indicates that the entry is still valid
- */
-static
-enum fscache_checkaux afs_vlocation_cache_check_aux(void *cookie_netfs_data,
-						    const void *buffer,
-						    uint16_t buflen)
-{
-	const struct afs_cache_vlocation *cvldb;
-	struct afs_vlocation *vlocation = cookie_netfs_data;
-	uint16_t dlen;
-
-	_enter("{%s},%p,%u", vlocation->vldb.name, buffer, buflen);
-
-	/* check the size of the data is what we're expecting */
-	dlen = sizeof(struct afs_cache_vlocation);
-	dlen -= offsetof(struct afs_cache_vlocation, nservers);
-	if (dlen != buflen)
-		return FSCACHE_CHECKAUX_OBSOLETE;
-
-	cvldb = container_of(buffer, struct afs_cache_vlocation, nservers);
-
-	/* if what's on disk is more valid than what's in memory, then use the
-	 * VL record from the cache */
-	if (!vlocation->valid || vlocation->vldb.rtime == cvldb->rtime) {
-		memcpy((uint8_t *)&vlocation->vldb.nservers, buffer, dlen);
-		vlocation->valid = 1;
-		_leave(" = SUCCESS [c->m]");
-		return FSCACHE_CHECKAUX_OKAY;
-	}
-
-	/* need to update the cache if the cached info differs */
-	if (memcmp(&vlocation->vldb, buffer, dlen) != 0) {
-		/* delete if the volume IDs for this name differ */
-		if (memcmp(&vlocation->vldb.vid, &cvldb->vid,
-			   sizeof(cvldb->vid)) != 0
-		    ) {
-			_leave(" = OBSOLETE");
-			return FSCACHE_CHECKAUX_OBSOLETE;
-		}
-
-		_leave(" = UPDATE");
-		return FSCACHE_CHECKAUX_NEEDS_UPDATE;
-	}
-
-	_leave(" = OKAY");
-	return FSCACHE_CHECKAUX_OKAY;
-}
-
 /*****************************************************************************/
 /*
  * set the key for the volume index entry
  */
 static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
-					void *buffer, uint16_t bufmax)
+					 void *buffer, uint16_t bufmax)
 {
 	const struct afs_volume *volume = cookie_netfs_data;
-	uint16_t klen;
+	struct {
+		u64 volid;
+	} __packed key;
 
 	_enter("{%u},%p,%u", volume->type, buffer, bufmax);
 
-	klen = sizeof(volume->type);
-	if (klen > bufmax)
+	if (bufmax < sizeof(key))
 		return 0;
 
-	memcpy(buffer, &volume->type, sizeof(volume->type));
-
-	_leave(" = %u", klen);
-	return klen;
-
+	key.volid = volume->vid;
+	memcpy(buffer, &key, sizeof(key));
+	return sizeof(key);
 }
 
 /*****************************************************************************/
@@ -251,20 +104,25 @@ static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
 					void *buffer, uint16_t bufmax)
 {
 	const struct afs_vnode *vnode = cookie_netfs_data;
-	uint16_t klen;
+	struct {
+		u32 vnode_id[3];
+	} __packed key;
 
 	_enter("{%x,%x,%llx},%p,%u",
 	       vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version,
 	       buffer, bufmax);
 
-	klen = sizeof(vnode->fid.vnode);
-	if (klen > bufmax)
-		return 0;
+	/* Allow for a 96-bit key */
+	memset(&key, 0, sizeof(key));
+	key.vnode_id[0] = vnode->fid.vnode;
+	key.vnode_id[1] = 0;
+	key.vnode_id[2] = 0;
 
-	memcpy(buffer, &vnode->fid.vnode, sizeof(vnode->fid.vnode));
+	if (sizeof(key) > bufmax)
+		return 0;
 
-	_leave(" = %u", klen);
-	return klen;
+	memcpy(buffer, &key, sizeof(key));
+	return sizeof(key);
 }
 
 /*
@@ -282,6 +140,11 @@ static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
 	*size = vnode->status.size;
 }
 
+struct afs_vnode_cache_aux {
+	u64 data_version;
+	u32 fid_unique;
+} __packed;
+
 /*
  * provide new auxiliary cache data
  */
@@ -289,23 +152,21 @@ static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
 					void *buffer, uint16_t bufmax)
 {
 	const struct afs_vnode *vnode = cookie_netfs_data;
-	uint16_t dlen;
+	struct afs_vnode_cache_aux aux;
 
 	_enter("{%x,%x,%Lx},%p,%u",
 	       vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version,
 	       buffer, bufmax);
 
-	dlen = sizeof(vnode->fid.unique) + sizeof(vnode->status.data_version);
-	if (dlen > bufmax)
-		return 0;
+	memset(&aux, 0, sizeof(aux));
+	aux.data_version = vnode->status.data_version;
+	aux.fid_unique = vnode->fid.unique;
 
-	memcpy(buffer, &vnode->fid.unique, sizeof(vnode->fid.unique));
-	buffer += sizeof(vnode->fid.unique);
-	memcpy(buffer, &vnode->status.data_version,
-	       sizeof(vnode->status.data_version));
+	if (bufmax < sizeof(aux))
+		return 0;
 
-	_leave(" = %u", dlen);
-	return dlen;
+	memcpy(buffer, &aux, sizeof(aux));
+	return sizeof(aux);
 }
 
 /*
@@ -316,43 +177,29 @@ static enum fscache_checkaux afs_vnode_cache_check_aux(void *cookie_netfs_data,
 						       uint16_t buflen)
 {
 	struct afs_vnode *vnode = cookie_netfs_data;
-	uint16_t dlen;
+	struct afs_vnode_cache_aux aux;
 
 	_enter("{%x,%x,%llx},%p,%u",
 	       vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version,
 	       buffer, buflen);
 
+	memcpy(&aux, buffer, sizeof(aux));
+
 	/* check the size of the data is what we're expecting */
-	dlen = sizeof(vnode->fid.unique) + sizeof(vnode->status.data_version);
-	if (dlen != buflen) {
-		_leave(" = OBSOLETE [len %hx != %hx]", dlen, buflen);
+	if (buflen != sizeof(aux)) {
+		_leave(" = OBSOLETE [len %hx != %zx]", buflen, sizeof(aux));
 		return FSCACHE_CHECKAUX_OBSOLETE;
 	}
 
-	if (memcmp(buffer,
-		   &vnode->fid.unique,
-		   sizeof(vnode->fid.unique)
-		   ) != 0) {
-		unsigned unique;
-
-		memcpy(&unique, buffer, sizeof(unique));
-
+	if (vnode->fid.unique != aux.fid_unique) {
 		_leave(" = OBSOLETE [uniq %x != %x]",
-		       unique, vnode->fid.unique);
+		       aux.fid_unique, vnode->fid.unique);
 		return FSCACHE_CHECKAUX_OBSOLETE;
 	}
 
-	if (memcmp(buffer + sizeof(vnode->fid.unique),
-		   &vnode->status.data_version,
-		   sizeof(vnode->status.data_version)
-		   ) != 0) {
-		afs_dataversion_t version;
-
-		memcpy(&version, buffer + sizeof(vnode->fid.unique),
-		       sizeof(version));
-
+	if (vnode->status.data_version != aux.data_version) {
 		_leave(" = OBSOLETE [vers %llx != %llx]",
-		       version, vnode->status.data_version);
+		       aux.data_version, vnode->status.data_version);
 		return FSCACHE_CHECKAUX_OBSOLETE;
 	}
 
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index e0484a38c9ce..0ffdb02a58cb 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -189,14 +189,6 @@ static inline struct afs_super_info *AFS_FS_S(struct super_block *sb)
 extern struct file_system_type afs_fs_type;
 
 /*
- * entry in the cached cell catalogue
- */
-struct afs_cache_cell {
-	char		name[AFS_MAXCELLNAME];	/* cell name (padded with NULs) */
-	struct in_addr	vl_servers[15];		/* cached cell VL servers */
-};
-
-/*
  * AFS network namespace record.
  */
 struct afs_net {
@@ -293,14 +285,6 @@ struct afs_cache_vlocation {
 };
 
 /*
- * volume -> vnode hash table entry
- */
-struct afs_cache_vhash {
-	afs_voltype_t		vtype;		/* which volume variation */
-	uint8_t			hash_bucket;	/* which hash bucket this represents */
-} __attribute__((packed));
-
-/*
  * AFS volume location record
  */
 struct afs_vlocation {
@@ -310,9 +294,6 @@ struct afs_vlocation {
 	struct list_head	grave;		/* link in master graveyard list */
 	struct list_head	update;		/* link in master update list */
 	struct afs_cell		*cell;		/* cell to which volume belongs */
-#ifdef CONFIG_AFS_FSCACHE
-	struct fscache_cookie	*cache;		/* caching cookie */
-#endif
 	struct afs_cache_vlocation vldb;	/* volume information DB record */
 	struct afs_volume	*vols[3];	/* volume access record pointer (index by type) */
 	wait_queue_head_t	waitq;		/* status change waitqueue */
@@ -472,12 +453,10 @@ struct afs_interface {
 #ifdef CONFIG_AFS_FSCACHE
 extern struct fscache_netfs afs_cache_netfs;
 extern struct fscache_cookie_def afs_cell_cache_index_def;
-extern struct fscache_cookie_def afs_vlocation_cache_index_def;
 extern struct fscache_cookie_def afs_volume_cache_index_def;
 extern struct fscache_cookie_def afs_vnode_cache_index_def;
 #else
 #define afs_cell_cache_index_def	(*(struct fscache_cookie_def *) NULL)
-#define afs_vlocation_cache_index_def	(*(struct fscache_cookie_def *) NULL)
 #define afs_volume_cache_index_def	(*(struct fscache_cookie_def *) NULL)
 #define afs_vnode_cache_index_def	(*(struct fscache_cookie_def *) NULL)
 #endif
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index ccb7aacfbeca..d9a5d7acdb86 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -273,10 +273,6 @@ static void afs_vlocation_apply_update(struct afs_vlocation *vl,
 		       vl->vldb.name, vldb->name);
 
 	vl->vldb = *vldb;
-
-#ifdef CONFIG_AFS_FSCACHE
-	fscache_update_cookie(vl->cache);
-#endif
 }
 
 /*
@@ -295,27 +291,12 @@ static int afs_vlocation_fill_in_record(struct afs_vlocation *vl,
 
 	memset(&vldb, 0, sizeof(vldb));
 
-	/* see if we have an in-cache copy (will set vl->valid if there is) */
-#ifdef CONFIG_AFS_FSCACHE
-	vl->cache = fscache_acquire_cookie(vl->cell->cache,
-					   &afs_vlocation_cache_index_def, vl,
-					   true);
-#endif
-
-	if (vl->valid) {
-		/* try to update a known volume in the cell VL databases by
-		 * ID as the name may have changed */
-		_debug("found in cache");
-		ret = afs_vlocation_update_record(vl, key, &vldb);
-	} else {
-		/* try to look up an unknown volume in the cell VL databases by
-		 * name */
-		ret = afs_vlocation_access_vl_by_name(vl, key, &vldb);
-		if (ret < 0) {
-			printk("kAFS: failed to locate '%s' in cell '%s'\n",
-			       vl->vldb.name, vl->cell->name);
-			return ret;
-		}
+	/* Try to look up an unknown volume in the cell VL databases by name */
+	ret = afs_vlocation_access_vl_by_name(vl, key, &vldb);
+	if (ret < 0) {
+		printk("kAFS: failed to locate '%s' in cell '%s'\n",
+		       vl->vldb.name, vl->cell->name);
+		return ret;
 	}
 
 	afs_vlocation_apply_update(vl, &vldb);
@@ -414,11 +395,6 @@ struct afs_vlocation *afs_vlocation_lookup(struct afs_net *net,
 	spin_unlock(&vl->lock);
 	wake_up(&vl->waitq);
 
-	/* update volume entry in local cache */
-#ifdef CONFIG_AFS_FSCACHE
-	fscache_update_cookie(vl->cache);
-#endif
-
 	/* schedule for regular updates */
 	afs_vlocation_queue_for_updates(net, vl);
 	goto success;
@@ -522,9 +498,6 @@ static void afs_vlocation_destroy(struct afs_vlocation *vl)
 {
 	_enter("%p", vl);
 
-#ifdef CONFIG_AFS_FSCACHE
-	fscache_relinquish_cookie(vl->cache, 0);
-#endif
 	afs_put_cell(vl->cell);
 	kfree(vl);
 }
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index 3d5363e0b7e1..bc52203695b7 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -125,7 +125,7 @@ struct afs_volume *afs_volume_lookup(struct afs_mount_params *params)
 
 	/* attach the cache and volume location */
 #ifdef CONFIG_AFS_FSCACHE
-	volume->cache = fscache_acquire_cookie(vlocation->cache,
+	volume->cache = fscache_acquire_cookie(volume->cell->cache,
 					       &afs_volume_cache_index_def,
 					       volume, true);
 #endif

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 08/11] afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
                   ` (5 preceding siblings ...)
  2017-09-01 15:41 ` [RFC PATCH 07/11] afs: Update the cache index structure David Howells
@ 2017-09-01 15:41 ` David Howells
  2017-09-01 15:41 ` [RFC PATCH 09/11] afs: Allow IPv6 address specification of VL servers David Howells
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:41 UTC (permalink / raw)
  To: linux-afs; +Cc: linux-fsdevel, dhowells, linux-kernel

Keep and pass sockaddr_rxrpc addresses around rather than keeping and
passing in_addr addresses to allow for the use of IPv6 and non-standard
port numbers in future.

This also allows the port and service_id fields to be removed from the
afs_call struct.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/cell.c      |   18 +++++++++++++++---
 fs/afs/fsclient.c  |   36 ------------------------------------
 fs/afs/internal.h  |   16 +++++++---------
 fs/afs/proc.c      |   10 +++++-----
 fs/afs/rxrpc.c     |   18 ++++--------------
 fs/afs/server.c    |   31 ++++++++++++++++---------------
 fs/afs/vlclient.c  |   20 ++++++++++++--------
 fs/afs/vlocation.c |   30 ++++++++----------------------
 fs/afs/vnode.c     |   28 ++++++++++++++--------------
 fs/afs/volume.c    |    9 ++++-----
 10 files changed, 85 insertions(+), 131 deletions(-)

diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index bd570fa539a0..3a6c91ae11a3 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -31,7 +31,7 @@ static struct afs_cell *afs_cell_alloc(struct afs_net *net,
 	char keyname[4 + AFS_MAXCELLNAME + 1], *cp, *dp, *next;
 	char  *dvllist = NULL, *_vllist = NULL;
 	char  delimiter = ':';
-	int ret;
+	int ret, i;
 
 	_enter("%*.*s,%s", namelen, namelen, name ?: "", vllist);
 
@@ -61,6 +61,14 @@ static struct afs_cell *afs_cell_alloc(struct afs_net *net,
 	INIT_LIST_HEAD(&cell->vl_list);
 	spin_lock_init(&cell->vl_lock);
 
+	for (i = 0; i < AFS_CELL_MAX_ADDRS; i++) {
+		struct sockaddr_rxrpc *srx = &cell->vl_addrs[i];
+		srx->srx_family			= AF_RXRPC;
+		srx->srx_service		= VL_SERVICE;
+		srx->transport_type		= SOCK_DGRAM;
+		srx->transport.sin.sin_port	= htons(AFS_VL_PORT);
+	}
+
 	/* if the ip address is invalid, try dns query */
 	if (!vllist || strlen(vllist) < 7) {
 		ret = dns_query("afsdb", name, namelen, "ipv4", &dvllist, NULL);
@@ -83,6 +91,7 @@ static struct afs_cell *afs_cell_alloc(struct afs_net *net,
 
 	/* fill in the VL server list from the rest of the string */
 	do {
+		struct sockaddr_rxrpc *srx = &cell->vl_addrs[cell->vl_naddrs];
 		unsigned a, b, c, d;
 
 		next = strchr(_vllist, delimiter);
@@ -95,10 +104,13 @@ static struct afs_cell *afs_cell_alloc(struct afs_net *net,
 		if (a > 255 || b > 255 || c > 255 || d > 255)
 			goto bad_address;
 
-		cell->vl_addrs[cell->vl_naddrs++].s_addr =
+		srx->transport_len		= sizeof(struct sockaddr_in);
+		srx->transport.sin.sin_family	= AF_INET;
+		srx->transport.sin.sin_addr.s_addr =
 			htonl((a << 24) | (b << 16) | (c << 8) | d);
 
-	} while (cell->vl_naddrs < AFS_CELL_MAX_ADDRS && (_vllist = next));
+	} while (cell->vl_naddrs++,
+		 cell->vl_naddrs < AFS_CELL_MAX_ADDRS && (_vllist = next));
 
 	/* create a key to represent an anonymous user */
 	memcpy(keyname, "afs@", 4);
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index ce6f0159e1d4..bac2e8db6e75 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -297,8 +297,6 @@ int afs_fs_fetch_file_status(struct afs_server *server,
 	call->key = key;
 	call->reply = vnode;
 	call->reply2 = volsync;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -504,8 +502,6 @@ static int afs_fs_fetch_data64(struct afs_server *server,
 	call->reply = vnode;
 	call->reply2 = NULL; /* volsync */
 	call->reply3 = req;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 	call->operation_ID = FSFETCHDATA64;
 
 	/* marshall the parameters */
@@ -551,8 +547,6 @@ int afs_fs_fetch_data(struct afs_server *server,
 	call->reply = vnode;
 	call->reply2 = NULL; /* volsync */
 	call->reply3 = req;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 	call->operation_ID = FSFETCHDATA;
 
 	/* marshall the parameters */
@@ -619,8 +613,6 @@ int afs_fs_give_up_callbacks(struct afs_net *net,
 	if (!call)
 		return -ENOMEM;
 
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -723,8 +715,6 @@ int afs_fs_create(struct afs_server *server,
 	call->reply2 = newfid;
 	call->reply3 = newstatus;
 	call->reply4 = newcb;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -810,8 +800,6 @@ int afs_fs_remove(struct afs_server *server,
 
 	call->key = key;
 	call->reply = vnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -893,8 +881,6 @@ int afs_fs_link(struct afs_server *server,
 	call->key = key;
 	call->reply = dvnode;
 	call->reply2 = vnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -988,8 +974,6 @@ int afs_fs_symlink(struct afs_server *server,
 	call->reply = vnode;
 	call->reply2 = newfid;
 	call->reply3 = newstatus;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -1094,8 +1078,6 @@ int afs_fs_rename(struct afs_server *server,
 	call->key = key;
 	call->reply = orig_dvnode;
 	call->reply2 = new_dvnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -1196,8 +1178,6 @@ static int afs_fs_store_data64(struct afs_server *server,
 	call->wb = wb;
 	call->key = wb->key;
 	call->reply = vnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 	call->mapping = vnode->vfs_inode.i_mapping;
 	call->first = first;
 	call->last = last;
@@ -1274,8 +1254,6 @@ int afs_fs_store_data(struct afs_server *server, struct afs_writeback *wb,
 	call->wb = wb;
 	call->key = wb->key;
 	call->reply = vnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 	call->mapping = vnode->vfs_inode.i_mapping;
 	call->first = first;
 	call->last = last;
@@ -1383,8 +1361,6 @@ static int afs_fs_setattr_size64(struct afs_server *server, struct key *key,
 
 	call->key = key;
 	call->reply = vnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 	call->store_version = vnode->status.data_version + 1;
 	call->operation_ID = FSSTOREDATA;
 
@@ -1435,8 +1411,6 @@ static int afs_fs_setattr_size(struct afs_server *server, struct key *key,
 
 	call->key = key;
 	call->reply = vnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 	call->store_version = vnode->status.data_version + 1;
 	call->operation_ID = FSSTOREDATA;
 
@@ -1483,8 +1457,6 @@ int afs_fs_setattr(struct afs_server *server, struct key *key,
 
 	call->key = key;
 	call->reply = vnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 	call->operation_ID = FSSTORESTATUS;
 
 	/* marshall the parameters */
@@ -1721,8 +1693,6 @@ int afs_fs_get_volume_status(struct afs_server *server,
 	call->reply = vnode;
 	call->reply2 = vs;
 	call->reply3 = tmpbuf;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -1805,8 +1775,6 @@ int afs_fs_set_lock(struct afs_server *server,
 
 	call->key = key;
 	call->reply = vnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -1839,8 +1807,6 @@ int afs_fs_extend_lock(struct afs_server *server,
 
 	call->key = key;
 	call->reply = vnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -1872,8 +1838,6 @@ int afs_fs_release_lock(struct afs_server *server,
 
 	call->key = key;
 	call->reply = vnode;
-	call->service_id = FS_SERVICE;
-	call->port = htons(AFS_FS_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 0ffdb02a58cb..47d5ae08f071 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -104,8 +104,6 @@ struct afs_call {
 	bool			send_pages;	/* T if data from mapping should be sent */
 	bool			need_attention;	/* T if RxRPC poked us */
 	bool			async;		/* T if asynchronous */
-	u16			service_id;	/* RxRPC service ID to call */
-	__be16			port;		/* target UDP port */
 	u32			operation_ID;	/* operation ID for an incoming call */
 	u32			count;		/* count for use in unmarshalling */
 	__be32			tmp;		/* place to extract temporary data */
@@ -260,7 +258,7 @@ struct afs_cell {
 	spinlock_t		vl_lock;	/* vl_list lock */
 	unsigned short		vl_naddrs;	/* number of VL servers in addr list */
 	unsigned short		vl_curr_svix;	/* current server index */
-	struct in_addr		vl_addrs[AFS_CELL_MAX_ADDRS];	/* cell VL server addresses */
+	struct sockaddr_rxrpc	vl_addrs[AFS_CELL_MAX_ADDRS];	/* cell VL server addresses */
 
 	char			name[0];	/* cell name - must go last */
 };
@@ -280,7 +278,7 @@ struct afs_cache_vlocation {
 #define AFS_VOL_VTM_BAK	0x04 /* backup version of the volume is available (on this server) */
 
 	afs_volid_t		vid[3];		/* volume IDs for R/W, R/O and Bak volumes */
-	struct in_addr		servers[8];	/* fileserver addresses */
+	struct sockaddr_rxrpc	servers[8];	/* fileserver addresses */
 	time_t			rtime;		/* last retrieval time */
 };
 
@@ -311,7 +309,7 @@ struct afs_vlocation {
 struct afs_server {
 	atomic_t		usage;
 	time64_t		time_of_death;	/* time at which put reduced usage to 0 */
-	struct in_addr		addr;		/* server address */
+	struct sockaddr_rxrpc	addr;		/* server address */
 	struct afs_cell		*cell;		/* cell in which server resides */
 	struct list_head	link;		/* link in cell's server list */
 	struct list_head	grave;		/* link in master graveyard list */
@@ -634,7 +632,7 @@ extern void __net_exit afs_close_socket(struct afs_net *);
 extern void afs_charge_preallocation(struct work_struct *);
 extern void afs_put_call(struct afs_call *);
 extern int afs_queue_call_work(struct afs_call *);
-extern int afs_make_call(struct in_addr *, struct afs_call *, gfp_t, bool);
+extern int afs_make_call(struct sockaddr_rxrpc *, struct afs_call *, gfp_t, bool);
 extern struct afs_call *afs_alloc_flat_call(struct afs_net *,
 					    const struct afs_call_type *,
 					    size_t, size_t);
@@ -669,7 +667,7 @@ do {								\
 } while(0)
 
 extern struct afs_server *afs_lookup_server(struct afs_cell *,
-					    const struct in_addr *);
+					    struct sockaddr_rxrpc *);
 extern struct afs_server *afs_find_server(struct afs_net *,
 					  const struct sockaddr_rxrpc *);
 extern void afs_put_server(struct afs_server *);
@@ -686,11 +684,11 @@ extern void __exit afs_fs_exit(void);
  * vlclient.c
  */
 extern int afs_vl_get_entry_by_name(struct afs_net *,
-				    struct in_addr *, struct key *,
+				    struct sockaddr_rxrpc *, struct key *,
 				    const char *, struct afs_cache_vlocation *,
 				    bool);
 extern int afs_vl_get_entry_by_id(struct afs_net *,
-				  struct in_addr *, struct key *,
+				  struct sockaddr_rxrpc *, struct key *,
 				  afs_volid_t, afs_voltype_t,
 				  struct afs_cache_vlocation *, bool);
 
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index c93433460348..4d609869a57b 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -570,16 +570,16 @@ static void afs_proc_cell_vlservers_stop(struct seq_file *p, void *v)
  */
 static int afs_proc_cell_vlservers_show(struct seq_file *m, void *v)
 {
-	struct in_addr *addr = v;
+	struct sockaddr_rxrpc *addr = v;
 
 	/* display header on line 1 */
-	if (v == (struct in_addr *) 1) {
+	if (v == (void *)1) {
 		seq_puts(m, "ADDRESS\n");
 		return 0;
 	}
 
 	/* display one cell per line on subsequent lines */
-	seq_printf(m, "%pI4\n", &addr->s_addr);
+	seq_printf(m, "%pISp\n", &addr->transport);
 	return 0;
 }
 
@@ -652,7 +652,7 @@ static int afs_proc_cell_servers_show(struct seq_file *m, void *v)
 {
 	struct afs_cell *cell = m->private;
 	struct afs_server *server = list_entry(v, struct afs_server, link);
-	char ipaddr[20];
+	char ipaddr[64];
 
 	/* display header on line 1 */
 	if (v == &cell->servers) {
@@ -661,7 +661,7 @@ static int afs_proc_cell_servers_show(struct seq_file *m, void *v)
 	}
 
 	/* display one cell per line on subsequent lines */
-	sprintf(ipaddr, "%pI4", &server->addr);
+	sprintf(ipaddr, "%pISp", &server->addr.transport);
 	seq_printf(m, "%3d %-15.15s %5d\n",
 		   atomic_read(&server->usage), ipaddr, server->fs_state);
 
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index fba7d56b64b4..f57a2d41a310 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -322,10 +322,9 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg)
 /*
  * initiate a call
  */
-int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
-		  bool async)
+int afs_make_call(struct sockaddr_rxrpc *srx, struct afs_call *call,
+		  gfp_t gfp, bool async)
 {
-	struct sockaddr_rxrpc srx;
 	struct rxrpc_call *rxcall;
 	struct msghdr msg;
 	struct kvec iov[1];
@@ -334,7 +333,7 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 	u32 abort_code;
 	int ret;
 
-	_enter("%x,{%d},", addr->s_addr, ntohs(call->port));
+	_enter(",{%pISp},", &srx->transport);
 
 	ASSERT(call->type != NULL);
 	ASSERT(call->type->name != NULL);
@@ -345,15 +344,6 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 
 	call->async = async;
 
-	memset(&srx, 0, sizeof(srx));
-	srx.srx_family = AF_RXRPC;
-	srx.srx_service = call->service_id;
-	srx.transport_type = SOCK_DGRAM;
-	srx.transport_len = sizeof(srx.transport.sin);
-	srx.transport.sin.sin_family = AF_INET;
-	srx.transport.sin.sin_port = call->port;
-	memcpy(&srx.transport.sin.sin_addr, addr, 4);
-
 	/* Work out the length we're going to transmit.  This is awkward for
 	 * calls such as FS.StoreData where there's an extra injection of data
 	 * after the initial fixed part.
@@ -365,7 +355,7 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 	}
 
 	/* create a call */
-	rxcall = rxrpc_kernel_begin_call(call->net->socket, &srx, call->key,
+	rxcall = rxrpc_kernel_begin_call(call->net->socket, srx, call->key,
 					 (unsigned long)call,
 					 tx_total_len, gfp,
 					 (async ?
diff --git a/fs/afs/server.c b/fs/afs/server.c
index e47fd9bc0ddc..7d103321efab 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -11,6 +11,7 @@
 
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include "afs_fs.h"
 #include "internal.h"
 
 static unsigned afs_server_timeout = 10;	/* server timeout in seconds */
@@ -23,7 +24,7 @@ static int afs_install_server(struct afs_server *server)
 	struct afs_server *xserver;
 	struct afs_net *net = server->cell->net;
 	struct rb_node **pp, *p;
-	int ret;
+	int ret, diff;
 
 	_enter("%p", server);
 
@@ -36,9 +37,10 @@ static int afs_install_server(struct afs_server *server)
 		p = *pp;
 		_debug("- consider %p", p);
 		xserver = rb_entry(p, struct afs_server, master_rb);
-		if (server->addr.s_addr < xserver->addr.s_addr)
+		diff = memcmp(&server->addr, &xserver->addr, sizeof(server->addr));
+		if (diff < 0)
 			pp = &(*pp)->rb_left;
-		else if (server->addr.s_addr > xserver->addr.s_addr)
+		else if (diff > 0)
 			pp = &(*pp)->rb_right;
 		else
 			goto error;
@@ -57,7 +59,7 @@ static int afs_install_server(struct afs_server *server)
  * allocate a new server record
  */
 static struct afs_server *afs_alloc_server(struct afs_cell *cell,
-					   const struct in_addr *addr)
+					   const struct sockaddr_rxrpc *addr)
 {
 	struct afs_server *server;
 
@@ -79,8 +81,7 @@ static struct afs_server *afs_alloc_server(struct afs_cell *cell,
 		INIT_DELAYED_WORK(&server->cb_break_work,
 				  afs_dispatch_give_up_callbacks);
 
-		memcpy(&server->addr, addr, sizeof(struct in_addr));
-		server->addr.s_addr = addr->s_addr;
+		server->addr = *addr;
 		_leave(" = %p{%d}", server, atomic_read(&server->usage));
 	} else {
 		_leave(" = NULL [nomem]");
@@ -92,17 +93,17 @@ static struct afs_server *afs_alloc_server(struct afs_cell *cell,
  * get an FS-server record for a cell
  */
 struct afs_server *afs_lookup_server(struct afs_cell *cell,
-				     const struct in_addr *addr)
+				     struct sockaddr_rxrpc *addr)
 {
 	struct afs_server *server, *candidate;
 
-	_enter("%p,%pI4", cell, &addr->s_addr);
+	_enter("%p,%pIS", cell, &addr->transport);
 
 	/* quick scan of the list to see if we already have the server */
 	read_lock(&cell->servers_lock);
 
 	list_for_each_entry(server, &cell->servers, link) {
-		if (server->addr.s_addr == addr->s_addr)
+		if (memcmp(&server->addr, addr, sizeof(*addr)) == 0)
 			goto found_server_quickly;
 	}
 	read_unlock(&cell->servers_lock);
@@ -117,7 +118,7 @@ struct afs_server *afs_lookup_server(struct afs_cell *cell,
 
 	/* check the cell's server list again */
 	list_for_each_entry(server, &cell->servers, link) {
-		if (server->addr.s_addr == addr->s_addr)
+		if (memcmp(&server->addr, addr, sizeof(*addr)) == 0)
 			goto found_server;
 	}
 
@@ -173,9 +174,9 @@ struct afs_server *afs_find_server(struct afs_net *net,
 {
 	struct afs_server *server = NULL;
 	struct rb_node *p;
-	struct in_addr addr = srx->transport.sin.sin_addr;
+	int diff;
 
-	_enter("{%d,%pI4}", srx->transport.family, &addr.s_addr);
+	_enter("{%d,%pIS}", srx->transport.family, &srx->transport);
 
 	if (srx->transport.family != AF_INET) {
 		WARN(true, "AFS does not yes support non-IPv4 addresses\n");
@@ -190,9 +191,10 @@ struct afs_server *afs_find_server(struct afs_net *net,
 
 		_debug("- consider %p", p);
 
-		if (addr.s_addr < server->addr.s_addr) {
+		diff = memcmp(srx, &server->addr, sizeof(*srx));
+		if (diff < 0) {
 			p = p->rb_left;
-		} else if (addr.s_addr > server->addr.s_addr) {
+		} else if (diff > 0) {
 			p = p->rb_right;
 		} else {
 			afs_get_server(server);
@@ -203,7 +205,6 @@ struct afs_server *afs_find_server(struct afs_net *net,
 	server = NULL;
 found:
 	read_unlock(&net->servers_lock);
-	ASSERTIFCMP(server, server->addr.s_addr, ==, addr.s_addr);
 	_leave(" = %p", server);
 	return server;
 }
diff --git a/fs/afs/vlclient.c b/fs/afs/vlclient.c
index f5a043a9ba61..48d137628d6a 100644
--- a/fs/afs/vlclient.c
+++ b/fs/afs/vlclient.c
@@ -12,6 +12,7 @@
 #include <linux/gfp.h>
 #include <linux/init.h>
 #include <linux/sched.h>
+#include "afs_fs.h"
 #include "internal.h"
 
 /*
@@ -83,8 +84,15 @@ static int afs_deliver_vl_get_entry_by_xxx(struct afs_call *call)
 	bp++; /* type */
 	entry->nservers = ntohl(*bp++);
 
-	for (loop = 0; loop < 8; loop++)
-		entry->servers[loop].s_addr = *bp++;
+	for (loop = 0; loop < 8; loop++) {
+		entry->servers[loop].srx_family = AF_RXRPC;
+		entry->servers[loop].srx_service = FS_SERVICE;
+		entry->servers[loop].transport_type = SOCK_DGRAM;
+		entry->servers[loop].transport_len = sizeof(entry->servers[loop].transport.sin);
+		entry->servers[loop].transport.sin.sin_family = AF_INET;
+		entry->servers[loop].transport.sin.sin_port = htons(AFS_FS_PORT);
+		entry->servers[loop].transport.sin.sin_addr.s_addr = *bp++;
+	}
 
 	bp += 8; /* partition IDs */
 
@@ -144,7 +152,7 @@ static const struct afs_call_type afs_RXVLGetEntryById = {
  * dispatch a get volume entry by name operation
  */
 int afs_vl_get_entry_by_name(struct afs_net *net,
-			     struct in_addr *addr,
+			     struct sockaddr_rxrpc *addr,
 			     struct key *key,
 			     const char *volname,
 			     struct afs_cache_vlocation *entry,
@@ -166,8 +174,6 @@ int afs_vl_get_entry_by_name(struct afs_net *net,
 
 	call->key = key;
 	call->reply = entry;
-	call->service_id = VL_SERVICE;
-	call->port = htons(AFS_VL_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -185,7 +191,7 @@ int afs_vl_get_entry_by_name(struct afs_net *net,
  * dispatch a get volume entry by ID operation
  */
 int afs_vl_get_entry_by_id(struct afs_net *net,
-			   struct in_addr *addr,
+			   struct sockaddr_rxrpc *addr,
 			   struct key *key,
 			   afs_volid_t volid,
 			   afs_voltype_t voltype,
@@ -203,8 +209,6 @@ int afs_vl_get_entry_by_id(struct afs_net *net,
 
 	call->key = key;
 	call->reply = entry;
-	call->service_id = VL_SERVICE;
-	call->port = htons(AFS_VL_PORT);
 
 	/* marshall the parameters */
 	bp = call->request;
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index d9a5d7acdb86..4f8c15c09a6d 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -30,7 +30,6 @@ static int afs_vlocation_access_vl_by_name(struct afs_vlocation *vl,
 					   struct afs_cache_vlocation *vldb)
 {
 	struct afs_cell *cell = vl->cell;
-	struct in_addr addr;
 	int count, ret;
 
 	_enter("%s,%s", cell->name, vl->vldb.name);
@@ -38,12 +37,12 @@ static int afs_vlocation_access_vl_by_name(struct afs_vlocation *vl,
 	down_write(&vl->cell->vl_sem);
 	ret = -ENOMEDIUM;
 	for (count = cell->vl_naddrs; count > 0; count--) {
-		addr = cell->vl_addrs[cell->vl_curr_svix];
+		struct sockaddr_rxrpc *addr = &cell->vl_addrs[cell->vl_curr_svix];
 
-		_debug("CellServ[%hu]: %08x", cell->vl_curr_svix, addr.s_addr);
+		_debug("CellServ[%hu]: %pIS", cell->vl_curr_svix, &addr->transport);
 
 		/* attempt to access the VL server */
-		ret = afs_vl_get_entry_by_name(cell->net, &addr, key,
+		ret = afs_vl_get_entry_by_name(cell->net, addr, key,
 					       vl->vldb.name, vldb, false);
 		switch (ret) {
 		case 0:
@@ -88,7 +87,6 @@ static int afs_vlocation_access_vl_by_id(struct afs_vlocation *vl,
 					 struct afs_cache_vlocation *vldb)
 {
 	struct afs_cell *cell = vl->cell;
-	struct in_addr addr;
 	int count, ret;
 
 	_enter("%s,%x,%d,", cell->name, volid, voltype);
@@ -96,12 +94,12 @@ static int afs_vlocation_access_vl_by_id(struct afs_vlocation *vl,
 	down_write(&vl->cell->vl_sem);
 	ret = -ENOMEDIUM;
 	for (count = cell->vl_naddrs; count > 0; count--) {
-		addr = cell->vl_addrs[cell->vl_curr_svix];
+		struct sockaddr_rxrpc *addr = &cell->vl_addrs[cell->vl_curr_svix];
 
-		_debug("CellServ[%hu]: %08x", cell->vl_curr_svix, addr.s_addr);
+		_debug("CellServ[%hu]: %pIS", cell->vl_curr_svix, &addr->transport);
 
 		/* attempt to access the VL server */
-		ret = afs_vl_get_entry_by_id(cell->net, &addr, key, volid,
+		ret = afs_vl_get_entry_by_id(cell->net, addr, key, volid,
 					     voltype, vldb, false);
 		switch (ret) {
 		case 0:
@@ -192,15 +190,7 @@ static int afs_vlocation_update_record(struct afs_vlocation *vl,
 	int ret;
 
 	/* try to look up a cached volume in the cell VL databases by ID */
-	_debug("Locally Cached: %s %02x { %08x(%x) %08x(%x) %08x(%x) }",
-	       vl->vldb.name,
-	       vl->vldb.vidmask,
-	       ntohl(vl->vldb.servers[0].s_addr),
-	       vl->vldb.srvtmask[0],
-	       ntohl(vl->vldb.servers[1].s_addr),
-	       vl->vldb.srvtmask[1],
-	       ntohl(vl->vldb.servers[2].s_addr),
-	       vl->vldb.srvtmask[2]);
+	_debug("Locally Cached: %s %02x", vl->vldb.name, vl->vldb.vidmask);
 
 	_debug("Vids: %08x %08x %08x",
 	       vl->vldb.vid[0],
@@ -258,11 +248,7 @@ static int afs_vlocation_update_record(struct afs_vlocation *vl,
 static void afs_vlocation_apply_update(struct afs_vlocation *vl,
 				       struct afs_cache_vlocation *vldb)
 {
-	_debug("Done VL Lookup: %s %02x { %08x(%x) %08x(%x) %08x(%x) }",
-	       vldb->name, vldb->vidmask,
-	       ntohl(vldb->servers[0].s_addr), vldb->srvtmask[0],
-	       ntohl(vldb->servers[1].s_addr), vldb->srvtmask[1],
-	       ntohl(vldb->servers[2].s_addr), vldb->srvtmask[2]);
+	_debug("Done VL Lookup: %s %02x", vldb->name, vldb->vidmask);
 
 	_debug("Vids: %08x %08x %08x",
 	       vldb->vid[0], vldb->vid[1], vldb->vid[2]);
diff --git a/fs/afs/vnode.c b/fs/afs/vnode.c
index dcb956143c86..64834b20f0f6 100644
--- a/fs/afs/vnode.c
+++ b/fs/afs/vnode.c
@@ -354,8 +354,8 @@ int afs_vnode_fetch_status(struct afs_vnode *vnode,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %p{%08x}",
-		       server, ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %p{%pIS}",
+		       server, &server->addr.transport);
 
 		ret = afs_fs_fetch_file_status(server, key, vnode, NULL,
 					       false);
@@ -418,7 +418,7 @@ int afs_vnode_fetch_data(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_fetch_data(server, key, vnode, desc,
 					false);
@@ -474,7 +474,7 @@ int afs_vnode_create(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_create(server, key, vnode, name, mode, newfid,
 				    newstatus, newcb, false);
@@ -530,7 +530,7 @@ int afs_vnode_remove(struct afs_vnode *vnode, struct key *key, const char *name,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_remove(server, key, vnode, name, isdir,
 				    false);
@@ -592,7 +592,7 @@ int afs_vnode_link(struct afs_vnode *dvnode, struct afs_vnode *vnode,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_link(server, key, dvnode, vnode, name,
 				  false);
@@ -656,7 +656,7 @@ int afs_vnode_symlink(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_symlink(server, key, vnode, name, content,
 				     newfid, newstatus, false);
@@ -726,7 +726,7 @@ int afs_vnode_rename(struct afs_vnode *orig_dvnode,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_rename(server, key, orig_dvnode, orig_name,
 				    new_dvnode, new_name, false);
@@ -792,7 +792,7 @@ int afs_vnode_store_data(struct afs_writeback *wb, pgoff_t first, pgoff_t last,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_store_data(server, wb, first, last, offset, to,
 					false);
@@ -845,7 +845,7 @@ int afs_vnode_setattr(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_setattr(server, key, vnode, attr, false);
 
@@ -892,7 +892,7 @@ int afs_vnode_get_volume_status(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_get_volume_status(server, key, vnode, vs, false);
 
@@ -931,7 +931,7 @@ int afs_vnode_set_lock(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_set_lock(server, key, vnode, type, false);
 
@@ -969,7 +969,7 @@ int afs_vnode_extend_lock(struct afs_vnode *vnode, struct key *key)
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_extend_lock(server, key, vnode, false);
 
@@ -1007,7 +1007,7 @@ int afs_vnode_release_lock(struct afs_vnode *vnode, struct key *key)
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
+		_debug("USING SERVER: %pIS\n", &server->addr.transport);
 
 		ret = afs_fs_release_lock(server, key, vnode, false);
 
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index bc52203695b7..fbbb470ac027 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -248,8 +248,8 @@ struct afs_server *afs_volume_pick_fileserver(struct afs_vnode *vnode)
 		case 0:
 			afs_get_server(server);
 			up_read(&volume->server_sem);
-			_leave(" = %p (picked %08x)",
-			       server, ntohl(server->addr.s_addr));
+			_leave(" = %p (picked %pIS)",
+			       server, &server->addr.transport);
 			return server;
 
 		case -ENETUNREACH:
@@ -303,9 +303,8 @@ int afs_volume_release_fileserver(struct afs_vnode *vnode,
 	struct afs_volume *volume = vnode->volume;
 	unsigned loop;
 
-	_enter("%s,%08x,%d",
-	       volume->vlocation->vldb.name, ntohl(server->addr.s_addr),
-	       result);
+	_enter("%s,%pIS,%d",
+	       volume->vlocation->vldb.name, &server->addr.transport, result);
 
 	switch (result) {
 		/* success */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 09/11] afs: Allow IPv6 address specification of VL servers
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
                   ` (6 preceding siblings ...)
  2017-09-01 15:41 ` [RFC PATCH 08/11] afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr David Howells
@ 2017-09-01 15:41 ` David Howells
  2017-09-01 15:42 ` [RFC PATCH 10/11] afs: Overhaul cell database management David Howells
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:41 UTC (permalink / raw)
  To: linux-afs; +Cc: linux-fsdevel, dhowells, linux-kernel

Allow VL server specifications to be given IPv6 addresses as well as IPv4
addresses, for example as:

	echo add procyon.org.uk 1111:2222:3333:0:4444:5555:6666:7777 >/proc/fs/afs/cells

Note that ':' is the expected separator for separating IPv4 addresses, but
if a ',' is detected or no '.' is detected in the string, the delimiter is
switched to ','.

This also works with DNS AFSDB or SRV record strings fetched by upcall from
userspace.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/cell.c     |   31 +++++++++++++++++++++----------
 fs/afs/proc.c     |    2 +-
 fs/afs/rxrpc.c    |   11 +++++------
 fs/afs/server.c   |    5 -----
 fs/afs/vlclient.c |   13 +++++++++----
 5 files changed, 36 insertions(+), 26 deletions(-)

diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index 3a6c91ae11a3..6c49532a79de 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -15,6 +15,7 @@
 #include <linux/ctype.h>
 #include <linux/dns_resolver.h>
 #include <linux/sched.h>
+#include <linux/inet.h>
 #include <keys/rxrpc-type.h>
 #include "internal.h"
 
@@ -86,28 +87,38 @@ static struct afs_cell *afs_cell_alloc(struct afs_net *net,
 		delimiter = ',';
 
 	} else {
+		if (strchr(vllist, ',') || !strchr(vllist, '.'))
+			delimiter = ',';
 		_vllist = vllist;
 	}
 
 	/* fill in the VL server list from the rest of the string */
 	do {
 		struct sockaddr_rxrpc *srx = &cell->vl_addrs[cell->vl_naddrs];
-		unsigned a, b, c, d;
+		const char *end;
 
 		next = strchr(_vllist, delimiter);
 		if (next)
 			*next++ = 0;
 
-		if (sscanf(_vllist, "%u.%u.%u.%u", &a, &b, &c, &d) != 4)
-			goto bad_address;
-
-		if (a > 255 || b > 255 || c > 255 || d > 255)
+		if (in4_pton(_vllist, -1, (u8 *)&srx->transport.sin6.sin6_addr.s6_addr32[3],
+			     -1, &end)) {
+			srx->transport_len		= sizeof(struct sockaddr_in6);
+			srx->transport.sin6.sin6_family	= AF_INET6;
+			srx->transport.sin6.sin6_flowinfo = 0;
+			srx->transport.sin6.sin6_scope_id = 0;
+			srx->transport.sin6.sin6_addr.s6_addr32[0] = 0;
+			srx->transport.sin6.sin6_addr.s6_addr32[1] = 0;
+			srx->transport.sin6.sin6_addr.s6_addr32[2] = htonl(0xffff);
+		} else if (in6_pton(_vllist, -1, srx->transport.sin6.sin6_addr.s6_addr,
+				    -1, &end)) {
+			srx->transport_len		= sizeof(struct sockaddr_in6);
+			srx->transport.sin6.sin6_family	= AF_INET6;
+			srx->transport.sin6.sin6_flowinfo = 0;
+			srx->transport.sin6.sin6_scope_id = 0;
+		} else {
 			goto bad_address;
-
-		srx->transport_len		= sizeof(struct sockaddr_in);
-		srx->transport.sin.sin_family	= AF_INET;
-		srx->transport.sin.sin_addr.s_addr =
-			htonl((a << 24) | (b << 16) | (c << 8) | d);
+		}
 
 	} while (cell->vl_naddrs++,
 		 cell->vl_naddrs < AFS_CELL_MAX_ADDRS && (_vllist = next));
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 4d609869a57b..ee10f0089d87 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -662,7 +662,7 @@ static int afs_proc_cell_servers_show(struct seq_file *m, void *v)
 
 	/* display one cell per line on subsequent lines */
 	sprintf(ipaddr, "%pISp", &server->addr.transport);
-	seq_printf(m, "%3d %-15.15s %5d\n",
+	seq_printf(m, "%3d %-15s %5d\n",
 		   atomic_read(&server->usage), ipaddr, server->fs_state);
 
 	return 0;
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index f57a2d41a310..805ae0542478 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -46,21 +46,20 @@ int afs_open_socket(struct afs_net *net)
 
 	_enter("");
 
-	ret = sock_create_kern(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET, &socket);
+	ret = sock_create_kern(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET6, &socket);
 	if (ret < 0)
 		goto error_1;
 
 	socket->sk->sk_allocation = GFP_NOFS;
 
 	/* bind the callback manager's address to make this a server socket */
+	memset(&srx, 0, sizeof(srx));
 	srx.srx_family			= AF_RXRPC;
 	srx.srx_service			= CM_SERVICE;
 	srx.transport_type		= SOCK_DGRAM;
-	srx.transport_len		= sizeof(srx.transport.sin);
-	srx.transport.sin.sin_family	= AF_INET;
-	srx.transport.sin.sin_port	= htons(AFS_CM_PORT);
-	memset(&srx.transport.sin.sin_addr, 0,
-	       sizeof(srx.transport.sin.sin_addr));
+	srx.transport_len		= sizeof(srx.transport.sin6);
+	srx.transport.sin6.sin6_family	= AF_INET6;
+	srx.transport.sin6.sin6_port	= htons(AFS_CM_PORT);
 
 	ret = kernel_bind(socket, (struct sockaddr *) &srx, sizeof(srx));
 	if (ret < 0)
diff --git a/fs/afs/server.c b/fs/afs/server.c
index 7d103321efab..9b38f386a142 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -178,11 +178,6 @@ struct afs_server *afs_find_server(struct afs_net *net,
 
 	_enter("{%d,%pIS}", srx->transport.family, &srx->transport);
 
-	if (srx->transport.family != AF_INET) {
-		WARN(true, "AFS does not yes support non-IPv4 addresses\n");
-		return NULL;
-	}
-
 	read_lock(&net->servers_lock);
 
 	p = net->servers.rb_node;
diff --git a/fs/afs/vlclient.c b/fs/afs/vlclient.c
index 48d137628d6a..276319aa86d8 100644
--- a/fs/afs/vlclient.c
+++ b/fs/afs/vlclient.c
@@ -88,10 +88,15 @@ static int afs_deliver_vl_get_entry_by_xxx(struct afs_call *call)
 		entry->servers[loop].srx_family = AF_RXRPC;
 		entry->servers[loop].srx_service = FS_SERVICE;
 		entry->servers[loop].transport_type = SOCK_DGRAM;
-		entry->servers[loop].transport_len = sizeof(entry->servers[loop].transport.sin);
-		entry->servers[loop].transport.sin.sin_family = AF_INET;
-		entry->servers[loop].transport.sin.sin_port = htons(AFS_FS_PORT);
-		entry->servers[loop].transport.sin.sin_addr.s_addr = *bp++;
+		entry->servers[loop].transport_len = sizeof(entry->servers[loop].transport.sin6);
+		entry->servers[loop].transport.sin6.sin6_family = AF_INET6;
+		entry->servers[loop].transport.sin6.sin6_port = htons(AFS_FS_PORT);
+		entry->servers[loop].transport.sin6.sin6_flowinfo = 0;
+		entry->servers[loop].transport.sin6.sin6_scope_id = 0;
+		entry->servers[loop].transport.sin6.sin6_addr.s6_addr32[0] = 0;
+		entry->servers[loop].transport.sin6.sin6_addr.s6_addr32[1] = 0;
+		entry->servers[loop].transport.sin6.sin6_addr.s6_addr32[2] = htonl(0xffff);
+		entry->servers[loop].transport.sin6.sin6_addr.s6_addr32[3] = *bp++;
 	}
 
 	bp += 8; /* partition IDs */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 10/11] afs: Overhaul cell database management
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
                   ` (7 preceding siblings ...)
  2017-09-01 15:41 ` [RFC PATCH 09/11] afs: Allow IPv6 address specification of VL servers David Howells
@ 2017-09-01 15:42 ` David Howells
  2017-09-01 15:42 ` [RFC PATCH 11/11] afs: Retry rxrpc calls with address rotation on network error David Howells
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:42 UTC (permalink / raw)
  To: linux-afs; +Cc: linux-fsdevel, dhowells, linux-kernel

Overhaul the way that the in-kernel AFS client keeps track of cells in the
following manner:

 (1) Cells are now held in an rbtree to make walking them quicker and RCU
     managed (though this is probably overkill).

 (2) Cells now have a manager work item that:

     (A) Looks after fetching and refreshing the VL server list.

     (B) Manages cell record lifetime, including initialising and
     	 destruction.

     (B) Manages cell record caching whereby threads are kept around for a
     	 certain time after last use and then destroyed.

     (C) Manages the FS-Cache index cookie for a cell.  It is not permitted
     	 for a cookie to be in use twice, so we have to be careful to not
     	 allow a new cell record to exist at the same time as an old record
     	 of the same name.

 (3) Each AFS network namespace is given a manager work item that manages
     the cells within it, maintaining a single timer to prod cells into
     updating their DNS records.

     This uses the reduce_timer() facility to make the timer expire at the
     soonest timed event that needs happening.

 (4) When a module is being unloaded, cells and cell managers are now
     counted out using dec_after_work() to make sure the module text is
     pinned until after the data structures have been cleaned up.

 (5) Each cell's VL server list is now protected by a seqlock rather than a
     semaphore.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/callback.c  |    2 
 fs/afs/cell.c      |  913 ++++++++++++++++++++++++++++++++++++----------------
 fs/afs/internal.h  |   62 +++-
 fs/afs/main.c      |   16 +
 fs/afs/proc.c      |   15 -
 fs/afs/server.c    |    3 
 fs/afs/super.c     |   20 +
 fs/afs/vlocation.c |    6 
 fs/afs/xattr.c     |    2 
 9 files changed, 710 insertions(+), 329 deletions(-)

diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index d12dffb76b67..bdccc4e72c0e 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -341,7 +341,7 @@ void afs_dispatch_give_up_callbacks(struct work_struct *work)
 	 *   had callbacks entirely, and the server will call us later to break
 	 *   them
 	 */
-	afs_fs_give_up_callbacks(server->cell->net, server, true);
+	afs_fs_give_up_callbacks(server->net, server, true);
 }
 
 /*
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index 6c49532a79de..078ffd90e5f4 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -1,6 +1,6 @@
 /* AFS cell and server record management
  *
- * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2002, 2017 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
  * This program is free software; you can redistribute it and/or
@@ -19,128 +19,192 @@
 #include <keys/rxrpc-type.h>
 #include "internal.h"
 
+unsigned __read_mostly afs_cell_gc_delay = 10;
+
+static void afs_manage_cell(struct work_struct *);
+
+static void afs_dec_cells_outstanding(struct afs_net *net)
+{
+	if (atomic_dec_and_test(&net->cells_outstanding))
+		wake_up_atomic_t(&net->cells_outstanding);
+}
+
 /*
- * allocate a cell record and fill in its name, VL server address list and
- * allocate an anonymous key
+ * Set the cell timer to fire after a given delay, assuming it's not already
+ * set for an earlier time.
  */
-static struct afs_cell *afs_cell_alloc(struct afs_net *net,
-				       const char *name, unsigned namelen,
-				       char *vllist)
+static void afs_set_cell_timer(struct afs_net *net, time64_t delay)
 {
-	struct afs_cell *cell;
-	struct key *key;
-	char keyname[4 + AFS_MAXCELLNAME + 1], *cp, *dp, *next;
-	char  *dvllist = NULL, *_vllist = NULL;
-	char  delimiter = ':';
-	int ret, i;
+	if (net->live) {
+		atomic_inc(&net->cells_outstanding);
+		if (reduce_timer(&net->cells_timer, jiffies + delay * HZ))
+			afs_dec_cells_outstanding(net);
+	}
+}
+
+/*
+ * Look up and get an activation reference on a cell record under RCU
+ * conditions.  The caller must hold the RCU read lock.
+ */
+struct afs_cell *afs_lookup_cell_rcu(struct afs_net *net,
+				     const char *name, unsigned int namesz)
+{
+	struct afs_cell *cell = NULL;
+	struct rb_node *p;
+	unsigned int seq = 0, n;
+	int ret = 0;
+
+	_enter("%*.*s", namesz, namesz, name);
+
+	if (name && namesz == 0)
+		return ERR_PTR(-EINVAL);
+	if (namesz > AFS_MAXCELLNAME)
+		return ERR_PTR(-ENAMETOOLONG);
+
+	do {
+		/* Unfortunately, rbtree walking doesn't give reliable results
+		 * under just the RCU read lock, so we have to check for
+		 * changes.
+		 */
+		read_seqbegin_or_lock(&net->cells_lock, &seq);
+
+		if (!name) {
+			ret = -EDESTADDRREQ;
+			cell = rcu_dereference_raw(net->ws_cell);
+			if (!cell)
+				goto done;
+
+			afs_get_cell(cell);
+			goto done;
+		}
+
+		p = rcu_dereference_raw(net->cells.rb_node);
+		while (p) {
+			cell = rb_entry(p, struct afs_cell, net_node);
+
+			n = strncasecmp(cell->name, name,
+					min_t(size_t, cell->name_len, namesz));
+			if (n == 0)
+				n = cell->name_len - namesz;
+			if (n < 0) {
+				p = rcu_dereference_raw(p->rb_left);
+			} else if (n > 0) {
+				p = rcu_dereference_raw(p->rb_right);
+			} else {
+				if (refcount_inc_not_zero(&cell->usage))
+					goto done;
+				/* We want to repeat the search, this time with
+				 * the lock properly locked.
+				 */
+			}
+			cell = NULL;
+		}
+	} while (need_seqretry(&net->cells_lock, seq));
+
+	ret = -ENOENT;
 
-	_enter("%*.*s,%s", namelen, namelen, name ?: "", vllist);
+done:
+	done_seqretry(&net->cells_lock, seq);
 
-	BUG_ON(!name); /* TODO: want to look up "this cell" in the cache */
+	return ret == 0 ? cell : ERR_PTR(ret);
+}
+
+/*
+ * Set up a cell record and fill in its name, VL server address list and
+ * allocate an anonymous key
+ */
+static struct afs_cell *afs_alloc_cell(struct afs_net *net,
+				       const char *name, unsigned int namelen,
+				       const char *vllist)
+{
+	struct afs_cell *cell;
+	int i, ret;
 
+	ASSERT(name);
+	if (namelen == 0)
+		return ERR_PTR(-EINVAL);
 	if (namelen > AFS_MAXCELLNAME) {
 		_leave(" = -ENAMETOOLONG");
 		return ERR_PTR(-ENAMETOOLONG);
 	}
 
-	/* allocate and initialise a cell record */
-	cell = kzalloc(sizeof(struct afs_cell) + namelen + 1, GFP_KERNEL);
+	_enter("%*.*s,%s", namelen, namelen, name, vllist);
+
+	cell = kzalloc(sizeof(struct afs_cell), GFP_KERNEL);
 	if (!cell) {
 		_leave(" = -ENOMEM");
 		return ERR_PTR(-ENOMEM);
 	}
 
-	memcpy(cell->name, name, namelen);
-	cell->name[namelen] = 0;
-
-	atomic_set(&cell->usage, 1);
-	INIT_LIST_HEAD(&cell->link);
 	cell->net = net;
+	cell->name_len = namelen;
+	for (i = 0; i < namelen; i++)
+		cell->name[i] = tolower(name[i]);
+
+	refcount_set(&cell->usage, 2);
+	INIT_WORK(&cell->manager, afs_manage_cell);
 	rwlock_init(&cell->servers_lock);
 	INIT_LIST_HEAD(&cell->servers);
 	init_rwsem(&cell->vl_sem);
 	INIT_LIST_HEAD(&cell->vl_list);
 	spin_lock_init(&cell->vl_lock);
+	seqlock_init(&cell->vl_addrs_lock);
+	cell->flags = (1 << AFS_CELL_FL_NOT_READY);
 
 	for (i = 0; i < AFS_CELL_MAX_ADDRS; i++) {
 		struct sockaddr_rxrpc *srx = &cell->vl_addrs[i];
 		srx->srx_family			= AF_RXRPC;
 		srx->srx_service		= VL_SERVICE;
 		srx->transport_type		= SOCK_DGRAM;
-		srx->transport.sin.sin_port	= htons(AFS_VL_PORT);
+		srx->transport.sin6.sin6_family	= AF_INET6;
+		srx->transport.sin6.sin6_port	= htons(AFS_VL_PORT);
 	}
 
-	/* if the ip address is invalid, try dns query */
-	if (!vllist || strlen(vllist) < 7) {
-		ret = dns_query("afsdb", name, namelen, "ipv4", &dvllist, NULL);
-		if (ret < 0) {
-			if (ret == -ENODATA || ret == -EAGAIN || ret == -ENOKEY)
-				/* translate these errors into something
-				 * userspace might understand */
-				ret = -EDESTADDRREQ;
-			_leave(" = %d", ret);
-			return ERR_PTR(ret);
-		}
-		_vllist = dvllist;
+	/* Fill in the VL server list if we were given a list of addresses to
+	 * use.
+	 */
+	if (vllist) {
+		char delim = ':';
 
-		/* change the delimiter for user-space reply */
-		delimiter = ',';
-
-	} else {
 		if (strchr(vllist, ',') || !strchr(vllist, '.'))
-			delimiter = ',';
-		_vllist = vllist;
-	}
-
-	/* fill in the VL server list from the rest of the string */
-	do {
-		struct sockaddr_rxrpc *srx = &cell->vl_addrs[cell->vl_naddrs];
-		const char *end;
-
-		next = strchr(_vllist, delimiter);
-		if (next)
-			*next++ = 0;
+			delim = ',';
+
+		do {
+			struct sockaddr_rxrpc *srx = &cell->vl_addrs[cell->vl_naddrs];
+
+			if (in4_pton(vllist, -1,
+				     (u8 *)&srx->transport.sin6.sin6_addr.s6_addr32[3],
+				     delim, &vllist)) {
+				srx->transport_len = sizeof(struct sockaddr_in6);
+				srx->transport.sin6.sin6_addr.s6_addr32[0] = 0;
+				srx->transport.sin6.sin6_addr.s6_addr32[1] = 0;
+				srx->transport.sin6.sin6_addr.s6_addr32[2] = htonl(0xffff);
+			} else if (in6_pton(vllist, -1,
+					    srx->transport.sin6.sin6_addr.s6_addr,
+					    delim, &vllist)) {
+				srx->transport_len = sizeof(struct sockaddr_in6);
+				srx->transport.sin6.sin6_family	= AF_INET6;
+			} else {
+				goto bad_address;
+			}
 
-		if (in4_pton(_vllist, -1, (u8 *)&srx->transport.sin6.sin6_addr.s6_addr32[3],
-			     -1, &end)) {
-			srx->transport_len		= sizeof(struct sockaddr_in6);
-			srx->transport.sin6.sin6_family	= AF_INET6;
-			srx->transport.sin6.sin6_flowinfo = 0;
-			srx->transport.sin6.sin6_scope_id = 0;
-			srx->transport.sin6.sin6_addr.s6_addr32[0] = 0;
-			srx->transport.sin6.sin6_addr.s6_addr32[1] = 0;
-			srx->transport.sin6.sin6_addr.s6_addr32[2] = htonl(0xffff);
-		} else if (in6_pton(_vllist, -1, srx->transport.sin6.sin6_addr.s6_addr,
-				    -1, &end)) {
-			srx->transport_len		= sizeof(struct sockaddr_in6);
-			srx->transport.sin6.sin6_family	= AF_INET6;
-			srx->transport.sin6.sin6_flowinfo = 0;
-			srx->transport.sin6.sin6_scope_id = 0;
-		} else {
-			goto bad_address;
-		}
+			cell->vl_naddrs++;
+			if (!*vllist)
+				break;
+			vllist++;
 
-	} while (cell->vl_naddrs++,
-		 cell->vl_naddrs < AFS_CELL_MAX_ADDRS && (_vllist = next));
+		} while (cell->vl_naddrs < AFS_CELL_MAX_ADDRS && vllist);
 
-	/* create a key to represent an anonymous user */
-	memcpy(keyname, "afs@", 4);
-	dp = keyname + 4;
-	cp = cell->name;
-	do {
-		*dp++ = toupper(*cp);
-	} while (*cp++);
-
-	key = rxrpc_get_null_key(keyname);
-	if (IS_ERR(key)) {
-		_debug("no key");
-		ret = PTR_ERR(key);
-		goto error;
+		/* Disable DNS refresh for manually-specified cells and set the
+		 * no-garbage collect flag (which pins the active count).
+		 */
+		cell->dns_expiry = TIME64_MAX;
+	} else {
+		/* We're going to need to 'refresh' this cell's VL server list
+		 * from the DNS before we can use it.
+		 */
+		cell->dns_expiry = S64_MIN;
 	}
-	cell->anonymous_key = key;
-
-	_debug("anon key %p{%x}",
-	       cell->anonymous_key, key_serial(cell->anonymous_key));
 
 	_leave(" = %p", cell);
 	return cell;
@@ -148,92 +212,128 @@ static struct afs_cell *afs_cell_alloc(struct afs_net *net,
 bad_address:
 	printk(KERN_ERR "kAFS: bad VL server IP address\n");
 	ret = -EINVAL;
-error:
-	key_put(cell->anonymous_key);
-	kfree(dvllist);
 	kfree(cell);
 	_leave(" = %d", ret);
 	return ERR_PTR(ret);
 }
 
 /*
- * afs_cell_crate() - create a cell record
+ * afs_lookup_cell - Look up or create a cell record.
  * @net:	The network namespace
- * @name:	is the name of the cell.
- * @namsesz:	is the strlen of the cell name.
- * @vllist:	is a colon separated list of IP addresses in "a.b.c.d" format.
- * @retref:	is T to return the cell reference when the cell exists.
+ * @name:	The name of the cell.
+ * @namesz:	The strlen of the cell name.
+ * @vllist:	A colon/comma separated list of numeric IP addresses or NULL.
+ * @excl:	T if an error should be given if the cell name already exists.
+ *
+ * Look up a cell record by name and query the DNS for VL server addresses if
+ * needed.  Note that that actual DNS query is punted off to the manager thread
+ * so that this function can return immediately if interrupted whilst allowing
+ * cell records to be shared even if not yet fully constructed.
  */
-struct afs_cell *afs_cell_create(struct afs_net *net,
-				 const char *name, unsigned namesz,
-				 char *vllist, bool retref)
+struct afs_cell *afs_lookup_cell(struct afs_net *net,
+				 const char *name, unsigned int namesz,
+				 const char *vllist, bool excl)
 {
-	struct afs_cell *cell;
-	int ret;
-
-	_enter("%*.*s,%s", namesz, namesz, name ?: "", vllist);
+	struct afs_cell *cell, *candidate, *cursor;
+	struct rb_node *parent, **pp;
+	int ret, n;
+
+	_enter("%s,%s", name, vllist);
+
+	if (!excl) {
+		rcu_read_lock();
+		cell = afs_lookup_cell_rcu(net, name, namesz);
+		rcu_read_unlock();
+		if (!IS_ERR(cell)) {
+			if (excl) {
+				afs_put_cell(net, cell);
+				return ERR_PTR(-EEXIST);
+			}
+			goto wait_for_cell;
+		}
+	}
 
-	down_write(&net->cells_sem);
-	read_lock(&net->cells_lock);
-	list_for_each_entry(cell, &net->cells, link) {
-		if (strncasecmp(cell->name, name, namesz) == 0)
-			goto duplicate_name;
+	/* Assume we're probably going to create a cell and preallocate and
+	 * mostly set up a candidate record.  We can then use this to stash the
+	 * name, the net namespace and VL server addresses.
+	 *
+	 * We also want to do this before we hold any locks as it may involve
+	 * upcalling to userspace to make DNS queries.
+	 */
+	candidate = afs_alloc_cell(net, name, namesz, vllist);
+	if (IS_ERR(candidate)) {
+		_leave(" = %ld", PTR_ERR(candidate));
+		return candidate;
 	}
-	read_unlock(&net->cells_lock);
 
-	cell = afs_cell_alloc(net, name, namesz, vllist);
-	if (IS_ERR(cell)) {
-		_leave(" = %ld", PTR_ERR(cell));
-		up_write(&net->cells_sem);
-		return cell;
+	/* Find the insertion point and check to see if someone else added a
+	 * cell whilst we were allocating.
+	 */
+	write_seqlock(&net->cells_lock);
+
+	pp = &net->cells.rb_node;
+	parent = NULL;
+	while (*pp) {
+		parent = *pp;
+		cursor = rb_entry(parent, struct afs_cell, net_node);
+
+		n = strncasecmp(cursor->name, name,
+				min_t(size_t, cursor->name_len, namesz));
+		if (n == 0)
+			n = cursor->name_len - namesz;
+		if (n < 0)
+			pp = &(*pp)->rb_left;
+		else if (n > 0)
+			pp = &(*pp)->rb_right;
+		else
+			goto cell_already_exists;
 	}
 
-	/* add a proc directory for this cell */
-	ret = afs_proc_cell_setup(net, cell);
-	if (ret < 0)
+	cell = candidate;
+	candidate = NULL;
+	rb_link_node_rcu(&cell->net_node, parent, pp);
+	rb_insert_color(&cell->net_node, &net->cells);
+	atomic_inc(&net->cells_outstanding);
+	write_sequnlock(&net->cells_lock);
+
+	schedule_work(&cell->manager);
+
+wait_for_cell:
+	_debug("wait_for_cell");
+	ret = wait_on_bit(&cell->flags, AFS_CELL_FL_NOT_READY, TASK_INTERRUPTIBLE);
+	smp_rmb();
+
+	switch (READ_ONCE(cell->state)) {
+	case AFS_CELL_FAILED:
+		ret = cell->error;
+	default:
+		_debug("weird %u %d", cell->state, cell->error);
 		goto error;
+	case AFS_CELL_ACTIVE:
+		break;
+	}
 
-#ifdef CONFIG_AFS_FSCACHE
-	/* put it up for caching (this never returns an error) */
-	cell->cache = fscache_acquire_cookie(afs_cache_netfs.primary_index,
-					     &afs_cell_cache_index_def,
-					     cell, true);
-#endif
-
-	/* add to the cell lists */
-	write_lock(&net->cells_lock);
-	list_add_tail(&cell->link, &net->cells);
-	write_unlock(&net->cells_lock);
-
-	down_write(&net->proc_cells_sem);
-	list_add_tail(&cell->proc_link, &net->proc_cells);
-	up_write(&net->proc_cells_sem);
-	up_write(&net->cells_sem);
-
-	_leave(" = %p", cell);
+	_leave(" = %p [cell]", cell);
 	return cell;
 
+cell_already_exists:
+	_debug("cell exists");
+	cell = cursor;
+	if (excl) {
+		ret = -EEXIST;
+	} else {
+		ASSERTCMP(refcount_read(&cursor->usage), >=, 1);
+		refcount_inc(&cursor->usage);
+		ret = 0;
+	}
+	write_sequnlock(&net->cells_lock);
+	kfree(candidate);
+	if (ret == 0)
+		goto wait_for_cell;
 error:
-	up_write(&net->cells_sem);
-	key_put(cell->anonymous_key);
-	kfree(cell);
-	_leave(" = %d", ret);
+	afs_put_cell(net, cell);
+	_leave(" = %d [error]", ret);
 	return ERR_PTR(ret);
-
-duplicate_name:
-	if (retref && !IS_ERR(cell))
-		afs_get_cell(cell);
-
-	read_unlock(&net->cells_lock);
-	up_write(&net->cells_sem);
-
-	if (retref) {
-		_leave(" = %p", cell);
-		return cell;
-	}
-
-	_leave(" = -EEXIST");
-	return ERR_PTR(-EEXIST);
 }
 
 /*
@@ -241,10 +341,11 @@ struct afs_cell *afs_cell_create(struct afs_net *net,
  * - can be called with a module parameter string
  * - can be called from a write to /proc/fs/afs/rootcell
  */
-int afs_cell_init(struct afs_net *net, char *rootcell)
+int afs_cell_init(struct afs_net *net, const char *rootcell)
 {
 	struct afs_cell *old_root, *new_root;
-	char *cp;
+	const char *cp, *vllist;
+	size_t len;
 
 	_enter("");
 
@@ -257,223 +358,467 @@ int afs_cell_init(struct afs_net *net, char *rootcell)
 	}
 
 	cp = strchr(rootcell, ':');
-	if (!cp)
+	if (!cp) {
 		_debug("kAFS: no VL server IP addresses specified");
-	else
-		*cp++ = 0;
+		vllist = NULL;
+		len = strlen(rootcell);
+	} else {
+		vllist = cp + 1;
+		len = cp - rootcell;
+	}
 
 	/* allocate a cell record for the root cell */
-	new_root = afs_cell_create(net, rootcell, strlen(rootcell), cp, false);
+	new_root = afs_lookup_cell(net, rootcell, len, vllist, false);
 	if (IS_ERR(new_root)) {
 		_leave(" = %ld", PTR_ERR(new_root));
 		return PTR_ERR(new_root);
 	}
 
+	set_bit(AFS_CELL_FL_NO_GC, &new_root->flags);
+	afs_get_cell(new_root);
+
 	/* install the new cell */
-	write_lock(&net->cells_lock);
+	write_seqlock(&net->cells_lock);
 	old_root = net->ws_cell;
 	net->ws_cell = new_root;
-	write_unlock(&net->cells_lock);
-	afs_put_cell(old_root);
+	write_sequnlock(&net->cells_lock);
 
+	afs_put_cell(net, old_root);
 	_leave(" = 0");
 	return 0;
 }
 
 /*
- * lookup a cell record
+ * Update a cell's VL server address list from the DNS.
  */
-struct afs_cell *afs_cell_lookup(struct afs_net *net,
-				 const char *name, unsigned namesz,
-				 bool dns_cell)
+static void afs_update_cell(struct afs_cell *cell)
 {
-	struct afs_cell *cell;
+	time64_t now, expiry;
+	char *vllist = NULL;
+	int ret;
 
-	_enter("\"%*.*s\",", namesz, namesz, name ?: "");
+	_enter("%s", cell->name);
+
+	ret = dns_query("afsdb", cell->name, cell->name_len,
+			"ipv4", &vllist, &expiry);
+	_debug("query %d", ret);
+	switch (ret) {
+	case 0 ... INT_MAX:
+		clear_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags);
+		clear_bit(AFS_CELL_FL_NOT_FOUND, &cell->flags);
+		goto parse_dns_data;
+
+	case -ENODATA:
+		clear_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags);
+		set_bit(AFS_CELL_FL_NOT_FOUND, &cell->flags);
+		cell->dns_expiry = ktime_get_real_seconds() + 61;
+		cell->error = -EDESTADDRREQ;
+		goto out;
+
+	case -EAGAIN:
+	case -ECONNREFUSED:
+	default:
+		/* Unable to query DNS. */
+		set_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags);
+		cell->dns_expiry = ktime_get_real_seconds() + 10;
+		cell->error = -EDESTADDRREQ;
+		goto out;
+	}
 
-	down_read(&net->cells_sem);
-	read_lock(&net->cells_lock);
+parse_dns_data:
+	write_seqlock(&cell->vl_addrs_lock);
 
-	if (name) {
-		/* if the cell was named, look for it in the cell record list */
-		list_for_each_entry(cell, &net->cells, link) {
-			if (strncmp(cell->name, name, namesz) == 0) {
-				afs_get_cell(cell);
-				goto found;
-			}
-		}
-		cell = ERR_PTR(-ENOENT);
-		if (dns_cell)
-			goto create_cell;
-	found:
-		;
-	} else {
-		cell = net->ws_cell;
-		if (!cell) {
-			/* this should not happen unless user tries to mount
-			 * when root cell is not set. Return an impossibly
-			 * bizarre errno to alert the user. Things like
-			 * ENOENT might be "more appropriate" but they happen
-			 * for other reasons.
-			 */
-			cell = ERR_PTR(-EDESTADDRREQ);
+	ret = -EINVAL;
+	do {
+		struct sockaddr_rxrpc *srx = &cell->vl_addrs[cell->vl_naddrs];
+
+		if (in4_pton(vllist, -1,
+			     (u8 *)&srx->transport.sin6.sin6_addr.s6_addr32[3],
+			     ',', (const char **)&vllist)) {
+			srx->transport_len = sizeof(struct sockaddr_in6);
+			srx->transport.sin6.sin6_addr.s6_addr32[0] = 0;
+			srx->transport.sin6.sin6_addr.s6_addr32[1] = 0;
+			srx->transport.sin6.sin6_addr.s6_addr32[2] = htonl(0xffff);
+		} else if (in6_pton(vllist, -1,
+				    srx->transport.sin6.sin6_addr.s6_addr,
+				    ',', (const char **)&vllist)) {
+			srx->transport_len = sizeof(struct sockaddr_in6);
+			srx->transport.sin6.sin6_family	= AF_INET6;
 		} else {
-			afs_get_cell(cell);
+			goto bad_address;
 		}
 
-	}
-
-	read_unlock(&net->cells_lock);
-	up_read(&net->cells_sem);
-	_leave(" = %p", cell);
-	return cell;
+		cell->vl_naddrs++;
+		if (!*vllist)
+			break;
+		vllist++;
 
-create_cell:
-	read_unlock(&net->cells_lock);
-	up_read(&net->cells_sem);
+	} while (cell->vl_naddrs < AFS_CELL_MAX_ADDRS);
 
-	cell = afs_cell_create(net, name, namesz, NULL, true);
+	if (cell->vl_naddrs < AFS_CELL_MAX_ADDRS)
+		memset(cell->vl_addrs + cell->vl_naddrs, 0,
+		       (AFS_CELL_MAX_ADDRS - cell->vl_naddrs) * sizeof(cell->vl_addrs[0]));
 
-	_leave(" = %p", cell);
-	return cell;
+	now = ktime_get_real_seconds();
+	cell->dns_expiry = expiry;
+	afs_set_cell_timer(cell->net, expiry - now);
+bad_address:
+	write_sequnlock(&cell->vl_addrs_lock);
+out:
+	_leave("");
 }
 
-#if 0
 /*
- * try and get a cell record
+ * Destroy a cell record
  */
-struct afs_cell *afs_get_cell_maybe(struct afs_cell *cell)
+static void afs_cell_destroy(struct rcu_head *rcu)
 {
-	write_lock(&net->cells_lock);
+	struct afs_cell *cell = container_of(rcu, struct afs_cell, rcu);
 
-	if (cell && !list_empty(&cell->link))
-		afs_get_cell(cell);
-	else
-		cell = NULL;
+	_enter("%p{%s}", cell, cell->name);
 
-	write_unlock(&net->cells_lock);
-	return cell;
+	ASSERTCMP(refcount_read(&cell->usage), ==, 0);
+
+	key_put(cell->anonymous_key);
+	kfree(cell);
+
+	_leave(" [destroyed]");
 }
-#endif  /*  0  */
 
 /*
- * destroy a cell record
+ * Queue the cell manager.
  */
-void afs_put_cell(struct afs_cell *cell)
+static void afs_queue_cell_manager(struct afs_net *net)
 {
-	if (!cell)
-		return;
+	int outstanding = atomic_inc_return(&net->cells_outstanding);
+
+	_enter("%d", outstanding);
 
-	_enter("%p{%d,%s}", cell, atomic_read(&cell->usage), cell->name);
+	if (!schedule_work(&net->cells_manager))
+		afs_dec_cells_outstanding(net);
+}
 
-	ASSERTCMP(atomic_read(&cell->usage), >, 0);
+/*
+ * Cell management timer.  We have an increment on cells_outstanding that we
+ * need to pass along to the work item.
+ */
+void afs_cells_timer(unsigned long data)
+{
+	struct afs_net *net = (struct afs_net *)data;
+
+	_enter("");
+	if (!schedule_work(&net->cells_manager))
+		afs_dec_cells_outstanding(net);
+}
 
-	/* to prevent a race, the decrement and the dequeue must be effectively
-	 * atomic */
-	write_lock(&cell->net->cells_lock);
+/*
+ * Drop a reference on a cell record.
+ */
+void afs_put_cell(struct afs_net *net, struct afs_cell *cell)
+{
+	time64_t now, expire_delay;
 
-	if (likely(!atomic_dec_and_test(&cell->usage))) {
-		write_unlock(&cell->net->cells_lock);
-		_leave("");
+	if (!cell)
 		return;
-	}
 
-	ASSERT(list_empty(&cell->servers));
-	ASSERT(list_empty(&cell->vl_list));
+	_enter("%s", cell->name);
 
-	wake_up(&cell->net->cells_freeable_wq);
+	now = ktime_get_real_seconds();
+	cell->last_inactive = now;
+	expire_delay = 0;
+	if (!test_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags) &&
+	    !test_bit(AFS_CELL_FL_NOT_FOUND, &cell->flags))
+		expire_delay = afs_cell_gc_delay;
 
-	write_unlock(&cell->net->cells_lock);
+	if (refcount_dec_return(&cell->usage) > 1)
+		return;
 
-	_leave(" [unused]");
+	/* 'cell' may now be garbage collected. */
+	afs_set_cell_timer(net, expire_delay);
 }
 
 /*
- * destroy a cell record
- * - must be called with the net->cells_sem write-locked
- * - cell->link should have been broken by the caller
+ * Allocate a key to use as a placeholder for anonymous user security.
  */
-static void afs_cell_destroy(struct afs_net *net, struct afs_cell *cell)
+static int afs_alloc_anon_key(struct afs_cell *cell)
 {
-	_enter("%p{%d,%s}", cell, atomic_read(&cell->usage), cell->name);
+	struct key *key;
+	char keyname[4 + AFS_MAXCELLNAME + 1], *cp, *dp;
 
-	ASSERTCMP(atomic_read(&cell->usage), >=, 0);
-	ASSERT(list_empty(&cell->link));
+	/* Create a key to represent an anonymous user. */
+	memcpy(keyname, "afs@", 4);
+	dp = keyname + 4;
+	cp = cell->name;
+	do {
+		*dp++ = toupper(*cp);
+	} while (*cp++);
 
-	/* wait for everyone to stop using the cell */
-	if (atomic_read(&cell->usage) > 0) {
-		DECLARE_WAITQUEUE(myself, current);
+	key = rxrpc_get_null_key(keyname);
+	if (IS_ERR(key))
+		return PTR_ERR(key);
 
-		_debug("wait for cell %s", cell->name);
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		add_wait_queue(&net->cells_freeable_wq, &myself);
+	cell->anonymous_key = key;
 
-		while (atomic_read(&cell->usage) > 0) {
-			schedule();
-			set_current_state(TASK_UNINTERRUPTIBLE);
-		}
+	_debug("anon key %p{%x}",
+	       cell->anonymous_key, key_serial(cell->anonymous_key));
+	return 0;
+}
 
-		remove_wait_queue(&net->cells_freeable_wq, &myself);
-		set_current_state(TASK_RUNNING);
+/*
+ * Activate a cell.
+ */
+static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
+{
+	int ret;
+
+	if (!cell->anonymous_key) {
+		ret = afs_alloc_anon_key(cell);
+		if (ret < 0)
+			return ret;
 	}
 
-	_debug("cell dead");
-	ASSERTCMP(atomic_read(&cell->usage), ==, 0);
-	ASSERT(list_empty(&cell->servers));
-	ASSERT(list_empty(&cell->vl_list));
+#ifdef CONFIG_AFS_FSCACHE
+	cell->cache = fscache_acquire_cookie(afs_cache_netfs.primary_index,
+					     &afs_cell_cache_index_def,
+					     cell, true);
+#endif
+	ret = afs_proc_cell_setup(net, cell);
+	if (ret < 0)
+		return ret;
+	spin_lock(&net->proc_cells_lock);
+	list_add_tail(&cell->proc_link, &net->proc_cells);
+	spin_unlock(&net->proc_cells_lock);
+	return 0;
+}
+
+/*
+ * Deactivate a cell.
+ */
+static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell)
+{
+	_enter("%s", cell->name);
 
 	afs_proc_cell_remove(net, cell);
 
-	down_write(&net->proc_cells_sem);
+	spin_lock(&net->proc_cells_lock);
 	list_del_init(&cell->proc_link);
-	up_write(&net->proc_cells_sem);
+	spin_unlock(&net->proc_cells_lock);
 
 #ifdef CONFIG_AFS_FSCACHE
 	fscache_relinquish_cookie(cell->cache, 0);
+	cell->cache = NULL;
 #endif
-	key_put(cell->anonymous_key);
-	kfree(cell);
 
-	_leave(" [destroyed]");
+	_leave("");
 }
 
 /*
- * purge in-memory cell database on module unload or afs_init() failure
- * - the timeout daemon is stopped before calling this
+ * Manage a cell record, initialising and destroying it, maintaining its DNS
+ * records.
  */
-void afs_cell_purge(struct afs_net *net)
+static void afs_manage_cell(struct work_struct *work)
 {
-	struct afs_cell *cell;
+	struct afs_cell *cell = container_of(work, struct afs_cell, manager);
+	struct afs_net *net = cell->net;
+	bool deleted;
+	int ret;
+
+	_enter("%s", cell->name);
+
+again:
+	_debug("state %u", cell->state);
+	switch (cell->state) {
+	case AFS_CELL_INACTIVE:
+	case AFS_CELL_FAILED:
+		write_seqlock(&net->cells_lock);
+		deleted = refcount_dec_if_one(&cell->usage);
+		if (deleted)
+			rb_erase(&cell->net_node, &net->cells);
+		write_sequnlock(&net->cells_lock);
+		if (deleted)
+			goto final_destruction;
+		if (cell->state == AFS_CELL_FAILED)
+			goto done;
+		cell->state = AFS_CELL_UNSET;
+		goto again;
+
+	case AFS_CELL_UNSET:
+		cell->state = AFS_CELL_ACTIVATING;
+		goto again;
+
+	case AFS_CELL_ACTIVATING:
+		ret = afs_activate_cell(net, cell);
+		if (ret < 0)
+			goto activation_failed;
+
+		cell->state = AFS_CELL_ACTIVE;
+		smp_wmb();
+		clear_bit(AFS_CELL_FL_NOT_READY, &cell->flags);
+		wake_up_bit(&cell->flags, AFS_CELL_FL_NOT_READY);
+		goto again;
+
+	case AFS_CELL_ACTIVE:
+		if (refcount_read(&cell->usage) > 1) {
+			time64_t now = ktime_get_real_seconds();
+			if (cell->dns_expiry <= now && net->live)
+				afs_update_cell(cell);
+			goto done;
+		}
+		cell->state = AFS_CELL_DEACTIVATING;
+		goto again;
+
+	case AFS_CELL_DEACTIVATING:
+		set_bit(AFS_CELL_FL_NOT_READY, &cell->flags);
+		if (refcount_read(&cell->usage) > 1)
+			goto reverse_deactivation;
+		afs_deactivate_cell(net, cell);
+		cell->state = AFS_CELL_INACTIVE;
+		goto again;
+
+	default:
+		break;
+	}
+	_debug("bad state %u", cell->state);
+	BUG(); /* Unhandled state */
+
+activation_failed:
+	cell->error = ret;
+	afs_deactivate_cell(net, cell);
+
+	cell->state = AFS_CELL_FAILED;
+	smp_wmb();
+	if (test_and_clear_bit(AFS_CELL_FL_NOT_READY, &cell->flags))
+		wake_up_bit(&cell->flags, AFS_CELL_FL_NOT_READY);
+	goto again;
+
+reverse_deactivation:
+	cell->state = AFS_CELL_ACTIVE;
+	smp_wmb();
+	clear_bit(AFS_CELL_FL_NOT_READY, &cell->flags);
+	wake_up_bit(&cell->flags, AFS_CELL_FL_NOT_READY);
+	_leave(" [deact->act]");
+	return;
+
+done:
+	_leave(" [done %u]", cell->state);
+	return;
+
+final_destruction:
+	call_rcu(&cell->rcu, afs_cell_destroy);
+	dec_after_work(&net->cells_outstanding);
+	_leave(" [destruct %d]", atomic_read(&net->cells_outstanding));
+}
+
+/*
+ * Manage the records of cells known to a network namespace.  This includes
+ * updating the DNS records and garbage collecting unused cells that were
+ * automatically added.
+ *
+ * Note that constructed cell records may only be removed from net->cells by
+ * this work item, so it is safe for this work item to stash a cursor pointing
+ * into the tree and then return to caller (provided it skips cells that are
+ * still under construction).
+ *
+ * Note also that we were given an increment on net->cells_outstanding by
+ * whoever queued us that we need to deal with before returning.
+ */
+void afs_manage_cells(struct work_struct *work)
+{
+	struct afs_net *net = container_of(work, struct afs_net, cells_manager);
+	struct rb_node *cursor;
+	time64_t now = ktime_get_real_seconds(), next_manage = TIME64_MAX;
+	bool purging = !net->live;
 
 	_enter("");
 
-	afs_put_cell(net->ws_cell);
+	/* Trawl the cell database looking for cells that have expired from
+	 * lack of use and cells whose DNS results have expired and dispatch
+	 * their managers.
+	 */
+	read_seqlock_excl(&net->cells_lock);
 
-	down_write(&net->cells_sem);
+	for (cursor = rb_first(&net->cells); cursor; cursor = rb_next(cursor)) {
+		struct afs_cell *cell =
+			rb_entry(cursor, struct afs_cell, net_node);
+		bool sched_cell = false;
 
-	while (!list_empty(&net->cells)) {
-		cell = NULL;
+		_debug("manage %s %u", cell->name, refcount_read(&cell->usage));
 
-		/* remove the next cell from the front of the list */
-		write_lock(&net->cells_lock);
+		ASSERTCMP(refcount_read(&cell->usage), >=, 1);
 
-		if (!list_empty(&net->cells)) {
-			cell = list_entry(net->cells.next,
-					  struct afs_cell, link);
-			list_del_init(&cell->link);
+		if (purging && test_and_clear_bit(AFS_CELL_FL_NO_GC, &cell->flags))
+			refcount_dec(&cell->usage);
+
+		ASSERTIFCMP(purging, refcount_read(&cell->usage), ==, 1);
+
+		if (refcount_read(&cell->usage) == 1) {
+			time64_t expire_at = cell->last_inactive;
+
+			if (!test_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags) &&
+			    !test_bit(AFS_CELL_FL_NOT_FOUND, &cell->flags))
+				expire_at += afs_cell_gc_delay;
+			if (purging || expire_at <= now)
+				sched_cell = true;
+			else if (expire_at < next_manage)
+				next_manage = expire_at;
 		}
 
-		write_unlock(&net->cells_lock);
+		if (!purging) {
+			if (cell->dns_expiry <= now)
+				sched_cell = true;
+			else if (cell->dns_expiry <= next_manage)
+				next_manage = cell->dns_expiry;
+		}
+
+		if (sched_cell)
+			schedule_work(&cell->manager);
+	}
+
+	read_sequnlock_excl(&net->cells_lock);
 
-		if (cell) {
-			_debug("PURGING CELL %s (%d)",
-			       cell->name, atomic_read(&cell->usage));
+	/* Update the timer on the way out.  We have to pass an increment on
+	 * cells_outstanding in the namespace that we are in to the timer or
+	 * the work scheduler.
+	 */
+	if (!purging && next_manage < TIME64_MAX) {
+		now = ktime_get_real_seconds();
 
-			/* now the cell should be left with no references */
-			afs_cell_destroy(net, cell);
+		if (next_manage - now <= 0) {
+			if (schedule_work(&net->cells_manager))
+				atomic_inc(&net->cells_outstanding);
+		} else {
+			afs_set_cell_timer(net, next_manage - now);
 		}
 	}
 
-	up_write(&net->cells_sem);
+	dec_after_work(&net->cells_outstanding);
+	_leave(" [%d]", atomic_read(&net->cells_outstanding));
+}
+
+/*
+ * Purge in-memory cell database.
+ */
+void afs_cell_purge(struct afs_net *net)
+{
+	struct afs_cell *ws;
+
+	_enter("");
+
+	write_seqlock(&net->cells_lock);
+	ws = net->ws_cell;
+	net->ws_cell = NULL;
+	write_sequnlock(&net->cells_lock);
+	afs_put_cell(net, ws);
+
+	_debug("del timer");
+	if (del_timer_sync(&net->cells_timer))
+		atomic_dec(&net->cells_outstanding);
+
+	_debug("kick mgr");
+	afs_queue_cell_manager(net);
+
+	_debug("wait");
+	wait_on_atomic_t(&net->cells_outstanding, atomic_t_wait,
+			 TASK_UNINTERRUPTIBLE);
 	_leave("");
 }
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 47d5ae08f071..f96398163a68 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -202,13 +202,14 @@ struct afs_net {
 	atomic_t		nr_superblocks;
 
 	/* Cell database */
-	struct list_head	cells;
+	struct rb_root		cells;
 	struct afs_cell		*ws_cell;
-	rwlock_t		cells_lock;
-	struct rw_semaphore	cells_sem;
-	wait_queue_head_t	cells_freeable_wq;
+	struct work_struct	cells_manager;
+	struct timer_list	cells_timer;
+	atomic_t		cells_outstanding;
+	seqlock_t		cells_lock;
 
-	struct rw_semaphore	proc_cells_sem;
+	spinlock_t		proc_cells_lock;
 	struct list_head	proc_cells;
 
 	/* Volume location database */
@@ -235,14 +236,26 @@ struct afs_net {
 
 extern struct afs_net __afs_net;// Dummy AFS network namespace; TODO: replace with real netns
 
+enum afs_cell_state {
+	AFS_CELL_UNSET,
+	AFS_CELL_ACTIVATING,
+	AFS_CELL_ACTIVE,
+	AFS_CELL_DEACTIVATING,
+	AFS_CELL_INACTIVE,
+	AFS_CELL_FAILED,
+};
+
 /*
  * AFS cell record
  */
 struct afs_cell {
-	atomic_t		usage;
-	struct list_head	link;		/* main cell list link */
-	struct afs_net		*net;		/* The network namespace */
+	union {
+		struct rcu_head	rcu;
+		struct rb_node	net_node;	/* Node in net->cells */
+	};
+	struct afs_net		*net;
 	struct key		*anonymous_key;	/* anonymous user key for this cell */
+	struct work_struct	manager;	/* Manager for init/deinit/dns */
 	struct list_head	proc_link;	/* /proc cell list link */
 #ifdef CONFIG_AFS_FSCACHE
 	struct fscache_cookie	*cache;		/* caching cookie */
@@ -255,12 +268,26 @@ struct afs_cell {
 	/* volume location record management */
 	struct rw_semaphore	vl_sem;		/* volume management serialisation semaphore */
 	struct list_head	vl_list;	/* cell's active VL record list */
+	time64_t		dns_expiry;	/* Time AFSDB/SRV record expires */
+	time64_t		last_inactive;	/* Time of last drop of usage count */
+	refcount_t		usage;
+	unsigned long		flags;
+#define AFS_CELL_FL_NOT_READY	0		/* The cell record is not ready for use */
+#define AFS_CELL_FL_NO_GC	1		/* The cell was added manually, don't auto-gc */
+#define AFS_CELL_FL_NOT_FOUND	2		/* Permanent DNS error */
+#define AFS_CELL_FL_DNS_FAIL	3		/* Failed to access DNS */
+	enum afs_cell_state	state;
+	short			error;
+
 	spinlock_t		vl_lock;	/* vl_list lock */
+
+	/* VLDB server list. */
+	seqlock_t		vl_addrs_lock;
 	unsigned short		vl_naddrs;	/* number of VL servers in addr list */
 	unsigned short		vl_curr_svix;	/* current server index */
 	struct sockaddr_rxrpc	vl_addrs[AFS_CELL_MAX_ADDRS];	/* cell VL server addresses */
-
-	char			name[0];	/* cell name - must go last */
+	u8			name_len;	/* Length of name */
+	char			name[64 + 1];	/* Cell name, case-flattened and NUL-padded */
 };
 
 /*
@@ -310,6 +337,7 @@ struct afs_server {
 	atomic_t		usage;
 	time64_t		time_of_death;	/* time at which put reduced usage to 0 */
 	struct sockaddr_rxrpc	addr;		/* server address */
+	struct afs_net		*net;		/* The network namespace */
 	struct afs_cell		*cell;		/* cell in which server resides */
 	struct list_head	link;		/* link in cell's server list */
 	struct list_head	grave;		/* link in master graveyard list */
@@ -476,12 +504,14 @@ extern void afs_flush_callback_breaks(struct afs_server *);
 /*
  * cell.c
  */
-#define afs_get_cell(C) do { atomic_inc(&(C)->usage); } while(0)
-extern int __net_init afs_cell_init(struct afs_net *, char *);
-extern struct afs_cell *afs_cell_create(struct afs_net *, const char *, unsigned, char *, bool);
-extern struct afs_cell *afs_cell_lookup(struct afs_net *, const char *, unsigned, bool);
-extern struct afs_cell *afs_grab_cell(struct afs_cell *);
-extern void afs_put_cell(struct afs_cell *);
+#define afs_get_cell(C) do { refcount_inc(&(C)->usage); } while(0)
+extern int __net_init afs_cell_init(struct afs_net *, const char *);
+extern struct afs_cell *afs_lookup_cell_rcu(struct afs_net *, const char *, unsigned);
+extern struct afs_cell *afs_lookup_cell(struct afs_net *, const char *, unsigned,
+					const char *, bool);
+extern void afs_put_cell(struct afs_net *, struct afs_cell *);
+extern void afs_manage_cells(struct work_struct *);
+extern void afs_cells_timer(unsigned long);
 extern void __net_exit afs_cell_purge(struct afs_net *);
 
 /*
diff --git a/fs/afs/main.c b/fs/afs/main.c
index 87b1a9c8000d..e0342e445099 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -46,12 +46,15 @@ static int __net_init afs_net_init(struct afs_net *net)
 
 	INIT_WORK(&net->charge_preallocation_work, afs_charge_preallocation);
 	mutex_init(&net->socket_mutex);
-	INIT_LIST_HEAD(&net->cells);
-	rwlock_init(&net->cells_lock);
-	init_rwsem(&net->cells_sem);
-	init_waitqueue_head(&net->cells_freeable_wq);
-	init_rwsem(&net->proc_cells_sem);
+
+	net->cells = RB_ROOT;
+	seqlock_init(&net->cells_lock);
+	INIT_WORK(&net->cells_manager, afs_manage_cells);
+	setup_timer(&net->cells_timer, afs_cells_timer, (unsigned long)net);
+
+	spin_lock_init(&net->proc_cells_lock);
 	INIT_LIST_HEAD(&net->proc_cells);
+
 	INIT_LIST_HEAD(&net->vl_updates);
 	INIT_LIST_HEAD(&net->vl_graveyard);
 	INIT_DELAYED_WORK(&net->vl_reaper, afs_vlocation_reaper);
@@ -82,11 +85,14 @@ static int __net_init afs_net_init(struct afs_net *net)
 	return 0;
 
 error_open_socket:
+	net->live = false;
 	afs_vlocation_purge(net);
 	afs_cell_purge(net);
 error_cell_init:
+	net->live = false;
 	afs_proc_cleanup(net);
 error_proc:
+	net->live = false;
 	return ret;
 }
 
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index ee10f0089d87..df3614306056 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -186,7 +186,7 @@ static void *afs_proc_cells_start(struct seq_file *m, loff_t *_pos)
 {
 	struct afs_net *net = afs_seq2net(m);
 
-	down_read(&net->proc_cells_sem);
+	rcu_read_lock();
 	return seq_list_start_head(&net->proc_cells, *_pos);
 }
 
@@ -205,9 +205,7 @@ static void *afs_proc_cells_next(struct seq_file *m, void *v, loff_t *pos)
  */
 static void afs_proc_cells_stop(struct seq_file *m, void *v)
 {
-	struct afs_net *net = afs_seq2net(m);
-
-	up_read(&net->proc_cells_sem);
+	rcu_read_unlock();
 }
 
 /*
@@ -225,8 +223,7 @@ static int afs_proc_cells_show(struct seq_file *m, void *v)
 	}
 
 	/* display one cell per line on subsequent lines */
-	seq_printf(m, "%3d %s\n",
-		   atomic_read(&cell->usage), cell->name);
+	seq_printf(m, "%3u %s\n", refcount_read(&cell->usage), cell->name);
 	return 0;
 }
 
@@ -279,13 +276,13 @@ static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf,
 	if (strcmp(kbuf, "add") == 0) {
 		struct afs_cell *cell;
 
-		cell = afs_cell_create(net, name, strlen(name), args, false);
+		cell = afs_lookup_cell(net, name, strlen(name), args, true);
 		if (IS_ERR(cell)) {
 			ret = PTR_ERR(cell);
 			goto done;
 		}
 
-		afs_put_cell(cell);
+		set_bit(AFS_CELL_FL_NO_GC, &cell->flags);
 		printk("kAFS: Added new cell '%s'\n", name);
 	} else {
 		goto inval;
@@ -354,7 +351,7 @@ int afs_proc_cell_setup(struct afs_net *net, struct afs_cell *cell)
 {
 	struct proc_dir_entry *dir;
 
-	_enter("%p{%s}", cell, cell->name);
+	_enter("%p{%s},%p", cell, cell->name, net->proc_afs);
 
 	dir = proc_mkdir(cell->name, net->proc_afs);
 	if (!dir)
diff --git a/fs/afs/server.c b/fs/afs/server.c
index 9b38f386a142..57c2f605e11b 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -68,6 +68,7 @@ static struct afs_server *afs_alloc_server(struct afs_cell *cell,
 	server = kzalloc(sizeof(struct afs_server), GFP_KERNEL);
 	if (server) {
 		atomic_set(&server->usage, 1);
+		server->net = cell->net;
 		server->cell = cell;
 
 		INIT_LIST_HEAD(&server->link);
@@ -254,7 +255,7 @@ static void afs_destroy_server(struct afs_server *server)
 	ASSERTCMP(server->cb_break_head, ==, server->cb_break_tail);
 	ASSERTCMP(atomic_read(&server->cb_break_n), ==, 0);
 
-	afs_put_cell(server->cell);
+	afs_put_cell(server->net, server->cell);
 	kfree(server);
 }
 
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 1bfc7b28700b..5113a48f907d 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -200,13 +200,14 @@ static int afs_parse_options(struct afs_mount_params *params,
 		token = match_token(p, afs_options_list, args);
 		switch (token) {
 		case afs_opt_cell:
-			cell = afs_cell_lookup(params->net,
-					       args[0].from,
-					       args[0].to - args[0].from,
-					       false);
+			rcu_read_lock();
+			cell = afs_lookup_cell_rcu(params->net,
+						   args[0].from,
+						   args[0].to - args[0].from);
+			rcu_read_unlock();
 			if (IS_ERR(cell))
 				return PTR_ERR(cell);
-			afs_put_cell(params->cell);
+			afs_put_cell(params->net, params->cell);
 			params->cell = cell;
 			break;
 
@@ -308,13 +309,14 @@ static int afs_parse_device_name(struct afs_mount_params *params,
 
 	/* lookup the cell record */
 	if (cellname || !params->cell) {
-		cell = afs_cell_lookup(params->net, cellname, cellnamesz, true);
+		cell = afs_lookup_cell(params->net, cellname, cellnamesz,
+				       NULL, false);
 		if (IS_ERR(cell)) {
 			printk(KERN_ERR "kAFS: unable to lookup cell '%*.*s'\n",
 			       cellnamesz, cellnamesz, cellname ?: "");
 			return PTR_ERR(cell);
 		}
-		afs_put_cell(params->cell);
+		afs_put_cell(params->net, params->cell);
 		params->cell = cell;
 	}
 
@@ -474,7 +476,7 @@ static struct dentry *afs_mount(struct file_system_type *fs_type,
 		kfree(as);
 	}
 
-	afs_put_cell(params.cell);
+	afs_put_cell(params.net, params.cell);
 	kfree(new_opts);
 	_leave(" = 0 [%p]", sb);
 	return dget(sb->s_root);
@@ -488,7 +490,7 @@ static struct dentry *afs_mount(struct file_system_type *fs_type,
 error_vol:
 	afs_put_volume(params.net, vol);
 error:
-	afs_put_cell(params.cell);
+	afs_put_cell(params.net, params.cell);
 	key_put(params.key);
 	kfree(new_opts);
 	_leave(" = %d", ret);
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index 4f8c15c09a6d..ec5ab8dc9bc8 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -480,11 +480,11 @@ void afs_put_vlocation(struct afs_net *net, struct afs_vlocation *vl)
 /*
  * destroy a dead volume location record
  */
-static void afs_vlocation_destroy(struct afs_vlocation *vl)
+static void afs_vlocation_destroy(struct afs_net *net, struct afs_vlocation *vl)
 {
 	_enter("%p", vl);
 
-	afs_put_cell(vl->cell);
+	afs_put_cell(net, vl->cell);
 	kfree(vl);
 }
 
@@ -539,7 +539,7 @@ void afs_vlocation_reaper(struct work_struct *work)
 	while (!list_empty(&corpses)) {
 		vl = list_entry(corpses.next, struct afs_vlocation, grave);
 		list_del(&vl->grave);
-		afs_vlocation_destroy(vl);
+		afs_vlocation_destroy(net, vl);
 	}
 
 	_leave("");
diff --git a/fs/afs/xattr.c b/fs/afs/xattr.c
index 2830e4f48d85..e58e00ee9747 100644
--- a/fs/afs/xattr.c
+++ b/fs/afs/xattr.c
@@ -45,7 +45,7 @@ static int afs_xattr_get_cell(const struct xattr_handler *handler,
 	struct afs_cell *cell = vnode->volume->cell;
 	size_t namelen;
 
-	namelen = strlen(cell->name);
+	namelen = cell->name_len;
 	if (size == 0)
 		return namelen;
 	if (namelen > size)

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 11/11] afs: Retry rxrpc calls with address rotation on network error
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
                   ` (8 preceding siblings ...)
  2017-09-01 15:42 ` [RFC PATCH 10/11] afs: Overhaul cell database management David Howells
@ 2017-09-01 15:42 ` David Howells
  2017-09-01 15:52 ` [RFC PATCH 00/11] AFS: Namespacing part 1 David Howells
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:42 UTC (permalink / raw)
  To: linux-afs; +Cc: linux-fsdevel, dhowells, linux-kernel

When a network error occurs when we attempt a call, we want to rotate the
set of addresses we have for that peer and try the call again.  Use the new
AF_RXRPC call-retrying facility to do this, thereby avoiding the need to
re-encrypt each time as this allows us to reuse the Tx-queue from the dead
call.

This method will work for accessing alternate VL servers and the various
addresses available for a single FS server, but should not be used to go to
alternate FS servers since that has other implications (such as getting
callbacks on other servers).

To this end:

 (1) An 'address list' concept is introduced.  Address lists are RCU
     replaceable lists of addresses.

 (2) A cell's VL server address list can be loaded directly via insmod or
     echo to /proc/fs/afs/cells or dynamically from a DNS query for AFSDB
     or SRV records.

 (3) An FS server's address list, for the moment, has a single entry that
     is the key to the server list.  This will change in the future when a
     server is instead keyed on its UUID and the VL.GetAddrsU operation is
     used.

 (4) Anyone wanting to use a cell's VL server address must wait until the
     cell record comes online and has tried to obtain some addresses.

 (5) An 'address cursor' concept is introduced to handle stepping over the
     address list.  For client calls, this is driven from a wrapper around
     rxrpc_kernel_send_data().  It isn't used for CM service call replies as
     they have to go to the caller's address.

In the future, we might want to annotate the list with information about
how each address fares.  We might then want to propagate such annotations
over address list replacement.

Whilst we're at it, we allow IPv6 addresses to be specified in
colon-delimited lists by enclosing them in square brackets.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/Makefile    |    1 
 fs/afs/addr_list.c |  310 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/afs/cell.c      |  191 ++++++++++++--------------------
 fs/afs/fsclient.c  |   54 ++++++---
 fs/afs/internal.h  |   60 ++++++++--
 fs/afs/proc.c      |   23 ++--
 fs/afs/rxrpc.c     |  180 +++++++++++++++++++++++++++---
 fs/afs/server.c    |   66 +++++++----
 fs/afs/vlclient.c  |   19 ++-
 fs/afs/vlocation.c |  150 +++----------------------
 fs/afs/vnode.c     |   27 ++---
 fs/afs/volume.c    |    5 +
 12 files changed, 720 insertions(+), 366 deletions(-)
 create mode 100644 fs/afs/addr_list.c

diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index 095c54165dfd..7cb4d55f6f1f 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -6,6 +6,7 @@ afs-cache-$(CONFIG_AFS_FSCACHE) := cache.o
 
 kafs-objs := \
 	$(afs-cache-y) \
+	addr_list.o \
 	callback.o \
 	cell.o \
 	cmservice.o \
diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
new file mode 100644
index 000000000000..c2dcb6021cb2
--- /dev/null
+++ b/fs/afs/addr_list.c
@@ -0,0 +1,310 @@
+/* Server address list management
+ *
+ * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/slab.h>
+#include <linux/ctype.h>
+#include <linux/dns_resolver.h>
+#include <linux/inet.h>
+#include <keys/rxrpc-type.h>
+#include "internal.h"
+
+#define AFS_MAX_ADDRESSES \
+	((PAGE_SIZE - sizeof(struct afs_addr_list)) / sizeof(struct sockaddr_rxrpc))
+
+/*
+ * Release an address list.
+ */
+void afs_put_addrlist(struct afs_addr_list *alist)
+{
+	if (alist) {
+		int usage = refcount_dec_return(&alist->usage);
+
+		if (usage == 0)
+			call_rcu(&alist->rcu, (rcu_callback_t)kfree);
+	}
+}
+
+/*
+ * Allocate an address list.
+ */
+static struct afs_addr_list *afs_alloc_addrlist(unsigned int nr,
+						unsigned short service,
+						unsigned short port)
+{
+	struct afs_addr_list *alist;
+	unsigned int i;
+
+	_enter("%u,%u,%u", nr, service, port);
+
+	alist = kzalloc(sizeof(alist) + sizeof(alist->addrs[0]) * nr,
+			GFP_KERNEL);
+	if (!alist)
+		return NULL;
+
+	refcount_set(&alist->usage, 1);
+
+	for (i = 0; i < nr; i++) {
+		struct sockaddr_rxrpc *srx = &alist->addrs[i];
+		srx->srx_family			= AF_RXRPC;
+		srx->srx_service		= service;
+		srx->transport_type		= SOCK_DGRAM;
+		srx->transport_len		= sizeof(srx->transport.sin6);
+		srx->transport.sin6.sin6_family	= AF_INET6;
+		srx->transport.sin6.sin6_port	= htons(port);
+	}
+
+	return alist;
+}
+
+/*
+ * Parse a text string consisting of delimited addresses.
+ */
+struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len,
+					   char delim,
+					   unsigned short service,
+					   unsigned short port)
+{
+	struct afs_addr_list *alist;
+	const char *p, *end = text + len;
+	unsigned int nr = 0;
+
+	_enter("%*.*s,%c", (int)len, (int)len, text, delim);
+
+	if (!len)
+		return ERR_PTR(-EDESTADDRREQ);
+
+	if (delim == ':' && (memchr(text, ',', len) || !memchr(text, '.', len)))
+		delim = ',';
+
+	/* Count the addresses */
+	p = text;
+	do {
+		if (!*p)
+			return ERR_PTR(-EINVAL);
+		if (*p == delim)
+			continue;
+		nr++;
+		if (*p == '[') {
+			p++;
+			if (p == end)
+				return ERR_PTR(-EINVAL);
+			p = memchr(p, ']', end - p);
+			if (!p)
+				return ERR_PTR(-EINVAL);
+			p++;
+			if (p >= end)
+				break;
+		}
+
+		p = memchr(p, delim, end - p);
+		if (!p)
+			break;
+		p++;
+	} while (p < end);
+
+	_debug("%u/%lu addresses", nr, AFS_MAX_ADDRESSES);
+	if (nr > AFS_MAX_ADDRESSES)
+		nr = AFS_MAX_ADDRESSES;
+
+	alist = afs_alloc_addrlist(nr, service, port);
+	if (!alist)
+		return ERR_PTR(-ENOMEM);
+
+	/* Extract the addresses */
+	p = text;
+	do {
+		struct sockaddr_rxrpc *srx = &alist->addrs[alist->nr_addrs];
+		char tdelim = delim;
+
+		if (*p == delim) {
+			p++;
+			continue;
+		}
+
+		if (*p == '[') {
+			p++;
+			tdelim = ']';
+		}
+
+		if (in4_pton(p, end - p,
+			     (u8 *)&srx->transport.sin6.sin6_addr.s6_addr32[3],
+			     tdelim, &p)) {
+			srx->transport.sin6.sin6_addr.s6_addr32[0] = 0;
+			srx->transport.sin6.sin6_addr.s6_addr32[1] = 0;
+			srx->transport.sin6.sin6_addr.s6_addr32[2] = htonl(0xffff);
+		} else if (in6_pton(p, end - p,
+				    srx->transport.sin6.sin6_addr.s6_addr,
+				    tdelim, &p)) {
+			/* Nothing to do */
+		} else {
+			goto bad_address;
+		}
+
+		if (tdelim == ']') {
+			if (p == end || *p != ']')
+				goto bad_address;
+			p++;
+		}
+
+		if (p < end) {
+			if (*p == '+') {
+				/* Port number specification "+1234" */
+				unsigned int xport = 0;
+				p++;
+				if (p >= end || !isdigit(*p))
+					goto bad_address;
+				do {
+					xport *= 10;
+					xport += *p - '0';
+					if (xport > 65535)
+						goto bad_address;
+					p++;
+				} while (p < end && isdigit(*p));
+				srx->transport.sin6.sin6_port = htons(xport);
+			} else if (*p == delim) {
+				p++;
+			} else {
+				goto bad_address;
+			}
+		}
+
+		alist->nr_addrs++;
+	} while (p < end && alist->nr_addrs < AFS_MAX_ADDRESSES);
+
+	_leave(" = [nr %u]", alist->nr_addrs);
+	return alist;
+
+bad_address:
+	kfree(alist);
+	return ERR_PTR(-EINVAL);
+}
+
+/*
+ * Compare old and new address lists to see if there's been any change.
+ * - How to do this in better than O(Nlog(N)) time?
+ *   - We don't really want to sort the address list, but would rather take the
+ *     list as we got it so as not to undo record rotation by the DNS server.
+ */
+#if 0
+static int afs_cmp_addr_list(const struct afs_addr_list *a1,
+			     const struct afs_addr_list *a2)
+{
+}
+#endif
+
+/*
+ * Perform a DNS query for VL servers and build a up an address list.
+ */
+struct afs_addr_list *afs_dns_query(struct afs_cell *cell, time64_t *_expiry)
+{
+	struct afs_addr_list *alist;
+	char *vllist = NULL;
+	int ret;
+
+	_enter("%s", cell->name);
+
+	ret = dns_query("afsdb", cell->name, cell->name_len,
+			"ipv4", &vllist, _expiry);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	alist = afs_parse_text_addrs(vllist, strlen(vllist), ',',
+				     VL_SERVICE, AFS_VL_PORT);
+	if (IS_ERR(alist)) {
+		kfree(vllist);
+		if (alist != ERR_PTR(-ENOMEM))
+			pr_err("Failed to parse DNS data\n");
+		return alist;
+	}
+
+	kfree(vllist);
+	return alist;
+}
+
+/*
+ * Get an address to try.
+ */
+struct sockaddr_rxrpc *afs_get_address(struct afs_addr_cursor *ac)
+{
+	unsigned short index;
+
+	_enter("%hu+%hd", ac->start, (short)ac->index);
+
+	if (!ac->alist)
+		return ERR_PTR(ac->error);
+
+	ac->index++;
+	if (ac->index == ac->alist->nr_addrs)
+		return ERR_PTR(-EDESTADDRREQ);
+
+	index = ac->start + ac->index;
+	if (index >= ac->alist->nr_addrs)
+		index -= ac->alist->nr_addrs;
+
+	return &ac->alist->addrs[index];
+}
+
+/*
+ * Release an address list cursor.
+ */
+void afs_end_cursor(struct afs_addr_cursor *ac)
+{
+	afs_put_addrlist(ac->alist);
+}
+
+/*
+ * Set the address cursor for iterating over VL servers.
+ */
+void afs_set_vl_cursor(struct afs_call *call, struct afs_cell *cell)
+{
+	struct afs_addr_cursor *ac = &call->cursor;
+	struct afs_addr_list *alist;
+	int ret;
+
+	if (!rcu_access_pointer(cell->vl_addrs)) {
+		ret = wait_on_bit(&cell->flags, AFS_CELL_FL_NO_LOOKUP_YET,
+				  TASK_INTERRUPTIBLE);
+		if (ret < 0) {
+			ac->error = ret;
+			return;
+		}
+
+		if (!rcu_access_pointer(cell->vl_addrs) &&
+		    ktime_get_real_seconds() < cell->dns_expiry) {
+			ac->error = cell->error;
+			return;
+		}
+	}
+
+	read_lock(&cell->vl_addrs_lock);
+	alist = rcu_dereference_protected(cell->vl_addrs,
+					  lockdep_is_held(&cell->vl_addrs_lock));
+	afs_get_addrlist(alist);
+	read_unlock(&cell->vl_addrs_lock);
+
+	ac->alist = alist;
+	ac->start = alist->index;
+	ac->index = 0xffff;
+	ac->error = 0;
+}
+
+/*
+ * Set the address cursor for iterating over FS servers.
+ */
+void afs_set_fs_cursor(struct afs_call *call, struct afs_server *server)
+{
+	struct afs_addr_cursor *ac = &call->cursor;
+
+	ac->alist = afs_get_addrlist(server->addrs);
+	ac->start = ac->alist->index;
+	ac->index = 0xffff;
+	ac->error = 0;
+}
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index 078ffd90e5f4..d99824fc7f3f 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -9,7 +9,6 @@
  * 2 of the License, or (at your option) any later version.
  */
 
-#include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/key.h>
 #include <linux/ctype.h>
@@ -51,8 +50,8 @@ struct afs_cell *afs_lookup_cell_rcu(struct afs_net *net,
 {
 	struct afs_cell *cell = NULL;
 	struct rb_node *p;
-	unsigned int seq = 0, n;
-	int ret = 0;
+	unsigned int seq = 0;
+	int n, ret = 0;
 
 	_enter("%*.*s", namesz, namesz, name);
 
@@ -69,12 +68,12 @@ struct afs_cell *afs_lookup_cell_rcu(struct afs_net *net,
 		read_seqbegin_or_lock(&net->cells_lock, &seq);
 
 		if (!name) {
-			ret = -EDESTADDRREQ;
 			cell = rcu_dereference_raw(net->ws_cell);
-			if (!cell)
+			if (cell) {
+				afs_get_cell(cell);
 				goto done;
-
-			afs_get_cell(cell);
+			}
+			ret = -EDESTADDRREQ;
 			goto done;
 		}
 
@@ -148,70 +147,33 @@ static struct afs_cell *afs_alloc_cell(struct afs_net *net,
 	init_rwsem(&cell->vl_sem);
 	INIT_LIST_HEAD(&cell->vl_list);
 	spin_lock_init(&cell->vl_lock);
-	seqlock_init(&cell->vl_addrs_lock);
-	cell->flags = (1 << AFS_CELL_FL_NOT_READY);
-
-	for (i = 0; i < AFS_CELL_MAX_ADDRS; i++) {
-		struct sockaddr_rxrpc *srx = &cell->vl_addrs[i];
-		srx->srx_family			= AF_RXRPC;
-		srx->srx_service		= VL_SERVICE;
-		srx->transport_type		= SOCK_DGRAM;
-		srx->transport.sin6.sin6_family	= AF_INET6;
-		srx->transport.sin6.sin6_port	= htons(AFS_VL_PORT);
-	}
+	cell->flags = ((1 << AFS_CELL_FL_NOT_READY) |
+		       (1 << AFS_CELL_FL_NO_LOOKUP_YET));
+	rwlock_init(&cell->vl_addrs_lock);
 
 	/* Fill in the VL server list if we were given a list of addresses to
 	 * use.
 	 */
 	if (vllist) {
-		char delim = ':';
-
-		if (strchr(vllist, ',') || !strchr(vllist, '.'))
-			delim = ',';
-
-		do {
-			struct sockaddr_rxrpc *srx = &cell->vl_addrs[cell->vl_naddrs];
-
-			if (in4_pton(vllist, -1,
-				     (u8 *)&srx->transport.sin6.sin6_addr.s6_addr32[3],
-				     delim, &vllist)) {
-				srx->transport_len = sizeof(struct sockaddr_in6);
-				srx->transport.sin6.sin6_addr.s6_addr32[0] = 0;
-				srx->transport.sin6.sin6_addr.s6_addr32[1] = 0;
-				srx->transport.sin6.sin6_addr.s6_addr32[2] = htonl(0xffff);
-			} else if (in6_pton(vllist, -1,
-					    srx->transport.sin6.sin6_addr.s6_addr,
-					    delim, &vllist)) {
-				srx->transport_len = sizeof(struct sockaddr_in6);
-				srx->transport.sin6.sin6_family	= AF_INET6;
-			} else {
-				goto bad_address;
-			}
-
-			cell->vl_naddrs++;
-			if (!*vllist)
-				break;
-			vllist++;
+		struct afs_addr_list *alist;
 
-		} while (cell->vl_naddrs < AFS_CELL_MAX_ADDRS && vllist);
+		alist = afs_parse_text_addrs(vllist, strlen(vllist), ':',
+					     VL_SERVICE, AFS_VL_PORT);
+		if (IS_ERR(alist)) {
+			ret = PTR_ERR(alist);
+			goto parse_failed;
+		}
 
-		/* Disable DNS refresh for manually-specified cells and set the
-		 * no-garbage collect flag (which pins the active count).
-		 */
+		rcu_assign_pointer(cell->vl_addrs, alist);
 		cell->dns_expiry = TIME64_MAX;
-	} else {
-		/* We're going to need to 'refresh' this cell's VL server list
-		 * from the DNS before we can use it.
-		 */
-		cell->dns_expiry = S64_MIN;
 	}
 
 	_leave(" = %p", cell);
 	return cell;
 
-bad_address:
-	printk(KERN_ERR "kAFS: bad VL server IP address\n");
-	ret = -EINVAL;
+parse_failed:
+	if (ret == -EINVAL)
+		printk(KERN_ERR "kAFS: bad VL server IP address\n");
 	kfree(cell);
 	_leave(" = %d", ret);
 	return ERR_PTR(ret);
@@ -322,16 +284,17 @@ struct afs_cell *afs_lookup_cell(struct afs_net *net,
 	if (excl) {
 		ret = -EEXIST;
 	} else {
-		ASSERTCMP(refcount_read(&cursor->usage), >=, 1);
-		refcount_inc(&cursor->usage);
+		afs_get_cell(cursor);
 		ret = 0;
 	}
 	write_sequnlock(&net->cells_lock);
 	kfree(candidate);
 	if (ret == 0)
 		goto wait_for_cell;
+	goto error_noput;
 error:
 	afs_put_cell(net, cell);
+error_noput:
 	_leave(" = %d [error]", ret);
 	return ERR_PTR(ret);
 }
@@ -393,78 +356,50 @@ int afs_cell_init(struct afs_net *net, const char *rootcell)
  */
 static void afs_update_cell(struct afs_cell *cell)
 {
+	struct afs_addr_list *alist, *old;
 	time64_t now, expiry;
-	char *vllist = NULL;
-	int ret;
 
 	_enter("%s", cell->name);
 
-	ret = dns_query("afsdb", cell->name, cell->name_len,
-			"ipv4", &vllist, &expiry);
-	_debug("query %d", ret);
-	switch (ret) {
-	case 0 ... INT_MAX:
-		clear_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags);
-		clear_bit(AFS_CELL_FL_NOT_FOUND, &cell->flags);
-		goto parse_dns_data;
+	alist = afs_dns_query(cell, &expiry);
+	if (IS_ERR(alist)) {
+		switch (PTR_ERR(alist)) {
+		case -ENODATA:
+			/* The DNS said that the cell does not exist */
+			set_bit(AFS_CELL_FL_NOT_FOUND, &cell->flags);
+			clear_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags);
+			cell->dns_expiry = ktime_get_real_seconds() + 61;
+			break;
 
-	case -ENODATA:
-		clear_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags);
-		set_bit(AFS_CELL_FL_NOT_FOUND, &cell->flags);
-		cell->dns_expiry = ktime_get_real_seconds() + 61;
-		cell->error = -EDESTADDRREQ;
-		goto out;
+		case -EAGAIN:
+		case -ECONNREFUSED:
+		default:
+			set_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags);
+			cell->dns_expiry = ktime_get_real_seconds() + 10;
+			break;
+		}
 
-	case -EAGAIN:
-	case -ECONNREFUSED:
-	default:
-		/* Unable to query DNS. */
-		set_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags);
-		cell->dns_expiry = ktime_get_real_seconds() + 10;
 		cell->error = -EDESTADDRREQ;
-		goto out;
-	}
-
-parse_dns_data:
-	write_seqlock(&cell->vl_addrs_lock);
-
-	ret = -EINVAL;
-	do {
-		struct sockaddr_rxrpc *srx = &cell->vl_addrs[cell->vl_naddrs];
-
-		if (in4_pton(vllist, -1,
-			     (u8 *)&srx->transport.sin6.sin6_addr.s6_addr32[3],
-			     ',', (const char **)&vllist)) {
-			srx->transport_len = sizeof(struct sockaddr_in6);
-			srx->transport.sin6.sin6_addr.s6_addr32[0] = 0;
-			srx->transport.sin6.sin6_addr.s6_addr32[1] = 0;
-			srx->transport.sin6.sin6_addr.s6_addr32[2] = htonl(0xffff);
-		} else if (in6_pton(vllist, -1,
-				    srx->transport.sin6.sin6_addr.s6_addr,
-				    ',', (const char **)&vllist)) {
-			srx->transport_len = sizeof(struct sockaddr_in6);
-			srx->transport.sin6.sin6_family	= AF_INET6;
-		} else {
-			goto bad_address;
-		}
+	} else {
+		clear_bit(AFS_CELL_FL_DNS_FAIL, &cell->flags);
+		clear_bit(AFS_CELL_FL_NOT_FOUND, &cell->flags);
 
-		cell->vl_naddrs++;
-		if (!*vllist)
-			break;
-		vllist++;
+		/* Exclusion on changing vl_addrs is achieved by a
+		 * non-reentrant work item.
+		 */
+		old = rcu_dereference_protected(cell->vl_addrs, true);
+		rcu_assign_pointer(cell->vl_addrs, alist);
+		cell->dns_expiry = expiry;
 
-	} while (cell->vl_naddrs < AFS_CELL_MAX_ADDRS);
+		if (old)
+			afs_put_addrlist(old);
+	}
 
-	if (cell->vl_naddrs < AFS_CELL_MAX_ADDRS)
-		memset(cell->vl_addrs + cell->vl_naddrs, 0,
-		       (AFS_CELL_MAX_ADDRS - cell->vl_naddrs) * sizeof(cell->vl_addrs[0]));
+	if (test_and_clear_bit(AFS_CELL_FL_NO_LOOKUP_YET, &cell->flags))
+		wake_up_bit(&cell->flags, AFS_CELL_FL_NO_LOOKUP_YET);
 
 	now = ktime_get_real_seconds();
-	cell->dns_expiry = expiry;
-	afs_set_cell_timer(cell->net, expiry - now);
-bad_address:
-	write_sequnlock(&cell->vl_addrs_lock);
-out:
+	afs_set_cell_timer(cell->net, cell->dns_expiry - now);
 	_leave("");
 }
 
@@ -479,6 +414,7 @@ static void afs_cell_destroy(struct rcu_head *rcu)
 
 	ASSERTCMP(refcount_read(&cell->usage), ==, 0);
 
+	afs_put_addrlist(cell->vl_addrs);
 	key_put(cell->anonymous_key);
 	kfree(cell);
 
@@ -512,11 +448,23 @@ void afs_cells_timer(unsigned long data)
 }
 
 /*
+ * Get a reference on a cell record.
+ */
+struct afs_cell *afs_get_cell(struct afs_cell *cell)
+{
+	unsigned int usage;
+
+	usage = refcount_inc_return(&cell->usage);
+	return cell;
+}
+
+/*
  * Drop a reference on a cell record.
  */
 void afs_put_cell(struct afs_net *net, struct afs_cell *cell)
 {
 	time64_t now, expire_delay;
+	unsigned int usage;
 
 	if (!cell)
 		return;
@@ -530,7 +478,8 @@ void afs_put_cell(struct afs_net *net, struct afs_cell *cell)
 	    !test_bit(AFS_CELL_FL_NOT_FOUND, &cell->flags))
 		expire_delay = afs_cell_gc_delay;
 
-	if (refcount_dec_return(&cell->usage) > 1)
+	usage = refcount_dec_return(&cell->usage);
+	if (usage > 1)
 		return;
 
 	/* 'cell' may now be garbage collected. */
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index bac2e8db6e75..f4e3ec104ac4 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -305,7 +305,8 @@ int afs_fs_fetch_file_status(struct afs_server *server,
 	bp[2] = htonl(vnode->fid.vnode);
 	bp[3] = htonl(vnode->fid.unique);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -516,7 +517,8 @@ static int afs_fs_fetch_data64(struct afs_server *server,
 	bp[7] = htonl(lower_32_bits(req->len));
 
 	atomic_inc(&req->usage);
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -559,7 +561,8 @@ int afs_fs_fetch_data(struct afs_server *server,
 	bp[5] = htonl(lower_32_bits(req->len));
 
 	atomic_inc(&req->usage);
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -641,7 +644,8 @@ int afs_fs_give_up_callbacks(struct afs_net *net,
 	ASSERT(ncallbacks > 0);
 	wake_up_nr(&server->cb_break_waitq, ncallbacks);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -736,7 +740,8 @@ int afs_fs_create(struct afs_server *server,
 	*bp++ = htonl(mode & S_IALLUGO); /* unix mode */
 	*bp++ = 0; /* segment size */
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -815,7 +820,8 @@ int afs_fs_remove(struct afs_server *server,
 		bp = (void *) bp + padsz;
 	}
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -899,7 +905,8 @@ int afs_fs_link(struct afs_server *server,
 	*bp++ = htonl(vnode->fid.vnode);
 	*bp++ = htonl(vnode->fid.unique);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1002,7 +1009,8 @@ int afs_fs_symlink(struct afs_server *server,
 	*bp++ = htonl(S_IRWXUGO); /* unix mode */
 	*bp++ = 0; /* segment size */
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1104,7 +1112,8 @@ int afs_fs_rename(struct afs_server *server,
 		bp = (void *) bp + n_padsz;
 	}
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1207,7 +1216,8 @@ static int afs_fs_store_data64(struct afs_server *server,
 	*bp++ = htonl(i_size >> 32);
 	*bp++ = htonl((u32) i_size);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1280,7 +1290,8 @@ int afs_fs_store_data(struct afs_server *server, struct afs_writeback *wb,
 	*bp++ = htonl(size);
 	*bp++ = htonl(i_size);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1380,7 +1391,8 @@ static int afs_fs_setattr_size64(struct afs_server *server, struct key *key,
 	*bp++ = htonl(attr->ia_size >> 32);	/* new file length */
 	*bp++ = htonl((u32) attr->ia_size);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1427,7 +1439,8 @@ static int afs_fs_setattr_size(struct afs_server *server, struct key *key,
 	*bp++ = 0;				/* size of write */
 	*bp++ = htonl(attr->ia_size);		/* new file length */
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1468,7 +1481,8 @@ int afs_fs_setattr(struct afs_server *server, struct key *key,
 
 	xdr_encode_AFS_StoreStatus(&bp, attr);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1699,7 +1713,8 @@ int afs_fs_get_volume_status(struct afs_server *server,
 	bp[0] = htonl(FSGETVOLUMESTATUS);
 	bp[1] = htonl(vnode->fid.vid);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1784,7 +1799,8 @@ int afs_fs_set_lock(struct afs_server *server,
 	*bp++ = htonl(vnode->fid.unique);
 	*bp++ = htonl(type);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1815,7 +1831,8 @@ int afs_fs_extend_lock(struct afs_server *server,
 	*bp++ = htonl(vnode->fid.vnode);
 	*bp++ = htonl(vnode->fid.unique);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
 
 /*
@@ -1846,5 +1863,6 @@ int afs_fs_release_lock(struct afs_server *server,
 	*bp++ = htonl(vnode->fid.vnode);
 	*bp++ = htonl(vnode->fid.unique);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, async);
+	afs_set_fs_cursor(call, server);
+	return afs_make_call(call, GFP_NOFS, async);
 }
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index f96398163a68..a75b67e816cd 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -31,6 +31,7 @@
 
 struct pagevec;
 struct afs_call;
+struct afs_addr_cursor;
 
 typedef enum {
 	AFS_VL_NEW,			/* new, uninitialised record */
@@ -66,6 +67,24 @@ enum afs_call_state {
 };
 
 /*
+ * List of server addresses.
+ */
+struct afs_addr_list {
+	struct rcu_head		rcu;		/* Must be first */
+	refcount_t		usage;
+	unsigned short		nr_addrs;
+	unsigned short		index;		/* Address currently in use */
+	struct sockaddr_rxrpc	addrs[];
+};
+
+struct afs_addr_cursor {
+	struct afs_addr_list	*alist;
+	unsigned short		start;		/* Starting point in alist->addrs[] */
+	unsigned short		index;		/* Wrapping offset from start to current addr */
+	short			error;
+};
+
+/*
  * a record of an in-progress RxRPC call
  */
 struct afs_call {
@@ -77,6 +96,7 @@ struct afs_call {
 	struct key		*key;		/* security for this call */
 	struct afs_net		*net;		/* The network namespace */
 	struct afs_server	*server;	/* server affected by incoming CM call */
+	struct afs_addr_cursor	cursor;		/* Address/server rotation cursor */
 	void			*request;	/* request data (first part) */
 	struct address_space	*mapping;	/* page set */
 	struct afs_writeback	*wb;		/* writeback being performed */
@@ -276,16 +296,15 @@ struct afs_cell {
 #define AFS_CELL_FL_NO_GC	1		/* The cell was added manually, don't auto-gc */
 #define AFS_CELL_FL_NOT_FOUND	2		/* Permanent DNS error */
 #define AFS_CELL_FL_DNS_FAIL	3		/* Failed to access DNS */
+#define AFS_CELL_FL_NO_LOOKUP_YET 4		/* Not completed first DNS lookup yet */
 	enum afs_cell_state	state;
 	short			error;
 
 	spinlock_t		vl_lock;	/* vl_list lock */
 
 	/* VLDB server list. */
-	seqlock_t		vl_addrs_lock;
-	unsigned short		vl_naddrs;	/* number of VL servers in addr list */
-	unsigned short		vl_curr_svix;	/* current server index */
-	struct sockaddr_rxrpc	vl_addrs[AFS_CELL_MAX_ADDRS];	/* cell VL server addresses */
+	rwlock_t		vl_addrs_lock;	/* Lock on vl_addrs */
+	struct afs_addr_list	__rcu *vl_addrs; /* List of VL servers */
 	u8			name_len;	/* Length of name */
 	char			name[64 + 1];	/* Cell name, case-flattened and NUL-padded */
 };
@@ -336,7 +355,7 @@ struct afs_vlocation {
 struct afs_server {
 	atomic_t		usage;
 	time64_t		time_of_death;	/* time at which put reduced usage to 0 */
-	struct sockaddr_rxrpc	addr;		/* server address */
+	struct afs_addr_list	__rcu *addrs;	/* List of addresses for this server */
 	struct afs_net		*net;		/* The network namespace */
 	struct afs_cell		*cell;		/* cell in which server resides */
 	struct list_head	link;		/* link in cell's server list */
@@ -474,6 +493,23 @@ struct afs_interface {
 
 /*****************************************************************************/
 /*
+ * addr_list.c
+ */
+static inline struct afs_addr_list *afs_get_addrlist(struct afs_addr_list *alist)
+{
+	refcount_inc(&alist->usage);
+	return alist;
+}
+extern void afs_put_addrlist(struct afs_addr_list *);
+extern struct afs_addr_list *afs_parse_text_addrs(const char *, size_t, char,
+						  unsigned short, unsigned short);
+extern struct afs_addr_list *afs_dns_query(struct afs_cell *, time64_t *);
+extern void afs_set_vl_cursor(struct afs_call *, struct afs_cell *);
+extern void afs_set_fs_cursor(struct afs_call *, struct afs_server *);
+extern struct sockaddr_rxrpc *afs_get_address(struct afs_addr_cursor *);
+extern 	void afs_end_cursor(struct afs_addr_cursor *);
+
+/*
  * cache.c
  */
 #ifdef CONFIG_AFS_FSCACHE
@@ -504,11 +540,11 @@ extern void afs_flush_callback_breaks(struct afs_server *);
 /*
  * cell.c
  */
-#define afs_get_cell(C) do { refcount_inc(&(C)->usage); } while(0)
 extern int __net_init afs_cell_init(struct afs_net *, const char *);
 extern struct afs_cell *afs_lookup_cell_rcu(struct afs_net *, const char *, unsigned);
 extern struct afs_cell *afs_lookup_cell(struct afs_net *, const char *, unsigned,
 					const char *, bool);
+extern struct afs_cell *afs_get_cell(struct afs_cell *);
 extern void afs_put_cell(struct afs_net *, struct afs_cell *);
 extern void afs_manage_cells(struct work_struct *);
 extern void afs_cells_timer(unsigned long);
@@ -662,7 +698,7 @@ extern void __net_exit afs_close_socket(struct afs_net *);
 extern void afs_charge_preallocation(struct work_struct *);
 extern void afs_put_call(struct afs_call *);
 extern int afs_queue_call_work(struct afs_call *);
-extern int afs_make_call(struct sockaddr_rxrpc *, struct afs_call *, gfp_t, bool);
+extern int afs_make_call(struct afs_call *, gfp_t, bool);
 extern struct afs_call *afs_alloc_flat_call(struct afs_net *,
 					    const struct afs_call_type *,
 					    size_t, size_t);
@@ -713,12 +749,10 @@ extern void __exit afs_fs_exit(void);
 /*
  * vlclient.c
  */
-extern int afs_vl_get_entry_by_name(struct afs_net *,
-				    struct sockaddr_rxrpc *, struct key *,
-				    const char *, struct afs_cache_vlocation *,
-				    bool);
-extern int afs_vl_get_entry_by_id(struct afs_net *,
-				  struct sockaddr_rxrpc *, struct key *,
+extern int afs_vl_get_entry_by_name(struct afs_cell *, struct key *,
+				    const char *,
+				    struct afs_cache_vlocation *, bool);
+extern int afs_vl_get_entry_by_id(struct afs_cell *,struct key *,
 				  afs_volid_t, afs_voltype_t,
 				  struct afs_cache_vlocation *, bool);
 
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index df3614306056..d75626613ed9 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -514,23 +514,23 @@ static int afs_proc_cell_vlservers_open(struct inode *inode, struct file *file)
  */
 static void *afs_proc_cell_vlservers_start(struct seq_file *m, loff_t *_pos)
 {
+	struct afs_addr_list *alist;
 	struct afs_cell *cell = m->private;
 	loff_t pos = *_pos;
 
-	_enter("cell=%p pos=%Ld", cell, *_pos);
+	rcu_read_lock();
 
-	/* lock the list against modification */
-	down_read(&cell->vl_sem);
+	alist = rcu_dereference(cell->vl_addrs);
 
 	/* allow for the header line */
 	if (!pos)
 		return (void *) 1;
 	pos--;
 
-	if (pos >= cell->vl_naddrs)
+	if (!alist || pos >= alist->nr_addrs)
 		return NULL;
 
-	return &cell->vl_addrs[pos];
+	return alist->addrs + pos;
 }
 
 /*
@@ -539,17 +539,18 @@ static void *afs_proc_cell_vlservers_start(struct seq_file *m, loff_t *_pos)
 static void *afs_proc_cell_vlservers_next(struct seq_file *p, void *v,
 					  loff_t *_pos)
 {
+	struct afs_addr_list *alist;
 	struct afs_cell *cell = p->private;
 	loff_t pos;
 
-	_enter("cell=%p{nad=%u} pos=%Ld", cell, cell->vl_naddrs, *_pos);
+	alist = rcu_dereference(cell->vl_addrs);
 
 	pos = *_pos;
 	(*_pos)++;
-	if (pos >= cell->vl_naddrs)
+	if (!alist || pos >= alist->nr_addrs)
 		return NULL;
 
-	return &cell->vl_addrs[pos];
+	return alist->addrs + pos;
 }
 
 /*
@@ -557,9 +558,7 @@ static void *afs_proc_cell_vlservers_next(struct seq_file *p, void *v,
  */
 static void afs_proc_cell_vlservers_stop(struct seq_file *p, void *v)
 {
-	struct afs_cell *cell = p->private;
-
-	up_read(&cell->vl_sem);
+	rcu_read_unlock();
 }
 
 /*
@@ -658,7 +657,7 @@ static int afs_proc_cell_servers_show(struct seq_file *m, void *v)
 	}
 
 	/* display one cell per line on subsequent lines */
-	sprintf(ipaddr, "%pISp", &server->addr.transport);
+	sprintf(ipaddr, "%pISp", &server->addrs->addrs[0].transport);
 	seq_printf(m, "%3d %-15s %5d\n",
 		   atomic_read(&server->usage), ipaddr, server->fs_state);
 
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 805ae0542478..ab149f67f908 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -162,6 +162,7 @@ void afs_put_call(struct afs_call *call)
 		if (call->type->destructor)
 			call->type->destructor(call);
 
+		afs_end_cursor(&call->cursor);
 		kfree(call->request);
 		kfree(call);
 
@@ -287,6 +288,84 @@ static void afs_notify_end_request_tx(struct sock *sock,
 }
 
 /*
+ * Send data through rxrpc and rotate the destination address if a network
+ * error of some sort occurs.
+ */
+static int afs_send_data(struct afs_call *call, struct msghdr *msg,
+			 unsigned int bytes)
+{
+	enum rxrpc_call_completion compl;
+	struct sockaddr_rxrpc *srx;
+	int ret;
+
+resume:
+	ret = rxrpc_kernel_send_data(call->net->socket, call->rxcall,
+				     msg, bytes, afs_notify_end_request_tx);
+
+	/* Success and obvious local errors are returned immediately.  Note
+	 * that for an async operation, the call struct may already have
+	 * evaporated.
+	 */
+	if (ret >= 0 ||
+	    ret == -ENOMEM ||
+	    ret == -ENONET ||
+	    ret == -EINTR ||
+	    ret == -EFAULT ||
+	    ret == -ERESTARTSYS ||
+	    ret == -EKEYEXPIRED ||
+	    ret == -EKEYREVOKED ||
+	    ret == -EKEYREJECTED ||
+	    ret == -EPERM)
+		return ret;
+
+	/* Check to see if it's an error that meant the call data packets never
+	 * reached the peer.
+	 */
+	call->error = rxrpc_kernel_check_call(call->net->socket, call->rxcall,
+					      &compl, &call->abort_code);
+	if (call->error != -EINPROGRESS)
+		return ret;
+
+	switch (compl) {
+	case RXRPC_CALL_SUCCEEDED:
+	default:
+		WARN_ONCE(true, "AFS: Call succeeded despite send-data failing\n");
+		return 0;
+
+	case RXRPC_CALL_REMOTELY_ABORTED:
+	case RXRPC_CALL_LOCALLY_ABORTED:
+		/* All of these indicate that we had some interaction with the
+		 * server, so there's no point trying another server.
+		 */
+		return call->error;
+
+	case RXRPC_CALL_LOCAL_ERROR:
+	case RXRPC_CALL_NETWORK_ERROR:
+		/* Local errors from an attempt to connect a call and network
+		 * errors reported back by ICMP suggest skipping the current
+		 * address and trying the next.
+		 */
+		break;
+	}
+
+	/* Rotate servers if possible. */
+	srx = afs_get_address(&call->cursor);
+	if (IS_ERR(srx)) {
+		_leave(" = %ld [cursor]", PTR_ERR(srx));
+		return PTR_ERR(srx);
+	}
+
+	ret = rxrpc_kernel_retry_call(call->net->socket, call->rxcall,
+				      srx, call->key);
+	if (ret < 0)
+		return ret;
+
+	if (msg_data_left(msg) > 0)
+		goto resume;
+	return 0;
+}
+
+/*
  * attach the data from a bunch of pages on an inode to a call
  */
 static int afs_send_pages(struct afs_call *call, struct msghdr *msg)
@@ -305,8 +384,7 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg)
 		bytes = msg->msg_iter.count;
 		nr = msg->msg_iter.nr_segs;
 
-		ret = rxrpc_kernel_send_data(call->net->socket, call->rxcall, msg,
-					     bytes, afs_notify_end_request_tx);
+		ret = afs_send_data(call, msg, bytes);
 		for (loop = 0; loop < nr; loop++)
 			put_page(bv[loop].bv_page);
 		if (ret < 0)
@@ -321,9 +399,9 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg)
 /*
  * initiate a call
  */
-int afs_make_call(struct sockaddr_rxrpc *srx, struct afs_call *call,
-		  gfp_t gfp, bool async)
+int afs_make_call(struct afs_call *call, gfp_t gfp, bool async)
 {
+	struct sockaddr_rxrpc *srx;
 	struct rxrpc_call *rxcall;
 	struct msghdr msg;
 	struct kvec iov[1];
@@ -332,7 +410,7 @@ int afs_make_call(struct sockaddr_rxrpc *srx, struct afs_call *call,
 	u32 abort_code;
 	int ret;
 
-	_enter(",{%pISp},", &srx->transport);
+	_enter("");
 
 	ASSERT(call->type != NULL);
 	ASSERT(call->type->name != NULL);
@@ -354,6 +432,11 @@ int afs_make_call(struct sockaddr_rxrpc *srx, struct afs_call *call,
 	}
 
 	/* create a call */
+	srx = afs_get_address(&call->cursor);
+	if (IS_ERR(srx))
+		return PTR_ERR(srx);
+
+	_debug("call %pISp", &srx->transport);
 	rxcall = rxrpc_kernel_begin_call(call->net->socket, srx, call->key,
 					 (unsigned long)call,
 					 tx_total_len, gfp,
@@ -380,16 +463,7 @@ int afs_make_call(struct sockaddr_rxrpc *srx, struct afs_call *call,
 	msg.msg_controllen	= 0;
 	msg.msg_flags		= (call->send_pages ? MSG_MORE : 0);
 
-	/* We have to change the state *before* sending the last packet as
-	 * rxrpc might give us the reply before it returns from sending the
-	 * request.  Further, if the send fails, we may already have been given
-	 * a notification and may have collected it.
-	 */
-	if (!call->send_pages)
-		call->state = AFS_CALL_AWAIT_REPLY;
-	ret = rxrpc_kernel_send_data(call->net->socket, rxcall,
-				     &msg, call->request_size,
-				     afs_notify_end_request_tx);
+	ret = afs_send_data(call, &msg, call->request_size);
 	if (ret < 0)
 		goto error_do_abort;
 
@@ -758,7 +832,6 @@ void afs_send_empty_reply(struct afs_call *call)
 	msg.msg_controllen	= 0;
 	msg.msg_flags		= 0;
 
-	call->state = AFS_CALL_AWAIT_ACK;
 	switch (rxrpc_kernel_send_data(net->socket, call->rxcall, &msg, 0,
 				       afs_notify_end_reply_tx)) {
 	case 0:
@@ -798,7 +871,6 @@ void afs_send_simple_reply(struct afs_call *call, const void *buf, size_t len)
 	msg.msg_controllen	= 0;
 	msg.msg_flags		= 0;
 
-	call->state = AFS_CALL_AWAIT_ACK;
 	n = rxrpc_kernel_send_data(net->socket, call->rxcall, &msg, len,
 				   afs_notify_end_reply_tx);
 	if (n >= 0) {
@@ -816,6 +888,69 @@ void afs_send_simple_reply(struct afs_call *call, const void *buf, size_t len)
 }
 
 /*
+ * Totate the destination address if a network error of some sort occurs and
+ * retry the call.
+ */
+static int afs_retry_call(struct afs_call *call, int ret)
+{
+	enum rxrpc_call_completion compl;
+	struct sockaddr_rxrpc *srx;
+
+	if (ret == -ENOMEM ||
+	    ret == -ENONET ||
+	    ret == -EINTR ||
+	    ret == -EFAULT ||
+	    ret == -ERESTARTSYS ||
+	    ret == -EKEYEXPIRED ||
+	    ret == -EKEYREVOKED ||
+	    ret == -EKEYREJECTED ||
+	    ret == -EPERM)
+		return ret;
+
+	/* Check to see if it's an error that meant the call data packets never
+	 * reached the peer.
+	 */
+	call->error = rxrpc_kernel_check_call(call->net->socket, call->rxcall,
+					      &compl, &call->abort_code);
+	if (call->error == -EINPROGRESS)
+		return ret;
+
+	switch (compl) {
+	case RXRPC_CALL_SUCCEEDED:
+	default:
+		WARN_ONCE(true, "AFS: Call succeeded despite send-data failing\n");
+		return 0;
+
+	case RXRPC_CALL_REMOTELY_ABORTED:
+	case RXRPC_CALL_LOCALLY_ABORTED:
+		/* All of these indicate that we had some interaction with the
+		 * server, so there's no point trying another server.
+		 */
+		return call->error;
+
+	case RXRPC_CALL_LOCAL_ERROR:
+	case RXRPC_CALL_NETWORK_ERROR:
+		/* Local errors from an attempt to connect a call and network
+		 * errors reported back by ICMP suggest skipping the current
+		 * address and trying the next.
+		 */
+		break;
+	}
+
+	/* Rotate servers if possible. */
+	srx = afs_get_address(&call->cursor);
+	if (IS_ERR(srx))
+		return PTR_ERR(srx);
+
+	_debug("retry %pISp", &srx->transport);
+	call->error = 0;
+	ret = rxrpc_kernel_retry_call(call->net->socket, call->rxcall,
+				      srx, call->key);
+	_leave(" = %d [retry]", ret);
+	return ret;
+}
+
+/*
  * Extract a piece of data from the received data socket buffers.
  */
 int afs_extract_data(struct afs_call *call, void *buf, size_t count,
@@ -850,10 +985,15 @@ int afs_extract_data(struct afs_call *call, void *buf, size_t count,
 		return 0;
 	}
 
-	if (ret == -ECONNABORTED)
+	if (ret == -ECONNABORTED) {
 		call->error = call->type->abort_to_error(call->abort_code);
-	else
-		call->error = ret;
+		goto out;
+	}
+
+	ret = afs_retry_call(call, ret);
+	if (ret == 0)
+		return -EAGAIN;
+out:
 	call->state = AFS_CALL_COMPLETE;
 	return ret;
 }
diff --git a/fs/afs/server.c b/fs/afs/server.c
index 57c2f605e11b..0f2e84966d3e 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -37,7 +37,9 @@ static int afs_install_server(struct afs_server *server)
 		p = *pp;
 		_debug("- consider %p", p);
 		xserver = rb_entry(p, struct afs_server, master_rb);
-		diff = memcmp(&server->addr, &xserver->addr, sizeof(server->addr));
+		diff = memcmp(&server->addrs->addrs[0],
+			      &xserver->addrs->addrs[0],
+			      sizeof(sizeof(server->addrs->addrs[0])));
 		if (diff < 0)
 			pp = &(*pp)->rb_left;
 		else if (diff > 0)
@@ -66,28 +68,41 @@ static struct afs_server *afs_alloc_server(struct afs_cell *cell,
 	_enter("");
 
 	server = kzalloc(sizeof(struct afs_server), GFP_KERNEL);
-	if (server) {
-		atomic_set(&server->usage, 1);
-		server->net = cell->net;
-		server->cell = cell;
-
-		INIT_LIST_HEAD(&server->link);
-		INIT_LIST_HEAD(&server->grave);
-		init_rwsem(&server->sem);
-		spin_lock_init(&server->fs_lock);
-		server->fs_vnodes = RB_ROOT;
-		server->cb_promises = RB_ROOT;
-		spin_lock_init(&server->cb_lock);
-		init_waitqueue_head(&server->cb_break_waitq);
-		INIT_DELAYED_WORK(&server->cb_break_work,
-				  afs_dispatch_give_up_callbacks);
-
-		server->addr = *addr;
-		_leave(" = %p{%d}", server, atomic_read(&server->usage));
-	} else {
-		_leave(" = NULL [nomem]");
-	}
+	if (!server)
+		goto enomem;
+	server->addrs = kzalloc(sizeof(struct afs_addr_list) +
+				sizeof(struct sockaddr_rxrpc),
+				GFP_KERNEL);
+	if (!server->addrs)
+		goto enomem_server;
+
+	atomic_set(&server->usage, 1);
+	server->net = cell->net;
+	server->cell = cell;
+
+	INIT_LIST_HEAD(&server->link);
+	INIT_LIST_HEAD(&server->grave);
+	init_rwsem(&server->sem);
+	spin_lock_init(&server->fs_lock);
+	server->fs_vnodes = RB_ROOT;
+	server->cb_promises = RB_ROOT;
+	spin_lock_init(&server->cb_lock);
+	init_waitqueue_head(&server->cb_break_waitq);
+	INIT_DELAYED_WORK(&server->cb_break_work,
+			  afs_dispatch_give_up_callbacks);
+
+	refcount_set(&server->addrs->usage, 1);
+	server->addrs->nr_addrs = 1;
+	server->addrs->addrs[0] = *addr;
+
+	_leave(" = %p{%d}", server, atomic_read(&server->usage));
 	return server;
+
+enomem_server:
+	kfree(server);
+enomem:
+	_leave(" = NULL [nomem]");
+	return NULL;
 }
 
 /*
@@ -104,7 +119,7 @@ struct afs_server *afs_lookup_server(struct afs_cell *cell,
 	read_lock(&cell->servers_lock);
 
 	list_for_each_entry(server, &cell->servers, link) {
-		if (memcmp(&server->addr, addr, sizeof(*addr)) == 0)
+		if (memcmp(&server->addrs->addrs[0], addr, sizeof(*addr)) == 0)
 			goto found_server_quickly;
 	}
 	read_unlock(&cell->servers_lock);
@@ -119,7 +134,7 @@ struct afs_server *afs_lookup_server(struct afs_cell *cell,
 
 	/* check the cell's server list again */
 	list_for_each_entry(server, &cell->servers, link) {
-		if (memcmp(&server->addr, addr, sizeof(*addr)) == 0)
+		if (memcmp(&server->addrs->addrs[0], addr, sizeof(*addr)) == 0)
 			goto found_server;
 	}
 
@@ -187,7 +202,7 @@ struct afs_server *afs_find_server(struct afs_net *net,
 
 		_debug("- consider %p", p);
 
-		diff = memcmp(srx, &server->addr, sizeof(*srx));
+		diff = memcmp(srx, &server->addrs->addrs[0], sizeof(*srx));
 		if (diff < 0) {
 			p = p->rb_left;
 		} else if (diff > 0) {
@@ -256,6 +271,7 @@ static void afs_destroy_server(struct afs_server *server)
 	ASSERTCMP(atomic_read(&server->cb_break_n), ==, 0);
 
 	afs_put_cell(server->net, server->cell);
+	afs_put_addrlist(server->addrs);
 	kfree(server);
 }
 
diff --git a/fs/afs/vlclient.c b/fs/afs/vlclient.c
index 276319aa86d8..54d02e5ea20a 100644
--- a/fs/afs/vlclient.c
+++ b/fs/afs/vlclient.c
@@ -156,8 +156,7 @@ static const struct afs_call_type afs_RXVLGetEntryById = {
 /*
  * dispatch a get volume entry by name operation
  */
-int afs_vl_get_entry_by_name(struct afs_net *net,
-			     struct sockaddr_rxrpc *addr,
+int afs_vl_get_entry_by_name(struct afs_cell *cell,
 			     struct key *key,
 			     const char *volname,
 			     struct afs_cache_vlocation *entry,
@@ -173,10 +172,13 @@ int afs_vl_get_entry_by_name(struct afs_net *net,
 	padsz = (4 - (volnamesz & 3)) & 3;
 	reqsz = 8 + volnamesz + padsz;
 
-	call = afs_alloc_flat_call(net, &afs_RXVLGetEntryByName, reqsz, 384);
+	call = afs_alloc_flat_call(cell->net, &afs_RXVLGetEntryByName,
+				   reqsz, 384);
 	if (!call)
 		return -ENOMEM;
 
+	afs_set_vl_cursor(call, cell);
+
 	call->key = key;
 	call->reply = entry;
 
@@ -189,14 +191,13 @@ int afs_vl_get_entry_by_name(struct afs_net *net,
 		memset((void *) bp + volnamesz, 0, padsz);
 
 	/* initiate the call */
-	return afs_make_call(addr, call, GFP_KERNEL, async);
+	return afs_make_call(call, GFP_KERNEL, async);
 }
 
 /*
  * dispatch a get volume entry by ID operation
  */
-int afs_vl_get_entry_by_id(struct afs_net *net,
-			   struct sockaddr_rxrpc *addr,
+int afs_vl_get_entry_by_id(struct afs_cell *cell,
 			   struct key *key,
 			   afs_volid_t volid,
 			   afs_voltype_t voltype,
@@ -208,10 +209,12 @@ int afs_vl_get_entry_by_id(struct afs_net *net,
 
 	_enter("");
 
-	call = afs_alloc_flat_call(net, &afs_RXVLGetEntryById, 12, 384);
+	call = afs_alloc_flat_call(cell->net, &afs_RXVLGetEntryById, 12, 384);
 	if (!call)
 		return -ENOMEM;
 
+	afs_set_vl_cursor(call, cell);
+
 	call->key = key;
 	call->reply = entry;
 
@@ -222,5 +225,5 @@ int afs_vl_get_entry_by_id(struct afs_net *net,
 	*bp   = htonl(voltype);
 
 	/* initiate the call */
-	return afs_make_call(addr, call, GFP_KERNEL, async);
+	return afs_make_call(call, GFP_KERNEL, async);
 }
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index ec5ab8dc9bc8..8c64a16c0aaf 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -22,137 +22,6 @@ static unsigned afs_vlocation_timeout = 10;	/* volume location timeout in second
 static unsigned afs_vlocation_update_timeout = 10 * 60;
 
 /*
- * iterate through the VL servers in a cell until one of them admits knowing
- * about the volume in question
- */
-static int afs_vlocation_access_vl_by_name(struct afs_vlocation *vl,
-					   struct key *key,
-					   struct afs_cache_vlocation *vldb)
-{
-	struct afs_cell *cell = vl->cell;
-	int count, ret;
-
-	_enter("%s,%s", cell->name, vl->vldb.name);
-
-	down_write(&vl->cell->vl_sem);
-	ret = -ENOMEDIUM;
-	for (count = cell->vl_naddrs; count > 0; count--) {
-		struct sockaddr_rxrpc *addr = &cell->vl_addrs[cell->vl_curr_svix];
-
-		_debug("CellServ[%hu]: %pIS", cell->vl_curr_svix, &addr->transport);
-
-		/* attempt to access the VL server */
-		ret = afs_vl_get_entry_by_name(cell->net, addr, key,
-					       vl->vldb.name, vldb, false);
-		switch (ret) {
-		case 0:
-			goto out;
-		case -ENOMEM:
-		case -ENONET:
-		case -ENETUNREACH:
-		case -EHOSTUNREACH:
-		case -ECONNREFUSED:
-			if (ret == -ENOMEM || ret == -ENONET)
-				goto out;
-			goto rotate;
-		case -ENOMEDIUM:
-		case -EKEYREJECTED:
-		case -EKEYEXPIRED:
-			goto out;
-		default:
-			ret = -EIO;
-			goto rotate;
-		}
-
-		/* rotate the server records upon lookup failure */
-	rotate:
-		cell->vl_curr_svix++;
-		cell->vl_curr_svix %= cell->vl_naddrs;
-	}
-
-out:
-	up_write(&vl->cell->vl_sem);
-	_leave(" = %d", ret);
-	return ret;
-}
-
-/*
- * iterate through the VL servers in a cell until one of them admits knowing
- * about the volume in question
- */
-static int afs_vlocation_access_vl_by_id(struct afs_vlocation *vl,
-					 struct key *key,
-					 afs_volid_t volid,
-					 afs_voltype_t voltype,
-					 struct afs_cache_vlocation *vldb)
-{
-	struct afs_cell *cell = vl->cell;
-	int count, ret;
-
-	_enter("%s,%x,%d,", cell->name, volid, voltype);
-
-	down_write(&vl->cell->vl_sem);
-	ret = -ENOMEDIUM;
-	for (count = cell->vl_naddrs; count > 0; count--) {
-		struct sockaddr_rxrpc *addr = &cell->vl_addrs[cell->vl_curr_svix];
-
-		_debug("CellServ[%hu]: %pIS", cell->vl_curr_svix, &addr->transport);
-
-		/* attempt to access the VL server */
-		ret = afs_vl_get_entry_by_id(cell->net, addr, key, volid,
-					     voltype, vldb, false);
-		switch (ret) {
-		case 0:
-			goto out;
-		case -ENOMEM:
-		case -ENONET:
-		case -ENETUNREACH:
-		case -EHOSTUNREACH:
-		case -ECONNREFUSED:
-			if (ret == -ENOMEM || ret == -ENONET)
-				goto out;
-			goto rotate;
-		case -EBUSY:
-			vl->upd_busy_cnt++;
-			if (vl->upd_busy_cnt <= 3) {
-				if (vl->upd_busy_cnt > 1) {
-					/* second+ BUSY - sleep a little bit */
-					set_current_state(TASK_UNINTERRUPTIBLE);
-					schedule_timeout(1);
-				}
-				continue;
-			}
-			break;
-		case -ENOMEDIUM:
-			vl->upd_rej_cnt++;
-			goto rotate;
-		default:
-			ret = -EIO;
-			goto rotate;
-		}
-
-		/* rotate the server records upon lookup failure */
-	rotate:
-		cell->vl_curr_svix++;
-		cell->vl_curr_svix %= cell->vl_naddrs;
-		vl->upd_busy_cnt = 0;
-	}
-
-out:
-	if (ret < 0 && vl->upd_rej_cnt > 0) {
-		printk(KERN_NOTICE "kAFS:"
-		       " Active volume no longer valid '%s'\n",
-		       vl->vldb.name);
-		vl->valid = 0;
-		ret = -ENOMEDIUM;
-	}
-
-	up_write(&vl->cell->vl_sem);
-	_leave(" = %d", ret);
-	return ret;
-}
-
-/*
  * allocate a volume location record
  */
 static struct afs_vlocation *afs_vlocation_alloc(struct afs_cell *cell,
@@ -197,6 +66,7 @@ static int afs_vlocation_update_record(struct afs_vlocation *vl,
 	       vl->vldb.vid[1],
 	       vl->vldb.vid[2]);
 
+retry:
 	if (vl->vldb.vidmask & AFS_VOL_VTM_RW) {
 		vid = vl->vldb.vid[0];
 		voltype = AFSVL_RWVOL;
@@ -215,7 +85,8 @@ static int afs_vlocation_update_record(struct afs_vlocation *vl,
 	/* contact the server to make sure the volume is still available
 	 * - TODO: need to handle disconnected operation here
 	 */
-	ret = afs_vlocation_access_vl_by_id(vl, key, vid, voltype, vldb);
+	ret = afs_vl_get_entry_by_id(vl->cell, key, vid, voltype,
+				     vldb, false);
 	switch (ret) {
 		/* net error */
 	default:
@@ -239,6 +110,18 @@ static int afs_vlocation_update_record(struct afs_vlocation *vl,
 		/* TODO: make existing record unavailable */
 		_leave(" = %d", ret);
 		return ret;
+
+	case -EBUSY:
+		vl->upd_busy_cnt++;
+		if (vl->upd_busy_cnt <= 3) {
+			if (vl->upd_busy_cnt > 1) {
+				/* second+ BUSY - sleep a little bit */
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				schedule_timeout(1);
+			}
+			goto retry;
+		}
+		return -EBUSY;
 	}
 }
 
@@ -278,7 +161,8 @@ static int afs_vlocation_fill_in_record(struct afs_vlocation *vl,
 	memset(&vldb, 0, sizeof(vldb));
 
 	/* Try to look up an unknown volume in the cell VL databases by name */
-	ret = afs_vlocation_access_vl_by_name(vl, key, &vldb);
+	ret = afs_vl_get_entry_by_name(vl->cell, key, vl->vldb.name,
+				       &vldb, false);
 	if (ret < 0) {
 		printk("kAFS: failed to locate '%s' in cell '%s'\n",
 		       vl->vldb.name, vl->cell->name);
diff --git a/fs/afs/vnode.c b/fs/afs/vnode.c
index 64834b20f0f6..8dcf4921340a 100644
--- a/fs/afs/vnode.c
+++ b/fs/afs/vnode.c
@@ -354,8 +354,7 @@ int afs_vnode_fetch_status(struct afs_vnode *vnode,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %p{%pIS}",
-		       server, &server->addr.transport);
+		_debug("USING SERVER: %pISp", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_fetch_file_status(server, key, vnode, NULL,
 					       false);
@@ -418,7 +417,7 @@ int afs_vnode_fetch_data(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pISp", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_fetch_data(server, key, vnode, desc,
 					false);
@@ -474,7 +473,7 @@ int afs_vnode_create(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pISp", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_create(server, key, vnode, name, mode, newfid,
 				    newstatus, newcb, false);
@@ -530,7 +529,7 @@ int afs_vnode_remove(struct afs_vnode *vnode, struct key *key, const char *name,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pIS\n", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_remove(server, key, vnode, name, isdir,
 				    false);
@@ -592,7 +591,7 @@ int afs_vnode_link(struct afs_vnode *dvnode, struct afs_vnode *vnode,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pIS\n", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_link(server, key, dvnode, vnode, name,
 				  false);
@@ -656,7 +655,7 @@ int afs_vnode_symlink(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pIS\n", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_symlink(server, key, vnode, name, content,
 				     newfid, newstatus, false);
@@ -726,7 +725,7 @@ int afs_vnode_rename(struct afs_vnode *orig_dvnode,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pIS\n", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_rename(server, key, orig_dvnode, orig_name,
 				    new_dvnode, new_name, false);
@@ -792,7 +791,7 @@ int afs_vnode_store_data(struct afs_writeback *wb, pgoff_t first, pgoff_t last,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pIS\n", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_store_data(server, wb, first, last, offset, to,
 					false);
@@ -845,7 +844,7 @@ int afs_vnode_setattr(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pIS\n", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_setattr(server, key, vnode, attr, false);
 
@@ -892,7 +891,7 @@ int afs_vnode_get_volume_status(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pIS\n", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_get_volume_status(server, key, vnode, vs, false);
 
@@ -931,7 +930,7 @@ int afs_vnode_set_lock(struct afs_vnode *vnode, struct key *key,
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pIS\n", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_set_lock(server, key, vnode, type, false);
 
@@ -969,7 +968,7 @@ int afs_vnode_extend_lock(struct afs_vnode *vnode, struct key *key)
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pIS\n", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_extend_lock(server, key, vnode, false);
 
@@ -1007,7 +1006,7 @@ int afs_vnode_release_lock(struct afs_vnode *vnode, struct key *key)
 		if (IS_ERR(server))
 			goto no_server;
 
-		_debug("USING SERVER: %pIS\n", &server->addr.transport);
+		_debug("USING SERVER: %pIS\n", &server->addrs->addrs[0].transport);
 
 		ret = afs_fs_release_lock(server, key, vnode, false);
 
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index fbbb470ac027..c0d4e9725d5e 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -249,7 +249,7 @@ struct afs_server *afs_volume_pick_fileserver(struct afs_vnode *vnode)
 			afs_get_server(server);
 			up_read(&volume->server_sem);
 			_leave(" = %p (picked %pIS)",
-			       server, &server->addr.transport);
+			       server, &server->addrs->addrs[0].transport);
 			return server;
 
 		case -ENETUNREACH:
@@ -304,7 +304,8 @@ int afs_volume_release_fileserver(struct afs_vnode *vnode,
 	unsigned loop;
 
 	_enter("%s,%pIS,%d",
-	       volume->vlocation->vldb.name, &server->addr.transport, result);
+	       volume->vlocation->vldb.name, &server->addrs->addrs[0].transport,
+	       result);
 
 	switch (result) {
 		/* success */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 00/11] AFS: Namespacing part 1
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
                   ` (9 preceding siblings ...)
  2017-09-01 15:42 ` [RFC PATCH 11/11] afs: Retry rxrpc calls with address rotation on network error David Howells
@ 2017-09-01 15:52 ` David Howells
  2017-09-05 13:29 ` [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility Tejun Heo
  2017-09-05 14:50 ` David Howells
  12 siblings, 0 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 15:52 UTC (permalink / raw)
  To: linux-afs; +Cc: dhowells, linux-fsdevel, linux-kernel

Here are some changes to the AFS filesystem that form the first part of
network-namespacing and IPv6 enabling the AFS filesystem.  AF_RXRPC is
already namespaced.

This is built on AF_RXRPC changes tagged with rxrpc-next-20170829 (which is
also in net-next).

The AFS changes are:

 (1) Create a dummy AFS network namespace and shift a bunch of global
     things into it and start using it.

 (2) Add some more AFS RPC protocol definitions.

 (3) Update the cache infrastructure to remove some stuff that is redundant
     or not actually useful and increment the version.

 (4) Keep track of internal addresses in terms of sockaddr_rxrpc structs
     rather than in_addr structs.  This will enable the use of IPv6.

 (5) Allow IPv6 addresses for VL servers to be specified.  Note that this
     doesn't help with finding FS servers as that requires a protocol
     change.  Such a protocol extension is available in the AuriStor
     AFS-compatible server, though I haven't implemented that yet.

 (6) Overhaul cell database management to manage them better, making them
     automatically kept up to date from the DNS server.

 (7) Make use of the new AF_RXRPC call-retry to implement address rotation
     for VL servers and FS servers without the need to re-encrypt client
     call data.

To make this work, I've added some extensions to the core kernel:

 (1) Add a decrement-after-return function for workqueues that allows a
     work item to ask the workqueue manager to decrement an atomic_t and
     'wake it up' if it reaches 0.  This is analogous to
     complete_and_exit() and can be used to protect rmmod against code
     removal.

 (2) Add refcount_inc/dec_return() functions that return the new value of
     the refcount_t.  This makes maintaining a cache easier where you want
     to schedule timed garbage collection when the refcount reaches 1.  It
     also makes tracing easier as the value is obtained atomically.

 (3) Pass the wait mode to wait_on_atomic_t() and provide a default action
     function.  This allows various default actions scattered about the
     place to be deleted.

 (4) Add a function to start or reduce the timeout on a timer if it's
     already running.  This makes it easier to maintain a single timer for
     multiple events without requiring extra locking to check/modify the
     timer (the timer has its own lock after all).


The patches can be found here also:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions
  2017-09-01 15:41 ` [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions David Howells
@ 2017-09-01 16:42   ` Peter Zijlstra
  2017-09-01 21:15   ` David Howells
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2017-09-01 16:42 UTC (permalink / raw)
  To: David Howells; +Cc: linux-afs, linux-fsdevel, Kees Cook, linux-kernel

On Fri, Sep 01, 2017 at 04:41:01PM +0100, David Howells wrote:
> Implement functions that increment or decrement a refcount_t object and
> return the value.  The dec-and-ret function can be used to maintain a
> counter in a cache where 1 means the object is unused, but available and
> the garbage collector can use refcount_dec_if_one() to make the object
> unavailable.  Further, both functions can be used to accurately trace the
> refcount (refcount_inc() followed by refcount_read() can't be considered
> accurate).
> 
> The interface is as follows:
> 
> 	unsigned int refcount_dec_return(refcount_t *r);
> 	unsigned int refcount_inc_return(refcount_t *r);
> 

I'm not immediately seeing how wanting 1 to mean unused leads to
requiring these two functions.

If you'll remember, I did that for inode_count and only needed
dec_unless().

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions
  2017-09-01 15:41 ` [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions David Howells
  2017-09-01 16:42   ` Peter Zijlstra
@ 2017-09-01 21:15   ` David Howells
  2017-09-01 21:50     ` Peter Zijlstra
  2017-09-01 22:51     ` David Howells
  2017-09-04 15:36   ` Christoph Hellwig
  2017-09-04 16:08   ` David Howells
  3 siblings, 2 replies; 26+ messages in thread
From: David Howells @ 2017-09-01 21:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, linux-afs, linux-fsdevel, Kees Cook, linux-kernel

Peter Zijlstra <peterz@infradead.org> wrote:

> > 	unsigned int refcount_dec_return(refcount_t *r);
> > 	unsigned int refcount_inc_return(refcount_t *r);
> > 
> 
> I'm not immediately seeing how wanting 1 to mean unused leads to
> requiring these two functions.

Did you read the other other part of the description?

	Further, both functions can be used to accurately trace the refcount
	(refcount_inc() followed by refcount_read() can't be considered
	accurate).

> If you'll remember, I did that for inode_count and only needed
> dec_unless().

I don't remember.  inode_count?  I can't find such a thing - did you mean
i_count?  I don't find anything matching "dec_unless.*i_count" either.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions
  2017-09-01 21:15   ` David Howells
@ 2017-09-01 21:50     ` Peter Zijlstra
  2017-09-01 22:03       ` Peter Zijlstra
  2017-09-01 22:51     ` David Howells
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-09-01 21:50 UTC (permalink / raw)
  To: David Howells; +Cc: linux-afs, linux-fsdevel, Kees Cook, linux-kernel

On Fri, Sep 01, 2017 at 10:15:39PM +0100, David Howells wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > 	unsigned int refcount_dec_return(refcount_t *r);
> > > 	unsigned int refcount_inc_return(refcount_t *r);
> > > 
> > 
> > I'm not immediately seeing how wanting 1 to mean unused leads to
> > requiring these two functions.
> 
> Did you read the other other part of the description?
> 
> 	Further, both functions can be used to accurately trace the refcount
> 	(refcount_inc() followed by refcount_read() can't be considered
> 	accurate).

I must admit to having overlooked that. But can we treat the two issues
separately? They are quite distinct.

> > If you'll remember, I did that for inode_count and only needed
> > dec_unless().
> 
> I don't remember.  inode_count?  I can't find such a thing - did you mean
> i_count?  I don't find anything matching "dec_unless.*i_count" either.

Ah, yes, i_count. See these:

https://lkml.kernel.org/r/20170224162044.479190330@infradead.org
https://lkml.kernel.org/r/20170224162044.548813302@infradead.org

But looking at them, i_count was rather special, a normal GC based
scheme doesn't need anything new AFAICT:

add:
	spin_lock(&map->lock)
	refcount_set(&obj->refs, 1);
	map_link(map, obj);
	spin_unlock(&map->lock);

lookup:
	rcu_read_lock();
	obj = map_find(map, key);
	if (obj && !refcount_inc_not_zero(&obj->refs))
	  obj = NULL;
	rcu_read_unlock();

	if (obj) {
	  /* use obj */
	  refcount_dec(&obj->refs); /* should never hit 0 */
	}

GC:
	spin_lock(&map->lock);
	map_for_each_obj_safe(obj, map) {
	  if (refcount_dec_if_one(&obj->refs)) {
	    map_unlink(map, obj);
	    rcu_free(obj);
	  }
	}
	spin_unlock(&map->lock);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions
  2017-09-01 21:50     ` Peter Zijlstra
@ 2017-09-01 22:03       ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2017-09-01 22:03 UTC (permalink / raw)
  To: David Howells; +Cc: linux-afs, linux-fsdevel, Kees Cook, linux-kernel

On Fri, Sep 01, 2017 at 11:50:03PM +0200, Peter Zijlstra wrote:
> > Did you read the other other part of the description?
> > 
> > 	Further, both functions can be used to accurately trace the refcount
> > 	(refcount_inc() followed by refcount_read() can't be considered
> > 	accurate).
> 
> I must admit to having overlooked that. But can we treat the two issues
> separately? They are quite distinct.

So for tracing purposes inc_return/dec_return don't cover the full set.

In particular: inc_not_zero, dec_not_one and dec_and_*lock are not
covered.

dec_if_one I suppose we only care about the success case, in which case
we knew it was one by inference.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions
  2017-09-01 21:15   ` David Howells
  2017-09-01 21:50     ` Peter Zijlstra
@ 2017-09-01 22:51     ` David Howells
  2017-09-04  7:30       ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: David Howells @ 2017-09-01 22:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, linux-afs, linux-fsdevel, Kees Cook, linux-kernel

Peter Zijlstra <peterz@infradead.org> wrote:

> 	if (obj) {
> 	  /* use obj */
> 	  refcount_dec(&obj->refs); /* should never hit 0 */
> 	}

You've missed a bit: We need to tell the gc to run when we reduce the refcount
to 1:

	if (obj) {
		...
		if (refcount_dec_return(&obj->refs) == 1)
			schedule_gc();
	}

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions
  2017-09-01 22:51     ` David Howells
@ 2017-09-04  7:30       ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2017-09-04  7:30 UTC (permalink / raw)
  To: David Howells; +Cc: linux-afs, linux-fsdevel, Kees Cook, linux-kernel

On Fri, Sep 01, 2017 at 11:51:53PM +0100, David Howells wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > 	if (obj) {
> > 	  /* use obj */
> > 	  refcount_dec(&obj->refs); /* should never hit 0 */
> > 	}
> 
> You've missed a bit: We need to tell the gc to run when we reduce the refcount
> to 1:
> 
> 	if (obj) {
> 		...
> 		if (refcount_dec_return(&obj->refs) == 1)
> 			schedule_gc();
> 	}

Ah, so that isn't fundamental to having a GC. But yes if that's your
requirement, then this makes sense.

Please clarify in the Changelog.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions
  2017-09-01 15:41 ` [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions David Howells
  2017-09-01 16:42   ` Peter Zijlstra
  2017-09-01 21:15   ` David Howells
@ 2017-09-04 15:36   ` Christoph Hellwig
  2017-09-04 16:08   ` David Howells
  3 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2017-09-04 15:36 UTC (permalink / raw)
  To: David Howells
  Cc: linux-afs, Peter Zijlstra, linux-fsdevel, Kees Cook, linux-kernel

On Fri, Sep 01, 2017 at 04:41:01PM +0100, David Howells wrote:
> Implement functions that increment or decrement a refcount_t object and
> return the value.  The dec-and-ret function can be used to maintain a
> counter in a cache where 1 means the object is unused, but available and
> the garbage collector can use refcount_dec_if_one() to make the object
> unavailable.  Further, both functions can be used to accurately trace the
> refcount (refcount_inc() followed by refcount_read() can't be considered
> accurate).

Please just use a different interface for that instead of overloading
refcount_t.  The main use case of that type is that it is so simple that
it is hard to get wrong (and have additional checking if things go
wrong)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions
  2017-09-01 15:41 ` [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions David Howells
                     ` (2 preceding siblings ...)
  2017-09-04 15:36   ` Christoph Hellwig
@ 2017-09-04 16:08   ` David Howells
  2017-09-05  6:45     ` Christoph Hellwig
  3 siblings, 1 reply; 26+ messages in thread
From: David Howells @ 2017-09-04 16:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, linux-afs, Peter Zijlstra, linux-fsdevel, Kees Cook,
	linux-kernel

Christoph Hellwig <hch@infradead.org> wrote:

> > Implement functions that increment or decrement a refcount_t object and
> > return the value.  The dec-and-ret function can be used to maintain a
> > counter in a cache where 1 means the object is unused, but available and
> > the garbage collector can use refcount_dec_if_one() to make the object
> > unavailable.  Further, both functions can be used to accurately trace the
> > refcount (refcount_inc() followed by refcount_read() can't be considered
> > accurate).
> 
> Please just use a different interface for that instead of overloading
> refcount_t.  The main use case of that type is that it is so simple that
> it is hard to get wrong (and have additional checking if things go
> wrong)

Which bit are you objecting to?  Wanting to use a refcount_t with 1 to
represent an otherwise-unreferenced object sat in a cache?  Or wanting to
display accurate usage counts when tracing gets and puts of objects?

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions
  2017-09-04 16:08   ` David Howells
@ 2017-09-05  6:45     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2017-09-05  6:45 UTC (permalink / raw)
  To: David Howells
  Cc: Christoph Hellwig, Kees Cook, Peter Zijlstra, linux-kernel,
	linux-fsdevel, linux-afs

On Mon, Sep 04, 2017 at 05:08:29PM +0100, David Howells wrote:
> Which bit are you objecting to?  Wanting to use a refcount_t with 1 to
> represent an otherwise-unreferenced object sat in a cache?  Or wanting to
> display accurate usage counts when tracing gets and puts of objects?

Primarily the first, but also any feature creap of the refcount_t in
general.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
                   ` (10 preceding siblings ...)
  2017-09-01 15:52 ` [RFC PATCH 00/11] AFS: Namespacing part 1 David Howells
@ 2017-09-05 13:29 ` Tejun Heo
  2017-09-05 14:50 ` David Howells
  12 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2017-09-05 13:29 UTC (permalink / raw)
  To: David Howells; +Cc: linux-afs, linux-fsdevel, Lai Jiangshan, linux-kernel

Hello, David.

On Fri, Sep 01, 2017 at 04:40:53PM +0100, David Howells wrote:
> Add a facility to the workqueue subsystem whereby an atomic_t can be
> registered by a work function such that the work function dispatcher will
> decrement the atomic after the work function has returned and then call
> wake_up_atomic() on it if it reached 0.
> 
> This is analogous to complete_and_exit() for kernel threads and is used to
> avoid a race between notifying that a work item is about to finish and the
> .text segment from a module being discarded.
> 
> The way this is used is that the work function calls:
> 
> 	dec_after_work(atomic_t *counter);
> 
> to register the counter and then process_one_work() calls it, potentially
> wakes it and clears the registration.
> 
> The reason I've used an atomic_t rather than a completion is that (1) it
> takes up less space and (2) it can monitor multiple objects.

Given how work items are used, I think this is too inviting to abuses
where people build complex event chains through these counters and
those chains would be completely opaque.  If the goal is protecting
.text of a work item, can't we just do that?  Can you please describe
your use case in more detail?  Why can't it be done via the usual
"flush from exit"?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility
  2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
                   ` (11 preceding siblings ...)
  2017-09-05 13:29 ` [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility Tejun Heo
@ 2017-09-05 14:50 ` David Howells
  2017-09-06 14:51   ` Tejun Heo
  12 siblings, 1 reply; 26+ messages in thread
From: David Howells @ 2017-09-05 14:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: dhowells, linux-afs, linux-fsdevel, Lai Jiangshan, linux-kernel

Tejun Heo <tj@kernel.org> wrote:

> Given how work items are used, I think this is too inviting to abuses
> where people build complex event chains through these counters and
> those chains would be completely opaque.  If the goal is protecting
> .text of a work item, can't we just do that?  Can you please describe
> your use case in more detail?

With one of my latest patches to AFS, there's a set of cell records, where
each cell has a manager work item that mainains that cell, including
refreshing DNS records and excising expired records from the list.  Performing
the excision in the manager work item makes handling the fscache index cookie
easier (you can't have two cookies attached to the same object), amongst other
things.

There's also an overseer work item that maintains a single expiry timer for
all the cells and queues the per-cell work items to do DNS updates and cell
removal.

The reason that the overseer exists is that it makes it easier to do a put on
a cell.  The cell decrements the cell refcount and then wants to schedule the
cell for destruction - but it's no longer permitted to touch the cell.  I
could use atomic_dec_and_lock(), but that's messy.  It's cleaner just to set
the timer on the overseer and leave it to that.

However, if someone does rmmod, I have to be able to clean everything up.  The
overseer timer may be queued or running; the overseer may be queued *and*
running and may get queued again by the timer; and each cell's work item may
be queued *and* running and may get queued again by the manager.

> Why can't it be done via the usual "flush from exit"?

Well, it can, but you need a flush for each separate level of dependencies,
where one dependency will kick off another level of dependency during the
cleanup.

So what I think I would have to do is set a flag to say that no one is allowed
to set the timer now (this shouldn't happen outside of server or volume cache
clearance), delete the timer synchronously, flush the work queue four times
and then do an RCU barrier.

However, since I have volumes with dependencies on servers and cells, possibly
with their own managers, I think I may need up to 12 flushes, possibly with
interspersed RCU barriers.

It's much simpler to count out the objects than to try and get the flushing
right.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility
  2017-09-05 14:50 ` David Howells
@ 2017-09-06 14:51   ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2017-09-06 14:51 UTC (permalink / raw)
  To: David Howells; +Cc: linux-afs, linux-fsdevel, Lai Jiangshan, linux-kernel

Hello, David.

On Tue, Sep 05, 2017 at 03:50:16PM +0100, David Howells wrote:
> With one of my latest patches to AFS, there's a set of cell records, where
> each cell has a manager work item that mainains that cell, including
> refreshing DNS records and excising expired records from the list.  Performing
> the excision in the manager work item makes handling the fscache index cookie
> easier (you can't have two cookies attached to the same object), amongst other
> things.
> 
> There's also an overseer work item that maintains a single expiry timer for
> all the cells and queues the per-cell work items to do DNS updates and cell
> removal.
> 
> The reason that the overseer exists is that it makes it easier to do a put on
> a cell.  The cell decrements the cell refcount and then wants to schedule the
> cell for destruction - but it's no longer permitted to touch the cell.  I
> could use atomic_dec_and_lock(), but that's messy.  It's cleaner just to set
> the timer on the overseer and leave it to that.
> 
> However, if someone does rmmod, I have to be able to clean everything up.  The
> overseer timer may be queued or running; the overseer may be queued *and*
> running and may get queued again by the timer; and each cell's work item may
> be queued *and* running and may get queued again by the manager.

Thanks for the detailed explanation.

> > Why can't it be done via the usual "flush from exit"?
> 
> Well, it can, but you need a flush for each separate level of dependencies,
> where one dependency will kick off another level of dependency during the
> cleanup.
> 
> So what I think I would have to do is set a flag to say that no one is allowed
> to set the timer now (this shouldn't happen outside of server or volume cache
> clearance), delete the timer synchronously, flush the work queue four times
> and then do an RCU barrier.
> 
> However, since I have volumes with dependencies on servers and cells, possibly
> with their own managers, I think I may need up to 12 flushes, possibly with
> interspersed RCU barriers.

Would it be possible to isolate work items for the cell in its own
workqueue and use drain_workqueue()?  Separating out flush domains is
one of the main use cases for dedicated workqueues after all.

> It's much simpler to count out the objects than to try and get the flushing
> right.

I still feel very reluctant to add generic counting & trigger
mechanism to work items for this.  I think it's too generic a solution
for a very specific problem.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 04/11] Add a function to start/reduce a timer
  2017-09-01 15:41 ` [RFC PATCH 04/11] Add a function to start/reduce a timer David Howells
@ 2017-10-20 12:20   ` Thomas Gleixner
  2017-11-09  0:33   ` David Howells
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2017-10-20 12:20 UTC (permalink / raw)
  To: David Howells; +Cc: linux-afs, linux-fsdevel, linux-kernel

On Fri, 1 Sep 2017, David Howells wrote:

> Add a function, similar to mod_timer(), that will start a timer it isn't

s/it /if it /

> running and will modify it if it is running and has an expiry time longer
> than the new time.  If the timer is running with an expiry time that's the
> same or sooner, no change is made.
>
> The function looks like:
>
>       int reduce_timer(struct timer_list *timer, unsigned long expires);

Well, yes. But what's the purpose of this function? You explain the what,
but not the why.

> +extern int reduce_timer(struct timer_list *timer, unsigned long expires);

For new timer functions we really should use the timer_xxxx()
convention. The historic naming convention is horrible.

Aside of that timer_reduce() is kinda ugly but I failed to come up with
something reasonable as well.

>  static inline int
> -__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
> +__mod_timer(struct timer_list *timer, unsigned long expires, unsigned int options)
>  {
>  	struct timer_base *base, *new_base;
>  	unsigned int idx = UINT_MAX;
> @@ -938,8 +941,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
>  	 * same array bucket then just return:
>  	 */
>  	if (timer_pending(timer)) {
> -		if (timer->expires == expires)
> -			return 1;
> +		if (options & MOD_TIMER_REDUCE) {
> +			if (time_before_eq(timer->expires, expires))
> +				return 1;
> +		} else {
> +			if (timer->expires == expires)
> +				return 1;
> +		}

This hurts the common networking optimzation case. Please keep that check
first:

		if (timer->expires == expires)
			return 1;

		if ((options & MOD_TIMER_REDUCE) &&
		    time_before(timer->expires, expires))
		    	return 1;

Also please check whether it's more efficient code wise to have that option
thing or if an additional 'bool reduce' argument cerates better code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 04/11] Add a function to start/reduce a timer
  2017-09-01 15:41 ` [RFC PATCH 04/11] Add a function to start/reduce a timer David Howells
  2017-10-20 12:20   ` Thomas Gleixner
@ 2017-11-09  0:33   ` David Howells
  1 sibling, 0 replies; 26+ messages in thread
From: David Howells @ 2017-11-09  0:33 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: dhowells, linux-fsdevel, linux-afs, linux-kernel

Thomas Gleixner <tglx@linutronix.de> wrote:

> > +extern int reduce_timer(struct timer_list *timer, unsigned long expires);
> 
> For new timer functions we really should use the timer_xxxx()
> convention. The historic naming convention is horrible.
> 
> Aside of that timer_reduce() is kinda ugly but I failed to come up with
> something reasonable as well.

reduce_timer() sounds snappier, probably because the verb is first, but I can
make it timer_reduce() if you prefer.  Or maybe timer_advance() - though
that's less clear as to the direction.

> > +		if (options & MOD_TIMER_REDUCE) {
> > +			if (time_before_eq(timer->expires, expires))
> > +				return 1;
> > +		} else {
> > +			if (timer->expires == expires)
> > +				return 1;
> > +		}
> 
> This hurts the common networking optimzation case. Please keep that check
> first:

The way the code stands, it doesn't make much difference because compiler
optimises away the MOD_TIMER_REDUCE case for __mod_timer() and optimises away
the other branch for reduce_timer().

> 		if (timer->expires == expires)
> 			return 1;
> 
> 		if ((options & MOD_TIMER_REDUCE) &&
> 		    time_before(timer->expires, expires))
> 		    	return 1;
> 
> Also please check whether it's more efficient code wise to have that option
> thing or if an additional 'bool reduce' argument cerates better code.

It's a constant passed into an inline function - gcc-7's optimiser copes with
that for x86_64 at least.  mod_timer() contains:

   0xffffffff810bb7a0 <+31>:    cmpq   $0x0,0x8(%rdi)
   0xffffffff810bb7a5 <+36>:    mov    %rsi,%r12
   0xffffffff810bb7a8 <+39>:    mov    %rdi,%rbx
   0xffffffff810bb7ab <+42>:    je     0xffffffff810bb829 <mod_timer+168>
   0xffffffff810bb7ad <+44>:    cmp    0x10(%rdi),%rsi
   0xffffffff810bb7b1 <+48>:    movl   $0x1,-0x38(%rbp)
   0xffffffff810bb7b8 <+55>:    je     0xffffffff810bba9f <mod_timer+798>

and reduce_timer() contains:

   0xffffffff810bbaed <+31>:    cmpq   $0x0,0x8(%rdi)
   0xffffffff810bbaf2 <+36>:    mov    %rsi,%r13
   0xffffffff810bbaf5 <+39>:    mov    %rdi,%rbx
   0xffffffff810bbaf8 <+42>:    je     0xffffffff810bbb9d <reduce_timer+207>
   0xffffffff810bbafe <+48>:    cmp    0x10(%rdi),%rsi
   0xffffffff810bbb02 <+52>:    mov    $0x1,%r14d
   0xffffffff810bbb08 <+58>:    jns    0xffffffff810bbe23 <reduce_timer+853>

As you can see, the relevant jump instruction is JE in one and JNS in the
other.

If I make the change you suggest with the equality check being unconditional,
mod_timer() is unchanged and reduce_timer() then contains:

   0xffffffff810bbaed <+31>:    cmpq   $0x0,0x8(%rdi)
   0xffffffff810bbaf2 <+36>:    mov    %rsi,%r13
   0xffffffff810bbaf5 <+39>:    mov    %rdi,%rbx
   0xffffffff810bbaf8 <+42>:    je     0xffffffff810bbba9 <reduce_timer+219>
   0xffffffff810bbafe <+48>:    mov    0x10(%rdi),%rax
   0xffffffff810bbb02 <+52>:    mov    $0x1,%r14d
   0xffffffff810bbb08 <+58>:    cmp    %rax,%rsi
   0xffffffff810bbb0b <+61>:    je     0xffffffff810bbe2f <reduce_timer+865>
   0xffffffff810bbb11 <+67>:    cmp    %rax,%rsi
   0xffffffff810bbb14 <+70>:    jns    0xffffffff810bbe2f <reduce_timer+865>

which smacks of a missed optimisation, since timer_before_eq() covers the ==
case.  Doing:

		long diff = timer->expires - expires;
		if (diff == 0)
			return 1;
		if (options & MOD_TIMER_REDUCE &&
		    diff <= 0)
			return 1;

gets me the same code in mod_timer() and the following in reduce_timer():

   0xffffffff810bbaed <+31>:    cmpq   $0x0,0x8(%rdi)
   0xffffffff810bbaf2 <+36>:    mov    %rsi,%r13
   0xffffffff810bbaf5 <+39>:    mov    %rdi,%rbx
   0xffffffff810bbaf8 <+42>:    je     0xffffffff810bbba3 <reduce_timer+213>
   0xffffffff810bbafe <+48>:    mov    0x10(%rdi),%rax
   0xffffffff810bbb02 <+52>:    mov    $0x1,%r14d
   0xffffffff810bbb08 <+58>:    sub    %rsi,%rax
   0xffffffff810bbb0b <+61>:    test   %rax,%rax
   0xffffffff810bbb0e <+64>:    jle    0xffffffff810bbe29 <reduce_timer+859>

which is marginally better - though I think it could still be optimised
better by the compiler.

Actually, something that might increase efficiency overall is to make
add_timer() an inline and forego the check - but that's a separate matter.

Thanks,
David

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-11-09  0:33 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-01 15:40 [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility David Howells
2017-09-01 15:41 ` [RFC PATCH 02/11] refcount: Implement inc/decrement-and-return functions David Howells
2017-09-01 16:42   ` Peter Zijlstra
2017-09-01 21:15   ` David Howells
2017-09-01 21:50     ` Peter Zijlstra
2017-09-01 22:03       ` Peter Zijlstra
2017-09-01 22:51     ` David Howells
2017-09-04  7:30       ` Peter Zijlstra
2017-09-04 15:36   ` Christoph Hellwig
2017-09-04 16:08   ` David Howells
2017-09-05  6:45     ` Christoph Hellwig
2017-09-01 15:41 ` [RFC PATCH 03/11] Pass mode to wait_on_atomic_t() action funcs and provide default actions David Howells
2017-09-01 15:41 ` [RFC PATCH 04/11] Add a function to start/reduce a timer David Howells
2017-10-20 12:20   ` Thomas Gleixner
2017-11-09  0:33   ` David Howells
2017-09-01 15:41 ` [RFC PATCH 05/11] afs: Lay the groundwork for supporting network namespaces David Howells
2017-09-01 15:41 ` [RFC PATCH 06/11] afs: Add some protocol defs David Howells
2017-09-01 15:41 ` [RFC PATCH 07/11] afs: Update the cache index structure David Howells
2017-09-01 15:41 ` [RFC PATCH 08/11] afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr David Howells
2017-09-01 15:41 ` [RFC PATCH 09/11] afs: Allow IPv6 address specification of VL servers David Howells
2017-09-01 15:42 ` [RFC PATCH 10/11] afs: Overhaul cell database management David Howells
2017-09-01 15:42 ` [RFC PATCH 11/11] afs: Retry rxrpc calls with address rotation on network error David Howells
2017-09-01 15:52 ` [RFC PATCH 00/11] AFS: Namespacing part 1 David Howells
2017-09-05 13:29 ` [RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility Tejun Heo
2017-09-05 14:50 ` David Howells
2017-09-06 14:51   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.