linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Waiman Long <Waiman.Long@hpe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.com>, Jeff Layton <jlayton@poochiereds.net>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andi Kleen <andi@firstfloor.org>,
	Dave Chinner <dchinner@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists
Date: Wed, 13 Jul 2016 12:08:23 -0400	[thread overview]
Message-ID: <20160713160823.GD4065@mtj.duckdns.org> (raw)
In-Reply-To: <1468258332-61537-2-git-send-email-Waiman.Long@hpe.com>

Hello,

On Mon, Jul 11, 2016 at 01:32:06PM -0400, Waiman Long wrote:
...
> A new header file include/linux/dlock-list.h will be added with the

Heh, I think perpcu_list was the better name but suppose I'm too late.

> associated dlock_list_head and dlock_list_node structures. The following
> functions are provided to manage the per-cpu list:
> 
>  1. int init_dlock_list_head(struct dlock_list_head **pdlock_head)
>  2. void dlock_list_add(struct dlock_list_node *node,
> 		        struct dlock_list_head *head)
>  3. void dlock_list_del(struct dlock_list *node)
> 
> Iteration of all the list entries within a group of per-cpu
> lists is done by calling either the dlock_list_iterate() or
> dlock_list_iterate_safe() functions in a while loop. They correspond
> to the list_for_each_entry() and list_for_each_entry_safe() macros
> respectively. The iteration states are keep in a dlock_list_state
> structure that is passed to the iteration functions.

Why do we need two variants of this?  We need a state variable to walk
the list anyway.  Why not make dlock_list_iterate() safe against
removal and get rid of the second variant?  Also, dlock_list_next()
probably is a better name.

> +/*
> + * include/linux/dlock-list.h
> + *
> + * A distributed (per-cpu) set of lists each of which is protected by its
> + * own spinlock, but acts like a single consolidated list to the callers.
> + *
> + * The dlock_list_head structure contains the spinlock, the other
> + * dlock_list_node structures only contains a pointer to the spinlock in
> + * dlock_list_head.
> + */
> +struct dlock_list_head {
> +	struct list_head list;
> +	spinlock_t lock;
> +};
> +
> +#define DLOCK_LIST_HEAD_INIT(name)				\
> +	{							\
> +		.list.prev = &name.list,			\
> +		.list.next = &name.list,			\
> +		.list.lock = __SPIN_LOCK_UNLOCKED(name),	\
> +	}

This is confusing.  It looks like dlock_list_head and
DLOCK_LIST_HEAD_INIT() can be used to define and initialize static
dlock_lists but that isn't true.  It's weird to require the user to
deal with percpu declaration of the data type.  Shouldn't it be more
like the following?

struct dlock_list_head_cpu {
	struct list_head list;
	spinlock_t lock;
};

struct dlock_list_head {
	struct dlock_list_head_percpu *head_cpu;
};

> +/*
> + * Per-cpu list iteration state
> + */
> +struct dlock_list_state {
> +	int			 cpu;
> +	spinlock_t		*lock;
> +	struct list_head	*head;	/* List head of current per-cpu list */
> +	struct dlock_list_node	*curr;
> +	struct dlock_list_node	*next;
> +};

Maybe dlock_list_iter[ator] is a better name?

> +#define DLOCK_LIST_STATE_INIT()			\
> +	{					\
> +		.cpu  = -1,			\
> +		.lock = NULL,			\
> +		.head = NULL,			\
> +		.curr = NULL,			\
> +		.next = NULL,			\
> +	}

The NULL inits are unnecessary and prone to being left behind.

> +#define DEFINE_DLOCK_LIST_STATE(s)		\
> +	struct dlock_list_state s = DLOCK_LIST_STATE_INIT()
> +
> +static inline void init_dlock_list_state(struct dlock_list_state *state)
> +{
> +	state->cpu  = -1;
> +	state->lock = NULL;
> +	state->head = NULL;
> +	state->curr = NULL;
> +	state->next = NULL;
> +}

Why not "state = (struct dlock_list_state)DLOCK_LIST_STATE_INIT;"?

> +#ifdef CONFIG_DEBUG_SPINLOCK
> +#define DLOCK_LIST_WARN_ON(x)	WARN_ON(x)
> +#else
> +#define DLOCK_LIST_WARN_ON(x)
> +#endif

I'd just use WARN_ON_ONCE() without the CONFIG guard.

> +/*
> + * Next per-cpu list entry
> + */
> +#define dlock_list_next_entry(pos, member) list_next_entry(pos, member.list)

Why does this need to be exposed?

> +/*
> + * Per-cpu node data structure
> + */
> +struct dlock_list_node {
> +	struct list_head list;
> +	spinlock_t *lockptr;
> +};
> +
> +#define DLOCK_LIST_NODE_INIT(name)		\
> +	{					\
> +		.list.prev = &name.list,	\
> +		.list.next = &name.list,	\
> +		.list.lockptr = NULL		\
> +	}

Ditto with NULL init.

> +static inline void init_dlock_list_node(struct dlock_list_node *node)
> +{
> +	INIT_LIST_HEAD(&node->list);
> +	node->lockptr = NULL;
> +}

Ditto with init.

> +static inline void free_dlock_list_head(struct dlock_list_head **pdlock_head)
> +{
> +	free_percpu(*pdlock_head);
> +	*pdlock_head = NULL;
> +}

Why does this need to be inlined?

> +/*
> + * Check if all the per-cpu lists are empty
> + */

Please use proper function comments.

> +static inline bool dlock_list_empty(struct dlock_list_head *dlock_head)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu)
> +		if (!list_empty(&per_cpu_ptr(dlock_head, cpu)->list))
> +			return false;
> +	return true;
> +}
> +
> +/*
> + * Helper function to find the first entry of the next per-cpu list
> + * It works somewhat like for_each_possible_cpu(cpu).
> + *
> + * Return: true if the entry is found, false if all the lists exhausted

Ditto about the comment.

> + */
> +static __always_inline bool
> +__dlock_list_next_cpu(struct dlock_list_head *head,
> +		      struct dlock_list_state *state)
> +{
...
> +static inline bool dlock_list_iterate_safe(struct dlock_list_head *head,
> +					   struct dlock_list_state *state)
> +{

Inlining these doesn't make senes to me.

> diff --git a/lib/Makefile b/lib/Makefile
> index 499fb35..92e8c38 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> +/*
> + * The dlock list lock needs its own class to avoid warning and stack
> + * trace when lockdep is enabled.
> + */

Can you please elaborate on this?

> +static struct lock_class_key dlock_list_key;
> +
> +/*
> + * Initialize the per-cpu list head
> + */
> +int init_dlock_list_head(struct dlock_list_head **pdlock_head)
> +{
> +	struct dlock_list_head *dlock_head;
> +	int cpu;
> +
> +	dlock_head = alloc_percpu(struct dlock_list_head);
> +	if (!dlock_head)
> +		return -ENOMEM;
> +
> +	for_each_possible_cpu(cpu) {
> +		struct dlock_list_head *head = per_cpu_ptr(dlock_head, cpu);
> +
> +		INIT_LIST_HEAD(&head->list);
> +		head->lock = __SPIN_LOCK_UNLOCKED(&head->lock);
> +		lockdep_set_class(&head->lock, &dlock_list_key);
> +	}
> +
> +	*pdlock_head = dlock_head;
> +	return 0;
> +}

Why is this called init?  Why not do the following?

struct dlock_list_head *alloc_dlock_list_head(void);

Also, the pointer type needs to include __percpu annotation.

> +/*
> + * List selection is based on the CPU being used when the dlock_list_add()
> + * function is called. However, deletion may be done by a different CPU.
> + * So we still need to use a lock to protect the content of the list.
> + */
> +void dlock_list_add(struct dlock_list_node *node, struct dlock_list_head *head)
> +{
> +	struct dlock_list_head *myhead;
> +
> +	/*
> +	 * Disable preemption to make sure that CPU won't gets changed.
> +	 */
> +	myhead = get_cpu_ptr(head);
> +	spin_lock(&myhead->lock);
> +	node->lockptr = &myhead->lock;
> +	list_add(&node->list, &myhead->list);
> +	spin_unlock(&myhead->lock);
> +	put_cpu_ptr(head);
> +}

I wonder whether it'd be better to use irqsafe operations.  lists tend
to be often used from irq contexts.

> +/*
> + * Delete a node from a dlock list
> + *
> + * We need to check the lock pointer again after taking the lock to guard
> + * against concurrent delete of the same node. If the lock pointer changes
> + * (becomes NULL or to a different one), we assume that the deletion was done
> + * elsewhere.
> + */
> +void dlock_list_del(struct dlock_list_node *node)
> +{
> +	spinlock_t *lock = READ_ONCE(node->lockptr);
> +
> +	if (unlikely(!lock)) {
> +		WARN(1, "dlock_list_del: node 0x%lx has no associated lock\n",
> +			(unsigned long)node);
> +		return;
> +	}
> +
> +	spin_lock(lock);
> +	if (likely(lock == node->lockptr)) {
> +		list_del_init(&node->list);
> +		node->lockptr = NULL;
> +	} else {
> +		/*
> +		 * This path should never be executed.
> +		 */

What if it races against someone else removing and adding back?
Shouldn't it retry on those cases?

Thanks.

-- 
tejun

  reply	other threads:[~2016-07-13 16:10 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-11 17:32 [PATCH v2 0/7] vfs: Use dlock list for SB's s_inodes list Waiman Long
2016-07-11 17:32 ` [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists Waiman Long
2016-07-13 16:08   ` Tejun Heo [this message]
2016-07-14  2:54     ` Waiman Long
2016-07-14 11:50       ` Tejun Heo
2016-07-14 14:35         ` Jan Kara
2016-07-14 14:51           ` Tejun Heo
2016-07-14 16:16             ` Waiman Long
2016-07-14 17:13         ` Waiman Long
2016-07-14 17:41           ` Tejun Heo
2016-07-11 17:32 ` [PATCH v2 2/7] lib/dlock-list: Add __percpu modifier for parameters Waiman Long
2016-07-13 16:08   ` Tejun Heo
2016-07-14  2:54     ` Waiman Long
2016-07-11 17:32 ` [PATCH v2 3/7] fsnotify: Simplify inode iteration on umount Waiman Long
2016-07-11 17:32 ` [PATCH v2 4/7] vfs: Remove unnecessary list_for_each_entry_safe() variants Waiman Long
2016-07-11 17:32 ` [PATCH v2 5/7] vfs: Use dlock list for superblock's inode list Waiman Long
2016-07-11 17:32 ` [RFC PATCH v2 6/7] lib/persubnode: Introducing a simple per-subnode APIs Waiman Long
2016-07-12  3:14   ` Boqun Feng
2016-07-12 18:37     ` Waiman Long
2016-07-12 14:27   ` Tejun Heo
2016-07-12 18:51     ` Waiman Long
2016-07-12 18:57       ` Tejun Heo
2016-07-12 19:42         ` Waiman Long
2016-07-11 17:32 ` [RFC PATCH v2 7/7] lib/dlock-list: Use the per-subnode APIs for managing lists Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160713160823.GD4065@mtj.duckdns.org \
    --to=tj@kernel.org \
    --cc=Waiman.Long@hpe.com \
    --cc=andi@firstfloor.org \
    --cc=bfields@fieldses.org \
    --cc=boqun.feng@gmail.com \
    --cc=cl@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=doug.hatch@hpe.com \
    --cc=jack@suse.com \
    --cc=jlayton@poochiereds.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hpe.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).