linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: Waiman Long <longman@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.com>, Jeff Layton <jlayton@poochiereds.net>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andi Kleen <andi@firstfloor.org>,
	Dave Chinner <dchinner@redhat.com>,
	Davidlohr Bueso <dave@stgolabs.net>
Subject: Re: [PATCH v7 1/6] lib/dlock-list: Distributed and lock-protected lists
Date: Wed, 18 Oct 2017 16:55:59 +0800	[thread overview]
Message-ID: <20171018085559.wwtqvofjguhveozg@tardis> (raw)
In-Reply-To: <1507229008-20569-2-git-send-email-longman@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6105 bytes --]

On Thu, Oct 05, 2017 at 06:43:23PM +0000, Waiman Long wrote:
[...]
> +/*
> + * Find the first entry of the next available list.
> + */
> +extern struct dlock_list_node *
> +__dlock_list_next_list(struct dlock_list_iter *iter);
> +
> +/**
> + * __dlock_list_next_entry - Iterate to the next entry of the dlock list
> + * @curr : Pointer to the current dlock_list_node structure
> + * @iter : Pointer to the dlock list iterator structure
> + * Return: Pointer to the next entry or NULL if all the entries are iterated
> + *
> + * The iterator has to be properly initialized before calling this function.
> + */
> +static inline struct dlock_list_node *
> +__dlock_list_next_entry(struct dlock_list_node *curr,
> +			struct dlock_list_iter *iter)
> +{
> +	/*
> +	 * Find next entry
> +	 */
> +	if (curr)
> +		curr = list_next_entry(curr, list);
> +
> +	if (!curr || (&curr->list == &iter->entry->list)) {
> +		/*
> +		 * The current list has been exhausted, try the next available
> +		 * list.
> +		 */
> +		curr = __dlock_list_next_list(iter);
> +	}
> +
> +	return curr;	/* Continue the iteration */
> +}
> +
> +/**
> + * dlock_list_first_entry - get the first element from a list
> + * @iter  : The dlock list iterator.
> + * @type  : The type of the struct this is embedded in.
> + * @member: The name of the dlock_list_node within the struct.
> + * Return : Pointer to the next entry or NULL if all the entries are iterated.
> + */
> +#define dlock_list_first_entry(iter, type, member)			\
> +	({								\
> +		struct dlock_list_node *_n;				\
> +		_n = __dlock_list_next_entry(NULL, iter);		\
> +		_n ? list_entry(_n, type, member) : NULL;		\
> +	})
> +
> +/**
> + * dlock_list_next_entry - iterate to the next entry of the list
> + * @pos   : The type * to cursor
> + * @iter  : The dlock list iterator.
> + * @member: The name of the dlock_list_node within the struct.
> + * Return : Pointer to the next entry or NULL if all the entries are iterated.
> + *
> + * Note that pos can't be NULL.
> + */
> +#define dlock_list_next_entry(pos, iter, member)			\
> +	({								\
> +		struct dlock_list_node *_n;				\
> +		_n = __dlock_list_next_entry(&(pos)->member, iter);	\
> +		_n ? list_entry(_n, typeof(*(pos)), member) : NULL;	\
> +	})
> +

[...]

> +/**
> + * dlist_for_each_entry_safe - iterate over the dlock list & safe over removal
> + * @pos   : Type * to use as a loop cursor
> + * @n	  : Another type * to use as temporary storage
> + * @iter  : The dlock list iterator
> + * @member: The name of the dlock_list_node within the struct
> + *
> + * This iteration macro is safe with respect to list entry removal.
> + * However, it cannot correctly iterate newly added entries right after the
> + * current one.
> + */
> +#define dlist_for_each_entry_safe(pos, n, iter, member)			\

So I missed something interesting here ;-)

> +	for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member);\
> +	    ({								\
> +		bool _b = (pos != NULL);				\
> +		if (_b)							\
> +			n = dlock_list_next_entry(pos, iter, member);	\

If @pos is the last item of the list of the index/cpu, and
dlock_list_next_entry() will eventually call __dlock_list_next_list(),
which will drop the lock for the current list and grab the lock for the
next list, leaving @pos unprotected. But in the meanwhile, there could
be another thread deleting @pos via dlock_lists_del() and freeing it.
This is a use-after-free.

I think we can have something like:

	(by adding a ->prev_entry in dlock_list_iter and severl helper
	functions.)

	bool dlist_is_last_perlist(struct dlock_list_node *n)
	{
		return list_is_last(&n->list, &n->head->list);
	
	}

	void dlock_list_release_prev(struct dlock_list_iter *iter)
	{
		spin_unlock(iter->prev_entry->lock);
		iter->prev_entry = NULL;
	}

	#define dlist_for_each_entry_safe(pos, n, iter, member)		\
		for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member);	\
		    ({									\
			bool _b = (pos != NULL);					\
			if (_b) {							\
				if (dlist_is_last_perlist(&(pos)->member)) {		\
					iter->prev_entry = iter->entry;			\
					iter->entry = NULL;				\
					n = dlock_list_first_entry(NULL, iter, member);	\
				}							\
				else							\
					n = dlock_list_next_entry(pos, iter, member);	\
			}								\
			_b;								\
		    });									\
		    pos = n, iter->prev_entry && dlock_list_release_prev(iter))

Of course, the dlock_list_first_entry() here may need a better name ;-)

Thoughts?

Regards,
Boqun

> +		_b;							\
> +	    });								\
> +	    pos = n)
> +
> +#endif /* __LINUX_DLOCK_LIST_H */

[...]

> +/**
> + * __dlock_list_next_list: Find the first entry of the next available list
> + * @dlist: Pointer to the dlock_list_heads structure
> + * @iter : Pointer to the dlock list iterator structure
> + * Return: true if the entry is found, false if all the lists exhausted
> + *
> + * The information about the next available list will be put into the iterator.
> + */
> +struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter)
> +{
> +	struct dlock_list_node *next;
> +	struct dlock_list_head *head;
> +
> +restart:
> +	if (iter->entry) {
> +		spin_unlock(&iter->entry->lock);
> +		iter->entry = NULL;
> +	}
> +
> +next_list:
> +	/*
> +	 * Try next list
> +	 */
> +	if (++iter->index >= nr_cpu_ids)
> +		return NULL;	/* All the entries iterated */
> +
> +	if (list_empty(&iter->head[iter->index].list))
> +		goto next_list;
> +
> +	head = iter->entry = &iter->head[iter->index];
> +	spin_lock(&head->lock);
> +	/*
> +	 * There is a slight chance that the list may become empty just
> +	 * before the lock is acquired. So an additional check is
> +	 * needed to make sure that a valid node will be returned.
> +	 */
> +	if (list_empty(&head->list))
> +		goto restart;
> +
> +	next = list_entry(head->list.next, struct dlock_list_node,
> +			  list);
> +	WARN_ON_ONCE(next->head != head);
> +
> +	return next;
> +}
> -- 
> 1.8.3.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2017-10-18  8:54 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-05 18:43 [PATCH v7 0/6] vfs: Use dlock list for SB's s_inodes list Waiman Long
2017-10-05 18:43 ` [PATCH v7 1/6] lib/dlock-list: Distributed and lock-protected lists Waiman Long
2017-10-10  5:35   ` Boqun Feng
2017-10-13 21:10     ` Waiman Long
2017-10-18  8:55   ` Boqun Feng [this message]
2017-10-05 18:43 ` [PATCH v7 2/6] vfs: Remove unnecessary list_for_each_entry_safe() variants Waiman Long
2017-10-05 18:43 ` [PATCH v7 3/6] vfs: Use dlock list for superblock's inode list Waiman Long
2017-10-05 18:43 ` [PATCH v7 4/6] lib/dlock-list: Make sibling CPUs share the same linked list Waiman Long
2017-10-09 15:40   ` Jan Kara
2017-10-09 16:14     ` Waiman Long
2017-10-05 18:43 ` [PATCH v7 5/6] lib/dlock-list: Enable faster lookup with hashing Waiman Long
2017-10-09 13:08   ` Davidlohr Bueso
2017-10-09 14:16     ` Waiman Long
2017-10-09 16:03       ` Davidlohr Bueso
2017-10-09 16:11         ` Waiman Long
2017-10-05 18:43 ` [PATCH v7 6/6] lib/dlock-list: Add an IRQ-safe mode to be used in interrupt handler Waiman Long
2017-10-13 15:45 ` [PATCH v7 7/6] fs/epoll: scale nested callbacks Davidlohr Bueso
2017-10-16 19:30   ` Jason Baron
2017-10-17 15:53     ` Davidlohr Bueso
2017-10-18 14:06       ` Jason Baron
2017-10-18 15:44         ` Davidlohr Bueso
2017-10-17 19:36 ` [PATCH v7 8/9] lib/dlock-list: Export symbols and add warnings Waiman Long
2017-10-17 19:36   ` [PATCH v7 9/9] lib/dlock-list: Unique lock class key for each allocation call site Waiman Long
2017-10-26 18:28 ` [PATCH v7 0/6] vfs: Use dlock list for SB's s_inodes list Waiman Long
2017-10-27  0:58   ` Boqun Feng
2017-10-27 20:19     ` Waiman Long
2017-10-27 20:10 ` [PATCH v7 10/10] lib/dlock-list: Fix use-after-unlock problem in dlist_for_each_entry_safe() Waiman Long
2017-10-30  9:06   ` Jan Kara
2017-10-30 14:06     ` Boqun Feng
2017-10-30 14:11   ` Davidlohr Bueso
2017-10-30 14:15     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171018085559.wwtqvofjguhveozg@tardis \
    --to=boqun.feng@gmail.com \
    --cc=andi@firstfloor.org \
    --cc=bfields@fieldses.org \
    --cc=cl@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=dchinner@redhat.com \
    --cc=jack@suse.com \
    --cc=jlayton@poochiereds.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).