From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753808AbcBXCA5 (ORCPT ); Tue, 23 Feb 2016 21:00:57 -0500 Received: from mail-pf0-f178.google.com ([209.85.192.178]:34384 "EHLO mail-pf0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752080AbcBXCAy (ORCPT ); Tue, 23 Feb 2016 21:00:54 -0500 Date: Wed, 24 Feb 2016 10:00:19 +0800 From: Boqun Feng To: Waiman Long Cc: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Scott J Norton , Douglas Hatch Subject: Re: [PATCH v3 1/3] lib/percpu-list: Per-cpu list with associated per-cpu locks Message-ID: <20160224020009.GA10956@fixme-laptop.cn.ibm.com> References: <1456254272-42313-1-git-send-email-Waiman.Long@hpe.com> <1456254272-42313-2-git-send-email-Waiman.Long@hpe.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="hQiwHBbRI9kgIhsi" Content-Disposition: inline In-Reply-To: <1456254272-42313-2-git-send-email-Waiman.Long@hpe.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --hQiwHBbRI9kgIhsi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Waiman, On Tue, Feb 23, 2016 at 02:04:30PM -0500, Waiman Long wrote: > Linked list is used everywhere in the Linux kernel. However, if many > threads are trying to add or delete entries into the same linked list, > it can create a performance bottleneck. >=20 > This patch introduces a new per-cpu list subystem with associated > per-cpu locks for protecting each of the lists individually. This > allows list entries insertion and deletion operations to happen in > parallel instead of being serialized with a global list and lock. >=20 > List entry insertion is strictly per cpu. List deletion, however, can > happen in a cpu other than the one that did the insertion. So we still > need lock to protect the list. Because of that, there may still be > a small amount of contention when deletion is being done. >=20 > A new header file include/linux/percpu-list.h will be added with the > associated pcpu_list_head and pcpu_list_node structures. The following > functions are provided to manage the per-cpu list: >=20 > 1. int init_pcpu_list_head(struct pcpu_list_head **ppcpu_head) > 2. void pcpu_list_add(struct pcpu_list_node *node, > struct pcpu_list_head *head) > 3. void pcpu_list_del(struct pcpu_list *node) >=20 > Iteration of all the list entries within a group of per-cpu > lists is done by calling either the pcpu_list_iterate() or > pcpu_list_iterate_safe() functions in a while loop. They correspond > to the list_for_each_entry() and list_for_each_entry_safe() macros > respectively. The iteration states are keep in a pcpu_list_state > structure that is passed to the iteration functions. >=20 > Signed-off-by: Waiman Long > --- > include/linux/percpu-list.h | 235 +++++++++++++++++++++++++++++++++++++= ++++++ > lib/Makefile | 2 +- > lib/percpu-list.c | 85 ++++++++++++++++ > 3 files changed, 321 insertions(+), 1 deletions(-) > create mode 100644 include/linux/percpu-list.h > create mode 100644 lib/percpu-list.c >=20 > diff --git a/include/linux/percpu-list.h b/include/linux/percpu-list.h > new file mode 100644 > index 0000000..8759fec > --- /dev/null > +++ b/include/linux/percpu-list.h > @@ -0,0 +1,235 @@ > +/* > + * Per-cpu list > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * (C) Copyright 2016 Hewlett-Packard Enterprise Development LP > + * > + * Authors: Waiman Long > + */ > +#ifndef __LINUX_PERCPU_LIST_H > +#define __LINUX_PERCPU_LIST_H > + > +#include > +#include > +#include > + > +/* > + * include/linux/percpu-list.h > + * > + * A per-cpu list protected by a per-cpu spinlock. > + * > + * The pcpu_list_head structure contains the spinlock, the other > + * pcpu_list_node structures only contains a pointer to the spinlock in > + * pcpu_list_head. > + */ > +struct pcpu_list_head { > + struct list_head list; > + spinlock_t lock; > +}; > + > +struct pcpu_list_node { > + struct list_head list; > + spinlock_t *lockptr; > +}; > + > +/* > + * Per-cpu list iteration state > + */ > +struct pcpu_list_state { > + int cpu; > + spinlock_t *lock; > + struct list_head *head; /* List head of current per-cpu list */ > + struct pcpu_list_node *curr; > + struct pcpu_list_node *next; > +}; > + > +#define PCPU_LIST_HEAD_INIT(name) \ > + { \ > + .list.prev =3D &name.list, \ > + .list.next =3D &name.list, \ > + .list.lock =3D __SPIN_LOCK_UNLOCKED(name), \ > + } > + > +#define PCPU_LIST_NODE_INIT(name) \ > + { \ > + .list.prev =3D &name.list, \ > + .list.next =3D &name.list, \ > + .list.lockptr =3D NULL \ > + } > + > +#define PCPU_LIST_STATE_INIT() \ > + { \ > + .cpu =3D -1, \ > + .lock =3D NULL, \ > + .head =3D NULL, \ > + .curr =3D NULL, \ > + .next =3D NULL, \ > + } > + > +#define DEFINE_PCPU_LIST_STATE(s) \ > + struct pcpu_list_state s =3D PCPU_LIST_STATE_INIT() > + > +#define pcpu_list_next_entry(pos, member) list_next_entry(pos, member.li= st) > + > +static inline void init_pcpu_list_node(struct pcpu_list_node *node) > +{ > + INIT_LIST_HEAD(&node->list); > + node->lockptr =3D NULL; > +} > + > +static inline void free_pcpu_list_head(struct pcpu_list_head **ppcpu_hea= d) > +{ > + free_percpu(*ppcpu_head); > + *ppcpu_head =3D NULL; > +} > + > +static inline void init_pcpu_list_state(struct pcpu_list_state *state) > +{ > + state->cpu =3D -1; > + state->lock =3D NULL; > + state->head =3D NULL; > + state->curr =3D NULL; > + state->next =3D NULL; > +} > + > +#if NR_CPUS =3D=3D 1 > +/* > + * For uniprocessor, the list head and lock in struct pcpu_list_head are > + * used directly. > + */ > +static inline bool pcpu_list_empty(struct pcpu_list_head *pcpu_head) > +{ > + return list_empty(&pcpu_head->list); > +} > + > +static __always_inline bool > +__pcpu_list_next_cpu(struct pcpu_list_head *head, struct pcpu_list_state= *state) > +{ > + if (state->lock) > + spin_unlock(state->lock); > + > + if (state->cpu++ >=3D 0) > + return false; > + > + state->curr =3D list_entry(head->list.next, struct pcpu_list_node, list= ); > + if (list_empty(&state->curr->list)) > + return false; > + state->lock =3D &head->lock; > + spin_lock(state->lock); > + return true; > + > +} > +#else /* NR_CPUS =3D=3D 1 */ > +/* > + * Multiprocessor > + */ > +static inline bool pcpu_list_empty(struct pcpu_list_head *pcpu_head) > +{ > + int cpu; > + > + for_each_possible_cpu(cpu) > + if (!list_empty(&per_cpu_ptr(pcpu_head, cpu)->list)) > + return false; > + return true; > +} > + > +/* > + * Helper function to find the first entry of the next per-cpu list > + * It works somewhat like for_each_possible_cpu(cpu). > + * > + * Return: true if the entry is found, false if all the lists exhausted > + */ > +static __always_inline bool > +__pcpu_list_next_cpu(struct pcpu_list_head *head, struct pcpu_list_state= *state) > +{ > + if (state->lock) > + spin_unlock(state->lock); > +next_cpu: > + /* > + * for_each_possible_cpu(cpu) > + */ > + state->cpu =3D cpumask_next(state->cpu, cpu_possible_mask); > + if (state->cpu >=3D nr_cpu_ids) > + return false; /* All the per-cpu lists iterated */ > + > + state->head =3D &per_cpu_ptr(head, state->cpu)->list; > + state->lock =3D &per_cpu_ptr(head, state->cpu)->lock; > + state->curr =3D list_entry(state->head->next, > + struct pcpu_list_node, list); > + if (&state->curr->list =3D=3D state->head) > + goto next_cpu; > + > + spin_lock(state->lock); > + return true; > +} > +#endif /* NR_CPUS =3D=3D 1 */ > + > +/* > + * Iterate to the next entry of the group of per-cpu lists > + * > + * Return: true if the next entry is found, false if all the entries ite= rated > + */ > +static inline bool pcpu_list_iterate(struct pcpu_list_head *head, > + struct pcpu_list_state *state) > +{ > + /* > + * Find next entry > + */ > + if (state->curr) > + state->curr =3D list_next_entry(state->curr, list); > + > + if (!state->curr || (&state->curr->list =3D=3D state->head)) { > + /* > + * The current per-cpu list has been exhausted, try the next > + * per-cpu list. > + */ > + if (!__pcpu_list_next_cpu(head, state)) > + return false; > + } > + return true; /* Continue the iteration */ > +} > + > +/* > + * Iterate to the next entry of the group of per-cpu lists and safe > + * against removal of list_entry > + * > + * Return: true if the next entry is found, false if all the entries ite= rated > + */ > +static inline bool pcpu_list_iterate_safe(struct pcpu_list_head *head, > + struct pcpu_list_state *state) > +{ > + /* > + * Find next entry > + */ > + if (state->curr) { > + state->curr =3D state->next; > + state->next =3D list_next_entry(state->next, list); > + } > + > + if (!state->curr || (&state->curr->list =3D=3D state->head)) { > + /* > + * The current per-cpu list has been exhausted, try the next > + * per-cpu list. > + */ > + if (!__pcpu_list_next_cpu(head, state)) > + return false; > + state->next =3D list_next_entry(state->curr, list); > + } > + return true; /* Continue the iteration */ > +} > + > +extern int init_pcpu_list_head(struct pcpu_list_head **ppcpu_head); > +extern void pcpu_list_add(struct pcpu_list_node *node, > + struct pcpu_list_head *head); > +extern void pcpu_list_del(struct pcpu_list_node *node); > + > +#endif /* __LINUX_PERCPU_LIST_H */ > diff --git a/lib/Makefile b/lib/Makefile > index a7c26a4..71a25d4 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -27,7 +27,7 @@ obj-y +=3D bcd.o div64.o sort.o parser.o halfmd4.o debu= g_locks.o random32.o \ > gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \ > bsearch.o find_bit.o llist.o memweight.o kfifo.o \ > percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \ > - once.o > + once.o percpu-list.o > obj-y +=3D string_helpers.o > obj-$(CONFIG_TEST_STRING_HELPERS) +=3D test-string_helpers.o > obj-y +=3D hexdump.o > diff --git a/lib/percpu-list.c b/lib/percpu-list.c > new file mode 100644 > index 0000000..45bbb2a > --- /dev/null > +++ b/lib/percpu-list.c > @@ -0,0 +1,85 @@ > +/* > + * Per-cpu list > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * (C) Copyright 2016 Hewlett-Packard Enterprise Development LP > + * > + * Authors: Waiman Long > + */ > +#include > + > +/* > + * Initialize the per-cpu list > + */ > +int init_pcpu_list_head(struct pcpu_list_head **ppcpu_head) > +{ > + struct pcpu_list_head *pcpu_head =3D alloc_percpu(struct pcpu_list_head= ); > + int cpu; > + > + if (!pcpu_head) > + return -ENOMEM; > + > + for_each_possible_cpu(cpu) { > + struct pcpu_list_head *head =3D per_cpu_ptr(pcpu_head, cpu); > + > + INIT_LIST_HEAD(&head->list); > + head->lock =3D __SPIN_LOCK_UNLOCKED(&head->lock); > + } > + > + *ppcpu_head =3D pcpu_head; > + return 0; > +} > + > +/* > + * List selection is based on the CPU being used when the pcpu_list_add() > + * function is called. However, deletion may be done by a different CPU. > + * So we still need to use a lock to protect the content of the list. > + */ > +void pcpu_list_add(struct pcpu_list_node *node, struct pcpu_list_head *h= ead) > +{ > + spinlock_t *lock; > + > + /* > + * There is a very slight chance the cpu will be changed > + * (by preemption) before calling spin_lock(). We only need to put > + * the node in one of the per-cpu lists. It may not need to be > + * that of the current cpu. > + */ Just curious about the comment here, what if the following happens: CPU 0 CPU 1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D task_1: lock =3D this_cpu_ptr(&head->lock); // head->lock is on CPU0 continue to task_1: spin_lock(lock); node->lockptr =3D lock; // head->list is on CPU1 list_add(&node->list, this_cpu_ptr(&head->list)); spin_unlock(lock); , which ends up the node is in the list on CPU1 while ->lockptr pointing to the lock on CPU0. If there is another node whose ->lockptr points to the lock on CPU1 and the node is in list on CPU1, what will happen if these two nodes get deleted simultaneously? Regards, Boqun > + lock =3D this_cpu_ptr(&head->lock); > + spin_lock(lock); > + node->lockptr =3D lock; > + list_add(&node->list, this_cpu_ptr(&head->list)); > + spin_unlock(lock); > +} > + > +/* > + * Delete a node from a percpu list > + * > + * We need to check the lock pointer again after taking the lock to guard > + * against concurrent delete of the same node. If the lock pointer chang= es > + * (becomes NULL or to a different one), we assume that the deletion was= done > + * elsewhere. > + */ > +void pcpu_list_del(struct pcpu_list_node *node) > +{ > + spinlock_t *lock =3D READ_ONCE(node->lockptr); > + > + if (unlikely(!lock)) > + return; > + > + spin_lock(lock); > + if (likely(lock =3D=3D node->lockptr)) { > + list_del_init(&node->list); > + node->lockptr =3D NULL; > + } > + spin_unlock(lock); > +} > --=20 > 1.7.1 >=20 > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" = in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html --hQiwHBbRI9kgIhsi Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJWzQ6vAAoJEEl56MO1B/q4LRcH/1O7GqI6tGRszrBwE1mlBuyT hnh6D5l5GadOFGI+oXpew6SaPFF+HDlwBZgeq97H5VfIpFLjM6gCwQL31YsLkRdy KHbGU5QlxbO+OzUOh5lAbmrAjqKCdQYqXiEmINDQah2/S+F15VZWZyvH4dmB2tal 1kM/YioVPLxfh3LUf3pQbhma5GB+GQiLTx5Vm36eLmfwE+STbyZbSY4aw8RFOAMq QtVmhOPCvOArJVFqTsJmGckCofgK6MWOYZ/wWt5SCxVwxqD53zgfGk6WYiAKuxul qQ33Sx4/37FUW+EMxoIg/CCr5LWoWSQo/1nMwlLlRXcEdcGUbMT+6wcJB6IhYGQ= =/ec3 -----END PGP SIGNATURE----- --hQiwHBbRI9kgIhsi--