linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] use rwlock in order to reduce ep_poll_callback() contention
@ 2018-12-12 11:03 Roman Penyaev
  2018-12-12 11:03 ` [PATCH 1/3] epoll: make sure all elements in ready list are in FIFO order Roman Penyaev
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Roman Penyaev @ 2018-12-12 11:03 UTC (permalink / raw)
  Cc: Roman Penyaev, Davidlohr Bueso, Jason Baron, Al Viro,
	Paul E. McKenney, Linus Torvalds, Andrew Morton, linux-fsdevel,
	linux-kernel

The last patch targets the contention problem in ep_poll_callback(), which
can be very well reproduced by generating events (write to pipe or eventfd)
from many threads, while consumer thread does polling.

The following are some microbenchmark results based on the test [1] which
starts threads which generate N events each.  The test ends when all events
are successfully fetched by the poller thread:

 spinlock
 ========

 threads  events/ms  run-time ms
       8       6402        12495
      16       7045        22709
      32       7395        43268

 rwlock + xchg
 =============

 threads  events/ms  run-time ms
       8      10038         7969
      16      12178        13138
      32      13223        24199


According to the results bandwidth of delivered events is significantly
increased, thus execution time is reduced.

This series is based on linux-next/akpm and differs from RFC in that
additional cleanup patches and explicit comments have been added.

[1] https://github.com/rouming/test-tools/blob/master/stress-epoll.c

Roman Penyaev (3):
  epoll: make sure all elements in ready list are in FIFO order
  epoll: loosen irq safety in ep_poll_callback()
  epoll: use rwlock in order to reduce ep_poll_callback() contention

 fs/eventpoll.c | 127 ++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 93 insertions(+), 34 deletions(-)

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
-- 
2.19.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] epoll: make sure all elements in ready list are in FIFO order
  2018-12-12 11:03 [PATCH 0/3] use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
@ 2018-12-12 11:03 ` Roman Penyaev
  2018-12-13 19:30   ` Davidlohr Bueso
  2018-12-12 11:03 ` [PATCH 2/3] epoll: loosen irq safety in ep_poll_callback() Roman Penyaev
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Roman Penyaev @ 2018-12-12 11:03 UTC (permalink / raw)
  Cc: Roman Penyaev, Davidlohr Bueso, Jason Baron, Al Viro,
	Andrew Morton, Linus Torvalds, linux-fsdevel, linux-kernel

All coming events are stored in FIFO order and this is also should be
applicable to ->ovflist, which originally is stack, i.e. LIFO.

Thus to keep correct FIFO order ->ovflist should reversed by adding
elements to the head of the read list but not to the tail.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/eventpoll.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2329f96469e2..3627c2e07149 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -722,7 +722,11 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
 		 * contain them, and the list_splice() below takes care of them.
 		 */
 		if (!ep_is_linked(epi)) {
-			list_add_tail(&epi->rdllink, &ep->rdllist);
+			/*
+			 * ->ovflist is LIFO, so we have to reverse it in order
+			 * to keep in FIFO.
+			 */
+			list_add(&epi->rdllink, &ep->rdllist);
 			ep_pm_stay_awake(epi);
 		}
 	}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] epoll: loosen irq safety in ep_poll_callback()
  2018-12-12 11:03 [PATCH 0/3] use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
  2018-12-12 11:03 ` [PATCH 1/3] epoll: make sure all elements in ready list are in FIFO order Roman Penyaev
@ 2018-12-12 11:03 ` Roman Penyaev
  2018-12-12 11:03 ` [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
  2018-12-13 18:13 ` [PATCH 0/3] " Davidlohr Bueso
  3 siblings, 0 replies; 11+ messages in thread
From: Roman Penyaev @ 2018-12-12 11:03 UTC (permalink / raw)
  Cc: Roman Penyaev, Davidlohr Bueso, Jason Baron, Al Viro,
	Andrew Morton, Linus Torvalds, linux-fsdevel, linux-kernel

Callers of the ep_poll_callback() (all set of wake_up_*poll()) disable
interrupts, so no need to save/restore irq flags.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/eventpoll.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 3627c2e07149..ea0025e77519 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1124,13 +1124,15 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
 static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, void *key)
 {
 	int pwake = 0;
-	unsigned long flags;
 	struct epitem *epi = ep_item_from_wait(wait);
 	struct eventpoll *ep = epi->ep;
 	__poll_t pollflags = key_to_poll(key);
 	int ewake = 0;
 
-	spin_lock_irqsave(&ep->wq.lock, flags);
+	/* Interrupts are disabled by the wake_up_*poll() callers */
+	lockdep_assert_irqs_disabled();
+
+	spin_lock(&ep->wq.lock);
 
 	ep_set_busy_poll_napi_id(epi);
 
@@ -1207,7 +1209,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 		pwake++;
 
 out_unlock:
-	spin_unlock_irqrestore(&ep->wq.lock, flags);
+	spin_unlock(&ep->wq.lock);
 
 	/* We have to call this outside the lock */
 	if (pwake)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention
  2018-12-12 11:03 [PATCH 0/3] use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
  2018-12-12 11:03 ` [PATCH 1/3] epoll: make sure all elements in ready list are in FIFO order Roman Penyaev
  2018-12-12 11:03 ` [PATCH 2/3] epoll: loosen irq safety in ep_poll_callback() Roman Penyaev
@ 2018-12-12 11:03 ` Roman Penyaev
       [not found]   ` <20181212171348.GA12786@andrea>
  2018-12-13 18:13 ` [PATCH 0/3] " Davidlohr Bueso
  3 siblings, 1 reply; 11+ messages in thread
From: Roman Penyaev @ 2018-12-12 11:03 UTC (permalink / raw)
  Cc: Roman Penyaev, Davidlohr Bueso, Jason Baron, Al Viro,
	Paul E. McKenney, Linus Torvalds, Andrew Morton, linux-fsdevel,
	linux-kernel

The goal of this patch is to reduce contention of ep_poll_callback() which
can be called concurrently from different CPUs in case of high events
rates and many fds per epoll.  Problem can be very well reproduced by
generating events (write to pipe or eventfd) from many threads, while
consumer thread does polling.  In other words this patch increases the
bandwidth of events which can be delivered from sources to the poller by
adding poll items in a lockless way to the list.

The main change is in replacement of the spinlock with a rwlock, which is
taken on read in ep_poll_callback(), and then by adding poll items to the
tail of the list using xchg atomic instruction.  Write lock is taken
everywhere else in order to stop list modifications and guarantee that list
updates are fully completed (I assume that write side of a rwlock does not
starve, it seems qrwlock implementation has these guarantees).

The following are some microbenchmark results based on the test [1] which
starts threads which generate N events each.  The test ends when all
events are successfully fetched by the poller thread:

 spinlock
 ========

 threads  events/ms  run-time ms
       8       6402        12495
      16       7045        22709
      32       7395        43268

 rwlock + xchg
 =============

 threads  events/ms  run-time ms
       8      10038         7969
      16      12178        13138
      32      13223        24199

According to the results bandwidth of delivered events is significantly
increased, thus execution time is reduced.

This patch was tested with different sort of microbenchmarks and
artificial delays (e.g. "udelay(get_random_int() % 0xff)") introduced
in kernel on paths where items are added to lists.

[1] https://github.com/rouming/test-tools/blob/master/stress-epoll.c

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/eventpoll.c | 117 +++++++++++++++++++++++++++++++++++--------------
 1 file changed, 85 insertions(+), 32 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ea0025e77519..2af4bb21fde8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -50,10 +50,10 @@
  *
  * 1) epmutex (mutex)
  * 2) ep->mtx (mutex)
- * 3) ep->wq.lock (spinlock)
+ * 3) ep->lock (rwlock)
  *
  * The acquire order is the one listed above, from 1 to 3.
- * We need a spinlock (ep->wq.lock) because we manipulate objects
+ * We need a rwlock (ep->lock) because we manipulate objects
  * from inside the poll callback, that might be triggered from
  * a wake_up() that in turn might be called from IRQ context.
  * So we can't sleep inside the poll callback and hence we need
@@ -85,7 +85,7 @@
  * of epoll file descriptors, we use the current recursion depth as
  * the lockdep subkey.
  * It is possible to drop the "ep->mtx" and to use the global
- * mutex "epmutex" (together with "ep->wq.lock") to have it working,
+ * mutex "epmutex" (together with "ep->lock") to have it working,
  * but having "ep->mtx" will make the interface more scalable.
  * Events that require holding "epmutex" are very rare, while for
  * normal operations the epoll private "ep->mtx" will guarantee
@@ -182,8 +182,6 @@ struct epitem {
  * This structure is stored inside the "private_data" member of the file
  * structure and represents the main data structure for the eventpoll
  * interface.
- *
- * Access to it is protected by the lock inside wq.
  */
 struct eventpoll {
 	/*
@@ -203,13 +201,16 @@ struct eventpoll {
 	/* List of ready file descriptors */
 	struct list_head rdllist;
 
+	/* Lock which protects rdllist and ovflist */
+	rwlock_t lock;
+
 	/* RB tree root used to store monitored fd structs */
 	struct rb_root_cached rbr;
 
 	/*
 	 * This is a single linked list that chains all the "struct epitem" that
 	 * happened while transferring ready events to userspace w/out
-	 * holding ->wq.lock.
+	 * holding ->lock.
 	 */
 	struct epitem *ovflist;
 
@@ -697,17 +698,17 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
 	 * because we want the "sproc" callback to be able to do it
 	 * in a lockless way.
 	 */
-	spin_lock_irq(&ep->wq.lock);
+	write_lock_irq(&ep->lock);
 	list_splice_init(&ep->rdllist, &txlist);
 	WRITE_ONCE(ep->ovflist, NULL);
-	spin_unlock_irq(&ep->wq.lock);
+	write_unlock_irq(&ep->lock);
 
 	/*
 	 * Now call the callback function.
 	 */
 	res = (*sproc)(ep, &txlist, priv);
 
-	spin_lock_irq(&ep->wq.lock);
+	write_lock_irq(&ep->lock);
 	/*
 	 * During the time we spent inside the "sproc" callback, some
 	 * other events might have been queued by the poll callback.
@@ -749,11 +750,11 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
 		 * the ->poll() wait list (delayed after we release the lock).
 		 */
 		if (waitqueue_active(&ep->wq))
-			wake_up_locked(&ep->wq);
+			wake_up(&ep->wq);
 		if (waitqueue_active(&ep->poll_wait))
 			pwake++;
 	}
-	spin_unlock_irq(&ep->wq.lock);
+	write_unlock_irq(&ep->lock);
 
 	if (!ep_locked)
 		mutex_unlock(&ep->mtx);
@@ -793,10 +794,10 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 
 	rb_erase_cached(&epi->rbn, &ep->rbr);
 
-	spin_lock_irq(&ep->wq.lock);
+	write_lock_irq(&ep->lock);
 	if (ep_is_linked(epi))
 		list_del_init(&epi->rdllink);
-	spin_unlock_irq(&ep->wq.lock);
+	write_unlock_irq(&ep->lock);
 
 	wakeup_source_unregister(ep_wakeup_source(epi));
 	/*
@@ -846,7 +847,7 @@ static void ep_free(struct eventpoll *ep)
 	 * Walks through the whole tree by freeing each "struct epitem". At this
 	 * point we are sure no poll callbacks will be lingering around, and also by
 	 * holding "epmutex" we can be sure that no file cleanup code will hit
-	 * us during this operation. So we can avoid the lock on "ep->wq.lock".
+	 * us during this operation. So we can avoid the lock on "ep->lock".
 	 * We do not need to lock ep->mtx, either, we only do it to prevent
 	 * a lockdep warning.
 	 */
@@ -1027,6 +1028,7 @@ static int ep_alloc(struct eventpoll **pep)
 		goto free_uid;
 
 	mutex_init(&ep->mtx);
+	rwlock_init(&ep->lock);
 	init_waitqueue_head(&ep->wq);
 	init_waitqueue_head(&ep->poll_wait);
 	INIT_LIST_HEAD(&ep->rdllist);
@@ -1116,10 +1118,61 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd,
 }
 #endif /* CONFIG_CHECKPOINT_RESTORE */
 
+/*
+ * Adds a new entry to the tail of the list in a lockless way, i.e.
+ * multiple CPUs are allowed to call this function concurrently.
+ *
+ * Beware: it is necessary to prevent any other modifications of the
+ *         existing list until all changes are completed, in other words
+ *         concurrent list_add_tail_lockless() calls should be protected
+ *         with a read lock, where write lock acts as a barrier which
+ *         makes sure all list_add_tail_lockless() calls are fully
+ *         completed.
+ *
+ *        Also an element can be locklessly added to the list only in one
+ *        direction i.e. either to the tail either to the head, otherwise
+ *        concurrent access will corrupt the list.
+ */
+static inline void list_add_tail_lockless(struct list_head *new,
+					  struct list_head *head)
+{
+	struct list_head *prev;
+
+	new->next = head;
+
+	/*
+	 * Initially ->next of a new element must be updated with the head
+	 * (we are inserting to the tail) and only then pointers are atomically
+	 * exchanged.  XCHG guarantees memory ordering, thus ->next should be
+	 * updated before pointers are actually swapped.
+	 */
+
+	prev = xchg(&head->prev, new);
+
+	/*
+	 * It is safe to modify prev->next and new->prev, because a new element
+	 * is added only to the tail and new->next is updated before XCHG.
+	 */
+
+	prev->next = new;
+	new->prev = prev;
+}
+
 /*
  * This is the callback that is passed to the wait queue wakeup
  * mechanism. It is called by the stored file descriptors when they
  * have events to report.
+ *
+ * This callback takes a read lock in order not to content with concurrent
+ * events from another file descriptors, thus all modifications to ->rdllist
+ * or ->ovflist are lockless.  Read lock is paired with the write lock from
+ * ep_scan_ready_list(), which stops all list modifications and guarantees
+ * that lists state is seen correctly.
+ *
+ * Another thing worth to mention is that ep_poll_callback() can't be called
+ * concurrently for the same @epi, because wq.lock must be taken by the caller
+ * with interrupts disabled, thus states for epi->next or ep_is_linked() will
+ * be correctly seen.
  */
 static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, void *key)
 {
@@ -1132,7 +1185,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 	/* Interrupts are disabled by the wake_up_*poll() callers */
 	lockdep_assert_irqs_disabled();
 
-	spin_lock(&ep->wq.lock);
+	read_lock(&ep->lock);
 
 	ep_set_busy_poll_napi_id(epi);
 
@@ -1162,8 +1215,8 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 	 */
 	if (READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR) {
 		if (epi->next == EP_UNACTIVE_PTR) {
-			epi->next = READ_ONCE(ep->ovflist);
-			WRITE_ONCE(ep->ovflist, epi);
+			/* Atomically exchange tail */
+			epi->next = xchg(&ep->ovflist, epi);
 			if (epi->ws) {
 				/*
 				 * Activate ep->ws since epi->ws may get
@@ -1178,7 +1231,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 
 	/* If this file is already in the ready list we exit soon */
 	if (!ep_is_linked(epi)) {
-		list_add_tail(&epi->rdllink, &ep->rdllist);
+		list_add_tail_lockless(&epi->rdllink, &ep->rdllist);
 		ep_pm_stay_awake_rcu(epi);
 	}
 
@@ -1203,13 +1256,13 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 				break;
 			}
 		}
-		wake_up_locked(&ep->wq);
+		wake_up(&ep->wq);
 	}
 	if (waitqueue_active(&ep->poll_wait))
 		pwake++;
 
 out_unlock:
-	spin_unlock(&ep->wq.lock);
+	read_unlock(&ep->lock);
 
 	/* We have to call this outside the lock */
 	if (pwake)
@@ -1494,7 +1547,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 		goto error_remove_epi;
 
 	/* We have to drop the new item inside our item list to keep track of it */
-	spin_lock_irq(&ep->wq.lock);
+	write_lock_irq(&ep->lock);
 
 	/* record NAPI ID of new item if present */
 	ep_set_busy_poll_napi_id(epi);
@@ -1506,12 +1559,12 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 
 		/* Notify waiting tasks that events are available */
 		if (waitqueue_active(&ep->wq))
-			wake_up_locked(&ep->wq);
+			wake_up(&ep->wq);
 		if (waitqueue_active(&ep->poll_wait))
 			pwake++;
 	}
 
-	spin_unlock_irq(&ep->wq.lock);
+	write_unlock_irq(&ep->lock);
 
 	atomic_long_inc(&ep->user->epoll_watches);
 
@@ -1537,10 +1590,10 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 	 * list, since that is used/cleaned only inside a section bound by "mtx".
 	 * And ep_insert() is called with "mtx" held.
 	 */
-	spin_lock_irq(&ep->wq.lock);
+	write_lock_irq(&ep->lock);
 	if (ep_is_linked(epi))
 		list_del_init(&epi->rdllink);
-	spin_unlock_irq(&ep->wq.lock);
+	write_unlock_irq(&ep->lock);
 
 	wakeup_source_unregister(ep_wakeup_source(epi));
 
@@ -1584,9 +1637,9 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
 	 * 1) Flush epi changes above to other CPUs.  This ensures
 	 *    we do not miss events from ep_poll_callback if an
 	 *    event occurs immediately after we call f_op->poll().
-	 *    We need this because we did not take ep->wq.lock while
+	 *    We need this because we did not take ep->lock while
 	 *    changing epi above (but ep_poll_callback does take
-	 *    ep->wq.lock).
+	 *    ep->lock).
 	 *
 	 * 2) We also need to ensure we do not miss _past_ events
 	 *    when calling f_op->poll().  This barrier also
@@ -1605,18 +1658,18 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
 	 * list, push it inside.
 	 */
 	if (ep_item_poll(epi, &pt, 1)) {
-		spin_lock_irq(&ep->wq.lock);
+		write_lock_irq(&ep->lock);
 		if (!ep_is_linked(epi)) {
 			list_add_tail(&epi->rdllink, &ep->rdllist);
 			ep_pm_stay_awake(epi);
 
 			/* Notify waiting tasks that events are available */
 			if (waitqueue_active(&ep->wq))
-				wake_up_locked(&ep->wq);
+				wake_up(&ep->wq);
 			if (waitqueue_active(&ep->poll_wait))
 				pwake++;
 		}
-		spin_unlock_irq(&ep->wq.lock);
+		write_unlock_irq(&ep->lock);
 	}
 
 	/* We have to call this outside the lock */
@@ -1777,9 +1830,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		 */
 		timed_out = 1;
 
-		spin_lock_irq(&ep->wq.lock);
+		write_lock_irq(&ep->lock);
 		eavail = ep_events_available(ep);
-		spin_unlock_irq(&ep->wq.lock);
+		write_unlock_irq(&ep->lock);
 
 		goto send_events;
 	}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention
       [not found]   ` <20181212171348.GA12786@andrea>
@ 2018-12-13 10:13     ` Roman Penyaev
  2018-12-13 11:19       ` Andrea Parri
  0 siblings, 1 reply; 11+ messages in thread
From: Roman Penyaev @ 2018-12-13 10:13 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Davidlohr Bueso, Jason Baron, Al Viro, Paul E. McKenney,
	Linus Torvalds, Andrew Morton, linux-fsdevel, linux-kernel

On 2018-12-12 18:13, Andrea Parri wrote:
> On Wed, Dec 12, 2018 at 12:03:57PM +0100, Roman Penyaev wrote:

[...]

>> +static inline void list_add_tail_lockless(struct list_head *new,
>> +					  struct list_head *head)
>> +{
>> +	struct list_head *prev;
>> +
>> +	new->next = head;
>> +
>> +	/*
>> +	 * Initially ->next of a new element must be updated with the head
>> +	 * (we are inserting to the tail) and only then pointers are 
>> atomically
>> +	 * exchanged.  XCHG guarantees memory ordering, thus ->next should 
>> be
>> +	 * updated before pointers are actually swapped.
>> +	 */
>> +
>> +	prev = xchg(&head->prev, new);
>> +
>> +	/*
>> +	 * It is safe to modify prev->next and new->prev, because a new 
>> element
>> +	 * is added only to the tail and new->next is updated before XCHG.
>> +	 */
> 
> IIUC, you're also relying on "some" ordering between the atomic load
> of &head->prev above and the store to prev->next below: consider the
> following snippet for two concurrent list_add_tail_lockless()'s:
> 
> {Initially: List := H -> A -> B}
> 
> CPU0					CPU1
> 
> list_add_tail_lockless(C, H):		list_add_tail_lockless(D, H):
> 
> C->next = H				D->next = H
> prev = xchg(&H->prev, C) // =B		prev = xchg(&H->prev, D) // =C
> B->next = C				C->next = D
> C->prev = B				D->prev = C
> 
> Here, as annotated, CPU0's xchg() "wins" over CPU1's xchg() (i.e., the
> latter reads the value of &H->prev that the former stored to that same
> location).
> 
> As you noted above, the xchg() guarantees that CPU0's store to C->next
> is "ordered before" CPU0's store to &H->prev.
> 
> But we also want CPU1's load from &H->prev to be ordered before CPU1's
> store to C->next, which is also guaranteed by the xchg() (or, FWIW, by
> the address dependency between these two memory accesses).
> 
> I do not see what could guarantee "C->next == D" in the end, otherwise.
> 
> What am I missing?

Hi Andrea,

xchg always acts as a full memory barrier, i.e. mfence in x86 terms.  So 
the
following statement should be always true, otherwise nothing should work 
as
the same code pattern is used in many generic places:

    CPU0               CPU1

  C->next = H
  xchg(&ptr, C)
                      C = xchg(&ptr, D)
                      C->next = D


This is the only guarantee we need, i.e. make it simplier:

    C->next = H
    mfence            mfence
                      C->next = D

the gurantee that two stores won't reorder.  Pattern is always the same: 
we
prepare chunk of memory on CPU0 and do pointers xchg, CPU1 sees chunks 
of
memory with all stores committed by CPU0 (regardless of CPU1 does loads
or stores to this chunk).

I am repeating the same thing which you also noted, but I just want to 
be
sure that I do not say nonsense.  So basically repeating to myself.

Ok, let's commit that.  Returning to your question: "I do not see what
could guarantee "C->next == D" in the end"

At the end of what?  Lockless insert procedure (insert to tail) relies 
only
on "head->prev".  This is the single "place" where we atomically 
exchange
list elements and "somehow" chain them.  So insert needs only actual
"head->prev", and xchg provides this guarantees to us.

But there is also a user of the list, who needs to iterate over the list
or to delete elements, etc, i.e. this user of the list needs list fully
committed to the memory.  This user takes write_lock().  So answering 
your
question (if I understood it correctly): at the end write_lock() 
guarantees
that list won't be seen as corrupted and updates to the last element, 
i.e.
"->next" or "->prev" pointers of the last element are committed and seen
correctly.

--
Roman






^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention
  2018-12-13 10:13     ` Roman Penyaev
@ 2018-12-13 11:19       ` Andrea Parri
  2018-12-13 12:19         ` Roman Penyaev
  0 siblings, 1 reply; 11+ messages in thread
From: Andrea Parri @ 2018-12-13 11:19 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Davidlohr Bueso, Jason Baron, Al Viro, Paul E. McKenney,
	Linus Torvalds, Andrew Morton, linux-fsdevel, linux-kernel

On Thu, Dec 13, 2018 at 11:13:58AM +0100, Roman Penyaev wrote:
> On 2018-12-12 18:13, Andrea Parri wrote:
> > On Wed, Dec 12, 2018 at 12:03:57PM +0100, Roman Penyaev wrote:
> 
> [...]
> 
> > > +static inline void list_add_tail_lockless(struct list_head *new,
> > > +					  struct list_head *head)
> > > +{
> > > +	struct list_head *prev;
> > > +
> > > +	new->next = head;
> > > +
> > > +	/*
> > > +	 * Initially ->next of a new element must be updated with the head
> > > +	 * (we are inserting to the tail) and only then pointers are
> > > atomically
> > > +	 * exchanged.  XCHG guarantees memory ordering, thus ->next should
> > > be
> > > +	 * updated before pointers are actually swapped.
> > > +	 */
> > > +
> > > +	prev = xchg(&head->prev, new);
> > > +
> > > +	/*
> > > +	 * It is safe to modify prev->next and new->prev, because a new
> > > element
> > > +	 * is added only to the tail and new->next is updated before XCHG.
> > > +	 */
> > 
> > IIUC, you're also relying on "some" ordering between the atomic load
> > of &head->prev above and the store to prev->next below: consider the
> > following snippet for two concurrent list_add_tail_lockless()'s:
> > 
> > {Initially: List := H -> A -> B}
> > 
> > CPU0					CPU1
> > 
> > list_add_tail_lockless(C, H):		list_add_tail_lockless(D, H):
> > 
> > C->next = H				D->next = H
> > prev = xchg(&H->prev, C) // =B		prev = xchg(&H->prev, D) // =C
> > B->next = C				C->next = D
> > C->prev = B				D->prev = C
> > 
> > Here, as annotated, CPU0's xchg() "wins" over CPU1's xchg() (i.e., the
> > latter reads the value of &H->prev that the former stored to that same
> > location).
> > 
> > As you noted above, the xchg() guarantees that CPU0's store to C->next
> > is "ordered before" CPU0's store to &H->prev.
> > 
> > But we also want CPU1's load from &H->prev to be ordered before CPU1's
> > store to C->next, which is also guaranteed by the xchg() (or, FWIW, by
> > the address dependency between these two memory accesses).
> > 
> > I do not see what could guarantee "C->next == D" in the end, otherwise.
> > 
> > What am I missing?
> 
> Hi Andrea,
> 
> xchg always acts as a full memory barrier, i.e. mfence in x86 terms.  So the
> following statement should be always true, otherwise nothing should work as
> the same code pattern is used in many generic places:
> 
>    CPU0               CPU1
> 
>  C->next = H
>  xchg(&ptr, C)
>                      C = xchg(&ptr, D)
>                      C->next = D
> 
> 
> This is the only guarantee we need, i.e. make it simplier:
> 
>    C->next = H
>    mfence            mfence
>                      C->next = D
> 
> the gurantee that two stores won't reorder.  Pattern is always the same: we
> prepare chunk of memory on CPU0 and do pointers xchg, CPU1 sees chunks of
> memory with all stores committed by CPU0 (regardless of CPU1 does loads
> or stores to this chunk).
> 
> I am repeating the same thing which you also noted, but I just want to be
> sure that I do not say nonsense.  So basically repeating to myself.
> 
> Ok, let's commit that.  Returning to your question: "I do not see what
> could guarantee "C->next == D" in the end"
> 
> At the end of what?  Lockless insert procedure (insert to tail) relies only
> on "head->prev".  This is the single "place" where we atomically exchange
> list elements and "somehow" chain them.  So insert needs only actual
> "head->prev", and xchg provides this guarantees to us.

When all the operations reported in the snippet have completed (i.e.,
executed and propagated to memory).

To rephrase my remark:

I am saying that we do need some ordering between the xchg() and the
program-order _subsequent stores, and implicitly suggesting to write
this down in the comment.  As I wrote, this ordering _is provided by
the xchg() itself or by the dependency; so, maybe, something like:

	/*
	 * [...]  XCHG guarantees memory ordering, thus new->next is
	 * updated before pointers are actually swapped and pointers
	 * are swapped before prev->next is updated.
	 */

Adding a snippet, say in the form you reported above, would not hurt
of course. ;-)

  Andrea


> 
> But there is also a user of the list, who needs to iterate over the list
> or to delete elements, etc, i.e. this user of the list needs list fully
> committed to the memory.  This user takes write_lock().  So answering your
> question (if I understood it correctly): at the end write_lock() guarantees
> that list won't be seen as corrupted and updates to the last element, i.e.
> "->next" or "->prev" pointers of the last element are committed and seen
> correctly.
> 
> --
> Roman
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention
  2018-12-13 11:19       ` Andrea Parri
@ 2018-12-13 12:19         ` Roman Penyaev
  0 siblings, 0 replies; 11+ messages in thread
From: Roman Penyaev @ 2018-12-13 12:19 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Davidlohr Bueso, Jason Baron, Al Viro, Paul E. McKenney,
	Linus Torvalds, Andrew Morton, linux-fsdevel, linux-kernel

On 2018-12-13 12:19, Andrea Parri wrote:
> On Thu, Dec 13, 2018 at 11:13:58AM +0100, Roman Penyaev wrote:
>> On 2018-12-12 18:13, Andrea Parri wrote:
>> > On Wed, Dec 12, 2018 at 12:03:57PM +0100, Roman Penyaev wrote:
>> 
>> [...]
>> 
>> > > +static inline void list_add_tail_lockless(struct list_head *new,
>> > > +					  struct list_head *head)
>> > > +{
>> > > +	struct list_head *prev;
>> > > +
>> > > +	new->next = head;
>> > > +
>> > > +	/*
>> > > +	 * Initially ->next of a new element must be updated with the head
>> > > +	 * (we are inserting to the tail) and only then pointers are
>> > > atomically
>> > > +	 * exchanged.  XCHG guarantees memory ordering, thus ->next should
>> > > be
>> > > +	 * updated before pointers are actually swapped.
>> > > +	 */
>> > > +
>> > > +	prev = xchg(&head->prev, new);
>> > > +
>> > > +	/*
>> > > +	 * It is safe to modify prev->next and new->prev, because a new
>> > > element
>> > > +	 * is added only to the tail and new->next is updated before XCHG.
>> > > +	 */
>> >
>> > IIUC, you're also relying on "some" ordering between the atomic load
>> > of &head->prev above and the store to prev->next below: consider the
>> > following snippet for two concurrent list_add_tail_lockless()'s:
>> >
>> > {Initially: List := H -> A -> B}
>> >
>> > CPU0					CPU1
>> >
>> > list_add_tail_lockless(C, H):		list_add_tail_lockless(D, H):
>> >
>> > C->next = H				D->next = H
>> > prev = xchg(&H->prev, C) // =B		prev = xchg(&H->prev, D) // =C
>> > B->next = C				C->next = D
>> > C->prev = B				D->prev = C
>> >
>> > Here, as annotated, CPU0's xchg() "wins" over CPU1's xchg() (i.e., the
>> > latter reads the value of &H->prev that the former stored to that same
>> > location).
>> >
>> > As you noted above, the xchg() guarantees that CPU0's store to C->next
>> > is "ordered before" CPU0's store to &H->prev.
>> >
>> > But we also want CPU1's load from &H->prev to be ordered before CPU1's
>> > store to C->next, which is also guaranteed by the xchg() (or, FWIW, by
>> > the address dependency between these two memory accesses).
>> >
>> > I do not see what could guarantee "C->next == D" in the end, otherwise.
>> >
>> > What am I missing?
>> 
>> Hi Andrea,
>> 
>> xchg always acts as a full memory barrier, i.e. mfence in x86 terms.  
>> So the
>> following statement should be always true, otherwise nothing should 
>> work as
>> the same code pattern is used in many generic places:
>> 
>>    CPU0               CPU1
>> 
>>  C->next = H
>>  xchg(&ptr, C)
>>                      C = xchg(&ptr, D)
>>                      C->next = D
>> 
>> 
>> This is the only guarantee we need, i.e. make it simplier:
>> 
>>    C->next = H
>>    mfence            mfence
>>                      C->next = D
>> 
>> the gurantee that two stores won't reorder.  Pattern is always the 
>> same: we
>> prepare chunk of memory on CPU0 and do pointers xchg, CPU1 sees chunks 
>> of
>> memory with all stores committed by CPU0 (regardless of CPU1 does 
>> loads
>> or stores to this chunk).
>> 
>> I am repeating the same thing which you also noted, but I just want to 
>> be
>> sure that I do not say nonsense.  So basically repeating to myself.
>> 
>> Ok, let's commit that.  Returning to your question: "I do not see what
>> could guarantee "C->next == D" in the end"
>> 
>> At the end of what?  Lockless insert procedure (insert to tail) relies 
>> only
>> on "head->prev".  This is the single "place" where we atomically 
>> exchange
>> list elements and "somehow" chain them.  So insert needs only actual
>> "head->prev", and xchg provides this guarantees to us.
> 
> When all the operations reported in the snippet have completed (i.e.,
> executed and propagated to memory).
> 
> To rephrase my remark:
> 
> I am saying that we do need some ordering between the xchg() and the
> program-order _subsequent stores, and implicitly suggesting to write
> this down in the comment.  As I wrote, this ordering _is provided by
> the xchg() itself or by the dependency; so, maybe, something like:
> 
> 	/*
> 	 * [...]  XCHG guarantees memory ordering, thus new->next is
> 	 * updated before pointers are actually swapped and pointers
> 	 * are swapped before prev->next is updated.
> 	 */
> 
> Adding a snippet, say in the form you reported above, would not hurt
> of course. ;-)

Sure thing.  Will extend the comments.

--
Roman


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] use rwlock in order to reduce ep_poll_callback() contention
  2018-12-12 11:03 [PATCH 0/3] use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
                   ` (2 preceding siblings ...)
  2018-12-12 11:03 ` [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
@ 2018-12-13 18:13 ` Davidlohr Bueso
  2018-12-17 11:49   ` Roman Penyaev
  3 siblings, 1 reply; 11+ messages in thread
From: Davidlohr Bueso @ 2018-12-13 18:13 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Jason Baron, Al Viro, Paul E. McKenney, Linus Torvalds,
	Andrew Morton, linux-fsdevel, linux-kernel

On 2018-12-12 03:03, Roman Penyaev wrote:
> The last patch targets the contention problem in ep_poll_callback(), 
> which
> can be very well reproduced by generating events (write to pipe or 
> eventfd)
> from many threads, while consumer thread does polling.
> 
> The following are some microbenchmark results based on the test [1] 
> which
> starts threads which generate N events each.  The test ends when all 
> events
> are successfully fetched by the poller thread:
> 
>  spinlock
>  ========
> 
>  threads  events/ms  run-time ms
>        8       6402        12495
>       16       7045        22709
>       32       7395        43268
> 
>  rwlock + xchg
>  =============
> 
>  threads  events/ms  run-time ms
>        8      10038         7969
>       16      12178        13138
>       32      13223        24199
> 
> 
> According to the results bandwidth of delivered events is significantly
> increased, thus execution time is reduced.
> 
> This series is based on linux-next/akpm and differs from RFC in that
> additional cleanup patches and explicit comments have been added.
> 
> [1] https://github.com/rouming/test-tools/blob/master/stress-epoll.c

Care to "port" this to 'perf bench epoll', in linux-next? I've been 
trying to unify into perf bench the whole epoll performance testcases 
kernel developers can use when making changes and it would be useful.

I ran these patches on the 'wait' workload which is a epoll_wait(2) 
stresser. On a 40-core IvyBridge it shows good performance improvements 
for increasing number of file descriptors each of the 40 threads deals 
with:

64   fds: +20%
512  fds: +30%
1024 fds: +50%

(Yes these are pretty raw measurements ops/sec). Unlike your benchmark, 
though, there is only single writer thread, and therefore is less ideal 
to measure optimizations when IO becomes available. Hence it would be 
nice to also have this.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] epoll: make sure all elements in ready list are in FIFO order
  2018-12-12 11:03 ` [PATCH 1/3] epoll: make sure all elements in ready list are in FIFO order Roman Penyaev
@ 2018-12-13 19:30   ` Davidlohr Bueso
  0 siblings, 0 replies; 11+ messages in thread
From: Davidlohr Bueso @ 2018-12-13 19:30 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Jason Baron, Al Viro, Andrew Morton, Linus Torvalds,
	linux-fsdevel, linux-kernel

On 2018-12-12 03:03, Roman Penyaev wrote:
> All coming events are stored in FIFO order and this is also should be
> applicable to ->ovflist, which originally is stack, i.e. LIFO.
> 
> Thus to keep correct FIFO order ->ovflist should reversed by adding
> elements to the head of the read list but not to the tail.

So the window for which the ovflist is used can be actually non-trivial 
(ie lots of copy_to_user) and I just hope nobody out there is relying on 
particular wakeup order. otoh nobody has every complained about this 
"reverse" order and not having the perfect queue. And hopefully the same 
will be for this case.

With that:

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>

> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Davidlohr Bueso <dbueso@suse.de>
> Cc: Jason Baron <jbaron@akamai.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  fs/eventpoll.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 2329f96469e2..3627c2e07149 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -722,7 +722,11 @@ static __poll_t ep_scan_ready_list(struct 
> eventpoll *ep,
>  		 * contain them, and the list_splice() below takes care of them.
>  		 */
>  		if (!ep_is_linked(epi)) {
> -			list_add_tail(&epi->rdllink, &ep->rdllist);
> +			/*
> +			 * ->ovflist is LIFO, so we have to reverse it in order
> +			 * to keep in FIFO.
> +			 */
> +			list_add(&epi->rdllink, &ep->rdllist);
>  			ep_pm_stay_awake(epi);
>  		}
>  	}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] use rwlock in order to reduce ep_poll_callback() contention
  2018-12-13 18:13 ` [PATCH 0/3] " Davidlohr Bueso
@ 2018-12-17 11:49   ` Roman Penyaev
  2018-12-17 18:01     ` Davidlohr Bueso
  0 siblings, 1 reply; 11+ messages in thread
From: Roman Penyaev @ 2018-12-17 11:49 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Jason Baron, Al Viro, Paul E. McKenney, Linus Torvalds,
	Andrew Morton, linux-fsdevel, linux-kernel

On 2018-12-13 19:13, Davidlohr Bueso wrote:
> On 2018-12-12 03:03, Roman Penyaev wrote:
>> The last patch targets the contention problem in ep_poll_callback(), 
>> which
>> can be very well reproduced by generating events (write to pipe or 
>> eventfd)
>> from many threads, while consumer thread does polling.
>> 
>> The following are some microbenchmark results based on the test [1] 
>> which
>> starts threads which generate N events each.  The test ends when all 
>> events
>> are successfully fetched by the poller thread:
>> 
>>  spinlock
>>  ========
>> 
>>  threads  events/ms  run-time ms
>>        8       6402        12495
>>       16       7045        22709
>>       32       7395        43268
>> 
>>  rwlock + xchg
>>  =============
>> 
>>  threads  events/ms  run-time ms
>>        8      10038         7969
>>       16      12178        13138
>>       32      13223        24199
>> 
>> 
>> According to the results bandwidth of delivered events is 
>> significantly
>> increased, thus execution time is reduced.
>> 
>> This series is based on linux-next/akpm and differs from RFC in that
>> additional cleanup patches and explicit comments have been added.
>> 
>> [1] https://github.com/rouming/test-tools/blob/master/stress-epoll.c
> 
> Care to "port" this to 'perf bench epoll', in linux-next? I've been
> trying to unify into perf bench the whole epoll performance testcases
> kernel developers can use when making changes and it would be useful.

Yes, good idea.  But frankly I do not want to bloat epoll-wait.c with
my multi-writers-single-reader test case, because soon epoll-wait.c
will become unmaintainable with all possible loads and set of
different options.

Can we have a single, small and separate source for each epoll load?
Easy to fix, easy to maintain, debug/hack.

> I ran these patches on the 'wait' workload which is a epoll_wait(2)
> stresser. On a 40-core IvyBridge it shows good performance
> improvements for increasing number of file descriptors each of the 40
> threads deals with:
> 
> 64   fds: +20%
> 512  fds: +30%
> 1024 fds: +50%
> 
> (Yes these are pretty raw measurements ops/sec). Unlike your
> benchmark, though, there is only single writer thread, and therefore
> is less ideal to measure optimizations when IO becomes available.
> Hence it would be nice to also have this.

That's weird. One writer thread does not content with anybody, only with
consumers, so should not be any big difference.

--
Roman


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] use rwlock in order to reduce ep_poll_callback() contention
  2018-12-17 11:49   ` Roman Penyaev
@ 2018-12-17 18:01     ` Davidlohr Bueso
  0 siblings, 0 replies; 11+ messages in thread
From: Davidlohr Bueso @ 2018-12-17 18:01 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Jason Baron, Al Viro, Paul E. McKenney, Linus Torvalds,
	Andrew Morton, linux-fsdevel, linux-kernel

On 2018-12-17 03:49, Roman Penyaev wrote:
> On 2018-12-13 19:13, Davidlohr Bueso wrote:
> Yes, good idea.  But frankly I do not want to bloat epoll-wait.c with
> my multi-writers-single-reader test case, because soon epoll-wait.c
> will become unmaintainable with all possible loads and set of
> different options.
> 
> Can we have a single, small and separate source for each epoll load?
> Easy to fix, easy to maintain, debug/hack.

Yes completely agree; I was actually thinking along those lines.

> 
>> I ran these patches on the 'wait' workload which is a epoll_wait(2)
>> stresser. On a 40-core IvyBridge it shows good performance
>> improvements for increasing number of file descriptors each of the 40
>> threads deals with:
>> 
>> 64   fds: +20%
>> 512  fds: +30%
>> 1024 fds: +50%
>> 
>> (Yes these are pretty raw measurements ops/sec). Unlike your
>> benchmark, though, there is only single writer thread, and therefore
>> is less ideal to measure optimizations when IO becomes available.
>> Hence it would be nice to also have this.
> 
> That's weird. One writer thread does not content with anybody, only 
> with
> consumers, so should not be any big difference.

Yeah so the irq optimization patch, which is known to boost numbers on 
this microbench, plays an important factor. I just put them all together 
when testing.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-12-17 18:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 11:03 [PATCH 0/3] use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
2018-12-12 11:03 ` [PATCH 1/3] epoll: make sure all elements in ready list are in FIFO order Roman Penyaev
2018-12-13 19:30   ` Davidlohr Bueso
2018-12-12 11:03 ` [PATCH 2/3] epoll: loosen irq safety in ep_poll_callback() Roman Penyaev
2018-12-12 11:03 ` [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
     [not found]   ` <20181212171348.GA12786@andrea>
2018-12-13 10:13     ` Roman Penyaev
2018-12-13 11:19       ` Andrea Parri
2018-12-13 12:19         ` Roman Penyaev
2018-12-13 18:13 ` [PATCH 0/3] " Davidlohr Bueso
2018-12-17 11:49   ` Roman Penyaev
2018-12-17 18:01     ` Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).