linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
@ 2018-07-20 17:29 Davidlohr Bueso
  2018-07-20 17:29 ` [PATCH 1/2] fs/epoll: loosen irq safety in ep_scan_ready_list() Davidlohr Bueso
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Davidlohr Bueso @ 2018-07-20 17:29 UTC (permalink / raw)
  To: akpm; +Cc: jbaron, viro, linux-kernel, dave

Hi,

Both patches replace saving+restoring interrupts when taking the
ep->lock (now the waitqueue lock), with just disabling local irqs.
This shows immediate performance benefits in patch 1 for an epoll
workload running on Xen. The main concern we need to have with this
sort of changes in epoll is the ep_poll_callback() which is passed
to the wait queue wakeup and is done very often under irq context,
this patch does not touch this call.

Patches have been tested pretty heavily with the customer workload,
microbenchmarks, ltp testcases and two high level workloads that
use epoll under the hood: nginx and libevent benchmarks.

Details are in the individual patches.

Applies on top of mmotd.

Thanks!

Davidlohr Bueso (2):
  fs/epoll: loosen irq safety in ep_scan_ready_list()
  fs/epoll: loosen irq safety in epoll_insert() and epoll_remove()

 fs/eventpoll.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] fs/epoll: loosen irq safety in ep_scan_ready_list()
  2018-07-20 17:29 [PATCH -next 0/2] fs/epoll: loosen irq safety when possible Davidlohr Bueso
@ 2018-07-20 17:29 ` Davidlohr Bueso
  2018-07-20 17:29 ` [PATCH 2/2] fs/epoll: loosen irq safety in epoll_insert() and epoll_remove() Davidlohr Bueso
  2018-07-20 19:42 ` [PATCH -next 0/2] fs/epoll: loosen irq safety when possible Andrew Morton
  2 siblings, 0 replies; 13+ messages in thread
From: Davidlohr Bueso @ 2018-07-20 17:29 UTC (permalink / raw)
  To: akpm; +Cc: jbaron, viro, linux-kernel, dave, Davidlohr Bueso

Saving and restoring interrupts in ep_scan_ready_list() is an
overkill as it is never called with interrupts disabled. Loosen
this to simply disabling local irqs such that archs where managing
irqs is expensive or virtual environments. This patch yields
some throughput improvements on a workload that is epoll intensive
running on a single Xen DomU.

1 Job	 7500	-->    8800 enq/s  (+17%)
2 Jobs	14000   -->   15200 enq/s  (+8%)
3 Jobs	20500	-->   22300 enq/s  (+8%)
4 Jobs	25000   -->   28000 enq/s  (+8-12)%

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 fs/eventpoll.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2737ef591b3e..2247769eb941 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -667,7 +667,6 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
 {
 	__poll_t res;
 	int pwake = 0;
-	unsigned long flags;
 	struct epitem *epi, *nepi;
 	LIST_HEAD(txlist);
 
@@ -687,17 +686,17 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
 	 * because we want the "sproc" callback to be able to do it
 	 * in a lockless way.
 	 */
-	spin_lock_irqsave(&ep->wq.lock, flags);
+	spin_lock_irq(&ep->wq.lock);
 	list_splice_init(&ep->rdllist, &txlist);
 	ep->ovflist = NULL;
-	spin_unlock_irqrestore(&ep->wq.lock, flags);
+	spin_unlock_irq(&ep->wq.lock);
 
 	/*
 	 * Now call the callback function.
 	 */
 	res = (*sproc)(ep, &txlist, priv);
 
-	spin_lock_irqsave(&ep->wq.lock, flags);
+	spin_lock_irq(&ep->wq.lock);
 	/*
 	 * During the time we spent inside the "sproc" callback, some
 	 * other events might have been queued by the poll callback.
@@ -739,7 +738,7 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
 		if (waitqueue_active(&ep->poll_wait))
 			pwake++;
 	}
-	spin_unlock_irqrestore(&ep->wq.lock, flags);
+	spin_unlock_irq(&ep->wq.lock);
 
 	if (!ep_locked)
 		mutex_unlock(&ep->mtx);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] fs/epoll: loosen irq safety in epoll_insert() and epoll_remove()
  2018-07-20 17:29 [PATCH -next 0/2] fs/epoll: loosen irq safety when possible Davidlohr Bueso
  2018-07-20 17:29 ` [PATCH 1/2] fs/epoll: loosen irq safety in ep_scan_ready_list() Davidlohr Bueso
@ 2018-07-20 17:29 ` Davidlohr Bueso
  2018-07-20 19:42 ` [PATCH -next 0/2] fs/epoll: loosen irq safety when possible Andrew Morton
  2 siblings, 0 replies; 13+ messages in thread
From: Davidlohr Bueso @ 2018-07-20 17:29 UTC (permalink / raw)
  To: akpm; +Cc: jbaron, viro, linux-kernel, dave, Davidlohr Bueso

Both functions are similar to the context of ep_modify(), called via
epoll_ctl(2). Just like ep_modify(), saving and restoring interrupts
is an overkill in these calls as it will never be called with irqs
disabled. While ep_remove() can be called directly from EPOLL_CTL_DEL,
it can also be called when releasing the file, but this also complies
with the above.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 fs/eventpoll.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2247769eb941..1b1abc461fc0 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -762,7 +762,6 @@ static void epi_rcu_free(struct rcu_head *head)
  */
 static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 {
-	unsigned long flags;
 	struct file *file = epi->ffd.file;
 
 	/*
@@ -777,10 +776,10 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 
 	rb_erase_cached(&epi->rbn, &ep->rbr);
 
-	spin_lock_irqsave(&ep->wq.lock, flags);
+	spin_lock_irq(&ep->wq.lock);
 	if (ep_is_linked(&epi->rdllink))
 		list_del_init(&epi->rdllink);
-	spin_unlock_irqrestore(&ep->wq.lock, flags);
+	spin_unlock_irq(&ep->wq.lock);
 
 	wakeup_source_unregister(ep_wakeup_source(epi));
 	/*
@@ -1409,7 +1408,6 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 {
 	int error, pwake = 0;
 	__poll_t revents;
-	unsigned long flags;
 	long user_watches;
 	struct epitem *epi;
 	struct ep_pqueue epq;
@@ -1476,7 +1474,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 		goto error_remove_epi;
 
 	/* We have to drop the new item inside our item list to keep track of it */
-	spin_lock_irqsave(&ep->wq.lock, flags);
+	spin_lock_irq(&ep->wq.lock);
 
 	/* record NAPI ID of new item if present */
 	ep_set_busy_poll_napi_id(epi);
@@ -1493,7 +1491,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 			pwake++;
 	}
 
-	spin_unlock_irqrestore(&ep->wq.lock, flags);
+	spin_unlock_irq(&ep->wq.lock);
 
 	atomic_long_inc(&ep->user->epoll_watches);
 
@@ -1519,10 +1517,10 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 	 * list, since that is used/cleaned only inside a section bound by "mtx".
 	 * And ep_insert() is called with "mtx" held.
 	 */
-	spin_lock_irqsave(&ep->wq.lock, flags);
+	spin_lock_irq(&ep->wq.lock);
 	if (ep_is_linked(&epi->rdllink))
 		list_del_init(&epi->rdllink);
-	spin_unlock_irqrestore(&ep->wq.lock, flags);
+	spin_unlock_irq(&ep->wq.lock);
 
 	wakeup_source_unregister(ep_wakeup_source(epi));
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
  2018-07-20 17:29 [PATCH -next 0/2] fs/epoll: loosen irq safety when possible Davidlohr Bueso
  2018-07-20 17:29 ` [PATCH 1/2] fs/epoll: loosen irq safety in ep_scan_ready_list() Davidlohr Bueso
  2018-07-20 17:29 ` [PATCH 2/2] fs/epoll: loosen irq safety in epoll_insert() and epoll_remove() Davidlohr Bueso
@ 2018-07-20 19:42 ` Andrew Morton
  2018-07-20 20:05   ` Davidlohr Bueso
  2 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2018-07-20 19:42 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: jbaron, viro, linux-kernel, Peter Zijlstra

On Fri, 20 Jul 2018 10:29:54 -0700 Davidlohr Bueso <dave@stgolabs.net> wrote:

> Hi,
> 
> Both patches replace saving+restoring interrupts when taking the
> ep->lock (now the waitqueue lock), with just disabling local irqs.
> This shows immediate performance benefits in patch 1 for an epoll
> workload running on Xen.

I'm surprised.  Is spin_lock_irqsave() significantly more expensive
than spin_lock_irq()?  Relative to all the other stuff those functions
are doing?  If so, how come?  Some architectural thing makes
local_irq_save() much more costly than local_irq_disable()?

> The main concern we need to have with this
> sort of changes in epoll is the ep_poll_callback() which is passed
> to the wait queue wakeup and is done very often under irq context,
> this patch does not touch this call.

Yeah, these changes are scary.  For the code as it stands now, and for
the code as it evolves.

I'd have more confidence if we had some warning mechanism if we run
spin_lock_irq() when IRQs are disabled, which is probably-a-bug.  But
afaict we don't have that.  Probably for good reasons - I wonder what
they are?

> Patches have been tested pretty heavily with the customer workload,
> microbenchmarks, ltp testcases and two high level workloads that
> use epoll under the hood: nginx and libevent benchmarks.
> 
> Details are in the individual patches.
> 
> Applies on top of mmotd.

Please convince me about the performance benefits?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
  2018-07-20 19:42 ` [PATCH -next 0/2] fs/epoll: loosen irq safety when possible Andrew Morton
@ 2018-07-20 20:05   ` Davidlohr Bueso
  2018-07-20 20:44     ` Andrew Morton
  2018-09-06 19:11     ` Peter Zijlstra
  0 siblings, 2 replies; 13+ messages in thread
From: Davidlohr Bueso @ 2018-07-20 20:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: jbaron, viro, linux-kernel, Peter Zijlstra

On Fri, 20 Jul 2018, Andrew Morton wrote:

>On Fri, 20 Jul 2018 10:29:54 -0700 Davidlohr Bueso <dave@stgolabs.net> wrote:
>
>> Hi,
>>
>> Both patches replace saving+restoring interrupts when taking the
>> ep->lock (now the waitqueue lock), with just disabling local irqs.
>> This shows immediate performance benefits in patch 1 for an epoll
>> workload running on Xen.
>
>I'm surprised.  Is spin_lock_irqsave() significantly more expensive
>than spin_lock_irq()?  Relative to all the other stuff those functions
>are doing?  If so, how come?  Some architectural thing makes
>local_irq_save() much more costly than local_irq_disable()?

For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
the cost of Xen is much higher.

And at least considering ep_scan_ready_list(), the lock is taken/released
twice, to deal with the ovflist when the ep->wq.lock is not held. To the
point that it yields measurable results (see patch 1) across incremental
thread counts.

>
>> The main concern we need to have with this
>> sort of changes in epoll is the ep_poll_callback() which is passed
>> to the wait queue wakeup and is done very often under irq context,
>> this patch does not touch this call.
>
>Yeah, these changes are scary.  For the code as it stands now, and for
>the code as it evolves.

Yes which is why I've been throwing lots of epoll workloads at it.

>
>I'd have more confidence if we had some warning mechanism if we run
>spin_lock_irq() when IRQs are disabled, which is probably-a-bug.  But
>afaict we don't have that.  Probably for good reasons - I wonder what
>they are?
>
>> Patches have been tested pretty heavily with the customer workload,
>> microbenchmarks, ltp testcases and two high level workloads that
>> use epoll under the hood: nginx and libevent benchmarks.
>>
>> Details are in the individual patches.
>>
>> Applies on top of mmotd.
>
>Please convince me about the performance benefits?

As for number I only have patch 1.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
  2018-07-20 20:05   ` Davidlohr Bueso
@ 2018-07-20 20:44     ` Andrew Morton
  2018-07-21  0:22       ` Davidlohr Bueso
  2018-07-21 17:21       ` Davidlohr Bueso
  2018-09-06 19:11     ` Peter Zijlstra
  1 sibling, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2018-07-20 20:44 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: jbaron, viro, linux-kernel, Peter Zijlstra

On Fri, 20 Jul 2018 13:05:59 -0700 Davidlohr Bueso <dave@stgolabs.net> wrote:

> On Fri, 20 Jul 2018, Andrew Morton wrote:
> 
> >On Fri, 20 Jul 2018 10:29:54 -0700 Davidlohr Bueso <dave@stgolabs.net> wrote:
> >
> >> Hi,
> >>
> >> Both patches replace saving+restoring interrupts when taking the
> >> ep->lock (now the waitqueue lock), with just disabling local irqs.
> >> This shows immediate performance benefits in patch 1 for an epoll
> >> workload running on Xen.
> >
> >I'm surprised.  Is spin_lock_irqsave() significantly more expensive
> >than spin_lock_irq()?  Relative to all the other stuff those functions
> >are doing?  If so, how come?  Some architectural thing makes
> >local_irq_save() much more costly than local_irq_disable()?
> 
> For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
> the cost of Xen is much higher.
> 
> And at least considering ep_scan_ready_list(), the lock is taken/released
> twice, to deal with the ovflist when the ep->wq.lock is not held. To the
> point that it yields measurable results (see patch 1) across incremental
> thread counts.

Did you try measuring it on bare hardware?

> >
> >> The main concern we need to have with this
> >> sort of changes in epoll is the ep_poll_callback() which is passed
> >> to the wait queue wakeup and is done very often under irq context,
> >> this patch does not touch this call.
> >
> >Yeah, these changes are scary.  For the code as it stands now, and for
> >the code as it evolves.
> 
> Yes which is why I've been throwing lots of epoll workloads at it.

I'm sure.  It's the "as it evolves" that is worrisome, and has caught
us in the past.

> >
> >I'd have more confidence if we had some warning mechanism if we run
> >spin_lock_irq() when IRQs are disabled, which is probably-a-bug.  But
> >afaict we don't have that.  Probably for good reasons - I wonder what
> >they are?

Well ignored ;)

We could open-code it locally.  Add a couple of
WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
Xen but surely just reading the thing isn't too expensive?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
  2018-07-20 20:44     ` Andrew Morton
@ 2018-07-21  0:22       ` Davidlohr Bueso
  2018-07-21 17:21       ` Davidlohr Bueso
  1 sibling, 0 replies; 13+ messages in thread
From: Davidlohr Bueso @ 2018-07-21  0:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: jbaron, viro, linux-kernel, Peter Zijlstra

On Fri, 20 Jul 2018, Andrew Morton wrote:

>Did you try measuring it on bare hardware?

I did and wasn't expecting much difference.

For a 2-socket 40-core (ht) IvyBridge on a few workloads, unfortunately
I don't have a xen environment and the results for Xen I do have (which numbers
are in patch 1) I don't have the actual workload, so cannot compare them directly.

1) Different configurations were used for a epoll_wait (pipes io) microbench
(http://linux-scalability.org/epoll/epoll-test.c) and shows around a 7-10%
improvement in overall total number of times the epoll_wait() loops when using
both regular and nested epolls, so very raw numbers, but measurable nonetheless.

# threads	vanilla		dirty
     1		1677717		1805587
     2		1660510		1854064
     4		1610184		1805484
     8		1577696		1751222
     16		1568837		1725299
     32		1291532		1378463
     64		 752584		 787368

Note that stddev is pretty small.

2) Another pipe test, which shows no real measurable improvement.
(http://www.xmailserver.org/linux-patches/pipetest.c)

>> >
>> >I'd have more confidence if we had some warning mechanism if we run
>> >spin_lock_irq() when IRQs are disabled, which is probably-a-bug.  But
>> >afaict we don't have that.  Probably for good reasons - I wonder what
>> >they are?
>
>Well ignored ;)
>
>We could open-code it locally.  Add a couple of
>WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
>Xen but surely just reading the thing isn't too expensive?

I agree, I'll see what I can come up with and also ask the customer to test
in his setup. Bare metal would also need some new numbers I guess.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
  2018-07-20 20:44     ` Andrew Morton
  2018-07-21  0:22       ` Davidlohr Bueso
@ 2018-07-21 17:21       ` Davidlohr Bueso
  2018-07-21 17:39         ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Davidlohr Bueso @ 2018-07-21 17:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: jbaron, viro, linux-kernel, Peter Zijlstra

On Fri, 20 Jul 2018, Andrew Morton wrote:

>We could open-code it locally.  Add a couple of
>WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
>Xen but surely just reading the thing isn't too expensive?

We could also pass on the responsibility to lockdep and just use
lockdep_assert_irqs_disabled(). But I guess that would be less effective
than to just open code it in epoll without lockdep -- note that over 80
places in the kernel do this.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
  2018-07-21 17:21       ` Davidlohr Bueso
@ 2018-07-21 17:39         ` Peter Zijlstra
  2018-07-21 18:31           ` Davidlohr Bueso
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2018-07-21 17:39 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Andrew Morton, jbaron, viro, linux-kernel

On Sat, Jul 21, 2018 at 10:21:20AM -0700, Davidlohr Bueso wrote:
> On Fri, 20 Jul 2018, Andrew Morton wrote:
> 
> > We could open-code it locally.  Add a couple of
> > WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
> > Xen but surely just reading the thing isn't too expensive?
> 
> We could also pass on the responsibility to lockdep and just use
> lockdep_assert_irqs_disabled(). But I guess that would be less effective
> than to just open code it in epoll without lockdep -- note that over 80
> places in the kernel do this.

The lockdep thing is relatively recent. I think someone proposed to go
replace a bunch of the open-coded ones at some point.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
  2018-07-21 17:39         ` Peter Zijlstra
@ 2018-07-21 18:31           ` Davidlohr Bueso
  2018-07-24 23:43             ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Davidlohr Bueso @ 2018-07-21 18:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andrew Morton, jbaron, viro, linux-kernel

On Sat, 21 Jul 2018, Peter Zijlstra wrote:

>On Sat, Jul 21, 2018 at 10:21:20AM -0700, Davidlohr Bueso wrote:
>> On Fri, 20 Jul 2018, Andrew Morton wrote:
>>
>> > We could open-code it locally.  Add a couple of
>> > WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
>> > Xen but surely just reading the thing isn't too expensive?
>>
>> We could also pass on the responsibility to lockdep and just use
>> lockdep_assert_irqs_disabled(). But I guess that would be less effective
>> than to just open code it in epoll without lockdep -- note that over 80
>> places in the kernel do this.
>
>The lockdep thing is relatively recent. I think someone proposed to go
>replace a bunch of the open-coded ones at some point.

For the open coded checks, I'm seeing a small (1-2% ish) cost for bare
metal on workload 1). I don't see (via code inspection) any additional
overhead in xen either. While negligible in the overall of things, I do
like the idea of lockdep handling it nonetheless.

I can add the open coded version if people really feel that it would catch
more bugs (no lockdep users out there in production afaik :) in the long
term; but if lockdep is where things are headed...

Thanks,
Davidlohr

-------8<--------------------------------------------------------
[PATCH -next 3/2] fs/epoll: robustify irq safety with  lockdep_assert_irqs_enabled()

Sprinkle lockdep_assert_irqs_enabled() checks in the functions that
do not save and restore interrupts when dealing with the ep->wq.lock.
These are ep_scan_ready_list() and those called by epoll_ctl():
ep_insert, ep_modify and ep_remove.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 fs/eventpoll.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1b1abc461fc0..97b9b73dfec8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -670,6 +670,9 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
 	struct epitem *epi, *nepi;
 	LIST_HEAD(txlist);
 
+	/* must not be called with irqs off */
+	lockdep_assert_irqs_enabled();
+
 	/*
 	 * We need to lock this because we could be hit by
 	 * eventpoll_release_file() and epoll_ctl().
@@ -764,6 +767,9 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 {
 	struct file *file = epi->ffd.file;
 
+	/* must not be called with irqs off */
+	lockdep_assert_irqs_enabled();
+
 	/*
 	 * Removes poll wait queue hooks.
 	 */
@@ -1412,6 +1418,9 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 	struct epitem *epi;
 	struct ep_pqueue epq;
 
+	/* must not be called with irqs off */
+	lockdep_assert_irqs_enabled();
+
 	user_watches = atomic_long_read(&ep->user->epoll_watches);
 	if (unlikely(user_watches >= max_user_watches))
 		return -ENOSPC;
@@ -1540,6 +1549,9 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
 	int pwake = 0;
 	poll_table pt;
 
+	/* must not be called with irqs off */
+	lockdep_assert_irqs_enabled();
+
 	init_poll_funcptr(&pt, NULL);
 
 	/*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
  2018-07-21 18:31           ` Davidlohr Bueso
@ 2018-07-24 23:43             ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2018-07-24 23:43 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Peter Zijlstra, jbaron, viro, linux-kernel

On Sat, 21 Jul 2018 11:31:27 -0700 Davidlohr Bueso <dave@stgolabs.net> wrote:

> On Sat, 21 Jul 2018, Peter Zijlstra wrote:
> 
> >On Sat, Jul 21, 2018 at 10:21:20AM -0700, Davidlohr Bueso wrote:
> >> On Fri, 20 Jul 2018, Andrew Morton wrote:
> >>
> >> > We could open-code it locally.  Add a couple of
> >> > WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
> >> > Xen but surely just reading the thing isn't too expensive?
> >>
> >> We could also pass on the responsibility to lockdep and just use
> >> lockdep_assert_irqs_disabled(). But I guess that would be less effective
> >> than to just open code it in epoll without lockdep -- note that over 80
> >> places in the kernel do this.
> >
> >The lockdep thing is relatively recent. I think someone proposed to go
> >replace a bunch of the open-coded ones at some point.
> 
> For the open coded checks, I'm seeing a small (1-2% ish) cost for bare
> metal on workload 1). I don't see (via code inspection) any additional
> overhead in xen either. While negligible in the overall of things, I do
> like the idea of lockdep handling it nonetheless.

It's good that the overhead goes away when lockdep is disabled.  That's
a big advantage over open-coding it.

> I can add the open coded version if people really feel that it would catch
> more bugs (no lockdep users out there in production afaik :) in the long
> term; but if lockdep is where things are headed...
> 
> ...
>
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -670,6 +670,9 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
>  	struct epitem *epi, *nepi;
>  	LIST_HEAD(txlist);
>  
> +	/* must not be called with irqs off */
> +	lockdep_assert_irqs_enabled();

Although I be the humble comment's most avid fan, these ones are too obvious
to be worth the space ;)

Would you survive if I zapped 'em?

--- a/fs/eventpoll.c~fs-epoll-robustify-irq-safety-with-lockdep_assert_irqs_enabled-fix
+++ a/fs/eventpoll.c
@@ -670,7 +670,6 @@ static __poll_t ep_scan_ready_list(struc
 	struct epitem *epi, *nepi;
 	LIST_HEAD(txlist);
 
-	/* must not be called with irqs off */
 	lockdep_assert_irqs_enabled();
 
 	/*
@@ -767,7 +766,6 @@ static int ep_remove(struct eventpoll *e
 {
 	struct file *file = epi->ffd.file;
 
-	/* must not be called with irqs off */
 	lockdep_assert_irqs_enabled();
 
 	/*
@@ -1418,7 +1416,6 @@ static int ep_insert(struct eventpoll *e
 	struct epitem *epi;
 	struct ep_pqueue epq;
 
-	/* must not be called with irqs off */
 	lockdep_assert_irqs_enabled();
 
 	user_watches = atomic_long_read(&ep->user->epoll_watches);
@@ -1549,7 +1546,6 @@ static int ep_modify(struct eventpoll *e
 	int pwake = 0;
 	poll_table pt;
 
-	/* must not be called with irqs off */
 	lockdep_assert_irqs_enabled();
 
 	init_poll_funcptr(&pt, NULL);
_


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
  2018-07-20 20:05   ` Davidlohr Bueso
  2018-07-20 20:44     ` Andrew Morton
@ 2018-09-06 19:11     ` Peter Zijlstra
  2018-09-06 20:55       ` Davidlohr Bueso
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2018-09-06 19:11 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Andrew Morton, jbaron, viro, linux-kernel, Thomas Gleixner

On Fri, Jul 20, 2018 at 01:05:59PM -0700, Davidlohr Bueso wrote:
> On Fri, 20 Jul 2018, Andrew Morton wrote:

> >I'm surprised.  Is spin_lock_irqsave() significantly more expensive
> >than spin_lock_irq()?  Relative to all the other stuff those functions
> >are doing?  If so, how come?  Some architectural thing makes
> >local_irq_save() much more costly than local_irq_disable()?
> 
> For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
> the cost of Xen is much higher.

Xen is a moot argument. IIRC the point is that POPF (as used by
*irqrestore()) is a very expensive operation because it changes all
flags and thus has very 'difficult' instruction dependencies, killing
the front end reorder and generating a giant bubble in the pipeline.

Similarly, I suppose PUSHF is an expensive instruction because it needs
all the flags 'stable' and thus needs to wait for a fair number of prior
instructions to retire before it can get on with it.

Combined the whole PUSHF + POPF is _far_ more expensive than STI + CLI,
because the latter only has dependencies on instructions that muck about
with IF -- not that many.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
  2018-09-06 19:11     ` Peter Zijlstra
@ 2018-09-06 20:55       ` Davidlohr Bueso
  0 siblings, 0 replies; 13+ messages in thread
From: Davidlohr Bueso @ 2018-09-06 20:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andrew Morton, jbaron, viro, linux-kernel, Thomas Gleixner

On Thu, 06 Sep 2018, Peter Zijlstra wrote:

>On Fri, Jul 20, 2018 at 01:05:59PM -0700, Davidlohr Bueso wrote:
>> On Fri, 20 Jul 2018, Andrew Morton wrote:
>
>> >I'm surprised.  Is spin_lock_irqsave() significantly more expensive
>> >than spin_lock_irq()?  Relative to all the other stuff those functions
>> >are doing?  If so, how come?  Some architectural thing makes
>> >local_irq_save() much more costly than local_irq_disable()?
>>
>> For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
>> the cost of Xen is much higher.
>
>Xen is a moot argument. IIRC the point is that POPF (as used by
>*irqrestore()) is a very expensive operation because it changes all
>flags and thus has very 'difficult' instruction dependencies, killing
>the front end reorder and generating a giant bubble in the pipeline.
>
>Similarly, I suppose PUSHF is an expensive instruction because it needs
>all the flags 'stable' and thus needs to wait for a fair number of prior
>instructions to retire before it can get on with it.
>
>Combined the whole PUSHF + POPF is _far_ more expensive than STI + CLI,
>because the latter only has dependencies on instructions that muck about
>with IF -- not that many.

ack.

In fact it turns out that my Xen numbers for this patch were actually
native (popf), and not the xen_restore_fl() as it was using hvm and
not paravirt.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-09-06 20:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-20 17:29 [PATCH -next 0/2] fs/epoll: loosen irq safety when possible Davidlohr Bueso
2018-07-20 17:29 ` [PATCH 1/2] fs/epoll: loosen irq safety in ep_scan_ready_list() Davidlohr Bueso
2018-07-20 17:29 ` [PATCH 2/2] fs/epoll: loosen irq safety in epoll_insert() and epoll_remove() Davidlohr Bueso
2018-07-20 19:42 ` [PATCH -next 0/2] fs/epoll: loosen irq safety when possible Andrew Morton
2018-07-20 20:05   ` Davidlohr Bueso
2018-07-20 20:44     ` Andrew Morton
2018-07-21  0:22       ` Davidlohr Bueso
2018-07-21 17:21       ` Davidlohr Bueso
2018-07-21 17:39         ` Peter Zijlstra
2018-07-21 18:31           ` Davidlohr Bueso
2018-07-24 23:43             ` Andrew Morton
2018-09-06 19:11     ` Peter Zijlstra
2018-09-06 20:55       ` Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).