All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
@ 2017-09-25 14:49 Arnaldo Carvalho de Melo
  2017-09-25 19:45 ` Arnaldo Carvalho de Melo
       [not found] ` <20170925144949.GP29668-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 2 replies; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-09-25 14:49 UTC (permalink / raw)
  To: Mike Marciniszyn, Dennis Dalessandro
  Cc: Thomas Gleixner, Clark Williams, Dean Luick, Doug Ledford,
	Jubin John, Kaike Wan, Leon Romanovsky, Peter Zijlstra,
	Sebastian Andrzej Siewior, Sebastian Sanchez, Steven Rostedt,
	linux-rt-users, linux-rdma

Hi,

	I'm trying to get an Infiniband test case working with the RT
kernel, and ended over tripping over this case:

	In drivers/infiniband/hw/hfi1/pio.c sc_buffer_alloc() disables
preemption that will be reenabled by either pio_copy() or
seg_pio_copy_end().

	But before disabling preemption it grabs a spin lock that will
be dropped after it disables preemption, which ends up triggering a
warning in migrate_disable() later on.

  spin_lock_irqsave(&sc->alloc_lock)
    migrate_disable() ++p->migrate_disable -> 2
  preempt_disable()
  spin_unlock_irqrestore(&sc->alloc_lock)
    migrate_enable() in_atomic(), so just returns, migrate_disable stays at 2
  spin_lock_irqsave(some other lock) -> b00m

    And the WARN_ON code ends up tripping over this over and over in
    log_store().
    
    Sequence captured via ftrace_dump_on_oops + crash utility 'dmesg'
    command.
    
    [512258.613862] sm-3297 16 .....11 359465349134644: sc_buffer_alloc <-hfi1_verbs_send_pio
    [512258.613876] sm-3297 16 .....11 359465349134719: migrate_disable <-sc_buffer_alloc
    [512258.613890] sm-3297 16 .....12 359465349134798: rt_spin_lock <-sc_buffer_alloc
    [512258.613903] sm-3297 16 ....112 359465349135481: rt_spin_unlock <-sc_buffer_alloc
    [512258.613916] sm-3297 16 ....112 359465349135556: migrate_enable <-sc_buffer_alloc
    [512258.613935] sm-3297 16 ....112 359465349135788: seg_pio_copy_start <-hfi1_verbs_send_pio
    [512258.613954] sm-3297 16 ....112 359465349136273: update_sge <-hfi1_verbs_send_pio
    [512258.613981] sm-3297 16 ....112 359465349136373: seg_pio_copy_mid <-hfi1_verbs_send_pio
    [512258.613999] sm-3297 16 ....112 359465349136873: update_sge <-hfi1_verbs_send_pio
    [512258.614017] sm-3297 16 ....112 359465349136956: seg_pio_copy_mid <-hfi1_verbs_send_pio
    [512258.614035] sm-3297 16 ....112 359465349137221: seg_pio_copy_end <-hfi1_verbs_send_pio
    [512258.614048] sm-3297 16 .....12 359465349137360: migrate_disable <-hfi1_verbs_send_pio
    [512258.614065] sm-3297 16 .....12 359465349137476: warn_slowpath_null <-migrate_disable
    [512258.614081] sm-3297 16 .....12 359465349137564: __warn <-warn_slowpath_null
    [512258.614088] sm-3297 16 .....12 359465349137958: printk <-__warn
    [512258.614096] sm-3297 16 .....12 359465349138055: vprintk_default <-printk
    [512258.614104] sm-3297 16 .....12 359465349138144: vprintk_emit <-vprintk_default
    [512258.614111] sm-3297 16 d....12 359465349138312: _raw_spin_lock <-vprintk_emit
    [512258.614119] sm-3297 16 d...112 359465349138789: log_store <-vprintk_emit
    [512258.614127] sm-3297 16 .....12 359465349139068: migrate_disable <-vprintk_emit

I'm wondering if turning this sc->alloc_lock to a raw_spin_lock is the
right solution, which I'm afraid its not, as there are places where it
is held and then the code goes on to grab other non-raw spinlocks...

I got this patch in my test branch and it makes the test case go further
before splatting on other problems with infiniband + PREEMPT_RT_FULL,
but as I said, I fear its not the right solution, ideas?

The kernel I'm seing this is RHEL's + the PREEMPT_RT_FULL patch:

Linux version 3.10.0-709.rt56.636.test.el7.x86_64 (acme@seventh) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #
1 SMP PREEMPT RT Wed Sep 20 18:04:55 -03 2017

I will try and build with the latest PREEMPT_RT_FULL patch, but the
infiniband codebase in RHEL seems to be up to what is upstream and
I just looked at patches-4.11.12-rt14/add_migrate_disable.patch and that
WARN_ON_ONCE(p->migrate_disable_atomic) is still there :-\

- Arnaldo

commit 7ec7d80c7f46bb04da5f39836096de4c0ddde71a
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Wed Sep 6 10:30:08 2017 -0300

    infiniband: Convert per-NUMA send_context->alloc_lock to a raw spinlock

diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c
index 615be68e40b3..8f28f8fe842d 100644
--- a/drivers/infiniband/hw/hfi1/pio.c
+++ b/drivers/infiniband/hw/hfi1/pio.c
@@ -744,7 +744,7 @@ struct send_context *sc_alloc(struct hfi1_devdata *dd, int type,
 	sc->dd = dd;
 	sc->node = numa;
 	sc->type = type;
-	spin_lock_init(&sc->alloc_lock);
+	raw_spin_lock_init(&sc->alloc_lock);
 	spin_lock_init(&sc->release_lock);
 	spin_lock_init(&sc->credit_ctrl_lock);
 	INIT_LIST_HEAD(&sc->piowait);
@@ -929,13 +929,13 @@ void sc_disable(struct send_context *sc)
 		return;
 
 	/* do all steps, even if already disabled */
-	spin_lock_irqsave(&sc->alloc_lock, flags);
+	raw_spin_lock_irqsave(&sc->alloc_lock, flags);
 	reg = read_kctxt_csr(sc->dd, sc->hw_context, SC(CTRL));
 	reg &= ~SC(CTRL_CTXT_ENABLE_SMASK);
 	sc->flags &= ~SCF_ENABLED;
 	sc_wait_for_packet_egress(sc, 1);
 	write_kctxt_csr(sc->dd, sc->hw_context, SC(CTRL), reg);
-	spin_unlock_irqrestore(&sc->alloc_lock, flags);
+	raw_spin_unlock_irqrestore(&sc->alloc_lock, flags);
 
 	/*
 	 * Flush any waiters.  Once the context is disabled,
@@ -1232,7 +1232,7 @@ int sc_enable(struct send_context *sc)
 	 * worry about locking since the releaser will not do anything
 	 * if the context accounting values have not changed.
 	 */
-	spin_lock_irqsave(&sc->alloc_lock, flags);
+	raw_spin_lock_irqsave(&sc->alloc_lock, flags);
 	sc_ctrl = read_kctxt_csr(dd, sc->hw_context, SC(CTRL));
 	if ((sc_ctrl & SC(CTRL_CTXT_ENABLE_SMASK)))
 		goto unlock; /* already enabled */
@@ -1303,7 +1303,7 @@ int sc_enable(struct send_context *sc)
 	sc->flags |= SCF_ENABLED;
 
 unlock:
-	spin_unlock_irqrestore(&sc->alloc_lock, flags);
+	raw_spin_unlock_irqrestore(&sc->alloc_lock, flags);
 
 	return ret;
 }
@@ -1361,9 +1361,9 @@ void sc_stop(struct send_context *sc, int flag)
 	sc->flags |= flag;
 
 	/* stop buffer allocations */
-	spin_lock_irqsave(&sc->alloc_lock, flags);
+	raw_spin_lock_irqsave(&sc->alloc_lock, flags);
 	sc->flags &= ~SCF_ENABLED;
-	spin_unlock_irqrestore(&sc->alloc_lock, flags);
+	raw_spin_unlock_irqrestore(&sc->alloc_lock, flags);
 	wake_up(&sc->halt_wait);
 }
 
@@ -1391,9 +1391,9 @@ struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,
 	int trycount = 0;
 	u32 head, next;
 
-	spin_lock_irqsave(&sc->alloc_lock, flags);
+	raw_spin_lock_irqsave(&sc->alloc_lock, flags);
 	if (!(sc->flags & SCF_ENABLED)) {
-		spin_unlock_irqrestore(&sc->alloc_lock, flags);
+		raw_spin_unlock_irqrestore(&sc->alloc_lock, flags);
 		goto done;
 	}
 
@@ -1402,7 +1402,7 @@ struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,
 	if (blocks > avail) {
 		/* not enough room */
 		if (unlikely(trycount))	{ /* already tried to get more room */
-			spin_unlock_irqrestore(&sc->alloc_lock, flags);
+			raw_spin_unlock_irqrestore(&sc->alloc_lock, flags);
 			goto done;
 		}
 		/* copy from receiver cache line and recalculate */
@@ -1458,7 +1458,7 @@ struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,
 	 */
 	smp_wmb();
 	sc->sr_head = next;
-	spin_unlock_irqrestore(&sc->alloc_lock, flags);
+	raw_spin_unlock_irqrestore(&sc->alloc_lock, flags);
 
 	/* finish filling in the buffer outside the lock */
 	pbuf->start = sc->base_addr + fill_wrap * PIO_BLOCK_SIZE;
diff --git a/drivers/infiniband/hw/hfi1/pio.h b/drivers/infiniband/hw/hfi1/pio.h
index 867e5ffc3595..06dfc6f81fd5 100644
--- a/drivers/infiniband/hw/hfi1/pio.h
+++ b/drivers/infiniband/hw/hfi1/pio.h
@@ -112,7 +112,7 @@ struct send_context {
 	u8  group;			/* credit return group */
 
 	/* allocator fields */
-	spinlock_t alloc_lock ____cacheline_aligned_in_smp;
+	raw_spinlock_t alloc_lock ____cacheline_aligned_in_smp;
 	u32 sr_head;			/* shadow ring head */
 	unsigned long fill;		/* official alloc count */
 	unsigned long alloc_free;	/* copy of free (less cache thrash) */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
  2017-09-25 14:49 [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues Arnaldo Carvalho de Melo
@ 2017-09-25 19:45 ` Arnaldo Carvalho de Melo
       [not found] ` <20170925144949.GP29668-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  1 sibling, 0 replies; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-09-25 19:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Mike Marciniszyn, Dennis Dalessandro, Thomas Gleixner,
	Clark Williams, Dean Luick, Doug Ledford, Jubin John, Kaike Wan,
	Leon Romanovsky, Peter Zijlstra, Sebastian Sanchez,
	Steven Rostedt, linux-rt-users, linux-rdma

Em Mon, Sep 25, 2017 at 11:49:49AM -0300, Arnaldo Carvalho de Melo escreveu:
> 	I'm trying to get an Infiniband test case working with the RT
> kernel, and ended over tripping over this case:
> 
> 	In drivers/infiniband/hw/hfi1/pio.c sc_buffer_alloc() disables
> preemption that will be reenabled by either pio_copy() or
> seg_pio_copy_end().
> 
> 	But before disabling preemption it grabs a spin lock that will
> be dropped after it disables preemption, which ends up triggering a
> warning in migrate_disable() later on.
> 
>   spin_lock_irqsave(&sc->alloc_lock)
>     migrate_disable() ++p->migrate_disable -> 2
>   preempt_disable()
>   spin_unlock_irqrestore(&sc->alloc_lock)
>     migrate_enable() in_atomic(), so just returns, migrate_disable stays at 2
>   spin_lock_irqsave(some other lock) -> b00m

Sebastian, perhaps use local locks like you did for the random.c case?
I'll study to see if that is possible...

I mean this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/drivers/char/random.c?h=linux-4.11.y-rt&id=4bed11300e24d5178829758e535cc4996490b7c8

-------------
random: avoid preempt_disable()ed section
extract_crng() will use sleeping locks while in a preempt_disable()
section due to get_cpu_var().
Work around it with local_locks.

Cc: stable-rt@vger.kernel.org # where it applies to
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-------------

- Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
       [not found] ` <20170925144949.GP29668-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-09-25 20:15   ` Steven Rostedt
       [not found]     ` <20170925161528.52d34769-ZM9ACYiE99GSuEeoRQArULNAH6kLmebB@public.gmane.org>
  2017-09-25 21:14     ` Julia Cartwright
  1 sibling, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2017-09-25 20:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Mike Marciniszyn, Dennis Dalessandro, Thomas Gleixner,
	Clark Williams, Dean Luick, Doug Ledford, Jubin John, Kaike Wan,
	Leon Romanovsky, Peter Zijlstra, Sebastian Andrzej Siewior,
	Sebastian Sanchez, linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, 25 Sep 2017 11:49:49 -0300
Arnaldo Carvalho de Melo <acme-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:

> Hi,
> 
> 	I'm trying to get an Infiniband test case working with the RT
> kernel, and ended over tripping over this case:
> 
> 	In drivers/infiniband/hw/hfi1/pio.c sc_buffer_alloc() disables
> preemption that will be reenabled by either pio_copy() or
> seg_pio_copy_end().
> 
> 	But before disabling preemption it grabs a spin lock that will
> be dropped after it disables preemption, which ends up triggering a
> warning in migrate_disable() later on.
> 
>   spin_lock_irqsave(&sc->alloc_lock)
>     migrate_disable() ++p->migrate_disable -> 2
>   preempt_disable()
>   spin_unlock_irqrestore(&sc->alloc_lock)
>     migrate_enable() in_atomic(), so just returns, migrate_disable stays at 2
>   spin_lock_irqsave(some other lock) -> b00m
> 
>     And the WARN_ON code ends up tripping over this over and over in
>     log_store().
>     
>     Sequence captured via ftrace_dump_on_oops + crash utility 'dmesg'
>     command.
>     
>     [512258.613862] sm-3297 16 .....11 359465349134644: sc_buffer_alloc <-hfi1_verbs_send_pio
>     [512258.613876] sm-3297 16 .....11 359465349134719: migrate_disable <-sc_buffer_alloc
>     [512258.613890] sm-3297 16 .....12 359465349134798: rt_spin_lock <-sc_buffer_alloc
>     [512258.613903] sm-3297 16 ....112 359465349135481: rt_spin_unlock <-sc_buffer_alloc
>     [512258.613916] sm-3297 16 ....112 359465349135556: migrate_enable <-sc_buffer_alloc
>     [512258.613935] sm-3297 16 ....112 359465349135788: seg_pio_copy_start <-hfi1_verbs_send_pio
>     [512258.613954] sm-3297 16 ....112 359465349136273: update_sge <-hfi1_verbs_send_pio
>     [512258.613981] sm-3297 16 ....112 359465349136373: seg_pio_copy_mid <-hfi1_verbs_send_pio
>     [512258.613999] sm-3297 16 ....112 359465349136873: update_sge <-hfi1_verbs_send_pio
>     [512258.614017] sm-3297 16 ....112 359465349136956: seg_pio_copy_mid <-hfi1_verbs_send_pio
>     [512258.614035] sm-3297 16 ....112 359465349137221: seg_pio_copy_end <-hfi1_verbs_send_pio
>     [512258.614048] sm-3297 16 .....12 359465349137360: migrate_disable <-hfi1_verbs_send_pio
>     [512258.614065] sm-3297 16 .....12 359465349137476: warn_slowpath_null <-migrate_disable
>     [512258.614081] sm-3297 16 .....12 359465349137564: __warn <-warn_slowpath_null
>     [512258.614088] sm-3297 16 .....12 359465349137958: printk <-__warn
>     [512258.614096] sm-3297 16 .....12 359465349138055: vprintk_default <-printk
>     [512258.614104] sm-3297 16 .....12 359465349138144: vprintk_emit <-vprintk_default
>     [512258.614111] sm-3297 16 d....12 359465349138312: _raw_spin_lock <-vprintk_emit
>     [512258.614119] sm-3297 16 d...112 359465349138789: log_store <-vprintk_emit
>     [512258.614127] sm-3297 16 .....12 359465349139068: migrate_disable <-vprintk_emit
> 
> I'm wondering if turning this sc->alloc_lock to a raw_spin_lock is the
> right solution, which I'm afraid its not, as there are places where it
> is held and then the code goes on to grab other non-raw spinlocks...

No, the correct solution is to convert the preempt_disable into a
local_lock(), which will be a preempt_disable when PREEMPT_RT is not
set. Look for other patches that convert preempt_disable() into
local_lock()s for examples.

-- Steve

> 
> I got this patch in my test branch and it makes the test case go further
> before splatting on other problems with infiniband + PREEMPT_RT_FULL,
> but as I said, I fear its not the right solution, ideas?
> 
> The kernel I'm seing this is RHEL's + the PREEMPT_RT_FULL patch:
> 
> Linux version 3.10.0-709.rt56.636.test.el7.x86_64 (acme@seventh) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #
> 1 SMP PREEMPT RT Wed Sep 20 18:04:55 -03 2017
> 
> I will try and build with the latest PREEMPT_RT_FULL patch, but the
> infiniband codebase in RHEL seems to be up to what is upstream and
> I just looked at patches-4.11.12-rt14/add_migrate_disable.patch and that
> WARN_ON_ONCE(p->migrate_disable_atomic) is still there :-\
> 
> - Arnaldo
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
       [not found] ` <20170925144949.GP29668-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-09-25 21:14     ` Julia Cartwright
  2017-09-25 21:14     ` Julia Cartwright
  1 sibling, 0 replies; 13+ messages in thread
From: Julia Cartwright @ 2017-09-25 21:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Mike Marciniszyn, Dennis Dalessandro, Thomas Gleixner,
	Clark Williams, Dean Luick, Doug Ledford, Jubin John, Kaike Wan,
	Leon Romanovsky, Peter Zijlstra, Sebastian Andrzej Siewior,
	Sebastian Sanchez, Steven Rostedt,
	linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Sep 25, 2017 at 11:49:49AM -0300, Arnaldo Carvalho de Melo wrote:
> Hi,
> 
> 	I'm trying to get an Infiniband test case working with the RT
> kernel, and ended over tripping over this case:
> 
> 	In drivers/infiniband/hw/hfi1/pio.c sc_buffer_alloc() disables
> preemption that will be reenabled by either pio_copy() or
> seg_pio_copy_end().

Perhaps it's worth revisiting why this design decision was chosen in the
first place?  The introducing commit, a054374f15428cbc1
("staging/rdma/hfi1: convert buffers allocated atomic to per cpu")
mentions the previous atomic as being expensive, but the usage of
preempt_disable()/preempt_enable() is also expensive, albeit in a
different way.

My first question would be: is disabling preemption here really
necessary?

AFAICT, preemption is only disabled to protect the access of the
'buffers_allocated' percpu counters, nothing else.

However, because these counters are only observed in aggregate, across
CPUs (see get_buffers_allocated()), the sum will be the same, regardless
of a thread is migrated between a this_cpu_inc(buffers_allocated) and a
this_cpu_dec(buffers_allocated).

If that's the case, perhaps the fix is as easy as removing the
preempt_disable() and preempt_enable()?

   Julia

PS. Perhaps a longer term approach would be to investigate whether the
usage of percpu_ref or percpu_counter is a better fit for this driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
@ 2017-09-25 21:14     ` Julia Cartwright
  0 siblings, 0 replies; 13+ messages in thread
From: Julia Cartwright @ 2017-09-25 21:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Mike Marciniszyn, Dennis Dalessandro, Thomas Gleixner,
	Clark Williams, Dean Luick, Doug Ledford, Jubin John, Kaike Wan,
	Leon Romanovsky, Peter Zijlstra, Sebastian Andrzej Siewior,
	Sebastian Sanchez, Steven Rostedt,
	linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Sep 25, 2017 at 11:49:49AM -0300, Arnaldo Carvalho de Melo wrote:
> Hi,
> 
> 	I'm trying to get an Infiniband test case working with the RT
> kernel, and ended over tripping over this case:
> 
> 	In drivers/infiniband/hw/hfi1/pio.c sc_buffer_alloc() disables
> preemption that will be reenabled by either pio_copy() or
> seg_pio_copy_end().

Perhaps it's worth revisiting why this design decision was chosen in the
first place?  The introducing commit, a054374f15428cbc1
("staging/rdma/hfi1: convert buffers allocated atomic to per cpu")
mentions the previous atomic as being expensive, but the usage of
preempt_disable()/preempt_enable() is also expensive, albeit in a
different way.

My first question would be: is disabling preemption here really
necessary?

AFAICT, preemption is only disabled to protect the access of the
'buffers_allocated' percpu counters, nothing else.

However, because these counters are only observed in aggregate, across
CPUs (see get_buffers_allocated()), the sum will be the same, regardless
of a thread is migrated between a this_cpu_inc(buffers_allocated) and a
this_cpu_dec(buffers_allocated).

If that's the case, perhaps the fix is as easy as removing the
preempt_disable() and preempt_enable()?

   Julia

PS. Perhaps a longer term approach would be to investigate whether the
usage of percpu_ref or percpu_counter is a better fit for this driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
       [not found]     ` <20170925161528.52d34769-ZM9ACYiE99GSuEeoRQArULNAH6kLmebB@public.gmane.org>
@ 2017-09-26 13:15       ` Arnaldo Carvalho de Melo
       [not found]         ` <20170926131529.GB25735-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-09-26 13:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mike Marciniszyn, Dennis Dalessandro, Thomas Gleixner,
	Clark Williams, Dean Luick, Doug Ledford, Jubin John, Kaike Wan,
	Leon Romanovsky, Peter Zijlstra, Sebastian Andrzej Siewior,
	Sebastian Sanchez, linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

Em Mon, Sep 25, 2017 at 04:15:28PM -0400, Steven Rostedt escreveu:
> On Mon, 25 Sep 2017 11:49:49 -0300 Arnaldo Carvalho de Melo <acme-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > I'm wondering if turning this sc->alloc_lock to a raw_spin_lock is the
> > right solution, which I'm afraid its not, as there are places where it
> > is held and then the code goes on to grab other non-raw spinlocks...
 
> No, the correct solution is to convert the preempt_disable into a
> local_lock(), which will be a preempt_disable when PREEMPT_RT is not
> set. Look for other patches that convert preempt_disable() into
> local_lock()s for examples.

Thanks, I had seen a patch patch by Sebastian for random.c doing that,
will continue in that direction.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
  2017-09-25 21:14     ` Julia Cartwright
  (?)
@ 2017-09-26 14:10     ` Marciniszyn, Mike
       [not found]       ` <32E1700B9017364D9B60AED9960492BC3441B3BD-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
                         ` (2 more replies)
  -1 siblings, 3 replies; 13+ messages in thread
From: Marciniszyn, Mike @ 2017-09-26 14:10 UTC (permalink / raw)
  To: Julia Cartwright, Arnaldo Carvalho de Melo
  Cc: Dalessandro, Dennis, Thomas Gleixner, Clark Williams, Luick,
	Dean, Doug Ledford, Wan, Kaike, Leon Romanovsky, Peter Zijlstra,
	Sebastian Andrzej Siewior, Sanchez, Sebastian, Steven Rostedt,
	linux-rt-users, linux-rdma

> > Hi,
> >
> > 	I'm trying to get an Infiniband test case working with the RT
> > kernel, and ended over tripping over this case:
> >
> > 	In drivers/infiniband/hw/hfi1/pio.c sc_buffer_alloc() disables
> > preemption that will be reenabled by either pio_copy() or
> > seg_pio_copy_end().
> 
> Perhaps it's worth revisiting why this design decision was chosen in the
> first place?  The introducing commit, a054374f15428cbc1
> ("staging/rdma/hfi1: convert buffers allocated atomic to per cpu")
> mentions the previous atomic as being expensive, but the usage of
> preempt_disable()/preempt_enable() is also expensive, albeit in a
> different way.
> 
> My first question would be: is disabling preemption here really
> necessary?
> 

The motivation for the preempt manipulation has nothing to do with the counter.

The buffer allocation returns a set of ringed write-combining 64B MMIO buffers.   The allocate lock is then dropped right after allocation to allow parallel allocates.

The hardware keeps track of the buffer fill across multiple CPUs, so that after the oldest buffer is copied the ring is advanced.

The idea was to minimize the time from the drop of the allocate lock to the end of the copy to insure the highest rate of copy to the ring from multiple cores.

> AFAICT, preemption is only disabled to protect the access of the
> 'buffers_allocated' percpu counters, nothing else.
>
> However, because these counters are only observed in aggregate, across
> CPUs (see get_buffers_allocated()), the sum will be the same, regardless
> of a thread is migrated between a this_cpu_inc(buffers_allocated) and a
> this_cpu_dec(buffers_allocated).
> 

The counter exists purely as a hot path optimization to avoid an atomic.

The only time an accurate counter is required is during reset sequences to wait for allocate/copy pairs to finish and we don't care if the closing decrement is on the same core.

The percpu refcount might be a better choice, but I don't think it existed at the time the code was written.

> If that's the case, perhaps the fix is as easy as removing the
> preempt_disable() and preempt_enable()?
> 

That would certainly be the easiest solution, but would compromise performance if a migration occurs after the allocate lock is dropped.

We have been looking at the patch that was submitted for using the raw spin lock and we share Arnaldo's concern, particularly mixing lock types.

Other locks can be procured in the timeout case in sc_wait_for_packet_egress() inside queue_work() and dd_dev_err()/printk().

Mike
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
       [not found]       ` <32E1700B9017364D9B60AED9960492BC3441B3BD-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2017-09-26 14:56         ` Steven Rostedt
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2017-09-26 14:56 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Julia Cartwright, Arnaldo Carvalho de Melo, Dalessandro, Dennis,
	Thomas Gleixner, Clark Williams, Luick, Dean, Doug Ledford, Wan,
	Kaike, Leon Romanovsky, Peter Zijlstra,
	Sebastian Andrzej Siewior, Sanchez, Sebastian,
	linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, 26 Sep 2017 14:10:52 +0000
"Marciniszyn, Mike" <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> > My first question would be: is disabling preemption here really
> > necessary?
> >   
> 
> The motivation for the preempt manipulation has nothing to do with
> the counter.
> 
> The buffer allocation returns a set of ringed write-combining 64B
> MMIO buffers.   The allocate lock is then dropped right after
> allocation to allow parallel allocates.
> 
> The hardware keeps track of the buffer fill across multiple CPUs, so
> that after the oldest buffer is copied the ring is advanced.
> 
> The idea was to minimize the time from the drop of the allocate lock
> to the end of the copy to insure the highest rate of copy to the ring
> from multiple cores.
> 
> > AFAICT, preemption is only disabled to protect the access of the
> > 'buffers_allocated' percpu counters, nothing else.
> >
> > However, because these counters are only observed in aggregate,
> > across CPUs (see get_buffers_allocated()), the sum will be the
> > same, regardless of a thread is migrated between a
> > this_cpu_inc(buffers_allocated) and a
> > this_cpu_dec(buffers_allocated). 
> 
> The counter exists purely as a hot path optimization to avoid an
> atomic.
> 
> The only time an accurate counter is required is during reset
> sequences to wait for allocate/copy pairs to finish and we don't care
> if the closing decrement is on the same core.
> 
> The percpu refcount might be a better choice, but I don't think it
> existed at the time the code was written.
> 
> > If that's the case, perhaps the fix is as easy as removing the
> > preempt_disable() and preempt_enable()?
> >   
> 
> That would certainly be the easiest solution, but would compromise
> performance if a migration occurs after the allocate lock is dropped.

local_lock() disables migration. It's currently only in the -rt patch.
It's made to allow preemption to occur as much as possible, as -rt
cares about reaction time more than performance (although, we would
love to keep performance as well).

> 
> We have been looking at the patch that was submitted for using the
> raw spin lock and we share Arnaldo's concern, particularly mixing
> lock types.
> 
> Other locks can be procured in the timeout case in
> sc_wait_for_packet_egress() inside queue_work() and
> dd_dev_err()/printk().
> 

A raw_spin_lock sacrifices reaction time, which goes against the goal
of -rt. When PREEMPT_RT is disabled, the local_lock becomes
preempt_disable, so it has no affect on the vanilla kernel.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
       [not found]         ` <20170926131529.GB25735-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-09-26 17:55           ` Marciniszyn, Mike
  2017-09-26 18:28             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 13+ messages in thread
From: Marciniszyn, Mike @ 2017-09-26 17:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Steven Rostedt
  Cc: Dalessandro, Dennis, Thomas Gleixner, Clark Williams, Luick,
	Dean, Doug Ledford, Wan, Kaike, Leon Romanovsky, Peter Zijlstra,
	Sebastian Andrzej Siewior, Sanchez, Sebastian,
	linux-rt-users-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

> > No, the correct solution is to convert the preempt_disable into a
> > local_lock(), which will be a preempt_disable when PREEMPT_RT is not
> > set. Look for other patches that convert preempt_disable() into
> > local_lock()s for examples.
> 
> Thanks, I had seen a patch patch by Sebastian for random.c doing that,
> will continue in that direction.
> 

I'm assuming you want to end up with common driver code base?

Will there eventually be a wrapper upstream to allow that?

Or will the rt patch handle those issues?

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
  2017-09-26 17:55           ` Marciniszyn, Mike
@ 2017-09-26 18:28             ` Arnaldo Carvalho de Melo
  2017-09-26 21:11               ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-09-26 18:28 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Steven Rostedt, Dalessandro, Dennis, Thomas Gleixner,
	Clark Williams, Luick, Dean, Doug Ledford, Wan, Kaike,
	Leon Romanovsky, Peter Zijlstra, Sebastian Andrzej Siewior,
	Sanchez, Sebastian, linux-rt-users, linux-rdma

Em Tue, Sep 26, 2017 at 05:55:45PM +0000, Marciniszyn, Mike escreveu:
> > > No, the correct solution is to convert the preempt_disable into a
> > > local_lock(), which will be a preempt_disable when PREEMPT_RT is not
> > > set. Look for other patches that convert preempt_disable() into
> > > local_lock()s for examples.
> > 
> > Thanks, I had seen a patch patch by Sebastian for random.c doing that,
> > will continue in that direction.
> > 
> 
> I'm assuming you want to end up with common driver code base?
> 
> Will there eventually be a wrapper upstream to allow that?
> 
> Or will the rt patch handle those issues?

So what happens is that infrastructure in the rt patch eventually lands
upstream, then this difference ceases to exist.

Steven, are there plans for local locks to go upstream?

Well, back to trying to learn about them and use in this case to see how
it ends up...

- Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
  2017-09-26 14:10     ` Marciniszyn, Mike
       [not found]       ` <32E1700B9017364D9B60AED9960492BC3441B3BD-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2017-09-26 21:00       ` Julia Cartwright
  2017-10-06  9:23       ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 13+ messages in thread
From: Julia Cartwright @ 2017-09-26 21:00 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Arnaldo Carvalho de Melo, Dalessandro, Dennis, Thomas Gleixner,
	Clark Williams, Luick, Dean, Doug Ledford, Wan, Kaike,
	Leon Romanovsky, Peter Zijlstra, Sebastian Andrzej Siewior,
	Sanchez, Sebastian, Steven Rostedt, linux-rt-users, linux-rdma

On Tue, Sep 26, 2017 at 02:10:52PM +0000, Marciniszyn, Mike wrote:
> > > Hi,
> > >
> > > 	I'm trying to get an Infiniband test case working with the RT
> > > kernel, and ended over tripping over this case:
> > >
> > > 	In drivers/infiniband/hw/hfi1/pio.c sc_buffer_alloc() disables
> > > preemption that will be reenabled by either pio_copy() or
> > > seg_pio_copy_end().
> > 
> > Perhaps it's worth revisiting why this design decision was chosen in the
> > first place?  The introducing commit, a054374f15428cbc1
> > ("staging/rdma/hfi1: convert buffers allocated atomic to per cpu")
> > mentions the previous atomic as being expensive, but the usage of
> > preempt_disable()/preempt_enable() is also expensive, albeit in a
> > different way.
> > 
> > My first question would be: is disabling preemption here really
> > necessary?
> > 
>
> The motivation for the preempt manipulation has nothing to do with the counter.
>
> The buffer allocation returns a set of ringed write-combining 64B MMIO
> buffers.   The allocate lock is then dropped right after allocation to
> allow parallel allocates.
>
> The hardware keeps track of the buffer fill across multiple CPUs, so
> that after the oldest buffer is copied the ring is advanced.
>
> The idea was to minimize the time from the drop of the allocate lock
> to the end of the copy to insure the highest rate of copy to the ring
> from multiple cores.

Okay, so to be clear, disabling preemption here is strictly a
performance consideration, not a correctness one?

> > If that's the case, perhaps the fix is as easy as removing the
> > preempt_disable() and preempt_enable()?
> > 
> 
> That would certainly be the easiest solution, but would compromise
> performance if a migration occurs after the allocate lock is dropped.

Given this response, I'm assuming that to be the case.

Disabling preemption to avoid "compromising performance", presumes a
very specific definition of "performance" that does not apply _in
general_, and certainly does not apply for RT users, who care more for
"response time" (as Steven put it).

Instead, the kernel already directly provides tools for _users_ to make
decisions about what the relative portions of their workloads are,
namely scheduling parameters, CPU affinities, isolcpus, etc.  Why are
these not sufficient in this case?

In addition, the use of local locks is a waste here, they provide more
guarantees than is actually necessary; this driver will function fine if
more than two threads on the same CPU are within this "critical
section".  If you want to maintain preempt_disable()/_enable() in
non-RT, then whatever, use the preempt_disable_nort()/_enable_nort()
varieties.

   Julia

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
  2017-09-26 18:28             ` Arnaldo Carvalho de Melo
@ 2017-09-26 21:11               ` Steven Rostedt
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2017-09-26 21:11 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Marciniszyn, Mike, Dalessandro, Dennis, Thomas Gleixner,
	Clark Williams, Luick, Dean, Doug Ledford, Wan, Kaike,
	Leon Romanovsky, Peter Zijlstra, Sebastian Andrzej Siewior,
	Sanchez, Sebastian, linux-rt-users, linux-rdma

On Tue, 26 Sep 2017 15:28:00 -0300
Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> So what happens is that infrastructure in the rt patch eventually lands
> upstream, then this difference ceases to exist.
> 
> Steven, are there plans for local locks to go upstream?
> 

Yes, but they don't make sense unless we get PREEMPT_RT (sleeping
spinlocks) upstream. Otherwise, they will just be another name for
preempt_disable().

That said, there is some rational for getting them upsteam before
sleeping spinlocks, and that is to annotate what a preempt_disable() is
trying to protect.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues
  2017-09-26 14:10     ` Marciniszyn, Mike
       [not found]       ` <32E1700B9017364D9B60AED9960492BC3441B3BD-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2017-09-26 21:00       ` Julia Cartwright
@ 2017-10-06  9:23       ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-10-06  9:23 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Julia Cartwright, Arnaldo Carvalho de Melo, Dalessandro, Dennis,
	Thomas Gleixner, Clark Williams, Luick, Dean, Doug Ledford, Wan,
	Kaike, Leon Romanovsky, Peter Zijlstra, Sanchez, Sebastian,
	Steven Rostedt, linux-rt-users, linux-rdma

On 2017-09-26 14:10:52 [+0000], Marciniszyn, Mike wrote:
> > My first question would be: is disabling preemption here really
> > necessary?
> > 
> 
> The motivation for the preempt manipulation has nothing to do with the counter.
> 
> The buffer allocation returns a set of ringed write-combining 64B MMIO buffers.   The allocate lock is then dropped right after allocation to allow parallel allocates.
> 
> The hardware keeps track of the buffer fill across multiple CPUs, so that after the oldest buffer is copied the ring is advanced.
> 
> The idea was to minimize the time from the drop of the allocate lock to the end of the copy to insure the highest rate of copy to the ring from multiple cores.

So on -RT we would like to get rid of that preempt_disable() section. Do
you have any numbers that say that this does any good?
If so, by how much and how much does is differ compared to a !PREEMPT
kernel if I may ask.

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-10-06  9:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-25 14:49 [RFC+PATCH] Infiniband hfi1 + PREEMPT_RT_FULL issues Arnaldo Carvalho de Melo
2017-09-25 19:45 ` Arnaldo Carvalho de Melo
     [not found] ` <20170925144949.GP29668-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-09-25 20:15   ` Steven Rostedt
     [not found]     ` <20170925161528.52d34769-ZM9ACYiE99GSuEeoRQArULNAH6kLmebB@public.gmane.org>
2017-09-26 13:15       ` Arnaldo Carvalho de Melo
     [not found]         ` <20170926131529.GB25735-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-09-26 17:55           ` Marciniszyn, Mike
2017-09-26 18:28             ` Arnaldo Carvalho de Melo
2017-09-26 21:11               ` Steven Rostedt
2017-09-25 21:14   ` Julia Cartwright
2017-09-25 21:14     ` Julia Cartwright
2017-09-26 14:10     ` Marciniszyn, Mike
     [not found]       ` <32E1700B9017364D9B60AED9960492BC3441B3BD-RjuIdWtd+YbTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-09-26 14:56         ` Steven Rostedt
2017-09-26 21:00       ` Julia Cartwright
2017-10-06  9:23       ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.