linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Context switch latency in tickless isolated CPU
@ 2016-08-17  6:26 GeHao Kang
  2016-08-17 12:18 ` Chris Metcalf
  0 siblings, 1 reply; 11+ messages in thread
From: GeHao Kang @ 2016-08-17  6:26 UTC (permalink / raw)
  To: fweisbec, cmetcalf, linux-api, linux-kernel; +Cc: peterz, tglx, mingo, paulmck

Hi Frederic and Chris,

When the lmbench runs on the tickless isolated CPU, the context switch
latency on
this CPU is higher than the one on other CPU.  The test  platform is
Linux 4.4.12 with NO_HZ_FULL on I.MX6Q sabresd. The following is the
lmbench results about context switch:

lmbench runs on nonspecific CPU:
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
imx6qsabr Linux 4.4.12-   12.6   12.8   16.1   26.6   42.1    36.5    70.0

lmbench runs on the isolated CPU:
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
imx6qsabr Linux 4.4.12-   17.7   21.9   27.6   42.0   40.3    44.0    77.1

>From the results, only the test case with 8p/64K on the isolated CPU
has lower latency.

To investigate the cause, I use the kernel event tracer to find out
the events, user_enter and user_exit, of context_tracking would happen
on tickless isolated CPU. These two events means that this CPU enters
and exits the RCU extended quiescent state. Besides, the execution
time of these two events are 3us and 2us,
which are measured by ktime. Is this the reason why the context switch
has higher
latency on the tickless isolated CPU?

Thanks,

Regards,
- Kang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Context switch latency in tickless isolated CPU
  2016-08-17  6:26 Context switch latency in tickless isolated CPU GeHao Kang
@ 2016-08-17 12:18 ` Chris Metcalf
  2016-08-18  3:25   ` GeHao Kang
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Metcalf @ 2016-08-17 12:18 UTC (permalink / raw)
  To: GeHao Kang, fweisbec, linux-api, linux-kernel
  Cc: peterz, tglx, mingo, paulmck

On 8/17/2016 2:26 AM, GeHao Kang wrote:
> To investigate the cause, I use the kernel event tracer to find out
> the events, user_enter and user_exit, of context_tracking would happen
> on tickless isolated CPU. These two events means that this CPU enters
> and exits the RCU extended quiescent state. Besides, the execution
> time of these two events are 3us and 2us,
> which are measured by ktime. Is this the reason why the context switch
> has higher
> latency on the tickless isolated CPU?

The increased context switch time is likely from the increased
time to return from the kernel to userspace, due to ensuring
that various things in the kernel are quiesced.

Of course I'm sure it goes without saying that context switch
time is probably near the absolute bottom of things that
we care about as a metric for task isolation, since when you
are using it as designed, you never actually context switch.
But that said, it's always good to quantify what the overheads
are, so thanks.

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Context switch latency in tickless isolated CPU
  2016-08-17 12:18 ` Chris Metcalf
@ 2016-08-18  3:25   ` GeHao Kang
  2016-08-19 12:34     ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: GeHao Kang @ 2016-08-18  3:25 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: fweisbec, linux-api, linux-kernel, peterz, tglx, mingo, paulmck

Hi Chris,

Thanks for your reply.

Is the increased time fixed in each context switch? Because this increased time
will be the latency of the real time application, we hope to confirm it.
Thanks.


Regards,
- Kang



On Wed, Aug 17, 2016 at 8:18 PM, Chris Metcalf <cmetcalf@mellanox.com> wrote:
> On 8/17/2016 2:26 AM, GeHao Kang wrote:
>>
>> To investigate the cause, I use the kernel event tracer to find out
>> the events, user_enter and user_exit, of context_tracking would happen
>> on tickless isolated CPU. These two events means that this CPU enters
>> and exits the RCU extended quiescent state. Besides, the execution
>> time of these two events are 3us and 2us,
>> which are measured by ktime. Is this the reason why the context switch
>> has higher
>> latency on the tickless isolated CPU?
>
>
> The increased context switch time is likely from the increased
> time to return from the kernel to userspace, due to ensuring
> that various things in the kernel are quiesced.
>
> Of course I'm sure it goes without saying that context switch
> time is probably near the absolute bottom of things that
> we care about as a metric for task isolation, since when you
> are using it as designed, you never actually context switch.
> But that said, it's always good to quantify what the overheads
> are, so thanks.
>
> --
> Chris Metcalf, Mellanox Technologies
> http://www.mellanox.com
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Context switch latency in tickless isolated CPU
  2016-08-18  3:25   ` GeHao Kang
@ 2016-08-19 12:34     ` Peter Zijlstra
  2016-08-21 11:26       ` GeHao Kang
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2016-08-19 12:34 UTC (permalink / raw)
  To: GeHao Kang
  Cc: Chris Metcalf, fweisbec, linux-api, linux-kernel, tglx, mingo, paulmck

On Thu, Aug 18, 2016 at 11:25:00AM +0800, GeHao Kang wrote:

> Is the increased time fixed in each context switch? Because this increased time
> will be the latency of the real time application, we hope to confirm it.
> Thanks.

Why are you wanting to use nohz_full if you do syscalls?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Context switch latency in tickless isolated CPU
  2016-08-19 12:34     ` Peter Zijlstra
@ 2016-08-21 11:26       ` GeHao Kang
  2016-08-21 14:53         ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: GeHao Kang @ 2016-08-21 11:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chris Metcalf, Frédéric Weisbecker, linux-api,
	linux-kernel, tglx, mingo, Paul McKenney

On Fri, Aug 19, 2016 at 8:34 PM, Peter Zijlstra <peterz@infradead.org> wrote:

> Why are you wanting to use nohz_full if you do syscalls?

We hope to reduce the overhead of the tick while the real time
applications run,
and these applications might do some syscalls to operate the I/O devices like
EtherCAT.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Context switch latency in tickless isolated CPU
  2016-08-21 11:26       ` GeHao Kang
@ 2016-08-21 14:53         ` Paul E. McKenney
  2016-08-22  9:40           ` GeHao Kang
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2016-08-21 14:53 UTC (permalink / raw)
  To: GeHao Kang
  Cc: Peter Zijlstra, Chris Metcalf, Frédéric Weisbecker,
	linux-api, linux-kernel, tglx, mingo

On Sun, Aug 21, 2016 at 07:26:04PM +0800, GeHao Kang wrote:
> On Fri, Aug 19, 2016 at 8:34 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Why are you wanting to use nohz_full if you do syscalls?
> 
> We hope to reduce the overhead of the tick while the real time
> applications run,
> and these applications might do some syscalls to operate the I/O devices like
> EtherCAT.

If latency is all you care about, one approach is to map the device
registers into userspace and do the I/O without assistance from the
kernel.

Alternatively, use in-memory mailbox/queuing techniques to hand the
I/O off to some other thread.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Context switch latency in tickless isolated CPU
  2016-08-21 14:53         ` Paul E. McKenney
@ 2016-08-22  9:40           ` GeHao Kang
  2016-08-22 14:48             ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: GeHao Kang @ 2016-08-22  9:40 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Peter Zijlstra, Chris Metcalf, Frédéric Weisbecker,
	linux-api, linux-kernel, tglx, mingo

On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> If latency is all you care about, one approach is to map the device
> registers into userspace and do the I/O without assistance from the
> kernel.
In addition to the context switch latency, local interrupts are also
closed during
user_enter and user_exit of the context tracking. Therefore, the interrupt
latency might be also increased on the isolated tickless CPU. That
will degrade the
real time performance. Are these two events determined?

Thanks,
Kang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Context switch latency in tickless isolated CPU
  2016-08-22  9:40           ` GeHao Kang
@ 2016-08-22 14:48             ` Paul E. McKenney
  2016-08-22 15:12               ` Mark Hounschell
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2016-08-22 14:48 UTC (permalink / raw)
  To: GeHao Kang
  Cc: Peter Zijlstra, Chris Metcalf, Frédéric Weisbecker,
	linux-api, linux-kernel, tglx, mingo

On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote:
> On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > If latency is all you care about, one approach is to map the device
> > registers into userspace and do the I/O without assistance from the
> > kernel.
> In addition to the context switch latency, local interrupts are also
> closed during
> user_enter and user_exit of the context tracking. Therefore, the interrupt
> latency might be also increased on the isolated tickless CPU. That
> will degrade the
> real time performance. Are these two events determined?

Hmmm...  Why would you be taking interrupts on your isolated tickless
CPUs?  Doesn't that defeat the purpose of designating them as isolated
and tickless?

The key point being that effective use of NO_HZ_FULL requires
careful configuration and complete understanding of your workload.
And it is quite possible that you instead need to use something
other than NO_HZ_FULL.

If your question is instead "why must interrupts be disabled during
context tracking", I must defer to people who understand the x86
entry/exit code paths better than I do.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Context switch latency in tickless isolated CPU
  2016-08-22 14:48             ` Paul E. McKenney
@ 2016-08-22 15:12               ` Mark Hounschell
  2016-08-22 15:37                 ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Hounschell @ 2016-08-22 15:12 UTC (permalink / raw)
  To: paulmck, GeHao Kang
  Cc: Peter Zijlstra, Chris Metcalf, Frédéric Weisbecker,
	linux-api, linux-kernel, tglx, mingo

On 08/22/2016 10:48 AM, Paul E. McKenney wrote:
> On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote:
>> On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>>> If latency is all you care about, one approach is to map the device
>>> registers into userspace and do the I/O without assistance from the
>>> kernel.
>> In addition to the context switch latency, local interrupts are also
>> closed during
>> user_enter and user_exit of the context tracking. Therefore, the interrupt
>> latency might be also increased on the isolated tickless CPU. That
>> will degrade the
>> real time performance. Are these two events determined?
>
> Hmmm...  Why would you be taking interrupts on your isolated tickless
> CPUs?  Doesn't that defeat the purpose of designating them as isolated
> and tickless?
>

Don't mean to butt in here but think about a "special" PCI card that 
does nothing but take an external interrupt or external interrupts from 
an outside source where the latency between the time it occurs on the 
outside and the time an isolated processor can act on that event. The 
IRQ of that card also being pinned/isolated to that processor. This is a 
very common thing in the RT world.

Mark

> The key point being that effective use of NO_HZ_FULL requires
> careful configuration and complete understanding of your workload.
> And it is quite possible that you instead need to use something
> other than NO_HZ_FULL.
>
> If your question is instead "why must interrupts be disabled during
> context tracking", I must defer to people who understand the x86
> entry/exit code paths better than I do.
>
> 							Thanx, Paul
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Context switch latency in tickless isolated CPU
  2016-08-22 15:12               ` Mark Hounschell
@ 2016-08-22 15:37                 ` Paul E. McKenney
  2016-08-22 16:35                   ` Mark Hounschell
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2016-08-22 15:37 UTC (permalink / raw)
  To: Mark Hounschell
  Cc: GeHao Kang, Peter Zijlstra, Chris Metcalf,
	Frédéric Weisbecker, linux-api, linux-kernel, tglx,
	mingo

On Mon, Aug 22, 2016 at 11:12:45AM -0400, Mark Hounschell wrote:
> On 08/22/2016 10:48 AM, Paul E. McKenney wrote:
> >On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote:
> >>On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney
> >><paulmck@linux.vnet.ibm.com> wrote:
> >>>If latency is all you care about, one approach is to map the device
> >>>registers into userspace and do the I/O without assistance from the
> >>>kernel.
> >>In addition to the context switch latency, local interrupts are also
> >>closed during
> >>user_enter and user_exit of the context tracking. Therefore, the interrupt
> >>latency might be also increased on the isolated tickless CPU. That
> >>will degrade the
> >>real time performance. Are these two events determined?
> >
> >Hmmm...  Why would you be taking interrupts on your isolated tickless
> >CPUs?  Doesn't that defeat the purpose of designating them as isolated
> >and tickless?
> 
> Don't mean to butt in here but think about a "special" PCI card that
> does nothing but take an external interrupt or external interrupts
> from an outside source where the latency between the time it occurs
> on the outside and the time an isolated processor can act on that
> event. The IRQ of that card also being pinned/isolated to that
> processor. This is a very common thing in the RT world.

In this case, the host OS would see an event-driven real-time workload
from the PCI card, which would lead me to suggest -not- using NO_HZ_FULL
on the host OS.

Of course, if you are instead building an OS to run on the PCI card
itself, then the choice of configuration would depend on how the PCI
card was set up.  If it polled hardware, then NO_HZ_FULL on the PCI card
might work quite well.  But then you wouldn't have interrupts (on the
PCI card), so I am guessing that you mean the scenario covered in the
first paragraph.

Or am I missing your point?

							Thanx, Paul

> Mark
> 
> >The key point being that effective use of NO_HZ_FULL requires
> >careful configuration and complete understanding of your workload.
> >And it is quite possible that you instead need to use something
> >other than NO_HZ_FULL.
> >
> >If your question is instead "why must interrupts be disabled during
> >context tracking", I must defer to people who understand the x86
> >entry/exit code paths better than I do.
> >
> >							Thanx, Paul
> >
> >
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Context switch latency in tickless isolated CPU
  2016-08-22 15:37                 ` Paul E. McKenney
@ 2016-08-22 16:35                   ` Mark Hounschell
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Hounschell @ 2016-08-22 16:35 UTC (permalink / raw)
  To: paulmck
  Cc: GeHao Kang, Peter Zijlstra, Chris Metcalf,
	Frédéric Weisbecker, linux-api, linux-kernel, tglx,
	mingo

On 08/22/2016 11:37 AM, Paul E. McKenney wrote:
> On Mon, Aug 22, 2016 at 11:12:45AM -0400, Mark Hounschell wrote:
>> On 08/22/2016 10:48 AM, Paul E. McKenney wrote:
>>> On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote:
>>>> On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney
>>>> <paulmck@linux.vnet.ibm.com> wrote:
>>>>> If latency is all you care about, one approach is to map the device
>>>>> registers into userspace and do the I/O without assistance from the
>>>>> kernel.
>>>> In addition to the context switch latency, local interrupts are also
>>>> closed during
>>>> user_enter and user_exit of the context tracking. Therefore, the interrupt
>>>> latency might be also increased on the isolated tickless CPU. That
>>>> will degrade the
>>>> real time performance. Are these two events determined?
>>>
>>> Hmmm...  Why would you be taking interrupts on your isolated tickless
>>> CPUs?  Doesn't that defeat the purpose of designating them as isolated
>>> and tickless?
>>
>> Don't mean to butt in here but think about a "special" PCI card that
>> does nothing but take an external interrupt or external interrupts
>> from an outside source where the latency between the time it occurs
>> on the outside and the time an isolated processor can act on that
>> event. The IRQ of that card also being pinned/isolated to that
>> processor. This is a very common thing in the RT world.
>
> In this case, the host OS would see an event-driven real-time workload
> from the PCI card, which would lead me to suggest -not- using NO_HZ_FULL
> on the host OS.
>
> Of course, if you are instead building an OS to run on the PCI card
> itself, then the choice of configuration would depend on how the PCI
> card was set up.  If it polled hardware, then NO_HZ_FULL on the PCI card
> might work quite well.  But then you wouldn't have interrupts (on the
> PCI card), so I am guessing that you mean the scenario covered in the
> first paragraph.
>
> Or am I missing your point?
>
> 							Thanx, Paul
>

First paragraph scenario is the one I was referring.

Thanks
Mark

>> Mark
>>
>>> The key point being that effective use of NO_HZ_FULL requires
>>> careful configuration and complete understanding of your workload.
>>> And it is quite possible that you instead need to use something
>>> other than NO_HZ_FULL.
>>>
>>> If your question is instead "why must interrupts be disabled during
>>> context tracking", I must defer to people who understand the x86
>>> entry/exit code paths better than I do.
>>>
>>> 							Thanx, Paul
>>>
>>>
>>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-08-22 16:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-17  6:26 Context switch latency in tickless isolated CPU GeHao Kang
2016-08-17 12:18 ` Chris Metcalf
2016-08-18  3:25   ` GeHao Kang
2016-08-19 12:34     ` Peter Zijlstra
2016-08-21 11:26       ` GeHao Kang
2016-08-21 14:53         ` Paul E. McKenney
2016-08-22  9:40           ` GeHao Kang
2016-08-22 14:48             ` Paul E. McKenney
2016-08-22 15:12               ` Mark Hounschell
2016-08-22 15:37                 ` Paul E. McKenney
2016-08-22 16:35                   ` Mark Hounschell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).