linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers
@ 2019-02-14 13:37 Juri Lelli
  2019-02-14 13:37 ` [RFC PATCH RT 1/2] time/hrtimer: Add PINNED_HARD mode for realtime hrtimers Juri Lelli
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Juri Lelli @ 2019-02-14 13:37 UTC (permalink / raw)
  To: tglx, bigeasy
  Cc: linux-rt-users, peterz, linux-kernel, bristot, williams, Juri Lelli

Hi,

I've been experimenting with 4.19.15-rt12 and I found out that, while
running cyclictest on isolated CPUs, the following thing, where CPU0 is
one of the housekeeping CPUs and CPU2 is isolated, was happening:

     <idle>-0     [000] ... hrtimer_cancel:       hrtimer=0xffffb4a74be7fe70
     <idle>-0     [000] ... hrtimer_expire_entry: hrtimer=0xffffb4a74be7fe70 now=144805770984 function=hrtimer_wakeup/0x0
     <idle>-0     [000] ... sched_wakeup:         cyclictest:1171 [4] success=1 CPU:002
     <idle>-0     [000] ... hrtimer_expire_exit:  hrtimer=0xffffb4a74be7fe70
     <idle>-0     [002] ... sched_switch:         swapper/2:0 [120] R ==> cyclictest:1171 [4]
 cyclictest-1171  [002] ... hrtimer_init:         hrtimer=0xffffb4a74be7fe70 clockid=CLOCK_MONOTONIC mode=0x8
 cyclictest-1171  [002] ... hrtimer_start:        hrtimer=0xffffb4a74be7fe70 function=hrtimer_wakeup/0x0 ...
 cyclictest-1171  [002] ... sched_switch:         cyclictest:1171 [4] S ==> swapper/2:0 [120]

While cyclitest was arming the hrtimer while running on isolated CPU2
(by means of clock_nanosleep), the hrtimer was then firing on CPU0. This
is due to the fact that switch_hrtimer_base(), called at hrtimer enqueue
time, will prefer to enqueue the timer on an housekeeping !idle CPU, if
the timer is not pinned, as it is currently the case.

This looked odd to me, as the problem with this seems to be that we are
measuring wake up latencies across isolated and !isolated domains, which
is against the purpose of configuring the latter.

Since PREEMPT_RT_FULL already forces HARD mode for hrtimers armed by
tasks running with RT policies, it seems to make sense to also force
PINNED mode under the same conditions.

This set implements the behavior, achieving something like the
following:

     <idle>-0     [002] ... hrtimer_cancel:       hrtimer=0xffffafbacc19fe78
     <idle>-0     [002] ... hrtimer_expire_entry: hrtimer=0xffffafbacc19fe78 now=104335855898 function=hrtimer_wakeup/0x0
     <idle>-0     [002] ... sched_wakeup:         cyclictest:1165 [4] success=1 CPU:002
     <idle>-0     [002] ... hrtimer_expire_exit:  hrtimer=0xffffafbacc19fe78
     <idle>-0     [002] ... sched_switch:         swapper/2:0 [120] R ==> cyclictest:1165 [4]
 cyclictest-1165  [002] ... hrtimer_init:         hrtimer=0xffffafbacc19fe78 clockid=CLOCK_MONOTONIC mode=0xa
 cyclictest-1165  [002] ... hrtimer_start:        hrtimer=0xffffafbacc19fe78 function=hrtimer_wakeup/0x0 ...
 cyclictest-1165  [002] ... sched_switch:         cyclictest:1165 [4] S ==> swapper/2:0 [120]

Now, I'm sending this and an RFC, as I'm wondering if the first behavior
is actually what we want, and it is not odd at all for reasons that are
not evident to me at the moment. In this case this posting might also
function as a question: why we need things to work as they are today?

Thanks!

- Juri

Juri Lelli (2):
  time/hrtimer: Add PINNED_HARD mode for realtime hrtimers
  time/hrtimer: Embed hrtimer mode into hrtimer_sleeper

 include/linux/hrtimer.h |  4 ++++
 kernel/time/hrtimer.c   | 13 +++++++------
 2 files changed, 11 insertions(+), 6 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH RT 1/2] time/hrtimer: Add PINNED_HARD mode for realtime hrtimers
  2019-02-14 13:37 [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers Juri Lelli
@ 2019-02-14 13:37 ` Juri Lelli
  2019-02-14 13:37 ` [RFC PATCH RT 2/2] time/hrtimer: Embed hrtimer mode into hrtimer_sleeper Juri Lelli
  2019-02-19 17:19 ` [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers Sebastian Andrzej Siewior
  2 siblings, 0 replies; 7+ messages in thread
From: Juri Lelli @ 2019-02-14 13:37 UTC (permalink / raw)
  To: tglx, bigeasy
  Cc: linux-rt-users, peterz, linux-kernel, bristot, williams, Juri Lelli

While running cyclictest on isolated CPUs I noticed the following odd
behavior, where CPU0 is one of the housekeeping CPUs and CPU2 is
isolated:

     <idle>-0     [000] ... hrtimer_cancel:       hrtimer=0xffffb4a74be7fe70
     <idle>-0     [000] ... hrtimer_expire_entry: hrtimer=0xffffb4a74be7fe70 now=144805770984 function=hrtimer_wakeup/0x0
     <idle>-0     [000] ... sched_wakeup:         cyclictest:1171 [4] success=1 CPU:002
     <idle>-0     [000] ... hrtimer_expire_exit:  hrtimer=0xffffb4a74be7fe70
     <idle>-0     [002] ... sched_switch:         swapper/2:0 [120] R ==> cyclictest:1171 [4]
 cyclictest-1171  [002] ... hrtimer_init:         hrtimer=0xffffb4a74be7fe70 clockid=CLOCK_MONOTONIC mode=0x8
 cyclictest-1171  [002] ... hrtimer_start:        hrtimer=0xffffb4a74be7fe70 function=hrtimer_wakeup/0x0 ...
 cyclictest-1171  [002] ... sched_switch:         cyclictest:1171 [4] S ==> swapper/2:0 [120]

While cyclitest was arming the hrtimer while running on isolated CPU2
(by means of clock_nanosleep), the hrtimer was then firing on CPU0. This
is due to the fact that switch_hrtimer_base(), called at hrtimer enqueue
time, will prefer to enqueue the timer on an housekeeping !idle CPU, if
the timer is not pinned, as it is currently the case.

The problem with this is that we are measuring wake up latencies across
isolated and !isolated domains, which is against the purpose of
configuring the latter.

Since PREEMPT_RT_FULL already forces HARD mode for hrtimers armed by
tasks running with RT policies, it makes sense to also force PINNED mode
under the same conditions.

This patch implements this behavior, achieving something like the
following:

     <idle>-0     [002] ... hrtimer_cancel:       hrtimer=0xffffafbacc19fe78
     <idle>-0     [002] ... hrtimer_expire_entry: hrtimer=0xffffafbacc19fe78 now=104335855898 function=hrtimer_wakeup/0x0
     <idle>-0     [002] ... sched_wakeup:         cyclictest:1165 [4] success=1 CPU:002
     <idle>-0     [002] ... hrtimer_expire_exit:  hrtimer=0xffffafbacc19fe78
     <idle>-0     [002] ... sched_switch:         swapper/2:0 [120] R ==> cyclictest:1165 [4]
 cyclictest-1165  [002] ... hrtimer_init:         hrtimer=0xffffafbacc19fe78 clockid=CLOCK_MONOTONIC mode=0xa
 cyclictest-1165  [002] ... hrtimer_start:        hrtimer=0xffffafbacc19fe78 function=hrtimer_wakeup/0x0 ...
 cyclictest-1165  [002] ... sched_switch:         cyclictest:1165 [4] S ==> swapper/2:0 [120]

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
 include/linux/hrtimer.h | 2 ++
 kernel/time/hrtimer.c   | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 2bdb047c7656..c6d4941c7dd8 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -56,6 +56,8 @@ enum hrtimer_mode {
 	HRTIMER_MODE_ABS_HARD	= HRTIMER_MODE_ABS | HRTIMER_MODE_HARD,
 	HRTIMER_MODE_REL_HARD	= HRTIMER_MODE_REL | HRTIMER_MODE_HARD,
 
+	HRTIMER_MODE_PINNED_HARD = HRTIMER_MODE_PINNED | HRTIMER_MODE_HARD,
+
 	HRTIMER_MODE_ABS_PINNED_HARD = HRTIMER_MODE_ABS_PINNED | HRTIMER_MODE_HARD,
 	HRTIMER_MODE_REL_PINNED_HARD = HRTIMER_MODE_REL_PINNED | HRTIMER_MODE_HARD,
 };
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 851b2134e77f..4a905fb721a1 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1717,7 +1717,7 @@ static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
 #ifdef CONFIG_PREEMPT_RT_FULL
 	if (!(mode & (HRTIMER_MODE_SOFT | HRTIMER_MODE_HARD))) {
 		if (task_is_realtime(current) || system_state != SYSTEM_RUNNING)
-			mode |= HRTIMER_MODE_HARD;
+			mode |= HRTIMER_MODE_PINNED_HARD;
 		else
 			mode |= HRTIMER_MODE_SOFT;
 	}
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH RT 2/2] time/hrtimer: Embed hrtimer mode into hrtimer_sleeper
  2019-02-14 13:37 [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers Juri Lelli
  2019-02-14 13:37 ` [RFC PATCH RT 1/2] time/hrtimer: Add PINNED_HARD mode for realtime hrtimers Juri Lelli
@ 2019-02-14 13:37 ` Juri Lelli
  2019-02-19 17:19 ` [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers Sebastian Andrzej Siewior
  2 siblings, 0 replies; 7+ messages in thread
From: Juri Lelli @ 2019-02-14 13:37 UTC (permalink / raw)
  To: tglx, bigeasy
  Cc: linux-rt-users, peterz, linux-kernel, bristot, williams, Juri Lelli

Changes to hrtimer mode (potentially made by __hrtimer_init_sleeper() on
PREEMPT_RT_FULL are not visible to do_nanosleep(), and thus not
accounted for by hrtimer_start_expires() call path.

Embed hrtimer mode into hrtimer_sleeper struct, so that the same mode is
used by code following hrtimer_sleeper initialization.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
 include/linux/hrtimer.h |  2 ++
 kernel/time/hrtimer.c   | 11 ++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index c6d4941c7dd8..d5f11ef5330a 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -130,12 +130,14 @@ struct hrtimer {
  * struct hrtimer_sleeper - simple sleeper structure
  * @timer:	embedded timer structure
  * @task:	task to wake up
+ * @mode:	hrtimer mode
  *
  * task is set to NULL, when the timer expires.
  */
 struct hrtimer_sleeper {
 	struct hrtimer timer;
 	struct task_struct *task;
+	enum hrtimer_mode mode;
 };
 
 #ifdef CONFIG_64BIT
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 4a905fb721a1..5067aba434fe 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1725,6 +1725,7 @@ static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
 	__hrtimer_init(&sl->timer, clock_id, mode);
 	sl->timer.function = hrtimer_wakeup;
 	sl->task = task;
+	sl->mode = mode;
 }
 
 /**
@@ -1774,20 +1775,20 @@ int nanosleep_copyout(struct restart_block *restart, struct timespec64 *ts)
 	return -ERESTART_RESTARTBLOCK;
 }
 
-static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mode)
+static int __sched do_nanosleep(struct hrtimer_sleeper *t)
 {
 	struct restart_block *restart;
 
 	do {
 		set_current_state(TASK_INTERRUPTIBLE);
-		hrtimer_start_expires(&t->timer, mode);
+		hrtimer_start_expires(&t->timer, t->mode);
 
 		if (likely(t->task))
 			freezable_schedule();
 
 		__set_current_state(TASK_RUNNING);
 		hrtimer_cancel(&t->timer);
-		mode = HRTIMER_MODE_ABS;
+		t->mode = HRTIMER_MODE_ABS;
 
 	} while (t->task && !signal_pending(current));
 
@@ -1817,7 +1818,7 @@ static long __sched hrtimer_nanosleep_restart(struct restart_block *restart)
 	hrtimer_init_sleeper_on_stack(&t, restart->nanosleep.clockid,
 				      HRTIMER_MODE_ABS, current);
 	hrtimer_set_expires_tv64(&t.timer, restart->nanosleep.expires);
-	ret = do_nanosleep(&t, HRTIMER_MODE_ABS);
+	ret = do_nanosleep(&t);
 	destroy_hrtimer_on_stack(&t.timer);
 	return ret;
 }
@@ -1836,7 +1837,7 @@ long hrtimer_nanosleep(const struct timespec64 *rqtp,
 
 	hrtimer_init_sleeper_on_stack(&t, clockid, mode, current);
 	hrtimer_set_expires_range_ns(&t.timer, timespec64_to_ktime(*rqtp), slack);
-	ret = do_nanosleep(&t, mode);
+	ret = do_nanosleep(&t);
 	if (ret != -ERESTART_RESTARTBLOCK)
 		goto out;
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers
  2019-02-14 13:37 [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers Juri Lelli
  2019-02-14 13:37 ` [RFC PATCH RT 1/2] time/hrtimer: Add PINNED_HARD mode for realtime hrtimers Juri Lelli
  2019-02-14 13:37 ` [RFC PATCH RT 2/2] time/hrtimer: Embed hrtimer mode into hrtimer_sleeper Juri Lelli
@ 2019-02-19 17:19 ` Sebastian Andrzej Siewior
  2019-02-20  7:47   ` Juri Lelli
  2 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-19 17:19 UTC (permalink / raw)
  To: Juri Lelli; +Cc: tglx, linux-rt-users, peterz, linux-kernel, bristot, williams

On 2019-02-14 14:37:14 [+0100], Juri Lelli wrote:
> Hi,
Hi,

> Now, I'm sending this and an RFC, as I'm wondering if the first behavior
> is actually what we want, and it is not odd at all for reasons that are
> not evident to me at the moment. In this case this posting might also
> function as a question: why we need things to work as they are today?

There is /proc/sys/kernel/timer_migration which should disable this but
I think you know that already.

So this is a NO_HZ feature. Basically try to move all the timers to a
designated CPU so all others can deep idle while one CPU does the work.
Ideally you have no timer which is pending / will expire if you go idle.
And then, once the timer fires the housekeeping CPU does the work so
chances are that the CPU, that programmed the timer, may remain idle.

In this case you prepare the wakeup and then wake the CPU anyway. There
should be no downside to this unless the housekeeping CPU is busy and in
irq-off regions which would increase the latency. Also in case of
	cyclictest -d0

the one CPU would have to process all timers. So the latency will be
worse compared to every CPU does its own wakeup. And on RT you probably
do not want to do deep idle anyway.

> Thanks!
> 
> - Juri

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers
  2019-02-19 17:19 ` [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers Sebastian Andrzej Siewior
@ 2019-02-20  7:47   ` Juri Lelli
  2019-02-20 15:30     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Lelli @ 2019-02-20  7:47 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: tglx, linux-rt-users, peterz, linux-kernel, bristot, williams

On 19/02/19 18:19, Sebastian Andrzej Siewior wrote:
> On 2019-02-14 14:37:14 [+0100], Juri Lelli wrote:
> > Hi,
> Hi,
> 
> > Now, I'm sending this and an RFC, as I'm wondering if the first behavior
> > is actually what we want, and it is not odd at all for reasons that are
> > not evident to me at the moment. In this case this posting might also
> > function as a question: why we need things to work as they are today?
> 
> There is /proc/sys/kernel/timer_migration which should disable this but
> I think you know that already.
> 
> So this is a NO_HZ feature. Basically try to move all the timers to a
> designated CPU so all others can deep idle while one CPU does the work.
> Ideally you have no timer which is pending / will expire if you go idle.
> And then, once the timer fires the housekeeping CPU does the work so
> chances are that the CPU, that programmed the timer, may remain idle.

Right.

> In this case you prepare the wakeup and then wake the CPU anyway. There
> should be no downside to this unless the housekeeping CPU is busy and in
> irq-off regions which would increase the latency. Also in case of
> 	cyclictest -d0
> 
> the one CPU would have to process all timers. So the latency will be
> worse compared to every CPU does its own wakeup. And on RT you probably
> do not want to do deep idle anyway.

Mmm, right. But, still very much dependent on the workload, I understand
you are saying? So, no one size fits all solution.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers
  2019-02-20  7:47   ` Juri Lelli
@ 2019-02-20 15:30     ` Sebastian Andrzej Siewior
  2019-02-21  8:46       ` Juri Lelli
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-20 15:30 UTC (permalink / raw)
  To: Juri Lelli; +Cc: tglx, linux-rt-users, peterz, linux-kernel, bristot, williams

On 2019-02-20 08:47:51 [+0100], Juri Lelli wrote:
> > In this case you prepare the wakeup and then wake the CPU anyway. There
> > should be no downside to this unless the housekeeping CPU is busy and in
> > irq-off regions which would increase the latency. Also in case of
> > 	cyclictest -d0
> > 
> > the one CPU would have to process all timers. So the latency will be
> > worse compared to every CPU does its own wakeup. And on RT you probably
> > do not want to do deep idle anyway.
> 
> Mmm, right. But, still very much dependent on the workload, I understand
> you are saying? So, no one size fits all solution.

Now that I slept over it, I think it makes sense from RT point of view
to always pin the timer. I'm not sure if we want to swap the sysctl or
make the PINNED change like you suggested.
 
> Thanks,
> 
> - Juri

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers
  2019-02-20 15:30     ` Sebastian Andrzej Siewior
@ 2019-02-21  8:46       ` Juri Lelli
  0 siblings, 0 replies; 7+ messages in thread
From: Juri Lelli @ 2019-02-21  8:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: tglx, linux-rt-users, peterz, linux-kernel, bristot, williams

On 20/02/19 16:30, Sebastian Andrzej Siewior wrote:
> On 2019-02-20 08:47:51 [+0100], Juri Lelli wrote:
> > > In this case you prepare the wakeup and then wake the CPU anyway. There
> > > should be no downside to this unless the housekeeping CPU is busy and in
> > > irq-off regions which would increase the latency. Also in case of
> > > 	cyclictest -d0
> > > 
> > > the one CPU would have to process all timers. So the latency will be
> > > worse compared to every CPU does its own wakeup. And on RT you probably
> > > do not want to do deep idle anyway.
> > 
> > Mmm, right. But, still very much dependent on the workload, I understand
> > you are saying? So, no one size fits all solution.
> 
> Now that I slept over it, I think it makes sense from RT point of view
> to always pin the timer. I'm not sure if we want to swap the sysctl or
> make the PINNED change like you suggested.

My thinking was that it would be nice to be able to discern between
timers coming from RT and !RT tasks so that the latter can be migrated
to housekeeping CPUs, leaving potentially isolated CPUs to deal with the
former only.

Not sure we can achieve this "best of both worlds" policy any way we set
the sysctl default to be.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-02-21  8:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14 13:37 [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers Juri Lelli
2019-02-14 13:37 ` [RFC PATCH RT 1/2] time/hrtimer: Add PINNED_HARD mode for realtime hrtimers Juri Lelli
2019-02-14 13:37 ` [RFC PATCH RT 2/2] time/hrtimer: Embed hrtimer mode into hrtimer_sleeper Juri Lelli
2019-02-19 17:19 ` [RFC PATCH RT 0/2] Add PINNED_HARD mode to hrtimers Sebastian Andrzej Siewior
2019-02-20  7:47   ` Juri Lelli
2019-02-20 15:30     ` Sebastian Andrzej Siewior
2019-02-21  8:46       ` Juri Lelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).