All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock
@ 2017-07-12 15:01 Jia He
  2017-07-12 22:45 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 4+ messages in thread
From: Jia He @ 2017-07-12 15:01 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-kernel, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Ingo Molnar, fweisbec, Thomas Gleixner,
	John Stultz, Stanislaw Gruszka, Ivan Mikhaylov, cyrilbur, Jia He

Virtual time base(vtb) is a register which increases only in guest.
Any exit from guest to host will stop the vtb(saved and restored by kvm).
But if there is an IO causes guest exits to host, the guest's watchdog
(watchdog_timer_fn -> is_softlockup -> get_timestamp -> running_clock)
needs to also include the time elapsed in host. get_vtb is not correct in
this case.

Also, the TB_OFFSET is well saved and restored by qemu after commit [1].
So we can use get_tb here.

[1] http://git.qemu.org/?p=qemu.git;a=commit;h=42043e4f1

Signed-off-by: Jia He <hejianet@gmail.com>
---
 arch/powerpc/kernel/time.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index fe6f3a2..c542dd3 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -695,16 +695,15 @@ notrace unsigned long long sched_clock(void)
 unsigned long long running_clock(void)
 {
 	/*
-	 * Don't read the VTB as a host since KVM does not switch in host
-	 * timebase into the VTB when it takes a guest off the CPU, reading the
-	 * VTB would result in reading 'last switched out' guest VTB.
+	 * Use get_tb instead of get_vtb for guest since the TB_OFFSET has been
+	 * well saved/restored when qemu does suspend/resume.
 	 *
 	 * Host kernels are often compiled with CONFIG_PPC_PSERIES checked, it
 	 * would be unsafe to rely only on the #ifdef above.
 	 */
 	if (firmware_has_feature(FW_FEATURE_LPAR) &&
 	    cpu_has_feature(CPU_FTR_ARCH_207S))
-		return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
+		return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
 
 	/*
 	 * This is a next best approximation without a VTB.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock
  2017-07-12 15:01 [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock Jia He
@ 2017-07-12 22:45 ` Benjamin Herrenschmidt
  2017-07-13  6:55   ` hejianet
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-12 22:45 UTC (permalink / raw)
  To: Jia He, linuxppc-dev
  Cc: linux-kernel, Paul Mackerras, Michael Ellerman, Ingo Molnar,
	fweisbec, Thomas Gleixner, John Stultz, Stanislaw Gruszka,
	Ivan Mikhaylov, cyrilbur

On Wed, 2017-07-12 at 23:01 +0800, Jia He wrote:
> Virtual time base(vtb) is a register which increases only in guest.
> Any exit from guest to host will stop the vtb(saved and restored by kvm).
> But if there is an IO causes guest exits to host, the guest's watchdog
> (watchdog_timer_fn -> is_softlockup -> get_timestamp -> running_clock)
> needs to also include the time elapsed in host. get_vtb is not correct in
> this case.
> 
> Also, the TB_OFFSET is well saved and restored by qemu after commit [1].
> So we can use get_tb here.

That completely defeats the purpose here... This was done specifically
to exploit the VTB which doesn't count in hypervisor mode.

> 
> [1] http://git.qemu.org/?p=qemu.git;a=commit;h=42043e4f1
> 
> Signed-off-by: Jia He <hejianet@gmail.com>
> ---
>  arch/powerpc/kernel/time.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index fe6f3a2..c542dd3 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -695,16 +695,15 @@ notrace unsigned long long sched_clock(void)
>  unsigned long long running_clock(void)
>  {
>  	/*
> -	 * Don't read the VTB as a host since KVM does not switch in host
> -	 * timebase into the VTB when it takes a guest off the CPU, reading the
> -	 * VTB would result in reading 'last switched out' guest VTB.
> +	 * Use get_tb instead of get_vtb for guest since the TB_OFFSET has been
> +	 * well saved/restored when qemu does suspend/resume.
>  	 *
>  	 * Host kernels are often compiled with CONFIG_PPC_PSERIES checked, it
>  	 * would be unsafe to rely only on the #ifdef above.
>  	 */
>  	if (firmware_has_feature(FW_FEATURE_LPAR) &&
>  	    cpu_has_feature(CPU_FTR_ARCH_207S))
> -		return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> +		return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
>  
>  	/*
>  	 * This is a next best approximation without a VTB.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock
  2017-07-12 22:45 ` Benjamin Herrenschmidt
@ 2017-07-13  6:55   ` hejianet
  2017-07-13 20:50     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 4+ messages in thread
From: hejianet @ 2017-07-13  6:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev
  Cc: linux-kernel, Paul Mackerras, Michael Ellerman, Ingo Molnar,
	fweisbec, Thomas Gleixner, John Stultz, Stanislaw Gruszka,
	Ivan Mikhaylov, cyrilbur

Hi Ben
I add some printk logs in watchdog_timer_fn in the guest
[   16.025222] get_vtb=8236291881, get_tb=13756711357, get_timestamp=4
[   20.025624] get_vtb=9745285807, get_tb=15804711283, get_timestamp=7
[   24.025042] get_vtb=11518119641, get_tb=17852711085, get_timestamp=10
[   28.024074] get_vtb=13192704319, get_tb=19900711071, get_timestamp=13
[   32.024086] get_vtb=14856516982, get_tb=21948711066, get_timestamp=16
[   36.024075] get_vtb=16569127618, get_tb=23996711078, get_timestamp=20
[   40.024138] get_vtb=17008865823, get_tb=26044718418, get_timestamp=20
[   44.023993] get_vtb=17020637241, get_tb=28092716383, get_timestamp=20
[   48.023996] get_vtb=17022857170, get_tb=30140718472, get_timestamp=20
[   52.023996] get_vtb=17024268541, get_tb=32188718432, get_timestamp=20
[   56.023996] get_vtb=17036577783, get_tb=34236718077, get_timestamp=20
[   60.023996] get_vtb=17037829743, get_tb=36284718437, get_timestamp=20
[   64.023992] get_vtb=17039846747, get_tb=38332716609, get_timestamp=20
[   68.023991] get_vtb=17041448345, get_tb=40380715903, get_timestamp=20

The get_timestamp(use get_vtb(),unit is second) is slower down compared
with printk time. You also can obviously watch the get_vtb increment is
slowly less than get_tb increment.

Without this patch, I thought there might be some softlockup warnings
missed in guest.

-Jia

On 13/07/2017 6:45 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2017-07-12 at 23:01 +0800, Jia He wrote:
>> Virtual time base(vtb) is a register which increases only in guest.
>> Any exit from guest to host will stop the vtb(saved and restored by kvm).
>> But if there is an IO causes guest exits to host, the guest's watchdog
>> (watchdog_timer_fn -> is_softlockup -> get_timestamp -> running_clock)
>> needs to also include the time elapsed in host. get_vtb is not correct in
>> this case.
>>
>> Also, the TB_OFFSET is well saved and restored by qemu after commit [1].
>> So we can use get_tb here.
> 
> That completely defeats the purpose here... This was done specifically
> to exploit the VTB which doesn't count in hypervisor mode.
> 
>>
>> [1] http://git.qemu.org/?p=qemu.git;a=commit;h=42043e4f1
>>
>> Signed-off-by: Jia He <hejianet@gmail.com>
>> ---
>>   arch/powerpc/kernel/time.c | 7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
>> index fe6f3a2..c542dd3 100644
>> --- a/arch/powerpc/kernel/time.c
>> +++ b/arch/powerpc/kernel/time.c
>> @@ -695,16 +695,15 @@ notrace unsigned long long sched_clock(void)
>>   unsigned long long running_clock(void)
>>   {
>>   	/*
>> -	 * Don't read the VTB as a host since KVM does not switch in host
>> -	 * timebase into the VTB when it takes a guest off the CPU, reading the
>> -	 * VTB would result in reading 'last switched out' guest VTB.
>> +	 * Use get_tb instead of get_vtb for guest since the TB_OFFSET has been
>> +	 * well saved/restored when qemu does suspend/resume.
>>   	 *
>>   	 * Host kernels are often compiled with CONFIG_PPC_PSERIES checked, it
>>   	 * would be unsafe to rely only on the #ifdef above.
>>   	 */
>>   	if (firmware_has_feature(FW_FEATURE_LPAR) &&
>>   	    cpu_has_feature(CPU_FTR_ARCH_207S))
>> -		return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
>> +		return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
>>   
>>   	/*
>>   	 * This is a next best approximation without a VTB.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock
  2017-07-13  6:55   ` hejianet
@ 2017-07-13 20:50     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 4+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-13 20:50 UTC (permalink / raw)
  To: hejianet, linuxppc-dev
  Cc: linux-kernel, Paul Mackerras, Michael Ellerman, Ingo Molnar,
	fweisbec, Thomas Gleixner, John Stultz, Stanislaw Gruszka,
	Ivan Mikhaylov, cyrilbur

On Thu, 2017-07-13 at 14:55 +0800, hejianet wrote:
> Hi Ben
> I add some printk logs in watchdog_timer_fn in the guest
> [   16.025222] get_vtb=8236291881, get_tb=13756711357, get_timestamp=4
> [   20.025624] get_vtb=9745285807, get_tb=15804711283, get_timestamp=7
> [   24.025042] get_vtb=11518119641, get_tb=17852711085, get_timestamp=10
> [   28.024074] get_vtb=13192704319, get_tb=19900711071, get_timestamp=13
> [   32.024086] get_vtb=14856516982, get_tb=21948711066, get_timestamp=16
> [   36.024075] get_vtb=16569127618, get_tb=23996711078, get_timestamp=20
> [   40.024138] get_vtb=17008865823, get_tb=26044718418, get_timestamp=20
> [   44.023993] get_vtb=17020637241, get_tb=28092716383, get_timestamp=20
> [   48.023996] get_vtb=17022857170, get_tb=30140718472, get_timestamp=20
> [   52.023996] get_vtb=17024268541, get_tb=32188718432, get_timestamp=20
> [   56.023996] get_vtb=17036577783, get_tb=34236718077, get_timestamp=20
> [   60.023996] get_vtb=17037829743, get_tb=36284718437, get_timestamp=20
> [   64.023992] get_vtb=17039846747, get_tb=38332716609, get_timestamp=20
> [   68.023991] get_vtb=17041448345, get_tb=40380715903, get_timestamp=20
> 
> The get_timestamp(use get_vtb(),unit is second) is slower down compared
> with printk time. You also can obviously watch the get_vtb increment is
> slowly less than get_tb increment.

But that is the entire point of vtb ... because it only counts when
running in the guest, not the time spent in the host.

> Without this patch, I thought there might be some softlockup warnings
> missed in guest.

Ugh ? We already have too many of these ! On the contrary, we don't
want the guest to start to spew soft lockup warnings because it was not
scheduled by the host enough. The guest soft lockup warnings should be
specifically about something bad happening inside the guest.

Your patch seems to defeat the whole purpose of that running_clock()
function unless I'm somewhat mistaken.

Ben.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-07-13 21:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-12 15:01 [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock Jia He
2017-07-12 22:45 ` Benjamin Herrenschmidt
2017-07-13  6:55   ` hejianet
2017-07-13 20:50     ` Benjamin Herrenschmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.