All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage
@ 2019-03-01  2:35 Dongli Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Dongli Zhang @ 2019-03-01  2:35 UTC (permalink / raw)
  To: xen-devel, stable, linux-kernel
  Cc: jgross, Herbert Van Den Bergh, sstabellini, sboyd, joe.jin,
	john.stultz, boris.ostrovsky, tglx

This issue is only for stable 4.9.x (e.g., 4.9.160), while the root cause is
still in the lasted mainline kernel.

This is obviated by new feature patch set ended with b672592f0221
("sched/cputime: Remove generic asm headers").

After xen guest is up for long time, once we hotplug new vcpu, the corresponding
steal usage might become 100% and the steal time from /proc/stat would increase
abnormally.

As we cannot wait for long time to reproduce the issue, here is how I reproduce
it on purpose by accounting a large initial steal clock for new vcpu 2 and 3.

1. Apply the below patch to guest 4.9.160 to account large initial steal clock
for new vcpu 2 and 3:

diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index ac5f23f..3cf629e 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -85,7 +85,14 @@ u64 xen_steal_clock(int cpu)
        struct vcpu_runstate_info state;
 
        xen_get_runstate_snapshot_cpu(&state, cpu);
-       return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline];
+
+       if (cpu == 2 || cpu == 3)
+               return state.time[RUNSTATE_runnable]
+                      + state.time[RUNSTATE_offline]
+                      + 0x00071e87e677aa12;
+       else
+               return state.time[RUNSTATE_runnable]
+                      + state.time[RUNSTATE_offline];
 }
 
 void xen_setup_runstate_info(int cpu)


2. Boot hvm guest with "vcpus=2" and "maxvcpus=4". By default, VM boot with
vcpu 0 and 1.

3. Hotplug vcpu 2 and 3 via "xl vcpu-set <domid> 4" on dom0.

In my env, the steal becomes 100% within 10s after the "xl vcpu-set" command on
dom0.

I can reproduce on kvm with similar method. However, as the initial steal clock
on kvm guest is always 0, I do not think it is easy to hit this issue on kvm.

--------------------------------------------------------

The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
but not 'unsigned long'. As a result, the leading 32 bits are discarded.

jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
If guest is already up for long time, the initial steal time for new vcpu might
be large and the leading 32 bits of jiffies_to_usecs() would be discarded.

As a result, the steal at line 259 is always large and the
this_rq()->prev_steal_time at line 264 is always small. The difference at line
260 is always large during each time steal_account_process_time() is involved.
Finally, the steal time in /proc/stat would increase abnormally.

252 static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
253 {
254 #ifdef CONFIG_PARAVIRT
255         if (static_key_false(&paravirt_steal_enabled)) {
256                 cputime_t steal_cputime;
257                 u64 steal;
258 
259                 steal = paravirt_steal_clock(smp_processor_id());
260                 steal -= this_rq()->prev_steal_time;
261 
262                 steal_cputime = min(nsecs_to_cputime(steal), maxtime);
263                 account_steal_time(steal_cputime);
264                 this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);
265 
266                 return steal_cputime;
267         }
268 #endif
269         return 0;
270 }

--------------------------------------------------------

I have emailed the kernel mailing list about the return type of
jiffies_to_usecs() and jiffies_to_msecs():

https://lkml.org/lkml/2019/2/26/899


So far, I have two solutions:

1. Change the return type from 'unsigned int' to 'unsigned long' as in above
link and I am afraid it would bring side effect. The return type in latest
mainline kernel is still 'unsigned int'.

2. Something like below based on stable 4.9.160:

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 734377a..9b1fc40 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -286,10 +286,11 @@ extern unsigned long preset_lpj;
  */
 extern unsigned int jiffies_to_msecs(const unsigned long j);
 extern unsigned int jiffies_to_usecs(const unsigned long j);
+extern unsigned long jiffies_to_usecs64(const unsigned long j);
 
 static inline u64 jiffies_to_nsecs(const unsigned long j)
 {
-       return (u64)jiffies_to_usecs(j) * NSEC_PER_USEC;
+       return (u64)jiffies_to_usecs64(j) * NSEC_PER_USEC;
 }
 
 extern u64 jiffies64_to_nsecs(u64 j);
diff --git a/kernel/time/time.c b/kernel/time/time.c
index a5b6d98..256c147 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -288,6 +288,27 @@ unsigned int jiffies_to_usecs(const unsigned long j)
 }
 EXPORT_SYMBOL(jiffies_to_usecs);
 
+unsigned long jiffies_to_usecs64(const unsigned long j)
+{
+       /*
+        * Hz usually doesn't go much further MSEC_PER_SEC.
+        * jiffies_to_usecs() and usecs_to_jiffies() depend on that.
+        */
+       BUILD_BUG_ON(HZ > USEC_PER_SEC);
+
+#if !(USEC_PER_SEC % HZ)
+       return (USEC_PER_SEC / HZ) * j;
+#else
+# if BITS_PER_LONG == 32
+       return (HZ_TO_USEC_MUL32 * j) >> HZ_TO_USEC_SHR32;
+# else
+       return (j * HZ_TO_USEC_NUM) / HZ_TO_USEC_DEN;
+# endif
+#endif
+}
+EXPORT_SYMBOL(jiffies_to_usecs64);
+
+
 /**
  * timespec_trunc - Truncate timespec to a granularity
  * @t: Timespec

People may dislike the 2nd solution.

3. Backport patch set ended with b672592f0221 ("sched/cputime: 
Remove generic asm headers").

This is not reasonable for stable branch as the patch set involves
lots of changes.


Would you please let me know if there is any suggestion on this issue?

Thank you very much!

Dongli Zhang

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage
  2019-03-01 23:43 ` Thomas Gleixner
  2019-03-05  2:19   ` Dongli Zhang
@ 2019-03-05  2:19   ` Dongli Zhang
  1 sibling, 0 replies; 10+ messages in thread
From: Dongli Zhang @ 2019-03-05  2:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: xen-devel, stable, linux-kernel, boris.ostrovsky, sstabellini,
	jgross, joe.jin, Herbert Van Den Bergh, sboyd, john.stultz

Hi Thomas,

On 3/2/19 7:43 AM, Thomas Gleixner wrote:
> On Thu, 28 Feb 2019, Dongli Zhang wrote:
>>
>> The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
>> but not 'unsigned long'. As a result, the leading 32 bits are discarded.
> 
> Errm. No. The root cause is that jiffies_to_usecs() is used for that in the
> first place. The function has been that way forever and all usage sites
> (except a broken dev_debug print in infiniband) feed delta values. Yes, it
> could have documentation....

Thank you very much for the explanation. It would help the developers clarify
the usage of jiffies_to_usecs() (which we should always feed with dealt value)
with comments above it.

Indeed, the input value in this bug is also a delta value. Because of the
special mechanisms used by xen to account steal clock, the initial delta value
is always very large, only when the new cpu is added after the VM is already up
for very long time.

Dongli Zhang


> 
>> jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
>> If guest is already up for long time, the initial steal time for new vcpu might
>> be large and the leading 32 bits of jiffies_to_usecs() would be discarded.
> 
>> So far, I have two solutions:
>>
>> 1. Change the return type from 'unsigned int' to 'unsigned long' as in above
>> link and I am afraid it would bring side effect. The return type in latest
>> mainline kernel is still 'unsigned int'.
> 
> Changing it to unsigned long would just solve the issue for 64bit.
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage
  2019-03-01 23:43 ` Thomas Gleixner
@ 2019-03-05  2:19   ` Dongli Zhang
  2019-03-05  2:19   ` Dongli Zhang
  1 sibling, 0 replies; 10+ messages in thread
From: Dongli Zhang @ 2019-03-05  2:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: jgross, Herbert Van Den Bergh, sstabellini, sboyd, joe.jin,
	linux-kernel, stable, john.stultz, xen-devel, boris.ostrovsky

Hi Thomas,

On 3/2/19 7:43 AM, Thomas Gleixner wrote:
> On Thu, 28 Feb 2019, Dongli Zhang wrote:
>>
>> The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
>> but not 'unsigned long'. As a result, the leading 32 bits are discarded.
> 
> Errm. No. The root cause is that jiffies_to_usecs() is used for that in the
> first place. The function has been that way forever and all usage sites
> (except a broken dev_debug print in infiniband) feed delta values. Yes, it
> could have documentation....

Thank you very much for the explanation. It would help the developers clarify
the usage of jiffies_to_usecs() (which we should always feed with dealt value)
with comments above it.

Indeed, the input value in this bug is also a delta value. Because of the
special mechanisms used by xen to account steal clock, the initial delta value
is always very large, only when the new cpu is added after the VM is already up
for very long time.

Dongli Zhang


> 
>> jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
>> If guest is already up for long time, the initial steal time for new vcpu might
>> be large and the leading 32 bits of jiffies_to_usecs() would be discarded.
> 
>> So far, I have two solutions:
>>
>> 1. Change the return type from 'unsigned int' to 'unsigned long' as in above
>> link and I am afraid it would bring side effect. The return type in latest
>> mainline kernel is still 'unsigned int'.
> 
> Changing it to unsigned long would just solve the issue for 64bit.
> 
> Thanks,
> 
> 	tglx
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage
  2019-03-04  8:14 ` Juergen Gross
@ 2019-03-05  2:14   ` Dongli Zhang
  2019-03-05  2:14   ` Dongli Zhang
  1 sibling, 0 replies; 10+ messages in thread
From: Dongli Zhang @ 2019-03-05  2:14 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, stable, linux-kernel, boris.ostrovsky, sstabellini,
	tglx, joe.jin, Herbert Van Den Bergh, sboyd, john.stultz

Hi Juergen,

On 3/4/19 4:14 PM, Juergen Gross wrote:
> On 01/03/2019 03:35, Dongli Zhang wrote:
>> This issue is only for stable 4.9.x (e.g., 4.9.160), while the root cause is
>> still in the lasted mainline kernel.
>>
>> This is obviated by new feature patch set ended with b672592f0221
>> ("sched/cputime: Remove generic asm headers").
>>
>> After xen guest is up for long time, once we hotplug new vcpu, the corresponding
>> steal usage might become 100% and the steal time from /proc/stat would increase
>> abnormally.
>>
>> As we cannot wait for long time to reproduce the issue, here is how I reproduce
>> it on purpose by accounting a large initial steal clock for new vcpu 2 and 3.
>>
>> 1. Apply the below patch to guest 4.9.160 to account large initial steal clock
>> for new vcpu 2 and 3:
>>
>> diff --git a/drivers/xen/time.c b/drivers/xen/time.c
>> index ac5f23f..3cf629e 100644
>> --- a/drivers/xen/time.c
>> +++ b/drivers/xen/time.c
>> @@ -85,7 +85,14 @@ u64 xen_steal_clock(int cpu)
>>         struct vcpu_runstate_info state;
>>  
>>         xen_get_runstate_snapshot_cpu(&state, cpu);
>> -       return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline];
>> +
>> +       if (cpu == 2 || cpu == 3)
>> +               return state.time[RUNSTATE_runnable]
>> +                      + state.time[RUNSTATE_offline]
>> +                      + 0x00071e87e677aa12;
>> +       else
>> +               return state.time[RUNSTATE_runnable]
>> +                      + state.time[RUNSTATE_offline];
>>  }
>>  
>>  void xen_setup_runstate_info(int cpu)
>>
>>
>> 2. Boot hvm guest with "vcpus=2" and "maxvcpus=4". By default, VM boot with
>> vcpu 0 and 1.
>>
>> 3. Hotplug vcpu 2 and 3 via "xl vcpu-set <domid> 4" on dom0.
>>
>> In my env, the steal becomes 100% within 10s after the "xl vcpu-set" command on
>> dom0.
>>
>> I can reproduce on kvm with similar method. However, as the initial steal clock
>> on kvm guest is always 0, I do not think it is easy to hit this issue on kvm.
>>
>> --------------------------------------------------------
>>
>> The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
>> but not 'unsigned long'. As a result, the leading 32 bits are discarded.
>>
>> jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
>> If guest is already up for long time, the initial steal time for new vcpu might
>> be large and the leading 32 bits of jiffies_to_usecs() would be discarded.
>>
>> As a result, the steal at line 259 is always large and the
>> this_rq()->prev_steal_time at line 264 is always small. The difference at line
>> 260 is always large during each time steal_account_process_time() is involved.
>> Finally, the steal time in /proc/stat would increase abnormally.
>>
>> 252 static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
>> 253 {
>> 254 #ifdef CONFIG_PARAVIRT
>> 255         if (static_key_false(&paravirt_steal_enabled)) {
>> 256                 cputime_t steal_cputime;
>> 257                 u64 steal;
>> 258 
>> 259                 steal = paravirt_steal_clock(smp_processor_id());
>> 260                 steal -= this_rq()->prev_steal_time;
>> 261 
>> 262                 steal_cputime = min(nsecs_to_cputime(steal), maxtime);
>> 263                 account_steal_time(steal_cputime);
>> 264                 this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);
>> 265 
>> 266                 return steal_cputime;
>> 267         }
>> 268 #endif
>> 269         return 0;
>> 270 }
>>
>> --------------------------------------------------------
>>
>> I have emailed the kernel mailing list about the return type of
>> jiffies_to_usecs() and jiffies_to_msecs():
>>
>> https://lkml.org/lkml/2019/2/26/899
>>
>>
>> So far, I have two solutions:
>>
>> 1. Change the return type from 'unsigned int' to 'unsigned long' as in above
>> link and I am afraid it would bring side effect. The return type in latest
>> mainline kernel is still 'unsigned int'.
>>
>> 2. Something like below based on stable 4.9.160:
> 
> 3. use jiffies64_to_nsecs() instead of trying to open code it.

Thank you very much for the suggestion!

I have tested that jiffies64_to_nsecs() works well by reproducing the issue in
kvm guest on purpose.

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 734377a..94aff43 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -287,13 +287,13 @@ extern unsigned long preset_lpj;
 extern unsigned int jiffies_to_msecs(const unsigned long j);
 extern unsigned int jiffies_to_usecs(const unsigned long j);

+extern u64 jiffies64_to_nsecs(u64 j);
+
 static inline u64 jiffies_to_nsecs(const unsigned long j)
 {
-       return (u64)jiffies_to_usecs(j) * NSEC_PER_USEC;
+       return jiffies64_to_nsecs(j);
 }

-extern u64 jiffies64_to_nsecs(u64 j);



Below is the patch used to reproduce on kvm. cpu 2 is added to reproduce on
purpose via "device_add
qemu64-x86_64-cpu,id=core2,socket-id=2,core-id=0,thread-id=0" in qemu monitor.
Guest is boot with qemu cmd "-smp 2,maxcpus=3"

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 77f17cb..54d46e0 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -407,7 +407,10 @@ static u64 kvm_steal_clock(int cpu)
                rmb();
        } while ((version & 1) || (version != src->version));

-       return steal;
+       if (cpu == 0 || cpu == 1)
+               return steal;
+       else
+               return steal + 0x00071e87e677aa12;
 }


Dongli Zhang

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage
  2019-03-04  8:14 ` Juergen Gross
  2019-03-05  2:14   ` Dongli Zhang
@ 2019-03-05  2:14   ` Dongli Zhang
  1 sibling, 0 replies; 10+ messages in thread
From: Dongli Zhang @ 2019-03-05  2:14 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Herbert Van Den Bergh, sstabellini, sboyd, joe.jin, linux-kernel,
	stable, john.stultz, xen-devel, boris.ostrovsky, tglx

Hi Juergen,

On 3/4/19 4:14 PM, Juergen Gross wrote:
> On 01/03/2019 03:35, Dongli Zhang wrote:
>> This issue is only for stable 4.9.x (e.g., 4.9.160), while the root cause is
>> still in the lasted mainline kernel.
>>
>> This is obviated by new feature patch set ended with b672592f0221
>> ("sched/cputime: Remove generic asm headers").
>>
>> After xen guest is up for long time, once we hotplug new vcpu, the corresponding
>> steal usage might become 100% and the steal time from /proc/stat would increase
>> abnormally.
>>
>> As we cannot wait for long time to reproduce the issue, here is how I reproduce
>> it on purpose by accounting a large initial steal clock for new vcpu 2 and 3.
>>
>> 1. Apply the below patch to guest 4.9.160 to account large initial steal clock
>> for new vcpu 2 and 3:
>>
>> diff --git a/drivers/xen/time.c b/drivers/xen/time.c
>> index ac5f23f..3cf629e 100644
>> --- a/drivers/xen/time.c
>> +++ b/drivers/xen/time.c
>> @@ -85,7 +85,14 @@ u64 xen_steal_clock(int cpu)
>>         struct vcpu_runstate_info state;
>>  
>>         xen_get_runstate_snapshot_cpu(&state, cpu);
>> -       return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline];
>> +
>> +       if (cpu == 2 || cpu == 3)
>> +               return state.time[RUNSTATE_runnable]
>> +                      + state.time[RUNSTATE_offline]
>> +                      + 0x00071e87e677aa12;
>> +       else
>> +               return state.time[RUNSTATE_runnable]
>> +                      + state.time[RUNSTATE_offline];
>>  }
>>  
>>  void xen_setup_runstate_info(int cpu)
>>
>>
>> 2. Boot hvm guest with "vcpus=2" and "maxvcpus=4". By default, VM boot with
>> vcpu 0 and 1.
>>
>> 3. Hotplug vcpu 2 and 3 via "xl vcpu-set <domid> 4" on dom0.
>>
>> In my env, the steal becomes 100% within 10s after the "xl vcpu-set" command on
>> dom0.
>>
>> I can reproduce on kvm with similar method. However, as the initial steal clock
>> on kvm guest is always 0, I do not think it is easy to hit this issue on kvm.
>>
>> --------------------------------------------------------
>>
>> The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
>> but not 'unsigned long'. As a result, the leading 32 bits are discarded.
>>
>> jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
>> If guest is already up for long time, the initial steal time for new vcpu might
>> be large and the leading 32 bits of jiffies_to_usecs() would be discarded.
>>
>> As a result, the steal at line 259 is always large and the
>> this_rq()->prev_steal_time at line 264 is always small. The difference at line
>> 260 is always large during each time steal_account_process_time() is involved.
>> Finally, the steal time in /proc/stat would increase abnormally.
>>
>> 252 static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
>> 253 {
>> 254 #ifdef CONFIG_PARAVIRT
>> 255         if (static_key_false(&paravirt_steal_enabled)) {
>> 256                 cputime_t steal_cputime;
>> 257                 u64 steal;
>> 258 
>> 259                 steal = paravirt_steal_clock(smp_processor_id());
>> 260                 steal -= this_rq()->prev_steal_time;
>> 261 
>> 262                 steal_cputime = min(nsecs_to_cputime(steal), maxtime);
>> 263                 account_steal_time(steal_cputime);
>> 264                 this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);
>> 265 
>> 266                 return steal_cputime;
>> 267         }
>> 268 #endif
>> 269         return 0;
>> 270 }
>>
>> --------------------------------------------------------
>>
>> I have emailed the kernel mailing list about the return type of
>> jiffies_to_usecs() and jiffies_to_msecs():
>>
>> https://lkml.org/lkml/2019/2/26/899
>>
>>
>> So far, I have two solutions:
>>
>> 1. Change the return type from 'unsigned int' to 'unsigned long' as in above
>> link and I am afraid it would bring side effect. The return type in latest
>> mainline kernel is still 'unsigned int'.
>>
>> 2. Something like below based on stable 4.9.160:
> 
> 3. use jiffies64_to_nsecs() instead of trying to open code it.

Thank you very much for the suggestion!

I have tested that jiffies64_to_nsecs() works well by reproducing the issue in
kvm guest on purpose.

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 734377a..94aff43 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -287,13 +287,13 @@ extern unsigned long preset_lpj;
 extern unsigned int jiffies_to_msecs(const unsigned long j);
 extern unsigned int jiffies_to_usecs(const unsigned long j);

+extern u64 jiffies64_to_nsecs(u64 j);
+
 static inline u64 jiffies_to_nsecs(const unsigned long j)
 {
-       return (u64)jiffies_to_usecs(j) * NSEC_PER_USEC;
+       return jiffies64_to_nsecs(j);
 }

-extern u64 jiffies64_to_nsecs(u64 j);



Below is the patch used to reproduce on kvm. cpu 2 is added to reproduce on
purpose via "device_add
qemu64-x86_64-cpu,id=core2,socket-id=2,core-id=0,thread-id=0" in qemu monitor.
Guest is boot with qemu cmd "-smp 2,maxcpus=3"

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 77f17cb..54d46e0 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -407,7 +407,10 @@ static u64 kvm_steal_clock(int cpu)
                rmb();
        } while ((version & 1) || (version != src->version));

-       return steal;
+       if (cpu == 0 || cpu == 1)
+               return steal;
+       else
+               return steal + 0x00071e87e677aa12;
 }


Dongli Zhang

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage
  2019-03-01  2:35 Dongli Zhang
  2019-03-01 23:43 ` Thomas Gleixner
  2019-03-01 23:43 ` Thomas Gleixner
@ 2019-03-04  8:14 ` Juergen Gross
  2019-03-05  2:14   ` Dongli Zhang
  2019-03-05  2:14   ` Dongli Zhang
  2019-03-04  8:14 ` Juergen Gross
  3 siblings, 2 replies; 10+ messages in thread
From: Juergen Gross @ 2019-03-04  8:14 UTC (permalink / raw)
  To: Dongli Zhang, xen-devel, stable, linux-kernel
  Cc: boris.ostrovsky, sstabellini, tglx, joe.jin,
	Herbert Van Den Bergh, sboyd, john.stultz

On 01/03/2019 03:35, Dongli Zhang wrote:
> This issue is only for stable 4.9.x (e.g., 4.9.160), while the root cause is
> still in the lasted mainline kernel.
> 
> This is obviated by new feature patch set ended with b672592f0221
> ("sched/cputime: Remove generic asm headers").
> 
> After xen guest is up for long time, once we hotplug new vcpu, the corresponding
> steal usage might become 100% and the steal time from /proc/stat would increase
> abnormally.
> 
> As we cannot wait for long time to reproduce the issue, here is how I reproduce
> it on purpose by accounting a large initial steal clock for new vcpu 2 and 3.
> 
> 1. Apply the below patch to guest 4.9.160 to account large initial steal clock
> for new vcpu 2 and 3:
> 
> diff --git a/drivers/xen/time.c b/drivers/xen/time.c
> index ac5f23f..3cf629e 100644
> --- a/drivers/xen/time.c
> +++ b/drivers/xen/time.c
> @@ -85,7 +85,14 @@ u64 xen_steal_clock(int cpu)
>         struct vcpu_runstate_info state;
>  
>         xen_get_runstate_snapshot_cpu(&state, cpu);
> -       return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline];
> +
> +       if (cpu == 2 || cpu == 3)
> +               return state.time[RUNSTATE_runnable]
> +                      + state.time[RUNSTATE_offline]
> +                      + 0x00071e87e677aa12;
> +       else
> +               return state.time[RUNSTATE_runnable]
> +                      + state.time[RUNSTATE_offline];
>  }
>  
>  void xen_setup_runstate_info(int cpu)
> 
> 
> 2. Boot hvm guest with "vcpus=2" and "maxvcpus=4". By default, VM boot with
> vcpu 0 and 1.
> 
> 3. Hotplug vcpu 2 and 3 via "xl vcpu-set <domid> 4" on dom0.
> 
> In my env, the steal becomes 100% within 10s after the "xl vcpu-set" command on
> dom0.
> 
> I can reproduce on kvm with similar method. However, as the initial steal clock
> on kvm guest is always 0, I do not think it is easy to hit this issue on kvm.
> 
> --------------------------------------------------------
> 
> The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
> but not 'unsigned long'. As a result, the leading 32 bits are discarded.
> 
> jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
> If guest is already up for long time, the initial steal time for new vcpu might
> be large and the leading 32 bits of jiffies_to_usecs() would be discarded.
> 
> As a result, the steal at line 259 is always large and the
> this_rq()->prev_steal_time at line 264 is always small. The difference at line
> 260 is always large during each time steal_account_process_time() is involved.
> Finally, the steal time in /proc/stat would increase abnormally.
> 
> 252 static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
> 253 {
> 254 #ifdef CONFIG_PARAVIRT
> 255         if (static_key_false(&paravirt_steal_enabled)) {
> 256                 cputime_t steal_cputime;
> 257                 u64 steal;
> 258 
> 259                 steal = paravirt_steal_clock(smp_processor_id());
> 260                 steal -= this_rq()->prev_steal_time;
> 261 
> 262                 steal_cputime = min(nsecs_to_cputime(steal), maxtime);
> 263                 account_steal_time(steal_cputime);
> 264                 this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);
> 265 
> 266                 return steal_cputime;
> 267         }
> 268 #endif
> 269         return 0;
> 270 }
> 
> --------------------------------------------------------
> 
> I have emailed the kernel mailing list about the return type of
> jiffies_to_usecs() and jiffies_to_msecs():
> 
> https://lkml.org/lkml/2019/2/26/899
> 
> 
> So far, I have two solutions:
> 
> 1. Change the return type from 'unsigned int' to 'unsigned long' as in above
> link and I am afraid it would bring side effect. The return type in latest
> mainline kernel is still 'unsigned int'.
> 
> 2. Something like below based on stable 4.9.160:

3. use jiffies64_to_nsecs() instead of trying to open code it.


Juergen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage
  2019-03-01  2:35 Dongli Zhang
                   ` (2 preceding siblings ...)
  2019-03-04  8:14 ` Juergen Gross
@ 2019-03-04  8:14 ` Juergen Gross
  3 siblings, 0 replies; 10+ messages in thread
From: Juergen Gross @ 2019-03-04  8:14 UTC (permalink / raw)
  To: Dongli Zhang, xen-devel, stable, linux-kernel
  Cc: Herbert Van Den Bergh, sstabellini, sboyd, joe.jin, john.stultz,
	boris.ostrovsky, tglx

On 01/03/2019 03:35, Dongli Zhang wrote:
> This issue is only for stable 4.9.x (e.g., 4.9.160), while the root cause is
> still in the lasted mainline kernel.
> 
> This is obviated by new feature patch set ended with b672592f0221
> ("sched/cputime: Remove generic asm headers").
> 
> After xen guest is up for long time, once we hotplug new vcpu, the corresponding
> steal usage might become 100% and the steal time from /proc/stat would increase
> abnormally.
> 
> As we cannot wait for long time to reproduce the issue, here is how I reproduce
> it on purpose by accounting a large initial steal clock for new vcpu 2 and 3.
> 
> 1. Apply the below patch to guest 4.9.160 to account large initial steal clock
> for new vcpu 2 and 3:
> 
> diff --git a/drivers/xen/time.c b/drivers/xen/time.c
> index ac5f23f..3cf629e 100644
> --- a/drivers/xen/time.c
> +++ b/drivers/xen/time.c
> @@ -85,7 +85,14 @@ u64 xen_steal_clock(int cpu)
>         struct vcpu_runstate_info state;
>  
>         xen_get_runstate_snapshot_cpu(&state, cpu);
> -       return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline];
> +
> +       if (cpu == 2 || cpu == 3)
> +               return state.time[RUNSTATE_runnable]
> +                      + state.time[RUNSTATE_offline]
> +                      + 0x00071e87e677aa12;
> +       else
> +               return state.time[RUNSTATE_runnable]
> +                      + state.time[RUNSTATE_offline];
>  }
>  
>  void xen_setup_runstate_info(int cpu)
> 
> 
> 2. Boot hvm guest with "vcpus=2" and "maxvcpus=4". By default, VM boot with
> vcpu 0 and 1.
> 
> 3. Hotplug vcpu 2 and 3 via "xl vcpu-set <domid> 4" on dom0.
> 
> In my env, the steal becomes 100% within 10s after the "xl vcpu-set" command on
> dom0.
> 
> I can reproduce on kvm with similar method. However, as the initial steal clock
> on kvm guest is always 0, I do not think it is easy to hit this issue on kvm.
> 
> --------------------------------------------------------
> 
> The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
> but not 'unsigned long'. As a result, the leading 32 bits are discarded.
> 
> jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
> If guest is already up for long time, the initial steal time for new vcpu might
> be large and the leading 32 bits of jiffies_to_usecs() would be discarded.
> 
> As a result, the steal at line 259 is always large and the
> this_rq()->prev_steal_time at line 264 is always small. The difference at line
> 260 is always large during each time steal_account_process_time() is involved.
> Finally, the steal time in /proc/stat would increase abnormally.
> 
> 252 static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
> 253 {
> 254 #ifdef CONFIG_PARAVIRT
> 255         if (static_key_false(&paravirt_steal_enabled)) {
> 256                 cputime_t steal_cputime;
> 257                 u64 steal;
> 258 
> 259                 steal = paravirt_steal_clock(smp_processor_id());
> 260                 steal -= this_rq()->prev_steal_time;
> 261 
> 262                 steal_cputime = min(nsecs_to_cputime(steal), maxtime);
> 263                 account_steal_time(steal_cputime);
> 264                 this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);
> 265 
> 266                 return steal_cputime;
> 267         }
> 268 #endif
> 269         return 0;
> 270 }
> 
> --------------------------------------------------------
> 
> I have emailed the kernel mailing list about the return type of
> jiffies_to_usecs() and jiffies_to_msecs():
> 
> https://lkml.org/lkml/2019/2/26/899
> 
> 
> So far, I have two solutions:
> 
> 1. Change the return type from 'unsigned int' to 'unsigned long' as in above
> link and I am afraid it would bring side effect. The return type in latest
> mainline kernel is still 'unsigned int'.
> 
> 2. Something like below based on stable 4.9.160:

3. use jiffies64_to_nsecs() instead of trying to open code it.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage
  2019-03-01  2:35 Dongli Zhang
  2019-03-01 23:43 ` Thomas Gleixner
@ 2019-03-01 23:43 ` Thomas Gleixner
  2019-03-05  2:19   ` Dongli Zhang
  2019-03-05  2:19   ` Dongli Zhang
  2019-03-04  8:14 ` Juergen Gross
  2019-03-04  8:14 ` Juergen Gross
  3 siblings, 2 replies; 10+ messages in thread
From: Thomas Gleixner @ 2019-03-01 23:43 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: xen-devel, stable, linux-kernel, boris.ostrovsky, sstabellini,
	jgross, joe.jin, Herbert Van Den Bergh, sboyd, john.stultz

On Thu, 28 Feb 2019, Dongli Zhang wrote:
> 
> The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
> but not 'unsigned long'. As a result, the leading 32 bits are discarded.

Errm. No. The root cause is that jiffies_to_usecs() is used for that in the
first place. The function has been that way forever and all usage sites
(except a broken dev_debug print in infiniband) feed delta values. Yes, it
could have documentation....

> jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
> If guest is already up for long time, the initial steal time for new vcpu might
> be large and the leading 32 bits of jiffies_to_usecs() would be discarded.

> So far, I have two solutions:
> 
> 1. Change the return type from 'unsigned int' to 'unsigned long' as in above
> link and I am afraid it would bring side effect. The return type in latest
> mainline kernel is still 'unsigned int'.

Changing it to unsigned long would just solve the issue for 64bit.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage
  2019-03-01  2:35 Dongli Zhang
@ 2019-03-01 23:43 ` Thomas Gleixner
  2019-03-01 23:43 ` Thomas Gleixner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2019-03-01 23:43 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: jgross, Herbert Van Den Bergh, sstabellini, sboyd, joe.jin,
	linux-kernel, stable, john.stultz, xen-devel, boris.ostrovsky

On Thu, 28 Feb 2019, Dongli Zhang wrote:
> 
> The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
> but not 'unsigned long'. As a result, the leading 32 bits are discarded.

Errm. No. The root cause is that jiffies_to_usecs() is used for that in the
first place. The function has been that way forever and all usage sites
(except a broken dev_debug print in infiniband) feed delta values. Yes, it
could have documentation....

> jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
> If guest is already up for long time, the initial steal time for new vcpu might
> be large and the leading 32 bits of jiffies_to_usecs() would be discarded.

> So far, I have two solutions:
> 
> 1. Change the return type from 'unsigned int' to 'unsigned long' as in above
> link and I am afraid it would bring side effect. The return type in latest
> mainline kernel is still 'unsigned int'.

Changing it to unsigned long would just solve the issue for 64bit.

Thanks,

	tglx


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage
@ 2019-03-01  2:35 Dongli Zhang
  2019-03-01 23:43 ` Thomas Gleixner
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Dongli Zhang @ 2019-03-01  2:35 UTC (permalink / raw)
  To: xen-devel, stable, linux-kernel
  Cc: boris.ostrovsky, sstabellini, jgross, tglx, joe.jin,
	Herbert Van Den Bergh, sboyd, john.stultz

This issue is only for stable 4.9.x (e.g., 4.9.160), while the root cause is
still in the lasted mainline kernel.

This is obviated by new feature patch set ended with b672592f0221
("sched/cputime: Remove generic asm headers").

After xen guest is up for long time, once we hotplug new vcpu, the corresponding
steal usage might become 100% and the steal time from /proc/stat would increase
abnormally.

As we cannot wait for long time to reproduce the issue, here is how I reproduce
it on purpose by accounting a large initial steal clock for new vcpu 2 and 3.

1. Apply the below patch to guest 4.9.160 to account large initial steal clock
for new vcpu 2 and 3:

diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index ac5f23f..3cf629e 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -85,7 +85,14 @@ u64 xen_steal_clock(int cpu)
        struct vcpu_runstate_info state;
 
        xen_get_runstate_snapshot_cpu(&state, cpu);
-       return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline];
+
+       if (cpu == 2 || cpu == 3)
+               return state.time[RUNSTATE_runnable]
+                      + state.time[RUNSTATE_offline]
+                      + 0x00071e87e677aa12;
+       else
+               return state.time[RUNSTATE_runnable]
+                      + state.time[RUNSTATE_offline];
 }
 
 void xen_setup_runstate_info(int cpu)


2. Boot hvm guest with "vcpus=2" and "maxvcpus=4". By default, VM boot with
vcpu 0 and 1.

3. Hotplug vcpu 2 and 3 via "xl vcpu-set <domid> 4" on dom0.

In my env, the steal becomes 100% within 10s after the "xl vcpu-set" command on
dom0.

I can reproduce on kvm with similar method. However, as the initial steal clock
on kvm guest is always 0, I do not think it is easy to hit this issue on kvm.

--------------------------------------------------------

The root cause is that the return type of jiffies_to_usecs() is 'unsigned int',
but not 'unsigned long'. As a result, the leading 32 bits are discarded.

jiffies_to_usecs() is indirectly triggered by cputime_to_nsecs() at line 264.
If guest is already up for long time, the initial steal time for new vcpu might
be large and the leading 32 bits of jiffies_to_usecs() would be discarded.

As a result, the steal at line 259 is always large and the
this_rq()->prev_steal_time at line 264 is always small. The difference at line
260 is always large during each time steal_account_process_time() is involved.
Finally, the steal time in /proc/stat would increase abnormally.

252 static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
253 {
254 #ifdef CONFIG_PARAVIRT
255         if (static_key_false(&paravirt_steal_enabled)) {
256                 cputime_t steal_cputime;
257                 u64 steal;
258 
259                 steal = paravirt_steal_clock(smp_processor_id());
260                 steal -= this_rq()->prev_steal_time;
261 
262                 steal_cputime = min(nsecs_to_cputime(steal), maxtime);
263                 account_steal_time(steal_cputime);
264                 this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);
265 
266                 return steal_cputime;
267         }
268 #endif
269         return 0;
270 }

--------------------------------------------------------

I have emailed the kernel mailing list about the return type of
jiffies_to_usecs() and jiffies_to_msecs():

https://lkml.org/lkml/2019/2/26/899


So far, I have two solutions:

1. Change the return type from 'unsigned int' to 'unsigned long' as in above
link and I am afraid it would bring side effect. The return type in latest
mainline kernel is still 'unsigned int'.

2. Something like below based on stable 4.9.160:

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 734377a..9b1fc40 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -286,10 +286,11 @@ extern unsigned long preset_lpj;
  */
 extern unsigned int jiffies_to_msecs(const unsigned long j);
 extern unsigned int jiffies_to_usecs(const unsigned long j);
+extern unsigned long jiffies_to_usecs64(const unsigned long j);
 
 static inline u64 jiffies_to_nsecs(const unsigned long j)
 {
-       return (u64)jiffies_to_usecs(j) * NSEC_PER_USEC;
+       return (u64)jiffies_to_usecs64(j) * NSEC_PER_USEC;
 }
 
 extern u64 jiffies64_to_nsecs(u64 j);
diff --git a/kernel/time/time.c b/kernel/time/time.c
index a5b6d98..256c147 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -288,6 +288,27 @@ unsigned int jiffies_to_usecs(const unsigned long j)
 }
 EXPORT_SYMBOL(jiffies_to_usecs);
 
+unsigned long jiffies_to_usecs64(const unsigned long j)
+{
+       /*
+        * Hz usually doesn't go much further MSEC_PER_SEC.
+        * jiffies_to_usecs() and usecs_to_jiffies() depend on that.
+        */
+       BUILD_BUG_ON(HZ > USEC_PER_SEC);
+
+#if !(USEC_PER_SEC % HZ)
+       return (USEC_PER_SEC / HZ) * j;
+#else
+# if BITS_PER_LONG == 32
+       return (HZ_TO_USEC_MUL32 * j) >> HZ_TO_USEC_SHR32;
+# else
+       return (j * HZ_TO_USEC_NUM) / HZ_TO_USEC_DEN;
+# endif
+#endif
+}
+EXPORT_SYMBOL(jiffies_to_usecs64);
+
+
 /**
  * timespec_trunc - Truncate timespec to a granularity
  * @t: Timespec

People may dislike the 2nd solution.

3. Backport patch set ended with b672592f0221 ("sched/cputime: 
Remove generic asm headers").

This is not reasonable for stable branch as the patch set involves
lots of changes.


Would you please let me know if there is any suggestion on this issue?

Thank you very much!

Dongli Zhang

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-03-05  2:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-01  2:35 [BUG linux-4.9.x] xen hotplug cpu leads to 100% steal usage Dongli Zhang
2019-03-01  2:35 Dongli Zhang
2019-03-01 23:43 ` Thomas Gleixner
2019-03-01 23:43 ` Thomas Gleixner
2019-03-05  2:19   ` Dongli Zhang
2019-03-05  2:19   ` Dongli Zhang
2019-03-04  8:14 ` Juergen Gross
2019-03-05  2:14   ` Dongli Zhang
2019-03-05  2:14   ` Dongli Zhang
2019-03-04  8:14 ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.