All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf/core: Fix hung issue on perf stat command during cpu hotplug
@ 2020-08-27  6:47 Kajol Jain
  2020-09-02 15:05 ` Arnaldo Carvalho de Melo
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Kajol Jain @ 2020-08-27  6:47 UTC (permalink / raw)
  To: acme, peterz
  Cc: jolsa, linux-kernel, linux-perf-users, maddy, mingo,
	mark.rutland, alexander.shishkin, namhyung, daniel, brho, srikar,
	kjain

Commit 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
added assignment of ret value as -EAGAIN in case function
call to 'smp_call_function_single' fails.
For non-zero ret value, it did
'ret = !ret ? data.ret : -EAGAIN;', which always
assign -EAGAIN to ret and make second if condition useless.

In scenarios like when executing a perf stat with --per-thread option, and
if any of the monitoring cpu goes offline, the 'smp_call_function_single'
function could return -ENXIO, and with the above check,
task_function_call hung and increases CPU
usage (because of repeated 'smp_call_function_single()')

Recration scenario:
	# perf stat -a --per-thread && (offline a CPU )

Patch here removes the tertiary condition added as part of that
commit and added a check for NULL and -EAGAIN.

Fixes: 2ed6edd33a21("perf: Add cond_resched() to task_function_call()")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Barret Rhoden <brho@google.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 kernel/events/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Changelog:
- Remove RFC tag
- Resolve some nits issues like space after if and
  added -ENXIO in comment msg of function 'task_function_call'
  as suggested by Barret Rhoden.

Link to the RFC: https://lkml.org/lkml/2020/8/26/896

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5bfe8e3c6e44..cef646084198 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -99,7 +99,7 @@ static void remote_function(void *data)
  * retry due to any failures in smp_call_function_single(), such as if the
  * task_cpu() goes offline concurrently.
  *
- * returns @func return value or -ESRCH when the process isn't running
+ * returns @func return value or -ESRCH or -ENXIO when the process isn't running
  */
 static int
 task_function_call(struct task_struct *p, remote_function_f func, void *info)
@@ -115,7 +115,8 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info)
 	for (;;) {
 		ret = smp_call_function_single(task_cpu(p), remote_function,
 					       &data, 1);
-		ret = !ret ? data.ret : -EAGAIN;
+		if (!ret)
+			ret = data.ret;
 
 		if (ret != -EAGAIN)
 			break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/core: Fix hung issue on perf stat command during cpu hotplug
  2020-08-27  6:47 [PATCH] perf/core: Fix hung issue on perf stat command during cpu hotplug Kajol Jain
@ 2020-09-02 15:05 ` Arnaldo Carvalho de Melo
  2020-09-02 15:57   ` kajoljain
  2020-10-08 12:25 ` kajoljain
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-09-02 15:05 UTC (permalink / raw)
  To: Peter Zijlstra, Kajol Jain
  Cc: jolsa, linux-kernel, linux-perf-users, maddy, mingo,
	mark.rutland, alexander.shishkin, namhyung, daniel, brho, srikar

Em Thu, Aug 27, 2020 at 12:17:32PM +0530, Kajol Jain escreveu:
> Commit 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
> added assignment of ret value as -EAGAIN in case function
> call to 'smp_call_function_single' fails.
> For non-zero ret value, it did
> 'ret = !ret ? data.ret : -EAGAIN;', which always
> assign -EAGAIN to ret and make second if condition useless.
> 
> In scenarios like when executing a perf stat with --per-thread option, and
> if any of the monitoring cpu goes offline, the 'smp_call_function_single'
> function could return -ENXIO, and with the above check,
> task_function_call hung and increases CPU
> usage (because of repeated 'smp_call_function_single()')
> 
> Recration scenario:
> 	# perf stat -a --per-thread && (offline a CPU )
> 
> Patch here removes the tertiary condition added as part of that
> commit and added a check for NULL and -EAGAIN.

I reproduced this issue with v5.9-rc3, now have to reboot for a conf
call, will try to test the patch afterwards,

[65108.467416] IRQ 165: no longer affine to CPU23
[65108.468495] smpboot: CPU 23 is now offline
[65129.003879] Uhhuh. NMI received for unknown reason 1c on CPU 20.
[65129.003880] Do you have a strange power saving mode enabled?
[65129.003880] Dazed and confused, but trying to continue
[65156.155539] Uhhuh. NMI received for unknown reason 1c on CPU 2.
[65156.155539] Do you have a strange power saving mode enabled?
[65156.155540] Dazed and confused, but trying to continue
[65161.985284] Uhhuh. NMI received for unknown reason 1c on CPU 21.
[65161.985285] Do you have a strange power saving mode enabled?
[65161.985285] Dazed and confused, but trying to continue
[65183.154444] Uhhuh. NMI received for unknown reason 1c on CPU 1.
[65183.154445] Do you have a strange power saving mode enabled?
[65183.154446] Dazed and confused, but trying to continue
[65189.724797] Uhhuh. NMI received for unknown reason 0c on CPU 4.
[65189.724798] Do you have a strange power saving mode enabled?
[65189.724799] Dazed and confused, but trying to continue
[65196.259918] Uhhuh. NMI received for unknown reason 1c on CPU 11.
[65196.259918] Do you have a strange power saving mode enabled?
[65196.259918] Dazed and confused, but trying to continue
[65234.794490] Uhhuh. NMI received for unknown reason 0c on CPU 5.
[65234.794491] Do you have a strange power saving mode enabled?
[65234.794491] Dazed and confused, but trying to continue
[65454.559831] Uhhuh. NMI received for unknown reason 1c on CPU 19.
[65454.559832] Do you have a strange power saving mode enabled?
[65454.559832] Dazed and confused, but trying to continue
[65529.657789] Uhhuh. NMI received for unknown reason 1c on CPU 3.
[65529.657790] Do you have a strange power saving mode enabled?
[65529.657790] Dazed and confused, but trying to continue
[acme@five perf]$


Things seems to be working again after bringing that CPU back online:

[root@five ~]# perf top --stdio -C 0-22
Error:
The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (cycles).
/bin/dmesg | grep -i perf may provide additional information.

[root@five ~]# perf stat -e cycles sleep 1
Error:
The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (cycles).
/bin/dmesg | grep -i perf may provide additional information.

[root@five ~]# perf record -e cycles sleep 1
Error:
The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (cycles).
/bin/dmesg | grep -i perf may provide additional information.

[root@five ~]# echo 1 > /sys/devices/system/cpu/cpu23/online
[root@five ~]# perf record -e cycles sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.039 MB perf.data (7 samples) ]
[root@five ~]# perf stat -e cycles sleep 1

 Performance counter stats for 'sleep 1':

           842,743      cycles

       1.000903853 seconds time elapsed

       0.000902000 seconds user
       0.000000000 seconds sys


[root@five ~]# perf stat -e cycles sleep 1


- Arnaldo

 
> Fixes: 2ed6edd33a21("perf: Add cond_resched() to task_function_call()")
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Reviewed-by: Barret Rhoden <brho@google.com>
> Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  kernel/events/core.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Changelog:
> - Remove RFC tag
> - Resolve some nits issues like space after if and
>   added -ENXIO in comment msg of function 'task_function_call'
>   as suggested by Barret Rhoden.
> 
> Link to the RFC: https://lkml.org/lkml/2020/8/26/896
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5bfe8e3c6e44..cef646084198 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -99,7 +99,7 @@ static void remote_function(void *data)
>   * retry due to any failures in smp_call_function_single(), such as if the
>   * task_cpu() goes offline concurrently.
>   *
> - * returns @func return value or -ESRCH when the process isn't running
> + * returns @func return value or -ESRCH or -ENXIO when the process isn't running
>   */
>  static int
>  task_function_call(struct task_struct *p, remote_function_f func, void *info)
> @@ -115,7 +115,8 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info)
>  	for (;;) {
>  		ret = smp_call_function_single(task_cpu(p), remote_function,
>  					       &data, 1);
> -		ret = !ret ? data.ret : -EAGAIN;
> +		if (!ret)
> +			ret = data.ret;
>  
>  		if (ret != -EAGAIN)
>  			break;
> -- 
> 2.26.2
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/core: Fix hung issue on perf stat command during cpu hotplug
  2020-09-02 15:05 ` Arnaldo Carvalho de Melo
@ 2020-09-02 15:57   ` kajoljain
  0 siblings, 0 replies; 7+ messages in thread
From: kajoljain @ 2020-09-02 15:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: jolsa, linux-kernel, linux-perf-users, maddy, mingo,
	mark.rutland, alexander.shishkin, namhyung, daniel, brho, srikar



On 9/2/20 8:35 PM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 27, 2020 at 12:17:32PM +0530, Kajol Jain escreveu:
>> Commit 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
>> added assignment of ret value as -EAGAIN in case function
>> call to 'smp_call_function_single' fails.
>> For non-zero ret value, it did
>> 'ret = !ret ? data.ret : -EAGAIN;', which always
>> assign -EAGAIN to ret and make second if condition useless.
>>
>> In scenarios like when executing a perf stat with --per-thread option, and
>> if any of the monitoring cpu goes offline, the 'smp_call_function_single'
>> function could return -ENXIO, and with the above check,
>> task_function_call hung and increases CPU
>> usage (because of repeated 'smp_call_function_single()')
>>
>> Recration scenario:
>> 	# perf stat -a --per-thread && (offline a CPU )
>>
>> Patch here removes the tertiary condition added as part of that
>> commit and added a check for NULL and -EAGAIN.
> 
> I reproduced this issue with v5.9-rc3, now have to reboot for a conf
> call, will try to test the patch afterwards,
> 
> [65108.467416] IRQ 165: no longer affine to CPU23
> [65108.468495] smpboot: CPU 23 is now offline
> [65129.003879] Uhhuh. NMI received for unknown reason 1c on CPU 20.
> [65129.003880] Do you have a strange power saving mode enabled?
> [65129.003880] Dazed and confused, but trying to continue
> [65156.155539] Uhhuh. NMI received for unknown reason 1c on CPU 2.
> [65156.155539] Do you have a strange power saving mode enabled?
> [65156.155540] Dazed and confused, but trying to continue
> [65161.985284] Uhhuh. NMI received for unknown reason 1c on CPU 21.
> [65161.985285] Do you have a strange power saving mode enabled?
> [65161.985285] Dazed and confused, but trying to continue
> [65183.154444] Uhhuh. NMI received for unknown reason 1c on CPU 1.
> [65183.154445] Do you have a strange power saving mode enabled?
> [65183.154446] Dazed and confused, but trying to continue
> [65189.724797] Uhhuh. NMI received for unknown reason 0c on CPU 4.
> [65189.724798] Do you have a strange power saving mode enabled?
> [65189.724799] Dazed and confused, but trying to continue
> [65196.259918] Uhhuh. NMI received for unknown reason 1c on CPU 11.
> [65196.259918] Do you have a strange power saving mode enabled?
> [65196.259918] Dazed and confused, but trying to continue
> [65234.794490] Uhhuh. NMI received for unknown reason 0c on CPU 5.
> [65234.794491] Do you have a strange power saving mode enabled?
> [65234.794491] Dazed and confused, but trying to continue
> [65454.559831] Uhhuh. NMI received for unknown reason 1c on CPU 19.
> [65454.559832] Do you have a strange power saving mode enabled?
> [65454.559832] Dazed and confused, but trying to continue
> [65529.657789] Uhhuh. NMI received for unknown reason 1c on CPU 3.
> [65529.657790] Do you have a strange power saving mode enabled?
> [65529.657790] Dazed and confused, but trying to continue
> [acme@five perf]$
> 
> 
> Things seems to be working again after bringing that CPU back online:

Hi Arnaldo,
    You are right, once we bring back the CPU again, things will start working as our 'smp_call_function_single'
will not fail and we will come out of the loop. But till then, task_function_call will be hung.

Thanks,
Kajol Jain
> 
> [root@five ~]# perf top --stdio -C 0-22
> Error:
> The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (cycles).
> /bin/dmesg | grep -i perf may provide additional information.
> 
> [root@five ~]# perf stat -e cycles sleep 1
> Error:
> The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (cycles).
> /bin/dmesg | grep -i perf may provide additional information.
> 
> [root@five ~]# perf record -e cycles sleep 1
> Error:
> The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (cycles).
> /bin/dmesg | grep -i perf may provide additional information.
> 
> [root@five ~]# echo 1 > /sys/devices/system/cpu/cpu23/online
> [root@five ~]# perf record -e cycles sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.039 MB perf.data (7 samples) ]
> [root@five ~]# perf stat -e cycles sleep 1
> 
>  Performance counter stats for 'sleep 1':
> 
>            842,743      cycles
> 
>        1.000903853 seconds time elapsed
> 
>        0.000902000 seconds user
>        0.000000000 seconds sys
> 
> 
> [root@five ~]# perf stat -e cycles sleep 1
> 
> 
> - Arnaldo
> 
>  
>> Fixes: 2ed6edd33a21("perf: Add cond_resched() to task_function_call()")
>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
>> Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>> Reviewed-by: Barret Rhoden <brho@google.com>
>> Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>> ---
>>  kernel/events/core.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> Changelog:
>> - Remove RFC tag
>> - Resolve some nits issues like space after if and
>>   added -ENXIO in comment msg of function 'task_function_call'
>>   as suggested by Barret Rhoden.
>>
>> Link to the RFC: https://lkml.org/lkml/2020/8/26/896
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 5bfe8e3c6e44..cef646084198 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -99,7 +99,7 @@ static void remote_function(void *data)
>>   * retry due to any failures in smp_call_function_single(), such as if the
>>   * task_cpu() goes offline concurrently.
>>   *
>> - * returns @func return value or -ESRCH when the process isn't running
>> + * returns @func return value or -ESRCH or -ENXIO when the process isn't running
>>   */
>>  static int
>>  task_function_call(struct task_struct *p, remote_function_f func, void *info)
>> @@ -115,7 +115,8 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info)
>>  	for (;;) {
>>  		ret = smp_call_function_single(task_cpu(p), remote_function,
>>  					       &data, 1);
>> -		ret = !ret ? data.ret : -EAGAIN;
>> +		if (!ret)
>> +			ret = data.ret;
>>  
>>  		if (ret != -EAGAIN)
>>  			break;
>> -- 
>> 2.26.2
>>
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/core: Fix hung issue on perf stat command during cpu hotplug
  2020-08-27  6:47 [PATCH] perf/core: Fix hung issue on perf stat command during cpu hotplug Kajol Jain
  2020-09-02 15:05 ` Arnaldo Carvalho de Melo
@ 2020-10-08 12:25 ` kajoljain
  2020-10-08 13:15   ` Peter Zijlstra
  2020-10-08 13:19 ` [tip: perf/core] perf: Fix task_function_call() error handling tip-bot2 for Kajol Jain
  2020-10-09  6:24 ` [tip: perf/urgent] " tip-bot2 for Kajol Jain
  3 siblings, 1 reply; 7+ messages in thread
From: kajoljain @ 2020-10-08 12:25 UTC (permalink / raw)
  To: acme, peterz
  Cc: jolsa, linux-kernel, linux-perf-users, maddy, mingo,
	mark.rutland, alexander.shishkin, namhyung, daniel, brho, srikar



On 8/27/20 12:17 PM, Kajol Jain wrote:
> Commit 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
> added assignment of ret value as -EAGAIN in case function
> call to 'smp_call_function_single' fails.
> For non-zero ret value, it did
> 'ret = !ret ? data.ret : -EAGAIN;', which always
> assign -EAGAIN to ret and make second if condition useless.
> 
> In scenarios like when executing a perf stat with --per-thread option, and
> if any of the monitoring cpu goes offline, the 'smp_call_function_single'
> function could return -ENXIO, and with the above check,
> task_function_call hung and increases CPU
> usage (because of repeated 'smp_call_function_single()')
> 
> Recration scenario:
> 	# perf stat -a --per-thread && (offline a CPU )
> 
> Patch here removes the tertiary condition added as part of that
> commit and added a check for NULL and -EAGAIN.

Hi Peter,
     Please let me know if you have any comment on this patch.

Thanks,
Kajol Jain
> 
> Fixes: 2ed6edd33a21("perf: Add cond_resched() to task_function_call()")
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Reviewed-by: Barret Rhoden <brho@google.com>
> Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  kernel/events/core.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Changelog:
> - Remove RFC tag
> - Resolve some nits issues like space after if and
>   added -ENXIO in comment msg of function 'task_function_call'
>   as suggested by Barret Rhoden.
> 
> Link to the RFC: https://lkml.org/lkml/2020/8/26/896
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5bfe8e3c6e44..cef646084198 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -99,7 +99,7 @@ static void remote_function(void *data)
>   * retry due to any failures in smp_call_function_single(), such as if the
>   * task_cpu() goes offline concurrently.
>   *
> - * returns @func return value or -ESRCH when the process isn't running
> + * returns @func return value or -ESRCH or -ENXIO when the process isn't running
>   */
>  static int
>  task_function_call(struct task_struct *p, remote_function_f func, void *info)
> @@ -115,7 +115,8 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info)
>  	for (;;) {
>  		ret = smp_call_function_single(task_cpu(p), remote_function,
>  					       &data, 1);
> -		ret = !ret ? data.ret : -EAGAIN;
> +		if (!ret)
> +			ret = data.ret;
>  
>  		if (ret != -EAGAIN)
>  			break;
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/core: Fix hung issue on perf stat command during cpu hotplug
  2020-10-08 12:25 ` kajoljain
@ 2020-10-08 13:15   ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2020-10-08 13:15 UTC (permalink / raw)
  To: kajoljain
  Cc: acme, jolsa, linux-kernel, linux-perf-users, maddy, mingo,
	mark.rutland, alexander.shishkin, namhyung, daniel, brho, srikar

On Thu, Oct 08, 2020 at 05:55:35PM +0530, kajoljain wrote:
> 
> 
> On 8/27/20 12:17 PM, Kajol Jain wrote:
> > Commit 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
> > added assignment of ret value as -EAGAIN in case function
> > call to 'smp_call_function_single' fails.
> > For non-zero ret value, it did
> > 'ret = !ret ? data.ret : -EAGAIN;', which always
> > assign -EAGAIN to ret and make second if condition useless.
> > 
> > In scenarios like when executing a perf stat with --per-thread option, and
> > if any of the monitoring cpu goes offline, the 'smp_call_function_single'
> > function could return -ENXIO, and with the above check,
> > task_function_call hung and increases CPU
> > usage (because of repeated 'smp_call_function_single()')
> > 
> > Recration scenario:
> > 	# perf stat -a --per-thread && (offline a CPU )
> > 
> > Patch here removes the tertiary condition added as part of that
> > commit and added a check for NULL and -EAGAIN.
> 
> Hi Peter,
>      Please let me know if you have any comment on this patch.

Yes, sorry. I've got it now. Thanks!

---
Subject: perf: Fix task_function_call() error handling
From: Kajol Jain <kjain@linux.ibm.com>
Date: Thu, 27 Aug 2020 12:17:32 +0530

From: Kajol Jain <kjain@linux.ibm.com>

The error handling introduced by commit:

  2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")

looses any return value from smp_call_function_single() that is not
{0, -EINVAL}. This is a problem because it will return -EXNIO when the
target CPU is offline. Worse, in that case it'll turn into an infinite
loop.

Fixes: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20200827064732.20860-1-kjain@linux.ibm.com
---
 kernel/events/core.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -99,7 +99,7 @@ static void remote_function(void *data)
  * retry due to any failures in smp_call_function_single(), such as if the
  * task_cpu() goes offline concurrently.
  *
- * returns @func return value or -ESRCH when the process isn't running
+ * returns @func return value or -ESRCH or -ENXIO when the process isn't running
  */
 static int
 task_function_call(struct task_struct *p, remote_function_f func, void *info)
@@ -115,7 +115,8 @@ task_function_call(struct task_struct *p
 	for (;;) {
 		ret = smp_call_function_single(task_cpu(p), remote_function,
 					       &data, 1);
-		ret = !ret ? data.ret : -EAGAIN;
+		if (!ret)
+			ret = data.ret;
 
 		if (ret != -EAGAIN)
 			break;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip: perf/core] perf: Fix task_function_call() error handling
  2020-08-27  6:47 [PATCH] perf/core: Fix hung issue on perf stat command during cpu hotplug Kajol Jain
  2020-09-02 15:05 ` Arnaldo Carvalho de Melo
  2020-10-08 12:25 ` kajoljain
@ 2020-10-08 13:19 ` tip-bot2 for Kajol Jain
  2020-10-09  6:24 ` [tip: perf/urgent] " tip-bot2 for Kajol Jain
  3 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for Kajol Jain @ 2020-10-08 13:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Srikar Dronamraju, Kajol Jain, Peter Zijlstra (Intel),
	Barret Rhoden, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     84ad70320241566e028ada955c694ab92f3351e3
Gitweb:        https://git.kernel.org/tip/84ad70320241566e028ada955c694ab92f3351e3
Author:        Kajol Jain <kjain@linux.ibm.com>
AuthorDate:    Thu, 27 Aug 2020 12:17:32 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 08 Oct 2020 15:16:29 +02:00

perf: Fix task_function_call() error handling

The error handling introduced by commit:

  2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")

looses any return value from smp_call_function_single() that is not
{0, -EINVAL}. This is a problem because it will return -EXNIO when the
target CPU is offline. Worse, in that case it'll turn into an infinite
loop.

Fixes: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20200827064732.20860-1-kjain@linux.ibm.com
---
 kernel/events/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 45edb85..85a6e7f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -99,7 +99,7 @@ static void remote_function(void *data)
  * retry due to any failures in smp_call_function_single(), such as if the
  * task_cpu() goes offline concurrently.
  *
- * returns @func return value or -ESRCH when the process isn't running
+ * returns @func return value or -ESRCH or -ENXIO when the process isn't running
  */
 static int
 task_function_call(struct task_struct *p, remote_function_f func, void *info)
@@ -115,7 +115,8 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info)
 	for (;;) {
 		ret = smp_call_function_single(task_cpu(p), remote_function,
 					       &data, 1);
-		ret = !ret ? data.ret : -EAGAIN;
+		if (!ret)
+			ret = data.ret;
 
 		if (ret != -EAGAIN)
 			break;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip: perf/urgent] perf: Fix task_function_call() error handling
  2020-08-27  6:47 [PATCH] perf/core: Fix hung issue on perf stat command during cpu hotplug Kajol Jain
                   ` (2 preceding siblings ...)
  2020-10-08 13:19 ` [tip: perf/core] perf: Fix task_function_call() error handling tip-bot2 for Kajol Jain
@ 2020-10-09  6:24 ` tip-bot2 for Kajol Jain
  3 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for Kajol Jain @ 2020-10-09  6:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Srikar Dronamraju, Kajol Jain, Peter Zijlstra (Intel),
	Ingo Molnar, Barret Rhoden, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     6d6b8b9f4fceab7266ca03d194f60ec72bd4b654
Gitweb:        https://git.kernel.org/tip/6d6b8b9f4fceab7266ca03d194f60ec72bd4b654
Author:        Kajol Jain <kjain@linux.ibm.com>
AuthorDate:    Thu, 27 Aug 2020 12:17:32 +05:30
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 09 Oct 2020 08:18:33 +02:00

perf: Fix task_function_call() error handling

The error handling introduced by commit:

  2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")

looses any return value from smp_call_function_single() that is not
{0, -EINVAL}. This is a problem because it will return -EXNIO when the
target CPU is offline. Worse, in that case it'll turn into an infinite
loop.

Fixes: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20200827064732.20860-1-kjain@linux.ibm.com
---
 kernel/events/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7ed5248..e8bf922 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -99,7 +99,7 @@ static void remote_function(void *data)
  * retry due to any failures in smp_call_function_single(), such as if the
  * task_cpu() goes offline concurrently.
  *
- * returns @func return value or -ESRCH when the process isn't running
+ * returns @func return value or -ESRCH or -ENXIO when the process isn't running
  */
 static int
 task_function_call(struct task_struct *p, remote_function_f func, void *info)
@@ -115,7 +115,8 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info)
 	for (;;) {
 		ret = smp_call_function_single(task_cpu(p), remote_function,
 					       &data, 1);
-		ret = !ret ? data.ret : -EAGAIN;
+		if (!ret)
+			ret = data.ret;
 
 		if (ret != -EAGAIN)
 			break;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-10-09  6:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-27  6:47 [PATCH] perf/core: Fix hung issue on perf stat command during cpu hotplug Kajol Jain
2020-09-02 15:05 ` Arnaldo Carvalho de Melo
2020-09-02 15:57   ` kajoljain
2020-10-08 12:25 ` kajoljain
2020-10-08 13:15   ` Peter Zijlstra
2020-10-08 13:19 ` [tip: perf/core] perf: Fix task_function_call() error handling tip-bot2 for Kajol Jain
2020-10-09  6:24 ` [tip: perf/urgent] " tip-bot2 for Kajol Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.