All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix crash during platform suspend when CPUs are related to each other
@ 2015-07-01  9:11 Sudarshan N Ramachandra
  2015-07-08  0:12 ` Rafael J. Wysocki
  0 siblings, 1 reply; 2+ messages in thread
From: Sudarshan N Ramachandra @ 2015-07-01  9:11 UTC (permalink / raw)
  To: linux-acpi; +Cc: Sudarshan N Ramachandra

When CPUs are related to each other, The policy structure is shared among all
related CPUs. Similarly we need to take care of sharing/initializing/freeing
of data in acpi-cpufreq.c This patch addresses the same.

When acpi-cpufreq.c initializes CPUs specifying that they are related to each
other (CPUFREQ_SHARED_TYPE_ALL), we will encounter a crash during suspend exit
when acpi_processor_register_performance() is called. This is because
acpi_processor_unregister_performance() would not have been called for that
CPU during suspend entry

When CPU 0, CPU1 are related and CPU2, CPU3 are related, During suspend
exit, CPU2 fails to initialize since acpi_processor_register_performance()
returns error.  Finally when CPU3 is added, below crash is seen:

[ 92.712008] BUG: unable to handle kernel paging request at 0000000677ac20e8
[ 92.712035] IP:
[ 92.712036] [<ffffffff8177da22>] cpufreq_frequency_table_update_policy_cpu+0x42/0x90
[ 92.712049] PGD 58fed067 PUD 0
[ 92.712067] Oops: 0000 1 PREEMPT SMP
[ 92.712149] CPU: 0 PID: 2945 Comm: system_server Tainted: G WC O 3.14.37-x86_64-L1-R443-g9a60c52-dirty #18
[ 92.712166] task: ffff8800594003d0 ti: ffff880059402000 task.ti: ffff880059402000
[ 92.712184] RIP: 0010:[<ffffffff8177da22>]
[ 92.712184] [<ffffffff8177da22>] cpufreq_frequency_table_update_policy_cpu+0x42/0x90
[ 92.712191] RSP: 0018:ffff880059403b28 EFLAGS: 00010282
[ 92.712196] RAX: 00000000dead4ead RBX: ffff8800729e9c00 RCX: 000000000000c3c2
[ 92.712202] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff81fc77e3
[ 92.712207] RBP: ffff880059403b38 R08: 00000000000142d8 R09: 0000000000000000
[ 92.712211] R10: 0000000000000000 R11: 0000000000040000 R12: 000000000000eae0
[ 92.712217] R13: ffff8800729e9d18 R14: ffffffff82385818 R15: 0000000000000003
[ 92.712225] FS: 00007fd57be09b60(0000) GS:ffff880076c00000(0000) knlGS:00007fd58fc24000
[ 92.712230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 92.712236] CR2: 0000000677ac20e8 CR3: 0000000067f24000 CR4: 00000000001007f0
[ 92.712247] Last Branch Records:
[ 92.712263] to: [<ffffffff819fff40>] page_fault+0x0/0x80
[ 92.712276] from: [<ffffffff8177da22>] cpufreq_frequency_table_update_policy_cpu+0x42/0x90
[ 92.712289] to: [<ffffffff8177da18>] cpufreq_frequency_table_update_policy_cpu+0x38/0x90
[ 92.712301] from: [<ffffffff819ee551>] printk+0x55/0x56
[ 92.712310] to: [<ffffffff819ee550>] printk+0x54/0x56
[ 92.712327] from: [<ffffffff810d9144>] vprintk_emit+0x184/0x540
[ 92.712337] to: [<ffffffff810d912f>] vprintk_emit+0x16f/0x540
[ 92.712353] from: [<ffffffff81133569>] trace_hardirqs_on+0x9/0xf0
[ 92.712364] to: [<ffffffff81133560>] trace_hardirqs_on+0x0/0xf0
[ 92.712375] from: [<ffffffff810d912a>] vprintk_emit+0x16a/0x540
[ 92.712386] to: [<ffffffff810d911c>] vprintk_emit+0x15c/0x540
[ 92.712402] from: [<ffffffff819ff205>] _raw_spin_unlock+0x25/0x40
[ 92.712413] to: [<ffffffff819ff1f8>] _raw_spin_unlock+0x18/0x40
[ 92.712426] from: [<ffffffff81a034bc>] preempt_count_sub+0x3c/0xb0
[ 92.712437] to: [<ffffffff81a034b1>] preempt_count_sub+0x31/0xb0
[ 92.712447] from: [<ffffffff81a034f4>] preempt_count_sub+0x74/0xb0
[ 92.712449] Stack:
[ 92.712469] ffff8800729e9c00 0000000000000003 ffff880059403b60 ffffffff8177a8f0
[ 92.712482] ffff8800729e9c00 0000000000000202 0000000000000003 ffff880059403bc0
[ 92.712496] ffffffff8177c8fe ffffffff81025af0 ffff880059403ba0 0000000000000003
[ 92.712499] Call Trace:
[ 92.712521] [<ffffffff8177a8f0>] update_policy_cpu+0x50/0xa0
[ 92.712535] [<ffffffff8177c8fe>] __cpufreq_add_dev.isra.22+0x31e/0xa70
[ 92.712550] [<ffffffff81025af0>] ? mce_device_create+0x160/0x200
[ 92.712564] [<ffffffff8177d0c2>] cpufreq_cpu_callback+0x72/0xc0
[ 92.712576] [<ffffffff81a032e3>] notifier_call_chain+0x53/0xa0
[ 92.712594] [<ffffffff810afffe>] __raw_notifier_call_chain+0xe/0x10
[ 92.712609] [<ffffffff81088af3>] cpu_notify+0x23/0x50
[ 92.712619] [<ffffffff81088d98>] _cpu_up+0x168/0x180
[ 92.712636] [<ffffffff819e98f3>] enable_nonboot_cpus+0xa3/0x1b0
[ 92.712650] [<ffffffff810d5afc>] suspend_enter+0x17c/0x490
[ 92.712663] [<ffffffff819ee550>] ? printk+0x54/0x56
[ 92.712682] [<ffffffff8157dc42>] ? dpm_show_time+0x92/0xc0
[ 92.712696] [<ffffffff81580c4e>] ? dpm_suspend+0x1ce/0x390
[ 92.712710] [<ffffffff810d5eaa>] suspend_devices_and_enter+0x9a/0x150
[ 92.712721] [<ffffffff810d6236>] pm_suspend+0x2d6/0x3a0
[ 92.712732] [<ffffffff810d4db7>] state_store+0x87/0xa0
[ 92.712749] [<ffffffff8137e12f>] kobj_attr_store+0xf/0x20
[ 92.712764] [<ffffffff8122861d>] sysfs_kf_write+0x3d/0x50
[ 92.712778] [<ffffffff8122c622>] kernfs_fop_write+0xd2/0x140
[ 92.712795] [<ffffffff811b2f1a>] vfs_write+0xba/0x1e0
[ 92.712808] [<ffffffff811b3919>] SyS_write+0x49/0xb0
[ 92.712828] [<ffffffff81a07632>] system_call_fastpath+0x16/0x1b
[ 92.712959] Code: fb 8b 77 14 8b 57 18 48 c7 c7 a8 5a 06 82 e8 f2 0a 27 00 48 c7 c7 d7 77 fc 81 31 c0 e8 e4 0a 27 00 8b 43 18 48 c7 c7 e3 77 fc 81 <48> 8b 04 c5 80 ab 41 82 49 8b 14 04 8b 43 14 48 8b 04 c5 80 ab
[ 92.712974] RIP
[ 92.712976] [<ffffffff8177da22>] cpufreq_frequency_table_update_policy_cpu+0x42/0x90
[ 92.712980] RSP <ffff880059403b28>
[ 92.712984] CR2: 0000000677ac20e8
[ 92.713050] --[ end trace b17e2a918293dc7c ]--
[ 92.713059] Kernel panic - not syncing: Fatal exception
[ 93.338306] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[ 93.376633] ACPI MEMORY or I/O RESET_REG.

Signed-off-by: Sudarshan N Ramachandra <sudarshan.n.ramachandra@intel.com>
---
 drivers/cpufreq/acpi-cpufreq.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index b0c18ed..a479ece 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -70,6 +70,7 @@ struct acpi_cpufreq_data {
 	unsigned int resume;
 	unsigned int cpu_feature;
 	cpumask_var_t freqdomain_cpus;
+	unsigned int owner;
 };
 
 static DEFINE_PER_CPU(struct acpi_cpufreq_data *, acfreq_data);
@@ -673,6 +674,7 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	}
 
 	data->acpi_data = per_cpu_ptr(acpi_perf_data, cpu);
+	data->owner = cpu;
 	per_cpu(acfreq_data, cpu) = data;
 
 	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC))
@@ -711,6 +713,10 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	}
 #endif
 
+	/* All related CPUs will use same acfreq_data */
+	for_each_cpu(i, policy->cpus)
+		per_cpu(acfreq_data, i) = per_cpu(acfreq_data, cpu);
+
 	/* capability check */
 	if (perf->state_count <= 1) {
 		pr_debug("No P-States\n");
@@ -850,13 +856,16 @@ err_free:
 static int acpi_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 {
 	struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
+	unsigned int i;
 
 	pr_debug("acpi_cpufreq_cpu_exit\n");
 
 	if (data) {
-		per_cpu(acfreq_data, policy->cpu) = NULL;
+		/* Set NULL for all related CPUs (online + offline) */
+		for_each_cpu(i, policy->related_cpus)
+			per_cpu(acfreq_data, i) = NULL;
 		acpi_processor_unregister_performance(data->acpi_data,
-						      policy->cpu);
+								data->owner);
 		free_cpumask_var(data->freqdomain_cpus);
 		kfree(data->freq_table);
 		kfree(data);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] Fix crash during platform suspend when CPUs are related to each other
  2015-07-01  9:11 [PATCH] Fix crash during platform suspend when CPUs are related to each other Sudarshan N Ramachandra
@ 2015-07-08  0:12 ` Rafael J. Wysocki
  0 siblings, 0 replies; 2+ messages in thread
From: Rafael J. Wysocki @ 2015-07-08  0:12 UTC (permalink / raw)
  To: Sudarshan N Ramachandra; +Cc: linux-acpi, Linux PM list, Pan Xinhui

On Wednesday, July 01, 2015 02:41:16 PM Sudarshan N Ramachandra wrote:
> When CPUs are related to each other, The policy structure is shared among all
> related CPUs. Similarly we need to take care of sharing/initializing/freeing
> of data in acpi-cpufreq.c This patch addresses the same.
> 
> When acpi-cpufreq.c initializes CPUs specifying that they are related to each
> other (CPUFREQ_SHARED_TYPE_ALL), we will encounter a crash during suspend exit
> when acpi_processor_register_performance() is called. This is because
> acpi_processor_unregister_performance() would not have been called for that
> CPU during suspend entry
> 
> When CPU 0, CPU1 are related and CPU2, CPU3 are related, During suspend
> exit, CPU2 fails to initialize since acpi_processor_register_performance()
> returns error.  Finally when CPU3 is added, below crash is seen:

While this explains what's wrong, the changelog doesn't say anything about how
the problem is being fixed.  It should.

> [ 92.712008] BUG: unable to handle kernel paging request at 0000000677ac20e8
> [ 92.712035] IP:
> [ 92.712036] [<ffffffff8177da22>] cpufreq_frequency_table_update_policy_cpu+0x42/0x90
> [ 92.712049] PGD 58fed067 PUD 0
> [ 92.712067] Oops: 0000 1 PREEMPT SMP
> [ 92.712149] CPU: 0 PID: 2945 Comm: system_server Tainted: G WC O 3.14.37-x86_64-L1-R443-g9a60c52-dirty #18
> [ 92.712166] task: ffff8800594003d0 ti: ffff880059402000 task.ti: ffff880059402000
> [ 92.712184] RIP: 0010:[<ffffffff8177da22>]
> [ 92.712184] [<ffffffff8177da22>] cpufreq_frequency_table_update_policy_cpu+0x42/0x90
> [ 92.712191] RSP: 0018:ffff880059403b28 EFLAGS: 00010282
> [ 92.712196] RAX: 00000000dead4ead RBX: ffff8800729e9c00 RCX: 000000000000c3c2
> [ 92.712202] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff81fc77e3
> [ 92.712207] RBP: ffff880059403b38 R08: 00000000000142d8 R09: 0000000000000000
> [ 92.712211] R10: 0000000000000000 R11: 0000000000040000 R12: 000000000000eae0
> [ 92.712217] R13: ffff8800729e9d18 R14: ffffffff82385818 R15: 0000000000000003
> [ 92.712225] FS: 00007fd57be09b60(0000) GS:ffff880076c00000(0000) knlGS:00007fd58fc24000
> [ 92.712230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 92.712236] CR2: 0000000677ac20e8 CR3: 0000000067f24000 CR4: 00000000001007f0
> [ 92.712247] Last Branch Records:
> [ 92.712263] to: [<ffffffff819fff40>] page_fault+0x0/0x80
> [ 92.712276] from: [<ffffffff8177da22>] cpufreq_frequency_table_update_policy_cpu+0x42/0x90
> [ 92.712289] to: [<ffffffff8177da18>] cpufreq_frequency_table_update_policy_cpu+0x38/0x90
> [ 92.712301] from: [<ffffffff819ee551>] printk+0x55/0x56
> [ 92.712310] to: [<ffffffff819ee550>] printk+0x54/0x56
> [ 92.712327] from: [<ffffffff810d9144>] vprintk_emit+0x184/0x540
> [ 92.712337] to: [<ffffffff810d912f>] vprintk_emit+0x16f/0x540
> [ 92.712353] from: [<ffffffff81133569>] trace_hardirqs_on+0x9/0xf0
> [ 92.712364] to: [<ffffffff81133560>] trace_hardirqs_on+0x0/0xf0
> [ 92.712375] from: [<ffffffff810d912a>] vprintk_emit+0x16a/0x540
> [ 92.712386] to: [<ffffffff810d911c>] vprintk_emit+0x15c/0x540
> [ 92.712402] from: [<ffffffff819ff205>] _raw_spin_unlock+0x25/0x40
> [ 92.712413] to: [<ffffffff819ff1f8>] _raw_spin_unlock+0x18/0x40
> [ 92.712426] from: [<ffffffff81a034bc>] preempt_count_sub+0x3c/0xb0
> [ 92.712437] to: [<ffffffff81a034b1>] preempt_count_sub+0x31/0xb0
> [ 92.712447] from: [<ffffffff81a034f4>] preempt_count_sub+0x74/0xb0
> [ 92.712449] Stack:
> [ 92.712469] ffff8800729e9c00 0000000000000003 ffff880059403b60 ffffffff8177a8f0
> [ 92.712482] ffff8800729e9c00 0000000000000202 0000000000000003 ffff880059403bc0
> [ 92.712496] ffffffff8177c8fe ffffffff81025af0 ffff880059403ba0 0000000000000003
> [ 92.712499] Call Trace:
> [ 92.712521] [<ffffffff8177a8f0>] update_policy_cpu+0x50/0xa0
> [ 92.712535] [<ffffffff8177c8fe>] __cpufreq_add_dev.isra.22+0x31e/0xa70
> [ 92.712550] [<ffffffff81025af0>] ? mce_device_create+0x160/0x200
> [ 92.712564] [<ffffffff8177d0c2>] cpufreq_cpu_callback+0x72/0xc0
> [ 92.712576] [<ffffffff81a032e3>] notifier_call_chain+0x53/0xa0
> [ 92.712594] [<ffffffff810afffe>] __raw_notifier_call_chain+0xe/0x10
> [ 92.712609] [<ffffffff81088af3>] cpu_notify+0x23/0x50
> [ 92.712619] [<ffffffff81088d98>] _cpu_up+0x168/0x180
> [ 92.712636] [<ffffffff819e98f3>] enable_nonboot_cpus+0xa3/0x1b0
> [ 92.712650] [<ffffffff810d5afc>] suspend_enter+0x17c/0x490
> [ 92.712663] [<ffffffff819ee550>] ? printk+0x54/0x56
> [ 92.712682] [<ffffffff8157dc42>] ? dpm_show_time+0x92/0xc0
> [ 92.712696] [<ffffffff81580c4e>] ? dpm_suspend+0x1ce/0x390
> [ 92.712710] [<ffffffff810d5eaa>] suspend_devices_and_enter+0x9a/0x150
> [ 92.712721] [<ffffffff810d6236>] pm_suspend+0x2d6/0x3a0
> [ 92.712732] [<ffffffff810d4db7>] state_store+0x87/0xa0
> [ 92.712749] [<ffffffff8137e12f>] kobj_attr_store+0xf/0x20
> [ 92.712764] [<ffffffff8122861d>] sysfs_kf_write+0x3d/0x50
> [ 92.712778] [<ffffffff8122c622>] kernfs_fop_write+0xd2/0x140
> [ 92.712795] [<ffffffff811b2f1a>] vfs_write+0xba/0x1e0
> [ 92.712808] [<ffffffff811b3919>] SyS_write+0x49/0xb0
> [ 92.712828] [<ffffffff81a07632>] system_call_fastpath+0x16/0x1b
> [ 92.712959] Code: fb 8b 77 14 8b 57 18 48 c7 c7 a8 5a 06 82 e8 f2 0a 27 00 48 c7 c7 d7 77 fc 81 31 c0 e8 e4 0a 27 00 8b 43 18 48 c7 c7 e3 77 fc 81 <48> 8b 04 c5 80 ab 41 82 49 8b 14 04 8b 43 14 48 8b 04 c5 80 ab
> [ 92.712974] RIP
> [ 92.712976] [<ffffffff8177da22>] cpufreq_frequency_table_update_policy_cpu+0x42/0x90
> [ 92.712980] RSP <ffff880059403b28>
> [ 92.712984] CR2: 0000000677ac20e8
> [ 92.713050] --[ end trace b17e2a918293dc7c ]--
> [ 92.713059] Kernel panic - not syncing: Fatal exception
> [ 93.338306] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> [ 93.376633] ACPI MEMORY or I/O RESET_REG.
> 
> Signed-off-by: Sudarshan N Ramachandra <sudarshan.n.ramachandra@intel.com>
> ---
>  drivers/cpufreq/acpi-cpufreq.c |   13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> index b0c18ed..a479ece 100644
> --- a/drivers/cpufreq/acpi-cpufreq.c
> +++ b/drivers/cpufreq/acpi-cpufreq.c
> @@ -70,6 +70,7 @@ struct acpi_cpufreq_data {
>  	unsigned int resume;
>  	unsigned int cpu_feature;
>  	cpumask_var_t freqdomain_cpus;
> +	unsigned int owner;
>  };
>  
>  static DEFINE_PER_CPU(struct acpi_cpufreq_data *, acfreq_data);
> @@ -673,6 +674,7 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  	}
>  
>  	data->acpi_data = per_cpu_ptr(acpi_perf_data, cpu);
> +	data->owner = cpu;
>  	per_cpu(acfreq_data, cpu) = data;
>  
>  	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC))
> @@ -711,6 +713,10 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  	}
>  #endif
>  
> +	/* All related CPUs will use same acfreq_data */
> +	for_each_cpu(i, policy->cpus)
> +		per_cpu(acfreq_data, i) = per_cpu(acfreq_data, cpu);
> +
>  	/* capability check */
>  	if (perf->state_count <= 1) {
>  		pr_debug("No P-States\n");
> @@ -850,13 +856,16 @@ err_free:
>  static int acpi_cpufreq_cpu_exit(struct cpufreq_policy *policy)
>  {
>  	struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
> +	unsigned int i;
>  
>  	pr_debug("acpi_cpufreq_cpu_exit\n");
>  
>  	if (data) {
> -		per_cpu(acfreq_data, policy->cpu) = NULL;
> +		/* Set NULL for all related CPUs (online + offline) */
> +		for_each_cpu(i, policy->related_cpus)
> +			per_cpu(acfreq_data, i) = NULL;

You clear this for policy->related_cpus while you have set it for
policy->cpus.  Why is this OK?

>  		acpi_processor_unregister_performance(data->acpi_data,
> -						      policy->cpu);
> +								data->owner);
>  		free_cpumask_var(data->freqdomain_cpus);
>  		kfree(data->freq_table);
>  		kfree(data);

Besides, or maybe most importantly, your patch clashes with this patch
from Pan Xinhui:

https://patchwork.kernel.org/patch/6732431/

Can you both please coordinate somehow?

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-07-08  0:12 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-01  9:11 [PATCH] Fix crash during platform suspend when CPUs are related to each other Sudarshan N Ramachandra
2015-07-08  0:12 ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.