linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] socfpga_a10: fix a kexec boot issue
@ 2017-05-10  5:13 yanjiang.jin
  2017-05-10  5:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin
  0 siblings, 1 reply; 8+ messages in thread
From: yanjiang.jin @ 2017-05-10  5:13 UTC (permalink / raw)
  To: hiraku.toyooka.gu, dinguyen, dinguyen, linux
  Cc: linux-arm-kernel, linux-kernel, yanjiang.jin, jinyanjiang

From: Yanjiang Jin <yanjiang.jin@windriver.com>

I guess socfpga's other boards may need this patch, but I have Arria10
board only. 
So add a new function socfpga_a10_cpu_kill(), keep the old function
socfpga_cpu_kill() for other boards.
I also verified CPU_HOTPLUG after applying this patch, everything seems well.

Test steps:

1. Enable kexec and build a SOCFPGA kernel;
2. Use zImage as 1st and 2nd kernel;
3. kexec -l /root/zImage --append="`cat /proc/cmdline`" 
4. kexec -e

Test env:

U-Boot 2014.10 (Jan 13 2016 - 11:07:09)

CPU   : Altera SOCFPGA Arria 10 Platform
BOARD : Altera SOCFPGA Arria 10 Dev Kit

-- 
1.9.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill()
  2017-05-10  5:13 [PATCH] socfpga_a10: fix a kexec boot issue yanjiang.jin
@ 2017-05-10  5:13 ` yanjiang.jin
  2017-05-12 14:25   ` Dinh Nguyen
  2017-05-12 14:36   ` Mark Rutland
  0 siblings, 2 replies; 8+ messages in thread
From: yanjiang.jin @ 2017-05-10  5:13 UTC (permalink / raw)
  To: hiraku.toyooka.gu, dinguyen, dinguyen, linux
  Cc: linux-arm-kernel, linux-kernel, yanjiang.jin, jinyanjiang

From: Yanjiang Jin <yanjiang.jin@windriver.com>

Kexec's second kernel would hang if CPU1 isn't reset.

Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
---
 arch/arm/mach-socfpga/platsmp.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c
index 0ee7677..db3940e 100644
--- a/arch/arm/mach-socfpga/platsmp.c
+++ b/arch/arm/mach-socfpga/platsmp.c
@@ -117,6 +117,16 @@ static int socfpga_cpu_kill(unsigned int cpu)
 {
 	return 1;
 }
+
+static int socfpga_a10_cpu_kill(unsigned int cpu)
+{
+	/* This will put CPU #1 into reset. */
+	if (socfpga_cpu1start_addr)
+		writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr +
+			SOCFPGA_A10_RSTMGR_MODMPURST);
+
+	return 1;
+}
 #endif
 
 static const struct smp_operations socfpga_smp_ops __initconst = {
@@ -133,7 +143,7 @@ static int socfpga_cpu_kill(unsigned int cpu)
 	.smp_boot_secondary	= socfpga_a10_boot_secondary,
 #ifdef CONFIG_HOTPLUG_CPU
 	.cpu_die		= socfpga_cpu_die,
-	.cpu_kill		= socfpga_cpu_kill,
+	.cpu_kill		= socfpga_a10_cpu_kill,
 #endif
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill()
  2017-05-10  5:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin
@ 2017-05-12 14:25   ` Dinh Nguyen
  2017-05-12 14:36   ` Mark Rutland
  1 sibling, 0 replies; 8+ messages in thread
From: Dinh Nguyen @ 2017-05-12 14:25 UTC (permalink / raw)
  To: yanjiang.jin, hiraku.toyooka.gu, linux
  Cc: linux-arm-kernel, linux-kernel, jinyanjiang



On 05/10/2017 12:13 AM, yanjiang.jin@windriver.com wrote:
> From: Yanjiang Jin <yanjiang.jin@windriver.com>
> 
> Kexec's second kernel would hang if CPU1 isn't reset.
> 

Can you please be a bit more descriptive on the commit log? Is it
because when kexec starts, the SMP on the kexec's kernel try to run on CPU1?


> Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
> ---
>  arch/arm/mach-socfpga/platsmp.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c
> index 0ee7677..db3940e 100644
> --- a/arch/arm/mach-socfpga/platsmp.c
> +++ b/arch/arm/mach-socfpga/platsmp.c
> @@ -117,6 +117,16 @@ static int socfpga_cpu_kill(unsigned int cpu)
>  {
>  	return 1;
>  }
> +
> +static int socfpga_a10_cpu_kill(unsigned int cpu)
> +{
> +	/* This will put CPU #1 into reset. */
> +	if (socfpga_cpu1start_addr)

Do you need to check for socfpga_cpu1start_addr?

Dinh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill()
  2017-05-10  5:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin
  2017-05-12 14:25   ` Dinh Nguyen
@ 2017-05-12 14:36   ` Mark Rutland
  2017-05-15  9:05     ` yjin
  2017-05-15 10:00     ` Russell King - ARM Linux
  1 sibling, 2 replies; 8+ messages in thread
From: Mark Rutland @ 2017-05-12 14:36 UTC (permalink / raw)
  To: yanjiang.jin
  Cc: hiraku.toyooka.gu, dinguyen, dinguyen, linux, jinyanjiang,
	linux-kernel, linux-arm-kernel

On Wed, May 10, 2017 at 01:13:04AM -0400, yanjiang.jin@windriver.com wrote:
> From: Yanjiang Jin <yanjiang.jin@windriver.com>
> 
> Kexec's second kernel would hang if CPU1 isn't reset.
> 
> Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
> ---
>  arch/arm/mach-socfpga/platsmp.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c
> index 0ee7677..db3940e 100644
> --- a/arch/arm/mach-socfpga/platsmp.c
> +++ b/arch/arm/mach-socfpga/platsmp.c
> @@ -117,6 +117,16 @@ static int socfpga_cpu_kill(unsigned int cpu)
>  {
>  	return 1;
>  }
> +
> +static int socfpga_a10_cpu_kill(unsigned int cpu)
> +{
> +	/* This will put CPU #1 into reset. */
> +	if (socfpga_cpu1start_addr)
> +		writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr +
> +			SOCFPGA_A10_RSTMGR_MODMPURST);
> +
> +	return 1;
> +}
>  #endif

I agree that currently, socfpga_cpu_die is completely bogus, as the CPU is just
sat in WFI with the MMU, caches, etc enabled, and not actually off:

static void socfpga_cpu_die(unsigned int cpu)
{
        /* Do WFI. If we wake up early, go back into WFI */
        while (1)
                cpu_do_idle();
}

... so that goes wrong as soon as the kerenl text gets ovewritten.

However, AFAICT, this patch forcibly resets is without any teardown
having happened. That will surely result in data being lost from the
caches, for example.

So I think this is incomplete.

Thanks,
Mark.

>  
>  static const struct smp_operations socfpga_smp_ops __initconst = {
> @@ -133,7 +143,7 @@ static int socfpga_cpu_kill(unsigned int cpu)
>  	.smp_boot_secondary	= socfpga_a10_boot_secondary,
>  #ifdef CONFIG_HOTPLUG_CPU
>  	.cpu_die		= socfpga_cpu_die,
> -	.cpu_kill		= socfpga_cpu_kill,
> +	.cpu_kill		= socfpga_a10_cpu_kill,
>  #endif
>  };
>  
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill()
  2017-05-12 14:36   ` Mark Rutland
@ 2017-05-15  9:05     ` yjin
  2017-05-15 10:00     ` Russell King - ARM Linux
  1 sibling, 0 replies; 8+ messages in thread
From: yjin @ 2017-05-15  9:05 UTC (permalink / raw)
  To: Mark Rutland
  Cc: dinguyen, dinguyen, linux, jinyanjiang, linux-kernel, linux-arm-kernel


On 2017年05月12日 22:36, Mark Rutland wrote:
> On Wed, May 10, 2017 at 01:13:04AM -0400, yanjiang.jin@windriver.com wrote:
>> From: Yanjiang Jin <yanjiang.jin@windriver.com>
>>
>> Kexec's second kernel would hang if CPU1 isn't reset.
>>
>> Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
>> ---
>>   arch/arm/mach-socfpga/platsmp.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c
>> index 0ee7677..db3940e 100644
>> --- a/arch/arm/mach-socfpga/platsmp.c
>> +++ b/arch/arm/mach-socfpga/platsmp.c
>> @@ -117,6 +117,16 @@ static int socfpga_cpu_kill(unsigned int cpu)
>>   {
>>   	return 1;
>>   }
>> +
>> +static int socfpga_a10_cpu_kill(unsigned int cpu)
>> +{
>> +	/* This will put CPU #1 into reset. */
>> +	if (socfpga_cpu1start_addr)
>> +		writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr +
>> +			SOCFPGA_A10_RSTMGR_MODMPURST);
>> +
>> +	return 1;
>> +}
>>   #endif
> I agree that currently, socfpga_cpu_die is completely bogus, as the CPU is just
> sat in WFI with the MMU, caches, etc enabled, and not actually off:
>
> static void socfpga_cpu_die(unsigned int cpu)
> {
>          /* Do WFI. If we wake up early, go back into WFI */
>          while (1)
>                  cpu_do_idle();
> }
>
> ... so that goes wrong as soon as the kerenl text gets ovewritten.
>
> However, AFAICT, this patch forcibly resets is without any teardown
> having happened. That will surely result in data being lost from the
> caches, for example.
>
> So I think this is incomplete.
Hi Mark,

I think I got what you mean. So far, flush_cache is the only thing that 
I can think of.
Can we just add flush_cache_all in cpu_die() now, and add other missing 
parts(if have) in the future.

Thanks!
Yanjiang

> Thanks,
> Mark.
>
>>   
>>   static const struct smp_operations socfpga_smp_ops __initconst = {
>> @@ -133,7 +143,7 @@ static int socfpga_cpu_kill(unsigned int cpu)
>>   	.smp_boot_secondary	= socfpga_a10_boot_secondary,
>>   #ifdef CONFIG_HOTPLUG_CPU
>>   	.cpu_die		= socfpga_cpu_die,
>> -	.cpu_kill		= socfpga_cpu_kill,
>> +	.cpu_kill		= socfpga_a10_cpu_kill,
>>   #endif
>>   };
>>   
>> -- 
>> 1.9.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill()
  2017-05-12 14:36   ` Mark Rutland
  2017-05-15  9:05     ` yjin
@ 2017-05-15 10:00     ` Russell King - ARM Linux
  1 sibling, 0 replies; 8+ messages in thread
From: Russell King - ARM Linux @ 2017-05-15 10:00 UTC (permalink / raw)
  To: Mark Rutland
  Cc: yanjiang.jin, hiraku.toyooka.gu, dinguyen, dinguyen, jinyanjiang,
	linux-kernel, linux-arm-kernel

On Fri, May 12, 2017 at 03:36:01PM +0100, Mark Rutland wrote:
> However, AFAICT, this patch forcibly resets is without any teardown
> having happened. That will surely result in data being lost from the
> caches, for example.

You're wrong on that point.  Having each bloody platform implement
the same friggin teardown is utter madness, and leads to all sorts
of synchronisation issues.

The generic code already deals with the cache issues, and has
synchronisation to ensure that the dying CPU completes the cache
handling before the requesting CPU continues with the killing.
The only thing that platform code need concern itself with is doing
is the "make the CPU die" thing.

It's not perfect, but it's good enough for the majority of cases.

In any case, encouraging people to add flush_cache_all() into their
cpu_die() function is NOT the way forward if there is a problem -
that introduces a new race between flush_cache_all() walking all the
cache lines and cpu_kill() actually turning the power off to the CPU,
which could very well happen either before or during the
flush_cache_all() execution.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill()
  2017-05-15  9:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin
@ 2017-05-15  9:55   ` Russell King - ARM Linux
  0 siblings, 0 replies; 8+ messages in thread
From: Russell King - ARM Linux @ 2017-05-15  9:55 UTC (permalink / raw)
  To: yanjiang.jin
  Cc: mark.rutland, dinguyen, dinguyen, jinyanjiang, linux-arm-kernel,
	linux-kernel

On Mon, May 15, 2017 at 05:13:36PM +0800, yanjiang.jin@windriver.com wrote:
> From: Yanjiang Jin <yanjiang.jin@windriver.com>
> 
> socfpga_cpu_die() just puts CPU in WFI, not actually off. So that Kexec's
> second kernel goes wrong since the kerenl text gets ovewritten.
> Now reset CPU1 in cpu_kill() to avoid this error.
> Also add flush_cache_all() to prevent data from being lost in cpu_die().

You should not need this, as the common code does the cache flushing
for you.

It's rather unsafe too - you are completely unsynchronised between the
CPU executing in cpu_die() and the CPU executing cpu_kill() - it's
quite possible for the code in cpu_kill() to take effect when you're
part-way through the flush_cache_all().

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill()
  2017-05-15  9:13 [V2 PATCH] socfpga_a10: fix a kexec boot issue yanjiang.jin
@ 2017-05-15  9:13 ` yanjiang.jin
  2017-05-15  9:55   ` Russell King - ARM Linux
  0 siblings, 1 reply; 8+ messages in thread
From: yanjiang.jin @ 2017-05-15  9:13 UTC (permalink / raw)
  To: mark.rutland, dinguyen, dinguyen, linux
  Cc: jinyanjiang, linux-arm-kernel, linux-kernel

From: Yanjiang Jin <yanjiang.jin@windriver.com>

socfpga_cpu_die() just puts CPU in WFI, not actually off. So that Kexec's
second kernel goes wrong since the kerenl text gets ovewritten.
Now reset CPU1 in cpu_kill() to avoid this error.
Also add flush_cache_all() to prevent data from being lost in cpu_die().

Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
---
 arch/arm/mach-socfpga/platsmp.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c
index 0ee7677..53f290e 100644
--- a/arch/arm/mach-socfpga/platsmp.c
+++ b/arch/arm/mach-socfpga/platsmp.c
@@ -102,6 +102,8 @@ static void __init socfpga_smp_prepare_cpus(unsigned int max_cpus)
  */
 static void socfpga_cpu_die(unsigned int cpu)
 {
+	flush_cache_all();
+
 	/* Do WFI. If we wake up early, go back into WFI */
 	while (1)
 		cpu_do_idle();
@@ -117,6 +119,15 @@ static int socfpga_cpu_kill(unsigned int cpu)
 {
 	return 1;
 }
+
+static int socfpga_a10_cpu_kill(unsigned int cpu)
+{
+	/* This will put CPU #1 into reset. */
+	writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr +
+		SOCFPGA_A10_RSTMGR_MODMPURST);
+
+	return 1;
+}
 #endif
 
 static const struct smp_operations socfpga_smp_ops __initconst = {
@@ -133,7 +144,7 @@ static int socfpga_cpu_kill(unsigned int cpu)
 	.smp_boot_secondary	= socfpga_a10_boot_secondary,
 #ifdef CONFIG_HOTPLUG_CPU
 	.cpu_die		= socfpga_cpu_die,
-	.cpu_kill		= socfpga_cpu_kill,
+	.cpu_kill		= socfpga_a10_cpu_kill,
 #endif
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-05-15 10:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-10  5:13 [PATCH] socfpga_a10: fix a kexec boot issue yanjiang.jin
2017-05-10  5:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin
2017-05-12 14:25   ` Dinh Nguyen
2017-05-12 14:36   ` Mark Rutland
2017-05-15  9:05     ` yjin
2017-05-15 10:00     ` Russell King - ARM Linux
2017-05-15  9:13 [V2 PATCH] socfpga_a10: fix a kexec boot issue yanjiang.jin
2017-05-15  9:13 ` [PATCH] socfpga_a10: reset CPU1 in socfpga_cpu_kill() yanjiang.jin
2017-05-15  9:55   ` Russell King - ARM Linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).