Kexec on arm64

All of lore.kernel.org
 help / color / mirror / Atom feed

* Kexec on arm64
@ 2014-07-09 10:13 Arun Chandran
  2014-07-09 13:58 ` Arun Chandran
  2014-07-09 18:33 ` Geoff Levand
  0 siblings, 2 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-09 10:13 UTC (permalink / raw)
  To: kexec; +Cc: Geoff Levand

Hi,

I found the kexec development on arm64 here
http://www.spinics.net/lists/arm-kernel/msg329563.html

Went to try it on my armv8 hardware after cloning

git://git.linaro.org/people/geoff.levand/linux-kexec.git and
https://git.linaro.org/people/geoff.levand/kexec-tools.git

Did
1) ~/work/aarch64-kernel/kexec-tools$ ./bootstrap
2)~/work/aarch64-kernel/kexec-tools$ ./configure
--prefix=/tmp/kexec_install --host=aarch64-linux-gnu
3) make

and got this error

elf-arm64.o kexec/arch/arm64/kexec-elf-arm64.c
kexec/arch/arm64/kexec-elf-arm64.c: In function ‘elf_arm64_probe’:
kexec/arch/arm64/kexec-elf-arm64.c:32:24: error: ‘EM_AARCH64’
undeclared (first use in this function)
  if (ehdr.e_machine != EM_AARCH64) {
                        ^
kexec/arch/arm64/kexec-elf-arm64.c:32:24: note: each undeclared
identifier is reported only once for each function it appears in
make: *** [kexec/arch/arm64/kexec-elf-arm64.o] Error 1

Am I missing something here?

--Arun

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-09 10:13 Kexec on arm64 Arun Chandran
@ 2014-07-09 13:58 ` Arun Chandran
  2014-07-09 18:49   ` Geoff Levand
  2014-07-09 18:33 ` Geoff Levand
  1 sibling, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-07-09 13:58 UTC (permalink / raw)
  To: kexec; +Cc: Geoff Levand

On Wed, Jul 9, 2014 at 3:43 PM, Arun Chandran <achandran@mvista.com> wrote:
> Hi,
>
> I found the kexec development on arm64 here
> http://www.spinics.net/lists/arm-kernel/msg329563.html
>
> Went to try it on my armv8 hardware after cloning
>
> git://git.linaro.org/people/geoff.levand/linux-kexec.git and
> https://git.linaro.org/people/geoff.levand/kexec-tools.git
>
> Did
> 1) ~/work/aarch64-kernel/kexec-tools$ ./bootstrap
> 2)~/work/aarch64-kernel/kexec-tools$ ./configure
> --prefix=/tmp/kexec_install --host=aarch64-linux-gnu
> 3) make
>
> and got this error
>
> elf-arm64.o kexec/arch/arm64/kexec-elf-arm64.c
> kexec/arch/arm64/kexec-elf-arm64.c: In function ‘elf_arm64_probe’:
> kexec/arch/arm64/kexec-elf-arm64.c:32:24: error: ‘EM_AARCH64’
> undeclared (first use in this function)
>   if (ehdr.e_machine != EM_AARCH64) {
>                         ^
> kexec/arch/arm64/kexec-elf-arm64.c:32:24: note: each undeclared
> identifier is reported only once for each function it appears in
> make: *** [kexec/arch/arm64/kexec-elf-arm64.o] Error 1
>

I compiled kexec-tools(
https://git.linaro.org/people/geoff.levand/kexec-tools.git)
by commenting that "machine check"

--- a/kexec/arch/arm64/kexec-elf-arm64.c
+++ b/kexec/arch/arm64/kexec-elf-arm64.c
@@ -28,12 +28,13 @@ int elf_arm64_probe(const char *kernel_buf, off_t
kernel_size)
                dbgprintf("Not an ELF executable.\n");
                goto on_exit;
        }
-
+#if 0
        if (ehdr.e_machine != EM_AARCH64) {
                dbgprintf("Not an AARCH64 executable.\n");
                result = -EINVAL;
                goto on_exit;
        }
+#endif

        result = 0;


After copying the resulting binaries to my target; I tried loading the
kernel Image

# kexec -l /Image
Modified cmdline: root=/dev/nfs
Unable to find /proc/device-tree//chosen/linux,stdout-path, printing
from purgatory is diabled
Cannot determine the file type of /Image

It failed to load the kernel Image. Any pointers?

--Arun














> Am I missing something here?
>
> --Arun

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-09 13:58 ` Arun Chandran
@ 2014-07-09 18:49   ` Geoff Levand
  2014-07-11  9:23     ` Arun Chandran
                       ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Geoff Levand @ 2014-07-09 18:49 UTC (permalink / raw)
  To: Arun Chandran; +Cc: kexec

Hi Arun,

On Wed, 2014-07-09 at 19:28 +0530, Arun Chandran wrote:
> After copying the resulting binaries to my target; I tried loading the
> kernel Image
> 
> # kexec -l /Image
> Modified cmdline: root=/dev/nfs
> Unable to find /proc/device-tree//chosen/linux,stdout-path, printing
> from purgatory is diabled
> Cannot determine the file type of /Image
> 
> It failed to load the kernel Image. Any pointers?

My kexec-tools [1] only supports loading of arm64 elf files, so
vmlinux, or a stripped version of it.  Image is a raw binary, and
is not yet supported.

Maybe something like this is what you need:

 ./kexec -d --load /boot/vmlinux.strip --append="console=ttyAMA0 earlyprintk=pl011,0x1c090000 root=/dev/vda rw --verbose" --dtb=/boot/fdt.dtb
 ./kexec -d -e

Also, my current master branch will only work reliably with PSCI
boot.  Spin-table boot has a bug and will be unstable for the next
few days.  Spin-table will re-boot, but only the primary cpu will
come up.

[1] https://git.linaro.org/people/geoff.levand/kexec-tools.git
 https://git.linaro.org/people/geoff.levand/linux-kexec.git

-Geoff


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-09 18:49   ` Geoff Levand
@ 2014-07-11  9:23     ` Arun Chandran
  2014-07-11 16:58       ` Geoff Levand
  2014-07-11 11:26     ` Arun Chandran
  2014-07-11 15:43     ` Arun Chandran
  2 siblings, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-07-11  9:23 UTC (permalink / raw)
  To: Geoff Levand; +Cc: kexec

Hi Geoff,

Finally I am able to get my kernel loaded with the patch below.


diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
index 894e0e1..bfca40d 100644
--- a/kexec/arch/arm64/kexec-arm64.c
+++ b/kexec/arch/arm64/kexec-arm64.c
@@ -327,7 +327,7 @@ static int get_memory_ranges_dt(struct
memory_range *array, unsigned int *count)
                        dbgprintf("%s:  RAM: %016llx - %016llx\n", __func__,
                                r.start, r.end);

-                       if (!arm64_mem.memstart || r.start < arm64_mem.memstart)
+                       if ((region->size) && (!arm64_mem.memstart ||
r.start < arm64_mem.memstart))
                                arm64_mem.memstart = r.start;
                }
        }


I suspect it was due to zero sized memory ranges in my dtb followed by the
actual region.

get_memory_ranges_dt: node_1676 memory
get_memory_ranges_dt:  RAM: 0000004000000000 - 0000004400000000
get_memory_ranges_dt:  RAM: 0000000000000000 - 0000000000000000
get_memory_ranges_dt:  RAM: 0000000000000000 - 0000000000000000
get_memory_ranges_dt:  RAM: 0000000000000000 - 0000000000000000
get_memory_ranges_dt:  RAM: 0000000000000000 - 0000000000000000

As a result of that arm64_mem.memstart was getting overwritten with zero.
and I was stuck at the error

kexec: kexec/arch/arm64/kexec-arm64.c:242: virt_to_phys: Assertion
`arm64_mem.memstart' failed.

So check the validity of region before updating the 'memstart'

I will try with kexec -e next.

--Arun






On Thu, Jul 10, 2014 at 12:19 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Wed, 2014-07-09 at 19:28 +0530, Arun Chandran wrote:
>> After copying the resulting binaries to my target; I tried loading the
>> kernel Image
>>
>> # kexec -l /Image
>> Modified cmdline: root=/dev/nfs
>> Unable to find /proc/device-tree//chosen/linux,stdout-path, printing
>> from purgatory is diabled
>> Cannot determine the file type of /Image
>>
>> It failed to load the kernel Image. Any pointers?
>
> My kexec-tools [1] only supports loading of arm64 elf files, so
> vmlinux, or a stripped version of it.  Image is a raw binary, and
> is not yet supported.
>
> Maybe something like this is what you need:
>
>  ./kexec -d --load /boot/vmlinux.strip --append="console=ttyAMA0 earlyprintk=pl011,0x1c090000 root=/dev/vda rw --verbose" --dtb=/boot/fdt.dtb
>  ./kexec -d -e
>
> Also, my current master branch will only work reliably with PSCI
> boot.  Spin-table boot has a bug and will be unstable for the next
> few days.  Spin-table will re-boot, but only the primary cpu will
> come up.
>
> [1] https://git.linaro.org/people/geoff.levand/kexec-tools.git
>  https://git.linaro.org/people/geoff.levand/linux-kexec.git
>
> -Geoff
>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-11  9:23     ` Arun Chandran
@ 2014-07-11 16:58       ` Geoff Levand
  0 siblings, 0 replies; 61+ messages in thread
From: Geoff Levand @ 2014-07-11 16:58 UTC (permalink / raw)
  To: Arun Chandran; +Cc: kexec

Hi Arun,

On Fri, 2014-07-11 at 14:53 +0530, Arun Chandran wrote:
> Finally I am able to get my kernel loaded with the patch below.

OK, great.

> I suspect it was due to zero sized memory ranges in my dtb followed by the
> actual region.

Could you please send me your dts?  If you don't want to post it to the
list, could you send it to me in a private mail?

-Geoff


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-09 18:49   ` Geoff Levand
  2014-07-11  9:23     ` Arun Chandran
@ 2014-07-11 11:26     ` Arun Chandran
  2014-07-12  0:19       ` Geoff Levand
  2014-07-11 15:43     ` Arun Chandran
  2 siblings, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-07-11 11:26 UTC (permalink / raw)
  To: Geoff Levand; +Cc: kexec

Hi Geoff,

This is a minor "NULL check" issue in the code.

kexec fails to load, without kernel command line option (--append).

The below code fixes that issue (missing null check.

diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
index 894e0e1..0ca0df4 100644
--- a/kexec/arch/arm64/kexec-arm64.c
+++ b/kexec/arch/arm64/kexec-arm64.c
@@ -135,8 +135,10 @@ int arm64_load_other_segments(struct kexec_info *info)

        /* Processing for arm64_opts.dtb and arm64_opts.command_line. */

-       strncpy(command_line, arm64_opts.command_line, sizeof(command_line));
-       command_line[sizeof(command_line) - 1] = 0;
+       if (arm64_opts.command_line != NULL) {
+               strncpy(command_line, arm64_opts.command_line,
sizeof(command_line));
+               command_line[sizeof(command_line) - 1] = 0;
+       }


--Arun







On Thu, Jul 10, 2014 at 12:19 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Wed, 2014-07-09 at 19:28 +0530, Arun Chandran wrote:
>> After copying the resulting binaries to my target; I tried loading the
>> kernel Image
>>
>> # kexec -l /Image
>> Modified cmdline: root=/dev/nfs
>> Unable to find /proc/device-tree//chosen/linux,stdout-path, printing
>> from purgatory is diabled
>> Cannot determine the file type of /Image
>>
>> It failed to load the kernel Image. Any pointers?
>
> My kexec-tools [1] only supports loading of arm64 elf files, so
> vmlinux, or a stripped version of it.  Image is a raw binary, and
> is not yet supported.
>
> Maybe something like this is what you need:
>
>  ./kexec -d --load /boot/vmlinux.strip --append="console=ttyAMA0 earlyprintk=pl011,0x1c090000 root=/dev/vda rw --verbose" --dtb=/boot/fdt.dtb
>  ./kexec -d -e
>
> Also, my current master branch will only work reliably with PSCI
> boot.  Spin-table boot has a bug and will be unstable for the next
> few days.  Spin-table will re-boot, but only the primary cpu will
> come up.
>
> [1] https://git.linaro.org/people/geoff.levand/kexec-tools.git
>  https://git.linaro.org/people/geoff.levand/linux-kexec.git
>
> -Geoff
>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-11 11:26     ` Arun Chandran
@ 2014-07-12  0:19       ` Geoff Levand
  2014-07-14 12:21         ` Arun Chandran
  0 siblings, 1 reply; 61+ messages in thread
From: Geoff Levand @ 2014-07-12  0:19 UTC (permalink / raw)
  To: Arun Chandran; +Cc: kexec

Hi Arun,

On Fri, 2014-07-11 at 16:56 +0530, Arun Chandran wrote:
> The below code fixes that issue (missing null check.

I just pushed out a fix for this, slightly different
from yours.  Please let me know if you still have a
problem with it.

-Geoff 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-12  0:19       ` Geoff Levand
@ 2014-07-14 12:21         ` Arun Chandran
  0 siblings, 0 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-14 12:21 UTC (permalink / raw)
  To: Geoff Levand; +Cc: kexec

Hi Geoff,

On Sat, Jul 12, 2014 at 5:49 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Fri, 2014-07-11 at 16:56 +0530, Arun Chandran wrote:
>> The below code fixes that issue (missing null check.
>
> I just pushed out a fix for this, slightly different
> from yours.  Please let me know if you still have a
> problem with it.
>

Got it. The loading part is fine with it.

kexec -e still giving troubles.


--Arun

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-09 18:49   ` Geoff Levand
  2014-07-11  9:23     ` Arun Chandran
  2014-07-11 11:26     ` Arun Chandran
@ 2014-07-11 15:43     ` Arun Chandran
  2014-07-14 22:05       ` Geoff Levand
  2 siblings, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-07-11 15:43 UTC (permalink / raw)
  To: Geoff Levand; +Cc: kexec

[-- Attachment #1: Type: text/plain, Size: 1793 bytes --]

On Thu, Jul 10, 2014 at 12:19 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Wed, 2014-07-09 at 19:28 +0530, Arun Chandran wrote:
>> After copying the resulting binaries to my target; I tried loading the
>> kernel Image
>>
>> # kexec -l /Image
>> Modified cmdline: root=/dev/nfs
>> Unable to find /proc/device-tree//chosen/linux,stdout-path, printing
>> from purgatory is diabled
>> Cannot determine the file type of /Image
>>
>> It failed to load the kernel Image. Any pointers?
>
> My kexec-tools [1] only supports loading of arm64 elf files, so
> vmlinux, or a stripped version of it.  Image is a raw binary, and
> is not yet supported.
>
> Maybe something like this is what you need:
>
>  ./kexec -d --load /boot/vmlinux.strip --append="console=ttyAMA0 earlyprintk=pl011,0x1c090000 root=/dev/vda rw --verbose" --dtb=/boot/fdt.dtb
>  ./kexec -d -e
>

Hi ,

I tried kexec reboot with my modified kexec-tools and got a nice kernel panic.
Please find the error log attached.

There is only dtb file related change in the kernel sources, as given below.

###########
                        cpu-release-addr = <0x1 0x0000fff8>;
-                       cpu-return-addr = <0 0>
+                       cpu-return-addr = <0x0 0x0>;
###########

I just assigned a random value to "cpu-return-addr" in my dtb file;
Do you have any idea how to find out that value for my hardware?


> Also, my current master branch will only work reliably with PSCI
> boot.  Spin-table boot has a bug and will be unstable for the next
> few days.  Spin-table will re-boot, but only the primary cpu will
> come up.
>

My hardware uses spin-table method for SMP. As you said there
is a bug in kexec for spin-table method, what about trying it without
CONFIG_SMP? Does your code support that?


--Arun

[-- Attachment #2: error_log.txt --]
[-- Type: text/plain, Size: 11059 bytes --]

# kexec -l vmlinux.strip --dtb=dtb.dtb -d --command-line="console=ttyS0,115200"
random: nonblocking pool is initialized
kexec version: 14.07.10.11.38-g426996b
arch_process_options:83: command_line: console=ttyS0,115200
arch_process_options:85: initrd: (null)
arch_process_options:86: dtmachine_kexec_prepare:547:
b:  kexec image info:
     type:        0
dt    start:       4000080000
    head:        0
b.    nr_segments: 2
dt      segment[0]: 0000004000080000 - 000000400084d000, 7cd000h bytes, 1997 pages
kexec_is_dtb:136: magic: 10000014 : 14000010 : no

      segment[1]: 0000004000860000 - 0000004000862000, 2000h bytes, 2 pages
kekexec_is_dtb:136: magic: d00dfeed : edfe0dd0 : yes
rnkexec_is_dtb:136: magic: 10000014 : 14000010 : no
ekexec_is_dtb:136: magic: d00dfeed : edfe0dd0 : yes
l:kexec_boot_info_init:459: cpu_count: 8
 0x7f7d7kexec_cpu_info_init:422: cpu-0: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000
29kexec_cpu_info_init:422: cpu-1: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000
01kexec_cpu_info_init:422: cpu-2: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000
0 kexec_cpu_info_init:422: cpu-3: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000
kekexec_cpu_info_init:422: cpu-4: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000
rnkexec_cpu_info_init:422: cpu-5: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000
elkexec_cpu_info_init:422: cpu-6: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000
_skexec_cpu_info_init:422: cpu-7: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000
izkexec_boot_info_init:459: cpu_count: 8
e:kexec_cpu_info_init:422: cpu-0: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000
 0kexec_cpu_info_init:422: cpu-1: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000
x7kexec_cpu_info_init:422: cpu-2: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000
adkexec_cpu_info_init:422: cpu-3: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000
22kexec_cpu_info_init:422: cpu-4: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000
kexec_cpu_info_init:422: cpu-5: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000

Mkexec_cpu_info_init:422: cpu-6: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000
odkexec_cpu_info_init:422: cpu-7: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000
ified cmdline: root=/dev/nfs 
Unable to find /proc/device-tree//chosen/linux,stdout-path, printing from purgatory is diabled
get_memory_ranges_dt: node_1676 memory
get_memory_ranges_dt:  RAM: 0000004000000000 - 0000004400000000
get_memory_ranges_dt:  RAM: 0000000000000000 - 0000000000000000
get_memory_ranges_dt:  RAM: 0000000000000000 - 0000000000000000
get_memory_ranges_dt:  RAM: 0000000000000000 - 0000000000000000
get_memory_ranges_dt: Success
p_paddr: ffffffc000080000
p_vaddr: ffffffc000080000
p_filesz: 000000000079cc08
p_memsz: 00000000007cce28
p_offset: 0000000000010000
text_offset: 0000000000080000
page_offset: ffffffc000000000
memstart: 0000004000000000
p_vaddr: ffffffc000080000
virt_to_phys: ffffffc000080000 -> 0000004000080000
text_offset: 0000000000080000
page_offset: ffffffc000000000
memstart:    0000004000000000
entry:       0x4000080000
virt_to_phys: ffffffc000080000 -> 0000004000080000
add_segment_phys_virt: 0000007f7d739010 - 0000007f7ded5c18 (0079cc08) -> 0000004000080000 - 000000400084d000 (007cd000)
dtb:    base 4000860000, size 1c36h (7222)
add_segment_phys_virt: 000000002cc78510 - 000000002cc7a146 (00001c36) -> 0000004000860000 - 0000004000862000 (00002000)
kexec_load: entry = 0x4000080000 flags = 0xb70000
nr_segments = 2
segment[0].buf   = 0x7f7d739010
segment[0].bufsz = 0x79cc08
segment[0].mem   = 0x4000080000
segment[0].memsz = 0x7cd000
segment[1].buf   = 0x2cc78510
segment[1].bufsz = 0x1c36
segment[1].mem   = 0x4000860000
segment[1].memsz = 0x2000
#  
# 
# 
# kexec -e -d
kexekvm: exiting hardware virtualization
c veStarting new kernel
rssmp_spin_table_cpu_die:125: id: 1, holding count: 0
smp_spin_table_cpu_die:125: id: 7, holding count: 0
smp_spin_table_cpu_die:125: id: 6, holding count: 0
smp_spin_table_cpu_die:125: id: 4, holding count: 0
smp_spin_table_cpu_die:125: id: 3, holding count: 0
smp_spin_table_cpu_die:125: id: 2, holding count: 0
smp_spin_table_cpu_die:125: id: 5, holding count: 0
machine_kexec:612: smp_processor_id = 0
Bad mode in Synchronous Abort handler detected, code 0x86000005
machine_kexec:614:
CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.15.0-rc4+ #44
  kexec image info:
task: ffffffc3ef143340 ti: ffffffc3ef160000 task.ti: ffffffc3ef160000
    type:        0
    start:       4000080000
PC is at 0x4000489f80
    head:        43edaef002
LR is at 0xffffffc0000802e8
    nr_segments: 2
pc : [<0000004000489f80>] lr : [<ffffffc0000802e8>] pstate: 600001c5
      segment[0]: 0000004000080000 - 000000400084d000, 7cd000h bytes, 1997 pages
sp : ffffffc3ef163d90
x29: ffffffc3ef163d90 x28: 0000008000000000 
kexec_is_dtb:136: magic: 0 : 0 : no

      segment[1]: 0000004000860000 - 0000004000862000, 2000h bytes, 2 pages
x27: ffffffc3ef160000 x26: ffffffc000604000 
kexec_is_dtb:136: magic: 0 : 0 : no

machine_kexec:622: control_code_page:        ffffffbcedbfc6f8
x25: ffffffc000602000 
machine_kexec:624: reboot_code_buffer_phys:  00000043eda69000
x24: ffffffc000587000 
machine_kexec:626: reboot_code_buffer:       ffffffc3eda69000

machine_kexec:628: relocate_new_kernel:      ffffffc000093158
x23: ffffffc000c72000 
machine_kexec:630: relocate_new_kernel_size: b8h(184) bytes
x22: 00000000500f0000 
machine_kexec:633: kexec_dtb_addr:           0000004000860000

machine_kexec:635: kexec_kimage_head:        00000043edaef002
x21: 0000000000000000 
machine_kexec:637: kexec_kimage_start:       0000004000080000
x20: 0000000000000000 
machine_kexec:639: kexec_entry_dump:

  I 43edaef002 = 43edaef000 (ffffffc3edaef000)
x19: ffffffc00048a8b0 
  D 4000080001 = 4000080000 (ffffffc000080000)
x18: 0000007fd31cc6e0 
x17: 00000000004a6c60 x16: ffffffc00017a834 
  I 43ed4f4002 = 43ed4f4000 (ffffffc3ed4f4000)

x15: 0000000000000006 x14: 00000000000fdb04 
x13: 0000000000000054 x12: 0000004000489f80 
  I 43ed6f4002 = 43ed6f4000 (ffffffc3ed6f4000)

x11: 00000000000024fc x10: ffffffffffffffff 
x9 : 0000000000000033 x8 : ffffffc3fee8153f 
  I 43ed0f4002 = 43ed0f4000 (ffffffc3ed0f4000)

x7 : 0000000000000000 x6 : 00000000000f0000 
x5 : 00000000000f0000 
  D 4000860001 = 4000860000 (ffffffc000860000)
x4 : ffffffc000caa208 
  DONE 0000000004


  kexec_entry_dump: 0 pages
x3 : ffffffc000c72000 x2 : ffffffc0000805d8 
x1 : 0000000000000000 x0 : ffffffc000c72000 
dump_cpus: all:        0 1 2 3 4 5 6 7

Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP
dump_cpus: possible:   0 1 2 3 4 5 6 7
Modules linked in:
dump_cpus: present:    0 1 2 3 4 5 6 7
CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.15.0-rc4+ #44
dump_cpus: active:     0 1 2 3 4 5 6 7
task: ffffffc3ef143340 ti: ffffffc3ef160000 task.ti: ffffffc3ef160000
dump_cpus: online:     0
PC is at 0x4000489f80
LR is at 0xffffffc0000802e8
dump_cpus: not online: 1 2 3 4 5 6 7
pc : [<0000004000489f80>] lr : [<ffffffc0000802e8>] pstate: 600001c5
Bye!
sp : ffffffc3ef163d90
x29: ffffffc3ef163d90 x28: 0000008000000000 
x27: ffffffc3ef160000 x26: ffffffc000604000 
x25: ffffffc000602000 x24: ffffffc000587000 
x23: ffffffc000c72000 x22: 00000000500f0000 
x21: 0000000000000000 x20: 0000000000000000 
x19: ffffffc00048a8b0 x18: 0000007fd31cc6e0 
x17: 00000000004a6c60 x16: ffffffc00017a834 
x15: 0000000000000006 x14: 00000000000fdb04 
x13: 0000000000000054 x12: 0000004000489f80 
x11: 00000000000024fc x10: ffffffffffffffff 
x9 : 0000000000000033 x8 : ffffffc3fee8153f 
x7 : 0000000000000000 x6 : 00000000000f0000 
x5 : 00000000000f0000 x4 : ffffffc000caa208 
x3 : ffffffc000c72000 x2 : ffffffc0000805d8 
x1 : 0000000000000000 x0 : ffffffc000c72000 

Process swapper/7 (pid: 0, stack limit = 0xffffffc3ef160058)
Stack: (0xffffffc3ef163d90 to 0xffffffc3ef164000)
3d80:                                     ef163dc0 ffffffc3 0008ed2c ffffffc0
3da0: 00caa170 ffffffc0 00000007 00000000 00c5ea98 ffffffc0 0008ecb8 ffffffc0
3dc0: ef163e10 ffffffc3 000812d0 ffffffc0 0000200c ffffff80 ef163e40 ffffffc3
3de0: 00cc2700 ffffffc0 00002010 ffffff80 60000145 00000000 00ca871a ffffffc0
3e00: ef163e40 ffffffc3 00000003 00000000 ef163f60 ffffffc3 00083da4 ffffffc0
3e20: ef160000 ffffffc3 ef160000 ffffffc3 ef163f60 ffffffc3 000852b4 ffffffc0
3e40: 00000007 00000000 00597e48 ffffffc0 ef163ef0 ffffffc3 00000001 00000000
3e60: 0000004e 00000000 14000000 00000000 25c17d03 00000002 00000000 00000000
3e80: 00000018 00000000 ef163d70 ffffffc3 00000400 00000000 00000400 00000000
3ea0: 00000000 00000000 ffffffff ffffffff ffffffff ffffffff 981e8598 0000007f
3ec0: 0017a834 ffffffc0 004a6c60 00000000 d31cc6e0 0000007f ef160000 ffffffc3
3ee0: ef160000 ffffffc3 00caf500 ffffffc0 0048d000 ffffffc0 00587a68 ffffffc0
3f00: 00ca871a ffffffc0 0058d200 ffffffc0 00000001 00000000 ef160000 ffffffc3
3f20: 00000000 00000080 ef163f60 ffffffc3 000852b0 ffffffc0 ef163f60 ffffffc3
3f40: 000852b4 ffffffc0 60000145 00000000 ef160000 ffffffc3 ef160000 ffffffc3
3f60: ef163f70 ffffffc3 000e10a4 ffffffc0 ef163fd0 ffffffc3 0008e81c ffffffc0
3f80: 00000007 00000000 00000000 00000000 00caa1b0 ffffffc0 500f0000 00000000
3fa0: 00c72000 00000040 00000000 00000040 0007b000 00000040 0007d000 00000040
3fc0: 00080300 ffffffc0 0008e7fc ffffffc0 00000000 00000000 000802e8 00000040
3fe0: 00002000 00000000 00000000 00000000 08d0b004 05101514 a688cdaa 2620a80f
Call trace:
[<0000004000489f80>] 0x4000489f80
[<ffffffc00008ed28>] handle_IPI+0x1a8/0x1b0
[<ffffffc0000812cc>] gic_handle_irq+0x78/0x80
Exception stack(0xffffffc3ef163e20 to 0xffffffc3ef163f40)
3e20: ef160000 ffffffc3 ef160000 ffffffc3 ef163f60 ffffffc3 000852b4 ffffffc0
3e40: 00000007 00000000 00597e48 ffffffc0 ef163ef0 ffffffc3 00000001 00000000
3e60: 0000004e 00000000 14000000 00000000 25c17d03 00000002 00000000 00000000
3e80: 00000018 00000000 ef163d70 ffffffc3 00000400 00000000 00000400 00000000
3ea0: 00000000 00000000 ffffffff ffffffff ffffffff ffffffff 981e8598 0000007f
3ec0: 0017a834 ffffffc0 004a6c60 00000000 d31cc6e0 0000007f ef160000 ffffffc3
3ee0: ef160000 ffffffc3 00caf500 ffffffc0 0048d000 ffffffc0 00587a68 ffffffc0
3f00: 00ca871a ffffffc0 0058d200 ffffffc0 00000001 00000000 ef160000 ffffffc3
3f20: 00000000 00000080 ef163f60 ffffffc3 000852b0 ffffffc0 ef163f60 ffffffc3
[<ffffffc000083da0>] el1_irq+0x60/0xd0
[<ffffffc0000e10a0>] cpu_startup_entry+0x118/0x180
[<ffffffc00008e818>] secondary_start_kernel+0x10c/0x11c
Code: bad PC value
---[ end trace 7467eabc0e0787f2 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 1 seconds..Reboot failed -- System halted


[-- Attachment #3: Type: text/plain, Size: 143 bytes --]

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-11 15:43     ` Arun Chandran
@ 2014-07-14 22:05       ` Geoff Levand
  2014-07-15 15:28         ` Arun Chandran
  0 siblings, 1 reply; 61+ messages in thread
From: Geoff Levand @ 2014-07-14 22:05 UTC (permalink / raw)
  To: Arun Chandran; +Cc: kexec

Hi Arun,

On Fri, 2014-07-11 at 21:13 +0530, Arun Chandran wrote:
> I tried kexec reboot with my modified kexec-tools and got a nice kernel panic.
> Please find the error log attached.
> There is only dtb file related change in the kernel sources, as given below.
> 
> ###########
>                         cpu-release-addr = <0x1 0x0000fff8>;
> -                       cpu-return-addr = <0 0>
> +                       cpu-return-addr = <0x0 0x0>;
> ###########
> 
> I just assigned a random value to "cpu-return-addr" in my dtb file;

You need the proper cpu-return-addr value.  Because cpu-return-addr
was zero, kexec sent all your secondary processors back to
secondary_startup, and is why you see that in your error log.

I put in a hack for zero cpu-return-addr in my latest version.  Please
try it.  If you use the same kernel for both the 1st and 2nd stage it
should work.

> Do you have any idea how to find out that value for my hardware?

It is a property of your bootloader, not hardware.  Ask your
bootloader maintainer if you can, or check the bootloader sources.

For the rtsm_ve_aemv8a model bootwrapper I set cpu-return-addr to
<0x0 0x80000000> because the bootwrapper linker script puts _start
at PHYS_OFFSET, which is 0x80000000 for that platform.

https://git.kernel.org/cgit/linux/kernel/git/mark/boot-wrapper-aarch64.git/tree/model.lds.S#n30

> My hardware uses spin-table method for SMP. As you said there
> is a bug in kexec for spin-table method, what about trying it without
> CONFIG_SMP? Does your code support that?

CONFIG_SMP=n should work, but I haven't tried it in a long time.

-Geoff

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-14 22:05       ` Geoff Levand
@ 2014-07-15 15:28         ` Arun Chandran
  0 siblings, 0 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-15 15:28 UTC (permalink / raw)
  To: Geoff Levand; +Cc: kexec

Hi Geoff,

I think we should better discuss this in the arm mailing list as it is getting
more arm64 specific.

I will post my reply there.

--Arun

On Tue, Jul 15, 2014 at 3:35 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Fri, 2014-07-11 at 21:13 +0530, Arun Chandran wrote:
>> I tried kexec reboot with my modified kexec-tools and got a nice kernel panic.
>> Please find the error log attached.
>> There is only dtb file related change in the kernel sources, as given below.
>>
>> ###########
>>                         cpu-release-addr = <0x1 0x0000fff8>;
>> -                       cpu-return-addr = <0 0>
>> +                       cpu-return-addr = <0x0 0x0>;
>> ###########
>>
>> I just assigned a random value to "cpu-return-addr" in my dtb file;
>
> You need the proper cpu-return-addr value.  Because cpu-return-addr
> was zero, kexec sent all your secondary processors back to
> secondary_startup, and is why you see that in your error log.
>
> I put in a hack for zero cpu-return-addr in my latest version.  Please
> try it.  If you use the same kernel for both the 1st and 2nd stage it
> should work.
>
>> Do you have any idea how to find out that value for my hardware?
>
> It is a property of your bootloader, not hardware.  Ask your
> bootloader maintainer if you can, or check the bootloader sources.
>
> For the rtsm_ve_aemv8a model bootwrapper I set cpu-return-addr to
> <0x0 0x80000000> because the bootwrapper linker script puts _start
> at PHYS_OFFSET, which is 0x80000000 for that platform.
>
> https://git.kernel.org/cgit/linux/kernel/git/mark/boot-wrapper-aarch64.git/tree/model.lds.S#n30
>
>> My hardware uses spin-table method for SMP. As you said there
>> is a bug in kexec for spin-table method, what about trying it without
>> CONFIG_SMP? Does your code support that?
>
> CONFIG_SMP=n should work, but I haven't tried it in a long time.
>
> -Geoff
>
>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Kexec on arm64
  2014-07-09 10:13 Kexec on arm64 Arun Chandran
  2014-07-09 13:58 ` Arun Chandran
@ 2014-07-09 18:33 ` Geoff Levand
  1 sibling, 0 replies; 61+ messages in thread
From: Geoff Levand @ 2014-07-09 18:33 UTC (permalink / raw)
  To: Arun Chandran; +Cc: kexec

Hi Arun,

On Wed, 2014-07-09 at 15:43 +0530, Arun Chandran wrote:
> and got this error
> 
> elf-arm64.o kexec/arch/arm64/kexec-elf-arm64.c
> kexec/arch/arm64/kexec-elf-arm64.c: In function ‘elf_arm64_probe’:
> kexec/arch/arm64/kexec-elf-arm64.c:32:24: error: ‘EM_AARCH64’
> undeclared (first use in this function)
>   if (ehdr.e_machine != EM_AARCH64) {
>                         ^
> kexec/arch/arm64/kexec-elf-arm64.c:32:24: note: each undeclared
> identifier is reported only once for each function it appears in
> make: *** [kexec/arch/arm64/kexec-elf-arm64.o] Error 1
> 
> Am I missing something here?

Your toolchain is old, or built incorrectly.  Try to upgrade
it.  You need headers that define EM_AARCH64.

As a work-around, maybe you can do something like:

  make CFLAGS=-DEM_AARCH64=183'

-Geoff




_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 61+ messages in thread

[parent not found: <CAFdej006OSyhgDcJ2iZdbjt+PtysN=i_+9Dr4GTmr=+t5yg4Kw@mail.gmail.com>]

* Kexec on arm64
       [not found] <CAFdej006OSyhgDcJ2iZdbjt+PtysN=i_+9Dr4GTmr=+t5yg4Kw@mail.gmail.com>
@ 2014-07-15 17:04 ` Geoff Levand
  2014-07-16 17:57   ` Feng Kan
  0 siblings, 1 reply; 61+ messages in thread
From: Geoff Levand @ 2014-07-15 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Tue, 2014-07-15 at 21:35 +0530, Arun Chandran wrote:
> So I am assuming some code at 0x238 is waiting for
> secondary_kernel_start address
> @000000400000fff8
> 
> So I modified my dtb file and made 0x238 as 'cpu-return-addr '
> 
...
> 
> Is this correct?

I doubt it.  I recommend you contact whoever you are contracted to do
this work for and ask them for the value.

As I mentioned in a mail from yesterday, I pushed out a work around for
this.  Did you try it?

For anyone who comes across this thread and wants to follow it back to
find that mail I sent out, good luck...

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-15 17:04 ` Geoff Levand
@ 2014-07-16 17:57   ` Feng Kan
  2014-07-16 23:04     ` Geoff Levand
  0 siblings, 1 reply; 61+ messages in thread
From: Feng Kan @ 2014-07-16 17:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 15, 2014 at 10:04 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi,
>
> On Tue, 2014-07-15 at 21:35 +0530, Arun Chandran wrote:
>> So I am assuming some code at 0x238 is waiting for
>> secondary_kernel_start address
>> @000000400000fff8
>>
>> So I modified my dtb file and made 0x238 as 'cpu-return-addr '
>>
> ...
>>
>> Is this correct?
>
> I doubt it.  I recommend you contact whoever you are contracted to do
> this work for and ask them for the value.
>
> As I mentioned in a mail from yesterday, I pushed out a work around for
> this.  Did you try it?
>
> For anyone who comes across this thread and wants to follow it back to
> find that mail I sent out, good luck...

Hi Guys:

Just following up on the conversation. The cpu return address of 0 should work
in your case. Since thats the _start of the bootloader, it will run
some core init
code and then put the core back in wfe. However, I think this
functionality should
be pushed back into the kernel side to provide some small page of spin
code rather
than depend on the bootloader. The address 0 could be OCM or NOR flash, in the
on chip memory case, the space could potentially be used for other means.

>
> -Geoff
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-16 17:57   ` Feng Kan
@ 2014-07-16 23:04     ` Geoff Levand
  2014-07-22  9:44       ` Arun Chandran
  0 siblings, 1 reply; 61+ messages in thread
From: Geoff Levand @ 2014-07-16 23:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Feng,

On Wed, 2014-07-16 at 10:57 -0700, Feng Kan wrote:
> Just following up on the conversation. The cpu return address of 0 should work
> in your case. Since thats the _start of the bootloader, it will run
> some core init
> code and then put the core back in wfe. 

OK, I fixed up my code so that zero is valid cpu return address.  Arun,
could you try my latest I pushed out today?

> However, I think this
> functionality should
> be pushed back into the kernel side to provide some small page of spin
> code rather
> than depend on the bootloader. 

This method was already discussed in another thread [1] and decided
against.  With this we would need to set that spin code memory as
reserved from kernel use, and so each boot stage would leak some memory.
A system would eventually run out of memory over kexec's.

[1] http://comments.gmane.org/gmane.linux.ports.arm.kernel/323440

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-16 23:04     ` Geoff Levand
@ 2014-07-22  9:44       ` Arun Chandran
  2014-07-22 13:25         ` Arun Chandran
                           ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-22  9:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 17, 2014 at 4:34 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Feng,
>
> On Wed, 2014-07-16 at 10:57 -0700, Feng Kan wrote:
>> Just following up on the conversation. The cpu return address of 0 should work
>> in your case. Since thats the _start of the bootloader, it will run
>> some core init
>> code and then put the core back in wfe.
>
> OK, I fixed up my code so that zero is valid cpu return address.  Arun,
> could you try my latest I pushed out today?
>
Hi Geoff,

Sorry for the late reply I was away.

Yes I tried the new code.
My dts file has the below change.

################
diff --git a/arch/arm64/boot/dts/apm-storm.dtsi
b/arch/arm64/boot/dts/apm-storm.dtsi
index e0bf91d..b64e549 100644
--- a/arch/arm64/boot/dts/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm-storm.dtsi
@@ -24,64 +24,64 @@
                        compatible = "apm,potenza", "arm,armv8";
                        reg = <0x0 0x000>;
                        enable-method = "spin-table";
-                       cpu-release-addr = <0x1 0x0000fff8>;
-                       cpu-return-addr = <0x0 0x0> /* Updated by bootloader */
+                       cpu-release-addr = <0x40 0x0000fff8>;
+                       cpu-return-addr = <0x0 0x0>; /* Updated by bootloader */
                };
#################
All other cpu nodes have similar change.

1) Loading  ( I don't change commandline and dtb; assuming kexec will
reuse whatever
the booted kernel has as commandline and dtb)

# kexec -l vmlinux.strip
kexec version: 14.07.17.12.17-gb6cccb4
Modified cmdline: root=/dev/nfs
Unable to find /proc/device-tree//chosen/linux,stdout-path, printing
from purgatory is diabled
Modified cmdline: root=/dev/nfs
Unable to find /proc/device-tree//chosen/linux,stdout-path, printing
from purgatory is diabled
machine_kexec_prepare:508:
  kexec image info:
    type:        0
    start:       4000080004
    head:        0
    nr_segments: 2
      segment[0]: 0000004000080000 - 000000400088c000, 80c000h bytes, 2060 pages
kexec_is_dtb:115: magic: 4d5a0091 : 91005a4d : no
      segment[1]: 00000040008a0000 - 00000040008a3000, 3000h bytes, 3 pages
kexec_is_dtb:115: magic: d00dfeed : edfe0dd0 : yes
kexec_boot_info_init:384: cpu_count: 8
kexec_cpu_info_init:352: cpu-0: hwid-0, 'spin-table', cpu-release-addr
400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-1: hwid-1, 'spin-table', cpu-release-addr
400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-2: hwid-100, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-3: hwid-101, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-4: hwid-200, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-5: hwid-201, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-6: hwid-300, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-7: hwid-301, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_is_dtb:115: magic: 4d5a0091 : 91005a4d : no
kexec_is_dtb:115: magic: d00dfeed : edfe0dd0 : yes
kexec_boot_info_init:384: cpu_count: 8
kexec_cpu_info_init:352: cpu-0: hwid-0, 'spin-table', cpu-release-addr
400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-1: hwid-1, 'spin-table', cpu-release-addr
400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-2: hwid-100, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-3: hwid-101, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-4: hwid-200, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-5: hwid-201, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-6: hwid-300, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-7: hwid-301, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_check:440: hwid-0 OK
kexec_cpu_check:440: hwid-1 OK
kexec_cpu_check:440: hwid-100 OK
kexec_cpu_check:440: hwid-101 OK
kexec_cpu_check:440: hwid-200 OK
kexec_cpu_check:440: hwid-201 OK
kexec_cpu_check:440: hwid-300 OK
kexec_cpu_check:440: hwid-301 OK


2) Rebooting
#########################
# kexec -e
kexec version: 1kvm: exiting hardware virtualization
4.07.17.12.17-gbStarting new kernel
6cccb4
 Ump_spin_tanblaeb_lcpeu_d ite:127: oi dh:a n7d,l holding count: 0e
 kernel NULL pointer dereference at virtual address 00000291
smp_spin_Itnaibtlei_cpaul_diie:12z7:i nigd :c 3g,r oholding couunpt :s 0u
bsys cpu
smpL_isnpuixn _table_cpu_diev:e1r2s7i:o ni d: 6, hol3d.i1n6g. 0c-orucnt: 0
4+ (arun at arun-OptiPlex-9010) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05) )
#25 SMP smp_sPpin_tablReE_EcMpPuT_ dTie:127: iude:  J5u, hlo ld2ing
coun2:
37:03 IST 2014
smp_Cspin_tPabUl:e _AcApruc_die:127: id: h46,4  hPorldiong count:c e0s
 or [500f0000] revision 0
smp_speifni_:t aGbeltet_cpu_die:i1n2g7 : ipd:a 2, holdinrga mceount:t e0
rs from FDT:
smep_fsip:i nC_atable_cpu_dien:'1t27: i df: 1i, holdinngd  cSoyusntt:e 0
m Table in device tree!
macchimne_kexaec:: 572C: smp_pMrAo:c efsasiorl_ied = 0
d to reserve 16 MiB
dachine_kexecO:n5 7n4o:
 e 0 totalpages: 4194304
a k e xec image inNfoor:m
 l zone: 57344 pages used for memmap
    type:        0
z   sta r tN:o     r  m4a0000800l04
 one: 4194304 pages, LIFO batch:31
    head:   P E R   43Cea9bPf002
U: Embedded 11 pages/cpu @ffffffc3fff7d000 s13120 r8192 d23744 u45056
    nrp_cspeug-maelnltosc: 2
: s13120 r8192 d23744 u45056 alloc=11*4096
   p c p usegment-[a0l]l:o c0:0 0[0000400]00 800 000[ -0 ]0000004000
881c 0[000],  280c000h bytes,  [200]6 0 3pages
 [0] 4 [0] 5 [0] 6 [0] 7
eexBeuc_is_dtb:1i1l5:t  m1a gizc: 0 : 0 : noon
 lists in Zone order, mobility grouping on.  Total pages: 4136960
/  K e r segment[1]: n0e0l0 00c0o400m08a0000 m-a n0d00
00040l008ai30n00e,:  3r000ho obty=tes, 3/ dpeagesv
 nfs rw nfsroot=10.162.103.228:/nfs_root/dora_june_6/apm-image-minimal-mustangbe
ip=10.162.103.21:10.162.103.228:10.162.103.1:255.255.255.0:mustangk:eextehc_0is:_odtb:1f15f:
 pmaangic:i 0c =: 0 : 1no
console=ttyS0,115200 earlyprintk=uart8250-32bit,0x1c020000 debug
maxcpus=8 swiotlb=65536 log_buf_len=1M
8aclhinoeg_kex_ec:582: cobnturfo_ll_ecode_page:        nf:f f1ff0fbc4edb67ee8
 576
6achinee_akrelxyec: 58l4: reobogo tb_ucfo dfe_buffer_physr:e e :0
000010435eaffb00007
 (92%)
macPhine_IkDexec:58 6h:a srehb otot_acode_bufferb:l e   e n t
rfifffffce3se:a f4f0b000
96 (order: 3, 32768 bytes)
macDhine_kexeecn:t5r88: ryel occate_neaw_ckheer nelh:a s     fffffhf
c0t0a0093b18
ble entries: 2097152 (order: 12, 16777216 bytes)
machineI_kneoxedc:5e90-: relocate_cnaecwh_ek ehransel_size: b8hh( 1t84)a bytes
ble entries: 1048576 (order: 11, 8388608 bytes)
machinMe_keemxoercy::5 913: kexec6_5d0t8b_6addr2:4 K
0/0100004600708a0000
77216K available (4360K kernel code, 299K rwdata, 1528K rodata, 6556K
init, 202K bss, 268592K reserved)
oacVhinei_kretxueacl: 595: kexec_kkeirmnaegle _mhead:     e m o r0y0
00l0043eaa9bf002y
 ut:
    vmalloc : 0xffffff8000000000 - 0xffffffbbffff0000   (245759 MB)
    vmemmap : 0xffffffbce0000000 - 0xffffffbcee000000   (   224 MB)
0  machine_ kmeoxdecu:l597:e kexecs_ k:i m0age_xsftarft:    f  f
f0f0b0f0f0c04000080004
 00000 - 0xffffffc000000000   (    64 MB)
    memory  : 0xffffffc000000000 - 0xffffffc400000000   ( 16384 MB)
      .init : 0xffffffc000642000 -machine_ke x0excf:f5f99:f
kexec_efntfrcy0_0d0ump:
ca9340   (  6557 kB)
      .text : 0xffffffc000080000 - 0xffffffc0006411c4   (  5893 kB)
f     .data : 0xffffffc000caa000 - 0xffffffc000cf4f28     (I
43ea9bf002 =  343ea9b0f000 (f0fffffc 3ekaB9)b
 000)
d D 40S0LU00B8:0 0H0W1 = a4000080000l i(gfnf=f6f4ffc,00008000 0O)r
######################

This doesn't seems to be working. Random behaviors are observed. Some
times it rebooted to u-boot
prompt. Sometimes kernel soft resets itself in an endless loop
(bootlog is repeating over and over again)

To debug what is happening I put a while(1) just before jumping into
kexec reboot code.

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 31cba91..8843623 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -85,6 +85,7 @@ void soft_restart(unsigned long addr)

        smp_secondary_shutdown();

+       while(1);
        /* Switch to the identity mapping */
        phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
        phys_reset(addr);

I break into target with BDI3000 now; and see the below output

TARGET#0>state
Core#0: halted 0xffffffc000085240 External Debug Request
Core#1: halted 0x0000004000080394 External Debug Request
Core#2: halted 0x0000004000080394 External Debug Request
Core#3: halted 0x0000004000080394 External Debug Request
Core#4: halted 0xffffffc0000802f8 External Debug Request
Core#5: halted 0x0000004000080394 External Debug Request
Core#6: halted 0xffffffc0000802f8 External Debug Request
Core#7: halted 0x0000004000080394 External Debug Request

I think some of the secondary CPUs are not behaving as expected;
As of now I don't have any clues for this.


--Arun

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-22  9:44       ` Arun Chandran
@ 2014-07-22 13:25         ` Arun Chandran
  2014-07-24  0:38           ` Geoff Levand
  2014-07-30  3:26           ` Feng Kan
  2014-07-24  0:10         ` Geoff Levand
  2014-07-24  9:13         ` Mark Rutland
  2 siblings, 2 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-22 13:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Tue, Jul 22, 2014 at 3:14 PM, Arun Chandran <achandran@mvista.com> wrote:
> On Thu, Jul 17, 2014 at 4:34 AM, Geoff Levand <geoff@infradead.org> wrote:
>> Hi Feng,
>>
>> On Wed, 2014-07-16 at 10:57 -0700, Feng Kan wrote:
>>> Just following up on the conversation. The cpu return address of 0 should work
>>> in your case. Since thats the _start of the bootloader, it will run
>>> some core init
>>> code and then put the core back in wfe.
>>
>> OK, I fixed up my code so that zero is valid cpu return address.  Arun,
>> could you try my latest I pushed out today?
>>
> Hi Geoff,
>
> Sorry for the late reply I was away.
>
> Yes I tried the new code.
> My dts file has the below change.
>
> ################
> diff --git a/arch/arm64/boot/dts/apm-storm.dtsi
> b/arch/arm64/boot/dts/apm-storm.dtsi
> index e0bf91d..b64e549 100644
> --- a/arch/arm64/boot/dts/apm-storm.dtsi
> +++ b/arch/arm64/boot/dts/apm-storm.dtsi
> @@ -24,64 +24,64 @@
>                         compatible = "apm,potenza", "arm,armv8";
>                         reg = <0x0 0x000>;
>                         enable-method = "spin-table";
> -                       cpu-release-addr = <0x1 0x0000fff8>;
> -                       cpu-return-addr = <0x0 0x0> /* Updated by bootloader */
> +                       cpu-release-addr = <0x40 0x0000fff8>;
> +                       cpu-return-addr = <0x0 0x0>; /* Updated by bootloader */
>                 };
> #################
> All other cpu nodes have similar change.
>

I tried the same dtb with UP configuration. For UP kernel to compile
did the below modifications

##############################
diff --git a/arch/arm64/kernel/cpu_ops.c b/arch/arm64/kernel/cpu_ops.c
index 1ccedb4c..c6a2a7e 100644
--- a/arch/arm64/kernel/cpu_ops.c
+++ b/arch/arm64/kernel/cpu_ops.c
@@ -89,6 +89,7 @@ void __init cpu_read_bootcpu_ops(void)
        cpu_read_ops(dn, 0);
 }

+#if 0
 int __init cpu_ops_init(void)
 {
        int result = 0;
@@ -110,3 +111,4 @@ void cpu_ops_shutdown(void)
        if (cpu_operation_psci.shutdown)
                cpu_operation_psci.shutdown();
 }
+#endif
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index fba6d50..c3cf246 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -609,7 +609,7 @@ void machine_kexec(struct kimage *image)
        flush_icache_range((unsigned long)reboot_code_buffer,
                (unsigned long)reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);

-       dump_cpus();
+       //dump_cpus();

        pr_info("Bye!\n");

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 31cba91..6bc85f78 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -83,7 +83,7 @@ void soft_restart(unsigned long addr)

        setup_restart();

-       smp_secondary_shutdown();
+       //smp_secondary_shutdown();

        /* Switch to the identity mapping */
        phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
@@ -130,7 +130,7 @@ void arch_cpu_idle_dead(void)
  */
 void machine_shutdown(void)
 {
-       smp_send_stop();
+       //smp_send_stop();
 }

 /*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index c29dde1..14c339c 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -73,7 +73,7 @@ ENTRY(cpu_reset)
        bic     x1, x1, #1
        msr     sctlr_el1, x1                   // disable the MMU
        isb
-       bl      secondary_shutdown
+#      bl      secondary_shutdown
        ret     x0
 ENDPROC(cpu_reset)
#####################

With the default target configuration "kexec -e" failed to execute
in UP scenario also.

But I had some luck when I did the same steps with L3 cache
disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
it has an L3 cache. Luckily I was able to disable it in u-boot.

With the L3 cache disabled configuration I am able to
do "kexec -e". Please see the log attached.

Feng,
I doubt kernel is unaware of the presence of L3 cache, this subsequently
makes "kexec -e" to fail.

Do you have any idea how to make the kernel to take control of L3 cache?

--Arun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kexec_unipro_log
Type: application/octet-stream
Size: 9227 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140722/8336d688/attachment-0001.obj>

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-22 13:25         ` Arun Chandran
@ 2014-07-24  0:38           ` Geoff Levand
  2014-07-24  9:36             ` Mark Rutland
  2014-07-24 11:50             ` Arun Chandran
  2014-07-30  3:26           ` Feng Kan
  1 sibling, 2 replies; 61+ messages in thread
From: Geoff Levand @ 2014-07-24  0:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:

> I tried the same dtb with UP configuration. For UP kernel to compile
> did the below modifications

I'll test and fixup the kexec UP build in the next few days.

...

> With the default target configuration "kexec -e" failed to execute
> in UP scenario also.
> 
> But I had some luck when I did the same steps with L3 cache
> disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
> it has an L3 cache. Luckily I was able to disable it in u-boot.
> 
> With the L3 cache disabled configuration I am able to
> do "kexec -e". Please see the log attached.

All memory management for the main cpu is done by the arch code.  Kexec
and cpu hot plug only work with the secondary cpus, so the problem would
be in the arch memory code, either in setup_restart() for shutdown, or
in the startup code.

I guess setup_restart() is not doing something it needs to do for your
platform.

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-24  0:38           ` Geoff Levand
@ 2014-07-24  9:36             ` Mark Rutland
  2014-07-24 12:49               ` Arun Chandran
                                 ` (2 more replies)
  2014-07-24 11:50             ` Arun Chandran
  1 sibling, 3 replies; 61+ messages in thread
From: Mark Rutland @ 2014-07-24  9:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
> Hi Arun,
> 
> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:
> 
> > I tried the same dtb with UP configuration. For UP kernel to compile
> > did the below modifications
> 
> I'll test and fixup the kexec UP build in the next few days.
> 
> ...
> 
> > With the default target configuration "kexec -e" failed to execute
> > in UP scenario also.

It would be helpful to know _how_ it failed. Do you have any log output?

> > 
> > But I had some luck when I did the same steps with L3 cache
> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
> > it has an L3 cache. Luckily I was able to disable it in u-boot.
> > 
> > With the L3 cache disabled configuration I am able to
> > do "kexec -e". Please see the log attached.

Hmm. We don't expect the kernel to do any L3 management. It seems that
memory subsystems with L3 caches respecting cache maintenance by VA are
going to become relatively common, and we expect to handle them all by
performing maintenance by VA. See commit c218bca74eea (arm64: Relax the
kernel cache requirements for boot) for what we do at boot time.

> 
> All memory management for the main cpu is done by the arch code.  Kexec
> and cpu hot plug only work with the secondary cpus, so the problem would
> be in the arch memory code, either in setup_restart() for shutdown, or
> in the startup code.

It's possible that soft_restart and setup_restart are a little dodgy, as
they also rely on the compiler being smart and not touching the stack
after setup_restart().

However, they provide absolutely no guarantee that any data has been
flushed out to the PoC [1]. If you require any data to be flushed out to the
PoC so as to be visible to noncacheable accesses, you will need to
ensure that this is flushed by VA before soft_restart is called. Data
may have migrated to another cache (e.g. another CPU, or the L3) where
it is not visible.

> 
> I guess setup_restart() is not doing something it needs to do for your
> platform.

Unless you can see soft_restart/setup_restart making stack accesses
after the first flush_cache_all call, I suspect your code is not
flushing some data it requires.

I'd been meaning to clean up soft_restart and setup_restart for a while.
I'll have a go at cleaning them up along with any remaining
flush_cache_all abuse in arm64.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-24  9:36             ` Mark Rutland
@ 2014-07-24 12:49               ` Arun Chandran
  2014-07-25  0:17               ` Geoff Levand
  2014-07-25 10:26               ` Arun Chandran
  2 siblings, 0 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-24 12:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 24, 2014 at 3:06 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>> Hi Arun,
>>
>> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:
>>
>> > I tried the same dtb with UP configuration. For UP kernel to compile
>> > did the below modifications
>>
>> I'll test and fixup the kexec UP build in the next few days.
>>
>> ...
>>
>> > With the default target configuration "kexec -e" failed to execute
>> > in UP scenario also.
>
> It would be helpful to know _how_ it failed. Do you have any log output?

I don't have any error log for for this.
If I loop before jumping to relocate_new_kernel and break with
BDI I can see.

CPU#0>state
Core#0: halted 0x0000004000094230 External Debug Request

CPU#0>rd
GPR00: 000000000000003f 0000000034d5d918 0000004000000000 0000000000000004
GPR04: 000000000000001f 000000000000001b 0000000000000000 ffffffffffffffff
GPR08: 0000000000000014 ffffffffffffffff 0000000000000000 0000000000000002
GPR12: ffffffc000085030 aa0e03ed3600004a 9100218cf940018a ffffffffffffffff
GPR16: ffffffc0000cc00c 0000000000434260 0000007ffe514d80 00000043eadd2000
GPR20: ffffffc3eadd2000 ffffffc000cc5000 ffffffc3ea42bf88 00000003eadd2000
GPR24: ffffffc0004ada58 ffffffc0005b2e68 ffffffc0005b2e88 ffffffc0005b2ef0
GPR28: ffffffc000092000 ffffffc3f017bce0 ffffffc000085054 ffffffc3f017bce0

x0 contains a corrupted address in this case

If I do the same without dosabling Dcache
#######
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6bc85f78..61f95a2 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -70,10 +70,10 @@ static void setup_restart(void)
        flush_cache_all();

        /* Turn D-cache off */
-       cpu_cache_off();
+//     cpu_cache_off();

        /* Push out any further dirty data, and ensure cache is empty */
-       flush_cache_all();
+//     flush_cache_all();
 }

 void soft_restart(unsigned long addr)
##########
I can see

CPU#0>state
Core#0: halted 0x0000004000094230 External Debug Request

CPU#0>
CPU#0>rd
GPR00: 00000043eae90000 0000000034d5d91c 0000004000000000 0000000000000004
GPR04: 000000000000001f 000000000000001b 0000000000000000 ffffffffffffffff
GPR08: 0000000000000014 ffffffffffffffff 0000000000000000 0000000000000002
GPR12: ffffffc000085028 aa0e03ed3600004a 9100218cf940018a ffffffffffffffff
GPR16: ffffffc0000cc00c 0000000000434260 0000007fe79f31e0 00000043eae90000
GPR20: ffffffc3eae90000 ffffffc000cc5000 ffffffc3ea42ef88 00000003eae90000
GPR24: ffffffc0004ada58 ffffffc0005b2e68 ffffffc0005b2e88 ffffffc0005b2ef0
GPR28: ffffffc000092000 ffffffc3ebad7ce0 ffffffc00008504c ffffffc3ebad7ce0
CPU#0>
CPU#0>go 0x00000043eae90000
CPU#0>
CPU#0>
CPU#0>h
    Core number       : 0
    Core state        : debug (AArch64 EL1)
    Debug entry cause : External Debug Request
    Current PC        : 0xffffffc000083200
    Current CPSR      : 0x600003c5 (EL1h)

In this case x0 contais the correct jump address; but kexeced
goes to " 0xffffffc000083200". It did not print anything at
console.

If you have any other experiments to find the root cause
please tell me I can do it.

>
>> >
>> > But I had some luck when I did the same steps with L3 cache
>> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
>> > it has an L3 cache. Luckily I was able to disable it in u-boot.
>> >
>> > With the L3 cache disabled configuration I am able to
>> > do "kexec -e". Please see the log attached.
>
> Hmm. We don't expect the kernel to do any L3 management. It seems that
> memory subsystems with L3 caches respecting cache maintenance by VA are
> going to become relatively common, and we expect to handle them all by
> performing maintenance by VA. See commit c218bca74eea (arm64: Relax the
> kernel cache requirements for boot) for what we do at boot time.
>

I will confirm this by booting with L3 cache enabled, reboot to u-boot,
don't flush L3 from u-boot and again boot to linux.

--Arun

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-24  9:36             ` Mark Rutland
  2014-07-24 12:49               ` Arun Chandran
@ 2014-07-25  0:17               ` Geoff Levand
  2014-07-25 10:31                 ` Arun Chandran
                                   ` (2 more replies)
  2014-07-25 10:26               ` Arun Chandran
  2 siblings, 3 replies; 61+ messages in thread
From: Geoff Levand @ 2014-07-25  0:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
 
> > All memory management for the main cpu is done by the arch code.  Kexec
> > and cpu hot plug only work with the secondary cpus, so the problem would
> > be in the arch memory code, either in setup_restart() for shutdown, or
> > in the startup code.
> 
> It's possible that soft_restart and setup_restart are a little dodgy, as
> they also rely on the compiler being smart and not touching the stack
> after setup_restart().
> 
> However, they provide absolutely no guarantee that any data has been
> flushed out to the PoC [1]. If you require any data to be flushed out to the
> PoC so as to be visible to noncacheable accesses, you will need to
> ensure that this is flushed by VA before soft_restart is called. Data
> may have migrated to another cache (e.g. another CPU, or the L3) where
> it is not visible.

OK, kexec's reset routine relocate_new_kernel does use a few global
variables that are set by the main kexec routines.  I added a call to
__flush_dcache_area(), which uses 'dc civac' for those.

I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
flush_cache_all() would be enough.

Arun, I also fixed UP builds.  Could you pull my latest and try with L3
enabled?

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-25  0:17               ` Geoff Levand
@ 2014-07-25 10:31                 ` Arun Chandran
  2014-07-25 10:36                 ` Mark Rutland
  2014-07-25 11:48                 ` Arun Chandran
  2 siblings, 0 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-25 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi,
>
> On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
>> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>
>> > All memory management for the main cpu is done by the arch code.  Kexec
>> > and cpu hot plug only work with the secondary cpus, so the problem would
>> > be in the arch memory code, either in setup_restart() for shutdown, or
>> > in the startup code.
>>
>> It's possible that soft_restart and setup_restart are a little dodgy, as
>> they also rely on the compiler being smart and not touching the stack
>> after setup_restart().
>>
>> However, they provide absolutely no guarantee that any data has been
>> flushed out to the PoC [1]. If you require any data to be flushed out to the
>> PoC so as to be visible to noncacheable accesses, you will need to
>> ensure that this is flushed by VA before soft_restart is called. Data
>> may have migrated to another cache (e.g. another CPU, or the L3) where
>> it is not visible.
>
> OK, kexec's reset routine relocate_new_kernel does use a few global
> variables that are set by the main kexec routines.  I added a call to
> __flush_dcache_area(), which uses 'dc civac' for those.
>
> I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
> flush_cache_all() would be enough.
>
> Arun, I also fixed UP builds.  Could you pull my latest and try with L3
> enabled?

I tried it but kexeced kernel fails to boot.
Ran the code with and without the call to cpu_cache_off();
in arch/arm64/kernel/process.c. I have put more details
in another mail. Please see that.

--Arun









>
> -Geoff
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-25  0:17               ` Geoff Levand
  2014-07-25 10:31                 ` Arun Chandran
@ 2014-07-25 10:36                 ` Mark Rutland
  2014-07-25 11:48                 ` Arun Chandran
  2 siblings, 0 replies; 61+ messages in thread
From: Mark Rutland @ 2014-07-25 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 25, 2014 at 01:17:48AM +0100, Geoff Levand wrote:
> Hi,
> 
> On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
> > On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>  
> > > All memory management for the main cpu is done by the arch code.  Kexec
> > > and cpu hot plug only work with the secondary cpus, so the problem would
> > > be in the arch memory code, either in setup_restart() for shutdown, or
> > > in the startup code.
> > 
> > It's possible that soft_restart and setup_restart are a little dodgy, as
> > they also rely on the compiler being smart and not touching the stack
> > after setup_restart().
> > 
> > However, they provide absolutely no guarantee that any data has been
> > flushed out to the PoC [1]. If you require any data to be flushed out to the
> > PoC so as to be visible to noncacheable accesses, you will need to
> > ensure that this is flushed by VA before soft_restart is called. Data
> > may have migrated to another cache (e.g. another CPU, or the L3) where
> > it is not visible.
> 
> OK, kexec's reset routine relocate_new_kernel does use a few global
> variables that are set by the main kexec routines.  I added a call to
> __flush_dcache_area(), which uses 'dc civac' for those.
> 
> I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
> flush_cache_all() would be enough.

Unfortunately Set/Way ops don't provide the guarantee you require. While
they may happen to force prior writes out to the PoC on some
implementations, this guarantee is not provided by the architecture. The
only way to guarantee data is out at the PoC per the architecture is to
use cache maintenance by VA (unless you are unlucky enough to be on a
32-bit system with an outer cache that requires MMIO maintenance).

Almost every use of Set/Way operations is dodgy. They only make sense
for IMPLEMENTATION DEFINED power-on / power-off sequences (which the
arm64 kernel won't be dealing with), and some edge cases where you need
to guarantee that the local d-caches are empty (to avoid unexpectedly
hitting in the cache).

The naming of __flush_dcache_all is certainly misleading and I intend to
address that shortly.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-25  0:17               ` Geoff Levand
  2014-07-25 10:31                 ` Arun Chandran
  2014-07-25 10:36                 ` Mark Rutland
@ 2014-07-25 11:48                 ` Arun Chandran
  2014-07-25 12:14                   ` Mark Rutland
  2014-07-26  0:18                   ` Geoff Levand
  2 siblings, 2 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-25 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi,
>
> On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
>> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>
>> > All memory management for the main cpu is done by the arch code.  Kexec
>> > and cpu hot plug only work with the secondary cpus, so the problem would
>> > be in the arch memory code, either in setup_restart() for shutdown, or
>> > in the startup code.
>>
>> It's possible that soft_restart and setup_restart are a little dodgy, as
>> they also rely on the compiler being smart and not touching the stack
>> after setup_restart().
>>
>> However, they provide absolutely no guarantee that any data has been
>> flushed out to the PoC [1]. If you require any data to be flushed out to the
>> PoC so as to be visible to noncacheable accesses, you will need to
>> ensure that this is flushed by VA before soft_restart is called. Data
>> may have migrated to another cache (e.g. another CPU, or the L3) where
>> it is not visible.
>
> OK, kexec's reset routine relocate_new_kernel does use a few global
> variables that are set by the main kexec routines.  I added a call to
> __flush_dcache_area(), which uses 'dc civac' for those.
>
> I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
> flush_cache_all() would be enough.
>
> Arun, I also fixed UP builds.  Could you pull my latest and try with L3
> enabled?
>
I got this working. As 'Mark Rutland' pointed in another mail that
it could be problem with flushing the cache; I did a read of
1GB data from start of RAM to a volatile var. I assume that
this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)

###################
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 786daa6..90418f3 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -63,6 +63,10 @@ static inline void smp_secondary_shutdown(void) {}

 static void setup_restart(void)
 {
+       volatile u64 tmp;
+       volatile u64 *addr;
+
+       addr = (u64 *)(0xffffffc000000000);
        /*
         * Tell the mm system that we are going to reboot -
         * we may need it to insert some 1:1 mappings so that
@@ -75,6 +79,11 @@ static void setup_restart(void)
        /* Clean and invalidate caches */
        flush_cache_all();

+       for ( ;addr < (u64 *)0xffffffc040000000; addr++)
+       {
+               tmp = *addr;
+       }
+
        /* Turn D-cache off */
        cpu_cache_off();

###################

With the above change latest kernel @
https://git.linaro.org/people/geoff.levand/linux-kexec.git
is able to do kexec with L3 enabled in UP scenario.

--Arun













> -Geoff
>

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-25 11:48                 ` Arun Chandran
@ 2014-07-25 12:14                   ` Mark Rutland
  2014-07-25 15:29                     ` Arun Chandran
  2014-07-26  0:18                   ` Geoff Levand
  1 sibling, 1 reply; 61+ messages in thread
From: Mark Rutland @ 2014-07-25 12:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 25, 2014 at 12:48:04PM +0100, Arun Chandran wrote:
> Hi Geoff,
> 
> On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote:
> > Hi,
> >
> > On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
> >> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
> >
> >> > All memory management for the main cpu is done by the arch code.  Kexec
> >> > and cpu hot plug only work with the secondary cpus, so the problem would
> >> > be in the arch memory code, either in setup_restart() for shutdown, or
> >> > in the startup code.
> >>
> >> It's possible that soft_restart and setup_restart are a little dodgy, as
> >> they also rely on the compiler being smart and not touching the stack
> >> after setup_restart().
> >>
> >> However, they provide absolutely no guarantee that any data has been
> >> flushed out to the PoC [1]. If you require any data to be flushed out to the
> >> PoC so as to be visible to noncacheable accesses, you will need to
> >> ensure that this is flushed by VA before soft_restart is called. Data
> >> may have migrated to another cache (e.g. another CPU, or the L3) where
> >> it is not visible.
> >
> > OK, kexec's reset routine relocate_new_kernel does use a few global
> > variables that are set by the main kexec routines.  I added a call to
> > __flush_dcache_area(), which uses 'dc civac' for those.
> >
> > I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
> > flush_cache_all() would be enough.
> >
> > Arun, I also fixed UP builds.  Could you pull my latest and try with L3
> > enabled?
> >
> I got this working. As 'Mark Rutland' pointed in another mail that
> it could be problem with flushing the cache; I did a read of
> 1GB data from start of RAM to a volatile var. I assume that
> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)

You've managed to get the cache to evict some lines, which proves my
theory, but this is absolute nonsense and guarantees nothing.

So NAK to this.

If you need to perform cache maintenance to guarantee data is visible to
non-cacheable accesses  you _must_ use the architected mechanism for
cleaning data to the PoC: DC CVAC. We have wrappers for flushing ranges.

Anything else is nonsense and does not provide the guarantee you need.

That said, I still am not sure what guarantee you are attempting to get.
Which data do you need out at the PoC?

Thanks,
Mark.

> 
> ###################
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 786daa6..90418f3 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -63,6 +63,10 @@ static inline void smp_secondary_shutdown(void) {}
> 
>  static void setup_restart(void)
>  {
> +       volatile u64 tmp;
> +       volatile u64 *addr;
> +
> +       addr = (u64 *)(0xffffffc000000000);
>         /*
>          * Tell the mm system that we are going to reboot -
>          * we may need it to insert some 1:1 mappings so that
> @@ -75,6 +79,11 @@ static void setup_restart(void)
>         /* Clean and invalidate caches */
>         flush_cache_all();
> 
> +       for ( ;addr < (u64 *)0xffffffc040000000; addr++)
> +       {
> +               tmp = *addr;
> +       }
> +
>         /* Turn D-cache off */
>         cpu_cache_off();
> 
> ###################
> 
> With the above change latest kernel @
> https://git.linaro.org/people/geoff.levand/linux-kexec.git
> is able to do kexec with L3 enabled in UP scenario.
> 
> --Arun
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> > -Geoff
> >
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-25 12:14                   ` Mark Rutland
@ 2014-07-25 15:29                     ` Arun Chandran
  0 siblings, 0 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-25 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 25, 2014 at 5:44 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Jul 25, 2014 at 12:48:04PM +0100, Arun Chandran wrote:
>> Hi Geoff,
>>
>> On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote:
>> > Hi,
>> >
>> > On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
>> >> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>> >
>> >> > All memory management for the main cpu is done by the arch code.  Kexec
>> >> > and cpu hot plug only work with the secondary cpus, so the problem would
>> >> > be in the arch memory code, either in setup_restart() for shutdown, or
>> >> > in the startup code.
>> >>
>> >> It's possible that soft_restart and setup_restart are a little dodgy, as
>> >> they also rely on the compiler being smart and not touching the stack
>> >> after setup_restart().
>> >>
>> >> However, they provide absolutely no guarantee that any data has been
>> >> flushed out to the PoC [1]. If you require any data to be flushed out to the
>> >> PoC so as to be visible to noncacheable accesses, you will need to
>> >> ensure that this is flushed by VA before soft_restart is called. Data
>> >> may have migrated to another cache (e.g. another CPU, or the L3) where
>> >> it is not visible.
>> >
>> > OK, kexec's reset routine relocate_new_kernel does use a few global
>> > variables that are set by the main kexec routines.  I added a call to
>> > __flush_dcache_area(), which uses 'dc civac' for those.
>> >
>> > I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
>> > flush_cache_all() would be enough.
>> >
>> > Arun, I also fixed UP builds.  Could you pull my latest and try with L3
>> > enabled?
>> >
>> I got this working. As 'Mark Rutland' pointed in another mail that
>> it could be problem with flushing the cache; I did a read of
>> 1GB data from start of RAM to a volatile var. I assume that
>> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)
>
> You've managed to get the cache to evict some lines, which proves my
> theory, but this is absolute nonsense and guarantees nothing.
>
> So NAK to this.
>

Yes I was just shooting wildly.

> If you need to perform cache maintenance to guarantee data is visible to
> non-cacheable accesses  you _must_ use the architected mechanism for
> cleaning data to the PoC: DC CVAC. We have wrappers for flushing ranges.
>
> Anything else is nonsense and does not provide the guarantee you need.
>
> That said, I still am not sure what guarantee you are attempting to get.
> Which data do you need out at the PoC?
>

I tried flushing the jump addr

##########
 +static unsigned long jump_addr_save;
 void soft_restart(unsigned long addr)
 {
        typedef void (*phys_reset_t)(unsigned long);
        phys_reset_t phys_reset;

+       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
+       jump_addr_save = addr;
+        __flush_dcache_area(&jump_addr_save, 16);
+        __flush_dcache_area(&phys_reset, 16);
        setup_restart();

        /* Switch to the identity mapping */
-       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
-       phys_reset(addr);
+       phys_reset(jump_addr_save);

        /* Should never get here */
        BUG();
###########

And flushing all the source and destination pages of the kexeced
kernel

##########
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 2995c78..3edf567 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -221,6 +221,8 @@ static void _kexec_entry_dump(const char *func, int line,
                                                addr,
                                                (unsigned
long)virt_to_phys(dest),
                                                dest);
+                               __flush_dcache_area(addr, PAGE_SIZE);
+                               __flush_dcache_area(dest, PAGE_SIZE);
                                dest += PAGE_SIZE;
                                break;
                        case IND_DONE:
###########

It still fails(not reboot to kexeced kernel); That means I miss
flushing of some other
area.

--Arun

> Thanks,
> Mark.
>
>>
>> ###################
>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> index 786daa6..90418f3 100644
>> --- a/arch/arm64/kernel/process.c
>> +++ b/arch/arm64/kernel/process.c
>> @@ -63,6 +63,10 @@ static inline void smp_secondary_shutdown(void) {}
>>
>>  static void setup_restart(void)
>>  {
>> +       volatile u64 tmp;
>> +       volatile u64 *addr;
>> +
>> +       addr = (u64 *)(0xffffffc000000000);
>>         /*
>>          * Tell the mm system that we are going to reboot -
>>          * we may need it to insert some 1:1 mappings so that
>> @@ -75,6 +79,11 @@ static void setup_restart(void)
>>         /* Clean and invalidate caches */
>>         flush_cache_all();
>>
>> +       for ( ;addr < (u64 *)0xffffffc040000000; addr++)
>> +       {
>> +               tmp = *addr;
>> +       }
>> +
>>         /* Turn D-cache off */
>>         cpu_cache_off();
>>
>> ###################
>>
>> With the above change latest kernel @
>> https://git.linaro.org/people/geoff.levand/linux-kexec.git
>> is able to do kexec with L3 enabled in UP scenario.
>>
>> --Arun
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > -Geoff
>> >
>>

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-25 11:48                 ` Arun Chandran
  2014-07-25 12:14                   ` Mark Rutland
@ 2014-07-26  0:18                   ` Geoff Levand
  2014-07-28 15:00                     ` Arun Chandran
  1 sibling, 1 reply; 61+ messages in thread
From: Geoff Levand @ 2014-07-26  0:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Fri, 2014-07-25 at 17:18 +0530, Arun Chandran wrote:
> I got this working. As 'Mark Rutland' pointed in another mail that
> it could be problem with flushing the cache; I did a read of
> 1GB data from start of RAM to a volatile var. I assume that
> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)

I wasn't flushing out all the data used by relocate_new_kernel.  I added
a routine that should flush all the pages in the kimage list out to PoC.
Please try a UP build with L3 enabled.

Next week I'll check that my smp secondary shutdown code is doing the
correct thing.

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-26  0:18                   ` Geoff Levand
@ 2014-07-28 15:00                     ` Arun Chandran
  2014-07-28 15:38                       ` Mark Rutland
  0 siblings, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-07-28 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff

On Sat, Jul 26, 2014 at 5:48 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Fri, 2014-07-25 at 17:18 +0530, Arun Chandran wrote:
>> I got this working. As 'Mark Rutland' pointed in another mail that
>> it could be problem with flushing the cache; I did a read of
>> 1GB data from start of RAM to a volatile var. I assume that
>> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)
>
> I wasn't flushing out all the data used by relocate_new_kernel.  I added
> a routine that should flush all the pages in the kimage list out to PoC.
> Please try a UP build with L3 enabled.
>

Yes I verified this new kernel by stopping just before jumping to the
kexeced kernel and taking the memory dump for the kernel
and dtb.

CPU#0>dump 0x0000004000080000 0x7F2000 uImage
CPU#0>rd
GPR00: 0000004000880000 0000000000000000 0000000000000000 0000000000000000
GPR04: 0000004000080004 000000000000001b 0000000000000000 ffffffffffffffff
GPR08: 0000000000000020 ffffffffffffffff 0000000000000004 0000000000000002
GPR12: 00000043ea439fd8 0000004000883000 00000043ea631000 ffffffffffffffff
GPR16: ffffffc0000cc31c 0000000000435260 0000007fe9328b90 00000043eaddc002
GPR20: 0000004000883000 00000043ea632000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: 0000000000000000 0000000000000000 ffffffc000085074 ffffffc3eae53d00

And compared this image with the one taken after booting with
my working one(The one with 1GB read of RAM). Both are Identical.
That means kexec code now flushes data properly. Note that in both
cases I used the same secondary kernel.

But It fails to boot kexeced kernel. If I break the target I can see
CPU#0>h
    Core number       : 0
    Core state        : debug (AArch64 EL1)
    Debug entry cause : External Debug Request
    Current PC        : 0xffffffc000083200
    Current CPSR      : 0x600003c5 (EL1h)


So I guess there may be something wrong with
the booting of kernel.

I have these changes to the code.
1)
#########
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 00cfbd6..da3672b 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -684,7 +691,7 @@ void machine_kexec(struct kimage *image)
        /* Flush the reboot_code_buffer in preparation for its execution. */

        flush_icache_range((unsigned long)reboot_code_buffer,
-               relocate_new_kernel_size);
+               (unsigned long)(reboot_code_buffer + relocate_new_kernel_size));

        /*
         * Flush any data used by relocate_new_kernel in preparation for
#########
Passing of second variable to flush_icache_range() is wrong
it expects an address not length.

2)

#######
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 9ed7327..e3fc8d6 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c

@@ -84,12 +91,17 @@ void soft_restart(unsigned long addr)
 {
        typedef void (*phys_reset_t)(unsigned long);
        phys_reset_t phys_reset;
+       unsigned long jump_addr = addr;
+
+       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
+
+       __flush_dcache_area(&jump_addr, 8);
+       __flush_dcache_area(&phys_reset, 8);

        setup_restart();

        /* Switch to the identity mapping */
-       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
-       phys_reset(addr);
+       phys_reset(jump_addr);

        /* Should never get here */
        BUG();
########

Without this flushing it will jump to wrong addr.

--Arun

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-28 15:00                     ` Arun Chandran
@ 2014-07-28 15:38                       ` Mark Rutland
  2014-07-29  0:09                         ` Geoff Levand
  0 siblings, 1 reply; 61+ messages in thread
From: Mark Rutland @ 2014-07-28 15:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote:
> Hi Geoff
> 
> On Sat, Jul 26, 2014 at 5:48 AM, Geoff Levand <geoff@infradead.org> wrote:
> > Hi Arun,
> >
> > On Fri, 2014-07-25 at 17:18 +0530, Arun Chandran wrote:
> >> I got this working. As 'Mark Rutland' pointed in another mail that
> >> it could be problem with flushing the cache; I did a read of
> >> 1GB data from start of RAM to a volatile var. I assume that
> >> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)
> >
> > I wasn't flushing out all the data used by relocate_new_kernel.  I added
> > a routine that should flush all the pages in the kimage list out to PoC.
> > Please try a UP build with L3 enabled.
> >
> 
> Yes I verified this new kernel by stopping just before jumping to the
> kexeced kernel and taking the memory dump for the kernel
> and dtb.
> 
> CPU#0>dump 0x0000004000080000 0x7F2000 uImage
> CPU#0>rd
> GPR00: 0000004000880000 0000000000000000 0000000000000000 0000000000000000
> GPR04: 0000004000080004 000000000000001b 0000000000000000 ffffffffffffffff
> GPR08: 0000000000000020 ffffffffffffffff 0000000000000004 0000000000000002
> GPR12: 00000043ea439fd8 0000004000883000 00000043ea631000 ffffffffffffffff
> GPR16: ffffffc0000cc31c 0000000000435260 0000007fe9328b90 00000043eaddc002
> GPR20: 0000004000883000 00000043ea632000 0000000000000000 0000000000000000
> GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR28: 0000000000000000 0000000000000000 ffffffc000085074 ffffffc3eae53d00
> 
> And compared this image with the one taken after booting with
> my working one(The one with 1GB read of RAM). Both are Identical.
> That means kexec code now flushes data properly. Note that in both
> cases I used the same secondary kernel.
> 
> But It fails to boot kexeced kernel. If I break the target I can see
> CPU#0>h
>     Core number       : 0
>     Core state        : debug (AArch64 EL1)
>     Debug entry cause : External Debug Request
>     Current PC        : 0xffffffc000083200
>     Current CPSR      : 0x600003c5 (EL1h)
> 
> 
> So I guess there may be something wrong with
> the booting of kernel.
> 
> I have these changes to the code.
> 1)
> #########
> diff --git a/arch/arm64/kernel/machine_kexec.c
> b/arch/arm64/kernel/machine_kexec.c
> index 00cfbd6..da3672b 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -684,7 +691,7 @@ void machine_kexec(struct kimage *image)
>         /* Flush the reboot_code_buffer in preparation for its execution. */
> 
>         flush_icache_range((unsigned long)reboot_code_buffer,
> -               relocate_new_kernel_size);
> +               (unsigned long)(reboot_code_buffer + relocate_new_kernel_size));
> 
>         /*
>          * Flush any data used by relocate_new_kernel in preparation for
> #########
> Passing of second variable to flush_icache_range() is wrong
> it expects an address not length.

A simpler option would be to nuke the entire icache before branching to
the new image.

> 
> 2)
> 
> #######
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 9ed7327..e3fc8d6 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> 
> @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr)
>  {
>         typedef void (*phys_reset_t)(unsigned long);
>         phys_reset_t phys_reset;
> +       unsigned long jump_addr = addr;
> +
> +       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
> +
> +       __flush_dcache_area(&jump_addr, 8);
> +       __flush_dcache_area(&phys_reset, 8);

Are these values really not getting stashed in registers?

If the compiler is spilling, then we have absolutely no guarantee about
any part of the stack. If that's the case, then we can't use the stack
at all. These need to be rewritten in asm if the compiler is spilling.

Thanks,
Mark.

> 
>         setup_restart();
> 
>         /* Switch to the identity mapping */
> -       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
> -       phys_reset(addr);
> +       phys_reset(jump_addr);
> 
>         /* Should never get here */
>         BUG();
> ########
> 
> Without this flushing it will jump to wrong addr.
> 
> --Arun
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-28 15:38                       ` Mark Rutland
@ 2014-07-29  0:09                         ` Geoff Levand
  2014-07-29  9:10                           ` Mark Rutland
  2014-07-29 12:32                           ` Arun Chandran
  0 siblings, 2 replies; 61+ messages in thread
From: Geoff Levand @ 2014-07-29  0:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, 2014-07-28 at 16:38 +0100, Mark Rutland wrote:
> On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote:
> > I have these changes to the code.
> >         flush_icache_range((unsigned long)reboot_code_buffer,
> > -               relocate_new_kernel_size);
> > +               (unsigned long)(reboot_code_buffer + relocate_new_kernel_size));

Thanks, I introduced this in my last version in an attempt to clean up
the code, but on studying setup_restart(), I wonder if we even need to
do this icache flush here (see below).

> >         /*
> >          * Flush any data used by relocate_new_kernel in preparation for
> > #########
> > Passing of second variable to flush_icache_range() is wrong
> > it expects an address not length.
> 
> A simpler option would be to nuke the entire icache before branching to
> the new image.

flush_cache_all(), which is called by setup_restart(), does a 'ic
ialluis'.  The ARM says that this will invalidate all instruction caches
for the inner shareable domain.  Do we need something more?

> > 2)
> > 
> > #######
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index 9ed7327..e3fc8d6 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > 
> > @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr)
> >  {
> >         typedef void (*phys_reset_t)(unsigned long);
> >         phys_reset_t phys_reset;
> > +       unsigned long jump_addr = addr;
> > +
> > +       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
> > +
> > +       __flush_dcache_area(&jump_addr, 8);
> > +       __flush_dcache_area(&phys_reset, 8);
> 
> Are these values really not getting stashed in registers?

Looking at the disassembled code of soft_restart() from my compiler,
addr is being saved on the stack over the call to setup_restart(), which
I would expect it to do.

> If the compiler is spilling, then we have absolutely no guarantee about
> any part of the stack. If that's the case, then we can't use the stack
> at all. These need to be rewritten in asm if the compiler is spilling.

I think we just need to put the restart addr in a variable and flush
that to the PoC.

Arun, I pushed out a fixed version of soft_restart(), so please try
another UP + L3 boot.

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-29  0:09                         ` Geoff Levand
@ 2014-07-29  9:10                           ` Mark Rutland
  2014-07-29 12:32                           ` Arun Chandran
  1 sibling, 0 replies; 61+ messages in thread
From: Mark Rutland @ 2014-07-29  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 29, 2014 at 01:09:08AM +0100, Geoff Levand wrote:
> Hi,

Hi,

> On Mon, 2014-07-28 at 16:38 +0100, Mark Rutland wrote:
> > On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote:
> > > I have these changes to the code.
> > >         flush_icache_range((unsigned long)reboot_code_buffer,
> > > -               relocate_new_kernel_size);
> > > +               (unsigned long)(reboot_code_buffer + relocate_new_kernel_size));
> 
> Thanks, I introduced this in my last version in an attempt to clean up
> the code, but on studying setup_restart(), I wonder if we even need to
> do this icache flush here (see below).
> 
> > >         /*
> > >          * Flush any data used by relocate_new_kernel in preparation for
> > > #########
> > > Passing of second variable to flush_icache_range() is wrong
> > > it expects an address not length.
> > 
> > A simpler option would be to nuke the entire icache before branching to
> > the new image.
> 
> flush_cache_all(), which is called by setup_restart(), does a 'ic
> ialluis'.  The ARM says that this will invalidate all instruction caches
> for the inner shareable domain.  Do we need something more?

If we have that before branching to the new image then that should be
ok. The current image should already be visible at the PoC per the boot
protocol.

> > > 2)
> > > 
> > > #######
> > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > index 9ed7327..e3fc8d6 100644
> > > --- a/arch/arm64/kernel/process.c
> > > +++ b/arch/arm64/kernel/process.c
> > > 
> > > @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr)
> > >  {
> > >         typedef void (*phys_reset_t)(unsigned long);
> > >         phys_reset_t phys_reset;
> > > +       unsigned long jump_addr = addr;
> > > +
> > > +       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
> > > +
> > > +       __flush_dcache_area(&jump_addr, 8);
> > > +       __flush_dcache_area(&phys_reset, 8);
> > 
> > Are these values really not getting stashed in registers?
> 
> Looking at the disassembled code of soft_restart() from my compiler,
> addr is being saved on the stack over the call to setup_restart(), which
> I would expect it to do.
 
> > If the compiler is spilling, then we have absolutely no guarantee about
> > any part of the stack. If that's the case, then we can't use the stack
> > at all. These need to be rewritten in asm if the compiler is spilling.
> 
> I think we just need to put the restart addr in a variable and flush
> that to the PoC.

I don't believe that flushing the restart addr variable on the stack out
to the PoC is the correct fix here; it only guarantees that said
variable is visible, not anything else the compiler may have placed on
the stack. I wonder how setup_restart interacts with
CONFIG_CC_STACKPROTECTOR for example.

As far as I can see, the only way to provide the guarantee we require
here is to not use the stack. Anything short of that is not going to be
much more robust than the current code.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-29  0:09                         ` Geoff Levand
  2014-07-29  9:10                           ` Mark Rutland
@ 2014-07-29 12:32                           ` Arun Chandran
  2014-07-29 13:35                             ` Mark Rutland
  1 sibling, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-07-29 12:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Tue, Jul 29, 2014 at 5:39 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi,
>
> On Mon, 2014-07-28 at 16:38 +0100, Mark Rutland wrote:
>> On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote:
>> > I have these changes to the code.
>> >         flush_icache_range((unsigned long)reboot_code_buffer,
>> > -               relocate_new_kernel_size);
>> > +               (unsigned long)(reboot_code_buffer + relocate_new_kernel_size));
>
> Thanks, I introduced this in my last version in an attempt to clean up
> the code, but on studying setup_restart(), I wonder if we even need to
> do this icache flush here (see below).
>
>> >         /*
>> >          * Flush any data used by relocate_new_kernel in preparation for
>> > #########
>> > Passing of second variable to flush_icache_range() is wrong
>> > it expects an address not length.
>>
>> A simpler option would be to nuke the entire icache before branching to
>> the new image.
>
> flush_cache_all(), which is called by setup_restart(), does a 'ic
> ialluis'.  The ARM says that this will invalidate all instruction caches
> for the inner shareable domain.  Do we need something more?
>
>> > 2)
>> >
>> > #######
>> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> > index 9ed7327..e3fc8d6 100644
>> > --- a/arch/arm64/kernel/process.c
>> > +++ b/arch/arm64/kernel/process.c
>> >
>> > @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr)
>> >  {
>> >         typedef void (*phys_reset_t)(unsigned long);
>> >         phys_reset_t phys_reset;
>> > +       unsigned long jump_addr = addr;
>> > +
>> > +       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
>> > +
>> > +       __flush_dcache_area(&jump_addr, 8);
>> > +       __flush_dcache_area(&phys_reset, 8);
>>
>> Are these values really not getting stashed in registers?
>
> Looking at the disassembled code of soft_restart() from my compiler,
> addr is being saved on the stack over the call to setup_restart(), which
> I would expect it to do.
>
Yes my compiler also saves this in stack

>> If the compiler is spilling, then we have absolutely no guarantee about
>> any part of the stack. If that's the case, then we can't use the stack
>> at all. These need to be rewritten in asm if the compiler is spilling.
>
> I think we just need to put the restart addr in a variable and flush
> that to the PoC.
>
> Arun, I pushed out a fixed version of soft_restart(), so please try
> another UP + L3 boot.
>

The default code did not work.

It is working with the change below

###############
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 5632473..7c5f859 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb)
 /**
  * kexec_list_walk - Helper to walk the kimage page list.
  */
-
+static int kexec_kernel_size;
+#define IMG_SIZE_NONE  0
+#define KERN_SIZE_FLAG 1
+#define DTB_SIZE_FLAG  2
 static void kexec_list_walk(void *ctx, unsigned long kimage_head,
        void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
 {
        void *dest;
        unsigned long *entry;
+       int imgsize_flag = IMG_SIZE_NONE;
+

        for (entry = &kimage_head, dest = NULL; ; entry++) {
                unsigned int flag = *entry & IND_FLAGS;
@@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned
long kimage_head,
                        cb(ctx, flag, addr, NULL);
                        break;
                case IND_DESTINATION:
+                       if (imgsize_flag == IMG_SIZE_NONE) {
+                               kexec_kernel_size = 0;
+                               imgsize_flag = KERN_SIZE_FLAG;
+                       } else if (imgsize_flag == KERN_SIZE_FLAG) {
+                               imgsize_flag = DTB_SIZE_FLAG;
+                       }
                        dest = addr;
                        cb(ctx, flag, addr, NULL);
                        break;
                case IND_SOURCE:
+                       if (imgsize_flag == KERN_SIZE_FLAG)
+                               kexec_kernel_size++;
                        cb(ctx, flag, addr, dest);
                        dest += PAGE_SIZE;
                        break;
@@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image)

        kexec_list_walk(NULL, image->head, kexec_list_flush_cb);

+       /*
+        * Make sure virtual addresses of new kernel are flushed
+        * SZ_512K = TEXT_OFFSET
+        * kexec_kernel = kexec_kernel_size * PAGE_SIZE
+        * Don't know = (SZ_4M + SZ_1M)
+        * SZ_4M = not working
+        * SZ_6M = working
+        * SZ_8M = working
+        *
+        * so chose SZ_4M + SZ_1M; Don't know why this is required
+        * BSS, stack ??
+        *
+        */
+       __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K +
(kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M);
+
        soft_restart(reboot_code_buffer_phys);
 }


--Arun

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-29 12:32                           ` Arun Chandran
@ 2014-07-29 13:35                             ` Mark Rutland
  2014-07-29 21:19                               ` Geoff Levand
                                                 ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Mark Rutland @ 2014-07-29 13:35 UTC (permalink / raw)
  To: linux-arm-kernel

[...]

> The default code did not work.
> 
> It is working with the change below
> 
> ###############
> diff --git a/arch/arm64/kernel/machine_kexec.c
> b/arch/arm64/kernel/machine_kexec.c
> index 5632473..7c5f859 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb)
>  /**
>   * kexec_list_walk - Helper to walk the kimage page list.
>   */
> -
> +static int kexec_kernel_size;
> +#define IMG_SIZE_NONE  0
> +#define KERN_SIZE_FLAG 1
> +#define DTB_SIZE_FLAG  2
>  static void kexec_list_walk(void *ctx, unsigned long kimage_head,
>         void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
>  {
>         void *dest;
>         unsigned long *entry;
> +       int imgsize_flag = IMG_SIZE_NONE;
> +
> 
>         for (entry = &kimage_head, dest = NULL; ; entry++) {
>                 unsigned int flag = *entry & IND_FLAGS;
> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned
> long kimage_head,
>                         cb(ctx, flag, addr, NULL);
>                         break;
>                 case IND_DESTINATION:
> +                       if (imgsize_flag == IMG_SIZE_NONE) {
> +                               kexec_kernel_size = 0;
> +                               imgsize_flag = KERN_SIZE_FLAG;
> +                       } else if (imgsize_flag == KERN_SIZE_FLAG) {
> +                               imgsize_flag = DTB_SIZE_FLAG;
> +                       }
>                         dest = addr;
>                         cb(ctx, flag, addr, NULL);
>                         break;
>                 case IND_SOURCE:
> +                       if (imgsize_flag == KERN_SIZE_FLAG)
> +                               kexec_kernel_size++;
>                         cb(ctx, flag, addr, dest);
>                         dest += PAGE_SIZE;
>                         break;
> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image)
> 
>         kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
> 
> +       /*
> +        * Make sure virtual addresses of new kernel are flushed
> +        * SZ_512K = TEXT_OFFSET

TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also
shouldn't need to be flushed.

Since c218bca74eea (arm64: Relax the kernel cache requirements for
boot), the kernel will flush the cache for anything outside of the Image
that it writes to before enabling the MMU and caches (e.g. the idmap and
swapper page tables). Once caches are up we shouldn't care.

Assuming that the existing kernel code is correct, the only region we
should need to flush out to the PoC is the region from _text to _edata
(i.e. just the contents of the Image).

> +        * kexec_kernel = kexec_kernel_size * PAGE_SIZE
> +        * Don't know = (SZ_4M + SZ_1M)
> +        * SZ_4M = not working
> +        * SZ_6M = working
> +        * SZ_8M = working
> +        *
> +        * so chose SZ_4M + SZ_1M; Don't know why this is required
> +        * BSS, stack ??
> +        *
> +        */
> +       __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K +
> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M);
> +
>         soft_restart(reboot_code_buffer_phys);
>  }

How big exactly is the kernel Image you're trying to kexec?

Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-29 13:35                             ` Mark Rutland
@ 2014-07-29 21:19                               ` Geoff Levand
  2014-07-30  7:22                                 ` Arun Chandran
  2014-07-30  5:46                               ` Arun Chandran
  2014-07-30  7:01                               ` Arun Chandran
  2 siblings, 1 reply; 61+ messages in thread
From: Geoff Levand @ 2014-07-29 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
> Since c218bca74eea (arm64: Relax the kernel cache requirements for
> boot), the kernel will flush the cache for anything outside of the Image
> that it writes to before enabling the MMU and caches (e.g. the idmap and
> swapper page tables). Once caches are up we shouldn't care.
> 
> Assuming that the existing kernel code is correct, the only region we
> should need to flush out to the PoC is the region from _text to _edata
> (i.e. just the contents of the Image).

If the new kernel will overwrite the old one, then we do the final copy
of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
is executed after the dcache is disabled, so that should write it directly
to the PoC.  It seems the protocol expects us to invalidate the dcache
for that range though, so I added code to do this, essentially what Arun
had added.

Arun, please try.

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-29 21:19                               ` Geoff Levand
@ 2014-07-30  7:22                                 ` Arun Chandran
  2014-08-01 11:13                                   ` Arun Chandran
  2014-08-04 10:16                                   ` Arun Chandran
  0 siblings, 2 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-30  7:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Mark,
>
> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
>> Since c218bca74eea (arm64: Relax the kernel cache requirements for
>> boot), the kernel will flush the cache for anything outside of the Image
>> that it writes to before enabling the MMU and caches (e.g. the idmap and
>> swapper page tables). Once caches are up we shouldn't care.
>>
>> Assuming that the existing kernel code is correct, the only region we
>> should need to flush out to the PoC is the region from _text to _edata
>> (i.e. just the contents of the Image).
>
> If the new kernel will overwrite the old one, then we do the final copy
> of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
> is executed after the dcache is disabled, so that should write it directly
> to the PoC.  It seems the protocol expects us to invalidate the dcache
> for that range though, so I added code to do this, essentially what Arun
> had added.
>
> Arun, please try.
>
It works without any hiccups :)..
I have attached the log.

I will try with big-endian UP configuration next.

Just had to comment the crashdump code to compile with
defconfig.

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 233cd04..6f4ebd8 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -377,6 +377,7 @@ static void __init __maybe_unused reserve_crashkernel(void)
        insert_resource(&iomem_resource, &crashk_res);
 }

+#ifdef CONFIG_CRASH_DUMP
 static void __init __maybe_unused reserve_elfcorehdr(void)
 {
        struct resource res;
@@ -402,6 +403,7 @@ static void __init __maybe_unused reserve_elfcorehdr(void)
        res.end = elfcorehdr_addr + elfcorehdr_size - 1;
        insert_resource(&iomem_resource, &res);
 }
+#endif

 static void __init request_standard_resources(void)
 {

--Arun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kexec_log
Type: application/octet-stream
Size: 15206 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140730/b4be6f8b/attachment-0001.obj>

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-30  7:22                                 ` Arun Chandran
@ 2014-08-01 11:13                                   ` Arun Chandran
  2014-08-03 14:47                                     ` Mark Rutland
  2014-08-04 10:16                                   ` Arun Chandran
  1 sibling, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-08-01 11:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote:
> On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote:
>> Hi Mark,
>>
>> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
>>> Since c218bca74eea (arm64: Relax the kernel cache requirements for
>>> boot), the kernel will flush the cache for anything outside of the Image
>>> that it writes to before enabling the MMU and caches (e.g. the idmap and
>>> swapper page tables). Once caches are up we shouldn't care.
>>>
>>> Assuming that the existing kernel code is correct, the only region we
>>> should need to flush out to the PoC is the region from _text to _edata
>>> (i.e. just the contents of the Image).
>>
>> If the new kernel will overwrite the old one, then we do the final copy
>> of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
>> is executed after the dcache is disabled, so that should write it directly
>> to the PoC.  It seems the protocol expects us to invalidate the dcache
>> for that range though, so I added code to do this, essentially what Arun
>> had added.
>>
>> Arun, please try.
>>
> It works without any hiccups :)..
> I have attached the log.
>
> I will try with big-endian UP configuration next.
>

This question may be irrelevant to kexec and may be stupid also.

while debugging kexec in BIG-endian configuration I see
_create_page_tables (arch/arm64/kernel/head.S) is
doing __inval_cache_range on the addresses
from idmap_pg_dir to swapper_pg_dir.

ie from 0x4000D5F000 to 0x4000D61000 + #SWAPPER_DIR_SIZE
in my case.

Is it supposed to clear the corresponding virtual addresses?

There might be chance that first stage kernel may be using VA
from the same area right? (cache (L3)containing valid lines in the
area 0xffffffc000d5f000 to 0xffffffc000d5f000 + 16K)

Or is it the duty of the kexec to flush the entire
VA regions used by the first stage kernel and
the VA regions going to be used by the 2nd
stage kernel?

--Arun


> Just had to comment the crashdump code to compile with
> defconfig.
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 233cd04..6f4ebd8 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -377,6 +377,7 @@ static void __init __maybe_unused reserve_crashkernel(void)
>         insert_resource(&iomem_resource, &crashk_res);
>  }
>
> +#ifdef CONFIG_CRASH_DUMP
>  static void __init __maybe_unused reserve_elfcorehdr(void)
>  {
>         struct resource res;
> @@ -402,6 +403,7 @@ static void __init __maybe_unused reserve_elfcorehdr(void)
>         res.end = elfcorehdr_addr + elfcorehdr_size - 1;
>         insert_resource(&iomem_resource, &res);
>  }
> +#endif
>
>  static void __init request_standard_resources(void)
>  {
>
> --Arun

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-01 11:13                                   ` Arun Chandran
@ 2014-08-03 14:47                                     ` Mark Rutland
  0 siblings, 0 replies; 61+ messages in thread
From: Mark Rutland @ 2014-08-03 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 01, 2014 at 12:13:12PM +0100, Arun Chandran wrote:
> On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote:
> > On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote:
> >> Hi Mark,
> >>
> >> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
> >>> Since c218bca74eea (arm64: Relax the kernel cache requirements for
> >>> boot), the kernel will flush the cache for anything outside of the Image
> >>> that it writes to before enabling the MMU and caches (e.g. the idmap and
> >>> swapper page tables). Once caches are up we shouldn't care.
> >>>
> >>> Assuming that the existing kernel code is correct, the only region we
> >>> should need to flush out to the PoC is the region from _text to _edata
> >>> (i.e. just the contents of the Image).
> >>
> >> If the new kernel will overwrite the old one, then we do the final copy
> >> of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
> >> is executed after the dcache is disabled, so that should write it directly
> >> to the PoC.  It seems the protocol expects us to invalidate the dcache
> >> for that range though, so I added code to do this, essentially what Arun
> >> had added.
> >>
> >> Arun, please try.
> >>
> > It works without any hiccups :)..
> > I have attached the log.
> >
> > I will try with big-endian UP configuration next.
> >
> 
> This question may be irrelevant to kexec and may be stupid also.
> 
> while debugging kexec in BIG-endian configuration I see
> _create_page_tables (arch/arm64/kernel/head.S) is
> doing __inval_cache_range on the addresses
> from idmap_pg_dir to swapper_pg_dir.
> 
> ie from 0x4000D5F000 to 0x4000D61000 + #SWAPPER_DIR_SIZE
> in my case.
> 
> Is it supposed to clear the corresponding virtual addresses?

The data caches behave in a PIPT like fashion, and there are no aliases.
Flushing by any VA that maps to a particular PA will flush out the only
entry that cna possibly exist for that PA.

While the MMU is off, the VA->PA mapping is an idmap. Given that, we can
flush by PA.

> There might be chance that first stage kernel may be using VA
> from the same area right? (cache (L3)containing valid lines in the
> area 0xffffffc000d5f000 to 0xffffffc000d5f000 + 16K)

The L3 cache should never see the VA, only the PA.

> Or is it the duty of the kexec to flush the entire
> VA regions used by the first stage kernel and
> the VA regions going to be used by the 2nd
> stage kernel?

We should only need ensure that the new kernel image and anything we
expect to use before the caches have been eanbled are flushed to the
PoC.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-30  7:22                                 ` Arun Chandran
  2014-08-01 11:13                                   ` Arun Chandran
@ 2014-08-04 10:16                                   ` Arun Chandran
  2014-08-04 11:35                                     ` Mark Rutland
  2014-08-04 17:21                                     ` Geoff Levand
  1 sibling, 2 replies; 61+ messages in thread
From: Arun Chandran @ 2014-08-04 10:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote:
> On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote:
>> Hi Mark,
>>
>> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
>>> Since c218bca74eea (arm64: Relax the kernel cache requirements for
>>> boot), the kernel will flush the cache for anything outside of the Image
>>> that it writes to before enabling the MMU and caches (e.g. the idmap and
>>> swapper page tables). Once caches are up we shouldn't care.
>>>
>>> Assuming that the existing kernel code is correct, the only region we
>>> should need to flush out to the PoC is the region from _text to _edata
>>> (i.e. just the contents of the Image).
>>
>> If the new kernel will overwrite the old one, then we do the final copy
>> of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
>> is executed after the dcache is disabled, so that should write it directly
>> to the PoC.  It seems the protocol expects us to invalidate the dcache
>> for that range though, so I added code to do this, essentially what Arun
>> had added.
>>
>> Arun, please try.
>>
> It works without any hiccups :)..
> I have attached the log.
>
> I will try with big-endian UP configuration next.
>

The latest kexec code is working fine in LE/BE mode in UP configuration.

I had to change kexec-tools code a bit to make sure that "kexec -l"
is not putting dtb at an area where kernel is building its initial page
tables.

#########################
diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
index ab7a9ac..8f04473 100644
--- a/kexec/arch/arm64/kexec-arm64.c
+++ b/kexec/arch/arm64/kexec-arm64.c
@@ -526,6 +526,7 @@ int arm64_load_other_segments(struct kexec_info *info)
        off_t dtb_base;
        char *initrd_buf = NULL;
        char command_line[COMMAND_LINE_SIZE] = "";
+       unsigned long dtb_start;

        /* Processing for arm64_opts.dtb and arm64_opts.command_line. */

@@ -554,8 +555,16 @@ int arm64_load_other_segments(struct kexec_info *info)
         * kernel with an alignment of 128 KiB, giving a max supported DTB
         * size of 128 KiB (worst case). */

+       /*
+        * arm64 kernel uses area above kernel image to build
+        * initial page tables. Max required size for this area is 384K. It
+        * happens when CONFIG_ARM64_PGTABLE_LEVELS is set.
+        * So place dtb 512k above kernel image area.
+        */
+       dtb_start = (unsigned long)info->segment[0].mem +
info->segment[0].memsz + 512UL * 1024;
+
        dtb_base = locate_hole(info, dtb_2.size, 128UL * 1024,
-               (unsigned long)info->entry,
+               (unsigned long)dtb_start,
                (unsigned long)info->entry + 512 * 1024 * 1024, 1);

        dbgprintf("dtb:    base %lx, size %lxh (%ld)\n", (unsigned
long)dtb_base,

########################

This code works fine with LE as well as BE. I have attached the logs
for both.

Now before trying SMP configuration I want to know whether the below "kexec -e"
scenarios are valid(required)?

1st stage                                  |               2nd stage
---------------------------------------------------------------------------------------------
LE UP                                      |               BE UP
LE UP                                      |               BE SMP
LE UP                                      |               LE SMP
LE SMP                                   |               LE UP
LE SMP                                   |               BE UP
LE SMP                                   |               BE SMP


--Arun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kexec_be_log
Type: application/octet-stream
Size: 16869 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140804/8a02a68c/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kexec_le_log
Type: application/octet-stream
Size: 15548 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140804/8a02a68c/attachment-0003.obj>

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-04 10:16                                   ` Arun Chandran
@ 2014-08-04 11:35                                     ` Mark Rutland
  2014-08-07  0:40                                       ` Geoff Levand
  2014-08-04 17:21                                     ` Geoff Levand
  1 sibling, 1 reply; 61+ messages in thread
From: Mark Rutland @ 2014-08-04 11:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 04, 2014 at 11:16:02AM +0100, Arun Chandran wrote:
> Hi Geoff,
> 
> On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote:
> > On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote:
> >> Hi Mark,
> >>
> >> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
> >>> Since c218bca74eea (arm64: Relax the kernel cache requirements for
> >>> boot), the kernel will flush the cache for anything outside of the Image
> >>> that it writes to before enabling the MMU and caches (e.g. the idmap and
> >>> swapper page tables). Once caches are up we shouldn't care.
> >>>
> >>> Assuming that the existing kernel code is correct, the only region we
> >>> should need to flush out to the PoC is the region from _text to _edata
> >>> (i.e. just the contents of the Image).
> >>
> >> If the new kernel will overwrite the old one, then we do the final copy
> >> of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
> >> is executed after the dcache is disabled, so that should write it directly
> >> to the PoC.  It seems the protocol expects us to invalidate the dcache
> >> for that range though, so I added code to do this, essentially what Arun
> >> had added.
> >>
> >> Arun, please try.
> >>
> > It works without any hiccups :)..
> > I have attached the log.
> >
> > I will try with big-endian UP configuration next.
> >
> 
> The latest kexec code is working fine in LE/BE mode in UP configuration.
> 
> I had to change kexec-tools code a bit to make sure that "kexec -l"
> is not putting dtb at an area where kernel is building its initial page
> tables.
> 
> #########################
> diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> index ab7a9ac..8f04473 100644
> --- a/kexec/arch/arm64/kexec-arm64.c
> +++ b/kexec/arch/arm64/kexec-arm64.c
> @@ -526,6 +526,7 @@ int arm64_load_other_segments(struct kexec_info *info)
>         off_t dtb_base;
>         char *initrd_buf = NULL;
>         char command_line[COMMAND_LINE_SIZE] = "";
> +       unsigned long dtb_start;
> 
>         /* Processing for arm64_opts.dtb and arm64_opts.command_line. */
> 
> @@ -554,8 +555,16 @@ int arm64_load_other_segments(struct kexec_info *info)
>          * kernel with an alignment of 128 KiB, giving a max supported DTB
>          * size of 128 KiB (worst case). */
> 
> +       /*
> +        * arm64 kernel uses area above kernel image to build
> +        * initial page tables. Max required size for this area is 384K. It
> +        * happens when CONFIG_ARM64_PGTABLE_LEVELS is set.
> +        * So place dtb 512k above kernel image area.
> +        */

The area above the kernel Image holds the BSS too, and from some
experiments I did a while back with an allyesconfig, it was possible to
have many megabytes of BSS.

While 512k might be adequate for a defconfig today, it will become less
so as time goes on.

For kernel Images post v3.17 you can and should read the Image header's
image_size field to figure out how much memory is needed from _text to
_end (i.e. all the memory the kernel will assume is available statically
and will happily clobber). For kernel Images prior to v3.17 the best
thing we can do is guess as you've done here, or place the DTB below the
kernel and page tables in the TEXT_OFFSET area; neither option is
particularly brilliant.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-04 11:35                                     ` Mark Rutland
@ 2014-08-07  0:40                                       ` Geoff Levand
  2014-08-07  9:59                                         ` Mark Rutland
  0 siblings, 1 reply; 61+ messages in thread
From: Geoff Levand @ 2014-08-07  0:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, 2014-08-04 at 12:35 +0100, Mark Rutland wrote:
> > +       /*
> > +        * arm64 kernel uses area above kernel image to build
> > +        * initial page tables. Max required size for this area is 384K. It
> > +        * happens when CONFIG_ARM64_PGTABLE_LEVELS is set.
> > +        * So place dtb 512k above kernel image area.
> > +        */
> 
> The area above the kernel Image holds the BSS too, and from some
> experiments I did a while back with an allyesconfig, it was possible to
> have many megabytes of BSS.
> 
> While 512k might be adequate for a defconfig today, it will become less
> so as time goes on.
> 
> For kernel Images post v3.17 you can and should read the Image header's
> image_size field to figure out how much memory is needed from _text to
> _end (i.e. all the memory the kernel will assume is available statically
> and will happily clobber). For kernel Images prior to v3.17 the best
> thing we can do is guess as you've done here, or place the DTB below the
> kernel and page tables in the TEXT_OFFSET area; neither option is
> particularly brilliant.

I haven't looked into it yet, but booting.txt says that the dtb must be
'within the first 512 megabytes from the start of the kernel image'.  If
this restriction still holds we can't just put the dtb at
text_offset + image_size.

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-07  0:40                                       ` Geoff Levand
@ 2014-08-07  9:59                                         ` Mark Rutland
  2014-08-07 17:09                                           ` Geoff Levand
  0 siblings, 1 reply; 61+ messages in thread
From: Mark Rutland @ 2014-08-07  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 07, 2014 at 01:40:03AM +0100, Geoff Levand wrote:
> Hi,

Hi Geoff,

> On Mon, 2014-08-04 at 12:35 +0100, Mark Rutland wrote:
> > > +       /*
> > > +        * arm64 kernel uses area above kernel image to build
> > > +        * initial page tables. Max required size for this area is 384K. It
> > > +        * happens when CONFIG_ARM64_PGTABLE_LEVELS is set.
> > > +        * So place dtb 512k above kernel image area.
> > > +        */
> > 
> > The area above the kernel Image holds the BSS too, and from some
> > experiments I did a while back with an allyesconfig, it was possible to
> > have many megabytes of BSS.
> > 
> > While 512k might be adequate for a defconfig today, it will become less
> > so as time goes on.
> > 
> > For kernel Images post v3.17 you can and should read the Image header's
> > image_size field to figure out how much memory is needed from _text to
> > _end (i.e. all the memory the kernel will assume is available statically
> > and will happily clobber). For kernel Images prior to v3.17 the best
> > thing we can do is guess as you've done here, or place the DTB below the
> > kernel and page tables in the TEXT_OFFSET area; neither option is
> > particularly brilliant.
> 
> I haven't looked into it yet, but booting.txt says that the dtb must be
> 'within the first 512 megabytes from the start of the kernel image'.  If
> this restriction still holds we can't just put the dtb at
> text_offset + image_size.

Isn't that only a problem if text_offset + image_size is huge (510MB+)?

The wording in booting.txt doesn't sound quite right. As I understand
it, the 512MB restriction is because of the way we map the dtb in the
swapper page tables.

So a better wording would be "the kernel and DTB must be placed in the
same naturally-aligned 512MB region of memory".

It should be possible to get rid of the restrictions on the placement of
the image and DTB, but this requires reworking the VA layout (using a
kernel text mapping separate from the linear map as with x86_64, and
similarly having a separate DTB mapping), and unfortunately I haven't
had the time to do that.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-07  9:59                                         ` Mark Rutland
@ 2014-08-07 17:09                                           ` Geoff Levand
  0 siblings, 0 replies; 61+ messages in thread
From: Geoff Levand @ 2014-08-07 17:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2014-08-07 at 10:59 +0100, Mark Rutland wrote:
> On Thu, Aug 07, 2014 at 01:40:03AM +0100, Geoff Levand wrote:
> > 
> > I haven't looked into it yet, but booting.txt says that the dtb must be
> > 'within the first 512 megabytes from the start of the kernel image'.  If
> > this restriction still holds we can't just put the dtb at
> > text_offset + image_size.
> 
> Isn't that only a problem if text_offset + image_size is huge (510MB+)?

Yes, a limit I don't think we need to be too concerned about at present.

> The wording in booting.txt doesn't sound quite right. As I understand
> it, the 512MB restriction is because of the way we map the dtb in the
> swapper page tables.
> 
> So a better wording would be "the kernel and DTB must be placed in the
> same naturally-aligned 512MB region of memory".
> 
> It should be possible to get rid of the restrictions on the placement of
> the image and DTB, but this requires reworking the VA layout (using a
> kernel text mapping separate from the linear map as with x86_64, and
> similarly having a separate DTB mapping), and unfortunately I haven't
> had the time to do that.

Or maybe just relocate the dtb at startup to within range.

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-04 10:16                                   ` Arun Chandran
  2014-08-04 11:35                                     ` Mark Rutland
@ 2014-08-04 17:21                                     ` Geoff Levand
  2014-08-06 13:54                                       ` Arun Chandran
  1 sibling, 1 reply; 61+ messages in thread
From: Geoff Levand @ 2014-08-04 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Mon, 2014-08-04 at 15:46 +0530, Arun Chandran wrote:
> The latest kexec code is working fine in LE/BE mode in UP configuration.
> 
> I had to change kexec-tools code a bit to make sure that "kexec -l"
> is not putting dtb at an area where kernel is building its initial page
> tables.

OK, I'll add in code to handle this.

> Now before trying SMP configuration I want to know whether the below "kexec -e"
> scenarios are valid(required)?
> 
> 1st stage                                  |               2nd stage
> ---------------------------------------------------------------------------------------------
> LE UP                                      |               BE UP
> LE UP                                      |               BE SMP
> LE UP                                      |               LE SMP
> LE SMP                                   |               LE UP
> LE SMP                                   |               BE UP
> LE SMP                                   |               BE SMP

Kexec should work for all combinations of kernel options.  A
bootloader (1st stage) may want to be UP, and custom crash
dump kernels (2nd stage) are generally UP.  UP->UP may have
limited use.  I think all these listed are important to test
though, and going the other way also, so BE-SMP->LE-SMP for
example.

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-04 17:21                                     ` Geoff Levand
@ 2014-08-06 13:54                                       ` Arun Chandran
  2014-08-06 15:51                                         ` Arun Chandran
  2014-08-07 20:07                                         ` Geoff Levand
  0 siblings, 2 replies; 61+ messages in thread
From: Arun Chandran @ 2014-08-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, Aug 4, 2014 at 10:51 PM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Mon, 2014-08-04 at 15:46 +0530, Arun Chandran wrote:
>> The latest kexec code is working fine in LE/BE mode in UP configuration.
>>
>> I had to change kexec-tools code a bit to make sure that "kexec -l"
>> is not putting dtb at an area where kernel is building its initial page
>> tables.
>
> OK, I'll add in code to handle this.
>
>> Now before trying SMP configuration I want to know whether the below "kexec -e"
>> scenarios are valid(required)?
>>
>> 1st stage                                  |               2nd stage
>> ---------------------------------------------------------------------------------------------
>> LE UP                                      |               BE UP
>> LE UP                                      |               BE SMP
>> LE UP                                      |               LE SMP
>> LE SMP                                   |               LE UP
>> LE SMP                                   |               BE UP
>> LE SMP                                   |               BE SMP
>
I am testing "kexec -e" with the script below(it will either
reboot the board to BE or LE mode randomly).

This test is done without L3 cache. With L3 I have more troubles.
I think it is best to fix the code without L3 then move
to testing with L3. That fix may solve the problem with L3

###############################
:~$ cat /etc/init.d/S50reboot
#!/bin/sh

sleep 5
i=$RANDOM
j=$(( $i % 2))

if [ $j -eq 0 ] ; then
        mount /dev/mmcblk0p1 /mnt

        count=`cat /mnt/cnt`
        echo "KEXEC rebootng to BE count = $count"
        echo $RANDOM > /mnt/"$count""_BE"

        count=$(( $count + 1 ))
        echo "$count">/mnt/cnt

        kexec -l /mnt/vmlinux_BE.strip
--command-line="console=ttyS0,115200 earlyprintk=uart8
250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M"
        umount /mnt
        kexec -e
else
        mount /dev/mmcblk0p1 /mnt

        count=`cat /mnt/cnt`
        echo "KEXEC rebooting to LE count = $count"
        echo $RANDOM > /mnt/"$count""_LE"

        count=$(( $count + 1 ))
        echo "$count">/mnt/cnt

        kexec -l /mnt/vmlinux_LE.strip
--command-line="console=ttyS0,115200 earlyprintk=uart8
250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M"
        umount /mnt
        kexec -e
fi

exit $?
#############################

I have managed to run this test till 72 times with the
below changes.

############################
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 363a246..7de11ee 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
unsigned int flag,
  break;
  case IND_SOURCE:
  __flush_dcache_area(addr, PAGE_SIZE);
- __flush_dcache_area(dest, PAGE_SIZE);
  break;
  default:
  break;
@@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image)
  phys_addr_t reboot_code_buffer_phys;
  void *reboot_code_buffer;
  struct kexec_ctx *ctx = kexec_image_to_ctx(image);
+ unsigned long start, end;
+ int i;

  BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
  BUG_ON(num_online_cpus() > 1);
@@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image)

  kexec_list_walk(NULL, image->head, kexec_list_flush_cb);

+ start = image->segment[0].mem;
+ end = image->segment[0].mem + image->segment[0].memsz;
+ for (i = 0; i < image->nr_segments; i++) {
+ if (image->segment[i].mem > end)
+ end = image->segment[i].mem + image->segment[i].memsz;
+ }
+
+ start = (unsigned long)phys_to_virt(start);
+ end = (unsigned long)phys_to_virt(end);
+ pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start);
+ __flush_dcache_area((void *)start, end - start);
+ //flush_icache_range(start, end);
+ //mdelay(10);
+
  soft_restart(reboot_code_buffer_phys);
 }

diff --git a/arch/arm64/kernel/relocate_kernel.S
b/arch/arm64/kernel/relocate_kernel.S
index 4b077e1..a49549e 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -61,13 +61,13 @@ relocate_new_kernel:
  mov x20, x13
  mov x21, x14

- prfm pldl1strm, [x21, #64]
+ /*prfm pldl1strm, [x21, #64] */
 1: ldp x22, x23, [x21]
  ldp x24, x25, [x21, #16]
  ldp x26, x27, [x21, #32]
  ldp x28, x29, [x21, #48]
  add x21, x21, #64
- prfm pldl1strm, [x21, #64]
+ /*prfm pldl1strm, [x21, #64]*/
  stnp x22, x23, [x20]
  stnp x24, x25, [x20, #16]
  stnp x26, x27, [x20, #32]
@@ -115,6 +115,8 @@ relocate_new_kernel:
  mov x3, xzr

  ldr x4, kexec_kimage_start
+ dsb sy
+ isb
  br x4

 .align 3
arun at arun-OptiPlex-9010:~/work/aarch64-kernel/linux-kexec_LE$
###########################

Out of this 72 times it booted 34 LE and 38 BE. It failed
from switching from BE to LE.

Mark,
Do you have any pointers to find the problem?

--Arun

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-06 13:54                                       ` Arun Chandran
@ 2014-08-06 15:51                                         ` Arun Chandran
  2014-08-07 20:07                                         ` Geoff Levand
  1 sibling, 0 replies; 61+ messages in thread
From: Arun Chandran @ 2014-08-06 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 6, 2014 at 7:24 PM, Arun Chandran <achandran@mvista.com> wrote:
> Hi,
>
> On Mon, Aug 4, 2014 at 10:51 PM, Geoff Levand <geoff@infradead.org> wrote:
>> Hi Arun,
>>
>> On Mon, 2014-08-04 at 15:46 +0530, Arun Chandran wrote:
>>> The latest kexec code is working fine in LE/BE mode in UP configuration.
>>>
>>> I had to change kexec-tools code a bit to make sure that "kexec -l"
>>> is not putting dtb at an area where kernel is building its initial page
>>> tables.
>>
>> OK, I'll add in code to handle this.
>>
>>> Now before trying SMP configuration I want to know whether the below "kexec -e"
>>> scenarios are valid(required)?
>>>
>>> 1st stage                                  |               2nd stage
>>> ---------------------------------------------------------------------------------------------
>>> LE UP                                      |               BE UP
>>> LE UP                                      |               BE SMP
>>> LE UP                                      |               LE SMP
>>> LE SMP                                   |               LE UP
>>> LE SMP                                   |               BE UP
>>> LE SMP                                   |               BE SMP
>>
> I am testing "kexec -e" with the script below(it will either
> reboot the board to BE or LE mode randomly).
>
> This test is done without L3 cache. With L3 I have more troubles.
> I think it is best to fix the code without L3 then move
> to testing with L3. That fix may solve the problem with L3
>
> ###############################
> :~$ cat /etc/init.d/S50reboot
> #!/bin/sh
>
> sleep 5
> i=$RANDOM
> j=$(( $i % 2))
>
> if [ $j -eq 0 ] ; then
>         mount /dev/mmcblk0p1 /mnt
>
>         count=`cat /mnt/cnt`
>         echo "KEXEC rebootng to BE count = $count"
>         echo $RANDOM > /mnt/"$count""_BE"
>
>         count=$(( $count + 1 ))
>         echo "$count">/mnt/cnt
>
>         kexec -l /mnt/vmlinux_BE.strip
> --command-line="console=ttyS0,115200 earlyprintk=uart8
> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M"
>         umount /mnt
>         kexec -e
> else
>         mount /dev/mmcblk0p1 /mnt
>
>         count=`cat /mnt/cnt`
>         echo "KEXEC rebooting to LE count = $count"
>         echo $RANDOM > /mnt/"$count""_LE"
>
>         count=$(( $count + 1 ))
>         echo "$count">/mnt/cnt
>
>         kexec -l /mnt/vmlinux_LE.strip
> --command-line="console=ttyS0,115200 earlyprintk=uart8
> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M"
>         umount /mnt
>         kexec -e
> fi
>
> exit $?
> #############################
>

Some more information:

I am able to run the same test (without L3) for 1.5 hours in case of
no endian switching.

For this testing I used the latest code @
https://git.linaro.org/people/geoff.levand/linux-kexec.git
unlike in the endian switching scenario there is no code change
made for this test.

It ran 370 times for LE-->LE reboots.
It ran 351 times for BE-->BE reboots.

I had to manually stop both tests as it was continuing
without any problem.

I will repeat the same test with L3 cache on

--Arun

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-06 13:54                                       ` Arun Chandran
  2014-08-06 15:51                                         ` Arun Chandran
@ 2014-08-07 20:07                                         ` Geoff Levand
  2014-08-08  5:46                                           ` Arun Chandran
  1 sibling, 1 reply; 61+ messages in thread
From: Geoff Levand @ 2014-08-07 20:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote:

> I have managed to run this test till 72 times with the
> below changes.
> 
> ############################
> diff --git a/arch/arm64/kernel/machine_kexec.c
> b/arch/arm64/kernel/machine_kexec.c
> index 363a246..7de11ee 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
> unsigned int flag,
>   break;
>   case IND_SOURCE:
>   __flush_dcache_area(addr, PAGE_SIZE);
> - __flush_dcache_area(dest, PAGE_SIZE);
>   break;
>   default:
>   break;
> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image)
>   phys_addr_t reboot_code_buffer_phys;
>   void *reboot_code_buffer;
>   struct kexec_ctx *ctx = kexec_image_to_ctx(image);
> + unsigned long start, end;
> + int i;
> 
>   BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
>   BUG_ON(num_online_cpus() > 1);
> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image)
> 
>   kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
> 
> + start = image->segment[0].mem;
> + end = image->segment[0].mem + image->segment[0].memsz;
> + for (i = 0; i < image->nr_segments; i++) {
> + if (image->segment[i].mem > end)
> + end = image->segment[i].mem + image->segment[i].memsz;
> + }
> +
> + start = (unsigned long)phys_to_virt(start);
> + end = (unsigned long)phys_to_virt(end);
> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start);
> + __flush_dcache_area((void *)start, end - start);
> + //flush_icache_range(start, end);
> + //mdelay(10);
> +
>   soft_restart(reboot_code_buffer_phys);
>  }

Doing the flush in kexec_list_flush_cb() is almost the same
as using the image->segment to flush.  Did you see a
difference on your system?

> diff --git a/arch/arm64/kernel/relocate_kernel.S
> b/arch/arm64/kernel/relocate_kernel.S
> index 4b077e1..a49549e 100644
> --- a/arch/arm64/kernel/relocate_kernel.S
> +++ b/arch/arm64/kernel/relocate_kernel.S

I think these changes are good.  I'll add them in.

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-07 20:07                                         ` Geoff Levand
@ 2014-08-08  5:46                                           ` Arun Chandran
  2014-08-08 10:03                                             ` Arun Chandran
  0 siblings, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-08-08  5:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, Aug 8, 2014 at 1:37 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote:
>
>> I have managed to run this test till 72 times with the
>> below changes.
>>
>> ############################
>> diff --git a/arch/arm64/kernel/machine_kexec.c
>> b/arch/arm64/kernel/machine_kexec.c
>> index 363a246..7de11ee 100644
>> --- a/arch/arm64/kernel/machine_kexec.c
>> +++ b/arch/arm64/kernel/machine_kexec.c
>> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
>> unsigned int flag,
>>   break;
>>   case IND_SOURCE:
>>   __flush_dcache_area(addr, PAGE_SIZE);
>> - __flush_dcache_area(dest, PAGE_SIZE);
>>   break;
>>   default:
>>   break;
>> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image)
>>   phys_addr_t reboot_code_buffer_phys;
>>   void *reboot_code_buffer;
>>   struct kexec_ctx *ctx = kexec_image_to_ctx(image);
>> + unsigned long start, end;
>> + int i;
>>
>>   BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
>>   BUG_ON(num_online_cpus() > 1);
>> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image)
>>
>>   kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
>>
>> + start = image->segment[0].mem;
>> + end = image->segment[0].mem + image->segment[0].memsz;
>> + for (i = 0; i < image->nr_segments; i++) {
>> + if (image->segment[i].mem > end)
>> + end = image->segment[i].mem + image->segment[i].memsz;
>> + }
>> +
>> + start = (unsigned long)phys_to_virt(start);
>> + end = (unsigned long)phys_to_virt(end);
>> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start);
>> + __flush_dcache_area((void *)start, end - start);
>> + //flush_icache_range(start, end);
>> + //mdelay(10);
>> +
>>   soft_restart(reboot_code_buffer_phys);
>>  }
>
> Doing the flush in kexec_list_flush_cb() is almost the same
> as using the image->segment to flush.  Did you see a
> difference on your system?
>

Yes I can see the difference. Let me explain it in detail.

I am doing a stress test of "kexec -e" with the below reboot
script.

################################
#!/bin/sh

sleep 5
i=$RANDOM
j=$(( $i % 2))

mount /dev/mmcblk0p1 /mnt
count=`cat /mnt/cnt`

if [ $j -eq 0 ] ; then
    echo "KEXEC rebootng to BE count = $count"
    echo $RANDOM > /mnt/"$count""_BE"
    kexec -l /mnt/vmlinux_BE.strip
--command-line="console=ttyS0,115200 earlyprintk=uart8
250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
else
   echo "KEXEC rebooting to LE count = $count"
   echo $RANDOM > /mnt/"$count""_LE"
    kexec -l /mnt/vmlinux_LE.strip
--command-line="console=ttyS0,115200 earlyprintk=uart8
250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
fi

count=$(( $count + 1 ))
echo "$count">/mnt/cnt
umount /mnt
kexec -e
exit $?
###############################

Observations with the default code
@https://git.linaro.org/people/geoff.levand/linux-kexec.git
Changed last on "Mon, 4 Aug 2014 23:24:10 +0000 (16:24 -0700)"

a) LE to LE worked without L3 cache on
b) BE to BE worked without L3 cache on
c) Random endian switching does not work in any case (with L3, No L3)
    It breaks very early and unstable.

Now with the below modifications

#############################
 diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 363a246..571b68d 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
unsigned int flag,
                break;
        case IND_SOURCE:
                __flush_dcache_area(addr, PAGE_SIZE);
-               __flush_dcache_area(dest, PAGE_SIZE);
                break;
        default:
                break;
@@ -636,11 +635,13 @@ static void kexec_list_flush_cb(void *ctx ,
unsigned int flag,
  * Called from the core kexec code for a sys_reboot with
LINUX_REBOOT_CMD_KEXEC.
  */

+unsigned long dflush_start, dflush_end;
 void machine_kexec(struct kimage *image)
 {
        phys_addr_t reboot_code_buffer_phys;
        void *reboot_code_buffer;
        struct kexec_ctx *ctx = kexec_image_to_ctx(image);
+       int i;

        BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
        BUG_ON(num_online_cpus() > 1);
@@ -698,6 +699,19 @@ void machine_kexec(struct kimage *image)

        kexec_list_walk(NULL, image->head, kexec_list_flush_cb);

+       dflush_start = image->segment[0].mem;
+       dflush_end = image->segment[0].mem + image->segment[0].memsz;
+       for (i = 0; i < image->nr_segments; i++) {
+               if (image->segment[i].mem > dflush_end)
+                       dflush_end = image->segment[i].mem +
image->segment[i].memsz;
+       }
+
+       dflush_start = (unsigned long)phys_to_virt(dflush_start);
+       dflush_end = (unsigned long)phys_to_virt(dflush_end);
+
+       __flush_dcache_area((void *)&dflush_start, sizeof(dflush_start));
+       __flush_dcache_area((void *)&dflush_end, sizeof(dflush_end));
+
        soft_restart(reboot_code_buffer_phys);
 }

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index aa13521..b8c58d8 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -57,6 +57,7 @@ unsigned long __stack_chk_guard __read_mostly;
 EXPORT_SYMBOL(__stack_chk_guard);
 #endif

+extern unsigned long dflush_start, dflush_end;
 static void setup_restart(void)
 {
        /*
@@ -78,6 +79,8 @@ static void setup_restart(void)

        /* Push out any further dirty data, and ensure cache is empty */
        flush_cache_all();
+
+       __flush_dcache_area((void*)dflush_start, dflush_end - dflush_start);
 }

 void soft_restart(unsigned long addr)
diff --git a/arch/arm64/kernel/relocate_kernel.S
b/arch/arm64/kernel/relocate_kernel.S
index 4b077e1..a49549e 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -61,13 +61,13 @@ relocate_new_kernel:
        mov x20, x13
        mov x21, x14

-       prfm    pldl1strm, [x21, #64]
+       /*prfm  pldl1strm, [x21, #64] */
 1:     ldp     x22, x23, [x21]
        ldp     x24, x25, [x21, #16]
        ldp     x26, x27, [x21, #32]
        ldp     x28, x29, [x21, #48]
        add     x21, x21, #64
-       prfm    pldl1strm, [x21, #64]
+       /*prfm  pldl1strm, [x21, #64]*/
        stnp    x22, x23, [x20]
        stnp    x24, x25, [x20, #16]
        stnp    x26, x27, [x20, #32]
@@ -115,6 +115,8 @@ relocate_new_kernel:
        mov     x3, xzr

        ldr     x4, kexec_kimage_start
+       dsb     sy
+       isb
        br      x4

 .align 3
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index f1619c0..7d81b86 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -52,6 +52,12 @@
  */
 ENTRY(cpu_cache_off)
        mrs     x0, sctlr_el1
+       bic     x0, x0, #1 << 12                // clear SCTLR.C
+       msr     sctlr_el1, x0
+       isb
+       dsb     sy
+
+       mrs     x0, sctlr_el1
        bic     x0, x0, #1 << 2                 // clear SCTLR.C
        msr     sctlr_el1, x0
        isb
###########################

a) I am able to run random endian switching test for
11.5 hours without any breaks.

It rebooted totally 6542 times.
total LE boots = 3241
total BE boots = 3301

Out of that 1625 times it switched from "LE to BE"
or "BE to LE"

One major modification is flushing the Dcache area of
the new Image after turning off CPU caches.

This makes sure that L3 contains no lines in the
new "kernel Image area". I am still not sure what happens
with the other lines still in L3. Does the new kernel has
to do a __flush_dcahe_area() on all the new pages
it is gonna give to userspace?

Please refer to the discussion at
http://lists.linaro.org/pipermail/linaro-kernel/2013-August/006155.html
for more details.

It says that L3 cache becomes transparent when
lower level caches are OFF. So we need to clean
L3 when it is transparent.

And about adding barrier + removing pre-fetching  in
arch/arm64/kernel/relocate_kernel.S. I think it makes
sure that no stale data is present with CPU while
performing a endian switching. Am I right here?

There is one more change that is turning off Icache.
I am not sure why I did this. I will perform the test
without this change now.

>> diff --git a/arch/arm64/kernel/relocate_kernel.S
>> b/arch/arm64/kernel/relocate_kernel.S
>> index 4b077e1..a49549e 100644
>> --- a/arch/arm64/kernel/relocate_kernel.S
>> +++ b/arch/arm64/kernel/relocate_kernel.S
>
> I think these changes are good.  I'll add them in.

Yes Please. Thank you for letting it in.

--Arun

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-08  5:46                                           ` Arun Chandran
@ 2014-08-08 10:03                                             ` Arun Chandran
  2014-08-12  5:42                                               ` Arun Chandran
  0 siblings, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-08-08 10:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 8, 2014 at 11:16 AM, Arun Chandran <achandran@mvista.com> wrote:
> Hi,
>
> On Fri, Aug 8, 2014 at 1:37 AM, Geoff Levand <geoff@infradead.org> wrote:
>> Hi Arun,
>>
>> On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote:
>>
>>> I have managed to run this test till 72 times with the
>>> below changes.
>>>
>>> ############################
>>> diff --git a/arch/arm64/kernel/machine_kexec.c
>>> b/arch/arm64/kernel/machine_kexec.c
>>> index 363a246..7de11ee 100644
>>> --- a/arch/arm64/kernel/machine_kexec.c
>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
>>> unsigned int flag,
>>>   break;
>>>   case IND_SOURCE:
>>>   __flush_dcache_area(addr, PAGE_SIZE);
>>> - __flush_dcache_area(dest, PAGE_SIZE);
>>>   break;
>>>   default:
>>>   break;
>>> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image)
>>>   phys_addr_t reboot_code_buffer_phys;
>>>   void *reboot_code_buffer;
>>>   struct kexec_ctx *ctx = kexec_image_to_ctx(image);
>>> + unsigned long start, end;
>>> + int i;
>>>
>>>   BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
>>>   BUG_ON(num_online_cpus() > 1);
>>> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image)
>>>
>>>   kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
>>>
>>> + start = image->segment[0].mem;
>>> + end = image->segment[0].mem + image->segment[0].memsz;
>>> + for (i = 0; i < image->nr_segments; i++) {
>>> + if (image->segment[i].mem > end)
>>> + end = image->segment[i].mem + image->segment[i].memsz;
>>> + }
>>> +
>>> + start = (unsigned long)phys_to_virt(start);
>>> + end = (unsigned long)phys_to_virt(end);
>>> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start);
>>> + __flush_dcache_area((void *)start, end - start);
>>> + //flush_icache_range(start, end);
>>> + //mdelay(10);
>>> +
>>>   soft_restart(reboot_code_buffer_phys);
>>>  }
>>
>> Doing the flush in kexec_list_flush_cb() is almost the same
>> as using the image->segment to flush.  Did you see a
>> difference on your system?
>>
>
> Yes I can see the difference. Let me explain it in detail.
>
> I am doing a stress test of "kexec -e" with the below reboot
> script.
>
> ################################
> #!/bin/sh
>
> sleep 5
> i=$RANDOM
> j=$(( $i % 2))
>
> mount /dev/mmcblk0p1 /mnt
> count=`cat /mnt/cnt`
>
> if [ $j -eq 0 ] ; then
>     echo "KEXEC rebootng to BE count = $count"
>     echo $RANDOM > /mnt/"$count""_BE"
>     kexec -l /mnt/vmlinux_BE.strip
> --command-line="console=ttyS0,115200 earlyprintk=uart8
> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
> else
>    echo "KEXEC rebooting to LE count = $count"
>    echo $RANDOM > /mnt/"$count""_LE"
>     kexec -l /mnt/vmlinux_LE.strip
> --command-line="console=ttyS0,115200 earlyprintk=uart8
> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
> fi
>
> count=$(( $count + 1 ))
> echo "$count">/mnt/cnt
> umount /mnt
> kexec -e
> exit $?
> ###############################
>
> Observations with the default code
> @https://git.linaro.org/people/geoff.levand/linux-kexec.git
> Changed last on "Mon, 4 Aug 2014 23:24:10 +0000 (16:24 -0700)"
>
> a) LE to LE worked without L3 cache on
> b) BE to BE worked without L3 cache on
> c) Random endian switching does not work in any case (with L3, No L3)
>     It breaks very early and unstable.
>
> Now with the below modifications
>

I think the more cleaner approach is to invalidate
the cache lines from arch/arm64/kernel/relocate_kernel.S
As this code is already aware of the destination it has
to copy the 2nd stage kernel.


########################
diff --git a/arch/arm64/kernel/relocate_kernel.S
b/arch/arm64/kernel/relocate_kernel.S
index 4b077e1..6880c1a 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -31,6 +31,13 @@

 .align 3

+.macro dcache_line_size, reg, tmp
+mrs    \tmp, ctr_el0                   // read CTR
+ubfm   \tmp, \tmp, #16, #19            // cache line size encoding
+mov    \reg, #4                        // bytes per word
+lsl    \reg, \reg, \tmp                // actual cache line size
+.endm
+
 .globl relocate_new_kernel
 relocate_new_kernel:

@@ -58,23 +65,46 @@ relocate_new_kernel:

        /* source: copy_page(x20 = dest, x21 = addr) */

+       mov     x0, x13
+       add     x1, x13, #PAGE_SIZE
+
+       /* Invalidate the destination cache area */
+__inval_cache_range:
+       dcache_line_size x2, x3
+       sub     x3, x2, #1
+       tst     x1, x3                          // end cache line aligned?
+       bic     x1, x1, x3
+       b.eq    1f
+       dc      civac, x1                       // clean & invalidate D / U line
+1:     tst     x0, x3                          // start cache line aligned?
+       bic     x0, x0, x3
+       b.eq    2f
+       dc      civac, x0                       // clean & invalidate D / U line
+       b       3f
+2:     dc      ivac, x0                        // invalidate D / U line
+3:     add     x0, x0, x2
+       cmp     x0, x1
+       b.lo    2b
+       dsb     sy
+
        mov x20, x13
        mov x21, x14

-       prfm    pldl1strm, [x21, #64]
-1:     ldp     x22, x23, [x21]
+       /*prfm  pldl1strm, [x21, #64] */
+.Lcopy_data:
+       ldp     x22, x23, [x21]
        ldp     x24, x25, [x21, #16]
        ldp     x26, x27, [x21, #32]
        ldp     x28, x29, [x21, #48]
        add     x21, x21, #64
-       prfm    pldl1strm, [x21, #64]
+       /*prfm  pldl1strm, [x21, #64]*/
        stnp    x22, x23, [x20]
        stnp    x24, x25, [x20, #16]
        stnp    x26, x27, [x20, #32]
        stnp    x28, x29, [x20, #48]
        add     x20, x20, #64
        tst     x21, #(PAGE_SIZE - 1)
-       b.ne    1b
+       b.ne    .Lcopy_data

        /* dest += PAGE_SIZE */

@@ -115,6 +145,8 @@ relocate_new_kernel:
        mov     x3, xzr

        ldr     x4, kexec_kimage_start
+       dsb     sy
+       isb
        br      x4

 .align 3
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index f1619c0..c62cba7 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -52,6 +52,13 @@
  */
 ENTRY(cpu_cache_off)
        mrs     x0, sctlr_el1
+       /* Turn off I-Cache */
+       bic     x0, x0, #1 << 12                // clear SCTLR.C
+       msr     sctlr_el1, x0
+       isb
+       dsb     sy
+
+       mrs     x0, sctlr_el1
        bic     x0, x0, #1 << 2                 // clear SCTLR.C
        msr     sctlr_el1, x0
        isb

#############################

--Arun

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-08 10:03                                             ` Arun Chandran
@ 2014-08-12  5:42                                               ` Arun Chandran
  2014-08-13 11:09                                                 ` Arun Chandran
  0 siblings, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-08-12  5:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff, Mark,


Sorry for top posting. I hope we have solved almost all the problems
with kexec in uni-processor scenario except converting soft_restart()
to assembly (I will give try to do this).

kexec is stress tested with L3 cache on with the below changes.
It ran for 17 hours and rebooted totally 8226 times without any problem.

Total LE boots - 4122
Total BE boots - 4104

Total LE to BE or BE to LE switching - 4112

##########################
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 363a246..5b15a00 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
unsigned int flag,
                break;
        case IND_SOURCE:
                __flush_dcache_area(addr, PAGE_SIZE);
-               __flush_dcache_area(dest, PAGE_SIZE);
                break;
        default:
                break;
diff --git a/arch/arm64/kernel/relocate_kernel.S
b/arch/arm64/kernel/relocate_kernel.S
index 4b077e1..e890516 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -31,6 +31,13 @@

 .align 3

+.macro dcache_line_size, reg, tmp
+mrs    \tmp, ctr_el0                   // read CTR
+ubfm   \tmp, \tmp, #16, #19            // cache line size encoding
+mov    \reg, #4                        // bytes per word
+lsl    \reg, \reg, \tmp                // actual cache line size
+.endm
+
 .globl relocate_new_kernel
 relocate_new_kernel:

@@ -56,25 +63,51 @@ relocate_new_kernel:
 .Ltest_source:
        tbz     x10, IND_SOURCE_BIT, .Ltest_indirection

-       /* source: copy_page(x20 = dest, x21 = addr) */
+       mov     x0, x13
+       add     x1, x13, #PAGE_SIZE
+
+       /* Invalidate the destination cache area to make sure that
+        * all the data required for the second stage kernel is
+        * intact at PoC. This is the safest place to do this activity
+        * as we are running with MMU and D-cache off.
+        */
+__inval_cache_range:
+       dcache_line_size x2, x3
+       sub     x3, x2, #1
+       tst     x1, x3                          // end cache line aligned?
+       bic     x1, x1, x3
+       b.eq    1f
+       dc      ivac, x1                        // invalidate D / U line
+1:     tst     x0, x3                          // start cache line aligned?
+       bic     x0, x0, x3
+       b.eq    2f
+       dc      ivac, x0                        // invalidate D / U line
+       b       3f
+2:     dc      ivac, x0                        // invalidate D / U line
+3:     add     x0, x0, x2
+       cmp     x0, x1
+       b.lo    2b
+       dsb     sy

+       /* source: copy_page(x20 = dest, x21 = addr) */
        mov x20, x13
        mov x21, x14

-       prfm    pldl1strm, [x21, #64]
-1:     ldp     x22, x23, [x21]
+       /*prfm  pldl1strm, [x21, #64] */
+.Lcopy_data:
+       ldp     x22, x23, [x21]
        ldp     x24, x25, [x21, #16]
        ldp     x26, x27, [x21, #32]
        ldp     x28, x29, [x21, #48]
        add     x21, x21, #64
-       prfm    pldl1strm, [x21, #64]
+       /*prfm  pldl1strm, [x21, #64]*/
        stnp    x22, x23, [x20]
        stnp    x24, x25, [x20, #16]
        stnp    x26, x27, [x20, #32]
        stnp    x28, x29, [x20, #48]
        add     x20, x20, #64
        tst     x21, #(PAGE_SIZE - 1)
-       b.ne    1b
+       b.ne    .Lcopy_data

        /* dest += PAGE_SIZE */

@@ -115,6 +148,11 @@ relocate_new_kernel:
        mov     x3, xzr

        ldr     x4, kexec_kimage_start
+
+       /* Clean entire I-cache */
+       ic      ialluis
+       isb
+       dsb     sy
        br      x4

 .align 3
################################

I have attached the patch for this.
Now I will try kexec in SMP configuration.

--Arun

On Fri, Aug 8, 2014 at 3:33 PM, Arun Chandran <achandran@mvista.com> wrote:
> On Fri, Aug 8, 2014 at 11:16 AM, Arun Chandran <achandran@mvista.com> wrote:
>> Hi,
>>
>> On Fri, Aug 8, 2014 at 1:37 AM, Geoff Levand <geoff@infradead.org> wrote:
>>> Hi Arun,
>>>
>>> On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote:
>>>
>>>> I have managed to run this test till 72 times with the
>>>> below changes.
>>>>
>>>> ############################
>>>> diff --git a/arch/arm64/kernel/machine_kexec.c
>>>> b/arch/arm64/kernel/machine_kexec.c
>>>> index 363a246..7de11ee 100644
>>>> --- a/arch/arm64/kernel/machine_kexec.c
>>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>>> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
>>>> unsigned int flag,
>>>>   break;
>>>>   case IND_SOURCE:
>>>>   __flush_dcache_area(addr, PAGE_SIZE);
>>>> - __flush_dcache_area(dest, PAGE_SIZE);
>>>>   break;
>>>>   default:
>>>>   break;
>>>> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image)
>>>>   phys_addr_t reboot_code_buffer_phys;
>>>>   void *reboot_code_buffer;
>>>>   struct kexec_ctx *ctx = kexec_image_to_ctx(image);
>>>> + unsigned long start, end;
>>>> + int i;
>>>>
>>>>   BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
>>>>   BUG_ON(num_online_cpus() > 1);
>>>> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image)
>>>>
>>>>   kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
>>>>
>>>> + start = image->segment[0].mem;
>>>> + end = image->segment[0].mem + image->segment[0].memsz;
>>>> + for (i = 0; i < image->nr_segments; i++) {
>>>> + if (image->segment[i].mem > end)
>>>> + end = image->segment[i].mem + image->segment[i].memsz;
>>>> + }
>>>> +
>>>> + start = (unsigned long)phys_to_virt(start);
>>>> + end = (unsigned long)phys_to_virt(end);
>>>> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start);
>>>> + __flush_dcache_area((void *)start, end - start);
>>>> + //flush_icache_range(start, end);
>>>> + //mdelay(10);
>>>> +
>>>>   soft_restart(reboot_code_buffer_phys);
>>>>  }
>>>
>>> Doing the flush in kexec_list_flush_cb() is almost the same
>>> as using the image->segment to flush.  Did you see a
>>> difference on your system?
>>>
>>
>> Yes I can see the difference. Let me explain it in detail.
>>
>> I am doing a stress test of "kexec -e" with the below reboot
>> script.
>>
>> ################################
>> #!/bin/sh
>>
>> sleep 5
>> i=$RANDOM
>> j=$(( $i % 2))
>>
>> mount /dev/mmcblk0p1 /mnt
>> count=`cat /mnt/cnt`
>>
>> if [ $j -eq 0 ] ; then
>>     echo "KEXEC rebootng to BE count = $count"
>>     echo $RANDOM > /mnt/"$count""_BE"
>>     kexec -l /mnt/vmlinux_BE.strip
>> --command-line="console=ttyS0,115200 earlyprintk=uart8
>> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
>> else
>>    echo "KEXEC rebooting to LE count = $count"
>>    echo $RANDOM > /mnt/"$count""_LE"
>>     kexec -l /mnt/vmlinux_LE.strip
>> --command-line="console=ttyS0,115200 earlyprintk=uart8
>> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
>> fi
>>
>> count=$(( $count + 1 ))
>> echo "$count">/mnt/cnt
>> umount /mnt
>> kexec -e
>> exit $?
>> ###############################
>>
>> Observations with the default code
>> @https://git.linaro.org/people/geoff.levand/linux-kexec.git
>> Changed last on "Mon, 4 Aug 2014 23:24:10 +0000 (16:24 -0700)"
>>
>> a) LE to LE worked without L3 cache on
>> b) BE to BE worked without L3 cache on
>> c) Random endian switching does not work in any case (with L3, No L3)
>>     It breaks very early and unstable.
>>
>> Now with the below modifications
>>
>
> I think the more cleaner approach is to invalidate
> the cache lines from arch/arm64/kernel/relocate_kernel.S
> As this code is already aware of the destination it has
> to copy the 2nd stage kernel.
>
>
> ########################
> diff --git a/arch/arm64/kernel/relocate_kernel.S
> b/arch/arm64/kernel/relocate_kernel.S
> index 4b077e1..6880c1a 100644
> --- a/arch/arm64/kernel/relocate_kernel.S
> +++ b/arch/arm64/kernel/relocate_kernel.S
> @@ -31,6 +31,13 @@
>
>  .align 3
>
> +.macro dcache_line_size, reg, tmp
> +mrs    \tmp, ctr_el0                   // read CTR
> +ubfm   \tmp, \tmp, #16, #19            // cache line size encoding
> +mov    \reg, #4                        // bytes per word
> +lsl    \reg, \reg, \tmp                // actual cache line size
> +.endm
> +
>  .globl relocate_new_kernel
>  relocate_new_kernel:
>
> @@ -58,23 +65,46 @@ relocate_new_kernel:
>
>         /* source: copy_page(x20 = dest, x21 = addr) */
>
> +       mov     x0, x13
> +       add     x1, x13, #PAGE_SIZE
> +
> +       /* Invalidate the destination cache area */
> +__inval_cache_range:
> +       dcache_line_size x2, x3
> +       sub     x3, x2, #1
> +       tst     x1, x3                          // end cache line aligned?
> +       bic     x1, x1, x3
> +       b.eq    1f
> +       dc      civac, x1                       // clean & invalidate D / U line
> +1:     tst     x0, x3                          // start cache line aligned?
> +       bic     x0, x0, x3
> +       b.eq    2f
> +       dc      civac, x0                       // clean & invalidate D / U line
> +       b       3f
> +2:     dc      ivac, x0                        // invalidate D / U line
> +3:     add     x0, x0, x2
> +       cmp     x0, x1
> +       b.lo    2b
> +       dsb     sy
> +
>         mov x20, x13
>         mov x21, x14
>
> -       prfm    pldl1strm, [x21, #64]
> -1:     ldp     x22, x23, [x21]
> +       /*prfm  pldl1strm, [x21, #64] */
> +.Lcopy_data:
> +       ldp     x22, x23, [x21]
>         ldp     x24, x25, [x21, #16]
>         ldp     x26, x27, [x21, #32]
>         ldp     x28, x29, [x21, #48]
>         add     x21, x21, #64
> -       prfm    pldl1strm, [x21, #64]
> +       /*prfm  pldl1strm, [x21, #64]*/
>         stnp    x22, x23, [x20]
>         stnp    x24, x25, [x20, #16]
>         stnp    x26, x27, [x20, #32]
>         stnp    x28, x29, [x20, #48]
>         add     x20, x20, #64
>         tst     x21, #(PAGE_SIZE - 1)
> -       b.ne    1b
> +       b.ne    .Lcopy_data
>
>         /* dest += PAGE_SIZE */
>
> @@ -115,6 +145,8 @@ relocate_new_kernel:
>         mov     x3, xzr
>
>         ldr     x4, kexec_kimage_start
> +       dsb     sy
> +       isb
>         br      x4
>
>  .align 3
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index f1619c0..c62cba7 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -52,6 +52,13 @@
>   */
>  ENTRY(cpu_cache_off)
>         mrs     x0, sctlr_el1
> +       /* Turn off I-Cache */
> +       bic     x0, x0, #1 << 12                // clear SCTLR.C
> +       msr     sctlr_el1, x0
> +       isb
> +       dsb     sy
> +
> +       mrs     x0, sctlr_el1
>         bic     x0, x0, #1 << 2                 // clear SCTLR.C
>         msr     sctlr_el1, x0
>         isb
>
> #############################
>
> --Arun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-LE-BE-switching-worked-with-L3.patch
Type: text/x-patch
Size: 2946 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140812/1d732f96/attachment.bin>

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-12  5:42                                               ` Arun Chandran
@ 2014-08-13 11:09                                                 ` Arun Chandran
  2014-08-26 22:32                                                   ` Geoff Levand
  0 siblings, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-08-13 11:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Tue, Aug 12, 2014 at 11:12 AM, Arun Chandran <achandran@mvista.com> wrote:
> Hi Geoff, Mark,
>
>
> Sorry for top posting. I hope we have solved almost all the problems
> with kexec in uni-processor scenario except converting soft_restart()
> to assembly (I will give try to do this).
>

I have one more concern regarding flushing of D-cache area corresponding
to the kexec_list entrees.

Currently kexec_list_walk() is doing

1) flush_dcache_area of the kexec_list[0] till PAGE_SIZE

2) continue accessing entries in kexec_list[0] to PAGE_SIZE

3) switch to next kexec_list depending upon kexec_list[entry] & flag
==  IND_INDIRECTION

4) goto 1

Shouldn't that be doing flush_dcache_area() after completely using the list??

Like given below?
######################
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 5b15a00..ffb3b54 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -155,6 +155,7 @@ static void kexec_list_walk(void *ctx, unsigned
long kimage_head,
 {
        void *dest;
        unsigned long *entry;
+       void *last_accessed_dir = NULL;

        for (entry = &kimage_head, dest = NULL; ; entry++) {
                unsigned int flag = *entry & IND_FLAGS;
@@ -163,7 +164,10 @@ static void kexec_list_walk(void *ctx, unsigned
long kimage_head,
                switch (flag) {
                case IND_INDIRECTION:
                        entry = (unsigned long *)addr - 1;
-                       cb(ctx, flag, addr, NULL);
+                       if (last_accessed_dir != addr) {
+                               cb(ctx, flag, last_accessed_dir, NULL);
+                               last_accessed_dir = addr;
+                       }
                        break;
                case IND_DESTINATION:
                        dest = addr;
@@ -174,7 +178,7 @@ static void kexec_list_walk(void *ctx, unsigned
long kimage_head,
                        dest += PAGE_SIZE;
                        break;
                case IND_DONE:
-                       cb(ctx, flag , NULL, NULL);
+                       cb(ctx, flag , last_accessed_dir, NULL);
                        return;
                default:
                        pr_devel("%s:%d unknown flag %xh\n", __func__, __LINE__,
@@ -617,6 +621,9 @@ on_exit:
 static void kexec_list_flush_cb(void *ctx , unsigned int flag,
        void *addr, void *dest)
 {
+       if (addr == NULL)
+               return;
+
        switch (flag) {
        case IND_INDIRECTION:
                __flush_dcache_area(addr, PAGE_SIZE);
@@ -624,6 +631,9 @@ static void kexec_list_flush_cb(void *ctx ,
unsigned int flag,
        case IND_SOURCE:
                __flush_dcache_area(addr, PAGE_SIZE);
                break;
+       case IND_DONE:
+               __flush_dcache_area(addr, PAGE_SIZE);
+               break;
        default:
                break;
        }

########################

--Arun

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-13 11:09                                                 ` Arun Chandran
@ 2014-08-26 22:32                                                   ` Geoff Levand
  2014-08-27  4:56                                                     ` Arun Chandran
  0 siblings, 1 reply; 61+ messages in thread
From: Geoff Levand @ 2014-08-26 22:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Wed, 2014-08-13 at 16:39 +0530, Arun Chandran wrote:
> I have one more concern regarding flushing of D-cache area corresponding
> to the kexec_list entrees.
> 
> Currently kexec_list_walk() is doing
> 
> 1) flush_dcache_area of the kexec_list[0] till PAGE_SIZE
> 
> 2) continue accessing entries in kexec_list[0] to PAGE_SIZE
> 
> 3) switch to next kexec_list depending upon kexec_list[entry] & flag
> ==  IND_INDIRECTION
> 
> 4) goto 1
> 
> Shouldn't that be doing flush_dcache_area() after completely using the list??

We just want to get any data in the dcache out to the PoC before
disabling the dcache, so as long as there are only reads, and no writes
to those addresses, kexec_list_walk() should work OK.

I will move the flush of the new kernel image to after it is copied in
relocate_new_kernel().  I think that your L3 cache may not work with
what we have now:

  current:  invalidate dcache -> turn off dcache -> write data to PoC
  proposed: turn off dcache -> write data to PoC -> invalidate dcache

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-08-26 22:32                                                   ` Geoff Levand
@ 2014-08-27  4:56                                                     ` Arun Chandran
  0 siblings, 0 replies; 61+ messages in thread
From: Arun Chandran @ 2014-08-27  4:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Wed, Aug 27, 2014 at 4:02 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Wed, 2014-08-13 at 16:39 +0530, Arun Chandran wrote:
>> I have one more concern regarding flushing of D-cache area corresponding
>> to the kexec_list entrees.
>>
>> Currently kexec_list_walk() is doing
>>
>> 1) flush_dcache_area of the kexec_list[0] till PAGE_SIZE
>>
>> 2) continue accessing entries in kexec_list[0] to PAGE_SIZE
>>
>> 3) switch to next kexec_list depending upon kexec_list[entry] & flag
>> ==  IND_INDIRECTION
>>
>> 4) goto 1
>>
>> Shouldn't that be doing flush_dcache_area() after completely using the list??
>
> We just want to get any data in the dcache out to the PoC before
> disabling the dcache, so as long as there are only reads, and no writes
> to those addresses, kexec_list_walk() should work OK.
>
Yes. I missed that point. If we don't perform any writes flushing
works just fine.

> I will move the flush of the new kernel image to after it is copied in
> relocate_new_kernel().  I think that your L3 cache may not work with
> what we have now:
>
>   current:  invalidate dcache -> turn off dcache -> write data to PoC
>   proposed: turn off dcache -> write data to PoC -> invalidate dcache
>

Yes this exactly what I have done here
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/278857.html
; that code should live inside relocate_new_kernel().

I am doing the cache invalidation(only invalidation) in
relocate_new_kernel(). As we run that code after cache's are off(L3 only
comes to picture when lower level caches are on) we are writing
to data (new kernel) to PoC.

--Arun

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-29 13:35                             ` Mark Rutland
  2014-07-29 21:19                               ` Geoff Levand
@ 2014-07-30  5:46                               ` Arun Chandran
  2014-07-30  9:16                                 ` Mark Rutland
  2014-07-30  7:01                               ` Arun Chandran
  2 siblings, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-07-30  5:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 29, 2014 at 7:05 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> [...]
>
>> The default code did not work.
>>
>> It is working with the change below
>>
>> ###############
>> diff --git a/arch/arm64/kernel/machine_kexec.c
>> b/arch/arm64/kernel/machine_kexec.c
>> index 5632473..7c5f859 100644
>> --- a/arch/arm64/kernel/machine_kexec.c
>> +++ b/arch/arm64/kernel/machine_kexec.c
>> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb)
>>  /**
>>   * kexec_list_walk - Helper to walk the kimage page list.
>>   */
>> -
>> +static int kexec_kernel_size;
>> +#define IMG_SIZE_NONE  0
>> +#define KERN_SIZE_FLAG 1
>> +#define DTB_SIZE_FLAG  2
>>  static void kexec_list_walk(void *ctx, unsigned long kimage_head,
>>         void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
>>  {
>>         void *dest;
>>         unsigned long *entry;
>> +       int imgsize_flag = IMG_SIZE_NONE;
>> +
>>
>>         for (entry = &kimage_head, dest = NULL; ; entry++) {
>>                 unsigned int flag = *entry & IND_FLAGS;
>> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned
>> long kimage_head,
>>                         cb(ctx, flag, addr, NULL);
>>                         break;
>>                 case IND_DESTINATION:
>> +                       if (imgsize_flag == IMG_SIZE_NONE) {
>> +                               kexec_kernel_size = 0;
>> +                               imgsize_flag = KERN_SIZE_FLAG;
>> +                       } else if (imgsize_flag == KERN_SIZE_FLAG) {
>> +                               imgsize_flag = DTB_SIZE_FLAG;
>> +                       }
>>                         dest = addr;
>>                         cb(ctx, flag, addr, NULL);
>>                         break;
>>                 case IND_SOURCE:
>> +                       if (imgsize_flag == KERN_SIZE_FLAG)
>> +                               kexec_kernel_size++;
>>                         cb(ctx, flag, addr, dest);
>>                         dest += PAGE_SIZE;
>>                         break;
>> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image)
>>
>>         kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
>>
>> +       /*
>> +        * Make sure virtual addresses of new kernel are flushed
>> +        * SZ_512K = TEXT_OFFSET
>
> TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also
> shouldn't need to be flushed.
>
> Since c218bca74eea (arm64: Relax the kernel cache requirements for
> boot), the kernel will flush the cache for anything outside of the Image
> that it writes to before enabling the MMU and caches (e.g. the idmap and
> swapper page tables). Once caches are up we shouldn't care.
>

Ok. TEXT_OFFSET macro is not exported, that's why I used SZ_512K.
Any particular reason for not exporting it?

> Assuming that the existing kernel code is correct, the only region we
> should need to flush out to the PoC is the region from _text to _edata
> (i.e. just the contents of the Image).
>
>> +        * kexec_kernel = kexec_kernel_size * PAGE_SIZE
>> +        * Don't know = (SZ_4M + SZ_1M)
>> +        * SZ_4M = not working
>> +        * SZ_6M = working
>> +        * SZ_8M = working
>> +        *
>> +        * so chose SZ_4M + SZ_1M; Don't know why this is required
>> +        * BSS, stack ??
>> +        *
>> +        */
>> +       __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K +
>> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M);
>> +
>>         soft_restart(reboot_code_buffer_phys);
>>  }
>
> How big exactly is the kernel Image you're trying to kexec?

For the 1st and second stage I use the same kernel as they
are combined with intramfs final image size varies.

1st stage
-------------
$ls -l arch/arm64/boot/uImage
-rw-rw-r-- 1 arun arun 12895544 Jul 30 10:57 arch/arm64/boot/uImage

It will boot to intramfs

$ls /
bin  dtb.dtb  home  lib    linuxrc  mnt  proc  run   share  tmp  var
dev  etc      init  lib64  media    opt  root  sbin  sys    usr  vmlinux.strip

kexec will boot vmlinux.strip
$ls -l vmlinux.strip
-rwxrwxr-x 1 arun arun 8194760 Jul 30 10:55 vmlinux.strip

2nd stage
--------------
$ls -l arch/arm64/boot/uImage
-rw-rw-r-- 1 arun arun 8127800 Jul 30 10:54 arch/arm64/boot/uImage

The corresponding vmlinux is converted to vmlinux.strip

$aarch64-linux-gnu-strip vmlinux -o vmlinux.strip
$cp vmlinux.strip /ramfs/aarch64_le_rootfs/

--Arun

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-30  5:46                               ` Arun Chandran
@ 2014-07-30  9:16                                 ` Mark Rutland
  0 siblings, 0 replies; 61+ messages in thread
From: Mark Rutland @ 2014-07-30  9:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 30, 2014 at 06:46:39AM +0100, Arun Chandran wrote:
> On Tue, Jul 29, 2014 at 7:05 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> > [...]
> >
> >> The default code did not work.
> >>
> >> It is working with the change below
> >>
> >> ###############
> >> diff --git a/arch/arm64/kernel/machine_kexec.c
> >> b/arch/arm64/kernel/machine_kexec.c
> >> index 5632473..7c5f859 100644
> >> --- a/arch/arm64/kernel/machine_kexec.c
> >> +++ b/arch/arm64/kernel/machine_kexec.c
> >> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb)
> >>  /**
> >>   * kexec_list_walk - Helper to walk the kimage page list.
> >>   */
> >> -
> >> +static int kexec_kernel_size;
> >> +#define IMG_SIZE_NONE  0
> >> +#define KERN_SIZE_FLAG 1
> >> +#define DTB_SIZE_FLAG  2
> >>  static void kexec_list_walk(void *ctx, unsigned long kimage_head,
> >>         void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
> >>  {
> >>         void *dest;
> >>         unsigned long *entry;
> >> +       int imgsize_flag = IMG_SIZE_NONE;
> >> +
> >>
> >>         for (entry = &kimage_head, dest = NULL; ; entry++) {
> >>                 unsigned int flag = *entry & IND_FLAGS;
> >> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned
> >> long kimage_head,
> >>                         cb(ctx, flag, addr, NULL);
> >>                         break;
> >>                 case IND_DESTINATION:
> >> +                       if (imgsize_flag == IMG_SIZE_NONE) {
> >> +                               kexec_kernel_size = 0;
> >> +                               imgsize_flag = KERN_SIZE_FLAG;
> >> +                       } else if (imgsize_flag == KERN_SIZE_FLAG) {
> >> +                               imgsize_flag = DTB_SIZE_FLAG;
> >> +                       }
> >>                         dest = addr;
> >>                         cb(ctx, flag, addr, NULL);
> >>                         break;
> >>                 case IND_SOURCE:
> >> +                       if (imgsize_flag == KERN_SIZE_FLAG)
> >> +                               kexec_kernel_size++;
> >>                         cb(ctx, flag, addr, dest);
> >>                         dest += PAGE_SIZE;
> >>                         break;
> >> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image)
> >>
> >>         kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
> >>
> >> +       /*
> >> +        * Make sure virtual addresses of new kernel are flushed
> >> +        * SZ_512K = TEXT_OFFSET
> >
> > TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also
> > shouldn't need to be flushed.
> >
> > Since c218bca74eea (arm64: Relax the kernel cache requirements for
> > boot), the kernel will flush the cache for anything outside of the Image
> > that it writes to before enabling the MMU and caches (e.g. the idmap and
> > swapper page tables). Once caches are up we shouldn't care.
> >
> 
> Ok. TEXT_OFFSET macro is not exported, that's why I used SZ_512K.
> Any particular reason for not exporting it?

The TEXT_OFFSET of the kernel you intend to execute can be found in its
Image header. Take a look at Documentation/arm64/booting.txt. This may
be different from the TEXT_OFFSET of the current kernel.

As of v3.17 all fields in the header will be little-endian, and
booting.txt has been updated for this. You can check the image_size
field to determine whether the fields are all little-endian. Only when
image_size is zero can TEXT_OFFSET be assumed to be 0x80000.

> 
> > Assuming that the existing kernel code is correct, the only region we
> > should need to flush out to the PoC is the region from _text to _edata
> > (i.e. just the contents of the Image).
> >
> >> +        * kexec_kernel = kexec_kernel_size * PAGE_SIZE
> >> +        * Don't know = (SZ_4M + SZ_1M)
> >> +        * SZ_4M = not working
> >> +        * SZ_6M = working
> >> +        * SZ_8M = working
> >> +        *
> >> +        * so chose SZ_4M + SZ_1M; Don't know why this is required
> >> +        * BSS, stack ??
> >> +        *
> >> +        */
> >> +       __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K +
> >> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M);
> >> +
> >>         soft_restart(reboot_code_buffer_phys);
> >>  }
> >
> > How big exactly is the kernel Image you're trying to kexec?
> 
> For the 1st and second stage I use the same kernel as they
> are combined with intramfs final image size varies.
> 
> 1st stage
> -------------
> $ls -l arch/arm64/boot/uImage
> -rw-rw-r-- 1 arun arun 12895544 Jul 30 10:57 arch/arm64/boot/uImage
> 
> It will boot to intramfs
> 
> $ls /
> bin  dtb.dtb  home  lib    linuxrc  mnt  proc  run   share  tmp  var
> dev  etc      init  lib64  media    opt  root  sbin  sys    usr  vmlinux.strip
> 
> kexec will boot vmlinux.strip
> $ls -l vmlinux.strip
> -rwxrwxr-x 1 arun arun 8194760 Jul 30 10:55 vmlinux.strip
> 
> 2nd stage
> --------------
> $ls -l arch/arm64/boot/uImage
> -rw-rw-r-- 1 arun arun 8127800 Jul 30 10:54 arch/arm64/boot/uImage
> 
> The corresponding vmlinux is converted to vmlinux.strip
> 
> $aarch64-linux-gnu-strip vmlinux -o vmlinux.strip
> $cp vmlinux.strip /ramfs/aarch64_le_rootfs/

OK, thanks for the info.

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-29 13:35                             ` Mark Rutland
  2014-07-29 21:19                               ` Geoff Levand
  2014-07-30  5:46                               ` Arun Chandran
@ 2014-07-30  7:01                               ` Arun Chandran
  2 siblings, 0 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-30  7:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 29, 2014 at 7:05 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> [...]
>
>> The default code did not work.
>>
>> It is working with the change below
>>
>> ###############
>> diff --git a/arch/arm64/kernel/machine_kexec.c
>> b/arch/arm64/kernel/machine_kexec.c
>> index 5632473..7c5f859 100644
>> --- a/arch/arm64/kernel/machine_kexec.c
>> +++ b/arch/arm64/kernel/machine_kexec.c
>> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb)
>>  /**
>>   * kexec_list_walk - Helper to walk the kimage page list.
>>   */
>> -
>> +static int kexec_kernel_size;
>> +#define IMG_SIZE_NONE  0
>> +#define KERN_SIZE_FLAG 1
>> +#define DTB_SIZE_FLAG  2
>>  static void kexec_list_walk(void *ctx, unsigned long kimage_head,
>>         void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
>>  {
>>         void *dest;
>>         unsigned long *entry;
>> +       int imgsize_flag = IMG_SIZE_NONE;
>> +
>>
>>         for (entry = &kimage_head, dest = NULL; ; entry++) {
>>                 unsigned int flag = *entry & IND_FLAGS;
>> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned
>> long kimage_head,
>>                         cb(ctx, flag, addr, NULL);
>>                         break;
>>                 case IND_DESTINATION:
>> +                       if (imgsize_flag == IMG_SIZE_NONE) {
>> +                               kexec_kernel_size = 0;
>> +                               imgsize_flag = KERN_SIZE_FLAG;
>> +                       } else if (imgsize_flag == KERN_SIZE_FLAG) {
>> +                               imgsize_flag = DTB_SIZE_FLAG;
>> +                       }
>>                         dest = addr;
>>                         cb(ctx, flag, addr, NULL);
>>                         break;
>>                 case IND_SOURCE:
>> +                       if (imgsize_flag == KERN_SIZE_FLAG)
>> +                               kexec_kernel_size++;
>>                         cb(ctx, flag, addr, dest);
>>                         dest += PAGE_SIZE;
>>                         break;
>> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image)
>>
>>         kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
>>
>> +       /*
>> +        * Make sure virtual addresses of new kernel are flushed
>> +        * SZ_512K = TEXT_OFFSET
>
> TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also
> shouldn't need to be flushed.
>
> Since c218bca74eea (arm64: Relax the kernel cache requirements for
> boot), the kernel will flush the cache for anything outside of the Image
> that it writes to before enabling the MMU and caches (e.g. the idmap and
> swapper page tables). Once caches are up we shouldn't care.
>
> Assuming that the existing kernel code is correct, the only region we
> should need to flush out to the PoC is the region from _text to _edata
> (i.e. just the contents of the Image).

I think we missed the dtb part. dtb is placed below the
kernel. We need to flush that also.

Geoff's new code manages that also. It is now
working for me.

--Arun
>
>> +        * kexec_kernel = kexec_kernel_size * PAGE_SIZE
>> +        * Don't know = (SZ_4M + SZ_1M)
>> +        * SZ_4M = not working
>> +        * SZ_6M = working
>> +        * SZ_8M = working
>> +        *
>> +        * so chose SZ_4M + SZ_1M; Don't know why this is required
>> +        * BSS, stack ??
>> +        *
>> +        */
>> +       __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K +
>> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M);
>> +
>>         soft_restart(reboot_code_buffer_phys);
>>  }
>
> How big exactly is the kernel Image you're trying to kexec?
>
> Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-24  9:36             ` Mark Rutland
  2014-07-24 12:49               ` Arun Chandran
  2014-07-25  0:17               ` Geoff Levand
@ 2014-07-25 10:26               ` Arun Chandran
  2014-07-25 11:29                 ` Mark Rutland
  2 siblings, 1 reply; 61+ messages in thread
From: Arun Chandran @ 2014-07-25 10:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 24, 2014 at 3:06 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>> Hi Arun,
>>
>> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:
>>
>> > I tried the same dtb with UP configuration. For UP kernel to compile
>> > did the below modifications
>>
>> I'll test and fixup the kexec UP build in the next few days.
>>
>> ...
>>
>> > With the default target configuration "kexec -e" failed to execute
>> > in UP scenario also.
>
> It would be helpful to know _how_ it failed. Do you have any log output?
>
>> >
>> > But I had some luck when I did the same steps with L3 cache
>> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
>> > it has an L3 cache. Luckily I was able to disable it in u-boot.
>> >
>> > With the L3 cache disabled configuration I am able to
>> > do "kexec -e". Please see the log attached.
>
> Hmm. We don't expect the kernel to do any L3 management. It seems that
> memory subsystems with L3 caches respecting cache maintenance by VA are
> going to become relatively common, and we expect to handle them all by
> performing maintenance by VA. See commit c218bca74eea (arm64: Relax the
> kernel cache requirements for boot) for what we do at boot time.
>
>>
>> All memory management for the main cpu is done by the arch code.  Kexec
>> and cpu hot plug only work with the secondary cpus, so the problem would
>> be in the arch memory code, either in setup_restart() for shutdown, or
>> in the startup code.
>
> It's possible that soft_restart and setup_restart are a little dodgy, as
> they also rely on the compiler being smart and not touching the stack
> after setup_restart().
>
Could you please explain why this is required?


This is my disassembled output of  soft_restart()
With the latest code from
https://git.linaro.org/people/geoff.levand/linux-kexec.git

ffffffc000085014 <soft_restart>:
ffffffc000085014:       a9be7bfd        stp     x29, x30, [sp,#-32]!
ffffffc000085018:       910003fd        mov     x29, sp
ffffffc00008501c:       f9000fa0        str     x0, [x29,#24]
ffffffc000085020:       94003c49        bl      ffffffc000094144
<setup_mm_for_reboot>
ffffffc000085024:       94003a6b        bl      ffffffc0000939d0
<flush_cache_all>
ffffffc000085028:       94003cde        bl      ffffffc0000943a0 <cpu_cache_off>
ffffffc00008502c:       94003a69        bl      ffffffc0000939d0
<flush_cache_all>
ffffffc000085030:       90006201        adrp    x1, ffffffc000cc5000
<reset_devices>
ffffffc000085034:       f9400fa0        ldr     x0, [x29,#24]
ffffffc000085038:       f940fc22        ldr     x2, [x1,#504]
ffffffc00008503c:       f0000061        adrp    x1, ffffffc000094000
<arch_pick_mmap_layout+0x150>
ffffffc000085040:       910f0021        add     x1, x1, #0x3c0
ffffffc000085044:       8b010041        add     x1, x2, x1
ffffffc000085048:       d2c00802        mov     x2, #0x4000000000
         // #274877906944
ffffffc00008504c:       8b020021        add     x1, x1, x2
ffffffc000085050:       d63f0020        blr     x1
ffffffc000085054:       f0002940        adrp    x0, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc000085058:       f0002941        adrp    x1, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc00008505c:       90002143        adrp    x3, ffffffc0004ad000
<__start_rodata>
ffffffc000085060:       91128000        add     x0, x0, #0x4a0
ffffffc000085064:       913de021        add     x1, x1, #0xf78
ffffffc000085068:       52800c22        mov     w2, #0x61
         // #97
ffffffc00008506c:       91072063        add     x3, x3, #0x1c8
ffffffc000085070:       941071d0        bl      ffffffc0004a17b0 <printk>
ffffffc000085074:       f0002940        adrp    x0, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc000085078:       91134000        add     x0, x0, #0x4d0
ffffffc00008507c:       9410712c        bl      ffffffc0004a152c <panic>

If I single step the code,

This is how my stack looks like @ffffffc00008501c
CPU#0>mdd 0xffffffc3eb83fcf0
ffffffc3_eb83fcf0 : ffffffc3eb83fd10  ........
ffffffc3_eb83fcf8 : ffffffc000092778  ......'x
ffffffc3_eb83fd00 : ffffffc000cc9f70  .......p
ffffffc3_eb83fd08 : 00000043eae32000  ...C.. .
ffffffc3_eb83fd10 : ffffffc3eb83fd70  .......p
ffffffc3_eb83fd18 : ffffffc0000fc018  ........
ffffffc3_eb83fd20 : ffffffc000c95000  ......P.
ffffffc3_eb83fd28 : 0000000000000000  ........
ffffffc3_eb83fd30 : ffffffc000cd06a0  ........
ffffffc3_eb83fd38 : 0000000000000000  ........
ffffffc3_eb83fd40 : 0000000080000000  ........
ffffffc3_eb83fd48 : 0000000000000015  ........
ffffffc3_eb83fd50 : 0000000000000115  ........
ffffffc3_eb83fd58 : 000000000000008e  ........
ffffffc3_eb83fd60 : ffffffc000c8b000  ........
ffffffc3_eb83fd68 : ffffffc3eb83c000  ........

And this is how it looks like @ffffffc000085030
CPU#0>mdd 0xffffffc3eb83fcf0
ffffffc3_eb83fcf0 : 0000000000000115  ........
ffffffc3_eb83fcf8 : 000000000000003f  .......?
ffffffc3_eb83fd00 : ffffffc3eb83fd30  .......0
ffffffc3_eb83fd08 : ffffffc000120360  .......`
ffffffc3_eb83fd10 : 0000000000000002  ........
ffffffc3_eb83fd18 : ffffffbcedb611c0  ........
ffffffc3_eb83fd20 : ffffffbcedb611c0  ........
ffffffc3_eb83fd28 : ffffffc3eae08000  ........
ffffffc3_eb83fd30 : ffffffc3000200d0  ........
ffffffc3_eb83fd38 : ffffffc000120708  ........
ffffffc3_eb83fd40 : 0000000000000000  ........
ffffffc3_eb83fd48 : 72a00040528f8fe0  r.. at R...
ffffffc3_eb83fd50 : 540002a16a00003f  T...j..?
ffffffc3_eb83fd58 : 3627fe60f8538260  6'.`.S.`
ffffffc3_eb83fd60 : 97fff80daa1303e0  ........
ffffffc3_eb83fd68 : 97fc01ffaa1403e0  ........

It is clearly getting corrupted.

Now with keeping caches on
######
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 786daa6..6ff3d9f 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -76,10 +76,10 @@ static void setup_restart(void)
        flush_cache_all();

        /* Turn D-cache off */
-       cpu_cache_off();
+       //cpu_cache_off();

        /* Push out any further dirty data, and ensure cache is empty */
-       flush_cache_all();
+       //flush_cache_all();
 }

 void soft_restart(unsigned long addr)
#######

ffffffc000085014 <soft_restart>:
ffffffc000085014:       a9be7bfd        stp     x29, x30, [sp,#-32]!
ffffffc000085018:       910003fd        mov     x29, sp
ffffffc00008501c:       f9000fa0        str     x0, [x29,#24]
ffffffc000085020:       94003c49        bl      ffffffc000094144
<setup_mm_for_reboot>
ffffffc000085024:       94003a6b        bl      ffffffc0000939d0
<flush_cache_all>
ffffffc000085028:       90006201        adrp    x1, ffffffc000cc5000
<reset_devices>
ffffffc00008502c:       f9400fa0        ldr     x0, [x29,#24]
ffffffc000085030:       f940fc22        ldr     x2, [x1,#504]
ffffffc000085034:       f0000061        adrp    x1, ffffffc000094000
<arch_pick_mmap_layout+0x150>
ffffffc000085038:       910f0021        add     x1, x1, #0x3c0
ffffffc00008503c:       8b010041        add     x1, x2, x1
ffffffc000085040:       d2c00802        mov     x2, #0x4000000000
         // #274877906944
ffffffc000085044:       8b020021        add     x1, x1, x2
ffffffc000085048:       d63f0020        blr     x1
ffffffc00008504c:       f0002940        adrp    x0, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc000085050:       f0002941        adrp    x1, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc000085054:       90002143        adrp    x3, ffffffc0004ad000
<__start_rodata>
ffffffc000085058:       91128000        add     x0, x0, #0x4a0
ffffffc00008505c:       913de021        add     x1, x1, #0xf78
ffffffc000085060:       52800c22        mov     w2, #0x61
         // #97
ffffffc000085064:       91072063        add     x3, x3, #0x1c8
ffffffc000085068:       941071d2        bl      ffffffc0004a17b0 <printk>
ffffffc00008506c:       f0002940        adrp    x0, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc000085070:       91134000        add     x0, x0, #0x4d0
ffffffc000085074:       9410712e        bl      ffffffc0004a152c <panic>

Now my stack @ffffffc00008501c and @ffffffc000085028 are same.
It is

CPU#0>mdd 0xffffffc3eae27cf0
ffffffc3_eae27cf0 : ffffffc3eae27d10  ......}.
ffffffc3_eae27cf8 : ffffffc000092778  ......'x
ffffffc3_eae27d00 : ffffffc000cc9f70  .......p
ffffffc3_eae27d08 : 00000043f0171000  ...C....
ffffffc3_eae27d10 : ffffffc3eae27d70  ......}p
ffffffc3_eae27d18 : ffffffc0000fc018  ........
ffffffc3_eae27d20 : ffffffc000c95000  ......P.
ffffffc3_eae27d28 : 0000000000000000  ........
ffffffc3_eae27d30 : ffffffc000cd06a0  ........
ffffffc3_eae27d38 : 0000000000000000  ........
ffffffc3_eae27d40 : 0000000080000000  ........
ffffffc3_eae27d48 : 0000000000000015  ........
ffffffc3_eae27d50 : 0000000000000115  ........
ffffffc3_eae27d58 : 000000000000008e  ........
ffffffc3_eae27d60 : ffffffc000c8b000  ........
ffffffc3_eae27d68 : ffffffc3eae24000  ...... at .

--Arun

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-25 10:26               ` Arun Chandran
@ 2014-07-25 11:29                 ` Mark Rutland
  0 siblings, 0 replies; 61+ messages in thread
From: Mark Rutland @ 2014-07-25 11:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 25, 2014 at 11:26:46AM +0100, Arun Chandran wrote:
> On Thu, Jul 24, 2014 at 3:06 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
> >> Hi Arun,
> >>
> >> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:
> >>
> >> > I tried the same dtb with UP configuration. For UP kernel to compile
> >> > did the below modifications
> >>
> >> I'll test and fixup the kexec UP build in the next few days.
> >>
> >> ...
> >>
> >> > With the default target configuration "kexec -e" failed to execute
> >> > in UP scenario also.
> >
> > It would be helpful to know _how_ it failed. Do you have any log output?
> >
> >> >
> >> > But I had some luck when I did the same steps with L3 cache
> >> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
> >> > it has an L3 cache. Luckily I was able to disable it in u-boot.
> >> >
> >> > With the L3 cache disabled configuration I am able to
> >> > do "kexec -e". Please see the log attached.
> >
> > Hmm. We don't expect the kernel to do any L3 management. It seems that
> > memory subsystems with L3 caches respecting cache maintenance by VA are
> > going to become relatively common, and we expect to handle them all by
> > performing maintenance by VA. See commit c218bca74eea (arm64: Relax the
> > kernel cache requirements for boot) for what we do at boot time.
> >
> >>
> >> All memory management for the main cpu is done by the arch code.  Kexec
> >> and cpu hot plug only work with the secondary cpus, so the problem would
> >> be in the arch memory code, either in setup_restart() for shutdown, or
> >> in the startup code.
> >
> > It's possible that soft_restart and setup_restart are a little dodgy, as
> > they also rely on the compiler being smart and not touching the stack
> > after setup_restart().
> >
> Could you please explain why this is required?

It's a requirement because we have no guarantee that the stack (or any
other memory addresses) will be flushed out to the PoC and become
visible to non-cacheable accesses.

Any use of the stack after we disable the d-cache is a bug. If you need
a VA range visible to non-cacheable accesses (i.e. after the cache is
disabled), you must flush that range to the PoC by VA.

The kernel text _should_ be out at the PoC per the boot protocol
requirements, so we don't have to flush it to execute correctly unless
we've performed some kernel text patching.

Arguably the first __flush_dcache_all call here is unnecessary; it
doesn't provide any guarantee we can rely on. The __flush_dcache_all
_after_ disabling the caches will ensure that the local caches are empty
(avoiding unexpected hits for non-cacheable accesses).

> 
> This is my disassembled output of  soft_restart()
> With the latest code from
> https://git.linaro.org/people/geoff.levand/linux-kexec.git
> 
> ffffffc000085014 <soft_restart>:
> ffffffc000085014:       a9be7bfd        stp     x29, x30, [sp,#-32]!
> ffffffc000085018:       910003fd        mov     x29, sp
> ffffffc00008501c:       f9000fa0        str     x0, [x29,#24]
> ffffffc000085020:       94003c49        bl      ffffffc000094144
> <setup_mm_for_reboot>
> ffffffc000085024:       94003a6b        bl      ffffffc0000939d0
> <flush_cache_all>
> ffffffc000085028:       94003cde        bl      ffffffc0000943a0 <cpu_cache_off>
> ffffffc00008502c:       94003a69        bl      ffffffc0000939d0
> <flush_cache_all>
> ffffffc000085030:       90006201        adrp    x1, ffffffc000cc5000
> <reset_devices>
> ffffffc000085034:       f9400fa0        ldr     x0, [x29,#24]
> ffffffc000085038:       f940fc22        ldr     x2, [x1,#504]
> ffffffc00008503c:       f0000061        adrp    x1, ffffffc000094000
> <arch_pick_mmap_layout+0x150>
> ffffffc000085040:       910f0021        add     x1, x1, #0x3c0
> ffffffc000085044:       8b010041        add     x1, x2, x1
> ffffffc000085048:       d2c00802        mov     x2, #0x4000000000
>          // #274877906944
> ffffffc00008504c:       8b020021        add     x1, x1, x2
> ffffffc000085050:       d63f0020        blr     x1
> ffffffc000085054:       f0002940        adrp    x0, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc000085058:       f0002941        adrp    x1, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc00008505c:       90002143        adrp    x3, ffffffc0004ad000
> <__start_rodata>
> ffffffc000085060:       91128000        add     x0, x0, #0x4a0
> ffffffc000085064:       913de021        add     x1, x1, #0xf78
> ffffffc000085068:       52800c22        mov     w2, #0x61
>          // #97
> ffffffc00008506c:       91072063        add     x3, x3, #0x1c8
> ffffffc000085070:       941071d0        bl      ffffffc0004a17b0 <printk>
> ffffffc000085074:       f0002940        adrp    x0, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc000085078:       91134000        add     x0, x0, #0x4d0
> ffffffc00008507c:       9410712c        bl      ffffffc0004a152c <panic>
> 
> If I single step the code,
> 
> This is how my stack looks like @ffffffc00008501c
> CPU#0>mdd 0xffffffc3eb83fcf0
> ffffffc3_eb83fcf0 : ffffffc3eb83fd10  ........
> ffffffc3_eb83fcf8 : ffffffc000092778  ......'x
> ffffffc3_eb83fd00 : ffffffc000cc9f70  .......p
> ffffffc3_eb83fd08 : 00000043eae32000  ...C.. .
> ffffffc3_eb83fd10 : ffffffc3eb83fd70  .......p
> ffffffc3_eb83fd18 : ffffffc0000fc018  ........
> ffffffc3_eb83fd20 : ffffffc000c95000  ......P.
> ffffffc3_eb83fd28 : 0000000000000000  ........
> ffffffc3_eb83fd30 : ffffffc000cd06a0  ........
> ffffffc3_eb83fd38 : 0000000000000000  ........
> ffffffc3_eb83fd40 : 0000000080000000  ........
> ffffffc3_eb83fd48 : 0000000000000015  ........
> ffffffc3_eb83fd50 : 0000000000000115  ........
> ffffffc3_eb83fd58 : 000000000000008e  ........
> ffffffc3_eb83fd60 : ffffffc000c8b000  ........
> ffffffc3_eb83fd68 : ffffffc3eb83c000  ........
> 
> And this is how it looks like @ffffffc000085030
> CPU#0>mdd 0xffffffc3eb83fcf0
> ffffffc3_eb83fcf0 : 0000000000000115  ........
> ffffffc3_eb83fcf8 : 000000000000003f  .......?
> ffffffc3_eb83fd00 : ffffffc3eb83fd30  .......0
> ffffffc3_eb83fd08 : ffffffc000120360  .......`
> ffffffc3_eb83fd10 : 0000000000000002  ........
> ffffffc3_eb83fd18 : ffffffbcedb611c0  ........
> ffffffc3_eb83fd20 : ffffffbcedb611c0  ........
> ffffffc3_eb83fd28 : ffffffc3eae08000  ........
> ffffffc3_eb83fd30 : ffffffc3000200d0  ........
> ffffffc3_eb83fd38 : ffffffc000120708  ........
> ffffffc3_eb83fd40 : 0000000000000000  ........
> ffffffc3_eb83fd48 : 72a00040528f8fe0  r.. at R...
> ffffffc3_eb83fd50 : 540002a16a00003f  T...j..?
> ffffffc3_eb83fd58 : 3627fe60f8538260  6'.`.S.`
> ffffffc3_eb83fd60 : 97fff80daa1303e0  ........
> ffffffc3_eb83fd68 : 97fc01ffaa1403e0  ........
> 
> It is clearly getting corrupted.

No it is not getting corrupted.

At ffffffc00008501c the CPU's cache is enabled (i.e. the CPU can make
cacheable accesses). The stack is normal cacheable (WBWA) memory, so
accesses will look-up and allocate in the cache. As the stack is being
accessed often it is warm, and the cache hierarchy does not decide to
evict it and/or write-back to the next level of the cache hierarchy.

At ffffffc000085030 the cache is disabled (i.e. the CPU cannot make
cacheable accesses). The effective attributes for the stack are
non-cacheable, so the CPU will bypass the cache hierarchy and go
straight to the PoC when it tried to read stack addresses. As we have
not flushed the stack by VA, (dirty) cachelines covering the stack might
be anywhere in the cache hierarchy, and aren't guaranteed to have
reached to PoC.

So the CPU reads _stale_ data from the PoC. There is no corruption; the
cache lines are still valid somewhere (we never invalidated them without
cleaning them first), but are simply not visible to non-cacheable
accesses.

Are we actually accessing the stack? If not, then there's no need to
worry. If we are, then the code which is doing so is buggy and either
needs to flush the stack by VA to the PoC, or (better) not use the stack
in this case and just use registers.

Does that all make sense? Or do I need to get my ascii-art paintbrush
out?

> 
> Now with keeping caches on
> ######
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 786daa6..6ff3d9f 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -76,10 +76,10 @@ static void setup_restart(void)
>         flush_cache_all();
> 
>         /* Turn D-cache off */
> -       cpu_cache_off();
> +       //cpu_cache_off();
> 
>         /* Push out any further dirty data, and ensure cache is empty */
> -       flush_cache_all();
> +       //flush_cache_all();
>  }
> 
>  void soft_restart(unsigned long addr)
> #######
> 
> ffffffc000085014 <soft_restart>:
> ffffffc000085014:       a9be7bfd        stp     x29, x30, [sp,#-32]!
> ffffffc000085018:       910003fd        mov     x29, sp
> ffffffc00008501c:       f9000fa0        str     x0, [x29,#24]
> ffffffc000085020:       94003c49        bl      ffffffc000094144
> <setup_mm_for_reboot>
> ffffffc000085024:       94003a6b        bl      ffffffc0000939d0
> <flush_cache_all>
> ffffffc000085028:       90006201        adrp    x1, ffffffc000cc5000
> <reset_devices>
> ffffffc00008502c:       f9400fa0        ldr     x0, [x29,#24]
> ffffffc000085030:       f940fc22        ldr     x2, [x1,#504]
> ffffffc000085034:       f0000061        adrp    x1, ffffffc000094000
> <arch_pick_mmap_layout+0x150>
> ffffffc000085038:       910f0021        add     x1, x1, #0x3c0
> ffffffc00008503c:       8b010041        add     x1, x2, x1
> ffffffc000085040:       d2c00802        mov     x2, #0x4000000000
>          // #274877906944
> ffffffc000085044:       8b020021        add     x1, x1, x2
> ffffffc000085048:       d63f0020        blr     x1
> ffffffc00008504c:       f0002940        adrp    x0, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc000085050:       f0002941        adrp    x1, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc000085054:       90002143        adrp    x3, ffffffc0004ad000
> <__start_rodata>
> ffffffc000085058:       91128000        add     x0, x0, #0x4a0
> ffffffc00008505c:       913de021        add     x1, x1, #0xf78
> ffffffc000085060:       52800c22        mov     w2, #0x61
>          // #97
> ffffffc000085064:       91072063        add     x3, x3, #0x1c8
> ffffffc000085068:       941071d2        bl      ffffffc0004a17b0 <printk>
> ffffffc00008506c:       f0002940        adrp    x0, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc000085070:       91134000        add     x0, x0, #0x4d0
> ffffffc000085074:       9410712e        bl      ffffffc0004a152c <panic>
> 
> Now my stack @ffffffc00008501c and @ffffffc000085028 are same.

Sure, with the caches on you have the guarantee that prior writes are
still visible.

You either need to perform maintenance by VA to make the stack visible,
or (better) don't access memory once the cache is disabled. Just load
everything you need into registers in advance.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-24  0:38           ` Geoff Levand
  2014-07-24  9:36             ` Mark Rutland
@ 2014-07-24 11:50             ` Arun Chandran
  1 sibling, 0 replies; 61+ messages in thread
From: Arun Chandran @ 2014-07-24 11:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, Jul 24, 2014 at 6:08 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:
>
>> I tried the same dtb with UP configuration. For UP kernel to compile
>> did the below modifications
>
> I'll test and fixup the kexec UP build in the next few days.
>

Ok.

> ...
>
>> With the default target configuration "kexec -e" failed to execute
>> in UP scenario also.
>>
>> But I had some luck when I did the same steps with L3 cache
>> disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
>> it has an L3 cache. Luckily I was able to disable it in u-boot.
>>
>> With the L3 cache disabled configuration I am able to
>> do "kexec -e". Please see the log attached.
>
> All memory management for the main cpu is done by the arch code.  Kexec
> and cpu hot plug only work with the secondary cpus, so the problem would
> be in the arch memory code, either in setup_restart() for shutdown, or
> in the startup code.
>
> I guess setup_restart() is not doing something it needs to do for your
> platform.
>
I have done different experiments with L3 enabled in UP(uni processor) scenario.
Please note that in all the experiments first stage and second stage kernels
are same.

-- Experiment 1--
Kernel is modified to loop before jumping to the kexec "relocate_new_kernel"
code + other modification (disable Dcache turning off)
###############
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 3f7b0a2..e4ea22f 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -73,6 +73,8 @@ ENTRY(cpu_reset)
        bic     x1, x1, #1
        msr     sctlr_el1, x1                   // disable the MMU
        isb
+loop:
+       b       loop

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 31cba91..888fe3f 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -70,10 +70,10 @@ static void setup_restart(void)
        flush_cache_all();

        /* Turn D-cache off */
-       cpu_cache_off();
+       //cpu_cache_off();

        /* Push out any further dirty data, and ensure cache is empty */
-       flush_cache_all();
+       //flush_cache_all();
 }
################

a) Load the second kernel "kexec -l"
b) Execute kexec -e; now it is looping @loop
c) Break into target using BDI3000
d) Flush L3 cache from BDI3000
c) Jump to relocate_new_kernel

CPU#0>rd
GPR00: 00000043eae0f000 0000000034d5d91c 0000004000000000 0000000000000004

CPU#0>go 0x00000043eae0f000

e) Kexeced kernel is booted without any issue.

--Experiment2--
Now revert only the Dcache disabling change
############
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 888fe3f..6bc85f78 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -70,10 +70,10 @@ static void setup_restart(void)
        flush_cache_all();

        /* Turn D-cache off */
-       //cpu_cache_off();
+       cpu_cache_off();

        /* Push out any further dirty data, and ensure cache is empty */
-       //flush_cache_all();
+       flush_cache_all();
 }
###############

Do the above steps a, b and c.

d)
>From BDI3000 I see strange value for x0
CPU#0>rd
GPR00: 000000000000003f 0000000034d5d918 0000004000000000 0000000000000004
e) Flush L3 cache from BDI3000
f) Jump to relocate_new_kernel (kexec -e prints this address)

machine_kexec:584: reboot_code_buffer_phys:  00000043f0381000

CPU#0>go 0x00000043f0381000

g) Kexeced kernel fails to boot

CPU#0>h
    Core number       : 0
    Core state        : debug (AArch64 EL1)
    Debug entry cause : External Debug Request
    Current PC        : 0xffffffc000083200
    Current CPSR      : 0x000003c5 (EL1h)

So If i don't turn off the dcache and flush L3 using
BDI3000 things are working.

--Experiment3--
Added L3 flush code to kernel +  other modification (disable Dcache turning off)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 0faa45a..5c546bb 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -233,6 +233,20 @@ section_table:

 ENTRY(stext)
        mov     x21, x0                         // x21=FDT
+flush_l3:
+       mov     x2, #0x10
+       mov     w1, #0x1f00
+       movk    x2, #0x7e60, lsl #16
+       movk    w1, #0x1600, lsl #16
+       str     w1, [x2]
+       mov     x4, #0
+wait_flush:
+       ldr     w1, [x2]
+       add     x4, x4 ,#1
+       tbz     w1, #31, wait_done
+       b       wait_flush
+wait_done:
+
        bl      el2_setup                       // Drop to EL1,
w20=cpu_boot_mode
        bl      __calc_phys_offset              // x24=PHYS_OFFSET,
x28=PHYS_OFFSET-PAGE_OFFSET
        bl      set_cpu_boot_mode_flag

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 31cba91..888fe3f 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -70,10 +70,10 @@ static void setup_restart(void)
        flush_cache_all();

        /* Turn D-cache off */
-       cpu_cache_off();
+       //cpu_cache_off();

        /* Push out any further dirty data, and ensure cache is empty */
-       flush_cache_all();
+       //flush_cache_all();
 }

 /*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index c29dde1..3f7b0a2 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -73,7 +73,22 @@ ENTRY(cpu_reset)
        bic     x1, x1, #1
        msr     sctlr_el1, x1                   // disable the MMU
        isb
-       bl      secondary_shutdown
+
+flush_l3:
+       mov     x2, #0x10
+       mov     w1, #0x1f00
+       movk    x2, #0x7e60, lsl #16
+       movk    w1, #0x1600, lsl #16
+       str     w1, [x2]
+       mov     x4, #0
+wait_flush:
+       ldr     w1, [x2]
+       add     x4, x4 ,#1
+       tbz     w1, #31, wait_done
+       b       wait_flush
+wait_done:
+
+#      bl      secondary_shutdown
        ret     x0
 ENDPROC(cpu_reset)

Now also kexeced kernel boots fine.
If i do the same with "Dcache turning off enabled"
booting of kexeced kernel fails.

--Arun

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-22 13:25         ` Arun Chandran
  2014-07-24  0:38           ` Geoff Levand
@ 2014-07-30  3:26           ` Feng Kan
  1 sibling, 0 replies; 61+ messages in thread
From: Feng Kan @ 2014-07-30  3:26 UTC (permalink / raw)
  To: linux-arm-kernel

>
> But I had some luck when I did the same steps with L3 cache
> disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
> it has an L3 cache. Luckily I was able to disable it in u-boot.
>
> With the L3 cache disabled configuration I am able to
> do "kexec -e". Please see the log attached.
>
> Feng,
> I doubt kernel is unaware of the presence of L3 cache, this subsequently
> makes "kexec -e" to fail.
Yes, L3 is turned on prior to entering Linux. It is used when Linux
enable cache in
the MMU.

>
> Do you have any idea how to make the kernel to take control of L3 cache?
We don't have this code. Using address 0 to get back to spin address would
require you to disable cache prior to jump.


>
> --Arun

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-22  9:44       ` Arun Chandran
  2014-07-22 13:25         ` Arun Chandran
@ 2014-07-24  0:10         ` Geoff Levand
  2014-07-24  9:13         ` Mark Rutland
  2 siblings, 0 replies; 61+ messages in thread
From: Geoff Levand @ 2014-07-24  0:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Tue, 2014-07-22 at 15:14 +0530, Arun Chandran wrote:
> # kexec -e

Please use 'kexec -d -e' here to get debug output.

> kexec version: 1kvm: exiting hardware virtualization
> 4.07.17.12.17-gbStarting new kernel
> 6cccb4
>  Ump_spin_tanblaeb_lcpeu_d ite:127: oi dh:a n7d,l holding count: 0e
>  kernel NULL pointer dereference at virtual address 00000291
> smp_spin_Itnaibtlei_cpaul_diie:12z7:i nigd :c 3g,r oholding couunpt :s 0u

It is hard to read this, please send the various outputs to different
streams.

> I think some of the secondary CPUs are not behaving as expected;
> As of now I don't have any clues for this.

I guess the cpu-return-addr is not correct.  Try setting cpu-return-addr
to <0x0 0x1>.  This will spin secondaries in secondary_holding_pen.

-Geoff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Kexec on arm64
  2014-07-22  9:44       ` Arun Chandran
  2014-07-22 13:25         ` Arun Chandran
  2014-07-24  0:10         ` Geoff Levand
@ 2014-07-24  9:13         ` Mark Rutland
  2 siblings, 0 replies; 61+ messages in thread
From: Mark Rutland @ 2014-07-24  9:13 UTC (permalink / raw)
  To: linux-arm-kernel

[...]

> 2) Rebooting
> #########################
> # kexec -e
> kexec version: 1kvm: exiting hardware virtualization
> 4.07.17.12.17-gbStarting new kernel
> 6cccb4
>  Ump_spin_tanblaeb_lcpeu_d ite:127: oi dh:a n7d,l holding count: 0e
>  kernel NULL pointer dereference at virtual address 00000291
> smp_spin_Itnaibtlei_cpaul_diie:12z7:i nigd :c 3g,r oholding couunpt :s 0u

Hmm. This looks like two threads/CPUs are outputting characters at the
same time, something like "smp_spin_table_die" and "Initial cpu" seem to
be interspersed.

Mark.

> bsys cpu
> smpL_isnpuixn _table_cpu_diev:e1r2s7i:o ni d: 6, hol3d.i1n6g. 0c-orucnt: 0
> 4+ (arun at arun-OptiPlex-9010) (gcc version 4.9.1 20140505 (prerelease)
> (crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05) )
> #25 SMP smp_sPpin_tablReE_EcMpPuT_ dTie:127: iude:  J5u, hlo ld2ing
> coun2:
> 37:03 IST 2014
> smp_Cspin_tPabUl:e _AcApruc_die:127: id: h46,4  hPorldiong count:c e0s
>  or [500f0000] revision 0
> smp_speifni_:t aGbeltet_cpu_die:i1n2g7 : ipd:a 2, holdinrga mceount:t e0
> rs from FDT:
> smep_fsip:i nC_atable_cpu_dien:'1t27: i df: 1i, holdinngd  cSoyusntt:e 0
> m Table in device tree!
> macchimne_kexaec:: 572C: smp_pMrAo:c efsasiorl_ied = 0
> d to reserve 16 MiB
> dachine_kexecO:n5 7n4o:
>  e 0 totalpages: 4194304
> a k e xec image inNfoor:m
>  l zone: 57344 pages used for memmap
>     type:        0
> z   sta r tN:o     r  m4a0000800l04
>  one: 4194304 pages, LIFO batch:31
>     head:   P E R   43Cea9bPf002
> U: Embedded 11 pages/cpu @ffffffc3fff7d000 s13120 r8192 d23744 u45056
>     nrp_cspeug-maelnltosc: 2
> : s13120 r8192 d23744 u45056 alloc=11*4096
>    p c p usegment-[a0l]l:o c0:0 0[0000400]00 800 000[ -0 ]0000004000
> 881c 0[000],  280c000h bytes,  [200]6 0 3pages
>  [0] 4 [0] 5 [0] 6 [0] 7
> eexBeuc_is_dtb:1i1l5:t  m1a gizc: 0 : 0 : noon
>  lists in Zone order, mobility grouping on.  Total pages: 4136960
> /  K e r segment[1]: n0e0l0 00c0o400m08a0000 m-a n0d00
> 00040l008ai30n00e,:  3r000ho obty=tes, 3/ dpeagesv
>  nfs rw nfsroot=10.162.103.228:/nfs_root/dora_june_6/apm-image-minimal-mustangbe
> ip=10.162.103.21:10.162.103.228:10.162.103.1:255.255.255.0:mustangk:eextehc_0is:_odtb:1f15f:
>  pmaangic:i 0c =: 0 : 1no
> console=ttyS0,115200 earlyprintk=uart8250-32bit,0x1c020000 debug
> maxcpus=8 swiotlb=65536 log_buf_len=1M
> 8aclhinoeg_kex_ec:582: cobnturfo_ll_ecode_page:        nf:f f1ff0fbc4edb67ee8
>  576
> 6achinee_akrelxyec: 58l4: reobogo tb_ucfo dfe_buffer_physr:e e :0
> 000010435eaffb00007
>  (92%)
> macPhine_IkDexec:58 6h:a srehb otot_acode_bufferb:l e   e n t
> rfifffffce3se:a f4f0b000
> 96 (order: 3, 32768 bytes)
> macDhine_kexeecn:t5r88: ryel occate_neaw_ckheer nelh:a s     fffffhf
> c0t0a0093b18
> ble entries: 2097152 (order: 12, 16777216 bytes)
> machineI_kneoxedc:5e90-: relocate_cnaecwh_ek ehransel_size: b8hh( 1t84)a bytes
> ble entries: 1048576 (order: 11, 8388608 bytes)
> machinMe_keemxoercy::5 913: kexec6_5d0t8b_6addr2:4 K
> 0/0100004600708a0000
> 77216K available (4360K kernel code, 299K rwdata, 1528K rodata, 6556K
> init, 202K bss, 268592K reserved)
> oacVhinei_kretxueacl: 595: kexec_kkeirmnaegle _mhead:     e m o r0y0
> 00l0043eaa9bf002y
>  ut:
>     vmalloc : 0xffffff8000000000 - 0xffffffbbffff0000   (245759 MB)
>     vmemmap : 0xffffffbce0000000 - 0xffffffbcee000000   (   224 MB)
> 0  machine_ kmeoxdecu:l597:e kexecs_ k:i m0age_xsftarft:    f  f
> f0f0b0f0f0c04000080004
>  00000 - 0xffffffc000000000   (    64 MB)
>     memory  : 0xffffffc000000000 - 0xffffffc400000000   ( 16384 MB)
>       .init : 0xffffffc000642000 -machine_ke x0excf:f5f99:f
> kexec_efntfrcy0_0d0ump:
> ca9340   (  6557 kB)
>       .text : 0xffffffc000080000 - 0xffffffc0006411c4   (  5893 kB)
> f     .data : 0xffffffc000caa000 - 0xffffffc000cf4f28     (I
> 43ea9bf002 =  343ea9b0f000 (f0fffffc 3ekaB9)b
>  000)
> d D 40S0LU00B8:0 0H0W1 = a4000080000l i(gfnf=f6f4ffc,00008000 0O)r
> ######################
> 
> This doesn't seems to be working. Random behaviors are observed. Some
> times it rebooted to u-boot
> prompt. Sometimes kernel soft resets itself in an endless loop
> (bootlog is repeating over and over again)
> 
> To debug what is happening I put a while(1) just before jumping into
> kexec reboot code.
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 31cba91..8843623 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -85,6 +85,7 @@ void soft_restart(unsigned long addr)
> 
>         smp_secondary_shutdown();
> 
> +       while(1);
>         /* Switch to the identity mapping */
>         phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
>         phys_reset(addr);
> 
> I break into target with BDI3000 now; and see the below output
> 
> TARGET#0>state
> Core#0: halted 0xffffffc000085240 External Debug Request
> Core#1: halted 0x0000004000080394 External Debug Request
> Core#2: halted 0x0000004000080394 External Debug Request
> Core#3: halted 0x0000004000080394 External Debug Request
> Core#4: halted 0xffffffc0000802f8 External Debug Request
> Core#5: halted 0x0000004000080394 External Debug Request
> Core#6: halted 0xffffffc0000802f8 External Debug Request
> Core#7: halted 0x0000004000080394 External Debug Request
> 
> I think some of the secondary CPUs are not behaving as expected;
> As of now I don't have any clues for this.
> 
> 
> --Arun
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2014-08-27  4:56 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-09 10:13 Kexec on arm64 Arun Chandran
2014-07-09 13:58 ` Arun Chandran
2014-07-09 18:49   ` Geoff Levand
2014-07-11  9:23     ` Arun Chandran
2014-07-11 16:58       ` Geoff Levand
2014-07-11 11:26     ` Arun Chandran
2014-07-12  0:19       ` Geoff Levand
2014-07-14 12:21         ` Arun Chandran
2014-07-11 15:43     ` Arun Chandran
2014-07-14 22:05       ` Geoff Levand
2014-07-15 15:28         ` Arun Chandran
2014-07-09 18:33 ` Geoff Levand
     [not found] <CAFdej006OSyhgDcJ2iZdbjt+PtysN=i_+9Dr4GTmr=+t5yg4Kw@mail.gmail.com>
2014-07-15 17:04 ` Geoff Levand
2014-07-16 17:57   ` Feng Kan
2014-07-16 23:04     ` Geoff Levand
2014-07-22  9:44       ` Arun Chandran
2014-07-22 13:25         ` Arun Chandran
2014-07-24  0:38           ` Geoff Levand
2014-07-24  9:36             ` Mark Rutland
2014-07-24 12:49               ` Arun Chandran
2014-07-25  0:17               ` Geoff Levand
2014-07-25 10:31                 ` Arun Chandran
2014-07-25 10:36                 ` Mark Rutland
2014-07-25 11:48                 ` Arun Chandran
2014-07-25 12:14                   ` Mark Rutland
2014-07-25 15:29                     ` Arun Chandran
2014-07-26  0:18                   ` Geoff Levand
2014-07-28 15:00                     ` Arun Chandran
2014-07-28 15:38                       ` Mark Rutland
2014-07-29  0:09                         ` Geoff Levand
2014-07-29  9:10                           ` Mark Rutland
2014-07-29 12:32                           ` Arun Chandran
2014-07-29 13:35                             ` Mark Rutland
2014-07-29 21:19                               ` Geoff Levand
2014-07-30  7:22                                 ` Arun Chandran
2014-08-01 11:13                                   ` Arun Chandran
2014-08-03 14:47                                     ` Mark Rutland
2014-08-04 10:16                                   ` Arun Chandran
2014-08-04 11:35                                     ` Mark Rutland
2014-08-07  0:40                                       ` Geoff Levand
2014-08-07  9:59                                         ` Mark Rutland
2014-08-07 17:09                                           ` Geoff Levand
2014-08-04 17:21                                     ` Geoff Levand
2014-08-06 13:54                                       ` Arun Chandran
2014-08-06 15:51                                         ` Arun Chandran
2014-08-07 20:07                                         ` Geoff Levand
2014-08-08  5:46                                           ` Arun Chandran
2014-08-08 10:03                                             ` Arun Chandran
2014-08-12  5:42                                               ` Arun Chandran
2014-08-13 11:09                                                 ` Arun Chandran
2014-08-26 22:32                                                   ` Geoff Levand
2014-08-27  4:56                                                     ` Arun Chandran
2014-07-30  5:46                               ` Arun Chandran
2014-07-30  9:16                                 ` Mark Rutland
2014-07-30  7:01                               ` Arun Chandran
2014-07-25 10:26               ` Arun Chandran
2014-07-25 11:29                 ` Mark Rutland
2014-07-24 11:50             ` Arun Chandran
2014-07-30  3:26           ` Feng Kan
2014-07-24  0:10         ` Geoff Levand
2014-07-24  9:13         ` Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.