* Kexec on arm64 @ 2014-07-09 10:13 Arun Chandran 2014-07-09 13:58 ` Arun Chandran 2014-07-09 18:33 ` Geoff Levand 0 siblings, 2 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-09 10:13 UTC (permalink / raw) To: kexec; +Cc: Geoff Levand Hi, I found the kexec development on arm64 here http://www.spinics.net/lists/arm-kernel/msg329563.html Went to try it on my armv8 hardware after cloning git://git.linaro.org/people/geoff.levand/linux-kexec.git and https://git.linaro.org/people/geoff.levand/kexec-tools.git Did 1) ~/work/aarch64-kernel/kexec-tools$ ./bootstrap 2)~/work/aarch64-kernel/kexec-tools$ ./configure --prefix=/tmp/kexec_install --host=aarch64-linux-gnu 3) make and got this error elf-arm64.o kexec/arch/arm64/kexec-elf-arm64.c kexec/arch/arm64/kexec-elf-arm64.c: In function ‘elf_arm64_probe’: kexec/arch/arm64/kexec-elf-arm64.c:32:24: error: ‘EM_AARCH64’ undeclared (first use in this function) if (ehdr.e_machine != EM_AARCH64) { ^ kexec/arch/arm64/kexec-elf-arm64.c:32:24: note: each undeclared identifier is reported only once for each function it appears in make: *** [kexec/arch/arm64/kexec-elf-arm64.o] Error 1 Am I missing something here? --Arun _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-09 10:13 Kexec on arm64 Arun Chandran @ 2014-07-09 13:58 ` Arun Chandran 2014-07-09 18:49 ` Geoff Levand 2014-07-09 18:33 ` Geoff Levand 1 sibling, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-07-09 13:58 UTC (permalink / raw) To: kexec; +Cc: Geoff Levand On Wed, Jul 9, 2014 at 3:43 PM, Arun Chandran <achandran@mvista.com> wrote: > Hi, > > I found the kexec development on arm64 here > http://www.spinics.net/lists/arm-kernel/msg329563.html > > Went to try it on my armv8 hardware after cloning > > git://git.linaro.org/people/geoff.levand/linux-kexec.git and > https://git.linaro.org/people/geoff.levand/kexec-tools.git > > Did > 1) ~/work/aarch64-kernel/kexec-tools$ ./bootstrap > 2)~/work/aarch64-kernel/kexec-tools$ ./configure > --prefix=/tmp/kexec_install --host=aarch64-linux-gnu > 3) make > > and got this error > > elf-arm64.o kexec/arch/arm64/kexec-elf-arm64.c > kexec/arch/arm64/kexec-elf-arm64.c: In function ‘elf_arm64_probe’: > kexec/arch/arm64/kexec-elf-arm64.c:32:24: error: ‘EM_AARCH64’ > undeclared (first use in this function) > if (ehdr.e_machine != EM_AARCH64) { > ^ > kexec/arch/arm64/kexec-elf-arm64.c:32:24: note: each undeclared > identifier is reported only once for each function it appears in > make: *** [kexec/arch/arm64/kexec-elf-arm64.o] Error 1 > I compiled kexec-tools( https://git.linaro.org/people/geoff.levand/kexec-tools.git) by commenting that "machine check" --- a/kexec/arch/arm64/kexec-elf-arm64.c +++ b/kexec/arch/arm64/kexec-elf-arm64.c @@ -28,12 +28,13 @@ int elf_arm64_probe(const char *kernel_buf, off_t kernel_size) dbgprintf("Not an ELF executable.\n"); goto on_exit; } - +#if 0 if (ehdr.e_machine != EM_AARCH64) { dbgprintf("Not an AARCH64 executable.\n"); result = -EINVAL; goto on_exit; } +#endif result = 0; After copying the resulting binaries to my target; I tried loading the kernel Image # kexec -l /Image Modified cmdline: root=/dev/nfs Unable to find /proc/device-tree//chosen/linux,stdout-path, printing from purgatory is diabled Cannot determine the file type of /Image It failed to load the kernel Image. Any pointers? --Arun > Am I missing something here? > > --Arun _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-09 13:58 ` Arun Chandran @ 2014-07-09 18:49 ` Geoff Levand 2014-07-11 9:23 ` Arun Chandran ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Geoff Levand @ 2014-07-09 18:49 UTC (permalink / raw) To: Arun Chandran; +Cc: kexec Hi Arun, On Wed, 2014-07-09 at 19:28 +0530, Arun Chandran wrote: > After copying the resulting binaries to my target; I tried loading the > kernel Image > > # kexec -l /Image > Modified cmdline: root=/dev/nfs > Unable to find /proc/device-tree//chosen/linux,stdout-path, printing > from purgatory is diabled > Cannot determine the file type of /Image > > It failed to load the kernel Image. Any pointers? My kexec-tools [1] only supports loading of arm64 elf files, so vmlinux, or a stripped version of it. Image is a raw binary, and is not yet supported. Maybe something like this is what you need: ./kexec -d --load /boot/vmlinux.strip --append="console=ttyAMA0 earlyprintk=pl011,0x1c090000 root=/dev/vda rw --verbose" --dtb=/boot/fdt.dtb ./kexec -d -e Also, my current master branch will only work reliably with PSCI boot. Spin-table boot has a bug and will be unstable for the next few days. Spin-table will re-boot, but only the primary cpu will come up. [1] https://git.linaro.org/people/geoff.levand/kexec-tools.git https://git.linaro.org/people/geoff.levand/linux-kexec.git -Geoff _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-09 18:49 ` Geoff Levand @ 2014-07-11 9:23 ` Arun Chandran 2014-07-11 16:58 ` Geoff Levand 2014-07-11 11:26 ` Arun Chandran 2014-07-11 15:43 ` Arun Chandran 2 siblings, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-07-11 9:23 UTC (permalink / raw) To: Geoff Levand; +Cc: kexec Hi Geoff, Finally I am able to get my kernel loaded with the patch below. diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c index 894e0e1..bfca40d 100644 --- a/kexec/arch/arm64/kexec-arm64.c +++ b/kexec/arch/arm64/kexec-arm64.c @@ -327,7 +327,7 @@ static int get_memory_ranges_dt(struct memory_range *array, unsigned int *count) dbgprintf("%s: RAM: %016llx - %016llx\n", __func__, r.start, r.end); - if (!arm64_mem.memstart || r.start < arm64_mem.memstart) + if ((region->size) && (!arm64_mem.memstart || r.start < arm64_mem.memstart)) arm64_mem.memstart = r.start; } } I suspect it was due to zero sized memory ranges in my dtb followed by the actual region. get_memory_ranges_dt: node_1676 memory get_memory_ranges_dt: RAM: 0000004000000000 - 0000004400000000 get_memory_ranges_dt: RAM: 0000000000000000 - 0000000000000000 get_memory_ranges_dt: RAM: 0000000000000000 - 0000000000000000 get_memory_ranges_dt: RAM: 0000000000000000 - 0000000000000000 get_memory_ranges_dt: RAM: 0000000000000000 - 0000000000000000 As a result of that arm64_mem.memstart was getting overwritten with zero. and I was stuck at the error kexec: kexec/arch/arm64/kexec-arm64.c:242: virt_to_phys: Assertion `arm64_mem.memstart' failed. So check the validity of region before updating the 'memstart' I will try with kexec -e next. --Arun On Thu, Jul 10, 2014 at 12:19 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Arun, > > On Wed, 2014-07-09 at 19:28 +0530, Arun Chandran wrote: >> After copying the resulting binaries to my target; I tried loading the >> kernel Image >> >> # kexec -l /Image >> Modified cmdline: root=/dev/nfs >> Unable to find /proc/device-tree//chosen/linux,stdout-path, printing >> from purgatory is diabled >> Cannot determine the file type of /Image >> >> It failed to load the kernel Image. Any pointers? > > My kexec-tools [1] only supports loading of arm64 elf files, so > vmlinux, or a stripped version of it. Image is a raw binary, and > is not yet supported. > > Maybe something like this is what you need: > > ./kexec -d --load /boot/vmlinux.strip --append="console=ttyAMA0 earlyprintk=pl011,0x1c090000 root=/dev/vda rw --verbose" --dtb=/boot/fdt.dtb > ./kexec -d -e > > Also, my current master branch will only work reliably with PSCI > boot. Spin-table boot has a bug and will be unstable for the next > few days. Spin-table will re-boot, but only the primary cpu will > come up. > > [1] https://git.linaro.org/people/geoff.levand/kexec-tools.git > https://git.linaro.org/people/geoff.levand/linux-kexec.git > > -Geoff > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-11 9:23 ` Arun Chandran @ 2014-07-11 16:58 ` Geoff Levand 0 siblings, 0 replies; 61+ messages in thread From: Geoff Levand @ 2014-07-11 16:58 UTC (permalink / raw) To: Arun Chandran; +Cc: kexec Hi Arun, On Fri, 2014-07-11 at 14:53 +0530, Arun Chandran wrote: > Finally I am able to get my kernel loaded with the patch below. OK, great. > I suspect it was due to zero sized memory ranges in my dtb followed by the > actual region. Could you please send me your dts? If you don't want to post it to the list, could you send it to me in a private mail? -Geoff _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-09 18:49 ` Geoff Levand 2014-07-11 9:23 ` Arun Chandran @ 2014-07-11 11:26 ` Arun Chandran 2014-07-12 0:19 ` Geoff Levand 2014-07-11 15:43 ` Arun Chandran 2 siblings, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-07-11 11:26 UTC (permalink / raw) To: Geoff Levand; +Cc: kexec Hi Geoff, This is a minor "NULL check" issue in the code. kexec fails to load, without kernel command line option (--append). The below code fixes that issue (missing null check. diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c index 894e0e1..0ca0df4 100644 --- a/kexec/arch/arm64/kexec-arm64.c +++ b/kexec/arch/arm64/kexec-arm64.c @@ -135,8 +135,10 @@ int arm64_load_other_segments(struct kexec_info *info) /* Processing for arm64_opts.dtb and arm64_opts.command_line. */ - strncpy(command_line, arm64_opts.command_line, sizeof(command_line)); - command_line[sizeof(command_line) - 1] = 0; + if (arm64_opts.command_line != NULL) { + strncpy(command_line, arm64_opts.command_line, sizeof(command_line)); + command_line[sizeof(command_line) - 1] = 0; + } --Arun On Thu, Jul 10, 2014 at 12:19 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Arun, > > On Wed, 2014-07-09 at 19:28 +0530, Arun Chandran wrote: >> After copying the resulting binaries to my target; I tried loading the >> kernel Image >> >> # kexec -l /Image >> Modified cmdline: root=/dev/nfs >> Unable to find /proc/device-tree//chosen/linux,stdout-path, printing >> from purgatory is diabled >> Cannot determine the file type of /Image >> >> It failed to load the kernel Image. Any pointers? > > My kexec-tools [1] only supports loading of arm64 elf files, so > vmlinux, or a stripped version of it. Image is a raw binary, and > is not yet supported. > > Maybe something like this is what you need: > > ./kexec -d --load /boot/vmlinux.strip --append="console=ttyAMA0 earlyprintk=pl011,0x1c090000 root=/dev/vda rw --verbose" --dtb=/boot/fdt.dtb > ./kexec -d -e > > Also, my current master branch will only work reliably with PSCI > boot. Spin-table boot has a bug and will be unstable for the next > few days. Spin-table will re-boot, but only the primary cpu will > come up. > > [1] https://git.linaro.org/people/geoff.levand/kexec-tools.git > https://git.linaro.org/people/geoff.levand/linux-kexec.git > > -Geoff > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-11 11:26 ` Arun Chandran @ 2014-07-12 0:19 ` Geoff Levand 2014-07-14 12:21 ` Arun Chandran 0 siblings, 1 reply; 61+ messages in thread From: Geoff Levand @ 2014-07-12 0:19 UTC (permalink / raw) To: Arun Chandran; +Cc: kexec Hi Arun, On Fri, 2014-07-11 at 16:56 +0530, Arun Chandran wrote: > The below code fixes that issue (missing null check. I just pushed out a fix for this, slightly different from yours. Please let me know if you still have a problem with it. -Geoff _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-12 0:19 ` Geoff Levand @ 2014-07-14 12:21 ` Arun Chandran 0 siblings, 0 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-14 12:21 UTC (permalink / raw) To: Geoff Levand; +Cc: kexec Hi Geoff, On Sat, Jul 12, 2014 at 5:49 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Arun, > > On Fri, 2014-07-11 at 16:56 +0530, Arun Chandran wrote: >> The below code fixes that issue (missing null check. > > I just pushed out a fix for this, slightly different > from yours. Please let me know if you still have a > problem with it. > Got it. The loading part is fine with it. kexec -e still giving troubles. --Arun _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-09 18:49 ` Geoff Levand 2014-07-11 9:23 ` Arun Chandran 2014-07-11 11:26 ` Arun Chandran @ 2014-07-11 15:43 ` Arun Chandran 2014-07-14 22:05 ` Geoff Levand 2 siblings, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-07-11 15:43 UTC (permalink / raw) To: Geoff Levand; +Cc: kexec [-- Attachment #1: Type: text/plain, Size: 1793 bytes --] On Thu, Jul 10, 2014 at 12:19 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Arun, > > On Wed, 2014-07-09 at 19:28 +0530, Arun Chandran wrote: >> After copying the resulting binaries to my target; I tried loading the >> kernel Image >> >> # kexec -l /Image >> Modified cmdline: root=/dev/nfs >> Unable to find /proc/device-tree//chosen/linux,stdout-path, printing >> from purgatory is diabled >> Cannot determine the file type of /Image >> >> It failed to load the kernel Image. Any pointers? > > My kexec-tools [1] only supports loading of arm64 elf files, so > vmlinux, or a stripped version of it. Image is a raw binary, and > is not yet supported. > > Maybe something like this is what you need: > > ./kexec -d --load /boot/vmlinux.strip --append="console=ttyAMA0 earlyprintk=pl011,0x1c090000 root=/dev/vda rw --verbose" --dtb=/boot/fdt.dtb > ./kexec -d -e > Hi , I tried kexec reboot with my modified kexec-tools and got a nice kernel panic. Please find the error log attached. There is only dtb file related change in the kernel sources, as given below. ########### cpu-release-addr = <0x1 0x0000fff8>; - cpu-return-addr = <0 0> + cpu-return-addr = <0x0 0x0>; ########### I just assigned a random value to "cpu-return-addr" in my dtb file; Do you have any idea how to find out that value for my hardware? > Also, my current master branch will only work reliably with PSCI > boot. Spin-table boot has a bug and will be unstable for the next > few days. Spin-table will re-boot, but only the primary cpu will > come up. > My hardware uses spin-table method for SMP. As you said there is a bug in kexec for spin-table method, what about trying it without CONFIG_SMP? Does your code support that? --Arun [-- Attachment #2: error_log.txt --] [-- Type: text/plain, Size: 11059 bytes --] # kexec -l vmlinux.strip --dtb=dtb.dtb -d --command-line="console=ttyS0,115200" random: nonblocking pool is initialized kexec version: 14.07.10.11.38-g426996b arch_process_options:83: command_line: console=ttyS0,115200 arch_process_options:85: initrd: (null) arch_process_options:86: dtmachine_kexec_prepare:547: b: kexec image info: type: 0 dt start: 4000080000 head: 0 b. nr_segments: 2 dt segment[0]: 0000004000080000 - 000000400084d000, 7cd000h bytes, 1997 pages kexec_is_dtb:136: magic: 10000014 : 14000010 : no segment[1]: 0000004000860000 - 0000004000862000, 2000h bytes, 2 pages kekexec_is_dtb:136: magic: d00dfeed : edfe0dd0 : yes rnkexec_is_dtb:136: magic: 10000014 : 14000010 : no ekexec_is_dtb:136: magic: d00dfeed : edfe0dd0 : yes l:kexec_boot_info_init:459: cpu_count: 8 0x7f7d7kexec_cpu_info_init:422: cpu-0: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000 29kexec_cpu_info_init:422: cpu-1: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000 01kexec_cpu_info_init:422: cpu-2: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000 0 kexec_cpu_info_init:422: cpu-3: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000 kekexec_cpu_info_init:422: cpu-4: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000 rnkexec_cpu_info_init:422: cpu-5: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000 elkexec_cpu_info_init:422: cpu-6: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000 _skexec_cpu_info_init:422: cpu-7: 'spin-table', release_addr: 0x000000400000fff8, return_addr: 0x0000000000000000 izkexec_boot_info_init:459: cpu_count: 8 e:kexec_cpu_info_init:422: cpu-0: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000 0kexec_cpu_info_init:422: cpu-1: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000 x7kexec_cpu_info_init:422: cpu-2: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000 adkexec_cpu_info_init:422: cpu-3: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000 22kexec_cpu_info_init:422: cpu-4: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000 kexec_cpu_info_init:422: cpu-5: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000 Mkexec_cpu_info_init:422: cpu-6: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000 odkexec_cpu_info_init:422: cpu-7: 'spin-table', release_addr: 0x000000010000fff8, return_addr: 0x0000000000000000 ified cmdline: root=/dev/nfs Unable to find /proc/device-tree//chosen/linux,stdout-path, printing from purgatory is diabled get_memory_ranges_dt: node_1676 memory get_memory_ranges_dt: RAM: 0000004000000000 - 0000004400000000 get_memory_ranges_dt: RAM: 0000000000000000 - 0000000000000000 get_memory_ranges_dt: RAM: 0000000000000000 - 0000000000000000 get_memory_ranges_dt: RAM: 0000000000000000 - 0000000000000000 get_memory_ranges_dt: Success p_paddr: ffffffc000080000 p_vaddr: ffffffc000080000 p_filesz: 000000000079cc08 p_memsz: 00000000007cce28 p_offset: 0000000000010000 text_offset: 0000000000080000 page_offset: ffffffc000000000 memstart: 0000004000000000 p_vaddr: ffffffc000080000 virt_to_phys: ffffffc000080000 -> 0000004000080000 text_offset: 0000000000080000 page_offset: ffffffc000000000 memstart: 0000004000000000 entry: 0x4000080000 virt_to_phys: ffffffc000080000 -> 0000004000080000 add_segment_phys_virt: 0000007f7d739010 - 0000007f7ded5c18 (0079cc08) -> 0000004000080000 - 000000400084d000 (007cd000) dtb: base 4000860000, size 1c36h (7222) add_segment_phys_virt: 000000002cc78510 - 000000002cc7a146 (00001c36) -> 0000004000860000 - 0000004000862000 (00002000) kexec_load: entry = 0x4000080000 flags = 0xb70000 nr_segments = 2 segment[0].buf = 0x7f7d739010 segment[0].bufsz = 0x79cc08 segment[0].mem = 0x4000080000 segment[0].memsz = 0x7cd000 segment[1].buf = 0x2cc78510 segment[1].bufsz = 0x1c36 segment[1].mem = 0x4000860000 segment[1].memsz = 0x2000 # # # # kexec -e -d kexekvm: exiting hardware virtualization c veStarting new kernel rssmp_spin_table_cpu_die:125: id: 1, holding count: 0 smp_spin_table_cpu_die:125: id: 7, holding count: 0 smp_spin_table_cpu_die:125: id: 6, holding count: 0 smp_spin_table_cpu_die:125: id: 4, holding count: 0 smp_spin_table_cpu_die:125: id: 3, holding count: 0 smp_spin_table_cpu_die:125: id: 2, holding count: 0 smp_spin_table_cpu_die:125: id: 5, holding count: 0 machine_kexec:612: smp_processor_id = 0 Bad mode in Synchronous Abort handler detected, code 0x86000005 machine_kexec:614: CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.15.0-rc4+ #44 kexec image info: task: ffffffc3ef143340 ti: ffffffc3ef160000 task.ti: ffffffc3ef160000 type: 0 start: 4000080000 PC is at 0x4000489f80 head: 43edaef002 LR is at 0xffffffc0000802e8 nr_segments: 2 pc : [<0000004000489f80>] lr : [<ffffffc0000802e8>] pstate: 600001c5 segment[0]: 0000004000080000 - 000000400084d000, 7cd000h bytes, 1997 pages sp : ffffffc3ef163d90 x29: ffffffc3ef163d90 x28: 0000008000000000 kexec_is_dtb:136: magic: 0 : 0 : no segment[1]: 0000004000860000 - 0000004000862000, 2000h bytes, 2 pages x27: ffffffc3ef160000 x26: ffffffc000604000 kexec_is_dtb:136: magic: 0 : 0 : no machine_kexec:622: control_code_page: ffffffbcedbfc6f8 x25: ffffffc000602000 machine_kexec:624: reboot_code_buffer_phys: 00000043eda69000 x24: ffffffc000587000 machine_kexec:626: reboot_code_buffer: ffffffc3eda69000 machine_kexec:628: relocate_new_kernel: ffffffc000093158 x23: ffffffc000c72000 machine_kexec:630: relocate_new_kernel_size: b8h(184) bytes x22: 00000000500f0000 machine_kexec:633: kexec_dtb_addr: 0000004000860000 machine_kexec:635: kexec_kimage_head: 00000043edaef002 x21: 0000000000000000 machine_kexec:637: kexec_kimage_start: 0000004000080000 x20: 0000000000000000 machine_kexec:639: kexec_entry_dump: I 43edaef002 = 43edaef000 (ffffffc3edaef000) x19: ffffffc00048a8b0 D 4000080001 = 4000080000 (ffffffc000080000) x18: 0000007fd31cc6e0 x17: 00000000004a6c60 x16: ffffffc00017a834 I 43ed4f4002 = 43ed4f4000 (ffffffc3ed4f4000) x15: 0000000000000006 x14: 00000000000fdb04 x13: 0000000000000054 x12: 0000004000489f80 I 43ed6f4002 = 43ed6f4000 (ffffffc3ed6f4000) x11: 00000000000024fc x10: ffffffffffffffff x9 : 0000000000000033 x8 : ffffffc3fee8153f I 43ed0f4002 = 43ed0f4000 (ffffffc3ed0f4000) x7 : 0000000000000000 x6 : 00000000000f0000 x5 : 00000000000f0000 D 4000860001 = 4000860000 (ffffffc000860000) x4 : ffffffc000caa208 DONE 0000000004 kexec_entry_dump: 0 pages x3 : ffffffc000c72000 x2 : ffffffc0000805d8 x1 : 0000000000000000 x0 : ffffffc000c72000 dump_cpus: all: 0 1 2 3 4 5 6 7 Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP dump_cpus: possible: 0 1 2 3 4 5 6 7 Modules linked in: dump_cpus: present: 0 1 2 3 4 5 6 7 CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.15.0-rc4+ #44 dump_cpus: active: 0 1 2 3 4 5 6 7 task: ffffffc3ef143340 ti: ffffffc3ef160000 task.ti: ffffffc3ef160000 dump_cpus: online: 0 PC is at 0x4000489f80 LR is at 0xffffffc0000802e8 dump_cpus: not online: 1 2 3 4 5 6 7 pc : [<0000004000489f80>] lr : [<ffffffc0000802e8>] pstate: 600001c5 Bye! sp : ffffffc3ef163d90 x29: ffffffc3ef163d90 x28: 0000008000000000 x27: ffffffc3ef160000 x26: ffffffc000604000 x25: ffffffc000602000 x24: ffffffc000587000 x23: ffffffc000c72000 x22: 00000000500f0000 x21: 0000000000000000 x20: 0000000000000000 x19: ffffffc00048a8b0 x18: 0000007fd31cc6e0 x17: 00000000004a6c60 x16: ffffffc00017a834 x15: 0000000000000006 x14: 00000000000fdb04 x13: 0000000000000054 x12: 0000004000489f80 x11: 00000000000024fc x10: ffffffffffffffff x9 : 0000000000000033 x8 : ffffffc3fee8153f x7 : 0000000000000000 x6 : 00000000000f0000 x5 : 00000000000f0000 x4 : ffffffc000caa208 x3 : ffffffc000c72000 x2 : ffffffc0000805d8 x1 : 0000000000000000 x0 : ffffffc000c72000 Process swapper/7 (pid: 0, stack limit = 0xffffffc3ef160058) Stack: (0xffffffc3ef163d90 to 0xffffffc3ef164000) 3d80: ef163dc0 ffffffc3 0008ed2c ffffffc0 3da0: 00caa170 ffffffc0 00000007 00000000 00c5ea98 ffffffc0 0008ecb8 ffffffc0 3dc0: ef163e10 ffffffc3 000812d0 ffffffc0 0000200c ffffff80 ef163e40 ffffffc3 3de0: 00cc2700 ffffffc0 00002010 ffffff80 60000145 00000000 00ca871a ffffffc0 3e00: ef163e40 ffffffc3 00000003 00000000 ef163f60 ffffffc3 00083da4 ffffffc0 3e20: ef160000 ffffffc3 ef160000 ffffffc3 ef163f60 ffffffc3 000852b4 ffffffc0 3e40: 00000007 00000000 00597e48 ffffffc0 ef163ef0 ffffffc3 00000001 00000000 3e60: 0000004e 00000000 14000000 00000000 25c17d03 00000002 00000000 00000000 3e80: 00000018 00000000 ef163d70 ffffffc3 00000400 00000000 00000400 00000000 3ea0: 00000000 00000000 ffffffff ffffffff ffffffff ffffffff 981e8598 0000007f 3ec0: 0017a834 ffffffc0 004a6c60 00000000 d31cc6e0 0000007f ef160000 ffffffc3 3ee0: ef160000 ffffffc3 00caf500 ffffffc0 0048d000 ffffffc0 00587a68 ffffffc0 3f00: 00ca871a ffffffc0 0058d200 ffffffc0 00000001 00000000 ef160000 ffffffc3 3f20: 00000000 00000080 ef163f60 ffffffc3 000852b0 ffffffc0 ef163f60 ffffffc3 3f40: 000852b4 ffffffc0 60000145 00000000 ef160000 ffffffc3 ef160000 ffffffc3 3f60: ef163f70 ffffffc3 000e10a4 ffffffc0 ef163fd0 ffffffc3 0008e81c ffffffc0 3f80: 00000007 00000000 00000000 00000000 00caa1b0 ffffffc0 500f0000 00000000 3fa0: 00c72000 00000040 00000000 00000040 0007b000 00000040 0007d000 00000040 3fc0: 00080300 ffffffc0 0008e7fc ffffffc0 00000000 00000000 000802e8 00000040 3fe0: 00002000 00000000 00000000 00000000 08d0b004 05101514 a688cdaa 2620a80f Call trace: [<0000004000489f80>] 0x4000489f80 [<ffffffc00008ed28>] handle_IPI+0x1a8/0x1b0 [<ffffffc0000812cc>] gic_handle_irq+0x78/0x80 Exception stack(0xffffffc3ef163e20 to 0xffffffc3ef163f40) 3e20: ef160000 ffffffc3 ef160000 ffffffc3 ef163f60 ffffffc3 000852b4 ffffffc0 3e40: 00000007 00000000 00597e48 ffffffc0 ef163ef0 ffffffc3 00000001 00000000 3e60: 0000004e 00000000 14000000 00000000 25c17d03 00000002 00000000 00000000 3e80: 00000018 00000000 ef163d70 ffffffc3 00000400 00000000 00000400 00000000 3ea0: 00000000 00000000 ffffffff ffffffff ffffffff ffffffff 981e8598 0000007f 3ec0: 0017a834 ffffffc0 004a6c60 00000000 d31cc6e0 0000007f ef160000 ffffffc3 3ee0: ef160000 ffffffc3 00caf500 ffffffc0 0048d000 ffffffc0 00587a68 ffffffc0 3f00: 00ca871a ffffffc0 0058d200 ffffffc0 00000001 00000000 ef160000 ffffffc3 3f20: 00000000 00000080 ef163f60 ffffffc3 000852b0 ffffffc0 ef163f60 ffffffc3 [<ffffffc000083da0>] el1_irq+0x60/0xd0 [<ffffffc0000e10a0>] cpu_startup_entry+0x118/0x180 [<ffffffc00008e818>] secondary_start_kernel+0x10c/0x11c Code: bad PC value ---[ end trace 7467eabc0e0787f2 ]--- Kernel panic - not syncing: Fatal exception in interrupt Rebooting in 1 seconds..Reboot failed -- System halted [-- Attachment #3: Type: text/plain, Size: 143 bytes --] _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-11 15:43 ` Arun Chandran @ 2014-07-14 22:05 ` Geoff Levand 2014-07-15 15:28 ` Arun Chandran 0 siblings, 1 reply; 61+ messages in thread From: Geoff Levand @ 2014-07-14 22:05 UTC (permalink / raw) To: Arun Chandran; +Cc: kexec Hi Arun, On Fri, 2014-07-11 at 21:13 +0530, Arun Chandran wrote: > I tried kexec reboot with my modified kexec-tools and got a nice kernel panic. > Please find the error log attached. > There is only dtb file related change in the kernel sources, as given below. > > ########### > cpu-release-addr = <0x1 0x0000fff8>; > - cpu-return-addr = <0 0> > + cpu-return-addr = <0x0 0x0>; > ########### > > I just assigned a random value to "cpu-return-addr" in my dtb file; You need the proper cpu-return-addr value. Because cpu-return-addr was zero, kexec sent all your secondary processors back to secondary_startup, and is why you see that in your error log. I put in a hack for zero cpu-return-addr in my latest version. Please try it. If you use the same kernel for both the 1st and 2nd stage it should work. > Do you have any idea how to find out that value for my hardware? It is a property of your bootloader, not hardware. Ask your bootloader maintainer if you can, or check the bootloader sources. For the rtsm_ve_aemv8a model bootwrapper I set cpu-return-addr to <0x0 0x80000000> because the bootwrapper linker script puts _start at PHYS_OFFSET, which is 0x80000000 for that platform. https://git.kernel.org/cgit/linux/kernel/git/mark/boot-wrapper-aarch64.git/tree/model.lds.S#n30 > My hardware uses spin-table method for SMP. As you said there > is a bug in kexec for spin-table method, what about trying it without > CONFIG_SMP? Does your code support that? CONFIG_SMP=n should work, but I haven't tried it in a long time. -Geoff _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-14 22:05 ` Geoff Levand @ 2014-07-15 15:28 ` Arun Chandran 0 siblings, 0 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-15 15:28 UTC (permalink / raw) To: Geoff Levand; +Cc: kexec Hi Geoff, I think we should better discuss this in the arm mailing list as it is getting more arm64 specific. I will post my reply there. --Arun On Tue, Jul 15, 2014 at 3:35 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Arun, > > On Fri, 2014-07-11 at 21:13 +0530, Arun Chandran wrote: >> I tried kexec reboot with my modified kexec-tools and got a nice kernel panic. >> Please find the error log attached. >> There is only dtb file related change in the kernel sources, as given below. >> >> ########### >> cpu-release-addr = <0x1 0x0000fff8>; >> - cpu-return-addr = <0 0> >> + cpu-return-addr = <0x0 0x0>; >> ########### >> >> I just assigned a random value to "cpu-return-addr" in my dtb file; > > You need the proper cpu-return-addr value. Because cpu-return-addr > was zero, kexec sent all your secondary processors back to > secondary_startup, and is why you see that in your error log. > > I put in a hack for zero cpu-return-addr in my latest version. Please > try it. If you use the same kernel for both the 1st and 2nd stage it > should work. > >> Do you have any idea how to find out that value for my hardware? > > It is a property of your bootloader, not hardware. Ask your > bootloader maintainer if you can, or check the bootloader sources. > > For the rtsm_ve_aemv8a model bootwrapper I set cpu-return-addr to > <0x0 0x80000000> because the bootwrapper linker script puts _start > at PHYS_OFFSET, which is 0x80000000 for that platform. > > https://git.kernel.org/cgit/linux/kernel/git/mark/boot-wrapper-aarch64.git/tree/model.lds.S#n30 > >> My hardware uses spin-table method for SMP. As you said there >> is a bug in kexec for spin-table method, what about trying it without >> CONFIG_SMP? Does your code support that? > > CONFIG_SMP=n should work, but I haven't tried it in a long time. > > -Geoff > > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Kexec on arm64 2014-07-09 10:13 Kexec on arm64 Arun Chandran 2014-07-09 13:58 ` Arun Chandran @ 2014-07-09 18:33 ` Geoff Levand 1 sibling, 0 replies; 61+ messages in thread From: Geoff Levand @ 2014-07-09 18:33 UTC (permalink / raw) To: Arun Chandran; +Cc: kexec Hi Arun, On Wed, 2014-07-09 at 15:43 +0530, Arun Chandran wrote: > and got this error > > elf-arm64.o kexec/arch/arm64/kexec-elf-arm64.c > kexec/arch/arm64/kexec-elf-arm64.c: In function ‘elf_arm64_probe’: > kexec/arch/arm64/kexec-elf-arm64.c:32:24: error: ‘EM_AARCH64’ > undeclared (first use in this function) > if (ehdr.e_machine != EM_AARCH64) { > ^ > kexec/arch/arm64/kexec-elf-arm64.c:32:24: note: each undeclared > identifier is reported only once for each function it appears in > make: *** [kexec/arch/arm64/kexec-elf-arm64.o] Error 1 > > Am I missing something here? Your toolchain is old, or built incorrectly. Try to upgrade it. You need headers that define EM_AARCH64. As a work-around, maybe you can do something like: make CFLAGS=-DEM_AARCH64=183' -Geoff _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
[parent not found: <CAFdej006OSyhgDcJ2iZdbjt+PtysN=i_+9Dr4GTmr=+t5yg4Kw@mail.gmail.com>]
* Kexec on arm64 [not found] <CAFdej006OSyhgDcJ2iZdbjt+PtysN=i_+9Dr4GTmr=+t5yg4Kw@mail.gmail.com> @ 2014-07-15 17:04 ` Geoff Levand 2014-07-16 17:57 ` Feng Kan 0 siblings, 1 reply; 61+ messages in thread From: Geoff Levand @ 2014-07-15 17:04 UTC (permalink / raw) To: linux-arm-kernel Hi, On Tue, 2014-07-15 at 21:35 +0530, Arun Chandran wrote: > So I am assuming some code at 0x238 is waiting for > secondary_kernel_start address > @000000400000fff8 > > So I modified my dtb file and made 0x238 as 'cpu-return-addr ' > ... > > Is this correct? I doubt it. I recommend you contact whoever you are contracted to do this work for and ask them for the value. As I mentioned in a mail from yesterday, I pushed out a work around for this. Did you try it? For anyone who comes across this thread and wants to follow it back to find that mail I sent out, good luck... -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-15 17:04 ` Geoff Levand @ 2014-07-16 17:57 ` Feng Kan 2014-07-16 23:04 ` Geoff Levand 0 siblings, 1 reply; 61+ messages in thread From: Feng Kan @ 2014-07-16 17:57 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 15, 2014 at 10:04 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi, > > On Tue, 2014-07-15 at 21:35 +0530, Arun Chandran wrote: >> So I am assuming some code at 0x238 is waiting for >> secondary_kernel_start address >> @000000400000fff8 >> >> So I modified my dtb file and made 0x238 as 'cpu-return-addr ' >> > ... >> >> Is this correct? > > I doubt it. I recommend you contact whoever you are contracted to do > this work for and ask them for the value. > > As I mentioned in a mail from yesterday, I pushed out a work around for > this. Did you try it? > > For anyone who comes across this thread and wants to follow it back to > find that mail I sent out, good luck... Hi Guys: Just following up on the conversation. The cpu return address of 0 should work in your case. Since thats the _start of the bootloader, it will run some core init code and then put the core back in wfe. However, I think this functionality should be pushed back into the kernel side to provide some small page of spin code rather than depend on the bootloader. The address 0 could be OCM or NOR flash, in the on chip memory case, the space could potentially be used for other means. > > -Geoff > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-16 17:57 ` Feng Kan @ 2014-07-16 23:04 ` Geoff Levand 2014-07-22 9:44 ` Arun Chandran 0 siblings, 1 reply; 61+ messages in thread From: Geoff Levand @ 2014-07-16 23:04 UTC (permalink / raw) To: linux-arm-kernel Hi Feng, On Wed, 2014-07-16 at 10:57 -0700, Feng Kan wrote: > Just following up on the conversation. The cpu return address of 0 should work > in your case. Since thats the _start of the bootloader, it will run > some core init > code and then put the core back in wfe. OK, I fixed up my code so that zero is valid cpu return address. Arun, could you try my latest I pushed out today? > However, I think this > functionality should > be pushed back into the kernel side to provide some small page of spin > code rather > than depend on the bootloader. This method was already discussed in another thread [1] and decided against. With this we would need to set that spin code memory as reserved from kernel use, and so each boot stage would leak some memory. A system would eventually run out of memory over kexec's. [1] http://comments.gmane.org/gmane.linux.ports.arm.kernel/323440 -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-16 23:04 ` Geoff Levand @ 2014-07-22 9:44 ` Arun Chandran 2014-07-22 13:25 ` Arun Chandran ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-22 9:44 UTC (permalink / raw) To: linux-arm-kernel On Thu, Jul 17, 2014 at 4:34 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Feng, > > On Wed, 2014-07-16 at 10:57 -0700, Feng Kan wrote: >> Just following up on the conversation. The cpu return address of 0 should work >> in your case. Since thats the _start of the bootloader, it will run >> some core init >> code and then put the core back in wfe. > > OK, I fixed up my code so that zero is valid cpu return address. Arun, > could you try my latest I pushed out today? > Hi Geoff, Sorry for the late reply I was away. Yes I tried the new code. My dts file has the below change. ################ diff --git a/arch/arm64/boot/dts/apm-storm.dtsi b/arch/arm64/boot/dts/apm-storm.dtsi index e0bf91d..b64e549 100644 --- a/arch/arm64/boot/dts/apm-storm.dtsi +++ b/arch/arm64/boot/dts/apm-storm.dtsi @@ -24,64 +24,64 @@ compatible = "apm,potenza", "arm,armv8"; reg = <0x0 0x000>; enable-method = "spin-table"; - cpu-release-addr = <0x1 0x0000fff8>; - cpu-return-addr = <0x0 0x0> /* Updated by bootloader */ + cpu-release-addr = <0x40 0x0000fff8>; + cpu-return-addr = <0x0 0x0>; /* Updated by bootloader */ }; ################# All other cpu nodes have similar change. 1) Loading ( I don't change commandline and dtb; assuming kexec will reuse whatever the booted kernel has as commandline and dtb) # kexec -l vmlinux.strip kexec version: 14.07.17.12.17-gb6cccb4 Modified cmdline: root=/dev/nfs Unable to find /proc/device-tree//chosen/linux,stdout-path, printing from purgatory is diabled Modified cmdline: root=/dev/nfs Unable to find /proc/device-tree//chosen/linux,stdout-path, printing from purgatory is diabled machine_kexec_prepare:508: kexec image info: type: 0 start: 4000080004 head: 0 nr_segments: 2 segment[0]: 0000004000080000 - 000000400088c000, 80c000h bytes, 2060 pages kexec_is_dtb:115: magic: 4d5a0091 : 91005a4d : no segment[1]: 00000040008a0000 - 00000040008a3000, 3000h bytes, 3 pages kexec_is_dtb:115: magic: d00dfeed : edfe0dd0 : yes kexec_boot_info_init:384: cpu_count: 8 kexec_cpu_info_init:352: cpu-0: hwid-0, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-1: hwid-1, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-2: hwid-100, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-3: hwid-101, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-4: hwid-200, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-5: hwid-201, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-6: hwid-300, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-7: hwid-301, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_is_dtb:115: magic: 4d5a0091 : 91005a4d : no kexec_is_dtb:115: magic: d00dfeed : edfe0dd0 : yes kexec_boot_info_init:384: cpu_count: 8 kexec_cpu_info_init:352: cpu-0: hwid-0, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-1: hwid-1, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-2: hwid-100, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-3: hwid-101, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-4: hwid-200, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-5: hwid-201, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-6: hwid-300, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_info_init:352: cpu-7: hwid-301, 'spin-table', cpu-release-addr 400000fff8, cpu-return-addr 0 kexec_cpu_check:440: hwid-0 OK kexec_cpu_check:440: hwid-1 OK kexec_cpu_check:440: hwid-100 OK kexec_cpu_check:440: hwid-101 OK kexec_cpu_check:440: hwid-200 OK kexec_cpu_check:440: hwid-201 OK kexec_cpu_check:440: hwid-300 OK kexec_cpu_check:440: hwid-301 OK 2) Rebooting ######################### # kexec -e kexec version: 1kvm: exiting hardware virtualization 4.07.17.12.17-gbStarting new kernel 6cccb4 Ump_spin_tanblaeb_lcpeu_d ite:127: oi dh:a n7d,l holding count: 0e kernel NULL pointer dereference at virtual address 00000291 smp_spin_Itnaibtlei_cpaul_diie:12z7:i nigd :c 3g,r oholding couunpt :s 0u bsys cpu smpL_isnpuixn _table_cpu_diev:e1r2s7i:o ni d: 6, hol3d.i1n6g. 0c-orucnt: 0 4+ (arun at arun-OptiPlex-9010) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05) ) #25 SMP smp_sPpin_tablReE_EcMpPuT_ dTie:127: iude: J5u, hlo ld2ing coun2: 37:03 IST 2014 smp_Cspin_tPabUl:e _AcApruc_die:127: id: h46,4 hPorldiong count:c e0s or [500f0000] revision 0 smp_speifni_:t aGbeltet_cpu_die:i1n2g7 : ipd:a 2, holdinrga mceount:t e0 rs from FDT: smep_fsip:i nC_atable_cpu_dien:'1t27: i df: 1i, holdinngd cSoyusntt:e 0 m Table in device tree! macchimne_kexaec:: 572C: smp_pMrAo:c efsasiorl_ied = 0 d to reserve 16 MiB dachine_kexecO:n5 7n4o: e 0 totalpages: 4194304 a k e xec image inNfoor:m l zone: 57344 pages used for memmap type: 0 z sta r tN:o r m4a0000800l04 one: 4194304 pages, LIFO batch:31 head: P E R 43Cea9bPf002 U: Embedded 11 pages/cpu @ffffffc3fff7d000 s13120 r8192 d23744 u45056 nrp_cspeug-maelnltosc: 2 : s13120 r8192 d23744 u45056 alloc=11*4096 p c p usegment-[a0l]l:o c0:0 0[0000400]00 800 000[ -0 ]0000004000 881c 0[000], 280c000h bytes, [200]6 0 3pages [0] 4 [0] 5 [0] 6 [0] 7 eexBeuc_is_dtb:1i1l5:t m1a gizc: 0 : 0 : noon lists in Zone order, mobility grouping on. Total pages: 4136960 / K e r segment[1]: n0e0l0 00c0o400m08a0000 m-a n0d00 00040l008ai30n00e,: 3r000ho obty=tes, 3/ dpeagesv nfs rw nfsroot=10.162.103.228:/nfs_root/dora_june_6/apm-image-minimal-mustangbe ip=10.162.103.21:10.162.103.228:10.162.103.1:255.255.255.0:mustangk:eextehc_0is:_odtb:1f15f: pmaangic:i 0c =: 0 : 1no console=ttyS0,115200 earlyprintk=uart8250-32bit,0x1c020000 debug maxcpus=8 swiotlb=65536 log_buf_len=1M 8aclhinoeg_kex_ec:582: cobnturfo_ll_ecode_page: nf:f f1ff0fbc4edb67ee8 576 6achinee_akrelxyec: 58l4: reobogo tb_ucfo dfe_buffer_physr:e e :0 000010435eaffb00007 (92%) macPhine_IkDexec:58 6h:a srehb otot_acode_bufferb:l e e n t rfifffffce3se:a f4f0b000 96 (order: 3, 32768 bytes) macDhine_kexeecn:t5r88: ryel occate_neaw_ckheer nelh:a s fffffhf c0t0a0093b18 ble entries: 2097152 (order: 12, 16777216 bytes) machineI_kneoxedc:5e90-: relocate_cnaecwh_ek ehransel_size: b8hh( 1t84)a bytes ble entries: 1048576 (order: 11, 8388608 bytes) machinMe_keemxoercy::5 913: kexec6_5d0t8b_6addr2:4 K 0/0100004600708a0000 77216K available (4360K kernel code, 299K rwdata, 1528K rodata, 6556K init, 202K bss, 268592K reserved) oacVhinei_kretxueacl: 595: kexec_kkeirmnaegle _mhead: e m o r0y0 00l0043eaa9bf002y ut: vmalloc : 0xffffff8000000000 - 0xffffffbbffff0000 (245759 MB) vmemmap : 0xffffffbce0000000 - 0xffffffbcee000000 ( 224 MB) 0 machine_ kmeoxdecu:l597:e kexecs_ k:i m0age_xsftarft: f f f0f0b0f0f0c04000080004 00000 - 0xffffffc000000000 ( 64 MB) memory : 0xffffffc000000000 - 0xffffffc400000000 ( 16384 MB) .init : 0xffffffc000642000 -machine_ke x0excf:f5f99:f kexec_efntfrcy0_0d0ump: ca9340 ( 6557 kB) .text : 0xffffffc000080000 - 0xffffffc0006411c4 ( 5893 kB) f .data : 0xffffffc000caa000 - 0xffffffc000cf4f28 (I 43ea9bf002 = 343ea9b0f000 (f0fffffc 3ekaB9)b 000) d D 40S0LU00B8:0 0H0W1 = a4000080000l i(gfnf=f6f4ffc,00008000 0O)r ###################### This doesn't seems to be working. Random behaviors are observed. Some times it rebooted to u-boot prompt. Sometimes kernel soft resets itself in an endless loop (bootlog is repeating over and over again) To debug what is happening I put a while(1) just before jumping into kexec reboot code. diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 31cba91..8843623 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -85,6 +85,7 @@ void soft_restart(unsigned long addr) smp_secondary_shutdown(); + while(1); /* Switch to the identity mapping */ phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); phys_reset(addr); I break into target with BDI3000 now; and see the below output TARGET#0>state Core#0: halted 0xffffffc000085240 External Debug Request Core#1: halted 0x0000004000080394 External Debug Request Core#2: halted 0x0000004000080394 External Debug Request Core#3: halted 0x0000004000080394 External Debug Request Core#4: halted 0xffffffc0000802f8 External Debug Request Core#5: halted 0x0000004000080394 External Debug Request Core#6: halted 0xffffffc0000802f8 External Debug Request Core#7: halted 0x0000004000080394 External Debug Request I think some of the secondary CPUs are not behaving as expected; As of now I don't have any clues for this. --Arun ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-22 9:44 ` Arun Chandran @ 2014-07-22 13:25 ` Arun Chandran 2014-07-24 0:38 ` Geoff Levand 2014-07-30 3:26 ` Feng Kan 2014-07-24 0:10 ` Geoff Levand 2014-07-24 9:13 ` Mark Rutland 2 siblings, 2 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-22 13:25 UTC (permalink / raw) To: linux-arm-kernel Hi Geoff, On Tue, Jul 22, 2014 at 3:14 PM, Arun Chandran <achandran@mvista.com> wrote: > On Thu, Jul 17, 2014 at 4:34 AM, Geoff Levand <geoff@infradead.org> wrote: >> Hi Feng, >> >> On Wed, 2014-07-16 at 10:57 -0700, Feng Kan wrote: >>> Just following up on the conversation. The cpu return address of 0 should work >>> in your case. Since thats the _start of the bootloader, it will run >>> some core init >>> code and then put the core back in wfe. >> >> OK, I fixed up my code so that zero is valid cpu return address. Arun, >> could you try my latest I pushed out today? >> > Hi Geoff, > > Sorry for the late reply I was away. > > Yes I tried the new code. > My dts file has the below change. > > ################ > diff --git a/arch/arm64/boot/dts/apm-storm.dtsi > b/arch/arm64/boot/dts/apm-storm.dtsi > index e0bf91d..b64e549 100644 > --- a/arch/arm64/boot/dts/apm-storm.dtsi > +++ b/arch/arm64/boot/dts/apm-storm.dtsi > @@ -24,64 +24,64 @@ > compatible = "apm,potenza", "arm,armv8"; > reg = <0x0 0x000>; > enable-method = "spin-table"; > - cpu-release-addr = <0x1 0x0000fff8>; > - cpu-return-addr = <0x0 0x0> /* Updated by bootloader */ > + cpu-release-addr = <0x40 0x0000fff8>; > + cpu-return-addr = <0x0 0x0>; /* Updated by bootloader */ > }; > ################# > All other cpu nodes have similar change. > I tried the same dtb with UP configuration. For UP kernel to compile did the below modifications ############################## diff --git a/arch/arm64/kernel/cpu_ops.c b/arch/arm64/kernel/cpu_ops.c index 1ccedb4c..c6a2a7e 100644 --- a/arch/arm64/kernel/cpu_ops.c +++ b/arch/arm64/kernel/cpu_ops.c @@ -89,6 +89,7 @@ void __init cpu_read_bootcpu_ops(void) cpu_read_ops(dn, 0); } +#if 0 int __init cpu_ops_init(void) { int result = 0; @@ -110,3 +111,4 @@ void cpu_ops_shutdown(void) if (cpu_operation_psci.shutdown) cpu_operation_psci.shutdown(); } +#endif diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index fba6d50..c3cf246 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -609,7 +609,7 @@ void machine_kexec(struct kimage *image) flush_icache_range((unsigned long)reboot_code_buffer, (unsigned long)reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE); - dump_cpus(); + //dump_cpus(); pr_info("Bye!\n"); diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 31cba91..6bc85f78 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -83,7 +83,7 @@ void soft_restart(unsigned long addr) setup_restart(); - smp_secondary_shutdown(); + //smp_secondary_shutdown(); /* Switch to the identity mapping */ phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); @@ -130,7 +130,7 @@ void arch_cpu_idle_dead(void) */ void machine_shutdown(void) { - smp_send_stop(); + //smp_send_stop(); } /* diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index c29dde1..14c339c 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -73,7 +73,7 @@ ENTRY(cpu_reset) bic x1, x1, #1 msr sctlr_el1, x1 // disable the MMU isb - bl secondary_shutdown +# bl secondary_shutdown ret x0 ENDPROC(cpu_reset) ##################### With the default target configuration "kexec -e" failed to execute in UP scenario also. But I had some luck when I did the same steps with L3 cache disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html it has an L3 cache. Luckily I was able to disable it in u-boot. With the L3 cache disabled configuration I am able to do "kexec -e". Please see the log attached. Feng, I doubt kernel is unaware of the presence of L3 cache, this subsequently makes "kexec -e" to fail. Do you have any idea how to make the kernel to take control of L3 cache? --Arun -------------- next part -------------- A non-text attachment was scrubbed... Name: kexec_unipro_log Type: application/octet-stream Size: 9227 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140722/8336d688/attachment-0001.obj> ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-22 13:25 ` Arun Chandran @ 2014-07-24 0:38 ` Geoff Levand 2014-07-24 9:36 ` Mark Rutland 2014-07-24 11:50 ` Arun Chandran 2014-07-30 3:26 ` Feng Kan 1 sibling, 2 replies; 61+ messages in thread From: Geoff Levand @ 2014-07-24 0:38 UTC (permalink / raw) To: linux-arm-kernel Hi Arun, On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote: > I tried the same dtb with UP configuration. For UP kernel to compile > did the below modifications I'll test and fixup the kexec UP build in the next few days. ... > With the default target configuration "kexec -e" failed to execute > in UP scenario also. > > But I had some luck when I did the same steps with L3 cache > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html > it has an L3 cache. Luckily I was able to disable it in u-boot. > > With the L3 cache disabled configuration I am able to > do "kexec -e". Please see the log attached. All memory management for the main cpu is done by the arch code. Kexec and cpu hot plug only work with the secondary cpus, so the problem would be in the arch memory code, either in setup_restart() for shutdown, or in the startup code. I guess setup_restart() is not doing something it needs to do for your platform. -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-24 0:38 ` Geoff Levand @ 2014-07-24 9:36 ` Mark Rutland 2014-07-24 12:49 ` Arun Chandran ` (2 more replies) 2014-07-24 11:50 ` Arun Chandran 1 sibling, 3 replies; 61+ messages in thread From: Mark Rutland @ 2014-07-24 9:36 UTC (permalink / raw) To: linux-arm-kernel On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote: > Hi Arun, > > On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote: > > > I tried the same dtb with UP configuration. For UP kernel to compile > > did the below modifications > > I'll test and fixup the kexec UP build in the next few days. > > ... > > > With the default target configuration "kexec -e" failed to execute > > in UP scenario also. It would be helpful to know _how_ it failed. Do you have any log output? > > > > But I had some luck when I did the same steps with L3 cache > > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html > > it has an L3 cache. Luckily I was able to disable it in u-boot. > > > > With the L3 cache disabled configuration I am able to > > do "kexec -e". Please see the log attached. Hmm. We don't expect the kernel to do any L3 management. It seems that memory subsystems with L3 caches respecting cache maintenance by VA are going to become relatively common, and we expect to handle them all by performing maintenance by VA. See commit c218bca74eea (arm64: Relax the kernel cache requirements for boot) for what we do at boot time. > > All memory management for the main cpu is done by the arch code. Kexec > and cpu hot plug only work with the secondary cpus, so the problem would > be in the arch memory code, either in setup_restart() for shutdown, or > in the startup code. It's possible that soft_restart and setup_restart are a little dodgy, as they also rely on the compiler being smart and not touching the stack after setup_restart(). However, they provide absolutely no guarantee that any data has been flushed out to the PoC [1]. If you require any data to be flushed out to the PoC so as to be visible to noncacheable accesses, you will need to ensure that this is flushed by VA before soft_restart is called. Data may have migrated to another cache (e.g. another CPU, or the L3) where it is not visible. > > I guess setup_restart() is not doing something it needs to do for your > platform. Unless you can see soft_restart/setup_restart making stack accesses after the first flush_cache_all call, I suspect your code is not flushing some data it requires. I'd been meaning to clean up soft_restart and setup_restart for a while. I'll have a go at cleaning them up along with any remaining flush_cache_all abuse in arm64. Thanks, Mark. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-24 9:36 ` Mark Rutland @ 2014-07-24 12:49 ` Arun Chandran 2014-07-25 0:17 ` Geoff Levand 2014-07-25 10:26 ` Arun Chandran 2 siblings, 0 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-24 12:49 UTC (permalink / raw) To: linux-arm-kernel On Thu, Jul 24, 2014 at 3:06 PM, Mark Rutland <mark.rutland@arm.com> wrote: > On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote: >> Hi Arun, >> >> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote: >> >> > I tried the same dtb with UP configuration. For UP kernel to compile >> > did the below modifications >> >> I'll test and fixup the kexec UP build in the next few days. >> >> ... >> >> > With the default target configuration "kexec -e" failed to execute >> > in UP scenario also. > > It would be helpful to know _how_ it failed. Do you have any log output? I don't have any error log for for this. If I loop before jumping to relocate_new_kernel and break with BDI I can see. CPU#0>state Core#0: halted 0x0000004000094230 External Debug Request CPU#0>rd GPR00: 000000000000003f 0000000034d5d918 0000004000000000 0000000000000004 GPR04: 000000000000001f 000000000000001b 0000000000000000 ffffffffffffffff GPR08: 0000000000000014 ffffffffffffffff 0000000000000000 0000000000000002 GPR12: ffffffc000085030 aa0e03ed3600004a 9100218cf940018a ffffffffffffffff GPR16: ffffffc0000cc00c 0000000000434260 0000007ffe514d80 00000043eadd2000 GPR20: ffffffc3eadd2000 ffffffc000cc5000 ffffffc3ea42bf88 00000003eadd2000 GPR24: ffffffc0004ada58 ffffffc0005b2e68 ffffffc0005b2e88 ffffffc0005b2ef0 GPR28: ffffffc000092000 ffffffc3f017bce0 ffffffc000085054 ffffffc3f017bce0 x0 contains a corrupted address in this case If I do the same without dosabling Dcache ####### diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 6bc85f78..61f95a2 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -70,10 +70,10 @@ static void setup_restart(void) flush_cache_all(); /* Turn D-cache off */ - cpu_cache_off(); +// cpu_cache_off(); /* Push out any further dirty data, and ensure cache is empty */ - flush_cache_all(); +// flush_cache_all(); } void soft_restart(unsigned long addr) ########## I can see CPU#0>state Core#0: halted 0x0000004000094230 External Debug Request CPU#0> CPU#0>rd GPR00: 00000043eae90000 0000000034d5d91c 0000004000000000 0000000000000004 GPR04: 000000000000001f 000000000000001b 0000000000000000 ffffffffffffffff GPR08: 0000000000000014 ffffffffffffffff 0000000000000000 0000000000000002 GPR12: ffffffc000085028 aa0e03ed3600004a 9100218cf940018a ffffffffffffffff GPR16: ffffffc0000cc00c 0000000000434260 0000007fe79f31e0 00000043eae90000 GPR20: ffffffc3eae90000 ffffffc000cc5000 ffffffc3ea42ef88 00000003eae90000 GPR24: ffffffc0004ada58 ffffffc0005b2e68 ffffffc0005b2e88 ffffffc0005b2ef0 GPR28: ffffffc000092000 ffffffc3ebad7ce0 ffffffc00008504c ffffffc3ebad7ce0 CPU#0> CPU#0>go 0x00000043eae90000 CPU#0> CPU#0> CPU#0>h Core number : 0 Core state : debug (AArch64 EL1) Debug entry cause : External Debug Request Current PC : 0xffffffc000083200 Current CPSR : 0x600003c5 (EL1h) In this case x0 contais the correct jump address; but kexeced goes to " 0xffffffc000083200". It did not print anything at console. If you have any other experiments to find the root cause please tell me I can do it. > >> > >> > But I had some luck when I did the same steps with L3 cache >> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html >> > it has an L3 cache. Luckily I was able to disable it in u-boot. >> > >> > With the L3 cache disabled configuration I am able to >> > do "kexec -e". Please see the log attached. > > Hmm. We don't expect the kernel to do any L3 management. It seems that > memory subsystems with L3 caches respecting cache maintenance by VA are > going to become relatively common, and we expect to handle them all by > performing maintenance by VA. See commit c218bca74eea (arm64: Relax the > kernel cache requirements for boot) for what we do at boot time. > I will confirm this by booting with L3 cache enabled, reboot to u-boot, don't flush L3 from u-boot and again boot to linux. --Arun ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-24 9:36 ` Mark Rutland 2014-07-24 12:49 ` Arun Chandran @ 2014-07-25 0:17 ` Geoff Levand 2014-07-25 10:31 ` Arun Chandran ` (2 more replies) 2014-07-25 10:26 ` Arun Chandran 2 siblings, 3 replies; 61+ messages in thread From: Geoff Levand @ 2014-07-25 0:17 UTC (permalink / raw) To: linux-arm-kernel Hi, On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote: > On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote: > > All memory management for the main cpu is done by the arch code. Kexec > > and cpu hot plug only work with the secondary cpus, so the problem would > > be in the arch memory code, either in setup_restart() for shutdown, or > > in the startup code. > > It's possible that soft_restart and setup_restart are a little dodgy, as > they also rely on the compiler being smart and not touching the stack > after setup_restart(). > > However, they provide absolutely no guarantee that any data has been > flushed out to the PoC [1]. If you require any data to be flushed out to the > PoC so as to be visible to noncacheable accesses, you will need to > ensure that this is flushed by VA before soft_restart is called. Data > may have migrated to another cache (e.g. another CPU, or the L3) where > it is not visible. OK, kexec's reset routine relocate_new_kernel does use a few global variables that are set by the main kexec routines. I added a call to __flush_dcache_area(), which uses 'dc civac' for those. I had thought the call to __flush_dcache_all, which uses 'dc cisw', in flush_cache_all() would be enough. Arun, I also fixed UP builds. Could you pull my latest and try with L3 enabled? -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-25 0:17 ` Geoff Levand @ 2014-07-25 10:31 ` Arun Chandran 2014-07-25 10:36 ` Mark Rutland 2014-07-25 11:48 ` Arun Chandran 2 siblings, 0 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-25 10:31 UTC (permalink / raw) To: linux-arm-kernel On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi, > > On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote: >> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote: > >> > All memory management for the main cpu is done by the arch code. Kexec >> > and cpu hot plug only work with the secondary cpus, so the problem would >> > be in the arch memory code, either in setup_restart() for shutdown, or >> > in the startup code. >> >> It's possible that soft_restart and setup_restart are a little dodgy, as >> they also rely on the compiler being smart and not touching the stack >> after setup_restart(). >> >> However, they provide absolutely no guarantee that any data has been >> flushed out to the PoC [1]. If you require any data to be flushed out to the >> PoC so as to be visible to noncacheable accesses, you will need to >> ensure that this is flushed by VA before soft_restart is called. Data >> may have migrated to another cache (e.g. another CPU, or the L3) where >> it is not visible. > > OK, kexec's reset routine relocate_new_kernel does use a few global > variables that are set by the main kexec routines. I added a call to > __flush_dcache_area(), which uses 'dc civac' for those. > > I had thought the call to __flush_dcache_all, which uses 'dc cisw', in > flush_cache_all() would be enough. > > Arun, I also fixed UP builds. Could you pull my latest and try with L3 > enabled? I tried it but kexeced kernel fails to boot. Ran the code with and without the call to cpu_cache_off(); in arch/arm64/kernel/process.c. I have put more details in another mail. Please see that. --Arun > > -Geoff > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-25 0:17 ` Geoff Levand 2014-07-25 10:31 ` Arun Chandran @ 2014-07-25 10:36 ` Mark Rutland 2014-07-25 11:48 ` Arun Chandran 2 siblings, 0 replies; 61+ messages in thread From: Mark Rutland @ 2014-07-25 10:36 UTC (permalink / raw) To: linux-arm-kernel On Fri, Jul 25, 2014 at 01:17:48AM +0100, Geoff Levand wrote: > Hi, > > On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote: > > On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote: > > > > All memory management for the main cpu is done by the arch code. Kexec > > > and cpu hot plug only work with the secondary cpus, so the problem would > > > be in the arch memory code, either in setup_restart() for shutdown, or > > > in the startup code. > > > > It's possible that soft_restart and setup_restart are a little dodgy, as > > they also rely on the compiler being smart and not touching the stack > > after setup_restart(). > > > > However, they provide absolutely no guarantee that any data has been > > flushed out to the PoC [1]. If you require any data to be flushed out to the > > PoC so as to be visible to noncacheable accesses, you will need to > > ensure that this is flushed by VA before soft_restart is called. Data > > may have migrated to another cache (e.g. another CPU, or the L3) where > > it is not visible. > > OK, kexec's reset routine relocate_new_kernel does use a few global > variables that are set by the main kexec routines. I added a call to > __flush_dcache_area(), which uses 'dc civac' for those. > > I had thought the call to __flush_dcache_all, which uses 'dc cisw', in > flush_cache_all() would be enough. Unfortunately Set/Way ops don't provide the guarantee you require. While they may happen to force prior writes out to the PoC on some implementations, this guarantee is not provided by the architecture. The only way to guarantee data is out at the PoC per the architecture is to use cache maintenance by VA (unless you are unlucky enough to be on a 32-bit system with an outer cache that requires MMIO maintenance). Almost every use of Set/Way operations is dodgy. They only make sense for IMPLEMENTATION DEFINED power-on / power-off sequences (which the arm64 kernel won't be dealing with), and some edge cases where you need to guarantee that the local d-caches are empty (to avoid unexpectedly hitting in the cache). The naming of __flush_dcache_all is certainly misleading and I intend to address that shortly. Thanks, Mark. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-25 0:17 ` Geoff Levand 2014-07-25 10:31 ` Arun Chandran 2014-07-25 10:36 ` Mark Rutland @ 2014-07-25 11:48 ` Arun Chandran 2014-07-25 12:14 ` Mark Rutland 2014-07-26 0:18 ` Geoff Levand 2 siblings, 2 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-25 11:48 UTC (permalink / raw) To: linux-arm-kernel Hi Geoff, On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi, > > On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote: >> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote: > >> > All memory management for the main cpu is done by the arch code. Kexec >> > and cpu hot plug only work with the secondary cpus, so the problem would >> > be in the arch memory code, either in setup_restart() for shutdown, or >> > in the startup code. >> >> It's possible that soft_restart and setup_restart are a little dodgy, as >> they also rely on the compiler being smart and not touching the stack >> after setup_restart(). >> >> However, they provide absolutely no guarantee that any data has been >> flushed out to the PoC [1]. If you require any data to be flushed out to the >> PoC so as to be visible to noncacheable accesses, you will need to >> ensure that this is flushed by VA before soft_restart is called. Data >> may have migrated to another cache (e.g. another CPU, or the L3) where >> it is not visible. > > OK, kexec's reset routine relocate_new_kernel does use a few global > variables that are set by the main kexec routines. I added a call to > __flush_dcache_area(), which uses 'dc civac' for those. > > I had thought the call to __flush_dcache_all, which uses 'dc cisw', in > flush_cache_all() would be enough. > > Arun, I also fixed UP builds. Could you pull my latest and try with L3 > enabled? > I got this working. As 'Mark Rutland' pointed in another mail that it could be problem with flushing the cache; I did a read of 1GB data from start of RAM to a volatile var. I assume that this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M) ################### diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 786daa6..90418f3 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -63,6 +63,10 @@ static inline void smp_secondary_shutdown(void) {} static void setup_restart(void) { + volatile u64 tmp; + volatile u64 *addr; + + addr = (u64 *)(0xffffffc000000000); /* * Tell the mm system that we are going to reboot - * we may need it to insert some 1:1 mappings so that @@ -75,6 +79,11 @@ static void setup_restart(void) /* Clean and invalidate caches */ flush_cache_all(); + for ( ;addr < (u64 *)0xffffffc040000000; addr++) + { + tmp = *addr; + } + /* Turn D-cache off */ cpu_cache_off(); ################### With the above change latest kernel @ https://git.linaro.org/people/geoff.levand/linux-kexec.git is able to do kexec with L3 enabled in UP scenario. --Arun > -Geoff > ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-25 11:48 ` Arun Chandran @ 2014-07-25 12:14 ` Mark Rutland 2014-07-25 15:29 ` Arun Chandran 2014-07-26 0:18 ` Geoff Levand 1 sibling, 1 reply; 61+ messages in thread From: Mark Rutland @ 2014-07-25 12:14 UTC (permalink / raw) To: linux-arm-kernel On Fri, Jul 25, 2014 at 12:48:04PM +0100, Arun Chandran wrote: > Hi Geoff, > > On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote: > > Hi, > > > > On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote: > >> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote: > > > >> > All memory management for the main cpu is done by the arch code. Kexec > >> > and cpu hot plug only work with the secondary cpus, so the problem would > >> > be in the arch memory code, either in setup_restart() for shutdown, or > >> > in the startup code. > >> > >> It's possible that soft_restart and setup_restart are a little dodgy, as > >> they also rely on the compiler being smart and not touching the stack > >> after setup_restart(). > >> > >> However, they provide absolutely no guarantee that any data has been > >> flushed out to the PoC [1]. If you require any data to be flushed out to the > >> PoC so as to be visible to noncacheable accesses, you will need to > >> ensure that this is flushed by VA before soft_restart is called. Data > >> may have migrated to another cache (e.g. another CPU, or the L3) where > >> it is not visible. > > > > OK, kexec's reset routine relocate_new_kernel does use a few global > > variables that are set by the main kexec routines. I added a call to > > __flush_dcache_area(), which uses 'dc civac' for those. > > > > I had thought the call to __flush_dcache_all, which uses 'dc cisw', in > > flush_cache_all() would be enough. > > > > Arun, I also fixed UP builds. Could you pull my latest and try with L3 > > enabled? > > > I got this working. As 'Mark Rutland' pointed in another mail that > it could be problem with flushing the cache; I did a read of > 1GB data from start of RAM to a volatile var. I assume that > this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M) You've managed to get the cache to evict some lines, which proves my theory, but this is absolute nonsense and guarantees nothing. So NAK to this. If you need to perform cache maintenance to guarantee data is visible to non-cacheable accesses you _must_ use the architected mechanism for cleaning data to the PoC: DC CVAC. We have wrappers for flushing ranges. Anything else is nonsense and does not provide the guarantee you need. That said, I still am not sure what guarantee you are attempting to get. Which data do you need out at the PoC? Thanks, Mark. > > ################### > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index 786daa6..90418f3 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -63,6 +63,10 @@ static inline void smp_secondary_shutdown(void) {} > > static void setup_restart(void) > { > + volatile u64 tmp; > + volatile u64 *addr; > + > + addr = (u64 *)(0xffffffc000000000); > /* > * Tell the mm system that we are going to reboot - > * we may need it to insert some 1:1 mappings so that > @@ -75,6 +79,11 @@ static void setup_restart(void) > /* Clean and invalidate caches */ > flush_cache_all(); > > + for ( ;addr < (u64 *)0xffffffc040000000; addr++) > + { > + tmp = *addr; > + } > + > /* Turn D-cache off */ > cpu_cache_off(); > > ################### > > With the above change latest kernel @ > https://git.linaro.org/people/geoff.levand/linux-kexec.git > is able to do kexec with L3 enabled in UP scenario. > > --Arun > > > > > > > > > > > > > > > -Geoff > > > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-25 12:14 ` Mark Rutland @ 2014-07-25 15:29 ` Arun Chandran 0 siblings, 0 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-25 15:29 UTC (permalink / raw) To: linux-arm-kernel On Fri, Jul 25, 2014 at 5:44 PM, Mark Rutland <mark.rutland@arm.com> wrote: > On Fri, Jul 25, 2014 at 12:48:04PM +0100, Arun Chandran wrote: >> Hi Geoff, >> >> On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote: >> > Hi, >> > >> > On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote: >> >> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote: >> > >> >> > All memory management for the main cpu is done by the arch code. Kexec >> >> > and cpu hot plug only work with the secondary cpus, so the problem would >> >> > be in the arch memory code, either in setup_restart() for shutdown, or >> >> > in the startup code. >> >> >> >> It's possible that soft_restart and setup_restart are a little dodgy, as >> >> they also rely on the compiler being smart and not touching the stack >> >> after setup_restart(). >> >> >> >> However, they provide absolutely no guarantee that any data has been >> >> flushed out to the PoC [1]. If you require any data to be flushed out to the >> >> PoC so as to be visible to noncacheable accesses, you will need to >> >> ensure that this is flushed by VA before soft_restart is called. Data >> >> may have migrated to another cache (e.g. another CPU, or the L3) where >> >> it is not visible. >> > >> > OK, kexec's reset routine relocate_new_kernel does use a few global >> > variables that are set by the main kexec routines. I added a call to >> > __flush_dcache_area(), which uses 'dc civac' for those. >> > >> > I had thought the call to __flush_dcache_all, which uses 'dc cisw', in >> > flush_cache_all() would be enough. >> > >> > Arun, I also fixed UP builds. Could you pull my latest and try with L3 >> > enabled? >> > >> I got this working. As 'Mark Rutland' pointed in another mail that >> it could be problem with flushing the cache; I did a read of >> 1GB data from start of RAM to a volatile var. I assume that >> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M) > > You've managed to get the cache to evict some lines, which proves my > theory, but this is absolute nonsense and guarantees nothing. > > So NAK to this. > Yes I was just shooting wildly. > If you need to perform cache maintenance to guarantee data is visible to > non-cacheable accesses you _must_ use the architected mechanism for > cleaning data to the PoC: DC CVAC. We have wrappers for flushing ranges. > > Anything else is nonsense and does not provide the guarantee you need. > > That said, I still am not sure what guarantee you are attempting to get. > Which data do you need out at the PoC? > I tried flushing the jump addr ########## +static unsigned long jump_addr_save; void soft_restart(unsigned long addr) { typedef void (*phys_reset_t)(unsigned long); phys_reset_t phys_reset; + phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); + jump_addr_save = addr; + __flush_dcache_area(&jump_addr_save, 16); + __flush_dcache_area(&phys_reset, 16); setup_restart(); /* Switch to the identity mapping */ - phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); - phys_reset(addr); + phys_reset(jump_addr_save); /* Should never get here */ BUG(); ########### And flushing all the source and destination pages of the kexeced kernel ########## diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 2995c78..3edf567 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -221,6 +221,8 @@ static void _kexec_entry_dump(const char *func, int line, addr, (unsigned long)virt_to_phys(dest), dest); + __flush_dcache_area(addr, PAGE_SIZE); + __flush_dcache_area(dest, PAGE_SIZE); dest += PAGE_SIZE; break; case IND_DONE: ########### It still fails(not reboot to kexeced kernel); That means I miss flushing of some other area. --Arun > Thanks, > Mark. > >> >> ################### >> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c >> index 786daa6..90418f3 100644 >> --- a/arch/arm64/kernel/process.c >> +++ b/arch/arm64/kernel/process.c >> @@ -63,6 +63,10 @@ static inline void smp_secondary_shutdown(void) {} >> >> static void setup_restart(void) >> { >> + volatile u64 tmp; >> + volatile u64 *addr; >> + >> + addr = (u64 *)(0xffffffc000000000); >> /* >> * Tell the mm system that we are going to reboot - >> * we may need it to insert some 1:1 mappings so that >> @@ -75,6 +79,11 @@ static void setup_restart(void) >> /* Clean and invalidate caches */ >> flush_cache_all(); >> >> + for ( ;addr < (u64 *)0xffffffc040000000; addr++) >> + { >> + tmp = *addr; >> + } >> + >> /* Turn D-cache off */ >> cpu_cache_off(); >> >> ################### >> >> With the above change latest kernel @ >> https://git.linaro.org/people/geoff.levand/linux-kexec.git >> is able to do kexec with L3 enabled in UP scenario. >> >> --Arun >> >> >> >> >> >> >> >> >> >> >> >> >> >> > -Geoff >> > >> ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-25 11:48 ` Arun Chandran 2014-07-25 12:14 ` Mark Rutland @ 2014-07-26 0:18 ` Geoff Levand 2014-07-28 15:00 ` Arun Chandran 1 sibling, 1 reply; 61+ messages in thread From: Geoff Levand @ 2014-07-26 0:18 UTC (permalink / raw) To: linux-arm-kernel Hi Arun, On Fri, 2014-07-25 at 17:18 +0530, Arun Chandran wrote: > I got this working. As 'Mark Rutland' pointed in another mail that > it could be problem with flushing the cache; I did a read of > 1GB data from start of RAM to a volatile var. I assume that > this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M) I wasn't flushing out all the data used by relocate_new_kernel. I added a routine that should flush all the pages in the kimage list out to PoC. Please try a UP build with L3 enabled. Next week I'll check that my smp secondary shutdown code is doing the correct thing. -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-26 0:18 ` Geoff Levand @ 2014-07-28 15:00 ` Arun Chandran 2014-07-28 15:38 ` Mark Rutland 0 siblings, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-07-28 15:00 UTC (permalink / raw) To: linux-arm-kernel Hi Geoff On Sat, Jul 26, 2014 at 5:48 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Arun, > > On Fri, 2014-07-25 at 17:18 +0530, Arun Chandran wrote: >> I got this working. As 'Mark Rutland' pointed in another mail that >> it could be problem with flushing the cache; I did a read of >> 1GB data from start of RAM to a volatile var. I assume that >> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M) > > I wasn't flushing out all the data used by relocate_new_kernel. I added > a routine that should flush all the pages in the kimage list out to PoC. > Please try a UP build with L3 enabled. > Yes I verified this new kernel by stopping just before jumping to the kexeced kernel and taking the memory dump for the kernel and dtb. CPU#0>dump 0x0000004000080000 0x7F2000 uImage CPU#0>rd GPR00: 0000004000880000 0000000000000000 0000000000000000 0000000000000000 GPR04: 0000004000080004 000000000000001b 0000000000000000 ffffffffffffffff GPR08: 0000000000000020 ffffffffffffffff 0000000000000004 0000000000000002 GPR12: 00000043ea439fd8 0000004000883000 00000043ea631000 ffffffffffffffff GPR16: ffffffc0000cc31c 0000000000435260 0000007fe9328b90 00000043eaddc002 GPR20: 0000004000883000 00000043ea632000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000000 ffffffc000085074 ffffffc3eae53d00 And compared this image with the one taken after booting with my working one(The one with 1GB read of RAM). Both are Identical. That means kexec code now flushes data properly. Note that in both cases I used the same secondary kernel. But It fails to boot kexeced kernel. If I break the target I can see CPU#0>h Core number : 0 Core state : debug (AArch64 EL1) Debug entry cause : External Debug Request Current PC : 0xffffffc000083200 Current CPSR : 0x600003c5 (EL1h) So I guess there may be something wrong with the booting of kernel. I have these changes to the code. 1) ######### diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 00cfbd6..da3672b 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -684,7 +691,7 @@ void machine_kexec(struct kimage *image) /* Flush the reboot_code_buffer in preparation for its execution. */ flush_icache_range((unsigned long)reboot_code_buffer, - relocate_new_kernel_size); + (unsigned long)(reboot_code_buffer + relocate_new_kernel_size)); /* * Flush any data used by relocate_new_kernel in preparation for ######### Passing of second variable to flush_icache_range() is wrong it expects an address not length. 2) ####### diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 9ed7327..e3fc8d6 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr) { typedef void (*phys_reset_t)(unsigned long); phys_reset_t phys_reset; + unsigned long jump_addr = addr; + + phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); + + __flush_dcache_area(&jump_addr, 8); + __flush_dcache_area(&phys_reset, 8); setup_restart(); /* Switch to the identity mapping */ - phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); - phys_reset(addr); + phys_reset(jump_addr); /* Should never get here */ BUG(); ######## Without this flushing it will jump to wrong addr. --Arun ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-28 15:00 ` Arun Chandran @ 2014-07-28 15:38 ` Mark Rutland 2014-07-29 0:09 ` Geoff Levand 0 siblings, 1 reply; 61+ messages in thread From: Mark Rutland @ 2014-07-28 15:38 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote: > Hi Geoff > > On Sat, Jul 26, 2014 at 5:48 AM, Geoff Levand <geoff@infradead.org> wrote: > > Hi Arun, > > > > On Fri, 2014-07-25 at 17:18 +0530, Arun Chandran wrote: > >> I got this working. As 'Mark Rutland' pointed in another mail that > >> it could be problem with flushing the cache; I did a read of > >> 1GB data from start of RAM to a volatile var. I assume that > >> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M) > > > > I wasn't flushing out all the data used by relocate_new_kernel. I added > > a routine that should flush all the pages in the kimage list out to PoC. > > Please try a UP build with L3 enabled. > > > > Yes I verified this new kernel by stopping just before jumping to the > kexeced kernel and taking the memory dump for the kernel > and dtb. > > CPU#0>dump 0x0000004000080000 0x7F2000 uImage > CPU#0>rd > GPR00: 0000004000880000 0000000000000000 0000000000000000 0000000000000000 > GPR04: 0000004000080004 000000000000001b 0000000000000000 ffffffffffffffff > GPR08: 0000000000000020 ffffffffffffffff 0000000000000004 0000000000000002 > GPR12: 00000043ea439fd8 0000004000883000 00000043ea631000 ffffffffffffffff > GPR16: ffffffc0000cc31c 0000000000435260 0000007fe9328b90 00000043eaddc002 > GPR20: 0000004000883000 00000043ea632000 0000000000000000 0000000000000000 > GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR28: 0000000000000000 0000000000000000 ffffffc000085074 ffffffc3eae53d00 > > And compared this image with the one taken after booting with > my working one(The one with 1GB read of RAM). Both are Identical. > That means kexec code now flushes data properly. Note that in both > cases I used the same secondary kernel. > > But It fails to boot kexeced kernel. If I break the target I can see > CPU#0>h > Core number : 0 > Core state : debug (AArch64 EL1) > Debug entry cause : External Debug Request > Current PC : 0xffffffc000083200 > Current CPSR : 0x600003c5 (EL1h) > > > So I guess there may be something wrong with > the booting of kernel. > > I have these changes to the code. > 1) > ######### > diff --git a/arch/arm64/kernel/machine_kexec.c > b/arch/arm64/kernel/machine_kexec.c > index 00cfbd6..da3672b 100644 > --- a/arch/arm64/kernel/machine_kexec.c > +++ b/arch/arm64/kernel/machine_kexec.c > @@ -684,7 +691,7 @@ void machine_kexec(struct kimage *image) > /* Flush the reboot_code_buffer in preparation for its execution. */ > > flush_icache_range((unsigned long)reboot_code_buffer, > - relocate_new_kernel_size); > + (unsigned long)(reboot_code_buffer + relocate_new_kernel_size)); > > /* > * Flush any data used by relocate_new_kernel in preparation for > ######### > Passing of second variable to flush_icache_range() is wrong > it expects an address not length. A simpler option would be to nuke the entire icache before branching to the new image. > > 2) > > ####### > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index 9ed7327..e3fc8d6 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > > @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr) > { > typedef void (*phys_reset_t)(unsigned long); > phys_reset_t phys_reset; > + unsigned long jump_addr = addr; > + > + phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); > + > + __flush_dcache_area(&jump_addr, 8); > + __flush_dcache_area(&phys_reset, 8); Are these values really not getting stashed in registers? If the compiler is spilling, then we have absolutely no guarantee about any part of the stack. If that's the case, then we can't use the stack at all. These need to be rewritten in asm if the compiler is spilling. Thanks, Mark. > > setup_restart(); > > /* Switch to the identity mapping */ > - phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); > - phys_reset(addr); > + phys_reset(jump_addr); > > /* Should never get here */ > BUG(); > ######## > > Without this flushing it will jump to wrong addr. > > --Arun > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-28 15:38 ` Mark Rutland @ 2014-07-29 0:09 ` Geoff Levand 2014-07-29 9:10 ` Mark Rutland 2014-07-29 12:32 ` Arun Chandran 0 siblings, 2 replies; 61+ messages in thread From: Geoff Levand @ 2014-07-29 0:09 UTC (permalink / raw) To: linux-arm-kernel Hi, On Mon, 2014-07-28 at 16:38 +0100, Mark Rutland wrote: > On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote: > > I have these changes to the code. > > flush_icache_range((unsigned long)reboot_code_buffer, > > - relocate_new_kernel_size); > > + (unsigned long)(reboot_code_buffer + relocate_new_kernel_size)); Thanks, I introduced this in my last version in an attempt to clean up the code, but on studying setup_restart(), I wonder if we even need to do this icache flush here (see below). > > /* > > * Flush any data used by relocate_new_kernel in preparation for > > ######### > > Passing of second variable to flush_icache_range() is wrong > > it expects an address not length. > > A simpler option would be to nuke the entire icache before branching to > the new image. flush_cache_all(), which is called by setup_restart(), does a 'ic ialluis'. The ARM says that this will invalidate all instruction caches for the inner shareable domain. Do we need something more? > > 2) > > > > ####### > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > > index 9ed7327..e3fc8d6 100644 > > --- a/arch/arm64/kernel/process.c > > +++ b/arch/arm64/kernel/process.c > > > > @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr) > > { > > typedef void (*phys_reset_t)(unsigned long); > > phys_reset_t phys_reset; > > + unsigned long jump_addr = addr; > > + > > + phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); > > + > > + __flush_dcache_area(&jump_addr, 8); > > + __flush_dcache_area(&phys_reset, 8); > > Are these values really not getting stashed in registers? Looking at the disassembled code of soft_restart() from my compiler, addr is being saved on the stack over the call to setup_restart(), which I would expect it to do. > If the compiler is spilling, then we have absolutely no guarantee about > any part of the stack. If that's the case, then we can't use the stack > at all. These need to be rewritten in asm if the compiler is spilling. I think we just need to put the restart addr in a variable and flush that to the PoC. Arun, I pushed out a fixed version of soft_restart(), so please try another UP + L3 boot. -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-29 0:09 ` Geoff Levand @ 2014-07-29 9:10 ` Mark Rutland 2014-07-29 12:32 ` Arun Chandran 1 sibling, 0 replies; 61+ messages in thread From: Mark Rutland @ 2014-07-29 9:10 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 29, 2014 at 01:09:08AM +0100, Geoff Levand wrote: > Hi, Hi, > On Mon, 2014-07-28 at 16:38 +0100, Mark Rutland wrote: > > On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote: > > > I have these changes to the code. > > > flush_icache_range((unsigned long)reboot_code_buffer, > > > - relocate_new_kernel_size); > > > + (unsigned long)(reboot_code_buffer + relocate_new_kernel_size)); > > Thanks, I introduced this in my last version in an attempt to clean up > the code, but on studying setup_restart(), I wonder if we even need to > do this icache flush here (see below). > > > > /* > > > * Flush any data used by relocate_new_kernel in preparation for > > > ######### > > > Passing of second variable to flush_icache_range() is wrong > > > it expects an address not length. > > > > A simpler option would be to nuke the entire icache before branching to > > the new image. > > flush_cache_all(), which is called by setup_restart(), does a 'ic > ialluis'. The ARM says that this will invalidate all instruction caches > for the inner shareable domain. Do we need something more? If we have that before branching to the new image then that should be ok. The current image should already be visible at the PoC per the boot protocol. > > > 2) > > > > > > ####### > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > > > index 9ed7327..e3fc8d6 100644 > > > --- a/arch/arm64/kernel/process.c > > > +++ b/arch/arm64/kernel/process.c > > > > > > @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr) > > > { > > > typedef void (*phys_reset_t)(unsigned long); > > > phys_reset_t phys_reset; > > > + unsigned long jump_addr = addr; > > > + > > > + phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); > > > + > > > + __flush_dcache_area(&jump_addr, 8); > > > + __flush_dcache_area(&phys_reset, 8); > > > > Are these values really not getting stashed in registers? > > Looking at the disassembled code of soft_restart() from my compiler, > addr is being saved on the stack over the call to setup_restart(), which > I would expect it to do. > > If the compiler is spilling, then we have absolutely no guarantee about > > any part of the stack. If that's the case, then we can't use the stack > > at all. These need to be rewritten in asm if the compiler is spilling. > > I think we just need to put the restart addr in a variable and flush > that to the PoC. I don't believe that flushing the restart addr variable on the stack out to the PoC is the correct fix here; it only guarantees that said variable is visible, not anything else the compiler may have placed on the stack. I wonder how setup_restart interacts with CONFIG_CC_STACKPROTECTOR for example. As far as I can see, the only way to provide the guarantee we require here is to not use the stack. Anything short of that is not going to be much more robust than the current code. Thanks, Mark. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-29 0:09 ` Geoff Levand 2014-07-29 9:10 ` Mark Rutland @ 2014-07-29 12:32 ` Arun Chandran 2014-07-29 13:35 ` Mark Rutland 1 sibling, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-07-29 12:32 UTC (permalink / raw) To: linux-arm-kernel Hi Geoff, On Tue, Jul 29, 2014 at 5:39 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi, > > On Mon, 2014-07-28 at 16:38 +0100, Mark Rutland wrote: >> On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote: >> > I have these changes to the code. >> > flush_icache_range((unsigned long)reboot_code_buffer, >> > - relocate_new_kernel_size); >> > + (unsigned long)(reboot_code_buffer + relocate_new_kernel_size)); > > Thanks, I introduced this in my last version in an attempt to clean up > the code, but on studying setup_restart(), I wonder if we even need to > do this icache flush here (see below). > >> > /* >> > * Flush any data used by relocate_new_kernel in preparation for >> > ######### >> > Passing of second variable to flush_icache_range() is wrong >> > it expects an address not length. >> >> A simpler option would be to nuke the entire icache before branching to >> the new image. > > flush_cache_all(), which is called by setup_restart(), does a 'ic > ialluis'. The ARM says that this will invalidate all instruction caches > for the inner shareable domain. Do we need something more? > >> > 2) >> > >> > ####### >> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c >> > index 9ed7327..e3fc8d6 100644 >> > --- a/arch/arm64/kernel/process.c >> > +++ b/arch/arm64/kernel/process.c >> > >> > @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr) >> > { >> > typedef void (*phys_reset_t)(unsigned long); >> > phys_reset_t phys_reset; >> > + unsigned long jump_addr = addr; >> > + >> > + phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); >> > + >> > + __flush_dcache_area(&jump_addr, 8); >> > + __flush_dcache_area(&phys_reset, 8); >> >> Are these values really not getting stashed in registers? > > Looking at the disassembled code of soft_restart() from my compiler, > addr is being saved on the stack over the call to setup_restart(), which > I would expect it to do. > Yes my compiler also saves this in stack >> If the compiler is spilling, then we have absolutely no guarantee about >> any part of the stack. If that's the case, then we can't use the stack >> at all. These need to be rewritten in asm if the compiler is spilling. > > I think we just need to put the restart addr in a variable and flush > that to the PoC. > > Arun, I pushed out a fixed version of soft_restart(), so please try > another UP + L3 boot. > The default code did not work. It is working with the change below ############### diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 5632473..7c5f859 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb) /** * kexec_list_walk - Helper to walk the kimage page list. */ - +static int kexec_kernel_size; +#define IMG_SIZE_NONE 0 +#define KERN_SIZE_FLAG 1 +#define DTB_SIZE_FLAG 2 static void kexec_list_walk(void *ctx, unsigned long kimage_head, void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest)) { void *dest; unsigned long *entry; + int imgsize_flag = IMG_SIZE_NONE; + for (entry = &kimage_head, dest = NULL; ; entry++) { unsigned int flag = *entry & IND_FLAGS; @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned long kimage_head, cb(ctx, flag, addr, NULL); break; case IND_DESTINATION: + if (imgsize_flag == IMG_SIZE_NONE) { + kexec_kernel_size = 0; + imgsize_flag = KERN_SIZE_FLAG; + } else if (imgsize_flag == KERN_SIZE_FLAG) { + imgsize_flag = DTB_SIZE_FLAG; + } dest = addr; cb(ctx, flag, addr, NULL); break; case IND_SOURCE: + if (imgsize_flag == KERN_SIZE_FLAG) + kexec_kernel_size++; cb(ctx, flag, addr, dest); dest += PAGE_SIZE; break; @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image) kexec_list_walk(NULL, image->head, kexec_list_flush_cb); + /* + * Make sure virtual addresses of new kernel are flushed + * SZ_512K = TEXT_OFFSET + * kexec_kernel = kexec_kernel_size * PAGE_SIZE + * Don't know = (SZ_4M + SZ_1M) + * SZ_4M = not working + * SZ_6M = working + * SZ_8M = working + * + * so chose SZ_4M + SZ_1M; Don't know why this is required + * BSS, stack ?? + * + */ + __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K + (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M); + soft_restart(reboot_code_buffer_phys); } --Arun ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-29 12:32 ` Arun Chandran @ 2014-07-29 13:35 ` Mark Rutland 2014-07-29 21:19 ` Geoff Levand ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Mark Rutland @ 2014-07-29 13:35 UTC (permalink / raw) To: linux-arm-kernel [...] > The default code did not work. > > It is working with the change below > > ############### > diff --git a/arch/arm64/kernel/machine_kexec.c > b/arch/arm64/kernel/machine_kexec.c > index 5632473..7c5f859 100644 > --- a/arch/arm64/kernel/machine_kexec.c > +++ b/arch/arm64/kernel/machine_kexec.c > @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb) > /** > * kexec_list_walk - Helper to walk the kimage page list. > */ > - > +static int kexec_kernel_size; > +#define IMG_SIZE_NONE 0 > +#define KERN_SIZE_FLAG 1 > +#define DTB_SIZE_FLAG 2 > static void kexec_list_walk(void *ctx, unsigned long kimage_head, > void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest)) > { > void *dest; > unsigned long *entry; > + int imgsize_flag = IMG_SIZE_NONE; > + > > for (entry = &kimage_head, dest = NULL; ; entry++) { > unsigned int flag = *entry & IND_FLAGS; > @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned > long kimage_head, > cb(ctx, flag, addr, NULL); > break; > case IND_DESTINATION: > + if (imgsize_flag == IMG_SIZE_NONE) { > + kexec_kernel_size = 0; > + imgsize_flag = KERN_SIZE_FLAG; > + } else if (imgsize_flag == KERN_SIZE_FLAG) { > + imgsize_flag = DTB_SIZE_FLAG; > + } > dest = addr; > cb(ctx, flag, addr, NULL); > break; > case IND_SOURCE: > + if (imgsize_flag == KERN_SIZE_FLAG) > + kexec_kernel_size++; > cb(ctx, flag, addr, dest); > dest += PAGE_SIZE; > break; > @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image) > > kexec_list_walk(NULL, image->head, kexec_list_flush_cb); > > + /* > + * Make sure virtual addresses of new kernel are flushed > + * SZ_512K = TEXT_OFFSET TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also shouldn't need to be flushed. Since c218bca74eea (arm64: Relax the kernel cache requirements for boot), the kernel will flush the cache for anything outside of the Image that it writes to before enabling the MMU and caches (e.g. the idmap and swapper page tables). Once caches are up we shouldn't care. Assuming that the existing kernel code is correct, the only region we should need to flush out to the PoC is the region from _text to _edata (i.e. just the contents of the Image). > + * kexec_kernel = kexec_kernel_size * PAGE_SIZE > + * Don't know = (SZ_4M + SZ_1M) > + * SZ_4M = not working > + * SZ_6M = working > + * SZ_8M = working > + * > + * so chose SZ_4M + SZ_1M; Don't know why this is required > + * BSS, stack ?? > + * > + */ > + __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K + > (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M); > + > soft_restart(reboot_code_buffer_phys); > } How big exactly is the kernel Image you're trying to kexec? Mark. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-29 13:35 ` Mark Rutland @ 2014-07-29 21:19 ` Geoff Levand 2014-07-30 7:22 ` Arun Chandran 2014-07-30 5:46 ` Arun Chandran 2014-07-30 7:01 ` Arun Chandran 2 siblings, 1 reply; 61+ messages in thread From: Geoff Levand @ 2014-07-29 21:19 UTC (permalink / raw) To: linux-arm-kernel Hi Mark, On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote: > Since c218bca74eea (arm64: Relax the kernel cache requirements for > boot), the kernel will flush the cache for anything outside of the Image > that it writes to before enabling the MMU and caches (e.g. the idmap and > swapper page tables). Once caches are up we shouldn't care. > > Assuming that the existing kernel code is correct, the only region we > should need to flush out to the PoC is the region from _text to _edata > (i.e. just the contents of the Image). If the new kernel will overwrite the old one, then we do the final copy of the new kernel in the relocate_new_kernel routine. relocate_new_kernel is executed after the dcache is disabled, so that should write it directly to the PoC. It seems the protocol expects us to invalidate the dcache for that range though, so I added code to do this, essentially what Arun had added. Arun, please try. -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-29 21:19 ` Geoff Levand @ 2014-07-30 7:22 ` Arun Chandran 2014-08-01 11:13 ` Arun Chandran 2014-08-04 10:16 ` Arun Chandran 0 siblings, 2 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-30 7:22 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Mark, > > On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote: >> Since c218bca74eea (arm64: Relax the kernel cache requirements for >> boot), the kernel will flush the cache for anything outside of the Image >> that it writes to before enabling the MMU and caches (e.g. the idmap and >> swapper page tables). Once caches are up we shouldn't care. >> >> Assuming that the existing kernel code is correct, the only region we >> should need to flush out to the PoC is the region from _text to _edata >> (i.e. just the contents of the Image). > > If the new kernel will overwrite the old one, then we do the final copy > of the new kernel in the relocate_new_kernel routine. relocate_new_kernel > is executed after the dcache is disabled, so that should write it directly > to the PoC. It seems the protocol expects us to invalidate the dcache > for that range though, so I added code to do this, essentially what Arun > had added. > > Arun, please try. > It works without any hiccups :).. I have attached the log. I will try with big-endian UP configuration next. Just had to comment the crashdump code to compile with defconfig. diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 233cd04..6f4ebd8 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -377,6 +377,7 @@ static void __init __maybe_unused reserve_crashkernel(void) insert_resource(&iomem_resource, &crashk_res); } +#ifdef CONFIG_CRASH_DUMP static void __init __maybe_unused reserve_elfcorehdr(void) { struct resource res; @@ -402,6 +403,7 @@ static void __init __maybe_unused reserve_elfcorehdr(void) res.end = elfcorehdr_addr + elfcorehdr_size - 1; insert_resource(&iomem_resource, &res); } +#endif static void __init request_standard_resources(void) { --Arun -------------- next part -------------- A non-text attachment was scrubbed... Name: kexec_log Type: application/octet-stream Size: 15206 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140730/b4be6f8b/attachment-0001.obj> ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-30 7:22 ` Arun Chandran @ 2014-08-01 11:13 ` Arun Chandran 2014-08-03 14:47 ` Mark Rutland 2014-08-04 10:16 ` Arun Chandran 1 sibling, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-08-01 11:13 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote: > On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote: >> Hi Mark, >> >> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote: >>> Since c218bca74eea (arm64: Relax the kernel cache requirements for >>> boot), the kernel will flush the cache for anything outside of the Image >>> that it writes to before enabling the MMU and caches (e.g. the idmap and >>> swapper page tables). Once caches are up we shouldn't care. >>> >>> Assuming that the existing kernel code is correct, the only region we >>> should need to flush out to the PoC is the region from _text to _edata >>> (i.e. just the contents of the Image). >> >> If the new kernel will overwrite the old one, then we do the final copy >> of the new kernel in the relocate_new_kernel routine. relocate_new_kernel >> is executed after the dcache is disabled, so that should write it directly >> to the PoC. It seems the protocol expects us to invalidate the dcache >> for that range though, so I added code to do this, essentially what Arun >> had added. >> >> Arun, please try. >> > It works without any hiccups :).. > I have attached the log. > > I will try with big-endian UP configuration next. > This question may be irrelevant to kexec and may be stupid also. while debugging kexec in BIG-endian configuration I see _create_page_tables (arch/arm64/kernel/head.S) is doing __inval_cache_range on the addresses from idmap_pg_dir to swapper_pg_dir. ie from 0x4000D5F000 to 0x4000D61000 + #SWAPPER_DIR_SIZE in my case. Is it supposed to clear the corresponding virtual addresses? There might be chance that first stage kernel may be using VA from the same area right? (cache (L3)containing valid lines in the area 0xffffffc000d5f000 to 0xffffffc000d5f000 + 16K) Or is it the duty of the kexec to flush the entire VA regions used by the first stage kernel and the VA regions going to be used by the 2nd stage kernel? --Arun > Just had to comment the crashdump code to compile with > defconfig. > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > index 233cd04..6f4ebd8 100644 > --- a/arch/arm64/kernel/setup.c > +++ b/arch/arm64/kernel/setup.c > @@ -377,6 +377,7 @@ static void __init __maybe_unused reserve_crashkernel(void) > insert_resource(&iomem_resource, &crashk_res); > } > > +#ifdef CONFIG_CRASH_DUMP > static void __init __maybe_unused reserve_elfcorehdr(void) > { > struct resource res; > @@ -402,6 +403,7 @@ static void __init __maybe_unused reserve_elfcorehdr(void) > res.end = elfcorehdr_addr + elfcorehdr_size - 1; > insert_resource(&iomem_resource, &res); > } > +#endif > > static void __init request_standard_resources(void) > { > > --Arun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-01 11:13 ` Arun Chandran @ 2014-08-03 14:47 ` Mark Rutland 0 siblings, 0 replies; 61+ messages in thread From: Mark Rutland @ 2014-08-03 14:47 UTC (permalink / raw) To: linux-arm-kernel On Fri, Aug 01, 2014 at 12:13:12PM +0100, Arun Chandran wrote: > On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote: > > On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote: > >> Hi Mark, > >> > >> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote: > >>> Since c218bca74eea (arm64: Relax the kernel cache requirements for > >>> boot), the kernel will flush the cache for anything outside of the Image > >>> that it writes to before enabling the MMU and caches (e.g. the idmap and > >>> swapper page tables). Once caches are up we shouldn't care. > >>> > >>> Assuming that the existing kernel code is correct, the only region we > >>> should need to flush out to the PoC is the region from _text to _edata > >>> (i.e. just the contents of the Image). > >> > >> If the new kernel will overwrite the old one, then we do the final copy > >> of the new kernel in the relocate_new_kernel routine. relocate_new_kernel > >> is executed after the dcache is disabled, so that should write it directly > >> to the PoC. It seems the protocol expects us to invalidate the dcache > >> for that range though, so I added code to do this, essentially what Arun > >> had added. > >> > >> Arun, please try. > >> > > It works without any hiccups :).. > > I have attached the log. > > > > I will try with big-endian UP configuration next. > > > > This question may be irrelevant to kexec and may be stupid also. > > while debugging kexec in BIG-endian configuration I see > _create_page_tables (arch/arm64/kernel/head.S) is > doing __inval_cache_range on the addresses > from idmap_pg_dir to swapper_pg_dir. > > ie from 0x4000D5F000 to 0x4000D61000 + #SWAPPER_DIR_SIZE > in my case. > > Is it supposed to clear the corresponding virtual addresses? The data caches behave in a PIPT like fashion, and there are no aliases. Flushing by any VA that maps to a particular PA will flush out the only entry that cna possibly exist for that PA. While the MMU is off, the VA->PA mapping is an idmap. Given that, we can flush by PA. > There might be chance that first stage kernel may be using VA > from the same area right? (cache (L3)containing valid lines in the > area 0xffffffc000d5f000 to 0xffffffc000d5f000 + 16K) The L3 cache should never see the VA, only the PA. > Or is it the duty of the kexec to flush the entire > VA regions used by the first stage kernel and > the VA regions going to be used by the 2nd > stage kernel? We should only need ensure that the new kernel image and anything we expect to use before the caches have been eanbled are flushed to the PoC. Thanks, Mark. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-30 7:22 ` Arun Chandran 2014-08-01 11:13 ` Arun Chandran @ 2014-08-04 10:16 ` Arun Chandran 2014-08-04 11:35 ` Mark Rutland 2014-08-04 17:21 ` Geoff Levand 1 sibling, 2 replies; 61+ messages in thread From: Arun Chandran @ 2014-08-04 10:16 UTC (permalink / raw) To: linux-arm-kernel Hi Geoff, On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote: > On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote: >> Hi Mark, >> >> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote: >>> Since c218bca74eea (arm64: Relax the kernel cache requirements for >>> boot), the kernel will flush the cache for anything outside of the Image >>> that it writes to before enabling the MMU and caches (e.g. the idmap and >>> swapper page tables). Once caches are up we shouldn't care. >>> >>> Assuming that the existing kernel code is correct, the only region we >>> should need to flush out to the PoC is the region from _text to _edata >>> (i.e. just the contents of the Image). >> >> If the new kernel will overwrite the old one, then we do the final copy >> of the new kernel in the relocate_new_kernel routine. relocate_new_kernel >> is executed after the dcache is disabled, so that should write it directly >> to the PoC. It seems the protocol expects us to invalidate the dcache >> for that range though, so I added code to do this, essentially what Arun >> had added. >> >> Arun, please try. >> > It works without any hiccups :).. > I have attached the log. > > I will try with big-endian UP configuration next. > The latest kexec code is working fine in LE/BE mode in UP configuration. I had to change kexec-tools code a bit to make sure that "kexec -l" is not putting dtb at an area where kernel is building its initial page tables. ######################### diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c index ab7a9ac..8f04473 100644 --- a/kexec/arch/arm64/kexec-arm64.c +++ b/kexec/arch/arm64/kexec-arm64.c @@ -526,6 +526,7 @@ int arm64_load_other_segments(struct kexec_info *info) off_t dtb_base; char *initrd_buf = NULL; char command_line[COMMAND_LINE_SIZE] = ""; + unsigned long dtb_start; /* Processing for arm64_opts.dtb and arm64_opts.command_line. */ @@ -554,8 +555,16 @@ int arm64_load_other_segments(struct kexec_info *info) * kernel with an alignment of 128 KiB, giving a max supported DTB * size of 128 KiB (worst case). */ + /* + * arm64 kernel uses area above kernel image to build + * initial page tables. Max required size for this area is 384K. It + * happens when CONFIG_ARM64_PGTABLE_LEVELS is set. + * So place dtb 512k above kernel image area. + */ + dtb_start = (unsigned long)info->segment[0].mem + info->segment[0].memsz + 512UL * 1024; + dtb_base = locate_hole(info, dtb_2.size, 128UL * 1024, - (unsigned long)info->entry, + (unsigned long)dtb_start, (unsigned long)info->entry + 512 * 1024 * 1024, 1); dbgprintf("dtb: base %lx, size %lxh (%ld)\n", (unsigned long)dtb_base, ######################## This code works fine with LE as well as BE. I have attached the logs for both. Now before trying SMP configuration I want to know whether the below "kexec -e" scenarios are valid(required)? 1st stage | 2nd stage --------------------------------------------------------------------------------------------- LE UP | BE UP LE UP | BE SMP LE UP | LE SMP LE SMP | LE UP LE SMP | BE UP LE SMP | BE SMP --Arun -------------- next part -------------- A non-text attachment was scrubbed... Name: kexec_be_log Type: application/octet-stream Size: 16869 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140804/8a02a68c/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: kexec_le_log Type: application/octet-stream Size: 15548 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140804/8a02a68c/attachment-0003.obj> ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-04 10:16 ` Arun Chandran @ 2014-08-04 11:35 ` Mark Rutland 2014-08-07 0:40 ` Geoff Levand 2014-08-04 17:21 ` Geoff Levand 1 sibling, 1 reply; 61+ messages in thread From: Mark Rutland @ 2014-08-04 11:35 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 04, 2014 at 11:16:02AM +0100, Arun Chandran wrote: > Hi Geoff, > > On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote: > > On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote: > >> Hi Mark, > >> > >> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote: > >>> Since c218bca74eea (arm64: Relax the kernel cache requirements for > >>> boot), the kernel will flush the cache for anything outside of the Image > >>> that it writes to before enabling the MMU and caches (e.g. the idmap and > >>> swapper page tables). Once caches are up we shouldn't care. > >>> > >>> Assuming that the existing kernel code is correct, the only region we > >>> should need to flush out to the PoC is the region from _text to _edata > >>> (i.e. just the contents of the Image). > >> > >> If the new kernel will overwrite the old one, then we do the final copy > >> of the new kernel in the relocate_new_kernel routine. relocate_new_kernel > >> is executed after the dcache is disabled, so that should write it directly > >> to the PoC. It seems the protocol expects us to invalidate the dcache > >> for that range though, so I added code to do this, essentially what Arun > >> had added. > >> > >> Arun, please try. > >> > > It works without any hiccups :).. > > I have attached the log. > > > > I will try with big-endian UP configuration next. > > > > The latest kexec code is working fine in LE/BE mode in UP configuration. > > I had to change kexec-tools code a bit to make sure that "kexec -l" > is not putting dtb at an area where kernel is building its initial page > tables. > > ######################### > diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c > index ab7a9ac..8f04473 100644 > --- a/kexec/arch/arm64/kexec-arm64.c > +++ b/kexec/arch/arm64/kexec-arm64.c > @@ -526,6 +526,7 @@ int arm64_load_other_segments(struct kexec_info *info) > off_t dtb_base; > char *initrd_buf = NULL; > char command_line[COMMAND_LINE_SIZE] = ""; > + unsigned long dtb_start; > > /* Processing for arm64_opts.dtb and arm64_opts.command_line. */ > > @@ -554,8 +555,16 @@ int arm64_load_other_segments(struct kexec_info *info) > * kernel with an alignment of 128 KiB, giving a max supported DTB > * size of 128 KiB (worst case). */ > > + /* > + * arm64 kernel uses area above kernel image to build > + * initial page tables. Max required size for this area is 384K. It > + * happens when CONFIG_ARM64_PGTABLE_LEVELS is set. > + * So place dtb 512k above kernel image area. > + */ The area above the kernel Image holds the BSS too, and from some experiments I did a while back with an allyesconfig, it was possible to have many megabytes of BSS. While 512k might be adequate for a defconfig today, it will become less so as time goes on. For kernel Images post v3.17 you can and should read the Image header's image_size field to figure out how much memory is needed from _text to _end (i.e. all the memory the kernel will assume is available statically and will happily clobber). For kernel Images prior to v3.17 the best thing we can do is guess as you've done here, or place the DTB below the kernel and page tables in the TEXT_OFFSET area; neither option is particularly brilliant. Thanks, Mark. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-04 11:35 ` Mark Rutland @ 2014-08-07 0:40 ` Geoff Levand 2014-08-07 9:59 ` Mark Rutland 0 siblings, 1 reply; 61+ messages in thread From: Geoff Levand @ 2014-08-07 0:40 UTC (permalink / raw) To: linux-arm-kernel Hi, On Mon, 2014-08-04 at 12:35 +0100, Mark Rutland wrote: > > + /* > > + * arm64 kernel uses area above kernel image to build > > + * initial page tables. Max required size for this area is 384K. It > > + * happens when CONFIG_ARM64_PGTABLE_LEVELS is set. > > + * So place dtb 512k above kernel image area. > > + */ > > The area above the kernel Image holds the BSS too, and from some > experiments I did a while back with an allyesconfig, it was possible to > have many megabytes of BSS. > > While 512k might be adequate for a defconfig today, it will become less > so as time goes on. > > For kernel Images post v3.17 you can and should read the Image header's > image_size field to figure out how much memory is needed from _text to > _end (i.e. all the memory the kernel will assume is available statically > and will happily clobber). For kernel Images prior to v3.17 the best > thing we can do is guess as you've done here, or place the DTB below the > kernel and page tables in the TEXT_OFFSET area; neither option is > particularly brilliant. I haven't looked into it yet, but booting.txt says that the dtb must be 'within the first 512 megabytes from the start of the kernel image'. If this restriction still holds we can't just put the dtb at text_offset + image_size. -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-07 0:40 ` Geoff Levand @ 2014-08-07 9:59 ` Mark Rutland 2014-08-07 17:09 ` Geoff Levand 0 siblings, 1 reply; 61+ messages in thread From: Mark Rutland @ 2014-08-07 9:59 UTC (permalink / raw) To: linux-arm-kernel On Thu, Aug 07, 2014 at 01:40:03AM +0100, Geoff Levand wrote: > Hi, Hi Geoff, > On Mon, 2014-08-04 at 12:35 +0100, Mark Rutland wrote: > > > + /* > > > + * arm64 kernel uses area above kernel image to build > > > + * initial page tables. Max required size for this area is 384K. It > > > + * happens when CONFIG_ARM64_PGTABLE_LEVELS is set. > > > + * So place dtb 512k above kernel image area. > > > + */ > > > > The area above the kernel Image holds the BSS too, and from some > > experiments I did a while back with an allyesconfig, it was possible to > > have many megabytes of BSS. > > > > While 512k might be adequate for a defconfig today, it will become less > > so as time goes on. > > > > For kernel Images post v3.17 you can and should read the Image header's > > image_size field to figure out how much memory is needed from _text to > > _end (i.e. all the memory the kernel will assume is available statically > > and will happily clobber). For kernel Images prior to v3.17 the best > > thing we can do is guess as you've done here, or place the DTB below the > > kernel and page tables in the TEXT_OFFSET area; neither option is > > particularly brilliant. > > I haven't looked into it yet, but booting.txt says that the dtb must be > 'within the first 512 megabytes from the start of the kernel image'. If > this restriction still holds we can't just put the dtb at > text_offset + image_size. Isn't that only a problem if text_offset + image_size is huge (510MB+)? The wording in booting.txt doesn't sound quite right. As I understand it, the 512MB restriction is because of the way we map the dtb in the swapper page tables. So a better wording would be "the kernel and DTB must be placed in the same naturally-aligned 512MB region of memory". It should be possible to get rid of the restrictions on the placement of the image and DTB, but this requires reworking the VA layout (using a kernel text mapping separate from the linear map as with x86_64, and similarly having a separate DTB mapping), and unfortunately I haven't had the time to do that. Thanks, Mark. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-07 9:59 ` Mark Rutland @ 2014-08-07 17:09 ` Geoff Levand 0 siblings, 0 replies; 61+ messages in thread From: Geoff Levand @ 2014-08-07 17:09 UTC (permalink / raw) To: linux-arm-kernel On Thu, 2014-08-07 at 10:59 +0100, Mark Rutland wrote: > On Thu, Aug 07, 2014 at 01:40:03AM +0100, Geoff Levand wrote: > > > > I haven't looked into it yet, but booting.txt says that the dtb must be > > 'within the first 512 megabytes from the start of the kernel image'. If > > this restriction still holds we can't just put the dtb at > > text_offset + image_size. > > Isn't that only a problem if text_offset + image_size is huge (510MB+)? Yes, a limit I don't think we need to be too concerned about at present. > The wording in booting.txt doesn't sound quite right. As I understand > it, the 512MB restriction is because of the way we map the dtb in the > swapper page tables. > > So a better wording would be "the kernel and DTB must be placed in the > same naturally-aligned 512MB region of memory". > > It should be possible to get rid of the restrictions on the placement of > the image and DTB, but this requires reworking the VA layout (using a > kernel text mapping separate from the linear map as with x86_64, and > similarly having a separate DTB mapping), and unfortunately I haven't > had the time to do that. Or maybe just relocate the dtb at startup to within range. -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-04 10:16 ` Arun Chandran 2014-08-04 11:35 ` Mark Rutland @ 2014-08-04 17:21 ` Geoff Levand 2014-08-06 13:54 ` Arun Chandran 1 sibling, 1 reply; 61+ messages in thread From: Geoff Levand @ 2014-08-04 17:21 UTC (permalink / raw) To: linux-arm-kernel Hi Arun, On Mon, 2014-08-04 at 15:46 +0530, Arun Chandran wrote: > The latest kexec code is working fine in LE/BE mode in UP configuration. > > I had to change kexec-tools code a bit to make sure that "kexec -l" > is not putting dtb at an area where kernel is building its initial page > tables. OK, I'll add in code to handle this. > Now before trying SMP configuration I want to know whether the below "kexec -e" > scenarios are valid(required)? > > 1st stage | 2nd stage > --------------------------------------------------------------------------------------------- > LE UP | BE UP > LE UP | BE SMP > LE UP | LE SMP > LE SMP | LE UP > LE SMP | BE UP > LE SMP | BE SMP Kexec should work for all combinations of kernel options. A bootloader (1st stage) may want to be UP, and custom crash dump kernels (2nd stage) are generally UP. UP->UP may have limited use. I think all these listed are important to test though, and going the other way also, so BE-SMP->LE-SMP for example. -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-04 17:21 ` Geoff Levand @ 2014-08-06 13:54 ` Arun Chandran 2014-08-06 15:51 ` Arun Chandran 2014-08-07 20:07 ` Geoff Levand 0 siblings, 2 replies; 61+ messages in thread From: Arun Chandran @ 2014-08-06 13:54 UTC (permalink / raw) To: linux-arm-kernel Hi, On Mon, Aug 4, 2014 at 10:51 PM, Geoff Levand <geoff@infradead.org> wrote: > Hi Arun, > > On Mon, 2014-08-04 at 15:46 +0530, Arun Chandran wrote: >> The latest kexec code is working fine in LE/BE mode in UP configuration. >> >> I had to change kexec-tools code a bit to make sure that "kexec -l" >> is not putting dtb at an area where kernel is building its initial page >> tables. > > OK, I'll add in code to handle this. > >> Now before trying SMP configuration I want to know whether the below "kexec -e" >> scenarios are valid(required)? >> >> 1st stage | 2nd stage >> --------------------------------------------------------------------------------------------- >> LE UP | BE UP >> LE UP | BE SMP >> LE UP | LE SMP >> LE SMP | LE UP >> LE SMP | BE UP >> LE SMP | BE SMP > I am testing "kexec -e" with the script below(it will either reboot the board to BE or LE mode randomly). This test is done without L3 cache. With L3 I have more troubles. I think it is best to fix the code without L3 then move to testing with L3. That fix may solve the problem with L3 ############################### :~$ cat /etc/init.d/S50reboot #!/bin/sh sleep 5 i=$RANDOM j=$(( $i % 2)) if [ $j -eq 0 ] ; then mount /dev/mmcblk0p1 /mnt count=`cat /mnt/cnt` echo "KEXEC rebootng to BE count = $count" echo $RANDOM > /mnt/"$count""_BE" count=$(( $count + 1 )) echo "$count">/mnt/cnt kexec -l /mnt/vmlinux_BE.strip --command-line="console=ttyS0,115200 earlyprintk=uart8 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M" umount /mnt kexec -e else mount /dev/mmcblk0p1 /mnt count=`cat /mnt/cnt` echo "KEXEC rebooting to LE count = $count" echo $RANDOM > /mnt/"$count""_LE" count=$(( $count + 1 )) echo "$count">/mnt/cnt kexec -l /mnt/vmlinux_LE.strip --command-line="console=ttyS0,115200 earlyprintk=uart8 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M" umount /mnt kexec -e fi exit $? ############################# I have managed to run this test till 72 times with the below changes. ############################ diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 363a246..7de11ee 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx , unsigned int flag, break; case IND_SOURCE: __flush_dcache_area(addr, PAGE_SIZE); - __flush_dcache_area(dest, PAGE_SIZE); break; default: break; @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image) phys_addr_t reboot_code_buffer_phys; void *reboot_code_buffer; struct kexec_ctx *ctx = kexec_image_to_ctx(image); + unsigned long start, end; + int i; BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE); BUG_ON(num_online_cpus() > 1); @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image) kexec_list_walk(NULL, image->head, kexec_list_flush_cb); + start = image->segment[0].mem; + end = image->segment[0].mem + image->segment[0].memsz; + for (i = 0; i < image->nr_segments; i++) { + if (image->segment[i].mem > end) + end = image->segment[i].mem + image->segment[i].memsz; + } + + start = (unsigned long)phys_to_virt(start); + end = (unsigned long)phys_to_virt(end); + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start); + __flush_dcache_area((void *)start, end - start); + //flush_icache_range(start, end); + //mdelay(10); + soft_restart(reboot_code_buffer_phys); } diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S index 4b077e1..a49549e 100644 --- a/arch/arm64/kernel/relocate_kernel.S +++ b/arch/arm64/kernel/relocate_kernel.S @@ -61,13 +61,13 @@ relocate_new_kernel: mov x20, x13 mov x21, x14 - prfm pldl1strm, [x21, #64] + /*prfm pldl1strm, [x21, #64] */ 1: ldp x22, x23, [x21] ldp x24, x25, [x21, #16] ldp x26, x27, [x21, #32] ldp x28, x29, [x21, #48] add x21, x21, #64 - prfm pldl1strm, [x21, #64] + /*prfm pldl1strm, [x21, #64]*/ stnp x22, x23, [x20] stnp x24, x25, [x20, #16] stnp x26, x27, [x20, #32] @@ -115,6 +115,8 @@ relocate_new_kernel: mov x3, xzr ldr x4, kexec_kimage_start + dsb sy + isb br x4 .align 3 arun at arun-OptiPlex-9010:~/work/aarch64-kernel/linux-kexec_LE$ ########################### Out of this 72 times it booted 34 LE and 38 BE. It failed from switching from BE to LE. Mark, Do you have any pointers to find the problem? --Arun ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-06 13:54 ` Arun Chandran @ 2014-08-06 15:51 ` Arun Chandran 2014-08-07 20:07 ` Geoff Levand 1 sibling, 0 replies; 61+ messages in thread From: Arun Chandran @ 2014-08-06 15:51 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 6, 2014 at 7:24 PM, Arun Chandran <achandran@mvista.com> wrote: > Hi, > > On Mon, Aug 4, 2014 at 10:51 PM, Geoff Levand <geoff@infradead.org> wrote: >> Hi Arun, >> >> On Mon, 2014-08-04 at 15:46 +0530, Arun Chandran wrote: >>> The latest kexec code is working fine in LE/BE mode in UP configuration. >>> >>> I had to change kexec-tools code a bit to make sure that "kexec -l" >>> is not putting dtb at an area where kernel is building its initial page >>> tables. >> >> OK, I'll add in code to handle this. >> >>> Now before trying SMP configuration I want to know whether the below "kexec -e" >>> scenarios are valid(required)? >>> >>> 1st stage | 2nd stage >>> --------------------------------------------------------------------------------------------- >>> LE UP | BE UP >>> LE UP | BE SMP >>> LE UP | LE SMP >>> LE SMP | LE UP >>> LE SMP | BE UP >>> LE SMP | BE SMP >> > I am testing "kexec -e" with the script below(it will either > reboot the board to BE or LE mode randomly). > > This test is done without L3 cache. With L3 I have more troubles. > I think it is best to fix the code without L3 then move > to testing with L3. That fix may solve the problem with L3 > > ############################### > :~$ cat /etc/init.d/S50reboot > #!/bin/sh > > sleep 5 > i=$RANDOM > j=$(( $i % 2)) > > if [ $j -eq 0 ] ; then > mount /dev/mmcblk0p1 /mnt > > count=`cat /mnt/cnt` > echo "KEXEC rebootng to BE count = $count" > echo $RANDOM > /mnt/"$count""_BE" > > count=$(( $count + 1 )) > echo "$count">/mnt/cnt > > kexec -l /mnt/vmlinux_BE.strip > --command-line="console=ttyS0,115200 earlyprintk=uart8 > 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M" > umount /mnt > kexec -e > else > mount /dev/mmcblk0p1 /mnt > > count=`cat /mnt/cnt` > echo "KEXEC rebooting to LE count = $count" > echo $RANDOM > /mnt/"$count""_LE" > > count=$(( $count + 1 )) > echo "$count">/mnt/cnt > > kexec -l /mnt/vmlinux_LE.strip > --command-line="console=ttyS0,115200 earlyprintk=uart8 > 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M" > umount /mnt > kexec -e > fi > > exit $? > ############################# > Some more information: I am able to run the same test (without L3) for 1.5 hours in case of no endian switching. For this testing I used the latest code @ https://git.linaro.org/people/geoff.levand/linux-kexec.git unlike in the endian switching scenario there is no code change made for this test. It ran 370 times for LE-->LE reboots. It ran 351 times for BE-->BE reboots. I had to manually stop both tests as it was continuing without any problem. I will repeat the same test with L3 cache on --Arun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-06 13:54 ` Arun Chandran 2014-08-06 15:51 ` Arun Chandran @ 2014-08-07 20:07 ` Geoff Levand 2014-08-08 5:46 ` Arun Chandran 1 sibling, 1 reply; 61+ messages in thread From: Geoff Levand @ 2014-08-07 20:07 UTC (permalink / raw) To: linux-arm-kernel Hi Arun, On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote: > I have managed to run this test till 72 times with the > below changes. > > ############################ > diff --git a/arch/arm64/kernel/machine_kexec.c > b/arch/arm64/kernel/machine_kexec.c > index 363a246..7de11ee 100644 > --- a/arch/arm64/kernel/machine_kexec.c > +++ b/arch/arm64/kernel/machine_kexec.c > @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx , > unsigned int flag, > break; > case IND_SOURCE: > __flush_dcache_area(addr, PAGE_SIZE); > - __flush_dcache_area(dest, PAGE_SIZE); > break; > default: > break; > @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image) > phys_addr_t reboot_code_buffer_phys; > void *reboot_code_buffer; > struct kexec_ctx *ctx = kexec_image_to_ctx(image); > + unsigned long start, end; > + int i; > > BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE); > BUG_ON(num_online_cpus() > 1); > @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image) > > kexec_list_walk(NULL, image->head, kexec_list_flush_cb); > > + start = image->segment[0].mem; > + end = image->segment[0].mem + image->segment[0].memsz; > + for (i = 0; i < image->nr_segments; i++) { > + if (image->segment[i].mem > end) > + end = image->segment[i].mem + image->segment[i].memsz; > + } > + > + start = (unsigned long)phys_to_virt(start); > + end = (unsigned long)phys_to_virt(end); > + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start); > + __flush_dcache_area((void *)start, end - start); > + //flush_icache_range(start, end); > + //mdelay(10); > + > soft_restart(reboot_code_buffer_phys); > } Doing the flush in kexec_list_flush_cb() is almost the same as using the image->segment to flush. Did you see a difference on your system? > diff --git a/arch/arm64/kernel/relocate_kernel.S > b/arch/arm64/kernel/relocate_kernel.S > index 4b077e1..a49549e 100644 > --- a/arch/arm64/kernel/relocate_kernel.S > +++ b/arch/arm64/kernel/relocate_kernel.S I think these changes are good. I'll add them in. -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-07 20:07 ` Geoff Levand @ 2014-08-08 5:46 ` Arun Chandran 2014-08-08 10:03 ` Arun Chandran 0 siblings, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-08-08 5:46 UTC (permalink / raw) To: linux-arm-kernel Hi, On Fri, Aug 8, 2014 at 1:37 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Arun, > > On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote: > >> I have managed to run this test till 72 times with the >> below changes. >> >> ############################ >> diff --git a/arch/arm64/kernel/machine_kexec.c >> b/arch/arm64/kernel/machine_kexec.c >> index 363a246..7de11ee 100644 >> --- a/arch/arm64/kernel/machine_kexec.c >> +++ b/arch/arm64/kernel/machine_kexec.c >> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx , >> unsigned int flag, >> break; >> case IND_SOURCE: >> __flush_dcache_area(addr, PAGE_SIZE); >> - __flush_dcache_area(dest, PAGE_SIZE); >> break; >> default: >> break; >> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image) >> phys_addr_t reboot_code_buffer_phys; >> void *reboot_code_buffer; >> struct kexec_ctx *ctx = kexec_image_to_ctx(image); >> + unsigned long start, end; >> + int i; >> >> BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE); >> BUG_ON(num_online_cpus() > 1); >> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image) >> >> kexec_list_walk(NULL, image->head, kexec_list_flush_cb); >> >> + start = image->segment[0].mem; >> + end = image->segment[0].mem + image->segment[0].memsz; >> + for (i = 0; i < image->nr_segments; i++) { >> + if (image->segment[i].mem > end) >> + end = image->segment[i].mem + image->segment[i].memsz; >> + } >> + >> + start = (unsigned long)phys_to_virt(start); >> + end = (unsigned long)phys_to_virt(end); >> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start); >> + __flush_dcache_area((void *)start, end - start); >> + //flush_icache_range(start, end); >> + //mdelay(10); >> + >> soft_restart(reboot_code_buffer_phys); >> } > > Doing the flush in kexec_list_flush_cb() is almost the same > as using the image->segment to flush. Did you see a > difference on your system? > Yes I can see the difference. Let me explain it in detail. I am doing a stress test of "kexec -e" with the below reboot script. ################################ #!/bin/sh sleep 5 i=$RANDOM j=$(( $i % 2)) mount /dev/mmcblk0p1 /mnt count=`cat /mnt/cnt` if [ $j -eq 0 ] ; then echo "KEXEC rebootng to BE count = $count" echo $RANDOM > /mnt/"$count""_BE" kexec -l /mnt/vmlinux_BE.strip --command-line="console=ttyS0,115200 earlyprintk=uart8 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M" else echo "KEXEC rebooting to LE count = $count" echo $RANDOM > /mnt/"$count""_LE" kexec -l /mnt/vmlinux_LE.strip --command-line="console=ttyS0,115200 earlyprintk=uart8 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M" fi count=$(( $count + 1 )) echo "$count">/mnt/cnt umount /mnt kexec -e exit $? ############################### Observations with the default code @https://git.linaro.org/people/geoff.levand/linux-kexec.git Changed last on "Mon, 4 Aug 2014 23:24:10 +0000 (16:24 -0700)" a) LE to LE worked without L3 cache on b) BE to BE worked without L3 cache on c) Random endian switching does not work in any case (with L3, No L3) It breaks very early and unstable. Now with the below modifications ############################# diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 363a246..571b68d 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx , unsigned int flag, break; case IND_SOURCE: __flush_dcache_area(addr, PAGE_SIZE); - __flush_dcache_area(dest, PAGE_SIZE); break; default: break; @@ -636,11 +635,13 @@ static void kexec_list_flush_cb(void *ctx , unsigned int flag, * Called from the core kexec code for a sys_reboot with LINUX_REBOOT_CMD_KEXEC. */ +unsigned long dflush_start, dflush_end; void machine_kexec(struct kimage *image) { phys_addr_t reboot_code_buffer_phys; void *reboot_code_buffer; struct kexec_ctx *ctx = kexec_image_to_ctx(image); + int i; BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE); BUG_ON(num_online_cpus() > 1); @@ -698,6 +699,19 @@ void machine_kexec(struct kimage *image) kexec_list_walk(NULL, image->head, kexec_list_flush_cb); + dflush_start = image->segment[0].mem; + dflush_end = image->segment[0].mem + image->segment[0].memsz; + for (i = 0; i < image->nr_segments; i++) { + if (image->segment[i].mem > dflush_end) + dflush_end = image->segment[i].mem + image->segment[i].memsz; + } + + dflush_start = (unsigned long)phys_to_virt(dflush_start); + dflush_end = (unsigned long)phys_to_virt(dflush_end); + + __flush_dcache_area((void *)&dflush_start, sizeof(dflush_start)); + __flush_dcache_area((void *)&dflush_end, sizeof(dflush_end)); + soft_restart(reboot_code_buffer_phys); } diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index aa13521..b8c58d8 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -57,6 +57,7 @@ unsigned long __stack_chk_guard __read_mostly; EXPORT_SYMBOL(__stack_chk_guard); #endif +extern unsigned long dflush_start, dflush_end; static void setup_restart(void) { /* @@ -78,6 +79,8 @@ static void setup_restart(void) /* Push out any further dirty data, and ensure cache is empty */ flush_cache_all(); + + __flush_dcache_area((void*)dflush_start, dflush_end - dflush_start); } void soft_restart(unsigned long addr) diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S index 4b077e1..a49549e 100644 --- a/arch/arm64/kernel/relocate_kernel.S +++ b/arch/arm64/kernel/relocate_kernel.S @@ -61,13 +61,13 @@ relocate_new_kernel: mov x20, x13 mov x21, x14 - prfm pldl1strm, [x21, #64] + /*prfm pldl1strm, [x21, #64] */ 1: ldp x22, x23, [x21] ldp x24, x25, [x21, #16] ldp x26, x27, [x21, #32] ldp x28, x29, [x21, #48] add x21, x21, #64 - prfm pldl1strm, [x21, #64] + /*prfm pldl1strm, [x21, #64]*/ stnp x22, x23, [x20] stnp x24, x25, [x20, #16] stnp x26, x27, [x20, #32] @@ -115,6 +115,8 @@ relocate_new_kernel: mov x3, xzr ldr x4, kexec_kimage_start + dsb sy + isb br x4 .align 3 diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index f1619c0..7d81b86 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -52,6 +52,12 @@ */ ENTRY(cpu_cache_off) mrs x0, sctlr_el1 + bic x0, x0, #1 << 12 // clear SCTLR.C + msr sctlr_el1, x0 + isb + dsb sy + + mrs x0, sctlr_el1 bic x0, x0, #1 << 2 // clear SCTLR.C msr sctlr_el1, x0 isb ########################### a) I am able to run random endian switching test for 11.5 hours without any breaks. It rebooted totally 6542 times. total LE boots = 3241 total BE boots = 3301 Out of that 1625 times it switched from "LE to BE" or "BE to LE" One major modification is flushing the Dcache area of the new Image after turning off CPU caches. This makes sure that L3 contains no lines in the new "kernel Image area". I am still not sure what happens with the other lines still in L3. Does the new kernel has to do a __flush_dcahe_area() on all the new pages it is gonna give to userspace? Please refer to the discussion at http://lists.linaro.org/pipermail/linaro-kernel/2013-August/006155.html for more details. It says that L3 cache becomes transparent when lower level caches are OFF. So we need to clean L3 when it is transparent. And about adding barrier + removing pre-fetching in arch/arm64/kernel/relocate_kernel.S. I think it makes sure that no stale data is present with CPU while performing a endian switching. Am I right here? There is one more change that is turning off Icache. I am not sure why I did this. I will perform the test without this change now. >> diff --git a/arch/arm64/kernel/relocate_kernel.S >> b/arch/arm64/kernel/relocate_kernel.S >> index 4b077e1..a49549e 100644 >> --- a/arch/arm64/kernel/relocate_kernel.S >> +++ b/arch/arm64/kernel/relocate_kernel.S > > I think these changes are good. I'll add them in. Yes Please. Thank you for letting it in. --Arun ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-08 5:46 ` Arun Chandran @ 2014-08-08 10:03 ` Arun Chandran 2014-08-12 5:42 ` Arun Chandran 0 siblings, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-08-08 10:03 UTC (permalink / raw) To: linux-arm-kernel On Fri, Aug 8, 2014 at 11:16 AM, Arun Chandran <achandran@mvista.com> wrote: > Hi, > > On Fri, Aug 8, 2014 at 1:37 AM, Geoff Levand <geoff@infradead.org> wrote: >> Hi Arun, >> >> On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote: >> >>> I have managed to run this test till 72 times with the >>> below changes. >>> >>> ############################ >>> diff --git a/arch/arm64/kernel/machine_kexec.c >>> b/arch/arm64/kernel/machine_kexec.c >>> index 363a246..7de11ee 100644 >>> --- a/arch/arm64/kernel/machine_kexec.c >>> +++ b/arch/arm64/kernel/machine_kexec.c >>> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx , >>> unsigned int flag, >>> break; >>> case IND_SOURCE: >>> __flush_dcache_area(addr, PAGE_SIZE); >>> - __flush_dcache_area(dest, PAGE_SIZE); >>> break; >>> default: >>> break; >>> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image) >>> phys_addr_t reboot_code_buffer_phys; >>> void *reboot_code_buffer; >>> struct kexec_ctx *ctx = kexec_image_to_ctx(image); >>> + unsigned long start, end; >>> + int i; >>> >>> BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE); >>> BUG_ON(num_online_cpus() > 1); >>> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image) >>> >>> kexec_list_walk(NULL, image->head, kexec_list_flush_cb); >>> >>> + start = image->segment[0].mem; >>> + end = image->segment[0].mem + image->segment[0].memsz; >>> + for (i = 0; i < image->nr_segments; i++) { >>> + if (image->segment[i].mem > end) >>> + end = image->segment[i].mem + image->segment[i].memsz; >>> + } >>> + >>> + start = (unsigned long)phys_to_virt(start); >>> + end = (unsigned long)phys_to_virt(end); >>> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start); >>> + __flush_dcache_area((void *)start, end - start); >>> + //flush_icache_range(start, end); >>> + //mdelay(10); >>> + >>> soft_restart(reboot_code_buffer_phys); >>> } >> >> Doing the flush in kexec_list_flush_cb() is almost the same >> as using the image->segment to flush. Did you see a >> difference on your system? >> > > Yes I can see the difference. Let me explain it in detail. > > I am doing a stress test of "kexec -e" with the below reboot > script. > > ################################ > #!/bin/sh > > sleep 5 > i=$RANDOM > j=$(( $i % 2)) > > mount /dev/mmcblk0p1 /mnt > count=`cat /mnt/cnt` > > if [ $j -eq 0 ] ; then > echo "KEXEC rebootng to BE count = $count" > echo $RANDOM > /mnt/"$count""_BE" > kexec -l /mnt/vmlinux_BE.strip > --command-line="console=ttyS0,115200 earlyprintk=uart8 > 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M" > else > echo "KEXEC rebooting to LE count = $count" > echo $RANDOM > /mnt/"$count""_LE" > kexec -l /mnt/vmlinux_LE.strip > --command-line="console=ttyS0,115200 earlyprintk=uart8 > 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M" > fi > > count=$(( $count + 1 )) > echo "$count">/mnt/cnt > umount /mnt > kexec -e > exit $? > ############################### > > Observations with the default code > @https://git.linaro.org/people/geoff.levand/linux-kexec.git > Changed last on "Mon, 4 Aug 2014 23:24:10 +0000 (16:24 -0700)" > > a) LE to LE worked without L3 cache on > b) BE to BE worked without L3 cache on > c) Random endian switching does not work in any case (with L3, No L3) > It breaks very early and unstable. > > Now with the below modifications > I think the more cleaner approach is to invalidate the cache lines from arch/arm64/kernel/relocate_kernel.S As this code is already aware of the destination it has to copy the 2nd stage kernel. ######################## diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S index 4b077e1..6880c1a 100644 --- a/arch/arm64/kernel/relocate_kernel.S +++ b/arch/arm64/kernel/relocate_kernel.S @@ -31,6 +31,13 @@ .align 3 +.macro dcache_line_size, reg, tmp +mrs \tmp, ctr_el0 // read CTR +ubfm \tmp, \tmp, #16, #19 // cache line size encoding +mov \reg, #4 // bytes per word +lsl \reg, \reg, \tmp // actual cache line size +.endm + .globl relocate_new_kernel relocate_new_kernel: @@ -58,23 +65,46 @@ relocate_new_kernel: /* source: copy_page(x20 = dest, x21 = addr) */ + mov x0, x13 + add x1, x13, #PAGE_SIZE + + /* Invalidate the destination cache area */ +__inval_cache_range: + dcache_line_size x2, x3 + sub x3, x2, #1 + tst x1, x3 // end cache line aligned? + bic x1, x1, x3 + b.eq 1f + dc civac, x1 // clean & invalidate D / U line +1: tst x0, x3 // start cache line aligned? + bic x0, x0, x3 + b.eq 2f + dc civac, x0 // clean & invalidate D / U line + b 3f +2: dc ivac, x0 // invalidate D / U line +3: add x0, x0, x2 + cmp x0, x1 + b.lo 2b + dsb sy + mov x20, x13 mov x21, x14 - prfm pldl1strm, [x21, #64] -1: ldp x22, x23, [x21] + /*prfm pldl1strm, [x21, #64] */ +.Lcopy_data: + ldp x22, x23, [x21] ldp x24, x25, [x21, #16] ldp x26, x27, [x21, #32] ldp x28, x29, [x21, #48] add x21, x21, #64 - prfm pldl1strm, [x21, #64] + /*prfm pldl1strm, [x21, #64]*/ stnp x22, x23, [x20] stnp x24, x25, [x20, #16] stnp x26, x27, [x20, #32] stnp x28, x29, [x20, #48] add x20, x20, #64 tst x21, #(PAGE_SIZE - 1) - b.ne 1b + b.ne .Lcopy_data /* dest += PAGE_SIZE */ @@ -115,6 +145,8 @@ relocate_new_kernel: mov x3, xzr ldr x4, kexec_kimage_start + dsb sy + isb br x4 .align 3 diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index f1619c0..c62cba7 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -52,6 +52,13 @@ */ ENTRY(cpu_cache_off) mrs x0, sctlr_el1 + /* Turn off I-Cache */ + bic x0, x0, #1 << 12 // clear SCTLR.C + msr sctlr_el1, x0 + isb + dsb sy + + mrs x0, sctlr_el1 bic x0, x0, #1 << 2 // clear SCTLR.C msr sctlr_el1, x0 isb ############################# --Arun ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-08 10:03 ` Arun Chandran @ 2014-08-12 5:42 ` Arun Chandran 2014-08-13 11:09 ` Arun Chandran 0 siblings, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-08-12 5:42 UTC (permalink / raw) To: linux-arm-kernel Hi Geoff, Mark, Sorry for top posting. I hope we have solved almost all the problems with kexec in uni-processor scenario except converting soft_restart() to assembly (I will give try to do this). kexec is stress tested with L3 cache on with the below changes. It ran for 17 hours and rebooted totally 8226 times without any problem. Total LE boots - 4122 Total BE boots - 4104 Total LE to BE or BE to LE switching - 4112 ########################## diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 363a246..5b15a00 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx , unsigned int flag, break; case IND_SOURCE: __flush_dcache_area(addr, PAGE_SIZE); - __flush_dcache_area(dest, PAGE_SIZE); break; default: break; diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S index 4b077e1..e890516 100644 --- a/arch/arm64/kernel/relocate_kernel.S +++ b/arch/arm64/kernel/relocate_kernel.S @@ -31,6 +31,13 @@ .align 3 +.macro dcache_line_size, reg, tmp +mrs \tmp, ctr_el0 // read CTR +ubfm \tmp, \tmp, #16, #19 // cache line size encoding +mov \reg, #4 // bytes per word +lsl \reg, \reg, \tmp // actual cache line size +.endm + .globl relocate_new_kernel relocate_new_kernel: @@ -56,25 +63,51 @@ relocate_new_kernel: .Ltest_source: tbz x10, IND_SOURCE_BIT, .Ltest_indirection - /* source: copy_page(x20 = dest, x21 = addr) */ + mov x0, x13 + add x1, x13, #PAGE_SIZE + + /* Invalidate the destination cache area to make sure that + * all the data required for the second stage kernel is + * intact at PoC. This is the safest place to do this activity + * as we are running with MMU and D-cache off. + */ +__inval_cache_range: + dcache_line_size x2, x3 + sub x3, x2, #1 + tst x1, x3 // end cache line aligned? + bic x1, x1, x3 + b.eq 1f + dc ivac, x1 // invalidate D / U line +1: tst x0, x3 // start cache line aligned? + bic x0, x0, x3 + b.eq 2f + dc ivac, x0 // invalidate D / U line + b 3f +2: dc ivac, x0 // invalidate D / U line +3: add x0, x0, x2 + cmp x0, x1 + b.lo 2b + dsb sy + /* source: copy_page(x20 = dest, x21 = addr) */ mov x20, x13 mov x21, x14 - prfm pldl1strm, [x21, #64] -1: ldp x22, x23, [x21] + /*prfm pldl1strm, [x21, #64] */ +.Lcopy_data: + ldp x22, x23, [x21] ldp x24, x25, [x21, #16] ldp x26, x27, [x21, #32] ldp x28, x29, [x21, #48] add x21, x21, #64 - prfm pldl1strm, [x21, #64] + /*prfm pldl1strm, [x21, #64]*/ stnp x22, x23, [x20] stnp x24, x25, [x20, #16] stnp x26, x27, [x20, #32] stnp x28, x29, [x20, #48] add x20, x20, #64 tst x21, #(PAGE_SIZE - 1) - b.ne 1b + b.ne .Lcopy_data /* dest += PAGE_SIZE */ @@ -115,6 +148,11 @@ relocate_new_kernel: mov x3, xzr ldr x4, kexec_kimage_start + + /* Clean entire I-cache */ + ic ialluis + isb + dsb sy br x4 .align 3 ################################ I have attached the patch for this. Now I will try kexec in SMP configuration. --Arun On Fri, Aug 8, 2014 at 3:33 PM, Arun Chandran <achandran@mvista.com> wrote: > On Fri, Aug 8, 2014 at 11:16 AM, Arun Chandran <achandran@mvista.com> wrote: >> Hi, >> >> On Fri, Aug 8, 2014 at 1:37 AM, Geoff Levand <geoff@infradead.org> wrote: >>> Hi Arun, >>> >>> On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote: >>> >>>> I have managed to run this test till 72 times with the >>>> below changes. >>>> >>>> ############################ >>>> diff --git a/arch/arm64/kernel/machine_kexec.c >>>> b/arch/arm64/kernel/machine_kexec.c >>>> index 363a246..7de11ee 100644 >>>> --- a/arch/arm64/kernel/machine_kexec.c >>>> +++ b/arch/arm64/kernel/machine_kexec.c >>>> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx , >>>> unsigned int flag, >>>> break; >>>> case IND_SOURCE: >>>> __flush_dcache_area(addr, PAGE_SIZE); >>>> - __flush_dcache_area(dest, PAGE_SIZE); >>>> break; >>>> default: >>>> break; >>>> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image) >>>> phys_addr_t reboot_code_buffer_phys; >>>> void *reboot_code_buffer; >>>> struct kexec_ctx *ctx = kexec_image_to_ctx(image); >>>> + unsigned long start, end; >>>> + int i; >>>> >>>> BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE); >>>> BUG_ON(num_online_cpus() > 1); >>>> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image) >>>> >>>> kexec_list_walk(NULL, image->head, kexec_list_flush_cb); >>>> >>>> + start = image->segment[0].mem; >>>> + end = image->segment[0].mem + image->segment[0].memsz; >>>> + for (i = 0; i < image->nr_segments; i++) { >>>> + if (image->segment[i].mem > end) >>>> + end = image->segment[i].mem + image->segment[i].memsz; >>>> + } >>>> + >>>> + start = (unsigned long)phys_to_virt(start); >>>> + end = (unsigned long)phys_to_virt(end); >>>> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start); >>>> + __flush_dcache_area((void *)start, end - start); >>>> + //flush_icache_range(start, end); >>>> + //mdelay(10); >>>> + >>>> soft_restart(reboot_code_buffer_phys); >>>> } >>> >>> Doing the flush in kexec_list_flush_cb() is almost the same >>> as using the image->segment to flush. Did you see a >>> difference on your system? >>> >> >> Yes I can see the difference. Let me explain it in detail. >> >> I am doing a stress test of "kexec -e" with the below reboot >> script. >> >> ################################ >> #!/bin/sh >> >> sleep 5 >> i=$RANDOM >> j=$(( $i % 2)) >> >> mount /dev/mmcblk0p1 /mnt >> count=`cat /mnt/cnt` >> >> if [ $j -eq 0 ] ; then >> echo "KEXEC rebootng to BE count = $count" >> echo $RANDOM > /mnt/"$count""_BE" >> kexec -l /mnt/vmlinux_BE.strip >> --command-line="console=ttyS0,115200 earlyprintk=uart8 >> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M" >> else >> echo "KEXEC rebooting to LE count = $count" >> echo $RANDOM > /mnt/"$count""_LE" >> kexec -l /mnt/vmlinux_LE.strip >> --command-line="console=ttyS0,115200 earlyprintk=uart8 >> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M" >> fi >> >> count=$(( $count + 1 )) >> echo "$count">/mnt/cnt >> umount /mnt >> kexec -e >> exit $? >> ############################### >> >> Observations with the default code >> @https://git.linaro.org/people/geoff.levand/linux-kexec.git >> Changed last on "Mon, 4 Aug 2014 23:24:10 +0000 (16:24 -0700)" >> >> a) LE to LE worked without L3 cache on >> b) BE to BE worked without L3 cache on >> c) Random endian switching does not work in any case (with L3, No L3) >> It breaks very early and unstable. >> >> Now with the below modifications >> > > I think the more cleaner approach is to invalidate > the cache lines from arch/arm64/kernel/relocate_kernel.S > As this code is already aware of the destination it has > to copy the 2nd stage kernel. > > > ######################## > diff --git a/arch/arm64/kernel/relocate_kernel.S > b/arch/arm64/kernel/relocate_kernel.S > index 4b077e1..6880c1a 100644 > --- a/arch/arm64/kernel/relocate_kernel.S > +++ b/arch/arm64/kernel/relocate_kernel.S > @@ -31,6 +31,13 @@ > > .align 3 > > +.macro dcache_line_size, reg, tmp > +mrs \tmp, ctr_el0 // read CTR > +ubfm \tmp, \tmp, #16, #19 // cache line size encoding > +mov \reg, #4 // bytes per word > +lsl \reg, \reg, \tmp // actual cache line size > +.endm > + > .globl relocate_new_kernel > relocate_new_kernel: > > @@ -58,23 +65,46 @@ relocate_new_kernel: > > /* source: copy_page(x20 = dest, x21 = addr) */ > > + mov x0, x13 > + add x1, x13, #PAGE_SIZE > + > + /* Invalidate the destination cache area */ > +__inval_cache_range: > + dcache_line_size x2, x3 > + sub x3, x2, #1 > + tst x1, x3 // end cache line aligned? > + bic x1, x1, x3 > + b.eq 1f > + dc civac, x1 // clean & invalidate D / U line > +1: tst x0, x3 // start cache line aligned? > + bic x0, x0, x3 > + b.eq 2f > + dc civac, x0 // clean & invalidate D / U line > + b 3f > +2: dc ivac, x0 // invalidate D / U line > +3: add x0, x0, x2 > + cmp x0, x1 > + b.lo 2b > + dsb sy > + > mov x20, x13 > mov x21, x14 > > - prfm pldl1strm, [x21, #64] > -1: ldp x22, x23, [x21] > + /*prfm pldl1strm, [x21, #64] */ > +.Lcopy_data: > + ldp x22, x23, [x21] > ldp x24, x25, [x21, #16] > ldp x26, x27, [x21, #32] > ldp x28, x29, [x21, #48] > add x21, x21, #64 > - prfm pldl1strm, [x21, #64] > + /*prfm pldl1strm, [x21, #64]*/ > stnp x22, x23, [x20] > stnp x24, x25, [x20, #16] > stnp x26, x27, [x20, #32] > stnp x28, x29, [x20, #48] > add x20, x20, #64 > tst x21, #(PAGE_SIZE - 1) > - b.ne 1b > + b.ne .Lcopy_data > > /* dest += PAGE_SIZE */ > > @@ -115,6 +145,8 @@ relocate_new_kernel: > mov x3, xzr > > ldr x4, kexec_kimage_start > + dsb sy > + isb > br x4 > > .align 3 > diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S > index f1619c0..c62cba7 100644 > --- a/arch/arm64/mm/proc.S > +++ b/arch/arm64/mm/proc.S > @@ -52,6 +52,13 @@ > */ > ENTRY(cpu_cache_off) > mrs x0, sctlr_el1 > + /* Turn off I-Cache */ > + bic x0, x0, #1 << 12 // clear SCTLR.C > + msr sctlr_el1, x0 > + isb > + dsb sy > + > + mrs x0, sctlr_el1 > bic x0, x0, #1 << 2 // clear SCTLR.C > msr sctlr_el1, x0 > isb > > ############################# > > --Arun -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-LE-BE-switching-worked-with-L3.patch Type: text/x-patch Size: 2946 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140812/1d732f96/attachment.bin> ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-12 5:42 ` Arun Chandran @ 2014-08-13 11:09 ` Arun Chandran 2014-08-26 22:32 ` Geoff Levand 0 siblings, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-08-13 11:09 UTC (permalink / raw) To: linux-arm-kernel Hi Geoff, On Tue, Aug 12, 2014 at 11:12 AM, Arun Chandran <achandran@mvista.com> wrote: > Hi Geoff, Mark, > > > Sorry for top posting. I hope we have solved almost all the problems > with kexec in uni-processor scenario except converting soft_restart() > to assembly (I will give try to do this). > I have one more concern regarding flushing of D-cache area corresponding to the kexec_list entrees. Currently kexec_list_walk() is doing 1) flush_dcache_area of the kexec_list[0] till PAGE_SIZE 2) continue accessing entries in kexec_list[0] to PAGE_SIZE 3) switch to next kexec_list depending upon kexec_list[entry] & flag == IND_INDIRECTION 4) goto 1 Shouldn't that be doing flush_dcache_area() after completely using the list?? Like given below? ###################### diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 5b15a00..ffb3b54 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -155,6 +155,7 @@ static void kexec_list_walk(void *ctx, unsigned long kimage_head, { void *dest; unsigned long *entry; + void *last_accessed_dir = NULL; for (entry = &kimage_head, dest = NULL; ; entry++) { unsigned int flag = *entry & IND_FLAGS; @@ -163,7 +164,10 @@ static void kexec_list_walk(void *ctx, unsigned long kimage_head, switch (flag) { case IND_INDIRECTION: entry = (unsigned long *)addr - 1; - cb(ctx, flag, addr, NULL); + if (last_accessed_dir != addr) { + cb(ctx, flag, last_accessed_dir, NULL); + last_accessed_dir = addr; + } break; case IND_DESTINATION: dest = addr; @@ -174,7 +178,7 @@ static void kexec_list_walk(void *ctx, unsigned long kimage_head, dest += PAGE_SIZE; break; case IND_DONE: - cb(ctx, flag , NULL, NULL); + cb(ctx, flag , last_accessed_dir, NULL); return; default: pr_devel("%s:%d unknown flag %xh\n", __func__, __LINE__, @@ -617,6 +621,9 @@ on_exit: static void kexec_list_flush_cb(void *ctx , unsigned int flag, void *addr, void *dest) { + if (addr == NULL) + return; + switch (flag) { case IND_INDIRECTION: __flush_dcache_area(addr, PAGE_SIZE); @@ -624,6 +631,9 @@ static void kexec_list_flush_cb(void *ctx , unsigned int flag, case IND_SOURCE: __flush_dcache_area(addr, PAGE_SIZE); break; + case IND_DONE: + __flush_dcache_area(addr, PAGE_SIZE); + break; default: break; } ######################## --Arun ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-13 11:09 ` Arun Chandran @ 2014-08-26 22:32 ` Geoff Levand 2014-08-27 4:56 ` Arun Chandran 0 siblings, 1 reply; 61+ messages in thread From: Geoff Levand @ 2014-08-26 22:32 UTC (permalink / raw) To: linux-arm-kernel Hi Arun, On Wed, 2014-08-13 at 16:39 +0530, Arun Chandran wrote: > I have one more concern regarding flushing of D-cache area corresponding > to the kexec_list entrees. > > Currently kexec_list_walk() is doing > > 1) flush_dcache_area of the kexec_list[0] till PAGE_SIZE > > 2) continue accessing entries in kexec_list[0] to PAGE_SIZE > > 3) switch to next kexec_list depending upon kexec_list[entry] & flag > == IND_INDIRECTION > > 4) goto 1 > > Shouldn't that be doing flush_dcache_area() after completely using the list?? We just want to get any data in the dcache out to the PoC before disabling the dcache, so as long as there are only reads, and no writes to those addresses, kexec_list_walk() should work OK. I will move the flush of the new kernel image to after it is copied in relocate_new_kernel(). I think that your L3 cache may not work with what we have now: current: invalidate dcache -> turn off dcache -> write data to PoC proposed: turn off dcache -> write data to PoC -> invalidate dcache -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-08-26 22:32 ` Geoff Levand @ 2014-08-27 4:56 ` Arun Chandran 0 siblings, 0 replies; 61+ messages in thread From: Arun Chandran @ 2014-08-27 4:56 UTC (permalink / raw) To: linux-arm-kernel Hi Geoff, On Wed, Aug 27, 2014 at 4:02 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Arun, > > On Wed, 2014-08-13 at 16:39 +0530, Arun Chandran wrote: >> I have one more concern regarding flushing of D-cache area corresponding >> to the kexec_list entrees. >> >> Currently kexec_list_walk() is doing >> >> 1) flush_dcache_area of the kexec_list[0] till PAGE_SIZE >> >> 2) continue accessing entries in kexec_list[0] to PAGE_SIZE >> >> 3) switch to next kexec_list depending upon kexec_list[entry] & flag >> == IND_INDIRECTION >> >> 4) goto 1 >> >> Shouldn't that be doing flush_dcache_area() after completely using the list?? > > We just want to get any data in the dcache out to the PoC before > disabling the dcache, so as long as there are only reads, and no writes > to those addresses, kexec_list_walk() should work OK. > Yes. I missed that point. If we don't perform any writes flushing works just fine. > I will move the flush of the new kernel image to after it is copied in > relocate_new_kernel(). I think that your L3 cache may not work with > what we have now: > > current: invalidate dcache -> turn off dcache -> write data to PoC > proposed: turn off dcache -> write data to PoC -> invalidate dcache > Yes this exactly what I have done here http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/278857.html ; that code should live inside relocate_new_kernel(). I am doing the cache invalidation(only invalidation) in relocate_new_kernel(). As we run that code after cache's are off(L3 only comes to picture when lower level caches are on) we are writing to data (new kernel) to PoC. --Arun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-29 13:35 ` Mark Rutland 2014-07-29 21:19 ` Geoff Levand @ 2014-07-30 5:46 ` Arun Chandran 2014-07-30 9:16 ` Mark Rutland 2014-07-30 7:01 ` Arun Chandran 2 siblings, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-07-30 5:46 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 29, 2014 at 7:05 PM, Mark Rutland <mark.rutland@arm.com> wrote: > [...] > >> The default code did not work. >> >> It is working with the change below >> >> ############### >> diff --git a/arch/arm64/kernel/machine_kexec.c >> b/arch/arm64/kernel/machine_kexec.c >> index 5632473..7c5f859 100644 >> --- a/arch/arm64/kernel/machine_kexec.c >> +++ b/arch/arm64/kernel/machine_kexec.c >> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb) >> /** >> * kexec_list_walk - Helper to walk the kimage page list. >> */ >> - >> +static int kexec_kernel_size; >> +#define IMG_SIZE_NONE 0 >> +#define KERN_SIZE_FLAG 1 >> +#define DTB_SIZE_FLAG 2 >> static void kexec_list_walk(void *ctx, unsigned long kimage_head, >> void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest)) >> { >> void *dest; >> unsigned long *entry; >> + int imgsize_flag = IMG_SIZE_NONE; >> + >> >> for (entry = &kimage_head, dest = NULL; ; entry++) { >> unsigned int flag = *entry & IND_FLAGS; >> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned >> long kimage_head, >> cb(ctx, flag, addr, NULL); >> break; >> case IND_DESTINATION: >> + if (imgsize_flag == IMG_SIZE_NONE) { >> + kexec_kernel_size = 0; >> + imgsize_flag = KERN_SIZE_FLAG; >> + } else if (imgsize_flag == KERN_SIZE_FLAG) { >> + imgsize_flag = DTB_SIZE_FLAG; >> + } >> dest = addr; >> cb(ctx, flag, addr, NULL); >> break; >> case IND_SOURCE: >> + if (imgsize_flag == KERN_SIZE_FLAG) >> + kexec_kernel_size++; >> cb(ctx, flag, addr, dest); >> dest += PAGE_SIZE; >> break; >> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image) >> >> kexec_list_walk(NULL, image->head, kexec_list_flush_cb); >> >> + /* >> + * Make sure virtual addresses of new kernel are flushed >> + * SZ_512K = TEXT_OFFSET > > TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also > shouldn't need to be flushed. > > Since c218bca74eea (arm64: Relax the kernel cache requirements for > boot), the kernel will flush the cache for anything outside of the Image > that it writes to before enabling the MMU and caches (e.g. the idmap and > swapper page tables). Once caches are up we shouldn't care. > Ok. TEXT_OFFSET macro is not exported, that's why I used SZ_512K. Any particular reason for not exporting it? > Assuming that the existing kernel code is correct, the only region we > should need to flush out to the PoC is the region from _text to _edata > (i.e. just the contents of the Image). > >> + * kexec_kernel = kexec_kernel_size * PAGE_SIZE >> + * Don't know = (SZ_4M + SZ_1M) >> + * SZ_4M = not working >> + * SZ_6M = working >> + * SZ_8M = working >> + * >> + * so chose SZ_4M + SZ_1M; Don't know why this is required >> + * BSS, stack ?? >> + * >> + */ >> + __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K + >> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M); >> + >> soft_restart(reboot_code_buffer_phys); >> } > > How big exactly is the kernel Image you're trying to kexec? For the 1st and second stage I use the same kernel as they are combined with intramfs final image size varies. 1st stage ------------- $ls -l arch/arm64/boot/uImage -rw-rw-r-- 1 arun arun 12895544 Jul 30 10:57 arch/arm64/boot/uImage It will boot to intramfs $ls / bin dtb.dtb home lib linuxrc mnt proc run share tmp var dev etc init lib64 media opt root sbin sys usr vmlinux.strip kexec will boot vmlinux.strip $ls -l vmlinux.strip -rwxrwxr-x 1 arun arun 8194760 Jul 30 10:55 vmlinux.strip 2nd stage -------------- $ls -l arch/arm64/boot/uImage -rw-rw-r-- 1 arun arun 8127800 Jul 30 10:54 arch/arm64/boot/uImage The corresponding vmlinux is converted to vmlinux.strip $aarch64-linux-gnu-strip vmlinux -o vmlinux.strip $cp vmlinux.strip /ramfs/aarch64_le_rootfs/ --Arun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-30 5:46 ` Arun Chandran @ 2014-07-30 9:16 ` Mark Rutland 0 siblings, 0 replies; 61+ messages in thread From: Mark Rutland @ 2014-07-30 9:16 UTC (permalink / raw) To: linux-arm-kernel On Wed, Jul 30, 2014 at 06:46:39AM +0100, Arun Chandran wrote: > On Tue, Jul 29, 2014 at 7:05 PM, Mark Rutland <mark.rutland@arm.com> wrote: > > [...] > > > >> The default code did not work. > >> > >> It is working with the change below > >> > >> ############### > >> diff --git a/arch/arm64/kernel/machine_kexec.c > >> b/arch/arm64/kernel/machine_kexec.c > >> index 5632473..7c5f859 100644 > >> --- a/arch/arm64/kernel/machine_kexec.c > >> +++ b/arch/arm64/kernel/machine_kexec.c > >> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb) > >> /** > >> * kexec_list_walk - Helper to walk the kimage page list. > >> */ > >> - > >> +static int kexec_kernel_size; > >> +#define IMG_SIZE_NONE 0 > >> +#define KERN_SIZE_FLAG 1 > >> +#define DTB_SIZE_FLAG 2 > >> static void kexec_list_walk(void *ctx, unsigned long kimage_head, > >> void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest)) > >> { > >> void *dest; > >> unsigned long *entry; > >> + int imgsize_flag = IMG_SIZE_NONE; > >> + > >> > >> for (entry = &kimage_head, dest = NULL; ; entry++) { > >> unsigned int flag = *entry & IND_FLAGS; > >> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned > >> long kimage_head, > >> cb(ctx, flag, addr, NULL); > >> break; > >> case IND_DESTINATION: > >> + if (imgsize_flag == IMG_SIZE_NONE) { > >> + kexec_kernel_size = 0; > >> + imgsize_flag = KERN_SIZE_FLAG; > >> + } else if (imgsize_flag == KERN_SIZE_FLAG) { > >> + imgsize_flag = DTB_SIZE_FLAG; > >> + } > >> dest = addr; > >> cb(ctx, flag, addr, NULL); > >> break; > >> case IND_SOURCE: > >> + if (imgsize_flag == KERN_SIZE_FLAG) > >> + kexec_kernel_size++; > >> cb(ctx, flag, addr, dest); > >> dest += PAGE_SIZE; > >> break; > >> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image) > >> > >> kexec_list_walk(NULL, image->head, kexec_list_flush_cb); > >> > >> + /* > >> + * Make sure virtual addresses of new kernel are flushed > >> + * SZ_512K = TEXT_OFFSET > > > > TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also > > shouldn't need to be flushed. > > > > Since c218bca74eea (arm64: Relax the kernel cache requirements for > > boot), the kernel will flush the cache for anything outside of the Image > > that it writes to before enabling the MMU and caches (e.g. the idmap and > > swapper page tables). Once caches are up we shouldn't care. > > > > Ok. TEXT_OFFSET macro is not exported, that's why I used SZ_512K. > Any particular reason for not exporting it? The TEXT_OFFSET of the kernel you intend to execute can be found in its Image header. Take a look at Documentation/arm64/booting.txt. This may be different from the TEXT_OFFSET of the current kernel. As of v3.17 all fields in the header will be little-endian, and booting.txt has been updated for this. You can check the image_size field to determine whether the fields are all little-endian. Only when image_size is zero can TEXT_OFFSET be assumed to be 0x80000. > > > Assuming that the existing kernel code is correct, the only region we > > should need to flush out to the PoC is the region from _text to _edata > > (i.e. just the contents of the Image). > > > >> + * kexec_kernel = kexec_kernel_size * PAGE_SIZE > >> + * Don't know = (SZ_4M + SZ_1M) > >> + * SZ_4M = not working > >> + * SZ_6M = working > >> + * SZ_8M = working > >> + * > >> + * so chose SZ_4M + SZ_1M; Don't know why this is required > >> + * BSS, stack ?? > >> + * > >> + */ > >> + __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K + > >> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M); > >> + > >> soft_restart(reboot_code_buffer_phys); > >> } > > > > How big exactly is the kernel Image you're trying to kexec? > > For the 1st and second stage I use the same kernel as they > are combined with intramfs final image size varies. > > 1st stage > ------------- > $ls -l arch/arm64/boot/uImage > -rw-rw-r-- 1 arun arun 12895544 Jul 30 10:57 arch/arm64/boot/uImage > > It will boot to intramfs > > $ls / > bin dtb.dtb home lib linuxrc mnt proc run share tmp var > dev etc init lib64 media opt root sbin sys usr vmlinux.strip > > kexec will boot vmlinux.strip > $ls -l vmlinux.strip > -rwxrwxr-x 1 arun arun 8194760 Jul 30 10:55 vmlinux.strip > > 2nd stage > -------------- > $ls -l arch/arm64/boot/uImage > -rw-rw-r-- 1 arun arun 8127800 Jul 30 10:54 arch/arm64/boot/uImage > > The corresponding vmlinux is converted to vmlinux.strip > > $aarch64-linux-gnu-strip vmlinux -o vmlinux.strip > $cp vmlinux.strip /ramfs/aarch64_le_rootfs/ OK, thanks for the info. Cheers, Mark. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-29 13:35 ` Mark Rutland 2014-07-29 21:19 ` Geoff Levand 2014-07-30 5:46 ` Arun Chandran @ 2014-07-30 7:01 ` Arun Chandran 2 siblings, 0 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-30 7:01 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 29, 2014 at 7:05 PM, Mark Rutland <mark.rutland@arm.com> wrote: > [...] > >> The default code did not work. >> >> It is working with the change below >> >> ############### >> diff --git a/arch/arm64/kernel/machine_kexec.c >> b/arch/arm64/kernel/machine_kexec.c >> index 5632473..7c5f859 100644 >> --- a/arch/arm64/kernel/machine_kexec.c >> +++ b/arch/arm64/kernel/machine_kexec.c >> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb) >> /** >> * kexec_list_walk - Helper to walk the kimage page list. >> */ >> - >> +static int kexec_kernel_size; >> +#define IMG_SIZE_NONE 0 >> +#define KERN_SIZE_FLAG 1 >> +#define DTB_SIZE_FLAG 2 >> static void kexec_list_walk(void *ctx, unsigned long kimage_head, >> void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest)) >> { >> void *dest; >> unsigned long *entry; >> + int imgsize_flag = IMG_SIZE_NONE; >> + >> >> for (entry = &kimage_head, dest = NULL; ; entry++) { >> unsigned int flag = *entry & IND_FLAGS; >> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned >> long kimage_head, >> cb(ctx, flag, addr, NULL); >> break; >> case IND_DESTINATION: >> + if (imgsize_flag == IMG_SIZE_NONE) { >> + kexec_kernel_size = 0; >> + imgsize_flag = KERN_SIZE_FLAG; >> + } else if (imgsize_flag == KERN_SIZE_FLAG) { >> + imgsize_flag = DTB_SIZE_FLAG; >> + } >> dest = addr; >> cb(ctx, flag, addr, NULL); >> break; >> case IND_SOURCE: >> + if (imgsize_flag == KERN_SIZE_FLAG) >> + kexec_kernel_size++; >> cb(ctx, flag, addr, dest); >> dest += PAGE_SIZE; >> break; >> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image) >> >> kexec_list_walk(NULL, image->head, kexec_list_flush_cb); >> >> + /* >> + * Make sure virtual addresses of new kernel are flushed >> + * SZ_512K = TEXT_OFFSET > > TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also > shouldn't need to be flushed. > > Since c218bca74eea (arm64: Relax the kernel cache requirements for > boot), the kernel will flush the cache for anything outside of the Image > that it writes to before enabling the MMU and caches (e.g. the idmap and > swapper page tables). Once caches are up we shouldn't care. > > Assuming that the existing kernel code is correct, the only region we > should need to flush out to the PoC is the region from _text to _edata > (i.e. just the contents of the Image). I think we missed the dtb part. dtb is placed below the kernel. We need to flush that also. Geoff's new code manages that also. It is now working for me. --Arun > >> + * kexec_kernel = kexec_kernel_size * PAGE_SIZE >> + * Don't know = (SZ_4M + SZ_1M) >> + * SZ_4M = not working >> + * SZ_6M = working >> + * SZ_8M = working >> + * >> + * so chose SZ_4M + SZ_1M; Don't know why this is required >> + * BSS, stack ?? >> + * >> + */ >> + __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K + >> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M); >> + >> soft_restart(reboot_code_buffer_phys); >> } > > How big exactly is the kernel Image you're trying to kexec? > > Mark. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-24 9:36 ` Mark Rutland 2014-07-24 12:49 ` Arun Chandran 2014-07-25 0:17 ` Geoff Levand @ 2014-07-25 10:26 ` Arun Chandran 2014-07-25 11:29 ` Mark Rutland 2 siblings, 1 reply; 61+ messages in thread From: Arun Chandran @ 2014-07-25 10:26 UTC (permalink / raw) To: linux-arm-kernel On Thu, Jul 24, 2014 at 3:06 PM, Mark Rutland <mark.rutland@arm.com> wrote: > On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote: >> Hi Arun, >> >> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote: >> >> > I tried the same dtb with UP configuration. For UP kernel to compile >> > did the below modifications >> >> I'll test and fixup the kexec UP build in the next few days. >> >> ... >> >> > With the default target configuration "kexec -e" failed to execute >> > in UP scenario also. > > It would be helpful to know _how_ it failed. Do you have any log output? > >> > >> > But I had some luck when I did the same steps with L3 cache >> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html >> > it has an L3 cache. Luckily I was able to disable it in u-boot. >> > >> > With the L3 cache disabled configuration I am able to >> > do "kexec -e". Please see the log attached. > > Hmm. We don't expect the kernel to do any L3 management. It seems that > memory subsystems with L3 caches respecting cache maintenance by VA are > going to become relatively common, and we expect to handle them all by > performing maintenance by VA. See commit c218bca74eea (arm64: Relax the > kernel cache requirements for boot) for what we do at boot time. > >> >> All memory management for the main cpu is done by the arch code. Kexec >> and cpu hot plug only work with the secondary cpus, so the problem would >> be in the arch memory code, either in setup_restart() for shutdown, or >> in the startup code. > > It's possible that soft_restart and setup_restart are a little dodgy, as > they also rely on the compiler being smart and not touching the stack > after setup_restart(). > Could you please explain why this is required? This is my disassembled output of soft_restart() With the latest code from https://git.linaro.org/people/geoff.levand/linux-kexec.git ffffffc000085014 <soft_restart>: ffffffc000085014: a9be7bfd stp x29, x30, [sp,#-32]! ffffffc000085018: 910003fd mov x29, sp ffffffc00008501c: f9000fa0 str x0, [x29,#24] ffffffc000085020: 94003c49 bl ffffffc000094144 <setup_mm_for_reboot> ffffffc000085024: 94003a6b bl ffffffc0000939d0 <flush_cache_all> ffffffc000085028: 94003cde bl ffffffc0000943a0 <cpu_cache_off> ffffffc00008502c: 94003a69 bl ffffffc0000939d0 <flush_cache_all> ffffffc000085030: 90006201 adrp x1, ffffffc000cc5000 <reset_devices> ffffffc000085034: f9400fa0 ldr x0, [x29,#24] ffffffc000085038: f940fc22 ldr x2, [x1,#504] ffffffc00008503c: f0000061 adrp x1, ffffffc000094000 <arch_pick_mmap_layout+0x150> ffffffc000085040: 910f0021 add x1, x1, #0x3c0 ffffffc000085044: 8b010041 add x1, x2, x1 ffffffc000085048: d2c00802 mov x2, #0x4000000000 // #274877906944 ffffffc00008504c: 8b020021 add x1, x1, x2 ffffffc000085050: d63f0020 blr x1 ffffffc000085054: f0002940 adrp x0, ffffffc0005b0000 <kallsyms_token_table+0x200> ffffffc000085058: f0002941 adrp x1, ffffffc0005b0000 <kallsyms_token_table+0x200> ffffffc00008505c: 90002143 adrp x3, ffffffc0004ad000 <__start_rodata> ffffffc000085060: 91128000 add x0, x0, #0x4a0 ffffffc000085064: 913de021 add x1, x1, #0xf78 ffffffc000085068: 52800c22 mov w2, #0x61 // #97 ffffffc00008506c: 91072063 add x3, x3, #0x1c8 ffffffc000085070: 941071d0 bl ffffffc0004a17b0 <printk> ffffffc000085074: f0002940 adrp x0, ffffffc0005b0000 <kallsyms_token_table+0x200> ffffffc000085078: 91134000 add x0, x0, #0x4d0 ffffffc00008507c: 9410712c bl ffffffc0004a152c <panic> If I single step the code, This is how my stack looks like @ffffffc00008501c CPU#0>mdd 0xffffffc3eb83fcf0 ffffffc3_eb83fcf0 : ffffffc3eb83fd10 ........ ffffffc3_eb83fcf8 : ffffffc000092778 ......'x ffffffc3_eb83fd00 : ffffffc000cc9f70 .......p ffffffc3_eb83fd08 : 00000043eae32000 ...C.. . ffffffc3_eb83fd10 : ffffffc3eb83fd70 .......p ffffffc3_eb83fd18 : ffffffc0000fc018 ........ ffffffc3_eb83fd20 : ffffffc000c95000 ......P. ffffffc3_eb83fd28 : 0000000000000000 ........ ffffffc3_eb83fd30 : ffffffc000cd06a0 ........ ffffffc3_eb83fd38 : 0000000000000000 ........ ffffffc3_eb83fd40 : 0000000080000000 ........ ffffffc3_eb83fd48 : 0000000000000015 ........ ffffffc3_eb83fd50 : 0000000000000115 ........ ffffffc3_eb83fd58 : 000000000000008e ........ ffffffc3_eb83fd60 : ffffffc000c8b000 ........ ffffffc3_eb83fd68 : ffffffc3eb83c000 ........ And this is how it looks like @ffffffc000085030 CPU#0>mdd 0xffffffc3eb83fcf0 ffffffc3_eb83fcf0 : 0000000000000115 ........ ffffffc3_eb83fcf8 : 000000000000003f .......? ffffffc3_eb83fd00 : ffffffc3eb83fd30 .......0 ffffffc3_eb83fd08 : ffffffc000120360 .......` ffffffc3_eb83fd10 : 0000000000000002 ........ ffffffc3_eb83fd18 : ffffffbcedb611c0 ........ ffffffc3_eb83fd20 : ffffffbcedb611c0 ........ ffffffc3_eb83fd28 : ffffffc3eae08000 ........ ffffffc3_eb83fd30 : ffffffc3000200d0 ........ ffffffc3_eb83fd38 : ffffffc000120708 ........ ffffffc3_eb83fd40 : 0000000000000000 ........ ffffffc3_eb83fd48 : 72a00040528f8fe0 r.. at R... ffffffc3_eb83fd50 : 540002a16a00003f T...j..? ffffffc3_eb83fd58 : 3627fe60f8538260 6'.`.S.` ffffffc3_eb83fd60 : 97fff80daa1303e0 ........ ffffffc3_eb83fd68 : 97fc01ffaa1403e0 ........ It is clearly getting corrupted. Now with keeping caches on ###### diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 786daa6..6ff3d9f 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -76,10 +76,10 @@ static void setup_restart(void) flush_cache_all(); /* Turn D-cache off */ - cpu_cache_off(); + //cpu_cache_off(); /* Push out any further dirty data, and ensure cache is empty */ - flush_cache_all(); + //flush_cache_all(); } void soft_restart(unsigned long addr) ####### ffffffc000085014 <soft_restart>: ffffffc000085014: a9be7bfd stp x29, x30, [sp,#-32]! ffffffc000085018: 910003fd mov x29, sp ffffffc00008501c: f9000fa0 str x0, [x29,#24] ffffffc000085020: 94003c49 bl ffffffc000094144 <setup_mm_for_reboot> ffffffc000085024: 94003a6b bl ffffffc0000939d0 <flush_cache_all> ffffffc000085028: 90006201 adrp x1, ffffffc000cc5000 <reset_devices> ffffffc00008502c: f9400fa0 ldr x0, [x29,#24] ffffffc000085030: f940fc22 ldr x2, [x1,#504] ffffffc000085034: f0000061 adrp x1, ffffffc000094000 <arch_pick_mmap_layout+0x150> ffffffc000085038: 910f0021 add x1, x1, #0x3c0 ffffffc00008503c: 8b010041 add x1, x2, x1 ffffffc000085040: d2c00802 mov x2, #0x4000000000 // #274877906944 ffffffc000085044: 8b020021 add x1, x1, x2 ffffffc000085048: d63f0020 blr x1 ffffffc00008504c: f0002940 adrp x0, ffffffc0005b0000 <kallsyms_token_table+0x200> ffffffc000085050: f0002941 adrp x1, ffffffc0005b0000 <kallsyms_token_table+0x200> ffffffc000085054: 90002143 adrp x3, ffffffc0004ad000 <__start_rodata> ffffffc000085058: 91128000 add x0, x0, #0x4a0 ffffffc00008505c: 913de021 add x1, x1, #0xf78 ffffffc000085060: 52800c22 mov w2, #0x61 // #97 ffffffc000085064: 91072063 add x3, x3, #0x1c8 ffffffc000085068: 941071d2 bl ffffffc0004a17b0 <printk> ffffffc00008506c: f0002940 adrp x0, ffffffc0005b0000 <kallsyms_token_table+0x200> ffffffc000085070: 91134000 add x0, x0, #0x4d0 ffffffc000085074: 9410712e bl ffffffc0004a152c <panic> Now my stack @ffffffc00008501c and @ffffffc000085028 are same. It is CPU#0>mdd 0xffffffc3eae27cf0 ffffffc3_eae27cf0 : ffffffc3eae27d10 ......}. ffffffc3_eae27cf8 : ffffffc000092778 ......'x ffffffc3_eae27d00 : ffffffc000cc9f70 .......p ffffffc3_eae27d08 : 00000043f0171000 ...C.... ffffffc3_eae27d10 : ffffffc3eae27d70 ......}p ffffffc3_eae27d18 : ffffffc0000fc018 ........ ffffffc3_eae27d20 : ffffffc000c95000 ......P. ffffffc3_eae27d28 : 0000000000000000 ........ ffffffc3_eae27d30 : ffffffc000cd06a0 ........ ffffffc3_eae27d38 : 0000000000000000 ........ ffffffc3_eae27d40 : 0000000080000000 ........ ffffffc3_eae27d48 : 0000000000000015 ........ ffffffc3_eae27d50 : 0000000000000115 ........ ffffffc3_eae27d58 : 000000000000008e ........ ffffffc3_eae27d60 : ffffffc000c8b000 ........ ffffffc3_eae27d68 : ffffffc3eae24000 ...... at . --Arun ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-25 10:26 ` Arun Chandran @ 2014-07-25 11:29 ` Mark Rutland 0 siblings, 0 replies; 61+ messages in thread From: Mark Rutland @ 2014-07-25 11:29 UTC (permalink / raw) To: linux-arm-kernel On Fri, Jul 25, 2014 at 11:26:46AM +0100, Arun Chandran wrote: > On Thu, Jul 24, 2014 at 3:06 PM, Mark Rutland <mark.rutland@arm.com> wrote: > > On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote: > >> Hi Arun, > >> > >> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote: > >> > >> > I tried the same dtb with UP configuration. For UP kernel to compile > >> > did the below modifications > >> > >> I'll test and fixup the kexec UP build in the next few days. > >> > >> ... > >> > >> > With the default target configuration "kexec -e" failed to execute > >> > in UP scenario also. > > > > It would be helpful to know _how_ it failed. Do you have any log output? > > > >> > > >> > But I had some luck when I did the same steps with L3 cache > >> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html > >> > it has an L3 cache. Luckily I was able to disable it in u-boot. > >> > > >> > With the L3 cache disabled configuration I am able to > >> > do "kexec -e". Please see the log attached. > > > > Hmm. We don't expect the kernel to do any L3 management. It seems that > > memory subsystems with L3 caches respecting cache maintenance by VA are > > going to become relatively common, and we expect to handle them all by > > performing maintenance by VA. See commit c218bca74eea (arm64: Relax the > > kernel cache requirements for boot) for what we do at boot time. > > > >> > >> All memory management for the main cpu is done by the arch code. Kexec > >> and cpu hot plug only work with the secondary cpus, so the problem would > >> be in the arch memory code, either in setup_restart() for shutdown, or > >> in the startup code. > > > > It's possible that soft_restart and setup_restart are a little dodgy, as > > they also rely on the compiler being smart and not touching the stack > > after setup_restart(). > > > Could you please explain why this is required? It's a requirement because we have no guarantee that the stack (or any other memory addresses) will be flushed out to the PoC and become visible to non-cacheable accesses. Any use of the stack after we disable the d-cache is a bug. If you need a VA range visible to non-cacheable accesses (i.e. after the cache is disabled), you must flush that range to the PoC by VA. The kernel text _should_ be out at the PoC per the boot protocol requirements, so we don't have to flush it to execute correctly unless we've performed some kernel text patching. Arguably the first __flush_dcache_all call here is unnecessary; it doesn't provide any guarantee we can rely on. The __flush_dcache_all _after_ disabling the caches will ensure that the local caches are empty (avoiding unexpected hits for non-cacheable accesses). > > This is my disassembled output of soft_restart() > With the latest code from > https://git.linaro.org/people/geoff.levand/linux-kexec.git > > ffffffc000085014 <soft_restart>: > ffffffc000085014: a9be7bfd stp x29, x30, [sp,#-32]! > ffffffc000085018: 910003fd mov x29, sp > ffffffc00008501c: f9000fa0 str x0, [x29,#24] > ffffffc000085020: 94003c49 bl ffffffc000094144 > <setup_mm_for_reboot> > ffffffc000085024: 94003a6b bl ffffffc0000939d0 > <flush_cache_all> > ffffffc000085028: 94003cde bl ffffffc0000943a0 <cpu_cache_off> > ffffffc00008502c: 94003a69 bl ffffffc0000939d0 > <flush_cache_all> > ffffffc000085030: 90006201 adrp x1, ffffffc000cc5000 > <reset_devices> > ffffffc000085034: f9400fa0 ldr x0, [x29,#24] > ffffffc000085038: f940fc22 ldr x2, [x1,#504] > ffffffc00008503c: f0000061 adrp x1, ffffffc000094000 > <arch_pick_mmap_layout+0x150> > ffffffc000085040: 910f0021 add x1, x1, #0x3c0 > ffffffc000085044: 8b010041 add x1, x2, x1 > ffffffc000085048: d2c00802 mov x2, #0x4000000000 > // #274877906944 > ffffffc00008504c: 8b020021 add x1, x1, x2 > ffffffc000085050: d63f0020 blr x1 > ffffffc000085054: f0002940 adrp x0, ffffffc0005b0000 > <kallsyms_token_table+0x200> > ffffffc000085058: f0002941 adrp x1, ffffffc0005b0000 > <kallsyms_token_table+0x200> > ffffffc00008505c: 90002143 adrp x3, ffffffc0004ad000 > <__start_rodata> > ffffffc000085060: 91128000 add x0, x0, #0x4a0 > ffffffc000085064: 913de021 add x1, x1, #0xf78 > ffffffc000085068: 52800c22 mov w2, #0x61 > // #97 > ffffffc00008506c: 91072063 add x3, x3, #0x1c8 > ffffffc000085070: 941071d0 bl ffffffc0004a17b0 <printk> > ffffffc000085074: f0002940 adrp x0, ffffffc0005b0000 > <kallsyms_token_table+0x200> > ffffffc000085078: 91134000 add x0, x0, #0x4d0 > ffffffc00008507c: 9410712c bl ffffffc0004a152c <panic> > > If I single step the code, > > This is how my stack looks like @ffffffc00008501c > CPU#0>mdd 0xffffffc3eb83fcf0 > ffffffc3_eb83fcf0 : ffffffc3eb83fd10 ........ > ffffffc3_eb83fcf8 : ffffffc000092778 ......'x > ffffffc3_eb83fd00 : ffffffc000cc9f70 .......p > ffffffc3_eb83fd08 : 00000043eae32000 ...C.. . > ffffffc3_eb83fd10 : ffffffc3eb83fd70 .......p > ffffffc3_eb83fd18 : ffffffc0000fc018 ........ > ffffffc3_eb83fd20 : ffffffc000c95000 ......P. > ffffffc3_eb83fd28 : 0000000000000000 ........ > ffffffc3_eb83fd30 : ffffffc000cd06a0 ........ > ffffffc3_eb83fd38 : 0000000000000000 ........ > ffffffc3_eb83fd40 : 0000000080000000 ........ > ffffffc3_eb83fd48 : 0000000000000015 ........ > ffffffc3_eb83fd50 : 0000000000000115 ........ > ffffffc3_eb83fd58 : 000000000000008e ........ > ffffffc3_eb83fd60 : ffffffc000c8b000 ........ > ffffffc3_eb83fd68 : ffffffc3eb83c000 ........ > > And this is how it looks like @ffffffc000085030 > CPU#0>mdd 0xffffffc3eb83fcf0 > ffffffc3_eb83fcf0 : 0000000000000115 ........ > ffffffc3_eb83fcf8 : 000000000000003f .......? > ffffffc3_eb83fd00 : ffffffc3eb83fd30 .......0 > ffffffc3_eb83fd08 : ffffffc000120360 .......` > ffffffc3_eb83fd10 : 0000000000000002 ........ > ffffffc3_eb83fd18 : ffffffbcedb611c0 ........ > ffffffc3_eb83fd20 : ffffffbcedb611c0 ........ > ffffffc3_eb83fd28 : ffffffc3eae08000 ........ > ffffffc3_eb83fd30 : ffffffc3000200d0 ........ > ffffffc3_eb83fd38 : ffffffc000120708 ........ > ffffffc3_eb83fd40 : 0000000000000000 ........ > ffffffc3_eb83fd48 : 72a00040528f8fe0 r.. at R... > ffffffc3_eb83fd50 : 540002a16a00003f T...j..? > ffffffc3_eb83fd58 : 3627fe60f8538260 6'.`.S.` > ffffffc3_eb83fd60 : 97fff80daa1303e0 ........ > ffffffc3_eb83fd68 : 97fc01ffaa1403e0 ........ > > It is clearly getting corrupted. No it is not getting corrupted. At ffffffc00008501c the CPU's cache is enabled (i.e. the CPU can make cacheable accesses). The stack is normal cacheable (WBWA) memory, so accesses will look-up and allocate in the cache. As the stack is being accessed often it is warm, and the cache hierarchy does not decide to evict it and/or write-back to the next level of the cache hierarchy. At ffffffc000085030 the cache is disabled (i.e. the CPU cannot make cacheable accesses). The effective attributes for the stack are non-cacheable, so the CPU will bypass the cache hierarchy and go straight to the PoC when it tried to read stack addresses. As we have not flushed the stack by VA, (dirty) cachelines covering the stack might be anywhere in the cache hierarchy, and aren't guaranteed to have reached to PoC. So the CPU reads _stale_ data from the PoC. There is no corruption; the cache lines are still valid somewhere (we never invalidated them without cleaning them first), but are simply not visible to non-cacheable accesses. Are we actually accessing the stack? If not, then there's no need to worry. If we are, then the code which is doing so is buggy and either needs to flush the stack by VA to the PoC, or (better) not use the stack in this case and just use registers. Does that all make sense? Or do I need to get my ascii-art paintbrush out? > > Now with keeping caches on > ###### > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index 786daa6..6ff3d9f 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -76,10 +76,10 @@ static void setup_restart(void) > flush_cache_all(); > > /* Turn D-cache off */ > - cpu_cache_off(); > + //cpu_cache_off(); > > /* Push out any further dirty data, and ensure cache is empty */ > - flush_cache_all(); > + //flush_cache_all(); > } > > void soft_restart(unsigned long addr) > ####### > > ffffffc000085014 <soft_restart>: > ffffffc000085014: a9be7bfd stp x29, x30, [sp,#-32]! > ffffffc000085018: 910003fd mov x29, sp > ffffffc00008501c: f9000fa0 str x0, [x29,#24] > ffffffc000085020: 94003c49 bl ffffffc000094144 > <setup_mm_for_reboot> > ffffffc000085024: 94003a6b bl ffffffc0000939d0 > <flush_cache_all> > ffffffc000085028: 90006201 adrp x1, ffffffc000cc5000 > <reset_devices> > ffffffc00008502c: f9400fa0 ldr x0, [x29,#24] > ffffffc000085030: f940fc22 ldr x2, [x1,#504] > ffffffc000085034: f0000061 adrp x1, ffffffc000094000 > <arch_pick_mmap_layout+0x150> > ffffffc000085038: 910f0021 add x1, x1, #0x3c0 > ffffffc00008503c: 8b010041 add x1, x2, x1 > ffffffc000085040: d2c00802 mov x2, #0x4000000000 > // #274877906944 > ffffffc000085044: 8b020021 add x1, x1, x2 > ffffffc000085048: d63f0020 blr x1 > ffffffc00008504c: f0002940 adrp x0, ffffffc0005b0000 > <kallsyms_token_table+0x200> > ffffffc000085050: f0002941 adrp x1, ffffffc0005b0000 > <kallsyms_token_table+0x200> > ffffffc000085054: 90002143 adrp x3, ffffffc0004ad000 > <__start_rodata> > ffffffc000085058: 91128000 add x0, x0, #0x4a0 > ffffffc00008505c: 913de021 add x1, x1, #0xf78 > ffffffc000085060: 52800c22 mov w2, #0x61 > // #97 > ffffffc000085064: 91072063 add x3, x3, #0x1c8 > ffffffc000085068: 941071d2 bl ffffffc0004a17b0 <printk> > ffffffc00008506c: f0002940 adrp x0, ffffffc0005b0000 > <kallsyms_token_table+0x200> > ffffffc000085070: 91134000 add x0, x0, #0x4d0 > ffffffc000085074: 9410712e bl ffffffc0004a152c <panic> > > Now my stack @ffffffc00008501c and @ffffffc000085028 are same. Sure, with the caches on you have the guarantee that prior writes are still visible. You either need to perform maintenance by VA to make the stack visible, or (better) don't access memory once the cache is disabled. Just load everything you need into registers in advance. Thanks, Mark. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-24 0:38 ` Geoff Levand 2014-07-24 9:36 ` Mark Rutland @ 2014-07-24 11:50 ` Arun Chandran 1 sibling, 0 replies; 61+ messages in thread From: Arun Chandran @ 2014-07-24 11:50 UTC (permalink / raw) To: linux-arm-kernel Hi, On Thu, Jul 24, 2014 at 6:08 AM, Geoff Levand <geoff@infradead.org> wrote: > Hi Arun, > > On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote: > >> I tried the same dtb with UP configuration. For UP kernel to compile >> did the below modifications > > I'll test and fixup the kexec UP build in the next few days. > Ok. > ... > >> With the default target configuration "kexec -e" failed to execute >> in UP scenario also. >> >> But I had some luck when I did the same steps with L3 cache >> disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html >> it has an L3 cache. Luckily I was able to disable it in u-boot. >> >> With the L3 cache disabled configuration I am able to >> do "kexec -e". Please see the log attached. > > All memory management for the main cpu is done by the arch code. Kexec > and cpu hot plug only work with the secondary cpus, so the problem would > be in the arch memory code, either in setup_restart() for shutdown, or > in the startup code. > > I guess setup_restart() is not doing something it needs to do for your > platform. > I have done different experiments with L3 enabled in UP(uni processor) scenario. Please note that in all the experiments first stage and second stage kernels are same. -- Experiment 1-- Kernel is modified to loop before jumping to the kexec "relocate_new_kernel" code + other modification (disable Dcache turning off) ############### diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 3f7b0a2..e4ea22f 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -73,6 +73,8 @@ ENTRY(cpu_reset) bic x1, x1, #1 msr sctlr_el1, x1 // disable the MMU isb +loop: + b loop diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 31cba91..888fe3f 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -70,10 +70,10 @@ static void setup_restart(void) flush_cache_all(); /* Turn D-cache off */ - cpu_cache_off(); + //cpu_cache_off(); /* Push out any further dirty data, and ensure cache is empty */ - flush_cache_all(); + //flush_cache_all(); } ################ a) Load the second kernel "kexec -l" b) Execute kexec -e; now it is looping @loop c) Break into target using BDI3000 d) Flush L3 cache from BDI3000 c) Jump to relocate_new_kernel CPU#0>rd GPR00: 00000043eae0f000 0000000034d5d91c 0000004000000000 0000000000000004 CPU#0>go 0x00000043eae0f000 e) Kexeced kernel is booted without any issue. --Experiment2-- Now revert only the Dcache disabling change ############ diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 888fe3f..6bc85f78 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -70,10 +70,10 @@ static void setup_restart(void) flush_cache_all(); /* Turn D-cache off */ - //cpu_cache_off(); + cpu_cache_off(); /* Push out any further dirty data, and ensure cache is empty */ - //flush_cache_all(); + flush_cache_all(); } ############### Do the above steps a, b and c. d) >From BDI3000 I see strange value for x0 CPU#0>rd GPR00: 000000000000003f 0000000034d5d918 0000004000000000 0000000000000004 e) Flush L3 cache from BDI3000 f) Jump to relocate_new_kernel (kexec -e prints this address) machine_kexec:584: reboot_code_buffer_phys: 00000043f0381000 CPU#0>go 0x00000043f0381000 g) Kexeced kernel fails to boot CPU#0>h Core number : 0 Core state : debug (AArch64 EL1) Debug entry cause : External Debug Request Current PC : 0xffffffc000083200 Current CPSR : 0x000003c5 (EL1h) So If i don't turn off the dcache and flush L3 using BDI3000 things are working. --Experiment3-- Added L3 flush code to kernel + other modification (disable Dcache turning off) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 0faa45a..5c546bb 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -233,6 +233,20 @@ section_table: ENTRY(stext) mov x21, x0 // x21=FDT +flush_l3: + mov x2, #0x10 + mov w1, #0x1f00 + movk x2, #0x7e60, lsl #16 + movk w1, #0x1600, lsl #16 + str w1, [x2] + mov x4, #0 +wait_flush: + ldr w1, [x2] + add x4, x4 ,#1 + tbz w1, #31, wait_done + b wait_flush +wait_done: + bl el2_setup // Drop to EL1, w20=cpu_boot_mode bl __calc_phys_offset // x24=PHYS_OFFSET, x28=PHYS_OFFSET-PAGE_OFFSET bl set_cpu_boot_mode_flag diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 31cba91..888fe3f 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -70,10 +70,10 @@ static void setup_restart(void) flush_cache_all(); /* Turn D-cache off */ - cpu_cache_off(); + //cpu_cache_off(); /* Push out any further dirty data, and ensure cache is empty */ - flush_cache_all(); + //flush_cache_all(); } /* diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index c29dde1..3f7b0a2 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -73,7 +73,22 @@ ENTRY(cpu_reset) bic x1, x1, #1 msr sctlr_el1, x1 // disable the MMU isb - bl secondary_shutdown + +flush_l3: + mov x2, #0x10 + mov w1, #0x1f00 + movk x2, #0x7e60, lsl #16 + movk w1, #0x1600, lsl #16 + str w1, [x2] + mov x4, #0 +wait_flush: + ldr w1, [x2] + add x4, x4 ,#1 + tbz w1, #31, wait_done + b wait_flush +wait_done: + +# bl secondary_shutdown ret x0 ENDPROC(cpu_reset) Now also kexeced kernel boots fine. If i do the same with "Dcache turning off enabled" booting of kexeced kernel fails. --Arun ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-22 13:25 ` Arun Chandran 2014-07-24 0:38 ` Geoff Levand @ 2014-07-30 3:26 ` Feng Kan 1 sibling, 0 replies; 61+ messages in thread From: Feng Kan @ 2014-07-30 3:26 UTC (permalink / raw) To: linux-arm-kernel > > But I had some luck when I did the same steps with L3 cache > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html > it has an L3 cache. Luckily I was able to disable it in u-boot. > > With the L3 cache disabled configuration I am able to > do "kexec -e". Please see the log attached. > > Feng, > I doubt kernel is unaware of the presence of L3 cache, this subsequently > makes "kexec -e" to fail. Yes, L3 is turned on prior to entering Linux. It is used when Linux enable cache in the MMU. > > Do you have any idea how to make the kernel to take control of L3 cache? We don't have this code. Using address 0 to get back to spin address would require you to disable cache prior to jump. > > --Arun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-22 9:44 ` Arun Chandran 2014-07-22 13:25 ` Arun Chandran @ 2014-07-24 0:10 ` Geoff Levand 2014-07-24 9:13 ` Mark Rutland 2 siblings, 0 replies; 61+ messages in thread From: Geoff Levand @ 2014-07-24 0:10 UTC (permalink / raw) To: linux-arm-kernel Hi Arun, On Tue, 2014-07-22 at 15:14 +0530, Arun Chandran wrote: > # kexec -e Please use 'kexec -d -e' here to get debug output. > kexec version: 1kvm: exiting hardware virtualization > 4.07.17.12.17-gbStarting new kernel > 6cccb4 > Ump_spin_tanblaeb_lcpeu_d ite:127: oi dh:a n7d,l holding count: 0e > kernel NULL pointer dereference at virtual address 00000291 > smp_spin_Itnaibtlei_cpaul_diie:12z7:i nigd :c 3g,r oholding couunpt :s 0u It is hard to read this, please send the various outputs to different streams. > I think some of the secondary CPUs are not behaving as expected; > As of now I don't have any clues for this. I guess the cpu-return-addr is not correct. Try setting cpu-return-addr to <0x0 0x1>. This will spin secondaries in secondary_holding_pen. -Geoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Kexec on arm64 2014-07-22 9:44 ` Arun Chandran 2014-07-22 13:25 ` Arun Chandran 2014-07-24 0:10 ` Geoff Levand @ 2014-07-24 9:13 ` Mark Rutland 2 siblings, 0 replies; 61+ messages in thread From: Mark Rutland @ 2014-07-24 9:13 UTC (permalink / raw) To: linux-arm-kernel [...] > 2) Rebooting > ######################### > # kexec -e > kexec version: 1kvm: exiting hardware virtualization > 4.07.17.12.17-gbStarting new kernel > 6cccb4 > Ump_spin_tanblaeb_lcpeu_d ite:127: oi dh:a n7d,l holding count: 0e > kernel NULL pointer dereference at virtual address 00000291 > smp_spin_Itnaibtlei_cpaul_diie:12z7:i nigd :c 3g,r oholding couunpt :s 0u Hmm. This looks like two threads/CPUs are outputting characters at the same time, something like "smp_spin_table_die" and "Initial cpu" seem to be interspersed. Mark. > bsys cpu > smpL_isnpuixn _table_cpu_diev:e1r2s7i:o ni d: 6, hol3d.i1n6g. 0c-orucnt: 0 > 4+ (arun at arun-OptiPlex-9010) (gcc version 4.9.1 20140505 (prerelease) > (crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05) ) > #25 SMP smp_sPpin_tablReE_EcMpPuT_ dTie:127: iude: J5u, hlo ld2ing > coun2: > 37:03 IST 2014 > smp_Cspin_tPabUl:e _AcApruc_die:127: id: h46,4 hPorldiong count:c e0s > or [500f0000] revision 0 > smp_speifni_:t aGbeltet_cpu_die:i1n2g7 : ipd:a 2, holdinrga mceount:t e0 > rs from FDT: > smep_fsip:i nC_atable_cpu_dien:'1t27: i df: 1i, holdinngd cSoyusntt:e 0 > m Table in device tree! > macchimne_kexaec:: 572C: smp_pMrAo:c efsasiorl_ied = 0 > d to reserve 16 MiB > dachine_kexecO:n5 7n4o: > e 0 totalpages: 4194304 > a k e xec image inNfoor:m > l zone: 57344 pages used for memmap > type: 0 > z sta r tN:o r m4a0000800l04 > one: 4194304 pages, LIFO batch:31 > head: P E R 43Cea9bPf002 > U: Embedded 11 pages/cpu @ffffffc3fff7d000 s13120 r8192 d23744 u45056 > nrp_cspeug-maelnltosc: 2 > : s13120 r8192 d23744 u45056 alloc=11*4096 > p c p usegment-[a0l]l:o c0:0 0[0000400]00 800 000[ -0 ]0000004000 > 881c 0[000], 280c000h bytes, [200]6 0 3pages > [0] 4 [0] 5 [0] 6 [0] 7 > eexBeuc_is_dtb:1i1l5:t m1a gizc: 0 : 0 : noon > lists in Zone order, mobility grouping on. Total pages: 4136960 > / K e r segment[1]: n0e0l0 00c0o400m08a0000 m-a n0d00 > 00040l008ai30n00e,: 3r000ho obty=tes, 3/ dpeagesv > nfs rw nfsroot=10.162.103.228:/nfs_root/dora_june_6/apm-image-minimal-mustangbe > ip=10.162.103.21:10.162.103.228:10.162.103.1:255.255.255.0:mustangk:eextehc_0is:_odtb:1f15f: > pmaangic:i 0c =: 0 : 1no > console=ttyS0,115200 earlyprintk=uart8250-32bit,0x1c020000 debug > maxcpus=8 swiotlb=65536 log_buf_len=1M > 8aclhinoeg_kex_ec:582: cobnturfo_ll_ecode_page: nf:f f1ff0fbc4edb67ee8 > 576 > 6achinee_akrelxyec: 58l4: reobogo tb_ucfo dfe_buffer_physr:e e :0 > 000010435eaffb00007 > (92%) > macPhine_IkDexec:58 6h:a srehb otot_acode_bufferb:l e e n t > rfifffffce3se:a f4f0b000 > 96 (order: 3, 32768 bytes) > macDhine_kexeecn:t5r88: ryel occate_neaw_ckheer nelh:a s fffffhf > c0t0a0093b18 > ble entries: 2097152 (order: 12, 16777216 bytes) > machineI_kneoxedc:5e90-: relocate_cnaecwh_ek ehransel_size: b8hh( 1t84)a bytes > ble entries: 1048576 (order: 11, 8388608 bytes) > machinMe_keemxoercy::5 913: kexec6_5d0t8b_6addr2:4 K > 0/0100004600708a0000 > 77216K available (4360K kernel code, 299K rwdata, 1528K rodata, 6556K > init, 202K bss, 268592K reserved) > oacVhinei_kretxueacl: 595: kexec_kkeirmnaegle _mhead: e m o r0y0 > 00l0043eaa9bf002y > ut: > vmalloc : 0xffffff8000000000 - 0xffffffbbffff0000 (245759 MB) > vmemmap : 0xffffffbce0000000 - 0xffffffbcee000000 ( 224 MB) > 0 machine_ kmeoxdecu:l597:e kexecs_ k:i m0age_xsftarft: f f > f0f0b0f0f0c04000080004 > 00000 - 0xffffffc000000000 ( 64 MB) > memory : 0xffffffc000000000 - 0xffffffc400000000 ( 16384 MB) > .init : 0xffffffc000642000 -machine_ke x0excf:f5f99:f > kexec_efntfrcy0_0d0ump: > ca9340 ( 6557 kB) > .text : 0xffffffc000080000 - 0xffffffc0006411c4 ( 5893 kB) > f .data : 0xffffffc000caa000 - 0xffffffc000cf4f28 (I > 43ea9bf002 = 343ea9b0f000 (f0fffffc 3ekaB9)b > 000) > d D 40S0LU00B8:0 0H0W1 = a4000080000l i(gfnf=f6f4ffc,00008000 0O)r > ###################### > > This doesn't seems to be working. Random behaviors are observed. Some > times it rebooted to u-boot > prompt. Sometimes kernel soft resets itself in an endless loop > (bootlog is repeating over and over again) > > To debug what is happening I put a while(1) just before jumping into > kexec reboot code. > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index 31cba91..8843623 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -85,6 +85,7 @@ void soft_restart(unsigned long addr) > > smp_secondary_shutdown(); > > + while(1); > /* Switch to the identity mapping */ > phys_reset = (phys_reset_t)virt_to_phys(cpu_reset); > phys_reset(addr); > > I break into target with BDI3000 now; and see the below output > > TARGET#0>state > Core#0: halted 0xffffffc000085240 External Debug Request > Core#1: halted 0x0000004000080394 External Debug Request > Core#2: halted 0x0000004000080394 External Debug Request > Core#3: halted 0x0000004000080394 External Debug Request > Core#4: halted 0xffffffc0000802f8 External Debug Request > Core#5: halted 0x0000004000080394 External Debug Request > Core#6: halted 0xffffffc0000802f8 External Debug Request > Core#7: halted 0x0000004000080394 External Debug Request > > I think some of the secondary CPUs are not behaving as expected; > As of now I don't have any clues for this. > > > --Arun > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2014-08-27 4:56 UTC | newest] Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-07-09 10:13 Kexec on arm64 Arun Chandran 2014-07-09 13:58 ` Arun Chandran 2014-07-09 18:49 ` Geoff Levand 2014-07-11 9:23 ` Arun Chandran 2014-07-11 16:58 ` Geoff Levand 2014-07-11 11:26 ` Arun Chandran 2014-07-12 0:19 ` Geoff Levand 2014-07-14 12:21 ` Arun Chandran 2014-07-11 15:43 ` Arun Chandran 2014-07-14 22:05 ` Geoff Levand 2014-07-15 15:28 ` Arun Chandran 2014-07-09 18:33 ` Geoff Levand [not found] <CAFdej006OSyhgDcJ2iZdbjt+PtysN=i_+9Dr4GTmr=+t5yg4Kw@mail.gmail.com> 2014-07-15 17:04 ` Geoff Levand 2014-07-16 17:57 ` Feng Kan 2014-07-16 23:04 ` Geoff Levand 2014-07-22 9:44 ` Arun Chandran 2014-07-22 13:25 ` Arun Chandran 2014-07-24 0:38 ` Geoff Levand 2014-07-24 9:36 ` Mark Rutland 2014-07-24 12:49 ` Arun Chandran 2014-07-25 0:17 ` Geoff Levand 2014-07-25 10:31 ` Arun Chandran 2014-07-25 10:36 ` Mark Rutland 2014-07-25 11:48 ` Arun Chandran 2014-07-25 12:14 ` Mark Rutland 2014-07-25 15:29 ` Arun Chandran 2014-07-26 0:18 ` Geoff Levand 2014-07-28 15:00 ` Arun Chandran 2014-07-28 15:38 ` Mark Rutland 2014-07-29 0:09 ` Geoff Levand 2014-07-29 9:10 ` Mark Rutland 2014-07-29 12:32 ` Arun Chandran 2014-07-29 13:35 ` Mark Rutland 2014-07-29 21:19 ` Geoff Levand 2014-07-30 7:22 ` Arun Chandran 2014-08-01 11:13 ` Arun Chandran 2014-08-03 14:47 ` Mark Rutland 2014-08-04 10:16 ` Arun Chandran 2014-08-04 11:35 ` Mark Rutland 2014-08-07 0:40 ` Geoff Levand 2014-08-07 9:59 ` Mark Rutland 2014-08-07 17:09 ` Geoff Levand 2014-08-04 17:21 ` Geoff Levand 2014-08-06 13:54 ` Arun Chandran 2014-08-06 15:51 ` Arun Chandran 2014-08-07 20:07 ` Geoff Levand 2014-08-08 5:46 ` Arun Chandran 2014-08-08 10:03 ` Arun Chandran 2014-08-12 5:42 ` Arun Chandran 2014-08-13 11:09 ` Arun Chandran 2014-08-26 22:32 ` Geoff Levand 2014-08-27 4:56 ` Arun Chandran 2014-07-30 5:46 ` Arun Chandran 2014-07-30 9:16 ` Mark Rutland 2014-07-30 7:01 ` Arun Chandran 2014-07-25 10:26 ` Arun Chandran 2014-07-25 11:29 ` Mark Rutland 2014-07-24 11:50 ` Arun Chandran 2014-07-30 3:26 ` Feng Kan 2014-07-24 0:10 ` Geoff Levand 2014-07-24 9:13 ` Mark Rutland
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.