linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Kexec on arm64
       [not found] <CAFdej006OSyhgDcJ2iZdbjt+PtysN=i_+9Dr4GTmr=+t5yg4Kw@mail.gmail.com>
@ 2014-07-15 17:04 ` Geoff Levand
  2014-07-16 17:57   ` Feng Kan
  0 siblings, 1 reply; 49+ messages in thread
From: Geoff Levand @ 2014-07-15 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Tue, 2014-07-15 at 21:35 +0530, Arun Chandran wrote:
> So I am assuming some code at 0x238 is waiting for
> secondary_kernel_start address
> @000000400000fff8
> 
> So I modified my dtb file and made 0x238 as 'cpu-return-addr '
> 
...
> 
> Is this correct?

I doubt it.  I recommend you contact whoever you are contracted to do
this work for and ask them for the value.

As I mentioned in a mail from yesterday, I pushed out a work around for
this.  Did you try it?

For anyone who comes across this thread and wants to follow it back to
find that mail I sent out, good luck...

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-15 17:04 ` Kexec on arm64 Geoff Levand
@ 2014-07-16 17:57   ` Feng Kan
  2014-07-16 23:04     ` Geoff Levand
  0 siblings, 1 reply; 49+ messages in thread
From: Feng Kan @ 2014-07-16 17:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 15, 2014 at 10:04 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi,
>
> On Tue, 2014-07-15 at 21:35 +0530, Arun Chandran wrote:
>> So I am assuming some code at 0x238 is waiting for
>> secondary_kernel_start address
>> @000000400000fff8
>>
>> So I modified my dtb file and made 0x238 as 'cpu-return-addr '
>>
> ...
>>
>> Is this correct?
>
> I doubt it.  I recommend you contact whoever you are contracted to do
> this work for and ask them for the value.
>
> As I mentioned in a mail from yesterday, I pushed out a work around for
> this.  Did you try it?
>
> For anyone who comes across this thread and wants to follow it back to
> find that mail I sent out, good luck...

Hi Guys:

Just following up on the conversation. The cpu return address of 0 should work
in your case. Since thats the _start of the bootloader, it will run
some core init
code and then put the core back in wfe. However, I think this
functionality should
be pushed back into the kernel side to provide some small page of spin
code rather
than depend on the bootloader. The address 0 could be OCM or NOR flash, in the
on chip memory case, the space could potentially be used for other means.

>
> -Geoff
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-16 17:57   ` Feng Kan
@ 2014-07-16 23:04     ` Geoff Levand
  2014-07-22  9:44       ` Arun Chandran
  0 siblings, 1 reply; 49+ messages in thread
From: Geoff Levand @ 2014-07-16 23:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Feng,

On Wed, 2014-07-16 at 10:57 -0700, Feng Kan wrote:
> Just following up on the conversation. The cpu return address of 0 should work
> in your case. Since thats the _start of the bootloader, it will run
> some core init
> code and then put the core back in wfe. 

OK, I fixed up my code so that zero is valid cpu return address.  Arun,
could you try my latest I pushed out today?

> However, I think this
> functionality should
> be pushed back into the kernel side to provide some small page of spin
> code rather
> than depend on the bootloader. 

This method was already discussed in another thread [1] and decided
against.  With this we would need to set that spin code memory as
reserved from kernel use, and so each boot stage would leak some memory.
A system would eventually run out of memory over kexec's.

[1] http://comments.gmane.org/gmane.linux.ports.arm.kernel/323440

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-16 23:04     ` Geoff Levand
@ 2014-07-22  9:44       ` Arun Chandran
  2014-07-22 13:25         ` Arun Chandran
                           ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Arun Chandran @ 2014-07-22  9:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 17, 2014 at 4:34 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Feng,
>
> On Wed, 2014-07-16 at 10:57 -0700, Feng Kan wrote:
>> Just following up on the conversation. The cpu return address of 0 should work
>> in your case. Since thats the _start of the bootloader, it will run
>> some core init
>> code and then put the core back in wfe.
>
> OK, I fixed up my code so that zero is valid cpu return address.  Arun,
> could you try my latest I pushed out today?
>
Hi Geoff,

Sorry for the late reply I was away.

Yes I tried the new code.
My dts file has the below change.

################
diff --git a/arch/arm64/boot/dts/apm-storm.dtsi
b/arch/arm64/boot/dts/apm-storm.dtsi
index e0bf91d..b64e549 100644
--- a/arch/arm64/boot/dts/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm-storm.dtsi
@@ -24,64 +24,64 @@
                        compatible = "apm,potenza", "arm,armv8";
                        reg = <0x0 0x000>;
                        enable-method = "spin-table";
-                       cpu-release-addr = <0x1 0x0000fff8>;
-                       cpu-return-addr = <0x0 0x0> /* Updated by bootloader */
+                       cpu-release-addr = <0x40 0x0000fff8>;
+                       cpu-return-addr = <0x0 0x0>; /* Updated by bootloader */
                };
#################
All other cpu nodes have similar change.

1) Loading  ( I don't change commandline and dtb; assuming kexec will
reuse whatever
the booted kernel has as commandline and dtb)

# kexec -l vmlinux.strip
kexec version: 14.07.17.12.17-gb6cccb4
Modified cmdline: root=/dev/nfs
Unable to find /proc/device-tree//chosen/linux,stdout-path, printing
from purgatory is diabled
Modified cmdline: root=/dev/nfs
Unable to find /proc/device-tree//chosen/linux,stdout-path, printing
from purgatory is diabled
machine_kexec_prepare:508:
  kexec image info:
    type:        0
    start:       4000080004
    head:        0
    nr_segments: 2
      segment[0]: 0000004000080000 - 000000400088c000, 80c000h bytes, 2060 pages
kexec_is_dtb:115: magic: 4d5a0091 : 91005a4d : no
      segment[1]: 00000040008a0000 - 00000040008a3000, 3000h bytes, 3 pages
kexec_is_dtb:115: magic: d00dfeed : edfe0dd0 : yes
kexec_boot_info_init:384: cpu_count: 8
kexec_cpu_info_init:352: cpu-0: hwid-0, 'spin-table', cpu-release-addr
400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-1: hwid-1, 'spin-table', cpu-release-addr
400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-2: hwid-100, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-3: hwid-101, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-4: hwid-200, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-5: hwid-201, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-6: hwid-300, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-7: hwid-301, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_is_dtb:115: magic: 4d5a0091 : 91005a4d : no
kexec_is_dtb:115: magic: d00dfeed : edfe0dd0 : yes
kexec_boot_info_init:384: cpu_count: 8
kexec_cpu_info_init:352: cpu-0: hwid-0, 'spin-table', cpu-release-addr
400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-1: hwid-1, 'spin-table', cpu-release-addr
400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-2: hwid-100, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-3: hwid-101, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-4: hwid-200, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-5: hwid-201, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-6: hwid-300, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_info_init:352: cpu-7: hwid-301, 'spin-table',
cpu-release-addr 400000fff8, cpu-return-addr 0
kexec_cpu_check:440: hwid-0 OK
kexec_cpu_check:440: hwid-1 OK
kexec_cpu_check:440: hwid-100 OK
kexec_cpu_check:440: hwid-101 OK
kexec_cpu_check:440: hwid-200 OK
kexec_cpu_check:440: hwid-201 OK
kexec_cpu_check:440: hwid-300 OK
kexec_cpu_check:440: hwid-301 OK


2) Rebooting
#########################
# kexec -e
kexec version: 1kvm: exiting hardware virtualization
4.07.17.12.17-gbStarting new kernel
6cccb4
 Ump_spin_tanblaeb_lcpeu_d ite:127: oi dh:a n7d,l holding count: 0e
 kernel NULL pointer dereference at virtual address 00000291
smp_spin_Itnaibtlei_cpaul_diie:12z7:i nigd :c 3g,r oholding couunpt :s 0u
bsys cpu
smpL_isnpuixn _table_cpu_diev:e1r2s7i:o ni d: 6, hol3d.i1n6g. 0c-orucnt: 0
4+ (arun at arun-OptiPlex-9010) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05) )
#25 SMP smp_sPpin_tablReE_EcMpPuT_ dTie:127: iude:  J5u, hlo ld2ing
coun2:
37:03 IST 2014
smp_Cspin_tPabUl:e _AcApruc_die:127: id: h46,4  hPorldiong count:c e0s
 or [500f0000] revision 0
smp_speifni_:t aGbeltet_cpu_die:i1n2g7 : ipd:a 2, holdinrga mceount:t e0
rs from FDT:
smep_fsip:i nC_atable_cpu_dien:'1t27: i df: 1i, holdinngd  cSoyusntt:e 0
m Table in device tree!
macchimne_kexaec:: 572C: smp_pMrAo:c efsasiorl_ied = 0
d to reserve 16 MiB
dachine_kexecO:n5 7n4o:
 e 0 totalpages: 4194304
a k e xec image inNfoor:m
 l zone: 57344 pages used for memmap
    type:        0
z   sta r tN:o     r  m4a0000800l04
 one: 4194304 pages, LIFO batch:31
    head:   P E R   43Cea9bPf002
U: Embedded 11 pages/cpu @ffffffc3fff7d000 s13120 r8192 d23744 u45056
    nrp_cspeug-maelnltosc: 2
: s13120 r8192 d23744 u45056 alloc=11*4096
   p c p usegment-[a0l]l:o c0:0 0[0000400]00 800 000[ -0 ]0000004000
881c 0[000],  280c000h bytes,  [200]6 0 3pages
 [0] 4 [0] 5 [0] 6 [0] 7
eexBeuc_is_dtb:1i1l5:t  m1a gizc: 0 : 0 : noon
 lists in Zone order, mobility grouping on.  Total pages: 4136960
/  K e r segment[1]: n0e0l0 00c0o400m08a0000 m-a n0d00
00040l008ai30n00e,:  3r000ho obty=tes, 3/ dpeagesv
 nfs rw nfsroot=10.162.103.228:/nfs_root/dora_june_6/apm-image-minimal-mustangbe
ip=10.162.103.21:10.162.103.228:10.162.103.1:255.255.255.0:mustangk:eextehc_0is:_odtb:1f15f:
 pmaangic:i 0c =: 0 : 1no
console=ttyS0,115200 earlyprintk=uart8250-32bit,0x1c020000 debug
maxcpus=8 swiotlb=65536 log_buf_len=1M
8aclhinoeg_kex_ec:582: cobnturfo_ll_ecode_page:        nf:f f1ff0fbc4edb67ee8
 576
6achinee_akrelxyec: 58l4: reobogo tb_ucfo dfe_buffer_physr:e e :0
000010435eaffb00007
 (92%)
macPhine_IkDexec:58 6h:a srehb otot_acode_bufferb:l e   e n t
rfifffffce3se:a f4f0b000
96 (order: 3, 32768 bytes)
macDhine_kexeecn:t5r88: ryel occate_neaw_ckheer nelh:a s     fffffhf
c0t0a0093b18
ble entries: 2097152 (order: 12, 16777216 bytes)
machineI_kneoxedc:5e90-: relocate_cnaecwh_ek ehransel_size: b8hh( 1t84)a bytes
ble entries: 1048576 (order: 11, 8388608 bytes)
machinMe_keemxoercy::5 913: kexec6_5d0t8b_6addr2:4 K
0/0100004600708a0000
77216K available (4360K kernel code, 299K rwdata, 1528K rodata, 6556K
init, 202K bss, 268592K reserved)
oacVhinei_kretxueacl: 595: kexec_kkeirmnaegle _mhead:     e m o r0y0
00l0043eaa9bf002y
 ut:
    vmalloc : 0xffffff8000000000 - 0xffffffbbffff0000   (245759 MB)
    vmemmap : 0xffffffbce0000000 - 0xffffffbcee000000   (   224 MB)
0  machine_ kmeoxdecu:l597:e kexecs_ k:i m0age_xsftarft:    f  f
f0f0b0f0f0c04000080004
 00000 - 0xffffffc000000000   (    64 MB)
    memory  : 0xffffffc000000000 - 0xffffffc400000000   ( 16384 MB)
      .init : 0xffffffc000642000 -machine_ke x0excf:f5f99:f
kexec_efntfrcy0_0d0ump:
ca9340   (  6557 kB)
      .text : 0xffffffc000080000 - 0xffffffc0006411c4   (  5893 kB)
f     .data : 0xffffffc000caa000 - 0xffffffc000cf4f28     (I
43ea9bf002 =  343ea9b0f000 (f0fffffc 3ekaB9)b
 000)
d D 40S0LU00B8:0 0H0W1 = a4000080000l i(gfnf=f6f4ffc,00008000 0O)r
######################

This doesn't seems to be working. Random behaviors are observed. Some
times it rebooted to u-boot
prompt. Sometimes kernel soft resets itself in an endless loop
(bootlog is repeating over and over again)

To debug what is happening I put a while(1) just before jumping into
kexec reboot code.

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 31cba91..8843623 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -85,6 +85,7 @@ void soft_restart(unsigned long addr)

        smp_secondary_shutdown();

+       while(1);
        /* Switch to the identity mapping */
        phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
        phys_reset(addr);

I break into target with BDI3000 now; and see the below output

TARGET#0>state
Core#0: halted 0xffffffc000085240 External Debug Request
Core#1: halted 0x0000004000080394 External Debug Request
Core#2: halted 0x0000004000080394 External Debug Request
Core#3: halted 0x0000004000080394 External Debug Request
Core#4: halted 0xffffffc0000802f8 External Debug Request
Core#5: halted 0x0000004000080394 External Debug Request
Core#6: halted 0xffffffc0000802f8 External Debug Request
Core#7: halted 0x0000004000080394 External Debug Request

I think some of the secondary CPUs are not behaving as expected;
As of now I don't have any clues for this.


--Arun

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-22  9:44       ` Arun Chandran
@ 2014-07-22 13:25         ` Arun Chandran
  2014-07-24  0:38           ` Geoff Levand
  2014-07-30  3:26           ` Feng Kan
  2014-07-24  0:10         ` Geoff Levand
  2014-07-24  9:13         ` Mark Rutland
  2 siblings, 2 replies; 49+ messages in thread
From: Arun Chandran @ 2014-07-22 13:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Tue, Jul 22, 2014 at 3:14 PM, Arun Chandran <achandran@mvista.com> wrote:
> On Thu, Jul 17, 2014 at 4:34 AM, Geoff Levand <geoff@infradead.org> wrote:
>> Hi Feng,
>>
>> On Wed, 2014-07-16 at 10:57 -0700, Feng Kan wrote:
>>> Just following up on the conversation. The cpu return address of 0 should work
>>> in your case. Since thats the _start of the bootloader, it will run
>>> some core init
>>> code and then put the core back in wfe.
>>
>> OK, I fixed up my code so that zero is valid cpu return address.  Arun,
>> could you try my latest I pushed out today?
>>
> Hi Geoff,
>
> Sorry for the late reply I was away.
>
> Yes I tried the new code.
> My dts file has the below change.
>
> ################
> diff --git a/arch/arm64/boot/dts/apm-storm.dtsi
> b/arch/arm64/boot/dts/apm-storm.dtsi
> index e0bf91d..b64e549 100644
> --- a/arch/arm64/boot/dts/apm-storm.dtsi
> +++ b/arch/arm64/boot/dts/apm-storm.dtsi
> @@ -24,64 +24,64 @@
>                         compatible = "apm,potenza", "arm,armv8";
>                         reg = <0x0 0x000>;
>                         enable-method = "spin-table";
> -                       cpu-release-addr = <0x1 0x0000fff8>;
> -                       cpu-return-addr = <0x0 0x0> /* Updated by bootloader */
> +                       cpu-release-addr = <0x40 0x0000fff8>;
> +                       cpu-return-addr = <0x0 0x0>; /* Updated by bootloader */
>                 };
> #################
> All other cpu nodes have similar change.
>

I tried the same dtb with UP configuration. For UP kernel to compile
did the below modifications

##############################
diff --git a/arch/arm64/kernel/cpu_ops.c b/arch/arm64/kernel/cpu_ops.c
index 1ccedb4c..c6a2a7e 100644
--- a/arch/arm64/kernel/cpu_ops.c
+++ b/arch/arm64/kernel/cpu_ops.c
@@ -89,6 +89,7 @@ void __init cpu_read_bootcpu_ops(void)
        cpu_read_ops(dn, 0);
 }

+#if 0
 int __init cpu_ops_init(void)
 {
        int result = 0;
@@ -110,3 +111,4 @@ void cpu_ops_shutdown(void)
        if (cpu_operation_psci.shutdown)
                cpu_operation_psci.shutdown();
 }
+#endif
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index fba6d50..c3cf246 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -609,7 +609,7 @@ void machine_kexec(struct kimage *image)
        flush_icache_range((unsigned long)reboot_code_buffer,
                (unsigned long)reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);

-       dump_cpus();
+       //dump_cpus();

        pr_info("Bye!\n");

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 31cba91..6bc85f78 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -83,7 +83,7 @@ void soft_restart(unsigned long addr)

        setup_restart();

-       smp_secondary_shutdown();
+       //smp_secondary_shutdown();

        /* Switch to the identity mapping */
        phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
@@ -130,7 +130,7 @@ void arch_cpu_idle_dead(void)
  */
 void machine_shutdown(void)
 {
-       smp_send_stop();
+       //smp_send_stop();
 }

 /*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index c29dde1..14c339c 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -73,7 +73,7 @@ ENTRY(cpu_reset)
        bic     x1, x1, #1
        msr     sctlr_el1, x1                   // disable the MMU
        isb
-       bl      secondary_shutdown
+#      bl      secondary_shutdown
        ret     x0
 ENDPROC(cpu_reset)
#####################

With the default target configuration "kexec -e" failed to execute
in UP scenario also.

But I had some luck when I did the same steps with L3 cache
disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
it has an L3 cache. Luckily I was able to disable it in u-boot.

With the L3 cache disabled configuration I am able to
do "kexec -e". Please see the log attached.

Feng,
I doubt kernel is unaware of the presence of L3 cache, this subsequently
makes "kexec -e" to fail.

Do you have any idea how to make the kernel to take control of L3 cache?

--Arun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kexec_unipro_log
Type: application/octet-stream
Size: 9227 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140722/8336d688/attachment-0001.obj>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-22  9:44       ` Arun Chandran
  2014-07-22 13:25         ` Arun Chandran
@ 2014-07-24  0:10         ` Geoff Levand
  2014-07-24  9:13         ` Mark Rutland
  2 siblings, 0 replies; 49+ messages in thread
From: Geoff Levand @ 2014-07-24  0:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Tue, 2014-07-22 at 15:14 +0530, Arun Chandran wrote:
> # kexec -e

Please use 'kexec -d -e' here to get debug output.

> kexec version: 1kvm: exiting hardware virtualization
> 4.07.17.12.17-gbStarting new kernel
> 6cccb4
>  Ump_spin_tanblaeb_lcpeu_d ite:127: oi dh:a n7d,l holding count: 0e
>  kernel NULL pointer dereference at virtual address 00000291
> smp_spin_Itnaibtlei_cpaul_diie:12z7:i nigd :c 3g,r oholding couunpt :s 0u

It is hard to read this, please send the various outputs to different
streams.

> I think some of the secondary CPUs are not behaving as expected;
> As of now I don't have any clues for this.

I guess the cpu-return-addr is not correct.  Try setting cpu-return-addr
to <0x0 0x1>.  This will spin secondaries in secondary_holding_pen.

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-22 13:25         ` Arun Chandran
@ 2014-07-24  0:38           ` Geoff Levand
  2014-07-24  9:36             ` Mark Rutland
  2014-07-24 11:50             ` Arun Chandran
  2014-07-30  3:26           ` Feng Kan
  1 sibling, 2 replies; 49+ messages in thread
From: Geoff Levand @ 2014-07-24  0:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:

> I tried the same dtb with UP configuration. For UP kernel to compile
> did the below modifications

I'll test and fixup the kexec UP build in the next few days.

...

> With the default target configuration "kexec -e" failed to execute
> in UP scenario also.
> 
> But I had some luck when I did the same steps with L3 cache
> disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
> it has an L3 cache. Luckily I was able to disable it in u-boot.
> 
> With the L3 cache disabled configuration I am able to
> do "kexec -e". Please see the log attached.

All memory management for the main cpu is done by the arch code.  Kexec
and cpu hot plug only work with the secondary cpus, so the problem would
be in the arch memory code, either in setup_restart() for shutdown, or
in the startup code.

I guess setup_restart() is not doing something it needs to do for your
platform.

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-22  9:44       ` Arun Chandran
  2014-07-22 13:25         ` Arun Chandran
  2014-07-24  0:10         ` Geoff Levand
@ 2014-07-24  9:13         ` Mark Rutland
  2 siblings, 0 replies; 49+ messages in thread
From: Mark Rutland @ 2014-07-24  9:13 UTC (permalink / raw)
  To: linux-arm-kernel

[...]

> 2) Rebooting
> #########################
> # kexec -e
> kexec version: 1kvm: exiting hardware virtualization
> 4.07.17.12.17-gbStarting new kernel
> 6cccb4
>  Ump_spin_tanblaeb_lcpeu_d ite:127: oi dh:a n7d,l holding count: 0e
>  kernel NULL pointer dereference at virtual address 00000291
> smp_spin_Itnaibtlei_cpaul_diie:12z7:i nigd :c 3g,r oholding couunpt :s 0u

Hmm. This looks like two threads/CPUs are outputting characters at the
same time, something like "smp_spin_table_die" and "Initial cpu" seem to
be interspersed.

Mark.

> bsys cpu
> smpL_isnpuixn _table_cpu_diev:e1r2s7i:o ni d: 6, hol3d.i1n6g. 0c-orucnt: 0
> 4+ (arun at arun-OptiPlex-9010) (gcc version 4.9.1 20140505 (prerelease)
> (crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05) )
> #25 SMP smp_sPpin_tablReE_EcMpPuT_ dTie:127: iude:  J5u, hlo ld2ing
> coun2:
> 37:03 IST 2014
> smp_Cspin_tPabUl:e _AcApruc_die:127: id: h46,4  hPorldiong count:c e0s
>  or [500f0000] revision 0
> smp_speifni_:t aGbeltet_cpu_die:i1n2g7 : ipd:a 2, holdinrga mceount:t e0
> rs from FDT:
> smep_fsip:i nC_atable_cpu_dien:'1t27: i df: 1i, holdinngd  cSoyusntt:e 0
> m Table in device tree!
> macchimne_kexaec:: 572C: smp_pMrAo:c efsasiorl_ied = 0
> d to reserve 16 MiB
> dachine_kexecO:n5 7n4o:
>  e 0 totalpages: 4194304
> a k e xec image inNfoor:m
>  l zone: 57344 pages used for memmap
>     type:        0
> z   sta r tN:o     r  m4a0000800l04
>  one: 4194304 pages, LIFO batch:31
>     head:   P E R   43Cea9bPf002
> U: Embedded 11 pages/cpu @ffffffc3fff7d000 s13120 r8192 d23744 u45056
>     nrp_cspeug-maelnltosc: 2
> : s13120 r8192 d23744 u45056 alloc=11*4096
>    p c p usegment-[a0l]l:o c0:0 0[0000400]00 800 000[ -0 ]0000004000
> 881c 0[000],  280c000h bytes,  [200]6 0 3pages
>  [0] 4 [0] 5 [0] 6 [0] 7
> eexBeuc_is_dtb:1i1l5:t  m1a gizc: 0 : 0 : noon
>  lists in Zone order, mobility grouping on.  Total pages: 4136960
> /  K e r segment[1]: n0e0l0 00c0o400m08a0000 m-a n0d00
> 00040l008ai30n00e,:  3r000ho obty=tes, 3/ dpeagesv
>  nfs rw nfsroot=10.162.103.228:/nfs_root/dora_june_6/apm-image-minimal-mustangbe
> ip=10.162.103.21:10.162.103.228:10.162.103.1:255.255.255.0:mustangk:eextehc_0is:_odtb:1f15f:
>  pmaangic:i 0c =: 0 : 1no
> console=ttyS0,115200 earlyprintk=uart8250-32bit,0x1c020000 debug
> maxcpus=8 swiotlb=65536 log_buf_len=1M
> 8aclhinoeg_kex_ec:582: cobnturfo_ll_ecode_page:        nf:f f1ff0fbc4edb67ee8
>  576
> 6achinee_akrelxyec: 58l4: reobogo tb_ucfo dfe_buffer_physr:e e :0
> 000010435eaffb00007
>  (92%)
> macPhine_IkDexec:58 6h:a srehb otot_acode_bufferb:l e   e n t
> rfifffffce3se:a f4f0b000
> 96 (order: 3, 32768 bytes)
> macDhine_kexeecn:t5r88: ryel occate_neaw_ckheer nelh:a s     fffffhf
> c0t0a0093b18
> ble entries: 2097152 (order: 12, 16777216 bytes)
> machineI_kneoxedc:5e90-: relocate_cnaecwh_ek ehransel_size: b8hh( 1t84)a bytes
> ble entries: 1048576 (order: 11, 8388608 bytes)
> machinMe_keemxoercy::5 913: kexec6_5d0t8b_6addr2:4 K
> 0/0100004600708a0000
> 77216K available (4360K kernel code, 299K rwdata, 1528K rodata, 6556K
> init, 202K bss, 268592K reserved)
> oacVhinei_kretxueacl: 595: kexec_kkeirmnaegle _mhead:     e m o r0y0
> 00l0043eaa9bf002y
>  ut:
>     vmalloc : 0xffffff8000000000 - 0xffffffbbffff0000   (245759 MB)
>     vmemmap : 0xffffffbce0000000 - 0xffffffbcee000000   (   224 MB)
> 0  machine_ kmeoxdecu:l597:e kexecs_ k:i m0age_xsftarft:    f  f
> f0f0b0f0f0c04000080004
>  00000 - 0xffffffc000000000   (    64 MB)
>     memory  : 0xffffffc000000000 - 0xffffffc400000000   ( 16384 MB)
>       .init : 0xffffffc000642000 -machine_ke x0excf:f5f99:f
> kexec_efntfrcy0_0d0ump:
> ca9340   (  6557 kB)
>       .text : 0xffffffc000080000 - 0xffffffc0006411c4   (  5893 kB)
> f     .data : 0xffffffc000caa000 - 0xffffffc000cf4f28     (I
> 43ea9bf002 =  343ea9b0f000 (f0fffffc 3ekaB9)b
>  000)
> d D 40S0LU00B8:0 0H0W1 = a4000080000l i(gfnf=f6f4ffc,00008000 0O)r
> ######################
> 
> This doesn't seems to be working. Random behaviors are observed. Some
> times it rebooted to u-boot
> prompt. Sometimes kernel soft resets itself in an endless loop
> (bootlog is repeating over and over again)
> 
> To debug what is happening I put a while(1) just before jumping into
> kexec reboot code.
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 31cba91..8843623 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -85,6 +85,7 @@ void soft_restart(unsigned long addr)
> 
>         smp_secondary_shutdown();
> 
> +       while(1);
>         /* Switch to the identity mapping */
>         phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
>         phys_reset(addr);
> 
> I break into target with BDI3000 now; and see the below output
> 
> TARGET#0>state
> Core#0: halted 0xffffffc000085240 External Debug Request
> Core#1: halted 0x0000004000080394 External Debug Request
> Core#2: halted 0x0000004000080394 External Debug Request
> Core#3: halted 0x0000004000080394 External Debug Request
> Core#4: halted 0xffffffc0000802f8 External Debug Request
> Core#5: halted 0x0000004000080394 External Debug Request
> Core#6: halted 0xffffffc0000802f8 External Debug Request
> Core#7: halted 0x0000004000080394 External Debug Request
> 
> I think some of the secondary CPUs are not behaving as expected;
> As of now I don't have any clues for this.
> 
> 
> --Arun
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-24  0:38           ` Geoff Levand
@ 2014-07-24  9:36             ` Mark Rutland
  2014-07-24 12:49               ` Arun Chandran
                                 ` (2 more replies)
  2014-07-24 11:50             ` Arun Chandran
  1 sibling, 3 replies; 49+ messages in thread
From: Mark Rutland @ 2014-07-24  9:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
> Hi Arun,
> 
> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:
> 
> > I tried the same dtb with UP configuration. For UP kernel to compile
> > did the below modifications
> 
> I'll test and fixup the kexec UP build in the next few days.
> 
> ...
> 
> > With the default target configuration "kexec -e" failed to execute
> > in UP scenario also.

It would be helpful to know _how_ it failed. Do you have any log output?

> > 
> > But I had some luck when I did the same steps with L3 cache
> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
> > it has an L3 cache. Luckily I was able to disable it in u-boot.
> > 
> > With the L3 cache disabled configuration I am able to
> > do "kexec -e". Please see the log attached.

Hmm. We don't expect the kernel to do any L3 management. It seems that
memory subsystems with L3 caches respecting cache maintenance by VA are
going to become relatively common, and we expect to handle them all by
performing maintenance by VA. See commit c218bca74eea (arm64: Relax the
kernel cache requirements for boot) for what we do at boot time.

> 
> All memory management for the main cpu is done by the arch code.  Kexec
> and cpu hot plug only work with the secondary cpus, so the problem would
> be in the arch memory code, either in setup_restart() for shutdown, or
> in the startup code.

It's possible that soft_restart and setup_restart are a little dodgy, as
they also rely on the compiler being smart and not touching the stack
after setup_restart().

However, they provide absolutely no guarantee that any data has been
flushed out to the PoC [1]. If you require any data to be flushed out to the
PoC so as to be visible to noncacheable accesses, you will need to
ensure that this is flushed by VA before soft_restart is called. Data
may have migrated to another cache (e.g. another CPU, or the L3) where
it is not visible.

> 
> I guess setup_restart() is not doing something it needs to do for your
> platform.

Unless you can see soft_restart/setup_restart making stack accesses
after the first flush_cache_all call, I suspect your code is not
flushing some data it requires.

I'd been meaning to clean up soft_restart and setup_restart for a while.
I'll have a go at cleaning them up along with any remaining
flush_cache_all abuse in arm64.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-24  0:38           ` Geoff Levand
  2014-07-24  9:36             ` Mark Rutland
@ 2014-07-24 11:50             ` Arun Chandran
  1 sibling, 0 replies; 49+ messages in thread
From: Arun Chandran @ 2014-07-24 11:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, Jul 24, 2014 at 6:08 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:
>
>> I tried the same dtb with UP configuration. For UP kernel to compile
>> did the below modifications
>
> I'll test and fixup the kexec UP build in the next few days.
>

Ok.

> ...
>
>> With the default target configuration "kexec -e" failed to execute
>> in UP scenario also.
>>
>> But I had some luck when I did the same steps with L3 cache
>> disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
>> it has an L3 cache. Luckily I was able to disable it in u-boot.
>>
>> With the L3 cache disabled configuration I am able to
>> do "kexec -e". Please see the log attached.
>
> All memory management for the main cpu is done by the arch code.  Kexec
> and cpu hot plug only work with the secondary cpus, so the problem would
> be in the arch memory code, either in setup_restart() for shutdown, or
> in the startup code.
>
> I guess setup_restart() is not doing something it needs to do for your
> platform.
>
I have done different experiments with L3 enabled in UP(uni processor) scenario.
Please note that in all the experiments first stage and second stage kernels
are same.

-- Experiment 1--
Kernel is modified to loop before jumping to the kexec "relocate_new_kernel"
code + other modification (disable Dcache turning off)
###############
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 3f7b0a2..e4ea22f 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -73,6 +73,8 @@ ENTRY(cpu_reset)
        bic     x1, x1, #1
        msr     sctlr_el1, x1                   // disable the MMU
        isb
+loop:
+       b       loop

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 31cba91..888fe3f 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -70,10 +70,10 @@ static void setup_restart(void)
        flush_cache_all();

        /* Turn D-cache off */
-       cpu_cache_off();
+       //cpu_cache_off();

        /* Push out any further dirty data, and ensure cache is empty */
-       flush_cache_all();
+       //flush_cache_all();
 }
################

a) Load the second kernel "kexec -l"
b) Execute kexec -e; now it is looping @loop
c) Break into target using BDI3000
d) Flush L3 cache from BDI3000
c) Jump to relocate_new_kernel

CPU#0>rd
GPR00: 00000043eae0f000 0000000034d5d91c 0000004000000000 0000000000000004

CPU#0>go 0x00000043eae0f000

e) Kexeced kernel is booted without any issue.

--Experiment2--
Now revert only the Dcache disabling change
############
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 888fe3f..6bc85f78 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -70,10 +70,10 @@ static void setup_restart(void)
        flush_cache_all();

        /* Turn D-cache off */
-       //cpu_cache_off();
+       cpu_cache_off();

        /* Push out any further dirty data, and ensure cache is empty */
-       //flush_cache_all();
+       flush_cache_all();
 }
###############

Do the above steps a, b and c.

d)
>From BDI3000 I see strange value for x0
CPU#0>rd
GPR00: 000000000000003f 0000000034d5d918 0000004000000000 0000000000000004
e) Flush L3 cache from BDI3000
f) Jump to relocate_new_kernel (kexec -e prints this address)

machine_kexec:584: reboot_code_buffer_phys:  00000043f0381000

CPU#0>go 0x00000043f0381000

g) Kexeced kernel fails to boot

CPU#0>h
    Core number       : 0
    Core state        : debug (AArch64 EL1)
    Debug entry cause : External Debug Request
    Current PC        : 0xffffffc000083200
    Current CPSR      : 0x000003c5 (EL1h)

So If i don't turn off the dcache and flush L3 using
BDI3000 things are working.

--Experiment3--
Added L3 flush code to kernel +  other modification (disable Dcache turning off)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 0faa45a..5c546bb 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -233,6 +233,20 @@ section_table:

 ENTRY(stext)
        mov     x21, x0                         // x21=FDT
+flush_l3:
+       mov     x2, #0x10
+       mov     w1, #0x1f00
+       movk    x2, #0x7e60, lsl #16
+       movk    w1, #0x1600, lsl #16
+       str     w1, [x2]
+       mov     x4, #0
+wait_flush:
+       ldr     w1, [x2]
+       add     x4, x4 ,#1
+       tbz     w1, #31, wait_done
+       b       wait_flush
+wait_done:
+
        bl      el2_setup                       // Drop to EL1,
w20=cpu_boot_mode
        bl      __calc_phys_offset              // x24=PHYS_OFFSET,
x28=PHYS_OFFSET-PAGE_OFFSET
        bl      set_cpu_boot_mode_flag

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 31cba91..888fe3f 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -70,10 +70,10 @@ static void setup_restart(void)
        flush_cache_all();

        /* Turn D-cache off */
-       cpu_cache_off();
+       //cpu_cache_off();

        /* Push out any further dirty data, and ensure cache is empty */
-       flush_cache_all();
+       //flush_cache_all();
 }

 /*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index c29dde1..3f7b0a2 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -73,7 +73,22 @@ ENTRY(cpu_reset)
        bic     x1, x1, #1
        msr     sctlr_el1, x1                   // disable the MMU
        isb
-       bl      secondary_shutdown
+
+flush_l3:
+       mov     x2, #0x10
+       mov     w1, #0x1f00
+       movk    x2, #0x7e60, lsl #16
+       movk    w1, #0x1600, lsl #16
+       str     w1, [x2]
+       mov     x4, #0
+wait_flush:
+       ldr     w1, [x2]
+       add     x4, x4 ,#1
+       tbz     w1, #31, wait_done
+       b       wait_flush
+wait_done:
+
+#      bl      secondary_shutdown
        ret     x0
 ENDPROC(cpu_reset)

Now also kexeced kernel boots fine.
If i do the same with "Dcache turning off enabled"
booting of kexeced kernel fails.

--Arun

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-24  9:36             ` Mark Rutland
@ 2014-07-24 12:49               ` Arun Chandran
  2014-07-25  0:17               ` Geoff Levand
  2014-07-25 10:26               ` Arun Chandran
  2 siblings, 0 replies; 49+ messages in thread
From: Arun Chandran @ 2014-07-24 12:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 24, 2014 at 3:06 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>> Hi Arun,
>>
>> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:
>>
>> > I tried the same dtb with UP configuration. For UP kernel to compile
>> > did the below modifications
>>
>> I'll test and fixup the kexec UP build in the next few days.
>>
>> ...
>>
>> > With the default target configuration "kexec -e" failed to execute
>> > in UP scenario also.
>
> It would be helpful to know _how_ it failed. Do you have any log output?

I don't have any error log for for this.
If I loop before jumping to relocate_new_kernel and break with
BDI I can see.

CPU#0>state
Core#0: halted 0x0000004000094230 External Debug Request

CPU#0>rd
GPR00: 000000000000003f 0000000034d5d918 0000004000000000 0000000000000004
GPR04: 000000000000001f 000000000000001b 0000000000000000 ffffffffffffffff
GPR08: 0000000000000014 ffffffffffffffff 0000000000000000 0000000000000002
GPR12: ffffffc000085030 aa0e03ed3600004a 9100218cf940018a ffffffffffffffff
GPR16: ffffffc0000cc00c 0000000000434260 0000007ffe514d80 00000043eadd2000
GPR20: ffffffc3eadd2000 ffffffc000cc5000 ffffffc3ea42bf88 00000003eadd2000
GPR24: ffffffc0004ada58 ffffffc0005b2e68 ffffffc0005b2e88 ffffffc0005b2ef0
GPR28: ffffffc000092000 ffffffc3f017bce0 ffffffc000085054 ffffffc3f017bce0

x0 contains a corrupted address in this case

If I do the same without dosabling Dcache
#######
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6bc85f78..61f95a2 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -70,10 +70,10 @@ static void setup_restart(void)
        flush_cache_all();

        /* Turn D-cache off */
-       cpu_cache_off();
+//     cpu_cache_off();

        /* Push out any further dirty data, and ensure cache is empty */
-       flush_cache_all();
+//     flush_cache_all();
 }

 void soft_restart(unsigned long addr)
##########
I can see

CPU#0>state
Core#0: halted 0x0000004000094230 External Debug Request

CPU#0>
CPU#0>rd
GPR00: 00000043eae90000 0000000034d5d91c 0000004000000000 0000000000000004
GPR04: 000000000000001f 000000000000001b 0000000000000000 ffffffffffffffff
GPR08: 0000000000000014 ffffffffffffffff 0000000000000000 0000000000000002
GPR12: ffffffc000085028 aa0e03ed3600004a 9100218cf940018a ffffffffffffffff
GPR16: ffffffc0000cc00c 0000000000434260 0000007fe79f31e0 00000043eae90000
GPR20: ffffffc3eae90000 ffffffc000cc5000 ffffffc3ea42ef88 00000003eae90000
GPR24: ffffffc0004ada58 ffffffc0005b2e68 ffffffc0005b2e88 ffffffc0005b2ef0
GPR28: ffffffc000092000 ffffffc3ebad7ce0 ffffffc00008504c ffffffc3ebad7ce0
CPU#0>
CPU#0>go 0x00000043eae90000
CPU#0>
CPU#0>
CPU#0>h
    Core number       : 0
    Core state        : debug (AArch64 EL1)
    Debug entry cause : External Debug Request
    Current PC        : 0xffffffc000083200
    Current CPSR      : 0x600003c5 (EL1h)

In this case x0 contais the correct jump address; but kexeced
goes to " 0xffffffc000083200". It did not print anything at
console.

If you have any other experiments to find the root cause
please tell me I can do it.

>
>> >
>> > But I had some luck when I did the same steps with L3 cache
>> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
>> > it has an L3 cache. Luckily I was able to disable it in u-boot.
>> >
>> > With the L3 cache disabled configuration I am able to
>> > do "kexec -e". Please see the log attached.
>
> Hmm. We don't expect the kernel to do any L3 management. It seems that
> memory subsystems with L3 caches respecting cache maintenance by VA are
> going to become relatively common, and we expect to handle them all by
> performing maintenance by VA. See commit c218bca74eea (arm64: Relax the
> kernel cache requirements for boot) for what we do at boot time.
>

I will confirm this by booting with L3 cache enabled, reboot to u-boot,
don't flush L3 from u-boot and again boot to linux.

--Arun

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-24  9:36             ` Mark Rutland
  2014-07-24 12:49               ` Arun Chandran
@ 2014-07-25  0:17               ` Geoff Levand
  2014-07-25 10:31                 ` Arun Chandran
                                   ` (2 more replies)
  2014-07-25 10:26               ` Arun Chandran
  2 siblings, 3 replies; 49+ messages in thread
From: Geoff Levand @ 2014-07-25  0:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
 
> > All memory management for the main cpu is done by the arch code.  Kexec
> > and cpu hot plug only work with the secondary cpus, so the problem would
> > be in the arch memory code, either in setup_restart() for shutdown, or
> > in the startup code.
> 
> It's possible that soft_restart and setup_restart are a little dodgy, as
> they also rely on the compiler being smart and not touching the stack
> after setup_restart().
> 
> However, they provide absolutely no guarantee that any data has been
> flushed out to the PoC [1]. If you require any data to be flushed out to the
> PoC so as to be visible to noncacheable accesses, you will need to
> ensure that this is flushed by VA before soft_restart is called. Data
> may have migrated to another cache (e.g. another CPU, or the L3) where
> it is not visible.

OK, kexec's reset routine relocate_new_kernel does use a few global
variables that are set by the main kexec routines.  I added a call to
__flush_dcache_area(), which uses 'dc civac' for those.

I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
flush_cache_all() would be enough.

Arun, I also fixed UP builds.  Could you pull my latest and try with L3
enabled?

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-24  9:36             ` Mark Rutland
  2014-07-24 12:49               ` Arun Chandran
  2014-07-25  0:17               ` Geoff Levand
@ 2014-07-25 10:26               ` Arun Chandran
  2014-07-25 11:29                 ` Mark Rutland
  2 siblings, 1 reply; 49+ messages in thread
From: Arun Chandran @ 2014-07-25 10:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 24, 2014 at 3:06 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>> Hi Arun,
>>
>> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:
>>
>> > I tried the same dtb with UP configuration. For UP kernel to compile
>> > did the below modifications
>>
>> I'll test and fixup the kexec UP build in the next few days.
>>
>> ...
>>
>> > With the default target configuration "kexec -e" failed to execute
>> > in UP scenario also.
>
> It would be helpful to know _how_ it failed. Do you have any log output?
>
>> >
>> > But I had some luck when I did the same steps with L3 cache
>> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
>> > it has an L3 cache. Luckily I was able to disable it in u-boot.
>> >
>> > With the L3 cache disabled configuration I am able to
>> > do "kexec -e". Please see the log attached.
>
> Hmm. We don't expect the kernel to do any L3 management. It seems that
> memory subsystems with L3 caches respecting cache maintenance by VA are
> going to become relatively common, and we expect to handle them all by
> performing maintenance by VA. See commit c218bca74eea (arm64: Relax the
> kernel cache requirements for boot) for what we do at boot time.
>
>>
>> All memory management for the main cpu is done by the arch code.  Kexec
>> and cpu hot plug only work with the secondary cpus, so the problem would
>> be in the arch memory code, either in setup_restart() for shutdown, or
>> in the startup code.
>
> It's possible that soft_restart and setup_restart are a little dodgy, as
> they also rely on the compiler being smart and not touching the stack
> after setup_restart().
>
Could you please explain why this is required?


This is my disassembled output of  soft_restart()
With the latest code from
https://git.linaro.org/people/geoff.levand/linux-kexec.git

ffffffc000085014 <soft_restart>:
ffffffc000085014:       a9be7bfd        stp     x29, x30, [sp,#-32]!
ffffffc000085018:       910003fd        mov     x29, sp
ffffffc00008501c:       f9000fa0        str     x0, [x29,#24]
ffffffc000085020:       94003c49        bl      ffffffc000094144
<setup_mm_for_reboot>
ffffffc000085024:       94003a6b        bl      ffffffc0000939d0
<flush_cache_all>
ffffffc000085028:       94003cde        bl      ffffffc0000943a0 <cpu_cache_off>
ffffffc00008502c:       94003a69        bl      ffffffc0000939d0
<flush_cache_all>
ffffffc000085030:       90006201        adrp    x1, ffffffc000cc5000
<reset_devices>
ffffffc000085034:       f9400fa0        ldr     x0, [x29,#24]
ffffffc000085038:       f940fc22        ldr     x2, [x1,#504]
ffffffc00008503c:       f0000061        adrp    x1, ffffffc000094000
<arch_pick_mmap_layout+0x150>
ffffffc000085040:       910f0021        add     x1, x1, #0x3c0
ffffffc000085044:       8b010041        add     x1, x2, x1
ffffffc000085048:       d2c00802        mov     x2, #0x4000000000
         // #274877906944
ffffffc00008504c:       8b020021        add     x1, x1, x2
ffffffc000085050:       d63f0020        blr     x1
ffffffc000085054:       f0002940        adrp    x0, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc000085058:       f0002941        adrp    x1, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc00008505c:       90002143        adrp    x3, ffffffc0004ad000
<__start_rodata>
ffffffc000085060:       91128000        add     x0, x0, #0x4a0
ffffffc000085064:       913de021        add     x1, x1, #0xf78
ffffffc000085068:       52800c22        mov     w2, #0x61
         // #97
ffffffc00008506c:       91072063        add     x3, x3, #0x1c8
ffffffc000085070:       941071d0        bl      ffffffc0004a17b0 <printk>
ffffffc000085074:       f0002940        adrp    x0, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc000085078:       91134000        add     x0, x0, #0x4d0
ffffffc00008507c:       9410712c        bl      ffffffc0004a152c <panic>

If I single step the code,

This is how my stack looks like @ffffffc00008501c
CPU#0>mdd 0xffffffc3eb83fcf0
ffffffc3_eb83fcf0 : ffffffc3eb83fd10  ........
ffffffc3_eb83fcf8 : ffffffc000092778  ......'x
ffffffc3_eb83fd00 : ffffffc000cc9f70  .......p
ffffffc3_eb83fd08 : 00000043eae32000  ...C.. .
ffffffc3_eb83fd10 : ffffffc3eb83fd70  .......p
ffffffc3_eb83fd18 : ffffffc0000fc018  ........
ffffffc3_eb83fd20 : ffffffc000c95000  ......P.
ffffffc3_eb83fd28 : 0000000000000000  ........
ffffffc3_eb83fd30 : ffffffc000cd06a0  ........
ffffffc3_eb83fd38 : 0000000000000000  ........
ffffffc3_eb83fd40 : 0000000080000000  ........
ffffffc3_eb83fd48 : 0000000000000015  ........
ffffffc3_eb83fd50 : 0000000000000115  ........
ffffffc3_eb83fd58 : 000000000000008e  ........
ffffffc3_eb83fd60 : ffffffc000c8b000  ........
ffffffc3_eb83fd68 : ffffffc3eb83c000  ........

And this is how it looks like @ffffffc000085030
CPU#0>mdd 0xffffffc3eb83fcf0
ffffffc3_eb83fcf0 : 0000000000000115  ........
ffffffc3_eb83fcf8 : 000000000000003f  .......?
ffffffc3_eb83fd00 : ffffffc3eb83fd30  .......0
ffffffc3_eb83fd08 : ffffffc000120360  .......`
ffffffc3_eb83fd10 : 0000000000000002  ........
ffffffc3_eb83fd18 : ffffffbcedb611c0  ........
ffffffc3_eb83fd20 : ffffffbcedb611c0  ........
ffffffc3_eb83fd28 : ffffffc3eae08000  ........
ffffffc3_eb83fd30 : ffffffc3000200d0  ........
ffffffc3_eb83fd38 : ffffffc000120708  ........
ffffffc3_eb83fd40 : 0000000000000000  ........
ffffffc3_eb83fd48 : 72a00040528f8fe0  r.. at R...
ffffffc3_eb83fd50 : 540002a16a00003f  T...j..?
ffffffc3_eb83fd58 : 3627fe60f8538260  6'.`.S.`
ffffffc3_eb83fd60 : 97fff80daa1303e0  ........
ffffffc3_eb83fd68 : 97fc01ffaa1403e0  ........

It is clearly getting corrupted.

Now with keeping caches on
######
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 786daa6..6ff3d9f 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -76,10 +76,10 @@ static void setup_restart(void)
        flush_cache_all();

        /* Turn D-cache off */
-       cpu_cache_off();
+       //cpu_cache_off();

        /* Push out any further dirty data, and ensure cache is empty */
-       flush_cache_all();
+       //flush_cache_all();
 }

 void soft_restart(unsigned long addr)
#######

ffffffc000085014 <soft_restart>:
ffffffc000085014:       a9be7bfd        stp     x29, x30, [sp,#-32]!
ffffffc000085018:       910003fd        mov     x29, sp
ffffffc00008501c:       f9000fa0        str     x0, [x29,#24]
ffffffc000085020:       94003c49        bl      ffffffc000094144
<setup_mm_for_reboot>
ffffffc000085024:       94003a6b        bl      ffffffc0000939d0
<flush_cache_all>
ffffffc000085028:       90006201        adrp    x1, ffffffc000cc5000
<reset_devices>
ffffffc00008502c:       f9400fa0        ldr     x0, [x29,#24]
ffffffc000085030:       f940fc22        ldr     x2, [x1,#504]
ffffffc000085034:       f0000061        adrp    x1, ffffffc000094000
<arch_pick_mmap_layout+0x150>
ffffffc000085038:       910f0021        add     x1, x1, #0x3c0
ffffffc00008503c:       8b010041        add     x1, x2, x1
ffffffc000085040:       d2c00802        mov     x2, #0x4000000000
         // #274877906944
ffffffc000085044:       8b020021        add     x1, x1, x2
ffffffc000085048:       d63f0020        blr     x1
ffffffc00008504c:       f0002940        adrp    x0, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc000085050:       f0002941        adrp    x1, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc000085054:       90002143        adrp    x3, ffffffc0004ad000
<__start_rodata>
ffffffc000085058:       91128000        add     x0, x0, #0x4a0
ffffffc00008505c:       913de021        add     x1, x1, #0xf78
ffffffc000085060:       52800c22        mov     w2, #0x61
         // #97
ffffffc000085064:       91072063        add     x3, x3, #0x1c8
ffffffc000085068:       941071d2        bl      ffffffc0004a17b0 <printk>
ffffffc00008506c:       f0002940        adrp    x0, ffffffc0005b0000
<kallsyms_token_table+0x200>
ffffffc000085070:       91134000        add     x0, x0, #0x4d0
ffffffc000085074:       9410712e        bl      ffffffc0004a152c <panic>

Now my stack @ffffffc00008501c and @ffffffc000085028 are same.
It is

CPU#0>mdd 0xffffffc3eae27cf0
ffffffc3_eae27cf0 : ffffffc3eae27d10  ......}.
ffffffc3_eae27cf8 : ffffffc000092778  ......'x
ffffffc3_eae27d00 : ffffffc000cc9f70  .......p
ffffffc3_eae27d08 : 00000043f0171000  ...C....
ffffffc3_eae27d10 : ffffffc3eae27d70  ......}p
ffffffc3_eae27d18 : ffffffc0000fc018  ........
ffffffc3_eae27d20 : ffffffc000c95000  ......P.
ffffffc3_eae27d28 : 0000000000000000  ........
ffffffc3_eae27d30 : ffffffc000cd06a0  ........
ffffffc3_eae27d38 : 0000000000000000  ........
ffffffc3_eae27d40 : 0000000080000000  ........
ffffffc3_eae27d48 : 0000000000000015  ........
ffffffc3_eae27d50 : 0000000000000115  ........
ffffffc3_eae27d58 : 000000000000008e  ........
ffffffc3_eae27d60 : ffffffc000c8b000  ........
ffffffc3_eae27d68 : ffffffc3eae24000  ...... at .

--Arun

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-25  0:17               ` Geoff Levand
@ 2014-07-25 10:31                 ` Arun Chandran
  2014-07-25 10:36                 ` Mark Rutland
  2014-07-25 11:48                 ` Arun Chandran
  2 siblings, 0 replies; 49+ messages in thread
From: Arun Chandran @ 2014-07-25 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi,
>
> On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
>> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>
>> > All memory management for the main cpu is done by the arch code.  Kexec
>> > and cpu hot plug only work with the secondary cpus, so the problem would
>> > be in the arch memory code, either in setup_restart() for shutdown, or
>> > in the startup code.
>>
>> It's possible that soft_restart and setup_restart are a little dodgy, as
>> they also rely on the compiler being smart and not touching the stack
>> after setup_restart().
>>
>> However, they provide absolutely no guarantee that any data has been
>> flushed out to the PoC [1]. If you require any data to be flushed out to the
>> PoC so as to be visible to noncacheable accesses, you will need to
>> ensure that this is flushed by VA before soft_restart is called. Data
>> may have migrated to another cache (e.g. another CPU, or the L3) where
>> it is not visible.
>
> OK, kexec's reset routine relocate_new_kernel does use a few global
> variables that are set by the main kexec routines.  I added a call to
> __flush_dcache_area(), which uses 'dc civac' for those.
>
> I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
> flush_cache_all() would be enough.
>
> Arun, I also fixed UP builds.  Could you pull my latest and try with L3
> enabled?

I tried it but kexeced kernel fails to boot.
Ran the code with and without the call to cpu_cache_off();
in arch/arm64/kernel/process.c. I have put more details
in another mail. Please see that.

--Arun









>
> -Geoff
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-25  0:17               ` Geoff Levand
  2014-07-25 10:31                 ` Arun Chandran
@ 2014-07-25 10:36                 ` Mark Rutland
  2014-07-25 11:48                 ` Arun Chandran
  2 siblings, 0 replies; 49+ messages in thread
From: Mark Rutland @ 2014-07-25 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 25, 2014 at 01:17:48AM +0100, Geoff Levand wrote:
> Hi,
> 
> On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
> > On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>  
> > > All memory management for the main cpu is done by the arch code.  Kexec
> > > and cpu hot plug only work with the secondary cpus, so the problem would
> > > be in the arch memory code, either in setup_restart() for shutdown, or
> > > in the startup code.
> > 
> > It's possible that soft_restart and setup_restart are a little dodgy, as
> > they also rely on the compiler being smart and not touching the stack
> > after setup_restart().
> > 
> > However, they provide absolutely no guarantee that any data has been
> > flushed out to the PoC [1]. If you require any data to be flushed out to the
> > PoC so as to be visible to noncacheable accesses, you will need to
> > ensure that this is flushed by VA before soft_restart is called. Data
> > may have migrated to another cache (e.g. another CPU, or the L3) where
> > it is not visible.
> 
> OK, kexec's reset routine relocate_new_kernel does use a few global
> variables that are set by the main kexec routines.  I added a call to
> __flush_dcache_area(), which uses 'dc civac' for those.
> 
> I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
> flush_cache_all() would be enough.

Unfortunately Set/Way ops don't provide the guarantee you require. While
they may happen to force prior writes out to the PoC on some
implementations, this guarantee is not provided by the architecture. The
only way to guarantee data is out at the PoC per the architecture is to
use cache maintenance by VA (unless you are unlucky enough to be on a
32-bit system with an outer cache that requires MMIO maintenance).

Almost every use of Set/Way operations is dodgy. They only make sense
for IMPLEMENTATION DEFINED power-on / power-off sequences (which the
arm64 kernel won't be dealing with), and some edge cases where you need
to guarantee that the local d-caches are empty (to avoid unexpectedly
hitting in the cache).

The naming of __flush_dcache_all is certainly misleading and I intend to
address that shortly.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-25 10:26               ` Arun Chandran
@ 2014-07-25 11:29                 ` Mark Rutland
  0 siblings, 0 replies; 49+ messages in thread
From: Mark Rutland @ 2014-07-25 11:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 25, 2014 at 11:26:46AM +0100, Arun Chandran wrote:
> On Thu, Jul 24, 2014 at 3:06 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
> >> Hi Arun,
> >>
> >> On Tue, 2014-07-22 at 18:55 +0530, Arun Chandran wrote:
> >>
> >> > I tried the same dtb with UP configuration. For UP kernel to compile
> >> > did the below modifications
> >>
> >> I'll test and fixup the kexec UP build in the next few days.
> >>
> >> ...
> >>
> >> > With the default target configuration "kexec -e" failed to execute
> >> > in UP scenario also.
> >
> > It would be helpful to know _how_ it failed. Do you have any log output?
> >
> >> >
> >> > But I had some luck when I did the same steps with L3 cache
> >> > disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
> >> > it has an L3 cache. Luckily I was able to disable it in u-boot.
> >> >
> >> > With the L3 cache disabled configuration I am able to
> >> > do "kexec -e". Please see the log attached.
> >
> > Hmm. We don't expect the kernel to do any L3 management. It seems that
> > memory subsystems with L3 caches respecting cache maintenance by VA are
> > going to become relatively common, and we expect to handle them all by
> > performing maintenance by VA. See commit c218bca74eea (arm64: Relax the
> > kernel cache requirements for boot) for what we do at boot time.
> >
> >>
> >> All memory management for the main cpu is done by the arch code.  Kexec
> >> and cpu hot plug only work with the secondary cpus, so the problem would
> >> be in the arch memory code, either in setup_restart() for shutdown, or
> >> in the startup code.
> >
> > It's possible that soft_restart and setup_restart are a little dodgy, as
> > they also rely on the compiler being smart and not touching the stack
> > after setup_restart().
> >
> Could you please explain why this is required?

It's a requirement because we have no guarantee that the stack (or any
other memory addresses) will be flushed out to the PoC and become
visible to non-cacheable accesses.

Any use of the stack after we disable the d-cache is a bug. If you need
a VA range visible to non-cacheable accesses (i.e. after the cache is
disabled), you must flush that range to the PoC by VA.

The kernel text _should_ be out at the PoC per the boot protocol
requirements, so we don't have to flush it to execute correctly unless
we've performed some kernel text patching.

Arguably the first __flush_dcache_all call here is unnecessary; it
doesn't provide any guarantee we can rely on. The __flush_dcache_all
_after_ disabling the caches will ensure that the local caches are empty
(avoiding unexpected hits for non-cacheable accesses).

> 
> This is my disassembled output of  soft_restart()
> With the latest code from
> https://git.linaro.org/people/geoff.levand/linux-kexec.git
> 
> ffffffc000085014 <soft_restart>:
> ffffffc000085014:       a9be7bfd        stp     x29, x30, [sp,#-32]!
> ffffffc000085018:       910003fd        mov     x29, sp
> ffffffc00008501c:       f9000fa0        str     x0, [x29,#24]
> ffffffc000085020:       94003c49        bl      ffffffc000094144
> <setup_mm_for_reboot>
> ffffffc000085024:       94003a6b        bl      ffffffc0000939d0
> <flush_cache_all>
> ffffffc000085028:       94003cde        bl      ffffffc0000943a0 <cpu_cache_off>
> ffffffc00008502c:       94003a69        bl      ffffffc0000939d0
> <flush_cache_all>
> ffffffc000085030:       90006201        adrp    x1, ffffffc000cc5000
> <reset_devices>
> ffffffc000085034:       f9400fa0        ldr     x0, [x29,#24]
> ffffffc000085038:       f940fc22        ldr     x2, [x1,#504]
> ffffffc00008503c:       f0000061        adrp    x1, ffffffc000094000
> <arch_pick_mmap_layout+0x150>
> ffffffc000085040:       910f0021        add     x1, x1, #0x3c0
> ffffffc000085044:       8b010041        add     x1, x2, x1
> ffffffc000085048:       d2c00802        mov     x2, #0x4000000000
>          // #274877906944
> ffffffc00008504c:       8b020021        add     x1, x1, x2
> ffffffc000085050:       d63f0020        blr     x1
> ffffffc000085054:       f0002940        adrp    x0, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc000085058:       f0002941        adrp    x1, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc00008505c:       90002143        adrp    x3, ffffffc0004ad000
> <__start_rodata>
> ffffffc000085060:       91128000        add     x0, x0, #0x4a0
> ffffffc000085064:       913de021        add     x1, x1, #0xf78
> ffffffc000085068:       52800c22        mov     w2, #0x61
>          // #97
> ffffffc00008506c:       91072063        add     x3, x3, #0x1c8
> ffffffc000085070:       941071d0        bl      ffffffc0004a17b0 <printk>
> ffffffc000085074:       f0002940        adrp    x0, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc000085078:       91134000        add     x0, x0, #0x4d0
> ffffffc00008507c:       9410712c        bl      ffffffc0004a152c <panic>
> 
> If I single step the code,
> 
> This is how my stack looks like @ffffffc00008501c
> CPU#0>mdd 0xffffffc3eb83fcf0
> ffffffc3_eb83fcf0 : ffffffc3eb83fd10  ........
> ffffffc3_eb83fcf8 : ffffffc000092778  ......'x
> ffffffc3_eb83fd00 : ffffffc000cc9f70  .......p
> ffffffc3_eb83fd08 : 00000043eae32000  ...C.. .
> ffffffc3_eb83fd10 : ffffffc3eb83fd70  .......p
> ffffffc3_eb83fd18 : ffffffc0000fc018  ........
> ffffffc3_eb83fd20 : ffffffc000c95000  ......P.
> ffffffc3_eb83fd28 : 0000000000000000  ........
> ffffffc3_eb83fd30 : ffffffc000cd06a0  ........
> ffffffc3_eb83fd38 : 0000000000000000  ........
> ffffffc3_eb83fd40 : 0000000080000000  ........
> ffffffc3_eb83fd48 : 0000000000000015  ........
> ffffffc3_eb83fd50 : 0000000000000115  ........
> ffffffc3_eb83fd58 : 000000000000008e  ........
> ffffffc3_eb83fd60 : ffffffc000c8b000  ........
> ffffffc3_eb83fd68 : ffffffc3eb83c000  ........
> 
> And this is how it looks like @ffffffc000085030
> CPU#0>mdd 0xffffffc3eb83fcf0
> ffffffc3_eb83fcf0 : 0000000000000115  ........
> ffffffc3_eb83fcf8 : 000000000000003f  .......?
> ffffffc3_eb83fd00 : ffffffc3eb83fd30  .......0
> ffffffc3_eb83fd08 : ffffffc000120360  .......`
> ffffffc3_eb83fd10 : 0000000000000002  ........
> ffffffc3_eb83fd18 : ffffffbcedb611c0  ........
> ffffffc3_eb83fd20 : ffffffbcedb611c0  ........
> ffffffc3_eb83fd28 : ffffffc3eae08000  ........
> ffffffc3_eb83fd30 : ffffffc3000200d0  ........
> ffffffc3_eb83fd38 : ffffffc000120708  ........
> ffffffc3_eb83fd40 : 0000000000000000  ........
> ffffffc3_eb83fd48 : 72a00040528f8fe0  r.. at R...
> ffffffc3_eb83fd50 : 540002a16a00003f  T...j..?
> ffffffc3_eb83fd58 : 3627fe60f8538260  6'.`.S.`
> ffffffc3_eb83fd60 : 97fff80daa1303e0  ........
> ffffffc3_eb83fd68 : 97fc01ffaa1403e0  ........
> 
> It is clearly getting corrupted.

No it is not getting corrupted.

At ffffffc00008501c the CPU's cache is enabled (i.e. the CPU can make
cacheable accesses). The stack is normal cacheable (WBWA) memory, so
accesses will look-up and allocate in the cache. As the stack is being
accessed often it is warm, and the cache hierarchy does not decide to
evict it and/or write-back to the next level of the cache hierarchy.

At ffffffc000085030 the cache is disabled (i.e. the CPU cannot make
cacheable accesses). The effective attributes for the stack are
non-cacheable, so the CPU will bypass the cache hierarchy and go
straight to the PoC when it tried to read stack addresses. As we have
not flushed the stack by VA, (dirty) cachelines covering the stack might
be anywhere in the cache hierarchy, and aren't guaranteed to have
reached to PoC.

So the CPU reads _stale_ data from the PoC. There is no corruption; the
cache lines are still valid somewhere (we never invalidated them without
cleaning them first), but are simply not visible to non-cacheable
accesses.

Are we actually accessing the stack? If not, then there's no need to
worry. If we are, then the code which is doing so is buggy and either
needs to flush the stack by VA to the PoC, or (better) not use the stack
in this case and just use registers.

Does that all make sense? Or do I need to get my ascii-art paintbrush
out?

> 
> Now with keeping caches on
> ######
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 786daa6..6ff3d9f 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -76,10 +76,10 @@ static void setup_restart(void)
>         flush_cache_all();
> 
>         /* Turn D-cache off */
> -       cpu_cache_off();
> +       //cpu_cache_off();
> 
>         /* Push out any further dirty data, and ensure cache is empty */
> -       flush_cache_all();
> +       //flush_cache_all();
>  }
> 
>  void soft_restart(unsigned long addr)
> #######
> 
> ffffffc000085014 <soft_restart>:
> ffffffc000085014:       a9be7bfd        stp     x29, x30, [sp,#-32]!
> ffffffc000085018:       910003fd        mov     x29, sp
> ffffffc00008501c:       f9000fa0        str     x0, [x29,#24]
> ffffffc000085020:       94003c49        bl      ffffffc000094144
> <setup_mm_for_reboot>
> ffffffc000085024:       94003a6b        bl      ffffffc0000939d0
> <flush_cache_all>
> ffffffc000085028:       90006201        adrp    x1, ffffffc000cc5000
> <reset_devices>
> ffffffc00008502c:       f9400fa0        ldr     x0, [x29,#24]
> ffffffc000085030:       f940fc22        ldr     x2, [x1,#504]
> ffffffc000085034:       f0000061        adrp    x1, ffffffc000094000
> <arch_pick_mmap_layout+0x150>
> ffffffc000085038:       910f0021        add     x1, x1, #0x3c0
> ffffffc00008503c:       8b010041        add     x1, x2, x1
> ffffffc000085040:       d2c00802        mov     x2, #0x4000000000
>          // #274877906944
> ffffffc000085044:       8b020021        add     x1, x1, x2
> ffffffc000085048:       d63f0020        blr     x1
> ffffffc00008504c:       f0002940        adrp    x0, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc000085050:       f0002941        adrp    x1, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc000085054:       90002143        adrp    x3, ffffffc0004ad000
> <__start_rodata>
> ffffffc000085058:       91128000        add     x0, x0, #0x4a0
> ffffffc00008505c:       913de021        add     x1, x1, #0xf78
> ffffffc000085060:       52800c22        mov     w2, #0x61
>          // #97
> ffffffc000085064:       91072063        add     x3, x3, #0x1c8
> ffffffc000085068:       941071d2        bl      ffffffc0004a17b0 <printk>
> ffffffc00008506c:       f0002940        adrp    x0, ffffffc0005b0000
> <kallsyms_token_table+0x200>
> ffffffc000085070:       91134000        add     x0, x0, #0x4d0
> ffffffc000085074:       9410712e        bl      ffffffc0004a152c <panic>
> 
> Now my stack @ffffffc00008501c and @ffffffc000085028 are same.

Sure, with the caches on you have the guarantee that prior writes are
still visible.

You either need to perform maintenance by VA to make the stack visible,
or (better) don't access memory once the cache is disabled. Just load
everything you need into registers in advance.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-25  0:17               ` Geoff Levand
  2014-07-25 10:31                 ` Arun Chandran
  2014-07-25 10:36                 ` Mark Rutland
@ 2014-07-25 11:48                 ` Arun Chandran
  2014-07-25 12:14                   ` Mark Rutland
  2014-07-26  0:18                   ` Geoff Levand
  2 siblings, 2 replies; 49+ messages in thread
From: Arun Chandran @ 2014-07-25 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi,
>
> On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
>> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>
>> > All memory management for the main cpu is done by the arch code.  Kexec
>> > and cpu hot plug only work with the secondary cpus, so the problem would
>> > be in the arch memory code, either in setup_restart() for shutdown, or
>> > in the startup code.
>>
>> It's possible that soft_restart and setup_restart are a little dodgy, as
>> they also rely on the compiler being smart and not touching the stack
>> after setup_restart().
>>
>> However, they provide absolutely no guarantee that any data has been
>> flushed out to the PoC [1]. If you require any data to be flushed out to the
>> PoC so as to be visible to noncacheable accesses, you will need to
>> ensure that this is flushed by VA before soft_restart is called. Data
>> may have migrated to another cache (e.g. another CPU, or the L3) where
>> it is not visible.
>
> OK, kexec's reset routine relocate_new_kernel does use a few global
> variables that are set by the main kexec routines.  I added a call to
> __flush_dcache_area(), which uses 'dc civac' for those.
>
> I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
> flush_cache_all() would be enough.
>
> Arun, I also fixed UP builds.  Could you pull my latest and try with L3
> enabled?
>
I got this working. As 'Mark Rutland' pointed in another mail that
it could be problem with flushing the cache; I did a read of
1GB data from start of RAM to a volatile var. I assume that
this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)

###################
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 786daa6..90418f3 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -63,6 +63,10 @@ static inline void smp_secondary_shutdown(void) {}

 static void setup_restart(void)
 {
+       volatile u64 tmp;
+       volatile u64 *addr;
+
+       addr = (u64 *)(0xffffffc000000000);
        /*
         * Tell the mm system that we are going to reboot -
         * we may need it to insert some 1:1 mappings so that
@@ -75,6 +79,11 @@ static void setup_restart(void)
        /* Clean and invalidate caches */
        flush_cache_all();

+       for ( ;addr < (u64 *)0xffffffc040000000; addr++)
+       {
+               tmp = *addr;
+       }
+
        /* Turn D-cache off */
        cpu_cache_off();

###################

With the above change latest kernel @
https://git.linaro.org/people/geoff.levand/linux-kexec.git
is able to do kexec with L3 enabled in UP scenario.

--Arun













> -Geoff
>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-25 11:48                 ` Arun Chandran
@ 2014-07-25 12:14                   ` Mark Rutland
  2014-07-25 15:29                     ` Arun Chandran
  2014-07-26  0:18                   ` Geoff Levand
  1 sibling, 1 reply; 49+ messages in thread
From: Mark Rutland @ 2014-07-25 12:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 25, 2014 at 12:48:04PM +0100, Arun Chandran wrote:
> Hi Geoff,
> 
> On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote:
> > Hi,
> >
> > On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
> >> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
> >
> >> > All memory management for the main cpu is done by the arch code.  Kexec
> >> > and cpu hot plug only work with the secondary cpus, so the problem would
> >> > be in the arch memory code, either in setup_restart() for shutdown, or
> >> > in the startup code.
> >>
> >> It's possible that soft_restart and setup_restart are a little dodgy, as
> >> they also rely on the compiler being smart and not touching the stack
> >> after setup_restart().
> >>
> >> However, they provide absolutely no guarantee that any data has been
> >> flushed out to the PoC [1]. If you require any data to be flushed out to the
> >> PoC so as to be visible to noncacheable accesses, you will need to
> >> ensure that this is flushed by VA before soft_restart is called. Data
> >> may have migrated to another cache (e.g. another CPU, or the L3) where
> >> it is not visible.
> >
> > OK, kexec's reset routine relocate_new_kernel does use a few global
> > variables that are set by the main kexec routines.  I added a call to
> > __flush_dcache_area(), which uses 'dc civac' for those.
> >
> > I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
> > flush_cache_all() would be enough.
> >
> > Arun, I also fixed UP builds.  Could you pull my latest and try with L3
> > enabled?
> >
> I got this working. As 'Mark Rutland' pointed in another mail that
> it could be problem with flushing the cache; I did a read of
> 1GB data from start of RAM to a volatile var. I assume that
> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)

You've managed to get the cache to evict some lines, which proves my
theory, but this is absolute nonsense and guarantees nothing.

So NAK to this.

If you need to perform cache maintenance to guarantee data is visible to
non-cacheable accesses  you _must_ use the architected mechanism for
cleaning data to the PoC: DC CVAC. We have wrappers for flushing ranges.

Anything else is nonsense and does not provide the guarantee you need.

That said, I still am not sure what guarantee you are attempting to get.
Which data do you need out at the PoC?

Thanks,
Mark.

> 
> ###################
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 786daa6..90418f3 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -63,6 +63,10 @@ static inline void smp_secondary_shutdown(void) {}
> 
>  static void setup_restart(void)
>  {
> +       volatile u64 tmp;
> +       volatile u64 *addr;
> +
> +       addr = (u64 *)(0xffffffc000000000);
>         /*
>          * Tell the mm system that we are going to reboot -
>          * we may need it to insert some 1:1 mappings so that
> @@ -75,6 +79,11 @@ static void setup_restart(void)
>         /* Clean and invalidate caches */
>         flush_cache_all();
> 
> +       for ( ;addr < (u64 *)0xffffffc040000000; addr++)
> +       {
> +               tmp = *addr;
> +       }
> +
>         /* Turn D-cache off */
>         cpu_cache_off();
> 
> ###################
> 
> With the above change latest kernel @
> https://git.linaro.org/people/geoff.levand/linux-kexec.git
> is able to do kexec with L3 enabled in UP scenario.
> 
> --Arun
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> > -Geoff
> >
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-25 12:14                   ` Mark Rutland
@ 2014-07-25 15:29                     ` Arun Chandran
  0 siblings, 0 replies; 49+ messages in thread
From: Arun Chandran @ 2014-07-25 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 25, 2014 at 5:44 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Jul 25, 2014 at 12:48:04PM +0100, Arun Chandran wrote:
>> Hi Geoff,
>>
>> On Fri, Jul 25, 2014 at 5:47 AM, Geoff Levand <geoff@infradead.org> wrote:
>> > Hi,
>> >
>> > On Thu, 2014-07-24 at 10:36 +0100, Mark Rutland wrote:
>> >> On Thu, Jul 24, 2014 at 01:38:07AM +0100, Geoff Levand wrote:
>> >
>> >> > All memory management for the main cpu is done by the arch code.  Kexec
>> >> > and cpu hot plug only work with the secondary cpus, so the problem would
>> >> > be in the arch memory code, either in setup_restart() for shutdown, or
>> >> > in the startup code.
>> >>
>> >> It's possible that soft_restart and setup_restart are a little dodgy, as
>> >> they also rely on the compiler being smart and not touching the stack
>> >> after setup_restart().
>> >>
>> >> However, they provide absolutely no guarantee that any data has been
>> >> flushed out to the PoC [1]. If you require any data to be flushed out to the
>> >> PoC so as to be visible to noncacheable accesses, you will need to
>> >> ensure that this is flushed by VA before soft_restart is called. Data
>> >> may have migrated to another cache (e.g. another CPU, or the L3) where
>> >> it is not visible.
>> >
>> > OK, kexec's reset routine relocate_new_kernel does use a few global
>> > variables that are set by the main kexec routines.  I added a call to
>> > __flush_dcache_area(), which uses 'dc civac' for those.
>> >
>> > I had thought the call to __flush_dcache_all, which uses 'dc cisw', in
>> > flush_cache_all() would be enough.
>> >
>> > Arun, I also fixed UP builds.  Could you pull my latest and try with L3
>> > enabled?
>> >
>> I got this working. As 'Mark Rutland' pointed in another mail that
>> it could be problem with flushing the cache; I did a read of
>> 1GB data from start of RAM to a volatile var. I assume that
>> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)
>
> You've managed to get the cache to evict some lines, which proves my
> theory, but this is absolute nonsense and guarantees nothing.
>
> So NAK to this.
>

Yes I was just shooting wildly.

> If you need to perform cache maintenance to guarantee data is visible to
> non-cacheable accesses  you _must_ use the architected mechanism for
> cleaning data to the PoC: DC CVAC. We have wrappers for flushing ranges.
>
> Anything else is nonsense and does not provide the guarantee you need.
>
> That said, I still am not sure what guarantee you are attempting to get.
> Which data do you need out at the PoC?
>

I tried flushing the jump addr

##########
 +static unsigned long jump_addr_save;
 void soft_restart(unsigned long addr)
 {
        typedef void (*phys_reset_t)(unsigned long);
        phys_reset_t phys_reset;

+       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
+       jump_addr_save = addr;
+        __flush_dcache_area(&jump_addr_save, 16);
+        __flush_dcache_area(&phys_reset, 16);
        setup_restart();

        /* Switch to the identity mapping */
-       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
-       phys_reset(addr);
+       phys_reset(jump_addr_save);

        /* Should never get here */
        BUG();
###########

And flushing all the source and destination pages of the kexeced
kernel

##########
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 2995c78..3edf567 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -221,6 +221,8 @@ static void _kexec_entry_dump(const char *func, int line,
                                                addr,
                                                (unsigned
long)virt_to_phys(dest),
                                                dest);
+                               __flush_dcache_area(addr, PAGE_SIZE);
+                               __flush_dcache_area(dest, PAGE_SIZE);
                                dest += PAGE_SIZE;
                                break;
                        case IND_DONE:
###########

It still fails(not reboot to kexeced kernel); That means I miss
flushing of some other
area.

--Arun

> Thanks,
> Mark.
>
>>
>> ###################
>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> index 786daa6..90418f3 100644
>> --- a/arch/arm64/kernel/process.c
>> +++ b/arch/arm64/kernel/process.c
>> @@ -63,6 +63,10 @@ static inline void smp_secondary_shutdown(void) {}
>>
>>  static void setup_restart(void)
>>  {
>> +       volatile u64 tmp;
>> +       volatile u64 *addr;
>> +
>> +       addr = (u64 *)(0xffffffc000000000);
>>         /*
>>          * Tell the mm system that we are going to reboot -
>>          * we may need it to insert some 1:1 mappings so that
>> @@ -75,6 +79,11 @@ static void setup_restart(void)
>>         /* Clean and invalidate caches */
>>         flush_cache_all();
>>
>> +       for ( ;addr < (u64 *)0xffffffc040000000; addr++)
>> +       {
>> +               tmp = *addr;
>> +       }
>> +
>>         /* Turn D-cache off */
>>         cpu_cache_off();
>>
>> ###################
>>
>> With the above change latest kernel @
>> https://git.linaro.org/people/geoff.levand/linux-kexec.git
>> is able to do kexec with L3 enabled in UP scenario.
>>
>> --Arun
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > -Geoff
>> >
>>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-25 11:48                 ` Arun Chandran
  2014-07-25 12:14                   ` Mark Rutland
@ 2014-07-26  0:18                   ` Geoff Levand
  2014-07-28 15:00                     ` Arun Chandran
  1 sibling, 1 reply; 49+ messages in thread
From: Geoff Levand @ 2014-07-26  0:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Fri, 2014-07-25 at 17:18 +0530, Arun Chandran wrote:
> I got this working. As 'Mark Rutland' pointed in another mail that
> it could be problem with flushing the cache; I did a read of
> 1GB data from start of RAM to a volatile var. I assume that
> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)

I wasn't flushing out all the data used by relocate_new_kernel.  I added
a routine that should flush all the pages in the kimage list out to PoC.
Please try a UP build with L3 enabled.

Next week I'll check that my smp secondary shutdown code is doing the
correct thing.

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-26  0:18                   ` Geoff Levand
@ 2014-07-28 15:00                     ` Arun Chandran
  2014-07-28 15:38                       ` Mark Rutland
  0 siblings, 1 reply; 49+ messages in thread
From: Arun Chandran @ 2014-07-28 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff

On Sat, Jul 26, 2014 at 5:48 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Fri, 2014-07-25 at 17:18 +0530, Arun Chandran wrote:
>> I got this working. As 'Mark Rutland' pointed in another mail that
>> it could be problem with flushing the cache; I did a read of
>> 1GB data from start of RAM to a volatile var. I assume that
>> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)
>
> I wasn't flushing out all the data used by relocate_new_kernel.  I added
> a routine that should flush all the pages in the kimage list out to PoC.
> Please try a UP build with L3 enabled.
>

Yes I verified this new kernel by stopping just before jumping to the
kexeced kernel and taking the memory dump for the kernel
and dtb.

CPU#0>dump 0x0000004000080000 0x7F2000 uImage
CPU#0>rd
GPR00: 0000004000880000 0000000000000000 0000000000000000 0000000000000000
GPR04: 0000004000080004 000000000000001b 0000000000000000 ffffffffffffffff
GPR08: 0000000000000020 ffffffffffffffff 0000000000000004 0000000000000002
GPR12: 00000043ea439fd8 0000004000883000 00000043ea631000 ffffffffffffffff
GPR16: ffffffc0000cc31c 0000000000435260 0000007fe9328b90 00000043eaddc002
GPR20: 0000004000883000 00000043ea632000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: 0000000000000000 0000000000000000 ffffffc000085074 ffffffc3eae53d00

And compared this image with the one taken after booting with
my working one(The one with 1GB read of RAM). Both are Identical.
That means kexec code now flushes data properly. Note that in both
cases I used the same secondary kernel.

But It fails to boot kexeced kernel. If I break the target I can see
CPU#0>h
    Core number       : 0
    Core state        : debug (AArch64 EL1)
    Debug entry cause : External Debug Request
    Current PC        : 0xffffffc000083200
    Current CPSR      : 0x600003c5 (EL1h)


So I guess there may be something wrong with
the booting of kernel.

I have these changes to the code.
1)
#########
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 00cfbd6..da3672b 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -684,7 +691,7 @@ void machine_kexec(struct kimage *image)
        /* Flush the reboot_code_buffer in preparation for its execution. */

        flush_icache_range((unsigned long)reboot_code_buffer,
-               relocate_new_kernel_size);
+               (unsigned long)(reboot_code_buffer + relocate_new_kernel_size));

        /*
         * Flush any data used by relocate_new_kernel in preparation for
#########
Passing of second variable to flush_icache_range() is wrong
it expects an address not length.

2)

#######
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 9ed7327..e3fc8d6 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c

@@ -84,12 +91,17 @@ void soft_restart(unsigned long addr)
 {
        typedef void (*phys_reset_t)(unsigned long);
        phys_reset_t phys_reset;
+       unsigned long jump_addr = addr;
+
+       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
+
+       __flush_dcache_area(&jump_addr, 8);
+       __flush_dcache_area(&phys_reset, 8);

        setup_restart();

        /* Switch to the identity mapping */
-       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
-       phys_reset(addr);
+       phys_reset(jump_addr);

        /* Should never get here */
        BUG();
########

Without this flushing it will jump to wrong addr.

--Arun

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-28 15:00                     ` Arun Chandran
@ 2014-07-28 15:38                       ` Mark Rutland
  2014-07-29  0:09                         ` Geoff Levand
  0 siblings, 1 reply; 49+ messages in thread
From: Mark Rutland @ 2014-07-28 15:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote:
> Hi Geoff
> 
> On Sat, Jul 26, 2014 at 5:48 AM, Geoff Levand <geoff@infradead.org> wrote:
> > Hi Arun,
> >
> > On Fri, 2014-07-25 at 17:18 +0530, Arun Chandran wrote:
> >> I got this working. As 'Mark Rutland' pointed in another mail that
> >> it could be problem with flushing the cache; I did a read of
> >> 1GB data from start of RAM to a volatile var. I assume that
> >> this will clear and invalidate all that in cache (L1=32K, L2=256 K, L3=8M)
> >
> > I wasn't flushing out all the data used by relocate_new_kernel.  I added
> > a routine that should flush all the pages in the kimage list out to PoC.
> > Please try a UP build with L3 enabled.
> >
> 
> Yes I verified this new kernel by stopping just before jumping to the
> kexeced kernel and taking the memory dump for the kernel
> and dtb.
> 
> CPU#0>dump 0x0000004000080000 0x7F2000 uImage
> CPU#0>rd
> GPR00: 0000004000880000 0000000000000000 0000000000000000 0000000000000000
> GPR04: 0000004000080004 000000000000001b 0000000000000000 ffffffffffffffff
> GPR08: 0000000000000020 ffffffffffffffff 0000000000000004 0000000000000002
> GPR12: 00000043ea439fd8 0000004000883000 00000043ea631000 ffffffffffffffff
> GPR16: ffffffc0000cc31c 0000000000435260 0000007fe9328b90 00000043eaddc002
> GPR20: 0000004000883000 00000043ea632000 0000000000000000 0000000000000000
> GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR28: 0000000000000000 0000000000000000 ffffffc000085074 ffffffc3eae53d00
> 
> And compared this image with the one taken after booting with
> my working one(The one with 1GB read of RAM). Both are Identical.
> That means kexec code now flushes data properly. Note that in both
> cases I used the same secondary kernel.
> 
> But It fails to boot kexeced kernel. If I break the target I can see
> CPU#0>h
>     Core number       : 0
>     Core state        : debug (AArch64 EL1)
>     Debug entry cause : External Debug Request
>     Current PC        : 0xffffffc000083200
>     Current CPSR      : 0x600003c5 (EL1h)
> 
> 
> So I guess there may be something wrong with
> the booting of kernel.
> 
> I have these changes to the code.
> 1)
> #########
> diff --git a/arch/arm64/kernel/machine_kexec.c
> b/arch/arm64/kernel/machine_kexec.c
> index 00cfbd6..da3672b 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -684,7 +691,7 @@ void machine_kexec(struct kimage *image)
>         /* Flush the reboot_code_buffer in preparation for its execution. */
> 
>         flush_icache_range((unsigned long)reboot_code_buffer,
> -               relocate_new_kernel_size);
> +               (unsigned long)(reboot_code_buffer + relocate_new_kernel_size));
> 
>         /*
>          * Flush any data used by relocate_new_kernel in preparation for
> #########
> Passing of second variable to flush_icache_range() is wrong
> it expects an address not length.

A simpler option would be to nuke the entire icache before branching to
the new image.

> 
> 2)
> 
> #######
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 9ed7327..e3fc8d6 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> 
> @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr)
>  {
>         typedef void (*phys_reset_t)(unsigned long);
>         phys_reset_t phys_reset;
> +       unsigned long jump_addr = addr;
> +
> +       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
> +
> +       __flush_dcache_area(&jump_addr, 8);
> +       __flush_dcache_area(&phys_reset, 8);

Are these values really not getting stashed in registers?

If the compiler is spilling, then we have absolutely no guarantee about
any part of the stack. If that's the case, then we can't use the stack
at all. These need to be rewritten in asm if the compiler is spilling.

Thanks,
Mark.

> 
>         setup_restart();
> 
>         /* Switch to the identity mapping */
> -       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
> -       phys_reset(addr);
> +       phys_reset(jump_addr);
> 
>         /* Should never get here */
>         BUG();
> ########
> 
> Without this flushing it will jump to wrong addr.
> 
> --Arun
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-28 15:38                       ` Mark Rutland
@ 2014-07-29  0:09                         ` Geoff Levand
  2014-07-29  9:10                           ` Mark Rutland
  2014-07-29 12:32                           ` Arun Chandran
  0 siblings, 2 replies; 49+ messages in thread
From: Geoff Levand @ 2014-07-29  0:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, 2014-07-28 at 16:38 +0100, Mark Rutland wrote:
> On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote:
> > I have these changes to the code.
> >         flush_icache_range((unsigned long)reboot_code_buffer,
> > -               relocate_new_kernel_size);
> > +               (unsigned long)(reboot_code_buffer + relocate_new_kernel_size));

Thanks, I introduced this in my last version in an attempt to clean up
the code, but on studying setup_restart(), I wonder if we even need to
do this icache flush here (see below).

> >         /*
> >          * Flush any data used by relocate_new_kernel in preparation for
> > #########
> > Passing of second variable to flush_icache_range() is wrong
> > it expects an address not length.
> 
> A simpler option would be to nuke the entire icache before branching to
> the new image.

flush_cache_all(), which is called by setup_restart(), does a 'ic
ialluis'.  The ARM says that this will invalidate all instruction caches
for the inner shareable domain.  Do we need something more?

> > 2)
> > 
> > #######
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index 9ed7327..e3fc8d6 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > 
> > @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr)
> >  {
> >         typedef void (*phys_reset_t)(unsigned long);
> >         phys_reset_t phys_reset;
> > +       unsigned long jump_addr = addr;
> > +
> > +       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
> > +
> > +       __flush_dcache_area(&jump_addr, 8);
> > +       __flush_dcache_area(&phys_reset, 8);
> 
> Are these values really not getting stashed in registers?

Looking at the disassembled code of soft_restart() from my compiler,
addr is being saved on the stack over the call to setup_restart(), which
I would expect it to do.

> If the compiler is spilling, then we have absolutely no guarantee about
> any part of the stack. If that's the case, then we can't use the stack
> at all. These need to be rewritten in asm if the compiler is spilling.

I think we just need to put the restart addr in a variable and flush
that to the PoC.

Arun, I pushed out a fixed version of soft_restart(), so please try
another UP + L3 boot.

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-29  0:09                         ` Geoff Levand
@ 2014-07-29  9:10                           ` Mark Rutland
  2014-07-29 12:32                           ` Arun Chandran
  1 sibling, 0 replies; 49+ messages in thread
From: Mark Rutland @ 2014-07-29  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 29, 2014 at 01:09:08AM +0100, Geoff Levand wrote:
> Hi,

Hi,

> On Mon, 2014-07-28 at 16:38 +0100, Mark Rutland wrote:
> > On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote:
> > > I have these changes to the code.
> > >         flush_icache_range((unsigned long)reboot_code_buffer,
> > > -               relocate_new_kernel_size);
> > > +               (unsigned long)(reboot_code_buffer + relocate_new_kernel_size));
> 
> Thanks, I introduced this in my last version in an attempt to clean up
> the code, but on studying setup_restart(), I wonder if we even need to
> do this icache flush here (see below).
> 
> > >         /*
> > >          * Flush any data used by relocate_new_kernel in preparation for
> > > #########
> > > Passing of second variable to flush_icache_range() is wrong
> > > it expects an address not length.
> > 
> > A simpler option would be to nuke the entire icache before branching to
> > the new image.
> 
> flush_cache_all(), which is called by setup_restart(), does a 'ic
> ialluis'.  The ARM says that this will invalidate all instruction caches
> for the inner shareable domain.  Do we need something more?

If we have that before branching to the new image then that should be
ok. The current image should already be visible at the PoC per the boot
protocol.

> > > 2)
> > > 
> > > #######
> > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > index 9ed7327..e3fc8d6 100644
> > > --- a/arch/arm64/kernel/process.c
> > > +++ b/arch/arm64/kernel/process.c
> > > 
> > > @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr)
> > >  {
> > >         typedef void (*phys_reset_t)(unsigned long);
> > >         phys_reset_t phys_reset;
> > > +       unsigned long jump_addr = addr;
> > > +
> > > +       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
> > > +
> > > +       __flush_dcache_area(&jump_addr, 8);
> > > +       __flush_dcache_area(&phys_reset, 8);
> > 
> > Are these values really not getting stashed in registers?
> 
> Looking at the disassembled code of soft_restart() from my compiler,
> addr is being saved on the stack over the call to setup_restart(), which
> I would expect it to do.
 
> > If the compiler is spilling, then we have absolutely no guarantee about
> > any part of the stack. If that's the case, then we can't use the stack
> > at all. These need to be rewritten in asm if the compiler is spilling.
> 
> I think we just need to put the restart addr in a variable and flush
> that to the PoC.

I don't believe that flushing the restart addr variable on the stack out
to the PoC is the correct fix here; it only guarantees that said
variable is visible, not anything else the compiler may have placed on
the stack. I wonder how setup_restart interacts with
CONFIG_CC_STACKPROTECTOR for example.

As far as I can see, the only way to provide the guarantee we require
here is to not use the stack. Anything short of that is not going to be
much more robust than the current code.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-29  0:09                         ` Geoff Levand
  2014-07-29  9:10                           ` Mark Rutland
@ 2014-07-29 12:32                           ` Arun Chandran
  2014-07-29 13:35                             ` Mark Rutland
  1 sibling, 1 reply; 49+ messages in thread
From: Arun Chandran @ 2014-07-29 12:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Tue, Jul 29, 2014 at 5:39 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi,
>
> On Mon, 2014-07-28 at 16:38 +0100, Mark Rutland wrote:
>> On Mon, Jul 28, 2014 at 04:00:18PM +0100, Arun Chandran wrote:
>> > I have these changes to the code.
>> >         flush_icache_range((unsigned long)reboot_code_buffer,
>> > -               relocate_new_kernel_size);
>> > +               (unsigned long)(reboot_code_buffer + relocate_new_kernel_size));
>
> Thanks, I introduced this in my last version in an attempt to clean up
> the code, but on studying setup_restart(), I wonder if we even need to
> do this icache flush here (see below).
>
>> >         /*
>> >          * Flush any data used by relocate_new_kernel in preparation for
>> > #########
>> > Passing of second variable to flush_icache_range() is wrong
>> > it expects an address not length.
>>
>> A simpler option would be to nuke the entire icache before branching to
>> the new image.
>
> flush_cache_all(), which is called by setup_restart(), does a 'ic
> ialluis'.  The ARM says that this will invalidate all instruction caches
> for the inner shareable domain.  Do we need something more?
>
>> > 2)
>> >
>> > #######
>> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> > index 9ed7327..e3fc8d6 100644
>> > --- a/arch/arm64/kernel/process.c
>> > +++ b/arch/arm64/kernel/process.c
>> >
>> > @@ -84,12 +91,17 @@ void soft_restart(unsigned long addr)
>> >  {
>> >         typedef void (*phys_reset_t)(unsigned long);
>> >         phys_reset_t phys_reset;
>> > +       unsigned long jump_addr = addr;
>> > +
>> > +       phys_reset = (phys_reset_t)virt_to_phys(cpu_reset);
>> > +
>> > +       __flush_dcache_area(&jump_addr, 8);
>> > +       __flush_dcache_area(&phys_reset, 8);
>>
>> Are these values really not getting stashed in registers?
>
> Looking at the disassembled code of soft_restart() from my compiler,
> addr is being saved on the stack over the call to setup_restart(), which
> I would expect it to do.
>
Yes my compiler also saves this in stack

>> If the compiler is spilling, then we have absolutely no guarantee about
>> any part of the stack. If that's the case, then we can't use the stack
>> at all. These need to be rewritten in asm if the compiler is spilling.
>
> I think we just need to put the restart addr in a variable and flush
> that to the PoC.
>
> Arun, I pushed out a fixed version of soft_restart(), so please try
> another UP + L3 boot.
>

The default code did not work.

It is working with the change below

###############
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 5632473..7c5f859 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb)
 /**
  * kexec_list_walk - Helper to walk the kimage page list.
  */
-
+static int kexec_kernel_size;
+#define IMG_SIZE_NONE  0
+#define KERN_SIZE_FLAG 1
+#define DTB_SIZE_FLAG  2
 static void kexec_list_walk(void *ctx, unsigned long kimage_head,
        void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
 {
        void *dest;
        unsigned long *entry;
+       int imgsize_flag = IMG_SIZE_NONE;
+

        for (entry = &kimage_head, dest = NULL; ; entry++) {
                unsigned int flag = *entry & IND_FLAGS;
@@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned
long kimage_head,
                        cb(ctx, flag, addr, NULL);
                        break;
                case IND_DESTINATION:
+                       if (imgsize_flag == IMG_SIZE_NONE) {
+                               kexec_kernel_size = 0;
+                               imgsize_flag = KERN_SIZE_FLAG;
+                       } else if (imgsize_flag == KERN_SIZE_FLAG) {
+                               imgsize_flag = DTB_SIZE_FLAG;
+                       }
                        dest = addr;
                        cb(ctx, flag, addr, NULL);
                        break;
                case IND_SOURCE:
+                       if (imgsize_flag == KERN_SIZE_FLAG)
+                               kexec_kernel_size++;
                        cb(ctx, flag, addr, dest);
                        dest += PAGE_SIZE;
                        break;
@@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image)

        kexec_list_walk(NULL, image->head, kexec_list_flush_cb);

+       /*
+        * Make sure virtual addresses of new kernel are flushed
+        * SZ_512K = TEXT_OFFSET
+        * kexec_kernel = kexec_kernel_size * PAGE_SIZE
+        * Don't know = (SZ_4M + SZ_1M)
+        * SZ_4M = not working
+        * SZ_6M = working
+        * SZ_8M = working
+        *
+        * so chose SZ_4M + SZ_1M; Don't know why this is required
+        * BSS, stack ??
+        *
+        */
+       __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K +
(kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M);
+
        soft_restart(reboot_code_buffer_phys);
 }


--Arun

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-29 12:32                           ` Arun Chandran
@ 2014-07-29 13:35                             ` Mark Rutland
  2014-07-29 21:19                               ` Geoff Levand
                                                 ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Mark Rutland @ 2014-07-29 13:35 UTC (permalink / raw)
  To: linux-arm-kernel

[...]

> The default code did not work.
> 
> It is working with the change below
> 
> ###############
> diff --git a/arch/arm64/kernel/machine_kexec.c
> b/arch/arm64/kernel/machine_kexec.c
> index 5632473..7c5f859 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb)
>  /**
>   * kexec_list_walk - Helper to walk the kimage page list.
>   */
> -
> +static int kexec_kernel_size;
> +#define IMG_SIZE_NONE  0
> +#define KERN_SIZE_FLAG 1
> +#define DTB_SIZE_FLAG  2
>  static void kexec_list_walk(void *ctx, unsigned long kimage_head,
>         void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
>  {
>         void *dest;
>         unsigned long *entry;
> +       int imgsize_flag = IMG_SIZE_NONE;
> +
> 
>         for (entry = &kimage_head, dest = NULL; ; entry++) {
>                 unsigned int flag = *entry & IND_FLAGS;
> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned
> long kimage_head,
>                         cb(ctx, flag, addr, NULL);
>                         break;
>                 case IND_DESTINATION:
> +                       if (imgsize_flag == IMG_SIZE_NONE) {
> +                               kexec_kernel_size = 0;
> +                               imgsize_flag = KERN_SIZE_FLAG;
> +                       } else if (imgsize_flag == KERN_SIZE_FLAG) {
> +                               imgsize_flag = DTB_SIZE_FLAG;
> +                       }
>                         dest = addr;
>                         cb(ctx, flag, addr, NULL);
>                         break;
>                 case IND_SOURCE:
> +                       if (imgsize_flag == KERN_SIZE_FLAG)
> +                               kexec_kernel_size++;
>                         cb(ctx, flag, addr, dest);
>                         dest += PAGE_SIZE;
>                         break;
> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image)
> 
>         kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
> 
> +       /*
> +        * Make sure virtual addresses of new kernel are flushed
> +        * SZ_512K = TEXT_OFFSET

TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also
shouldn't need to be flushed.

Since c218bca74eea (arm64: Relax the kernel cache requirements for
boot), the kernel will flush the cache for anything outside of the Image
that it writes to before enabling the MMU and caches (e.g. the idmap and
swapper page tables). Once caches are up we shouldn't care.

Assuming that the existing kernel code is correct, the only region we
should need to flush out to the PoC is the region from _text to _edata
(i.e. just the contents of the Image).

> +        * kexec_kernel = kexec_kernel_size * PAGE_SIZE
> +        * Don't know = (SZ_4M + SZ_1M)
> +        * SZ_4M = not working
> +        * SZ_6M = working
> +        * SZ_8M = working
> +        *
> +        * so chose SZ_4M + SZ_1M; Don't know why this is required
> +        * BSS, stack ??
> +        *
> +        */
> +       __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K +
> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M);
> +
>         soft_restart(reboot_code_buffer_phys);
>  }

How big exactly is the kernel Image you're trying to kexec?

Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-29 13:35                             ` Mark Rutland
@ 2014-07-29 21:19                               ` Geoff Levand
  2014-07-30  7:22                                 ` Arun Chandran
  2014-07-30  5:46                               ` Arun Chandran
  2014-07-30  7:01                               ` Arun Chandran
  2 siblings, 1 reply; 49+ messages in thread
From: Geoff Levand @ 2014-07-29 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
> Since c218bca74eea (arm64: Relax the kernel cache requirements for
> boot), the kernel will flush the cache for anything outside of the Image
> that it writes to before enabling the MMU and caches (e.g. the idmap and
> swapper page tables). Once caches are up we shouldn't care.
> 
> Assuming that the existing kernel code is correct, the only region we
> should need to flush out to the PoC is the region from _text to _edata
> (i.e. just the contents of the Image).

If the new kernel will overwrite the old one, then we do the final copy
of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
is executed after the dcache is disabled, so that should write it directly
to the PoC.  It seems the protocol expects us to invalidate the dcache
for that range though, so I added code to do this, essentially what Arun
had added.

Arun, please try.

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-22 13:25         ` Arun Chandran
  2014-07-24  0:38           ` Geoff Levand
@ 2014-07-30  3:26           ` Feng Kan
  1 sibling, 0 replies; 49+ messages in thread
From: Feng Kan @ 2014-07-30  3:26 UTC (permalink / raw)
  To: linux-arm-kernel

>
> But I had some luck when I did the same steps with L3 cache
> disabled. According to http://www.spinics.net/lists/arm-kernel/msg329541.html
> it has an L3 cache. Luckily I was able to disable it in u-boot.
>
> With the L3 cache disabled configuration I am able to
> do "kexec -e". Please see the log attached.
>
> Feng,
> I doubt kernel is unaware of the presence of L3 cache, this subsequently
> makes "kexec -e" to fail.
Yes, L3 is turned on prior to entering Linux. It is used when Linux
enable cache in
the MMU.

>
> Do you have any idea how to make the kernel to take control of L3 cache?
We don't have this code. Using address 0 to get back to spin address would
require you to disable cache prior to jump.


>
> --Arun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-29 13:35                             ` Mark Rutland
  2014-07-29 21:19                               ` Geoff Levand
@ 2014-07-30  5:46                               ` Arun Chandran
  2014-07-30  9:16                                 ` Mark Rutland
  2014-07-30  7:01                               ` Arun Chandran
  2 siblings, 1 reply; 49+ messages in thread
From: Arun Chandran @ 2014-07-30  5:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 29, 2014 at 7:05 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> [...]
>
>> The default code did not work.
>>
>> It is working with the change below
>>
>> ###############
>> diff --git a/arch/arm64/kernel/machine_kexec.c
>> b/arch/arm64/kernel/machine_kexec.c
>> index 5632473..7c5f859 100644
>> --- a/arch/arm64/kernel/machine_kexec.c
>> +++ b/arch/arm64/kernel/machine_kexec.c
>> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb)
>>  /**
>>   * kexec_list_walk - Helper to walk the kimage page list.
>>   */
>> -
>> +static int kexec_kernel_size;
>> +#define IMG_SIZE_NONE  0
>> +#define KERN_SIZE_FLAG 1
>> +#define DTB_SIZE_FLAG  2
>>  static void kexec_list_walk(void *ctx, unsigned long kimage_head,
>>         void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
>>  {
>>         void *dest;
>>         unsigned long *entry;
>> +       int imgsize_flag = IMG_SIZE_NONE;
>> +
>>
>>         for (entry = &kimage_head, dest = NULL; ; entry++) {
>>                 unsigned int flag = *entry & IND_FLAGS;
>> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned
>> long kimage_head,
>>                         cb(ctx, flag, addr, NULL);
>>                         break;
>>                 case IND_DESTINATION:
>> +                       if (imgsize_flag == IMG_SIZE_NONE) {
>> +                               kexec_kernel_size = 0;
>> +                               imgsize_flag = KERN_SIZE_FLAG;
>> +                       } else if (imgsize_flag == KERN_SIZE_FLAG) {
>> +                               imgsize_flag = DTB_SIZE_FLAG;
>> +                       }
>>                         dest = addr;
>>                         cb(ctx, flag, addr, NULL);
>>                         break;
>>                 case IND_SOURCE:
>> +                       if (imgsize_flag == KERN_SIZE_FLAG)
>> +                               kexec_kernel_size++;
>>                         cb(ctx, flag, addr, dest);
>>                         dest += PAGE_SIZE;
>>                         break;
>> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image)
>>
>>         kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
>>
>> +       /*
>> +        * Make sure virtual addresses of new kernel are flushed
>> +        * SZ_512K = TEXT_OFFSET
>
> TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also
> shouldn't need to be flushed.
>
> Since c218bca74eea (arm64: Relax the kernel cache requirements for
> boot), the kernel will flush the cache for anything outside of the Image
> that it writes to before enabling the MMU and caches (e.g. the idmap and
> swapper page tables). Once caches are up we shouldn't care.
>

Ok. TEXT_OFFSET macro is not exported, that's why I used SZ_512K.
Any particular reason for not exporting it?

> Assuming that the existing kernel code is correct, the only region we
> should need to flush out to the PoC is the region from _text to _edata
> (i.e. just the contents of the Image).
>
>> +        * kexec_kernel = kexec_kernel_size * PAGE_SIZE
>> +        * Don't know = (SZ_4M + SZ_1M)
>> +        * SZ_4M = not working
>> +        * SZ_6M = working
>> +        * SZ_8M = working
>> +        *
>> +        * so chose SZ_4M + SZ_1M; Don't know why this is required
>> +        * BSS, stack ??
>> +        *
>> +        */
>> +       __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K +
>> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M);
>> +
>>         soft_restart(reboot_code_buffer_phys);
>>  }
>
> How big exactly is the kernel Image you're trying to kexec?

For the 1st and second stage I use the same kernel as they
are combined with intramfs final image size varies.

1st stage
-------------
$ls -l arch/arm64/boot/uImage
-rw-rw-r-- 1 arun arun 12895544 Jul 30 10:57 arch/arm64/boot/uImage

It will boot to intramfs

$ls /
bin  dtb.dtb  home  lib    linuxrc  mnt  proc  run   share  tmp  var
dev  etc      init  lib64  media    opt  root  sbin  sys    usr  vmlinux.strip

kexec will boot vmlinux.strip
$ls -l vmlinux.strip
-rwxrwxr-x 1 arun arun 8194760 Jul 30 10:55 vmlinux.strip

2nd stage
--------------
$ls -l arch/arm64/boot/uImage
-rw-rw-r-- 1 arun arun 8127800 Jul 30 10:54 arch/arm64/boot/uImage

The corresponding vmlinux is converted to vmlinux.strip

$aarch64-linux-gnu-strip vmlinux -o vmlinux.strip
$cp vmlinux.strip /ramfs/aarch64_le_rootfs/

--Arun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-29 13:35                             ` Mark Rutland
  2014-07-29 21:19                               ` Geoff Levand
  2014-07-30  5:46                               ` Arun Chandran
@ 2014-07-30  7:01                               ` Arun Chandran
  2 siblings, 0 replies; 49+ messages in thread
From: Arun Chandran @ 2014-07-30  7:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 29, 2014 at 7:05 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> [...]
>
>> The default code did not work.
>>
>> It is working with the change below
>>
>> ###############
>> diff --git a/arch/arm64/kernel/machine_kexec.c
>> b/arch/arm64/kernel/machine_kexec.c
>> index 5632473..7c5f859 100644
>> --- a/arch/arm64/kernel/machine_kexec.c
>> +++ b/arch/arm64/kernel/machine_kexec.c
>> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb)
>>  /**
>>   * kexec_list_walk - Helper to walk the kimage page list.
>>   */
>> -
>> +static int kexec_kernel_size;
>> +#define IMG_SIZE_NONE  0
>> +#define KERN_SIZE_FLAG 1
>> +#define DTB_SIZE_FLAG  2
>>  static void kexec_list_walk(void *ctx, unsigned long kimage_head,
>>         void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
>>  {
>>         void *dest;
>>         unsigned long *entry;
>> +       int imgsize_flag = IMG_SIZE_NONE;
>> +
>>
>>         for (entry = &kimage_head, dest = NULL; ; entry++) {
>>                 unsigned int flag = *entry & IND_FLAGS;
>> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned
>> long kimage_head,
>>                         cb(ctx, flag, addr, NULL);
>>                         break;
>>                 case IND_DESTINATION:
>> +                       if (imgsize_flag == IMG_SIZE_NONE) {
>> +                               kexec_kernel_size = 0;
>> +                               imgsize_flag = KERN_SIZE_FLAG;
>> +                       } else if (imgsize_flag == KERN_SIZE_FLAG) {
>> +                               imgsize_flag = DTB_SIZE_FLAG;
>> +                       }
>>                         dest = addr;
>>                         cb(ctx, flag, addr, NULL);
>>                         break;
>>                 case IND_SOURCE:
>> +                       if (imgsize_flag == KERN_SIZE_FLAG)
>> +                               kexec_kernel_size++;
>>                         cb(ctx, flag, addr, dest);
>>                         dest += PAGE_SIZE;
>>                         break;
>> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image)
>>
>>         kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
>>
>> +       /*
>> +        * Make sure virtual addresses of new kernel are flushed
>> +        * SZ_512K = TEXT_OFFSET
>
> TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also
> shouldn't need to be flushed.
>
> Since c218bca74eea (arm64: Relax the kernel cache requirements for
> boot), the kernel will flush the cache for anything outside of the Image
> that it writes to before enabling the MMU and caches (e.g. the idmap and
> swapper page tables). Once caches are up we shouldn't care.
>
> Assuming that the existing kernel code is correct, the only region we
> should need to flush out to the PoC is the region from _text to _edata
> (i.e. just the contents of the Image).

I think we missed the dtb part. dtb is placed below the
kernel. We need to flush that also.

Geoff's new code manages that also. It is now
working for me.

--Arun
>
>> +        * kexec_kernel = kexec_kernel_size * PAGE_SIZE
>> +        * Don't know = (SZ_4M + SZ_1M)
>> +        * SZ_4M = not working
>> +        * SZ_6M = working
>> +        * SZ_8M = working
>> +        *
>> +        * so chose SZ_4M + SZ_1M; Don't know why this is required
>> +        * BSS, stack ??
>> +        *
>> +        */
>> +       __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K +
>> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M);
>> +
>>         soft_restart(reboot_code_buffer_phys);
>>  }
>
> How big exactly is the kernel Image you're trying to kexec?
>
> Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-29 21:19                               ` Geoff Levand
@ 2014-07-30  7:22                                 ` Arun Chandran
  2014-08-01 11:13                                   ` Arun Chandran
  2014-08-04 10:16                                   ` Arun Chandran
  0 siblings, 2 replies; 49+ messages in thread
From: Arun Chandran @ 2014-07-30  7:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Mark,
>
> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
>> Since c218bca74eea (arm64: Relax the kernel cache requirements for
>> boot), the kernel will flush the cache for anything outside of the Image
>> that it writes to before enabling the MMU and caches (e.g. the idmap and
>> swapper page tables). Once caches are up we shouldn't care.
>>
>> Assuming that the existing kernel code is correct, the only region we
>> should need to flush out to the PoC is the region from _text to _edata
>> (i.e. just the contents of the Image).
>
> If the new kernel will overwrite the old one, then we do the final copy
> of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
> is executed after the dcache is disabled, so that should write it directly
> to the PoC.  It seems the protocol expects us to invalidate the dcache
> for that range though, so I added code to do this, essentially what Arun
> had added.
>
> Arun, please try.
>
It works without any hiccups :)..
I have attached the log.

I will try with big-endian UP configuration next.

Just had to comment the crashdump code to compile with
defconfig.

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 233cd04..6f4ebd8 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -377,6 +377,7 @@ static void __init __maybe_unused reserve_crashkernel(void)
        insert_resource(&iomem_resource, &crashk_res);
 }

+#ifdef CONFIG_CRASH_DUMP
 static void __init __maybe_unused reserve_elfcorehdr(void)
 {
        struct resource res;
@@ -402,6 +403,7 @@ static void __init __maybe_unused reserve_elfcorehdr(void)
        res.end = elfcorehdr_addr + elfcorehdr_size - 1;
        insert_resource(&iomem_resource, &res);
 }
+#endif

 static void __init request_standard_resources(void)
 {

--Arun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kexec_log
Type: application/octet-stream
Size: 15206 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140730/b4be6f8b/attachment-0001.obj>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-30  5:46                               ` Arun Chandran
@ 2014-07-30  9:16                                 ` Mark Rutland
  0 siblings, 0 replies; 49+ messages in thread
From: Mark Rutland @ 2014-07-30  9:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 30, 2014 at 06:46:39AM +0100, Arun Chandran wrote:
> On Tue, Jul 29, 2014 at 7:05 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> > [...]
> >
> >> The default code did not work.
> >>
> >> It is working with the change below
> >>
> >> ###############
> >> diff --git a/arch/arm64/kernel/machine_kexec.c
> >> b/arch/arm64/kernel/machine_kexec.c
> >> index 5632473..7c5f859 100644
> >> --- a/arch/arm64/kernel/machine_kexec.c
> >> +++ b/arch/arm64/kernel/machine_kexec.c
> >> @@ -147,12 +147,17 @@ static bool kexec_is_dtb_user(const dtb_t *dtb)
> >>  /**
> >>   * kexec_list_walk - Helper to walk the kimage page list.
> >>   */
> >> -
> >> +static int kexec_kernel_size;
> >> +#define IMG_SIZE_NONE  0
> >> +#define KERN_SIZE_FLAG 1
> >> +#define DTB_SIZE_FLAG  2
> >>  static void kexec_list_walk(void *ctx, unsigned long kimage_head,
> >>         void (*cb)(void *ctx, unsigned int flag, void *addr, void *dest))
> >>  {
> >>         void *dest;
> >>         unsigned long *entry;
> >> +       int imgsize_flag = IMG_SIZE_NONE;
> >> +
> >>
> >>         for (entry = &kimage_head, dest = NULL; ; entry++) {
> >>                 unsigned int flag = *entry & IND_FLAGS;
> >> @@ -164,10 +169,18 @@ static void kexec_list_walk(void *ctx, unsigned
> >> long kimage_head,
> >>                         cb(ctx, flag, addr, NULL);
> >>                         break;
> >>                 case IND_DESTINATION:
> >> +                       if (imgsize_flag == IMG_SIZE_NONE) {
> >> +                               kexec_kernel_size = 0;
> >> +                               imgsize_flag = KERN_SIZE_FLAG;
> >> +                       } else if (imgsize_flag == KERN_SIZE_FLAG) {
> >> +                               imgsize_flag = DTB_SIZE_FLAG;
> >> +                       }
> >>                         dest = addr;
> >>                         cb(ctx, flag, addr, NULL);
> >>                         break;
> >>                 case IND_SOURCE:
> >> +                       if (imgsize_flag == KERN_SIZE_FLAG)
> >> +                               kexec_kernel_size++;
> >>                         cb(ctx, flag, addr, dest);
> >>                         dest += PAGE_SIZE;
> >>                         break;
> >> @@ -693,5 +706,20 @@ void machine_kexec(struct kimage *image)
> >>
> >>         kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
> >>
> >> +       /*
> >> +        * Make sure virtual addresses of new kernel are flushed
> >> +        * SZ_512K = TEXT_OFFSET
> >
> > TEXT_OFFSET is not guaranteed to be 512K. The TEXT_OFFSET area also
> > shouldn't need to be flushed.
> >
> > Since c218bca74eea (arm64: Relax the kernel cache requirements for
> > boot), the kernel will flush the cache for anything outside of the Image
> > that it writes to before enabling the MMU and caches (e.g. the idmap and
> > swapper page tables). Once caches are up we shouldn't care.
> >
> 
> Ok. TEXT_OFFSET macro is not exported, that's why I used SZ_512K.
> Any particular reason for not exporting it?

The TEXT_OFFSET of the kernel you intend to execute can be found in its
Image header. Take a look at Documentation/arm64/booting.txt. This may
be different from the TEXT_OFFSET of the current kernel.

As of v3.17 all fields in the header will be little-endian, and
booting.txt has been updated for this. You can check the image_size
field to determine whether the fields are all little-endian. Only when
image_size is zero can TEXT_OFFSET be assumed to be 0x80000.

> 
> > Assuming that the existing kernel code is correct, the only region we
> > should need to flush out to the PoC is the region from _text to _edata
> > (i.e. just the contents of the Image).
> >
> >> +        * kexec_kernel = kexec_kernel_size * PAGE_SIZE
> >> +        * Don't know = (SZ_4M + SZ_1M)
> >> +        * SZ_4M = not working
> >> +        * SZ_6M = working
> >> +        * SZ_8M = working
> >> +        *
> >> +        * so chose SZ_4M + SZ_1M; Don't know why this is required
> >> +        * BSS, stack ??
> >> +        *
> >> +        */
> >> +       __flush_dcache_area((void *)PAGE_OFFSET, SZ_512K +
> >> (kexec_kernel_size * PAGE_SIZE) + SZ_4M + SZ_1M);
> >> +
> >>         soft_restart(reboot_code_buffer_phys);
> >>  }
> >
> > How big exactly is the kernel Image you're trying to kexec?
> 
> For the 1st and second stage I use the same kernel as they
> are combined with intramfs final image size varies.
> 
> 1st stage
> -------------
> $ls -l arch/arm64/boot/uImage
> -rw-rw-r-- 1 arun arun 12895544 Jul 30 10:57 arch/arm64/boot/uImage
> 
> It will boot to intramfs
> 
> $ls /
> bin  dtb.dtb  home  lib    linuxrc  mnt  proc  run   share  tmp  var
> dev  etc      init  lib64  media    opt  root  sbin  sys    usr  vmlinux.strip
> 
> kexec will boot vmlinux.strip
> $ls -l vmlinux.strip
> -rwxrwxr-x 1 arun arun 8194760 Jul 30 10:55 vmlinux.strip
> 
> 2nd stage
> --------------
> $ls -l arch/arm64/boot/uImage
> -rw-rw-r-- 1 arun arun 8127800 Jul 30 10:54 arch/arm64/boot/uImage
> 
> The corresponding vmlinux is converted to vmlinux.strip
> 
> $aarch64-linux-gnu-strip vmlinux -o vmlinux.strip
> $cp vmlinux.strip /ramfs/aarch64_le_rootfs/

OK, thanks for the info.

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-30  7:22                                 ` Arun Chandran
@ 2014-08-01 11:13                                   ` Arun Chandran
  2014-08-03 14:47                                     ` Mark Rutland
  2014-08-04 10:16                                   ` Arun Chandran
  1 sibling, 1 reply; 49+ messages in thread
From: Arun Chandran @ 2014-08-01 11:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote:
> On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote:
>> Hi Mark,
>>
>> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
>>> Since c218bca74eea (arm64: Relax the kernel cache requirements for
>>> boot), the kernel will flush the cache for anything outside of the Image
>>> that it writes to before enabling the MMU and caches (e.g. the idmap and
>>> swapper page tables). Once caches are up we shouldn't care.
>>>
>>> Assuming that the existing kernel code is correct, the only region we
>>> should need to flush out to the PoC is the region from _text to _edata
>>> (i.e. just the contents of the Image).
>>
>> If the new kernel will overwrite the old one, then we do the final copy
>> of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
>> is executed after the dcache is disabled, so that should write it directly
>> to the PoC.  It seems the protocol expects us to invalidate the dcache
>> for that range though, so I added code to do this, essentially what Arun
>> had added.
>>
>> Arun, please try.
>>
> It works without any hiccups :)..
> I have attached the log.
>
> I will try with big-endian UP configuration next.
>

This question may be irrelevant to kexec and may be stupid also.

while debugging kexec in BIG-endian configuration I see
_create_page_tables (arch/arm64/kernel/head.S) is
doing __inval_cache_range on the addresses
from idmap_pg_dir to swapper_pg_dir.

ie from 0x4000D5F000 to 0x4000D61000 + #SWAPPER_DIR_SIZE
in my case.

Is it supposed to clear the corresponding virtual addresses?

There might be chance that first stage kernel may be using VA
from the same area right? (cache (L3)containing valid lines in the
area 0xffffffc000d5f000 to 0xffffffc000d5f000 + 16K)

Or is it the duty of the kexec to flush the entire
VA regions used by the first stage kernel and
the VA regions going to be used by the 2nd
stage kernel?

--Arun


> Just had to comment the crashdump code to compile with
> defconfig.
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 233cd04..6f4ebd8 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -377,6 +377,7 @@ static void __init __maybe_unused reserve_crashkernel(void)
>         insert_resource(&iomem_resource, &crashk_res);
>  }
>
> +#ifdef CONFIG_CRASH_DUMP
>  static void __init __maybe_unused reserve_elfcorehdr(void)
>  {
>         struct resource res;
> @@ -402,6 +403,7 @@ static void __init __maybe_unused reserve_elfcorehdr(void)
>         res.end = elfcorehdr_addr + elfcorehdr_size - 1;
>         insert_resource(&iomem_resource, &res);
>  }
> +#endif
>
>  static void __init request_standard_resources(void)
>  {
>
> --Arun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-01 11:13                                   ` Arun Chandran
@ 2014-08-03 14:47                                     ` Mark Rutland
  0 siblings, 0 replies; 49+ messages in thread
From: Mark Rutland @ 2014-08-03 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 01, 2014 at 12:13:12PM +0100, Arun Chandran wrote:
> On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote:
> > On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote:
> >> Hi Mark,
> >>
> >> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
> >>> Since c218bca74eea (arm64: Relax the kernel cache requirements for
> >>> boot), the kernel will flush the cache for anything outside of the Image
> >>> that it writes to before enabling the MMU and caches (e.g. the idmap and
> >>> swapper page tables). Once caches are up we shouldn't care.
> >>>
> >>> Assuming that the existing kernel code is correct, the only region we
> >>> should need to flush out to the PoC is the region from _text to _edata
> >>> (i.e. just the contents of the Image).
> >>
> >> If the new kernel will overwrite the old one, then we do the final copy
> >> of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
> >> is executed after the dcache is disabled, so that should write it directly
> >> to the PoC.  It seems the protocol expects us to invalidate the dcache
> >> for that range though, so I added code to do this, essentially what Arun
> >> had added.
> >>
> >> Arun, please try.
> >>
> > It works without any hiccups :)..
> > I have attached the log.
> >
> > I will try with big-endian UP configuration next.
> >
> 
> This question may be irrelevant to kexec and may be stupid also.
> 
> while debugging kexec in BIG-endian configuration I see
> _create_page_tables (arch/arm64/kernel/head.S) is
> doing __inval_cache_range on the addresses
> from idmap_pg_dir to swapper_pg_dir.
> 
> ie from 0x4000D5F000 to 0x4000D61000 + #SWAPPER_DIR_SIZE
> in my case.
> 
> Is it supposed to clear the corresponding virtual addresses?

The data caches behave in a PIPT like fashion, and there are no aliases.
Flushing by any VA that maps to a particular PA will flush out the only
entry that cna possibly exist for that PA.

While the MMU is off, the VA->PA mapping is an idmap. Given that, we can
flush by PA.

> There might be chance that first stage kernel may be using VA
> from the same area right? (cache (L3)containing valid lines in the
> area 0xffffffc000d5f000 to 0xffffffc000d5f000 + 16K)

The L3 cache should never see the VA, only the PA.

> Or is it the duty of the kexec to flush the entire
> VA regions used by the first stage kernel and
> the VA regions going to be used by the 2nd
> stage kernel?

We should only need ensure that the new kernel image and anything we
expect to use before the caches have been eanbled are flushed to the
PoC.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-07-30  7:22                                 ` Arun Chandran
  2014-08-01 11:13                                   ` Arun Chandran
@ 2014-08-04 10:16                                   ` Arun Chandran
  2014-08-04 11:35                                     ` Mark Rutland
  2014-08-04 17:21                                     ` Geoff Levand
  1 sibling, 2 replies; 49+ messages in thread
From: Arun Chandran @ 2014-08-04 10:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote:
> On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote:
>> Hi Mark,
>>
>> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
>>> Since c218bca74eea (arm64: Relax the kernel cache requirements for
>>> boot), the kernel will flush the cache for anything outside of the Image
>>> that it writes to before enabling the MMU and caches (e.g. the idmap and
>>> swapper page tables). Once caches are up we shouldn't care.
>>>
>>> Assuming that the existing kernel code is correct, the only region we
>>> should need to flush out to the PoC is the region from _text to _edata
>>> (i.e. just the contents of the Image).
>>
>> If the new kernel will overwrite the old one, then we do the final copy
>> of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
>> is executed after the dcache is disabled, so that should write it directly
>> to the PoC.  It seems the protocol expects us to invalidate the dcache
>> for that range though, so I added code to do this, essentially what Arun
>> had added.
>>
>> Arun, please try.
>>
> It works without any hiccups :)..
> I have attached the log.
>
> I will try with big-endian UP configuration next.
>

The latest kexec code is working fine in LE/BE mode in UP configuration.

I had to change kexec-tools code a bit to make sure that "kexec -l"
is not putting dtb at an area where kernel is building its initial page
tables.

#########################
diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
index ab7a9ac..8f04473 100644
--- a/kexec/arch/arm64/kexec-arm64.c
+++ b/kexec/arch/arm64/kexec-arm64.c
@@ -526,6 +526,7 @@ int arm64_load_other_segments(struct kexec_info *info)
        off_t dtb_base;
        char *initrd_buf = NULL;
        char command_line[COMMAND_LINE_SIZE] = "";
+       unsigned long dtb_start;

        /* Processing for arm64_opts.dtb and arm64_opts.command_line. */

@@ -554,8 +555,16 @@ int arm64_load_other_segments(struct kexec_info *info)
         * kernel with an alignment of 128 KiB, giving a max supported DTB
         * size of 128 KiB (worst case). */

+       /*
+        * arm64 kernel uses area above kernel image to build
+        * initial page tables. Max required size for this area is 384K. It
+        * happens when CONFIG_ARM64_PGTABLE_LEVELS is set.
+        * So place dtb 512k above kernel image area.
+        */
+       dtb_start = (unsigned long)info->segment[0].mem +
info->segment[0].memsz + 512UL * 1024;
+
        dtb_base = locate_hole(info, dtb_2.size, 128UL * 1024,
-               (unsigned long)info->entry,
+               (unsigned long)dtb_start,
                (unsigned long)info->entry + 512 * 1024 * 1024, 1);

        dbgprintf("dtb:    base %lx, size %lxh (%ld)\n", (unsigned
long)dtb_base,

########################

This code works fine with LE as well as BE. I have attached the logs
for both.

Now before trying SMP configuration I want to know whether the below "kexec -e"
scenarios are valid(required)?

1st stage                                  |               2nd stage
---------------------------------------------------------------------------------------------
LE UP                                      |               BE UP
LE UP                                      |               BE SMP
LE UP                                      |               LE SMP
LE SMP                                   |               LE UP
LE SMP                                   |               BE UP
LE SMP                                   |               BE SMP


--Arun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kexec_be_log
Type: application/octet-stream
Size: 16869 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140804/8a02a68c/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kexec_le_log
Type: application/octet-stream
Size: 15548 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140804/8a02a68c/attachment-0003.obj>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-04 10:16                                   ` Arun Chandran
@ 2014-08-04 11:35                                     ` Mark Rutland
  2014-08-07  0:40                                       ` Geoff Levand
  2014-08-04 17:21                                     ` Geoff Levand
  1 sibling, 1 reply; 49+ messages in thread
From: Mark Rutland @ 2014-08-04 11:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 04, 2014 at 11:16:02AM +0100, Arun Chandran wrote:
> Hi Geoff,
> 
> On Wed, Jul 30, 2014 at 12:52 PM, Arun Chandran <achandran@mvista.com> wrote:
> > On Wed, Jul 30, 2014 at 2:49 AM, Geoff Levand <geoff@infradead.org> wrote:
> >> Hi Mark,
> >>
> >> On Tue, 2014-07-29 at 14:35 +0100, Mark Rutland wrote:
> >>> Since c218bca74eea (arm64: Relax the kernel cache requirements for
> >>> boot), the kernel will flush the cache for anything outside of the Image
> >>> that it writes to before enabling the MMU and caches (e.g. the idmap and
> >>> swapper page tables). Once caches are up we shouldn't care.
> >>>
> >>> Assuming that the existing kernel code is correct, the only region we
> >>> should need to flush out to the PoC is the region from _text to _edata
> >>> (i.e. just the contents of the Image).
> >>
> >> If the new kernel will overwrite the old one, then we do the final copy
> >> of the new kernel in the relocate_new_kernel routine.  relocate_new_kernel
> >> is executed after the dcache is disabled, so that should write it directly
> >> to the PoC.  It seems the protocol expects us to invalidate the dcache
> >> for that range though, so I added code to do this, essentially what Arun
> >> had added.
> >>
> >> Arun, please try.
> >>
> > It works without any hiccups :)..
> > I have attached the log.
> >
> > I will try with big-endian UP configuration next.
> >
> 
> The latest kexec code is working fine in LE/BE mode in UP configuration.
> 
> I had to change kexec-tools code a bit to make sure that "kexec -l"
> is not putting dtb at an area where kernel is building its initial page
> tables.
> 
> #########################
> diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> index ab7a9ac..8f04473 100644
> --- a/kexec/arch/arm64/kexec-arm64.c
> +++ b/kexec/arch/arm64/kexec-arm64.c
> @@ -526,6 +526,7 @@ int arm64_load_other_segments(struct kexec_info *info)
>         off_t dtb_base;
>         char *initrd_buf = NULL;
>         char command_line[COMMAND_LINE_SIZE] = "";
> +       unsigned long dtb_start;
> 
>         /* Processing for arm64_opts.dtb and arm64_opts.command_line. */
> 
> @@ -554,8 +555,16 @@ int arm64_load_other_segments(struct kexec_info *info)
>          * kernel with an alignment of 128 KiB, giving a max supported DTB
>          * size of 128 KiB (worst case). */
> 
> +       /*
> +        * arm64 kernel uses area above kernel image to build
> +        * initial page tables. Max required size for this area is 384K. It
> +        * happens when CONFIG_ARM64_PGTABLE_LEVELS is set.
> +        * So place dtb 512k above kernel image area.
> +        */

The area above the kernel Image holds the BSS too, and from some
experiments I did a while back with an allyesconfig, it was possible to
have many megabytes of BSS.

While 512k might be adequate for a defconfig today, it will become less
so as time goes on.

For kernel Images post v3.17 you can and should read the Image header's
image_size field to figure out how much memory is needed from _text to
_end (i.e. all the memory the kernel will assume is available statically
and will happily clobber). For kernel Images prior to v3.17 the best
thing we can do is guess as you've done here, or place the DTB below the
kernel and page tables in the TEXT_OFFSET area; neither option is
particularly brilliant.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-04 10:16                                   ` Arun Chandran
  2014-08-04 11:35                                     ` Mark Rutland
@ 2014-08-04 17:21                                     ` Geoff Levand
  2014-08-06 13:54                                       ` Arun Chandran
  1 sibling, 1 reply; 49+ messages in thread
From: Geoff Levand @ 2014-08-04 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Mon, 2014-08-04 at 15:46 +0530, Arun Chandran wrote:
> The latest kexec code is working fine in LE/BE mode in UP configuration.
> 
> I had to change kexec-tools code a bit to make sure that "kexec -l"
> is not putting dtb at an area where kernel is building its initial page
> tables.

OK, I'll add in code to handle this.

> Now before trying SMP configuration I want to know whether the below "kexec -e"
> scenarios are valid(required)?
> 
> 1st stage                                  |               2nd stage
> ---------------------------------------------------------------------------------------------
> LE UP                                      |               BE UP
> LE UP                                      |               BE SMP
> LE UP                                      |               LE SMP
> LE SMP                                   |               LE UP
> LE SMP                                   |               BE UP
> LE SMP                                   |               BE SMP

Kexec should work for all combinations of kernel options.  A
bootloader (1st stage) may want to be UP, and custom crash
dump kernels (2nd stage) are generally UP.  UP->UP may have
limited use.  I think all these listed are important to test
though, and going the other way also, so BE-SMP->LE-SMP for
example.

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-04 17:21                                     ` Geoff Levand
@ 2014-08-06 13:54                                       ` Arun Chandran
  2014-08-06 15:51                                         ` Arun Chandran
  2014-08-07 20:07                                         ` Geoff Levand
  0 siblings, 2 replies; 49+ messages in thread
From: Arun Chandran @ 2014-08-06 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, Aug 4, 2014 at 10:51 PM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Mon, 2014-08-04 at 15:46 +0530, Arun Chandran wrote:
>> The latest kexec code is working fine in LE/BE mode in UP configuration.
>>
>> I had to change kexec-tools code a bit to make sure that "kexec -l"
>> is not putting dtb at an area where kernel is building its initial page
>> tables.
>
> OK, I'll add in code to handle this.
>
>> Now before trying SMP configuration I want to know whether the below "kexec -e"
>> scenarios are valid(required)?
>>
>> 1st stage                                  |               2nd stage
>> ---------------------------------------------------------------------------------------------
>> LE UP                                      |               BE UP
>> LE UP                                      |               BE SMP
>> LE UP                                      |               LE SMP
>> LE SMP                                   |               LE UP
>> LE SMP                                   |               BE UP
>> LE SMP                                   |               BE SMP
>
I am testing "kexec -e" with the script below(it will either
reboot the board to BE or LE mode randomly).

This test is done without L3 cache. With L3 I have more troubles.
I think it is best to fix the code without L3 then move
to testing with L3. That fix may solve the problem with L3

###############################
:~$ cat /etc/init.d/S50reboot
#!/bin/sh

sleep 5
i=$RANDOM
j=$(( $i % 2))

if [ $j -eq 0 ] ; then
        mount /dev/mmcblk0p1 /mnt

        count=`cat /mnt/cnt`
        echo "KEXEC rebootng to BE count = $count"
        echo $RANDOM > /mnt/"$count""_BE"

        count=$(( $count + 1 ))
        echo "$count">/mnt/cnt

        kexec -l /mnt/vmlinux_BE.strip
--command-line="console=ttyS0,115200 earlyprintk=uart8
250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M"
        umount /mnt
        kexec -e
else
        mount /dev/mmcblk0p1 /mnt

        count=`cat /mnt/cnt`
        echo "KEXEC rebooting to LE count = $count"
        echo $RANDOM > /mnt/"$count""_LE"

        count=$(( $count + 1 ))
        echo "$count">/mnt/cnt

        kexec -l /mnt/vmlinux_LE.strip
--command-line="console=ttyS0,115200 earlyprintk=uart8
250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M"
        umount /mnt
        kexec -e
fi

exit $?
#############################

I have managed to run this test till 72 times with the
below changes.

############################
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 363a246..7de11ee 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
unsigned int flag,
  break;
  case IND_SOURCE:
  __flush_dcache_area(addr, PAGE_SIZE);
- __flush_dcache_area(dest, PAGE_SIZE);
  break;
  default:
  break;
@@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image)
  phys_addr_t reboot_code_buffer_phys;
  void *reboot_code_buffer;
  struct kexec_ctx *ctx = kexec_image_to_ctx(image);
+ unsigned long start, end;
+ int i;

  BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
  BUG_ON(num_online_cpus() > 1);
@@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image)

  kexec_list_walk(NULL, image->head, kexec_list_flush_cb);

+ start = image->segment[0].mem;
+ end = image->segment[0].mem + image->segment[0].memsz;
+ for (i = 0; i < image->nr_segments; i++) {
+ if (image->segment[i].mem > end)
+ end = image->segment[i].mem + image->segment[i].memsz;
+ }
+
+ start = (unsigned long)phys_to_virt(start);
+ end = (unsigned long)phys_to_virt(end);
+ pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start);
+ __flush_dcache_area((void *)start, end - start);
+ //flush_icache_range(start, end);
+ //mdelay(10);
+
  soft_restart(reboot_code_buffer_phys);
 }

diff --git a/arch/arm64/kernel/relocate_kernel.S
b/arch/arm64/kernel/relocate_kernel.S
index 4b077e1..a49549e 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -61,13 +61,13 @@ relocate_new_kernel:
  mov x20, x13
  mov x21, x14

- prfm pldl1strm, [x21, #64]
+ /*prfm pldl1strm, [x21, #64] */
 1: ldp x22, x23, [x21]
  ldp x24, x25, [x21, #16]
  ldp x26, x27, [x21, #32]
  ldp x28, x29, [x21, #48]
  add x21, x21, #64
- prfm pldl1strm, [x21, #64]
+ /*prfm pldl1strm, [x21, #64]*/
  stnp x22, x23, [x20]
  stnp x24, x25, [x20, #16]
  stnp x26, x27, [x20, #32]
@@ -115,6 +115,8 @@ relocate_new_kernel:
  mov x3, xzr

  ldr x4, kexec_kimage_start
+ dsb sy
+ isb
  br x4

 .align 3
arun at arun-OptiPlex-9010:~/work/aarch64-kernel/linux-kexec_LE$
###########################

Out of this 72 times it booted 34 LE and 38 BE. It failed
from switching from BE to LE.

Mark,
Do you have any pointers to find the problem?

--Arun

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-06 13:54                                       ` Arun Chandran
@ 2014-08-06 15:51                                         ` Arun Chandran
  2014-08-07 20:07                                         ` Geoff Levand
  1 sibling, 0 replies; 49+ messages in thread
From: Arun Chandran @ 2014-08-06 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 6, 2014 at 7:24 PM, Arun Chandran <achandran@mvista.com> wrote:
> Hi,
>
> On Mon, Aug 4, 2014 at 10:51 PM, Geoff Levand <geoff@infradead.org> wrote:
>> Hi Arun,
>>
>> On Mon, 2014-08-04 at 15:46 +0530, Arun Chandran wrote:
>>> The latest kexec code is working fine in LE/BE mode in UP configuration.
>>>
>>> I had to change kexec-tools code a bit to make sure that "kexec -l"
>>> is not putting dtb at an area where kernel is building its initial page
>>> tables.
>>
>> OK, I'll add in code to handle this.
>>
>>> Now before trying SMP configuration I want to know whether the below "kexec -e"
>>> scenarios are valid(required)?
>>>
>>> 1st stage                                  |               2nd stage
>>> ---------------------------------------------------------------------------------------------
>>> LE UP                                      |               BE UP
>>> LE UP                                      |               BE SMP
>>> LE UP                                      |               LE SMP
>>> LE SMP                                   |               LE UP
>>> LE SMP                                   |               BE UP
>>> LE SMP                                   |               BE SMP
>>
> I am testing "kexec -e" with the script below(it will either
> reboot the board to BE or LE mode randomly).
>
> This test is done without L3 cache. With L3 I have more troubles.
> I think it is best to fix the code without L3 then move
> to testing with L3. That fix may solve the problem with L3
>
> ###############################
> :~$ cat /etc/init.d/S50reboot
> #!/bin/sh
>
> sleep 5
> i=$RANDOM
> j=$(( $i % 2))
>
> if [ $j -eq 0 ] ; then
>         mount /dev/mmcblk0p1 /mnt
>
>         count=`cat /mnt/cnt`
>         echo "KEXEC rebootng to BE count = $count"
>         echo $RANDOM > /mnt/"$count""_BE"
>
>         count=$(( $count + 1 ))
>         echo "$count">/mnt/cnt
>
>         kexec -l /mnt/vmlinux_BE.strip
> --command-line="console=ttyS0,115200 earlyprintk=uart8
> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M"
>         umount /mnt
>         kexec -e
> else
>         mount /dev/mmcblk0p1 /mnt
>
>         count=`cat /mnt/cnt`
>         echo "KEXEC rebooting to LE count = $count"
>         echo $RANDOM > /mnt/"$count""_LE"
>
>         count=$(( $count + 1 ))
>         echo "$count">/mnt/cnt
>
>         kexec -l /mnt/vmlinux_LE.strip
> --command-line="console=ttyS0,115200 earlyprintk=uart8
> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=2M"
>         umount /mnt
>         kexec -e
> fi
>
> exit $?
> #############################
>

Some more information:

I am able to run the same test (without L3) for 1.5 hours in case of
no endian switching.

For this testing I used the latest code @
https://git.linaro.org/people/geoff.levand/linux-kexec.git
unlike in the endian switching scenario there is no code change
made for this test.

It ran 370 times for LE-->LE reboots.
It ran 351 times for BE-->BE reboots.

I had to manually stop both tests as it was continuing
without any problem.

I will repeat the same test with L3 cache on

--Arun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-04 11:35                                     ` Mark Rutland
@ 2014-08-07  0:40                                       ` Geoff Levand
  2014-08-07  9:59                                         ` Mark Rutland
  0 siblings, 1 reply; 49+ messages in thread
From: Geoff Levand @ 2014-08-07  0:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, 2014-08-04 at 12:35 +0100, Mark Rutland wrote:
> > +       /*
> > +        * arm64 kernel uses area above kernel image to build
> > +        * initial page tables. Max required size for this area is 384K. It
> > +        * happens when CONFIG_ARM64_PGTABLE_LEVELS is set.
> > +        * So place dtb 512k above kernel image area.
> > +        */
> 
> The area above the kernel Image holds the BSS too, and from some
> experiments I did a while back with an allyesconfig, it was possible to
> have many megabytes of BSS.
> 
> While 512k might be adequate for a defconfig today, it will become less
> so as time goes on.
> 
> For kernel Images post v3.17 you can and should read the Image header's
> image_size field to figure out how much memory is needed from _text to
> _end (i.e. all the memory the kernel will assume is available statically
> and will happily clobber). For kernel Images prior to v3.17 the best
> thing we can do is guess as you've done here, or place the DTB below the
> kernel and page tables in the TEXT_OFFSET area; neither option is
> particularly brilliant.

I haven't looked into it yet, but booting.txt says that the dtb must be
'within the first 512 megabytes from the start of the kernel image'.  If
this restriction still holds we can't just put the dtb at
text_offset + image_size.

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-07  0:40                                       ` Geoff Levand
@ 2014-08-07  9:59                                         ` Mark Rutland
  2014-08-07 17:09                                           ` Geoff Levand
  0 siblings, 1 reply; 49+ messages in thread
From: Mark Rutland @ 2014-08-07  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 07, 2014 at 01:40:03AM +0100, Geoff Levand wrote:
> Hi,

Hi Geoff,

> On Mon, 2014-08-04 at 12:35 +0100, Mark Rutland wrote:
> > > +       /*
> > > +        * arm64 kernel uses area above kernel image to build
> > > +        * initial page tables. Max required size for this area is 384K. It
> > > +        * happens when CONFIG_ARM64_PGTABLE_LEVELS is set.
> > > +        * So place dtb 512k above kernel image area.
> > > +        */
> > 
> > The area above the kernel Image holds the BSS too, and from some
> > experiments I did a while back with an allyesconfig, it was possible to
> > have many megabytes of BSS.
> > 
> > While 512k might be adequate for a defconfig today, it will become less
> > so as time goes on.
> > 
> > For kernel Images post v3.17 you can and should read the Image header's
> > image_size field to figure out how much memory is needed from _text to
> > _end (i.e. all the memory the kernel will assume is available statically
> > and will happily clobber). For kernel Images prior to v3.17 the best
> > thing we can do is guess as you've done here, or place the DTB below the
> > kernel and page tables in the TEXT_OFFSET area; neither option is
> > particularly brilliant.
> 
> I haven't looked into it yet, but booting.txt says that the dtb must be
> 'within the first 512 megabytes from the start of the kernel image'.  If
> this restriction still holds we can't just put the dtb at
> text_offset + image_size.

Isn't that only a problem if text_offset + image_size is huge (510MB+)?

The wording in booting.txt doesn't sound quite right. As I understand
it, the 512MB restriction is because of the way we map the dtb in the
swapper page tables.

So a better wording would be "the kernel and DTB must be placed in the
same naturally-aligned 512MB region of memory".

It should be possible to get rid of the restrictions on the placement of
the image and DTB, but this requires reworking the VA layout (using a
kernel text mapping separate from the linear map as with x86_64, and
similarly having a separate DTB mapping), and unfortunately I haven't
had the time to do that.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-07  9:59                                         ` Mark Rutland
@ 2014-08-07 17:09                                           ` Geoff Levand
  0 siblings, 0 replies; 49+ messages in thread
From: Geoff Levand @ 2014-08-07 17:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2014-08-07 at 10:59 +0100, Mark Rutland wrote:
> On Thu, Aug 07, 2014 at 01:40:03AM +0100, Geoff Levand wrote:
> > 
> > I haven't looked into it yet, but booting.txt says that the dtb must be
> > 'within the first 512 megabytes from the start of the kernel image'.  If
> > this restriction still holds we can't just put the dtb at
> > text_offset + image_size.
> 
> Isn't that only a problem if text_offset + image_size is huge (510MB+)?

Yes, a limit I don't think we need to be too concerned about at present.

> The wording in booting.txt doesn't sound quite right. As I understand
> it, the 512MB restriction is because of the way we map the dtb in the
> swapper page tables.
> 
> So a better wording would be "the kernel and DTB must be placed in the
> same naturally-aligned 512MB region of memory".
> 
> It should be possible to get rid of the restrictions on the placement of
> the image and DTB, but this requires reworking the VA layout (using a
> kernel text mapping separate from the linear map as with x86_64, and
> similarly having a separate DTB mapping), and unfortunately I haven't
> had the time to do that.

Or maybe just relocate the dtb at startup to within range.

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-06 13:54                                       ` Arun Chandran
  2014-08-06 15:51                                         ` Arun Chandran
@ 2014-08-07 20:07                                         ` Geoff Levand
  2014-08-08  5:46                                           ` Arun Chandran
  1 sibling, 1 reply; 49+ messages in thread
From: Geoff Levand @ 2014-08-07 20:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote:

> I have managed to run this test till 72 times with the
> below changes.
> 
> ############################
> diff --git a/arch/arm64/kernel/machine_kexec.c
> b/arch/arm64/kernel/machine_kexec.c
> index 363a246..7de11ee 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
> unsigned int flag,
>   break;
>   case IND_SOURCE:
>   __flush_dcache_area(addr, PAGE_SIZE);
> - __flush_dcache_area(dest, PAGE_SIZE);
>   break;
>   default:
>   break;
> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image)
>   phys_addr_t reboot_code_buffer_phys;
>   void *reboot_code_buffer;
>   struct kexec_ctx *ctx = kexec_image_to_ctx(image);
> + unsigned long start, end;
> + int i;
> 
>   BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
>   BUG_ON(num_online_cpus() > 1);
> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image)
> 
>   kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
> 
> + start = image->segment[0].mem;
> + end = image->segment[0].mem + image->segment[0].memsz;
> + for (i = 0; i < image->nr_segments; i++) {
> + if (image->segment[i].mem > end)
> + end = image->segment[i].mem + image->segment[i].memsz;
> + }
> +
> + start = (unsigned long)phys_to_virt(start);
> + end = (unsigned long)phys_to_virt(end);
> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start);
> + __flush_dcache_area((void *)start, end - start);
> + //flush_icache_range(start, end);
> + //mdelay(10);
> +
>   soft_restart(reboot_code_buffer_phys);
>  }

Doing the flush in kexec_list_flush_cb() is almost the same
as using the image->segment to flush.  Did you see a
difference on your system?

> diff --git a/arch/arm64/kernel/relocate_kernel.S
> b/arch/arm64/kernel/relocate_kernel.S
> index 4b077e1..a49549e 100644
> --- a/arch/arm64/kernel/relocate_kernel.S
> +++ b/arch/arm64/kernel/relocate_kernel.S

I think these changes are good.  I'll add them in.

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-07 20:07                                         ` Geoff Levand
@ 2014-08-08  5:46                                           ` Arun Chandran
  2014-08-08 10:03                                             ` Arun Chandran
  0 siblings, 1 reply; 49+ messages in thread
From: Arun Chandran @ 2014-08-08  5:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, Aug 8, 2014 at 1:37 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote:
>
>> I have managed to run this test till 72 times with the
>> below changes.
>>
>> ############################
>> diff --git a/arch/arm64/kernel/machine_kexec.c
>> b/arch/arm64/kernel/machine_kexec.c
>> index 363a246..7de11ee 100644
>> --- a/arch/arm64/kernel/machine_kexec.c
>> +++ b/arch/arm64/kernel/machine_kexec.c
>> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
>> unsigned int flag,
>>   break;
>>   case IND_SOURCE:
>>   __flush_dcache_area(addr, PAGE_SIZE);
>> - __flush_dcache_area(dest, PAGE_SIZE);
>>   break;
>>   default:
>>   break;
>> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image)
>>   phys_addr_t reboot_code_buffer_phys;
>>   void *reboot_code_buffer;
>>   struct kexec_ctx *ctx = kexec_image_to_ctx(image);
>> + unsigned long start, end;
>> + int i;
>>
>>   BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
>>   BUG_ON(num_online_cpus() > 1);
>> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image)
>>
>>   kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
>>
>> + start = image->segment[0].mem;
>> + end = image->segment[0].mem + image->segment[0].memsz;
>> + for (i = 0; i < image->nr_segments; i++) {
>> + if (image->segment[i].mem > end)
>> + end = image->segment[i].mem + image->segment[i].memsz;
>> + }
>> +
>> + start = (unsigned long)phys_to_virt(start);
>> + end = (unsigned long)phys_to_virt(end);
>> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start);
>> + __flush_dcache_area((void *)start, end - start);
>> + //flush_icache_range(start, end);
>> + //mdelay(10);
>> +
>>   soft_restart(reboot_code_buffer_phys);
>>  }
>
> Doing the flush in kexec_list_flush_cb() is almost the same
> as using the image->segment to flush.  Did you see a
> difference on your system?
>

Yes I can see the difference. Let me explain it in detail.

I am doing a stress test of "kexec -e" with the below reboot
script.

################################
#!/bin/sh

sleep 5
i=$RANDOM
j=$(( $i % 2))

mount /dev/mmcblk0p1 /mnt
count=`cat /mnt/cnt`

if [ $j -eq 0 ] ; then
    echo "KEXEC rebootng to BE count = $count"
    echo $RANDOM > /mnt/"$count""_BE"
    kexec -l /mnt/vmlinux_BE.strip
--command-line="console=ttyS0,115200 earlyprintk=uart8
250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
else
   echo "KEXEC rebooting to LE count = $count"
   echo $RANDOM > /mnt/"$count""_LE"
    kexec -l /mnt/vmlinux_LE.strip
--command-line="console=ttyS0,115200 earlyprintk=uart8
250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
fi

count=$(( $count + 1 ))
echo "$count">/mnt/cnt
umount /mnt
kexec -e
exit $?
###############################

Observations with the default code
@https://git.linaro.org/people/geoff.levand/linux-kexec.git
Changed last on "Mon, 4 Aug 2014 23:24:10 +0000 (16:24 -0700)"

a) LE to LE worked without L3 cache on
b) BE to BE worked without L3 cache on
c) Random endian switching does not work in any case (with L3, No L3)
    It breaks very early and unstable.

Now with the below modifications

#############################
 diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 363a246..571b68d 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
unsigned int flag,
                break;
        case IND_SOURCE:
                __flush_dcache_area(addr, PAGE_SIZE);
-               __flush_dcache_area(dest, PAGE_SIZE);
                break;
        default:
                break;
@@ -636,11 +635,13 @@ static void kexec_list_flush_cb(void *ctx ,
unsigned int flag,
  * Called from the core kexec code for a sys_reboot with
LINUX_REBOOT_CMD_KEXEC.
  */

+unsigned long dflush_start, dflush_end;
 void machine_kexec(struct kimage *image)
 {
        phys_addr_t reboot_code_buffer_phys;
        void *reboot_code_buffer;
        struct kexec_ctx *ctx = kexec_image_to_ctx(image);
+       int i;

        BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
        BUG_ON(num_online_cpus() > 1);
@@ -698,6 +699,19 @@ void machine_kexec(struct kimage *image)

        kexec_list_walk(NULL, image->head, kexec_list_flush_cb);

+       dflush_start = image->segment[0].mem;
+       dflush_end = image->segment[0].mem + image->segment[0].memsz;
+       for (i = 0; i < image->nr_segments; i++) {
+               if (image->segment[i].mem > dflush_end)
+                       dflush_end = image->segment[i].mem +
image->segment[i].memsz;
+       }
+
+       dflush_start = (unsigned long)phys_to_virt(dflush_start);
+       dflush_end = (unsigned long)phys_to_virt(dflush_end);
+
+       __flush_dcache_area((void *)&dflush_start, sizeof(dflush_start));
+       __flush_dcache_area((void *)&dflush_end, sizeof(dflush_end));
+
        soft_restart(reboot_code_buffer_phys);
 }

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index aa13521..b8c58d8 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -57,6 +57,7 @@ unsigned long __stack_chk_guard __read_mostly;
 EXPORT_SYMBOL(__stack_chk_guard);
 #endif

+extern unsigned long dflush_start, dflush_end;
 static void setup_restart(void)
 {
        /*
@@ -78,6 +79,8 @@ static void setup_restart(void)

        /* Push out any further dirty data, and ensure cache is empty */
        flush_cache_all();
+
+       __flush_dcache_area((void*)dflush_start, dflush_end - dflush_start);
 }

 void soft_restart(unsigned long addr)
diff --git a/arch/arm64/kernel/relocate_kernel.S
b/arch/arm64/kernel/relocate_kernel.S
index 4b077e1..a49549e 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -61,13 +61,13 @@ relocate_new_kernel:
        mov x20, x13
        mov x21, x14

-       prfm    pldl1strm, [x21, #64]
+       /*prfm  pldl1strm, [x21, #64] */
 1:     ldp     x22, x23, [x21]
        ldp     x24, x25, [x21, #16]
        ldp     x26, x27, [x21, #32]
        ldp     x28, x29, [x21, #48]
        add     x21, x21, #64
-       prfm    pldl1strm, [x21, #64]
+       /*prfm  pldl1strm, [x21, #64]*/
        stnp    x22, x23, [x20]
        stnp    x24, x25, [x20, #16]
        stnp    x26, x27, [x20, #32]
@@ -115,6 +115,8 @@ relocate_new_kernel:
        mov     x3, xzr

        ldr     x4, kexec_kimage_start
+       dsb     sy
+       isb
        br      x4

 .align 3
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index f1619c0..7d81b86 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -52,6 +52,12 @@
  */
 ENTRY(cpu_cache_off)
        mrs     x0, sctlr_el1
+       bic     x0, x0, #1 << 12                // clear SCTLR.C
+       msr     sctlr_el1, x0
+       isb
+       dsb     sy
+
+       mrs     x0, sctlr_el1
        bic     x0, x0, #1 << 2                 // clear SCTLR.C
        msr     sctlr_el1, x0
        isb
###########################

a) I am able to run random endian switching test for
11.5 hours without any breaks.

It rebooted totally 6542 times.
total LE boots = 3241
total BE boots = 3301

Out of that 1625 times it switched from "LE to BE"
or "BE to LE"

One major modification is flushing the Dcache area of
the new Image after turning off CPU caches.

This makes sure that L3 contains no lines in the
new "kernel Image area". I am still not sure what happens
with the other lines still in L3. Does the new kernel has
to do a __flush_dcahe_area() on all the new pages
it is gonna give to userspace?

Please refer to the discussion at
http://lists.linaro.org/pipermail/linaro-kernel/2013-August/006155.html
for more details.

It says that L3 cache becomes transparent when
lower level caches are OFF. So we need to clean
L3 when it is transparent.

And about adding barrier + removing pre-fetching  in
arch/arm64/kernel/relocate_kernel.S. I think it makes
sure that no stale data is present with CPU while
performing a endian switching. Am I right here?

There is one more change that is turning off Icache.
I am not sure why I did this. I will perform the test
without this change now.

>> diff --git a/arch/arm64/kernel/relocate_kernel.S
>> b/arch/arm64/kernel/relocate_kernel.S
>> index 4b077e1..a49549e 100644
>> --- a/arch/arm64/kernel/relocate_kernel.S
>> +++ b/arch/arm64/kernel/relocate_kernel.S
>
> I think these changes are good.  I'll add them in.

Yes Please. Thank you for letting it in.

--Arun

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-08  5:46                                           ` Arun Chandran
@ 2014-08-08 10:03                                             ` Arun Chandran
  2014-08-12  5:42                                               ` Arun Chandran
  0 siblings, 1 reply; 49+ messages in thread
From: Arun Chandran @ 2014-08-08 10:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 8, 2014 at 11:16 AM, Arun Chandran <achandran@mvista.com> wrote:
> Hi,
>
> On Fri, Aug 8, 2014 at 1:37 AM, Geoff Levand <geoff@infradead.org> wrote:
>> Hi Arun,
>>
>> On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote:
>>
>>> I have managed to run this test till 72 times with the
>>> below changes.
>>>
>>> ############################
>>> diff --git a/arch/arm64/kernel/machine_kexec.c
>>> b/arch/arm64/kernel/machine_kexec.c
>>> index 363a246..7de11ee 100644
>>> --- a/arch/arm64/kernel/machine_kexec.c
>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
>>> unsigned int flag,
>>>   break;
>>>   case IND_SOURCE:
>>>   __flush_dcache_area(addr, PAGE_SIZE);
>>> - __flush_dcache_area(dest, PAGE_SIZE);
>>>   break;
>>>   default:
>>>   break;
>>> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image)
>>>   phys_addr_t reboot_code_buffer_phys;
>>>   void *reboot_code_buffer;
>>>   struct kexec_ctx *ctx = kexec_image_to_ctx(image);
>>> + unsigned long start, end;
>>> + int i;
>>>
>>>   BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
>>>   BUG_ON(num_online_cpus() > 1);
>>> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image)
>>>
>>>   kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
>>>
>>> + start = image->segment[0].mem;
>>> + end = image->segment[0].mem + image->segment[0].memsz;
>>> + for (i = 0; i < image->nr_segments; i++) {
>>> + if (image->segment[i].mem > end)
>>> + end = image->segment[i].mem + image->segment[i].memsz;
>>> + }
>>> +
>>> + start = (unsigned long)phys_to_virt(start);
>>> + end = (unsigned long)phys_to_virt(end);
>>> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start);
>>> + __flush_dcache_area((void *)start, end - start);
>>> + //flush_icache_range(start, end);
>>> + //mdelay(10);
>>> +
>>>   soft_restart(reboot_code_buffer_phys);
>>>  }
>>
>> Doing the flush in kexec_list_flush_cb() is almost the same
>> as using the image->segment to flush.  Did you see a
>> difference on your system?
>>
>
> Yes I can see the difference. Let me explain it in detail.
>
> I am doing a stress test of "kexec -e" with the below reboot
> script.
>
> ################################
> #!/bin/sh
>
> sleep 5
> i=$RANDOM
> j=$(( $i % 2))
>
> mount /dev/mmcblk0p1 /mnt
> count=`cat /mnt/cnt`
>
> if [ $j -eq 0 ] ; then
>     echo "KEXEC rebootng to BE count = $count"
>     echo $RANDOM > /mnt/"$count""_BE"
>     kexec -l /mnt/vmlinux_BE.strip
> --command-line="console=ttyS0,115200 earlyprintk=uart8
> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
> else
>    echo "KEXEC rebooting to LE count = $count"
>    echo $RANDOM > /mnt/"$count""_LE"
>     kexec -l /mnt/vmlinux_LE.strip
> --command-line="console=ttyS0,115200 earlyprintk=uart8
> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
> fi
>
> count=$(( $count + 1 ))
> echo "$count">/mnt/cnt
> umount /mnt
> kexec -e
> exit $?
> ###############################
>
> Observations with the default code
> @https://git.linaro.org/people/geoff.levand/linux-kexec.git
> Changed last on "Mon, 4 Aug 2014 23:24:10 +0000 (16:24 -0700)"
>
> a) LE to LE worked without L3 cache on
> b) BE to BE worked without L3 cache on
> c) Random endian switching does not work in any case (with L3, No L3)
>     It breaks very early and unstable.
>
> Now with the below modifications
>

I think the more cleaner approach is to invalidate
the cache lines from arch/arm64/kernel/relocate_kernel.S
As this code is already aware of the destination it has
to copy the 2nd stage kernel.


########################
diff --git a/arch/arm64/kernel/relocate_kernel.S
b/arch/arm64/kernel/relocate_kernel.S
index 4b077e1..6880c1a 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -31,6 +31,13 @@

 .align 3

+.macro dcache_line_size, reg, tmp
+mrs    \tmp, ctr_el0                   // read CTR
+ubfm   \tmp, \tmp, #16, #19            // cache line size encoding
+mov    \reg, #4                        // bytes per word
+lsl    \reg, \reg, \tmp                // actual cache line size
+.endm
+
 .globl relocate_new_kernel
 relocate_new_kernel:

@@ -58,23 +65,46 @@ relocate_new_kernel:

        /* source: copy_page(x20 = dest, x21 = addr) */

+       mov     x0, x13
+       add     x1, x13, #PAGE_SIZE
+
+       /* Invalidate the destination cache area */
+__inval_cache_range:
+       dcache_line_size x2, x3
+       sub     x3, x2, #1
+       tst     x1, x3                          // end cache line aligned?
+       bic     x1, x1, x3
+       b.eq    1f
+       dc      civac, x1                       // clean & invalidate D / U line
+1:     tst     x0, x3                          // start cache line aligned?
+       bic     x0, x0, x3
+       b.eq    2f
+       dc      civac, x0                       // clean & invalidate D / U line
+       b       3f
+2:     dc      ivac, x0                        // invalidate D / U line
+3:     add     x0, x0, x2
+       cmp     x0, x1
+       b.lo    2b
+       dsb     sy
+
        mov x20, x13
        mov x21, x14

-       prfm    pldl1strm, [x21, #64]
-1:     ldp     x22, x23, [x21]
+       /*prfm  pldl1strm, [x21, #64] */
+.Lcopy_data:
+       ldp     x22, x23, [x21]
        ldp     x24, x25, [x21, #16]
        ldp     x26, x27, [x21, #32]
        ldp     x28, x29, [x21, #48]
        add     x21, x21, #64
-       prfm    pldl1strm, [x21, #64]
+       /*prfm  pldl1strm, [x21, #64]*/
        stnp    x22, x23, [x20]
        stnp    x24, x25, [x20, #16]
        stnp    x26, x27, [x20, #32]
        stnp    x28, x29, [x20, #48]
        add     x20, x20, #64
        tst     x21, #(PAGE_SIZE - 1)
-       b.ne    1b
+       b.ne    .Lcopy_data

        /* dest += PAGE_SIZE */

@@ -115,6 +145,8 @@ relocate_new_kernel:
        mov     x3, xzr

        ldr     x4, kexec_kimage_start
+       dsb     sy
+       isb
        br      x4

 .align 3
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index f1619c0..c62cba7 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -52,6 +52,13 @@
  */
 ENTRY(cpu_cache_off)
        mrs     x0, sctlr_el1
+       /* Turn off I-Cache */
+       bic     x0, x0, #1 << 12                // clear SCTLR.C
+       msr     sctlr_el1, x0
+       isb
+       dsb     sy
+
+       mrs     x0, sctlr_el1
        bic     x0, x0, #1 << 2                 // clear SCTLR.C
        msr     sctlr_el1, x0
        isb

#############################

--Arun

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-08 10:03                                             ` Arun Chandran
@ 2014-08-12  5:42                                               ` Arun Chandran
  2014-08-13 11:09                                                 ` Arun Chandran
  0 siblings, 1 reply; 49+ messages in thread
From: Arun Chandran @ 2014-08-12  5:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff, Mark,


Sorry for top posting. I hope we have solved almost all the problems
with kexec in uni-processor scenario except converting soft_restart()
to assembly (I will give try to do this).

kexec is stress tested with L3 cache on with the below changes.
It ran for 17 hours and rebooted totally 8226 times without any problem.

Total LE boots - 4122
Total BE boots - 4104

Total LE to BE or BE to LE switching - 4112

##########################
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 363a246..5b15a00 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
unsigned int flag,
                break;
        case IND_SOURCE:
                __flush_dcache_area(addr, PAGE_SIZE);
-               __flush_dcache_area(dest, PAGE_SIZE);
                break;
        default:
                break;
diff --git a/arch/arm64/kernel/relocate_kernel.S
b/arch/arm64/kernel/relocate_kernel.S
index 4b077e1..e890516 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -31,6 +31,13 @@

 .align 3

+.macro dcache_line_size, reg, tmp
+mrs    \tmp, ctr_el0                   // read CTR
+ubfm   \tmp, \tmp, #16, #19            // cache line size encoding
+mov    \reg, #4                        // bytes per word
+lsl    \reg, \reg, \tmp                // actual cache line size
+.endm
+
 .globl relocate_new_kernel
 relocate_new_kernel:

@@ -56,25 +63,51 @@ relocate_new_kernel:
 .Ltest_source:
        tbz     x10, IND_SOURCE_BIT, .Ltest_indirection

-       /* source: copy_page(x20 = dest, x21 = addr) */
+       mov     x0, x13
+       add     x1, x13, #PAGE_SIZE
+
+       /* Invalidate the destination cache area to make sure that
+        * all the data required for the second stage kernel is
+        * intact at PoC. This is the safest place to do this activity
+        * as we are running with MMU and D-cache off.
+        */
+__inval_cache_range:
+       dcache_line_size x2, x3
+       sub     x3, x2, #1
+       tst     x1, x3                          // end cache line aligned?
+       bic     x1, x1, x3
+       b.eq    1f
+       dc      ivac, x1                        // invalidate D / U line
+1:     tst     x0, x3                          // start cache line aligned?
+       bic     x0, x0, x3
+       b.eq    2f
+       dc      ivac, x0                        // invalidate D / U line
+       b       3f
+2:     dc      ivac, x0                        // invalidate D / U line
+3:     add     x0, x0, x2
+       cmp     x0, x1
+       b.lo    2b
+       dsb     sy

+       /* source: copy_page(x20 = dest, x21 = addr) */
        mov x20, x13
        mov x21, x14

-       prfm    pldl1strm, [x21, #64]
-1:     ldp     x22, x23, [x21]
+       /*prfm  pldl1strm, [x21, #64] */
+.Lcopy_data:
+       ldp     x22, x23, [x21]
        ldp     x24, x25, [x21, #16]
        ldp     x26, x27, [x21, #32]
        ldp     x28, x29, [x21, #48]
        add     x21, x21, #64
-       prfm    pldl1strm, [x21, #64]
+       /*prfm  pldl1strm, [x21, #64]*/
        stnp    x22, x23, [x20]
        stnp    x24, x25, [x20, #16]
        stnp    x26, x27, [x20, #32]
        stnp    x28, x29, [x20, #48]
        add     x20, x20, #64
        tst     x21, #(PAGE_SIZE - 1)
-       b.ne    1b
+       b.ne    .Lcopy_data

        /* dest += PAGE_SIZE */

@@ -115,6 +148,11 @@ relocate_new_kernel:
        mov     x3, xzr

        ldr     x4, kexec_kimage_start
+
+       /* Clean entire I-cache */
+       ic      ialluis
+       isb
+       dsb     sy
        br      x4

 .align 3
################################

I have attached the patch for this.
Now I will try kexec in SMP configuration.

--Arun

On Fri, Aug 8, 2014 at 3:33 PM, Arun Chandran <achandran@mvista.com> wrote:
> On Fri, Aug 8, 2014 at 11:16 AM, Arun Chandran <achandran@mvista.com> wrote:
>> Hi,
>>
>> On Fri, Aug 8, 2014 at 1:37 AM, Geoff Levand <geoff@infradead.org> wrote:
>>> Hi Arun,
>>>
>>> On Wed, 2014-08-06 at 19:24 +0530, Arun Chandran wrote:
>>>
>>>> I have managed to run this test till 72 times with the
>>>> below changes.
>>>>
>>>> ############################
>>>> diff --git a/arch/arm64/kernel/machine_kexec.c
>>>> b/arch/arm64/kernel/machine_kexec.c
>>>> index 363a246..7de11ee 100644
>>>> --- a/arch/arm64/kernel/machine_kexec.c
>>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>>> @@ -623,7 +623,6 @@ static void kexec_list_flush_cb(void *ctx ,
>>>> unsigned int flag,
>>>>   break;
>>>>   case IND_SOURCE:
>>>>   __flush_dcache_area(addr, PAGE_SIZE);
>>>> - __flush_dcache_area(dest, PAGE_SIZE);
>>>>   break;
>>>>   default:
>>>>   break;
>>>> @@ -641,6 +640,8 @@ void machine_kexec(struct kimage *image)
>>>>   phys_addr_t reboot_code_buffer_phys;
>>>>   void *reboot_code_buffer;
>>>>   struct kexec_ctx *ctx = kexec_image_to_ctx(image);
>>>> + unsigned long start, end;
>>>> + int i;
>>>>
>>>>   BUG_ON(relocate_new_kernel_size > KEXEC_CONTROL_PAGE_SIZE);
>>>>   BUG_ON(num_online_cpus() > 1);
>>>> @@ -698,6 +699,20 @@ void machine_kexec(struct kimage *image)
>>>>
>>>>   kexec_list_walk(NULL, image->head, kexec_list_flush_cb);
>>>>
>>>> + start = image->segment[0].mem;
>>>> + end = image->segment[0].mem + image->segment[0].memsz;
>>>> + for (i = 0; i < image->nr_segments; i++) {
>>>> + if (image->segment[i].mem > end)
>>>> + end = image->segment[i].mem + image->segment[i].memsz;
>>>> + }
>>>> +
>>>> + start = (unsigned long)phys_to_virt(start);
>>>> + end = (unsigned long)phys_to_virt(end);
>>>> + pr_info("flushing from %lx to %lx size = %lx\n", start, end, end - start);
>>>> + __flush_dcache_area((void *)start, end - start);
>>>> + //flush_icache_range(start, end);
>>>> + //mdelay(10);
>>>> +
>>>>   soft_restart(reboot_code_buffer_phys);
>>>>  }
>>>
>>> Doing the flush in kexec_list_flush_cb() is almost the same
>>> as using the image->segment to flush.  Did you see a
>>> difference on your system?
>>>
>>
>> Yes I can see the difference. Let me explain it in detail.
>>
>> I am doing a stress test of "kexec -e" with the below reboot
>> script.
>>
>> ################################
>> #!/bin/sh
>>
>> sleep 5
>> i=$RANDOM
>> j=$(( $i % 2))
>>
>> mount /dev/mmcblk0p1 /mnt
>> count=`cat /mnt/cnt`
>>
>> if [ $j -eq 0 ] ; then
>>     echo "KEXEC rebootng to BE count = $count"
>>     echo $RANDOM > /mnt/"$count""_BE"
>>     kexec -l /mnt/vmlinux_BE.strip
>> --command-line="console=ttyS0,115200 earlyprintk=uart8
>> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
>> else
>>    echo "KEXEC rebooting to LE count = $count"
>>    echo $RANDOM > /mnt/"$count""_LE"
>>     kexec -l /mnt/vmlinux_LE.strip
>> --command-line="console=ttyS0,115200 earlyprintk=uart8
>> 250-32bit,0x1c020000 debug swiotlb=65536 log_buf_len=4M"
>> fi
>>
>> count=$(( $count + 1 ))
>> echo "$count">/mnt/cnt
>> umount /mnt
>> kexec -e
>> exit $?
>> ###############################
>>
>> Observations with the default code
>> @https://git.linaro.org/people/geoff.levand/linux-kexec.git
>> Changed last on "Mon, 4 Aug 2014 23:24:10 +0000 (16:24 -0700)"
>>
>> a) LE to LE worked without L3 cache on
>> b) BE to BE worked without L3 cache on
>> c) Random endian switching does not work in any case (with L3, No L3)
>>     It breaks very early and unstable.
>>
>> Now with the below modifications
>>
>
> I think the more cleaner approach is to invalidate
> the cache lines from arch/arm64/kernel/relocate_kernel.S
> As this code is already aware of the destination it has
> to copy the 2nd stage kernel.
>
>
> ########################
> diff --git a/arch/arm64/kernel/relocate_kernel.S
> b/arch/arm64/kernel/relocate_kernel.S
> index 4b077e1..6880c1a 100644
> --- a/arch/arm64/kernel/relocate_kernel.S
> +++ b/arch/arm64/kernel/relocate_kernel.S
> @@ -31,6 +31,13 @@
>
>  .align 3
>
> +.macro dcache_line_size, reg, tmp
> +mrs    \tmp, ctr_el0                   // read CTR
> +ubfm   \tmp, \tmp, #16, #19            // cache line size encoding
> +mov    \reg, #4                        // bytes per word
> +lsl    \reg, \reg, \tmp                // actual cache line size
> +.endm
> +
>  .globl relocate_new_kernel
>  relocate_new_kernel:
>
> @@ -58,23 +65,46 @@ relocate_new_kernel:
>
>         /* source: copy_page(x20 = dest, x21 = addr) */
>
> +       mov     x0, x13
> +       add     x1, x13, #PAGE_SIZE
> +
> +       /* Invalidate the destination cache area */
> +__inval_cache_range:
> +       dcache_line_size x2, x3
> +       sub     x3, x2, #1
> +       tst     x1, x3                          // end cache line aligned?
> +       bic     x1, x1, x3
> +       b.eq    1f
> +       dc      civac, x1                       // clean & invalidate D / U line
> +1:     tst     x0, x3                          // start cache line aligned?
> +       bic     x0, x0, x3
> +       b.eq    2f
> +       dc      civac, x0                       // clean & invalidate D / U line
> +       b       3f
> +2:     dc      ivac, x0                        // invalidate D / U line
> +3:     add     x0, x0, x2
> +       cmp     x0, x1
> +       b.lo    2b
> +       dsb     sy
> +
>         mov x20, x13
>         mov x21, x14
>
> -       prfm    pldl1strm, [x21, #64]
> -1:     ldp     x22, x23, [x21]
> +       /*prfm  pldl1strm, [x21, #64] */
> +.Lcopy_data:
> +       ldp     x22, x23, [x21]
>         ldp     x24, x25, [x21, #16]
>         ldp     x26, x27, [x21, #32]
>         ldp     x28, x29, [x21, #48]
>         add     x21, x21, #64
> -       prfm    pldl1strm, [x21, #64]
> +       /*prfm  pldl1strm, [x21, #64]*/
>         stnp    x22, x23, [x20]
>         stnp    x24, x25, [x20, #16]
>         stnp    x26, x27, [x20, #32]
>         stnp    x28, x29, [x20, #48]
>         add     x20, x20, #64
>         tst     x21, #(PAGE_SIZE - 1)
> -       b.ne    1b
> +       b.ne    .Lcopy_data
>
>         /* dest += PAGE_SIZE */
>
> @@ -115,6 +145,8 @@ relocate_new_kernel:
>         mov     x3, xzr
>
>         ldr     x4, kexec_kimage_start
> +       dsb     sy
> +       isb
>         br      x4
>
>  .align 3
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index f1619c0..c62cba7 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -52,6 +52,13 @@
>   */
>  ENTRY(cpu_cache_off)
>         mrs     x0, sctlr_el1
> +       /* Turn off I-Cache */
> +       bic     x0, x0, #1 << 12                // clear SCTLR.C
> +       msr     sctlr_el1, x0
> +       isb
> +       dsb     sy
> +
> +       mrs     x0, sctlr_el1
>         bic     x0, x0, #1 << 2                 // clear SCTLR.C
>         msr     sctlr_el1, x0
>         isb
>
> #############################
>
> --Arun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-LE-BE-switching-worked-with-L3.patch
Type: text/x-patch
Size: 2946 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140812/1d732f96/attachment.bin>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-12  5:42                                               ` Arun Chandran
@ 2014-08-13 11:09                                                 ` Arun Chandran
  2014-08-26 22:32                                                   ` Geoff Levand
  0 siblings, 1 reply; 49+ messages in thread
From: Arun Chandran @ 2014-08-13 11:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Tue, Aug 12, 2014 at 11:12 AM, Arun Chandran <achandran@mvista.com> wrote:
> Hi Geoff, Mark,
>
>
> Sorry for top posting. I hope we have solved almost all the problems
> with kexec in uni-processor scenario except converting soft_restart()
> to assembly (I will give try to do this).
>

I have one more concern regarding flushing of D-cache area corresponding
to the kexec_list entrees.

Currently kexec_list_walk() is doing

1) flush_dcache_area of the kexec_list[0] till PAGE_SIZE

2) continue accessing entries in kexec_list[0] to PAGE_SIZE

3) switch to next kexec_list depending upon kexec_list[entry] & flag
==  IND_INDIRECTION

4) goto 1

Shouldn't that be doing flush_dcache_area() after completely using the list??

Like given below?
######################
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index 5b15a00..ffb3b54 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -155,6 +155,7 @@ static void kexec_list_walk(void *ctx, unsigned
long kimage_head,
 {
        void *dest;
        unsigned long *entry;
+       void *last_accessed_dir = NULL;

        for (entry = &kimage_head, dest = NULL; ; entry++) {
                unsigned int flag = *entry & IND_FLAGS;
@@ -163,7 +164,10 @@ static void kexec_list_walk(void *ctx, unsigned
long kimage_head,
                switch (flag) {
                case IND_INDIRECTION:
                        entry = (unsigned long *)addr - 1;
-                       cb(ctx, flag, addr, NULL);
+                       if (last_accessed_dir != addr) {
+                               cb(ctx, flag, last_accessed_dir, NULL);
+                               last_accessed_dir = addr;
+                       }
                        break;
                case IND_DESTINATION:
                        dest = addr;
@@ -174,7 +178,7 @@ static void kexec_list_walk(void *ctx, unsigned
long kimage_head,
                        dest += PAGE_SIZE;
                        break;
                case IND_DONE:
-                       cb(ctx, flag , NULL, NULL);
+                       cb(ctx, flag , last_accessed_dir, NULL);
                        return;
                default:
                        pr_devel("%s:%d unknown flag %xh\n", __func__, __LINE__,
@@ -617,6 +621,9 @@ on_exit:
 static void kexec_list_flush_cb(void *ctx , unsigned int flag,
        void *addr, void *dest)
 {
+       if (addr == NULL)
+               return;
+
        switch (flag) {
        case IND_INDIRECTION:
                __flush_dcache_area(addr, PAGE_SIZE);
@@ -624,6 +631,9 @@ static void kexec_list_flush_cb(void *ctx ,
unsigned int flag,
        case IND_SOURCE:
                __flush_dcache_area(addr, PAGE_SIZE);
                break;
+       case IND_DONE:
+               __flush_dcache_area(addr, PAGE_SIZE);
+               break;
        default:
                break;
        }

########################

--Arun

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-13 11:09                                                 ` Arun Chandran
@ 2014-08-26 22:32                                                   ` Geoff Levand
  2014-08-27  4:56                                                     ` Arun Chandran
  0 siblings, 1 reply; 49+ messages in thread
From: Geoff Levand @ 2014-08-26 22:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arun,

On Wed, 2014-08-13 at 16:39 +0530, Arun Chandran wrote:
> I have one more concern regarding flushing of D-cache area corresponding
> to the kexec_list entrees.
> 
> Currently kexec_list_walk() is doing
> 
> 1) flush_dcache_area of the kexec_list[0] till PAGE_SIZE
> 
> 2) continue accessing entries in kexec_list[0] to PAGE_SIZE
> 
> 3) switch to next kexec_list depending upon kexec_list[entry] & flag
> ==  IND_INDIRECTION
> 
> 4) goto 1
> 
> Shouldn't that be doing flush_dcache_area() after completely using the list??

We just want to get any data in the dcache out to the PoC before
disabling the dcache, so as long as there are only reads, and no writes
to those addresses, kexec_list_walk() should work OK.

I will move the flush of the new kernel image to after it is copied in
relocate_new_kernel().  I think that your L3 cache may not work with
what we have now:

  current:  invalidate dcache -> turn off dcache -> write data to PoC
  proposed: turn off dcache -> write data to PoC -> invalidate dcache

-Geoff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Kexec on arm64
  2014-08-26 22:32                                                   ` Geoff Levand
@ 2014-08-27  4:56                                                     ` Arun Chandran
  0 siblings, 0 replies; 49+ messages in thread
From: Arun Chandran @ 2014-08-27  4:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geoff,

On Wed, Aug 27, 2014 at 4:02 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Arun,
>
> On Wed, 2014-08-13 at 16:39 +0530, Arun Chandran wrote:
>> I have one more concern regarding flushing of D-cache area corresponding
>> to the kexec_list entrees.
>>
>> Currently kexec_list_walk() is doing
>>
>> 1) flush_dcache_area of the kexec_list[0] till PAGE_SIZE
>>
>> 2) continue accessing entries in kexec_list[0] to PAGE_SIZE
>>
>> 3) switch to next kexec_list depending upon kexec_list[entry] & flag
>> ==  IND_INDIRECTION
>>
>> 4) goto 1
>>
>> Shouldn't that be doing flush_dcache_area() after completely using the list??
>
> We just want to get any data in the dcache out to the PoC before
> disabling the dcache, so as long as there are only reads, and no writes
> to those addresses, kexec_list_walk() should work OK.
>
Yes. I missed that point. If we don't perform any writes flushing
works just fine.

> I will move the flush of the new kernel image to after it is copied in
> relocate_new_kernel().  I think that your L3 cache may not work with
> what we have now:
>
>   current:  invalidate dcache -> turn off dcache -> write data to PoC
>   proposed: turn off dcache -> write data to PoC -> invalidate dcache
>

Yes this exactly what I have done here
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/278857.html
; that code should live inside relocate_new_kernel().

I am doing the cache invalidation(only invalidation) in
relocate_new_kernel(). As we run that code after cache's are off(L3 only
comes to picture when lower level caches are on) we are writing
to data (new kernel) to PoC.

--Arun

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2014-08-27  4:56 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAFdej006OSyhgDcJ2iZdbjt+PtysN=i_+9Dr4GTmr=+t5yg4Kw@mail.gmail.com>
2014-07-15 17:04 ` Kexec on arm64 Geoff Levand
2014-07-16 17:57   ` Feng Kan
2014-07-16 23:04     ` Geoff Levand
2014-07-22  9:44       ` Arun Chandran
2014-07-22 13:25         ` Arun Chandran
2014-07-24  0:38           ` Geoff Levand
2014-07-24  9:36             ` Mark Rutland
2014-07-24 12:49               ` Arun Chandran
2014-07-25  0:17               ` Geoff Levand
2014-07-25 10:31                 ` Arun Chandran
2014-07-25 10:36                 ` Mark Rutland
2014-07-25 11:48                 ` Arun Chandran
2014-07-25 12:14                   ` Mark Rutland
2014-07-25 15:29                     ` Arun Chandran
2014-07-26  0:18                   ` Geoff Levand
2014-07-28 15:00                     ` Arun Chandran
2014-07-28 15:38                       ` Mark Rutland
2014-07-29  0:09                         ` Geoff Levand
2014-07-29  9:10                           ` Mark Rutland
2014-07-29 12:32                           ` Arun Chandran
2014-07-29 13:35                             ` Mark Rutland
2014-07-29 21:19                               ` Geoff Levand
2014-07-30  7:22                                 ` Arun Chandran
2014-08-01 11:13                                   ` Arun Chandran
2014-08-03 14:47                                     ` Mark Rutland
2014-08-04 10:16                                   ` Arun Chandran
2014-08-04 11:35                                     ` Mark Rutland
2014-08-07  0:40                                       ` Geoff Levand
2014-08-07  9:59                                         ` Mark Rutland
2014-08-07 17:09                                           ` Geoff Levand
2014-08-04 17:21                                     ` Geoff Levand
2014-08-06 13:54                                       ` Arun Chandran
2014-08-06 15:51                                         ` Arun Chandran
2014-08-07 20:07                                         ` Geoff Levand
2014-08-08  5:46                                           ` Arun Chandran
2014-08-08 10:03                                             ` Arun Chandran
2014-08-12  5:42                                               ` Arun Chandran
2014-08-13 11:09                                                 ` Arun Chandran
2014-08-26 22:32                                                   ` Geoff Levand
2014-08-27  4:56                                                     ` Arun Chandran
2014-07-30  5:46                               ` Arun Chandran
2014-07-30  9:16                                 ` Mark Rutland
2014-07-30  7:01                               ` Arun Chandran
2014-07-25 10:26               ` Arun Chandran
2014-07-25 11:29                 ` Mark Rutland
2014-07-24 11:50             ` Arun Chandran
2014-07-30  3:26           ` Feng Kan
2014-07-24  0:10         ` Geoff Levand
2014-07-24  9:13         ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).