linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* arm64 regression in kernel 5.12 related to the (n)VHE
@ 2021-08-11 12:15 Rafał Miłecki
  2021-08-11 12:50 ` Marc Zyngier
  2021-08-12  3:59 ` Rafał Miłecki
  0 siblings, 2 replies; 16+ messages in thread
From: Rafał Miłecki @ 2021-08-11 12:15 UTC (permalink / raw)
  To: Marc Zyngier, David Brazdil, Catalin Marinas, Will Deacon,
	linux-arm-kernel
  Cc: Mark Rutland, David Brazdil, Ard Biesheuvel, Andrey Konovalov,
	Marco Elver, BCM Kernel Feedback, Florian Fainelli

Hi,

I just tried upgrading from the old good LTS kernel 5.10 and I
discovered that my bcm4908 boards don't boot anymore with the 5.14-rc5.


The problem is kernel doesn't seem to start booting at all. I see CFE
bootloader messages:

Starting program at 0x0000000000080000
/memory = 0x40000000

and then nothing. Normally the first kernel line should follow like a:
Linux version 5.11.0-rc4 (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #30 SMP Wed Aug 11 14:01:00 CEST 2021


I have zero knowledge of low level arm64 or assembler stuff. I also
don't own any bcm4908 development board or bcm4908 datasheets.

All I could do to help debugging this regression was bisecting. The
first bad commit (I verified it after bisecting process) is:

commit 0c93df9622d4d921bcd0dc83f71fed9e98f5119f
Author: Marc Zyngier <maz@kernel.org>
Date:   Mon Feb 8 09:57:14 2021 +0000

     arm64: Initialise as nVHE before switching to VHE

     As we are aiming to be able to control whether we enable VHE or
     not, let's always drop down to EL1 first, and only then upgrade
     to VHE if at all possible.

     This means that if the kernel is booted at EL2, we always start
     with a nVHE init, drop to EL1 to initialise the the kernel, and
     only then upgrade the kernel EL to EL2 if possible (the process
     is obviously shortened for secondary CPUs).

     The resume path is handled similarly to a secondary CPU boot.

     Signed-off-by: Marc Zyngier <maz@kernel.org>
     Acked-by: David Brazdil <dbrazdil@google.com>
     Acked-by: Catalin Marinas <catalin.marinas@arm.com>
     Link: https://lore.kernel.org/r/20210208095732.3267263-6-maz@kernel.org
     [will: Avoid calling switch_to_vhe twice on kaslr path]
     Signed-off-by: Will Deacon <will@kernel.org>


Could you look at this issue, please? I'm happy to test any patches or
provide any extra info I can obtain using kernel 5.11.


My defconfig for bcm4908 is:

CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE="${BR_BINARIES_DIR}/rootfs.cpio"
CONFIG_ARCH_BCM4908=y
CONFIG_NR_CPUS=4
CONFIG_CMDLINE="earlycon=bcm63xx_uart,0xff800640"
CONFIG_CMDLINE_FORCE=y
CONFIG_PCI=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCIE_BRCMSTB=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_SERIAL_BCM63XX=y
CONFIG_SERIAL_BCM63XX_CONSOLE=y
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_SLAVE=y
CONFIG_SPI=y
CONFIG_PINCTRL=y
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_GENERIC_PLATFORM=y
CONFIG_POWER_RESET_SYSCON=y
CONFIG_THERMAL=y
CONFIG_USB=y
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_XHCI_PLATFORM=y
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_HCD_PLATFORM=y
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_OHCI_HCD_PLATFORM=y
CONFIG_DMADEVICES=y
CONFIG_RESET_CONTROLLER=y

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-11 12:15 arm64 regression in kernel 5.12 related to the (n)VHE Rafał Miłecki
@ 2021-08-11 12:50 ` Marc Zyngier
  2021-08-11 16:55   ` Rafał Miłecki
  2021-08-12  3:59 ` Rafał Miłecki
  1 sibling, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2021-08-11 12:50 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Andrey Konovalov, Marco Elver,
	BCM Kernel Feedback, Florian Fainelli

Hi Rafał,

On Wed, 11 Aug 2021 13:15:31 +0100,
Rafał Miłecki <zajec5@gmail.com> wrote:
> 
> Hi,
> 
> I just tried upgrading from the old good LTS kernel 5.10 and I
> discovered that my bcm4908 boards don't boot anymore with the 5.14-rc5.
> 
> 
> The problem is kernel doesn't seem to start booting at all. I see CFE
> bootloader messages:
> 
> Starting program at 0x0000000000080000
> /memory = 0x40000000
> 
> and then nothing. Normally the first kernel line should follow like a:
> Linux version 5.11.0-rc4 (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #30 SMP Wed Aug 11 14:01:00 CEST 2021
> 
> 
> I have zero knowledge of low level arm64 or assembler stuff. I also
> don't own any bcm4908 development board or bcm4908 datasheets.
> 
> All I could do to help debugging this regression was bisecting. The
> first bad commit (I verified it after bisecting process) is:
> 
> commit 0c93df9622d4d921bcd0dc83f71fed9e98f5119f
> Author: Marc Zyngier <maz@kernel.org>
> Date:   Mon Feb 8 09:57:14 2021 +0000
> 
>     arm64: Initialise as nVHE before switching to VHE
> 
>     As we are aiming to be able to control whether we enable VHE or
>     not, let's always drop down to EL1 first, and only then upgrade
>     to VHE if at all possible.
> 
>     This means that if the kernel is booted at EL2, we always start
>     with a nVHE init, drop to EL1 to initialise the the kernel, and
>     only then upgrade the kernel EL to EL2 if possible (the process
>     is obviously shortened for secondary CPUs).
> 
>     The resume path is handled similarly to a secondary CPU boot.
> 
>     Signed-off-by: Marc Zyngier <maz@kernel.org>
>     Acked-by: David Brazdil <dbrazdil@google.com>
>     Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>     Link: https://lore.kernel.org/r/20210208095732.3267263-6-maz@kernel.org
>     [will: Avoid calling switch_to_vhe twice on kaslr path]
>     Signed-off-by: Will Deacon <will@kernel.org>
> 
> 
> Could you look at this issue, please? I'm happy to test any patches or
> provide any extra info I can obtain using kernel 5.11.
> 
> 
> My defconfig for bcm4908 is:

[...]

I don't think the dconfig is that relevant (nothing you quote here
would have an impact that early in the boot process).

On the other hand, a description of the platform (what CPUs does it
have) and how it boots (VHE, non-VHE, booted at EL2 or not) would be
extremely useful. At minimum, a boot log of a working kernel could
help.

Florian: is it something you are seeing on your own systems?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-11 12:50 ` Marc Zyngier
@ 2021-08-11 16:55   ` Rafał Miłecki
  2021-08-12  6:51     ` Marc Zyngier
  0 siblings, 1 reply; 16+ messages in thread
From: Rafał Miłecki @ 2021-08-11 16:55 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On 11.08.2021 14:50, Marc Zyngier wrote:
> On Wed, 11 Aug 2021 13:15:31 +0100,
> Rafał Miłecki <zajec5@gmail.com> wrote:
>>
>> Hi,
>>
>> I just tried upgrading from the old good LTS kernel 5.10 and I
>> discovered that my bcm4908 boards don't boot anymore with the 5.14-rc5.
>>
>>
>> The problem is kernel doesn't seem to start booting at all. I see CFE
>> bootloader messages:
>>
>> Starting program at 0x0000000000080000
>> /memory = 0x40000000
>>
>> and then nothing. Normally the first kernel line should follow like a:
>> Linux version 5.11.0-rc4 (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #30 SMP Wed Aug 11 14:01:00 CEST 2021
>>
>>
>> I have zero knowledge of low level arm64 or assembler stuff. I also
>> don't own any bcm4908 development board or bcm4908 datasheets.
>>
>> All I could do to help debugging this regression was bisecting. The
>> first bad commit (I verified it after bisecting process) is:
>>
>> commit 0c93df9622d4d921bcd0dc83f71fed9e98f5119f
>> Author: Marc Zyngier <maz@kernel.org>
>> Date:   Mon Feb 8 09:57:14 2021 +0000
>>
>>      arm64: Initialise as nVHE before switching to VHE
>>
>>      As we are aiming to be able to control whether we enable VHE or
>>      not, let's always drop down to EL1 first, and only then upgrade
>>      to VHE if at all possible.
>>
>>      This means that if the kernel is booted at EL2, we always start
>>      with a nVHE init, drop to EL1 to initialise the the kernel, and
>>      only then upgrade the kernel EL to EL2 if possible (the process
>>      is obviously shortened for secondary CPUs).
>>
>>      The resume path is handled similarly to a secondary CPU boot.
>>
>>      Signed-off-by: Marc Zyngier <maz@kernel.org>
>>      Acked-by: David Brazdil <dbrazdil@google.com>
>>      Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>>      Link: https://lore.kernel.org/r/20210208095732.3267263-6-maz@kernel.org
>>      [will: Avoid calling switch_to_vhe twice on kaslr path]
>>      Signed-off-by: Will Deacon <will@kernel.org>
>>
>>
>> Could you look at this issue, please? I'm happy to test any patches or
>> provide any extra info I can obtain using kernel 5.11.
>>
>>
>> My defconfig for bcm4908 is:
> 
> [...]
> 
> I don't think the dconfig is that relevant (nothing you quote here
> would have an impact that early in the boot process).
> 
> On the other hand, a description of the platform (what CPUs does it
> have) and how it boots (VHE, non-VHE, booted at EL2 or not) would be
> extremely useful. At minimum, a boot log of a working kernel could
> help.

Thank you for your patience & reply.

BCM4908 is Broadcom's 64-bit platform with Broadcom's own Brahma-B53
CPU(s). I don't know how it boots. Is that something I can find out
from a running system?

For DTS SoC description you can check:
arch/arm64/boot/dts/broadcom/bcm4908/bcm4908.dtsi

See below for bootlog and /proc/cpuinfo. Please note I seem to have
console misconfigured and early part of log appears twice (nothing
really harmful).

Starting program at 0x0000000000080000
/memory = 0x40000000
WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
Booting Linux on physical CPU 0x0000000000 [0x420f1000]
Linux version 5.11.22-g40462c7f0649 (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #9 SMP Wed Aug 11 18:39:58 CEST 2021
Machine model: Asus GT-AC5300
earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
printk: bootconsole [bcm63xx_uart0] enabled
efi: UEFI not found.
[Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
Zone ranges:
   DMA      [mem 0x0000000000000000-0x000000003fffffff]
   DMA32    empty
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x0000000000000000-0x000000003fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
percpu: Embedded 17 pages/cpu s37856 r0 d31776 u69632
Detected VIPT I-cache on CPU0
CPU features: detected: ARM erratum 843419
Built 1 zonelists, mobility grouping on.  Total pages: 258048
Kernel command line: earlycon=bcm63xx_uart,0xff800640
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 1020660K/1048576K available (3584K kernel code, 650K rwdata, 684K rodata, 2368K init, 229K bss, 27916K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
rcu: Hierarchical RCU implementation.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
GIC: Using split EOI/Deactivate mode
random: get_random_bytes called from start_kernel+0x33c/0x524 with crng_init=0
arch_timer: cp15 timer(s) running at 50.00MHz (phys).
clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
Console: colour dummy device 80x25
printk: console [tty0] enabled
printk: bootconsole [bcm63xx_uart0] disabled
Booting Linux on physical CPU 0x0000000000 [0x420f1000]
Linux version 5.11.22-g40462c7f0649 (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #9 SMP Wed Aug 11 18:39:58 CEST 2021
Machine model: Asus GT-AC5300
earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
printk: bootconsole [bcm63xx_uart0] enabled
efi: UEFI not found.
[Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
Zone ranges:
   DMA      [mem 0x0000000000000000-0x000000003fffffff]
   DMA32    empty
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x0000000000000000-0x000000003fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
percpu: Embedded 17 pages/cpu s37856 r0 d31776 u69632
Detected VIPT I-cache on CPU0
CPU features: detected: ARM erratum 843419
Built 1 zonelists, mobility grouping on.  Total pages: 258048
Kernel command line: earlycon=bcm63xx_uart,0xff800640
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 1020660K/1048576K available (3584K kernel code, 650K rwdata, 684K rodata, 2368K init, 229K bss, 27916K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
rcu: Hierarchical RCU implementation.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
GIC: Using split EOI/Deactivate mode
random: get_random_bytes called from start_kernel+0x33c/0x524 with crng_init=0
arch_timer: cp15 timer(s) running at 50.00MHz (phys).
clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
Console: colour dummy device 80x25
printk: console [tty0] enabled
printk: bootconsole [bcm63xx_uart0] disabled
Calibrating delay loop (skipped), value calculated using timer frequency.. 100.00 BogoMIPS (lpj=200000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
rcu: Hierarchical SRCU implementation.
EFI services will not be available.
smp: Bringing up secondary CPUs ...
Detected VIPT I-cache on CPU1
CPU1: Booted secondary processor 0x0000000001 [0x420f1000]
Detected VIPT I-cache on CPU2
CPU2: Booted secondary processor 0x0000000002 [0x420f1000]
Detected VIPT I-cache on CPU3
CPU3: Booted secondary processor 0x0000000003 [0x420f1000]
smp: Brought up 1 node, 4 CPUs
SMP: Total of 4 processors activated.
CPU features: detected: 32-bit EL0 Support
CPU features: detected: CRC32 instructions
CPU: All CPU(s) started at EL2
alternatives: patching kernel code
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
pinctrl core: initialized pinctrl subsystem
DMI not present or invalid.
DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
thermal_sys: Registered thermal governor 'step_wise'
ASID allocator initialised with 65536 entries
iommu: Default domain type: Translated
vgaarb: loaded
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
clocksource: Switched to clocksource arch_sys_counter
PCI: CLS 0 bytes, default 64
workingset: timestamp_bits=62 max_order=18 bucket_order=0
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler mq-deadline registered
io scheduler kyber registered
basic-mmio-gpio: probe of ff800500.gpio-controller failed with error -22
ff800640.serial: ttyS0 at MMIO 0xff800640 (irq = 17, base_baud = 1562500) is a bcm63xx_uart
printk: console [ttyS0] enabled
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci-pci: EHCI PCI platform driver
ehci-platform: EHCI generic platform driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci-pci: OHCI PCI platform driver
ohci-platform: OHCI generic platform driver
i2c /dev entries driver
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
Freeing unused kernel memory: 2368K
Run /init as init process
tmpfs: Unknown parameter 'mode'
mount: mounting tmpfs: Unknown parameter 'mode'
tmpfs on /dev/shtmpfs: Unknown parameter 'mode'
m failed: Invalid argument
mount: mounting tmpfs on /tmp failed: Invalid argument
mount: mounting tmpfs on /run failed: Invalid argument
Starting syslogd: OK
Starting klogd: random: dd: uninitialized urandom read (512 bytes read)
OK
Running sysctl: OK
Saving random seed: OK
Starting network: ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
FAIL

Welcome to Buildroot
buildroot login:

# cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 100.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x42
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0x100
CPU revision    : 0

processor       : 1
BogoMIPS        : 100.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x42
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0x100
CPU revision    : 0

processor       : 2
BogoMrandom: fast init done
IPS     : 100.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x42
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0x100
CPU revision    : 0

processor       : 3
BogoMIPS        : 100.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x42
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0x100
CPU revision    : 0

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-11 12:15 arm64 regression in kernel 5.12 related to the (n)VHE Rafał Miłecki
  2021-08-11 12:50 ` Marc Zyngier
@ 2021-08-12  3:59 ` Rafał Miłecki
  1 sibling, 0 replies; 16+ messages in thread
From: Rafał Miłecki @ 2021-08-12  3:59 UTC (permalink / raw)
  To: Marc Zyngier, David Brazdil, Catalin Marinas, Will Deacon,
	linux-arm-kernel
  Cc: Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On 11.08.2021 14:15, Rafał Miłecki wrote:
> All I could do to help debugging this regression was bisecting. The
> first bad commit (I verified it after bisecting process) is:
> 
> commit 0c93df9622d4d921bcd0dc83f71fed9e98f5119f
> Author: Marc Zyngier <maz@kernel.org>
> Date:   Mon Feb 8 09:57:14 2021 +0000
> 
>      arm64: Initialise as nVHE before switching to VHE
> 
>      As we are aiming to be able to control whether we enable VHE or
>      not, let's always drop down to EL1 first, and only then upgrade
>      to VHE if at all possible.
> 
>      This means that if the kernel is booted at EL2, we always start
>      with a nVHE init, drop to EL1 to initialise the the kernel, and
>      only then upgrade the kernel EL to EL2 if possible (the process
>      is obviously shortened for secondary CPUs).
> 
>      The resume path is handled similarly to a secondary CPU boot.
> 
>      Signed-off-by: Marc Zyngier <maz@kernel.org>
>      Acked-by: David Brazdil <dbrazdil@google.com>
>      Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>      Link: https://lore.kernel.org/r/20210208095732.3267263-6-maz@kernel.org
>      [will: Avoid calling switch_to_vhe twice on kaslr path]
>      Signed-off-by: Will Deacon <will@kernel.org>

FWIW I confirmed it's about the above commit.

***

On top of 5.12.19 I can do:

git revert \
d077cb3cb90470f8bd7dbe357a474e13589390b9 \
e2df464173f0b585adb958a09536eae2cd1dbefd \
0c93df9622d4d921bcd0dc83f71fed9e98f5119f

to get a booting kernel (the first 2 commits need to be reverted to
allow a clean revert of the 0c93df9622d4).

***

On top of 5.13.9 I can do:

git revert \
2d726d0db6ac479d91bf74490455badd34af6b1d \
d077cb3cb90470f8bd7dbe357a474e13589390b9 \
e2df464173f0b585adb958a09536eae2cd1dbefd \
0c93df9622d4d921bcd0dc83f71fed9e98f5119f

to get a booting kernel too.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-11 16:55   ` Rafał Miłecki
@ 2021-08-12  6:51     ` Marc Zyngier
  2021-08-12  7:32       ` Rafał Miłecki
  2021-08-12  8:33       ` Florian Fainelli
  0 siblings, 2 replies; 16+ messages in thread
From: Marc Zyngier @ 2021-08-12  6:51 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On Wed, 11 Aug 2021 17:55:07 +0100,
Rafał Miłecki <zajec5@gmail.com> wrote:
> 
> On 11.08.2021 14:50, Marc Zyngier wrote:
> > On Wed, 11 Aug 2021 13:15:31 +0100,
> > Rafał Miłecki <zajec5@gmail.com> wrote:
> >> 
> >> Hi,
> >> 
> >> I just tried upgrading from the old good LTS kernel 5.10 and I
> >> discovered that my bcm4908 boards don't boot anymore with the 5.14-rc5.
> >> 
> >> 
> >> The problem is kernel doesn't seem to start booting at all. I see CFE
> >> bootloader messages:
> >> 
> >> Starting program at 0x0000000000080000
> >> /memory = 0x40000000
> >> 
> >> and then nothing. Normally the first kernel line should follow like a:
> >> Linux version 5.11.0-rc4 (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #30 SMP Wed Aug 11 14:01:00 CEST 2021
> >> 
> >> 
> >> I have zero knowledge of low level arm64 or assembler stuff. I also
> >> don't own any bcm4908 development board or bcm4908 datasheets.
> >> 
> >> All I could do to help debugging this regression was bisecting. The
> >> first bad commit (I verified it after bisecting process) is:
> >> 
> >> commit 0c93df9622d4d921bcd0dc83f71fed9e98f5119f
> >> Author: Marc Zyngier <maz@kernel.org>
> >> Date:   Mon Feb 8 09:57:14 2021 +0000
> >> 
> >>      arm64: Initialise as nVHE before switching to VHE
> >> 
> >>      As we are aiming to be able to control whether we enable VHE or
> >>      not, let's always drop down to EL1 first, and only then upgrade
> >>      to VHE if at all possible.
> >> 
> >>      This means that if the kernel is booted at EL2, we always start
> >>      with a nVHE init, drop to EL1 to initialise the the kernel, and
> >>      only then upgrade the kernel EL to EL2 if possible (the process
> >>      is obviously shortened for secondary CPUs).
> >> 
> >>      The resume path is handled similarly to a secondary CPU boot.
> >> 
> >>      Signed-off-by: Marc Zyngier <maz@kernel.org>
> >>      Acked-by: David Brazdil <dbrazdil@google.com>
> >>      Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> >>      Link: https://lore.kernel.org/r/20210208095732.3267263-6-maz@kernel.org
> >>      [will: Avoid calling switch_to_vhe twice on kaslr path]
> >>      Signed-off-by: Will Deacon <will@kernel.org>
> >> 
> >> 
> >> Could you look at this issue, please? I'm happy to test any patches or
> >> provide any extra info I can obtain using kernel 5.11.
> >> 
> >> 
> >> My defconfig for bcm4908 is:
> > 
> > [...]
> > 
> > I don't think the dconfig is that relevant (nothing you quote here
> > would have an impact that early in the boot process).
> > 
> > On the other hand, a description of the platform (what CPUs does it
> > have) and how it boots (VHE, non-VHE, booted at EL2 or not) would be
> > extremely useful. At minimum, a boot log of a working kernel could
> > help.
> 
> Thank you for your patience & reply.
> 
> BCM4908 is Broadcom's 64-bit platform with Broadcom's own Brahma-B53
> CPU(s). I don't know how it boots. Is that something I can find out
> from a running system?
> 
> For DTS SoC description you can check:
> arch/arm64/boot/dts/broadcom/bcm4908/bcm4908.dtsi
> 
> See below for bootlog and /proc/cpuinfo. Please note I seem to have
> console misconfigured and early part of log appears twice (nothing
> really harmful).
> 
> Starting program at 0x0000000000080000
> /memory = 0x40000000
> WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
> WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
> WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
> Booting Linux on physical CPU 0x0000000000 [0x420f1000]
> Linux version 5.11.22-g40462c7f0649 (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #9 SMP Wed Aug 11 18:39:58 CEST 2021
> Machine model: Asus GT-AC5300
> earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
> printk: bootconsole [bcm63xx_uart0] enabled
> efi: UEFI not found.
> [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
> Zone ranges:
>   DMA      [mem 0x0000000000000000-0x000000003fffffff]
>   DMA32    empty
>   Normal   empty
> Movable zone start for each node
> Early memory node ranges
>   node   0: [mem 0x0000000000000000-0x000000003fffffff]
> Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
> percpu: Embedded 17 pages/cpu s37856 r0 d31776 u69632
> Detected VIPT I-cache on CPU0
> CPU features: detected: ARM erratum 843419
> Built 1 zonelists, mobility grouping on.  Total pages: 258048
> Kernel command line: earlycon=bcm63xx_uart,0xff800640
> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
> mem auto-init: stack:off, heap alloc:off, heap free:off
> Memory: 1020660K/1048576K available (3584K kernel code, 650K rwdata, 684K rodata, 2368K init, 229K bss, 27916K reserved, 0K cma-reserved)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> rcu: Hierarchical RCU implementation.
> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
> GIC: Using split EOI/Deactivate mode
> random: get_random_bytes called from start_kernel+0x33c/0x524 with crng_init=0
> arch_timer: cp15 timer(s) running at 50.00MHz (phys).
> clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
> Console: colour dummy device 80x25
> printk: console [tty0] enabled
> printk: bootconsole [bcm63xx_uart0] disabled
> Booting Linux on physical CPU 0x0000000000 [0x420f1000]
> Linux version 5.11.22-g40462c7f0649 (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #9 SMP Wed Aug 11 18:39:58 CEST 2021
> Machine model: Asus GT-AC5300
> earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
> printk: bootconsole [bcm63xx_uart0] enabled
> efi: UEFI not found.
> [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
> Zone ranges:
>   DMA      [mem 0x0000000000000000-0x000000003fffffff]
>   DMA32    empty
>   Normal   empty
> Movable zone start for each node
> Early memory node ranges
>   node   0: [mem 0x0000000000000000-0x000000003fffffff]
> Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
> percpu: Embedded 17 pages/cpu s37856 r0 d31776 u69632
> Detected VIPT I-cache on CPU0
> CPU features: detected: ARM erratum 843419
> Built 1 zonelists, mobility grouping on.  Total pages: 258048
> Kernel command line: earlycon=bcm63xx_uart,0xff800640
> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
> mem auto-init: stack:off, heap alloc:off, heap free:off
> Memory: 1020660K/1048576K available (3584K kernel code, 650K rwdata, 684K rodata, 2368K init, 229K bss, 27916K reserved, 0K cma-reserved)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> rcu: Hierarchical RCU implementation.
> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
> GIC: Using split EOI/Deactivate mode
> random: get_random_bytes called from start_kernel+0x33c/0x524 with crng_init=0
> arch_timer: cp15 timer(s) running at 50.00MHz (phys).
> clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
> Console: colour dummy device 80x25
> printk: console [tty0] enabled
> printk: bootconsole [bcm63xx_uart0] disabled
> Calibrating delay loop (skipped), value calculated using timer frequency.. 100.00 BogoMIPS (lpj=200000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
> Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
> rcu: Hierarchical SRCU implementation.
> EFI services will not be available.
> smp: Bringing up secondary CPUs ...
> Detected VIPT I-cache on CPU1
> CPU1: Booted secondary processor 0x0000000001 [0x420f1000]
> Detected VIPT I-cache on CPU2
> CPU2: Booted secondary processor 0x0000000002 [0x420f1000]
> Detected VIPT I-cache on CPU3
> CPU3: Booted secondary processor 0x0000000003 [0x420f1000]
> smp: Brought up 1 node, 4 CPUs
> SMP: Total of 4 processors activated.
> CPU features: detected: 32-bit EL0 Support
> CPU features: detected: CRC32 instructions
> CPU: All CPU(s) started at EL2

Interestingly, all your CPUs are booting at EL2. Which is great.  Can
you try and enable KVM on your existing 5.10 kernel? Just selecting
CONFIG_KVM should be enough. Does it boot correctly with KVM enabled?

My suspicion is that the firmware doesn't set SCR_EL3.HCE, and that
the HVC instruction UNDEFs at EL1. That would be bad news.

Please let me know.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12  6:51     ` Marc Zyngier
@ 2021-08-12  7:32       ` Rafał Miłecki
  2021-08-12  7:56         ` Rafał Miłecki
  2021-08-12  7:57         ` Marc Zyngier
  2021-08-12  8:33       ` Florian Fainelli
  1 sibling, 2 replies; 16+ messages in thread
From: Rafał Miłecki @ 2021-08-12  7:32 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On 12.08.2021 08:51, Marc Zyngier wrote:
> Interestingly, all your CPUs are booting at EL2. Which is great.  Can
> you try and enable KVM on your existing 5.10 kernel? Just selecting
> CONFIG_KVM should be enough. Does it boot correctly with KVM enabled?
> 
> My suspicion is that the firmware doesn't set SCR_EL3.HCE, and that
> the HVC instruction UNDEFs at EL1. That would be bad news.

Interesting! I had to enable CONFIG_VIRTUALIZATION and CONFIG_NET first.
First I verified kernel built with those options still boots. It does.

Then I enabled CONFIG_KVM and kernel seems to hang around switching from
bootconsole to the console.

Starting program at 0x0000000000080000
/memory = 0x40000000
WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
Booting Linux on physical CPU 0x0000000000 [0x420f1000]
Linux version 5.11.22-g0453a426c37b (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #8 SMP Thu Aug 12 09:25:55 CEST 2021
Machine model: Asus GT-AC5300
earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
printk: bootconsole [bcm63xx_uart0] enabled
efi: UEFI not found.
[Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
Zone ranges:
   DMA      [mem 0x0000000000000000-0x000000003fffffff]
   DMA32    empty
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x0000000000000000-0x000000003fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
percpu: Embedded 18 pages/cpu s43904 r0 d29824 u73728
Detected VIPT I-cache on CPU0
CPU features: detected: ARM erratum 843419
Built 1 zonelists, mobility grouping on.  Total pages: 258048
Kernel command line: earlycon=bcm63xx_uart,0xff800640
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 1019556K/1048576K available (4352K kernel code, 678K rwdata, 860K rodata, 2496K init, 232K bss, 29020K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
rcu: Hierarchical RCU implementation.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
GIC: Using split EOI/Deactivate mode
random: get_random_bytes called from start_kernel+0x33c/0x52c with crng_init=0
arch_timer: cp15 timer(s) running at 50.00MHz (phys).
clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
Console: colour dummy device 80x25
printk: console [tty0] enabled
printk: bootconsole [bcm63xx_uart0] disabled


(Unless it's a false conclusion and CONFIG_KVM just breaks console
somehow)


Enabling CONFIG_KVM resulted in enablig few more options but I believe
it's actually CONFIG_KVM itself that affects booting process.

--- config-nokvm        2021-08-12 09:21:50.670046231 +0200
+++ config-kvm  2021-08-12 09:22:35.897103038 +0200
@@ -292,6 +292,7 @@
  CONFIG_ARM64_ERRATUM_824069=y
  CONFIG_ARM64_ERRATUM_819472=y
  CONFIG_ARM64_ERRATUM_832075=y
+CONFIG_ARM64_ERRATUM_834220=y
  CONFIG_ARM64_ERRATUM_843419=y
  CONFIG_ARM64_ERRATUM_1024718=y
  CONFIG_ARM64_WORKAROUND_SPECULATIVE_AT=y
@@ -502,8 +503,21 @@

  CONFIG_ARCH_SUPPORTS_ACPI=y
  # CONFIG_ACPI is not set
+CONFIG_IRQ_BYPASS_MANAGER=y
  CONFIG_VIRTUALIZATION=y
-# CONFIG_KVM is not set
+CONFIG_KVM=y
+CONFIG_HAVE_KVM_IRQCHIP=y
+CONFIG_HAVE_KVM_IRQFD=y
+CONFIG_HAVE_KVM_IRQ_ROUTING=y
+CONFIG_HAVE_KVM_EVENTFD=y
+CONFIG_KVM_MMIO=y
+CONFIG_HAVE_KVM_MSI=y
+CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
+CONFIG_KVM_VFIO=y
+CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL=y
+CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
+CONFIG_HAVE_KVM_IRQ_BYPASS=y
+CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE=y

  #
  # General architecture-dependent options
@@ -616,6 +630,7 @@
  # CONFIG_IOSCHED_BFQ is not set
  # end of IO Schedulers

+CONFIG_PREEMPT_NOTIFIERS=y
  CONFIG_ARCH_INLINE_SPIN_TRYLOCK=y
  CONFIG_ARCH_INLINE_SPIN_TRYLOCK_BH=y
  CONFIG_ARCH_INLINE_SPIN_LOCK=y
@@ -712,6 +727,7 @@
  CONFIG_MIGRATION=y
  CONFIG_PHYS_ADDR_T_64BIT=y
  CONFIG_BOUNCE=y
+CONFIG_MMU_NOTIFIER=y
  # CONFIG_KSM is not set
  CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
  CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
@@ -2508,6 +2524,7 @@
  CONFIG_DECOMPRESS_LZ4=y
  CONFIG_DECOMPRESS_ZSTD=y
  CONFIG_GENERIC_ALLOCATOR=y
+CONFIG_INTERVAL_TREE=y
  CONFIG_HAS_IOMEM=y
  CONFIG_HAS_IOPORT_MAP=y
  CONFIG_HAS_DMA=y

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12  7:32       ` Rafał Miłecki
@ 2021-08-12  7:56         ` Rafał Miłecki
  2021-08-12  8:24           ` Marc Zyngier
  2021-08-12  7:57         ` Marc Zyngier
  1 sibling, 1 reply; 16+ messages in thread
From: Rafał Miłecki @ 2021-08-12  7:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On 12.08.2021 09:32, Rafał Miłecki wrote:
> (Unless it's a false conclusion and CONFIG_KVM just breaks console
> somehow)

That was a false conclusion. I modified kernel/printk/printk.c and set
keep_bootcon = 1

A full log with important part below:

Starting program at 0x0000000000080000
/memory = 0x40000000
WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
Booting Linux on physical CPU 0x0000000000 [0x420f1000]
Linux version 5.11.22-g0453a426c37b-dirty (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #12 SMP Thu Aug 12 09:54:11 CEST 2021
Machine model: Asus GT-AC5300
earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
printk: bootconsole [bcm63xx_uart0] enabled
efi: UEFI not found.
[Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
Zone ranges:
   DMA      [mem 0x0000000000000000-0x000000003fffffff]
   DMA32    empty
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x0000000000000000-0x000000003fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
percpu: Embedded 18 pages/cpu s43904 r0 d29824 u73728
Detected VIPT I-cache on CPU0
CPU features: detected: ARM erratum 843419
Built 1 zonelists, mobility grouping on.  Total pages: 258048
Kernel command line: earlycon=bcm63xx_uart,0xff800640
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 1019556K/1048576K available (4352K kernel code, 678K rwdata, 860K rodata, 2496K init, 232K bss, 29020K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
rcu: Hierarchical RCU implementation.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
GIC: Using split EOI/Deactivate mode
random: get_random_bytes called from start_kernel+0x33c/0x52c with crng_init=0
arch_timer: cp15 timer(s) running at 50.00MHz (phys).
clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
Console: colour dummy device 80x25
printk: console [tty0] enabled
Calibrating delay loop (skipped), value calculated using timer frequency.. 100.00 BogoMIPS (lpj=200000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
rcu: Hierarchical SRCU implementation.
EFI services will not be available.
smp: Bringing up secondary CPUs ...
Detected VIPT I-cache on CPU1
CPU1: Booted secondary processor 0x0000000001 [0x420f1000]
Detected VIPT I-cache on CPU2
CPU2: Booted secondary processor 0x0000000002 [0x420f1000]
Detected VIPT I-cache on CPU3
CPU3: Booted secondary processor 0x0000000003 [0x420f1000]
smp: Brought up 1 node, 4 CPUs
SMP: Total of 4 processors activated.
CPU features: detected: 32-bit EL0 Support
CPU features: detected: CRC32 instructions
CPU features: detected: 32-bit EL1 Support
CPU: All CPU(s) started at EL2
alternatives: patching kernel code
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
pinctrl core: initialized pinctrl subsystem
DMI not present or invalid.
NET: Registered protocol family 16
DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
thermal_sys: Registered thermal governor 'step_wise'
ASID allocator initialised with 65536 entries
iommu: Default domain type: Translated
vgaarb: loaded
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
clocksource: Switched to clocksource arch_sys_counter
PCI: CLS 0 bytes, default 64
kvm [1]: IPA Size Limit: 40 bits
------------[ cut here ]------------
------------[ cut here ]------------
------------[ cut here ]------------
kernel BUG at arch/arm64/kernel/traps.c:406!
kernel BUG at arch/arm64/kernel/traps.c:406!
Internal error: Oops - BUG: 0 [#1] SMP
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.11.22-g0453a426c37b-dirty #12
Hardware name: Asus GT-AC5300 (DT)
pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--)
pc : do_undefinstr+0x204/0x230
lr : do_undefinstr+0x218/0x230
sp : ffffffc01001bcc0
x29: ffffffc01001bcc0 x28: ffffff80010a0a80
x27: 00000000000000e0 x26: ffffffc01001c000
x25: ffffffc010018000 x24: ffffffc0108a9000
x23: 0000000080000085 x22: 00000000d4000002
x21: ffffffc0108cd0d8 x20: ffffff80010a0a80
x19: ffffffc01001bd40 x18: 0000000000000030
x17: fbfffafbffffffff x16: 0000000000000048
x15: 0000046366960998 x14: 0000000000000017
x13: 0000000000000001 x12: 0000000000000001
x11: 0000000000000000 x10: 0000000000000003
x9 : 0000000000000000 x8 : 0000000000000000
x7 : ffffff803fde8600 x6 : 0000000000000001
x5 : 0000000000000000 x4 : ffffff80010a0a80
x3 : 00000000d5300000 x2 : ffffffc01082ce68
x1 : ffffffc0108cd0e8 x0 : 0000000080000085
Call trace:
  do_undefinstr+0x204/0x230
  el1_undef+0x30/0x50
  el1_sync_handler+0x8c/0xd0
  el1_sync+0x78/0x100
  __hyp_reset_vectors+0x4/0x91d0
  _kvm_arch_hardware_enable+0x3c/0x60
  flush_smp_call_function_queue+0x14c/0x260
  generic_smp_call_function_single_interrupt+0x14/0x20
  ipi_handler+0x9c/0xd0
  handle_percpu_devid_irq+0x84/0x150
  generic_handle_irq+0x34/0x50
  __handle_domain_irq+0x64/0xc0
  gic_handle_irq+0x78/0xa0
  el1_irq+0xbc/0x140
  arch_cpu_idle+0x18/0x30
  default_idle_call+0x20/0x70
  do_idle+0xc8/0x130
  cpu_startup_entry+0x24/0x50
  secondary_start_kernel+0x130/0x160
Code: 9a81d000 d50342df 17ffffa3 f9001bf7 (d4210000)
---[ end trace 6a2bdc7bc6eb54af ]---
Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
SMP: stopping secondary CPUs
kernel BUG at arch/arm64/kernel/traps.c:406!
SMP: failed to stop secondary CPUs 1-3
Kernel Offset: disabled
CPU features: 0x00240002,24002000
Memory Limit: none
---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]---

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12  7:32       ` Rafał Miłecki
  2021-08-12  7:56         ` Rafał Miłecki
@ 2021-08-12  7:57         ` Marc Zyngier
  2021-08-12  8:24           ` Rafał Miłecki
  2021-08-12  8:33           ` Florian Fainelli
  1 sibling, 2 replies; 16+ messages in thread
From: Marc Zyngier @ 2021-08-12  7:57 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On Thu, 12 Aug 2021 08:32:02 +0100,
Rafał Miłecki <zajec5@gmail.com> wrote:
> 
> On 12.08.2021 08:51, Marc Zyngier wrote:
> > Interestingly, all your CPUs are booting at EL2. Which is great.  Can
> > you try and enable KVM on your existing 5.10 kernel? Just selecting
> > CONFIG_KVM should be enough. Does it boot correctly with KVM enabled?
> > 
> > My suspicion is that the firmware doesn't set SCR_EL3.HCE, and that
> > the HVC instruction UNDEFs at EL1. That would be bad news.
> 
> Interesting! I had to enable CONFIG_VIRTUALIZATION and CONFIG_NET first.
> First I verified kernel built with those options still boots. It does.
> 
> Then I enabled CONFIG_KVM and kernel seems to hang around switching from
> bootconsole to the console.
> 
> Starting program at 0x0000000000080000
> /memory = 0x40000000
> WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
> WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
> WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
> Booting Linux on physical CPU 0x0000000000 [0x420f1000]
> Linux version 5.11.22-g0453a426c37b (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #8 SMP Thu Aug 12 09:25:55 CEST 2021
> Machine model: Asus GT-AC5300
> earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
> printk: bootconsole [bcm63xx_uart0] enabled
> efi: UEFI not found.
> [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
> Zone ranges:
>   DMA      [mem 0x0000000000000000-0x000000003fffffff]
>   DMA32    empty
>   Normal   empty
> Movable zone start for each node
> Early memory node ranges
>   node   0: [mem 0x0000000000000000-0x000000003fffffff]
> Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
> percpu: Embedded 18 pages/cpu s43904 r0 d29824 u73728
> Detected VIPT I-cache on CPU0
> CPU features: detected: ARM erratum 843419
> Built 1 zonelists, mobility grouping on.  Total pages: 258048
> Kernel command line: earlycon=bcm63xx_uart,0xff800640
> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
> mem auto-init: stack:off, heap alloc:off, heap free:off
> Memory: 1019556K/1048576K available (4352K kernel code, 678K rwdata, 860K rodata, 2496K init, 232K bss, 29020K reserved, 0K cma-reserved)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> rcu: Hierarchical RCU implementation.
> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
> GIC: Using split EOI/Deactivate mode
> random: get_random_bytes called from start_kernel+0x33c/0x52c with crng_init=0
> arch_timer: cp15 timer(s) running at 50.00MHz (phys).
> clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
> Console: colour dummy device 80x25
> printk: console [tty0] enabled
> printk: bootconsole [bcm63xx_uart0] disabled
> 
> 
> (Unless it's a false conclusion and CONFIG_KVM just breaks console
> somehow)

No, that's because you don't pass the right console to your
kernel. Add something like "console=ttyS0,115200" to the kernel
command line, which will show what you are missing, as well as stop
the double-logging.

Anyway, the fact that it stops booting when you enable KVM confirms my
suspicion. The firmware on this system is probably crap enough not to
enable HVC. Let's confirm it further: please apply the patch below on
top of mainline and tell me that it now boots fine...

Are you in a position where you can actually fix the firmware? Or is
it some closed-source blob?

Broadcom folks: can you lease check whether the firmware on this
system correctly configures SCR_EL3.HCE?

Thanks,

	M.

diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 43d212618834..fc95b103ef42 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -238,7 +238,7 @@ SYM_FUNC_START(switch_to_vhe)
 
 	// Turn the world upside down
 	mov	x0, #HVC_VHE_RESTART
-	hvc	#0
+//	hvc	#0
 1:
 	ret
 SYM_FUNC_END(switch_to_vhe)

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12  7:57         ` Marc Zyngier
@ 2021-08-12  8:24           ` Rafał Miłecki
  2021-08-12 10:13             ` Marc Zyngier
  2021-08-12  8:33           ` Florian Fainelli
  1 sibling, 1 reply; 16+ messages in thread
From: Rafał Miłecki @ 2021-08-12  8:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On 12.08.2021 09:57, Marc Zyngier wrote:
> On Thu, 12 Aug 2021 08:32:02 +0100,
> Rafał Miłecki <zajec5@gmail.com> wrote:
>>
>> On 12.08.2021 08:51, Marc Zyngier wrote:
>>> Interestingly, all your CPUs are booting at EL2. Which is great.  Can
>>> you try and enable KVM on your existing 5.10 kernel? Just selecting
>>> CONFIG_KVM should be enough. Does it boot correctly with KVM enabled?
>>>
>>> My suspicion is that the firmware doesn't set SCR_EL3.HCE, and that
>>> the HVC instruction UNDEFs at EL1. That would be bad news.
>>
>> Interesting! I had to enable CONFIG_VIRTUALIZATION and CONFIG_NET first.
>> First I verified kernel built with those options still boots. It does.
>>
>> Then I enabled CONFIG_KVM and kernel seems to hang around switching from
>> bootconsole to the console.
>>
>> Starting program at 0x0000000000080000
>> /memory = 0x40000000
>> WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
>> WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
>> WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
>> Booting Linux on physical CPU 0x0000000000 [0x420f1000]
>> Linux version 5.11.22-g0453a426c37b (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #8 SMP Thu Aug 12 09:25:55 CEST 2021
>> Machine model: Asus GT-AC5300
>> earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
>> printk: bootconsole [bcm63xx_uart0] enabled
>> efi: UEFI not found.
>> [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
>> Zone ranges:
>>    DMA      [mem 0x0000000000000000-0x000000003fffffff]
>>    DMA32    empty
>>    Normal   empty
>> Movable zone start for each node
>> Early memory node ranges
>>    node   0: [mem 0x0000000000000000-0x000000003fffffff]
>> Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
>> percpu: Embedded 18 pages/cpu s43904 r0 d29824 u73728
>> Detected VIPT I-cache on CPU0
>> CPU features: detected: ARM erratum 843419
>> Built 1 zonelists, mobility grouping on.  Total pages: 258048
>> Kernel command line: earlycon=bcm63xx_uart,0xff800640
>> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
>> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
>> mem auto-init: stack:off, heap alloc:off, heap free:off
>> Memory: 1019556K/1048576K available (4352K kernel code, 678K rwdata, 860K rodata, 2496K init, 232K bss, 29020K reserved, 0K cma-reserved)
>> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> rcu: Hierarchical RCU implementation.
>> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
>> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
>> GIC: Using split EOI/Deactivate mode
>> random: get_random_bytes called from start_kernel+0x33c/0x52c with crng_init=0
>> arch_timer: cp15 timer(s) running at 50.00MHz (phys).
>> clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
>> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
>> Console: colour dummy device 80x25
>> printk: console [tty0] enabled
>> printk: bootconsole [bcm63xx_uart0] disabled
>>
>>
>> (Unless it's a false conclusion and CONFIG_KVM just breaks console
>> somehow)
> 
> No, that's because you don't pass the right console to your
> kernel. Add something like "console=ttyS0,115200" to the kernel
> command line, which will show what you are missing, as well as stop
> the double-logging.
> 
> Anyway, the fact that it stops booting when you enable KVM confirms my
> suspicion. The firmware on this system is probably crap enough not to
> enable HVC. Let's confirm it further: please apply the patch below on
> top of mainline and tell me that it now boots fine...

Thanks for the patch! It workarounds the issue. See below.


> Are you in a position where you can actually fix the firmware? Or is
> it some closed-source blob?

I'm just an end-user with no access to CFE sources and without any
business contact as Broadcom :(


> diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
> index 43d212618834..fc95b103ef42 100644
> --- a/arch/arm64/kernel/hyp-stub.S
> +++ b/arch/arm64/kernel/hyp-stub.S
> @@ -238,7 +238,7 @@ SYM_FUNC_START(switch_to_vhe)
>   
>   	// Turn the world upside down
>   	mov	x0, #HVC_VHE_RESTART
> -	hvc	#0
> +//	hvc	#0
>   1:
>   	ret
>   SYM_FUNC_END(switch_to_vhe)

This allows me to boot 5.13.9 and 5.14-rc5 without any reverts!

Enabling CONFIG_KVM still results in the:
Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt

Starting program at 0x0000000000080000
/memory = 0x40000000
WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
Booting Linux on physical CPU 0x0000000000 [0x420f1000]
Linux version 5.14.0-rc5-g9c6405c34362-dirty (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #4 SMP Thu Aug 12 10:11:31 CEST 2021
Machine model: Asus GT-AC5300
earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
printk: bootconsole [bcm63xx_uart0] enabled
efi: UEFI not found.
[Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
Zone ranges:
   DMA      [mem 0x0000000000000000-0x000000003fffffff]
   DMA32    empty
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x0000000000000000-0x000000003fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
percpu: Embedded 17 pages/cpu s38112 r0 d31520 u69632
Detected VIPT I-cache on CPU0
CPU features: detected: ARM erratum 843419
Built 1 zonelists, mobility grouping on.  Total pages: 258048
Kernel command line: earlycon=bcm63xx_uart,0xff800640
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 1020452K/1048576K available (3648K kernel code, 654K rwdata, 708K rodata, 2432K init, 228K bss, 28124K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
rcu: Hierarchical RCU implementation.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
Root IRQ handler: gic_handle_irq
GIC: Using split EOI/Deactivate mode
random: get_random_bytes called from start_kernel+0x4a0/0x6dc with crng_init=0
arch_timer: cp15 timer(s) running at 50.00MHz (phys).
clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
Console: colour dummy device 80x25
printk: console [tty0] enabled
printk: bootconsole [bcm63xx_uart0] disabled
Booting Linux on physical CPU 0x0000000000 [0x420f1000]
Linux version 5.14.0-rc5-g9c6405c34362-dirty (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #4 SMP Thu Aug 12 10:11:31 CEST 2021
Machine model: Asus GT-AC5300
earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
printk: bootconsole [bcm63xx_uart0] enabled
efi: UEFI not found.
[Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
Zone ranges:
   DMA      [mem 0x0000000000000000-0x000000003fffffff]
   DMA32    empty
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x0000000000000000-0x000000003fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
percpu: Embedded 17 pages/cpu s38112 r0 d31520 u69632
Detected VIPT I-cache on CPU0
CPU features: detected: ARM erratum 843419
Built 1 zonelists, mobility grouping on.  Total pages: 258048
Kernel command line: earlycon=bcm63xx_uart,0xff800640
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 1020452K/1048576K available (3648K kernel code, 654K rwdata, 708K rodata, 2432K init, 228K bss, 28124K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
rcu: Hierarchical RCU implementation.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
Root IRQ handler: gic_handle_irq
GIC: Using split EOI/Deactivate mode
random: get_random_bytes called from start_kernel+0x4a0/0x6dc with crng_init=0
arch_timer: cp15 timer(s) running at 50.00MHz (phys).
clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
Console: colour dummy device 80x25
printk: console [tty0] enabled
printk: bootconsole [bcm63xx_uart0] disabled
Calibrating delay loop (skipped), value calculated using timer frequency.. 100.00 BogoMIPS (lpj=200000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
rcu: Hierarchical SRCU implementation.
EFI services will not be available.
smp: Bringing up secondary CPUs ...
Detected VIPT I-cache on CPU1
CPU1: Booted secondary processor 0x0000000001 [0x420f1000]
Detected VIPT I-cache on CPU2
CPU2: Booted secondary processor 0x0000000002 [0x420f1000]
Detected VIPT I-cache on CPU3
CPU3: Booted secondary processor 0x0000000003 [0x420f1000]
smp: Brought up 1 node, 4 CPUs
SMP: Total of 4 processors activated.
CPU features: detected: 32-bit EL0 Support
CPU features: detected: CRC32 instructions
CPU: All CPU(s) started at EL2
alternatives: patching kernel code
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
pinctrl core: initialized pinctrl subsystem
DMI not present or invalid.
DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
thermal_sys: Registered thermal governor 'step_wise'
ASID allocator initialised with 65536 entries
iommu: Default domain type: Translated
vgaarb: loaded
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
clocksource: Switched to clocksource arch_sys_counter
PCI: CLS 0 bytes, default 64
workingset: timestamp_bits=62 max_order=18 bucket_order=0
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler mq-deadline registered
io scheduler kyber registered
basic-mmio-gpio: probe of ff800500.gpio-controller failed with error -22
ff800640.serial: ttyS0 at MMIO 0xff800640 (irq = 24, base_baud = 1562500) is a bcm63xx_uart
printk: console [ttyS0] enabled
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci-pci: EHCI PCI platform driver
ehci-platform: EHCI generic platform driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci-pci: OHCI PCI platform driver
ohci-platform: OHCI generic platform driver
i2c /dev entries driver
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
brcmstb-usb-phy 8000c200.usb-phy: Clock not found in Device Tree
brcmstb-usb-phy 8000c200.usb-phy: USB3.0 clock not found in Device Tree
brcmstb-usb-phy 8000c200.usb-phy: Suspend Clock not found in Device Tree
brcmstb-usb-phy 8000c200.usb-phy: IRQ wake not found
brcmstb-usb-phy 8000c200.usb-phy: IRQ wakeup not found
brcmstb-usb-phy 8000c200.usb-phy: Wake interrupt missing, system wake not supported
ehci-platform 8000c300.usb: EHCI Host Controller
ehci-platform 8000c300.usb: new USB bus registered, assigned bus number 1
ehci-platform 8000c300.usb: irq 19, io mem 0x8000c300
ehci-platform 8000c300.usb: USB 2.0 started, EHCI 1.00
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
ohci-platform 8000c400.usb: Generic Platform OHCI controller
ohci-platform 8000c400.usb: new USB bus registered, assigned bus number 2
ohci-platform 8000c400.usb: irq 20, io mem 0x8000c400
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
xhci-hcd 8000d000.usb: xHCI Host Controller
xhci-hcd 8000d000.usb: new USB bus registered, assigned bus number 3
xhci-hcd 8000d000.usb: hcc params 0x0250f17c hci version 0x100 quirks 0x0000000000010010
xhci-hcd 8000d000.usb: irq 21, io mem 0x8000d000
hub 3-0:1.0: USB hub found
hub 3-0:1.0: config failed, hub doesn't have any ports! (err -19)
xhci-hcd 8000d000.usb: xHCI Host Controller
xhci-hcd 8000d000.usb: new USB bus registered, assigned bus number 4
xhci-hcd 8000d000.usb: Host supports USB 3.0 SuperSpeed
usb usb4: We don't know the algorithms for LPM for this host, disabling LPM.
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
Freeing unused kernel memory: 2432K
Run /init as init process
tmpfs: Unknown parameter 'mode'
mount: mounting tmpfs: Unknown parameter 'mode'
tmpfs on /dev/shtmpfs: Unknown parameter 'mode'
m failed: Invalid argument
mount: mounting tmpfs on /tmp failed: Invalid argument
mount: mounting tmpfs on /run failed: Invalid argument
Starting syslogd: OK
Starting klogd: OK
Runnirandom: dd: uninitialized urandom read (512 bytes read)
ng sysctl: OK
Saving random seed: OK
Starting network: ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
ip: socket: Function not implemented
FAIL

Welcome to Buildroot
buildroot login:

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12  7:56         ` Rafał Miłecki
@ 2021-08-12  8:24           ` Marc Zyngier
  0 siblings, 0 replies; 16+ messages in thread
From: Marc Zyngier @ 2021-08-12  8:24 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On Thu, 12 Aug 2021 08:56:57 +0100,
Rafał Miłecki <zajec5@gmail.com> wrote:
> 
> On 12.08.2021 09:32, Rafał Miłecki wrote:
> > (Unless it's a false conclusion and CONFIG_KVM just breaks console
> > somehow)
> 
> That was a false conclusion. I modified kernel/printk/printk.c and set
> keep_bootcon = 1
> 
> A full log with important part below:

[...]

> kernel BUG at arch/arm64/kernel/traps.c:406!
> Internal error: Oops - BUG: 0 [#1] SMP
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.11.22-g0453a426c37b-dirty #12
> Hardware name: Asus GT-AC5300 (DT)
> pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--)
> pc : do_undefinstr+0x204/0x230
> lr : do_undefinstr+0x218/0x230
> sp : ffffffc01001bcc0
> x29: ffffffc01001bcc0 x28: ffffff80010a0a80
> x27: 00000000000000e0 x26: ffffffc01001c000
> x25: ffffffc010018000 x24: ffffffc0108a9000
> x23: 0000000080000085 x22: 00000000d4000002
> x21: ffffffc0108cd0d8 x20: ffffff80010a0a80
> x19: ffffffc01001bd40 x18: 0000000000000030
> x17: fbfffafbffffffff x16: 0000000000000048
> x15: 0000046366960998 x14: 0000000000000017
> x13: 0000000000000001 x12: 0000000000000001
> x11: 0000000000000000 x10: 0000000000000003
> x9 : 0000000000000000 x8 : 0000000000000000
> x7 : ffffff803fde8600 x6 : 0000000000000001
> x5 : 0000000000000000 x4 : ffffff80010a0a80
> x3 : 00000000d5300000 x2 : ffffffc01082ce68
> x1 : ffffffc0108cd0e8 x0 : 0000000080000085
> Call trace:
>  do_undefinstr+0x204/0x230
>  el1_undef+0x30/0x50
>  el1_sync_handler+0x8c/0xd0
>  el1_sync+0x78/0x100
>  __hyp_reset_vectors+0x4/0x91d0
>  _kvm_arch_hardware_enable+0x3c/0x60

And here's the proof. The first HVC we issue ends up generating an
UNDEF, and the kernel legitimately panics. It is just that from 5.12,
we always use HVC even if you don't have KVM enabled. Or kexec. Or
anything else that requires jumping back to EL2, despite having booted
at... EL2. Nonsense.

I'll have to go and think of how to handle this. This may end-up being
a command-line option if we cannot easily handle the UNDEF that early
at boot time.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12  7:57         ` Marc Zyngier
  2021-08-12  8:24           ` Rafał Miłecki
@ 2021-08-12  8:33           ` Florian Fainelli
  1 sibling, 0 replies; 16+ messages in thread
From: Florian Fainelli @ 2021-08-12  8:33 UTC (permalink / raw)
  To: Marc Zyngier, Rafał Miłecki
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli



On 8/12/2021 12:57 AM, Marc Zyngier wrote:
> On Thu, 12 Aug 2021 08:32:02 +0100,
> Rafał Miłecki <zajec5@gmail.com> wrote:
>>
>> On 12.08.2021 08:51, Marc Zyngier wrote:
>>> Interestingly, all your CPUs are booting at EL2. Which is great.  Can
>>> you try and enable KVM on your existing 5.10 kernel? Just selecting
>>> CONFIG_KVM should be enough. Does it boot correctly with KVM enabled?
>>>
>>> My suspicion is that the firmware doesn't set SCR_EL3.HCE, and that
>>> the HVC instruction UNDEFs at EL1. That would be bad news.
>>
>> Interesting! I had to enable CONFIG_VIRTUALIZATION and CONFIG_NET first.
>> First I verified kernel built with those options still boots. It does.
>>
>> Then I enabled CONFIG_KVM and kernel seems to hang around switching from
>> bootconsole to the console.
>>
>> Starting program at 0x0000000000080000
>> /memory = 0x40000000
>> WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
>> WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
>> WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
>> Booting Linux on physical CPU 0x0000000000 [0x420f1000]
>> Linux version 5.11.22-g0453a426c37b (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #8 SMP Thu Aug 12 09:25:55 CEST 2021
>> Machine model: Asus GT-AC5300
>> earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
>> printk: bootconsole [bcm63xx_uart0] enabled
>> efi: UEFI not found.
>> [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
>> Zone ranges:
>>    DMA      [mem 0x0000000000000000-0x000000003fffffff]
>>    DMA32    empty
>>    Normal   empty
>> Movable zone start for each node
>> Early memory node ranges
>>    node   0: [mem 0x0000000000000000-0x000000003fffffff]
>> Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
>> percpu: Embedded 18 pages/cpu s43904 r0 d29824 u73728
>> Detected VIPT I-cache on CPU0
>> CPU features: detected: ARM erratum 843419
>> Built 1 zonelists, mobility grouping on.  Total pages: 258048
>> Kernel command line: earlycon=bcm63xx_uart,0xff800640
>> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
>> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
>> mem auto-init: stack:off, heap alloc:off, heap free:off
>> Memory: 1019556K/1048576K available (4352K kernel code, 678K rwdata, 860K rodata, 2496K init, 232K bss, 29020K reserved, 0K cma-reserved)
>> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> rcu: Hierarchical RCU implementation.
>> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
>> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
>> GIC: Using split EOI/Deactivate mode
>> random: get_random_bytes called from start_kernel+0x33c/0x52c with crng_init=0
>> arch_timer: cp15 timer(s) running at 50.00MHz (phys).
>> clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
>> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
>> Console: colour dummy device 80x25
>> printk: console [tty0] enabled
>> printk: bootconsole [bcm63xx_uart0] disabled
>>
>>
>> (Unless it's a false conclusion and CONFIG_KVM just breaks console
>> somehow)
> 
> No, that's because you don't pass the right console to your
> kernel. Add something like "console=ttyS0,115200" to the kernel
> command line, which will show what you are missing, as well as stop
> the double-logging.
> 
> Anyway, the fact that it stops booting when you enable KVM confirms my
> suspicion. The firmware on this system is probably crap enough not to
> enable HVC. Let's confirm it further: please apply the patch below on
> top of mainline and tell me that it now boots fine...
> 
> Are you in a position where you can actually fix the firmware? Or is
> it some closed-source blob?

This is a closed source blob, but you ought to be able to load a custom 
EL3 stub that could fix things up, in premise at least.

> 
> Broadcom folks: can you lease check whether the firmware on this
> system correctly configures SCR_EL3.HCE?

Let me confirm that with the responsible folks.
-- 
Florian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12  6:51     ` Marc Zyngier
  2021-08-12  7:32       ` Rafał Miłecki
@ 2021-08-12  8:33       ` Florian Fainelli
  1 sibling, 0 replies; 16+ messages in thread
From: Florian Fainelli @ 2021-08-12  8:33 UTC (permalink / raw)
  To: Marc Zyngier, Rafał Miłecki
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli



On 8/11/2021 11:51 PM, Marc Zyngier wrote:
[snip]
>> CPU: All CPU(s) started at EL2
> 
> Interestingly, all your CPUs are booting at EL2. Which is great.  Can
> you try and enable KVM on your existing 5.10 kernel? Just selecting
> CONFIG_KVM should be enough. Does it boot correctly with KVM enabled?
> 
> My suspicion is that the firmware doesn't set SCR_EL3.HCE, and that
> the HVC instruction UNDEFs at EL1. That would be bad news.

Other Brahma-B53 devices that we work with (set-top box and cable modem 
chips) boot 5.12 and beyond correctly, and indeed, our EL3 firmware 
sets-up SCR_EL3.HCE.
-- 
Florian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12  8:24           ` Rafał Miłecki
@ 2021-08-12 10:13             ` Marc Zyngier
  2021-08-12 12:29               ` Rafał Miłecki
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2021-08-12 10:13 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On Thu, 12 Aug 2021 09:24:14 +0100,
Rafał Miłecki <zajec5@gmail.com> wrote:
> 
> On 12.08.2021 09:57, Marc Zyngier wrote:
> > On Thu, 12 Aug 2021 08:32:02 +0100,
> > Rafał Miłecki <zajec5@gmail.com> wrote:
> >> 
> >> On 12.08.2021 08:51, Marc Zyngier wrote:
> >>> Interestingly, all your CPUs are booting at EL2. Which is great.  Can
> >>> you try and enable KVM on your existing 5.10 kernel? Just selecting
> >>> CONFIG_KVM should be enough. Does it boot correctly with KVM enabled?
> >>> 
> >>> My suspicion is that the firmware doesn't set SCR_EL3.HCE, and that
> >>> the HVC instruction UNDEFs at EL1. That would be bad news.
> >> 
> >> Interesting! I had to enable CONFIG_VIRTUALIZATION and CONFIG_NET first.
> >> First I verified kernel built with those options still boots. It does.
> >> 
> >> Then I enabled CONFIG_KVM and kernel seems to hang around switching from
> >> bootconsole to the console.
> >> 
> >> Starting program at 0x0000000000080000
> >> /memory = 0x40000000
> >> WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
> >> WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
> >> WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
> >> Booting Linux on physical CPU 0x0000000000 [0x420f1000]
> >> Linux version 5.11.22-g0453a426c37b (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #8 SMP Thu Aug 12 09:25:55 CEST 2021
> >> Machine model: Asus GT-AC5300
> >> earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
> >> printk: bootconsole [bcm63xx_uart0] enabled
> >> efi: UEFI not found.
> >> [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
> >> Zone ranges:
> >>    DMA      [mem 0x0000000000000000-0x000000003fffffff]
> >>    DMA32    empty
> >>    Normal   empty
> >> Movable zone start for each node
> >> Early memory node ranges
> >>    node   0: [mem 0x0000000000000000-0x000000003fffffff]
> >> Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
> >> percpu: Embedded 18 pages/cpu s43904 r0 d29824 u73728
> >> Detected VIPT I-cache on CPU0
> >> CPU features: detected: ARM erratum 843419
> >> Built 1 zonelists, mobility grouping on.  Total pages: 258048
> >> Kernel command line: earlycon=bcm63xx_uart,0xff800640
> >> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
> >> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
> >> mem auto-init: stack:off, heap alloc:off, heap free:off
> >> Memory: 1019556K/1048576K available (4352K kernel code, 678K rwdata, 860K rodata, 2496K init, 232K bss, 29020K reserved, 0K cma-reserved)
> >> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> rcu: Hierarchical RCU implementation.
> >> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> >> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
> >> GIC: Using split EOI/Deactivate mode
> >> random: get_random_bytes called from start_kernel+0x33c/0x52c with crng_init=0
> >> arch_timer: cp15 timer(s) running at 50.00MHz (phys).
> >> clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
> >> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
> >> Console: colour dummy device 80x25
> >> printk: console [tty0] enabled
> >> printk: bootconsole [bcm63xx_uart0] disabled
> >> 
> >> 
> >> (Unless it's a false conclusion and CONFIG_KVM just breaks console
> >> somehow)
> > 
> > No, that's because you don't pass the right console to your
> > kernel. Add something like "console=ttyS0,115200" to the kernel
> > command line, which will show what you are missing, as well as stop
> > the double-logging.
> > 
> > Anyway, the fact that it stops booting when you enable KVM confirms my
> > suspicion. The firmware on this system is probably crap enough not to
> > enable HVC. Let's confirm it further: please apply the patch below on
> > top of mainline and tell me that it now boots fine...
> 
> Thanks for the patch! It workarounds the issue. See below.
> 
> 
> > Are you in a position where you can actually fix the firmware? Or is
> > it some closed-source blob?
> 
> I'm just an end-user with no access to CFE sources and without any
> business contact as Broadcom :(

I feared that would be the case. Florian's reply seems to indicate
that the "upstream" firmware implementation is correct, so the OEM
must have fumbled it somehow...

> > diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
> > index 43d212618834..fc95b103ef42 100644
> > --- a/arch/arm64/kernel/hyp-stub.S
> > +++ b/arch/arm64/kernel/hyp-stub.S
> > @@ -238,7 +238,7 @@ SYM_FUNC_START(switch_to_vhe)
> >     	// Turn the world upside down
> >   	mov	x0, #HVC_VHE_RESTART
> > -	hvc	#0
> > +//	hvc	#0
> >   1:
> >   	ret
> >   SYM_FUNC_END(switch_to_vhe)
> 
> This allows me to boot 5.13.9 and 5.14-rc5 without any reverts!
> 
> Enabling CONFIG_KVM still results in the:
> Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt

That's expected. Can you please check the patch below? It should
result in a booting kernel which actually survives having KVM compiled
in. It should even display a warning telling you that your setup is
completely buggered.

That's obviously not the final version, but probably a good enough
approximation.

Thanks,

	M.


diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 43d212618834..74a130808b38 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -46,6 +46,14 @@ SYM_CODE_END(__hyp_stub_vectors)
 	.align 11
 
 SYM_CODE_START_LOCAL(elx_sync)
+	mrs	x4, CurrentEL
+	cmp	x4, #CurrentEL_EL1
+	b.ne	0f
+	mrs	x4, esr_el1
+	eor	x4, x4, #ESR_ELx_IL
+	cbz	x4, el1_undef
+
+0:
 	cmp	x0, #HVC_SET_VECTORS
 	b.ne	1f
 	msr	vbar_el2, x1
@@ -71,6 +79,18 @@ SYM_CODE_START_LOCAL(elx_sync)
 
 9:	mov	x0, xzr
 	eret
+
+el1_undef:
+	# Downgrade the sucker to EL1...
+	adr_l	x1, __boot_cpu_mode
+	mov	w0, #BOOT_CPU_MODE_EL1
+	str	w0, [x1]
+
+	mrs	x0, elr_el1
+	add	x0, x0, #4
+	msr	elr_el1, x0
+	mov_q	x0, HVC_STUB_ERR
+	eret
 SYM_CODE_END(elx_sync)
 
 // nVHE? No way! Give me the real thing!
@@ -236,6 +256,19 @@ SYM_FUNC_START(switch_to_vhe)
 	cmp	x0, #CurrentEL_EL1
 	b.ne	1f
 
+	// Check that HVC actually works...
+	adr_l	x1, __hyp_stub_vectors
+	msr	vbar_el1, x1
+	isb
+	mov	x0, #HVC_RESET_VECTORS
+	hvc	#0
+	adr_l	x1, vectors
+	msr	vbar_el1, x1
+	isb
+	mov_q	x1, HVC_STUB_ERR
+	cmp	x0, x1
+	b.eq	1f
+
 	// Turn the world upside down
 	mov	x0, #HVC_VHE_RESTART
 	hvc	#0

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12 10:13             ` Marc Zyngier
@ 2021-08-12 12:29               ` Rafał Miłecki
  2021-08-12 12:57                 ` Marc Zyngier
  0 siblings, 1 reply; 16+ messages in thread
From: Rafał Miłecki @ 2021-08-12 12:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On 12.08.2021 12:13, Marc Zyngier wrote:
> On Thu, 12 Aug 2021 09:24:14 +0100,
> Rafał Miłecki <zajec5@gmail.com> wrote:
>>
>> On 12.08.2021 09:57, Marc Zyngier wrote:
>>> On Thu, 12 Aug 2021 08:32:02 +0100,
>>> Rafał Miłecki <zajec5@gmail.com> wrote:
>>>>
>>>> On 12.08.2021 08:51, Marc Zyngier wrote:
>>>>> Interestingly, all your CPUs are booting at EL2. Which is great.  Can
>>>>> you try and enable KVM on your existing 5.10 kernel? Just selecting
>>>>> CONFIG_KVM should be enough. Does it boot correctly with KVM enabled?
>>>>>
>>>>> My suspicion is that the firmware doesn't set SCR_EL3.HCE, and that
>>>>> the HVC instruction UNDEFs at EL1. That would be bad news.
>>>>
>>>> Interesting! I had to enable CONFIG_VIRTUALIZATION and CONFIG_NET first.
>>>> First I verified kernel built with those options still boots. It does.
>>>>
>>>> Then I enabled CONFIG_KVM and kernel seems to hang around switching from
>>>> bootconsole to the console.
>>>>
>>>> Starting program at 0x0000000000080000
>>>> /memory = 0x40000000
>>>> WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
>>>> WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
>>>> WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
>>>> Booting Linux on physical CPU 0x0000000000 [0x420f1000]
>>>> Linux version 5.11.22-g0453a426c37b (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #8 SMP Thu Aug 12 09:25:55 CEST 2021
>>>> Machine model: Asus GT-AC5300
>>>> earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
>>>> printk: bootconsole [bcm63xx_uart0] enabled
>>>> efi: UEFI not found.
>>>> [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
>>>> Zone ranges:
>>>>     DMA      [mem 0x0000000000000000-0x000000003fffffff]
>>>>     DMA32    empty
>>>>     Normal   empty
>>>> Movable zone start for each node
>>>> Early memory node ranges
>>>>     node   0: [mem 0x0000000000000000-0x000000003fffffff]
>>>> Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
>>>> percpu: Embedded 18 pages/cpu s43904 r0 d29824 u73728
>>>> Detected VIPT I-cache on CPU0
>>>> CPU features: detected: ARM erratum 843419
>>>> Built 1 zonelists, mobility grouping on.  Total pages: 258048
>>>> Kernel command line: earlycon=bcm63xx_uart,0xff800640
>>>> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
>>>> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
>>>> mem auto-init: stack:off, heap alloc:off, heap free:off
>>>> Memory: 1019556K/1048576K available (4352K kernel code, 678K rwdata, 860K rodata, 2496K init, 232K bss, 29020K reserved, 0K cma-reserved)
>>>> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>>>> rcu: Hierarchical RCU implementation.
>>>> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
>>>> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
>>>> GIC: Using split EOI/Deactivate mode
>>>> random: get_random_bytes called from start_kernel+0x33c/0x52c with crng_init=0
>>>> arch_timer: cp15 timer(s) running at 50.00MHz (phys).
>>>> clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
>>>> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
>>>> Console: colour dummy device 80x25
>>>> printk: console [tty0] enabled
>>>> printk: bootconsole [bcm63xx_uart0] disabled
>>>>
>>>>
>>>> (Unless it's a false conclusion and CONFIG_KVM just breaks console
>>>> somehow)
>>>
>>> No, that's because you don't pass the right console to your
>>> kernel. Add something like "console=ttyS0,115200" to the kernel
>>> command line, which will show what you are missing, as well as stop
>>> the double-logging.
>>>
>>> Anyway, the fact that it stops booting when you enable KVM confirms my
>>> suspicion. The firmware on this system is probably crap enough not to
>>> enable HVC. Let's confirm it further: please apply the patch below on
>>> top of mainline and tell me that it now boots fine...
>>
>> Thanks for the patch! It workarounds the issue. See below.
>>
>>
>>> Are you in a position where you can actually fix the firmware? Or is
>>> it some closed-source blob?
>>
>> I'm just an end-user with no access to CFE sources and without any
>> business contact as Broadcom :(
> 
> I feared that would be the case. Florian's reply seems to indicate
> that the "upstream" firmware implementation is correct, so the OEM
> must have fumbled it somehow...

Please note that Broadcom has many business units, many teams and from
my understanding they often don't cooperate properly.

It's likely that BCM4908 BU screwed something up. Or maybe it's a matter
of CFE vs. U-Boot?

Florian: does your team (set-top box and cable modem devices) use CFE or
U-Boot with kernels 5.12+?

It's very unlikely it's a single OEM that broke CFE with custom
modifications. This problem affects all 3 devices I own:
1. Netgear R8000P
2. TP-Link Archer C2300 V1
3. Asus GT-AC5300


>>> diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
>>> index 43d212618834..fc95b103ef42 100644
>>> --- a/arch/arm64/kernel/hyp-stub.S
>>> +++ b/arch/arm64/kernel/hyp-stub.S
>>> @@ -238,7 +238,7 @@ SYM_FUNC_START(switch_to_vhe)
>>>      	// Turn the world upside down
>>>    	mov	x0, #HVC_VHE_RESTART
>>> -	hvc	#0
>>> +//	hvc	#0
>>>    1:
>>>    	ret
>>>    SYM_FUNC_END(switch_to_vhe)
>>
>> This allows me to boot 5.13.9 and 5.14-rc5 without any reverts!
>>
>> Enabling CONFIG_KVM still results in the:
>> Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
> 
> That's expected. Can you please check the patch below? It should
> result in a booting kernel which actually survives having KVM compiled
> in. It should even display a warning telling you that your setup is
> completely buggered.
> 
> That's obviously not the final version, but probably a good enough
> approximation.

It seems to work! Kernel has booted and I saw:
CPU: CPUs started in inconsistent modes
WARNING: CPU: 0 PID: 1 at arch/arm64/kernel/smp.c:426 smp_cpus_done+0x8c/0xc8
(...)
kvm [1]: HYP mode not available


Starting program at 0x0000000000080000
/memory = 0x40000000
WARNING: Node's property /reserved-memory/dt_reserved_buffer is not defined
WARNING: Node's property /reserved-memory/dt_reserved_flow is not defined
WARNING: Node's property /reserved-memory/dt_reserved_dhd2 is not defined
Booting Linux on physical CPU 0x0000000000 [0x420f1000]
Linux version 5.14.0-rc5-g8bad1731c752-dirty (rmilecki@localhost.localdomain) (aarch64-buildroot-linux-uclibc-gcc.br_real (Buildroot -g91617ed) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #23 SMP Thu Aug 12 14:20:52 CEST 2021
Machine model: Asus GT-AC5300
earlycon: bcm63xx_uart0 at MMIO 0x00000000ff800640 (options '')
printk: bootconsole [bcm63xx_uart0] enabled
efi: UEFI not found.
[Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
Zone ranges:
   DMA      [mem 0x0000000000000000-0x000000003fffffff]
   DMA32    empty
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x0000000000000000-0x000000003fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
percpu: Embedded 18 pages/cpu s44568 r0 d29160 u73728
Detected VIPT I-cache on CPU0
CPU features: detected: ARM erratum 843419
Built 1 zonelists, mobility grouping on.  Total pages: 258048
Kernel command line: earlycon=bcm63xx_uart,0xff800640 console=ttyS0,115200
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 1017496K/1048576K available (5824K kernel code, 762K rwdata, 1100K rodata, 2624K init, 259K bss, 31080K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
rcu: Hierarchical RCU implementation.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
Root IRQ handler: gic_handle_irq
random: get_random_bytes called from start_kernel+0x4a0/0x6dc with crng_init=0
arch_timer: cp15 timer(s) running at 50.00MHz (virt).
clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
Console: colour dummy device 80x25
Calibrating delay loop (skipped), value calculated using timer frequency.. 100.00 BogoMIPS (lpj=200000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
rcu: Hierarchical SRCU implementation.
EFI services will not be available.
smp: Bringing up secondary CPUs ...
Detected VIPT I-cache on CPU1
CPU1: Booted secondary processor 0x0000000001 [0x420f1000]
Detected VIPT I-cache on CPU2
CPU2: Booted secondary processor 0x0000000002 [0x420f1000]
Detected VIPT I-cache on CPU3
CPU3: Booted secondary processor 0x0000000003 [0x420f1000]
smp: Brought up 1 node, 4 CPUs
SMP: Total of 4 processors activated.
CPU features: detected: 32-bit EL0 Support
CPU features: detected: 32-bit EL1 Support
CPU features: detected: CRC32 instructions
------------[ cut here ]------------
CPU: CPUs started in inconsistent modes
WARNING: CPU: 0 PID: 1 at arch/arm64/kernel/smp.c:426 smp_cpus_done+0x8c/0xc8
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc5-g8bad1731c752-dirty #23
Hardware name: Asus GT-AC5300 (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
pc : smp_cpus_done+0x8c/0xc8
lr : smp_cpus_done+0x8c/0xc8
sp : ffffffc01002be00
x29: ffffffc01002be00 x28: 0000000000000000 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
x23: ffffffc010ab4000 x22: 0000000000000000 x21: 0000000000000000
x20: ffffffc0107b7e74 x19: ffffffc010a78000 x18: 0000000000000001
x17: ffffffc010a9ee40 x16: 0000000000000000 x15: 0000424c5a953180
x14: fffffffffffc0ef7 x13: 0000000000000037 x12: ffffff80010b03b0
x11: 00000000ffffffea x10: ffffffc010a5eb50 x9 : 0000000000000001
x8 : 0000000000000001 x7 : 0000000000017fe8 x6 : c0000000ffffefff
x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 00000000ffffffff x1 : 33e2e90440df2000 x0 : 0000000000000000
Call trace:
  smp_cpus_done+0x8c/0xc8
  smp_init+0x68/0x78
  kernel_init_freeable+0xd0/0x214
  kernel_init+0x24/0x120
  ret_from_fork+0x10/0x18
---[ end trace 773cbee471955c5a ]---
alternatives: patching kernel code
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
pinctrl core: initialized pinctrl subsystem
DMI not present or invalid.
NET: Registered PF_NETLINK/PF_ROUTE protocol family
DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
thermal_sys: Registered thermal governor 'step_wise'
ASID allocator initialised with 65536 entries
iommu: Default domain type: Translated
vgaarb: loaded
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
clocksource: Switched to clocksource arch_sys_counter
NET: Registered PF_INET protocol family
IP idents hash table entries: 16384 (order: 5, 131072 bytes, linear)
tcp_listen_portaddr_hash hash table entries: 512 (order: 1, 8192 bytes, linear)
TCP established hash table entries: 8192 (order: 4, 65536 bytes, linear)
TCP bind hash table entries: 8192 (order: 5, 131072 bytes, linear)
TCP: Hash tables configured (established 8192 bind 8192)
UDP hash table entries: 512 (order: 2, 16384 bytes, linear)
UDP-Lite hash table entries: 512 (order: 2, 16384 bytes, linear)
NET: Registered PF_UNIX/PF_LOCAL protocol family
PCI: CLS 0 bytes, default 64
kvm [1]: HYP mode not available
workingset: timestamp_bits=62 max_order=18 bucket_order=0
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler mq-deadline registered
io scheduler kyber registered
ɥ饹�ѭ� console [ttyS0] enabled 0xff800640 (irq = 24, base_baud = 1562500) is a bcm63xx_uart
printk: console [ttyS0] enabled
printk: bootconsole [bcm63xx_uart0] disabled
printk: bootconsole [bcm63xx_uart0] disabled
nand: device found, Manufacturer ID: 0xc8, Chip ID: 0xda
nand: ESMT NAND 256MiB 3,3V 8-bit
nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
bcm63138_nand ff801800.nand: detected 256MiB total, 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
Bad block table found at page 131008, version 0x01
Bad block table found at page 130944, version 0x01
3 fixed-partitions partitions found on MTD device brcmnand.0
Creating 3 MTD partitions on "brcmnand.0":
0x000000000000-0x000000100000 : "cferom"
0x000000100000-0x000005800000 : "firmware"
0x000005800000-0x00000af00000 : "backup"
libphy: Fixed MDIO Bus: probed
libphy: unimac MII bus: probed
unimac-mdio 800c05c0.mdio: Broadcom UniMAC MDIO bus
libphy: sf2 slave mii: probed
brcm-sf2 80080000.ethernet-switch: found switch: BCM4908, rev 0
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci-pci: EHCI PCI platform driver
ehci-platform: EHCI generic platform driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci-pci: OHCI PCI platform driver
ohci-platform: OHCI generic platform driver
i2c /dev entries driver
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
NET: Registered PF_INET6 protocol family
Segment Routing with IPv6
sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
NET: Registered PF_PACKET protocol family
8021q: 802.1Q VLAN Support v1.8
brcmstb-usb-phy 8000c200.usb-phy: Clock not found in Device Tree
brcmstb-usb-phy 8000c200.usb-phy: USB3.0 clock not found in Device Tree
brcmstb-usb-phy 8000c200.usb-phy: Suspend Clock not found in Device Tree
brcmstb-usb-phy 8000c200.usb-phy: IRQ wake not found
brcmstb-usb-phy 8000c200.usb-phy: IRQ wakeup not found
brcmstb-usb-phy 8000c200.usb-phy: Wake interrupt missing, system wake not supported
libphy: sf2 slave mii: probed
brcm-sf2 80080000.ethernet-switch: found switch: BCM4908, rev 0
brcm-sf2 80080000.ethernet-switch lan2 (uninitialized): PHY [800c05c0.mdio--1:08] driver [Generic PHY] (irq=POLL)
brcm-sf2 80080000.ethernet-switch lan1 (uninitialized): PHY [800c05c0.mdio--1:09] driver [Generic PHY] (irq=POLL)
brcm-sf2 80080000.ethernet-switch lan6 (uninitialized): PHY [800c05c0.mdio--1:0a] driver [Generic PHY] (irq=POLL)
brcm-sf2 80080000.ethernet-switch lan5 (uninitialized): PHY [800c05c0.mdio--1:0b] driver [Generic PHY] (irq=POLL)
brcm-sf2 80080000.ethernet-switch: configuring for fixed/internal link mode
eth0: mtu greater than device maximum
bcm4908_enet 80002000.ethernet eth0: error -22 setting MTU to 1504 to include DSA overhead
DSA: tree 0 setup
brcm-sf2 80080000.ethernet-switch: Starfighter 2 top: 4.07, core: 5.00, IRQs: 22, 23
brcm-sf2 80080000.ethernet-switch: Link is Up - 1Gbps/Full - flow control off
ehci-platform 8000c300.usb: EHCI Host Controller
ehci-platform 8000c300.usb: new USB bus registered, assigned bus number 1
ehci-platform 8000c300.usb: irq 19, io mem 0x8000c300
ehci-platform 8000c300.usb: USB 2.0 started, EHCI 1.00
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
ohci-platform 8000c400.usb: Generic Platform OHCI controller
ohci-platform 8000c400.usb: new USB bus registered, assigned bus number 2
ohci-platform 8000c400.usb: irq 20, io mem 0x8000c400
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
xhci-hcd 8000d000.usb: xHCI Host Controller
xhci-hcd 8000d000.usb: new USB bus registered, assigned bus number 3
xhci-hcd 8000d000.usb: hcc params 0x0250f17c hci version 0x100 quirks 0x0000000000010010
xhci-hcd 8000d000.usb: irq 21, io mem 0x8000d000
hub 3-0:1.0: USB hub found
hub 3-0:1.0: config failed, hub doesn't have any ports! (err -19)
xhci-hcd 8000d000.usb: xHCI Host Controller
xhci-hcd 8000d000.usb: new USB bus registered, assigned bus number 4
xhci-hcd 8000d000.usb: Host supports USB 3.0 SuperSpeed
usb usb4: We don't know the algorithms for LPM for this host, disabling LPM.
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
Freeing unused kernel memory: 2624K
Run /init as init process
tmpfs: Unknown parameter 'mode'
mount: mounting tmpfs: Unknown parameter 'mode'
tmpfs on /dev/shtmpfs: Unknown parameter 'mode'
m failed: Invalid argument
mount: mounting tmpfs on /tmp failed: Invalid argument
mount: mounting tmpfs on /run failed: Invalid argument
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Savrandom: dd: uninitialized urandom read (512 bytes read)
ing random seed: OK
Starting network: brcm-sf2 80080000.ethernet-switch lan1: configuring for phy/internal link mode
8021q: adding VLAN 0 to HW filter on device lan1
brcm-sf2 80080000.ethernet-switch lan2: configuring for phy/internal link mode
8021q: adding VLAN 0 to HW filter on device lan2
br-lan: port 1(lan1) entered blocking state
br-lan: port 1(lan1) entered disabled state
device lan1 entered promiscuous mode
device eth0 entered promiscuous mode
br-lan: port 2(lan2) entered blocking state
br-lan: port 2(lan2) entered disabled state
device lan2 entered promiscuous mode
OK

Welcome to Buildroot
buildroot login: brcm-sf2 80080000.ethernet-switch lan1: Link is Up - 1Gbps/Full - flow control rx/tx
IPv6: ADDRCONF(NETDEV_CHANGE): lan1: link becomes ready
br-lan: port 1(lan1) entered blocking state
br-lan: port 1(lan1) entered forwarding state
IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready

Welcome to Buildroot
buildroot login:

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12 12:29               ` Rafał Miłecki
@ 2021-08-12 12:57                 ` Marc Zyngier
  2021-08-12 18:29                   ` Florian Fainelli
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2021-08-12 12:57 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli

On Thu, 12 Aug 2021 13:29:56 +0100,
Rafał Miłecki <zajec5@gmail.com> wrote:
> 
> On 12.08.2021 12:13, Marc Zyngier wrote:
> > On Thu, 12 Aug 2021 09:24:14 +0100,
> > Rafał Miłecki <zajec5@gmail.com> wrote:

[...]

> >> I'm just an end-user with no access to CFE sources and without any
> >> business contact as Broadcom :(
> > 
> > I feared that would be the case. Florian's reply seems to indicate
> > that the "upstream" firmware implementation is correct, so the OEM
> > must have fumbled it somehow...
> 
> Please note that Broadcom has many business units, many teams and from
> my understanding they often don't cooperate properly.

I bet some team sampled an early version of the firmware that included
the bug and never looked back. You can also tell the level of quality
by the fact that it uses spin-tables to boot, that the interrupt
controller node is incomplete...

> It's likely that BCM4908 BU screwed something up. Or maybe it's a matter
> of CFE vs. U-Boot?

It is a matter of whatever is running at EL3 and doing the basic setup
of the CPUs.
> 
> Florian: does your team (set-top box and cable modem devices) use CFE or
> U-Boot with kernels 5.12+?
> 
> It's very unlikely it's a single OEM that broke CFE with custom
> modifications. This problem affects all 3 devices I own:
> 1. Netgear R8000P
> 2. TP-Link Archer C2300 V1
> 3. Asus GT-AC5300

They probably all use the same pre-cast design with some sort of
value-add on top.

[...]

> > That's expected. Can you please check the patch below? It should
> > result in a booting kernel which actually survives having KVM compiled
> > in. It should even display a warning telling you that your setup is
> > completely buggered.
> > 
> > That's obviously not the final version, but probably a good enough
> > approximation.
> 
> It seems to work! Kernel has booted and I saw:
> CPU: CPUs started in inconsistent modes
> WARNING: CPU: 0 PID: 1 at arch/arm64/kernel/smp.c:426 smp_cpus_done+0x8c/0xc8
> (...)
> kvm [1]: HYP mode not available

Right. So there is some hope. Maybe. I'm not sure I want to maintain
this crap though.

[...]

> nand: device found, Manufacturer ID: 0xc8, Chip ID: 0xda
> nand: ESMT NAND 256MiB 3,3V 8-bit
> nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
> bcm63138_nand ff801800.nand: detected 256MiB total, 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> Bad block table found at page 131008, version 0x01
> Bad block table found at page 130944, version 0x01
> 3 fixed-partitions partitions found on MTD device brcmnand.0
> Creating 3 MTD partitions on "brcmnand.0":
> 0x000000000000-0x000000100000 : "cferom"
> 0x000000100000-0x000005800000 : "firmware"
> 0x000005800000-0x00000af00000 : "backup"

So here's your chance! You have the firmware image here (I guess
"cferom" is the one). It'd be interesting to disassemble it, find out
where SCR_EL3 is set, patch it and never look back.

Only kidding.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: arm64 regression in kernel 5.12 related to the (n)VHE
  2021-08-12 12:57                 ` Marc Zyngier
@ 2021-08-12 18:29                   ` Florian Fainelli
  0 siblings, 0 replies; 16+ messages in thread
From: Florian Fainelli @ 2021-08-12 18:29 UTC (permalink / raw)
  To: Marc Zyngier, Rafał Miłecki
  Cc: David Brazdil, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Mark Rutland, Ard Biesheuvel, Marco Elver, BCM Kernel Feedback,
	Florian Fainelli



On 8/12/2021 2:57 PM, Marc Zyngier wrote:
> On Thu, 12 Aug 2021 13:29:56 +0100,
> Rafał Miłecki <zajec5@gmail.com> wrote:
>>
>> On 12.08.2021 12:13, Marc Zyngier wrote:
>>> On Thu, 12 Aug 2021 09:24:14 +0100,
>>> Rafał Miłecki <zajec5@gmail.com> wrote:
> 
> [...]
> 
>>>> I'm just an end-user with no access to CFE sources and without any
>>>> business contact as Broadcom :(
>>>
>>> I feared that would be the case. Florian's reply seems to indicate
>>> that the "upstream" firmware implementation is correct, so the OEM
>>> must have fumbled it somehow...
>>
>> Please note that Broadcom has many business units, many teams and from
>> my understanding they often don't cooperate properly.
> 
> I bet some team sampled an early version of the firmware that included
> the bug and never looked back. You can also tell the level of quality
> by the fact that it uses spin-tables to boot, that the interrupt
> controller node is incomplete...
> 
>> It's likely that BCM4908 BU screwed something up. Or maybe it's a matter
>> of CFE vs. U-Boot?
> 
> It is a matter of whatever is running at EL3 and doing the basic setup
> of the CPUs.



>>
>> Florian: does your team (set-top box and cable modem devices) use CFE or
>> U-Boot with kernels 5.12+?

Set-top-box and cable modem devices use a different boot loader (called 
BOLT) and a different EL3 firmware than the 4908 implementation, 
although we don't use virtualization we did pay attention to the 
register set-up.

Just got confirmation from the team that authored the 4908 CFE that they 
*do not* set SCR_EL3.HCE on the premise that they do not use 
virtualization....

>>
>> It's very unlikely it's a single OEM that broke CFE with custom
>> modifications. This problem affects all 3 devices I own:
>> 1. Netgear R8000P
>> 2. TP-Link Archer C2300 V1
>> 3. Asus GT-AC5300
> 
> They probably all use the same pre-cast design with some sort of
> value-add on top.

Yes indeed.

> 
> [...]
> 
>>> That's expected. Can you please check the patch below? It should
>>> result in a booting kernel which actually survives having KVM compiled
>>> in. It should even display a warning telling you that your setup is
>>> completely buggered.
>>>
>>> That's obviously not the final version, but probably a good enough
>>> approximation.
>>
>> It seems to work! Kernel has booted and I saw:
>> CPU: CPUs started in inconsistent modes
>> WARNING: CPU: 0 PID: 1 at arch/arm64/kernel/smp.c:426 smp_cpus_done+0x8c/0xc8
>> (...)
>> kvm [1]: HYP mode not available
> 
> Right. So there is some hope. Maybe. I'm not sure I want to maintain
> this crap though.
> 
> [...]
> 
>> nand: device found, Manufacturer ID: 0xc8, Chip ID: 0xda
>> nand: ESMT NAND 256MiB 3,3V 8-bit
>> nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
>> bcm63138_nand ff801800.nand: detected 256MiB total, 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
>> Bad block table found at page 131008, version 0x01
>> Bad block table found at page 130944, version 0x01
>> 3 fixed-partitions partitions found on MTD device brcmnand.0
>> Creating 3 MTD partitions on "brcmnand.0":
>> 0x000000000000-0x000000100000 : "cferom"
>> 0x000000100000-0x000005800000 : "firmware"
>> 0x000005800000-0x00000af00000 : "backup"
> 
> So here's your chance! You have the firmware image here (I guess
> "cferom" is the one). It'd be interesting to disassemble it, find out
> where SCR_EL3 is set, patch it and never look back.
> 
> Only kidding.
> 
> 	M.
> 

-- 
Florian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-08-12 18:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 12:15 arm64 regression in kernel 5.12 related to the (n)VHE Rafał Miłecki
2021-08-11 12:50 ` Marc Zyngier
2021-08-11 16:55   ` Rafał Miłecki
2021-08-12  6:51     ` Marc Zyngier
2021-08-12  7:32       ` Rafał Miłecki
2021-08-12  7:56         ` Rafał Miłecki
2021-08-12  8:24           ` Marc Zyngier
2021-08-12  7:57         ` Marc Zyngier
2021-08-12  8:24           ` Rafał Miłecki
2021-08-12 10:13             ` Marc Zyngier
2021-08-12 12:29               ` Rafał Miłecki
2021-08-12 12:57                 ` Marc Zyngier
2021-08-12 18:29                   ` Florian Fainelli
2021-08-12  8:33           ` Florian Fainelli
2021-08-12  8:33       ` Florian Fainelli
2021-08-12  3:59 ` Rafał Miłecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).