All of lore.kernel.org
 help / color / mirror / Atom feed
* [next] arm64: boot failed - next-20220606
@ 2022-06-06 11:46 ` Naresh Kamboju
  0 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-06 11:46 UTC (permalink / raw)
  To: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm
  Cc: Stephen Rothwell, Andrew Morton, Ard Biesheuvel, Arnd Bergmann,
	Catalin Marinas, Raghuram Thammiraju, Mark Brown, Will Deacon

Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
I am bisecting this problem.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

The initial investigation show that,

GOOD: next-20220603
BAD:  next-20220606

Boot log:
Starting kernel ...

The recent changes show,

# git log --oneline  next-20220603..next-20220606  -- arch/arm64/
202693ac55e0 (origin/akpm-base, origin/akpm) Merge branch
'mm-everything' of
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
a83bdd6800e3 Merge branch 'rust-next' of
https://github.com/Rust-for-Linux/linux.git
9daba6cb8145 Merge branch 'for-next' of git://github.com/Xilinx/linux-xlnx.git
582d5ed4caf7 Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
1ec6574a3c0a Merge tag 'kthread-cleanups-for-v5.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
21873bd66b6e Merge tag 'arm64-fixes' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
a8fc46f5a417 mm: avoid unnecessary page fault retires on shared memory types
3c59c47d1a6d arm64: Change elfcore for_each_mte_vma() to use VMA iterator
1c826fa748d5 arm64: remove mmap linked list from vdso
54c2cc79194c Merge tag 'usb-5.19-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
09a018176ba2 Merge tag 'arm-late-5.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
96479c09803b Merge tag 'arm-multiplatform-5.19-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc


Test job link,
https://lkft.validation.linaro.org/scheduler/job/5136989#L560


metadata:
  git_ref: master
  git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
  git_sha: 40b58e42584bf5bd9230481dc8946f714fb387de
  git_describe: next-20220606
  kernel_version: 5.19.0-rc1
  kernel-config: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/config
  build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/556237413
  artifact-location: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy


--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [next] arm64: boot failed - next-20220606
@ 2022-06-06 11:46 ` Naresh Kamboju
  0 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-06 11:46 UTC (permalink / raw)
  To: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm
  Cc: Stephen Rothwell, Andrew Morton, Ard Biesheuvel, Arnd Bergmann,
	Catalin Marinas, Raghuram Thammiraju, Mark Brown, Will Deacon

Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
I am bisecting this problem.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

The initial investigation show that,

GOOD: next-20220603
BAD:  next-20220606

Boot log:
Starting kernel ...

The recent changes show,

# git log --oneline  next-20220603..next-20220606  -- arch/arm64/
202693ac55e0 (origin/akpm-base, origin/akpm) Merge branch
'mm-everything' of
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
a83bdd6800e3 Merge branch 'rust-next' of
https://github.com/Rust-for-Linux/linux.git
9daba6cb8145 Merge branch 'for-next' of git://github.com/Xilinx/linux-xlnx.git
582d5ed4caf7 Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
1ec6574a3c0a Merge tag 'kthread-cleanups-for-v5.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
21873bd66b6e Merge tag 'arm64-fixes' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
a8fc46f5a417 mm: avoid unnecessary page fault retires on shared memory types
3c59c47d1a6d arm64: Change elfcore for_each_mte_vma() to use VMA iterator
1c826fa748d5 arm64: remove mmap linked list from vdso
54c2cc79194c Merge tag 'usb-5.19-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
09a018176ba2 Merge tag 'arm-late-5.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
96479c09803b Merge tag 'arm-multiplatform-5.19-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc


Test job link,
https://lkft.validation.linaro.org/scheduler/job/5136989#L560


metadata:
  git_ref: master
  git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
  git_sha: 40b58e42584bf5bd9230481dc8946f714fb387de
  git_describe: next-20220606
  kernel_version: 5.19.0-rc1
  kernel-config: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/config
  build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/556237413
  artifact-location: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy


--
Linaro LKFT
https://lkft.linaro.org

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-06 11:46 ` Naresh Kamboju
@ 2022-06-07  5:30   ` Naresh Kamboju
  -1 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-07  5:30 UTC (permalink / raw)
  To: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm
  Cc: Stephen Rothwell, Andrew Morton, Ard Biesheuvel, Arnd Bergmann,
	Catalin Marinas, Raghuram Thammiraju, Mark Brown, Will Deacon,
	Shakeel Butt, Roman Gushchin, Vasily Averin, Qian Cai

On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> I am bisecting this problem.
>
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>
> The initial investigation show that,
>
> GOOD: next-20220603
> BAD:  next-20220606
>
> Boot log:
> Starting kernel ...

Linux next-20220606 and next-20220607 arm64 boot failed.
The kernel panic log showing after earlycon.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

[    0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
[    0.000000] Linux version 5.19.0-rc1-next-20220606
(tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 11.3.0-3) 11.3.0, GNU
ld (GNU Binutils for Debian) 2.38) #1 SMP PREEMPT @1654490846
[    0.000000] Machine model: ARM Juno development board (r2)
[    0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '')
[    0.000000] printk: bootconsole [pl11] enabled
[    0.000000] efi: UEFI not found.
[    0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '115200n8')
[    0.000000] ------------[ cut here ]------------
[    0.000000] console 'pl11' already registered
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/printk/printk.c:3327
register_console+0x64/0x2ec
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
5.19.0-rc1-next-20220606 #1
[    0.000000] Hardware name: ARM Juno development board (r2) (DT)
[    0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : register_console+0x64/0x2ec
[    0.000000] lr : register_console+0x64/0x2ec
[    0.000000] sp : ffff80000a963c80
[    0.000000] x29: ffff80000a963c80 x28: 00000000820a0018 x27: 0000000000000000
[    0.000000] x26: 00000000fef770dc x25: 0000000000000000 x24: ffff80000acbc000
[    0.000000] x23: 0000000000000000 x22: ffff80000a0b1a30 x21: ffff80000ae39250
[    0.000000] x20: 00000000000050cc x19: ffff80000acbc5e0 x18: ffffffffffffffff
[    0.000000] x17: 0000000000ffa000 x16: 00000009ff006000 x15: ffff80008a963957
[    0.000000] x14: 0000000000000000 x13: 6465726574736967 x12: 6572207964616572
[    0.000000] x11: 6c61202731316c70 x10: ffff80000a9ea6a8 x9 : ffff80000a9926a8
[    0.000000] x8 : 00000000ffffefff x7 : ffff80000a9ea6a8 x6 : 0000000000000000
[    0.000000] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
[    0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff80000a979e00
[    0.000000] Call trace:
[    0.000000]  register_console+0x64/0x2ec
[    0.000000]  of_setup_earlycon+0x254/0x278
[    0.000000]  early_init_dt_scan_chosen_stdout+0x164/0x1a4
[    0.000000]  acpi_boot_table_init+0x1d8/0x218
[    0.000000]  setup_arch+0x28c/0x5f0
[    0.000000]  start_kernel+0xa4/0x748
[    0.000000]  __primary_switched+0xc0/0xc8
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] NUMA: No NUMA configuration found
login-action: exception
#
[    0.000000] NUMA: Faking a #
[login-action] Waiting for messages, (timeout 00:12:59)
node at [mem 0x0000000080000000-0x00000009ffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x9fefd5b40-0x9fefd7fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x00000009ffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080000000-0x00000000feffffff]
[    0.000000]   node   0: [mem 0x0000000880000000-0x00000009ffffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
[    0.000000] On node 0, zone Normal: 4096 pages in unavailable ranges
[    0.000000] cma: Reserved 32 MiB at 0x00000000fd000000
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] psci: SMC Calling Convention v1.0
[    0.000000] percpu: Embedded 30 pages/cpu s82792 r8192 d31896 u122880
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: detected: ARM erratum 843419
[    0.000000] CPU features: detected: ARM erratum 845719
[    0.000000] Fallback order for Node 0: 0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 2060288
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: console=ttyAMA0,115200n8
root=/dev/nfs rw
nfsroot=10.66.16.125:/var/lib/lava/dispatcher/tmp/5143101/extract-nfsrootfs-i9fmnadt,tcp,hard,vers=3,wsize=65536
earlycon=pl011,0x7ff80000 console_msg_format=syslog earlycon
default_hugepagesz=2M hugepages=256
sky2.mac_address=0x00,0x02,0xF7,0x00,0x67,0x17 ip=dhcp
<6>[    0.000000] HugeTLB: can optimize 7 vmemmap pages for hugepages-2048kB
<6>[    0.000000] Dentry cache hash table entries: 1048576 (order: 11,
8388608 bytes, linear)
<6>[    0.000000] Inode-cache hash table entries: 524288 (order: 10,
4194304 bytes, linear)
<6>[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
<6>[    0.000000] software IO TLB: mapped [mem
0x00000000f9000000-0x00000000fd000000] (64MB)
<6>[    0.000000] Memory: 8062180K/8372224K available (20032K kernel
code, 4884K rwdata, 11148K rodata, 11008K init, 951K bss, 277276K
reserved, 32768K cma-reserved)
<6>[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1
<6>[    0.000000] ftrace: allocating 65019 entries in 254 pages
<6>[    0.000000] ftrace: allocated 254 pages with 7 groups
<6>[    0.000000] trace event string verifier disabled
<6>[    0.000000] rcu: Preemptible hierarchical RCU implementation.
<6>[    0.000000] rcu: RCU event tracing is enabled.
<6>[    0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=6.
<6>[    0.000000] Trampoline variant of Tasks RCU enabled.
<6>[    0.000000] Rude variant of Tasks RCU enabled.
<6>[    0.000000] Tracing variant of Tasks RCU enabled.
<6>[    0.000000] rcu: RCU calculated value of scheduler-enlistment
delay is 25 jiffies.
<6>[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=6
<6>[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
<6>[    0.000000] Root IRQ handler: gic_handle_irq
<6>[    0.000000] GIC: Using split EOI/Deactivate mode
<6>[    0.000000] GICv2m: range[mem 0x2c1c0000-0x2c1cffff], SPI[224:255]
<6>[    0.000000] GICv2m: range[mem 0x2c1d0000-0x2c1dffff], SPI[256:287]
<6>[    0.000000] GICv2m: range[mem 0x2c1e0000-0x2c1effff], SPI[288:319]
<6>[    0.000000] GICv2m: range[mem 0x2c1f0000-0x2c1fffff], SPI[320:351]
<6>[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
<6>[    0.000000] kfence: initialized - using 2097152 bytes for 255
objects at 0x(____ptrval____)-0x(____ptrval____)
<3>[    0.000000] timer_sp804: timer clock not found: -517
<3>[    0.000000] timer_sp804: arm,sp804 clock not found: -2
<3>[    0.000000] Failed to initialize
'/bus@8000000/motherboard-bus@8000000/iofpga-bus@300000000/timer@110000':
-22
<3>[    0.000000] timer_sp804: timer clock not found: -517
<3>[    0.000000] timer_sp804: arm,sp804 clock not found: -2
<3>[    0.000000] Failed to initialize
'/bus@8000000/motherboard-bus@8000000/iofpga-bus@300000000/timer@120000':
-22
<6>[    0.000000] arch_timer: cp15 and mmio timer(s) running at
50.00MHz (phys/phys).
<6>[    0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
<6>[    0.000000] sched_clock: 56 bits at 50MHz, resolution 20ns,
wraps every 4398046511100ns
<6>[    0.009801] Console: colour dummy device 80x25
<6>[    0.014654] Calibrating delay loop (skipped), value calculated
using timer frequency.. 100.00 BogoMIPS (lpj=200000)
<6>[    0.025413] pid_max: default: 32768 minimum: 301
<6>[    0.030453] LSM: Security Framework initializing
<1>[    0.035435] Unable to handle kernel paging request at virtual
address fffffe00002bc248
<1>[    0.043654] Mem abort info:
<1>[    0.046719]   ESR = 0x0000000096000004
<1>[    0.050752]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[    0.056355]   SET = 0, FnV = 0
<1>[    0.059683]   EA = 0, S1PTW = 0
<1>[    0.063105]   FSC = 0x04: level 0 translation fault
<1>[    0.068270] Data abort info:
<1>[    0.071421]   ISV = 0, ISS = 0x00000004
<1>[    0.075539]   CM = 0, WnR = 0
<1>[    0.078780] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082090000
<1>[    0.085778] [fffffe00002bc248] pgd=0000000000000000, p4d=0000000000000000
<0>[    0.092881] Internal error: Oops: 96000004 [#1] PREEMPT SMP
<4>[    0.098730] Modules linked in:
<4>[    0.102054] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W
     5.19.0-rc1-next-20220606 #1
<4>[    0.111214] Hardware name: ARM Juno development board (r2) (DT)
<4>[    0.117407] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT
-SSBS BTYPE=--)
<4>[    0.124652] pc : mem_cgroup_from_obj+0x2c/0x120
<4>[    0.129462] lr : register_pernet_operations+0xf0/0x59c
<4>[    0.134878] sp : ffff80000a963d70
<4>[    0.138458] x29: ffff80000a963d70 x28: 00000000820a0018 x27:
0000000000000000
<4>[    0.145886] x26: ffff80000a0c7688 x25: ffff80000a0c7688 x24:
ffff80000ad5e680
<4>[    0.153313] x23: ffff80000a963dd8 x22: ffff80000ad5e818 x21:
ffff80000a979e00
<4>[    0.160739] x20: ffff80000af09740 x19: ffff80000ad5e720 x18:
0000000000000014
<4>[    0.168166] x17: 00000000beabf81a x16: 00000000d8e898a9 x15:
000000005b20ff98
<4>[    0.175594] x14: 00000000032b2301 x13: 00000000c9e39f56 x12:
0000000014288186
<4>[    0.183021] x11: 00000000bcf02680 x10: 000000008d09a8d9 x9 :
ffff800009146254
<4>[    0.190446] x8 : ffff80000a963d48 x7 : 0000000000000000 x6 :
0000000000000002
<4>[    0.197872] x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 :
ffff80000ad5e680
<4>[    0.205299] x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 :
ffff80000af09740
<4>[    0.212726] Call trace:
<4>[    0.215435]  mem_cgroup_from_obj+0x2c/0x120
<4>[    0.219894]  register_pernet_subsys+0x3c/0x60
<4>[    0.224523]  net_ns_init+0xe4/0x13c
<4>[    0.228285]  start_kernel+0x6d4/0x748
<4>[    0.232222]  __primary_switched+0xc0/0xc8
<0>[    0.236513] Code: b25657e4 d34cfc21 d37ae421 8b040022 (f9400443)
<4>[    0.242886] ---[ end trace 0000000000000000 ]---
<0>[    0.247788] Kernel panic - not syncing: Attempted to kill the idle task!
<0>[    0.254772] ---[ end Kernel panic - not syncing: Attempted to
kill the idle task! ]---





>
> The recent changes show,
>
> # git log --oneline  next-20220603..next-20220606  -- arch/arm64/
> 202693ac55e0 (origin/akpm-base, origin/akpm) Merge branch
> 'mm-everything' of
> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> a83bdd6800e3 Merge branch 'rust-next' of
> https://github.com/Rust-for-Linux/linux.git
> 9daba6cb8145 Merge branch 'for-next' of git://github.com/Xilinx/linux-xlnx.git
> 582d5ed4caf7 Merge branch 'master' of
> git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
> 1ec6574a3c0a Merge tag 'kthread-cleanups-for-v5.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
> 21873bd66b6e Merge tag 'arm64-fixes' of
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> a8fc46f5a417 mm: avoid unnecessary page fault retires on shared memory types
> 3c59c47d1a6d arm64: Change elfcore for_each_mte_vma() to use VMA iterator
> 1c826fa748d5 arm64: remove mmap linked list from vdso
> 54c2cc79194c Merge tag 'usb-5.19-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> 09a018176ba2 Merge tag 'arm-late-5.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> 96479c09803b Merge tag 'arm-multiplatform-5.19-2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
>
>
> Test job link,
> https://lkft.validation.linaro.org/scheduler/job/5136989#L560
>
>
> metadata:
>   git_ref: master
>   git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>   git_sha: 40b58e42584bf5bd9230481dc8946f714fb387de
>   git_describe: next-20220606
>   kernel_version: 5.19.0-rc1
>   kernel-config: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/config
>   build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/556237413
>   artifact-location: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy
>
>

 --
 Linaro LKFT
 https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-07  5:30   ` Naresh Kamboju
  0 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-07  5:30 UTC (permalink / raw)
  To: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm
  Cc: Stephen Rothwell, Andrew Morton, Ard Biesheuvel, Arnd Bergmann,
	Catalin Marinas, Raghuram Thammiraju, Mark Brown, Will Deacon,
	Shakeel Butt, Roman Gushchin, Vasily Averin, Qian Cai

On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> I am bisecting this problem.
>
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>
> The initial investigation show that,
>
> GOOD: next-20220603
> BAD:  next-20220606
>
> Boot log:
> Starting kernel ...

Linux next-20220606 and next-20220607 arm64 boot failed.
The kernel panic log showing after earlycon.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

[    0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
[    0.000000] Linux version 5.19.0-rc1-next-20220606
(tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 11.3.0-3) 11.3.0, GNU
ld (GNU Binutils for Debian) 2.38) #1 SMP PREEMPT @1654490846
[    0.000000] Machine model: ARM Juno development board (r2)
[    0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '')
[    0.000000] printk: bootconsole [pl11] enabled
[    0.000000] efi: UEFI not found.
[    0.000000] earlycon: pl11 at MMIO 0x000000007ff80000 (options '115200n8')
[    0.000000] ------------[ cut here ]------------
[    0.000000] console 'pl11' already registered
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/printk/printk.c:3327
register_console+0x64/0x2ec
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
5.19.0-rc1-next-20220606 #1
[    0.000000] Hardware name: ARM Juno development board (r2) (DT)
[    0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : register_console+0x64/0x2ec
[    0.000000] lr : register_console+0x64/0x2ec
[    0.000000] sp : ffff80000a963c80
[    0.000000] x29: ffff80000a963c80 x28: 00000000820a0018 x27: 0000000000000000
[    0.000000] x26: 00000000fef770dc x25: 0000000000000000 x24: ffff80000acbc000
[    0.000000] x23: 0000000000000000 x22: ffff80000a0b1a30 x21: ffff80000ae39250
[    0.000000] x20: 00000000000050cc x19: ffff80000acbc5e0 x18: ffffffffffffffff
[    0.000000] x17: 0000000000ffa000 x16: 00000009ff006000 x15: ffff80008a963957
[    0.000000] x14: 0000000000000000 x13: 6465726574736967 x12: 6572207964616572
[    0.000000] x11: 6c61202731316c70 x10: ffff80000a9ea6a8 x9 : ffff80000a9926a8
[    0.000000] x8 : 00000000ffffefff x7 : ffff80000a9ea6a8 x6 : 0000000000000000
[    0.000000] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
[    0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff80000a979e00
[    0.000000] Call trace:
[    0.000000]  register_console+0x64/0x2ec
[    0.000000]  of_setup_earlycon+0x254/0x278
[    0.000000]  early_init_dt_scan_chosen_stdout+0x164/0x1a4
[    0.000000]  acpi_boot_table_init+0x1d8/0x218
[    0.000000]  setup_arch+0x28c/0x5f0
[    0.000000]  start_kernel+0xa4/0x748
[    0.000000]  __primary_switched+0xc0/0xc8
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] NUMA: No NUMA configuration found
login-action: exception
#
[    0.000000] NUMA: Faking a #
[login-action] Waiting for messages, (timeout 00:12:59)
node at [mem 0x0000000080000000-0x00000009ffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x9fefd5b40-0x9fefd7fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x00000009ffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080000000-0x00000000feffffff]
[    0.000000]   node   0: [mem 0x0000000880000000-0x00000009ffffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
[    0.000000] On node 0, zone Normal: 4096 pages in unavailable ranges
[    0.000000] cma: Reserved 32 MiB at 0x00000000fd000000
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] psci: SMC Calling Convention v1.0
[    0.000000] percpu: Embedded 30 pages/cpu s82792 r8192 d31896 u122880
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: detected: ARM erratum 843419
[    0.000000] CPU features: detected: ARM erratum 845719
[    0.000000] Fallback order for Node 0: 0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 2060288
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: console=ttyAMA0,115200n8
root=/dev/nfs rw
nfsroot=10.66.16.125:/var/lib/lava/dispatcher/tmp/5143101/extract-nfsrootfs-i9fmnadt,tcp,hard,vers=3,wsize=65536
earlycon=pl011,0x7ff80000 console_msg_format=syslog earlycon
default_hugepagesz=2M hugepages=256
sky2.mac_address=0x00,0x02,0xF7,0x00,0x67,0x17 ip=dhcp
<6>[    0.000000] HugeTLB: can optimize 7 vmemmap pages for hugepages-2048kB
<6>[    0.000000] Dentry cache hash table entries: 1048576 (order: 11,
8388608 bytes, linear)
<6>[    0.000000] Inode-cache hash table entries: 524288 (order: 10,
4194304 bytes, linear)
<6>[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
<6>[    0.000000] software IO TLB: mapped [mem
0x00000000f9000000-0x00000000fd000000] (64MB)
<6>[    0.000000] Memory: 8062180K/8372224K available (20032K kernel
code, 4884K rwdata, 11148K rodata, 11008K init, 951K bss, 277276K
reserved, 32768K cma-reserved)
<6>[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1
<6>[    0.000000] ftrace: allocating 65019 entries in 254 pages
<6>[    0.000000] ftrace: allocated 254 pages with 7 groups
<6>[    0.000000] trace event string verifier disabled
<6>[    0.000000] rcu: Preemptible hierarchical RCU implementation.
<6>[    0.000000] rcu: RCU event tracing is enabled.
<6>[    0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=6.
<6>[    0.000000] Trampoline variant of Tasks RCU enabled.
<6>[    0.000000] Rude variant of Tasks RCU enabled.
<6>[    0.000000] Tracing variant of Tasks RCU enabled.
<6>[    0.000000] rcu: RCU calculated value of scheduler-enlistment
delay is 25 jiffies.
<6>[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=6
<6>[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
<6>[    0.000000] Root IRQ handler: gic_handle_irq
<6>[    0.000000] GIC: Using split EOI/Deactivate mode
<6>[    0.000000] GICv2m: range[mem 0x2c1c0000-0x2c1cffff], SPI[224:255]
<6>[    0.000000] GICv2m: range[mem 0x2c1d0000-0x2c1dffff], SPI[256:287]
<6>[    0.000000] GICv2m: range[mem 0x2c1e0000-0x2c1effff], SPI[288:319]
<6>[    0.000000] GICv2m: range[mem 0x2c1f0000-0x2c1fffff], SPI[320:351]
<6>[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
<6>[    0.000000] kfence: initialized - using 2097152 bytes for 255
objects at 0x(____ptrval____)-0x(____ptrval____)
<3>[    0.000000] timer_sp804: timer clock not found: -517
<3>[    0.000000] timer_sp804: arm,sp804 clock not found: -2
<3>[    0.000000] Failed to initialize
'/bus@8000000/motherboard-bus@8000000/iofpga-bus@300000000/timer@110000':
-22
<3>[    0.000000] timer_sp804: timer clock not found: -517
<3>[    0.000000] timer_sp804: arm,sp804 clock not found: -2
<3>[    0.000000] Failed to initialize
'/bus@8000000/motherboard-bus@8000000/iofpga-bus@300000000/timer@120000':
-22
<6>[    0.000000] arch_timer: cp15 and mmio timer(s) running at
50.00MHz (phys/phys).
<6>[    0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
<6>[    0.000000] sched_clock: 56 bits at 50MHz, resolution 20ns,
wraps every 4398046511100ns
<6>[    0.009801] Console: colour dummy device 80x25
<6>[    0.014654] Calibrating delay loop (skipped), value calculated
using timer frequency.. 100.00 BogoMIPS (lpj=200000)
<6>[    0.025413] pid_max: default: 32768 minimum: 301
<6>[    0.030453] LSM: Security Framework initializing
<1>[    0.035435] Unable to handle kernel paging request at virtual
address fffffe00002bc248
<1>[    0.043654] Mem abort info:
<1>[    0.046719]   ESR = 0x0000000096000004
<1>[    0.050752]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[    0.056355]   SET = 0, FnV = 0
<1>[    0.059683]   EA = 0, S1PTW = 0
<1>[    0.063105]   FSC = 0x04: level 0 translation fault
<1>[    0.068270] Data abort info:
<1>[    0.071421]   ISV = 0, ISS = 0x00000004
<1>[    0.075539]   CM = 0, WnR = 0
<1>[    0.078780] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082090000
<1>[    0.085778] [fffffe00002bc248] pgd=0000000000000000, p4d=0000000000000000
<0>[    0.092881] Internal error: Oops: 96000004 [#1] PREEMPT SMP
<4>[    0.098730] Modules linked in:
<4>[    0.102054] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W
     5.19.0-rc1-next-20220606 #1
<4>[    0.111214] Hardware name: ARM Juno development board (r2) (DT)
<4>[    0.117407] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT
-SSBS BTYPE=--)
<4>[    0.124652] pc : mem_cgroup_from_obj+0x2c/0x120
<4>[    0.129462] lr : register_pernet_operations+0xf0/0x59c
<4>[    0.134878] sp : ffff80000a963d70
<4>[    0.138458] x29: ffff80000a963d70 x28: 00000000820a0018 x27:
0000000000000000
<4>[    0.145886] x26: ffff80000a0c7688 x25: ffff80000a0c7688 x24:
ffff80000ad5e680
<4>[    0.153313] x23: ffff80000a963dd8 x22: ffff80000ad5e818 x21:
ffff80000a979e00
<4>[    0.160739] x20: ffff80000af09740 x19: ffff80000ad5e720 x18:
0000000000000014
<4>[    0.168166] x17: 00000000beabf81a x16: 00000000d8e898a9 x15:
000000005b20ff98
<4>[    0.175594] x14: 00000000032b2301 x13: 00000000c9e39f56 x12:
0000000014288186
<4>[    0.183021] x11: 00000000bcf02680 x10: 000000008d09a8d9 x9 :
ffff800009146254
<4>[    0.190446] x8 : ffff80000a963d48 x7 : 0000000000000000 x6 :
0000000000000002
<4>[    0.197872] x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 :
ffff80000ad5e680
<4>[    0.205299] x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 :
ffff80000af09740
<4>[    0.212726] Call trace:
<4>[    0.215435]  mem_cgroup_from_obj+0x2c/0x120
<4>[    0.219894]  register_pernet_subsys+0x3c/0x60
<4>[    0.224523]  net_ns_init+0xe4/0x13c
<4>[    0.228285]  start_kernel+0x6d4/0x748
<4>[    0.232222]  __primary_switched+0xc0/0xc8
<0>[    0.236513] Code: b25657e4 d34cfc21 d37ae421 8b040022 (f9400443)
<4>[    0.242886] ---[ end trace 0000000000000000 ]---
<0>[    0.247788] Kernel panic - not syncing: Attempted to kill the idle task!
<0>[    0.254772] ---[ end Kernel panic - not syncing: Attempted to
kill the idle task! ]---





>
> The recent changes show,
>
> # git log --oneline  next-20220603..next-20220606  -- arch/arm64/
> 202693ac55e0 (origin/akpm-base, origin/akpm) Merge branch
> 'mm-everything' of
> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> a83bdd6800e3 Merge branch 'rust-next' of
> https://github.com/Rust-for-Linux/linux.git
> 9daba6cb8145 Merge branch 'for-next' of git://github.com/Xilinx/linux-xlnx.git
> 582d5ed4caf7 Merge branch 'master' of
> git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
> 1ec6574a3c0a Merge tag 'kthread-cleanups-for-v5.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
> 21873bd66b6e Merge tag 'arm64-fixes' of
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> a8fc46f5a417 mm: avoid unnecessary page fault retires on shared memory types
> 3c59c47d1a6d arm64: Change elfcore for_each_mte_vma() to use VMA iterator
> 1c826fa748d5 arm64: remove mmap linked list from vdso
> 54c2cc79194c Merge tag 'usb-5.19-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> 09a018176ba2 Merge tag 'arm-late-5.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> 96479c09803b Merge tag 'arm-multiplatform-5.19-2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
>
>
> Test job link,
> https://lkft.validation.linaro.org/scheduler/job/5136989#L560
>
>
> metadata:
>   git_ref: master
>   git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>   git_sha: 40b58e42584bf5bd9230481dc8946f714fb387de
>   git_describe: next-20220606
>   kernel_version: 5.19.0-rc1
>   kernel-config: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/config
>   build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/556237413
>   artifact-location: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy
>
>

 --
 Linaro LKFT
 https://lkft.linaro.org

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-07  5:30   ` Naresh Kamboju
@ 2022-06-07  6:25     ` Stephen Rothwell
  -1 siblings, 0 replies; 46+ messages in thread
From: Stephen Rothwell @ 2022-06-07  6:25 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm, Andrew Morton, Ard Biesheuvel,
	Arnd Bergmann, Catalin Marinas, Raghuram Thammiraju, Mark Brown,
	Will Deacon, Shakeel Butt, Roman Gushchin, Vasily Averin,
	Qian Cai

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

Hi Naresh,

On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > I am bisecting this problem.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> >
> > The initial investigation show that,
> >
> > GOOD: next-20220603
> > BAD:  next-20220606
> >
> > Boot log:
> > Starting kernel ...  
> 
> Linux next-20220606 and next-20220607 arm64 boot failed.
> The kernel panic log showing after earlycon.
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

Can you test v5.19-rc1, please?  If that does not fail, then you could
bisect between that and next-20220606 ...

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-07  6:25     ` Stephen Rothwell
  0 siblings, 0 replies; 46+ messages in thread
From: Stephen Rothwell @ 2022-06-07  6:25 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm, Andrew Morton, Ard Biesheuvel,
	Arnd Bergmann, Catalin Marinas, Raghuram Thammiraju, Mark Brown,
	Will Deacon, Shakeel Butt, Roman Gushchin, Vasily Averin,
	Qian Cai


[-- Attachment #1.1: Type: text/plain, Size: 852 bytes --]

Hi Naresh,

On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > I am bisecting this problem.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> >
> > The initial investigation show that,
> >
> > GOOD: next-20220603
> > BAD:  next-20220606
> >
> > Boot log:
> > Starting kernel ...  
> 
> Linux next-20220606 and next-20220607 arm64 boot failed.
> The kernel panic log showing after earlycon.
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

Can you test v5.19-rc1, please?  If that does not fail, then you could
bisect between that and next-20220606 ...

-- 
Cheers,
Stephen Rothwell

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-07  6:25     ` Stephen Rothwell
@ 2022-06-07  6:36       ` Shakeel Butt
  -1 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-07  6:36 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Andrew Morton, Ard Biesheuvel,
	Arnd Bergmann, Catalin Marinas, Raghuram Thammiraju, Mark Brown,
	Will Deacon, Roman Gushchin, Vasily Averin, Qian Cai

On Mon, Jun 6, 2022 at 11:25 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Hi Naresh,
>
> On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > >
> > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > I am bisecting this problem.
> > >
> > > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > >
> > > The initial investigation show that,
> > >
> > > GOOD: next-20220603
> > > BAD:  next-20220606
> > >
> > > Boot log:
> > > Starting kernel ...
> >
> > Linux next-20220606 and next-20220607 arm64 boot failed.
> > The kernel panic log showing after earlycon.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>
> Can you test v5.19-rc1, please?  If that does not fail, then you could
> bisect between that and next-20220606 ...
>

This is already reported at
https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
the underlying issue (which is calling virt_to_page() on a vmalloc
address).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-07  6:36       ` Shakeel Butt
  0 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-07  6:36 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Andrew Morton, Ard Biesheuvel,
	Arnd Bergmann, Catalin Marinas, Raghuram Thammiraju, Mark Brown,
	Will Deacon, Roman Gushchin, Vasily Averin, Qian Cai

On Mon, Jun 6, 2022 at 11:25 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Hi Naresh,
>
> On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > >
> > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > I am bisecting this problem.
> > >
> > > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > >
> > > The initial investigation show that,
> > >
> > > GOOD: next-20220603
> > > BAD:  next-20220606
> > >
> > > Boot log:
> > > Starting kernel ...
> >
> > Linux next-20220606 and next-20220607 arm64 boot failed.
> > The kernel panic log showing after earlycon.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>
> Can you test v5.19-rc1, please?  If that does not fail, then you could
> bisect between that and next-20220606 ...
>

This is already reported at
https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
the underlying issue (which is calling virt_to_page() on a vmalloc
address).

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-07  6:36       ` Shakeel Butt
@ 2022-06-07  6:44         ` Shakeel Butt
  -1 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-07  6:44 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Andrew Morton, Ard Biesheuvel,
	Arnd Bergmann, Catalin Marinas, Raghuram Thammiraju, Mark Brown,
	Will Deacon, Roman Gushchin, Vasily Averin, Qian Cai

On Mon, Jun 6, 2022 at 11:36 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Mon, Jun 6, 2022 at 11:25 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > Hi Naresh,
> >
> > On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > >
> > > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > > >
> > > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > > I am bisecting this problem.
> > > >
> > > > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > > >
> > > > The initial investigation show that,
> > > >
> > > > GOOD: next-20220603
> > > > BAD:  next-20220606
> > > >
> > > > Boot log:
> > > > Starting kernel ...
> > >
> > > Linux next-20220606 and next-20220607 arm64 boot failed.
> > > The kernel panic log showing after earlycon.
> > >
> > > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> >
> > Can you test v5.19-rc1, please?  If that does not fail, then you could
> > bisect between that and next-20220606 ...
> >
>
> This is already reported at
> https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> the underlying issue (which is calling virt_to_page() on a vmalloc
> address).

Sorry, I might be wrong. Just checked the stacktrace again and it
seems like the failure is happening in early boot in this report.
Though the error "Unable to handle kernel paging request at virtual
address" is happening in the function mem_cgroup_from_obj().

Naresh, can you repro the issue if you revert the patch "net: set
proper memcg for net_init hooks allocations"?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-07  6:44         ` Shakeel Butt
  0 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-07  6:44 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Andrew Morton, Ard Biesheuvel,
	Arnd Bergmann, Catalin Marinas, Raghuram Thammiraju, Mark Brown,
	Will Deacon, Roman Gushchin, Vasily Averin, Qian Cai

On Mon, Jun 6, 2022 at 11:36 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Mon, Jun 6, 2022 at 11:25 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > Hi Naresh,
> >
> > On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > >
> > > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > > >
> > > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > > I am bisecting this problem.
> > > >
> > > > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > > >
> > > > The initial investigation show that,
> > > >
> > > > GOOD: next-20220603
> > > > BAD:  next-20220606
> > > >
> > > > Boot log:
> > > > Starting kernel ...
> > >
> > > Linux next-20220606 and next-20220607 arm64 boot failed.
> > > The kernel panic log showing after earlycon.
> > >
> > > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> >
> > Can you test v5.19-rc1, please?  If that does not fail, then you could
> > bisect between that and next-20220606 ...
> >
>
> This is already reported at
> https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> the underlying issue (which is calling virt_to_page() on a vmalloc
> address).

Sorry, I might be wrong. Just checked the stacktrace again and it
seems like the failure is happening in early boot in this report.
Though the error "Unable to handle kernel paging request at virtual
address" is happening in the function mem_cgroup_from_obj().

Naresh, can you repro the issue if you revert the patch "net: set
proper memcg for net_init hooks allocations"?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-07  6:25     ` Stephen Rothwell
@ 2022-06-07 10:24       ` Naresh Kamboju
  -1 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-07 10:24 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm, Andrew Morton, Ard Biesheuvel,
	Arnd Bergmann, Catalin Marinas, Raghuram Thammiraju, Mark Brown,
	Will Deacon, Shakeel Butt, Roman Gushchin, Vasily Averin,
	Qian Cai

Hi Stephen,

On Tue, 7 Jun 2022 at 11:55, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Hi Naresh,
>
> On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > >
> > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > I am bisecting this problem.

The bisection found the first bad commit as,

19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")


After reverting this single commit I am able to boot arm64 successfully.

- Naresh

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-07 10:24       ` Naresh Kamboju
  0 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-07 10:24 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm, Andrew Morton, Ard Biesheuvel,
	Arnd Bergmann, Catalin Marinas, Raghuram Thammiraju, Mark Brown,
	Will Deacon, Shakeel Butt, Roman Gushchin, Vasily Averin,
	Qian Cai

Hi Stephen,

On Tue, 7 Jun 2022 at 11:55, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Hi Naresh,
>
> On Tue, 7 Jun 2022 11:00:39 +0530 Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > >
> > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > I am bisecting this problem.

The bisection found the first bad commit as,

19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")


After reverting this single commit I am able to boot arm64 successfully.

- Naresh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-07  6:44         ` Shakeel Butt
@ 2022-06-07 10:27           ` Naresh Kamboju
  -1 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-07 10:27 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, Linux ARM, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Vasily Averin, Qian Cai

Hi Shakeel,

> > > Can you test v5.19-rc1, please?  If that does not fail, then you could
> > > bisect between that and next-20220606 ...
> > >
> >
> > This is already reported at
> > https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> > the underlying issue (which is calling virt_to_page() on a vmalloc
> > address).
>
> Sorry, I might be wrong. Just checked the stacktrace again and it
> seems like the failure is happening in early boot in this report.
> Though the error "Unable to handle kernel paging request at virtual
> address" is happening in the function mem_cgroup_from_obj().
>
> Naresh, can you repro the issue if you revert the patch "net: set
> proper memcg for net_init hooks allocations"?

yes. You are right !
19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
After reverting this single commit I am able to boot arm64 successfully.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-07 10:27           ` Naresh Kamboju
  0 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-07 10:27 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, Linux ARM, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Vasily Averin, Qian Cai

Hi Shakeel,

> > > Can you test v5.19-rc1, please?  If that does not fail, then you could
> > > bisect between that and next-20220606 ...
> > >
> >
> > This is already reported at
> > https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> > the underlying issue (which is calling virt_to_page() on a vmalloc
> > address).
>
> Sorry, I might be wrong. Just checked the stacktrace again and it
> seems like the failure is happening in early boot in this report.
> Though the error "Unable to handle kernel paging request at virtual
> address" is happening in the function mem_cgroup_from_obj().
>
> Naresh, can you repro the issue if you revert the patch "net: set
> proper memcg for net_init hooks allocations"?

yes. You are right !
19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
After reverting this single commit I am able to boot arm64 successfully.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

--
Linaro LKFT
https://lkft.linaro.org

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-07 10:27           ` Naresh Kamboju
@ 2022-06-07 14:17             ` Shakeel Butt
  -1 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-07 14:17 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, Linux ARM, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Vasily Averin, Qian Cai

On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> Hi Shakeel,
>
> > > > Can you test v5.19-rc1, please?  If that does not fail, then you could
> > > > bisect between that and next-20220606 ...
> > > >
> > >
> > > This is already reported at
> > > https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> > > the underlying issue (which is calling virt_to_page() on a vmalloc
> > > address).
> >
> > Sorry, I might be wrong. Just checked the stacktrace again and it
> > seems like the failure is happening in early boot in this report.
> > Though the error "Unable to handle kernel paging request at virtual
> > address" is happening in the function mem_cgroup_from_obj().
> >
> > Naresh, can you repro the issue if you revert the patch "net: set
> > proper memcg for net_init hooks allocations"?
>
> yes. You are right !
> 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
> After reverting this single commit I am able to boot arm64 successfully.
>
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>

Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-07 14:17             ` Shakeel Butt
  0 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-07 14:17 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, Linux ARM, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Vasily Averin, Qian Cai

On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> Hi Shakeel,
>
> > > > Can you test v5.19-rc1, please?  If that does not fail, then you could
> > > > bisect between that and next-20220606 ...
> > > >
> > >
> > > This is already reported at
> > > https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> > > the underlying issue (which is calling virt_to_page() on a vmalloc
> > > address).
> >
> > Sorry, I might be wrong. Just checked the stacktrace again and it
> > seems like the failure is happening in early boot in this report.
> > Though the error "Unable to handle kernel paging request at virtual
> > address" is happening in the function mem_cgroup_from_obj().
> >
> > Naresh, can you repro the issue if you revert the patch "net: set
> > proper memcg for net_init hooks allocations"?
>
> yes. You are right !
> 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
> After reverting this single commit I am able to boot arm64 successfully.
>
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>

Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-07 14:17             ` Shakeel Butt
@ 2022-06-07 15:29               ` Naresh Kamboju
  -1 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-07 15:29 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, Linux ARM, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Vasily Averin, Qian Cai

On Tue, 7 Jun 2022 at 19:47, Shakeel Butt <shakeelb@google.com> wrote:
>
> On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > Hi Shakeel,
> >
> > > > > Can you test v5.19-rc1, please?  If that does not fail, then you could
> > > > > bisect between that and next-20220606 ...
> > > > >
> > > >
> > > > This is already reported at
> > > > https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> > > > the underlying issue (which is calling virt_to_page() on a vmalloc
> > > > address).
> > >
> > > Sorry, I might be wrong. Just checked the stacktrace again and it
> > > seems like the failure is happening in early boot in this report.
> > > Though the error "Unable to handle kernel paging request at virtual
> > > address" is happening in the function mem_cgroup_from_obj().
> > >
> > > Naresh, can you repro the issue if you revert the patch "net: set
> > > proper memcg for net_init hooks allocations"?
> >
> > yes. You are right !
> > 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
> > After reverting this single commit I am able to boot arm64 successfully.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> >
>
> Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?

./scripts/faddr2line vmlinux  mem_cgroup_from_obj+0x2c/0x120
mem_cgroup_from_obj+0x2c/0x120:
mem_cgroup_from_obj at ??:?

Please find the following artifacts which are causing kernel crashes.

vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz
System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map


- Naresh

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-07 15:29               ` Naresh Kamboju
  0 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-07 15:29 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, Linux ARM, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Vasily Averin, Qian Cai

On Tue, 7 Jun 2022 at 19:47, Shakeel Butt <shakeelb@google.com> wrote:
>
> On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > Hi Shakeel,
> >
> > > > > Can you test v5.19-rc1, please?  If that does not fail, then you could
> > > > > bisect between that and next-20220606 ...
> > > > >
> > > >
> > > > This is already reported at
> > > > https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
> > > > the underlying issue (which is calling virt_to_page() on a vmalloc
> > > > address).
> > >
> > > Sorry, I might be wrong. Just checked the stacktrace again and it
> > > seems like the failure is happening in early boot in this report.
> > > Though the error "Unable to handle kernel paging request at virtual
> > > address" is happening in the function mem_cgroup_from_obj().
> > >
> > > Naresh, can you repro the issue if you revert the patch "net: set
> > > proper memcg for net_init hooks allocations"?
> >
> > yes. You are right !
> > 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
> > After reverting this single commit I am able to boot arm64 successfully.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> >
>
> Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?

./scripts/faddr2line vmlinux  mem_cgroup_from_obj+0x2c/0x120
mem_cgroup_from_obj+0x2c/0x120:
mem_cgroup_from_obj at ??:?

Please find the following artifacts which are causing kernel crashes.

vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz
System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map


- Naresh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-07 15:29               ` Naresh Kamboju
@ 2022-06-09  2:49                 ` Vasily Averin
  -1 siblings, 0 replies; 46+ messages in thread
From: Vasily Averin @ 2022-06-09  2:49 UTC (permalink / raw)
  To: Naresh Kamboju, Shakeel Butt, Linux ARM
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Qian Cai

Dear ARM developers,
could you please help me to find the reason of this problem?

On 6/7/22 18:29, Naresh Kamboju wrote:
> On Tue, 7 Jun 2022 at 19:47, Shakeel Butt <shakeelb@google.com> wrote:
>>
>> On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>>>
>>> Hi Shakeel,
>>>
>>>>>> Can you test v5.19-rc1, please?  If that does not fail, then you could
>>>>>> bisect between that and next-20220606 ...
>>>>>>
>>>>>
>>>>> This is already reported at
>>>>> https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
>>>>> the underlying issue (which is calling virt_to_page() on a vmalloc
>>>>> address).
>>>>
>>>> Sorry, I might be wrong. Just checked the stacktrace again and it
>>>> seems like the failure is happening in early boot in this report.
>>>> Though the error "Unable to handle kernel paging request at virtual
>>>> address" is happening in the function mem_cgroup_from_obj().
>>>>
>>>> Naresh, can you repro the issue if you revert the patch "net: set
>>>> proper memcg for net_init hooks allocations"?
>>>
>>> yes. You are right !
>>> 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
>>> After reverting this single commit I am able to boot arm64 successfully.
>>>
>>> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>>>
>>
>> Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?
> 
> ./scripts/faddr2line vmlinux  mem_cgroup_from_obj+0x2c/0x120
> mem_cgroup_from_obj+0x2c/0x120:
> mem_cgroup_from_obj at ??:?
> 
> Please find the following artifacts which are causing kernel crashes.
> 
> vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz
> System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map

Dear Naresh,
thank you very much

mem_cgroup_from_obj():
ffff80000836cf40:       d503245f        bti     c
ffff80000836cf44:       d503201f        nop
ffff80000836cf48:       d503201f        nop
ffff80000836cf4c:       d503233f        paciasp
ffff80000836cf50:       d503201f        nop
ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
ffff80000836cf58:       8b010001        add     x1, x0, x1
ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
ffff80000836cf68:       8b040022        add     x2, x1, x4
ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]

x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740

x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
according to System.map it is init_net

This issue is caused by calling virt_to_page() on address of static variable init_net.
Arm64 consider that addresses of static variables are not valid virtual addresses.
On x86_64 the same API works without any problem.

Unfortunately I do not understand the cause of the problem.
I do not see any bugs in my patch.
I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
to account for the specified object.
In particular, in the current case, I wanted to get the memory cgroup of the
specified network namespace by the name taken from for_each_net(). 
The first object in this list is the static structure unit_net

On x86_64 I can translate its address to page:

crash> p &init_net
$1 = (struct net *) 0xffffffff90c7bdc0 <init_net>
crash> vtop 0xffffffff90c7bdc0
VIRTUAL           PHYSICAL        
ffffffff90c7bdc0  402c7bdc0       

PGD DIRECTORY: ffffffff8fe10000
PAGE DIRECTORY: 401e15067
   PUD: 401e15ff0 => 401e16063
   PMD: 401e16430 => 8000000402c000e3
  PAGE: 402c00000  (2MB)

      PTE         PHYSICAL   FLAGS
8000000402c000e3  402c00000  (PRESENT|RW|ACCESSED|DIRTY|PSE|NX)

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
fffff227d00b1ec0 402c7b000                0        0  1 17ffffc0001000 reserved

However, as far as I understand this does not work for arm64.
Could you please help me to understand what is wrong here?

Below are:
 link to my patch:
https://lore.kernel.org/all/20220603182442.63750C385B8@smtp.kernel.org/
 and the quote of my investigation of similar report:
https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/

> virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)
>  WARNING: CPU: 87 PID: 3170 at arch/arm64/mm/physaddr.c:12 __virt_to_phys
...
>  Call trace:
>   __virt_to_phys
>   mem_cgroup_from_obj
>   __register_pernet_operations

@@ -1143,7 +1144,13 @@ static int __register_pernet_operations(struct list_head *list,
  		 * setup_net() and cleanup_net() are not possible.
		 */
		for_each_net(net) {
+			struct mem_cgroup *old, *memcg;
+
+			memcg = mem_cgroup_or_root(get_mem_cgroup_from_obj(net));   <<<<  Here
+			old = set_active_memcg(memcg);
 			error = ops_init(ops, net);
+			set_active_memcg(old);
+			mem_cgroup_put(memcg);
...
+static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p)
+{
+	struct mem_cgroup *memcg;
+
+	rcu_read_lock();
+	do {
+		memcg = mem_cgroup_from_obj(p); <<<<
+	} while (memcg && !css_tryget(&memcg->css));
...
struct mem_cgroup *mem_cgroup_from_obj(void *p)
{
        struct folio *folio;

        if (mem_cgroup_disabled())
                return NULL;

        folio = virt_to_folio(p); <<<< here
...
static inline struct folio *virt_to_folio(const void *x)
{
        struct page *page = virt_to_page(x); <<< here

... (arm64)
#define virt_to_page(x)         pfn_to_page(virt_to_pfn(x))  
...
#define virt_to_pfn(x)          __phys_to_pfn(__virt_to_phys((unsigned long)(x)))
...
phys_addr_t __virt_to_phys(unsigned long x)
{
        WARN(!__is_lm_address(__tag_reset(x)),
             "virt_to_phys used for non-linear address: %pK (%pS)\n",
...
virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)

Thank you,
	Vasily Averin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09  2:49                 ` Vasily Averin
  0 siblings, 0 replies; 46+ messages in thread
From: Vasily Averin @ 2022-06-09  2:49 UTC (permalink / raw)
  To: Naresh Kamboju, Shakeel Butt, Linux ARM
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Qian Cai

Dear ARM developers,
could you please help me to find the reason of this problem?

On 6/7/22 18:29, Naresh Kamboju wrote:
> On Tue, 7 Jun 2022 at 19:47, Shakeel Butt <shakeelb@google.com> wrote:
>>
>> On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>>>
>>> Hi Shakeel,
>>>
>>>>>> Can you test v5.19-rc1, please?  If that does not fail, then you could
>>>>>> bisect between that and next-20220606 ...
>>>>>>
>>>>>
>>>>> This is already reported at
>>>>> https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
>>>>> the underlying issue (which is calling virt_to_page() on a vmalloc
>>>>> address).
>>>>
>>>> Sorry, I might be wrong. Just checked the stacktrace again and it
>>>> seems like the failure is happening in early boot in this report.
>>>> Though the error "Unable to handle kernel paging request at virtual
>>>> address" is happening in the function mem_cgroup_from_obj().
>>>>
>>>> Naresh, can you repro the issue if you revert the patch "net: set
>>>> proper memcg for net_init hooks allocations"?
>>>
>>> yes. You are right !
>>> 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
>>> After reverting this single commit I am able to boot arm64 successfully.
>>>
>>> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>>>
>>
>> Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?
> 
> ./scripts/faddr2line vmlinux  mem_cgroup_from_obj+0x2c/0x120
> mem_cgroup_from_obj+0x2c/0x120:
> mem_cgroup_from_obj at ??:?
> 
> Please find the following artifacts which are causing kernel crashes.
> 
> vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz
> System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map

Dear Naresh,
thank you very much

mem_cgroup_from_obj():
ffff80000836cf40:       d503245f        bti     c
ffff80000836cf44:       d503201f        nop
ffff80000836cf48:       d503201f        nop
ffff80000836cf4c:       d503233f        paciasp
ffff80000836cf50:       d503201f        nop
ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
ffff80000836cf58:       8b010001        add     x1, x0, x1
ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
ffff80000836cf68:       8b040022        add     x2, x1, x4
ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]

x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740

x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
according to System.map it is init_net

This issue is caused by calling virt_to_page() on address of static variable init_net.
Arm64 consider that addresses of static variables are not valid virtual addresses.
On x86_64 the same API works without any problem.

Unfortunately I do not understand the cause of the problem.
I do not see any bugs in my patch.
I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
to account for the specified object.
In particular, in the current case, I wanted to get the memory cgroup of the
specified network namespace by the name taken from for_each_net(). 
The first object in this list is the static structure unit_net

On x86_64 I can translate its address to page:

crash> p &init_net
$1 = (struct net *) 0xffffffff90c7bdc0 <init_net>
crash> vtop 0xffffffff90c7bdc0
VIRTUAL           PHYSICAL        
ffffffff90c7bdc0  402c7bdc0       

PGD DIRECTORY: ffffffff8fe10000
PAGE DIRECTORY: 401e15067
   PUD: 401e15ff0 => 401e16063
   PMD: 401e16430 => 8000000402c000e3
  PAGE: 402c00000  (2MB)

      PTE         PHYSICAL   FLAGS
8000000402c000e3  402c00000  (PRESENT|RW|ACCESSED|DIRTY|PSE|NX)

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
fffff227d00b1ec0 402c7b000                0        0  1 17ffffc0001000 reserved

However, as far as I understand this does not work for arm64.
Could you please help me to understand what is wrong here?

Below are:
 link to my patch:
https://lore.kernel.org/all/20220603182442.63750C385B8@smtp.kernel.org/
 and the quote of my investigation of similar report:
https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/

> virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)
>  WARNING: CPU: 87 PID: 3170 at arch/arm64/mm/physaddr.c:12 __virt_to_phys
...
>  Call trace:
>   __virt_to_phys
>   mem_cgroup_from_obj
>   __register_pernet_operations

@@ -1143,7 +1144,13 @@ static int __register_pernet_operations(struct list_head *list,
  		 * setup_net() and cleanup_net() are not possible.
		 */
		for_each_net(net) {
+			struct mem_cgroup *old, *memcg;
+
+			memcg = mem_cgroup_or_root(get_mem_cgroup_from_obj(net));   <<<<  Here
+			old = set_active_memcg(memcg);
 			error = ops_init(ops, net);
+			set_active_memcg(old);
+			mem_cgroup_put(memcg);
...
+static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p)
+{
+	struct mem_cgroup *memcg;
+
+	rcu_read_lock();
+	do {
+		memcg = mem_cgroup_from_obj(p); <<<<
+	} while (memcg && !css_tryget(&memcg->css));
...
struct mem_cgroup *mem_cgroup_from_obj(void *p)
{
        struct folio *folio;

        if (mem_cgroup_disabled())
                return NULL;

        folio = virt_to_folio(p); <<<< here
...
static inline struct folio *virt_to_folio(const void *x)
{
        struct page *page = virt_to_page(x); <<< here

... (arm64)
#define virt_to_page(x)         pfn_to_page(virt_to_pfn(x))  
...
#define virt_to_pfn(x)          __phys_to_pfn(__virt_to_phys((unsigned long)(x)))
...
phys_addr_t __virt_to_phys(unsigned long x)
{
        WARN(!__is_lm_address(__tag_reset(x)),
             "virt_to_phys used for non-linear address: %pK (%pS)\n",
...
virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)

Thank you,
	Vasily Averin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09  2:49                 ` Vasily Averin
@ 2022-06-09  3:44                   ` Kefeng Wang
  -1 siblings, 0 replies; 46+ messages in thread
From: Kefeng Wang @ 2022-06-09  3:44 UTC (permalink / raw)
  To: Vasily Averin, Naresh Kamboju, Shakeel Butt, Linux ARM
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Qian Cai


On 2022/6/9 10:49, Vasily Averin wrote:
> Dear ARM developers,
> could you please help me to find the reason of this problem?
Hi,
> mem_cgroup_from_obj():
> ffff80000836cf40:       d503245f        bti     c
> ffff80000836cf44:       d503201f        nop
> ffff80000836cf48:       d503201f        nop
> ffff80000836cf4c:       d503233f        paciasp
> ffff80000836cf50:       d503201f        nop
> ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
> ffff80000836cf58:       8b010001        add     x1, x0, x1
> ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
> ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
> ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
> ffff80000836cf68:       8b040022        add     x2, x1, x4
> ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
>
> x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
>
> x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> according to System.map it is init_net
>
> This issue is caused by calling virt_to_page() on address of static variable init_net.
> Arm64 consider that addresses of static variables are not valid virtual addresses.
> On x86_64 the same API works without any problem.
>
> Unfortunately I do not understand the cause of the problem.
> I do not see any bugs in my patch.
> I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
> to account for the specified object.
> In particular, in the current case, I wanted to get the memory cgroup of the
> specified network namespace by the name taken from for_each_net().
> The first object in this list is the static structure unit_net

root@test:~# cat /proc/kallsyms |grep -w _data
ffff80000a110000 D _data
root@test:~# cat /proc/kallsyms |grep -w _end
ffff80000a500000 B _end
root@test:~# cat /proc/kallsyms |grep -w init_net
ffff80000a4eb980 B init_net

the init_net is located in data section, on arm64, it is allowed by 
vmalloc, see

     map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 
0, 0);

and the arm has same behavior.

We could let init_net be allocated dynamically, but I think it could 
change a lot.

Any better sugguestion, Catalin?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09  3:44                   ` Kefeng Wang
  0 siblings, 0 replies; 46+ messages in thread
From: Kefeng Wang @ 2022-06-09  3:44 UTC (permalink / raw)
  To: Vasily Averin, Naresh Kamboju, Shakeel Butt, Linux ARM
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Qian Cai


On 2022/6/9 10:49, Vasily Averin wrote:
> Dear ARM developers,
> could you please help me to find the reason of this problem?
Hi,
> mem_cgroup_from_obj():
> ffff80000836cf40:       d503245f        bti     c
> ffff80000836cf44:       d503201f        nop
> ffff80000836cf48:       d503201f        nop
> ffff80000836cf4c:       d503233f        paciasp
> ffff80000836cf50:       d503201f        nop
> ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
> ffff80000836cf58:       8b010001        add     x1, x0, x1
> ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
> ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
> ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
> ffff80000836cf68:       8b040022        add     x2, x1, x4
> ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
>
> x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
>
> x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> according to System.map it is init_net
>
> This issue is caused by calling virt_to_page() on address of static variable init_net.
> Arm64 consider that addresses of static variables are not valid virtual addresses.
> On x86_64 the same API works without any problem.
>
> Unfortunately I do not understand the cause of the problem.
> I do not see any bugs in my patch.
> I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
> to account for the specified object.
> In particular, in the current case, I wanted to get the memory cgroup of the
> specified network namespace by the name taken from for_each_net().
> The first object in this list is the static structure unit_net

root@test:~# cat /proc/kallsyms |grep -w _data
ffff80000a110000 D _data
root@test:~# cat /proc/kallsyms |grep -w _end
ffff80000a500000 B _end
root@test:~# cat /proc/kallsyms |grep -w init_net
ffff80000a4eb980 B init_net

the init_net is located in data section, on arm64, it is allowed by 
vmalloc, see

     map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 
0, 0);

and the arm has same behavior.

We could let init_net be allocated dynamically, but I think it could 
change a lot.

Any better sugguestion, Catalin?


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09  3:44                   ` Kefeng Wang
@ 2022-06-09  4:43                     ` Kefeng Wang
  -1 siblings, 0 replies; 46+ messages in thread
From: Kefeng Wang @ 2022-06-09  4:43 UTC (permalink / raw)
  To: Vasily Averin, Naresh Kamboju, Shakeel Butt, Linux ARM
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Qian Cai


On 2022/6/9 11:44, Kefeng Wang wrote:
>
> On 2022/6/9 10:49, Vasily Averin wrote:
>> Dear ARM developers,
>> could you please help me to find the reason of this problem?
> Hi,
>> mem_cgroup_from_obj():
>> ffff80000836cf40:       d503245f        bti     c
>> ffff80000836cf44:       d503201f        nop
>> ffff80000836cf48:       d503201f        nop
>> ffff80000836cf4c:       d503233f        paciasp
>> ffff80000836cf50:       d503201f        nop
>> ffff80000836cf54:       d2e00021        mov     x1, 
>> #0x1000000000000            // #281474976710656
>> ffff80000836cf58:       8b010001        add     x1, x0, x1
>> ffff80000836cf5c:       b25657e4        mov     x4, 
>> #0xfffffc0000000000         // #-4398046511104
>> ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
>> ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
>> ffff80000836cf68:       8b040022        add     x2, x1, x4
>> ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
>>
>> x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
>> x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
>>
>> x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
>> according to System.map it is init_net
>>
>> This issue is caused by calling virt_to_page() on address of static 
>> variable init_net.
>> Arm64 consider that addresses of static variables are not valid 
>> virtual addresses.
>> On x86_64 the same API works without any problem.
>>
>> Unfortunately I do not understand the cause of the problem.
>> I do not see any bugs in my patch.
>> I'm using an existing API, mem_cgroup_from_obj(), to find the memory 
>> cgroup used
>> to account for the specified object.
>> In particular, in the current case, I wanted to get the memory cgroup 
>> of the
>> specified network namespace by the name taken from for_each_net().
>> The first object in this list is the static structure unit_net
>
> root@test:~# cat /proc/kallsyms |grep -w _data
> ffff80000a110000 D _data
> root@test:~# cat /proc/kallsyms |grep -w _end
> ffff80000a500000 B _end
> root@test:~# cat /proc/kallsyms |grep -w init_net
> ffff80000a4eb980 B init_net
>
> the init_net is located in data section, on arm64, it is allowed by 
> vmalloc, see
>
>     map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 
> 0, 0);
>
> and the arm has same behavior.
>
> We could let init_net be allocated dynamically, but I think it could 
> change a lot.
>
> Any better sugguestion, Catalin?

or  add vmalloc check in mem_cgroup_from_obj()?

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 27cebaa53472..fb817e5da5f0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2860,7 +2860,10 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
         if (mem_cgroup_disabled())
                 return NULL;

-       folio = virt_to_folio(p);
+       if (unlikely(is_vmalloc_addr(p)))
+               folio = page_folio(vmalloc_to_page(p));
+       else
+               folio = virt_to_folio(p);

         /*
          * Slab objects are accounted individually, not per-page.


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09  4:43                     ` Kefeng Wang
  0 siblings, 0 replies; 46+ messages in thread
From: Kefeng Wang @ 2022-06-09  4:43 UTC (permalink / raw)
  To: Vasily Averin, Naresh Kamboju, Shakeel Butt, Linux ARM
  Cc: Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Roman Gushchin,
	Qian Cai


On 2022/6/9 11:44, Kefeng Wang wrote:
>
> On 2022/6/9 10:49, Vasily Averin wrote:
>> Dear ARM developers,
>> could you please help me to find the reason of this problem?
> Hi,
>> mem_cgroup_from_obj():
>> ffff80000836cf40:       d503245f        bti     c
>> ffff80000836cf44:       d503201f        nop
>> ffff80000836cf48:       d503201f        nop
>> ffff80000836cf4c:       d503233f        paciasp
>> ffff80000836cf50:       d503201f        nop
>> ffff80000836cf54:       d2e00021        mov     x1, 
>> #0x1000000000000            // #281474976710656
>> ffff80000836cf58:       8b010001        add     x1, x0, x1
>> ffff80000836cf5c:       b25657e4        mov     x4, 
>> #0xfffffc0000000000         // #-4398046511104
>> ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
>> ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
>> ffff80000836cf68:       8b040022        add     x2, x1, x4
>> ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
>>
>> x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
>> x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
>>
>> x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
>> according to System.map it is init_net
>>
>> This issue is caused by calling virt_to_page() on address of static 
>> variable init_net.
>> Arm64 consider that addresses of static variables are not valid 
>> virtual addresses.
>> On x86_64 the same API works without any problem.
>>
>> Unfortunately I do not understand the cause of the problem.
>> I do not see any bugs in my patch.
>> I'm using an existing API, mem_cgroup_from_obj(), to find the memory 
>> cgroup used
>> to account for the specified object.
>> In particular, in the current case, I wanted to get the memory cgroup 
>> of the
>> specified network namespace by the name taken from for_each_net().
>> The first object in this list is the static structure unit_net
>
> root@test:~# cat /proc/kallsyms |grep -w _data
> ffff80000a110000 D _data
> root@test:~# cat /proc/kallsyms |grep -w _end
> ffff80000a500000 B _end
> root@test:~# cat /proc/kallsyms |grep -w init_net
> ffff80000a4eb980 B init_net
>
> the init_net is located in data section, on arm64, it is allowed by 
> vmalloc, see
>
>     map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 
> 0, 0);
>
> and the arm has same behavior.
>
> We could let init_net be allocated dynamically, but I think it could 
> change a lot.
>
> Any better sugguestion, Catalin?

or  add vmalloc check in mem_cgroup_from_obj()?

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 27cebaa53472..fb817e5da5f0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2860,7 +2860,10 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
         if (mem_cgroup_disabled())
                 return NULL;

-       folio = virt_to_folio(p);
+       if (unlikely(is_vmalloc_addr(p)))
+               folio = page_folio(vmalloc_to_page(p));
+       else
+               folio = virt_to_folio(p);

         /*
          * Slab objects are accounted individually, not per-page.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09  4:43                     ` Kefeng Wang
@ 2022-06-09  5:19                       ` Roman Gushchin
  -1 siblings, 0 replies; 46+ messages in thread
From: Roman Gushchin @ 2022-06-09  5:19 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Vasily Averin, Naresh Kamboju, Shakeel Butt, Linux ARM,
	Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Qian Cai

On Thu, Jun 09, 2022 at 12:43:00PM +0800, Kefeng Wang wrote:
> 
> On 2022/6/9 11:44, Kefeng Wang wrote:
> > 
> > On 2022/6/9 10:49, Vasily Averin wrote:
> > > Dear ARM developers,
> > > could you please help me to find the reason of this problem?
> > Hi,
> > > mem_cgroup_from_obj():
> > > ffff80000836cf40:       d503245f        bti     c
> > > ffff80000836cf44:       d503201f        nop
> > > ffff80000836cf48:       d503201f        nop
> > > ffff80000836cf4c:       d503233f        paciasp
> > > ffff80000836cf50:       d503201f        nop
> > > ffff80000836cf54:       d2e00021        mov     x1,
> > > #0x1000000000000            // #281474976710656
> > > ffff80000836cf58:       8b010001        add     x1, x0, x1
> > > ffff80000836cf5c:       b25657e4        mov     x4,
> > > #0xfffffc0000000000         // #-4398046511104
> > > ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
> > > ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
> > > ffff80000836cf68:       8b040022        add     x2, x1, x4
> > > ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
> > > 
> > > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > > 
> > > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > > according to System.map it is init_net
> > > 
> > > This issue is caused by calling virt_to_page() on address of static
> > > variable init_net.
> > > Arm64 consider that addresses of static variables are not valid
> > > virtual addresses.
> > > On x86_64 the same API works without any problem.
> > > 
> > > Unfortunately I do not understand the cause of the problem.
> > > I do not see any bugs in my patch.
> > > I'm using an existing API, mem_cgroup_from_obj(), to find the memory
> > > cgroup used
> > > to account for the specified object.
> > > In particular, in the current case, I wanted to get the memory
> > > cgroup of the
> > > specified network namespace by the name taken from for_each_net().
> > > The first object in this list is the static structure unit_net
> > 
> > root@test:~# cat /proc/kallsyms |grep -w _data
> > ffff80000a110000 D _data
> > root@test:~# cat /proc/kallsyms |grep -w _end
> > ffff80000a500000 B _end
> > root@test:~# cat /proc/kallsyms |grep -w init_net
> > ffff80000a4eb980 B init_net
> > 
> > the init_net is located in data section, on arm64, it is allowed by
> > vmalloc, see
> > 
> >     map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0,
> > 0);
> > 
> > and the arm has same behavior.
> > 
> > We could let init_net be allocated dynamically, but I think it could
> > change a lot.
> > 
> > Any better sugguestion, Catalin?
> 
> or  add vmalloc check in mem_cgroup_from_obj()?
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 27cebaa53472..fb817e5da5f0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2860,7 +2860,10 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>         if (mem_cgroup_disabled())
>                 return NULL;
> 
> -       folio = virt_to_folio(p);
> +       if (unlikely(is_vmalloc_addr(p)))
> +               folio = page_folio(vmalloc_to_page(p));
> +       else
> +               folio = virt_to_folio(p);
> 
>         /*
>          * Slab objects are accounted individually, not per-page.
> 

It sounds right. Later we can add something like mem_cgroup_from_slab_obj()
to use on hot paths and avoid this check.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09  5:19                       ` Roman Gushchin
  0 siblings, 0 replies; 46+ messages in thread
From: Roman Gushchin @ 2022-06-09  5:19 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Vasily Averin, Naresh Kamboju, Shakeel Butt, Linux ARM,
	Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Qian Cai

On Thu, Jun 09, 2022 at 12:43:00PM +0800, Kefeng Wang wrote:
> 
> On 2022/6/9 11:44, Kefeng Wang wrote:
> > 
> > On 2022/6/9 10:49, Vasily Averin wrote:
> > > Dear ARM developers,
> > > could you please help me to find the reason of this problem?
> > Hi,
> > > mem_cgroup_from_obj():
> > > ffff80000836cf40:       d503245f        bti     c
> > > ffff80000836cf44:       d503201f        nop
> > > ffff80000836cf48:       d503201f        nop
> > > ffff80000836cf4c:       d503233f        paciasp
> > > ffff80000836cf50:       d503201f        nop
> > > ffff80000836cf54:       d2e00021        mov     x1,
> > > #0x1000000000000            // #281474976710656
> > > ffff80000836cf58:       8b010001        add     x1, x0, x1
> > > ffff80000836cf5c:       b25657e4        mov     x4,
> > > #0xfffffc0000000000         // #-4398046511104
> > > ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
> > > ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
> > > ffff80000836cf68:       8b040022        add     x2, x1, x4
> > > ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
> > > 
> > > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > > 
> > > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > > according to System.map it is init_net
> > > 
> > > This issue is caused by calling virt_to_page() on address of static
> > > variable init_net.
> > > Arm64 consider that addresses of static variables are not valid
> > > virtual addresses.
> > > On x86_64 the same API works without any problem.
> > > 
> > > Unfortunately I do not understand the cause of the problem.
> > > I do not see any bugs in my patch.
> > > I'm using an existing API, mem_cgroup_from_obj(), to find the memory
> > > cgroup used
> > > to account for the specified object.
> > > In particular, in the current case, I wanted to get the memory
> > > cgroup of the
> > > specified network namespace by the name taken from for_each_net().
> > > The first object in this list is the static structure unit_net
> > 
> > root@test:~# cat /proc/kallsyms |grep -w _data
> > ffff80000a110000 D _data
> > root@test:~# cat /proc/kallsyms |grep -w _end
> > ffff80000a500000 B _end
> > root@test:~# cat /proc/kallsyms |grep -w init_net
> > ffff80000a4eb980 B init_net
> > 
> > the init_net is located in data section, on arm64, it is allowed by
> > vmalloc, see
> > 
> >     map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0,
> > 0);
> > 
> > and the arm has same behavior.
> > 
> > We could let init_net be allocated dynamically, but I think it could
> > change a lot.
> > 
> > Any better sugguestion, Catalin?
> 
> or  add vmalloc check in mem_cgroup_from_obj()?
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 27cebaa53472..fb817e5da5f0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2860,7 +2860,10 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>         if (mem_cgroup_disabled())
>                 return NULL;
> 
> -       folio = virt_to_folio(p);
> +       if (unlikely(is_vmalloc_addr(p)))
> +               folio = page_folio(vmalloc_to_page(p));
> +       else
> +               folio = virt_to_folio(p);
> 
>         /*
>          * Slab objects are accounted individually, not per-page.
> 

It sounds right. Later we can add something like mem_cgroup_from_slab_obj()
to use on hot paths and avoid this check.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09  3:44                   ` Kefeng Wang
@ 2022-06-09 10:11                     ` Will Deacon
  -1 siblings, 0 replies; 46+ messages in thread
From: Will Deacon @ 2022-06-09 10:11 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Vasily Averin, Naresh Kamboju, Shakeel Butt, Linux ARM,
	Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Roman Gushchin, Qian Cai

On Thu, Jun 09, 2022 at 11:44:09AM +0800, Kefeng Wang wrote:
> On 2022/6/9 10:49, Vasily Averin wrote:
> > mem_cgroup_from_obj():
> > ffff80000836cf40:       d503245f        bti     c
> > ffff80000836cf44:       d503201f        nop
> > ffff80000836cf48:       d503201f        nop
> > ffff80000836cf4c:       d503233f        paciasp
> > ffff80000836cf50:       d503201f        nop
> > ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
> > ffff80000836cf58:       8b010001        add     x1, x0, x1
> > ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
> > ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
> > ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
> > ffff80000836cf68:       8b040022        add     x2, x1, x4
> > ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
> > 
> > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > 
> > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > according to System.map it is init_net
> > 
> > This issue is caused by calling virt_to_page() on address of static variable init_net.
> > Arm64 consider that addresses of static variables are not valid virtual addresses.
> > On x86_64 the same API works without any problem.

This just depends on whether or not the kernel is running out of the linear
mapping or not. On arm64, we use the vmalloc area for the kernel image and
so virt_to_page() won't work, just like it won't work for modules on other
architectures.

How are module addresses handled by mem_cgroup_from_obj()?

> > Unfortunately I do not understand the cause of the problem.
> > I do not see any bugs in my patch.
> > I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
> > to account for the specified object.
> > In particular, in the current case, I wanted to get the memory cgroup of the
> > specified network namespace by the name taken from for_each_net().
> > The first object in this list is the static structure unit_net
> 
> root@test:~# cat /proc/kallsyms |grep -w _data
> ffff80000a110000 D _data
> root@test:~# cat /proc/kallsyms |grep -w _end
> ffff80000a500000 B _end
> root@test:~# cat /proc/kallsyms |grep -w init_net
> ffff80000a4eb980 B init_net
> 
> the init_net is located in data section, on arm64, it is allowed by vmalloc,
> see
> 
>     map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0, 0);
> 
> and the arm has same behavior.
> 
> We could let init_net be allocated dynamically, but I think it could change
> a lot.
> 
> Any better sugguestion, Catalin?

For this specific issue, can you use lm_alias to get a virtual address
suitable for virt_to_page()? My question about modules still applies though.

Will

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09 10:11                     ` Will Deacon
  0 siblings, 0 replies; 46+ messages in thread
From: Will Deacon @ 2022-06-09 10:11 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Vasily Averin, Naresh Kamboju, Shakeel Butt, Linux ARM,
	Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Roman Gushchin, Qian Cai

On Thu, Jun 09, 2022 at 11:44:09AM +0800, Kefeng Wang wrote:
> On 2022/6/9 10:49, Vasily Averin wrote:
> > mem_cgroup_from_obj():
> > ffff80000836cf40:       d503245f        bti     c
> > ffff80000836cf44:       d503201f        nop
> > ffff80000836cf48:       d503201f        nop
> > ffff80000836cf4c:       d503233f        paciasp
> > ffff80000836cf50:       d503201f        nop
> > ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
> > ffff80000836cf58:       8b010001        add     x1, x0, x1
> > ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
> > ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
> > ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
> > ffff80000836cf68:       8b040022        add     x2, x1, x4
> > ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
> > 
> > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > 
> > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > according to System.map it is init_net
> > 
> > This issue is caused by calling virt_to_page() on address of static variable init_net.
> > Arm64 consider that addresses of static variables are not valid virtual addresses.
> > On x86_64 the same API works without any problem.

This just depends on whether or not the kernel is running out of the linear
mapping or not. On arm64, we use the vmalloc area for the kernel image and
so virt_to_page() won't work, just like it won't work for modules on other
architectures.

How are module addresses handled by mem_cgroup_from_obj()?

> > Unfortunately I do not understand the cause of the problem.
> > I do not see any bugs in my patch.
> > I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
> > to account for the specified object.
> > In particular, in the current case, I wanted to get the memory cgroup of the
> > specified network namespace by the name taken from for_each_net().
> > The first object in this list is the static structure unit_net
> 
> root@test:~# cat /proc/kallsyms |grep -w _data
> ffff80000a110000 D _data
> root@test:~# cat /proc/kallsyms |grep -w _end
> ffff80000a500000 B _end
> root@test:~# cat /proc/kallsyms |grep -w init_net
> ffff80000a4eb980 B init_net
> 
> the init_net is located in data section, on arm64, it is allowed by vmalloc,
> see
> 
>     map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0, 0);
> 
> and the arm has same behavior.
> 
> We could let init_net be allocated dynamically, but I think it could change
> a lot.
> 
> Any better sugguestion, Catalin?

For this specific issue, can you use lm_alias to get a virtual address
suitable for virt_to_page()? My question about modules still applies though.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09 10:11                     ` Will Deacon
@ 2022-06-09 10:25                       ` Catalin Marinas
  -1 siblings, 0 replies; 46+ messages in thread
From: Catalin Marinas @ 2022-06-09 10:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Kefeng Wang, Vasily Averin, Naresh Kamboju, Shakeel Butt,
	Linux ARM, Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Raghuram Thammiraju, Mark Brown,
	Roman Gushchin, Qian Cai

On Thu, Jun 09, 2022 at 11:11:54AM +0100, Will Deacon wrote:
> On Thu, Jun 09, 2022 at 11:44:09AM +0800, Kefeng Wang wrote:
> > On 2022/6/9 10:49, Vasily Averin wrote:
> > > mem_cgroup_from_obj():
> > > ffff80000836cf40:       d503245f        bti     c
> > > ffff80000836cf44:       d503201f        nop
> > > ffff80000836cf48:       d503201f        nop
> > > ffff80000836cf4c:       d503233f        paciasp
> > > ffff80000836cf50:       d503201f        nop
> > > ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
> > > ffff80000836cf58:       8b010001        add     x1, x0, x1
> > > ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
> > > ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
> > > ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
> > > ffff80000836cf68:       8b040022        add     x2, x1, x4
> > > ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
> > > 
> > > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > > 
> > > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > > according to System.map it is init_net
> > > 
> > > This issue is caused by calling virt_to_page() on address of static variable init_net.
> > > Arm64 consider that addresses of static variables are not valid virtual addresses.
> > > On x86_64 the same API works without any problem.
> 
> This just depends on whether or not the kernel is running out of the linear
> mapping or not. On arm64, we use the vmalloc area for the kernel image and
> so virt_to_page() won't work, just like it won't work for modules on other
> architectures.
> 
> How are module addresses handled by mem_cgroup_from_obj()?

It doesn't look like they are handled in any way. It just expects the
pointer to be a linear map one. Something like below:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 27cebaa53472..795bf3673fa7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2860,6 +2860,11 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	if (mem_cgroup_disabled())
 		return NULL;
 
+	if (is_module_address((unsigned long)p))
+		return NULL;
+	else if (is_kernel((unsigned long)p))
+		return NULL;
+
 	folio = virt_to_folio(p);
 
 	/*

-- 
Catalin

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09 10:25                       ` Catalin Marinas
  0 siblings, 0 replies; 46+ messages in thread
From: Catalin Marinas @ 2022-06-09 10:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Kefeng Wang, Vasily Averin, Naresh Kamboju, Shakeel Butt,
	Linux ARM, Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Raghuram Thammiraju, Mark Brown,
	Roman Gushchin, Qian Cai

On Thu, Jun 09, 2022 at 11:11:54AM +0100, Will Deacon wrote:
> On Thu, Jun 09, 2022 at 11:44:09AM +0800, Kefeng Wang wrote:
> > On 2022/6/9 10:49, Vasily Averin wrote:
> > > mem_cgroup_from_obj():
> > > ffff80000836cf40:       d503245f        bti     c
> > > ffff80000836cf44:       d503201f        nop
> > > ffff80000836cf48:       d503201f        nop
> > > ffff80000836cf4c:       d503233f        paciasp
> > > ffff80000836cf50:       d503201f        nop
> > > ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
> > > ffff80000836cf58:       8b010001        add     x1, x0, x1
> > > ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
> > > ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
> > > ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
> > > ffff80000836cf68:       8b040022        add     x2, x1, x4
> > > ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
> > > 
> > > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > > 
> > > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > > according to System.map it is init_net
> > > 
> > > This issue is caused by calling virt_to_page() on address of static variable init_net.
> > > Arm64 consider that addresses of static variables are not valid virtual addresses.
> > > On x86_64 the same API works without any problem.
> 
> This just depends on whether or not the kernel is running out of the linear
> mapping or not. On arm64, we use the vmalloc area for the kernel image and
> so virt_to_page() won't work, just like it won't work for modules on other
> architectures.
> 
> How are module addresses handled by mem_cgroup_from_obj()?

It doesn't look like they are handled in any way. It just expects the
pointer to be a linear map one. Something like below:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 27cebaa53472..795bf3673fa7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2860,6 +2860,11 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	if (mem_cgroup_disabled())
 		return NULL;
 
+	if (is_module_address((unsigned long)p))
+		return NULL;
+	else if (is_kernel((unsigned long)p))
+		return NULL;
+
 	folio = virt_to_folio(p);
 
 	/*

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09 10:25                       ` Catalin Marinas
@ 2022-06-09 15:23                         ` Shakeel Butt
  -1 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-09 15:23 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Kefeng Wang, Vasily Averin, Naresh Kamboju,
	Linux ARM, Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Raghuram Thammiraju, Mark Brown,
	Roman Gushchin, Qian Cai

On Thu, Jun 9, 2022 at 3:26 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Thu, Jun 09, 2022 at 11:11:54AM +0100, Will Deacon wrote:
> > On Thu, Jun 09, 2022 at 11:44:09AM +0800, Kefeng Wang wrote:
> > > On 2022/6/9 10:49, Vasily Averin wrote:
> > > > mem_cgroup_from_obj():
> > > > ffff80000836cf40:       d503245f        bti     c
> > > > ffff80000836cf44:       d503201f        nop
> > > > ffff80000836cf48:       d503201f        nop
> > > > ffff80000836cf4c:       d503233f        paciasp
> > > > ffff80000836cf50:       d503201f        nop
> > > > ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
> > > > ffff80000836cf58:       8b010001        add     x1, x0, x1
> > > > ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
> > > > ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
> > > > ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
> > > > ffff80000836cf68:       8b040022        add     x2, x1, x4
> > > > ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
> > > >
> > > > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > > > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > > >
> > > > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > > > according to System.map it is init_net
> > > >
> > > > This issue is caused by calling virt_to_page() on address of static variable init_net.
> > > > Arm64 consider that addresses of static variables are not valid virtual addresses.
> > > > On x86_64 the same API works without any problem.
> >
> > This just depends on whether or not the kernel is running out of the linear
> > mapping or not. On arm64, we use the vmalloc area for the kernel image and
> > so virt_to_page() won't work, just like it won't work for modules on other
> > architectures.
> >
> > How are module addresses handled by mem_cgroup_from_obj()?
>
> It doesn't look like they are handled in any way. It just expects the
> pointer to be a linear map one.

Yes, that is correct.

> Something like below:
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 27cebaa53472..795bf3673fa7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2860,6 +2860,11 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>         if (mem_cgroup_disabled())
>                 return NULL;
>
> +       if (is_module_address((unsigned long)p))
> +               return NULL;
> +       else if (is_kernel((unsigned long)p))
> +               return NULL;
> +

How about just is_vmalloc_addr(p) check? It should cover modules and
also arm64 using vmalloc for kernel image cases.

>         folio = virt_to_folio(p);
>
>         /*
>
> --
> Catalin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09 15:23                         ` Shakeel Butt
  0 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-09 15:23 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Kefeng Wang, Vasily Averin, Naresh Kamboju,
	Linux ARM, Stephen Rothwell, Linux-Next Mailing List, open list,
	regressions, lkft-triage, linux-mm, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Raghuram Thammiraju, Mark Brown,
	Roman Gushchin, Qian Cai

On Thu, Jun 9, 2022 at 3:26 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Thu, Jun 09, 2022 at 11:11:54AM +0100, Will Deacon wrote:
> > On Thu, Jun 09, 2022 at 11:44:09AM +0800, Kefeng Wang wrote:
> > > On 2022/6/9 10:49, Vasily Averin wrote:
> > > > mem_cgroup_from_obj():
> > > > ffff80000836cf40:       d503245f        bti     c
> > > > ffff80000836cf44:       d503201f        nop
> > > > ffff80000836cf48:       d503201f        nop
> > > > ffff80000836cf4c:       d503233f        paciasp
> > > > ffff80000836cf50:       d503201f        nop
> > > > ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
> > > > ffff80000836cf58:       8b010001        add     x1, x0, x1
> > > > ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
> > > > ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
> > > > ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
> > > > ffff80000836cf68:       8b040022        add     x2, x1, x4
> > > > ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]
> > > >
> > > > x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
> > > > x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
> > > >
> > > > x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
> > > > according to System.map it is init_net
> > > >
> > > > This issue is caused by calling virt_to_page() on address of static variable init_net.
> > > > Arm64 consider that addresses of static variables are not valid virtual addresses.
> > > > On x86_64 the same API works without any problem.
> >
> > This just depends on whether or not the kernel is running out of the linear
> > mapping or not. On arm64, we use the vmalloc area for the kernel image and
> > so virt_to_page() won't work, just like it won't work for modules on other
> > architectures.
> >
> > How are module addresses handled by mem_cgroup_from_obj()?
>
> It doesn't look like they are handled in any way. It just expects the
> pointer to be a linear map one.

Yes, that is correct.

> Something like below:
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 27cebaa53472..795bf3673fa7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2860,6 +2860,11 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>         if (mem_cgroup_disabled())
>                 return NULL;
>
> +       if (is_module_address((unsigned long)p))
> +               return NULL;
> +       else if (is_kernel((unsigned long)p))
> +               return NULL;
> +

How about just is_vmalloc_addr(p) check? It should cover modules and
also arm64 using vmalloc for kernel image cases.

>         folio = virt_to_folio(p);
>
>         /*
>
> --
> Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-07  5:30   ` Naresh Kamboju
@ 2022-06-09 17:26     ` Roman Gushchin
  -1 siblings, 0 replies; 46+ messages in thread
From: Roman Gushchin @ 2022-06-09 17:26 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm, Stephen Rothwell, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Shakeel Butt,
	Vasily Averin, Qian Cai

On Tue, Jun 07, 2022 at 11:00:39AM +0530, Naresh Kamboju wrote:
> On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > I am bisecting this problem.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> >
> > The initial investigation show that,
> >
> > GOOD: next-20220603
> > BAD:  next-20220606
> >
> > Boot log:
> > Starting kernel ...
> 
> Linux next-20220606 and next-20220607 arm64 boot failed.
> The kernel panic log showing after earlycon.
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

Naresh, can you, please, check if the following patch resolves the issue?
(completely untested except for building)

--

From 6a454876c9a1886e3cf8e9b66dae19b326f8901a Mon Sep 17 00:00:00 2001
From: Roman Gushchin <roman.gushchin@linux.dev>
Date: Thu, 9 Jun 2022 10:03:20 -0700
Subject: [PATCH] mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe

Currently mem_cgroup_from_obj() is not working properly with objects
allocated using vmalloc(). It creates problems in some cases, when
it's called for static objects belonging to  modules or generally
allocated using vmalloc().

This patch makes mem_cgroup_from_obj() safe to be called on objects
allocated using vmalloc().

It also introduces mem_cgroup_from_slab_obj(), which is a faster
version to use in places when we know the object is either a slab
object or a generic slab page (e.g. when adding an object to a lru
list).

Suggested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/linux/memcontrol.h |  6 ++++
 mm/list_lru.c              |  2 +-
 mm/memcontrol.c            | 71 +++++++++++++++++++++++++++-----------
 3 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0d7584e2f335..4d31ce55b1c0 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1761,6 +1761,7 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg)
 }
 
 struct mem_cgroup *mem_cgroup_from_obj(void *p);
+struct mem_cgroup *mem_cgroup_from_slab_obj(void *p);
 
 static inline void count_objcg_event(struct obj_cgroup *objcg,
 				     enum vm_event_item idx)
@@ -1858,6 +1859,11 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	return NULL;
 }
 
+static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
+{
+	return NULL;
+}
+
 static inline void count_objcg_event(struct obj_cgroup *objcg,
 				     enum vm_event_item idx)
 {
diff --git a/mm/list_lru.c b/mm/list_lru.c
index ba76428ceece..a05e5bef3b40 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -71,7 +71,7 @@ list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
 	if (!list_lru_memcg_aware(lru))
 		goto out;
 
-	memcg = mem_cgroup_from_obj(ptr);
+	memcg = mem_cgroup_from_slab_obj(ptr);
 	if (!memcg)
 		goto out;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4093062c5c9b..8c408d681377 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -783,7 +783,7 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 	struct lruvec *lruvec;
 
 	rcu_read_lock();
-	memcg = mem_cgroup_from_obj(p);
+	memcg = mem_cgroup_from_slab_obj(p);
 
 	/*
 	 * Untracked pages have no memcg, no lruvec. Update only the
@@ -2833,27 +2833,9 @@ int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
 	return 0;
 }
 
-/*
- * Returns a pointer to the memory cgroup to which the kernel object is charged.
- *
- * A passed kernel object can be a slab object or a generic kernel page, so
- * different mechanisms for getting the memory cgroup pointer should be used.
- * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
- * can not know for sure how the kernel object is implemented.
- * mem_cgroup_from_obj() can be safely used in such cases.
- *
- * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
- * cgroup_mutex, etc.
- */
-struct mem_cgroup *mem_cgroup_from_obj(void *p)
+static __always_inline
+struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
 {
-	struct folio *folio;
-
-	if (mem_cgroup_disabled())
-		return NULL;
-
-	folio = virt_to_folio(p);
-
 	/*
 	 * Slab objects are accounted individually, not per-page.
 	 * Memcg membership data for each individual object is saved in
@@ -2886,6 +2868,53 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	return page_memcg_check(folio_page(folio, 0));
 }
 
+/*
+ * Returns a pointer to the memory cgroup to which the kernel object is charged.
+ *
+ * A passed kernel object can be a slab object, vmalloc object or a generic
+ * kernel page, so different mechanisms for getting the memory cgroup pointer
+ * should be used.
+ *
+ * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
+ * can not know for sure how the kernel object is implemented.
+ * mem_cgroup_from_obj() can be safely used in such cases.
+ *
+ * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
+ * cgroup_mutex, etc.
+ */
+struct mem_cgroup *mem_cgroup_from_obj(void *p)
+{
+	struct folio *folio;
+
+	if (mem_cgroup_disabled())
+		return NULL;
+
+	if (unlikely(is_vmalloc_addr(p)))
+		folio = page_folio(vmalloc_to_page(p));
+	else
+		folio = virt_to_folio(p);
+
+	return mem_cgroup_from_obj_folio(folio, p);
+}
+
+/*
+ * Returns a pointer to the memory cgroup to which the kernel object is charged.
+ * Similar to mem_cgroup_from_obj(), but faster and not suitable for objects,
+ * allocated using vmalloc().
+ *
+ * A passed kernel object must be a slab object or a generic kernel page.
+ *
+ * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
+ * cgroup_mutex, etc.
+ */
+struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
+{
+	if (mem_cgroup_disabled())
+		return NULL;
+
+	return mem_cgroup_from_obj_folio(virt_to_folio(p), p);
+}
+
 static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
 {
 	struct obj_cgroup *objcg = NULL;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09 17:26     ` Roman Gushchin
  0 siblings, 0 replies; 46+ messages in thread
From: Roman Gushchin @ 2022-06-09 17:26 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm, Stephen Rothwell, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Shakeel Butt,
	Vasily Averin, Qian Cai

On Tue, Jun 07, 2022 at 11:00:39AM +0530, Naresh Kamboju wrote:
> On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > I am bisecting this problem.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> >
> > The initial investigation show that,
> >
> > GOOD: next-20220603
> > BAD:  next-20220606
> >
> > Boot log:
> > Starting kernel ...
> 
> Linux next-20220606 and next-20220607 arm64 boot failed.
> The kernel panic log showing after earlycon.
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

Naresh, can you, please, check if the following patch resolves the issue?
(completely untested except for building)

--

From 6a454876c9a1886e3cf8e9b66dae19b326f8901a Mon Sep 17 00:00:00 2001
From: Roman Gushchin <roman.gushchin@linux.dev>
Date: Thu, 9 Jun 2022 10:03:20 -0700
Subject: [PATCH] mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe

Currently mem_cgroup_from_obj() is not working properly with objects
allocated using vmalloc(). It creates problems in some cases, when
it's called for static objects belonging to  modules or generally
allocated using vmalloc().

This patch makes mem_cgroup_from_obj() safe to be called on objects
allocated using vmalloc().

It also introduces mem_cgroup_from_slab_obj(), which is a faster
version to use in places when we know the object is either a slab
object or a generic slab page (e.g. when adding an object to a lru
list).

Suggested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/linux/memcontrol.h |  6 ++++
 mm/list_lru.c              |  2 +-
 mm/memcontrol.c            | 71 +++++++++++++++++++++++++++-----------
 3 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0d7584e2f335..4d31ce55b1c0 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1761,6 +1761,7 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg)
 }
 
 struct mem_cgroup *mem_cgroup_from_obj(void *p);
+struct mem_cgroup *mem_cgroup_from_slab_obj(void *p);
 
 static inline void count_objcg_event(struct obj_cgroup *objcg,
 				     enum vm_event_item idx)
@@ -1858,6 +1859,11 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	return NULL;
 }
 
+static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
+{
+	return NULL;
+}
+
 static inline void count_objcg_event(struct obj_cgroup *objcg,
 				     enum vm_event_item idx)
 {
diff --git a/mm/list_lru.c b/mm/list_lru.c
index ba76428ceece..a05e5bef3b40 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -71,7 +71,7 @@ list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
 	if (!list_lru_memcg_aware(lru))
 		goto out;
 
-	memcg = mem_cgroup_from_obj(ptr);
+	memcg = mem_cgroup_from_slab_obj(ptr);
 	if (!memcg)
 		goto out;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4093062c5c9b..8c408d681377 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -783,7 +783,7 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 	struct lruvec *lruvec;
 
 	rcu_read_lock();
-	memcg = mem_cgroup_from_obj(p);
+	memcg = mem_cgroup_from_slab_obj(p);
 
 	/*
 	 * Untracked pages have no memcg, no lruvec. Update only the
@@ -2833,27 +2833,9 @@ int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
 	return 0;
 }
 
-/*
- * Returns a pointer to the memory cgroup to which the kernel object is charged.
- *
- * A passed kernel object can be a slab object or a generic kernel page, so
- * different mechanisms for getting the memory cgroup pointer should be used.
- * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
- * can not know for sure how the kernel object is implemented.
- * mem_cgroup_from_obj() can be safely used in such cases.
- *
- * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
- * cgroup_mutex, etc.
- */
-struct mem_cgroup *mem_cgroup_from_obj(void *p)
+static __always_inline
+struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
 {
-	struct folio *folio;
-
-	if (mem_cgroup_disabled())
-		return NULL;
-
-	folio = virt_to_folio(p);
-
 	/*
 	 * Slab objects are accounted individually, not per-page.
 	 * Memcg membership data for each individual object is saved in
@@ -2886,6 +2868,53 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	return page_memcg_check(folio_page(folio, 0));
 }
 
+/*
+ * Returns a pointer to the memory cgroup to which the kernel object is charged.
+ *
+ * A passed kernel object can be a slab object, vmalloc object or a generic
+ * kernel page, so different mechanisms for getting the memory cgroup pointer
+ * should be used.
+ *
+ * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
+ * can not know for sure how the kernel object is implemented.
+ * mem_cgroup_from_obj() can be safely used in such cases.
+ *
+ * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
+ * cgroup_mutex, etc.
+ */
+struct mem_cgroup *mem_cgroup_from_obj(void *p)
+{
+	struct folio *folio;
+
+	if (mem_cgroup_disabled())
+		return NULL;
+
+	if (unlikely(is_vmalloc_addr(p)))
+		folio = page_folio(vmalloc_to_page(p));
+	else
+		folio = virt_to_folio(p);
+
+	return mem_cgroup_from_obj_folio(folio, p);
+}
+
+/*
+ * Returns a pointer to the memory cgroup to which the kernel object is charged.
+ * Similar to mem_cgroup_from_obj(), but faster and not suitable for objects,
+ * allocated using vmalloc().
+ *
+ * A passed kernel object must be a slab object or a generic kernel page.
+ *
+ * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
+ * cgroup_mutex, etc.
+ */
+struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
+{
+	if (mem_cgroup_disabled())
+		return NULL;
+
+	return mem_cgroup_from_obj_folio(virt_to_folio(p), p);
+}
+
 static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
 {
 	struct obj_cgroup *objcg = NULL;
-- 
2.35.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09 17:26     ` Roman Gushchin
@ 2022-06-09 17:47       ` Shakeel Butt
  -1 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-09 17:47 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Stephen Rothwell,
	Andrew Morton, Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Vasily Averin,
	Qian Cai

On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
[...]
> +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> +{
> +       struct folio *folio;
> +
> +       if (mem_cgroup_disabled())
> +               return NULL;
> +
> +       if (unlikely(is_vmalloc_addr(p)))
> +               folio = page_folio(vmalloc_to_page(p));

Do we need to check for NULL from vmalloc_to_page(p)?

> +       else
> +               folio = virt_to_folio(p);
> +
> +       return mem_cgroup_from_obj_folio(folio, p);
> +}

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09 17:47       ` Shakeel Butt
  0 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-09 17:47 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Stephen Rothwell,
	Andrew Morton, Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Vasily Averin,
	Qian Cai

On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
[...]
> +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> +{
> +       struct folio *folio;
> +
> +       if (mem_cgroup_disabled())
> +               return NULL;
> +
> +       if (unlikely(is_vmalloc_addr(p)))
> +               folio = page_folio(vmalloc_to_page(p));

Do we need to check for NULL from vmalloc_to_page(p)?

> +       else
> +               folio = virt_to_folio(p);
> +
> +       return mem_cgroup_from_obj_folio(folio, p);
> +}

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09 17:47       ` Shakeel Butt
@ 2022-06-09 17:56         ` Roman Gushchin
  -1 siblings, 0 replies; 46+ messages in thread
From: Roman Gushchin @ 2022-06-09 17:56 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Stephen Rothwell,
	Andrew Morton, Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Vasily Averin,
	Qian Cai

On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> [...]
> > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > +{
> > +       struct folio *folio;
> > +
> > +       if (mem_cgroup_disabled())
> > +               return NULL;
> > +
> > +       if (unlikely(is_vmalloc_addr(p)))
> > +               folio = page_folio(vmalloc_to_page(p));
> 
> Do we need to check for NULL from vmalloc_to_page(p)?

Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
I would be surprised, but maybe I'm missing something.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09 17:56         ` Roman Gushchin
  0 siblings, 0 replies; 46+ messages in thread
From: Roman Gushchin @ 2022-06-09 17:56 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Stephen Rothwell,
	Andrew Morton, Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Vasily Averin,
	Qian Cai

On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> [...]
> > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > +{
> > +       struct folio *folio;
> > +
> > +       if (mem_cgroup_disabled())
> > +               return NULL;
> > +
> > +       if (unlikely(is_vmalloc_addr(p)))
> > +               folio = page_folio(vmalloc_to_page(p));
> 
> Do we need to check for NULL from vmalloc_to_page(p)?

Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
I would be surprised, but maybe I'm missing something.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09 17:56         ` Roman Gushchin
@ 2022-06-09 19:12           ` Shakeel Butt
  -1 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-09 19:12 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Stephen Rothwell,
	Andrew Morton, Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Vasily Averin,
	Qian Cai

On Thu, Jun 09, 2022 at 10:56:09AM -0700, Roman Gushchin wrote:
> On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> > On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > >
> > [...]
> > > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > > +{
> > > +       struct folio *folio;
> > > +
> > > +       if (mem_cgroup_disabled())
> > > +               return NULL;
> > > +
> > > +       if (unlikely(is_vmalloc_addr(p)))
> > > +               folio = page_folio(vmalloc_to_page(p));
> > 
> > Do we need to check for NULL from vmalloc_to_page(p)?
> 
> Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
> I would be surprised, but maybe I'm missing something.

is_vmalloc_addr() is simply checking the range and some buggy caller can
provide an unmapped address within the range. Maybe VM_BUG_ON() should
be good enough (though no strong opinion either way).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09 19:12           ` Shakeel Butt
  0 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-09 19:12 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Stephen Rothwell,
	Andrew Morton, Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Vasily Averin,
	Qian Cai

On Thu, Jun 09, 2022 at 10:56:09AM -0700, Roman Gushchin wrote:
> On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> > On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > >
> > [...]
> > > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > > +{
> > > +       struct folio *folio;
> > > +
> > > +       if (mem_cgroup_disabled())
> > > +               return NULL;
> > > +
> > > +       if (unlikely(is_vmalloc_addr(p)))
> > > +               folio = page_folio(vmalloc_to_page(p));
> > 
> > Do we need to check for NULL from vmalloc_to_page(p)?
> 
> Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
> I would be surprised, but maybe I'm missing something.

is_vmalloc_addr() is simply checking the range and some buggy caller can
provide an unmapped address within the range. Maybe VM_BUG_ON() should
be good enough (though no strong opinion either way).

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09 19:12           ` Shakeel Butt
@ 2022-06-09 22:05             ` Roman Gushchin
  -1 siblings, 0 replies; 46+ messages in thread
From: Roman Gushchin @ 2022-06-09 22:05 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Stephen Rothwell,
	Andrew Morton, Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Vasily Averin,
	Qian Cai

On Thu, Jun 09, 2022 at 07:12:21PM +0000, Shakeel Butt wrote:
> On Thu, Jun 09, 2022 at 10:56:09AM -0700, Roman Gushchin wrote:
> > On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> > > On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > >
> > > [...]
> > > > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > > > +{
> > > > +       struct folio *folio;
> > > > +
> > > > +       if (mem_cgroup_disabled())
> > > > +               return NULL;
> > > > +
> > > > +       if (unlikely(is_vmalloc_addr(p)))
> > > > +               folio = page_folio(vmalloc_to_page(p));
> > > 
> > > Do we need to check for NULL from vmalloc_to_page(p)?
> > 
> > Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
> > I would be surprised, but maybe I'm missing something.
> 
> is_vmalloc_addr() is simply checking the range and some buggy caller can
> provide an unmapped address within the range. Maybe VM_BUG_ON() should
> be good enough (though no strong opinion either way).

No strong opinion here as well, but I think we don't have to be too defensive
here. Actually we'll know anyway, unlikely a null pointer dereference will be
unnoticed. And it's not different to calling mem_cgroup_from_obj() with some
random invalid address now.

Thanks!

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09 22:05             ` Roman Gushchin
  0 siblings, 0 replies; 46+ messages in thread
From: Roman Gushchin @ 2022-06-09 22:05 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Stephen Rothwell,
	Andrew Morton, Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Vasily Averin,
	Qian Cai

On Thu, Jun 09, 2022 at 07:12:21PM +0000, Shakeel Butt wrote:
> On Thu, Jun 09, 2022 at 10:56:09AM -0700, Roman Gushchin wrote:
> > On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> > > On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > >
> > > [...]
> > > > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > > > +{
> > > > +       struct folio *folio;
> > > > +
> > > > +       if (mem_cgroup_disabled())
> > > > +               return NULL;
> > > > +
> > > > +       if (unlikely(is_vmalloc_addr(p)))
> > > > +               folio = page_folio(vmalloc_to_page(p));
> > > 
> > > Do we need to check for NULL from vmalloc_to_page(p)?
> > 
> > Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
> > I would be surprised, but maybe I'm missing something.
> 
> is_vmalloc_addr() is simply checking the range and some buggy caller can
> provide an unmapped address within the range. Maybe VM_BUG_ON() should
> be good enough (though no strong opinion either way).

No strong opinion here as well, but I think we don't have to be too defensive
here. Actually we'll know anyway, unlikely a null pointer dereference will be
unnoticed. And it's not different to calling mem_cgroup_from_obj() with some
random invalid address now.

Thanks!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09 22:05             ` Roman Gushchin
@ 2022-06-09 22:16               ` Shakeel Butt
  -1 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-09 22:16 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Stephen Rothwell,
	Andrew Morton, Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Vasily Averin,
	Qian Cai

On Thu, Jun 09, 2022 at 03:05:08PM -0700, Roman Gushchin wrote:
> On Thu, Jun 09, 2022 at 07:12:21PM +0000, Shakeel Butt wrote:
> > On Thu, Jun 09, 2022 at 10:56:09AM -0700, Roman Gushchin wrote:
> > > On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> > > > On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > > >
> > > > [...]
> > > > > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > > > > +{
> > > > > +       struct folio *folio;
> > > > > +
> > > > > +       if (mem_cgroup_disabled())
> > > > > +               return NULL;
> > > > > +
> > > > > +       if (unlikely(is_vmalloc_addr(p)))
> > > > > +               folio = page_folio(vmalloc_to_page(p));
> > > > 
> > > > Do we need to check for NULL from vmalloc_to_page(p)?
> > > 
> > > Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
> > > I would be surprised, but maybe I'm missing something.
> > 
> > is_vmalloc_addr() is simply checking the range and some buggy caller can
> > provide an unmapped address within the range. Maybe VM_BUG_ON() should
> > be good enough (though no strong opinion either way).
> 
> No strong opinion here as well, but I think we don't have to be too defensive
> here. Actually we'll know anyway, unlikely a null pointer dereference will be
> unnoticed. And it's not different to calling mem_cgroup_from_obj() with some
> random invalid address now.
> 

Sounds good. You can add my ack when you send the official version of
the patch.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-09 22:16               ` Shakeel Butt
  0 siblings, 0 replies; 46+ messages in thread
From: Shakeel Butt @ 2022-06-09 22:16 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Naresh Kamboju, Linux-Next Mailing List, open list, regressions,
	lkft-triage, Linux ARM, linux-mm, Stephen Rothwell,
	Andrew Morton, Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Vasily Averin,
	Qian Cai

On Thu, Jun 09, 2022 at 03:05:08PM -0700, Roman Gushchin wrote:
> On Thu, Jun 09, 2022 at 07:12:21PM +0000, Shakeel Butt wrote:
> > On Thu, Jun 09, 2022 at 10:56:09AM -0700, Roman Gushchin wrote:
> > > On Thu, Jun 09, 2022 at 10:47:35AM -0700, Shakeel Butt wrote:
> > > > On Thu, Jun 9, 2022 at 10:27 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > > >
> > > > [...]
> > > > > +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> > > > > +{
> > > > > +       struct folio *folio;
> > > > > +
> > > > > +       if (mem_cgroup_disabled())
> > > > > +               return NULL;
> > > > > +
> > > > > +       if (unlikely(is_vmalloc_addr(p)))
> > > > > +               folio = page_folio(vmalloc_to_page(p));
> > > > 
> > > > Do we need to check for NULL from vmalloc_to_page(p)?
> > > 
> > > Idk, can it realistically return NULL after is_vmalloc_addr() returned true?
> > > I would be surprised, but maybe I'm missing something.
> > 
> > is_vmalloc_addr() is simply checking the range and some buggy caller can
> > provide an unmapped address within the range. Maybe VM_BUG_ON() should
> > be good enough (though no strong opinion either way).
> 
> No strong opinion here as well, but I think we don't have to be too defensive
> here. Actually we'll know anyway, unlikely a null pointer dereference will be
> unnoticed. And it's not different to calling mem_cgroup_from_obj() with some
> random invalid address now.
> 

Sounds good. You can add my ack when you send the official version of
the patch.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
  2022-06-09 17:26     ` Roman Gushchin
@ 2022-06-10 10:56       ` Naresh Kamboju
  -1 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-10 10:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm, Stephen Rothwell, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Shakeel Butt,
	Vasily Averin, Qian Cai

Hi Roman,

On Thu, 9 Jun 2022 at 22:57, Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Tue, Jun 07, 2022 at 11:00:39AM +0530, Naresh Kamboju wrote:
> > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > >
> > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > I am bisecting this problem.
> > >
> > > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > >
> > > The initial investigation show that,
> > >
> > > GOOD: next-20220603
> > > BAD:  next-20220606
> > >
> > > Boot log:
> > > Starting kernel ...
> >
> > Linux next-20220606 and next-20220607 arm64 boot failed.
> > The kernel panic log showing after earlycon.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>
> Naresh, can you, please, check if the following patch resolves the issue?
> (completely untested except for building)

I have tested this patch on top of next-20220606 and boot successfully [1].

Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>

> --
>
> From 6a454876c9a1886e3cf8e9b66dae19b326f8901a Mon Sep 17 00:00:00 2001
> From: Roman Gushchin <roman.gushchin@linux.dev>
> Date: Thu, 9 Jun 2022 10:03:20 -0700
> Subject: [PATCH] mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe
>
> Currently mem_cgroup_from_obj() is not working properly with objects
> allocated using vmalloc(). It creates problems in some cases, when
> it's called for static objects belonging to  modules or generally
> allocated using vmalloc().
>
> This patch makes mem_cgroup_from_obj() safe to be called on objects
> allocated using vmalloc().
>
> It also introduces mem_cgroup_from_slab_obj(), which is a faster
> version to use in places when we know the object is either a slab
> object or a generic slab page (e.g. when adding an object to a lru
> list).
>
> Suggested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> ---
>  include/linux/memcontrol.h |  6 ++++
>  mm/list_lru.c              |  2 +-
>  mm/memcontrol.c            | 71 +++++++++++++++++++++++++++-----------
>  3 files changed, 57 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0d7584e2f335..4d31ce55b1c0 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1761,6 +1761,7 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg)
>  }
>
>  struct mem_cgroup *mem_cgroup_from_obj(void *p);
> +struct mem_cgroup *mem_cgroup_from_slab_obj(void *p);
>
>  static inline void count_objcg_event(struct obj_cgroup *objcg,
>                                      enum vm_event_item idx)
> @@ -1858,6 +1859,11 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
>         return NULL;
>  }
>
> +static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
> +{
> +       return NULL;
> +}
> +
>  static inline void count_objcg_event(struct obj_cgroup *objcg,
>                                      enum vm_event_item idx)
>  {
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index ba76428ceece..a05e5bef3b40 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -71,7 +71,7 @@ list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
>         if (!list_lru_memcg_aware(lru))
>                 goto out;
>
> -       memcg = mem_cgroup_from_obj(ptr);
> +       memcg = mem_cgroup_from_slab_obj(ptr);
>         if (!memcg)
>                 goto out;
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4093062c5c9b..8c408d681377 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -783,7 +783,7 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
>         struct lruvec *lruvec;
>
>         rcu_read_lock();
> -       memcg = mem_cgroup_from_obj(p);
> +       memcg = mem_cgroup_from_slab_obj(p);
>
>         /*
>          * Untracked pages have no memcg, no lruvec. Update only the
> @@ -2833,27 +2833,9 @@ int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
>         return 0;
>  }
>
> -/*
> - * Returns a pointer to the memory cgroup to which the kernel object is charged.
> - *
> - * A passed kernel object can be a slab object or a generic kernel page, so
> - * different mechanisms for getting the memory cgroup pointer should be used.
> - * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
> - * can not know for sure how the kernel object is implemented.
> - * mem_cgroup_from_obj() can be safely used in such cases.
> - *
> - * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
> - * cgroup_mutex, etc.
> - */
> -struct mem_cgroup *mem_cgroup_from_obj(void *p)
> +static __always_inline
> +struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
>  {
> -       struct folio *folio;
> -
> -       if (mem_cgroup_disabled())
> -               return NULL;
> -
> -       folio = virt_to_folio(p);
> -
>         /*
>          * Slab objects are accounted individually, not per-page.
>          * Memcg membership data for each individual object is saved in
> @@ -2886,6 +2868,53 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>         return page_memcg_check(folio_page(folio, 0));
>  }
>
> +/*
> + * Returns a pointer to the memory cgroup to which the kernel object is charged.
> + *
> + * A passed kernel object can be a slab object, vmalloc object or a generic
> + * kernel page, so different mechanisms for getting the memory cgroup pointer
> + * should be used.
> + *
> + * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
> + * can not know for sure how the kernel object is implemented.
> + * mem_cgroup_from_obj() can be safely used in such cases.
> + *
> + * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
> + * cgroup_mutex, etc.
> + */
> +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> +{
> +       struct folio *folio;
> +
> +       if (mem_cgroup_disabled())
> +               return NULL;
> +
> +       if (unlikely(is_vmalloc_addr(p)))
> +               folio = page_folio(vmalloc_to_page(p));
> +       else
> +               folio = virt_to_folio(p);
> +
> +       return mem_cgroup_from_obj_folio(folio, p);
> +}
> +
> +/*
> + * Returns a pointer to the memory cgroup to which the kernel object is charged.
> + * Similar to mem_cgroup_from_obj(), but faster and not suitable for objects,
> + * allocated using vmalloc().
> + *
> + * A passed kernel object must be a slab object or a generic kernel page.
> + *
> + * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
> + * cgroup_mutex, etc.
> + */
> +struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
> +{
> +       if (mem_cgroup_disabled())
> +               return NULL;
> +
> +       return mem_cgroup_from_obj_folio(virt_to_folio(p), p);
> +}
> +
>  static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
>  {
>         struct obj_cgroup *objcg = NULL;
> --
> 2.35.3

[1] https://lkft.validation.linaro.org/scheduler/job/5156201

--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next] arm64: boot failed - next-20220606
@ 2022-06-10 10:56       ` Naresh Kamboju
  0 siblings, 0 replies; 46+ messages in thread
From: Naresh Kamboju @ 2022-06-10 10:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Linux-Next Mailing List, open list, regressions, lkft-triage,
	Linux ARM, linux-mm, Stephen Rothwell, Andrew Morton,
	Ard Biesheuvel, Arnd Bergmann, Catalin Marinas,
	Raghuram Thammiraju, Mark Brown, Will Deacon, Shakeel Butt,
	Vasily Averin, Qian Cai

Hi Roman,

On Thu, 9 Jun 2022 at 22:57, Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Tue, Jun 07, 2022 at 11:00:39AM +0530, Naresh Kamboju wrote:
> > On Mon, 6 Jun 2022 at 17:16, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > >
> > > Linux next-20220606 arm64 boot failed. The kernel boot log is empty.
> > > I am bisecting this problem.
> > >
> > > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > >
> > > The initial investigation show that,
> > >
> > > GOOD: next-20220603
> > > BAD:  next-20220606
> > >
> > > Boot log:
> > > Starting kernel ...
> >
> > Linux next-20220606 and next-20220607 arm64 boot failed.
> > The kernel panic log showing after earlycon.
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>
> Naresh, can you, please, check if the following patch resolves the issue?
> (completely untested except for building)

I have tested this patch on top of next-20220606 and boot successfully [1].

Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>

> --
>
> From 6a454876c9a1886e3cf8e9b66dae19b326f8901a Mon Sep 17 00:00:00 2001
> From: Roman Gushchin <roman.gushchin@linux.dev>
> Date: Thu, 9 Jun 2022 10:03:20 -0700
> Subject: [PATCH] mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe
>
> Currently mem_cgroup_from_obj() is not working properly with objects
> allocated using vmalloc(). It creates problems in some cases, when
> it's called for static objects belonging to  modules or generally
> allocated using vmalloc().
>
> This patch makes mem_cgroup_from_obj() safe to be called on objects
> allocated using vmalloc().
>
> It also introduces mem_cgroup_from_slab_obj(), which is a faster
> version to use in places when we know the object is either a slab
> object or a generic slab page (e.g. when adding an object to a lru
> list).
>
> Suggested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> ---
>  include/linux/memcontrol.h |  6 ++++
>  mm/list_lru.c              |  2 +-
>  mm/memcontrol.c            | 71 +++++++++++++++++++++++++++-----------
>  3 files changed, 57 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0d7584e2f335..4d31ce55b1c0 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1761,6 +1761,7 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg)
>  }
>
>  struct mem_cgroup *mem_cgroup_from_obj(void *p);
> +struct mem_cgroup *mem_cgroup_from_slab_obj(void *p);
>
>  static inline void count_objcg_event(struct obj_cgroup *objcg,
>                                      enum vm_event_item idx)
> @@ -1858,6 +1859,11 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
>         return NULL;
>  }
>
> +static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
> +{
> +       return NULL;
> +}
> +
>  static inline void count_objcg_event(struct obj_cgroup *objcg,
>                                      enum vm_event_item idx)
>  {
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index ba76428ceece..a05e5bef3b40 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -71,7 +71,7 @@ list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
>         if (!list_lru_memcg_aware(lru))
>                 goto out;
>
> -       memcg = mem_cgroup_from_obj(ptr);
> +       memcg = mem_cgroup_from_slab_obj(ptr);
>         if (!memcg)
>                 goto out;
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4093062c5c9b..8c408d681377 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -783,7 +783,7 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
>         struct lruvec *lruvec;
>
>         rcu_read_lock();
> -       memcg = mem_cgroup_from_obj(p);
> +       memcg = mem_cgroup_from_slab_obj(p);
>
>         /*
>          * Untracked pages have no memcg, no lruvec. Update only the
> @@ -2833,27 +2833,9 @@ int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
>         return 0;
>  }
>
> -/*
> - * Returns a pointer to the memory cgroup to which the kernel object is charged.
> - *
> - * A passed kernel object can be a slab object or a generic kernel page, so
> - * different mechanisms for getting the memory cgroup pointer should be used.
> - * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
> - * can not know for sure how the kernel object is implemented.
> - * mem_cgroup_from_obj() can be safely used in such cases.
> - *
> - * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
> - * cgroup_mutex, etc.
> - */
> -struct mem_cgroup *mem_cgroup_from_obj(void *p)
> +static __always_inline
> +struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
>  {
> -       struct folio *folio;
> -
> -       if (mem_cgroup_disabled())
> -               return NULL;
> -
> -       folio = virt_to_folio(p);
> -
>         /*
>          * Slab objects are accounted individually, not per-page.
>          * Memcg membership data for each individual object is saved in
> @@ -2886,6 +2868,53 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>         return page_memcg_check(folio_page(folio, 0));
>  }
>
> +/*
> + * Returns a pointer to the memory cgroup to which the kernel object is charged.
> + *
> + * A passed kernel object can be a slab object, vmalloc object or a generic
> + * kernel page, so different mechanisms for getting the memory cgroup pointer
> + * should be used.
> + *
> + * In certain cases (e.g. kernel stacks or large kmallocs with SLUB) the caller
> + * can not know for sure how the kernel object is implemented.
> + * mem_cgroup_from_obj() can be safely used in such cases.
> + *
> + * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
> + * cgroup_mutex, etc.
> + */
> +struct mem_cgroup *mem_cgroup_from_obj(void *p)
> +{
> +       struct folio *folio;
> +
> +       if (mem_cgroup_disabled())
> +               return NULL;
> +
> +       if (unlikely(is_vmalloc_addr(p)))
> +               folio = page_folio(vmalloc_to_page(p));
> +       else
> +               folio = virt_to_folio(p);
> +
> +       return mem_cgroup_from_obj_folio(folio, p);
> +}
> +
> +/*
> + * Returns a pointer to the memory cgroup to which the kernel object is charged.
> + * Similar to mem_cgroup_from_obj(), but faster and not suitable for objects,
> + * allocated using vmalloc().
> + *
> + * A passed kernel object must be a slab object or a generic kernel page.
> + *
> + * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
> + * cgroup_mutex, etc.
> + */
> +struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
> +{
> +       if (mem_cgroup_disabled())
> +               return NULL;
> +
> +       return mem_cgroup_from_obj_folio(virt_to_folio(p), p);
> +}
> +
>  static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
>  {
>         struct obj_cgroup *objcg = NULL;
> --
> 2.35.3

[1] https://lkft.validation.linaro.org/scheduler/job/5156201

--
Linaro LKFT
https://lkft.linaro.org

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2022-06-10 10:57 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-06 11:46 [next] arm64: boot failed - next-20220606 Naresh Kamboju
2022-06-06 11:46 ` Naresh Kamboju
2022-06-07  5:30 ` Naresh Kamboju
2022-06-07  5:30   ` Naresh Kamboju
2022-06-07  6:25   ` Stephen Rothwell
2022-06-07  6:25     ` Stephen Rothwell
2022-06-07  6:36     ` Shakeel Butt
2022-06-07  6:36       ` Shakeel Butt
2022-06-07  6:44       ` Shakeel Butt
2022-06-07  6:44         ` Shakeel Butt
2022-06-07 10:27         ` Naresh Kamboju
2022-06-07 10:27           ` Naresh Kamboju
2022-06-07 14:17           ` Shakeel Butt
2022-06-07 14:17             ` Shakeel Butt
2022-06-07 15:29             ` Naresh Kamboju
2022-06-07 15:29               ` Naresh Kamboju
2022-06-09  2:49               ` Vasily Averin
2022-06-09  2:49                 ` Vasily Averin
2022-06-09  3:44                 ` Kefeng Wang
2022-06-09  3:44                   ` Kefeng Wang
2022-06-09  4:43                   ` Kefeng Wang
2022-06-09  4:43                     ` Kefeng Wang
2022-06-09  5:19                     ` Roman Gushchin
2022-06-09  5:19                       ` Roman Gushchin
2022-06-09 10:11                   ` Will Deacon
2022-06-09 10:11                     ` Will Deacon
2022-06-09 10:25                     ` Catalin Marinas
2022-06-09 10:25                       ` Catalin Marinas
2022-06-09 15:23                       ` Shakeel Butt
2022-06-09 15:23                         ` Shakeel Butt
2022-06-07 10:24     ` Naresh Kamboju
2022-06-07 10:24       ` Naresh Kamboju
2022-06-09 17:26   ` Roman Gushchin
2022-06-09 17:26     ` Roman Gushchin
2022-06-09 17:47     ` Shakeel Butt
2022-06-09 17:47       ` Shakeel Butt
2022-06-09 17:56       ` Roman Gushchin
2022-06-09 17:56         ` Roman Gushchin
2022-06-09 19:12         ` Shakeel Butt
2022-06-09 19:12           ` Shakeel Butt
2022-06-09 22:05           ` Roman Gushchin
2022-06-09 22:05             ` Roman Gushchin
2022-06-09 22:16             ` Shakeel Butt
2022-06-09 22:16               ` Shakeel Butt
2022-06-10 10:56     ` Naresh Kamboju
2022-06-10 10:56       ` Naresh Kamboju

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.