From: Vasily Averin <vvs@openvz.org> To: Naresh Kamboju <naresh.kamboju@linaro.org>, Shakeel Butt <shakeelb@google.com>, Linux ARM <linux-arm-kernel@lists.infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au>, Linux-Next Mailing List <linux-next@vger.kernel.org>, open list <linux-kernel@vger.kernel.org>, regressions@lists.linux.dev, lkft-triage@lists.linaro.org, linux-mm <linux-mm@kvack.org>, Andrew Morton <akpm@linux-foundation.org>, Ard Biesheuvel <ardb@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Catalin Marinas <catalin.marinas@arm.com>, Raghuram Thammiraju <raghuram.thammiraju@arm.com>, Mark Brown <broonie@kernel.org>, Will Deacon <will@kernel.org>, Roman Gushchin <roman.gushchin@linux.dev>, Qian Cai <quic_qiancai@quicinc.com> Subject: Re: [next] arm64: boot failed - next-20220606 Date: Thu, 9 Jun 2022 05:49:02 +0300 [thread overview] Message-ID: <2a4cc632-c936-1e42-4fdc-572334c58ee1@openvz.org> (raw) In-Reply-To: <CA+G9fYu6mayYrrYK+0Rn1K7HOM6WbaOhnJSx-Wv6CaKBDpaT2g@mail.gmail.com> Dear ARM developers, could you please help me to find the reason of this problem? On 6/7/22 18:29, Naresh Kamboju wrote: > On Tue, 7 Jun 2022 at 19:47, Shakeel Butt <shakeelb@google.com> wrote: >> >> On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote: >>> >>> Hi Shakeel, >>> >>>>>> Can you test v5.19-rc1, please? If that does not fail, then you could >>>>>> bisect between that and next-20220606 ... >>>>>> >>>>> >>>>> This is already reported at >>>>> https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know >>>>> the underlying issue (which is calling virt_to_page() on a vmalloc >>>>> address). >>>> >>>> Sorry, I might be wrong. Just checked the stacktrace again and it >>>> seems like the failure is happening in early boot in this report. >>>> Though the error "Unable to handle kernel paging request at virtual >>>> address" is happening in the function mem_cgroup_from_obj(). >>>> >>>> Naresh, can you repro the issue if you revert the patch "net: set >>>> proper memcg for net_init hooks allocations"? >>> >>> yes. You are right ! >>> 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations") >>> After reverting this single commit I am able to boot arm64 successfully. >>> >>> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> >>> >> >> Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"? > > ./scripts/faddr2line vmlinux mem_cgroup_from_obj+0x2c/0x120 > mem_cgroup_from_obj+0x2c/0x120: > mem_cgroup_from_obj at ??:? > > Please find the following artifacts which are causing kernel crashes. > > vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz > System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map Dear Naresh, thank you very much mem_cgroup_from_obj(): ffff80000836cf40: d503245f bti c ffff80000836cf44: d503201f nop ffff80000836cf48: d503201f nop ffff80000836cf4c: d503233f paciasp ffff80000836cf50: d503201f nop ffff80000836cf54: d2e00021 mov x1, #0x1000000000000 // #281474976710656 ffff80000836cf58: 8b010001 add x1, x0, x1 ffff80000836cf5c: b25657e4 mov x4, #0xfffffc0000000000 // #-4398046511104 ffff80000836cf60: d34cfc21 lsr x1, x1, #12 ffff80000836cf64: d37ae421 lsl x1, x1, #6 ffff80000836cf68: 8b040022 add x2, x1, x4 ffff80000836cf6c: f9400443 ldr x3, [x2, #8] x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680 x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740 x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj() according to System.map it is init_net This issue is caused by calling virt_to_page() on address of static variable init_net. Arm64 consider that addresses of static variables are not valid virtual addresses. On x86_64 the same API works without any problem. Unfortunately I do not understand the cause of the problem. I do not see any bugs in my patch. I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used to account for the specified object. In particular, in the current case, I wanted to get the memory cgroup of the specified network namespace by the name taken from for_each_net(). The first object in this list is the static structure unit_net On x86_64 I can translate its address to page: crash> p &init_net $1 = (struct net *) 0xffffffff90c7bdc0 <init_net> crash> vtop 0xffffffff90c7bdc0 VIRTUAL PHYSICAL ffffffff90c7bdc0 402c7bdc0 PGD DIRECTORY: ffffffff8fe10000 PAGE DIRECTORY: 401e15067 PUD: 401e15ff0 => 401e16063 PMD: 401e16430 => 8000000402c000e3 PAGE: 402c00000 (2MB) PTE PHYSICAL FLAGS 8000000402c000e3 402c00000 (PRESENT|RW|ACCESSED|DIRTY|PSE|NX) PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffff227d00b1ec0 402c7b000 0 0 1 17ffffc0001000 reserved However, as far as I understand this does not work for arm64. Could you please help me to understand what is wrong here? Below are: link to my patch: https://lore.kernel.org/all/20220603182442.63750C385B8@smtp.kernel.org/ and the quote of my investigation of similar report: https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ > virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net) > WARNING: CPU: 87 PID: 3170 at arch/arm64/mm/physaddr.c:12 __virt_to_phys ... > Call trace: > __virt_to_phys > mem_cgroup_from_obj > __register_pernet_operations @@ -1143,7 +1144,13 @@ static int __register_pernet_operations(struct list_head *list, * setup_net() and cleanup_net() are not possible. */ for_each_net(net) { + struct mem_cgroup *old, *memcg; + + memcg = mem_cgroup_or_root(get_mem_cgroup_from_obj(net)); <<<< Here + old = set_active_memcg(memcg); error = ops_init(ops, net); + set_active_memcg(old); + mem_cgroup_put(memcg); ... +static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p) +{ + struct mem_cgroup *memcg; + + rcu_read_lock(); + do { + memcg = mem_cgroup_from_obj(p); <<<< + } while (memcg && !css_tryget(&memcg->css)); ... struct mem_cgroup *mem_cgroup_from_obj(void *p) { struct folio *folio; if (mem_cgroup_disabled()) return NULL; folio = virt_to_folio(p); <<<< here ... static inline struct folio *virt_to_folio(const void *x) { struct page *page = virt_to_page(x); <<< here ... (arm64) #define virt_to_page(x) pfn_to_page(virt_to_pfn(x)) ... #define virt_to_pfn(x) __phys_to_pfn(__virt_to_phys((unsigned long)(x))) ... phys_addr_t __virt_to_phys(unsigned long x) { WARN(!__is_lm_address(__tag_reset(x)), "virt_to_phys used for non-linear address: %pK (%pS)\n", ... virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net) Thank you, Vasily Averin
WARNING: multiple messages have this Message-ID (diff)
From: Vasily Averin <vvs@openvz.org> To: Naresh Kamboju <naresh.kamboju@linaro.org>, Shakeel Butt <shakeelb@google.com>, Linux ARM <linux-arm-kernel@lists.infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au>, Linux-Next Mailing List <linux-next@vger.kernel.org>, open list <linux-kernel@vger.kernel.org>, regressions@lists.linux.dev, lkft-triage@lists.linaro.org, linux-mm <linux-mm@kvack.org>, Andrew Morton <akpm@linux-foundation.org>, Ard Biesheuvel <ardb@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Catalin Marinas <catalin.marinas@arm.com>, Raghuram Thammiraju <raghuram.thammiraju@arm.com>, Mark Brown <broonie@kernel.org>, Will Deacon <will@kernel.org>, Roman Gushchin <roman.gushchin@linux.dev>, Qian Cai <quic_qiancai@quicinc.com> Subject: Re: [next] arm64: boot failed - next-20220606 Date: Thu, 9 Jun 2022 05:49:02 +0300 [thread overview] Message-ID: <2a4cc632-c936-1e42-4fdc-572334c58ee1@openvz.org> (raw) In-Reply-To: <CA+G9fYu6mayYrrYK+0Rn1K7HOM6WbaOhnJSx-Wv6CaKBDpaT2g@mail.gmail.com> Dear ARM developers, could you please help me to find the reason of this problem? On 6/7/22 18:29, Naresh Kamboju wrote: > On Tue, 7 Jun 2022 at 19:47, Shakeel Butt <shakeelb@google.com> wrote: >> >> On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote: >>> >>> Hi Shakeel, >>> >>>>>> Can you test v5.19-rc1, please? If that does not fail, then you could >>>>>> bisect between that and next-20220606 ... >>>>>> >>>>> >>>>> This is already reported at >>>>> https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know >>>>> the underlying issue (which is calling virt_to_page() on a vmalloc >>>>> address). >>>> >>>> Sorry, I might be wrong. Just checked the stacktrace again and it >>>> seems like the failure is happening in early boot in this report. >>>> Though the error "Unable to handle kernel paging request at virtual >>>> address" is happening in the function mem_cgroup_from_obj(). >>>> >>>> Naresh, can you repro the issue if you revert the patch "net: set >>>> proper memcg for net_init hooks allocations"? >>> >>> yes. You are right ! >>> 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations") >>> After reverting this single commit I am able to boot arm64 successfully. >>> >>> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> >>> >> >> Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"? > > ./scripts/faddr2line vmlinux mem_cgroup_from_obj+0x2c/0x120 > mem_cgroup_from_obj+0x2c/0x120: > mem_cgroup_from_obj at ??:? > > Please find the following artifacts which are causing kernel crashes. > > vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz > System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map Dear Naresh, thank you very much mem_cgroup_from_obj(): ffff80000836cf40: d503245f bti c ffff80000836cf44: d503201f nop ffff80000836cf48: d503201f nop ffff80000836cf4c: d503233f paciasp ffff80000836cf50: d503201f nop ffff80000836cf54: d2e00021 mov x1, #0x1000000000000 // #281474976710656 ffff80000836cf58: 8b010001 add x1, x0, x1 ffff80000836cf5c: b25657e4 mov x4, #0xfffffc0000000000 // #-4398046511104 ffff80000836cf60: d34cfc21 lsr x1, x1, #12 ffff80000836cf64: d37ae421 lsl x1, x1, #6 ffff80000836cf68: 8b040022 add x2, x1, x4 ffff80000836cf6c: f9400443 ldr x3, [x2, #8] x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680 x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740 x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj() according to System.map it is init_net This issue is caused by calling virt_to_page() on address of static variable init_net. Arm64 consider that addresses of static variables are not valid virtual addresses. On x86_64 the same API works without any problem. Unfortunately I do not understand the cause of the problem. I do not see any bugs in my patch. I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used to account for the specified object. In particular, in the current case, I wanted to get the memory cgroup of the specified network namespace by the name taken from for_each_net(). The first object in this list is the static structure unit_net On x86_64 I can translate its address to page: crash> p &init_net $1 = (struct net *) 0xffffffff90c7bdc0 <init_net> crash> vtop 0xffffffff90c7bdc0 VIRTUAL PHYSICAL ffffffff90c7bdc0 402c7bdc0 PGD DIRECTORY: ffffffff8fe10000 PAGE DIRECTORY: 401e15067 PUD: 401e15ff0 => 401e16063 PMD: 401e16430 => 8000000402c000e3 PAGE: 402c00000 (2MB) PTE PHYSICAL FLAGS 8000000402c000e3 402c00000 (PRESENT|RW|ACCESSED|DIRTY|PSE|NX) PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffff227d00b1ec0 402c7b000 0 0 1 17ffffc0001000 reserved However, as far as I understand this does not work for arm64. Could you please help me to understand what is wrong here? Below are: link to my patch: https://lore.kernel.org/all/20220603182442.63750C385B8@smtp.kernel.org/ and the quote of my investigation of similar report: https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ > virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net) > WARNING: CPU: 87 PID: 3170 at arch/arm64/mm/physaddr.c:12 __virt_to_phys ... > Call trace: > __virt_to_phys > mem_cgroup_from_obj > __register_pernet_operations @@ -1143,7 +1144,13 @@ static int __register_pernet_operations(struct list_head *list, * setup_net() and cleanup_net() are not possible. */ for_each_net(net) { + struct mem_cgroup *old, *memcg; + + memcg = mem_cgroup_or_root(get_mem_cgroup_from_obj(net)); <<<< Here + old = set_active_memcg(memcg); error = ops_init(ops, net); + set_active_memcg(old); + mem_cgroup_put(memcg); ... +static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p) +{ + struct mem_cgroup *memcg; + + rcu_read_lock(); + do { + memcg = mem_cgroup_from_obj(p); <<<< + } while (memcg && !css_tryget(&memcg->css)); ... struct mem_cgroup *mem_cgroup_from_obj(void *p) { struct folio *folio; if (mem_cgroup_disabled()) return NULL; folio = virt_to_folio(p); <<<< here ... static inline struct folio *virt_to_folio(const void *x) { struct page *page = virt_to_page(x); <<< here ... (arm64) #define virt_to_page(x) pfn_to_page(virt_to_pfn(x)) ... #define virt_to_pfn(x) __phys_to_pfn(__virt_to_phys((unsigned long)(x))) ... phys_addr_t __virt_to_phys(unsigned long x) { WARN(!__is_lm_address(__tag_reset(x)), "virt_to_phys used for non-linear address: %pK (%pS)\n", ... virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net) Thank you, Vasily Averin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-06-09 2:49 UTC|newest] Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-06-06 11:46 [next] arm64: boot failed - next-20220606 Naresh Kamboju 2022-06-06 11:46 ` Naresh Kamboju 2022-06-07 5:30 ` Naresh Kamboju 2022-06-07 5:30 ` Naresh Kamboju 2022-06-07 6:25 ` Stephen Rothwell 2022-06-07 6:25 ` Stephen Rothwell 2022-06-07 6:36 ` Shakeel Butt 2022-06-07 6:36 ` Shakeel Butt 2022-06-07 6:44 ` Shakeel Butt 2022-06-07 6:44 ` Shakeel Butt 2022-06-07 10:27 ` Naresh Kamboju 2022-06-07 10:27 ` Naresh Kamboju 2022-06-07 14:17 ` Shakeel Butt 2022-06-07 14:17 ` Shakeel Butt 2022-06-07 15:29 ` Naresh Kamboju 2022-06-07 15:29 ` Naresh Kamboju 2022-06-09 2:49 ` Vasily Averin [this message] 2022-06-09 2:49 ` Vasily Averin 2022-06-09 3:44 ` Kefeng Wang 2022-06-09 3:44 ` Kefeng Wang 2022-06-09 4:43 ` Kefeng Wang 2022-06-09 4:43 ` Kefeng Wang 2022-06-09 5:19 ` Roman Gushchin 2022-06-09 5:19 ` Roman Gushchin 2022-06-09 10:11 ` Will Deacon 2022-06-09 10:11 ` Will Deacon 2022-06-09 10:25 ` Catalin Marinas 2022-06-09 10:25 ` Catalin Marinas 2022-06-09 15:23 ` Shakeel Butt 2022-06-09 15:23 ` Shakeel Butt 2022-06-07 10:24 ` Naresh Kamboju 2022-06-07 10:24 ` Naresh Kamboju 2022-06-09 17:26 ` Roman Gushchin 2022-06-09 17:26 ` Roman Gushchin 2022-06-09 17:47 ` Shakeel Butt 2022-06-09 17:47 ` Shakeel Butt 2022-06-09 17:56 ` Roman Gushchin 2022-06-09 17:56 ` Roman Gushchin 2022-06-09 19:12 ` Shakeel Butt 2022-06-09 19:12 ` Shakeel Butt 2022-06-09 22:05 ` Roman Gushchin 2022-06-09 22:05 ` Roman Gushchin 2022-06-09 22:16 ` Shakeel Butt 2022-06-09 22:16 ` Shakeel Butt 2022-06-10 10:56 ` Naresh Kamboju 2022-06-10 10:56 ` Naresh Kamboju
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=2a4cc632-c936-1e42-4fdc-572334c58ee1@openvz.org \ --to=vvs@openvz.org \ --cc=akpm@linux-foundation.org \ --cc=ardb@kernel.org \ --cc=arnd@arndb.de \ --cc=broonie@kernel.org \ --cc=catalin.marinas@arm.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-next@vger.kernel.org \ --cc=lkft-triage@lists.linaro.org \ --cc=naresh.kamboju@linaro.org \ --cc=quic_qiancai@quicinc.com \ --cc=raghuram.thammiraju@arm.com \ --cc=regressions@lists.linux.dev \ --cc=roman.gushchin@linux.dev \ --cc=sfr@canb.auug.org.au \ --cc=shakeelb@google.com \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.