All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vasily Averin <vvs@openvz.org>
To: Naresh Kamboju <naresh.kamboju@linaro.org>,
	Shakeel Butt <shakeelb@google.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
	Linux-Next Mailing List <linux-next@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	regressions@lists.linux.dev, lkft-triage@lists.linaro.org,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ard Biesheuvel <ardb@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Raghuram Thammiraju <raghuram.thammiraju@arm.com>,
	Mark Brown <broonie@kernel.org>, Will Deacon <will@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Qian Cai <quic_qiancai@quicinc.com>
Subject: Re: [next] arm64: boot failed - next-20220606
Date: Thu, 9 Jun 2022 05:49:02 +0300	[thread overview]
Message-ID: <2a4cc632-c936-1e42-4fdc-572334c58ee1@openvz.org> (raw)
In-Reply-To: <CA+G9fYu6mayYrrYK+0Rn1K7HOM6WbaOhnJSx-Wv6CaKBDpaT2g@mail.gmail.com>

Dear ARM developers,
could you please help me to find the reason of this problem?

On 6/7/22 18:29, Naresh Kamboju wrote:
> On Tue, 7 Jun 2022 at 19:47, Shakeel Butt <shakeelb@google.com> wrote:
>>
>> On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>>>
>>> Hi Shakeel,
>>>
>>>>>> Can you test v5.19-rc1, please?  If that does not fail, then you could
>>>>>> bisect between that and next-20220606 ...
>>>>>>
>>>>>
>>>>> This is already reported at
>>>>> https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
>>>>> the underlying issue (which is calling virt_to_page() on a vmalloc
>>>>> address).
>>>>
>>>> Sorry, I might be wrong. Just checked the stacktrace again and it
>>>> seems like the failure is happening in early boot in this report.
>>>> Though the error "Unable to handle kernel paging request at virtual
>>>> address" is happening in the function mem_cgroup_from_obj().
>>>>
>>>> Naresh, can you repro the issue if you revert the patch "net: set
>>>> proper memcg for net_init hooks allocations"?
>>>
>>> yes. You are right !
>>> 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
>>> After reverting this single commit I am able to boot arm64 successfully.
>>>
>>> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>>>
>>
>> Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?
> 
> ./scripts/faddr2line vmlinux  mem_cgroup_from_obj+0x2c/0x120
> mem_cgroup_from_obj+0x2c/0x120:
> mem_cgroup_from_obj at ??:?
> 
> Please find the following artifacts which are causing kernel crashes.
> 
> vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz
> System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map

Dear Naresh,
thank you very much

mem_cgroup_from_obj():
ffff80000836cf40:       d503245f        bti     c
ffff80000836cf44:       d503201f        nop
ffff80000836cf48:       d503201f        nop
ffff80000836cf4c:       d503233f        paciasp
ffff80000836cf50:       d503201f        nop
ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
ffff80000836cf58:       8b010001        add     x1, x0, x1
ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
ffff80000836cf68:       8b040022        add     x2, x1, x4
ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]

x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740

x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
according to System.map it is init_net

This issue is caused by calling virt_to_page() on address of static variable init_net.
Arm64 consider that addresses of static variables are not valid virtual addresses.
On x86_64 the same API works without any problem.

Unfortunately I do not understand the cause of the problem.
I do not see any bugs in my patch.
I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
to account for the specified object.
In particular, in the current case, I wanted to get the memory cgroup of the
specified network namespace by the name taken from for_each_net(). 
The first object in this list is the static structure unit_net

On x86_64 I can translate its address to page:

crash> p &init_net
$1 = (struct net *) 0xffffffff90c7bdc0 <init_net>
crash> vtop 0xffffffff90c7bdc0
VIRTUAL           PHYSICAL        
ffffffff90c7bdc0  402c7bdc0       

PGD DIRECTORY: ffffffff8fe10000
PAGE DIRECTORY: 401e15067
   PUD: 401e15ff0 => 401e16063
   PMD: 401e16430 => 8000000402c000e3
  PAGE: 402c00000  (2MB)

      PTE         PHYSICAL   FLAGS
8000000402c000e3  402c00000  (PRESENT|RW|ACCESSED|DIRTY|PSE|NX)

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
fffff227d00b1ec0 402c7b000                0        0  1 17ffffc0001000 reserved

However, as far as I understand this does not work for arm64.
Could you please help me to understand what is wrong here?

Below are:
 link to my patch:
https://lore.kernel.org/all/20220603182442.63750C385B8@smtp.kernel.org/
 and the quote of my investigation of similar report:
https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/

> virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)
>  WARNING: CPU: 87 PID: 3170 at arch/arm64/mm/physaddr.c:12 __virt_to_phys
...
>  Call trace:
>   __virt_to_phys
>   mem_cgroup_from_obj
>   __register_pernet_operations

@@ -1143,7 +1144,13 @@ static int __register_pernet_operations(struct list_head *list,
  		 * setup_net() and cleanup_net() are not possible.
		 */
		for_each_net(net) {
+			struct mem_cgroup *old, *memcg;
+
+			memcg = mem_cgroup_or_root(get_mem_cgroup_from_obj(net));   <<<<  Here
+			old = set_active_memcg(memcg);
 			error = ops_init(ops, net);
+			set_active_memcg(old);
+			mem_cgroup_put(memcg);
...
+static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p)
+{
+	struct mem_cgroup *memcg;
+
+	rcu_read_lock();
+	do {
+		memcg = mem_cgroup_from_obj(p); <<<<
+	} while (memcg && !css_tryget(&memcg->css));
...
struct mem_cgroup *mem_cgroup_from_obj(void *p)
{
        struct folio *folio;

        if (mem_cgroup_disabled())
                return NULL;

        folio = virt_to_folio(p); <<<< here
...
static inline struct folio *virt_to_folio(const void *x)
{
        struct page *page = virt_to_page(x); <<< here

... (arm64)
#define virt_to_page(x)         pfn_to_page(virt_to_pfn(x))  
...
#define virt_to_pfn(x)          __phys_to_pfn(__virt_to_phys((unsigned long)(x)))
...
phys_addr_t __virt_to_phys(unsigned long x)
{
        WARN(!__is_lm_address(__tag_reset(x)),
             "virt_to_phys used for non-linear address: %pK (%pS)\n",
...
virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)

Thank you,
	Vasily Averin

WARNING: multiple messages have this Message-ID (diff)
From: Vasily Averin <vvs@openvz.org>
To: Naresh Kamboju <naresh.kamboju@linaro.org>,
	Shakeel Butt <shakeelb@google.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
	Linux-Next Mailing List <linux-next@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	regressions@lists.linux.dev, lkft-triage@lists.linaro.org,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ard Biesheuvel <ardb@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Raghuram Thammiraju <raghuram.thammiraju@arm.com>,
	Mark Brown <broonie@kernel.org>, Will Deacon <will@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Qian Cai <quic_qiancai@quicinc.com>
Subject: Re: [next] arm64: boot failed - next-20220606
Date: Thu, 9 Jun 2022 05:49:02 +0300	[thread overview]
Message-ID: <2a4cc632-c936-1e42-4fdc-572334c58ee1@openvz.org> (raw)
In-Reply-To: <CA+G9fYu6mayYrrYK+0Rn1K7HOM6WbaOhnJSx-Wv6CaKBDpaT2g@mail.gmail.com>

Dear ARM developers,
could you please help me to find the reason of this problem?

On 6/7/22 18:29, Naresh Kamboju wrote:
> On Tue, 7 Jun 2022 at 19:47, Shakeel Butt <shakeelb@google.com> wrote:
>>
>> On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>>>
>>> Hi Shakeel,
>>>
>>>>>> Can you test v5.19-rc1, please?  If that does not fail, then you could
>>>>>> bisect between that and next-20220606 ...
>>>>>>
>>>>>
>>>>> This is already reported at
>>>>> https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know
>>>>> the underlying issue (which is calling virt_to_page() on a vmalloc
>>>>> address).
>>>>
>>>> Sorry, I might be wrong. Just checked the stacktrace again and it
>>>> seems like the failure is happening in early boot in this report.
>>>> Though the error "Unable to handle kernel paging request at virtual
>>>> address" is happening in the function mem_cgroup_from_obj().
>>>>
>>>> Naresh, can you repro the issue if you revert the patch "net: set
>>>> proper memcg for net_init hooks allocations"?
>>>
>>> yes. You are right !
>>> 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations")
>>> After reverting this single commit I am able to boot arm64 successfully.
>>>
>>> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
>>>
>>
>> Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?
> 
> ./scripts/faddr2line vmlinux  mem_cgroup_from_obj+0x2c/0x120
> mem_cgroup_from_obj+0x2c/0x120:
> mem_cgroup_from_obj at ??:?
> 
> Please find the following artifacts which are causing kernel crashes.
> 
> vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz
> System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map

Dear Naresh,
thank you very much

mem_cgroup_from_obj():
ffff80000836cf40:       d503245f        bti     c
ffff80000836cf44:       d503201f        nop
ffff80000836cf48:       d503201f        nop
ffff80000836cf4c:       d503233f        paciasp
ffff80000836cf50:       d503201f        nop
ffff80000836cf54:       d2e00021        mov     x1, #0x1000000000000            // #281474976710656
ffff80000836cf58:       8b010001        add     x1, x0, x1
ffff80000836cf5c:       b25657e4        mov     x4, #0xfffffc0000000000         // #-4398046511104
ffff80000836cf60:       d34cfc21        lsr     x1, x1, #12
ffff80000836cf64:       d37ae421        lsl     x1, x1, #6
ffff80000836cf68:       8b040022        add     x2, x1, x4
ffff80000836cf6c:       f9400443        ldr     x3, [x2, #8]

x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680
x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740

x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj()
according to System.map it is init_net

This issue is caused by calling virt_to_page() on address of static variable init_net.
Arm64 consider that addresses of static variables are not valid virtual addresses.
On x86_64 the same API works without any problem.

Unfortunately I do not understand the cause of the problem.
I do not see any bugs in my patch.
I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used
to account for the specified object.
In particular, in the current case, I wanted to get the memory cgroup of the
specified network namespace by the name taken from for_each_net(). 
The first object in this list is the static structure unit_net

On x86_64 I can translate its address to page:

crash> p &init_net
$1 = (struct net *) 0xffffffff90c7bdc0 <init_net>
crash> vtop 0xffffffff90c7bdc0
VIRTUAL           PHYSICAL        
ffffffff90c7bdc0  402c7bdc0       

PGD DIRECTORY: ffffffff8fe10000
PAGE DIRECTORY: 401e15067
   PUD: 401e15ff0 => 401e16063
   PMD: 401e16430 => 8000000402c000e3
  PAGE: 402c00000  (2MB)

      PTE         PHYSICAL   FLAGS
8000000402c000e3  402c00000  (PRESENT|RW|ACCESSED|DIRTY|PSE|NX)

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
fffff227d00b1ec0 402c7b000                0        0  1 17ffffc0001000 reserved

However, as far as I understand this does not work for arm64.
Could you please help me to understand what is wrong here?

Below are:
 link to my patch:
https://lore.kernel.org/all/20220603182442.63750C385B8@smtp.kernel.org/
 and the quote of my investigation of similar report:
https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/

> virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)
>  WARNING: CPU: 87 PID: 3170 at arch/arm64/mm/physaddr.c:12 __virt_to_phys
...
>  Call trace:
>   __virt_to_phys
>   mem_cgroup_from_obj
>   __register_pernet_operations

@@ -1143,7 +1144,13 @@ static int __register_pernet_operations(struct list_head *list,
  		 * setup_net() and cleanup_net() are not possible.
		 */
		for_each_net(net) {
+			struct mem_cgroup *old, *memcg;
+
+			memcg = mem_cgroup_or_root(get_mem_cgroup_from_obj(net));   <<<<  Here
+			old = set_active_memcg(memcg);
 			error = ops_init(ops, net);
+			set_active_memcg(old);
+			mem_cgroup_put(memcg);
...
+static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p)
+{
+	struct mem_cgroup *memcg;
+
+	rcu_read_lock();
+	do {
+		memcg = mem_cgroup_from_obj(p); <<<<
+	} while (memcg && !css_tryget(&memcg->css));
...
struct mem_cgroup *mem_cgroup_from_obj(void *p)
{
        struct folio *folio;

        if (mem_cgroup_disabled())
                return NULL;

        folio = virt_to_folio(p); <<<< here
...
static inline struct folio *virt_to_folio(const void *x)
{
        struct page *page = virt_to_page(x); <<< here

... (arm64)
#define virt_to_page(x)         pfn_to_page(virt_to_pfn(x))  
...
#define virt_to_pfn(x)          __phys_to_pfn(__virt_to_phys((unsigned long)(x)))
...
phys_addr_t __virt_to_phys(unsigned long x)
{
        WARN(!__is_lm_address(__tag_reset(x)),
             "virt_to_phys used for non-linear address: %pK (%pS)\n",
...
virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)

Thank you,
	Vasily Averin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-06-09  2:49 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-06 11:46 [next] arm64: boot failed - next-20220606 Naresh Kamboju
2022-06-06 11:46 ` Naresh Kamboju
2022-06-07  5:30 ` Naresh Kamboju
2022-06-07  5:30   ` Naresh Kamboju
2022-06-07  6:25   ` Stephen Rothwell
2022-06-07  6:25     ` Stephen Rothwell
2022-06-07  6:36     ` Shakeel Butt
2022-06-07  6:36       ` Shakeel Butt
2022-06-07  6:44       ` Shakeel Butt
2022-06-07  6:44         ` Shakeel Butt
2022-06-07 10:27         ` Naresh Kamboju
2022-06-07 10:27           ` Naresh Kamboju
2022-06-07 14:17           ` Shakeel Butt
2022-06-07 14:17             ` Shakeel Butt
2022-06-07 15:29             ` Naresh Kamboju
2022-06-07 15:29               ` Naresh Kamboju
2022-06-09  2:49               ` Vasily Averin [this message]
2022-06-09  2:49                 ` Vasily Averin
2022-06-09  3:44                 ` Kefeng Wang
2022-06-09  3:44                   ` Kefeng Wang
2022-06-09  4:43                   ` Kefeng Wang
2022-06-09  4:43                     ` Kefeng Wang
2022-06-09  5:19                     ` Roman Gushchin
2022-06-09  5:19                       ` Roman Gushchin
2022-06-09 10:11                   ` Will Deacon
2022-06-09 10:11                     ` Will Deacon
2022-06-09 10:25                     ` Catalin Marinas
2022-06-09 10:25                       ` Catalin Marinas
2022-06-09 15:23                       ` Shakeel Butt
2022-06-09 15:23                         ` Shakeel Butt
2022-06-07 10:24     ` Naresh Kamboju
2022-06-07 10:24       ` Naresh Kamboju
2022-06-09 17:26   ` Roman Gushchin
2022-06-09 17:26     ` Roman Gushchin
2022-06-09 17:47     ` Shakeel Butt
2022-06-09 17:47       ` Shakeel Butt
2022-06-09 17:56       ` Roman Gushchin
2022-06-09 17:56         ` Roman Gushchin
2022-06-09 19:12         ` Shakeel Butt
2022-06-09 19:12           ` Shakeel Butt
2022-06-09 22:05           ` Roman Gushchin
2022-06-09 22:05             ` Roman Gushchin
2022-06-09 22:16             ` Shakeel Butt
2022-06-09 22:16               ` Shakeel Butt
2022-06-10 10:56     ` Naresh Kamboju
2022-06-10 10:56       ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a4cc632-c936-1e42-4fdc-572334c58ee1@openvz.org \
    --to=vvs@openvz.org \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-next@vger.kernel.org \
    --cc=lkft-triage@lists.linaro.org \
    --cc=naresh.kamboju@linaro.org \
    --cc=quic_qiancai@quicinc.com \
    --cc=raghuram.thammiraju@arm.com \
    --cc=regressions@lists.linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=sfr@canb.auug.org.au \
    --cc=shakeelb@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.