All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liam Howlett <liam.howlett@oracle.com>
To: Sven Schnelle <svens@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>,
	Guenter Roeck <linux@roeck-us.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mapletree-vs-khugepaged
Date: Mon, 16 May 2022 14:02:09 +0000	[thread overview]
Message-ID: <20220516140202.pcw2f6gu4kyslmjd@revolver> (raw)
In-Reply-To: <yt9dbkvy5zu0.fsf@linux.ibm.com>

* Sven Schnelle <svens@linux.ibm.com> [220515 16:02]:
> Liam Howlett <liam.howlett@oracle.com> writes:
> 
> > * Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
> >> Starting today we're still seeing the same crash with linux-next from
> >> (next-20220513):
> >>
> >> [  211.937897] CPU: 7 PID: 535 Comm: pt_upgrade Not tainted 5.18.0-rc6-11648-g76535d42eb53-dirty #732
> >> [  211.937902] Unable to handle kernel pointer dereference in virtual kernel address space
> >> [  211.937903] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
> >> [  211.937906] Failing address: 0e00000000000000 TEID: 0e00000000000803
> >> [  211.937909] Krnl PSW : 0704c00180000000 0000001ca52f06d6
> >> [  211.937910] Fault in home space mode while using kernel ASCE.
> >> [  211.937917] AS:0000001ca6e24007 R3:0000001fffff0007 S:0000001ffffef800 P:000000000000003d
> >> [  211.937914]  (mmap_region+0x19e/0x848)
> >> [  211.937929]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >> [  211.937939] Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> >> [  211.937942]            ffffffff00000f0f ffffffffffffffff 0e00000000000000 0000040000001000
> >> [  211.937945]            0000000083551900 0000040000000000 00000000000000fb 000003800070fc58
> >> [  211.937947]            000000008f490000 0000000000000000 0000001ca52f06b6 000003800070fb48
> >> [  211.937959] Krnl Code: 0000001ca52f06c6: a7740021            brc     7,0000001ca52f0708
> >> [  211.937959]            0000001ca52f06ca: ec6801b3007c        cgij    %r6,0,8,0000001ca52f0a30
> >> [  211.937959]           #0000001ca52f06d0: e310f0f80004        lg      %r1,248(%r15)
> >> [  211.937959]           >0000001ca52f06d6: e37010000020        cg      %r7,0(%r1)
> >> [  211.937959]            0000001ca52f06dc: a78400ea            brc     8,0000001ca52f08b0
> >> [  211.937959]            0000001ca52f06e0: e310f0f00004        lg      %r1,240(%r15)
> >> [  211.937959]            0000001ca52f06e6: ec180008007c        cgij    %r1,0,8,0000001ca52f06f6
> >> [  211.937959]            0000001ca52f06ec: e39010080020        cg      %r9,8(%r1)
> >> [  211.937973] Call Trace:
> >> [  211.937975]  [<0000001ca52f06d6>] mmap_region+0x19e/0x848
> >> [  211.937978] ([<0000001ca52f06b6>] mmap_region+0x17e/0x848)
> >> [  211.937981]  [<0000001ca52f116a>] do_mmap+0x3ea/0x4c8
> >> [  211.937983]  [<0000001ca52bed12>] vm_mmap_pgoff+0xda/0x178
> >> [  211.937987]  [<0000001ca52ed5ea>] ksys_mmap_pgoff+0x62/0x238
> >> [  211.937989]  [<0000001ca52ed992>] __s390x_sys_old_mmap+0x7a/0xa0
> >> [  211.937993]  [<0000001ca5c4ef5c>] __do_syscall+0x1d4/0x200
> >> [  211.937999]  [<0000001ca5c5d572>] system_call+0x82/0xb0
> >> [  211.938002] Last Breaking-Event-Address:
> >> [  211.938003]  [<0000001ca5888616>] mas_prev+0xb6/0xc0
> >> [  211.938010] Oops: 0038 ilc:3 [#2]
> >> [  211.938011] Kernel panic - not syncing: Fatal exception: panic_on_oops
> >> [  211.938012] SMP
> >> [  211.938014] Modules linked in:
> >> 07: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000001C
> >> A50679A6
> >>
> >> IS that issue supposed to be fixed? git bisect pointed me to
> >>
> >> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
> >>   'mm-everything' of
> >>   git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> >>
> >> which isn't really helpful.
> >>
> >> Anything we could help with debugging this?
> >
> > I tested the maple tree on top of the s390 as it was the same crash and
> > it was okay.  I haven't tested the mm-everything branch though.  Can you
> > test mm-unstable?
> 
> Yes, i tested mm-unstable but wasn't able to reproduce the issue.
> 
> > I'll continue setting up a sparc VM for testing here and test
> > mm-everything on that and the s390
> 
> One thing that is different compared to x86 is that both sparc and s390
> are big endian. Not sure whether and where that would make a difference.
> 
> The code to trigger the crash on s390 is rather simple: Just force a
> paging level upgrade to 5 levels by calling mmap() with an address that
> doesn't fit in 3 levels. Haven't tested whether an upgrade to 4 levels
> would be sufficent. I've condensed our test case that triggers this, and
> basically all that is required is:
> 
> --------------------------------8<---------------------------------------
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/wait.h>
> #include <stdio.h>
> 
> #define PAGE_SIZE       0x1000
> #define _REGION1_SIZE   (1UL << 54)
> 
> int main(int argc, char *argv[])
> {
>         int pid, status;
>         void *addr;
> 
>         pid = fork();
>         if (pid == 0) {
>                 /*
>                  * Trigger page table level upgrade
>                  */
>                 addr = mmap((void *)_REGION1_SIZE, PAGE_SIZE, PROT_READ | PROT_WRITE,
>                             MAP_SHARED | MAP_ANONYMOUS, -1, 0);
>                 if (addr == MAP_FAILED)
>                         return 1;
>                 *(int *)addr = 1;
>                 return 0;
>         }
>         wait(&status);
>         return 0;
> }
> --------------------------------8<---------------------------------------
> 

I tried the above on my qemu s390 with kernel 5.18.0-rc6-next-20220513,
but it runs without issue, return code is 0.  Is there something the VM
needs to have for this to trigger?

> I've added a few debug statements to the maple tree code:
> 
> [   27.769641] mas_next_entry: offset=14
> [   27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14

Where exactly are you printing this?

> 
> I see in mas_next_nentry() that there's a while that iterates over the
> (used?) slots until count is reached.`

Yes, mas_next_nentry() looks for the next non-null entry in the current
node.

>After that loop mas_next_entry()
> just picks the next (unused?) entry, which is slot 15 in that case.

mas_next_entry() returns the next non-null entry.  If there isn't one
returned by mas_next_nentry(), then it will advance to the next node by
calling mas_next_node().  There are checks in there for detecting dead
nodes for RCU use and limit checking as well.

> 
> What i noticed while scanning over include/linux/maple_tree.h is:
> 
> struct maple_range_64 {
> 	struct maple_pnode *parent;
> 	unsigned long pivot[MAPLE_RANGE64_SLOTS - 1];
> 	union {
> 		void __rcu *slot[MAPLE_RANGE64_SLOTS];
> 		struct {
> 		void __rcu *pad[MAPLE_RANGE64_SLOTS - 1];
> 		struct maple_metadata meta;
>         	};
> 	};
> };
> 
> and struct maple_metadata is:
> 
> struct maple_metadata {
> 	unsigned char end;
> 	unsigned char gap;
> };
> 
> If i swap the gap and end members 0x0e00000000000000 becomes
> 0x000e000000000000. And 0xe matches our msa->offset 14 above.
> So it looks like mas_next() in mmap_region returns the meta
> data for the node.

If this is the case, then I think any task that has more than 14 VMAs
would have issues.  I also use mas_next_entry() in mas_find() which is
used for the mas_for_each() macro/iterator.  Can you please enable
CONFIG_DEBUG_VM_MAPLE_TREE ?  mmap.c tests the tree after pretty much
any change and will dump useful information if there is an issue -
including the entire tree. See validate_mm_mt() for details.

You can find CONFIG_DEBUG_VM_MAPLE_TREE in the config:
kernel hacking -> Memory debugging -> Debug VM -> Debug VM maple trees

> 
> So from the lines above you likely already guessed that i have no clue
> how mapple tree works, and i didn't had enough time today to read all
> the magic and understand it. But i thought i just drop my observation
> here in case someone has an idea.

Thanks for sharing.  I'm having a hard time recreating the issue so I
cannot fully dig in myself.



I was able to boot spar64 with mm-unstable.  I did get an error:
[    5.002625] Kernel unaligned access at TPC[59bae8]
mmap_region+0x168/0xb00

faddr2line is less than useful though with reported line "at ??:?"

I'll keep digging into that.

Thanks,
Liam

  reply	other threads:[~2022-05-16 14:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-28 17:20 [PATCH] mapletree-vs-khugepaged Guenter Roeck
2022-04-28 19:27 ` Liam Howlett
2022-04-29 12:09 ` Heiko Carstens
2022-04-29 13:01   ` Liam Howlett
2022-04-29 13:10     ` Heiko Carstens
2022-04-29 16:18       ` Liam Howlett
2022-05-02  9:10         ` Geert Uytterhoeven
2022-05-13 14:46   ` Sven Schnelle
2022-05-13 14:51     ` Sven Schnelle
2022-05-13 16:49     ` Andrew Morton
2022-05-13 17:00     ` Liam Howlett
2022-05-15 20:02       ` Sven Schnelle
2022-05-16 14:02         ` Liam Howlett [this message]
2022-05-16 15:37           ` Sven Schnelle
2022-05-16 15:50             ` Liam Howlett
2022-05-16 17:10               ` Sven Schnelle
2022-05-17 14:52                 ` Liam Howlett
2022-05-17 11:53       ` Heiko Carstens
2022-05-17 12:26         ` Heiko Carstens
2022-05-17 13:23         ` Guenter Roeck
2022-05-17 15:03           ` Liam Howlett
2022-05-17 16:28             ` Guenter Roeck
2022-05-17 20:38               ` Liam Howlett
2022-05-17 14:32         ` Guenter Roeck
2022-05-19 14:35           ` Liam Howlett
2022-05-19 21:41             ` Guenter Roeck
2022-05-19 22:38               ` Liam Howlett
2022-05-30 17:38               ` Liam Howlett
2022-05-31 18:56                 ` Liam Howlett
2022-06-01 19:06                   ` Liam Howlett
2022-05-13 17:28     ` Guenter Roeck
2022-05-13 20:12     ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220516140202.pcw2f6gu4kyslmjd@revolver \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=hca@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@roeck-us.net \
    --cc=svens@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.