Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
       [not found] ` <bug-198497-200779-43rwxa1kcg@https.bugzilla.kernel.org/>
@ 2018-04-20 13:10   ` Jason Andryuk
  2018-04-20 13:39     ` Matthew Wilcox
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Andryuk @ 2018-04-20 13:10 UTC (permalink / raw)
  To: bugzilla-daemon, willy, akpm, linux-mm, labbott

On Thu, Apr 12, 2018 at 1:28 PM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>
> --- Comment #25 from willy@infradead.org ---
> On Thu, Apr 12, 2018 at 10:12:09AM -0700, Andrew Morton wrote:
>> On Fri, 9 Feb 2018 06:47:26 -0800 Matthew Wilcox <willy@infradead.org> wrote:
>>
>> >
>> > ping?
>> >
>>
>> There have been a bunch of updates to this issue in bugzilla
>> (https://bugzilla.kernel.org/show_bug.cgi?id=198497).  Sigh, I don't
>> know what to do about this - maybe there's some way of getting bugzilla
>> to echo everything to linux-mm or something.
>>
>> Anyway, please take a look - we appear to have a bug here.  Perhaps
>> this bug is sufficiently gnarly for you to prepare a debugging patch
>> which we can add to the mainline kernel so we get (much) more debugging
>> info when people hit it?
>
> I have a few thoughts ...
>
>  - The debugging patch I prepared appears to be doing its job well.
>    People get the message and their machine stays working.
>  - The commonality appears to be Xen running 32-bit kernels.  Maybe we
>    can kick the problem over to them to solve?
>  - If we are seeing corruption purely in the lower bits, *we'll never
>    know*.  The radix tree lookup will simply not find anything, and all
>    will be well.  That said, the bad PTE values reported in that bug have
>    the NX bit and one other bit set; generally bit 32, 33 or 34.  I have
>    an idea for adding a parity bit, but haven't had time to implement it.
>    Anyone have an intern who wants an interesting kernel project to work on?
>
> Given that this is happening on Xen, I wonder if Xen is using some of the
> bits in the page table for its own purposes.

The backtraces include do_swap_page().  While I have a swap partition
configured, I don't think it's being used.  Are we somehow
misidentifying the page as a swap page?  I'm not familiar with the
code, but is there an easy way to query global swap usage?  That way
we can see if the check for a swap page is bogus.

My system works with the band-aid patch.  When that patch sets page =
NULL, does that mean userspace is just going to get a zero-ed page?
Userspace still works AFAICT, which makes me think it is a
mis-identified page to start with.

Regards,
Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 13:10   ` [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer Jason Andryuk
@ 2018-04-20 13:39     ` Matthew Wilcox
  2018-04-20 15:20         ` Jason Andryuk
  0 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2018-04-20 13:39 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: bugzilla-daemon, akpm, linux-mm, labbott

On Fri, Apr 20, 2018 at 09:10:11AM -0400, Jason Andryuk wrote:
> > Given that this is happening on Xen, I wonder if Xen is using some of the
> > bits in the page table for its own purposes.
> 
> The backtraces include do_swap_page().  While I have a swap partition
> configured, I don't think it's being used.  Are we somehow
> misidentifying the page as a swap page?  I'm not familiar with the
> code, but is there an easy way to query global swap usage?  That way
> we can see if the check for a swap page is bogus.
> 
> My system works with the band-aid patch.  When that patch sets page =
> NULL, does that mean userspace is just going to get a zero-ed page?
> Userspace still works AFAICT, which makes me think it is a
> mis-identified page to start with.

Here's how this code works.

When we swap out an anonymous page (a page which is not backed by a
file; could be from a MAP_PRIVATE mapping, could be brk()), we write it
to the swap cache.  In order to be able to find it again, we store a
cookie (called a swp_entry_t) in the process' page table (marked with
the 'present' bit clear, so the CPU will fault on it).  When we get a
fault, we look up the cookie in a radix tree and bring that page back
in from swap.

If there's no page found in the radix tree, we put a freshly zeroed
page into the process's address space.  That's because we won't find
a page in the swap cache's radix tree for the first time we fault.
It's not an indication of a bug if there's no page to be found.

What we're seeing for this bug is page table entries of the format
0x8000'0004'0000'0000.  That would be a zeroed entry, except for the
fact that something's stepped on the upper bits.

What is worrying is that potentially Xen might be stepping on the upper
bits of either a present entry (leading to the process loading a page
that belongs to someone else) or an entry which has been swapped out,
leading to the process getting a zeroed page when it should be getting
its page back from swap.

Defending against this kind of corruption would take adding a parity
bit to the page tables.  That's not a project I have time for right now.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 13:39     ` Matthew Wilcox
@ 2018-04-20 15:20         ` Jason Andryuk
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Andryuk @ 2018-04-20 15:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: bugzilla-daemon, akpm, linux-mm, labbott, xen-devel,
	Boris Ostrovsky, Juergen Gross

Adding xen-devel and the Linux Xen maintainers.

Summary: Some Xen users (and maybe others) are hitting a BUG in
__radix_tree_lookup() under do_swap_page() - example backtrace is
provided at the end.  Matthew Wilcox provided a band-aid patch that
prints errors like the following instead of triggering the bug.

Skylake 32bit PAE Dom0:
Bad swp_entry: 80000000
mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)

Ivy Bridge 32bit PAE Dom0:
Bad swp_entry: 40000000
mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)

Other 32bit DomU:
Bad swp_entry: 4000000
mm/swap_state.c:683: bad pte e2187f30(8000000200000000)

Other 32bit:
Bad swp_entry: 2000000
mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)

The Linux bugzilla has more info
https://bugzilla.kernel.org/show_bug.cgi?id=198497

This may not be exclusive to Xen Linux, but most of the reports are on
Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
pte.

On Fri, Apr 20, 2018 at 9:39 AM, Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Apr 20, 2018 at 09:10:11AM -0400, Jason Andryuk wrote:
>> > Given that this is happening on Xen, I wonder if Xen is using some of the
>> > bits in the page table for its own purposes.
>>
>> The backtraces include do_swap_page().  While I have a swap partition
>> configured, I don't think it's being used.  Are we somehow
>> misidentifying the page as a swap page?  I'm not familiar with the
>> code, but is there an easy way to query global swap usage?  That way
>> we can see if the check for a swap page is bogus.
>>
>> My system works with the band-aid patch.  When that patch sets page =
>> NULL, does that mean userspace is just going to get a zero-ed page?
>> Userspace still works AFAICT, which makes me think it is a
>> mis-identified page to start with.
>
> Here's how this code works.

Thanks for the description.

> When we swap out an anonymous page (a page which is not backed by a
> file; could be from a MAP_PRIVATE mapping, could be brk()), we write it
> to the swap cache.  In order to be able to find it again, we store a
> cookie (called a swp_entry_t) in the process' page table (marked with
> the 'present' bit clear, so the CPU will fault on it).  When we get a
> fault, we look up the cookie in a radix tree and bring that page back
> in from swap.
>
> If there's no page found in the radix tree, we put a freshly zeroed
> page into the process's address space.  That's because we won't find
> a page in the swap cache's radix tree for the first time we fault.
> It's not an indication of a bug if there's no page to be found.

Is "no page found" the case for a lazy, un-allocated MAP_ANONYMOUS page?

> What we're seeing for this bug is page table entries of the format
> 0x8000'0004'0000'0000.  That would be a zeroed entry, except for the
> fact that something's stepped on the upper bits.

Does a totally zero-ed entry correspond to an un-allocated MAP_ANONYMOUS page?

> What is worrying is that potentially Xen might be stepping on the upper
> bits of either a present entry (leading to the process loading a page
> that belongs to someone else) or an entry which has been swapped out,
> leading to the process getting a zeroed page when it should be getting
> its page back from swap.

There was at least one report of non-Xen 32bit being affected.  There
was no backtrace, so it could be something else.  One report doesn't
have any swap configured.

> Defending against this kind of corruption would take adding a parity
> bit to the page tables.  That's not a project I have time for right now.

Understood.  Thanks for the response.

Regards,
Jason


[ 2234.939079] BUG: unable to handle kernel NULL pointer dereference at 00000008
[ 2234.942154] IP: __radix_tree_lookup+0xe/0xa0
[ 2234.945176] *pdpt = 0000000008cd5027 *pde = 0000000000000000
[ 2234.948382] Oops: 0000 [#1] SMP
[ 2234.951410] Modules linked in: hp_wmi sparse_keymap rfkill wmi_bmof
pcspkr i915 wmi hp_accel lis3lv02d input_polldev drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm hp_wireless
i2c_algo_bit hid_multitouch sha256_generic xen_netfront v4v(O) psmouse
ecb xts hid_generic xhci_pci xhci_hcd ohci_pci ohci_hcd uhci_hcd
ehci_pci ehci_hcd usbhid hid tpm_tis tpm_tis_core tpm
[ 2234.960816] CPU: 1 PID: 2338 Comm: xenvm Tainted: G           O    4.14.18 #1
[ 2234.963991] Hardware name: Hewlett-Packard HP EliteBook Folio
9470m/18DF, BIOS 68IBD Ver. F.40 02/01/2013
[ 2234.967186] task: d4370980 task.stack: cf8e8000
[ 2234.970351] EIP: __radix_tree_lookup+0xe/0xa0
[ 2234.973520] EFLAGS: 00010286 CPU: 1
[ 2234.976699] EAX: 00000004 EBX: b5900000 ECX: 00000000 EDX: 00000000
[ 2234.979887] ESI: 00000000 EDI: 00000004 EBP: cf8e9dd0 ESP: cf8e9dc0
[ 2234.983081]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[ 2234.986233] CR0: 80050033 CR2: 00000008 CR3: 08f12000 CR4: 00042660
[ 2234.989340] Call Trace:
[ 2234.992354]  radix_tree_lookup_slot+0x1d/0x50
[ 2234.995341]  ? xen_irq_disable_direct+0xc/0xc
[ 2234.998288]  find_get_entry+0x1d/0x110
[ 2235.001140]  pagecache_get_page+0x1f/0x240
[ 2235.003948]  ? xen_flush_tlb_others+0x17b/0x260
[ 2235.006784]  lookup_swap_cache+0x32/0xe0
[ 2235.009632]  swap_readahead_detect+0x67/0x2c0
[ 2235.012447]  do_swap_page+0x10a/0x750
[ 2235.015270]  ? wp_page_copy+0x2c4/0x590
[ 2235.018043]  ? xen_pmd_val+0x11/0x20
[ 2235.020729]  handle_mm_fault+0x3f8/0x970
[ 2235.023352]  ? xen_smp_send_reschedule+0xa/0x10
[ 2235.025927]  ? resched_curr+0x68/0xc0
[ 2235.028444]  __do_page_fault+0x1a7/0x480
[ 2235.030883]  do_page_fault+0x33/0x110
[ 2235.033250]  ? do_fast_syscall_32+0xb3/0x200
[ 2235.035567]  ? vmalloc_sync_all+0x290/0x290
[ 2235.037828]  common_exception+0x84/0x8a
[ 2235.040011] EIP: 0xb7c8ddea
[ 2235.042111] EFLAGS: 00010202 CPU: 1
[ 2235.044153] EAX: b7dd38d0 EBX: b7dd2780 ECX: b7dd2000 EDX: b5900010
[ 2235.046176] ESI: 00000000 EDI: b7dd38f0 EBP: b56ff124 ESP: b56ff070
[ 2235.048152]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 2235.050053] Code: 42 14 29 c6 89 f0 c1 f8 02 e9 71 ff ff ff e8 aa
81 aa ff 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 57 89 c7 56 53 83 ec
04 89 4d f0 <8b> 5f 04 89 d8 83 e0 03 83 f8 01 75 67 89 d8 83 e0 fe 0f
b6 08
[ 2235.053998] EIP: __radix_tree_lookup+0xe/0xa0 SS:ESP: 0069:cf8e9dc0
[ 2235.055895] CR2: 0000000000000008

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-20 15:20         ` Jason Andryuk
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Andryuk @ 2018-04-20 15:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Juergen Gross, bugzilla-daemon, xen-devel, linux-mm,
	Boris Ostrovsky, labbott, akpm

Adding xen-devel and the Linux Xen maintainers.

Summary: Some Xen users (and maybe others) are hitting a BUG in
__radix_tree_lookup() under do_swap_page() - example backtrace is
provided at the end.  Matthew Wilcox provided a band-aid patch that
prints errors like the following instead of triggering the bug.

Skylake 32bit PAE Dom0:
Bad swp_entry: 80000000
mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)

Ivy Bridge 32bit PAE Dom0:
Bad swp_entry: 40000000
mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)

Other 32bit DomU:
Bad swp_entry: 4000000
mm/swap_state.c:683: bad pte e2187f30(8000000200000000)

Other 32bit:
Bad swp_entry: 2000000
mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)

The Linux bugzilla has more info
https://bugzilla.kernel.org/show_bug.cgi?id=198497

This may not be exclusive to Xen Linux, but most of the reports are on
Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
pte.

On Fri, Apr 20, 2018 at 9:39 AM, Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Apr 20, 2018 at 09:10:11AM -0400, Jason Andryuk wrote:
>> > Given that this is happening on Xen, I wonder if Xen is using some of the
>> > bits in the page table for its own purposes.
>>
>> The backtraces include do_swap_page().  While I have a swap partition
>> configured, I don't think it's being used.  Are we somehow
>> misidentifying the page as a swap page?  I'm not familiar with the
>> code, but is there an easy way to query global swap usage?  That way
>> we can see if the check for a swap page is bogus.
>>
>> My system works with the band-aid patch.  When that patch sets page =
>> NULL, does that mean userspace is just going to get a zero-ed page?
>> Userspace still works AFAICT, which makes me think it is a
>> mis-identified page to start with.
>
> Here's how this code works.

Thanks for the description.

> When we swap out an anonymous page (a page which is not backed by a
> file; could be from a MAP_PRIVATE mapping, could be brk()), we write it
> to the swap cache.  In order to be able to find it again, we store a
> cookie (called a swp_entry_t) in the process' page table (marked with
> the 'present' bit clear, so the CPU will fault on it).  When we get a
> fault, we look up the cookie in a radix tree and bring that page back
> in from swap.
>
> If there's no page found in the radix tree, we put a freshly zeroed
> page into the process's address space.  That's because we won't find
> a page in the swap cache's radix tree for the first time we fault.
> It's not an indication of a bug if there's no page to be found.

Is "no page found" the case for a lazy, un-allocated MAP_ANONYMOUS page?

> What we're seeing for this bug is page table entries of the format
> 0x8000'0004'0000'0000.  That would be a zeroed entry, except for the
> fact that something's stepped on the upper bits.

Does a totally zero-ed entry correspond to an un-allocated MAP_ANONYMOUS page?

> What is worrying is that potentially Xen might be stepping on the upper
> bits of either a present entry (leading to the process loading a page
> that belongs to someone else) or an entry which has been swapped out,
> leading to the process getting a zeroed page when it should be getting
> its page back from swap.

There was at least one report of non-Xen 32bit being affected.  There
was no backtrace, so it could be something else.  One report doesn't
have any swap configured.

> Defending against this kind of corruption would take adding a parity
> bit to the page tables.  That's not a project I have time for right now.

Understood.  Thanks for the response.

Regards,
Jason


[ 2234.939079] BUG: unable to handle kernel NULL pointer dereference at 00000008
[ 2234.942154] IP: __radix_tree_lookup+0xe/0xa0
[ 2234.945176] *pdpt = 0000000008cd5027 *pde = 0000000000000000
[ 2234.948382] Oops: 0000 [#1] SMP
[ 2234.951410] Modules linked in: hp_wmi sparse_keymap rfkill wmi_bmof
pcspkr i915 wmi hp_accel lis3lv02d input_polldev drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm hp_wireless
i2c_algo_bit hid_multitouch sha256_generic xen_netfront v4v(O) psmouse
ecb xts hid_generic xhci_pci xhci_hcd ohci_pci ohci_hcd uhci_hcd
ehci_pci ehci_hcd usbhid hid tpm_tis tpm_tis_core tpm
[ 2234.960816] CPU: 1 PID: 2338 Comm: xenvm Tainted: G           O    4.14.18 #1
[ 2234.963991] Hardware name: Hewlett-Packard HP EliteBook Folio
9470m/18DF, BIOS 68IBD Ver. F.40 02/01/2013
[ 2234.967186] task: d4370980 task.stack: cf8e8000
[ 2234.970351] EIP: __radix_tree_lookup+0xe/0xa0
[ 2234.973520] EFLAGS: 00010286 CPU: 1
[ 2234.976699] EAX: 00000004 EBX: b5900000 ECX: 00000000 EDX: 00000000
[ 2234.979887] ESI: 00000000 EDI: 00000004 EBP: cf8e9dd0 ESP: cf8e9dc0
[ 2234.983081]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[ 2234.986233] CR0: 80050033 CR2: 00000008 CR3: 08f12000 CR4: 00042660
[ 2234.989340] Call Trace:
[ 2234.992354]  radix_tree_lookup_slot+0x1d/0x50
[ 2234.995341]  ? xen_irq_disable_direct+0xc/0xc
[ 2234.998288]  find_get_entry+0x1d/0x110
[ 2235.001140]  pagecache_get_page+0x1f/0x240
[ 2235.003948]  ? xen_flush_tlb_others+0x17b/0x260
[ 2235.006784]  lookup_swap_cache+0x32/0xe0
[ 2235.009632]  swap_readahead_detect+0x67/0x2c0
[ 2235.012447]  do_swap_page+0x10a/0x750
[ 2235.015270]  ? wp_page_copy+0x2c4/0x590
[ 2235.018043]  ? xen_pmd_val+0x11/0x20
[ 2235.020729]  handle_mm_fault+0x3f8/0x970
[ 2235.023352]  ? xen_smp_send_reschedule+0xa/0x10
[ 2235.025927]  ? resched_curr+0x68/0xc0
[ 2235.028444]  __do_page_fault+0x1a7/0x480
[ 2235.030883]  do_page_fault+0x33/0x110
[ 2235.033250]  ? do_fast_syscall_32+0xb3/0x200
[ 2235.035567]  ? vmalloc_sync_all+0x290/0x290
[ 2235.037828]  common_exception+0x84/0x8a
[ 2235.040011] EIP: 0xb7c8ddea
[ 2235.042111] EFLAGS: 00010202 CPU: 1
[ 2235.044153] EAX: b7dd38d0 EBX: b7dd2780 ECX: b7dd2000 EDX: b5900010
[ 2235.046176] ESI: 00000000 EDI: b7dd38f0 EBP: b56ff124 ESP: b56ff070
[ 2235.048152]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 2235.050053] Code: 42 14 29 c6 89 f0 c1 f8 02 e9 71 ff ff ff e8 aa
81 aa ff 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 57 89 c7 56 53 83 ec
04 89 4d f0 <8b> 5f 04 89 d8 83 e0 03 83 f8 01 75 67 89 d8 83 e0 fe 0f
b6 08
[ 2235.053998] EIP: __radix_tree_lookup+0xe/0xa0 SS:ESP: 0069:cf8e9dc0
[ 2235.055895] CR2: 0000000000000008

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 15:20         ` Jason Andryuk
@ 2018-04-20 15:25           ` Andrew Cooper
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Cooper @ 2018-04-20 15:25 UTC (permalink / raw)
  To: Jason Andryuk, Matthew Wilcox
  Cc: Juergen Gross, bugzilla-daemon, xen-devel, linux-mm,
	Boris Ostrovsky, labbott, akpm

On 20/04/18 16:20, Jason Andryuk wrote:
> Adding xen-devel and the Linux Xen maintainers.
>
> Summary: Some Xen users (and maybe others) are hitting a BUG in
> __radix_tree_lookup() under do_swap_page() - example backtrace is
> provided at the end.  Matthew Wilcox provided a band-aid patch that
> prints errors like the following instead of triggering the bug.
>
> Skylake 32bit PAE Dom0:
> Bad swp_entry: 80000000
> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>
> Ivy Bridge 32bit PAE Dom0:
> Bad swp_entry: 40000000
> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>
> Other 32bit DomU:
> Bad swp_entry: 4000000
> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>
> Other 32bit:
> Bad swp_entry: 2000000
> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>
> The Linux bugzilla has more info
> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>
> This may not be exclusive to Xen Linux, but most of the reports are on
> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
> pte.

Yes - Xen does use the upper bits of a PTE, but only 1 in release
builds, and a second in debug builds.A  I don't understand where you're
getting the 3rd bit in there.

The use of these bits are dubious, and not adequately described in the
ABI, and attempts to improve the state of play has come to nothing in
the past.

~Andrew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-20 15:25           ` Andrew Cooper
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Cooper @ 2018-04-20 15:25 UTC (permalink / raw)
  To: Jason Andryuk, Matthew Wilcox
  Cc: Juergen Gross, bugzilla-daemon, xen-devel, linux-mm,
	Boris Ostrovsky, labbott, akpm

On 20/04/18 16:20, Jason Andryuk wrote:
> Adding xen-devel and the Linux Xen maintainers.
>
> Summary: Some Xen users (and maybe others) are hitting a BUG in
> __radix_tree_lookup() under do_swap_page() - example backtrace is
> provided at the end.  Matthew Wilcox provided a band-aid patch that
> prints errors like the following instead of triggering the bug.
>
> Skylake 32bit PAE Dom0:
> Bad swp_entry: 80000000
> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>
> Ivy Bridge 32bit PAE Dom0:
> Bad swp_entry: 40000000
> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>
> Other 32bit DomU:
> Bad swp_entry: 4000000
> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>
> Other 32bit:
> Bad swp_entry: 2000000
> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>
> The Linux bugzilla has more info
> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>
> This may not be exclusive to Xen Linux, but most of the reports are on
> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
> pte.

Yes - Xen does use the upper bits of a PTE, but only 1 in release
builds, and a second in debug builds.  I don't understand where you're
getting the 3rd bit in there.

The use of these bits are dubious, and not adequately described in the
ABI, and attempts to improve the state of play has come to nothing in
the past.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 15:25           ` Andrew Cooper
@ 2018-04-20 15:40             ` Andrew Cooper
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Cooper @ 2018-04-20 15:40 UTC (permalink / raw)
  To: Jason Andryuk, Matthew Wilcox
  Cc: Juergen Gross, bugzilla-daemon, xen-devel, linux-mm,
	Boris Ostrovsky, labbott, akpm

On 20/04/18 16:25, Andrew Cooper wrote:
> On 20/04/18 16:20, Jason Andryuk wrote:
>> Adding xen-devel and the Linux Xen maintainers.
>>
>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>> prints errors like the following instead of triggering the bug.
>>
>> Skylake 32bit PAE Dom0:
>> Bad swp_entry: 80000000
>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>
>> Ivy Bridge 32bit PAE Dom0:
>> Bad swp_entry: 40000000
>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>
>> Other 32bit DomU:
>> Bad swp_entry: 4000000
>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>
>> Other 32bit:
>> Bad swp_entry: 2000000
>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>
>> The Linux bugzilla has more info
>> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>>
>> This may not be exclusive to Xen Linux, but most of the reports are on
>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>> pte.
> Yes - Xen does use the upper bits of a PTE, but only 1 in release
> builds, and a second in debug builds.A  I don't understand where you're
> getting the 3rd bit in there.
>
> The use of these bits are dubious, and not adequately described in the
> ABI, and attempts to improve the state of play has come to nothing in
> the past.

Sorry - hit send too early.A  To be rather more helpful:

For 64bit guests only, we use one bit to distinguish between guest
kernel and guest user pages.A  This is because both guest user and kernel
run in ring3, and have to have _PAGE_USER set on them.A  We use bit 52 to
tag guest kernel mappings, which is seeded from the guest kernels choice
of _PAGE_USER.

In debug builds of the hypervisor only, we use bit 62 to tag grant
mappings.A  This is to help spot API errors in the guest, and results in
an instant crash if we spot misuse.

~Andrew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-20 15:40             ` Andrew Cooper
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Cooper @ 2018-04-20 15:40 UTC (permalink / raw)
  To: Jason Andryuk, Matthew Wilcox
  Cc: Juergen Gross, bugzilla-daemon, xen-devel, linux-mm,
	Boris Ostrovsky, labbott, akpm

On 20/04/18 16:25, Andrew Cooper wrote:
> On 20/04/18 16:20, Jason Andryuk wrote:
>> Adding xen-devel and the Linux Xen maintainers.
>>
>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>> prints errors like the following instead of triggering the bug.
>>
>> Skylake 32bit PAE Dom0:
>> Bad swp_entry: 80000000
>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>
>> Ivy Bridge 32bit PAE Dom0:
>> Bad swp_entry: 40000000
>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>
>> Other 32bit DomU:
>> Bad swp_entry: 4000000
>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>
>> Other 32bit:
>> Bad swp_entry: 2000000
>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>
>> The Linux bugzilla has more info
>> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>>
>> This may not be exclusive to Xen Linux, but most of the reports are on
>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>> pte.
> Yes - Xen does use the upper bits of a PTE, but only 1 in release
> builds, and a second in debug builds.  I don't understand where you're
> getting the 3rd bit in there.
>
> The use of these bits are dubious, and not adequately described in the
> ABI, and attempts to improve the state of play has come to nothing in
> the past.

Sorry - hit send too early.  To be rather more helpful:

For 64bit guests only, we use one bit to distinguish between guest
kernel and guest user pages.  This is because both guest user and kernel
run in ring3, and have to have _PAGE_USER set on them.  We use bit 52 to
tag guest kernel mappings, which is seeded from the guest kernels choice
of _PAGE_USER.

In debug builds of the hypervisor only, we use bit 62 to tag grant
mappings.  This is to help spot API errors in the guest, and results in
an instant crash if we spot misuse.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 15:25           ` Andrew Cooper
@ 2018-04-20 15:42             ` Jan Beulich
  -1 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2018-04-20 15:42 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: bugzilla-daemon, Jason Andryuk, Matthew Wilcox, linux-mm, akpm,
	xen-devel, Boris Ostrovsky, labbott, Juergen Gross

>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
> On 20/04/18 16:20, Jason Andryuk wrote:
>> Adding xen-devel and the Linux Xen maintainers.
>>
>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>> prints errors like the following instead of triggering the bug.
>>
>> Skylake 32bit PAE Dom0:
>> Bad swp_entry: 80000000
>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>
>> Ivy Bridge 32bit PAE Dom0:
>> Bad swp_entry: 40000000
>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>
>> Other 32bit DomU:
>> Bad swp_entry: 4000000
>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>
>> Other 32bit:
>> Bad swp_entry: 2000000
>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>
>> The Linux bugzilla has more info
>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>
>> This may not be exclusive to Xen Linux, but most of the reports are on
>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>> pte.
> 
> Yes - Xen does use the upper bits of a PTE, but only 1 in release
> builds, and a second in debug builds.  I don't understand where you're
> getting the 3rd bit in there.

The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
guests only. Above talk is of 32-bit guests only.

In addition both this and _PAGE_GNTTAB are used on present PTEs only,
while above talk is about swap entries.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-20 15:42             ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2018-04-20 15:42 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Jason Andryuk, bugzilla-daemon, Matthew Wilcox,
	xen-devel, linux-mm, akpm, labbott, Boris Ostrovsky

>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
> On 20/04/18 16:20, Jason Andryuk wrote:
>> Adding xen-devel and the Linux Xen maintainers.
>>
>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>> prints errors like the following instead of triggering the bug.
>>
>> Skylake 32bit PAE Dom0:
>> Bad swp_entry: 80000000
>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>
>> Ivy Bridge 32bit PAE Dom0:
>> Bad swp_entry: 40000000
>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>
>> Other 32bit DomU:
>> Bad swp_entry: 4000000
>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>
>> Other 32bit:
>> Bad swp_entry: 2000000
>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>
>> The Linux bugzilla has more info
>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>
>> This may not be exclusive to Xen Linux, but most of the reports are on
>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>> pte.
> 
> Yes - Xen does use the upper bits of a PTE, but only 1 in release
> builds, and a second in debug builds.  I don't understand where you're
> getting the 3rd bit in there.

The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
guests only. Above talk is of 32-bit guests only.

In addition both this and _PAGE_GNTTAB are used on present PTEs only,
while above talk is about swap entries.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 15:42             ` Jan Beulich
@ 2018-04-20 15:52               ` Jason Andryuk
  -1 siblings, 0 replies; 28+ messages in thread
From: Jason Andryuk @ 2018-04-20 15:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, bugzilla-daemon, Matthew Wilcox, linux-mm, akpm,
	xen-devel, Boris Ostrovsky, labbott, Juergen Gross

On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
>> On 20/04/18 16:20, Jason Andryuk wrote:
>>> Adding xen-devel and the Linux Xen maintainers.
>>>
>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>> prints errors like the following instead of triggering the bug.
>>>
>>> Skylake 32bit PAE Dom0:
>>> Bad swp_entry: 80000000
>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>
>>> Ivy Bridge 32bit PAE Dom0:
>>> Bad swp_entry: 40000000
>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>
>>> Other 32bit DomU:
>>> Bad swp_entry: 4000000
>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>
>>> Other 32bit:
>>> Bad swp_entry: 2000000
>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>>
>>> The Linux bugzilla has more info
>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>>>
>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>> pte.
>>
>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>> builds, and a second in debug builds.  I don't understand where you're
>> getting the 3rd bit in there.
>
> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
> guests only. Above talk is of 32-bit guests only.
>
> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
> while above talk is about swap entries.

This hits a BUG going through do_swap_page, but it seems like users
don't think they are actually using swap at the time.  One reporter
didn't have any swap configured.  Some of this information was further
down in my original message.

I'm wondering if somehow we have a PTE that should be empty and should
be lazily filled.  For some reason, the entry has some bits set and is
causing the trouble.  Would Xen mess with the PTEs in that case?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-20 15:52               ` Jason Andryuk
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Andryuk @ 2018-04-20 15:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Andrew Cooper, bugzilla-daemon, Matthew Wilcox,
	xen-devel, linux-mm, akpm, labbott, Boris Ostrovsky

On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
>> On 20/04/18 16:20, Jason Andryuk wrote:
>>> Adding xen-devel and the Linux Xen maintainers.
>>>
>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>> prints errors like the following instead of triggering the bug.
>>>
>>> Skylake 32bit PAE Dom0:
>>> Bad swp_entry: 80000000
>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>
>>> Ivy Bridge 32bit PAE Dom0:
>>> Bad swp_entry: 40000000
>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>
>>> Other 32bit DomU:
>>> Bad swp_entry: 4000000
>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>
>>> Other 32bit:
>>> Bad swp_entry: 2000000
>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>>
>>> The Linux bugzilla has more info
>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>>>
>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>> pte.
>>
>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>> builds, and a second in debug builds.  I don't understand where you're
>> getting the 3rd bit in there.
>
> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
> guests only. Above talk is of 32-bit guests only.
>
> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
> while above talk is about swap entries.

This hits a BUG going through do_swap_page, but it seems like users
don't think they are actually using swap at the time.  One reporter
didn't have any swap configured.  Some of this information was further
down in my original message.

I'm wondering if somehow we have a PTE that should be empty and should
be lazily filled.  For some reason, the entry has some bits set and is
causing the trouble.  Would Xen mess with the PTEs in that case?

Thanks,
Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 15:52               ` Jason Andryuk
@ 2018-04-20 16:00                 ` Andrew Cooper
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Cooper @ 2018-04-20 16:00 UTC (permalink / raw)
  To: Jason Andryuk, Jan Beulich
  Cc: bugzilla-daemon, Matthew Wilcox, linux-mm, akpm, xen-devel,
	Boris Ostrovsky, labbott, Juergen Gross

On 20/04/18 16:52, Jason Andryuk wrote:
> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
>>> On 20/04/18 16:20, Jason Andryuk wrote:
>>>> Adding xen-devel and the Linux Xen maintainers.
>>>>
>>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>>> prints errors like the following instead of triggering the bug.
>>>>
>>>> Skylake 32bit PAE Dom0:
>>>> Bad swp_entry: 80000000
>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>>
>>>> Ivy Bridge 32bit PAE Dom0:
>>>> Bad swp_entry: 40000000
>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>>
>>>> Other 32bit DomU:
>>>> Bad swp_entry: 4000000
>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>>
>>>> Other 32bit:
>>>> Bad swp_entry: 2000000
>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>>>
>>>> The Linux bugzilla has more info
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>>>>
>>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>>> pte.
>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>>> builds, and a second in debug builds.  I don't understand where you're
>>> getting the 3rd bit in there.
>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>> guests only. Above talk is of 32-bit guests only.
>>
>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>> while above talk is about swap entries.
> This hits a BUG going through do_swap_page, but it seems like users
> don't think they are actually using swap at the time.  One reporter
> didn't have any swap configured.  Some of this information was further
> down in my original message.
>
> I'm wondering if somehow we have a PTE that should be empty and should
> be lazily filled.  For some reason, the entry has some bits set and is
> causing the trouble.  Would Xen mess with the PTEs in that case?

Any PTE with the present bit clear will be accepted and used
unmodified.A  That said, I believe there is some batching of updates for
efficiency reasons in the PVops layer of the kernel, which might end up
causing a disconnect between what the swap system things, and what the
actual PTEs show when read.

~Andrew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-20 16:00                 ` Andrew Cooper
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Cooper @ 2018-04-20 16:00 UTC (permalink / raw)
  To: Jason Andryuk, Jan Beulich
  Cc: Juergen Gross, bugzilla-daemon, Matthew Wilcox, xen-devel,
	linux-mm, akpm, labbott, Boris Ostrovsky

On 20/04/18 16:52, Jason Andryuk wrote:
> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
>>> On 20/04/18 16:20, Jason Andryuk wrote:
>>>> Adding xen-devel and the Linux Xen maintainers.
>>>>
>>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>>> prints errors like the following instead of triggering the bug.
>>>>
>>>> Skylake 32bit PAE Dom0:
>>>> Bad swp_entry: 80000000
>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>>
>>>> Ivy Bridge 32bit PAE Dom0:
>>>> Bad swp_entry: 40000000
>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>>
>>>> Other 32bit DomU:
>>>> Bad swp_entry: 4000000
>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>>
>>>> Other 32bit:
>>>> Bad swp_entry: 2000000
>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>>>
>>>> The Linux bugzilla has more info
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>>>>
>>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>>> pte.
>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>>> builds, and a second in debug builds.  I don't understand where you're
>>> getting the 3rd bit in there.
>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>> guests only. Above talk is of 32-bit guests only.
>>
>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>> while above talk is about swap entries.
> This hits a BUG going through do_swap_page, but it seems like users
> don't think they are actually using swap at the time.  One reporter
> didn't have any swap configured.  Some of this information was further
> down in my original message.
>
> I'm wondering if somehow we have a PTE that should be empty and should
> be lazily filled.  For some reason, the entry has some bits set and is
> causing the trouble.  Would Xen mess with the PTEs in that case?

Any PTE with the present bit clear will be accepted and used
unmodified.  That said, I believe there is some batching of updates for
efficiency reasons in the PVops layer of the kernel, which might end up
causing a disconnect between what the swap system things, and what the
actual PTEs show when read.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 15:52               ` Jason Andryuk
@ 2018-04-20 16:02                 ` Jan Beulich
  -1 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2018-04-20 16:02 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: bugzilla-daemon, Andrew Cooper, Matthew Wilcox, linux-mm, akpm,
	xen-devel, Boris Ostrovsky, labbott, Juergen Gross

>>> On 20.04.18 at 17:52, <jandryuk@gmail.com> wrote:
> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
>>> On 20/04/18 16:20, Jason Andryuk wrote:
>>>> Adding xen-devel and the Linux Xen maintainers.
>>>>
>>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>>> prints errors like the following instead of triggering the bug.
>>>>
>>>> Skylake 32bit PAE Dom0:
>>>> Bad swp_entry: 80000000
>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>>
>>>> Ivy Bridge 32bit PAE Dom0:
>>>> Bad swp_entry: 40000000
>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>>
>>>> Other 32bit DomU:
>>>> Bad swp_entry: 4000000
>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>>
>>>> Other 32bit:
>>>> Bad swp_entry: 2000000
>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>>>
>>>> The Linux bugzilla has more info
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>>>
>>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>>> pte.
>>>
>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>>> builds, and a second in debug builds.  I don't understand where you're
>>> getting the 3rd bit in there.
>>
>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>> guests only. Above talk is of 32-bit guests only.
>>
>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>> while above talk is about swap entries.
> 
> This hits a BUG going through do_swap_page, but it seems like users
> don't think they are actually using swap at the time.  One reporter
> didn't have any swap configured.  Some of this information was further
> down in my original message.
> 
> I'm wondering if somehow we have a PTE that should be empty and should
> be lazily filled.  For some reason, the entry has some bits set and is
> causing the trouble.  Would Xen mess with the PTEs in that case?

As said in my previous reply - both of the bits Andrew has mentioned can
only ever be set when the present bit is also set (which doesn't appear to
be the case here). The set bits above are actually in the range of bits
designated to the address, which Xen wouldn't ever play with.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-20 16:02                 ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2018-04-20 16:02 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Juergen Gross, Andrew Cooper, bugzilla-daemon, Matthew Wilcox,
	xen-devel, linux-mm, akpm, labbott, Boris Ostrovsky

>>> On 20.04.18 at 17:52, <jandryuk@gmail.com> wrote:
> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
>>> On 20/04/18 16:20, Jason Andryuk wrote:
>>>> Adding xen-devel and the Linux Xen maintainers.
>>>>
>>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>>> prints errors like the following instead of triggering the bug.
>>>>
>>>> Skylake 32bit PAE Dom0:
>>>> Bad swp_entry: 80000000
>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>>
>>>> Ivy Bridge 32bit PAE Dom0:
>>>> Bad swp_entry: 40000000
>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>>
>>>> Other 32bit DomU:
>>>> Bad swp_entry: 4000000
>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>>
>>>> Other 32bit:
>>>> Bad swp_entry: 2000000
>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>>>
>>>> The Linux bugzilla has more info
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>>>
>>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>>> pte.
>>>
>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>>> builds, and a second in debug builds.  I don't understand where you're
>>> getting the 3rd bit in there.
>>
>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>> guests only. Above talk is of 32-bit guests only.
>>
>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>> while above talk is about swap entries.
> 
> This hits a BUG going through do_swap_page, but it seems like users
> don't think they are actually using swap at the time.  One reporter
> didn't have any swap configured.  Some of this information was further
> down in my original message.
> 
> I'm wondering if somehow we have a PTE that should be empty and should
> be lazily filled.  For some reason, the entry has some bits set and is
> causing the trouble.  Would Xen mess with the PTEs in that case?

As said in my previous reply - both of the bits Andrew has mentioned can
only ever be set when the present bit is also set (which doesn't appear to
be the case here). The set bits above are actually in the range of bits
designated to the address, which Xen wouldn't ever play with.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 16:02                 ` Jan Beulich
@ 2018-04-20 19:20                   ` Boris Ostrovsky
  -1 siblings, 0 replies; 28+ messages in thread
From: Boris Ostrovsky @ 2018-04-20 19:20 UTC (permalink / raw)
  To: Jan Beulich, Jason Andryuk
  Cc: bugzilla-daemon, Andrew Cooper, Matthew Wilcox, linux-mm, akpm,
	xen-devel, labbott, Juergen Gross

On 04/20/2018 12:02 PM, Jan Beulich wrote:
>>>> On 20.04.18 at 17:52, <jandryuk@gmail.com> wrote:
>> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
>>>> On 20/04/18 16:20, Jason Andryuk wrote:
>>>>> Adding xen-devel and the Linux Xen maintainers.
>>>>>
>>>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>>>> prints errors like the following instead of triggering the bug.
>>>>>
>>>>> Skylake 32bit PAE Dom0:
>>>>> Bad swp_entry: 80000000
>>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>>>
>>>>> Ivy Bridge 32bit PAE Dom0:
>>>>> Bad swp_entry: 40000000
>>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>>>
>>>>> Other 32bit DomU:
>>>>> Bad swp_entry: 4000000
>>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>>>
>>>>> Other 32bit:
>>>>> Bad swp_entry: 2000000
>>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>>>>
>>>>> The Linux bugzilla has more info
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>>>>
>>>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>>>> pte.
>>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>>>> builds, and a second in debug builds.  I don't understand where you're
>>>> getting the 3rd bit in there.
>>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>>> guests only. Above talk is of 32-bit guests only.
>>>
>>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>>> while above talk is about swap entries.
>> This hits a BUG going through do_swap_page, but it seems like users
>> don't think they are actually using swap at the time.  One reporter
>> didn't have any swap configured.  Some of this information was further
>> down in my original message.
>>
>> I'm wondering if somehow we have a PTE that should be empty and should
>> be lazily filled.  For some reason, the entry has some bits set and is
>> causing the trouble.  Would Xen mess with the PTEs in that case?
> As said in my previous reply - both of the bits Andrew has mentioned can
> only ever be set when the present bit is also set (which doesn't appear to
> be the case here). The set bits above are actually in the range of bits
> designated to the address, which Xen wouldn't ever play with.


The bug description starts with: "On a Xen VM running as pvh"

So is this a PV or a PVH guest?


-boris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-20 19:20                   ` Boris Ostrovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Boris Ostrovsky @ 2018-04-20 19:20 UTC (permalink / raw)
  To: Jan Beulich, Jason Andryuk
  Cc: Juergen Gross, Andrew Cooper, bugzilla-daemon, Matthew Wilcox,
	xen-devel, linux-mm, akpm, labbott

On 04/20/2018 12:02 PM, Jan Beulich wrote:
>>>> On 20.04.18 at 17:52, <jandryuk@gmail.com> wrote:
>> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
>>>> On 20/04/18 16:20, Jason Andryuk wrote:
>>>>> Adding xen-devel and the Linux Xen maintainers.
>>>>>
>>>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>>>> prints errors like the following instead of triggering the bug.
>>>>>
>>>>> Skylake 32bit PAE Dom0:
>>>>> Bad swp_entry: 80000000
>>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>>>
>>>>> Ivy Bridge 32bit PAE Dom0:
>>>>> Bad swp_entry: 40000000
>>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>>>
>>>>> Other 32bit DomU:
>>>>> Bad swp_entry: 4000000
>>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>>>
>>>>> Other 32bit:
>>>>> Bad swp_entry: 2000000
>>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>>>>
>>>>> The Linux bugzilla has more info
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>>>>
>>>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>>>> pte.
>>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>>>> builds, and a second in debug builds.  I don't understand where you're
>>>> getting the 3rd bit in there.
>>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>>> guests only. Above talk is of 32-bit guests only.
>>>
>>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>>> while above talk is about swap entries.
>> This hits a BUG going through do_swap_page, but it seems like users
>> don't think they are actually using swap at the time.  One reporter
>> didn't have any swap configured.  Some of this information was further
>> down in my original message.
>>
>> I'm wondering if somehow we have a PTE that should be empty and should
>> be lazily filled.  For some reason, the entry has some bits set and is
>> causing the trouble.  Would Xen mess with the PTEs in that case?
> As said in my previous reply - both of the bits Andrew has mentioned can
> only ever be set when the present bit is also set (which doesn't appear to
> be the case here). The set bits above are actually in the range of bits
> designated to the address, which Xen wouldn't ever play with.


The bug description starts with: "On a Xen VM running as pvh"

So is this a PV or a PVH guest?


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 19:20                   ` Boris Ostrovsky
@ 2018-04-21  6:17                     ` Juergen Gross
  -1 siblings, 0 replies; 28+ messages in thread
From: Juergen Gross @ 2018-04-21  6:17 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich, Jason Andryuk
  Cc: bugzilla-daemon, Andrew Cooper, Matthew Wilcox, linux-mm, akpm,
	xen-devel, labbott

On 20/04/18 21:20, Boris Ostrovsky wrote:
> On 04/20/2018 12:02 PM, Jan Beulich wrote:
>>>>> On 20.04.18 at 17:52, <jandryuk@gmail.com> wrote:
>>> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
>>>>> On 20/04/18 16:20, Jason Andryuk wrote:
>>>>>> Adding xen-devel and the Linux Xen maintainers.
>>>>>>
>>>>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>>>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>>>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>>>>> prints errors like the following instead of triggering the bug.
>>>>>>
>>>>>> Skylake 32bit PAE Dom0:
>>>>>> Bad swp_entry: 80000000
>>>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>>>>
>>>>>> Ivy Bridge 32bit PAE Dom0:
>>>>>> Bad swp_entry: 40000000
>>>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>>>>
>>>>>> Other 32bit DomU:
>>>>>> Bad swp_entry: 4000000
>>>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>>>>
>>>>>> Other 32bit:
>>>>>> Bad swp_entry: 2000000
>>>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>>>>>
>>>>>> The Linux bugzilla has more info
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>>>>>
>>>>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>>>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>>>>> pte.
>>>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>>>>> builds, and a second in debug builds.  I don't understand where you're
>>>>> getting the 3rd bit in there.
>>>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>>>> guests only. Above talk is of 32-bit guests only.
>>>>
>>>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>>>> while above talk is about swap entries.
>>> This hits a BUG going through do_swap_page, but it seems like users
>>> don't think they are actually using swap at the time.  One reporter
>>> didn't have any swap configured.  Some of this information was further
>>> down in my original message.
>>>
>>> I'm wondering if somehow we have a PTE that should be empty and should
>>> be lazily filled.  For some reason, the entry has some bits set and is
>>> causing the trouble.  Would Xen mess with the PTEs in that case?
>> As said in my previous reply - both of the bits Andrew has mentioned can
>> only ever be set when the present bit is also set (which doesn't appear to
>> be the case here). The set bits above are actually in the range of bits
>> designated to the address, which Xen wouldn't ever play with.
> 
> 
> The bug description starts with: "On a Xen VM running as pvh"
> 
> So is this a PV or a PVH guest?

The stack backtrace suggests PV.


Juergen

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-21  6:17                     ` Juergen Gross
  0 siblings, 0 replies; 28+ messages in thread
From: Juergen Gross @ 2018-04-21  6:17 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich, Jason Andryuk
  Cc: Andrew Cooper, bugzilla-daemon, Matthew Wilcox, xen-devel,
	linux-mm, akpm, labbott

On 20/04/18 21:20, Boris Ostrovsky wrote:
> On 04/20/2018 12:02 PM, Jan Beulich wrote:
>>>>> On 20.04.18 at 17:52, <jandryuk@gmail.com> wrote:
>>> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 20.04.18 at 17:25, <andrew.cooper3@citrix.com> wrote:
>>>>> On 20/04/18 16:20, Jason Andryuk wrote:
>>>>>> Adding xen-devel and the Linux Xen maintainers.
>>>>>>
>>>>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>>>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>>>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>>>>> prints errors like the following instead of triggering the bug.
>>>>>>
>>>>>> Skylake 32bit PAE Dom0:
>>>>>> Bad swp_entry: 80000000
>>>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>>>>
>>>>>> Ivy Bridge 32bit PAE Dom0:
>>>>>> Bad swp_entry: 40000000
>>>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>>>>
>>>>>> Other 32bit DomU:
>>>>>> Bad swp_entry: 4000000
>>>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>>>>
>>>>>> Other 32bit:
>>>>>> Bad swp_entry: 2000000
>>>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
>>>>>>
>>>>>> The Linux bugzilla has more info
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>>>>>
>>>>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>>>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>>>>> pte.
>>>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>>>>> builds, and a second in debug builds.  I don't understand where you're
>>>>> getting the 3rd bit in there.
>>>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>>>> guests only. Above talk is of 32-bit guests only.
>>>>
>>>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>>>> while above talk is about swap entries.
>>> This hits a BUG going through do_swap_page, but it seems like users
>>> don't think they are actually using swap at the time.  One reporter
>>> didn't have any swap configured.  Some of this information was further
>>> down in my original message.
>>>
>>> I'm wondering if somehow we have a PTE that should be empty and should
>>> be lazily filled.  For some reason, the entry has some bits set and is
>>> causing the trouble.  Would Xen mess with the PTEs in that case?
>> As said in my previous reply - both of the bits Andrew has mentioned can
>> only ever be set when the present bit is also set (which doesn't appear to
>> be the case here). The set bits above are actually in the range of bits
>> designated to the address, which Xen wouldn't ever play with.
> 
> 
> The bug description starts with: "On a Xen VM running as pvh"
> 
> So is this a PV or a PVH guest?

The stack backtrace suggests PV.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 16:02                 ` Jan Beulich
@ 2018-04-21 14:35                   ` Matthew Wilcox
  -1 siblings, 0 replies; 28+ messages in thread
From: Matthew Wilcox @ 2018-04-21 14:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jason Andryuk, bugzilla-daemon, Andrew Cooper, linux-mm, akpm,
	xen-devel, Boris Ostrovsky, labbott, Juergen Gross

On Fri, Apr 20, 2018 at 10:02:29AM -0600, Jan Beulich wrote:
> >>>> Skylake 32bit PAE Dom0:
> >>>> Bad swp_entry: 80000000
> >>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
> >>>>
> >>>> Ivy Bridge 32bit PAE Dom0:
> >>>> Bad swp_entry: 40000000
> >>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
> >>>>
> >>>> Other 32bit DomU:
> >>>> Bad swp_entry: 4000000
> >>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
> >>>>
> >>>> Other 32bit:
> >>>> Bad swp_entry: 2000000
> >>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)

> As said in my previous reply - both of the bits Andrew has mentioned can
> only ever be set when the present bit is also set (which doesn't appear to
> be the case here). The set bits above are actually in the range of bits
> designated to the address, which Xen wouldn't ever play with.

Is it relevant that all the crashes we've seen are with PAE in the guest?
Is it possible that Xen thinks the guest is not using PAE?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-21 14:35                   ` Matthew Wilcox
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Wilcox @ 2018-04-21 14:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Jason Andryuk, Andrew Cooper, bugzilla-daemon,
	xen-devel, linux-mm, akpm, labbott, Boris Ostrovsky

On Fri, Apr 20, 2018 at 10:02:29AM -0600, Jan Beulich wrote:
> >>>> Skylake 32bit PAE Dom0:
> >>>> Bad swp_entry: 80000000
> >>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
> >>>>
> >>>> Ivy Bridge 32bit PAE Dom0:
> >>>> Bad swp_entry: 40000000
> >>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
> >>>>
> >>>> Other 32bit DomU:
> >>>> Bad swp_entry: 4000000
> >>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
> >>>>
> >>>> Other 32bit:
> >>>> Bad swp_entry: 2000000
> >>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)

> As said in my previous reply - both of the bits Andrew has mentioned can
> only ever be set when the present bit is also set (which doesn't appear to
> be the case here). The set bits above are actually in the range of bits
> designated to the address, which Xen wouldn't ever play with.

Is it relevant that all the crashes we've seen are with PAE in the guest?
Is it possible that Xen thinks the guest is not using PAE?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-21 14:35                   ` Matthew Wilcox
@ 2018-04-22  5:50                     ` Juergen Gross
  -1 siblings, 0 replies; 28+ messages in thread
From: Juergen Gross @ 2018-04-22  5:50 UTC (permalink / raw)
  To: Matthew Wilcox, Jan Beulich
  Cc: Jason Andryuk, bugzilla-daemon, Andrew Cooper, linux-mm, akpm,
	xen-devel, Boris Ostrovsky, labbott

On 21/04/18 16:35, Matthew Wilcox wrote:
> On Fri, Apr 20, 2018 at 10:02:29AM -0600, Jan Beulich wrote:
>>>>>> Skylake 32bit PAE Dom0:
>>>>>> Bad swp_entry: 80000000
>>>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>>>>
>>>>>> Ivy Bridge 32bit PAE Dom0:
>>>>>> Bad swp_entry: 40000000
>>>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>>>>
>>>>>> Other 32bit DomU:
>>>>>> Bad swp_entry: 4000000
>>>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>>>>
>>>>>> Other 32bit:
>>>>>> Bad swp_entry: 2000000
>>>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
> 
>> As said in my previous reply - both of the bits Andrew has mentioned can
>> only ever be set when the present bit is also set (which doesn't appear to
>> be the case here). The set bits above are actually in the range of bits
>> designated to the address, which Xen wouldn't ever play with.
> 
> Is it relevant that all the crashes we've seen are with PAE in the guest?
> Is it possible that Xen thinks the guest is not using PAE?
> 

All Xen 32-bit PV guests are using PAE. Its part of the PV ABI.


Juergen

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-22  5:50                     ` Juergen Gross
  0 siblings, 0 replies; 28+ messages in thread
From: Juergen Gross @ 2018-04-22  5:50 UTC (permalink / raw)
  To: Matthew Wilcox, Jan Beulich
  Cc: Jason Andryuk, Andrew Cooper, bugzilla-daemon, xen-devel,
	linux-mm, akpm, labbott, Boris Ostrovsky

On 21/04/18 16:35, Matthew Wilcox wrote:
> On Fri, Apr 20, 2018 at 10:02:29AM -0600, Jan Beulich wrote:
>>>>>> Skylake 32bit PAE Dom0:
>>>>>> Bad swp_entry: 80000000
>>>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
>>>>>>
>>>>>> Ivy Bridge 32bit PAE Dom0:
>>>>>> Bad swp_entry: 40000000
>>>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
>>>>>>
>>>>>> Other 32bit DomU:
>>>>>> Bad swp_entry: 4000000
>>>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
>>>>>>
>>>>>> Other 32bit:
>>>>>> Bad swp_entry: 2000000
>>>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
> 
>> As said in my previous reply - both of the bits Andrew has mentioned can
>> only ever be set when the present bit is also set (which doesn't appear to
>> be the case here). The set bits above are actually in the range of bits
>> designated to the address, which Xen wouldn't ever play with.
> 
> Is it relevant that all the crashes we've seen are with PAE in the guest?
> Is it possible that Xen thinks the guest is not using PAE?
> 

All Xen 32-bit PV guests are using PAE. Its part of the PV ABI.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-20 15:20         ` Jason Andryuk
@ 2018-04-23  8:17           ` Juergen Gross
  -1 siblings, 0 replies; 28+ messages in thread
From: Juergen Gross @ 2018-04-23  8:17 UTC (permalink / raw)
  To: Jason Andryuk, Matthew Wilcox
  Cc: bugzilla-daemon, akpm, linux-mm, labbott, xen-devel, Boris Ostrovsky

On 20/04/18 17:20, Jason Andryuk wrote:
> Adding xen-devel and the Linux Xen maintainers.
> 
> Summary: Some Xen users (and maybe others) are hitting a BUG in
> __radix_tree_lookup() under do_swap_page() - example backtrace is
> provided at the end.  Matthew Wilcox provided a band-aid patch that
> prints errors like the following instead of triggering the bug.
> 
> Skylake 32bit PAE Dom0:
> Bad swp_entry: 80000000
> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
> 
> Ivy Bridge 32bit PAE Dom0:
> Bad swp_entry: 40000000
> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
> 
> Other 32bit DomU:
> Bad swp_entry: 4000000
> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
> 
> Other 32bit:
> Bad swp_entry: 2000000
> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
> 
> The Linux bugzilla has more info
> https://bugzilla.kernel.org/show_bug.cgi?id=198497
> 
> This may not be exclusive to Xen Linux, but most of the reports are on
> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
> pte.
> 
> On Fri, Apr 20, 2018 at 9:39 AM, Matthew Wilcox <willy@infradead.org> wrote:
>> On Fri, Apr 20, 2018 at 09:10:11AM -0400, Jason Andryuk wrote:
>>>> Given that this is happening on Xen, I wonder if Xen is using some of the
>>>> bits in the page table for its own purposes.
>>>
>>> The backtraces include do_swap_page().  While I have a swap partition
>>> configured, I don't think it's being used.  Are we somehow
>>> misidentifying the page as a swap page?  I'm not familiar with the
>>> code, but is there an easy way to query global swap usage?  That way
>>> we can see if the check for a swap page is bogus.
>>>
>>> My system works with the band-aid patch.  When that patch sets page =
>>> NULL, does that mean userspace is just going to get a zero-ed page?
>>> Userspace still works AFAICT, which makes me think it is a
>>> mis-identified page to start with.
>>
>> Here's how this code works.
> 
> Thanks for the description.
> 
>> When we swap out an anonymous page (a page which is not backed by a
>> file; could be from a MAP_PRIVATE mapping, could be brk()), we write it
>> to the swap cache.  In order to be able to find it again, we store a
>> cookie (called a swp_entry_t) in the process' page table (marked with
>> the 'present' bit clear, so the CPU will fault on it).  When we get a
>> fault, we look up the cookie in a radix tree and bring that page back
>> in from swap.
>>
>> If there's no page found in the radix tree, we put a freshly zeroed
>> page into the process's address space.  That's because we won't find
>> a page in the swap cache's radix tree for the first time we fault.
>> It's not an indication of a bug if there's no page to be found.
> 
> Is "no page found" the case for a lazy, un-allocated MAP_ANONYMOUS page?
> 
>> What we're seeing for this bug is page table entries of the format
>> 0x8000'0004'0000'0000.  That would be a zeroed entry, except for the
>> fact that something's stepped on the upper bits.
> 
> Does a totally zero-ed entry correspond to an un-allocated MAP_ANONYMOUS page?
> 
>> What is worrying is that potentially Xen might be stepping on the upper
>> bits of either a present entry (leading to the process loading a page
>> that belongs to someone else) or an entry which has been swapped out,
>> leading to the process getting a zeroed page when it should be getting
>> its page back from swap.
> 
> There was at least one report of non-Xen 32bit being affected.  There
> was no backtrace, so it could be something else.  One report doesn't
> have any swap configured.
> 
>> Defending against this kind of corruption would take adding a parity
>> bit to the page tables.  That's not a project I have time for right now.
> 
> Understood.  Thanks for the response.
> 
> Regards,
> Jason
> 
> 
> [ 2234.939079] BUG: unable to handle kernel NULL pointer dereference at 00000008
> [ 2234.942154] IP: __radix_tree_lookup+0xe/0xa0
> [ 2234.945176] *pdpt = 0000000008cd5027 *pde = 0000000000000000
> [ 2234.948382] Oops: 0000 [#1] SMP
> [ 2234.951410] Modules linked in: hp_wmi sparse_keymap rfkill wmi_bmof
> pcspkr i915 wmi hp_accel lis3lv02d input_polldev drm_kms_helper
> syscopyarea sysfillrect sysimgblt fb_sys_fops drm hp_wireless
> i2c_algo_bit hid_multitouch sha256_generic xen_netfront v4v(O) psmouse
> ecb xts hid_generic xhci_pci xhci_hcd ohci_pci ohci_hcd uhci_hcd
> ehci_pci ehci_hcd usbhid hid tpm_tis tpm_tis_core tpm
> [ 2234.960816] CPU: 1 PID: 2338 Comm: xenvm Tainted: G           O    4.14.18 #1
> [ 2234.963991] Hardware name: Hewlett-Packard HP EliteBook Folio
> 9470m/18DF, BIOS 68IBD Ver. F.40 02/01/2013
> [ 2234.967186] task: d4370980 task.stack: cf8e8000
> [ 2234.970351] EIP: __radix_tree_lookup+0xe/0xa0
> [ 2234.973520] EFLAGS: 00010286 CPU: 1
> [ 2234.976699] EAX: 00000004 EBX: b5900000 ECX: 00000000 EDX: 00000000
> [ 2234.979887] ESI: 00000000 EDI: 00000004 EBP: cf8e9dd0 ESP: cf8e9dc0
> [ 2234.983081]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> [ 2234.986233] CR0: 80050033 CR2: 00000008 CR3: 08f12000 CR4: 00042660
> [ 2234.989340] Call Trace:
> [ 2234.992354]  radix_tree_lookup_slot+0x1d/0x50
> [ 2234.995341]  ? xen_irq_disable_direct+0xc/0xc
> [ 2234.998288]  find_get_entry+0x1d/0x110
> [ 2235.001140]  pagecache_get_page+0x1f/0x240
> [ 2235.003948]  ? xen_flush_tlb_others+0x17b/0x260
> [ 2235.006784]  lookup_swap_cache+0x32/0xe0
> [ 2235.009632]  swap_readahead_detect+0x67/0x2c0
> [ 2235.012447]  do_swap_page+0x10a/0x750
> [ 2235.015270]  ? wp_page_copy+0x2c4/0x590
> [ 2235.018043]  ? xen_pmd_val+0x11/0x20
> [ 2235.020729]  handle_mm_fault+0x3f8/0x970
> [ 2235.023352]  ? xen_smp_send_reschedule+0xa/0x10
> [ 2235.025927]  ? resched_curr+0x68/0xc0
> [ 2235.028444]  __do_page_fault+0x1a7/0x480
> [ 2235.030883]  do_page_fault+0x33/0x110
> [ 2235.033250]  ? do_fast_syscall_32+0xb3/0x200
> [ 2235.035567]  ? vmalloc_sync_all+0x290/0x290
> [ 2235.037828]  common_exception+0x84/0x8a
> [ 2235.040011] EIP: 0xb7c8ddea
> [ 2235.042111] EFLAGS: 00010202 CPU: 1
> [ 2235.044153] EAX: b7dd38d0 EBX: b7dd2780 ECX: b7dd2000 EDX: b5900010
> [ 2235.046176] ESI: 00000000 EDI: b7dd38f0 EBP: b56ff124 ESP: b56ff070
> [ 2235.048152]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
> [ 2235.050053] Code: 42 14 29 c6 89 f0 c1 f8 02 e9 71 ff ff ff e8 aa
> 81 aa ff 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 57 89 c7 56 53 83 ec
> 04 89 4d f0 <8b> 5f 04 89 d8 83 e0 03 83 f8 01 75 67 89 d8 83 e0 fe 0f
> b6 08
> [ 2235.053998] EIP: __radix_tree_lookup+0xe/0xa0 SS:ESP: 0069:cf8e9dc0
> [ 2235.055895] CR2: 0000000000000008
> 

Could it be we just have a race regarding pte_clear()? This will set
the low part of the pte to zero first and then the hight part.

In case pte_clear() is used in interrupt mode especially Xen will be
rather slow as it emulates the two writes to the page table resulting
in a larger window where the race might happen.


Juergen

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-04-23  8:17           ` Juergen Gross
  0 siblings, 0 replies; 28+ messages in thread
From: Juergen Gross @ 2018-04-23  8:17 UTC (permalink / raw)
  To: Jason Andryuk, Matthew Wilcox
  Cc: bugzilla-daemon, xen-devel, linux-mm, Boris Ostrovsky, labbott, akpm

On 20/04/18 17:20, Jason Andryuk wrote:
> Adding xen-devel and the Linux Xen maintainers.
> 
> Summary: Some Xen users (and maybe others) are hitting a BUG in
> __radix_tree_lookup() under do_swap_page() - example backtrace is
> provided at the end.  Matthew Wilcox provided a band-aid patch that
> prints errors like the following instead of triggering the bug.
> 
> Skylake 32bit PAE Dom0:
> Bad swp_entry: 80000000
> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
> 
> Ivy Bridge 32bit PAE Dom0:
> Bad swp_entry: 40000000
> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
> 
> Other 32bit DomU:
> Bad swp_entry: 4000000
> mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
> 
> Other 32bit:
> Bad swp_entry: 2000000
> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
> 
> The Linux bugzilla has more info
> https://bugzilla.kernel.org/show_bug.cgi?id=198497
> 
> This may not be exclusive to Xen Linux, but most of the reports are on
> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
> pte.
> 
> On Fri, Apr 20, 2018 at 9:39 AM, Matthew Wilcox <willy@infradead.org> wrote:
>> On Fri, Apr 20, 2018 at 09:10:11AM -0400, Jason Andryuk wrote:
>>>> Given that this is happening on Xen, I wonder if Xen is using some of the
>>>> bits in the page table for its own purposes.
>>>
>>> The backtraces include do_swap_page().  While I have a swap partition
>>> configured, I don't think it's being used.  Are we somehow
>>> misidentifying the page as a swap page?  I'm not familiar with the
>>> code, but is there an easy way to query global swap usage?  That way
>>> we can see if the check for a swap page is bogus.
>>>
>>> My system works with the band-aid patch.  When that patch sets page =
>>> NULL, does that mean userspace is just going to get a zero-ed page?
>>> Userspace still works AFAICT, which makes me think it is a
>>> mis-identified page to start with.
>>
>> Here's how this code works.
> 
> Thanks for the description.
> 
>> When we swap out an anonymous page (a page which is not backed by a
>> file; could be from a MAP_PRIVATE mapping, could be brk()), we write it
>> to the swap cache.  In order to be able to find it again, we store a
>> cookie (called a swp_entry_t) in the process' page table (marked with
>> the 'present' bit clear, so the CPU will fault on it).  When we get a
>> fault, we look up the cookie in a radix tree and bring that page back
>> in from swap.
>>
>> If there's no page found in the radix tree, we put a freshly zeroed
>> page into the process's address space.  That's because we won't find
>> a page in the swap cache's radix tree for the first time we fault.
>> It's not an indication of a bug if there's no page to be found.
> 
> Is "no page found" the case for a lazy, un-allocated MAP_ANONYMOUS page?
> 
>> What we're seeing for this bug is page table entries of the format
>> 0x8000'0004'0000'0000.  That would be a zeroed entry, except for the
>> fact that something's stepped on the upper bits.
> 
> Does a totally zero-ed entry correspond to an un-allocated MAP_ANONYMOUS page?
> 
>> What is worrying is that potentially Xen might be stepping on the upper
>> bits of either a present entry (leading to the process loading a page
>> that belongs to someone else) or an entry which has been swapped out,
>> leading to the process getting a zeroed page when it should be getting
>> its page back from swap.
> 
> There was at least one report of non-Xen 32bit being affected.  There
> was no backtrace, so it could be something else.  One report doesn't
> have any swap configured.
> 
>> Defending against this kind of corruption would take adding a parity
>> bit to the page tables.  That's not a project I have time for right now.
> 
> Understood.  Thanks for the response.
> 
> Regards,
> Jason
> 
> 
> [ 2234.939079] BUG: unable to handle kernel NULL pointer dereference at 00000008
> [ 2234.942154] IP: __radix_tree_lookup+0xe/0xa0
> [ 2234.945176] *pdpt = 0000000008cd5027 *pde = 0000000000000000
> [ 2234.948382] Oops: 0000 [#1] SMP
> [ 2234.951410] Modules linked in: hp_wmi sparse_keymap rfkill wmi_bmof
> pcspkr i915 wmi hp_accel lis3lv02d input_polldev drm_kms_helper
> syscopyarea sysfillrect sysimgblt fb_sys_fops drm hp_wireless
> i2c_algo_bit hid_multitouch sha256_generic xen_netfront v4v(O) psmouse
> ecb xts hid_generic xhci_pci xhci_hcd ohci_pci ohci_hcd uhci_hcd
> ehci_pci ehci_hcd usbhid hid tpm_tis tpm_tis_core tpm
> [ 2234.960816] CPU: 1 PID: 2338 Comm: xenvm Tainted: G           O    4.14.18 #1
> [ 2234.963991] Hardware name: Hewlett-Packard HP EliteBook Folio
> 9470m/18DF, BIOS 68IBD Ver. F.40 02/01/2013
> [ 2234.967186] task: d4370980 task.stack: cf8e8000
> [ 2234.970351] EIP: __radix_tree_lookup+0xe/0xa0
> [ 2234.973520] EFLAGS: 00010286 CPU: 1
> [ 2234.976699] EAX: 00000004 EBX: b5900000 ECX: 00000000 EDX: 00000000
> [ 2234.979887] ESI: 00000000 EDI: 00000004 EBP: cf8e9dd0 ESP: cf8e9dc0
> [ 2234.983081]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> [ 2234.986233] CR0: 80050033 CR2: 00000008 CR3: 08f12000 CR4: 00042660
> [ 2234.989340] Call Trace:
> [ 2234.992354]  radix_tree_lookup_slot+0x1d/0x50
> [ 2234.995341]  ? xen_irq_disable_direct+0xc/0xc
> [ 2234.998288]  find_get_entry+0x1d/0x110
> [ 2235.001140]  pagecache_get_page+0x1f/0x240
> [ 2235.003948]  ? xen_flush_tlb_others+0x17b/0x260
> [ 2235.006784]  lookup_swap_cache+0x32/0xe0
> [ 2235.009632]  swap_readahead_detect+0x67/0x2c0
> [ 2235.012447]  do_swap_page+0x10a/0x750
> [ 2235.015270]  ? wp_page_copy+0x2c4/0x590
> [ 2235.018043]  ? xen_pmd_val+0x11/0x20
> [ 2235.020729]  handle_mm_fault+0x3f8/0x970
> [ 2235.023352]  ? xen_smp_send_reschedule+0xa/0x10
> [ 2235.025927]  ? resched_curr+0x68/0xc0
> [ 2235.028444]  __do_page_fault+0x1a7/0x480
> [ 2235.030883]  do_page_fault+0x33/0x110
> [ 2235.033250]  ? do_fast_syscall_32+0xb3/0x200
> [ 2235.035567]  ? vmalloc_sync_all+0x290/0x290
> [ 2235.037828]  common_exception+0x84/0x8a
> [ 2235.040011] EIP: 0xb7c8ddea
> [ 2235.042111] EFLAGS: 00010202 CPU: 1
> [ 2235.044153] EAX: b7dd38d0 EBX: b7dd2780 ECX: b7dd2000 EDX: b5900010
> [ 2235.046176] ESI: 00000000 EDI: b7dd38f0 EBP: b56ff124 ESP: b56ff070
> [ 2235.048152]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
> [ 2235.050053] Code: 42 14 29 c6 89 f0 c1 f8 02 e9 71 ff ff ff e8 aa
> 81 aa ff 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 57 89 c7 56 53 83 ec
> 04 89 4d f0 <8b> 5f 04 89 d8 83 e0 03 83 f8 01 75 67 89 d8 83 e0 fe 0f
> b6 08
> [ 2235.053998] EIP: __radix_tree_lookup+0xe/0xa0 SS:ESP: 0069:cf8e9dc0
> [ 2235.055895] CR2: 0000000000000008
> 

Could it be we just have a race regarding pte_clear()? This will set
the low part of the pte to zero first and then the hight part.

In case pte_clear() is used in interrupt mode especially Xen will be
rather slow as it emulates the two writes to the page table resulting
in a larger window where the race might happen.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
  2018-04-23  8:17           ` Juergen Gross
@ 2018-09-04 12:54             ` Jason Andryuk
  -1 siblings, 0 replies; 28+ messages in thread
From: Jason Andryuk @ 2018-09-04 12:54 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Matthew Wilcox, bugzilla-daemon, akpm, linux-mm, labbott,
	xen-devel, Boris Ostrovsky

On Mon, Apr 23, 2018 at 4:17 AM Juergen Gross <jgross@suse.com> wrote:
> On 20/04/18 17:20, Jason Andryuk wrote:
> > Adding xen-devel and the Linux Xen maintainers.
> >
> > Summary: Some Xen users (and maybe others) are hitting a BUG in
> > __radix_tree_lookup() under do_swap_page() - example backtrace is
> > provided at the end.  Matthew Wilcox provided a band-aid patch that
> > prints errors like the following instead of triggering the bug.
> >
> > Skylake 32bit PAE Dom0:
> > Bad swp_entry: 80000000
> > mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
> >
> > Ivy Bridge 32bit PAE Dom0:
> > Bad swp_entry: 40000000
> > mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
> >
> > Other 32bit DomU:
> > Bad swp_entry: 4000000
> > mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
> >
> > Other 32bit:
> > Bad swp_entry: 2000000
> > mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
> >
> > The Linux bugzilla has more info
> > https://bugzilla.kernel.org/show_bug.cgi?id=198497
> >
> > This may not be exclusive to Xen Linux, but most of the reports are on
> > Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
> > pte.
> >
<snip>
>
> Could it be we just have a race regarding pte_clear()? This will set
> the low part of the pte to zero first and then the hight part.
>
> In case pte_clear() is used in interrupt mode especially Xen will be
> rather slow as it emulates the two writes to the page table resulting
> in a larger window where the race might happen.

It looks like Juergen was correct.  With the L1TF vulnerability, the
Xen hypervisor needs to detect vulnerable PTEs.  For 32bit PAE, Xen
would trap on PTEs like 0x8000'0002'0000'0000  - the same format as
seen in this bug.  He wrote two patches for Linux, now upstream, to
write PTEs with 64bit operations or hypercalls and avoid the invalid
PTEs:
f7c90c2aa400 "x86/xen: don't write ptes directly in 32-bit PV guests"
b2d7a075a1cc "x86/pae: use 64 bit atomic xchg function in
native_ptep_get_and_clear"

With those patches, I have not seen a "Bad swp_entry", so this seems
fixed for me on Xen.

There was also a report of a non-Xen kernel being affected.  Is there
an underlying problem that native PAE code updates PTEs in two writes,
but there is no locking to prevent the intermediate PTE from being
used elsewhere in the kernel?

Regards,
Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
@ 2018-09-04 12:54             ` Jason Andryuk
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Andryuk @ 2018-09-04 12:54 UTC (permalink / raw)
  To: Juergen Gross
  Cc: bugzilla-daemon, Matthew Wilcox, xen-devel, linux-mm, akpm,
	labbott, Boris Ostrovsky

On Mon, Apr 23, 2018 at 4:17 AM Juergen Gross <jgross@suse.com> wrote:
> On 20/04/18 17:20, Jason Andryuk wrote:
> > Adding xen-devel and the Linux Xen maintainers.
> >
> > Summary: Some Xen users (and maybe others) are hitting a BUG in
> > __radix_tree_lookup() under do_swap_page() - example backtrace is
> > provided at the end.  Matthew Wilcox provided a band-aid patch that
> > prints errors like the following instead of triggering the bug.
> >
> > Skylake 32bit PAE Dom0:
> > Bad swp_entry: 80000000
> > mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000)
> >
> > Ivy Bridge 32bit PAE Dom0:
> > Bad swp_entry: 40000000
> > mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000)
> >
> > Other 32bit DomU:
> > Bad swp_entry: 4000000
> > mm/swap_state.c:683: bad pte e2187f30(8000000200000000)
> >
> > Other 32bit:
> > Bad swp_entry: 2000000
> > mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000)
> >
> > The Linux bugzilla has more info
> > https://bugzilla.kernel.org/show_bug.cgi?id=198497
> >
> > This may not be exclusive to Xen Linux, but most of the reports are on
> > Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
> > pte.
> >
<snip>
>
> Could it be we just have a race regarding pte_clear()? This will set
> the low part of the pte to zero first and then the hight part.
>
> In case pte_clear() is used in interrupt mode especially Xen will be
> rather slow as it emulates the two writes to the page table resulting
> in a larger window where the race might happen.

It looks like Juergen was correct.  With the L1TF vulnerability, the
Xen hypervisor needs to detect vulnerable PTEs.  For 32bit PAE, Xen
would trap on PTEs like 0x8000'0002'0000'0000  - the same format as
seen in this bug.  He wrote two patches for Linux, now upstream, to
write PTEs with 64bit operations or hypercalls and avoid the invalid
PTEs:
f7c90c2aa400 "x86/xen: don't write ptes directly in 32-bit PV guests"
b2d7a075a1cc "x86/pae: use 64 bit atomic xchg function in
native_ptep_get_and_clear"

With those patches, I have not seen a "Bad swp_entry", so this seems
fixed for me on Xen.

There was also a report of a non-Xen kernel being affected.  Is there
an underlying problem that native PAE code updates PTEs in two writes,
but there is no locking to prevent the intermediate PTE from being
used elsewhere in the kernel?

Regards,
Jason

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-09-04 12:54 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-198497-200779@https.bugzilla.kernel.org/>
     [not found] ` <bug-198497-200779-43rwxa1kcg@https.bugzilla.kernel.org/>
2018-04-20 13:10   ` [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer Jason Andryuk
2018-04-20 13:39     ` Matthew Wilcox
2018-04-20 15:20       ` Jason Andryuk
2018-04-20 15:20         ` Jason Andryuk
2018-04-20 15:25         ` [Xen-devel] " Andrew Cooper
2018-04-20 15:25           ` Andrew Cooper
2018-04-20 15:40           ` [Xen-devel] " Andrew Cooper
2018-04-20 15:40             ` Andrew Cooper
2018-04-20 15:42           ` [Xen-devel] " Jan Beulich
2018-04-20 15:42             ` Jan Beulich
2018-04-20 15:52             ` [Xen-devel] " Jason Andryuk
2018-04-20 15:52               ` Jason Andryuk
2018-04-20 16:00               ` [Xen-devel] " Andrew Cooper
2018-04-20 16:00                 ` Andrew Cooper
2018-04-20 16:02               ` [Xen-devel] " Jan Beulich
2018-04-20 16:02                 ` Jan Beulich
2018-04-20 19:20                 ` [Xen-devel] " Boris Ostrovsky
2018-04-20 19:20                   ` Boris Ostrovsky
2018-04-21  6:17                   ` [Xen-devel] " Juergen Gross
2018-04-21  6:17                     ` Juergen Gross
2018-04-21 14:35                 ` [Xen-devel] " Matthew Wilcox
2018-04-21 14:35                   ` Matthew Wilcox
2018-04-22  5:50                   ` [Xen-devel] " Juergen Gross
2018-04-22  5:50                     ` Juergen Gross
2018-04-23  8:17         ` Juergen Gross
2018-04-23  8:17           ` Juergen Gross
2018-09-04 12:54           ` Jason Andryuk
2018-09-04 12:54             ` Jason Andryuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.