All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
@ 2018-02-16 15:41 ` Juergen Gross
  0 siblings, 0 replies; 12+ messages in thread
From: Juergen Gross @ 2018-02-16 15:41 UTC (permalink / raw)
  To: linux-kernel, linux-mm, xen-devel; +Cc: akpm, mhocko, Juergen Gross, stable

Commit f7f99100d8d95dbcf09e0216a143211e79418b9f ("mm: stop zeroing
memory during allocation in vmemmap") broke Xen pv domains in some
configurations, as the "Pinned" information in struct page of early
page tables could get lost. This will lead to the kernel trying to
write directly into the page tables instead of asking the hypervisor
to do so. The result is a crash like the following:

[    0.004000] BUG: unable to handle kernel paging request at ffff8801ead19008
[    0.004000] IP: xen_set_pud+0x4e/0xd0
[    0.004000] PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
[    0.004000] Oops: 0003 [#1] PREEMPT SMP
[    0.004000] Modules linked in:
[    0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
[    0.004000] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
[    0.004000] task: ffffffff81c10480 task.stack: ffffffff81c00000
[    0.004000] RIP: e030:xen_set_pud+0x4e/0xd0
[    0.004000] RSP: e02b:ffffffff81c03cd8 EFLAGS: 00010246
[    0.004000] RAX: 002ffff800000800 RBX: ffff88020fd31000 RCX: 0000000000000000
[    0.004000] RDX: ffffea0000000000 RSI: 00000001b8308067 RDI: ffff8801ead19008
[    0.004000] RBP: ffff8801ead19008 R08: aaaaaaaaaaaaaaaa R09: 00000000063f4c80
[    0.004000] R10: aaaaaaaaaaaaaaaa R11: 0720072007200720 R12: 00000001b8308067
[    0.004000] R13: ffffffff81c8a9cc R14: ffff88018fd31000 R15: 000077ff80000000
[    0.004000] FS:  0000000000000000(0000) GS:ffff88020f600000(0000) knlGS:0000000000000000
[    0.004000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.004000] CR2: ffff8801ead19008 CR3: 0000000001c09000 CR4: 0000000000042660
[    0.004000] Call Trace:
[    0.004000]  __pmd_alloc+0x128/0x140
[    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
[    0.004000]  ioremap_page_range+0x3f4/0x410
[    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
[    0.004000]  __ioremap_caller+0x1c3/0x2e0
[    0.004000]  acpi_os_map_iomem+0x175/0x1b0
[    0.004000]  acpi_tb_acquire_table+0x39/0x66
[    0.004000]  acpi_tb_validate_table+0x44/0x7c
[    0.004000]  acpi_tb_verify_temp_table+0x45/0x304
[    0.004000]  ? acpi_ut_acquire_mutex+0x12a/0x1c2
[    0.004000]  acpi_reallocate_root_table+0x12d/0x141
[    0.004000]  acpi_early_init+0x4d/0x10a
[    0.004000]  start_kernel+0x3eb/0x4a1
[    0.004000]  ? set_init_arg+0x55/0x55
[    0.004000]  xen_start_kernel+0x528/0x532
[    0.004000] Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
[    0.004000] RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
[    0.004000] CR2: ffff8801ead19008
[    0.004000] ---[ end trace 38eca2e56f1b642e ]---

Avoid this problem by not deferring struct page initialization when
running as Xen pv guest.

Cc: <stable@vger.kernel.org> #4.15
Fixes: f7f99100d8d95d ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81e18ceef579..681d504b9a40 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 	/* Always populate low zones for address-constrained allocations */
 	if (zone_end < pgdat_end_pfn(pgdat))
 		return true;
+	/* Xen PV domains need page structures early */
+	if (xen_pv_domain())
+		return true;
 	(*nr_initialised)++;
 	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
 	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
@ 2018-02-16 15:41 ` Juergen Gross
  0 siblings, 0 replies; 12+ messages in thread
From: Juergen Gross @ 2018-02-16 15:41 UTC (permalink / raw)
  To: linux-kernel, linux-mm, xen-devel; +Cc: akpm, mhocko, Juergen Gross, stable

Commit f7f99100d8d95dbcf09e0216a143211e79418b9f ("mm: stop zeroing
memory during allocation in vmemmap") broke Xen pv domains in some
configurations, as the "Pinned" information in struct page of early
page tables could get lost. This will lead to the kernel trying to
write directly into the page tables instead of asking the hypervisor
to do so. The result is a crash like the following:

[    0.004000] BUG: unable to handle kernel paging request at ffff8801ead19008
[    0.004000] IP: xen_set_pud+0x4e/0xd0
[    0.004000] PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
[    0.004000] Oops: 0003 [#1] PREEMPT SMP
[    0.004000] Modules linked in:
[    0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
[    0.004000] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
[    0.004000] task: ffffffff81c10480 task.stack: ffffffff81c00000
[    0.004000] RIP: e030:xen_set_pud+0x4e/0xd0
[    0.004000] RSP: e02b:ffffffff81c03cd8 EFLAGS: 00010246
[    0.004000] RAX: 002ffff800000800 RBX: ffff88020fd31000 RCX: 0000000000000000
[    0.004000] RDX: ffffea0000000000 RSI: 00000001b8308067 RDI: ffff8801ead19008
[    0.004000] RBP: ffff8801ead19008 R08: aaaaaaaaaaaaaaaa R09: 00000000063f4c80
[    0.004000] R10: aaaaaaaaaaaaaaaa R11: 0720072007200720 R12: 00000001b8308067
[    0.004000] R13: ffffffff81c8a9cc R14: ffff88018fd31000 R15: 000077ff80000000
[    0.004000] FS:  0000000000000000(0000) GS:ffff88020f600000(0000) knlGS:0000000000000000
[    0.004000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.004000] CR2: ffff8801ead19008 CR3: 0000000001c09000 CR4: 0000000000042660
[    0.004000] Call Trace:
[    0.004000]  __pmd_alloc+0x128/0x140
[    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
[    0.004000]  ioremap_page_range+0x3f4/0x410
[    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
[    0.004000]  __ioremap_caller+0x1c3/0x2e0
[    0.004000]  acpi_os_map_iomem+0x175/0x1b0
[    0.004000]  acpi_tb_acquire_table+0x39/0x66
[    0.004000]  acpi_tb_validate_table+0x44/0x7c
[    0.004000]  acpi_tb_verify_temp_table+0x45/0x304
[    0.004000]  ? acpi_ut_acquire_mutex+0x12a/0x1c2
[    0.004000]  acpi_reallocate_root_table+0x12d/0x141
[    0.004000]  acpi_early_init+0x4d/0x10a
[    0.004000]  start_kernel+0x3eb/0x4a1
[    0.004000]  ? set_init_arg+0x55/0x55
[    0.004000]  xen_start_kernel+0x528/0x532
[    0.004000] Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
[    0.004000] RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
[    0.004000] CR2: ffff8801ead19008
[    0.004000] ---[ end trace 38eca2e56f1b642e ]---

Avoid this problem by not deferring struct page initialization when
running as Xen pv guest.

Cc: <stable@vger.kernel.org> #4.15
Fixes: f7f99100d8d95d ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81e18ceef579..681d504b9a40 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 	/* Always populate low zones for address-constrained allocations */
 	if (zone_end < pgdat_end_pfn(pgdat))
 		return true;
+	/* Xen PV domains need page structures early */
+	if (xen_pv_domain())
+		return true;
 	(*nr_initialised)++;
 	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
 	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
  2018-02-16 15:41 ` Juergen Gross
@ 2018-02-16 20:40   ` Andrew Morton
  -1 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2018-02-16 20:40 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, linux-mm, xen-devel, mhocko, stable, Pavel Tatashin

On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross <jgross@suse.com> wrote:

> Commit f7f99100d8d95dbcf09e0216a143211e79418b9f ("mm: stop zeroing
> memory during allocation in vmemmap") broke Xen pv domains in some
> configurations, as the "Pinned" information in struct page of early
> page tables could get lost. This will lead to the kernel trying to
> write directly into the page tables instead of asking the hypervisor
> to do so. The result is a crash like the following:

Let's cc Pavel, who authored f7f99100d8d95d.

> [    0.004000] BUG: unable to handle kernel paging request at ffff8801ead19008
> [    0.004000] IP: xen_set_pud+0x4e/0xd0
> [    0.004000] PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
> [    0.004000] Oops: 0003 [#1] PREEMPT SMP
> [    0.004000] Modules linked in:
> [    0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
> [    0.004000] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
> [    0.004000] task: ffffffff81c10480 task.stack: ffffffff81c00000
> [    0.004000] RIP: e030:xen_set_pud+0x4e/0xd0
> [    0.004000] RSP: e02b:ffffffff81c03cd8 EFLAGS: 00010246
> [    0.004000] RAX: 002ffff800000800 RBX: ffff88020fd31000 RCX: 0000000000000000
> [    0.004000] RDX: ffffea0000000000 RSI: 00000001b8308067 RDI: ffff8801ead19008
> [    0.004000] RBP: ffff8801ead19008 R08: aaaaaaaaaaaaaaaa R09: 00000000063f4c80
> [    0.004000] R10: aaaaaaaaaaaaaaaa R11: 0720072007200720 R12: 00000001b8308067
> [    0.004000] R13: ffffffff81c8a9cc R14: ffff88018fd31000 R15: 000077ff80000000
> [    0.004000] FS:  0000000000000000(0000) GS:ffff88020f600000(0000) knlGS:0000000000000000
> [    0.004000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.004000] CR2: ffff8801ead19008 CR3: 0000000001c09000 CR4: 0000000000042660
> [    0.004000] Call Trace:
> [    0.004000]  __pmd_alloc+0x128/0x140
> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
> [    0.004000]  ioremap_page_range+0x3f4/0x410
> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
> [    0.004000]  __ioremap_caller+0x1c3/0x2e0
> [    0.004000]  acpi_os_map_iomem+0x175/0x1b0
> [    0.004000]  acpi_tb_acquire_table+0x39/0x66
> [    0.004000]  acpi_tb_validate_table+0x44/0x7c
> [    0.004000]  acpi_tb_verify_temp_table+0x45/0x304
> [    0.004000]  ? acpi_ut_acquire_mutex+0x12a/0x1c2
> [    0.004000]  acpi_reallocate_root_table+0x12d/0x141
> [    0.004000]  acpi_early_init+0x4d/0x10a
> [    0.004000]  start_kernel+0x3eb/0x4a1
> [    0.004000]  ? set_init_arg+0x55/0x55
> [    0.004000]  xen_start_kernel+0x528/0x532
> [    0.004000] Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
> [    0.004000] RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
> [    0.004000] CR2: ffff8801ead19008
> [    0.004000] ---[ end trace 38eca2e56f1b642e ]---
> 
> Avoid this problem by not deferring struct page initialization when
> running as Xen pv guest.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>  	/* Always populate low zones for address-constrained allocations */
>  	if (zone_end < pgdat_end_pfn(pgdat))
>  		return true;
> +	/* Xen PV domains need page structures early */
> +	if (xen_pv_domain())
> +		return true;
>  	(*nr_initialised)++;
>  	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
>  	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {

I'm OK with applying the patch as a short-term regression fix but I do
wonder whether it's the correct fix.  What is special about Xen (in
some configurations!) that causes it to find a hole in deferred
initialization?

I'd like us to delve further please.  Because if Xen found a hole in
the implementation, others might do so.  Or perhaps Xen is doing
something naughty.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
@ 2018-02-16 20:40   ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2018-02-16 20:40 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, linux-mm, xen-devel, mhocko, stable, Pavel Tatashin

On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross <jgross@suse.com> wrote:

> Commit f7f99100d8d95dbcf09e0216a143211e79418b9f ("mm: stop zeroing
> memory during allocation in vmemmap") broke Xen pv domains in some
> configurations, as the "Pinned" information in struct page of early
> page tables could get lost. This will lead to the kernel trying to
> write directly into the page tables instead of asking the hypervisor
> to do so. The result is a crash like the following:

Let's cc Pavel, who authored f7f99100d8d95d.

> [    0.004000] BUG: unable to handle kernel paging request at ffff8801ead19008
> [    0.004000] IP: xen_set_pud+0x4e/0xd0
> [    0.004000] PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
> [    0.004000] Oops: 0003 [#1] PREEMPT SMP
> [    0.004000] Modules linked in:
> [    0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
> [    0.004000] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
> [    0.004000] task: ffffffff81c10480 task.stack: ffffffff81c00000
> [    0.004000] RIP: e030:xen_set_pud+0x4e/0xd0
> [    0.004000] RSP: e02b:ffffffff81c03cd8 EFLAGS: 00010246
> [    0.004000] RAX: 002ffff800000800 RBX: ffff88020fd31000 RCX: 0000000000000000
> [    0.004000] RDX: ffffea0000000000 RSI: 00000001b8308067 RDI: ffff8801ead19008
> [    0.004000] RBP: ffff8801ead19008 R08: aaaaaaaaaaaaaaaa R09: 00000000063f4c80
> [    0.004000] R10: aaaaaaaaaaaaaaaa R11: 0720072007200720 R12: 00000001b8308067
> [    0.004000] R13: ffffffff81c8a9cc R14: ffff88018fd31000 R15: 000077ff80000000
> [    0.004000] FS:  0000000000000000(0000) GS:ffff88020f600000(0000) knlGS:0000000000000000
> [    0.004000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.004000] CR2: ffff8801ead19008 CR3: 0000000001c09000 CR4: 0000000000042660
> [    0.004000] Call Trace:
> [    0.004000]  __pmd_alloc+0x128/0x140
> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
> [    0.004000]  ioremap_page_range+0x3f4/0x410
> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
> [    0.004000]  __ioremap_caller+0x1c3/0x2e0
> [    0.004000]  acpi_os_map_iomem+0x175/0x1b0
> [    0.004000]  acpi_tb_acquire_table+0x39/0x66
> [    0.004000]  acpi_tb_validate_table+0x44/0x7c
> [    0.004000]  acpi_tb_verify_temp_table+0x45/0x304
> [    0.004000]  ? acpi_ut_acquire_mutex+0x12a/0x1c2
> [    0.004000]  acpi_reallocate_root_table+0x12d/0x141
> [    0.004000]  acpi_early_init+0x4d/0x10a
> [    0.004000]  start_kernel+0x3eb/0x4a1
> [    0.004000]  ? set_init_arg+0x55/0x55
> [    0.004000]  xen_start_kernel+0x528/0x532
> [    0.004000] Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
> [    0.004000] RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
> [    0.004000] CR2: ffff8801ead19008
> [    0.004000] ---[ end trace 38eca2e56f1b642e ]---
> 
> Avoid this problem by not deferring struct page initialization when
> running as Xen pv guest.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>  	/* Always populate low zones for address-constrained allocations */
>  	if (zone_end < pgdat_end_pfn(pgdat))
>  		return true;
> +	/* Xen PV domains need page structures early */
> +	if (xen_pv_domain())
> +		return true;
>  	(*nr_initialised)++;
>  	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
>  	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {

I'm OK with applying the patch as a short-term regression fix but I do
wonder whether it's the correct fix.  What is special about Xen (in
some configurations!) that causes it to find a hole in deferred
initialization?

I'd like us to delve further please.  Because if Xen found a hole in
the implementation, others might do so.  Or perhaps Xen is doing
something naughty.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
  2018-02-16 15:41 ` Juergen Gross
  (?)
  (?)
@ 2018-02-16 20:40 ` Andrew Morton
  -1 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2018-02-16 20:40 UTC (permalink / raw)
  To: Juergen Gross
  Cc: mhocko, Pavel Tatashin, stable, linux-kernel, linux-mm, xen-devel

On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross <jgross@suse.com> wrote:

> Commit f7f99100d8d95dbcf09e0216a143211e79418b9f ("mm: stop zeroing
> memory during allocation in vmemmap") broke Xen pv domains in some
> configurations, as the "Pinned" information in struct page of early
> page tables could get lost. This will lead to the kernel trying to
> write directly into the page tables instead of asking the hypervisor
> to do so. The result is a crash like the following:

Let's cc Pavel, who authored f7f99100d8d95d.

> [    0.004000] BUG: unable to handle kernel paging request at ffff8801ead19008
> [    0.004000] IP: xen_set_pud+0x4e/0xd0
> [    0.004000] PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
> [    0.004000] Oops: 0003 [#1] PREEMPT SMP
> [    0.004000] Modules linked in:
> [    0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
> [    0.004000] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
> [    0.004000] task: ffffffff81c10480 task.stack: ffffffff81c00000
> [    0.004000] RIP: e030:xen_set_pud+0x4e/0xd0
> [    0.004000] RSP: e02b:ffffffff81c03cd8 EFLAGS: 00010246
> [    0.004000] RAX: 002ffff800000800 RBX: ffff88020fd31000 RCX: 0000000000000000
> [    0.004000] RDX: ffffea0000000000 RSI: 00000001b8308067 RDI: ffff8801ead19008
> [    0.004000] RBP: ffff8801ead19008 R08: aaaaaaaaaaaaaaaa R09: 00000000063f4c80
> [    0.004000] R10: aaaaaaaaaaaaaaaa R11: 0720072007200720 R12: 00000001b8308067
> [    0.004000] R13: ffffffff81c8a9cc R14: ffff88018fd31000 R15: 000077ff80000000
> [    0.004000] FS:  0000000000000000(0000) GS:ffff88020f600000(0000) knlGS:0000000000000000
> [    0.004000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.004000] CR2: ffff8801ead19008 CR3: 0000000001c09000 CR4: 0000000000042660
> [    0.004000] Call Trace:
> [    0.004000]  __pmd_alloc+0x128/0x140
> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
> [    0.004000]  ioremap_page_range+0x3f4/0x410
> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
> [    0.004000]  __ioremap_caller+0x1c3/0x2e0
> [    0.004000]  acpi_os_map_iomem+0x175/0x1b0
> [    0.004000]  acpi_tb_acquire_table+0x39/0x66
> [    0.004000]  acpi_tb_validate_table+0x44/0x7c
> [    0.004000]  acpi_tb_verify_temp_table+0x45/0x304
> [    0.004000]  ? acpi_ut_acquire_mutex+0x12a/0x1c2
> [    0.004000]  acpi_reallocate_root_table+0x12d/0x141
> [    0.004000]  acpi_early_init+0x4d/0x10a
> [    0.004000]  start_kernel+0x3eb/0x4a1
> [    0.004000]  ? set_init_arg+0x55/0x55
> [    0.004000]  xen_start_kernel+0x528/0x532
> [    0.004000] Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
> [    0.004000] RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
> [    0.004000] CR2: ffff8801ead19008
> [    0.004000] ---[ end trace 38eca2e56f1b642e ]---
> 
> Avoid this problem by not deferring struct page initialization when
> running as Xen pv guest.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>  	/* Always populate low zones for address-constrained allocations */
>  	if (zone_end < pgdat_end_pfn(pgdat))
>  		return true;
> +	/* Xen PV domains need page structures early */
> +	if (xen_pv_domain())
> +		return true;
>  	(*nr_initialised)++;
>  	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
>  	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {

I'm OK with applying the patch as a short-term regression fix but I do
wonder whether it's the correct fix.  What is special about Xen (in
some configurations!) that causes it to find a hole in deferred
initialization?

I'd like us to delve further please.  Because if Xen found a hole in
the implementation, others might do so.  Or perhaps Xen is doing
something naughty.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
  2018-02-16 15:41 ` Juergen Gross
@ 2018-02-16 20:43   ` Andrew Morton
  -1 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2018-02-16 20:43 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, linux-mm, xen-devel, mhocko, stable

On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross <jgross@suse.com> wrote:

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>  	/* Always populate low zones for address-constrained allocations */
>  	if (zone_end < pgdat_end_pfn(pgdat))
>  		return true;
> +	/* Xen PV domains need page structures early */
> +	if (xen_pv_domain())
> +		return true;

I'll do this:

--- a/mm/page_alloc.c~mm-dont-defer-struct-page-initialization-for-xen-pv-guests-fix
+++ a/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/stop_machine.h>
 #include <linux/sort.h>
 #include <linux/pfn.h>
+#include <xen/xen.h>
 #include <linux/backing-dev.h>
 #include <linux/fault-inject.h>
 #include <linux/page-isolation.h>

So we're not relying on dumb luck ;)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
@ 2018-02-16 20:43   ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2018-02-16 20:43 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, linux-mm, xen-devel, mhocko, stable

On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross <jgross@suse.com> wrote:

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>  	/* Always populate low zones for address-constrained allocations */
>  	if (zone_end < pgdat_end_pfn(pgdat))
>  		return true;
> +	/* Xen PV domains need page structures early */
> +	if (xen_pv_domain())
> +		return true;

I'll do this:

--- a/mm/page_alloc.c~mm-dont-defer-struct-page-initialization-for-xen-pv-guests-fix
+++ a/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/stop_machine.h>
 #include <linux/sort.h>
 #include <linux/pfn.h>
+#include <xen/xen.h>
 #include <linux/backing-dev.h>
 #include <linux/fault-inject.h>
 #include <linux/page-isolation.h>

So we're not relying on dumb luck ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
  2018-02-16 15:41 ` Juergen Gross
                   ` (3 preceding siblings ...)
  (?)
@ 2018-02-16 20:43 ` Andrew Morton
  -1 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2018-02-16 20:43 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-mm, mhocko, linux-kernel, stable, xen-devel

On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross <jgross@suse.com> wrote:

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>  	/* Always populate low zones for address-constrained allocations */
>  	if (zone_end < pgdat_end_pfn(pgdat))
>  		return true;
> +	/* Xen PV domains need page structures early */
> +	if (xen_pv_domain())
> +		return true;

I'll do this:

--- a/mm/page_alloc.c~mm-dont-defer-struct-page-initialization-for-xen-pv-guests-fix
+++ a/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/stop_machine.h>
 #include <linux/sort.h>
 #include <linux/pfn.h>
+#include <xen/xen.h>
 #include <linux/backing-dev.h>
 #include <linux/fault-inject.h>
 #include <linux/page-isolation.h>

So we're not relying on dumb luck ;)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
  2018-02-16 20:40   ` Andrew Morton
@ 2018-02-17 15:32     ` Pavel Tatashin
  -1 siblings, 0 replies; 12+ messages in thread
From: Pavel Tatashin @ 2018-02-17 15:32 UTC (permalink / raw)
  To: Andrew Morton, Juergen Gross
  Cc: linux-kernel, linux-mm, xen-devel, mhocko, stable

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

This is unique for Xen, so this particular issue won't effect other 
configurations. I am going to investigate if there is a way to re-enable 
deferred page initialization on xen guests.

Pavel

On 02/16/2018 03:40 PM, Andrew Morton wrote:
> On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross <jgross@suse.com> wrote:
> 
>> Commit f7f99100d8d95dbcf09e0216a143211e79418b9f ("mm: stop zeroing
>> memory during allocation in vmemmap") broke Xen pv domains in some
>> configurations, as the "Pinned" information in struct page of early
>> page tables could get lost. This will lead to the kernel trying to
>> write directly into the page tables instead of asking the hypervisor
>> to do so. The result is a crash like the following:
> 
> Let's cc Pavel, who authored f7f99100d8d95d.
> 
>> [    0.004000] BUG: unable to handle kernel paging request at ffff8801ead19008
>> [    0.004000] IP: xen_set_pud+0x4e/0xd0
>> [    0.004000] PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
>> [    0.004000] Oops: 0003 [#1] PREEMPT SMP
>> [    0.004000] Modules linked in:
>> [    0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
>> [    0.004000] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
>> [    0.004000] task: ffffffff81c10480 task.stack: ffffffff81c00000
>> [    0.004000] RIP: e030:xen_set_pud+0x4e/0xd0
>> [    0.004000] RSP: e02b:ffffffff81c03cd8 EFLAGS: 00010246
>> [    0.004000] RAX: 002ffff800000800 RBX: ffff88020fd31000 RCX: 0000000000000000
>> [    0.004000] RDX: ffffea0000000000 RSI: 00000001b8308067 RDI: ffff8801ead19008
>> [    0.004000] RBP: ffff8801ead19008 R08: aaaaaaaaaaaaaaaa R09: 00000000063f4c80
>> [    0.004000] R10: aaaaaaaaaaaaaaaa R11: 0720072007200720 R12: 00000001b8308067
>> [    0.004000] R13: ffffffff81c8a9cc R14: ffff88018fd31000 R15: 000077ff80000000
>> [    0.004000] FS:  0000000000000000(0000) GS:ffff88020f600000(0000) knlGS:0000000000000000
>> [    0.004000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.004000] CR2: ffff8801ead19008 CR3: 0000000001c09000 CR4: 0000000000042660
>> [    0.004000] Call Trace:
>> [    0.004000]  __pmd_alloc+0x128/0x140
>> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
>> [    0.004000]  ioremap_page_range+0x3f4/0x410
>> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
>> [    0.004000]  __ioremap_caller+0x1c3/0x2e0
>> [    0.004000]  acpi_os_map_iomem+0x175/0x1b0
>> [    0.004000]  acpi_tb_acquire_table+0x39/0x66
>> [    0.004000]  acpi_tb_validate_table+0x44/0x7c
>> [    0.004000]  acpi_tb_verify_temp_table+0x45/0x304
>> [    0.004000]  ? acpi_ut_acquire_mutex+0x12a/0x1c2
>> [    0.004000]  acpi_reallocate_root_table+0x12d/0x141
>> [    0.004000]  acpi_early_init+0x4d/0x10a
>> [    0.004000]  start_kernel+0x3eb/0x4a1
>> [    0.004000]  ? set_init_arg+0x55/0x55
>> [    0.004000]  xen_start_kernel+0x528/0x532
>> [    0.004000] Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
>> [    0.004000] RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
>> [    0.004000] CR2: ffff8801ead19008
>> [    0.004000] ---[ end trace 38eca2e56f1b642e ]---
>>
>> Avoid this problem by not deferring struct page initialization when
>> running as Xen pv guest.
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>>   	/* Always populate low zones for address-constrained allocations */
>>   	if (zone_end < pgdat_end_pfn(pgdat))
>>   		return true;
>> +	/* Xen PV domains need page structures early */
>> +	if (xen_pv_domain())
>> +		return true;
>>   	(*nr_initialised)++;
>>   	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
>>   	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
> 
> I'm OK with applying the patch as a short-term regression fix but I do
> wonder whether it's the correct fix.  What is special about Xen (in
> some configurations!) that causes it to find a hole in deferred
> initialization?
> 
> I'd like us to delve further please.  Because if Xen found a hole in
> the implementation, others might do so.  Or perhaps Xen is doing
> something naughty.
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
@ 2018-02-17 15:32     ` Pavel Tatashin
  0 siblings, 0 replies; 12+ messages in thread
From: Pavel Tatashin @ 2018-02-17 15:32 UTC (permalink / raw)
  To: Andrew Morton, Juergen Gross
  Cc: linux-kernel, linux-mm, xen-devel, mhocko, stable

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

This is unique for Xen, so this particular issue won't effect other 
configurations. I am going to investigate if there is a way to re-enable 
deferred page initialization on xen guests.

Pavel

On 02/16/2018 03:40 PM, Andrew Morton wrote:
> On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross <jgross@suse.com> wrote:
> 
>> Commit f7f99100d8d95dbcf09e0216a143211e79418b9f ("mm: stop zeroing
>> memory during allocation in vmemmap") broke Xen pv domains in some
>> configurations, as the "Pinned" information in struct page of early
>> page tables could get lost. This will lead to the kernel trying to
>> write directly into the page tables instead of asking the hypervisor
>> to do so. The result is a crash like the following:
> 
> Let's cc Pavel, who authored f7f99100d8d95d.
> 
>> [    0.004000] BUG: unable to handle kernel paging request at ffff8801ead19008
>> [    0.004000] IP: xen_set_pud+0x4e/0xd0
>> [    0.004000] PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
>> [    0.004000] Oops: 0003 [#1] PREEMPT SMP
>> [    0.004000] Modules linked in:
>> [    0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
>> [    0.004000] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
>> [    0.004000] task: ffffffff81c10480 task.stack: ffffffff81c00000
>> [    0.004000] RIP: e030:xen_set_pud+0x4e/0xd0
>> [    0.004000] RSP: e02b:ffffffff81c03cd8 EFLAGS: 00010246
>> [    0.004000] RAX: 002ffff800000800 RBX: ffff88020fd31000 RCX: 0000000000000000
>> [    0.004000] RDX: ffffea0000000000 RSI: 00000001b8308067 RDI: ffff8801ead19008
>> [    0.004000] RBP: ffff8801ead19008 R08: aaaaaaaaaaaaaaaa R09: 00000000063f4c80
>> [    0.004000] R10: aaaaaaaaaaaaaaaa R11: 0720072007200720 R12: 00000001b8308067
>> [    0.004000] R13: ffffffff81c8a9cc R14: ffff88018fd31000 R15: 000077ff80000000
>> [    0.004000] FS:  0000000000000000(0000) GS:ffff88020f600000(0000) knlGS:0000000000000000
>> [    0.004000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.004000] CR2: ffff8801ead19008 CR3: 0000000001c09000 CR4: 0000000000042660
>> [    0.004000] Call Trace:
>> [    0.004000]  __pmd_alloc+0x128/0x140
>> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
>> [    0.004000]  ioremap_page_range+0x3f4/0x410
>> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
>> [    0.004000]  __ioremap_caller+0x1c3/0x2e0
>> [    0.004000]  acpi_os_map_iomem+0x175/0x1b0
>> [    0.004000]  acpi_tb_acquire_table+0x39/0x66
>> [    0.004000]  acpi_tb_validate_table+0x44/0x7c
>> [    0.004000]  acpi_tb_verify_temp_table+0x45/0x304
>> [    0.004000]  ? acpi_ut_acquire_mutex+0x12a/0x1c2
>> [    0.004000]  acpi_reallocate_root_table+0x12d/0x141
>> [    0.004000]  acpi_early_init+0x4d/0x10a
>> [    0.004000]  start_kernel+0x3eb/0x4a1
>> [    0.004000]  ? set_init_arg+0x55/0x55
>> [    0.004000]  xen_start_kernel+0x528/0x532
>> [    0.004000] Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
>> [    0.004000] RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
>> [    0.004000] CR2: ffff8801ead19008
>> [    0.004000] ---[ end trace 38eca2e56f1b642e ]---
>>
>> Avoid this problem by not deferring struct page initialization when
>> running as Xen pv guest.
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>>   	/* Always populate low zones for address-constrained allocations */
>>   	if (zone_end < pgdat_end_pfn(pgdat))
>>   		return true;
>> +	/* Xen PV domains need page structures early */
>> +	if (xen_pv_domain())
>> +		return true;
>>   	(*nr_initialised)++;
>>   	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
>>   	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
> 
> I'm OK with applying the patch as a short-term regression fix but I do
> wonder whether it's the correct fix.  What is special about Xen (in
> some configurations!) that causes it to find a hole in deferred
> initialization?
> 
> I'd like us to delve further please.  Because if Xen found a hole in
> the implementation, others might do so.  Or perhaps Xen is doing
> something naughty.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
  2018-02-16 20:40   ` Andrew Morton
  (?)
  (?)
@ 2018-02-17 15:32   ` Pavel Tatashin
  -1 siblings, 0 replies; 12+ messages in thread
From: Pavel Tatashin @ 2018-02-17 15:32 UTC (permalink / raw)
  To: Andrew Morton, Juergen Gross
  Cc: linux-mm, mhocko, linux-kernel, stable, xen-devel

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

This is unique for Xen, so this particular issue won't effect other 
configurations. I am going to investigate if there is a way to re-enable 
deferred page initialization on xen guests.

Pavel

On 02/16/2018 03:40 PM, Andrew Morton wrote:
> On Fri, 16 Feb 2018 16:41:01 +0100 Juergen Gross <jgross@suse.com> wrote:
> 
>> Commit f7f99100d8d95dbcf09e0216a143211e79418b9f ("mm: stop zeroing
>> memory during allocation in vmemmap") broke Xen pv domains in some
>> configurations, as the "Pinned" information in struct page of early
>> page tables could get lost. This will lead to the kernel trying to
>> write directly into the page tables instead of asking the hypervisor
>> to do so. The result is a crash like the following:
> 
> Let's cc Pavel, who authored f7f99100d8d95d.
> 
>> [    0.004000] BUG: unable to handle kernel paging request at ffff8801ead19008
>> [    0.004000] IP: xen_set_pud+0x4e/0xd0
>> [    0.004000] PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
>> [    0.004000] Oops: 0003 [#1] PREEMPT SMP
>> [    0.004000] Modules linked in:
>> [    0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
>> [    0.004000] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
>> [    0.004000] task: ffffffff81c10480 task.stack: ffffffff81c00000
>> [    0.004000] RIP: e030:xen_set_pud+0x4e/0xd0
>> [    0.004000] RSP: e02b:ffffffff81c03cd8 EFLAGS: 00010246
>> [    0.004000] RAX: 002ffff800000800 RBX: ffff88020fd31000 RCX: 0000000000000000
>> [    0.004000] RDX: ffffea0000000000 RSI: 00000001b8308067 RDI: ffff8801ead19008
>> [    0.004000] RBP: ffff8801ead19008 R08: aaaaaaaaaaaaaaaa R09: 00000000063f4c80
>> [    0.004000] R10: aaaaaaaaaaaaaaaa R11: 0720072007200720 R12: 00000001b8308067
>> [    0.004000] R13: ffffffff81c8a9cc R14: ffff88018fd31000 R15: 000077ff80000000
>> [    0.004000] FS:  0000000000000000(0000) GS:ffff88020f600000(0000) knlGS:0000000000000000
>> [    0.004000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.004000] CR2: ffff8801ead19008 CR3: 0000000001c09000 CR4: 0000000000042660
>> [    0.004000] Call Trace:
>> [    0.004000]  __pmd_alloc+0x128/0x140
>> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
>> [    0.004000]  ioremap_page_range+0x3f4/0x410
>> [    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
>> [    0.004000]  __ioremap_caller+0x1c3/0x2e0
>> [    0.004000]  acpi_os_map_iomem+0x175/0x1b0
>> [    0.004000]  acpi_tb_acquire_table+0x39/0x66
>> [    0.004000]  acpi_tb_validate_table+0x44/0x7c
>> [    0.004000]  acpi_tb_verify_temp_table+0x45/0x304
>> [    0.004000]  ? acpi_ut_acquire_mutex+0x12a/0x1c2
>> [    0.004000]  acpi_reallocate_root_table+0x12d/0x141
>> [    0.004000]  acpi_early_init+0x4d/0x10a
>> [    0.004000]  start_kernel+0x3eb/0x4a1
>> [    0.004000]  ? set_init_arg+0x55/0x55
>> [    0.004000]  xen_start_kernel+0x528/0x532
>> [    0.004000] Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
>> [    0.004000] RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
>> [    0.004000] CR2: ffff8801ead19008
>> [    0.004000] ---[ end trace 38eca2e56f1b642e ]---
>>
>> Avoid this problem by not deferring struct page initialization when
>> running as Xen pv guest.
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>>   	/* Always populate low zones for address-constrained allocations */
>>   	if (zone_end < pgdat_end_pfn(pgdat))
>>   		return true;
>> +	/* Xen PV domains need page structures early */
>> +	if (xen_pv_domain())
>> +		return true;
>>   	(*nr_initialised)++;
>>   	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
>>   	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
> 
> I'm OK with applying the patch as a short-term regression fix but I do
> wonder whether it's the correct fix.  What is special about Xen (in
> some configurations!) that causes it to find a hole in deferred
> initialization?
> 
> I'd like us to delve further please.  Because if Xen found a hole in
> the implementation, others might do so.  Or perhaps Xen is doing
> something naughty.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RESEND v2] mm: don't defer struct page initialization for Xen pv guests
@ 2018-02-16 15:41 Juergen Gross
  0 siblings, 0 replies; 12+ messages in thread
From: Juergen Gross @ 2018-02-16 15:41 UTC (permalink / raw)
  To: linux-kernel, linux-mm, xen-devel; +Cc: Juergen Gross, akpm, mhocko, stable

Commit f7f99100d8d95dbcf09e0216a143211e79418b9f ("mm: stop zeroing
memory during allocation in vmemmap") broke Xen pv domains in some
configurations, as the "Pinned" information in struct page of early
page tables could get lost. This will lead to the kernel trying to
write directly into the page tables instead of asking the hypervisor
to do so. The result is a crash like the following:

[    0.004000] BUG: unable to handle kernel paging request at ffff8801ead19008
[    0.004000] IP: xen_set_pud+0x4e/0xd0
[    0.004000] PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
[    0.004000] Oops: 0003 [#1] PREEMPT SMP
[    0.004000] Modules linked in:
[    0.004000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
[    0.004000] Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
[    0.004000] task: ffffffff81c10480 task.stack: ffffffff81c00000
[    0.004000] RIP: e030:xen_set_pud+0x4e/0xd0
[    0.004000] RSP: e02b:ffffffff81c03cd8 EFLAGS: 00010246
[    0.004000] RAX: 002ffff800000800 RBX: ffff88020fd31000 RCX: 0000000000000000
[    0.004000] RDX: ffffea0000000000 RSI: 00000001b8308067 RDI: ffff8801ead19008
[    0.004000] RBP: ffff8801ead19008 R08: aaaaaaaaaaaaaaaa R09: 00000000063f4c80
[    0.004000] R10: aaaaaaaaaaaaaaaa R11: 0720072007200720 R12: 00000001b8308067
[    0.004000] R13: ffffffff81c8a9cc R14: ffff88018fd31000 R15: 000077ff80000000
[    0.004000] FS:  0000000000000000(0000) GS:ffff88020f600000(0000) knlGS:0000000000000000
[    0.004000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.004000] CR2: ffff8801ead19008 CR3: 0000000001c09000 CR4: 0000000000042660
[    0.004000] Call Trace:
[    0.004000]  __pmd_alloc+0x128/0x140
[    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
[    0.004000]  ioremap_page_range+0x3f4/0x410
[    0.004000]  ? acpi_os_map_iomem+0x175/0x1b0
[    0.004000]  __ioremap_caller+0x1c3/0x2e0
[    0.004000]  acpi_os_map_iomem+0x175/0x1b0
[    0.004000]  acpi_tb_acquire_table+0x39/0x66
[    0.004000]  acpi_tb_validate_table+0x44/0x7c
[    0.004000]  acpi_tb_verify_temp_table+0x45/0x304
[    0.004000]  ? acpi_ut_acquire_mutex+0x12a/0x1c2
[    0.004000]  acpi_reallocate_root_table+0x12d/0x141
[    0.004000]  acpi_early_init+0x4d/0x10a
[    0.004000]  start_kernel+0x3eb/0x4a1
[    0.004000]  ? set_init_arg+0x55/0x55
[    0.004000]  xen_start_kernel+0x528/0x532
[    0.004000] Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
[    0.004000] RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
[    0.004000] CR2: ffff8801ead19008
[    0.004000] ---[ end trace 38eca2e56f1b642e ]---

Avoid this problem by not deferring struct page initialization when
running as Xen pv guest.

Cc: <stable@vger.kernel.org> #4.15
Fixes: f7f99100d8d95d ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81e18ceef579..681d504b9a40 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -347,6 +347,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 	/* Always populate low zones for address-constrained allocations */
 	if (zone_end < pgdat_end_pfn(pgdat))
 		return true;
+	/* Xen PV domains need page structures early */
+	if (xen_pv_domain())
+		return true;
 	(*nr_initialised)++;
 	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
 	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-02-17 15:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-16 15:41 [RESEND v2] mm: don't defer struct page initialization for Xen pv guests Juergen Gross
2018-02-16 15:41 ` Juergen Gross
2018-02-16 20:40 ` Andrew Morton
2018-02-16 20:40   ` Andrew Morton
2018-02-17 15:32   ` Pavel Tatashin
2018-02-17 15:32     ` Pavel Tatashin
2018-02-17 15:32   ` Pavel Tatashin
2018-02-16 20:40 ` Andrew Morton
2018-02-16 20:43 ` Andrew Morton
2018-02-16 20:43   ` Andrew Morton
2018-02-16 20:43 ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2018-02-16 15:41 Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.