All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3 v2] Sync unmappings in vmalloc/ioremap areas
@ 2019-07-17  7:14 Joerg Roedel
  2019-07-17  7:14 ` [PATCH 1/3] x86/mm: Check for pfn instead of page in vmalloc_sync_one() Joerg Roedel
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Joerg Roedel @ 2019-07-17  7:14 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov
  Cc: Andrew Morton, linux-kernel, linux-mm, Joerg Roedel

Hi,

here is a small patch-set to sync unmappings in the
vmalloc/ioremap areas between page-tables in the system.

This is only needed x86-32 with !SHARED_KERNEL_PMD, which is
the case on a PAE kernel with PTI enabled.

On affected systems the missing sync causes old mappings to
persist in some page-tables, causing data corruption and
other undefined behavior.

Please review.

Thanks,

	Joerg

Changes since v1:
	- Added correct Fixes-tags to all patches

Joerg Roedel (3):
  x86/mm: Check for pfn instead of page in vmalloc_sync_one()
  x86/mm: Sync also unmappings in vmalloc_sync_one()
  mm/vmalloc: Sync unmappings in vunmap_page_range()

 arch/x86/mm/fault.c | 9 +++++----
 mm/vmalloc.c        | 2 ++
 2 files changed, 7 insertions(+), 4 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/3] x86/mm: Check for pfn instead of page in vmalloc_sync_one()
  2019-07-17  7:14 [PATCH 0/3 v2] Sync unmappings in vmalloc/ioremap areas Joerg Roedel
@ 2019-07-17  7:14 ` Joerg Roedel
  2019-07-17  7:14 ` [PATCH 2/3] x86/mm: Sync also unmappings " Joerg Roedel
  2019-07-17  7:14 ` [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range() Joerg Roedel
  2 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2019-07-17  7:14 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov
  Cc: Andrew Morton, linux-kernel, linux-mm, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Do not require a struct page for the mapped memory location
because it might not exist. This can happen when an
ioremapped region is mapped with 2MB pages.

Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/fault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 794f364cb882..4a4049f6d458 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -200,7 +200,7 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 	if (!pmd_present(*pmd))
 		set_pmd(pmd, *pmd_k);
 	else
-		BUG_ON(pmd_page(*pmd) != pmd_page(*pmd_k));
+		BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k));
 
 	return pmd_k;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
  2019-07-17  7:14 [PATCH 0/3 v2] Sync unmappings in vmalloc/ioremap areas Joerg Roedel
  2019-07-17  7:14 ` [PATCH 1/3] x86/mm: Check for pfn instead of page in vmalloc_sync_one() Joerg Roedel
@ 2019-07-17  7:14 ` Joerg Roedel
  2019-07-17 21:06   ` Dave Hansen
  2019-07-17 21:43     ` Thomas Gleixner
  2019-07-17  7:14 ` [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range() Joerg Roedel
  2 siblings, 2 replies; 30+ messages in thread
From: Joerg Roedel @ 2019-07-17  7:14 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov
  Cc: Andrew Morton, linux-kernel, linux-mm, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

With huge-page ioremap areas the unmappings also need to be
synced between all page-tables. Otherwise it can cause data
corruption when a region is unmapped and later re-used.

Make the vmalloc_sync_one() function ready to sync
unmappings.

Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/fault.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 4a4049f6d458..d71e167662c3 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -194,11 +194,12 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 
 	pmd = pmd_offset(pud, address);
 	pmd_k = pmd_offset(pud_k, address);
-	if (!pmd_present(*pmd_k))
-		return NULL;
 
-	if (!pmd_present(*pmd))
+	if (pmd_present(*pmd) ^ pmd_present(*pmd_k))
 		set_pmd(pmd, *pmd_k);
+
+	if (!pmd_present(*pmd_k))
+		return NULL;
 	else
 		BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k));
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-17  7:14 [PATCH 0/3 v2] Sync unmappings in vmalloc/ioremap areas Joerg Roedel
  2019-07-17  7:14 ` [PATCH 1/3] x86/mm: Check for pfn instead of page in vmalloc_sync_one() Joerg Roedel
  2019-07-17  7:14 ` [PATCH 2/3] x86/mm: Sync also unmappings " Joerg Roedel
@ 2019-07-17  7:14 ` Joerg Roedel
  2019-07-17 21:24     ` Andy Lutomirski
  2 siblings, 1 reply; 30+ messages in thread
From: Joerg Roedel @ 2019-07-17  7:14 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov
  Cc: Andrew Morton, linux-kernel, linux-mm, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

On x86-32 with PTI enabled, parts of the kernel page-tables
are not shared between processes. This can cause mappings in
the vmalloc/ioremap area to persist in some page-tables
after the regions is unmapped and released.

When the region is re-used the processes with the old
mappings do not fault in the new mappings but still access
the old ones.

This causes undefined behavior, in reality often data
corruption, kernel oopses and panics and even spontaneous
reboots.

Fix this problem by activly syncing unmaps in the
vmalloc/ioremap area to all page-tables in the system.

References: https://bugzilla.suse.com/show_bug.cgi?id=1118689
Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 mm/vmalloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4fa8d84599b0..322b11a374fd 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -132,6 +132,8 @@ static void vunmap_page_range(unsigned long addr, unsigned long end)
 			continue;
 		vunmap_p4d_range(pgd, addr, next);
 	} while (pgd++, addr = next, addr != end);
+
+	vmalloc_sync_all();
 }
 
 static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
  2019-07-17  7:14 ` [PATCH 2/3] x86/mm: Sync also unmappings " Joerg Roedel
@ 2019-07-17 21:06   ` Dave Hansen
  2019-07-18  8:44     ` Joerg Roedel
  2019-07-17 21:43     ` Thomas Gleixner
  1 sibling, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2019-07-17 21:06 UTC (permalink / raw)
  To: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Andrew Morton, linux-kernel, linux-mm, Joerg Roedel

On 7/17/19 12:14 AM, Joerg Roedel wrote:
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 4a4049f6d458..d71e167662c3 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -194,11 +194,12 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
>  
>  	pmd = pmd_offset(pud, address);
>  	pmd_k = pmd_offset(pud_k, address);
> -	if (!pmd_present(*pmd_k))
> -		return NULL;
>  
> -	if (!pmd_present(*pmd))
> +	if (pmd_present(*pmd) ^ pmd_present(*pmd_k))
>  		set_pmd(pmd, *pmd_k);

Wouldn't:

	if (pmd_present(*pmd) != pmd_present(*pmd_k))
		set_pmd(pmd, *pmd_k);

be a bit more intuitive?

But, either way, these look fine.  For the series:

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-17  7:14 ` [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range() Joerg Roedel
@ 2019-07-17 21:24     ` Andy Lutomirski
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Lutomirski @ 2019-07-17 21:24 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Andrew Morton, LKML, Linux-MM,
	Joerg Roedel

On Wed, Jul 17, 2019 at 12:14 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> On x86-32 with PTI enabled, parts of the kernel page-tables
> are not shared between processes. This can cause mappings in
> the vmalloc/ioremap area to persist in some page-tables
> after the regions is unmapped and released.
>
> When the region is re-used the processes with the old
> mappings do not fault in the new mappings but still access
> the old ones.
>
> This causes undefined behavior, in reality often data
> corruption, kernel oopses and panics and even spontaneous
> reboots.
>
> Fix this problem by activly syncing unmaps in the
> vmalloc/ioremap area to all page-tables in the system.
>
> References: https://bugzilla.suse.com/show_bug.cgi?id=1118689
> Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  mm/vmalloc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 4fa8d84599b0..322b11a374fd 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -132,6 +132,8 @@ static void vunmap_page_range(unsigned long addr, unsigned long end)
>                         continue;
>                 vunmap_p4d_range(pgd, addr, next);
>         } while (pgd++, addr = next, addr != end);
> +
> +       vmalloc_sync_all();
>  }

I'm confused.  Shouldn't the code in _vm_unmap_aliases handle this?
As it stands, won't your patch hurt performance on x86_64?  If x86_32
is a special snowflake here, maybe flush_tlb_kernel_range() should
handle this?

Even if your patch is correct, a comment would be nice

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
@ 2019-07-17 21:24     ` Andy Lutomirski
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Lutomirski @ 2019-07-17 21:24 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Andrew Morton, LKML, Linux-MM,
	Joerg Roedel

On Wed, Jul 17, 2019 at 12:14 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> On x86-32 with PTI enabled, parts of the kernel page-tables
> are not shared between processes. This can cause mappings in
> the vmalloc/ioremap area to persist in some page-tables
> after the regions is unmapped and released.
>
> When the region is re-used the processes with the old
> mappings do not fault in the new mappings but still access
> the old ones.
>
> This causes undefined behavior, in reality often data
> corruption, kernel oopses and panics and even spontaneous
> reboots.
>
> Fix this problem by activly syncing unmaps in the
> vmalloc/ioremap area to all page-tables in the system.
>
> References: https://bugzilla.suse.com/show_bug.cgi?id=1118689
> Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  mm/vmalloc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 4fa8d84599b0..322b11a374fd 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -132,6 +132,8 @@ static void vunmap_page_range(unsigned long addr, unsigned long end)
>                         continue;
>                 vunmap_p4d_range(pgd, addr, next);
>         } while (pgd++, addr = next, addr != end);
> +
> +       vmalloc_sync_all();
>  }

I'm confused.  Shouldn't the code in _vm_unmap_aliases handle this?
As it stands, won't your patch hurt performance on x86_64?  If x86_32
is a special snowflake here, maybe flush_tlb_kernel_range() should
handle this?

Even if your patch is correct, a comment would be nice


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
  2019-07-17  7:14 ` [PATCH 2/3] x86/mm: Sync also unmappings " Joerg Roedel
@ 2019-07-17 21:43     ` Thomas Gleixner
  2019-07-17 21:43     ` Thomas Gleixner
  1 sibling, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2019-07-17 21:43 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, Andrew Morton, linux-kernel, linux-mm,
	Joerg Roedel

On Wed, 17 Jul 2019, Joerg Roedel wrote:

> From: Joerg Roedel <jroedel@suse.de>
> 
> With huge-page ioremap areas the unmappings also need to be
> synced between all page-tables. Otherwise it can cause data
> corruption when a region is unmapped and later re-used.
> 
> Make the vmalloc_sync_one() function ready to sync
> unmappings.
> 
> Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/mm/fault.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 4a4049f6d458..d71e167662c3 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -194,11 +194,12 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
>  
>  	pmd = pmd_offset(pud, address);
>  	pmd_k = pmd_offset(pud_k, address);
> -	if (!pmd_present(*pmd_k))
> -		return NULL;
>  
> -	if (!pmd_present(*pmd))
> +	if (pmd_present(*pmd) ^ pmd_present(*pmd_k))
>  		set_pmd(pmd, *pmd_k);
> +
> +	if (!pmd_present(*pmd_k))
> +		return NULL;
>  	else
>  		BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k));

So in case of unmap, this updates only the first entry in the pgd_list
because vmalloc_sync_all() will break out of the iteration over pgd_list
when NULL is returned from vmalloc_sync_one().

I'm surely missing something, but how is that supposed to sync _all_ page
tables on unmap as the changelog claims?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
@ 2019-07-17 21:43     ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2019-07-17 21:43 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, Andrew Morton, linux-kernel, linux-mm,
	Joerg Roedel

On Wed, 17 Jul 2019, Joerg Roedel wrote:

> From: Joerg Roedel <jroedel@suse.de>
> 
> With huge-page ioremap areas the unmappings also need to be
> synced between all page-tables. Otherwise it can cause data
> corruption when a region is unmapped and later re-used.
> 
> Make the vmalloc_sync_one() function ready to sync
> unmappings.
> 
> Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/mm/fault.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 4a4049f6d458..d71e167662c3 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -194,11 +194,12 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
>  
>  	pmd = pmd_offset(pud, address);
>  	pmd_k = pmd_offset(pud_k, address);
> -	if (!pmd_present(*pmd_k))
> -		return NULL;
>  
> -	if (!pmd_present(*pmd))
> +	if (pmd_present(*pmd) ^ pmd_present(*pmd_k))
>  		set_pmd(pmd, *pmd_k);
> +
> +	if (!pmd_present(*pmd_k))
> +		return NULL;
>  	else
>  		BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k));

So in case of unmap, this updates only the first entry in the pgd_list
because vmalloc_sync_all() will break out of the iteration over pgd_list
when NULL is returned from vmalloc_sync_one().

I'm surely missing something, but how is that supposed to sync _all_ page
tables on unmap as the changelog claims?

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
  2019-07-17 21:06   ` Dave Hansen
@ 2019-07-18  8:44     ` Joerg Roedel
  0 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2019-07-18  8:44 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andrew Morton,
	linux-kernel, linux-mm

Hi Dave,

On Wed, Jul 17, 2019 at 02:06:01PM -0700, Dave Hansen wrote:
> On 7/17/19 12:14 AM, Joerg Roedel wrote:
> > -	if (!pmd_present(*pmd))
> > +	if (pmd_present(*pmd) ^ pmd_present(*pmd_k))
> >  		set_pmd(pmd, *pmd_k);
> 
> Wouldn't:
> 
> 	if (pmd_present(*pmd) != pmd_present(*pmd_k))
> 		set_pmd(pmd, *pmd_k);
> 
> be a bit more intuitive?

Yes, right. That is much better, I changed it in the patch.

> But, either way, these look fine.  For the series:
> 
> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>

Thanks!


	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
  2019-07-17 21:43     ` Thomas Gleixner
  (?)
@ 2019-07-18  8:46     ` Joerg Roedel
  2019-07-18  9:04         ` Thomas Gleixner
  -1 siblings, 1 reply; 30+ messages in thread
From: Joerg Roedel @ 2019-07-18  8:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

Hi Thomas,

On Wed, Jul 17, 2019 at 11:43:43PM +0200, Thomas Gleixner wrote:
> On Wed, 17 Jul 2019, Joerg Roedel wrote:
> > +
> > +	if (!pmd_present(*pmd_k))
> > +		return NULL;
> >  	else
> >  		BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k));
> 
> So in case of unmap, this updates only the first entry in the pgd_list
> because vmalloc_sync_all() will break out of the iteration over pgd_list
> when NULL is returned from vmalloc_sync_one().
> 
> I'm surely missing something, but how is that supposed to sync _all_ page
> tables on unmap as the changelog claims?

No, you are right, I missed that. It is a bug in this patch, the code
that breaks out of the loop in vmalloc_sync_all() needs to be removed as
well. Will do that in the next version.


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
  2019-07-18  8:46     ` Joerg Roedel
@ 2019-07-18  9:04         ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2019-07-18  9:04 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

Joerg,

On Thu, 18 Jul 2019, Joerg Roedel wrote:
> On Wed, Jul 17, 2019 at 11:43:43PM +0200, Thomas Gleixner wrote:
> > On Wed, 17 Jul 2019, Joerg Roedel wrote:
> > > +
> > > +	if (!pmd_present(*pmd_k))
> > > +		return NULL;
> > >  	else
> > >  		BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k));
> > 
> > So in case of unmap, this updates only the first entry in the pgd_list
> > because vmalloc_sync_all() will break out of the iteration over pgd_list
> > when NULL is returned from vmalloc_sync_one().
> > 
> > I'm surely missing something, but how is that supposed to sync _all_ page
> > tables on unmap as the changelog claims?
> 
> No, you are right, I missed that. It is a bug in this patch, the code
> that breaks out of the loop in vmalloc_sync_all() needs to be removed as
> well. Will do that in the next version.

I assume that p4d/pud do not need the pmd treatment, but a comment
explaining why would be appreciated.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
@ 2019-07-18  9:04         ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2019-07-18  9:04 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

Joerg,

On Thu, 18 Jul 2019, Joerg Roedel wrote:
> On Wed, Jul 17, 2019 at 11:43:43PM +0200, Thomas Gleixner wrote:
> > On Wed, 17 Jul 2019, Joerg Roedel wrote:
> > > +
> > > +	if (!pmd_present(*pmd_k))
> > > +		return NULL;
> > >  	else
> > >  		BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k));
> > 
> > So in case of unmap, this updates only the first entry in the pgd_list
> > because vmalloc_sync_all() will break out of the iteration over pgd_list
> > when NULL is returned from vmalloc_sync_one().
> > 
> > I'm surely missing something, but how is that supposed to sync _all_ page
> > tables on unmap as the changelog claims?
> 
> No, you are right, I missed that. It is a bug in this patch, the code
> that breaks out of the loop in vmalloc_sync_all() needs to be removed as
> well. Will do that in the next version.

I assume that p4d/pud do not need the pmd treatment, but a comment
explaining why would be appreciated.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-17 21:24     ` Andy Lutomirski
  (?)
@ 2019-07-18  9:17     ` Joerg Roedel
  2019-07-18 19:04         ` Andy Lutomirski
  -1 siblings, 1 reply; 30+ messages in thread
From: Joerg Roedel @ 2019-07-18  9:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Dave Hansen, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Andrew Morton, LKML, Linux-MM

Hi Andy,

On Wed, Jul 17, 2019 at 02:24:09PM -0700, Andy Lutomirski wrote:
> On Wed, Jul 17, 2019 at 12:14 AM Joerg Roedel <joro@8bytes.org> wrote:
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 4fa8d84599b0..322b11a374fd 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -132,6 +132,8 @@ static void vunmap_page_range(unsigned long addr, unsigned long end)
> >                         continue;
> >                 vunmap_p4d_range(pgd, addr, next);
> >         } while (pgd++, addr = next, addr != end);
> > +
> > +       vmalloc_sync_all();
> >  }
> 
> I'm confused.  Shouldn't the code in _vm_unmap_aliases handle this?
> As it stands, won't your patch hurt performance on x86_64?  If x86_32
> is a special snowflake here, maybe flush_tlb_kernel_range() should
> handle this?

Imo this is the logical place to handle this. The code first unmaps the
area from the init_mm page-table and then syncs that page-table to all
other page-tables in the system, so one place to update the page-tables.

Performance-wise it makes no difference if we put that into
_vm_unmap_aliases(), as that is called in the vmunmap path too. But it
is right that vmunmap/iounmap performance on x86-64 will decrease to
some degree. If that is a problem for some workloads I can also
implement a complete separate code-path which just syncs unmappings and
is only implemented for x86-32 with !SHARED_KERNEL_PMD.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
  2019-07-18  9:04         ` Thomas Gleixner
  (?)
@ 2019-07-18  9:25         ` Joerg Roedel
  -1 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2019-07-18  9:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

On Thu, Jul 18, 2019 at 11:04:57AM +0200, Thomas Gleixner wrote:
> On Thu, 18 Jul 2019, Joerg Roedel wrote:
> > No, you are right, I missed that. It is a bug in this patch, the code
> > that breaks out of the loop in vmalloc_sync_all() needs to be removed as
> > well. Will do that in the next version.
> 
> I assume that p4d/pud do not need the pmd treatment, but a comment
> explaining why would be appreciated.

Yes, p4d and pud don't need to be handled here, as the code is 32-bit
only and there p4d is folded anyway. Pud is only relevant for PAE and
will already be mapped when the page-table is created (for performance
reasons, because pud is top-level at PAE and mapping it later requires a
TLB flush).
The pud with PAE also never changes during the life-time of the
page-table because we can't map a huge-page there. I will put that into
a comment.

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-18  9:17     ` Joerg Roedel
@ 2019-07-18 19:04         ` Andy Lutomirski
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Lutomirski @ 2019-07-18 19:04 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Dave Hansen, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andrew Morton,
	LKML, Linux-MM

On Thu, Jul 18, 2019 at 2:17 AM Joerg Roedel <jroedel@suse.de> wrote:
>
> Hi Andy,
>
> On Wed, Jul 17, 2019 at 02:24:09PM -0700, Andy Lutomirski wrote:
> > On Wed, Jul 17, 2019 at 12:14 AM Joerg Roedel <joro@8bytes.org> wrote:
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 4fa8d84599b0..322b11a374fd 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -132,6 +132,8 @@ static void vunmap_page_range(unsigned long addr, unsigned long end)
> > >                         continue;
> > >                 vunmap_p4d_range(pgd, addr, next);
> > >         } while (pgd++, addr = next, addr != end);
> > > +
> > > +       vmalloc_sync_all();
> > >  }
> >
> > I'm confused.  Shouldn't the code in _vm_unmap_aliases handle this?
> > As it stands, won't your patch hurt performance on x86_64?  If x86_32
> > is a special snowflake here, maybe flush_tlb_kernel_range() should
> > handle this?
>
> Imo this is the logical place to handle this. The code first unmaps the
> area from the init_mm page-table and then syncs that page-table to all
> other page-tables in the system, so one place to update the page-tables.


I find it problematic that there is no meaningful documentation as to
what vmalloc_sync_all() is supposed to do.  The closest I can find is
this comment by following the x86_64 code, which calls
sync_global_pgds(), which says:

/*
 * When memory was added make sure all the processes MM have
 * suitable PGD entries in the local PGD level page.
 */
void sync_global_pgds(unsigned long start, unsigned long end)
{

Which is obviously entirely inapplicable.  If I'm understanding
correctly, the underlying issue here is that the vmalloc fault
mechanism can propagate PGD entry *addition*, but nothing (not even
flush_tlb_kernel_range()) propagates PGD entry *removal*.

I find it suspicious that only x86 has this.  How do other
architectures handle this?

At the very least, I think this series needs a comment in
vmalloc_sync_all() explaining exactly what the function promises to
do.  But maybe a better fix is to add code to flush_tlb_kernel_range()
to sync the vmalloc area if the flushed range overlaps the vmalloc
area.  Or, even better, improve x86_32 the way we did x86_64: adjust
the memory mapping code such that top-level paging entries are never
deleted in the first place.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
@ 2019-07-18 19:04         ` Andy Lutomirski
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Lutomirski @ 2019-07-18 19:04 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Dave Hansen, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andrew Morton,
	LKML, Linux-MM

On Thu, Jul 18, 2019 at 2:17 AM Joerg Roedel <jroedel@suse.de> wrote:
>
> Hi Andy,
>
> On Wed, Jul 17, 2019 at 02:24:09PM -0700, Andy Lutomirski wrote:
> > On Wed, Jul 17, 2019 at 12:14 AM Joerg Roedel <joro@8bytes.org> wrote:
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 4fa8d84599b0..322b11a374fd 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -132,6 +132,8 @@ static void vunmap_page_range(unsigned long addr, unsigned long end)
> > >                         continue;
> > >                 vunmap_p4d_range(pgd, addr, next);
> > >         } while (pgd++, addr = next, addr != end);
> > > +
> > > +       vmalloc_sync_all();
> > >  }
> >
> > I'm confused.  Shouldn't the code in _vm_unmap_aliases handle this?
> > As it stands, won't your patch hurt performance on x86_64?  If x86_32
> > is a special snowflake here, maybe flush_tlb_kernel_range() should
> > handle this?
>
> Imo this is the logical place to handle this. The code first unmaps the
> area from the init_mm page-table and then syncs that page-table to all
> other page-tables in the system, so one place to update the page-tables.


I find it problematic that there is no meaningful documentation as to
what vmalloc_sync_all() is supposed to do.  The closest I can find is
this comment by following the x86_64 code, which calls
sync_global_pgds(), which says:

/*
 * When memory was added make sure all the processes MM have
 * suitable PGD entries in the local PGD level page.
 */
void sync_global_pgds(unsigned long start, unsigned long end)
{

Which is obviously entirely inapplicable.  If I'm understanding
correctly, the underlying issue here is that the vmalloc fault
mechanism can propagate PGD entry *addition*, but nothing (not even
flush_tlb_kernel_range()) propagates PGD entry *removal*.

I find it suspicious that only x86 has this.  How do other
architectures handle this?

At the very least, I think this series needs a comment in
vmalloc_sync_all() explaining exactly what the function promises to
do.  But maybe a better fix is to add code to flush_tlb_kernel_range()
to sync the vmalloc area if the flushed range overlaps the vmalloc
area.  Or, even better, improve x86_32 the way we did x86_64: adjust
the memory mapping code such that top-level paging entries are never
deleted in the first place.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-18 19:04         ` Andy Lutomirski
  (?)
@ 2019-07-19 12:21         ` Joerg Roedel
  2019-07-19 12:24             ` Andy Lutomirski
  -1 siblings, 1 reply; 30+ messages in thread
From: Joerg Roedel @ 2019-07-19 12:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Dave Hansen, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Andrew Morton, LKML, Linux-MM

On Thu, Jul 18, 2019 at 12:04:49PM -0700, Andy Lutomirski wrote:
> I find it problematic that there is no meaningful documentation as to
> what vmalloc_sync_all() is supposed to do.

Yeah, I found that too, there is no real design around
vmalloc_sync_all(). It looks like it was just added to fit the purpose
on x86-32. That also makes it hard to find all necessary call-sites.

> Which is obviously entirely inapplicable.  If I'm understanding
> correctly, the underlying issue here is that the vmalloc fault
> mechanism can propagate PGD entry *addition*, but nothing (not even
> flush_tlb_kernel_range()) propagates PGD entry *removal*.

Close, the underlying issue is not about PGD, but PMD entry
addition/removal on x86-32 pae systems.

> I find it suspicious that only x86 has this.  How do other
> architectures handle this?

The problem on x86-PAE arises from the !SHARED_KERNEL_PMD case, which was
introduced by the  Xen-PV patches and then re-used for the PTI-x32
enablement to be able to map the LDT into user-space at a fixed address.

Other architectures probably don't have the !SHARED_KERNEL_PMD case (or
do unsharing of kernel page-tables on any level where a huge-page could
be mapped).

> At the very least, I think this series needs a comment in
> vmalloc_sync_all() explaining exactly what the function promises to
> do.

Okay, as it stands, it promises to sync mappings for the vmalloc area
between all PGDs in the system. I will add that as a comment.

> But maybe a better fix is to add code to flush_tlb_kernel_range()
> to sync the vmalloc area if the flushed range overlaps the vmalloc
> area.

That would also cause needless overhead on x86-64 because the vmalloc
area doesn't need syncing there. I can make it x86-32 only, but that is
not a clean solution imo.

> Or, even better, improve x86_32 the way we did x86_64: adjust
> the memory mapping code such that top-level paging entries are never
> deleted in the first place.

There is not enough address space on x86-32 to partition it like on
x86-64. In the default PAE configuration there are _four_ PGD entries,
usually one for the kernel, and then 512 PMD entries. Partitioning
happens on the PMD level, for example there is one entry (2MB of address
space) reserved for the user-space LDT mapping.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-19 12:21         ` Joerg Roedel
@ 2019-07-19 12:24             ` Andy Lutomirski
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Lutomirski @ 2019-07-19 12:24 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Dave Hansen, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andrew Morton,
	LKML, Linux-MM

On Fri, Jul 19, 2019 at 5:21 AM Joerg Roedel <jroedel@suse.de> wrote:
>
> On Thu, Jul 18, 2019 at 12:04:49PM -0700, Andy Lutomirski wrote:
> > I find it problematic that there is no meaningful documentation as to
> > what vmalloc_sync_all() is supposed to do.
>
> Yeah, I found that too, there is no real design around
> vmalloc_sync_all(). It looks like it was just added to fit the purpose
> on x86-32. That also makes it hard to find all necessary call-sites.
>
> > Which is obviously entirely inapplicable.  If I'm understanding
> > correctly, the underlying issue here is that the vmalloc fault
> > mechanism can propagate PGD entry *addition*, but nothing (not even
> > flush_tlb_kernel_range()) propagates PGD entry *removal*.
>
> Close, the underlying issue is not about PGD, but PMD entry
> addition/removal on x86-32 pae systems.
>
> > I find it suspicious that only x86 has this.  How do other
> > architectures handle this?
>
> The problem on x86-PAE arises from the !SHARED_KERNEL_PMD case, which was
> introduced by the  Xen-PV patches and then re-used for the PTI-x32
> enablement to be able to map the LDT into user-space at a fixed address.
>
> Other architectures probably don't have the !SHARED_KERNEL_PMD case (or
> do unsharing of kernel page-tables on any level where a huge-page could
> be mapped).
>
> > At the very least, I think this series needs a comment in
> > vmalloc_sync_all() explaining exactly what the function promises to
> > do.
>
> Okay, as it stands, it promises to sync mappings for the vmalloc area
> between all PGDs in the system. I will add that as a comment.
>
> > But maybe a better fix is to add code to flush_tlb_kernel_range()
> > to sync the vmalloc area if the flushed range overlaps the vmalloc
> > area.
>
> That would also cause needless overhead on x86-64 because the vmalloc
> area doesn't need syncing there. I can make it x86-32 only, but that is
> not a clean solution imo.

Could you move the vmalloc_sync_all() call to the lazy purge path,
though?  If nothing else, it will cause it to be called fewer times
under any given workload, and it looks like it could be rather slow on
x86_32.

>
> > Or, even better, improve x86_32 the way we did x86_64: adjust
> > the memory mapping code such that top-level paging entries are never
> > deleted in the first place.
>
> There is not enough address space on x86-32 to partition it like on
> x86-64. In the default PAE configuration there are _four_ PGD entries,
> usually one for the kernel, and then 512 PMD entries. Partitioning
> happens on the PMD level, for example there is one entry (2MB of address
> space) reserved for the user-space LDT mapping.

Ugh, fair enough.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
@ 2019-07-19 12:24             ` Andy Lutomirski
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Lutomirski @ 2019-07-19 12:24 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Dave Hansen, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Andrew Morton,
	LKML, Linux-MM

On Fri, Jul 19, 2019 at 5:21 AM Joerg Roedel <jroedel@suse.de> wrote:
>
> On Thu, Jul 18, 2019 at 12:04:49PM -0700, Andy Lutomirski wrote:
> > I find it problematic that there is no meaningful documentation as to
> > what vmalloc_sync_all() is supposed to do.
>
> Yeah, I found that too, there is no real design around
> vmalloc_sync_all(). It looks like it was just added to fit the purpose
> on x86-32. That also makes it hard to find all necessary call-sites.
>
> > Which is obviously entirely inapplicable.  If I'm understanding
> > correctly, the underlying issue here is that the vmalloc fault
> > mechanism can propagate PGD entry *addition*, but nothing (not even
> > flush_tlb_kernel_range()) propagates PGD entry *removal*.
>
> Close, the underlying issue is not about PGD, but PMD entry
> addition/removal on x86-32 pae systems.
>
> > I find it suspicious that only x86 has this.  How do other
> > architectures handle this?
>
> The problem on x86-PAE arises from the !SHARED_KERNEL_PMD case, which was
> introduced by the  Xen-PV patches and then re-used for the PTI-x32
> enablement to be able to map the LDT into user-space at a fixed address.
>
> Other architectures probably don't have the !SHARED_KERNEL_PMD case (or
> do unsharing of kernel page-tables on any level where a huge-page could
> be mapped).
>
> > At the very least, I think this series needs a comment in
> > vmalloc_sync_all() explaining exactly what the function promises to
> > do.
>
> Okay, as it stands, it promises to sync mappings for the vmalloc area
> between all PGDs in the system. I will add that as a comment.
>
> > But maybe a better fix is to add code to flush_tlb_kernel_range()
> > to sync the vmalloc area if the flushed range overlaps the vmalloc
> > area.
>
> That would also cause needless overhead on x86-64 because the vmalloc
> area doesn't need syncing there. I can make it x86-32 only, but that is
> not a clean solution imo.

Could you move the vmalloc_sync_all() call to the lazy purge path,
though?  If nothing else, it will cause it to be called fewer times
under any given workload, and it looks like it could be rather slow on
x86_32.

>
> > Or, even better, improve x86_32 the way we did x86_64: adjust
> > the memory mapping code such that top-level paging entries are never
> > deleted in the first place.
>
> There is not enough address space on x86-32 to partition it like on
> x86-64. In the default PAE configuration there are _four_ PGD entries,
> usually one for the kernel, and then 512 PMD entries. Partitioning
> happens on the PMD level, for example there is one entry (2MB of address
> space) reserved for the user-space LDT mapping.

Ugh, fair enough.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-19 12:24             ` Andy Lutomirski
  (?)
@ 2019-07-19 13:00             ` Joerg Roedel
  -1 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2019-07-19 13:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Dave Hansen, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Andrew Morton, LKML, Linux-MM

On Fri, Jul 19, 2019 at 05:24:03AM -0700, Andy Lutomirski wrote:
> Could you move the vmalloc_sync_all() call to the lazy purge path,
> though?  If nothing else, it will cause it to be called fewer times
> under any given workload, and it looks like it could be rather slow on
> x86_32.

Okay, I move it to __purge_vmap_area_lazy(). That looks like the right
place.


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
  2019-07-18  9:04         ` Thomas Gleixner
  (?)
  (?)
@ 2019-07-19 14:01         ` Joerg Roedel
  2019-07-19 21:10             ` Thomas Gleixner
  -1 siblings, 1 reply; 30+ messages in thread
From: Joerg Roedel @ 2019-07-19 14:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

On Thu, Jul 18, 2019 at 11:04:57AM +0200, Thomas Gleixner wrote:
> Joerg,
> 
> On Thu, 18 Jul 2019, Joerg Roedel wrote:
> > On Wed, Jul 17, 2019 at 11:43:43PM +0200, Thomas Gleixner wrote:
> > > On Wed, 17 Jul 2019, Joerg Roedel wrote:
> > > > +
> > > > +	if (!pmd_present(*pmd_k))
> > > > +		return NULL;
> > > >  	else
> > > >  		BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k));
> > > 
> > > So in case of unmap, this updates only the first entry in the pgd_list
> > > because vmalloc_sync_all() will break out of the iteration over pgd_list
> > > when NULL is returned from vmalloc_sync_one().
> > > 
> > > I'm surely missing something, but how is that supposed to sync _all_ page
> > > tables on unmap as the changelog claims?
> > 
> > No, you are right, I missed that. It is a bug in this patch, the code
> > that breaks out of the loop in vmalloc_sync_all() needs to be removed as
> > well. Will do that in the next version.
> 
> I assume that p4d/pud do not need the pmd treatment, but a comment
> explaining why would be appreciated.

Actually there is already a comment in this function explaining why p4d
and pud don't need any treatment:

        /*
         * set_pgd(pgd, *pgd_k); here would be useless on PAE
         * and redundant with the set_pmd() on non-PAE. As would
         * set_p4d/set_pud.
         */ 

I couldn't say it with less words :)


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
  2019-07-19 14:01         ` Joerg Roedel
@ 2019-07-19 21:10             ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2019-07-19 21:10 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

On Fri, 19 Jul 2019, Joerg Roedel wrote:
> On Thu, Jul 18, 2019 at 11:04:57AM +0200, Thomas Gleixner wrote:
> > Joerg,
> > 
> > On Thu, 18 Jul 2019, Joerg Roedel wrote:
> > > On Wed, Jul 17, 2019 at 11:43:43PM +0200, Thomas Gleixner wrote:
> > > > On Wed, 17 Jul 2019, Joerg Roedel wrote:
> > > > > +
> > > > > +	if (!pmd_present(*pmd_k))
> > > > > +		return NULL;
> > > > >  	else
> > > > >  		BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k));
> > > > 
> > > > So in case of unmap, this updates only the first entry in the pgd_list
> > > > because vmalloc_sync_all() will break out of the iteration over pgd_list
> > > > when NULL is returned from vmalloc_sync_one().
> > > > 
> > > > I'm surely missing something, but how is that supposed to sync _all_ page
> > > > tables on unmap as the changelog claims?
> > > 
> > > No, you are right, I missed that. It is a bug in this patch, the code
> > > that breaks out of the loop in vmalloc_sync_all() needs to be removed as
> > > well. Will do that in the next version.
> > 
> > I assume that p4d/pud do not need the pmd treatment, but a comment
> > explaining why would be appreciated.
> 
> Actually there is already a comment in this function explaining why p4d
> and pud don't need any treatment:
> 
>         /*
>          * set_pgd(pgd, *pgd_k); here would be useless on PAE
>          * and redundant with the set_pmd() on non-PAE. As would
>          * set_p4d/set_pud.
>          */ 

Indeed. Why did I think there was none?

> I couldn't say it with less words :)

It's perfectly fine.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86/mm: Sync also unmappings in vmalloc_sync_one()
@ 2019-07-19 21:10             ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2019-07-19 21:10 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

On Fri, 19 Jul 2019, Joerg Roedel wrote:
> On Thu, Jul 18, 2019 at 11:04:57AM +0200, Thomas Gleixner wrote:
> > Joerg,
> > 
> > On Thu, 18 Jul 2019, Joerg Roedel wrote:
> > > On Wed, Jul 17, 2019 at 11:43:43PM +0200, Thomas Gleixner wrote:
> > > > On Wed, 17 Jul 2019, Joerg Roedel wrote:
> > > > > +
> > > > > +	if (!pmd_present(*pmd_k))
> > > > > +		return NULL;
> > > > >  	else
> > > > >  		BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k));
> > > > 
> > > > So in case of unmap, this updates only the first entry in the pgd_list
> > > > because vmalloc_sync_all() will break out of the iteration over pgd_list
> > > > when NULL is returned from vmalloc_sync_one().
> > > > 
> > > > I'm surely missing something, but how is that supposed to sync _all_ page
> > > > tables on unmap as the changelog claims?
> > > 
> > > No, you are right, I missed that. It is a bug in this patch, the code
> > > that breaks out of the loop in vmalloc_sync_all() needs to be removed as
> > > well. Will do that in the next version.
> > 
> > I assume that p4d/pud do not need the pmd treatment, but a comment
> > explaining why would be appreciated.
> 
> Actually there is already a comment in this function explaining why p4d
> and pud don't need any treatment:
> 
>         /*
>          * set_pgd(pgd, *pgd_k); here would be useless on PAE
>          * and redundant with the set_pmd() on non-PAE. As would
>          * set_p4d/set_pud.
>          */ 

Indeed. Why did I think there was none?

> I couldn't say it with less words :)

It's perfectly fine.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-22  8:19       ` Thomas Gleixner
  (?)
@ 2019-07-22  8:29       ` Joerg Roedel
  -1 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2019-07-22  8:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

On Mon, Jul 22, 2019 at 10:19:32AM +0200, Thomas Gleixner wrote:
> On Mon, 22 Jul 2019, Joerg Roedel wrote:
> 
> > Srewed up the subject :(, it needs to be
> 
> Un-Srewed it :)

Thanks a lot :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-22  8:11   ` Joerg Roedel
@ 2019-07-22  8:19       ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2019-07-22  8:19 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

On Mon, 22 Jul 2019, Joerg Roedel wrote:

> Srewed up the subject :(, it needs to be

Un-Srewed it :)
   ^^^^^^

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
@ 2019-07-22  8:19       ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2019-07-22  8:19 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

On Mon, 22 Jul 2019, Joerg Roedel wrote:

> Srewed up the subject :(, it needs to be

Un-Srewed it :)
   ^^^^^^


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-19 18:46 ` [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range() Joerg Roedel
@ 2019-07-22  8:11   ` Joerg Roedel
  2019-07-22  8:19       ` Thomas Gleixner
  0 siblings, 1 reply; 30+ messages in thread
From: Joerg Roedel @ 2019-07-22  8:11 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Andrew Morton, linux-kernel,
	linux-mm

Srewed up the subject :(, it needs to be

	"mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()"

of course.

On Fri, Jul 19, 2019 at 08:46:52PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> On x86-32 with PTI enabled, parts of the kernel page-tables
> are not shared between processes. This can cause mappings in
> the vmalloc/ioremap area to persist in some page-tables
> after the region is unmapped and released.
> 
> When the region is re-used the processes with the old
> mappings do not fault in the new mappings but still access
> the old ones.
> 
> This causes undefined behavior, in reality often data
> corruption, kernel oopses and panics and even spontaneous
> reboots.
> 
> Fix this problem by activly syncing unmaps in the
> vmalloc/ioremap area to all page-tables in the system before
> the regions can be re-used.
> 
> References: https://bugzilla.suse.com/show_bug.cgi?id=1118689
> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  mm/vmalloc.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 4fa8d84599b0..e0fc963acc41 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1258,6 +1258,12 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>  	if (unlikely(valist == NULL))
>  		return false;
>  
> +	/*
> +	 * First make sure the mappings are removed from all page-tables
> +	 * before they are freed.
> +	 */
> +	vmalloc_sync_all();
> +
>  	/*
>  	 * TODO: to calculate a flush range without looping.
>  	 * The list can be up to lazy_max_pages() elements.
> @@ -3038,6 +3044,9 @@ EXPORT_SYMBOL(remap_vmalloc_range);
>  /*
>   * Implement a stub for vmalloc_sync_all() if the architecture chose not to
>   * have one.
> + *
> + * The purpose of this function is to make sure the vmalloc area
> + * mappings are identical in all page-tables in the system.
>   */
>  void __weak vmalloc_sync_all(void)
>  {
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-19 18:46 [PATCH 0/3 v3] Sync unmappings in vmalloc/ioremap areas Joerg Roedel
@ 2019-07-19 18:46 ` Joerg Roedel
  2019-07-22  8:11   ` Joerg Roedel
  0 siblings, 1 reply; 30+ messages in thread
From: Joerg Roedel @ 2019-07-19 18:46 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov
  Cc: Andrew Morton, linux-kernel, linux-mm, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

On x86-32 with PTI enabled, parts of the kernel page-tables
are not shared between processes. This can cause mappings in
the vmalloc/ioremap area to persist in some page-tables
after the region is unmapped and released.

When the region is re-used the processes with the old
mappings do not fault in the new mappings but still access
the old ones.

This causes undefined behavior, in reality often data
corruption, kernel oopses and panics and even spontaneous
reboots.

Fix this problem by activly syncing unmaps in the
vmalloc/ioremap area to all page-tables in the system before
the regions can be re-used.

References: https://bugzilla.suse.com/show_bug.cgi?id=1118689
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 mm/vmalloc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4fa8d84599b0..e0fc963acc41 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1258,6 +1258,12 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
 	if (unlikely(valist == NULL))
 		return false;
 
+	/*
+	 * First make sure the mappings are removed from all page-tables
+	 * before they are freed.
+	 */
+	vmalloc_sync_all();
+
 	/*
 	 * TODO: to calculate a flush range without looping.
 	 * The list can be up to lazy_max_pages() elements.
@@ -3038,6 +3044,9 @@ EXPORT_SYMBOL(remap_vmalloc_range);
 /*
  * Implement a stub for vmalloc_sync_all() if the architecture chose not to
  * have one.
+ *
+ * The purpose of this function is to make sure the vmalloc area
+ * mappings are identical in all page-tables in the system.
  */
 void __weak vmalloc_sync_all(void)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range()
  2019-07-15 11:02 [PATCH 0/3] Sync unmappings in vmalloc/ioremap areas Joerg Roedel
@ 2019-07-15 11:02 ` Joerg Roedel
  0 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2019-07-15 11:02 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov
  Cc: Andrew Morton, linux-kernel, linux-mm, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

On x86-32 with PTI enabled, parts of the kernel page-tables
are not shared between processes. This can cause mappings in
the vmalloc/ioremap area to persist in some page-tables
after the regions is unmapped and released.

When the region is re-used the processes with the old
mappings do not fault in the new mappings but still access
the old ones.

This causes undefined behavior, in reality often data
corruption, kernel oopses and panics and even spontaneous
reboots.

Fix this problem by activly syncing unmaps in the
vmalloc/ioremap area to all page-tables in the system.

References: https://bugzilla.suse.com/show_bug.cgi?id=1118689
Fixes: 7757d607c6b3 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 mm/vmalloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4fa8d84599b0..322b11a374fd 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -132,6 +132,8 @@ static void vunmap_page_range(unsigned long addr, unsigned long end)
 			continue;
 		vunmap_p4d_range(pgd, addr, next);
 	} while (pgd++, addr = next, addr != end);
+
+	vmalloc_sync_all();
 }
 
 static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-07-22  8:29 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-17  7:14 [PATCH 0/3 v2] Sync unmappings in vmalloc/ioremap areas Joerg Roedel
2019-07-17  7:14 ` [PATCH 1/3] x86/mm: Check for pfn instead of page in vmalloc_sync_one() Joerg Roedel
2019-07-17  7:14 ` [PATCH 2/3] x86/mm: Sync also unmappings " Joerg Roedel
2019-07-17 21:06   ` Dave Hansen
2019-07-18  8:44     ` Joerg Roedel
2019-07-17 21:43   ` Thomas Gleixner
2019-07-17 21:43     ` Thomas Gleixner
2019-07-18  8:46     ` Joerg Roedel
2019-07-18  9:04       ` Thomas Gleixner
2019-07-18  9:04         ` Thomas Gleixner
2019-07-18  9:25         ` Joerg Roedel
2019-07-19 14:01         ` Joerg Roedel
2019-07-19 21:10           ` Thomas Gleixner
2019-07-19 21:10             ` Thomas Gleixner
2019-07-17  7:14 ` [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range() Joerg Roedel
2019-07-17 21:24   ` Andy Lutomirski
2019-07-17 21:24     ` Andy Lutomirski
2019-07-18  9:17     ` Joerg Roedel
2019-07-18 19:04       ` Andy Lutomirski
2019-07-18 19:04         ` Andy Lutomirski
2019-07-19 12:21         ` Joerg Roedel
2019-07-19 12:24           ` Andy Lutomirski
2019-07-19 12:24             ` Andy Lutomirski
2019-07-19 13:00             ` Joerg Roedel
  -- strict thread matches above, loose matches on Subject: below --
2019-07-19 18:46 [PATCH 0/3 v3] Sync unmappings in vmalloc/ioremap areas Joerg Roedel
2019-07-19 18:46 ` [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range() Joerg Roedel
2019-07-22  8:11   ` Joerg Roedel
2019-07-22  8:19     ` Thomas Gleixner
2019-07-22  8:19       ` Thomas Gleixner
2019-07-22  8:29       ` Joerg Roedel
2019-07-15 11:02 [PATCH 0/3] Sync unmappings in vmalloc/ioremap areas Joerg Roedel
2019-07-15 11:02 ` [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range() Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.