linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate
@ 2019-02-19 12:32 Lars Persson
  2019-02-21 17:10 ` Mel Gorman
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Lars Persson @ 2019-02-19 12:32 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: linux-mips, Lars Persson

Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
and SIGSEGV that could not be traced back to a userspace code
bug. They had all the magic signs of an I/D cache coherency issue.

Now recently we noticed that the /proc/sys/vm/compact_memory interface
was quite efficient at provoking this class of userspace crashes.

Studying the code in mm/migrate.c there is a distinction made between
migrating a page that is mapped at the instant of migration and one
that is not mapped. Our problem turned out to be the non-mapped pages.

For the non-mapped page the code performs a copy of the page content
and all relevant meta-data of the page without doing the required
D-cache maintenance. This leaves dirty data in the D-cache of the CPU
and on the 1004K cores this data is not visible to the I-cache. A
subsequent page-fault that triggers a mapping of the page will happily
serve the process with potentially stale code.

What about ARM then, this bug should have seen greater exposure? Well
ARM became immune to this flaw back in 2010, see commit c01778001a4f
("ARM: 6379/1: Assume new page cache pages have dirty D-cache").

My proposed fix moves the D-cache maintenance inside move_to_new_page
to make it common for both cases.

Signed-off-by: Lars Persson <larper@axis.com>
---
 mm/migrate.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index d4fd680be3b0..80fc19e610b5 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -248,10 +248,8 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma,
 				pte = swp_entry_to_pte(entry);
 			} else if (is_device_public_page(new)) {
 				pte = pte_mkdevmap(pte);
-				flush_dcache_page(new);
 			}
-		} else
-			flush_dcache_page(new);
+		}
 
 #ifdef CONFIG_HUGETLB_PAGE
 		if (PageHuge(new)) {
@@ -995,6 +993,13 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 		 */
 		if (!PageMappingFlags(page))
 			page->mapping = NULL;
+
+		if (unlikely(is_zone_device_page(newpage))) {
+			if (is_device_public_page(newpage))
+				flush_dcache_page(newpage);
+		} else
+			flush_dcache_page(newpage);
+
 	}
 out:
 	return rc;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate
  2019-02-19 12:32 [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate Lars Persson
@ 2019-02-21 17:10 ` Mel Gorman
  2019-02-21 20:36 ` Paul Burton
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2019-02-21 17:10 UTC (permalink / raw)
  To: Lars Persson; +Cc: linux-mm, linux-kernel, linux-mips, Lars Persson

On Tue, Feb 19, 2019 at 01:32:12PM +0100, Lars Persson wrote:
> Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
> and SIGSEGV that could not be traced back to a userspace code
> bug. They had all the magic signs of an I/D cache coherency issue.
> 
> Now recently we noticed that the /proc/sys/vm/compact_memory interface
> was quite efficient at provoking this class of userspace crashes.
> 
> Studying the code in mm/migrate.c there is a distinction made between
> migrating a page that is mapped at the instant of migration and one
> that is not mapped. Our problem turned out to be the non-mapped pages.
> 
> For the non-mapped page the code performs a copy of the page content
> and all relevant meta-data of the page without doing the required
> D-cache maintenance. This leaves dirty data in the D-cache of the CPU
> and on the 1004K cores this data is not visible to the I-cache. A
> subsequent page-fault that triggers a mapping of the page will happily
> serve the process with potentially stale code.
> 
> What about ARM then, this bug should have seen greater exposure? Well
> ARM became immune to this flaw back in 2010, see commit c01778001a4f
> ("ARM: 6379/1: Assume new page cache pages have dirty D-cache").
> 
> My proposed fix moves the D-cache maintenance inside move_to_new_page
> to make it common for both cases.
> 
> Signed-off-by: Lars Persson <larper@axis.com>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate
  2019-02-19 12:32 [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate Lars Persson
  2019-02-21 17:10 ` Mel Gorman
@ 2019-02-21 20:36 ` Paul Burton
  2019-02-25 15:07 ` Vlastimil Babka
  2019-02-26  9:23 ` Anshuman Khandual
  3 siblings, 0 replies; 10+ messages in thread
From: Paul Burton @ 2019-02-21 20:36 UTC (permalink / raw)
  To: Lars Persson; +Cc: linux-mm, linux-kernel, linux-mips, Lars Persson

Hi Lars,

On Tue, Feb 19, 2019 at 01:32:12PM +0100, Lars Persson wrote:
> Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
> and SIGSEGV that could not be traced back to a userspace code
> bug. They had all the magic signs of an I/D cache coherency issue.
> 
> Now recently we noticed that the /proc/sys/vm/compact_memory interface
> was quite efficient at provoking this class of userspace crashes.
> 
> Studying the code in mm/migrate.c there is a distinction made between
> migrating a page that is mapped at the instant of migration and one
> that is not mapped. Our problem turned out to be the non-mapped pages.
> 
> For the non-mapped page the code performs a copy of the page content
> and all relevant meta-data of the page without doing the required
> D-cache maintenance. This leaves dirty data in the D-cache of the CPU
> and on the 1004K cores this data is not visible to the I-cache. A
> subsequent page-fault that triggers a mapping of the page will happily
> serve the process with potentially stale code.
> 
> What about ARM then, this bug should have seen greater exposure? Well
> ARM became immune to this flaw back in 2010, see commit c01778001a4f
> ("ARM: 6379/1: Assume new page cache pages have dirty D-cache").
> 
> My proposed fix moves the D-cache maintenance inside move_to_new_page
> to make it common for both cases.
> 
> Signed-off-by: Lars Persson <larper@axis.com>

Reviewed-by: Paul Burton <paul.burton@mips.com>

Thanks,
    Paul

> ---
>  mm/migrate.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate
  2019-02-19 12:32 [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate Lars Persson
  2019-02-21 17:10 ` Mel Gorman
  2019-02-21 20:36 ` Paul Burton
@ 2019-02-25 15:07 ` Vlastimil Babka
  2019-02-26  8:40   ` Lars Persson
  2019-02-26  9:23 ` Anshuman Khandual
  3 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-02-25 15:07 UTC (permalink / raw)
  To: Lars Persson, linux-mm, linux-kernel; +Cc: linux-mips, Lars Persson

On 2/19/19 1:32 PM, Lars Persson wrote:
> Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
> and SIGSEGV that could not be traced back to a userspace code
> bug. They had all the magic signs of an I/D cache coherency issue.
> 
> Now recently we noticed that the /proc/sys/vm/compact_memory interface
> was quite efficient at provoking this class of userspace crashes.
> 
> Studying the code in mm/migrate.c there is a distinction made between
> migrating a page that is mapped at the instant of migration and one
> that is not mapped. Our problem turned out to be the non-mapped pages.
> 
> For the non-mapped page the code performs a copy of the page content
> and all relevant meta-data of the page without doing the required
> D-cache maintenance. This leaves dirty data in the D-cache of the CPU
> and on the 1004K cores this data is not visible to the I-cache. A
> subsequent page-fault that triggers a mapping of the page will happily
> serve the process with potentially stale code.
> 
> What about ARM then, this bug should have seen greater exposure? Well
> ARM became immune to this flaw back in 2010, see commit c01778001a4f
> ("ARM: 6379/1: Assume new page cache pages have dirty D-cache").
> 
> My proposed fix moves the D-cache maintenance inside move_to_new_page
> to make it common for both cases.
> 
> Signed-off-by: Lars Persson <larper@axis.com>

What about CC stable and a Fixes tag, would it be applicable here?

> ---
>  mm/migrate.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index d4fd680be3b0..80fc19e610b5 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -248,10 +248,8 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma,
>  				pte = swp_entry_to_pte(entry);
>  			} else if (is_device_public_page(new)) {
>  				pte = pte_mkdevmap(pte);
> -				flush_dcache_page(new);
>  			}
> -		} else
> -			flush_dcache_page(new);
> +		}
>  
>  #ifdef CONFIG_HUGETLB_PAGE
>  		if (PageHuge(new)) {
> @@ -995,6 +993,13 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>  		 */
>  		if (!PageMappingFlags(page))
>  			page->mapping = NULL;
> +
> +		if (unlikely(is_zone_device_page(newpage))) {
> +			if (is_device_public_page(newpage))
> +				flush_dcache_page(newpage);
> +		} else
> +			flush_dcache_page(newpage);
> +
>  	}
>  out:
>  	return rc;
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate
  2019-02-25 15:07 ` Vlastimil Babka
@ 2019-02-26  8:40   ` Lars Persson
  2019-02-26 10:07     ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Lars Persson @ 2019-02-26  8:40 UTC (permalink / raw)
  To: Vlastimil Babka, Lars Persson, linux-mm, linux-kernel; +Cc: linux-mips


On 2/25/19 4:07 PM, Vlastimil Babka wrote:
> On 2/19/19 1:32 PM, Lars Persson wrote:
>> Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
>> and SIGSEGV that could not be traced back to a userspace code
>> bug. They had all the magic signs of an I/D cache coherency issue.
>>
>> Now recently we noticed that the /proc/sys/vm/compact_memory interface
>> was quite efficient at provoking this class of userspace crashes.
>>
>> Studying the code in mm/migrate.c there is a distinction made between
>> migrating a page that is mapped at the instant of migration and one
>> that is not mapped. Our problem turned out to be the non-mapped pages.
>>
>> For the non-mapped page the code performs a copy of the page content
>> and all relevant meta-data of the page without doing the required
>> D-cache maintenance. This leaves dirty data in the D-cache of the CPU
>> and on the 1004K cores this data is not visible to the I-cache. A
>> subsequent page-fault that triggers a mapping of the page will happily
>> serve the process with potentially stale code.
>>
>> What about ARM then, this bug should have seen greater exposure? Well
>> ARM became immune to this flaw back in 2010, see commit c01778001a4f
>> ("ARM: 6379/1: Assume new page cache pages have dirty D-cache").
>>
>> My proposed fix moves the D-cache maintenance inside move_to_new_page
>> to make it common for both cases.
>>
>> Signed-off-by: Lars Persson <larper@axis.com>
> 
> What about CC stable and a Fixes tag, would it be applicable here?
> 

Yes this is candidate for stable so let's add:
Cc: <stable@vger.kernel.org>

I do not find a good candidate for a Fixes tag.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate
  2019-02-19 12:32 [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate Lars Persson
                   ` (2 preceding siblings ...)
  2019-02-25 15:07 ` Vlastimil Babka
@ 2019-02-26  9:23 ` Anshuman Khandual
  2019-02-26  9:46   ` Lars Persson
  3 siblings, 1 reply; 10+ messages in thread
From: Anshuman Khandual @ 2019-02-26  9:23 UTC (permalink / raw)
  To: Lars Persson, linux-mm, linux-kernel; +Cc: linux-mips, Lars Persson



On 02/19/2019 06:02 PM, Lars Persson wrote:
> Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
> and SIGSEGV that could not be traced back to a userspace code
> bug. They had all the magic signs of an I/D cache coherency issue.
> 
> Now recently we noticed that the /proc/sys/vm/compact_memory interface
> was quite efficient at provoking this class of userspace crashes.
> 
> Studying the code in mm/migrate.c there is a distinction made between
> migrating a page that is mapped at the instant of migration and one
> that is not mapped. Our problem turned out to be the non-mapped pages.
> 
> For the non-mapped page the code performs a copy of the page content
> and all relevant meta-data of the page without doing the required
> D-cache maintenance. This leaves dirty data in the D-cache of the CPU
> and on the 1004K cores this data is not visible to the I-cache. A
> subsequent page-fault that triggers a mapping of the page will happily
> serve the process with potentially stale code.

Just curious. Is not the code path which tries to map this page should
do the invalidation just before setting it up in the page table via
set_pte_at() or other similar variants ? How it maps without doing the
necessary flush.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate
  2019-02-26  9:23 ` Anshuman Khandual
@ 2019-02-26  9:46   ` Lars Persson
  0 siblings, 0 replies; 10+ messages in thread
From: Lars Persson @ 2019-02-26  9:46 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Lars Persson, linux-mm, linux-kernel, linux-mips, Lars Persson

On Tue, Feb 26, 2019 at 10:23 AM Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
> On 02/19/2019 06:02 PM, Lars Persson wrote:
> > Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
> > and SIGSEGV that could not be traced back to a userspace code
> > bug. They had all the magic signs of an I/D cache coherency issue.
> >
> > Now recently we noticed that the /proc/sys/vm/compact_memory interface
> > was quite efficient at provoking this class of userspace crashes.
> >
> > Studying the code in mm/migrate.c there is a distinction made between
> > migrating a page that is mapped at the instant of migration and one
> > that is not mapped. Our problem turned out to be the non-mapped pages.
> >
> > For the non-mapped page the code performs a copy of the page content
> > and all relevant meta-data of the page without doing the required
> > D-cache maintenance. This leaves dirty data in the D-cache of the CPU
> > and on the 1004K cores this data is not visible to the I-cache. A
> > subsequent page-fault that triggers a mapping of the page will happily
> > serve the process with potentially stale code.
>
> Just curious. Is not the code path which tries to map this page should
> do the invalidation just before setting it up in the page table via
> set_pte_at() or other similar variants ? How it maps without doing the
> necessary flush.

In fact this is what happens when the flush_dcache_page API was used
correctly, but it is an arch implementation detail. All kernel code
that writes to a page cage page must also call flush_dcache_page
before the page becomes eligible for mapping. The arch code has the
option to postpone the actual flush until set_pte_at maps the page.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate
  2019-02-26  8:40   ` Lars Persson
@ 2019-02-26 10:07     ` Vlastimil Babka
  2019-02-26 11:57       ` Lars Persson
  0 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-02-26 10:07 UTC (permalink / raw)
  To: Lars Persson, Lars Persson, linux-mm, linux-kernel; +Cc: linux-mips

On 2/26/19 9:40 AM, Lars Persson wrote:
>> What about CC stable and a Fixes tag, would it be applicable here?
>>
> 
> Yes this is candidate for stable so let's add:
> Cc: <stable@vger.kernel.org>
> 
> I do not find a good candidate for a Fixes tag.

How bout a version range where the bug needs to be fixed then?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate
  2019-02-26 10:07     ` Vlastimil Babka
@ 2019-02-26 11:57       ` Lars Persson
  2019-03-07 14:17         ` Lars Persson
  0 siblings, 1 reply; 10+ messages in thread
From: Lars Persson @ 2019-02-26 11:57 UTC (permalink / raw)
  To: Vlastimil Babka, Lars Persson, linux-mm, linux-kernel; +Cc: linux-mips



On 2/26/19 11:07 AM, Vlastimil Babka wrote:
> On 2/26/19 9:40 AM, Lars Persson wrote:
>>> What about CC stable and a Fixes tag, would it be applicable here?
>>>
>>
>> Yes this is candidate for stable so let's add:
>> Cc: <stable@vger.kernel.org>
>>
>> I do not find a good candidate for a Fixes tag.
> 
> How bout a version range where the bug needs to be fixed then?
> 

The distinction between mapped and non-mapped old page was introduced in 2ebba6b7e1d9 ("mm: unmapped page migration avoid unmap+remap overhead") so at least it applies to stable 4.4+.

Before that patch there was always a call to remove_migration_ptes() but I cannot conclude if those earlier versions actually will reach the flush_dcache_page call if the old page was unmapped.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate
  2019-02-26 11:57       ` Lars Persson
@ 2019-03-07 14:17         ` Lars Persson
  0 siblings, 0 replies; 10+ messages in thread
From: Lars Persson @ 2019-03-07 14:17 UTC (permalink / raw)
  To: Vlastimil Babka, Lars Persson, linux-mm, linux-kernel; +Cc: linux-mips



On 2/26/19 12:57 PM, Lars Persson wrote:
> 
> 
> On 2/26/19 11:07 AM, Vlastimil Babka wrote:
>> On 2/26/19 9:40 AM, Lars Persson wrote:
>>>> What about CC stable and a Fixes tag, would it be applicable here?
>>>>
>>>
>>> Yes this is candidate for stable so let's add:
>>> Cc: <stable@vger.kernel.org>
>>>
>>> I do not find a good candidate for a Fixes tag.
>>
>> How bout a version range where the bug needs to be fixed then?
>>
> 
> The distinction between mapped and non-mapped old page was introduced in 2ebba6b7e1d9 ("mm: unmapped page migration avoid unmap+remap overhead") so at least it applies to stable 4.4+.
> 
> Before that patch there was always a call to remove_migration_ptes() but I cannot conclude if those earlier versions actually will reach the flush_dcache_page call if the old page was unmapped.
> 

Should I submit a V2 patch with CC stable for v4.4+ ?

- Lars

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-03-07 14:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19 12:32 [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped page migrate Lars Persson
2019-02-21 17:10 ` Mel Gorman
2019-02-21 20:36 ` Paul Burton
2019-02-25 15:07 ` Vlastimil Babka
2019-02-26  8:40   ` Lars Persson
2019-02-26 10:07     ` Vlastimil Babka
2019-02-26 11:57       ` Lars Persson
2019-03-07 14:17         ` Lars Persson
2019-02-26  9:23 ` Anshuman Khandual
2019-02-26  9:46   ` Lars Persson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).