linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] mm: add rss counters consistency check
@ 2012-01-06 17:38 Konstantin Khlebnikov
  2012-01-06 17:38 ` [PATCH 2/3] mm: postpone migrated page mapping reset Konstantin Khlebnikov
  2012-01-06 17:38 ` [PATCH 3/3] mm: adjust rss counters for migration entiries Konstantin Khlebnikov
  0 siblings, 2 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2012-01-06 17:38 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel

This patch warn about non-zero rss counters at final mmdrop.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 kernel/fork.c |   21 ++++++++++++++++++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index da4a6a1..67a79dd 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -507,6 +507,23 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
 	return NULL;
 }
 
+static void check_mm(struct mm_struct *mm)
+{
+	int i;
+
+	for (i = 0; i < NR_MM_COUNTERS; i++) {
+		long x = atomic_long_read(&mm->rss_stat.count[i]);
+
+		if (unlikely(x))
+			printk(KERN_ALERT "BUG: Bad rss-counter state "
+					  "mm:%p idx:%d val:%ld\n", mm, i, x);
+	}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	VM_BUG_ON(mm->pmd_huge_pte);
+#endif
+}
+
 /*
  * Allocate and initialize an mm_struct.
  */
@@ -534,9 +551,7 @@ void __mmdrop(struct mm_struct *mm)
 	mm_free_pgd(mm);
 	destroy_context(mm);
 	mmu_notifier_mm_destroy(mm);
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	VM_BUG_ON(mm->pmd_huge_pte);
-#endif
+	check_mm(mm);
 	free_mm(mm);
 }
 EXPORT_SYMBOL_GPL(__mmdrop);


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] mm: postpone migrated page mapping reset
  2012-01-06 17:38 [PATCH 1/3] mm: add rss counters consistency check Konstantin Khlebnikov
@ 2012-01-06 17:38 ` Konstantin Khlebnikov
  2012-01-06 17:38 ` [PATCH 3/3] mm: adjust rss counters for migration entiries Konstantin Khlebnikov
  1 sibling, 0 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2012-01-06 17:38 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel

Postpone resetting page->mapping till final remove_migration_ptes(),
otherwise expression PageAnon(migration_entry_to_page(entry)) does not work.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/migrate.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 177aca4..f59cd76 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -386,7 +386,6 @@ void migrate_page_copy(struct page *newpage, struct page *page)
 	ClearPageSwapCache(page);
 	ClearPagePrivate(page);
 	set_page_private(page, 0);
-	page->mapping = NULL;
 
 	/*
 	 * If any waiters have accumulated on the new page then
@@ -614,6 +613,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 	} else {
 		if (remap_swapcache)
 			remove_migration_ptes(page, newpage);
+		page->mapping = NULL;
 	}
 
 	unlock_page(newpage);


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] mm: adjust rss counters for migration entiries
  2012-01-06 17:38 [PATCH 1/3] mm: add rss counters consistency check Konstantin Khlebnikov
  2012-01-06 17:38 ` [PATCH 2/3] mm: postpone migrated page mapping reset Konstantin Khlebnikov
@ 2012-01-06 17:38 ` Konstantin Khlebnikov
  2012-01-06 17:45   ` Konstantin Khlebnikov
  2012-01-11  5:41   ` KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2012-01-06 17:38 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel

Memory migration fill pte with migration entry and it didn't update rss counters.
Then it replace migration entry with new page (or old one if migration was failed).
But between this two passes this pte can be unmaped, or task can fork child and
it will get copy of this migration entry. Nobody account this into rss counters.

This patch properly adjust rss counters for migration entries in zap_pte_range()
and copy_one_pte(). Thus we avoid extra atomic operations on migration fast-path.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/memory.c |   37 ++++++++++++++++++++++++++++---------
 1 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 829d437..2f96ffc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -878,15 +878,24 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			}
 			if (likely(!non_swap_entry(entry)))
 				rss[MM_SWAPENTS]++;
-			else if (is_write_migration_entry(entry) &&
-					is_cow_mapping(vm_flags)) {
-				/*
-				 * COW mappings require pages in both parent
-				 * and child to be set to read.
-				 */
-				make_migration_entry_read(&entry);
-				pte = swp_entry_to_pte(entry);
-				set_pte_at(src_mm, addr, src_pte, pte);
+			else if (is_migration_entry(entry)) {
+				page = migration_entry_to_page(entry);
+
+				if (PageAnon(page))
+					rss[MM_ANONPAGES]++;
+				else
+					rss[MM_FILEPAGES]++;
+
+				if (is_write_migration_entry(entry) &&
+				    is_cow_mapping(vm_flags)) {
+					/*
+					 * COW mappings require pages in both
+					 * parent and child to be set to read.
+					 */
+					make_migration_entry_read(&entry);
+					pte = swp_entry_to_pte(entry);
+					set_pte_at(src_mm, addr, src_pte, pte);
+				}
 			}
 		}
 		goto out_set_pte;
@@ -1191,6 +1200,16 @@ again:
 
 			if (!non_swap_entry(entry))
 				rss[MM_SWAPENTS]--;
+			else if (is_migration_entry(entry)) {
+				struct page *page;
+
+				page = migration_entry_to_page(entry);
+
+				if (PageAnon(page))
+					rss[MM_ANONPAGES]--;
+				else
+					rss[MM_FILEPAGES]--;
+			}
 			if (unlikely(!free_swap_and_cache(entry)))
 				print_bad_pte(vma, addr, ptent, NULL);
 		}


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm: adjust rss counters for migration entiries
  2012-01-06 17:38 ` [PATCH 3/3] mm: adjust rss counters for migration entiries Konstantin Khlebnikov
@ 2012-01-06 17:45   ` Konstantin Khlebnikov
  2012-01-11  5:41   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2012-01-06 17:45 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 121 bytes --]

This bug can be easily triggered by test-tool in attachment.
I run it with with this arguments: ./mm-thp-torture 100 100

[-- Attachment #2: mm-thp-torture.c --]
[-- Type: text/x-csrc, Size: 1193 bytes --]

/*
 * torture-test for transparent huge pages,
 * memory migration and memory compaction =)
 *
 * usage: ./mm-thp-tortire <threads> <pages>
 *
 * to eat all avaliable huge pages:
 * threads * pages >= ram[mb] / 2[mb]
 *
 */

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/mman.h>


#define PAGE_SIZE (4096)
#define HPAGE_SIZE (4096*512)

int main(int argc, char **argv)
{
	long pages, threads;
	char *buf, *ptr;

	if (argc < 3)
		return 2;

	signal(SIGCHLD, SIG_IGN);

	threads = atol(argv[1]);
	pages = atol(argv[2]);

	buf = mmap(NULL, (pages + 1) * HPAGE_SIZE, PROT_READ|PROT_WRITE,
			MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
	if (buf == MAP_FAILED)
		return 3;

	buf += HPAGE_SIZE - ((long)buf & (HPAGE_SIZE-1));

	while (--threads > 0 && !fork());

	while (1) {
		for ( ptr = buf ; ptr < buf + pages * HPAGE_SIZE ;
				ptr += HPAGE_SIZE ) {
			if (mmap(ptr, HPAGE_SIZE, PROT_READ|PROT_WRITE,
			    MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE|MAP_FIXED,
			    -1, 0) != ptr)
				return 4;
			*ptr = 0;
			munmap(ptr + PAGE_SIZE, HPAGE_SIZE - PAGE_SIZE);
		}
		if (!fork())
			exit(0);
		munmap(buf, pages * HPAGE_SIZE);
	}
	return 0;
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm: adjust rss counters for migration entiries
  2012-01-06 17:38 ` [PATCH 3/3] mm: adjust rss counters for migration entiries Konstantin Khlebnikov
  2012-01-06 17:45   ` Konstantin Khlebnikov
@ 2012-01-11  5:41   ` KAMEZAWA Hiroyuki
  2012-01-11  8:23     ` Konstantin Khlebnikov
  1 sibling, 1 reply; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-11  5:41 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: Andrew Morton, Hugh Dickins, linux-mm, linux-kernel

On Fri, 06 Jan 2012 21:38:56 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Memory migration fill pte with migration entry and it didn't update rss counters.
> Then it replace migration entry with new page (or old one if migration was failed).
> But between this two passes this pte can be unmaped, or task can fork child and
> it will get copy of this migration entry. Nobody account this into rss counters.
> 
> This patch properly adjust rss counters for migration entries in zap_pte_range()
> and copy_one_pte(). Thus we avoid extra atomic operations on migration fast-path.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

It's better to show wheter this is a bug-fix or not in changelog.

IIUC, the bug-fix is the 1st harf of this patch + patch [2/3].
Your new bug-check code is in patch[1/3] and 2nd half of this patch.

I think it's better to do bug-fix 1st and add bug-check later.

So, could you reorder patches to bug-fix and new-bug-check ?

To the logic itself,
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Please CC when you repost.



> ---
>  mm/memory.c |   37 ++++++++++++++++++++++++++++---------
>  1 files changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 829d437..2f96ffc 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -878,15 +878,24 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  			}
>  			if (likely(!non_swap_entry(entry)))
>  				rss[MM_SWAPENTS]++;
> -			else if (is_write_migration_entry(entry) &&
> -					is_cow_mapping(vm_flags)) {
> -				/*
> -				 * COW mappings require pages in both parent
> -				 * and child to be set to read.
> -				 */
> -				make_migration_entry_read(&entry);
> -				pte = swp_entry_to_pte(entry);
> -				set_pte_at(src_mm, addr, src_pte, pte);
> +			else if (is_migration_entry(entry)) {
> +				page = migration_entry_to_page(entry);
> +
> +				if (PageAnon(page))
> +					rss[MM_ANONPAGES]++;
> +				else
> +					rss[MM_FILEPAGES]++;
> +
> +				if (is_write_migration_entry(entry) &&
> +				    is_cow_mapping(vm_flags)) {
> +					/*
> +					 * COW mappings require pages in both
> +					 * parent and child to be set to read.
> +					 */
> +					make_migration_entry_read(&entry);
> +					pte = swp_entry_to_pte(entry);
> +					set_pte_at(src_mm, addr, src_pte, pte);
> +				}
>  			}
>  		}
>  		goto out_set_pte;
> @@ -1191,6 +1200,16 @@ again:
>  
>  			if (!non_swap_entry(entry))
>  				rss[MM_SWAPENTS]--;
> +			else if (is_migration_entry(entry)) {
> +				struct page *page;
> +
> +				page = migration_entry_to_page(entry);
> +
> +				if (PageAnon(page))
> +					rss[MM_ANONPAGES]--;
> +				else
> +					rss[MM_FILEPAGES]--;
> +			}
>  			if (unlikely(!free_swap_and_cache(entry)))
>  				print_bad_pte(vma, addr, ptent, NULL);
>  		}
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm: adjust rss counters for migration entiries
  2012-01-11  5:41   ` KAMEZAWA Hiroyuki
@ 2012-01-11  8:23     ` Konstantin Khlebnikov
  2012-01-11  8:41       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2012-01-11  8:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Andrew Morton, Hugh Dickins, linux-mm, linux-kernel

KAMEZAWA Hiroyuki wrote:
> On Fri, 06 Jan 2012 21:38:56 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> Memory migration fill pte with migration entry and it didn't update rss counters.
>> Then it replace migration entry with new page (or old one if migration was failed).
>> But between this two passes this pte can be unmaped, or task can fork child and
>> it will get copy of this migration entry. Nobody account this into rss counters.
>>
>> This patch properly adjust rss counters for migration entries in zap_pte_range()
>> and copy_one_pte(). Thus we avoid extra atomic operations on migration fast-path.
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>
> It's better to show wheter this is a bug-fix or not in changelog.
>
> IIUC, the bug-fix is the 1st harf of this patch + patch [2/3].
> Your new bug-check code is in patch[1/3] and 2nd half of this patch.
>

No, there only one new bug-check in 1st patch, this is non-fatal warning.
I didn't hide this check under CONFIG_VM_DEBUG because it rather small and
rss counters covers whole page-table management, this is very good invariant.
Currently I can trigger this warning only on this rare race -- extremely loaded
memory compaction catches this every several seconds.

1/3 bug-check
2/3 fix preparation
3/3 bugfix in two places:
     do rss++ in copy_one_pte()
     do rss-- in zap_pte_range()

> I think it's better to do bug-fix 1st and add bug-check later.
>
> So, could you reorder patches to bug-fix and new-bug-check ?

Patches didn't share any context, so they can be applied in any order.

>
> To the logic itself,
> Acked-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> Please CC when you repost.
>
>
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm: adjust rss counters for migration entiries
  2012-01-11  8:23     ` Konstantin Khlebnikov
@ 2012-01-11  8:41       ` KAMEZAWA Hiroyuki
  2012-01-18 23:21         ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-11  8:41 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: Andrew Morton, Hugh Dickins, linux-mm, linux-kernel

On Wed, 11 Jan 2012 12:23:11 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Fri, 06 Jan 2012 21:38:56 +0400
> > Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
> >
> >> Memory migration fill pte with migration entry and it didn't update rss counters.
> >> Then it replace migration entry with new page (or old one if migration was failed).
> >> But between this two passes this pte can be unmaped, or task can fork child and
> >> it will get copy of this migration entry. Nobody account this into rss counters.
> >>
> >> This patch properly adjust rss counters for migration entries in zap_pte_range()
> >> and copy_one_pte(). Thus we avoid extra atomic operations on migration fast-path.
> >>
> >> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
> >
> > It's better to show wheter this is a bug-fix or not in changelog.
> >
> > IIUC, the bug-fix is the 1st harf of this patch + patch [2/3].
> > Your new bug-check code is in patch[1/3] and 2nd half of this patch.
> >
> 
> No, there only one new bug-check in 1st patch, this is non-fatal warning.
> I didn't hide this check under CONFIG_VM_DEBUG because it rather small and
> rss counters covers whole page-table management, this is very good invariant.
> Currently I can trigger this warning only on this rare race -- extremely loaded
> memory compaction catches this every several seconds.
> 
> 1/3 bug-check
> 2/3 fix preparation
> 3/3 bugfix in two places:
>      do rss++ in copy_one_pte()
>      do rss-- in zap_pte_range()
> 

Hmm, ok, I read wrong.

So, I think you should post the patch with [BUGFIX] and
report 'what happens' and 'what is the bug' , 'what you fixed' explicitly.

As...
==
  This patch series fixes per-mm rss counter accounting bug. When pages are
  heavily migrated, the rss counters will go wrong by fork() and unmap()
  because they ignores migration_pte_entries.
  This rarelly happens but will make rss counter incorrect.

  This seires of patches will fix the issue by adding proper accounting of
  migration_pte_entries in unmap() and fork(). This series includes
  bug check code, too.
==

If [BUGFIX], people will have more interests.

Anyway, thank you for bugfix.

-Kame


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm: adjust rss counters for migration entiries
  2012-01-11  8:41       ` KAMEZAWA Hiroyuki
@ 2012-01-18 23:21         ` Andrew Morton
  2012-01-24  1:42           ` Hugh Dickins
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2012-01-18 23:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Konstantin Khlebnikov, Hugh Dickins, linux-mm, linux-kernel

On Wed, 11 Jan 2012 17:41:26 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Wed, 11 Jan 2012 12:23:11 +0400
> Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> 
> > KAMEZAWA Hiroyuki wrote:
> > > On Fri, 06 Jan 2012 21:38:56 +0400
> > > Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
> > >
> > >> Memory migration fill pte with migration entry and it didn't update rss counters.
> > >> Then it replace migration entry with new page (or old one if migration was failed).
> > >> But between this two passes this pte can be unmaped, or task can fork child and
> > >> it will get copy of this migration entry. Nobody account this into rss counters.
> > >>
> > >> This patch properly adjust rss counters for migration entries in zap_pte_range()
> > >> and copy_one_pte(). Thus we avoid extra atomic operations on migration fast-path.
> > >>
> > >> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
> > >
> > > It's better to show wheter this is a bug-fix or not in changelog.
> > >
> > > IIUC, the bug-fix is the 1st harf of this patch + patch [2/3].
> > > Your new bug-check code is in patch[1/3] and 2nd half of this patch.
> > >
> > 
> > No, there only one new bug-check in 1st patch, this is non-fatal warning.
> > I didn't hide this check under CONFIG_VM_DEBUG because it rather small and
> > rss counters covers whole page-table management, this is very good invariant.
> > Currently I can trigger this warning only on this rare race -- extremely loaded
> > memory compaction catches this every several seconds.
> > 
> > 1/3 bug-check
> > 2/3 fix preparation
> > 3/3 bugfix in two places:
> >      do rss++ in copy_one_pte()
> >      do rss-- in zap_pte_range()
> > 
> 
> Hmm, ok, I read wrong.
> 
> So, I think you should post the patch with [BUGFIX] and
> report 'what happens' and 'what is the bug' , 'what you fixed' explicitly.
> 
> As...
> ==
>   This patch series fixes per-mm rss counter accounting bug. When pages are
>   heavily migrated, the rss counters will go wrong by fork() and unmap()
>   because they ignores migration_pte_entries.
>   This rarelly happens but will make rss counter incorrect.
> 
>   This seires of patches will fix the issue by adding proper accounting of
>   migration_pte_entries in unmap() and fork(). This series includes
>   bug check code, too.

Putting "fix" in the patch title text is a good way of handling this.

I renamed [3/3] to "mm: fix rss count leakage during migration" and
shall queue it for 3.3.  If people think we should also backport it
into -stable then please let me know.

I reordered the patches and worked the chagnelogs quite a bit.  I now
have:

: From: Konstantin Khlebnikov <khlebnikov@openvz.org>
: Subject: mm: fix rss count leakage during migration
: 
: Memory migration fills a pte with a migration entry and it doesn't update
: the rss counters.  Then it replaces the migration entry with the new page
: (or the old one if migration failed).  But between these two passes this
: pte can be unmaped, or a task can fork a child and it will get a copy of
: this migration entry.  Nobody accounts for this in the rss counters.
: 
: This patch properly adjust rss counters for migration entries in
: zap_pte_range() and copy_one_pte().  Thus we avoid extra atomic operations
: on the migration fast-path.
: 
: Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
: Cc: Hugh Dickins <hughd@google.com>
: Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

and

: From: Konstantin Khlebnikov <khlebnikov@openvz.org>
: Subject: mm: add rss counters consistency check
: 
: Warn about non-zero rss counters at final mmdrop.
: 
: This check will prevent reoccurences of bugs such as that fixed in "mm:
: fix rss count leakage during migration".
: 
: I didn't hide this check under CONFIG_VM_DEBUG because it rather small and
: rss counters cover whole page-table management, so this is a good
: invariant.
: 
: Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
: Cc: Hugh Dickins <hughd@google.com>
: Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

and

: From: Konstantin Khlebnikov <khlebnikov@openvz.org>
: Subject: mm: postpone migrated page mapping reset
: 
: Postpone resetting page->mapping until the final remove_migration_ptes(). 
: Otherwise the expression PageAnon(migration_entry_to_page(entry)) does not
: work.
: 
: Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
: Cc: Hugh Dickins <hughd@google.com>
: Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm: adjust rss counters for migration entiries
  2012-01-18 23:21         ` Andrew Morton
@ 2012-01-24  1:42           ` Hugh Dickins
  2012-01-24  7:08             ` Konstantin Khlebnikov
  0 siblings, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2012-01-24  1:42 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, linux-mm, linux-kernel

On Wed, 18 Jan 2012, Andrew Morton wrote:
> On Wed, 11 Jan 2012 17:41:26 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Wed, 11 Jan 2012 12:23:11 +0400
> > Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

I only just got around to looking at these, sorry.

> 
> Putting "fix" in the patch title text is a good way of handling this.
> 
> I renamed [3/3] to "mm: fix rss count leakage during migration" and
> shall queue it for 3.3.  If people think we should also backport it
> into -stable then please let me know.

I don't think it needs backporting to stable: unless I'm forgetting
something, the only thing that actually uses these rss counters is the
OOM killer, and I don't think that will be greatly affected by the bug.

> 
> I reordered the patches and worked the chagnelogs quite a bit.  I now
> have:
> 
> : From: Konstantin Khlebnikov <khlebnikov@openvz.org>
> : Subject: mm: fix rss count leakage during migration
> : 
> : Memory migration fills a pte with a migration entry and it doesn't update
> : the rss counters.  Then it replaces the migration entry with the new page
> : (or the old one if migration failed).  But between these two passes this
> : pte can be unmaped, or a task can fork a child and it will get a copy of
> : this migration entry.  Nobody accounts for this in the rss counters.
> : 
> : This patch properly adjust rss counters for migration entries in
> : zap_pte_range() and copy_one_pte().  Thus we avoid extra atomic operations
> : on the migration fast-path.
> : 
> : Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> : Cc: Hugh Dickins <hughd@google.com>
> : Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

That was a good find, Konstantin: thank you.

> 
> and
> 
> : From: Konstantin Khlebnikov <khlebnikov@openvz.org>
> : Subject: mm: add rss counters consistency check
> : 
> : Warn about non-zero rss counters at final mmdrop.
> : 
> : This check will prevent reoccurences of bugs such as that fixed in "mm:
> : fix rss count leakage during migration".
> : 
> : I didn't hide this check under CONFIG_VM_DEBUG because it rather small and
> : rss counters cover whole page-table management, so this is a good
> : invariant.
> : 
> : Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> : Cc: Hugh Dickins <hughd@google.com>
> : Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

I'd be happier with this one if you do hide the check under
CONFIG_VM_DEBUG - or even under CONFIG_DEBUG_VM if you want it to
be compiled in sometimes ;)  I suppose NR_MM_COUNTERS is only 3,
so it isn't a huge overhead; but I think you're overestimating the
importance of these counters, and it would look better under DEBUG_VM.

> 
> and
> 
> : From: Konstantin Khlebnikov <khlebnikov@openvz.org>
> : Subject: mm: postpone migrated page mapping reset
> : 
> : Postpone resetting page->mapping until the final remove_migration_ptes(). 
> : Otherwise the expression PageAnon(migration_entry_to_page(entry)) does not
> : work.
> : 
> : Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> : Cc: Hugh Dickins <hughd@google.com>
> : Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Isn't this one actually an essential part of the fix?  It should have
been part of the same patch, but you split them apart, now Andrew has
reordered them and pushed one part to 3.3, but this needs to go in too?

Hugh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm: adjust rss counters for migration entiries
  2012-01-24  1:42           ` Hugh Dickins
@ 2012-01-24  7:08             ` Konstantin Khlebnikov
  2012-01-25 23:01               ` Hugh Dickins
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2012-01-24  7:08 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, KAMEZAWA Hiroyuki, linux-mm, linux-kernel

Hugh Dickins wrote:
> On Wed, 18 Jan 2012, Andrew Morton wrote:
>> On Wed, 11 Jan 2012 17:41:26 +0900
>> KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>>> On Wed, 11 Jan 2012 12:23:11 +0400
>>> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
> I only just got around to looking at these, sorry.
>
>>
>> Putting "fix" in the patch title text is a good way of handling this.
>>
>> I renamed [3/3] to "mm: fix rss count leakage during migration" and
>> shall queue it for 3.3.  If people think we should also backport it
>> into -stable then please let me know.
>
> I don't think it needs backporting to stable: unless I'm forgetting
> something, the only thing that actually uses these rss counters is the
> OOM killer, and I don't think that will be greatly affected by the bug.
>

Yes, there are only counters. But, I can imagine local DoS attack via this race.

>>
>> I reordered the patches and worked the chagnelogs quite a bit.  I now
>> have:
>>
>> : From: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> : Subject: mm: fix rss count leakage during migration
>> :
>> : Memory migration fills a pte with a migration entry and it doesn't update
>> : the rss counters.  Then it replaces the migration entry with the new page
>> : (or the old one if migration failed).  But between these two passes this
>> : pte can be unmaped, or a task can fork a child and it will get a copy of
>> : this migration entry.  Nobody accounts for this in the rss counters.
>> :
>> : This patch properly adjust rss counters for migration entries in
>> : zap_pte_range() and copy_one_pte().  Thus we avoid extra atomic operations
>> : on the migration fast-path.
>> :
>> : Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> : Cc: Hugh Dickins<hughd@google.com>
>> : Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> That was a good find, Konstantin: thank you.
>
>>
>> and
>>
>> : From: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> : Subject: mm: add rss counters consistency check
>> :
>> : Warn about non-zero rss counters at final mmdrop.
>> :
>> : This check will prevent reoccurences of bugs such as that fixed in "mm:
>> : fix rss count leakage during migration".
>> :
>> : I didn't hide this check under CONFIG_VM_DEBUG because it rather small and
>> : rss counters cover whole page-table management, so this is a good
>> : invariant.
>> :
>> : Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> : Cc: Hugh Dickins<hughd@google.com>
>> : Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> I'd be happier with this one if you do hide the check under
> CONFIG_VM_DEBUG - or even under CONFIG_DEBUG_VM if you want it to
> be compiled in sometimes ;)  I suppose NR_MM_COUNTERS is only 3,
> so it isn't a huge overhead; but I think you're overestimating the
> importance of these counters, and it would look better under DEBUG_VM.

Theoretically, some drivers can touch page tables,
for example if they do that outside of vma we can get some kind of strange memory leaks.

>
>>
>> and
>>
>> : From: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> : Subject: mm: postpone migrated page mapping reset
>> :
>> : Postpone resetting page->mapping until the final remove_migration_ptes().
>> : Otherwise the expression PageAnon(migration_entry_to_page(entry)) does not
>> : work.
>> :
>> : Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> : Cc: Hugh Dickins<hughd@google.com>
>> : Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Isn't this one actually an essential part of the fix?  It should have
> been part of the same patch, but you split them apart, now Andrew has
> reordered them and pushed one part to 3.3, but this needs to go in too?
>

Oops. I missed that. Yes. race-fix does not work for anon-memory without that patch.
But this is non-fatal, there are no new bugs.

> Hugh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm: adjust rss counters for migration entiries
  2012-01-24  7:08             ` Konstantin Khlebnikov
@ 2012-01-25 23:01               ` Hugh Dickins
  2012-01-26  0:20                 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2012-01-25 23:01 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, linux-mm, linux-kernel

On Tue, 24 Jan 2012, Konstantin Khlebnikov wrote:
> Hugh Dickins wrote:
> > On Wed, 18 Jan 2012, Andrew Morton wrote:
> > > : From: Konstantin Khlebnikov<khlebnikov@openvz.org>
> > > : Subject: mm: add rss counters consistency check
> > > :
> > > : Warn about non-zero rss counters at final mmdrop.
> > > :
> > > : This check will prevent reoccurences of bugs such as that fixed in "mm:
> > > : fix rss count leakage during migration".
> > > :
> > > : I didn't hide this check under CONFIG_VM_DEBUG because it rather small
> > > and
> > > : rss counters cover whole page-table management, so this is a good
> > > : invariant.
> > > :
> > > : Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
> > > : Cc: Hugh Dickins<hughd@google.com>
> > > : Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > I'd be happier with this one if you do hide the check under
> > CONFIG_VM_DEBUG - or even under CONFIG_DEBUG_VM if you want it to
> > be compiled in sometimes ;)  I suppose NR_MM_COUNTERS is only 3,
> > so it isn't a huge overhead; but I think you're overestimating the
> > importance of these counters, and it would look better under DEBUG_VM.
> 
> Theoretically, some drivers can touch page tables,
> for example if they do that outside of vma we can get some kind of strange
> memory leaks.

I don't understand you on that.  Sure, drivers could do all kinds of
damage, but if they're touching pagetables outside of the vmas, then
this check on rss at exit isn't going to catch them.

But the message I get is that you want to leave the check (which would
have been better at the end of exit_mmap, I think, but never mind)
outside of CONFIG_DEBUG_VM: okay, I don't feel strongly enough.

> > > : From: Konstantin Khlebnikov<khlebnikov@openvz.org>
> > > : Subject: mm: postpone migrated page mapping reset
> > > :
> > > : Postpone resetting page->mapping until the final
> > > remove_migration_ptes().
> > > : Otherwise the expression PageAnon(migration_entry_to_page(entry)) does
> > > not
> > > : work.
> > > :
> > > : Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
> > > : Cc: Hugh Dickins<hughd@google.com>
> > > : Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Isn't this one actually an essential part of the fix?  It should have
> > been part of the same patch, but you split them apart, now Andrew has
> > reordered them and pushed one part to 3.3, but this needs to go in too?
> > 
> 
> Oops. I missed that. Yes. race-fix does not work for anon-memory without that
> patch.
> But this is non-fatal, there are no new bugs.

Non-fatal and no new bug, yes, but it makes the fix which has already
gone in rather less of a fix than was intended (it'll get the total right,
but misreport anon as file).  Andrew, please add this one to your next
push to Linus - thanks.

Hugh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm: adjust rss counters for migration entiries
  2012-01-25 23:01               ` Hugh Dickins
@ 2012-01-26  0:20                 ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2012-01-26  0:20 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Konstantin Khlebnikov, KAMEZAWA Hiroyuki, linux-mm, linux-kernel

On Wed, 25 Jan 2012 15:01:38 -0800 (PST)
Hugh Dickins <hughd@google.com> wrote:

> > > > : From: Konstantin Khlebnikov<khlebnikov@openvz.org>
> > > > : Subject: mm: postpone migrated page mapping reset
> > > > :
> > > > : Postpone resetting page->mapping until the final
> > > > remove_migration_ptes().
> > > > : Otherwise the expression PageAnon(migration_entry_to_page(entry)) does
> > > > not
> > > > : work.
> > > > :
> > > > : Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
> > > > : Cc: Hugh Dickins<hughd@google.com>
> > > > : Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> > > 
> > > Isn't this one actually an essential part of the fix?  It should have
> > > been part of the same patch, but you split them apart, now Andrew has
> > > reordered them and pushed one part to 3.3, but this needs to go in too?
> > > 
> > 
> > Oops. I missed that. Yes. race-fix does not work for anon-memory without that
> > patch.
> > But this is non-fatal, there are no new bugs.
> 
> Non-fatal and no new bug, yes, but it makes the fix which has already
> gone in rather less of a fix than was intended (it'll get the total right,
> but misreport anon as file).  Andrew, please add this one to your next
> push to Linus - thanks.

Shall do.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-01-26  0:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-06 17:38 [PATCH 1/3] mm: add rss counters consistency check Konstantin Khlebnikov
2012-01-06 17:38 ` [PATCH 2/3] mm: postpone migrated page mapping reset Konstantin Khlebnikov
2012-01-06 17:38 ` [PATCH 3/3] mm: adjust rss counters for migration entiries Konstantin Khlebnikov
2012-01-06 17:45   ` Konstantin Khlebnikov
2012-01-11  5:41   ` KAMEZAWA Hiroyuki
2012-01-11  8:23     ` Konstantin Khlebnikov
2012-01-11  8:41       ` KAMEZAWA Hiroyuki
2012-01-18 23:21         ` Andrew Morton
2012-01-24  1:42           ` Hugh Dickins
2012-01-24  7:08             ` Konstantin Khlebnikov
2012-01-25 23:01               ` Hugh Dickins
2012-01-26  0:20                 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).