All of lore.kernel.org
 help / color / mirror / Atom feed
* hugetlb: preserve hugetlb pte dirty state
@ 2007-02-06 21:06 Ken Chen
  2007-02-06 21:36 ` Nish Aravamudan
  2007-02-07  0:35 ` Andrew Morton
  0 siblings, 2 replies; 7+ messages in thread
From: Ken Chen @ 2007-02-06 21:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm

__unmap_hugepage_range() is buggy that it does not preserve dirty
state of huge_pte when unmapping hugepage range.  It causes data
corruption in the event of dop_caches being used by sys admin.
For example, an application creates a hugetlb file, modify pages,
then unmap it.  While leaving the hugetlb file alive, comes along
sys admin doing a "echo 3 > /proc/sys/vm/drop_caches".
drop_pagecache_sb() will happily frees all pages that isn't marked
dirty if there are no active mapping. Later when application remaps
the hugetlb file back and all data are gone, triggering catastrophic
flip over on application.

Not only that, the internal resv_huge_pages count will also get all
messed up.  Fix it up by marking page dirty appropriately.

Signed-off-by: Ken Chen <kenchen@google.com>

--- ./mm/hugetlb.c.orig	2007-02-06 08:28:33.000000000 -0800
+++ ./mm/hugetlb.c	2007-02-06 08:29:47.000000000 -0800
@@ -389,6 +389,8 @@
 			continue;

 		page = pte_page(pte);
+		if (pte_dirty(pte))
+			set_page_dirty(page);
 		list_add(&page->lru, &page_list);
 	}
 	spin_unlock(&mm->page_table_lock);
--- ./fs/hugetlbfs/inode.c.orig	2007-02-06 08:29:56.000000000 -0800
+++ ./fs/hugetlbfs/inode.c	2007-02-06 08:40:44.000000000 -0800
@@ -449,10 +449,13 @@
 }

 /*
- * For direct-IO reads into hugetlb pages
+ * mark the head page dirty
  */
 static int hugetlbfs_set_page_dirty(struct page *page)
 {
+	struct page *head = (struct page *) page_private(page);
+
+	SetPageDirty(head);
 	return 0;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hugetlb: preserve hugetlb pte dirty state
  2007-02-06 21:06 hugetlb: preserve hugetlb pte dirty state Ken Chen
@ 2007-02-06 21:36 ` Nish Aravamudan
  2007-02-06 21:43   ` Ken Chen
  2007-02-07  0:35 ` Andrew Morton
  1 sibling, 1 reply; 7+ messages in thread
From: Nish Aravamudan @ 2007-02-06 21:36 UTC (permalink / raw)
  To: Ken Chen; +Cc: Andrew Morton, linux-mm

On 2/6/07, Ken Chen <kenchen@google.com> wrote:
> __unmap_hugepage_range() is buggy that it does not preserve dirty
> state of huge_pte when unmapping hugepage range.  It causes data
> corruption in the event of dop_caches being used by sys admin.
> For example, an application creates a hugetlb file, modify pages,
> then unmap it.  While leaving the hugetlb file alive, comes along
> sys admin doing a "echo 3 > /proc/sys/vm/drop_caches".
> drop_pagecache_sb() will happily frees all pages that isn't marked
> dirty if there are no active mapping. Later when application remaps
> the hugetlb file back and all data are gone, triggering catastrophic
> flip over on application.
>
> Not only that, the internal resv_huge_pages count will also get all
> messed up.  Fix it up by marking page dirty appropriately.
>
> Signed-off-by: Ken Chen <kenchen@google.com>

This fixes my bug with HugePages_Rsvd going to 2^64 - 1.
("Hugepages_Rsvd goes huge in 2.6.20-rc7" is the subject on linux-mm).
Stable material, too, I would say.

Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hugetlb: preserve hugetlb pte dirty state
  2007-02-06 21:36 ` Nish Aravamudan
@ 2007-02-06 21:43   ` Ken Chen
  0 siblings, 0 replies; 7+ messages in thread
From: Ken Chen @ 2007-02-06 21:43 UTC (permalink / raw)
  To: Nish Aravamudan; +Cc: Andrew Morton, linux-mm

On 2/6/07, Nish Aravamudan <nish.aravamudan@gmail.com> wrote:
> This fixes my bug with HugePages_Rsvd going to 2^64 - 1.
> ("Hugepages_Rsvd goes huge in 2.6.20-rc7" is the subject on linux-mm).
> Stable material, too, I would say.

Wow, we hit the same bug in different ways, nice to hear that this
patch fixed the problem you observed.

- Ken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hugetlb: preserve hugetlb pte dirty state
  2007-02-06 21:06 hugetlb: preserve hugetlb pte dirty state Ken Chen
  2007-02-06 21:36 ` Nish Aravamudan
@ 2007-02-07  0:35 ` Andrew Morton
  2007-02-07  0:47   ` Ken Chen
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2007-02-07  0:35 UTC (permalink / raw)
  To: Ken Chen; +Cc: linux-mm

On Tue, 6 Feb 2007 13:06:39 -0800
"Ken Chen" <kenchen@google.com> wrote:

> --- ./mm/hugetlb.c.orig	2007-02-06 08:28:33.000000000 -0800
> +++ ./mm/hugetlb.c	2007-02-06 08:29:47.000000000 -0800
> @@ -389,6 +389,8 @@
>  			continue;
> 
>  		page = pte_page(pte);
> +		if (pte_dirty(pte))
> +			set_page_dirty(page);
>  		list_add(&page->lru, &page_list);
>  	}
>  	spin_unlock(&mm->page_table_lock);

I guess we really should be setting these pages dirty at fault-time, as we're
now doing with regular pages.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hugetlb: preserve hugetlb pte dirty state
  2007-02-07  0:35 ` Andrew Morton
@ 2007-02-07  0:47   ` Ken Chen
  2007-02-07  1:08     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Ken Chen @ 2007-02-07  0:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm

On 2/6/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > --- ./mm/hugetlb.c.orig       2007-02-06 08:28:33.000000000 -0800
> > +++ ./mm/hugetlb.c    2007-02-06 08:29:47.000000000 -0800
> > @@ -389,6 +389,8 @@
> >                       continue;
> >
> >               page = pte_page(pte);
> > +             if (pte_dirty(pte))
> > +                     set_page_dirty(page);
> >               list_add(&page->lru, &page_list);
> >       }
> >       spin_unlock(&mm->page_table_lock);
>
> I guess we really should be setting these pages dirty at fault-time, as we're
> now doing with regular pages.

yeah, I wonder why I didn't do that :-P  Especially after I asked you
a similar question the other day.  I will redo it and retest.

- Ken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hugetlb: preserve hugetlb pte dirty state
  2007-02-07  0:47   ` Ken Chen
@ 2007-02-07  1:08     ` Andrew Morton
  2007-02-07  1:22       ` Ken Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2007-02-07  1:08 UTC (permalink / raw)
  To: Ken Chen; +Cc: linux-mm

On Tue, 6 Feb 2007 16:47:21 -0800
"Ken Chen" <kenchen@google.com> wrote:

> On 2/6/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > > --- ./mm/hugetlb.c.orig       2007-02-06 08:28:33.000000000 -0800
> > > +++ ./mm/hugetlb.c    2007-02-06 08:29:47.000000000 -0800
> > > @@ -389,6 +389,8 @@
> > >                       continue;
> > >
> > >               page = pte_page(pte);
> > > +             if (pte_dirty(pte))
> > > +                     set_page_dirty(page);
> > >               list_add(&page->lru, &page_list);
> > >       }
> > >       spin_unlock(&mm->page_table_lock);
> >
> > I guess we really should be setting these pages dirty at fault-time, as we're
> > now doing with regular pages.
> 
> yeah, I wonder why I didn't do that :-P  Especially after I asked you
> a similar question the other day.  I will redo it and retest.
> 

No rush - the code should work OK as-is and given the total catastrophe
which that change caused in core mm, making the same change to hugepages
should be done with some care and thought and maintainer-poking.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hugetlb: preserve hugetlb pte dirty state
  2007-02-07  1:08     ` Andrew Morton
@ 2007-02-07  1:22       ` Ken Chen
  0 siblings, 0 replies; 7+ messages in thread
From: Ken Chen @ 2007-02-07  1:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm

On 2/6/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > yeah, I wonder why I didn't do that :-P  Especially after I asked you
> > a similar question the other day.  I will redo it and retest.
>
> No rush - the code should work OK as-is and given the total catastrophe
> which that change caused in core mm, making the same change to hugepages
> should be done with some care and thought and maintainer-poking.

OK.  That's comforting.  I was thinking along the same line that
hugetlb should be in-sync with core mm code and should continue to do
the same thing like core mm in the unmap path (to transfer dirty bit
into page struct); even if we dirty a page in the fault path and that
dirty state will be on for the life of the page.  I will play with the
fault path for a while then.

- Ken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-02-07  1:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-06 21:06 hugetlb: preserve hugetlb pte dirty state Ken Chen
2007-02-06 21:36 ` Nish Aravamudan
2007-02-06 21:43   ` Ken Chen
2007-02-07  0:35 ` Andrew Morton
2007-02-07  0:47   ` Ken Chen
2007-02-07  1:08     ` Andrew Morton
2007-02-07  1:22       ` Ken Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.