All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fix hugepage coredump
@ 2013-03-28 15:42 Naoya Horiguchi
  2013-03-28 15:42   ` Naoya Horiguchi
  2013-03-28 15:42   ` Naoya Horiguchi
  0 siblings, 2 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, stable,
	linux-mm, linux-kernel

Hi,

This small patch series fixes problems on hugepage coredump,
where we cannot include any data on hugepages into coredump.
See individual patches for more details.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-28 15:42 [PATCH 0/2] fix hugepage coredump Naoya Horiguchi
@ 2013-03-28 15:42   ` Naoya Horiguchi
  2013-03-28 15:42   ` Naoya Horiguchi
  1 sibling, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, stable,
	linux-mm, linux-kernel, Konstantin Khlebnikov

Currently we fail to include any data on hugepages into coredump,
because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
mm->reserved_vm counter". This looks to me a serious regression,
so let's fix it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 fs/hugetlbfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
index 84e3d85..523464e 100644
--- v3.9-rc3.orig/fs/hugetlbfs/inode.c
+++ v3.9-rc3/fs/hugetlbfs/inode.c
@@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	 * way when do_mmap_pgoff unwinds (may be important on powerpc
 	 * and ia64).
 	 */
-	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
+	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-28 15:42   ` Naoya Horiguchi
  0 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, stable,
	linux-mm, linux-kernel, Konstantin Khlebnikov

Currently we fail to include any data on hugepages into coredump,
because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
mm->reserved_vm counter". This looks to me a serious regression,
so let's fix it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 fs/hugetlbfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
index 84e3d85..523464e 100644
--- v3.9-rc3.orig/fs/hugetlbfs/inode.c
+++ v3.9-rc3/fs/hugetlbfs/inode.c
@@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	 * way when do_mmap_pgoff unwinds (may be important on powerpc
 	 * and ia64).
 	 */
-	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
+	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
  2013-03-28 15:42 [PATCH 0/2] fix hugepage coredump Naoya Horiguchi
@ 2013-03-28 15:42   ` Naoya Horiguchi
  2013-03-28 15:42   ` Naoya Horiguchi
  1 sibling, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, stable,
	linux-mm, linux-kernel

With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access
the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0)
in get_page().

The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.
In other words, physical address information is stored in different bit layout
between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page()
which is called in get_dump_page() returns a wrong page from a given address.

We need to filter out only hwpoison hugepages to have data on healthy
hugepages in coredump. So this patch makes follow_hugetlb_page() avoid
trying to get page when a pmd is in swap entry like format.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/hugetlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c
index 0d1705b..8462e2c 100644
--- v3.9-rc3.orig/mm/hugetlb.c
+++ v3.9-rc3/mm/hugetlb.c
@@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * first, for the page indexing below to work.
 		 */
 		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
-		absent = !pte || huge_pte_none(huge_ptep_get(pte));
+		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
+			is_swap_pte(huge_ptep_get(pte));
 
 		/*
 		 * When coredumping, it suits get_dump_page if we just return
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
@ 2013-03-28 15:42   ` Naoya Horiguchi
  0 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro, stable,
	linux-mm, linux-kernel

With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access
the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0)
in get_page().

The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.
In other words, physical address information is stored in different bit layout
between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page()
which is called in get_dump_page() returns a wrong page from a given address.

We need to filter out only hwpoison hugepages to have data on healthy
hugepages in coredump. So this patch makes follow_hugetlb_page() avoid
trying to get page when a pmd is in swap entry like format.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/hugetlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c
index 0d1705b..8462e2c 100644
--- v3.9-rc3.orig/mm/hugetlb.c
+++ v3.9-rc3/mm/hugetlb.c
@@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * first, for the page indexing below to work.
 		 */
 		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
-		absent = !pte || huge_pte_none(huge_ptep_get(pte));
+		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
+			is_swap_pte(huge_ptep_get(pte));
 
 		/*
 		 * When coredumping, it suits get_dump_page if we just return
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-28 15:42   ` Naoya Horiguchi
@ 2013-03-28 15:51     ` Greg KH
  -1 siblings, 0 replies; 33+ messages in thread
From: Greg KH @ 2013-03-28 15:51 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel,
	Konstantin Khlebnikov

On Thu, Mar 28, 2013 at 11:42:37AM -0400, Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
> ---
>  fs/hugetlbfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)


<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-28 15:51     ` Greg KH
  0 siblings, 0 replies; 33+ messages in thread
From: Greg KH @ 2013-03-28 15:51 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel,
	Konstantin Khlebnikov

On Thu, Mar 28, 2013 at 11:42:37AM -0400, Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
> ---
>  fs/hugetlbfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)


<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-28 15:51     ` Greg KH
@ 2013-03-28 16:04       ` Naoya Horiguchi
  -1 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 16:04 UTC (permalink / raw)
  To: Greg KH
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel,
	Konstantin Khlebnikov

On Thu, Mar 28, 2013 at 08:51:09AM -0700, Greg KH wrote:
> On Thu, Mar 28, 2013 at 11:42:37AM -0400, Naoya Horiguchi wrote:
> > Currently we fail to include any data on hugepages into coredump,
> > because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> > introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> > mm->reserved_vm counter". This looks to me a serious regression,
> > so let's fix it.
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
> > ---
> >  fs/hugetlbfs/inode.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> <formletter>
> 
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.
> 
> </formletter>

I guess you mean this patch violates one/both of these rules:

 - It must fix a problem that causes a build error (but not for things
   marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
   security issue, or some "oh, that's not good" issue.  In short, something
   critical.
 - It or an equivalent fix must already exist in Linus' tree (upstream).

I'm not sure if the problem "we can't get any hugepage in coredump"
is considered as 'some "oh, that's not good" issue'.
But yes, it's not a critical one.
If you mean I violated the second rule, sorry, I'll get it into upstream first.

Thanks,
Naoya

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-28 16:04       ` Naoya Horiguchi
  0 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 16:04 UTC (permalink / raw)
  To: Greg KH
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel,
	Konstantin Khlebnikov

On Thu, Mar 28, 2013 at 08:51:09AM -0700, Greg KH wrote:
> On Thu, Mar 28, 2013 at 11:42:37AM -0400, Naoya Horiguchi wrote:
> > Currently we fail to include any data on hugepages into coredump,
> > because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> > introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> > mm->reserved_vm counter". This looks to me a serious regression,
> > so let's fix it.
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
> > ---
> >  fs/hugetlbfs/inode.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> <formletter>
> 
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.
> 
> </formletter>

I guess you mean this patch violates one/both of these rules:

 - It must fix a problem that causes a build error (but not for things
   marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
   security issue, or some "oh, that's not good" issue.  In short, something
   critical.
 - It or an equivalent fix must already exist in Linus' tree (upstream).

I'm not sure if the problem "we can't get any hugepage in coredump"
is considered as 'some "oh, that's not good" issue'.
But yes, it's not a critical one.
If you mean I violated the second rule, sorry, I'll get it into upstream first.

Thanks,
Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-28 15:42   ` Naoya Horiguchi
@ 2013-03-28 17:03     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 33+ messages in thread
From: Konstantin Khlebnikov @ 2013-03-28 17:03 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, linux-mm, linux-kernel

Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.

That was introduced in my patch? Really?
Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.

>
> Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
> Cc: Konstantin Khlebnikov<khlebnikov@openvz.org>
> ---
>   fs/hugetlbfs/inode.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
> index 84e3d85..523464e 100644
> --- v3.9-rc3.orig/fs/hugetlbfs/inode.c
> +++ v3.9-rc3/fs/hugetlbfs/inode.c
> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>   	 * way when do_mmap_pgoff unwinds (may be important on powerpc
>   	 * and ia64).
>   	 */
> -	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
> +	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
>   	vma->vm_ops =&hugetlb_vm_ops;
>
>   	if (vma->vm_pgoff&  (~huge_page_mask(h)>>  PAGE_SHIFT))


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-28 17:03     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 33+ messages in thread
From: Konstantin Khlebnikov @ 2013-03-28 17:03 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, linux-mm, linux-kernel

Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.

That was introduced in my patch? Really?
Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.

>
> Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
> Cc: Konstantin Khlebnikov<khlebnikov@openvz.org>
> ---
>   fs/hugetlbfs/inode.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
> index 84e3d85..523464e 100644
> --- v3.9-rc3.orig/fs/hugetlbfs/inode.c
> +++ v3.9-rc3/fs/hugetlbfs/inode.c
> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>   	 * way when do_mmap_pgoff unwinds (may be important on powerpc
>   	 * and ia64).
>   	 */
> -	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
> +	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
>   	vma->vm_ops =&hugetlb_vm_ops;
>
>   	if (vma->vm_pgoff&  (~huge_page_mask(h)>>  PAGE_SHIFT))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-28 17:03     ` Konstantin Khlebnikov
@ 2013-03-28 18:29       ` Naoya Horiguchi
  -1 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 18:29 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, linux-mm, linux-kernel

On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
> Naoya Horiguchi wrote:
> >Currently we fail to include any data on hugepages into coredump,
> >because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> >introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> >mm->reserved_vm counter". This looks to me a serious regression,
> >so let's fix it.
> 
> That was introduced in my patch? Really?
> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.

vma_dump_size() does like this (the diff is the one in 314e51b98):

   static unsigned long vma_dump_size(struct vm_area_struct *vma,
   				   unsigned long mm_flags)
   {
   #define FILTER(type)	(mm_flags & (1UL << MMF_DUMP_##type))
   
   	/* always dump the vdso and vsyscall sections */
   	if (always_dump_vma(vma))
   		goto whole;
  
  	if (vma->vm_flags & VM_DONTDUMP)
   		return 0;
   
   	/* Hugetlb memory check */
   	if (vma->vm_flags & VM_HUGETLB) {
   		if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
   			goto whole;
   		if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
   			goto whole;
   	}
   
   	/* Do not dump I/O mapped devices or special mappings */
  -	if (vma->vm_flags & (VM_IO | VM_RESERVED))
  +	if (vma->vm_flags & VM_IO)
   		return 0;

We have hugetlb memory check after VM_DONTDUMP check, so the following
changed the behavior.

  --- a/fs/hugetlbfs/inode.c
  +++ b/fs/hugetlbfs/inode.c
  @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
           * way when do_mmap_pgoff unwinds (may be important on powerpc
           * and ia64).
           */
  -       vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
  +       vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
          vma->vm_ops = &hugetlb_vm_ops;
   
          if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))

I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.

Thanks,
Naoya

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-28 18:29       ` Naoya Horiguchi
  0 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 18:29 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, linux-mm, linux-kernel

On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
> Naoya Horiguchi wrote:
> >Currently we fail to include any data on hugepages into coredump,
> >because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> >introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> >mm->reserved_vm counter". This looks to me a serious regression,
> >so let's fix it.
> 
> That was introduced in my patch? Really?
> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.

vma_dump_size() does like this (the diff is the one in 314e51b98):

   static unsigned long vma_dump_size(struct vm_area_struct *vma,
   				   unsigned long mm_flags)
   {
   #define FILTER(type)	(mm_flags & (1UL << MMF_DUMP_##type))
   
   	/* always dump the vdso and vsyscall sections */
   	if (always_dump_vma(vma))
   		goto whole;
  
  	if (vma->vm_flags & VM_DONTDUMP)
   		return 0;
   
   	/* Hugetlb memory check */
   	if (vma->vm_flags & VM_HUGETLB) {
   		if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
   			goto whole;
   		if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
   			goto whole;
   	}
   
   	/* Do not dump I/O mapped devices or special mappings */
  -	if (vma->vm_flags & (VM_IO | VM_RESERVED))
  +	if (vma->vm_flags & VM_IO)
   		return 0;

We have hugetlb memory check after VM_DONTDUMP check, so the following
changed the behavior.

  --- a/fs/hugetlbfs/inode.c
  +++ b/fs/hugetlbfs/inode.c
  @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
           * way when do_mmap_pgoff unwinds (may be important on powerpc
           * and ia64).
           */
  -       vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
  +       vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
          vma->vm_ops = &hugetlb_vm_ops;
   
          if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))

I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.

Thanks,
Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-28 16:04       ` Naoya Horiguchi
@ 2013-03-28 19:39         ` Ben Hutchings
  -1 siblings, 0 replies; 33+ messages in thread
From: Ben Hutchings @ 2013-03-28 19:39 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Greg KH, Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel,
	Konstantin Khlebnikov

On Thu, Mar 28, 2013 at 12:04:29PM -0400, Naoya Horiguchi wrote:
[...]
> I guess you mean this patch violates one/both of these rules:
> 
>  - It must fix a problem that causes a build error (but not for things
>    marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
>    security issue, or some "oh, that's not good" issue.  In short, something
>    critical.
>  - It or an equivalent fix must already exist in Linus' tree (upstream).
> 
> I'm not sure if the problem "we can't get any hugepage in coredump"
> is considered as 'some "oh, that's not good" issue'.
> But yes, it's not a critical one.
> If you mean I violated the second rule, sorry, I'll get it into upstream first.
 
The second rule is the clear one.  If you are submitting a patch to
a subsystem maintainer and you want it to go into stable branches as
well, you must put 'Cc: stable@vger.kernel.org' in the commit message,
not just the mail header.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-28 19:39         ` Ben Hutchings
  0 siblings, 0 replies; 33+ messages in thread
From: Ben Hutchings @ 2013-03-28 19:39 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Greg KH, Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel,
	Konstantin Khlebnikov

On Thu, Mar 28, 2013 at 12:04:29PM -0400, Naoya Horiguchi wrote:
[...]
> I guess you mean this patch violates one/both of these rules:
> 
>  - It must fix a problem that causes a build error (but not for things
>    marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
>    security issue, or some "oh, that's not good" issue.  In short, something
>    critical.
>  - It or an equivalent fix must already exist in Linus' tree (upstream).
> 
> I'm not sure if the problem "we can't get any hugepage in coredump"
> is considered as 'some "oh, that's not good" issue'.
> But yes, it's not a critical one.
> If you mean I violated the second rule, sorry, I'll get it into upstream first.
 
The second rule is the clear one.  If you are submitting a patch to
a subsystem maintainer and you want it to go into stable branches as
well, you must put 'Cc: stable@vger.kernel.org' in the commit message,
not just the mail header.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
                                                              - Albert Camus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-28 19:39         ` Ben Hutchings
@ 2013-03-28 19:47           ` Naoya Horiguchi
  -1 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 19:47 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Greg KH, Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel,
	Konstantin Khlebnikov

On Thu, Mar 28, 2013 at 07:39:01PM +0000, Ben Hutchings wrote:
> On Thu, Mar 28, 2013 at 12:04:29PM -0400, Naoya Horiguchi wrote:
> [...]
> > I guess you mean this patch violates one/both of these rules:
> > 
> >  - It must fix a problem that causes a build error (but not for things
> >    marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
> >    security issue, or some "oh, that's not good" issue.  In short, something
> >    critical.
> >  - It or an equivalent fix must already exist in Linus' tree (upstream).
> > 
> > I'm not sure if the problem "we can't get any hugepage in coredump"
> > is considered as 'some "oh, that's not good" issue'.
> > But yes, it's not a critical one.
> > If you mean I violated the second rule, sorry, I'll get it into upstream first.
>  
> The second rule is the clear one.  If you are submitting a patch to
> a subsystem maintainer and you want it to go into stable branches as
> well, you must put 'Cc: stable@vger.kernel.org' in the commit message,
> not just the mail header.

Got it. Thank you.
Naoya

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-28 19:47           ` Naoya Horiguchi
  0 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-28 19:47 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Greg KH, Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel,
	Konstantin Khlebnikov

On Thu, Mar 28, 2013 at 07:39:01PM +0000, Ben Hutchings wrote:
> On Thu, Mar 28, 2013 at 12:04:29PM -0400, Naoya Horiguchi wrote:
> [...]
> > I guess you mean this patch violates one/both of these rules:
> > 
> >  - It must fix a problem that causes a build error (but not for things
> >    marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
> >    security issue, or some "oh, that's not good" issue.  In short, something
> >    critical.
> >  - It or an equivalent fix must already exist in Linus' tree (upstream).
> > 
> > I'm not sure if the problem "we can't get any hugepage in coredump"
> > is considered as 'some "oh, that's not good" issue'.
> > But yes, it's not a critical one.
> > If you mean I violated the second rule, sorry, I'll get it into upstream first.
>  
> The second rule is the clear one.  If you are submitting a patch to
> a subsystem maintainer and you want it to go into stable branches as
> well, you must put 'Cc: stable@vger.kernel.org' in the commit message,
> not just the mail header.

Got it. Thank you.
Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-28 18:29       ` Naoya Horiguchi
@ 2013-03-29  5:30         ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 33+ messages in thread
From: Konstantin Khlebnikov @ 2013-03-29  5:30 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, linux-mm, linux-kernel

Naoya Horiguchi wrote:
> On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
>> Naoya Horiguchi wrote:
>>> Currently we fail to include any data on hugepages into coredump,
>>> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
>>> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
>>> mm->reserved_vm counter". This looks to me a serious regression,
>>> so let's fix it.
>>
>> That was introduced in my patch? Really?
>> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.
> 
> vma_dump_size() does like this (the diff is the one in 314e51b98):
> 
>     static unsigned long vma_dump_size(struct vm_area_struct *vma,
>     				   unsigned long mm_flags)
>     {
>     #define FILTER(type)	(mm_flags&  (1UL<<  MMF_DUMP_##type))
> 
>     	/* always dump the vdso and vsyscall sections */
>     	if (always_dump_vma(vma))
>     		goto whole;
> 
>    	if (vma->vm_flags&  VM_DONTDUMP)
>     		return 0;
> 
>     	/* Hugetlb memory check */
>     	if (vma->vm_flags&  VM_HUGETLB) {
>     		if ((vma->vm_flags&  VM_SHARED)&&  FILTER(HUGETLB_SHARED))
>     			goto whole;
>     		if (!(vma->vm_flags&  VM_SHARED)&&  FILTER(HUGETLB_PRIVATE))
>     			goto whole;
>     	}
> 
>     	/* Do not dump I/O mapped devices or special mappings */
>    -	if (vma->vm_flags&  (VM_IO | VM_RESERVED))
>    +	if (vma->vm_flags&  VM_IO)
>     		return 0;
> 
> We have hugetlb memory check after VM_DONTDUMP check, so the following
> changed the behavior.

Ok, I missed this in my patch.

> 
>    --- a/fs/hugetlbfs/inode.c
>    +++ b/fs/hugetlbfs/inode.c
>    @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>             * way when do_mmap_pgoff unwinds (may be important on powerpc
>             * and ia64).
>             */
>    -       vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
>    +       vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
>            vma->vm_ops =&hugetlb_vm_ops;
> 
>            if (vma->vm_pgoff&  (~huge_page_mask(h)>>  PAGE_SHIFT))
> 
> I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.

Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

> 
> Thanks,
> Naoya


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-29  5:30         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 33+ messages in thread
From: Konstantin Khlebnikov @ 2013-03-29  5:30 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, linux-mm, linux-kernel

Naoya Horiguchi wrote:
> On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
>> Naoya Horiguchi wrote:
>>> Currently we fail to include any data on hugepages into coredump,
>>> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
>>> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
>>> mm->reserved_vm counter". This looks to me a serious regression,
>>> so let's fix it.
>>
>> That was introduced in my patch? Really?
>> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.
> 
> vma_dump_size() does like this (the diff is the one in 314e51b98):
> 
>     static unsigned long vma_dump_size(struct vm_area_struct *vma,
>     				   unsigned long mm_flags)
>     {
>     #define FILTER(type)	(mm_flags&  (1UL<<  MMF_DUMP_##type))
> 
>     	/* always dump the vdso and vsyscall sections */
>     	if (always_dump_vma(vma))
>     		goto whole;
> 
>    	if (vma->vm_flags&  VM_DONTDUMP)
>     		return 0;
> 
>     	/* Hugetlb memory check */
>     	if (vma->vm_flags&  VM_HUGETLB) {
>     		if ((vma->vm_flags&  VM_SHARED)&&  FILTER(HUGETLB_SHARED))
>     			goto whole;
>     		if (!(vma->vm_flags&  VM_SHARED)&&  FILTER(HUGETLB_PRIVATE))
>     			goto whole;
>     	}
> 
>     	/* Do not dump I/O mapped devices or special mappings */
>    -	if (vma->vm_flags&  (VM_IO | VM_RESERVED))
>    +	if (vma->vm_flags&  VM_IO)
>     		return 0;
> 
> We have hugetlb memory check after VM_DONTDUMP check, so the following
> changed the behavior.

Ok, I missed this in my patch.

> 
>    --- a/fs/hugetlbfs/inode.c
>    +++ b/fs/hugetlbfs/inode.c
>    @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>             * way when do_mmap_pgoff unwinds (may be important on powerpc
>             * and ia64).
>             */
>    -       vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
>    +       vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
>            vma->vm_ops =&hugetlb_vm_ops;
> 
>            if (vma->vm_pgoff&  (~huge_page_mask(h)>>  PAGE_SHIFT))
> 
> I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.

Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>

> 
> Thanks,
> Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-29  5:30         ` Konstantin Khlebnikov
@ 2013-03-29 12:09           ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 33+ messages in thread
From: Konstantin Khlebnikov @ 2013-03-29 12:09 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, linux-mm, linux-kernel

Konstantin Khlebnikov wrote:
> Naoya Horiguchi wrote:
>> On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
>>> Naoya Horiguchi wrote:
>>>> Currently we fail to include any data on hugepages into coredump,
>>>> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
>>>> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
>>>> mm->reserved_vm counter". This looks to me a serious regression,
>>>> so let's fix it.
>>>
>>> That was introduced in my patch? Really?
>>> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.
>>
>> vma_dump_size() does like this (the diff is the one in 314e51b98):
>>
>>      static unsigned long vma_dump_size(struct vm_area_struct *vma,
>>      				   unsigned long mm_flags)
>>      {
>>      #define FILTER(type)	(mm_flags&   (1UL<<   MMF_DUMP_##type))
>>
>>      	/* always dump the vdso and vsyscall sections */
>>      	if (always_dump_vma(vma))
>>      		goto whole;
>>
>>     	if (vma->vm_flags&   VM_DONTDUMP)
>>      		return 0;
>>
>>      	/* Hugetlb memory check */
>>      	if (vma->vm_flags&   VM_HUGETLB) {
>>      		if ((vma->vm_flags&   VM_SHARED)&&   FILTER(HUGETLB_SHARED))
>>      			goto whole;
>>      		if (!(vma->vm_flags&   VM_SHARED)&&   FILTER(HUGETLB_PRIVATE))
>>      			goto whole;
>>      	}
>>
>>      	/* Do not dump I/O mapped devices or special mappings */
>>     -	if (vma->vm_flags&   (VM_IO | VM_RESERVED))
>>     +	if (vma->vm_flags&   VM_IO)
>>      		return 0;
>>
>> We have hugetlb memory check after VM_DONTDUMP check, so the following
>> changed the behavior.
> 
> Ok, I missed this in my patch.
> 
>>
>>     --- a/fs/hugetlbfs/inode.c
>>     +++ b/fs/hugetlbfs/inode.c
>>     @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>              * way when do_mmap_pgoff unwinds (may be important on powerpc
>>              * and ia64).
>>              */
>>     -       vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
>>     +       vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
>>             vma->vm_ops =&hugetlb_vm_ops;
>>
>>             if (vma->vm_pgoff&   (~huge_page_mask(h)>>   PAGE_SHIFT))
>>
>> I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.
> 
> Acked-by: Konstantin Khlebnikov<khlebnikov@openvz.org>

hugetlb coredump filter also should be fixed in this way:

--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1154,6 +1154,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
                        goto whole;
                if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
                        goto whole;
+               return 0;
        }

        /* Do not dump I/O mapped devices or special mappings */

> 
>>
>> Thanks,
>> Naoya
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-29 12:09           ` Konstantin Khlebnikov
  0 siblings, 0 replies; 33+ messages in thread
From: Konstantin Khlebnikov @ 2013-03-29 12:09 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, linux-mm, linux-kernel

Konstantin Khlebnikov wrote:
> Naoya Horiguchi wrote:
>> On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
>>> Naoya Horiguchi wrote:
>>>> Currently we fail to include any data on hugepages into coredump,
>>>> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
>>>> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
>>>> mm->reserved_vm counter". This looks to me a serious regression,
>>>> so let's fix it.
>>>
>>> That was introduced in my patch? Really?
>>> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.
>>
>> vma_dump_size() does like this (the diff is the one in 314e51b98):
>>
>>      static unsigned long vma_dump_size(struct vm_area_struct *vma,
>>      				   unsigned long mm_flags)
>>      {
>>      #define FILTER(type)	(mm_flags&   (1UL<<   MMF_DUMP_##type))
>>
>>      	/* always dump the vdso and vsyscall sections */
>>      	if (always_dump_vma(vma))
>>      		goto whole;
>>
>>     	if (vma->vm_flags&   VM_DONTDUMP)
>>      		return 0;
>>
>>      	/* Hugetlb memory check */
>>      	if (vma->vm_flags&   VM_HUGETLB) {
>>      		if ((vma->vm_flags&   VM_SHARED)&&   FILTER(HUGETLB_SHARED))
>>      			goto whole;
>>      		if (!(vma->vm_flags&   VM_SHARED)&&   FILTER(HUGETLB_PRIVATE))
>>      			goto whole;
>>      	}
>>
>>      	/* Do not dump I/O mapped devices or special mappings */
>>     -	if (vma->vm_flags&   (VM_IO | VM_RESERVED))
>>     +	if (vma->vm_flags&   VM_IO)
>>      		return 0;
>>
>> We have hugetlb memory check after VM_DONTDUMP check, so the following
>> changed the behavior.
> 
> Ok, I missed this in my patch.
> 
>>
>>     --- a/fs/hugetlbfs/inode.c
>>     +++ b/fs/hugetlbfs/inode.c
>>     @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>              * way when do_mmap_pgoff unwinds (may be important on powerpc
>>              * and ia64).
>>              */
>>     -       vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
>>     +       vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
>>             vma->vm_ops =&hugetlb_vm_ops;
>>
>>             if (vma->vm_pgoff&   (~huge_page_mask(h)>>   PAGE_SHIFT))
>>
>> I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.
> 
> Acked-by: Konstantin Khlebnikov<khlebnikov@openvz.org>

hugetlb coredump filter also should be fixed in this way:

--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1154,6 +1154,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
                        goto whole;
                if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
                        goto whole;
+               return 0;
        }

        /* Do not dump I/O mapped devices or special mappings */

> 
>>
>> Thanks,
>> Naoya
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-28 15:42   ` Naoya Horiguchi
@ 2013-03-29 13:47     ` Michal Hocko
  -1 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2013-03-29 13:47 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel,
	Konstantin Khlebnikov

On Thu 28-03-13 11:42:37, Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  fs/hugetlbfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
> index 84e3d85..523464e 100644
> --- v3.9-rc3.orig/fs/hugetlbfs/inode.c
> +++ v3.9-rc3/fs/hugetlbfs/inode.c
> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	 * way when do_mmap_pgoff unwinds (may be important on powerpc
>  	 * and ia64).
>  	 */
> -	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
> +	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
>  	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-29 13:47     ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2013-03-29 13:47 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel,
	Konstantin Khlebnikov

On Thu 28-03-13 11:42:37, Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  fs/hugetlbfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
> index 84e3d85..523464e 100644
> --- v3.9-rc3.orig/fs/hugetlbfs/inode.c
> +++ v3.9-rc3/fs/hugetlbfs/inode.c
> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	 * way when do_mmap_pgoff unwinds (may be important on powerpc
>  	 * and ia64).
>  	 */
> -	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
> +	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
>  	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
  2013-03-28 15:42   ` Naoya Horiguchi
@ 2013-03-29 13:57     ` Michal Hocko
  -1 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2013-03-29 13:57 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel

On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
[...]
> @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		 * first, for the page indexing below to work.
>  		 */
>  		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> -		absent = !pte || huge_pte_none(huge_ptep_get(pte));
> +		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> +			is_swap_pte(huge_ptep_get(pte));

is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
instead?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
@ 2013-03-29 13:57     ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2013-03-29 13:57 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel

On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
[...]
> @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		 * first, for the page indexing below to work.
>  		 */
>  		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> -		absent = !pte || huge_pte_none(huge_ptep_get(pte));
> +		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> +			is_swap_pte(huge_ptep_get(pte));

is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
instead?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-03-29 12:09           ` Konstantin Khlebnikov
@ 2013-03-29 16:59             ` Naoya Horiguchi
  -1 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-29 16:59 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, linux-mm, linux-kernel

On Fri, Mar 29, 2013 at 04:09:36PM +0400, Konstantin Khlebnikov wrote:
...
> hugetlb coredump filter also should be fixed in this way:
> 
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1154,6 +1154,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
>                         goto whole;
>                 if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
>                         goto whole;
> +               return 0;
>         }

OK, I'll add it.

Thanks,
Naoya

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-03-29 16:59             ` Naoya Horiguchi
  0 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-29 16:59 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, linux-mm, linux-kernel

On Fri, Mar 29, 2013 at 04:09:36PM +0400, Konstantin Khlebnikov wrote:
...
> hugetlb coredump filter also should be fixed in this way:
> 
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1154,6 +1154,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
>                         goto whole;
>                 if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
>                         goto whole;
> +               return 0;
>         }

OK, I'll add it.

Thanks,
Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
  2013-03-29 13:57     ` Michal Hocko
@ 2013-03-29 17:23       ` Naoya Horiguchi
  -1 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-29 17:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel

Hi,

On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> [...]
> > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >  		 * first, for the page indexing below to work.
> >  		 */
> >  		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > -		absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > +		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > +			is_swap_pte(huge_ptep_get(pte));
> 
> is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> instead?

I tested only hwpoisoned hugepage, but the same can happen for hugepages
under migration. So I intended to filter out all types of swap entries.
The local variable 'absent' seems to mean whether data on the address
is immediately available, so swap type entry isn't included in it.

Thanks,
Naoya

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
@ 2013-03-29 17:23       ` Naoya Horiguchi
  0 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-03-29 17:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel

Hi,

On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> [...]
> > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >  		 * first, for the page indexing below to work.
> >  		 */
> >  		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > -		absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > +		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > +			is_swap_pte(huge_ptep_get(pte));
> 
> is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> instead?

I tested only hwpoisoned hugepage, but the same can happen for hugepages
under migration. So I intended to filter out all types of swap entries.
The local variable 'absent' seems to mean whether data on the address
is immediately available, so swap type entry isn't included in it.

Thanks,
Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
  2013-03-29 17:23       ` Naoya Horiguchi
@ 2013-04-02  9:24         ` Michal Hocko
  -1 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2013-04-02  9:24 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel

On Fri 29-03-13 13:23:38, Naoya Horiguchi wrote:
> Hi,
> 
> On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> > On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> > [...]
> > > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > >  		 * first, for the page indexing below to work.
> > >  		 */
> > >  		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > > -		absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > > +		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > > +			is_swap_pte(huge_ptep_get(pte));
> > 
> > is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> > instead?
> 
> I tested only hwpoisoned hugepage, but the same can happen for hugepages
> under migration. So I intended to filter out all types of swap entries.
> The local variable 'absent' seems to mean whether data on the address
> is immediately available, so swap type entry isn't included in it.

OK, I didn't consider huge pages under migration and I was merely worried
that is_hugetlb_entry_hwpoisoned sounds more appropriate than
is_swap_pte.

Could you add a comment which would clarify that is_swap_pte covers both
migration and hwpoison pages, please? Something like:

		/*
		 * is_swap_pte test covers both is_hugetlb_entry_hwpoisoned
		 * and hugepages under migration in which case
		 * hugetlb_fault waits for the migration and bails out
		 * properly for HWPosined pages.
		 */
		 absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
		 	 is_swap_pte(huge_ptep_get(pte));

Other than that feel free to add
Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thanks.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
@ 2013-04-02  9:24         ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2013-04-02  9:24 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel

On Fri 29-03-13 13:23:38, Naoya Horiguchi wrote:
> Hi,
> 
> On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> > On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> > [...]
> > > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > >  		 * first, for the page indexing below to work.
> > >  		 */
> > >  		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > > -		absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > > +		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > > +			is_swap_pte(huge_ptep_get(pte));
> > 
> > is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> > instead?
> 
> I tested only hwpoisoned hugepage, but the same can happen for hugepages
> under migration. So I intended to filter out all types of swap entries.
> The local variable 'absent' seems to mean whether data on the address
> is immediately available, so swap type entry isn't included in it.

OK, I didn't consider huge pages under migration and I was merely worried
that is_hugetlb_entry_hwpoisoned sounds more appropriate than
is_swap_pte.

Could you add a comment which would clarify that is_swap_pte covers both
migration and hwpoison pages, please? Something like:

		/*
		 * is_swap_pte test covers both is_hugetlb_entry_hwpoisoned
		 * and hugepages under migration in which case
		 * hugetlb_fault waits for the migration and bails out
		 * properly for HWPosined pages.
		 */
		 absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
		 	 is_swap_pte(huge_ptep_get(pte));

Other than that feel free to add
Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thanks.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
  2013-04-02  9:24         ` Michal Hocko
@ 2013-04-02 14:35           ` Naoya Horiguchi
  -1 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-04-02 14:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel

On Tue, Apr 02, 2013 at 11:24:41AM +0200, Michal Hocko wrote:
> On Fri 29-03-13 13:23:38, Naoya Horiguchi wrote:
> > Hi,
> > 
> > On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> > > On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> > > [...]
> > > > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > > >  		 * first, for the page indexing below to work.
> > > >  		 */
> > > >  		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > > > -		absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > > > +		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > > > +			is_swap_pte(huge_ptep_get(pte));
> > > 
> > > is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> > > instead?
> > 
> > I tested only hwpoisoned hugepage, but the same can happen for hugepages
> > under migration. So I intended to filter out all types of swap entries.
> > The local variable 'absent' seems to mean whether data on the address
> > is immediately available, so swap type entry isn't included in it.
> 
> OK, I didn't consider huge pages under migration and I was merely worried
> that is_hugetlb_entry_hwpoisoned sounds more appropriate than
> is_swap_pte.
> 
> Could you add a comment which would clarify that is_swap_pte covers both
> migration and hwpoison pages, please? Something like:
> 
> 		/*
> 		 * is_swap_pte test covers both is_hugetlb_entry_hwpoisoned
> 		 * and hugepages under migration in which case
> 		 * hugetlb_fault waits for the migration and bails out
> 		 * properly for HWPosined pages.
> 		 */
> 		 absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> 		 	 is_swap_pte(huge_ptep_get(pte));

OK, I'll add this.

> Other than that feel free to add
> Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thank you!
Naoya

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
@ 2013-04-02 14:35           ` Naoya Horiguchi
  0 siblings, 0 replies; 33+ messages in thread
From: Naoya Horiguchi @ 2013-04-02 14:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, stable, linux-mm, linux-kernel

On Tue, Apr 02, 2013 at 11:24:41AM +0200, Michal Hocko wrote:
> On Fri 29-03-13 13:23:38, Naoya Horiguchi wrote:
> > Hi,
> > 
> > On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> > > On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> > > [...]
> > > > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > > >  		 * first, for the page indexing below to work.
> > > >  		 */
> > > >  		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > > > -		absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > > > +		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > > > +			is_swap_pte(huge_ptep_get(pte));
> > > 
> > > is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> > > instead?
> > 
> > I tested only hwpoisoned hugepage, but the same can happen for hugepages
> > under migration. So I intended to filter out all types of swap entries.
> > The local variable 'absent' seems to mean whether data on the address
> > is immediately available, so swap type entry isn't included in it.
> 
> OK, I didn't consider huge pages under migration and I was merely worried
> that is_hugetlb_entry_hwpoisoned sounds more appropriate than
> is_swap_pte.
> 
> Could you add a comment which would clarify that is_swap_pte covers both
> migration and hwpoison pages, please? Something like:
> 
> 		/*
> 		 * is_swap_pte test covers both is_hugetlb_entry_hwpoisoned
> 		 * and hugepages under migration in which case
> 		 * hugetlb_fault waits for the migration and bails out
> 		 * properly for HWPosined pages.
> 		 */
> 		 absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> 		 	 is_swap_pte(huge_ptep_get(pte));

OK, I'll add this.

> Other than that feel free to add
> Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thank you!
Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2013-04-02 14:37 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-28 15:42 [PATCH 0/2] fix hugepage coredump Naoya Horiguchi
2013-03-28 15:42 ` [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) Naoya Horiguchi
2013-03-28 15:42   ` Naoya Horiguchi
2013-03-28 15:51   ` Greg KH
2013-03-28 15:51     ` Greg KH
2013-03-28 16:04     ` Naoya Horiguchi
2013-03-28 16:04       ` Naoya Horiguchi
2013-03-28 19:39       ` Ben Hutchings
2013-03-28 19:39         ` Ben Hutchings
2013-03-28 19:47         ` Naoya Horiguchi
2013-03-28 19:47           ` Naoya Horiguchi
2013-03-28 17:03   ` Konstantin Khlebnikov
2013-03-28 17:03     ` Konstantin Khlebnikov
2013-03-28 18:29     ` Naoya Horiguchi
2013-03-28 18:29       ` Naoya Horiguchi
2013-03-29  5:30       ` Konstantin Khlebnikov
2013-03-29  5:30         ` Konstantin Khlebnikov
2013-03-29 12:09         ` Konstantin Khlebnikov
2013-03-29 12:09           ` Konstantin Khlebnikov
2013-03-29 16:59           ` Naoya Horiguchi
2013-03-29 16:59             ` Naoya Horiguchi
2013-03-29 13:47   ` Michal Hocko
2013-03-29 13:47     ` Michal Hocko
2013-03-28 15:42 ` [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page() Naoya Horiguchi
2013-03-28 15:42   ` Naoya Horiguchi
2013-03-29 13:57   ` Michal Hocko
2013-03-29 13:57     ` Michal Hocko
2013-03-29 17:23     ` Naoya Horiguchi
2013-03-29 17:23       ` Naoya Horiguchi
2013-04-02  9:24       ` Michal Hocko
2013-04-02  9:24         ` Michal Hocko
2013-04-02 14:35         ` Naoya Horiguchi
2013-04-02 14:35           ` Naoya Horiguchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.