All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] fix hugepage coredump
@ 2013-04-01 17:21 ` Naoya Horiguchi
  0 siblings, 0 replies; 11+ messages in thread
From: Naoya Horiguchi @ 2013-04-01 17:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro,
	Konstantin Khlebnikov, Michal Hocko, linux-mm, linux-kernel

Hi,

Here is 2nd version of hugepage coredump fix.
See individual patches for more details.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 0/2] fix hugepage coredump
@ 2013-04-01 17:21 ` Naoya Horiguchi
  0 siblings, 0 replies; 11+ messages in thread
From: Naoya Horiguchi @ 2013-04-01 17:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro,
	Konstantin Khlebnikov, Michal Hocko, linux-mm, linux-kernel

Hi,

Here is 2nd version of hugepage coredump fix.
See individual patches for more details.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-04-01 17:21 ` Naoya Horiguchi
@ 2013-04-01 17:21   ` Naoya Horiguchi
  -1 siblings, 0 replies; 11+ messages in thread
From: Naoya Horiguchi @ 2013-04-01 17:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro,
	Konstantin Khlebnikov, Michal Hocko, linux-mm, linux-kernel,
	stable

Currently we fail to include any data on hugepages into coredump,
because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
mm->reserved_vm counter". This looks to me a serious regression,
so let's fix it.

ChangeLog v2:
 - add 'return 0' in hugepage memory check

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org
---
 fs/binfmt_elf.c      | 1 +
 fs/hugetlbfs/inode.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git v3.9-rc3.orig/fs/binfmt_elf.c v3.9-rc3/fs/binfmt_elf.c
index 3939829..86af964 100644
--- v3.9-rc3.orig/fs/binfmt_elf.c
+++ v3.9-rc3/fs/binfmt_elf.c
@@ -1137,6 +1137,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
 			goto whole;
 		if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
 			goto whole;
+		return 0;
 	}
 
 	/* Do not dump I/O mapped devices or special mappings */
diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
index 84e3d85..523464e 100644
--- v3.9-rc3.orig/fs/hugetlbfs/inode.c
+++ v3.9-rc3/fs/hugetlbfs/inode.c
@@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	 * way when do_mmap_pgoff unwinds (may be important on powerpc
 	 * and ia64).
 	 */
-	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
+	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-04-01 17:21   ` Naoya Horiguchi
  0 siblings, 0 replies; 11+ messages in thread
From: Naoya Horiguchi @ 2013-04-01 17:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro,
	Konstantin Khlebnikov, Michal Hocko, linux-mm, linux-kernel,
	stable

Currently we fail to include any data on hugepages into coredump,
because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
mm->reserved_vm counter". This looks to me a serious regression,
so let's fix it.

ChangeLog v2:
 - add 'return 0' in hugepage memory check

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org
---
 fs/binfmt_elf.c      | 1 +
 fs/hugetlbfs/inode.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git v3.9-rc3.orig/fs/binfmt_elf.c v3.9-rc3/fs/binfmt_elf.c
index 3939829..86af964 100644
--- v3.9-rc3.orig/fs/binfmt_elf.c
+++ v3.9-rc3/fs/binfmt_elf.c
@@ -1137,6 +1137,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
 			goto whole;
 		if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
 			goto whole;
+		return 0;
 	}
 
 	/* Do not dump I/O mapped devices or special mappings */
diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
index 84e3d85..523464e 100644
--- v3.9-rc3.orig/fs/hugetlbfs/inode.c
+++ v3.9-rc3/fs/hugetlbfs/inode.c
@@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	 * way when do_mmap_pgoff unwinds (may be important on powerpc
 	 * and ia64).
 	 */
-	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
+	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
  2013-04-01 17:21 ` Naoya Horiguchi
@ 2013-04-01 17:21   ` Naoya Horiguchi
  -1 siblings, 0 replies; 11+ messages in thread
From: Naoya Horiguchi @ 2013-04-01 17:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro,
	Konstantin Khlebnikov, Michal Hocko, linux-mm, linux-kernel,
	stable

With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access
the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0)
in get_page().

The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.
In other words, physical address information is stored in different bit layout
between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page()
which is called in get_dump_page() returns a wrong page from a given address.

We need to filter out only hwpoison hugepages to have data on healthy
hugepages in coredump. So this patch makes follow_hugetlb_page() avoid
trying to get page when a pmd is in swap entry like format.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: stable@vger.kernel.org
---
 mm/hugetlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c
index 0d1705b..8462e2c 100644
--- v3.9-rc3.orig/mm/hugetlb.c
+++ v3.9-rc3/mm/hugetlb.c
@@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * first, for the page indexing below to work.
 		 */
 		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
-		absent = !pte || huge_pte_none(huge_ptep_get(pte));
+		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
+			is_swap_pte(huge_ptep_get(pte));
 
 		/*
 		 * When coredumping, it suits get_dump_page if we just return
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()
@ 2013-04-01 17:21   ` Naoya Horiguchi
  0 siblings, 0 replies; 11+ messages in thread
From: Naoya Horiguchi @ 2013-04-01 17:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Hugh Dickins, Rik van Riel, KOSAKI Motohiro,
	Konstantin Khlebnikov, Michal Hocko, linux-mm, linux-kernel,
	stable

With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access
the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0)
in get_page().

The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.
In other words, physical address information is stored in different bit layout
between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page()
which is called in get_dump_page() returns a wrong page from a given address.

We need to filter out only hwpoison hugepages to have data on healthy
hugepages in coredump. So this patch makes follow_hugetlb_page() avoid
trying to get page when a pmd is in swap entry like format.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: stable@vger.kernel.org
---
 mm/hugetlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c
index 0d1705b..8462e2c 100644
--- v3.9-rc3.orig/mm/hugetlb.c
+++ v3.9-rc3/mm/hugetlb.c
@@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * first, for the page indexing below to work.
 		 */
 		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
-		absent = !pte || huge_pte_none(huge_ptep_get(pte));
+		absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
+			is_swap_pte(huge_ptep_get(pte));
 
 		/*
 		 * When coredumping, it suits get_dump_page if we just return
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/2] fix hugepage coredump
  2013-04-01 17:21 ` Naoya Horiguchi
@ 2013-04-02  5:34   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 11+ messages in thread
From: Konstantin Khlebnikov @ 2013-04-02  5:34 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, Michal Hocko, linux-mm, linux-kernel

Naoya Horiguchi wrote:
> Hi,
>
> Here is 2nd version of hugepage coredump fix.
> See individual patches for more details.
>
> Thanks,
> Naoya Horiguchi

ACK to both patches


VM_* bits cleanup patchset was merged into v3.7, so only two recent stable kernels needs this fix.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/2] fix hugepage coredump
@ 2013-04-02  5:34   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 11+ messages in thread
From: Konstantin Khlebnikov @ 2013-04-02  5:34 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, Michal Hocko, linux-mm, linux-kernel

Naoya Horiguchi wrote:
> Hi,
>
> Here is 2nd version of hugepage coredump fix.
> See individual patches for more details.
>
> Thanks,
> Naoya Horiguchi

ACK to both patches


VM_* bits cleanup patchset was merged into v3.7, so only two recent stable kernels needs this fix.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-04-01 17:21   ` Naoya Horiguchi
  (?)
@ 2013-04-02 11:32   ` HATAYAMA Daisuke
  2013-04-02 14:07       ` Naoya Horiguchi
  -1 siblings, 1 reply; 11+ messages in thread
From: HATAYAMA Daisuke @ 2013-04-02 11:32 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, Konstantin Khlebnikov, Michal Hocko, linux-mm,
	linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

2013/4/2 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.
>
> ChangeLog v2:
>  - add 'return 0' in hugepage memory check
>
<cut>

> @@ -1137,6 +1137,7 @@ static unsigned long vma_dump_size(struct
> vm_area_struct *vma,
>                         goto whole;
>                 if (!(vma->vm_flags & VM_SHARED) &&
> FILTER(HUGETLB_PRIVATE))
>                         goto whole;
> +               return 0;
>         }
>

You should split this part into another patch. This fix is orthogonal to
the bug this patch tries to fix.

The bug you're trying to fix implicitly here is the filtering behaviour
that doesn't follow
the description in Documentation/filesystems/proc.txt that:

  Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
  effected by bit 5-6.

Right?

Thanks.
HATAYAMA, Daisuke

[-- Attachment #2: Type: text/html, Size: 1704 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
  2013-04-02 11:32   ` HATAYAMA Daisuke
@ 2013-04-02 14:07       ` Naoya Horiguchi
  0 siblings, 0 replies; 11+ messages in thread
From: Naoya Horiguchi @ 2013-04-02 14:07 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, Konstantin Khlebnikov, Michal Hocko, linux-mm,
	linux-kernel, stable

On Tue, Apr 02, 2013 at 08:32:33PM +0900, HATAYAMA Daisuke wrote:
> 2013/4/2 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> > Currently we fail to include any data on hugepages into coredump,
> > because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> > introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> > mm->reserved_vm counter". This looks to me a serious regression,
> > so let's fix it.
> >
> > ChangeLog v2:
> >  - add 'return 0' in hugepage memory check
> >
> <cut>
> 
> > @@ -1137,6 +1137,7 @@ static unsigned long vma_dump_size(struct
> > vm_area_struct *vma,
> >                         goto whole;
> >                 if (!(vma->vm_flags & VM_SHARED) &&
> > FILTER(HUGETLB_PRIVATE))
> >                         goto whole;
> > +               return 0;
> >         }
> >
> 
> You should split this part into another patch. This fix is orthogonal to
> the bug this patch tries to fix.

Fair enough, thanks.

> The bug you're trying to fix implicitly here is the filtering behaviour
> that doesn't follow
> the description in Documentation/filesystems/proc.txt that:
> 
>   Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
>   effected by bit 5-6.
> 
> Right?

Right. Without this return, we will go into the subsequent flag checks
of bit 0-4 for vma(VM_HUGETLB).

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
@ 2013-04-02 14:07       ` Naoya Horiguchi
  0 siblings, 0 replies; 11+ messages in thread
From: Naoya Horiguchi @ 2013-04-02 14:07 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Rik van Riel,
	KOSAKI Motohiro, Konstantin Khlebnikov, Michal Hocko, linux-mm,
	linux-kernel, stable

On Tue, Apr 02, 2013 at 08:32:33PM +0900, HATAYAMA Daisuke wrote:
> 2013/4/2 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> > Currently we fail to include any data on hugepages into coredump,
> > because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> > introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> > mm->reserved_vm counter". This looks to me a serious regression,
> > so let's fix it.
> >
> > ChangeLog v2:
> >  - add 'return 0' in hugepage memory check
> >
> <cut>
> 
> > @@ -1137,6 +1137,7 @@ static unsigned long vma_dump_size(struct
> > vm_area_struct *vma,
> >                         goto whole;
> >                 if (!(vma->vm_flags & VM_SHARED) &&
> > FILTER(HUGETLB_PRIVATE))
> >                         goto whole;
> > +               return 0;
> >         }
> >
> 
> You should split this part into another patch. This fix is orthogonal to
> the bug this patch tries to fix.

Fair enough, thanks.

> The bug you're trying to fix implicitly here is the filtering behaviour
> that doesn't follow
> the description in Documentation/filesystems/proc.txt that:
> 
>   Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
>   effected by bit 5-6.
> 
> Right?

Right. Without this return, we will go into the subsequent flag checks
of bit 0-4 for vma(VM_HUGETLB).

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-04-02 14:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-01 17:21 [PATCH v2 0/2] fix hugepage coredump Naoya Horiguchi
2013-04-01 17:21 ` Naoya Horiguchi
2013-04-01 17:21 ` [PATCH v2 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) Naoya Horiguchi
2013-04-01 17:21   ` Naoya Horiguchi
2013-04-02 11:32   ` HATAYAMA Daisuke
2013-04-02 14:07     ` Naoya Horiguchi
2013-04-02 14:07       ` Naoya Horiguchi
2013-04-01 17:21 ` [PATCH v2 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page() Naoya Horiguchi
2013-04-01 17:21   ` Naoya Horiguchi
2013-04-02  5:34 ` [PATCH v2 0/2] fix hugepage coredump Konstantin Khlebnikov
2013-04-02  5:34   ` Konstantin Khlebnikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.