All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] numa: fix /proc/<pid>/numa_maps on s390
@ 2016-01-20 18:17 Michael Holzheu
  2016-01-20 19:32 ` kbuild test robot
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Holzheu @ 2016-01-20 18:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin Schwidefsky, Heiko Carstens, Gerald Schaefer, linux-mm,
	linux-kernel

When working with huge page pmds in general is not valid to directly
use pte functions like pte_present() because the hardware bit layout
of pmds and ptes can be different. This is the case on s390. Therefore
we have to convert the pmds first into a valid pte encoding with
huge_ptep_get(). So add the two missing functions calls to do this.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 fs/proc/task_mmu.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 65a1b6c..e287e32 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1520,7 +1520,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 	pte_t *pte;
 
 	if (pmd_trans_huge_lock(pmd, vma, &ptl)) {
-		pte_t huge_pte = *(pte_t *)pmd;
+		pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
 		struct page *page;
 
 		page = can_gather_numa_stats(huge_pte, vma, addr);
@@ -1548,18 +1548,19 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
 		unsigned long addr, unsigned long end, struct mm_walk *walk)
 {
+	pte_t huge_pte = huge_ptep_get(pte);
 	struct numa_maps *md;
 	struct page *page;
 
-	if (!pte_present(*pte))
+	if (!pte_present(huge_pte))
 		return 0;
 
-	page = pte_page(*pte);
+	page = pte_page(huge_pte);
 	if (!page)
 		return 0;
 
 	md = walk->private;
-	gather_stats(page, md, pte_dirty(*pte), 1);
+	gather_stats(page, md, pte_dirty(huge_pte), 1);
 	return 0;
 }
 
-- 
2.3.9

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] numa: fix /proc/<pid>/numa_maps on s390
  2016-01-20 18:17 [PATCH] numa: fix /proc/<pid>/numa_maps on s390 Michael Holzheu
@ 2016-01-20 19:32 ` kbuild test robot
  2016-01-20 20:26   ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: kbuild test robot @ 2016-01-20 19:32 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: kbuild-all, Andrew Morton, Martin Schwidefsky, Heiko Carstens,
	Gerald Schaefer, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1540 bytes --]

Hi Michael,

[auto build test ERROR on next-20160120]
[cannot apply to v4.4-rc8 v4.4-rc7 v4.4-rc6 v4.4]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Michael-Holzheu/numa-fix-proc-pid-numa_maps-on-s390/20160121-022313
config: ia64-alldefconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   fs/proc/task_mmu.c: In function 'gather_pte_stats':
>> fs/proc/task_mmu.c:1523:3: error: implicit declaration of function 'huge_ptep_get' [-Werror=implicit-function-declaration]
      pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
      ^
>> fs/proc/task_mmu.c:1523:3: error: invalid initializer
   cc1: some warnings being treated as errors

vim +/huge_ptep_get +1523 fs/proc/task_mmu.c

  1517		struct vm_area_struct *vma = walk->vma;
  1518		spinlock_t *ptl;
  1519		pte_t *orig_pte;
  1520		pte_t *pte;
  1521	
  1522		if (pmd_trans_huge_lock(pmd, vma, &ptl)) {
> 1523			pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
  1524			struct page *page;
  1525	
  1526			page = can_gather_numa_stats(huge_pte, vma, addr);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 7310 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] numa: fix /proc/<pid>/numa_maps on s390
  2016-01-20 19:32 ` kbuild test robot
@ 2016-01-20 20:26   ` Andrew Morton
  2016-01-21 14:57     ` Gerald Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2016-01-20 20:26 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Michael Holzheu, kbuild-all, Martin Schwidefsky, Heiko Carstens,
	Gerald Schaefer, linux-mm, linux-kernel

On Thu, 21 Jan 2016 03:32:41 +0800 kbuild test robot <lkp@intel.com> wrote:

> Hi Michael,
> 
> [auto build test ERROR on next-20160120]
> [cannot apply to v4.4-rc8 v4.4-rc7 v4.4-rc6 v4.4]
> [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Michael-Holzheu/numa-fix-proc-pid-numa_maps-on-s390/20160121-022313
> config: ia64-alldefconfig (attached as .config)
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=ia64 
> 
> All errors (new ones prefixed by >>):
> 
>    fs/proc/task_mmu.c: In function 'gather_pte_stats':
> >> fs/proc/task_mmu.c:1523:3: error: implicit declaration of function 'huge_ptep_get' [-Werror=implicit-function-declaration]
>       pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
>       ^

hm.

That whole code block in gather_pte_stats() just shouldn't exist when
CONFIG_TRANSPARENT_HUGEPAGE=n.  And because pmd_trans_huge_lock()
returns NULL, the compiler will nuke it all anyway.

So instead of doing the obvious:

--- a/fs/proc/task_mmu.c~numa-fix-proc-pid-numa_maps-on-s390-fix
+++ a/fs/proc/task_mmu.c
@@ -1524,7 +1524,11 @@ static int gather_pte_stats(pmd_t *pmd,
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (ptl) {
+#ifdef CONFIG_HUGETLB_PAGE
 		pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
+#else
+		pte_t huge_pte = *(pte_t *)pmd;
+#endif
 		struct page *page;
 
 		page = can_gather_numa_stats(huge_pte, vma, addr);

I'm thinking we should do

--- a/fs/proc/task_mmu.c~numa-fix-proc-pid-numa_maps-on-s390-fix
+++ a/fs/proc/task_mmu.c
@@ -1523,6 +1523,7 @@ static int gather_pte_stats(pmd_t *pmd,
 	pte_t *pte;
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	if (ptl) {
 		pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
 		struct page *page;
@@ -1534,6 +1535,7 @@ static int gather_pte_stats(pmd_t *pmd,
 		spin_unlock(ptl);
 		return 0;
 	}
+#endif
 
 	if (pmd_trans_unstable(pmd))
 		return 0;
_

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] numa: fix /proc/<pid>/numa_maps on s390
  2016-01-20 20:26   ` Andrew Morton
@ 2016-01-21 14:57     ` Gerald Schaefer
  2016-01-21 19:19       ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Gerald Schaefer @ 2016-01-21 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kbuild test robot, Michael Holzheu, kbuild-all,
	Martin Schwidefsky, Heiko Carstens, linux-mm, linux-kernel

On Wed, 20 Jan 2016 12:26:31 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 21 Jan 2016 03:32:41 +0800 kbuild test robot <lkp@intel.com> wrote:
> 
> > Hi Michael,
> > 
> > [auto build test ERROR on next-20160120]
> > [cannot apply to v4.4-rc8 v4.4-rc7 v4.4-rc6 v4.4]
> > [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
> > 
> > url:    https://github.com/0day-ci/linux/commits/Michael-Holzheu/numa-fix-proc-pid-numa_maps-on-s390/20160121-022313
> > config: ia64-alldefconfig (attached as .config)
> > reproduce:
> >         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # save the attached .config to linux build tree
> >         make.cross ARCH=ia64 
> > 
> > All errors (new ones prefixed by >>):
> > 
> >    fs/proc/task_mmu.c: In function 'gather_pte_stats':
> > >> fs/proc/task_mmu.c:1523:3: error: implicit declaration of function 'huge_ptep_get' [-Werror=implicit-function-declaration]
> >       pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
> >       ^
> 
> hm.
> 
> That whole code block in gather_pte_stats() just shouldn't exist when
> CONFIG_TRANSPARENT_HUGEPAGE=n.  And because pmd_trans_huge_lock()
> returns NULL, the compiler will nuke it all anyway.
> 
> So instead of doing the obvious:
> 
> --- a/fs/proc/task_mmu.c~numa-fix-proc-pid-numa_maps-on-s390-fix
> +++ a/fs/proc/task_mmu.c
> @@ -1524,7 +1524,11 @@ static int gather_pte_stats(pmd_t *pmd,
> 
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
> +#ifdef CONFIG_HUGETLB_PAGE
>  		pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
> +#else
> +		pte_t huge_pte = *(pte_t *)pmd;
> +#endif
>  		struct page *page;
> 
>  		page = can_gather_numa_stats(huge_pte, vma, addr);
> 
> I'm thinking we should do
> 
> --- a/fs/proc/task_mmu.c~numa-fix-proc-pid-numa_maps-on-s390-fix
> +++ a/fs/proc/task_mmu.c
> @@ -1523,6 +1523,7 @@ static int gather_pte_stats(pmd_t *pmd,
>  	pte_t *pte;
> 
>  	ptl = pmd_trans_huge_lock(pmd, vma);
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  	if (ptl) {
>  		pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
>  		struct page *page;
> @@ -1534,6 +1535,7 @@ static int gather_pte_stats(pmd_t *pmd,
>  		spin_unlock(ptl);
>  		return 0;
>  	}
> +#endif
> 
>  	if (pmd_trans_unstable(pmd))
>  		return 0;

Hi Andrew,

Unfortunately this seems to be a lot more complicated than we thought.
huge_ptep_get() is only defined when CONFIG_HUGETLB_PAGE=y, independent
from CONFIG_TRANSPARENT_HUGEPAGE. This is because asm/hugetlb.h is only
included from linux/hugetlb.h when CONFIG_HUGETLB_PAGE=y.

So this fix won't fix the build error when CONFIG_HUGETLBFS=n and
CONFIG_TRANSPARENT_HUGEPAGE=y.

Since the THP code did not repeat the flaws of the hugtelbfs code, i.e.
it is actually working on PMD entries and not PTE entries, there was
no need for huge_ptep_get() for THP so far.

Now it seems that the THP code in gather_pte_stats() is an exception to
this, as it is not working on a PMD like the rest of the THP code, but
also on a fake "PTE" like the hugetlbfs code.

I guess this needs more thinking, two options are crossing my mind:
- Fix the THP code in gather_pte_stats() to properly use a PMD instead of
  PTE. This would probably require something like a "_pmd" version of
  "can_gather_numa_stats()" and a pmd_dirty() check for the
  gather_stats() parameter.
- Make huge_ptep_get() also available for CONFIG_HUGETLBFS=n, perhaps
  by introducing something like HAVE_ARCH_HUGE_PTEP_GET and implementing
  the default NOP version in linux/hugetlbfs.h instead of the individual
  asm/hugetlbfs.h files for all archs.

The first option seems more correct, but it might entail other problems.
The second option would also introduce new problems on s390, where the
implementation of huge_ptep_get() in arch/s390/mm/hugetlbpage.c is currently
only built with CONFIG_HUGETLBFS=y, but I guess we could handle that.

Any thoughts / more ideas?

Regards,
Gerald

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] numa: fix /proc/<pid>/numa_maps on s390
  2016-01-21 14:57     ` Gerald Schaefer
@ 2016-01-21 19:19       ` Andrew Morton
  2016-01-22 13:37         ` Gerald Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2016-01-21 19:19 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: kbuild test robot, Michael Holzheu, kbuild-all,
	Martin Schwidefsky, Heiko Carstens, linux-mm, linux-kernel

On Thu, 21 Jan 2016 15:57:32 +0100 Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> > --- a/fs/proc/task_mmu.c~numa-fix-proc-pid-numa_maps-on-s390-fix
> > +++ a/fs/proc/task_mmu.c
> > @@ -1523,6 +1523,7 @@ static int gather_pte_stats(pmd_t *pmd,
> >  	pte_t *pte;
> > 
> >  	ptl = pmd_trans_huge_lock(pmd, vma);
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  	if (ptl) {
> >  		pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
> >  		struct page *page;
> > @@ -1534,6 +1535,7 @@ static int gather_pte_stats(pmd_t *pmd,
> >  		spin_unlock(ptl);
> >  		return 0;
> >  	}
> > +#endif
> > 
> >  	if (pmd_trans_unstable(pmd))
> >  		return 0;
> 
> Hi Andrew,
> 
> Unfortunately this seems to be a lot more complicated than we thought.
> huge_ptep_get() is only defined when CONFIG_HUGETLB_PAGE=y, independent
> from CONFIG_TRANSPARENT_HUGEPAGE. This is because asm/hugetlb.h is only
> included from linux/hugetlb.h when CONFIG_HUGETLB_PAGE=y.
> 
> So this fix won't fix the build error when CONFIG_HUGETLBFS=n and
> CONFIG_TRANSPARENT_HUGEPAGE=y.
> 
> Since the THP code did not repeat the flaws of the hugtelbfs code, i.e.
> it is actually working on PMD entries and not PTE entries, there was
> no need for huge_ptep_get() for THP so far.
> 
> Now it seems that the THP code in gather_pte_stats() is an exception to
> this, as it is not working on a PMD like the rest of the THP code, but
> also on a fake "PTE" like the hugetlbfs code.
> 
> I guess this needs more thinking, two options are crossing my mind:
> - Fix the THP code in gather_pte_stats() to properly use a PMD instead of
>   PTE. This would probably require something like a "_pmd" version of
>   "can_gather_numa_stats()" and a pmd_dirty() check for the
>   gather_stats() parameter.
> - Make huge_ptep_get() also available for CONFIG_HUGETLBFS=n, perhaps
>   by introducing something like HAVE_ARCH_HUGE_PTEP_GET and implementing
>   the default NOP version in linux/hugetlbfs.h instead of the individual
>   asm/hugetlbfs.h files for all archs.
> 
> The first option seems more correct, but it might entail other problems.
> The second option would also introduce new problems on s390, where the
> implementation of huge_ptep_get() in arch/s390/mm/hugetlbpage.c is currently
> only built with CONFIG_HUGETLBFS=y, but I guess we could handle that.
> 
> Any thoughts / more ideas?
> 

The first option does of course sound better.

But you need numa_maps fixed, presumably in 4.5 and possibly backported
into -stable? (The changelog doesn't describe the end-user-visible
effects of the bug.  Naughty changelog!)

So is there some minimal thing we can do for now to get things working
properly and fix it for-real in 4.6?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] numa: fix /proc/<pid>/numa_maps on s390
  2016-01-21 19:19       ` Andrew Morton
@ 2016-01-22 13:37         ` Gerald Schaefer
  0 siblings, 0 replies; 6+ messages in thread
From: Gerald Schaefer @ 2016-01-22 13:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kbuild test robot, Michael Holzheu, kbuild-all,
	Martin Schwidefsky, Heiko Carstens, linux-mm, linux-kernel

On Thu, 21 Jan 2016 11:19:19 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 21 Jan 2016 15:57:32 +0100 Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:
> 
> > > --- a/fs/proc/task_mmu.c~numa-fix-proc-pid-numa_maps-on-s390-fix
> > > +++ a/fs/proc/task_mmu.c
> > > @@ -1523,6 +1523,7 @@ static int gather_pte_stats(pmd_t *pmd,
> > >  	pte_t *pte;
> > > 
> > >  	ptl = pmd_trans_huge_lock(pmd, vma);
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > >  	if (ptl) {
> > >  		pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
> > >  		struct page *page;
> > > @@ -1534,6 +1535,7 @@ static int gather_pte_stats(pmd_t *pmd,
> > >  		spin_unlock(ptl);
> > >  		return 0;
> > >  	}
> > > +#endif
> > > 
> > >  	if (pmd_trans_unstable(pmd))
> > >  		return 0;
> > 
> > Hi Andrew,
> > 
> > Unfortunately this seems to be a lot more complicated than we thought.
> > huge_ptep_get() is only defined when CONFIG_HUGETLB_PAGE=y, independent
> > from CONFIG_TRANSPARENT_HUGEPAGE. This is because asm/hugetlb.h is only
> > included from linux/hugetlb.h when CONFIG_HUGETLB_PAGE=y.
> > 
> > So this fix won't fix the build error when CONFIG_HUGETLBFS=n and
> > CONFIG_TRANSPARENT_HUGEPAGE=y.
> > 
> > Since the THP code did not repeat the flaws of the hugtelbfs code, i.e.
> > it is actually working on PMD entries and not PTE entries, there was
> > no need for huge_ptep_get() for THP so far.
> > 
> > Now it seems that the THP code in gather_pte_stats() is an exception to
> > this, as it is not working on a PMD like the rest of the THP code, but
> > also on a fake "PTE" like the hugetlbfs code.
> > 
> > I guess this needs more thinking, two options are crossing my mind:
> > - Fix the THP code in gather_pte_stats() to properly use a PMD instead of
> >   PTE. This would probably require something like a "_pmd" version of
> >   "can_gather_numa_stats()" and a pmd_dirty() check for the
> >   gather_stats() parameter.
> > - Make huge_ptep_get() also available for CONFIG_HUGETLBFS=n, perhaps
> >   by introducing something like HAVE_ARCH_HUGE_PTEP_GET and implementing
> >   the default NOP version in linux/hugetlbfs.h instead of the individual
> >   asm/hugetlbfs.h files for all archs.
> > 
> > The first option seems more correct, but it might entail other problems.
> > The second option would also introduce new problems on s390, where the
> > implementation of huge_ptep_get() in arch/s390/mm/hugetlbpage.c is currently
> > only built with CONFIG_HUGETLBFS=y, but I guess we could handle that.
> > 
> > Any thoughts / more ideas?
> > 
> 
> The first option does of course sound better.
> 
> But you need numa_maps fixed, presumably in 4.5 and possibly backported
> into -stable? (The changelog doesn't describe the end-user-visible
> effects of the bug.  Naughty changelog!)
> 
> So is there some minimal thing we can do for now to get things working
> properly and fix it for-real in 4.6?
> 

Right, the effects are worth a closer look. NUMA accounting is affected
for THP and hugetlbfs mappings on s390, and only the latter can be fixed
with huge_ptep_get() as in Michaels patch.

On current kernel levels, at least since we have NUMA on s390, both
accountings should actually work by chance for most pages, apart from
PROT_NONE PMDs where pte_present() will return 0. Dirty page accounting is
also broken, as pte_dirty() will always return 0 for PMDs.

Initially, I also had concerns about a potentially faulty memory access,
caused by falsely calculated pfn values and struct page addresses, but it
looks like we can only get "offsets" within one large page frame, where
all struct pages should exist.

So, the minimal thing to do for now would probably be to fix only
gather_hugetlb_stats(), i.e. take only the second hunk of Michaels patch.
I guess it would be best if we send a new patch, maybe even with a better
description, and revert what has been added so far.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-01-22 13:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-20 18:17 [PATCH] numa: fix /proc/<pid>/numa_maps on s390 Michael Holzheu
2016-01-20 19:32 ` kbuild test robot
2016-01-20 20:26   ` Andrew Morton
2016-01-21 14:57     ` Gerald Schaefer
2016-01-21 19:19       ` Andrew Morton
2016-01-22 13:37         ` Gerald Schaefer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.