linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] pgtable bytes mis-accounting v2
@ 2018-10-15 16:42 Martin Schwidefsky
  2018-10-15 16:42 ` [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded Martin Schwidefsky
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Martin Schwidefsky @ 2018-10-15 16:42 UTC (permalink / raw)
  To: Li Wang, Guenter Roeck, Janosch Frank
  Cc: Kirill A. Shutemov, Heiko Carstens, linux-kernel, Linux-MM,
	Martin Schwidefsky

Greetings,

the first test patch to fix the pgtable_bytes mis-accounting on s390
still had a few problems. For one it didn't work for x86 ..

Changes v1 -> v2:

 - Split the patch into three parts, one patch to add the mm_pxd_folded
   helpers, one patch to use to the helpers in mm_[dec|inc]_nr_[pmds|puds]
   and finally the fix for s390.

 - Drop the use of __is_defined, it does not work with the
   __PAGETABLE_PxD_FOLDED defines

 - Do not change the basic #ifdef'ery in mm.h, just add the calls
   to mm_pxd_folded to the pgtable_bytes accounting functions. This
   fixes the compile error on alpha (and potentially on other archs).

Martin Schwidefsky (3):
  mm: introduce mm_[p4d|pud|pmd]_folded
  mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
  s390/mm: fix mis-accounting of pgtable_bytes

 arch/s390/include/asm/mmu_context.h |  5 ----
 arch/s390/include/asm/pgalloc.h     |  6 ++---
 arch/s390/include/asm/pgtable.h     | 18 ++++++++++++++
 arch/s390/include/asm/tlb.h         |  6 ++---
 include/linux/mm.h                  | 48 +++++++++++++++++++++++++++++++++++++
 5 files changed, 72 insertions(+), 11 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded
  2018-10-15 16:42 [RFC][PATCH 0/3] pgtable bytes mis-accounting v2 Martin Schwidefsky
@ 2018-10-15 16:42 ` Martin Schwidefsky
  2018-10-31  9:02   ` Kirill A. Shutemov
  2018-10-15 16:42 ` [PATCH 2/3] mm: add mm_pxd_folded checks to pgtable_bytes accounting functions Martin Schwidefsky
  2018-10-15 16:42 ` [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes Martin Schwidefsky
  2 siblings, 1 reply; 19+ messages in thread
From: Martin Schwidefsky @ 2018-10-15 16:42 UTC (permalink / raw)
  To: Li Wang, Guenter Roeck, Janosch Frank
  Cc: Kirill A. Shutemov, Heiko Carstens, linux-kernel, Linux-MM,
	Martin Schwidefsky

Add three architecture overrideable function to test if the
p4d, pud, or pmd layer of a page table is folded or not.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 include/linux/mm.h | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0416a7204be3..d1029972541c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -105,6 +105,46 @@ extern int mmap_rnd_compat_bits __read_mostly;
 #define mm_zero_struct_page(pp)  ((void)memset((pp), 0, sizeof(struct page)))
 #endif
 
+/*
+ * On some architectures it depends on the mm if the p4d/pud or pmd
+ * layer of the page table hierarchy is folded or not.
+ */
+#ifndef mm_p4d_folded
+#define mm_p4d_folded(mm) mm_p4d_folded(mm)
+static inline bool mm_p4d_folded(struct mm_struct *mm)
+{
+#ifdef __PAGETABLE_P4D_FOLDED
+	return 1;
+#else
+	return 0;
+#endif
+}
+#endif
+
+#ifndef mm_pud_folded
+#define mm_pud_folded(mm) mm_pud_folded(mm)
+static inline bool mm_pud_folded(struct mm_struct *mm)
+{
+#ifdef __PAGETABLE_PUD_FOLDED
+	return 1;
+#else
+	return 0;
+#endif
+}
+#endif
+
+#ifndef mm_pmd_folded
+#define mm_pmd_folded(mm) mm_pmd_folded(mm)
+static inline bool mm_pmd_folded(struct mm_struct *mm)
+{
+#ifdef __PAGETABLE_PMD_FOLDED
+	return 1;
+#else
+	return 0;
+#endif
+}
+#endif
+
 /*
  * Default maximum number of active map areas, this limits the number of vmas
  * per mm struct. Users can overwrite this number by sysctl but there is a
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
  2018-10-15 16:42 [RFC][PATCH 0/3] pgtable bytes mis-accounting v2 Martin Schwidefsky
  2018-10-15 16:42 ` [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded Martin Schwidefsky
@ 2018-10-15 16:42 ` Martin Schwidefsky
  2018-10-31  9:04   ` Kirill A. Shutemov
  2018-10-15 16:42 ` [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes Martin Schwidefsky
  2 siblings, 1 reply; 19+ messages in thread
From: Martin Schwidefsky @ 2018-10-15 16:42 UTC (permalink / raw)
  To: Li Wang, Guenter Roeck, Janosch Frank
  Cc: Kirill A. Shutemov, Heiko Carstens, linux-kernel, Linux-MM,
	Martin Schwidefsky

The common mm code calls mm_dec_nr_pmds() and mm_dec_nr_puds()
in free_pgtables() if the address range spans a full pud or pmd.
If mm_dec_nr_puds/mm_dec_nr_pmds are non-empty due to configuration
settings they blindly subtract the size of the pmd or pud table from
pgtable_bytes even if the pud or pmd page table layer is folded.

Add explicit mm_[pmd|pud]_folded checks to the four pgtable_bytes
accounting functions mm_inc_nr_puds, mm_inc_nr_pmds, mm_dec_nr_puds
and mm_dec_nr_pmds. As the check for folded page tables can be
overwritten by the architecture, this allows to keep a correct
pgtable_bytes value for platforms that use a dynamic number of
page table levels.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 include/linux/mm.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d1029972541c..67f55c71e59a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1764,11 +1764,15 @@ int __pud_alloc(struct mm_struct *mm, p4d_t *p4d, unsigned long address);
 
 static inline void mm_inc_nr_puds(struct mm_struct *mm)
 {
+	if (mm_pud_folded(mm))
+		return;
 	atomic_long_add(PTRS_PER_PUD * sizeof(pud_t), &mm->pgtables_bytes);
 }
 
 static inline void mm_dec_nr_puds(struct mm_struct *mm)
 {
+	if (mm_pud_folded(mm))
+		return;
 	atomic_long_sub(PTRS_PER_PUD * sizeof(pud_t), &mm->pgtables_bytes);
 }
 #endif
@@ -1788,11 +1792,15 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address);
 
 static inline void mm_inc_nr_pmds(struct mm_struct *mm)
 {
+	if (mm_pmd_folded(mm))
+		return;
 	atomic_long_add(PTRS_PER_PMD * sizeof(pmd_t), &mm->pgtables_bytes);
 }
 
 static inline void mm_dec_nr_pmds(struct mm_struct *mm)
 {
+	if (mm_pmd_folded(mm))
+		return;
 	atomic_long_sub(PTRS_PER_PMD * sizeof(pmd_t), &mm->pgtables_bytes);
 }
 #endif
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
  2018-10-15 16:42 [RFC][PATCH 0/3] pgtable bytes mis-accounting v2 Martin Schwidefsky
  2018-10-15 16:42 ` [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded Martin Schwidefsky
  2018-10-15 16:42 ` [PATCH 2/3] mm: add mm_pxd_folded checks to pgtable_bytes accounting functions Martin Schwidefsky
@ 2018-10-15 16:42 ` Martin Schwidefsky
       [not found]   ` <CAEemH2cHNFsiDqPF32K6TNn-XoXCRT0wP4ccAeah4bKHt=FKFA@mail.gmail.com>
  2 siblings, 1 reply; 19+ messages in thread
From: Martin Schwidefsky @ 2018-10-15 16:42 UTC (permalink / raw)
  To: Li Wang, Guenter Roeck, Janosch Frank
  Cc: Kirill A. Shutemov, Heiko Carstens, linux-kernel, Linux-MM,
	Martin Schwidefsky

In case a fork or a clone system fails in copy_process and the error
handling does the mmput() at the bad_fork_cleanup_mm label, the
following warning messages will appear on the console:

  BUG: non-zero pgtables_bytes on freeing mm: 16384

The reason for that is the tricks we play with mm_inc_nr_puds() and
mm_inc_nr_pmds() in init_new_context().

A normal 64-bit process has 3 levels of page table, the p4d level and
the pud level are folded. On process termination the free_pud_range()
function in mm/memory.c will subtract 16KB from pgtable_bytes with a
mm_dec_nr_puds() call, but there actually is not really a pud table.

One issue with this is the fact that pgtable_bytes is usually off
by a few kilobytes, but the more severe problem is that for a failed
fork or clone the free_pgtables() function is not called. In this case
there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
BUG message. The message itself is purely cosmetic, but annoying.

To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
function to check for the true size of the address space.

Reported-by: Li Wang <liwang@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/include/asm/mmu_context.h |  5 -----
 arch/s390/include/asm/pgalloc.h     |  6 +++---
 arch/s390/include/asm/pgtable.h     | 18 ++++++++++++++++++
 arch/s390/include/asm/tlb.h         |  6 +++---
 4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
index 0717ee76885d..f1ab9420ccfb 100644
--- a/arch/s390/include/asm/mmu_context.h
+++ b/arch/s390/include/asm/mmu_context.h
@@ -45,8 +45,6 @@ static inline int init_new_context(struct task_struct *tsk,
 		mm->context.asce_limit = STACK_TOP_MAX;
 		mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
 				   _ASCE_USER_BITS | _ASCE_TYPE_REGION3;
-		/* pgd_alloc() did not account this pud */
-		mm_inc_nr_puds(mm);
 		break;
 	case -PAGE_SIZE:
 		/* forked 5-level task, set new asce with new_mm->pgd */
@@ -62,9 +60,6 @@ static inline int init_new_context(struct task_struct *tsk,
 		/* forked 2-level compat task, set new asce with new mm->pgd */
 		mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
 				   _ASCE_USER_BITS | _ASCE_TYPE_SEGMENT;
-		/* pgd_alloc() did not account this pmd */
-		mm_inc_nr_pmds(mm);
-		mm_inc_nr_puds(mm);
 	}
 	crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm));
 	return 0;
diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index f0f9bcf94c03..5ee733720a57 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -36,11 +36,11 @@ static inline void crst_table_init(unsigned long *crst, unsigned long entry)
 
 static inline unsigned long pgd_entry_type(struct mm_struct *mm)
 {
-	if (mm->context.asce_limit <= _REGION3_SIZE)
+	if (mm_pmd_folded(mm))
 		return _SEGMENT_ENTRY_EMPTY;
-	if (mm->context.asce_limit <= _REGION2_SIZE)
+	if (mm_pud_folded(mm))
 		return _REGION3_ENTRY_EMPTY;
-	if (mm->context.asce_limit <= _REGION1_SIZE)
+	if (mm_p4d_folded(mm))
 		return _REGION2_ENTRY_EMPTY;
 	return _REGION1_ENTRY_EMPTY;
 }
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0e7cb0dc9c33..de05466ce50c 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -485,6 +485,24 @@ static inline int is_module_addr(void *addr)
 				   _REGION_ENTRY_PROTECT | \
 				   _REGION_ENTRY_NOEXEC)
 
+static inline bool mm_p4d_folded(struct mm_struct *mm)
+{
+	return mm->context.asce_limit <= _REGION1_SIZE;
+}
+#define mm_p4d_folded(mm) mm_p4d_folded(mm)
+
+static inline bool mm_pud_folded(struct mm_struct *mm)
+{
+	return mm->context.asce_limit <= _REGION2_SIZE;
+}
+#define mm_pud_folded(mm) mm_pud_folded(mm)
+
+static inline bool mm_pmd_folded(struct mm_struct *mm)
+{
+	return mm->context.asce_limit <= _REGION3_SIZE;
+}
+#define mm_pmd_folded(mm) mm_pmd_folded(mm)
+
 static inline int mm_has_pgste(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index 457b7ba0fbb6..b31c779cf581 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -136,7 +136,7 @@ static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
 				unsigned long address)
 {
-	if (tlb->mm->context.asce_limit <= _REGION3_SIZE)
+	if (mm_pmd_folded(tlb->mm))
 		return;
 	pgtable_pmd_page_dtor(virt_to_page(pmd));
 	tlb_remove_table(tlb, pmd);
@@ -152,7 +152,7 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
 static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
 				unsigned long address)
 {
-	if (tlb->mm->context.asce_limit <= _REGION1_SIZE)
+	if (mm_p4d_folded(tlb->mm))
 		return;
 	tlb_remove_table(tlb, p4d);
 }
@@ -167,7 +167,7 @@ static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
 static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
 				unsigned long address)
 {
-	if (tlb->mm->context.asce_limit <= _REGION2_SIZE)
+	if (mm_pud_folded(tlb->mm))
 		return;
 	tlb_remove_table(tlb, pud);
 }
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
       [not found]   ` <CAEemH2cHNFsiDqPF32K6TNn-XoXCRT0wP4ccAeah4bKHt=FKFA@mail.gmail.com>
@ 2018-10-31  6:31     ` Martin Schwidefsky
       [not found]       ` <CAEemH2f2gW22PJYpVrh7p5zJyHOVRfVawJWD+kN3+8LmApePbw@mail.gmail.com>
  2018-10-31 10:09       ` Heiko Carstens
  0 siblings, 2 replies; 19+ messages in thread
From: Martin Schwidefsky @ 2018-10-31  6:31 UTC (permalink / raw)
  To: Li Wang
  Cc: Guenter Roeck, Janosch Frank, Kirill A. Shutemov, Heiko Carstens,
	linux-kernel, Linux-MM

On Wed, 31 Oct 2018 14:18:33 +0800
Li Wang <liwang@redhat.com> wrote:

> On Tue, Oct 16, 2018 at 12:42 AM, Martin Schwidefsky <schwidefsky@de.ibm.com
> > wrote:  
> 
> > In case a fork or a clone system fails in copy_process and the error
> > handling does the mmput() at the bad_fork_cleanup_mm label, the
> > following warning messages will appear on the console:
> >
> >   BUG: non-zero pgtables_bytes on freeing mm: 16384
> >
> > The reason for that is the tricks we play with mm_inc_nr_puds() and
> > mm_inc_nr_pmds() in init_new_context().
> >
> > A normal 64-bit process has 3 levels of page table, the p4d level and
> > the pud level are folded. On process termination the free_pud_range()
> > function in mm/memory.c will subtract 16KB from pgtable_bytes with a
> > mm_dec_nr_puds() call, but there actually is not really a pud table.
> >
> > One issue with this is the fact that pgtable_bytes is usually off
> > by a few kilobytes, but the more severe problem is that for a failed
> > fork or clone the free_pgtables() function is not called. In this case
> > there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
> > the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
> > The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
> > BUG message. The message itself is purely cosmetic, but annoying.
> >
> > To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
> > function to check for the true size of the address space.
> >  
> 
> I can confirm that it works to the problem, the warning message is gone
> after applying this patch on s390x. And I also done ltp syscalls/cve test
> for the patch set on x86_64 arch, there has no new regression.
> 
> Tested-by: Li Wang <liwang@redhat.com>

Thanks for testing. Unfortunately Heiko reported another issue yesterday
with the patch applied. This time the other way around:

BUG: non-zero pgtables_bytes on freeing mm: -16384

I am trying to understand how this can happen. For now I would like to
keep the patch on hold in case they need another change.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
       [not found]       ` <CAEemH2f2gW22PJYpVrh7p5zJyHOVRfVawJWD+kN3+8LmApePbw@mail.gmail.com>
@ 2018-10-31  6:46         ` Martin Schwidefsky
  2018-10-31  9:39           ` Martin Schwidefsky
  0 siblings, 1 reply; 19+ messages in thread
From: Martin Schwidefsky @ 2018-10-31  6:46 UTC (permalink / raw)
  To: Li Wang
  Cc: Guenter Roeck, Janosch Frank, Kirill A. Shutemov, Heiko Carstens,
	linux-kernel, Linux-MM

On Wed, 31 Oct 2018 14:43:38 +0800
Li Wang <liwang@redhat.com> wrote:

> On Wed, Oct 31, 2018 at 2:31 PM, Martin Schwidefsky <schwidefsky@de.ibm.com>
> wrote:
> 
> > On Wed, 31 Oct 2018 14:18:33 +0800
> > Li Wang <liwang@redhat.com> wrote:
> >  
> > > On Tue, Oct 16, 2018 at 12:42 AM, Martin Schwidefsky <  
> > schwidefsky@de.ibm.com  
> > > > wrote:  
> > >  
> > > > In case a fork or a clone system fails in copy_process and the error
> > > > handling does the mmput() at the bad_fork_cleanup_mm label, the
> > > > following warning messages will appear on the console:
> > > >
> > > >   BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > >
> > > > The reason for that is the tricks we play with mm_inc_nr_puds() and
> > > > mm_inc_nr_pmds() in init_new_context().
> > > >
> > > > A normal 64-bit process has 3 levels of page table, the p4d level and
> > > > the pud level are folded. On process termination the free_pud_range()
> > > > function in mm/memory.c will subtract 16KB from pgtable_bytes with a
> > > > mm_dec_nr_puds() call, but there actually is not really a pud table.
> > > >
> > > > One issue with this is the fact that pgtable_bytes is usually off
> > > > by a few kilobytes, but the more severe problem is that for a failed
> > > > fork or clone the free_pgtables() function is not called. In this case
> > > > there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
> > > > the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
> > > > The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
> > > > BUG message. The message itself is purely cosmetic, but annoying.
> > > >
> > > > To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
> > > > function to check for the true size of the address space.
> > > >  
> > >
> > > I can confirm that it works to the problem, the warning message is gone
> > > after applying this patch on s390x. And I also done ltp syscalls/cve test
> > > for the patch set on x86_64 arch, there has no new regression.
> > >
> > > Tested-by: Li Wang <liwang@redhat.com>  
> >
> > Thanks for testing. Unfortunately Heiko reported another issue yesterday
> > with the patch applied. This time the other way around:
> >
> > BUG: non-zero pgtables_bytes on freeing mm: -16384
> >  
> 
> Okay, the problem is still triggered by LTP/cve-2017-17052.c?

No, unfortunately we do not have a simple testcase to trigger this new bug.
It happened once with one of our test kernels, the path that leads to this
is completely unclear.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded
  2018-10-15 16:42 ` [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded Martin Schwidefsky
@ 2018-10-31  9:02   ` Kirill A. Shutemov
  2018-10-31  9:35     ` Martin Schwidefsky
  0 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2018-10-31  9:02 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Li Wang, Guenter Roeck, Janosch Frank, Kirill A. Shutemov,
	Heiko Carstens, linux-kernel, Linux-MM

On Mon, Oct 15, 2018 at 06:42:37PM +0200, Martin Schwidefsky wrote:
> Add three architecture overrideable function to test if the
> p4d, pud, or pmd layer of a page table is folded or not.
> 
> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> ---
>  include/linux/mm.h | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0416a7204be3..d1029972541c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h

Shouldn't it be somewhere in asm-generic/pgtable*?

> @@ -105,6 +105,46 @@ extern int mmap_rnd_compat_bits __read_mostly;
>  #define mm_zero_struct_page(pp)  ((void)memset((pp), 0, sizeof(struct page)))
>  #endif
>  
> +/*
> + * On some architectures it depends on the mm if the p4d/pud or pmd
> + * layer of the page table hierarchy is folded or not.
> + */
> +#ifndef mm_p4d_folded
> +#define mm_p4d_folded(mm) mm_p4d_folded(mm)

Do we need to define it in generic header?

> +static inline bool mm_p4d_folded(struct mm_struct *mm)
> +{
> +#ifdef __PAGETABLE_P4D_FOLDED
> +	return 1;
> +#else
> +	return 0;
> +#endif

Maybe
	return __is_defined(__PAGETABLE_P4D_FOLDED);

?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
  2018-10-15 16:42 ` [PATCH 2/3] mm: add mm_pxd_folded checks to pgtable_bytes accounting functions Martin Schwidefsky
@ 2018-10-31  9:04   ` Kirill A. Shutemov
  0 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2018-10-31  9:04 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Li Wang, Guenter Roeck, Janosch Frank, Kirill A. Shutemov,
	Heiko Carstens, linux-kernel, Linux-MM

On Mon, Oct 15, 2018 at 06:42:38PM +0200, Martin Schwidefsky wrote:
> The common mm code calls mm_dec_nr_pmds() and mm_dec_nr_puds()
> in free_pgtables() if the address range spans a full pud or pmd.
> If mm_dec_nr_puds/mm_dec_nr_pmds are non-empty due to configuration
> settings they blindly subtract the size of the pmd or pud table from
> pgtable_bytes even if the pud or pmd page table layer is folded.
> 
> Add explicit mm_[pmd|pud]_folded checks to the four pgtable_bytes
> accounting functions mm_inc_nr_puds, mm_inc_nr_pmds, mm_dec_nr_puds
> and mm_dec_nr_pmds. As the check for folded page tables can be
> overwritten by the architecture, this allows to keep a correct
> pgtable_bytes value for platforms that use a dynamic number of
> page table levels.
> 
> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

Looks fine to me.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded
  2018-10-31  9:02   ` Kirill A. Shutemov
@ 2018-10-31  9:35     ` Martin Schwidefsky
  2018-10-31  9:48       ` Kirill A. Shutemov
  0 siblings, 1 reply; 19+ messages in thread
From: Martin Schwidefsky @ 2018-10-31  9:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Li Wang, Guenter Roeck, Janosch Frank, Kirill A. Shutemov,
	Heiko Carstens, linux-kernel, Linux-MM

On Wed, 31 Oct 2018 12:02:55 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Mon, Oct 15, 2018 at 06:42:37PM +0200, Martin Schwidefsky wrote:
> > Add three architecture overrideable function to test if the
> > p4d, pud, or pmd layer of a page table is folded or not.
> > 
> > Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > ---
> >  include/linux/mm.h | 40 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 0416a7204be3..d1029972541c 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h  
> 
> Shouldn't it be somewhere in asm-generic/pgtable*?

If you prefer the definitions in asm-generic that is fine with me.
I'll give it a try to see if it still compiles.

> > @@ -105,6 +105,46 @@ extern int mmap_rnd_compat_bits __read_mostly;
> >  #define mm_zero_struct_page(pp)  ((void)memset((pp), 0, sizeof(struct page)))
> >  #endif
> >  
> > +/*
> > + * On some architectures it depends on the mm if the p4d/pud or pmd
> > + * layer of the page table hierarchy is folded or not.
> > + */
> > +#ifndef mm_p4d_folded
> > +#define mm_p4d_folded(mm) mm_p4d_folded(mm)  
> 
> Do we need to define it in generic header?

That is true, it should work without the #define in the generic header.

> > +static inline bool mm_p4d_folded(struct mm_struct *mm)
> > +{
> > +#ifdef __PAGETABLE_P4D_FOLDED
> > +	return 1;
> > +#else
> > +	return 0;
> > +#endif  
> 
> Maybe
> 	return __is_defined(__PAGETABLE_P4D_FOLDED);
> 
> ?
 
I have tried that, doesn't work. The reason is that the
__PAGETABLE_xxx_FOLDED defines to not have a value.

#define __PAGETABLE_P4D_FOLDED
#define __PAGETABLE_PMD_FOLDED
#define __PAGETABLE_PUD_FOLDED

While the definition of CONFIG_xxx symbols looks like this

#define CONFIG_xxx 1

The __is_defined needs the value for the __take_second_arg trick.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
  2018-10-31  6:46         ` Martin Schwidefsky
@ 2018-10-31  9:39           ` Martin Schwidefsky
  0 siblings, 0 replies; 19+ messages in thread
From: Martin Schwidefsky @ 2018-10-31  9:39 UTC (permalink / raw)
  To: Li Wang
  Cc: Guenter Roeck, Janosch Frank, Kirill A. Shutemov, Heiko Carstens,
	linux-kernel, Linux-MM

On Wed, 31 Oct 2018 07:46:47 +0100
Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> On Wed, 31 Oct 2018 14:43:38 +0800
> Li Wang <liwang@redhat.com> wrote:
> 
> > On Wed, Oct 31, 2018 at 2:31 PM, Martin Schwidefsky <schwidefsky@de.ibm.com>
> > wrote:
> >   
> > > BUG: non-zero pgtables_bytes on freeing mm: -16384
> > >    
> > 
> > Okay, the problem is still triggered by LTP/cve-2017-17052.c?  
> 
> No, unfortunately we do not have a simple testcase to trigger this new bug.
> It happened once with one of our test kernels, the path that leads to this
> is completely unclear.
 
Ok, got it. There is a mm_inc_nr_puds(mm) missing in the s390 code:

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 76d89ee8b428..814f26520aa2 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -101,6 +101,7 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
                        mm->context.asce_limit = _REGION1_SIZE;
                        mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
                                _ASCE_USER_BITS | _ASCE_TYPE_REGION2;
+                       mm_inc_nr_puds(mm);
                } else {
                        crst_table_init(table, _REGION1_ENTRY_EMPTY);
                        pgd_populate(mm, (pgd_t *) table, (p4d_t *) pgd);

One of our test-cases did an upgrade of a 3-level page table.
I'll update the patch and send a v3.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded
  2018-10-31  9:35     ` Martin Schwidefsky
@ 2018-10-31  9:48       ` Kirill A. Shutemov
  0 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2018-10-31  9:48 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Li Wang, Guenter Roeck, Janosch Frank, Kirill A. Shutemov,
	Heiko Carstens, linux-kernel, Linux-MM

On Wed, Oct 31, 2018 at 10:35:36AM +0100, Martin Schwidefsky wrote:
> > Maybe
> > 	return __is_defined(__PAGETABLE_P4D_FOLDED);
> > 
> > ?
>  
> I have tried that, doesn't work. The reason is that the
> __PAGETABLE_xxx_FOLDED defines to not have a value.
> 
> #define __PAGETABLE_P4D_FOLDED
> #define __PAGETABLE_PMD_FOLDED
> #define __PAGETABLE_PUD_FOLDED
> 
> While the definition of CONFIG_xxx symbols looks like this
> 
> #define CONFIG_xxx 1
> 
> The __is_defined needs the value for the __take_second_arg trick.

I guess this is easily fixable :)

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
  2018-10-31  6:31     ` Martin Schwidefsky
       [not found]       ` <CAEemH2f2gW22PJYpVrh7p5zJyHOVRfVawJWD+kN3+8LmApePbw@mail.gmail.com>
@ 2018-10-31 10:09       ` Heiko Carstens
  2018-10-31 10:36         ` Kirill A. Shutemov
  1 sibling, 1 reply; 19+ messages in thread
From: Heiko Carstens @ 2018-10-31 10:09 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Li Wang, Guenter Roeck, Janosch Frank, linux-kernel, Linux-MM,
	Martin Schwidefsky

On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
> Thanks for testing. Unfortunately Heiko reported another issue yesterday
> with the patch applied. This time the other way around:
> 
> BUG: non-zero pgtables_bytes on freeing mm: -16384
> 
> I am trying to understand how this can happen. For now I would like to
> keep the patch on hold in case they need another change.

FWIW, Kirill: is there a reason why this "BUG:" output is done with
pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?

That would to get more information with DEBUG_VM and / or
panic_on_warn=1 set. At least for automated testing it would be nice
to have such triggers.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
  2018-10-31 10:09       ` Heiko Carstens
@ 2018-10-31 10:36         ` Kirill A. Shutemov
  2018-11-27  7:34           ` Heiko Carstens
  0 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2018-10-31 10:36 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Kirill A. Shutemov, Li Wang, Guenter Roeck, Janosch Frank,
	linux-kernel, Linux-MM, Martin Schwidefsky

On Wed, Oct 31, 2018 at 11:09:44AM +0100, Heiko Carstens wrote:
> On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
> > Thanks for testing. Unfortunately Heiko reported another issue yesterday
> > with the patch applied. This time the other way around:
> > 
> > BUG: non-zero pgtables_bytes on freeing mm: -16384
> > 
> > I am trying to understand how this can happen. For now I would like to
> > keep the patch on hold in case they need another change.
> 
> FWIW, Kirill: is there a reason why this "BUG:" output is done with
> pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?
> 
> That would to get more information with DEBUG_VM and / or
> panic_on_warn=1 set. At least for automated testing it would be nice
> to have such triggers.

Stack trace is not helpful there. It will always show the exit path which
is useless.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
  2018-10-31 10:36         ` Kirill A. Shutemov
@ 2018-11-27  7:34           ` Heiko Carstens
  2018-11-27  8:05             ` Kirill A. Shutemov
                               ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Heiko Carstens @ 2018-11-27  7:34 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Li Wang, Guenter Roeck, Janosch Frank,
	linux-kernel, Linux-MM, Martin Schwidefsky

On Wed, Oct 31, 2018 at 01:36:23PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 31, 2018 at 11:09:44AM +0100, Heiko Carstens wrote:
> > On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
> > > Thanks for testing. Unfortunately Heiko reported another issue yesterday
> > > with the patch applied. This time the other way around:
> > > 
> > > BUG: non-zero pgtables_bytes on freeing mm: -16384
> > > 
> > > I am trying to understand how this can happen. For now I would like to
> > > keep the patch on hold in case they need another change.
> > 
> > FWIW, Kirill: is there a reason why this "BUG:" output is done with
> > pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?
> > 
> > That would to get more information with DEBUG_VM and / or
> > panic_on_warn=1 set. At least for automated testing it would be nice
> > to have such triggers.
> 
> Stack trace is not helpful there. It will always show the exit path which
> is useless.

So, even with the updated version of these patches I can flood dmesg
and the console with

BUG: non-zero pgtables_bytes on freeing mm: 16384

messages with this complex reproducer on s390:

echo "void main(void) {}" | gcc -m31 -xc -o compat - && ./compat

Besides that this needs to be fixed, I'd really like to see this
changed to either a printk_once() or a WARN_ON_ONCE() within
check_mm() so that an arbitrary user cannot flood the console.

E.g. something like the below. If there aren't any objections, I will
provide a proper patch with changelog, etc.

diff --git a/kernel/fork.c b/kernel/fork.c
index 07cddff89c7b..d7aeec03c57f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -647,8 +647,8 @@ static void check_mm(struct mm_struct *mm)
 	}
 
 	if (mm_pgtables_bytes(mm))
-		pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
-				mm_pgtables_bytes(mm));
+		printk_once(KERN_ALERT "BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
+			    mm_pgtables_bytes(mm));
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
 	VM_BUG_ON_MM(mm->pmd_huge_pte, mm);


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
  2018-11-27  7:34           ` Heiko Carstens
@ 2018-11-27  8:05             ` Kirill A. Shutemov
  2018-11-27  8:13               ` Heiko Carstens
  2018-11-27 11:47             ` Guenter Roeck
  2018-11-27 14:31             ` Martin Schwidefsky
  2 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2018-11-27  8:05 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Kirill A. Shutemov, Li Wang, Guenter Roeck, Janosch Frank,
	linux-kernel, Linux-MM, Martin Schwidefsky

On Tue, Nov 27, 2018 at 08:34:12AM +0100, Heiko Carstens wrote:
> On Wed, Oct 31, 2018 at 01:36:23PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Oct 31, 2018 at 11:09:44AM +0100, Heiko Carstens wrote:
> > > On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
> > > > Thanks for testing. Unfortunately Heiko reported another issue yesterday
> > > > with the patch applied. This time the other way around:
> > > > 
> > > > BUG: non-zero pgtables_bytes on freeing mm: -16384
> > > > 
> > > > I am trying to understand how this can happen. For now I would like to
> > > > keep the patch on hold in case they need another change.
> > > 
> > > FWIW, Kirill: is there a reason why this "BUG:" output is done with
> > > pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?
> > > 
> > > That would to get more information with DEBUG_VM and / or
> > > panic_on_warn=1 set. At least for automated testing it would be nice
> > > to have such triggers.
> > 
> > Stack trace is not helpful there. It will always show the exit path which
> > is useless.
> 
> So, even with the updated version of these patches I can flood dmesg
> and the console with
> 
> BUG: non-zero pgtables_bytes on freeing mm: 16384
> 
> messages with this complex reproducer on s390:
> 
> echo "void main(void) {}" | gcc -m31 -xc -o compat - && ./compat
> 
> Besides that this needs to be fixed, I'd really like to see this
> changed to either a printk_once() or a WARN_ON_ONCE() within
> check_mm() so that an arbitrary user cannot flood the console.
> 
> E.g. something like the below. If there aren't any objections, I will
> provide a proper patch with changelog, etc.
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 07cddff89c7b..d7aeec03c57f 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -647,8 +647,8 @@ static void check_mm(struct mm_struct *mm)
>  	}
>  
>  	if (mm_pgtables_bytes(mm))
> -		pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> -				mm_pgtables_bytes(mm));
> +		printk_once(KERN_ALERT "BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> +			    mm_pgtables_bytes(mm));

You can be the first user of pr_alert_once(). Don't miss a chance! ;)

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
  2018-11-27  8:05             ` Kirill A. Shutemov
@ 2018-11-27  8:13               ` Heiko Carstens
  0 siblings, 0 replies; 19+ messages in thread
From: Heiko Carstens @ 2018-11-27  8:13 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Li Wang, Guenter Roeck, Janosch Frank,
	linux-kernel, Linux-MM, Martin Schwidefsky

On Tue, Nov 27, 2018 at 11:05:15AM +0300, Kirill A. Shutemov wrote:
> > E.g. something like the below. If there aren't any objections, I will
> > provide a proper patch with changelog, etc.
> > 
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 07cddff89c7b..d7aeec03c57f 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -647,8 +647,8 @@ static void check_mm(struct mm_struct *mm)
> >  	}
> >  
> >  	if (mm_pgtables_bytes(mm))
> > -		pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> > -				mm_pgtables_bytes(mm));
> > +		printk_once(KERN_ALERT "BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> > +			    mm_pgtables_bytes(mm));
> 
> You can be the first user of pr_alert_once(). Don't miss a chance! ;)

I didn't expect that that one exists. ;) Will do.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
  2018-11-27  7:34           ` Heiko Carstens
  2018-11-27  8:05             ` Kirill A. Shutemov
@ 2018-11-27 11:47             ` Guenter Roeck
  2018-11-27 11:52               ` Heiko Carstens
  2018-11-27 14:31             ` Martin Schwidefsky
  2 siblings, 1 reply; 19+ messages in thread
From: Guenter Roeck @ 2018-11-27 11:47 UTC (permalink / raw)
  To: Heiko Carstens, Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Li Wang, Janosch Frank, linux-kernel,
	Linux-MM, Martin Schwidefsky

On 11/26/18 11:34 PM, Heiko Carstens wrote:
> On Wed, Oct 31, 2018 at 01:36:23PM +0300, Kirill A. Shutemov wrote:
>> On Wed, Oct 31, 2018 at 11:09:44AM +0100, Heiko Carstens wrote:
>>> On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:
>>>> Thanks for testing. Unfortunately Heiko reported another issue yesterday
>>>> with the patch applied. This time the other way around:
>>>>
>>>> BUG: non-zero pgtables_bytes on freeing mm: -16384
>>>>
>>>> I am trying to understand how this can happen. For now I would like to
>>>> keep the patch on hold in case they need another change.
>>>
>>> FWIW, Kirill: is there a reason why this "BUG:" output is done with
>>> pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?
>>>
>>> That would to get more information with DEBUG_VM and / or
>>> panic_on_warn=1 set. At least for automated testing it would be nice
>>> to have such triggers.
>>
>> Stack trace is not helpful there. It will always show the exit path which
>> is useless.
> 
> So, even with the updated version of these patches I can flood dmesg
> and the console with
> 
> BUG: non-zero pgtables_bytes on freeing mm: 16384
> 
> messages with this complex reproducer on s390:
> 
> echo "void main(void) {}" | gcc -m31 -xc -o compat - && ./compat
> 
> Besides that this needs to be fixed, I'd really like to see this
> changed to either a printk_once() or a WARN_ON_ONCE() within
> check_mm() so that an arbitrary user cannot flood the console.
> 
> E.g. something like the below. If there aren't any objections, I will
> provide a proper patch with changelog, etc.
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 07cddff89c7b..d7aeec03c57f 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -647,8 +647,8 @@ static void check_mm(struct mm_struct *mm)
>   	}
>   
>   	if (mm_pgtables_bytes(mm))
> -		pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> -				mm_pgtables_bytes(mm));
> +		printk_once(KERN_ALERT "BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> +			    mm_pgtables_bytes(mm));
>   

pr_alert_once ?

Guenter

>   #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
>   	VM_BUG_ON_MM(mm->pmd_huge_pte, mm);
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
  2018-11-27 11:47             ` Guenter Roeck
@ 2018-11-27 11:52               ` Heiko Carstens
  0 siblings, 0 replies; 19+ messages in thread
From: Heiko Carstens @ 2018-11-27 11:52 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, Li Wang, Janosch Frank,
	linux-kernel, Linux-MM, Martin Schwidefsky

On Tue, Nov 27, 2018 at 03:47:13AM -0800, Guenter Roeck wrote:
> >E.g. something like the below. If there aren't any objections, I will
> >provide a proper patch with changelog, etc.
> >
> >diff --git a/kernel/fork.c b/kernel/fork.c
> >index 07cddff89c7b..d7aeec03c57f 100644
> >--- a/kernel/fork.c
> >+++ b/kernel/fork.c
> >@@ -647,8 +647,8 @@ static void check_mm(struct mm_struct *mm)
> >  	}
> >  	if (mm_pgtables_bytes(mm))
> >-		pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> >-				mm_pgtables_bytes(mm));
> >+		printk_once(KERN_ALERT "BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
> >+			    mm_pgtables_bytes(mm));
> 
> pr_alert_once ?

Already changed and posted:

https://lore.kernel.org/lkml/20181127083603.39041-1-heiko.carstens@de.ibm.com/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes
  2018-11-27  7:34           ` Heiko Carstens
  2018-11-27  8:05             ` Kirill A. Shutemov
  2018-11-27 11:47             ` Guenter Roeck
@ 2018-11-27 14:31             ` Martin Schwidefsky
  2 siblings, 0 replies; 19+ messages in thread
From: Martin Schwidefsky @ 2018-11-27 14:31 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, Li Wang, Guenter Roeck,
	Janosch Frank, linux-kernel, Linux-MM

On Tue, 27 Nov 2018 08:34:12 +0100
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> On Wed, Oct 31, 2018 at 01:36:23PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Oct 31, 2018 at 11:09:44AM +0100, Heiko Carstens wrote:  
> > > On Wed, Oct 31, 2018 at 07:31:49AM +0100, Martin Schwidefsky wrote:  
> > > > Thanks for testing. Unfortunately Heiko reported another issue yesterday
> > > > with the patch applied. This time the other way around:
> > > > 
> > > > BUG: non-zero pgtables_bytes on freeing mm: -16384
> > > > 
> > > > I am trying to understand how this can happen. For now I would like to
> > > > keep the patch on hold in case they need another change.  
> > > 
> > > FWIW, Kirill: is there a reason why this "BUG:" output is done with
> > > pr_alert() and not with VM_BUG_ON() or one of the WARN*() variants?
> > > 
> > > That would to get more information with DEBUG_VM and / or
> > > panic_on_warn=1 set. At least for automated testing it would be nice
> > > to have such triggers.  
> > 
> > Stack trace is not helpful there. It will always show the exit path which
> > is useless.  
> 
> So, even with the updated version of these patches I can flood dmesg
> and the console with
> 
> BUG: non-zero pgtables_bytes on freeing mm: 16384
> 
> messages with this complex reproducer on s390:
> 
> echo "void main(void) {}" | gcc -m31 -xc -o compat - && ./compat

Forgot a hunk in the fix.. I claim not enough coffee :-/
Patch is queued and I will send a please pull by the end of the week.
--
From c0499f2aa853939984ecaf0d393012486e56c7ce Mon Sep 17 00:00:00 2001
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Date: Tue, 27 Nov 2018 14:04:04 +0100
Subject: [PATCH] s390/mm: correct pgtable_bytes on page table downgrade

The downgrade of a page table from 3 levels to 2 levels for a 31-bit compat
process removes a pmd table which has to be counted against pgtable_bytes.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/mm/pgalloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 814f26520aa2..6791562779ee 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -131,6 +131,7 @@ void crst_table_downgrade(struct mm_struct *mm)
 	}
 
 	pgd = mm->pgd;
+	mm_dec_nr_pmds(mm);
 	mm->pgd = (pgd_t *) (pgd_val(*pgd) & _REGION_ENTRY_ORIGIN);
 	mm->context.asce_limit = _REGION3_SIZE;
 	mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
-- 
2.16.4
-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-11-27 14:31 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-15 16:42 [RFC][PATCH 0/3] pgtable bytes mis-accounting v2 Martin Schwidefsky
2018-10-15 16:42 ` [PATCH 1/3] mm: introduce mm_[p4d|pud|pmd]_folded Martin Schwidefsky
2018-10-31  9:02   ` Kirill A. Shutemov
2018-10-31  9:35     ` Martin Schwidefsky
2018-10-31  9:48       ` Kirill A. Shutemov
2018-10-15 16:42 ` [PATCH 2/3] mm: add mm_pxd_folded checks to pgtable_bytes accounting functions Martin Schwidefsky
2018-10-31  9:04   ` Kirill A. Shutemov
2018-10-15 16:42 ` [PATCH 3/3] s390/mm: fix mis-accounting of pgtable_bytes Martin Schwidefsky
     [not found]   ` <CAEemH2cHNFsiDqPF32K6TNn-XoXCRT0wP4ccAeah4bKHt=FKFA@mail.gmail.com>
2018-10-31  6:31     ` Martin Schwidefsky
     [not found]       ` <CAEemH2f2gW22PJYpVrh7p5zJyHOVRfVawJWD+kN3+8LmApePbw@mail.gmail.com>
2018-10-31  6:46         ` Martin Schwidefsky
2018-10-31  9:39           ` Martin Schwidefsky
2018-10-31 10:09       ` Heiko Carstens
2018-10-31 10:36         ` Kirill A. Shutemov
2018-11-27  7:34           ` Heiko Carstens
2018-11-27  8:05             ` Kirill A. Shutemov
2018-11-27  8:13               ` Heiko Carstens
2018-11-27 11:47             ` Guenter Roeck
2018-11-27 11:52               ` Heiko Carstens
2018-11-27 14:31             ` Martin Schwidefsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).