On Wed, Oct 31, 2018 at 2:31 PM, Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

On Wed, 31 Oct 2018 14:18:33 +0800
Li Wang <liwang@redhat.com> wrote:

> On Tue, Oct 16, 2018 at 12:42 AM, Martin Schwidefsky <schwidefsky@de.ibm.com
> > wrote:
>
> > In case a fork or a clone system fails in copy_process and the error
> > handling does the mmput() at the bad_fork_cleanup_mm label, the
> > following warning messages will appear on the console:
> >
> > BUG: non-zero pgtables_bytes on freeing mm: 16384
> >
> > The reason for that is the tricks we play with mm_inc_nr_puds() and
> > mm_inc_nr_pmds() in init_new_context().
> >
> > A normal 64-bit process has 3 levels of page table, the p4d level and
> > the pud level are folded. On process termination the free_pud_range()
> > function in mm/memory.c will subtract 16KB from pgtable_bytes with a
> > mm_dec_nr_puds() call, but there actually is not really a pud table.
> >
> > One issue with this is the fact that pgtable_bytes is usually off
> > by a few kilobytes, but the more severe problem is that for a failed
> > fork or clone the free_pgtables() function is not called. In this case
> > there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
> > the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
> > The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
> > BUG message. The message itself is purely cosmetic, but annoying.
> >
> > To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
> > function to check for the true size of the address space.
> >
>
> I can confirm that it works to the problem, the warning message is gone
> after applying this patch on s390x. And I also done ltp syscalls/cve test
> for the patch set on x86_64 arch, there has no new regression.
>
> Tested-by: Li Wang <liwang@redhat.com>

Thanks for testing. Unfortunately Heiko reported another issue yesterday
with the patch applied. This time the other way around:

BUG: non-zero pgtables_bytes on freeing mm: -16384

Okay, the problem is still triggered by LTP/cve-2017-17052.c?

I tried this patch on my platform and it works! My test environment as:

# lscpu

Architecture: s390x

CPU op-mode(s): 32-bit, 64-bit

Byte Order: Big Endian

CPU(s): 2

On-line CPU(s) list: 0,1

Thread(s) per core: 1

Core(s) per socket: 1

Socket(s) per book: 1

Book(s) per drawer: 1

Drawer(s): 2

Vendor ID: IBM/S390

Machine type: 2827

CPU dynamic MHz: 5504

CPU static MHz: 5504

BogoMIPS: 2913.00

Hypervisor vendor: vertical

Virtualization type: full

Dispatching mode: horizontal

L1d cache: 96K

L1i cache: 64K

L2d cache: 1024K

L2i cache: 1024K

L3 cache: 49152K

L4 cache: 393216K

Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te sie

I am trying to understand how this can happen. For now I would like to
keep the patch on hold in case they need another change.

Sure.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

Regards,

Li Wang