linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: Linux 4.18-rc7
       [not found] <CA+55aFxpFefwVdTGVML99PSFUqwpJXPx5LVCA3D=g2t2_QLNsA@mail.gmail.com>
@ 2018-07-30  6:47 ` Amit Pundir
  2018-07-30 13:01   ` Kirill A. Shutemov
  0 siblings, 1 reply; 32+ messages in thread
From: Amit Pundir @ 2018-07-30  6:47 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	aarcange, Linus Torvalds
  Cc: Greg Kroah-Hartman, John Stultz, linux-mm, lkml

On Mon, 30 Jul 2018 at 03:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So unless something odd happens, this should be the last rc for 4.18.
>
> Nothing particularly odd happened this last week - we got the usual
> random set of various minor fixes all over. About two thirds of it is
> drivers - networking, staging and usb stands out, but there's a little
> bit of stuff all over (clk, block, gpu, nvme..).
>
> Outside of drivers, the bulk is some core networking stuff, with
> random changes elsewhere (minor arch updates, filesystems, core
> kernel, test scripts).
>
> The appended shortlog gives a flavor of the details.
>
>                   Linus
>
> ---
> Kirill A. Shutemov (3):
>       mm: introduce vma_init()
>       mm: use vma_init() to initialize VMAs on stack and data segments
>       mm: fix vma_is_anonymous() false-positives

Hi, I have run into AOSP userspace crash with v4.18-rc7, leading to
above mm patches. bfd40eaff5ab ("mm: fix vma_is_anonymous()
false-positives") to be specific. The same userspace is working fine
with v4.18-rc6.

I didn't yet look into what is going wrong from userspace point of
view, but I just wanted to give you a heads up on this. I'll be happy
to assist in further debugging/diagnosis if required.

Here is the crash log from logcat, if it helps:
F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
F DEBUG   : Build fingerprint:
'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key
F DEBUG   : Revision: '0'
F DEBUG   : ABI: 'arm'
F DEBUG   : pid: 2261, tid: 2261, name: zygote  >>> zygote <<<
F DEBUG   : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
.. <snip> ..
F DEBUG   : backtrace:
F DEBUG   :     #00 pc 00001c04  /system/lib/libc.so (memset+48)
F DEBUG   :     #01 pc 0010c513  /system/lib/libart.so
(create_mspace_with_base+82)
F DEBUG   :     #02 pc 0015c601  /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int,
unsigned int)+40)
F DEBUG   :     #03 pc 0015c3ed  /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*,
std::__1::basic_string<char, std::__
1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int,
unsigned int, unsigned int, unsigned int, bool)+36)
F DEBUG   :     #04 pc 0013c9ab  /system/lib/libart.so
(art::gc::Heap::Heap(unsigned int, unsigned int, unsigned int,
unsigned int, double, double, unsigned int, unsigned int,
std::__1::basic_string<char, std::__1::char_traits<char>,
std::__1::allocator<char>> const&, art::InstructionSet,
art::gc::CollectorType, art::gc::CollectorType,
art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int,
unsigned int, bool, unsigned int, unsigned int, bool, bool, bool,
bool, bool, bool, bool, bool, bool, bool, bool, unsigned long
long)+1674)
DEBUG   :     #05 pc 00318201  /system/lib/libart.so
(art::Runtime::Init(art::RuntimeArgumentMap&&)+7036)
DEBUG   :     #06 pc 0031af19  /system/lib/libart.so
(art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>,
std::__1::allocator<std::__1::pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char>>, void
const*>>> const&, bool)+68)
F DEBUG   :     #07 pc 0023c353  /system/lib/libart.so (JNI_CreateJavaVM+658)
F DEBUG   :     #08 pc 0000205f  /system/lib/libandroid_runtime.so
(android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool)+5038)
F DEBUG   :     #09 pc 00002381  /system/lib/libandroid_runtime.so
(android::AndroidRuntime::start(char const*,
android::Vector<android::String8> const&, bool)+196)
F DEBUG   :     #10 pc 0000046b  /system/bin/app_process32 (main+702)

Regards,
Amit Pundir

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-30  6:47 ` Linux 4.18-rc7 Amit Pundir
@ 2018-07-30 13:01   ` Kirill A. Shutemov
  2018-07-30 13:34     ` Amit Pundir
  2018-07-30 17:32     ` Linus Torvalds
  0 siblings, 2 replies; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-07-30 13:01 UTC (permalink / raw)
  To: Amit Pundir
  Cc: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	aarcange, Linus Torvalds, Greg Kroah-Hartman, John Stultz,
	linux-mm, lkml, youling 257

On Mon, Jul 30, 2018 at 12:17:46PM +0530, Amit Pundir wrote:
> On Mon, 30 Jul 2018 at 03:39, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So unless something odd happens, this should be the last rc for 4.18.
> >
> > Nothing particularly odd happened this last week - we got the usual
> > random set of various minor fixes all over. About two thirds of it is
> > drivers - networking, staging and usb stands out, but there's a little
> > bit of stuff all over (clk, block, gpu, nvme..).
> >
> > Outside of drivers, the bulk is some core networking stuff, with
> > random changes elsewhere (minor arch updates, filesystems, core
> > kernel, test scripts).
> >
> > The appended shortlog gives a flavor of the details.
> >
> >                   Linus
> >
> > ---
> > Kirill A. Shutemov (3):
> >       mm: introduce vma_init()
> >       mm: use vma_init() to initialize VMAs on stack and data segments
> >       mm: fix vma_is_anonymous() false-positives
> 
> Hi, I have run into AOSP userspace crash with v4.18-rc7, leading to
> above mm patches. bfd40eaff5ab ("mm: fix vma_is_anonymous()
> false-positives") to be specific. The same userspace is working fine
> with v4.18-rc6.
> 
> I didn't yet look into what is going wrong from userspace point of
> view, but I just wanted to give you a heads up on this. I'll be happy
> to assist in further debugging/diagnosis if required.

Youling reported basically the same bug with zygote crashing, but on
x86-64.

I think I missed vma_set_anonymous() somewhere, but I fail to see where.

Could you check if removing 'vma->vm_ops = &dummy_vm_ops;" from vma_init
makes the problem go away?

Any chance the code that crashes can be run under strace?

> Here is the crash log from logcat, if it helps:
> F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
> F DEBUG   : Build fingerprint:
> 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key
> F DEBUG   : Revision: '0'
> F DEBUG   : ABI: 'arm'
> F DEBUG   : pid: 2261, tid: 2261, name: zygote  >>> zygote <<<
> F DEBUG   : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
> .. <snip> ..
> F DEBUG   : backtrace:
> F DEBUG   :     #00 pc 00001c04  /system/lib/libc.so (memset+48)
> F DEBUG   :     #01 pc 0010c513  /system/lib/libart.so
> (create_mspace_with_base+82)
> F DEBUG   :     #02 pc 0015c601  /system/lib/libart.so
> (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int,
> unsigned int)+40)
> F DEBUG   :     #03 pc 0015c3ed  /system/lib/libart.so
> (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*,
> std::__1::basic_string<char, std::__
> 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int,
> unsigned int, unsigned int, unsigned int, bool)+36)
> F DEBUG   :     #04 pc 0013c9ab  /system/lib/libart.so
> (art::gc::Heap::Heap(unsigned int, unsigned int, unsigned int,
> unsigned int, double, double, unsigned int, unsigned int,
> std::__1::basic_string<char, std::__1::char_traits<char>,
> std::__1::allocator<char>> const&, art::InstructionSet,
> art::gc::CollectorType, art::gc::CollectorType,
> art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int,
> unsigned int, bool, unsigned int, unsigned int, bool, bool, bool,
> bool, bool, bool, bool, bool, bool, bool, bool, unsigned long
> long)+1674)
> DEBUG   :     #05 pc 00318201  /system/lib/libart.so
> (art::Runtime::Init(art::RuntimeArgumentMap&&)+7036)
> DEBUG   :     #06 pc 0031af19  /system/lib/libart.so
> (art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>,
> std::__1::allocator<std::__1::pair<std::__1::basic_string<char,
> std::__1::char_traits<char>, std::__1::allocator<char>>, void
> const*>>> const&, bool)+68)
> F DEBUG   :     #07 pc 0023c353  /system/lib/libart.so (JNI_CreateJavaVM+658)
> F DEBUG   :     #08 pc 0000205f  /system/lib/libandroid_runtime.so
> (android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool)+5038)
> F DEBUG   :     #09 pc 00002381  /system/lib/libandroid_runtime.so
> (android::AndroidRuntime::start(char const*,
> android::Vector<android::String8> const&, bool)+196)
> F DEBUG   :     #10 pc 0000046b  /system/bin/app_process32 (main+702)
> 
> Regards,
> Amit Pundir
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-30 13:01   ` Kirill A. Shutemov
@ 2018-07-30 13:34     ` Amit Pundir
  2018-07-30 17:32     ` Linus Torvalds
  1 sibling, 0 replies; 32+ messages in thread
From: Amit Pundir @ 2018-07-30 13:34 UTC (permalink / raw)
  To: kirill
  Cc: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	aarcange, Linus Torvalds, Greg Kroah-Hartman, John Stultz,
	linux-mm, lkml, youling 257

On Mon, 30 Jul 2018 at 18:31, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Mon, Jul 30, 2018 at 12:17:46PM +0530, Amit Pundir wrote:
> > On Mon, 30 Jul 2018 at 03:39, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > So unless something odd happens, this should be the last rc for 4.18.
> > >
> > > Nothing particularly odd happened this last week - we got the usual
> > > random set of various minor fixes all over. About two thirds of it is
> > > drivers - networking, staging and usb stands out, but there's a little
> > > bit of stuff all over (clk, block, gpu, nvme..).
> > >
> > > Outside of drivers, the bulk is some core networking stuff, with
> > > random changes elsewhere (minor arch updates, filesystems, core
> > > kernel, test scripts).
> > >
> > > The appended shortlog gives a flavor of the details.
> > >
> > >                   Linus
> > >
> > > ---
> > > Kirill A. Shutemov (3):
> > >       mm: introduce vma_init()
> > >       mm: use vma_init() to initialize VMAs on stack and data segments
> > >       mm: fix vma_is_anonymous() false-positives
> >
> > Hi, I have run into AOSP userspace crash with v4.18-rc7, leading to
> > above mm patches. bfd40eaff5ab ("mm: fix vma_is_anonymous()
> > false-positives") to be specific. The same userspace is working fine
> > with v4.18-rc6.
> >
> > I didn't yet look into what is going wrong from userspace point of
> > view, but I just wanted to give you a heads up on this. I'll be happy
> > to assist in further debugging/diagnosis if required.
>
> Youling reported basically the same bug with zygote crashing, but on
> x86-64.
>
> I think I missed vma_set_anonymous() somewhere, but I fail to see where.
>
> Could you check if removing 'vma->vm_ops = &dummy_vm_ops;" from vma_init
> makes the problem go away?

Yes removing 'vma->vm_ops = &dummy_vm_ops;" from vma_init() works.
Crash is gone with that change.

>
> Any chance the code that crashes can be run under strace?

Running strace on zygote is going to be a pain. I can check logcat
again and see if any other relatively less complex process is crashing
with similar backtrace and try to run that with strace if that is
still required.

Regards,
Amit Pundir

>
> > Here is the crash log from logcat, if it helps:
> > F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
> > F DEBUG   : Build fingerprint:
> > 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key
> > F DEBUG   : Revision: '0'
> > F DEBUG   : ABI: 'arm'
> > F DEBUG   : pid: 2261, tid: 2261, name: zygote  >>> zygote <<<
> > F DEBUG   : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
> > .. <snip> ..
> > F DEBUG   : backtrace:
> > F DEBUG   :     #00 pc 00001c04  /system/lib/libc.so (memset+48)
> > F DEBUG   :     #01 pc 0010c513  /system/lib/libart.so
> > (create_mspace_with_base+82)
> > F DEBUG   :     #02 pc 0015c601  /system/lib/libart.so
> > (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int,
> > unsigned int)+40)
> > F DEBUG   :     #03 pc 0015c3ed  /system/lib/libart.so
> > (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*,
> > std::__1::basic_string<char, std::__
> > 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int,
> > unsigned int, unsigned int, unsigned int, bool)+36)
> > F DEBUG   :     #04 pc 0013c9ab  /system/lib/libart.so
> > (art::gc::Heap::Heap(unsigned int, unsigned int, unsigned int,
> > unsigned int, double, double, unsigned int, unsigned int,
> > std::__1::basic_string<char, std::__1::char_traits<char>,
> > std::__1::allocator<char>> const&, art::InstructionSet,
> > art::gc::CollectorType, art::gc::CollectorType,
> > art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int,
> > unsigned int, bool, unsigned int, unsigned int, bool, bool, bool,
> > bool, bool, bool, bool, bool, bool, bool, bool, unsigned long
> > long)+1674)
> > DEBUG   :     #05 pc 00318201  /system/lib/libart.so
> > (art::Runtime::Init(art::RuntimeArgumentMap&&)+7036)
> > DEBUG   :     #06 pc 0031af19  /system/lib/libart.so
> > (art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char,
> > std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>,
> > std::__1::allocator<std::__1::pair<std::__1::basic_string<char,
> > std::__1::char_traits<char>, std::__1::allocator<char>>, void
> > const*>>> const&, bool)+68)
> > F DEBUG   :     #07 pc 0023c353  /system/lib/libart.so (JNI_CreateJavaVM+658)
> > F DEBUG   :     #08 pc 0000205f  /system/lib/libandroid_runtime.so
> > (android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool)+5038)
> > F DEBUG   :     #09 pc 00002381  /system/lib/libandroid_runtime.so
> > (android::AndroidRuntime::start(char const*,
> > android::Vector<android::String8> const&, bool)+196)
> > F DEBUG   :     #10 pc 0000046b  /system/bin/app_process32 (main+702)
> >
> > Regards,
> > Amit Pundir
> >
>
> --
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-30 13:01   ` Kirill A. Shutemov
  2018-07-30 13:34     ` Amit Pundir
@ 2018-07-30 17:32     ` Linus Torvalds
  2018-07-30 21:53       ` Hugh Dickins
  1 sibling, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2018-07-30 17:32 UTC (permalink / raw)
  To: Kirill A. Shutemov, Hugh Dickins, Matthew Wilcox
  Cc: Amit Pundir, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, John Stultz,
	linux-mm, Linux Kernel Mailing List, youling257

On Mon, Jul 30, 2018 at 6:01 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> I think I missed vma_set_anonymous() somewhere, but I fail to see where.

Honestly, by now we just need to revert that commit.

It's not even clear that it was a good idea to begin with. The rest of
the commits were cleanups, this one was driven by a incorrect
VM_BUG_ON() that triggered, and that checked "vma_is_anonymous(vma)"
without any explanations of wht it should matter.

I think the biggest problem with vma_is_anonymous() may be its name,
not what it does.

What the code historically *did* (and what vma_is_anonymous() checks)
is not "is this anonymous", but rather "does this have any special
operations associated with it".

The two are similar. But people have grown opinions about exactly what
"anonymous" means. If we had named it just "no_vm_ops()", we wouldn't
have random crazy checks for "vma_is_anonymous()" in places where it
makes no sense.

So what I think we want a real explanation for is why people who use
"vma_is_anonymous()" care. Instead of trying to change its very
historical meaning, we should look at the users, and perhaps change
its name.

In this case, for example, I think the *real* problem was described by
commit 684283988f70 ("huge pagecache: mmap_sem is unlocked when
truncation splits pmd"), and the problem is that an existing check
that required that mmap_sem was held was changed to say "only for
anonymous mappings".

But the fact is, you can truncate mappings that don't have any ops just *fine*.

So maybe that original BUG() was entirely bogus to begin with, and it
shouldn't exist at all?

Or maybe the code should test "do I have a vm_file" instead of testing
"do I have vm_ops"?

What's the problem with just doing split_huge_pmd() there when it's a
pmd_trans_huge or pmd_devmap pmd? Why is that VM_BUG_ON_VMA() there in
the first place? Why are allegedly "anonymous" mappings so special
here for locking?

Adding a few more people to the cc, they were involved the last that
time VM_BUG_ON_VMA() was modified.

New people: see commit bfd40eaff5ab ("mm: fix vma_is_anonymous()
false-positives") for details. Right now I think it's getting
reverted, but the oops explanation in the commit is about that

            kernel BUG at mm/memory.c:1422!

which was/is debatable and seems to make no sense (and definitely is
still triggerable despite that commit 684283988f70 ("huge pagecache:
mmap_sem is unlocked when truncation splits pmd") that limited it a
bit - but I think it didn't limit it enough.

               Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-30 17:32     ` Linus Torvalds
@ 2018-07-30 21:53       ` Hugh Dickins
  2018-07-31  1:01         ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2018-07-30 21:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Hugh Dickins, Matthew Wilcox, Amit Pundir,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Mon, 30 Jul 2018, Linus Torvalds wrote:
> On Mon, Jul 30, 2018 at 6:01 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > I think I missed vma_set_anonymous() somewhere, but I fail to see where.
> 
> Honestly, by now we just need to revert that commit.
> 
> It's not even clear that it was a good idea to begin with. The rest of
> the commits were cleanups, this one was driven by a incorrect
> VM_BUG_ON() that triggered, and that checked "vma_is_anonymous(vma)"
> without any explanations of wht it should matter.
> 
> I think the biggest problem with vma_is_anonymous() may be its name,
> not what it does.
> 
> What the code historically *did* (and what vma_is_anonymous() checks)
> is not "is this anonymous", but rather "does this have any special
> operations associated with it".
> 
> The two are similar. But people have grown opinions about exactly what
> "anonymous" means. If we had named it just "no_vm_ops()", we wouldn't
> have random crazy checks for "vma_is_anonymous()" in places where it
> makes no sense.
> 
> So what I think we want a real explanation for is why people who use
> "vma_is_anonymous()" care. Instead of trying to change its very
> historical meaning, we should look at the users, and perhaps change
> its name.

You make a very good point on the naming. Especially confusing when
layered on top of "we call shmem 'file' here, but 'anon' there".

I have no problem with reverting -rc7's vma_is_anonymous() series.

> 
> In this case, for example, I think the *real* problem was described by
> commit 684283988f70 ("huge pagecache: mmap_sem is unlocked when
> truncation splits pmd"), and the problem is that an existing check
> that required that mmap_sem was held was changed to say "only for
> anonymous mappings".
> 
> But the fact is, you can truncate mappings that don't have any ops just *fine*.
> 
> So maybe that original BUG() was entirely bogus to begin with, and it
> shouldn't exist at all?
> 
> Or maybe the code should test "do I have a vm_file" instead of testing
> "do I have vm_ops"?
> 
> What's the problem with just doing split_huge_pmd() there when it's a
> pmd_trans_huge or pmd_devmap pmd? Why is that VM_BUG_ON_VMA() there in
> the first place? Why are allegedly "anonymous" mappings so special
> here for locking?
> 
> Adding a few more people to the cc, they were involved the last that
> time VM_BUG_ON_VMA() was modified.
> 
> New people: see commit bfd40eaff5ab ("mm: fix vma_is_anonymous()
> false-positives") for details. Right now I think it's getting
> reverted, but the oops explanation in the commit is about that
> 
>             kernel BUG at mm/memory.c:1422!
> 
> which was/is debatable and seems to make no sense (and definitely is
> still triggerable despite that commit 684283988f70 ("huge pagecache:
> mmap_sem is unlocked when truncation splits pmd") that limited it a
> bit - but I think it didn't limit it enough.

I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was
just a compromise with those who wanted to keep something there;
I don't think we even need a WARN_ON_ONCE() now.

It's historical: back in the day when only (hugetlbfs which never
gets there, and) anon THP used huge pmds for userspace, it did half-
document an obscure assumption underlying __split_huge_pmd(), and
IIRC was added in response to a trinity bug catch in that area.

(It remains quite interesting how exit_mmap() does not come that way,
and most syscalls split the vma beforehand in vma_adjust(): it's mostly
about madvise(,,MADV_DONTNEED), perhaps others now, which zap ptes
without prior splitting.)

Once DAX and huge tmpfs came in, the BUG had to be scrapped or weakened
because truncation and hole-punching can come that way without mmap_sem.
And perhaps other drivers since are taking advantage of huge pmds now,
with other assumptions.

But I have not yet taken a look to see where the -rc7 changes actually
go wrong: I'll spend a little while looking, but don't expect to find it -
certainly don't wait on me, I'll only speak up if I find something.

Hugh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-30 21:53       ` Hugh Dickins
@ 2018-07-31  1:01         ` Linus Torvalds
  2018-07-31  3:26           ` Hugh Dickins
  2018-07-31  6:29           ` Kirill A. Shutemov
  0 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2018-07-31  1:01 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Matthew Wilcox, Amit Pundir,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
>
> I have no problem with reverting -rc7's vma_is_anonymous() series.

I don't think we need to revert the whole series: I think the rest are
all fairly obvious cleanups, and shouldn't really have any semantic
changes.

It's literally only that last patch in the series that then changes
that meaning of "vm_ops". And I don't really _mind_ that last step
either, but since we don't know exactly what it was that it broke, and
we're past rc7, I don't think we really have any option but the revert
it.

And if we revert it, I think we need to just remove the
VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
it is quite likely that the real bug is that overzealous BUG_ON(),
since I can't see any reason why anonymous mappings should be special
there.

But I'm certainly also ok with re-visiting that commit later.  I just
think that right _now_ the above is my preferred plan.

Kirill?

> I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was
> just a compromise with those who wanted to keep something there;
> I don't think we even need a WARN_ON_ONCE() now.

So to me it looks like a historical check that simply doesn't
"normally" trigger, but there's no reason I can see why we should care
about the case it tests against.

> (It remains quite interesting how exit_mmap() does not come that way,
> and most syscalls split the vma beforehand in vma_adjust(): it's mostly
> about madvise(,,MADV_DONTNEED), perhaps others now, which zap ptes
> without prior splitting.)

Well, in this case it's the ftruncate() path, which fundamentally
doesn't split the vma at all (prior *or* later). But yes, madvise() is
in the same boat - it doesn't change the vma at all, it just changes
the contents of the vma.

And exit_mmap() is special because it just tears down everything.

So we do have three very distinct cases:

 (a) changing and thus splitting the vma itself (mprotect, munmap/mmap, mlock),

 (b) not changing the vma, but changing the underlying mapping
(truncate and madvise(MADV_DONTNEED)

 (c) tearing down everything, and no locking needed because it's the
last user (exit_mmap).

that are different for what I think are good reasons.

                   Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  1:01         ` Linus Torvalds
@ 2018-07-31  3:26           ` Hugh Dickins
  2018-07-31  4:25             ` John Stultz
  2018-07-31  6:29           ` Kirill A. Shutemov
  1 sibling, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2018-07-31  3:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox, Amit Pundir,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Mon, 30 Jul 2018, Linus Torvalds wrote:
> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > I have no problem with reverting -rc7's vma_is_anonymous() series.
> 
> I don't think we need to revert the whole series: I think the rest are
> all fairly obvious cleanups, and shouldn't really have any semantic
> changes.

Okay.

> 
> It's literally only that last patch in the series that then changes
> that meaning of "vm_ops". And I don't really _mind_ that last step
> either, but since we don't know exactly what it was that it broke, and
> we're past rc7, I don't think we really have any option but the revert
> it.

It took me a long time to grasp what was happening, that that last
patch bfd40eaff5ab was fixing. Not quite explained in the commit.

I think it was that by mistakenly passing the vma_is_anonymous() test,
create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of
COWing pages from kcov); which the truncate then had to split, and in
going to do so, again hit the mistaken vma_is_anonymous() test, BUG.

> 
> And if we revert it, I think we need to just remove the
> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
> it is quite likely that the real bug is that overzealous BUG_ON(),
> since I can't see any reason why anonymous mappings should be special
> there.

Yes, that probably has to go: but it's not clear what state it leaves
us in, with an anon THP being split by a truncate, without the expected
locking; I don't remember offhand, probably a subtler bug than that BUG,
which you may or may not consider an improvement.

I fear that Kirill has not missed inserting a vma_set_anonymous() from
somewhere that it should be, but rather that zygote is working with some
special mapping which used to satisfy vma_is_anonymous(), faults supplying
backing pages, but now comes out as !vma_is_anonymous(), so do_fault()
finds !dummy_vm_ops.fault hence SIGBUS.

If that's so, perhaps dummy_vm_ops needs to be given a back-compatible
fault handler; or the driver(?) in question given vm_ops and that fault
handler. But when I say "back-compatible", I don't think it should ever
go so far as to supply a THP.

> 
> But I'm certainly also ok with re-visiting that commit later.  I just
> think that right _now_ the above is my preferred plan.
> 
> Kirill?
> 
> > I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was
> > just a compromise with those who wanted to keep something there;
> > I don't think we even need a WARN_ON_ONCE() now.
> 
> So to me it looks like a historical check that simply doesn't
> "normally" trigger, but there's no reason I can see why we should care
> about the case it tests against.
> 
> > (It remains quite interesting how exit_mmap() does not come that way,
> > and most syscalls split the vma beforehand in vma_adjust(): it's mostly
> > about madvise(,,MADV_DONTNEED), perhaps others now, which zap ptes
> > without prior splitting.)
> 
> Well, in this case it's the ftruncate() path, which fundamentally
> doesn't split the vma at all (prior *or* later). But yes, madvise() is
> in the same boat - it doesn't change the vma at all, it just changes
> the contents of the vma.
> 
> And exit_mmap() is special because it just tears down everything.
> 
> So we do have three very distinct cases:
> 
>  (a) changing and thus splitting the vma itself (mprotect, munmap/mmap, mlock),

Yes.

> 
>  (b) not changing the vma, but changing the underlying mapping
> (truncate and madvise(MADV_DONTNEED)

Yes, though I think I would distinguish the truncate & hole-punch case
from the madvise zap case, they have different characteristics in other
ways (or did, before that awkward case of truncating an anon THP surfaced).

> 
>  (c) tearing down everything, and no locking needed because it's the
> last user (exit_mmap).

Yes; and it goes linearly from start to finish, never jumping into
the middle of a vma, so never needing to split a THP.

> 
> that are different for what I think are good reasons.
> 
>                    Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  3:26           ` Hugh Dickins
@ 2018-07-31  4:25             ` John Stultz
  2018-07-31  6:40               ` Amit Pundir
  0 siblings, 1 reply; 32+ messages in thread
From: John Stultz @ 2018-07-31  4:25 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Kirill A. Shutemov, Matthew Wilcox, Amit Pundir,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling257, Joel Fernandes,
	Colin Cross

On Mon, Jul 30, 2018 at 8:26 PM, Hugh Dickins <hughd@google.com> wrote:
> On Mon, 30 Jul 2018, Linus Torvalds wrote:
>> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
>> >
>> > I have no problem with reverting -rc7's vma_is_anonymous() series.
>>
>> I don't think we need to revert the whole series: I think the rest are
>> all fairly obvious cleanups, and shouldn't really have any semantic
>> changes.
>
> Okay.
>
>>
>> It's literally only that last patch in the series that then changes
>> that meaning of "vm_ops". And I don't really _mind_ that last step
>> either, but since we don't know exactly what it was that it broke, and
>> we're past rc7, I don't think we really have any option but the revert
>> it.
>
> It took me a long time to grasp what was happening, that that last
> patch bfd40eaff5ab was fixing. Not quite explained in the commit.
>
> I think it was that by mistakenly passing the vma_is_anonymous() test,
> create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of
> COWing pages from kcov); which the truncate then had to split, and in
> going to do so, again hit the mistaken vma_is_anonymous() test, BUG.
>
>>
>> And if we revert it, I think we need to just remove the
>> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
>> it is quite likely that the real bug is that overzealous BUG_ON(),
>> since I can't see any reason why anonymous mappings should be special
>> there.
>
> Yes, that probably has to go: but it's not clear what state it leaves
> us in, with an anon THP being split by a truncate, without the expected
> locking; I don't remember offhand, probably a subtler bug than that BUG,
> which you may or may not consider an improvement.
>
> I fear that Kirill has not missed inserting a vma_set_anonymous() from
> somewhere that it should be, but rather that zygote is working with some
> special mapping which used to satisfy vma_is_anonymous(), faults supplying
> backing pages, but now comes out as !vma_is_anonymous(), so do_fault()
> finds !dummy_vm_ops.fault hence SIGBUS.

I've been only casually following this thread (mostly just glad Amit
caught it and I could avoid having to bisect the issue in my own
Android testing), but this bit starting to shake a few old cobwebs
loose in my brain.

I'm wondering if Zygote is utilizing ashmem here, and we're somehow
traversing ashmem purged memory, or due to some setup issue the
initial traverse isn't being zero-filled as expected?

ashmem ranges are created using: shmem_file_setup() and shmem_zero_setup()
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n377


If we purge pages, it punches it out with:
vfs_fallocate(range->asma->file,
     FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
     start, end - start);
here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n447

But in ashmem_pin(), we don't do anything other then returning if we
purged any page in the range.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n577

And I believe the future assumption is the if we traverse those pages
they will be zero filled (if purged or even during the initial
traversal after mmap)

Its been a long time, and I've not looked at the code in question but
it sounds from Hugh's comments above that we might instead get a
SIGBUS here.

Looking more at the problematic patch..
Amit: Does adding something like (whitespace damaged, apologies):

index a1a0025..1af6915 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -402,7 +402,8 @@ static int ashmem_mmap(struct file *file, struct
vm_area_struct *vma)
                        fput(asma->file);
                        goto out;
                }
-       }
+       } else
+               vma_set_anonymous(vma);

        if (vma->vm_file)
                fput(vma->vm_file);


Seem to resolve it? (Sorry, I'd test it myself, but I'm away from my
desk for the night).
thanks
-john

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  1:01         ` Linus Torvalds
  2018-07-31  3:26           ` Hugh Dickins
@ 2018-07-31  6:29           ` Kirill A. Shutemov
  2018-07-31 14:57             ` Kirill A. Shutemov
  1 sibling, 1 reply; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-07-31  6:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov,
	Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli,
	Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Mon, Jul 30, 2018 at 06:01:26PM -0700, Linus Torvalds wrote:
> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > I have no problem with reverting -rc7's vma_is_anonymous() series.
> 
> I don't think we need to revert the whole series: I think the rest are
> all fairly obvious cleanups, and shouldn't really have any semantic
> changes.
> 
> It's literally only that last patch in the series that then changes
> that meaning of "vm_ops". And I don't really _mind_ that last step
> either, but since we don't know exactly what it was that it broke, and
> we're past rc7, I don't think we really have any option but the revert
> it.
> 
> And if we revert it, I think we need to just remove the
> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
> it is quite likely that the real bug is that overzealous BUG_ON(),
> since I can't see any reason why anonymous mappings should be special
> there.
> 
> But I'm certainly also ok with re-visiting that commit later.  I just
> think that right _now_ the above is my preferred plan.
> 
> Kirill?

Considering the timing, I'm okay with reverting the last patch with
dropping the VM_BUG_ON_VMA().

But in the end I would like to see strong vma_is_anonymous().

The VM_BUG_ON_VMA() is only triggerable by the test case because
vma_is_anonymous() false-positive in fault path and we get anon-THP
allocated in file-private mapping.

I don't see immediately how this may trigger other crashes.
But it definitely looks wrong.

> > I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was
> > just a compromise with those who wanted to keep something there;
> > I don't think we even need a WARN_ON_ONCE() now.
> 
> So to me it looks like a historical check that simply doesn't
> "normally" trigger, but there's no reason I can see why we should care
> about the case it tests against.

I'll think more on what could go wrong with __split_huge_pmd() called on
anon-THP page without mmap_sem(). It's not yet clear cut to me.


-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  4:25             ` John Stultz
@ 2018-07-31  6:40               ` Amit Pundir
  2018-07-31  6:56                 ` Kirill A. Shutemov
  2018-07-31 16:29                 ` Linus Torvalds
  0 siblings, 2 replies; 32+ messages in thread
From: Amit Pundir @ 2018-07-31  6:40 UTC (permalink / raw)
  To: John Stultz
  Cc: Hugh Dickins, Linus Torvalds, kirill, willy, Kirill A. Shutemov,
	Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange,
	Greg Kroah-Hartman, linux-mm, lkml, youling 257, Joel Fernandes,
	ccross

On Tue, 31 Jul 2018 at 09:55, John Stultz <john.stultz@linaro.org> wrote:
>
> On Mon, Jul 30, 2018 at 8:26 PM, Hugh Dickins <hughd@google.com> wrote:
> > On Mon, 30 Jul 2018, Linus Torvalds wrote:
> >> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
> >> >
> >> > I have no problem with reverting -rc7's vma_is_anonymous() series.
> >>
> >> I don't think we need to revert the whole series: I think the rest are
> >> all fairly obvious cleanups, and shouldn't really have any semantic
> >> changes.
> >
> > Okay.
> >
> >>
> >> It's literally only that last patch in the series that then changes
> >> that meaning of "vm_ops". And I don't really _mind_ that last step
> >> either, but since we don't know exactly what it was that it broke, and
> >> we're past rc7, I don't think we really have any option but the revert
> >> it.
> >
> > It took me a long time to grasp what was happening, that that last
> > patch bfd40eaff5ab was fixing. Not quite explained in the commit.
> >
> > I think it was that by mistakenly passing the vma_is_anonymous() test,
> > create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of
> > COWing pages from kcov); which the truncate then had to split, and in
> > going to do so, again hit the mistaken vma_is_anonymous() test, BUG.
> >
> >>
> >> And if we revert it, I think we need to just remove the
> >> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
> >> it is quite likely that the real bug is that overzealous BUG_ON(),
> >> since I can't see any reason why anonymous mappings should be special
> >> there.
> >
> > Yes, that probably has to go: but it's not clear what state it leaves
> > us in, with an anon THP being split by a truncate, without the expected
> > locking; I don't remember offhand, probably a subtler bug than that BUG,
> > which you may or may not consider an improvement.
> >
> > I fear that Kirill has not missed inserting a vma_set_anonymous() from
> > somewhere that it should be, but rather that zygote is working with some
> > special mapping which used to satisfy vma_is_anonymous(), faults supplying
> > backing pages, but now comes out as !vma_is_anonymous(), so do_fault()
> > finds !dummy_vm_ops.fault hence SIGBUS.
>
> I've been only casually following this thread (mostly just glad Amit
> caught it and I could avoid having to bisect the issue in my own
> Android testing), but this bit starting to shake a few old cobwebs
> loose in my brain.
>
> I'm wondering if Zygote is utilizing ashmem here, and we're somehow
> traversing ashmem purged memory, or due to some setup issue the
> initial traverse isn't being zero-filled as expected?
>
> ashmem ranges are created using: shmem_file_setup() and shmem_zero_setup()
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n377
>
>
> If we purge pages, it punches it out with:
> vfs_fallocate(range->asma->file,
>      FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>      start, end - start);
> here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n447
>
> But in ashmem_pin(), we don't do anything other then returning if we
> purged any page in the range.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n577
>
> And I believe the future assumption is the if we traverse those pages
> they will be zero filled (if purged or even during the initial
> traversal after mmap)
>
> Its been a long time, and I've not looked at the code in question but
> it sounds from Hugh's comments above that we might instead get a
> SIGBUS here.
>
> Looking more at the problematic patch..
> Amit: Does adding something like (whitespace damaged, apologies):
>
> index a1a0025..1af6915 100644
> --- a/drivers/staging/android/ashmem.c
> +++ b/drivers/staging/android/ashmem.c
> @@ -402,7 +402,8 @@ static int ashmem_mmap(struct file *file, struct
> vm_area_struct *vma)
>                         fput(asma->file);
>                         goto out;
>                 }
> -       }
> +       } else
> +               vma_set_anonymous(vma);
>
>         if (vma->vm_file)
>                 fput(vma->vm_file);
>

This ashmem change ^^ worked too.

Regards,
Amit Pundir

>
> Seem to resolve it? (Sorry, I'd test it myself, but I'm away from my
> desk for the night).
> thanks
> -john

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  6:40               ` Amit Pundir
@ 2018-07-31  6:56                 ` Kirill A. Shutemov
  2018-07-31 16:29                 ` Linus Torvalds
  1 sibling, 0 replies; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-07-31  6:56 UTC (permalink / raw)
  To: Amit Pundir
  Cc: John Stultz, Hugh Dickins, Linus Torvalds, willy,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	aarcange, Greg Kroah-Hartman, linux-mm, lkml, youling 257,
	Joel Fernandes, ccross

On Tue, Jul 31, 2018 at 12:10:06PM +0530, Amit Pundir wrote:
> On Tue, 31 Jul 2018 at 09:55, John Stultz <john.stultz@linaro.org> wrote:
> >
> > On Mon, Jul 30, 2018 at 8:26 PM, Hugh Dickins <hughd@google.com> wrote:
> > > On Mon, 30 Jul 2018, Linus Torvalds wrote:
> > >> On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
> > >> >
> > >> > I have no problem with reverting -rc7's vma_is_anonymous() series.
> > >>
> > >> I don't think we need to revert the whole series: I think the rest are
> > >> all fairly obvious cleanups, and shouldn't really have any semantic
> > >> changes.
> > >
> > > Okay.
> > >
> > >>
> > >> It's literally only that last patch in the series that then changes
> > >> that meaning of "vm_ops". And I don't really _mind_ that last step
> > >> either, but since we don't know exactly what it was that it broke, and
> > >> we're past rc7, I don't think we really have any option but the revert
> > >> it.
> > >
> > > It took me a long time to grasp what was happening, that that last
> > > patch bfd40eaff5ab was fixing. Not quite explained in the commit.
> > >
> > > I think it was that by mistakenly passing the vma_is_anonymous() test,
> > > create_huge_pmd() gave the MAP_PRIVATE kcov mapping a THP (instead of
> > > COWing pages from kcov); which the truncate then had to split, and in
> > > going to do so, again hit the mistaken vma_is_anonymous() test, BUG.
> > >
> > >>
> > >> And if we revert it, I think we need to just remove the
> > >> VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
> > >> it is quite likely that the real bug is that overzealous BUG_ON(),
> > >> since I can't see any reason why anonymous mappings should be special
> > >> there.
> > >
> > > Yes, that probably has to go: but it's not clear what state it leaves
> > > us in, with an anon THP being split by a truncate, without the expected
> > > locking; I don't remember offhand, probably a subtler bug than that BUG,
> > > which you may or may not consider an improvement.
> > >
> > > I fear that Kirill has not missed inserting a vma_set_anonymous() from
> > > somewhere that it should be, but rather that zygote is working with some
> > > special mapping which used to satisfy vma_is_anonymous(), faults supplying
> > > backing pages, but now comes out as !vma_is_anonymous(), so do_fault()
> > > finds !dummy_vm_ops.fault hence SIGBUS.
> >
> > I've been only casually following this thread (mostly just glad Amit
> > caught it and I could avoid having to bisect the issue in my own
> > Android testing), but this bit starting to shake a few old cobwebs
> > loose in my brain.
> >
> > I'm wondering if Zygote is utilizing ashmem here, and we're somehow
> > traversing ashmem purged memory, or due to some setup issue the
> > initial traverse isn't being zero-filled as expected?
> >
> > ashmem ranges are created using: shmem_file_setup() and shmem_zero_setup()
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n377
> >
> >
> > If we purge pages, it punches it out with:
> > vfs_fallocate(range->asma->file,
> >      FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> >      start, end - start);
> > here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n447
> >
> > But in ashmem_pin(), we don't do anything other then returning if we
> > purged any page in the range.
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/android/ashmem.c#n577
> >
> > And I believe the future assumption is the if we traverse those pages
> > they will be zero filled (if purged or even during the initial
> > traversal after mmap)
> >
> > Its been a long time, and I've not looked at the code in question but
> > it sounds from Hugh's comments above that we might instead get a
> > SIGBUS here.
> >
> > Looking more at the problematic patch..
> > Amit: Does adding something like (whitespace damaged, apologies):
> >
> > index a1a0025..1af6915 100644
> > --- a/drivers/staging/android/ashmem.c
> > +++ b/drivers/staging/android/ashmem.c
> > @@ -402,7 +402,8 @@ static int ashmem_mmap(struct file *file, struct
> > vm_area_struct *vma)
> >                         fput(asma->file);
> >                         goto out;
> >                 }
> > -       }
> > +       } else
> > +               vma_set_anonymous(vma);
> >
> >         if (vma->vm_file)
> >                 fput(vma->vm_file);
> >
> 
> This ashmem change ^^ worked too.

Okay. It makes sense.

But I'm not convinced that's a legitimate way to get an anonymous mapping.

I don't know how ashmem suppose to work. Looks like we get a shmem file
associated with the mapping, even if user asked for private mapping.

Shouldn't in this case vm_ops point to shmem_vm_ops?

Note, we have only one other case when MAP_PRIVATE on a file gets you an
anonymous mapping -- /dev/zero.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  6:29           ` Kirill A. Shutemov
@ 2018-07-31 14:57             ` Kirill A. Shutemov
  2018-08-01  0:09               ` Hugh Dickins
  0 siblings, 1 reply; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-07-31 14:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Matthew Wilcox, Amit Pundir, Kirill A. Shutemov,
	Andrew Morton, Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli,
	Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Tue, Jul 31, 2018 at 09:29:27AM +0300, Kirill A. Shutemov wrote:
> On Mon, Jul 30, 2018 at 06:01:26PM -0700, Linus Torvalds wrote:
> > On Mon, Jul 30, 2018 at 2:53 PM Hugh Dickins <hughd@google.com> wrote:
> > >
> > > I have no problem with reverting -rc7's vma_is_anonymous() series.
> > 
> > I don't think we need to revert the whole series: I think the rest are
> > all fairly obvious cleanups, and shouldn't really have any semantic
> > changes.
> > 
> > It's literally only that last patch in the series that then changes
> > that meaning of "vm_ops". And I don't really _mind_ that last step
> > either, but since we don't know exactly what it was that it broke, and
> > we're past rc7, I don't think we really have any option but the revert
> > it.
> > 
> > And if we revert it, I think we need to just remove the
> > VM_BUG_ON_VMA() that it was supposed to fix. Because I do think that
> > it is quite likely that the real bug is that overzealous BUG_ON(),
> > since I can't see any reason why anonymous mappings should be special
> > there.
> > 
> > But I'm certainly also ok with re-visiting that commit later.  I just
> > think that right _now_ the above is my preferred plan.
> > 
> > Kirill?
> 
> Considering the timing, I'm okay with reverting the last patch with
> dropping the VM_BUG_ON_VMA().
> 
> But in the end I would like to see strong vma_is_anonymous().
> 
> The VM_BUG_ON_VMA() is only triggerable by the test case because
> vma_is_anonymous() false-positive in fault path and we get anon-THP
> allocated in file-private mapping.
> 
> I don't see immediately how this may trigger other crashes.
> But it definitely looks wrong.
> 
> > > I'm all for deleting that VM_BUG_ON_VMA() in zap_pmd_range(), it was
> > > just a compromise with those who wanted to keep something there;
> > > I don't think we even need a WARN_ON_ONCE() now.
> > 
> > So to me it looks like a historical check that simply doesn't
> > "normally" trigger, but there's no reason I can see why we should care
> > about the case it tests against.
> 
> I'll think more on what could go wrong with __split_huge_pmd() called on
> anon-THP page without mmap_sem(). It's not yet clear cut to me.

I think not having mmap_sem taken at least on read when we call
__split_huge_pmd() opens possiblity of race with khugepaged:
khugepaged can collapse the page back to THP as soon as we drop ptl.
As result pmd_none_or_trans_huge_or_clear_bad() would return true and we
basically leave the THP behind, not zapped.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31  6:40               ` Amit Pundir
  2018-07-31  6:56                 ` Kirill A. Shutemov
@ 2018-07-31 16:29                 ` Linus Torvalds
  2018-07-31 16:56                   ` John Stultz
                                     ` (4 more replies)
  1 sibling, 5 replies; 32+ messages in thread
From: Linus Torvalds @ 2018-07-31 16:29 UTC (permalink / raw)
  To: Amit Pundir
  Cc: John Stultz, Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org> wrote:
>
> This ashmem change ^^ worked too.

Ok, let's go for that one and hope it's the only one.

John, can I get a proper commit message and sign-off for that ashmem change?

Kirill - you mentioned that somebody reproduced a problem on x86-64
too. I didn't see that report. Was that some odd x86 Android setup
with Ashmem too, or is there something else pending?

                       Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 16:29                 ` Linus Torvalds
@ 2018-07-31 16:56                   ` John Stultz
  2018-07-31 17:03                   ` Kirill A. Shutemov
                                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: John Stultz @ 2018-07-31 16:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Amit Pundir, Hugh Dickins, Kirill A. Shutemov, Matthew Wilcox,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Tue, Jul 31, 2018 at 9:29 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org> wrote:
>>
>> This ashmem change ^^ worked too.
>
> Ok, let's go for that one and hope it's the only one.
>
> John, can I get a proper commit message and sign-off for that ashmem change?

Will do. Just doing some local testing myself to make sure all is well.

> Kirill - you mentioned that somebody reproduced a problem on x86-64
> too. I didn't see that report. Was that some odd x86 Android setup
> with Ashmem too, or is there something else pending?

Krill mentioned "zygote crashing, but on x86-64" and zygote is Android
so I assume it is the same issue.

thanks
-john

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 16:29                 ` Linus Torvalds
  2018-07-31 16:56                   ` John Stultz
@ 2018-07-31 17:03                   ` Kirill A. Shutemov
  2018-07-31 17:43                     ` Luck, Tony
  2018-07-31 17:17                   ` [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages John Stultz
                                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-07-31 17:03 UTC (permalink / raw)
  To: Linus Torvalds, Luck, Tony
  Cc: Amit Pundir, John Stultz, Hugh Dickins, Matthew Wilcox,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Tue, Jul 31, 2018 at 09:29:22AM -0700, Linus Torvalds wrote:
> On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org> wrote:
> >
> > This ashmem change ^^ worked too.
> 
> Ok, let's go for that one and hope it's the only one.
> 
> John, can I get a proper commit message and sign-off for that ashmem change?
> 
> Kirill - you mentioned that somebody reproduced a problem on x86-64
> too. I didn't see that report. Was that some odd x86 Android setup
> with Ashmem too, or is there something else pending?

I've got report from youling privately:

"mm: fix vma_is_anonymous() false-positives" cause my userspace boot failedi 1/4 ?
our Androidx86 userspace can running on linux mainline kerneli 1/4 ?
revert it boot succeed with 4.18rc7 kernel.

"mm: fix vma_is_anonymous() false-positives" cause these

07-30 11:04:19.556 1609 1609 F DEBUG : pid: 1304, tid: 1304, name: zygote
>>> zygote <<<
07-30 11:04:19.556 1609 1609 F DEBUG : signal 7 (SIGBUS), code 2
(BUS_ADRERR), fault addr 0x7494d008
07-30 11:04:19.556 1609 1609 F DEBUG : eax 00000000 ebx f337bb68 ecx
000001e0 edx 7494d008
07-30 11:04:19.556 1609 1609 F DEBUG : esi 7494d000 edi 00000000
07-30 11:04:19.556 1609 1609 F DEBUG : xcs 00000023 xds 0000002b xes
0000002b xfs 00000003 xss 0000002b
07-30 11:04:19.556 1609 1609 F DEBUG : eip f40f5c76 ebp ffa8d288 esp
ffa8d238 flags 00010202
07-30 11:04:19.581 1609 1609 F DEBUG :

-------------------------------------------------------------------------

The report also had screenshot attached about system info. It's a Baytrail
tablet with LinageOS, so I believe it's the same issue.

But it's not the only issue unfortunately. Tony reported issue with
booting ia64 with the patch. I have no clue why. I rechecked everything
ia64-specific and looks fine to me. :-/

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages
  2018-07-31 16:29                 ` Linus Torvalds
  2018-07-31 16:56                   ` John Stultz
  2018-07-31 17:03                   ` Kirill A. Shutemov
@ 2018-07-31 17:17                   ` John Stultz
  2018-07-31 22:57                   ` Linux 4.18-rc7 youling 257
  2018-07-31 23:07                   ` youling 257
  4 siblings, 0 replies; 32+ messages in thread
From: John Stultz @ 2018-07-31 17:17 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Amit Pundir, Kirill A. Shutemov, Kirill A. Shutemov,
	Andrew Morton, Dmitry Vyukov, Oleg Nesterov, aarcange,
	Linus Torvalds, Greg Kroah-Hartman, Hugh Dickins, Joel Fernandes,
	Colin Cross, Matthew Wilcox, linux-mm, youling 257

Amit Pundir and Youling in parallel reported crashes with recent
mainline kernels running Android:

F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
F DEBUG   : Build fingerprint:
'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key
F DEBUG   : Revision: '0'
F DEBUG   : ABI: 'arm'
F DEBUG   : pid: 2261, tid: 2261, name: zygote  >>> zygote <<<
F DEBUG   : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
... <snip> ...
F DEBUG   : backtrace:
F DEBUG   :     #00 pc 00001c04  /system/lib/libc.so (memset+48)
F DEBUG   :     #01 pc 0010c513  /system/lib/libart.so
(create_mspace_with_base+82)
F DEBUG   :     #02 pc 0015c601  /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int,
unsigned int)+40)
F DEBUG   :     #03 pc 0015c3ed  /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*,
std::__1::basic_string<char, std::__
1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int,
unsigned int, unsigned int, unsigned int, bool)+36)
...

This was bisected back to commit bfd40eaff5ab ("mm: fix vma_is_anonymous()
false-positives").

create_mspace_with_base() in the trace above, utilizes ashmem, and
with ashmem, for shared mappings we use shmem_zero_setup(), which sets
the vma->vm_ops to &shmem_vm_ops. But for private ashmem mappings
nothing sets the vma->vm_ops.

Looking at the problematic patch, it seems to add a requirement that
one call vma_set_anonymous() on a vma, otherwise the dummy_vm_ops will
be used. Using the dummy_vm_ops seem to triggger SIGBUS when traversing
unmapped pages.

Thus, this patch adds a call to vma_set_anonymous() for ashmem private
mappings and seems to avoid the reported problem.

Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: aarcange@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Colin Cross <ccross@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org
Cc: youling 257 <youling257@gmail.com>
Fixes: bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives")
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Reported-by: Youling 257 <youling257@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---

Hopefully my explanation make sense here. Please let me know if it
needs corrections.
thanks
-john

---
 drivers/staging/android/ashmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index a1a0025..d5d33e1 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -402,6 +402,8 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
 			fput(asma->file);
 			goto out;
 		}
+	} else {
+		vma_set_anonymous(vma);
 	}
 
 	if (vma->vm_file)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 17:03                   ` Kirill A. Shutemov
@ 2018-07-31 17:43                     ` Luck, Tony
  2018-07-31 19:02                       ` Linus Torvalds
  2018-08-01 17:15                       ` Linus Torvalds
  0 siblings, 2 replies; 32+ messages in thread
From: Luck, Tony @ 2018-07-31 17:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Tue, Jul 31, 2018 at 08:03:28PM +0300, Kirill A. Shutemov wrote:
> But it's not the only issue unfortunately. Tony reported issue with
> booting ia64 with the patch. I have no clue why. I rechecked everything
> ia64-specific and looks fine to me. :-/

If I just revert bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives")
then ia64 boots again.

-Tony

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 17:43                     ` Luck, Tony
@ 2018-07-31 19:02                       ` Linus Torvalds
  2018-08-01 17:15                       ` Linus Torvalds
  1 sibling, 0 replies; 32+ messages in thread
From: Linus Torvalds @ 2018-07-31 19:02 UTC (permalink / raw)
  To: Tony Luck
  Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Tue, Jul 31, 2018 at 10:43 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> If I just revert bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives")
> then ia64 boots again.

Ok, so it's not just the ashmem thing.

I think I'll do an rc8 with the revert, just so that we'll have some
time to figure this out. It's only Tuesday, but I already have 90
commits since rc7, so this isn't the only issue we're having.

I _prefer_ just the regular cadence of releases, but when I have a
reason to delay, I'll delay.

                 Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 16:29                 ` Linus Torvalds
                                     ` (2 preceding siblings ...)
  2018-07-31 17:17                   ` [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages John Stultz
@ 2018-07-31 22:57                   ` youling 257
  2018-07-31 23:07                   ` youling 257
  4 siblings, 0 replies; 32+ messages in thread
From: youling 257 @ 2018-07-31 22:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Amit Pundir, John Stultz, Hugh Dickins, Kirill A. Shutemov,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, Joel Fernandes, Colin Cross

[-- Attachment #1: Type: text/plain, Size: 2508 bytes --]

Build fingerprint:
'Android-x86/android_x86/x86:8.1.0/OPM6.171019.030.B1/cwhuang0618:userdebug/test-keys'
Revision: '0'
ABI: 'x86'
pid: 2899, tid: 2899, name: zygote >>> zygote <<<
signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
    eax 00000000 ebx f0274a40 ecx 000001e0 edx 0ec00008
    esi 00000000 edi 0ec00000
    xcs 00000023 xds 0000002b xes 0000002b xfs 00000003 xss 0000002b
    eip f24a4996 ebp ffc4eaa8 esp ffc4ea68 flags 00010202

backtrace:
    #00 pc 0001a996 /system/lib/libc.so (memset+150)
    #01 pc 0022b2be /system/lib/libart.so (create_mspace_with_base+222)
    #02 pc 002b2a05 /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, unsigned
int)+69)
    #03 pc 002b2637 /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*,
std::__1::basic_string<char, std::__1::char_traits<char>,
std::__1::allocator<char>> const&, unsigned int, unsigned int, unsigned
int, unsigned int, bool)+55)
    #04 pc 0027f6af /system/lib/libart.so (art::gc::Heap::Heap(unsigned
int, unsigned int, unsigned int, unsigned int, double, double, unsigned
int, unsigned int, std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char>> const&,
art::InstructionSet, art::gc::CollectorType, art::gc::CollectorType,
art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int, unsigned
int, bool, unsigned int, unsigned int, bool, bool, bool, bool, bool, bool,
bool, bool, bool, bool, bool, unsig #05 pc 0055a17a /system/lib/libart.so
(_ZN3art7Runtime4InitEONS_18RuntimeArgumentMapE+11434)
    #06 pc 0055e928 /system/lib/libart.so
(art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>,
std::__1::allocator<std::__1::pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>>>
const&, bool)+184)

2018-08-01 0:29 GMT+08:00 Linus Torvalds <torvalds@linux-foundation.org>:

> On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org>
> wrote:
> >
> > This ashmem change ^^ worked too.
>
> Ok, let's go for that one and hope it's the only one.
>
> John, can I get a proper commit message and sign-off for that ashmem
> change?
>
> Kirill - you mentioned that somebody reproduced a problem on x86-64
> too. I didn't see that report. Was that some odd x86 Android setup
> with Ashmem too, or is there something else pending?
>
>                        Linus
>

[-- Attachment #2: Type: text/html, Size: 3364 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 16:29                 ` Linus Torvalds
                                     ` (3 preceding siblings ...)
  2018-07-31 22:57                   ` Linux 4.18-rc7 youling 257
@ 2018-07-31 23:07                   ` youling 257
  4 siblings, 0 replies; 32+ messages in thread
From: youling 257 @ 2018-07-31 23:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Amit Pundir, John Stultz, Hugh Dickins, Kirill A. Shutemov,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, Joel Fernandes, Colin Cross

[-- Attachment #1: Type: text/plain, Size: 3632 bytes --]

my x86 report

isPrevious: true
Build:
Android-x86/android_x86/x86:8.1.0/OPM6.171019.030.B1/cwhuang0618:userdebug/test-keys
Hardware: unknown
Revision: 0
Bootloader: unknown
Radio: unknown
Kernel: Linux version 4.18.0-rc7-android-x86_64+ (root@localhost) (gcc
version 8.2.0 (Ubuntu 8.2.0-1ubuntu2)) #1 SMP PREEMPT Mon Jul 30 12:26:29
CST 2018

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint:
'Android-x86/android_x86/x86:8.1.0/OPM6.171019.030.B1/cwhuang0618:userdebug/test-keys'
Revision: '0'
ABI: 'x86'
pid: 2899, tid: 2899, name: zygote >>> zygote <<<
signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
    eax 00000000 ebx f0274a40 ecx 000001e0 edx 0ec00008
    esi 00000000 edi 0ec00000
    xcs 00000023 xds 0000002b xes 0000002b xfs 00000003 xss 0000002b
    eip f24a4996 ebp ffc4eaa8 esp ffc4ea68 flags 00010202

backtrace:
    #00 pc 0001a996 /system/lib/libc.so (memset+150)
    #01 pc 0022b2be /system/lib/libart.so (create_mspace_with_base+222)
    #02 pc 002b2a05 /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, unsigned
int)+69)
    #03 pc 002b2637 /system/lib/libart.so
(art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*,
std::__1::basic_string<char, std::__1::char_traits<char>,
std::__1::allocator<char>> const&, unsigned int, unsigned int, unsigned
int, unsigned int, bool)+55)
    #04 pc 0027f6af /system/lib/libart.so (art::gc::Heap::Heap(unsigned
int, unsigned int, unsigned int, unsigned int, double, double, unsigned
int, unsigned int, std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char>> const&,
art::InstructionSet, art::gc::CollectorType, art::gc::CollectorType,
art::gc::space::LargeObjectSpaceType, unsigned int, unsigned int, unsigned
int, bool, unsigned int, unsigned int, bool, bool, bool, bool, bool, bool,
bool, bool, bool, bool, bool, unsig #05 pc 0055a17a /system/lib/libart.so
(_ZN3art7Runtime4InitEONS_18RuntimeArgumentMapE+11434)
    #06 pc 0055e928 /system/lib/libart.so
(art::Runtime::Create(std::__1::vector<std::__1::pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>,
std::__1::allocator<std::__1::pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char>>, void const*>>>
const&, bool)+184)
    #07 pc 003a1117 /system/lib/libart.so (JNI_CreateJavaVM+647)
    #08 pc 000044ae /system/lib/libnativehelper.so (JNI_CreateJavaVM+46)
    #09 pc 00078d1d /system/lib/libandroid_runtime.so
(android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool)+7581)
    #10 pc 000791fb /system/lib/libandroid_runtime.so
(android::AndroidRuntime::start(char const*,
android::Vector<android::String8> const&, bool)+395)
    #11 pc 00003119 /system/bin/app_process32 (main+1689)
    #12 pc 000b6a34 /system/lib/libc.so (__libc_init+100)
    #13 pc 000029dd /system/bin/app_process32 (_start_main+80)
    #14 pc 000029e8 /system/bin/app_process32 (_start+10)
    #15 pc 00000004 <unknown>
    #16 pc 00020b69 [stack:ffc31000]

2018-08-01 0:29 GMT+08:00 Linus Torvalds <torvalds@linux-foundation.org>:

> On Mon, Jul 30, 2018 at 11:40 PM Amit Pundir <amit.pundir@linaro.org>
> wrote:
> >
> > This ashmem change ^^ worked too.
>
> Ok, let's go for that one and hope it's the only one.
>
> John, can I get a proper commit message and sign-off for that ashmem
> change?
>
> Kirill - you mentioned that somebody reproduced a problem on x86-64
> too. I didn't see that report. Was that some odd x86 Android setup
> with Ashmem too, or is there something else pending?
>
>                        Linus
>

[-- Attachment #2: Type: text/html, Size: 4752 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 14:57             ` Kirill A. Shutemov
@ 2018-08-01  0:09               ` Hugh Dickins
  0 siblings, 0 replies; 32+ messages in thread
From: Hugh Dickins @ 2018-08-01  0:09 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Hugh Dickins, Matthew Wilcox, Amit Pundir,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	Andrea Arcangeli, Greg Kroah-Hartman, John Stultz, linux-mm,
	Linux Kernel Mailing List, youling257

On Tue, 31 Jul 2018, Kirill A. Shutemov wrote:
> On Tue, Jul 31, 2018 at 09:29:27AM +0300, Kirill A. Shutemov wrote:
> > On Mon, Jul 30, 2018 at 06:01:26PM -0700, Linus Torvalds wrote:
> > > 
> > > So to me it looks like a historical check that simply doesn't
> > > "normally" trigger, but there's no reason I can see why we should care
> > > about the case it tests against.
> > 
> > I'll think more on what could go wrong with __split_huge_pmd() called on
> > anon-THP page without mmap_sem(). It's not yet clear cut to me.
> 
> I think not having mmap_sem taken at least on read when we call
> __split_huge_pmd() opens possiblity of race with khugepaged:
> khugepaged can collapse the page back to THP as soon as we drop ptl.
> As result pmd_none_or_trans_huge_or_clear_bad() would return true and we
> basically leave the THP behind, not zapped.

I think we don't care deeply about the POSIX truncate semantics on the
kind of "file" that has managed to get to this point: in the unlikely
event that a THP is immediately recreated there, never mind, so long as
we don't crash or leak memory or suchlike (the surplus THP would get
freed at exit).

I think we're altogether better off just deleting that VM_BUG_ON_VMA();
but I do find it very very hard to arrive at a firm conclusion on the
absolute safety of splitting a pmd without mmap_sem there (though any
problem unlikely even if real, and more likely a figment of my paranoia).

I believe the VM_BUG_ON is a relic from the old days, when anon_vma_lock
played a big part in guarding the pmd+page split: remember how mmap_sem
is one of the ways you can guarantee that the anon_vma will not vanish
beneath you (page_get_anon_vma was added later than anon THP).

I'm a little more worried by the nearby zap_huge_pmd() (which could
never be covered by a suitable VM_BUG_ON): the way that frees a
previously deposited page table, and you have no guarantee of when
and where that page table was last used. Again I can't point to an
actual problem, just the recollection that it's been found subtly
safe in the past, but any change in the conditions might affect that.

And a little worried to see how split_huge_page_to_list() uses
anon_vma_lock on PageAnon versus i_mmap_lock on !PageAnon: which
makes complete sense in itself, but won't protect against a PageAnon
THP being concurrently split from the truncate_pagecache() direction,
where unmap_mapping_range() uses i_mmap_lock. (simple_setattr() the
default setattr: that's a bit of a worry too.)

I feel I'm moaning and crying at shadows, rather than providing any
useful suggestions or patches; but thought I ought to report back.

Hugh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-07-31 17:43                     ` Luck, Tony
  2018-07-31 19:02                       ` Linus Torvalds
@ 2018-08-01 17:15                       ` Linus Torvalds
  2018-08-01 18:31                         ` Hugh Dickins
                                           ` (3 more replies)
  1 sibling, 4 replies; 32+ messages in thread
From: Linus Torvalds @ 2018-08-01 17:15 UTC (permalink / raw)
  To: Tony Luck
  Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Tue, Jul 31, 2018 at 10:43 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Tue, Jul 31, 2018 at 08:03:28PM +0300, Kirill A. Shutemov wrote:
> > But it's not the only issue unfortunately. Tony reported issue with
> > booting ia64 with the patch. I have no clue why. I rechecked everything
> > ia64-specific and looks fine to me. :-/
>
> If I just revert bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives")
> then ia64 boots again.

Ok, I'd love to have more information about this, but I'm assuming
that Tony isn't running some odd ia64 version of Android, so there's
definitely something else than just the ashmem thing going on. Either
it's some odd ia64-specific special vma, or it's just something
triggered on an ia64 boot that nobody else noticed or cared about.

And I was just going to do the final revert and started this email to
say so, when I looked at the obvious candidate: the
ia64_init_addr_space() function. Trivially fixed.

But as I was doing that, I also noticed another problem with the vma
series: the vma_init() conversion of automatic variables is buggy.
Commit 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack
and data segments") is really bad, because it never grew the memset()
that was discussed, and the patch that was applied was the original
one - so vma_init() only initializes a couple of fields. As a result,
doing things like this:

-       struct vm_area_struct vma = { .vm_mm = mm };
+       struct vm_area_struct vma;

+       vma_init(&vma, mm);

is just completely wrong, because it actually initializes much less
than it used to, and leaves most of the vma filled with random stack
garbage. In particular, it now fills with garbage the fields that TLB
flushing really can care about: things like vm_flags that says "is
this perhaps an executable-only mapping that only needs to flush the
ITLB?"

I don't actually believe that we should do vma_init() on those
on-stack vma's anyway, since they aren't "real" vma's. They are
literally crafted just for TLB flushing - filling in the vm_mm (and
sometimes vm_flags) pointers so that the TLB flushing knows what to
do.

So using "vma_init()" on them is actively detrimental as things stand
right now. The reason I looked at them was that I was trying to see
who actually uses "vm_area_alloc()" and "vma_init()" right now that
would be affected by that commit bfd40eaff5ab ("mm: fix
vma_is_anonymous() false-positives") outside of actual
honest-to-goodness device file mmaps.

Anyway, the upshot of all this is that I think I know what the ia64
problem was, and John sent the patch for the ashmem case, and I'm
going to hold off reverting that vma_is_anonymous() false-positives
commit after all.

I'm still unhappy about the vma_init() ones, and I have not decided
how to go with those. Either the memset() in vma_init(), or just
reverting the (imho unnecessary) commit 2c4541e24c55. Kirill, Andrew,
comments?

Tony, can you please double-check my commit ebad825cdd4e ("ia64: mark
special ia64 memory areas anonymous") fixes things for  you? I didn't
even compile it, but it really looks so obvious that I just committed
it directly. It's not pushed out yet (I'm doing the normal full build
test because of the ashmem commit first), but it should be out in
about 20 minutes when my testing has finished.

I'd like to get this sorted out asap, although at this point I still
think that I'll have to do an rc8 even though I feel like we may have
caught everything.

                        Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 17:15                       ` Linus Torvalds
@ 2018-08-01 18:31                         ` Hugh Dickins
  2018-08-01 20:58                           ` Kirill A. Shutemov
  2018-08-01 18:36                         ` Luck, Tony
                                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2018-08-01 18:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, Kirill A. Shutemov, Amit Pundir, John Stultz,
	Hugh Dickins, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli,
	Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List,
	youling 257, Joel Fernandes, Colin Cross

On Wed, 1 Aug 2018, Linus Torvalds wrote:
> 
> Anyway, the upshot of all this is that I think I know what the ia64
> problem was, and John sent the patch for the ashmem case, and I'm
> going to hold off reverting that vma_is_anonymous() false-positives
> commit after all.

I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below
(but I've no proprietorial interest, if you prefer to do your own).

John's patch is good, and originally I thought it was safe from that
VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
insists on VM_SHARED. But afterwards read John's earlier mail,
drawing attention to the vfs_fallocate() in there: I may be wrong,
and I don't know if Android has THP in the config anyway, but it looks
to me like an unmap_mapping_range() from ashmem's vfs_fallocate()
could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous().

(I'm not familiar with ashmem, and I certainly don't understand the
role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range()
should end up leaving any anon pages in place; but the presence of
the BUG is requiring us all to understand too much too quickly.)


[PATCH] mm: delete historical BUG from zap_pmd_range()

Delete the old VM_BUG_ON_VMA() from zap_pmd_range(), which asserted
that mmap_sem must be held when splitting an "anonymous" vma there.
Whether that's still strictly true nowadays is not entirely clear,
but the danger of sometimes crashing on the BUG is now fairly clear.

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/memory.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- 4.18-rc7/mm/memory.c	2018-06-16 18:48:22.041173422 -0700
+++ linux/mm/memory.c	2018-08-01 11:01:21.397286507 -0700
@@ -1417,11 +1417,9 @@ static inline unsigned long zap_pmd_rang
 	do {
 		next = pmd_addr_end(addr, end);
 		if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) {
-			if (next - addr != HPAGE_PMD_SIZE) {
-				VM_BUG_ON_VMA(vma_is_anonymous(vma) &&
-				    !rwsem_is_locked(&tlb->mm->mmap_sem), vma);
+			if (next - addr != HPAGE_PMD_SIZE)
 				__split_huge_pmd(vma, pmd, addr, false, NULL);
-			} else if (zap_huge_pmd(tlb, vma, pmd, addr))
+			else if (zap_huge_pmd(tlb, vma, pmd, addr))
 				goto next;
 			/* fall through */
 		}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 17:15                       ` Linus Torvalds
  2018-08-01 18:31                         ` Hugh Dickins
@ 2018-08-01 18:36                         ` Luck, Tony
  2018-08-01 20:05                         ` Linus Torvalds
  2018-08-02  6:59                         ` Amit Pundir
  3 siblings, 0 replies; 32+ messages in thread
From: Luck, Tony @ 2018-08-01 18:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 01, 2018 at 10:15:05AM -0700, Linus Torvalds wrote:
> Tony, can you please double-check my commit ebad825cdd4e ("ia64: mark
> special ia64 memory areas anonymous") fixes things for  you? I didn't
> even compile it, but it really looks so obvious that I just committed
> it directly. It's not pushed out yet (I'm doing the normal full build
> test because of the ashmem commit first), but it should be out in
> about 20 minutes when my testing has finished.
> 
> I'd like to get this sorted out asap, although at this point I still
> think that I'll have to do an rc8 even though I feel like we may have
> caught everything.

I pulled and got HEAD = 44960f2a7b63e224b1091b3e1d6f60e0cdf4be0c which
includes ebad825cdd4e.

ia64 boots fine with no issues.

Thanks

-Tony

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 17:15                       ` Linus Torvalds
  2018-08-01 18:31                         ` Hugh Dickins
  2018-08-01 18:36                         ` Luck, Tony
@ 2018-08-01 20:05                         ` Linus Torvalds
  2018-08-01 20:51                           ` Kirill A. Shutemov
  2018-08-02  6:59                         ` Amit Pundir
  3 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2018-08-01 20:05 UTC (permalink / raw)
  To: Tony Luck
  Cc: Kirill A. Shutemov, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

[-- Attachment #1: Type: text/plain, Size: 1614 bytes --]

On Wed, Aug 1, 2018 at 10:15 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm still unhappy about the vma_init() ones, and I have not decided
> how to go with those. Either the memset() in vma_init(), or just
> reverting the (imho unnecessary) commit 2c4541e24c55. Kirill, Andrew,
> comments?

Ugh. Adding a memset looks simple, but screws up some places that have
other initialization. It also requires adding a new include of
<linux/string.h>, or we'd need to uninline vma_init() and put it
somewhere else.

But just reverting commit 2c4541e24c55 ("mm: use vma_init() to
initialize VMAs on stack and data segments") entirely isn't good
either, because some of the cases aren't about the TLB flush
interface, and call down to "real" VM functions. The 'pseudo_vma' use
of remove_inode_hugepages() and hugetlbfs_fallocate() in particular is
odd, but using vma_init() looks good there. And those places had the
memset() already.

So I'm inclined to simply mark the TLB-related vma_init() cases
special, and use something like this:

  #define TLB_FLUSH_VMA(mm,flags) { .vm_mm = (mm), .vm_flags = (flags) }

to make it very obvious when we're doing that vma initialization for
flush_tlb_range(). It's done as an initializer, exactly so that the
only valid syntax is to do somethin glike this:

        struct vm_area_struct vma = TLB_FLUSH_VMA(mm, VM_EXEC);

That leaves vma_init() users to be just the actual real allocation
path, and a few very specific specual vmas (the hugetlbfs and
mempolicy pseudo-vma, and a couple of "gate" vmas).

Suggested patch attached. Comments?

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 4025 bytes --]

 arch/arm/mach-rpc/ecard.c    |  5 +----
 arch/arm64/include/asm/tlb.h |  4 +---
 arch/arm64/mm/hugetlbpage.c  | 10 ++++------
 arch/ia64/include/asm/tlb.h  |  7 +++----
 include/linux/mm.h           |  3 +++
 5 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/arch/arm/mach-rpc/ecard.c b/arch/arm/mach-rpc/ecard.c
index 8db62cc54a6a..04b2f22c2739 100644
--- a/arch/arm/mach-rpc/ecard.c
+++ b/arch/arm/mach-rpc/ecard.c
@@ -212,7 +212,7 @@ static DEFINE_MUTEX(ecard_mutex);
  */
 static void ecard_init_pgtables(struct mm_struct *mm)
 {
-	struct vm_area_struct vma;
+	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, VM_EXEC);
 
 	/* We want to set up the page tables for the following mapping:
 	 *  Virtual	Physical
@@ -237,9 +237,6 @@ static void ecard_init_pgtables(struct mm_struct *mm)
 
 	memcpy(dst_pgd, src_pgd, sizeof(pgd_t) * (EASI_SIZE / PGDIR_SIZE));
 
-	vma_init(&vma, mm);
-	vma.vm_flags = VM_EXEC;
-
 	flush_tlb_range(&vma, IO_START, IO_START + IO_SIZE);
 	flush_tlb_range(&vma, EASI_START, EASI_START + EASI_SIZE);
 }
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index d87f2d646caa..0ad1cf233470 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -37,9 +37,7 @@ static inline void __tlb_remove_table(void *_table)
 
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
-	struct vm_area_struct vma;
-
-	vma_init(&vma, tlb->mm);
+	struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0);
 
 	/*
 	 * The ASID allocator will either invalidate the ASID or mark
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 1854e49aa18a..192b3ba07075 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -108,13 +108,10 @@ static pte_t get_clear_flush(struct mm_struct *mm,
 			     unsigned long pgsize,
 			     unsigned long ncontig)
 {
-	struct vm_area_struct vma;
 	pte_t orig_pte = huge_ptep_get(ptep);
 	bool valid = pte_valid(orig_pte);
 	unsigned long i, saddr = addr;
 
-	vma_init(&vma, mm);
-
 	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
 		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
 
@@ -127,8 +124,10 @@ static pte_t get_clear_flush(struct mm_struct *mm,
 			orig_pte = pte_mkdirty(orig_pte);
 	}
 
-	if (valid)
+	if (valid) {
+		struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
 		flush_tlb_range(&vma, saddr, addr);
+	}
 	return orig_pte;
 }
 
@@ -147,10 +146,9 @@ static void clear_flush(struct mm_struct *mm,
 			     unsigned long pgsize,
 			     unsigned long ncontig)
 {
-	struct vm_area_struct vma;
+	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
 	unsigned long i, saddr = addr;
 
-	vma_init(&vma, mm);
 	for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
 		pte_clear(mm, addr, ptep);
 
diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h
index db89e7306081..516355a774bf 100644
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -115,12 +115,11 @@ ia64_tlb_flush_mmu_tlbonly(struct mmu_gather *tlb, unsigned long start, unsigned
 		flush_tlb_all();
 	} else {
 		/*
-		 * XXX fix me: flush_tlb_range() should take an mm pointer instead of a
-		 * vma pointer.
+		 * flush_tlb_range() takes a vma instead of a mm pointer because
+		 * some architectures want the vm_flags for ITLB/DTLB flush.
 		 */
-		struct vm_area_struct vma;
+		struct vm_area_struct vma = TLB_FLUSH_VMA(tlb->mm, 0);
 
-		vma_init(&vma, tlb->mm);
 		/* flush the address range from the tlb: */
 		flush_tlb_range(&vma, start, end);
 		/* now flush the virt. page-table area mapping the address range: */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7ba6d356d18f..68a5121694ef 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -466,6 +466,9 @@ static inline void vma_set_anonymous(struct vm_area_struct *vma)
 	vma->vm_ops = NULL;
 }
 
+/* flush_tlb_range() takes a vma, not a mm, and can care about flags */
+#define TLB_FLUSH_VMA(mm,flags) { .vm_mm = (mm), .vm_flags = (flags) }
+
 struct mmu_gather;
 struct inode;
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 20:05                         ` Linus Torvalds
@ 2018-08-01 20:51                           ` Kirill A. Shutemov
  2018-08-01 20:56                             ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-08-01 20:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 01, 2018 at 01:05:48PM -0700, Linus Torvalds wrote:
> On Wed, Aug 1, 2018 at 10:15 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I'm still unhappy about the vma_init() ones, and I have not decided
> > how to go with those. Either the memset() in vma_init(), or just
> > reverting the (imho unnecessary) commit 2c4541e24c55. Kirill, Andrew,
> > comments?
> 
> Ugh. Adding a memset looks simple, but screws up some places that have
> other initialization. It also requires adding a new include of
> <linux/string.h>, or we'd need to uninline vma_init() and put it
> somewhere else.
> 
> But just reverting commit 2c4541e24c55 ("mm: use vma_init() to
> initialize VMAs on stack and data segments") entirely isn't good
> either, because some of the cases aren't about the TLB flush
> interface, and call down to "real" VM functions. The 'pseudo_vma' use
> of remove_inode_hugepages() and hugetlbfs_fallocate() in particular is
> odd, but using vma_init() looks good there. And those places had the
> memset() already.
> 
> So I'm inclined to simply mark the TLB-related vma_init() cases
> special, and use something like this:
> 
>   #define TLB_FLUSH_VMA(mm,flags) { .vm_mm = (mm), .vm_flags = (flags) }
> 
> to make it very obvious when we're doing that vma initialization for
> flush_tlb_range(). It's done as an initializer, exactly so that the
> only valid syntax is to do somethin glike this:
> 
>         struct vm_area_struct vma = TLB_FLUSH_VMA(mm, VM_EXEC);
> 
> That leaves vma_init() users to be just the actual real allocation
> path, and a few very specific specual vmas (the hugetlbfs and
> mempolicy pseudo-vma, and a couple of "gate" vmas).
> 
> Suggested patch attached. Comments?

Is there a reason why we pass vma to flush_tlb_range?

It's not obvious to me what information from VMA can be useful for an
implementation. I see that ecard.c initialize vm_flags too, but it seems
unused by flush_tlb_range.

Maybe it's cleaner to have generic helper flush_tlb_range_mm() or
something?

In longer term we can change the interface to take mm instead of vma.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 20:51                           ` Kirill A. Shutemov
@ 2018-08-01 20:56                             ` Linus Torvalds
  2018-08-01 21:25                               ` Kirill A. Shutemov
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2018-08-01 20:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Tony Luck, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 1, 2018 at 1:52 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> Is there a reason why we pass vma to flush_tlb_range?

Yes. It's even in that patch.

The fact is, real MM users *have* a vma, and passing it in to the TLB
flushing is the right thing to do. That allows architectures that care
(mainly powerpc, I think) to notice that "hey, this range only had
execute permissions, so I only need to flush the ITLB".

The people who use tlb_flush_range() any other way are doing an
arch-specific hack.  It's not how tlb_flush_range() was defined, and
it's not how you can use it in general.

> It's not obvious to me what information from VMA can be useful for an
> implementation.

See the patch I sent, which had this as part of it:

-                * XXX fix me: flush_tlb_range() should take an mm
pointer instead of a
-                * vma pointer.
+                * flush_tlb_range() takes a vma instead of a mm pointer because
+                * some architectures want the vm_flags for ITLB/DTLB flush.

because I wanted to educate people about why the interface was what it
was, and the "fixme" was bogus shit.

> In longer term we can change the interface to take mm instead of vma.

FUCK NO!

Goddammit, read the code, or read the patch. The places ytou added
those broken vma_init() calls to were architecture-specific hacks.

Those architecture-specific hacks do not get to screw up the design
for everybody else.

                     Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 18:31                         ` Hugh Dickins
@ 2018-08-01 20:58                           ` Kirill A. Shutemov
  2018-08-01 21:55                             ` Hugh Dickins
  0 siblings, 1 reply; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-08-01 20:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Tony Luck, Amit Pundir, John Stultz,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 01, 2018 at 11:31:52AM -0700, Hugh Dickins wrote:
> On Wed, 1 Aug 2018, Linus Torvalds wrote:
> > 
> > Anyway, the upshot of all this is that I think I know what the ia64
> > problem was, and John sent the patch for the ashmem case, and I'm
> > going to hold off reverting that vma_is_anonymous() false-positives
> > commit after all.
> 
> I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below
> (but I've no proprietorial interest, if you prefer to do your own).

Agreed.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> John's patch is good, and originally I thought it was safe from that
> VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
> disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
> insists on VM_SHARED. But afterwards read John's earlier mail,
> drawing attention to the vfs_fallocate() in there: I may be wrong,
> and I don't know if Android has THP in the config anyway, but it looks
> to me like an unmap_mapping_range() from ashmem's vfs_fallocate()
> could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous().
> 
> (I'm not familiar with ashmem, and I certainly don't understand the
> role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range()
> should end up leaving any anon pages in place; but the presence of
> the BUG is requiring us all to understand too much too quickly.)

Hugh, do you see any reason why ashmem shouldn't have vm_ops ==
shmem_vm_ops?

I don't understand ashmem, but I feel uncomfortable that we have this
sneaky way to create an anonymous VMA. It feels wrong to me.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 20:56                             ` Linus Torvalds
@ 2018-08-01 21:25                               ` Kirill A. Shutemov
  0 siblings, 0 replies; 32+ messages in thread
From: Kirill A. Shutemov @ 2018-08-01 21:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, Amit Pundir, John Stultz, Hugh Dickins,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 01, 2018 at 01:56:19PM -0700, Linus Torvalds wrote:
> On Wed, Aug 1, 2018 at 1:52 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > Is there a reason why we pass vma to flush_tlb_range?
> 
> Yes. It's even in that patch.
> 
> The fact is, real MM users *have* a vma, and passing it in to the TLB
> flushing is the right thing to do. That allows architectures that care
> (mainly powerpc, I think) to notice that "hey, this range only had
> execute permissions, so I only need to flush the ITLB".
> 
> The people who use tlb_flush_range() any other way are doing an
> arch-specific hack.  It's not how tlb_flush_range() was defined, and
> it's not how you can use it in general.

Okay, I see.

ARM, unicore32 and xtensa avoid iTLB flush for non-executable VMAs.

> 
> > It's not obvious to me what information from VMA can be useful for an
> > implementation.
> 
> See the patch I sent, which had this as part of it:
> 
> -                * XXX fix me: flush_tlb_range() should take an mm
> pointer instead of a
> -                * vma pointer.
> +                * flush_tlb_range() takes a vma instead of a mm pointer because
> +                * some architectures want the vm_flags for ITLB/DTLB flush.
> 
> because I wanted to educate people about why the interface was what it
> was, and the "fixme" was bogus shit.

I didn't noticied this. Sorry.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 20:58                           ` Kirill A. Shutemov
@ 2018-08-01 21:55                             ` Hugh Dickins
  2018-08-02 19:12                               ` John Stultz
  0 siblings, 1 reply; 32+ messages in thread
From: Hugh Dickins @ 2018-08-01 21:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Dickins, Linus Torvalds, Tony Luck, Amit Pundir,
	John Stultz, Matthew Wilcox, Kirill A. Shutemov, Andrew Morton,
	Dmitry Vyukov, Oleg Nesterov, Andrea Arcangeli,
	Greg Kroah-Hartman, linux-mm, Linux Kernel Mailing List,
	youling 257, Joel Fernandes, Colin Cross

On Wed, 1 Aug 2018, Kirill A. Shutemov wrote:
> On Wed, Aug 01, 2018 at 11:31:52AM -0700, Hugh Dickins wrote:
> > On Wed, 1 Aug 2018, Linus Torvalds wrote:
> > > 
> > > Anyway, the upshot of all this is that I think I know what the ia64
> > > problem was, and John sent the patch for the ashmem case, and I'm
> > > going to hold off reverting that vma_is_anonymous() false-positives
> > > commit after all.
> > 
> > I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below
> > (but I've no proprietorial interest, if you prefer to do your own).
> 
> Agreed.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Thanks.

> 
> > John's patch is good, and originally I thought it was safe from that
> > VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
> > disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
> > insists on VM_SHARED. But afterwards read John's earlier mail,
> > drawing attention to the vfs_fallocate() in there: I may be wrong,
> > and I don't know if Android has THP in the config anyway, but it looks
> > to me like an unmap_mapping_range() from ashmem's vfs_fallocate()
> > could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous().
> > 
> > (I'm not familiar with ashmem, and I certainly don't understand the
> > role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range()
> > should end up leaving any anon pages in place; but the presence of
> > the BUG is requiring us all to understand too much too quickly.)
> 
> Hugh, do you see any reason why ashmem shouldn't have vm_ops ==
> shmem_vm_ops?

I cannot immediately think of an absolute reason why not, but I'm
not giving it much thought; and that might turn it into a stranger
beast than it already is.

> 
> I don't understand ashmem, but I feel uncomfortable that we have this
> sneaky way to create an anonymous VMA. It feels wrong to me.

I agree it's odd, but in this respect it's no odder than /dev/zero:
that has exactly the same pattern of shmem_zero_setup() for VM_SHARED,
else vma_set_anonymous(): which made me comfortable with John's patch,
restoring the way it worked before.

Admittedly, the subsequent vfs_fallocate() might generate surprises;
and the business of doing a shmem_file_setup() first, and then undoing
it with a shmem_zero_setup(), looks weird - from John's old XXX comment,
I think it was a quick hack to piece together some functionality they
needed in a hurry, which never got revisited (they wanted a name for
the area? maybe memfd would be good for that now).

But if what's in there is working now, I do not want to mess with it:
I'd be adding bugs faster than removing them.

Hugh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 17:15                       ` Linus Torvalds
                                           ` (2 preceding siblings ...)
  2018-08-01 20:05                         ` Linus Torvalds
@ 2018-08-02  6:59                         ` Amit Pundir
  3 siblings, 0 replies; 32+ messages in thread
From: Amit Pundir @ 2018-08-02  6:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: tony.luck, kirill, John Stultz, Hugh Dickins, willy,
	Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, Oleg Nesterov,
	aarcange, Greg Kroah-Hartman, linux-mm, lkml, youling 257,
	Joel Fernandes, Colin Cross

On Wed, 1 Aug 2018 at 22:45, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'd like to get this sorted out asap, although at this point I still
> think that I'll have to do an rc8 even though I feel like we may have
> caught everything.

No AOSP regressions in my limited smoke testing so far with
current HEAD: 6b4703768268 ("Merge branch 'fixes' of
git://git.armlinux.org.uk/~rmk/linux-arm"). Thanks.

Regards,
Amit Pundir

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Linux 4.18-rc7
  2018-08-01 21:55                             ` Hugh Dickins
@ 2018-08-02 19:12                               ` John Stultz
  0 siblings, 0 replies; 32+ messages in thread
From: John Stultz @ 2018-08-02 19:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kirill A. Shutemov, Linus Torvalds, Tony Luck, Amit Pundir,
	Matthew Wilcox, Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov,
	Oleg Nesterov, Andrea Arcangeli, Greg Kroah-Hartman, linux-mm,
	Linux Kernel Mailing List, youling 257, Joel Fernandes,
	Colin Cross

On Wed, Aug 1, 2018 at 2:55 PM, Hugh Dickins <hughd@google.com> wrote:
> On Wed, 1 Aug 2018, Kirill A. Shutemov wrote:
>> On Wed, Aug 01, 2018 at 11:31:52AM -0700, Hugh Dickins wrote:
>> > On Wed, 1 Aug 2018, Linus Torvalds wrote:
>> > >
>> > > Anyway, the upshot of all this is that I think I know what the ia64
>> > > problem was, and John sent the patch for the ashmem case, and I'm
>> > > going to hold off reverting that vma_is_anonymous() false-positives
>> > > commit after all.
>> >
>> > I'd better send deletion of zap_pmd_range()'s VM_BUG_ON_VMA(): below
>> > (but I've no proprietorial interest, if you prefer to do your own).
>>
>> Agreed.
>>
>> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
> Thanks.
>
>>
>> > John's patch is good, and originally I thought it was safe from that
>> > VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
>> > disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
>> > insists on VM_SHARED. But afterwards read John's earlier mail,
>> > drawing attention to the vfs_fallocate() in there: I may be wrong,
>> > and I don't know if Android has THP in the config anyway, but it looks
>> > to me like an unmap_mapping_range() from ashmem's vfs_fallocate()
>> > could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous().
>> >
>> > (I'm not familiar with ashmem, and I certainly don't understand the
>> > role of MAP_PRIVATE ashmem mappings - hole-punch's zap_pte_range()
>> > should end up leaving any anon pages in place; but the presence of
>> > the BUG is requiring us all to understand too much too quickly.)
>>
>> Hugh, do you see any reason why ashmem shouldn't have vm_ops ==
>> shmem_vm_ops?
>
> I cannot immediately think of an absolute reason why not, but I'm
> not giving it much thought; and that might turn it into a stranger
> beast than it already is.
>
>>
>> I don't understand ashmem, but I feel uncomfortable that we have this
>> sneaky way to create an anonymous VMA. It feels wrong to me.
>
> I agree it's odd, but in this respect it's no odder than /dev/zero:
> that has exactly the same pattern of shmem_zero_setup() for VM_SHARED,
> else vma_set_anonymous(): which made me comfortable with John's patch,
> restoring the way it worked before.
>
> Admittedly, the subsequent vfs_fallocate() might generate surprises;
> and the business of doing a shmem_file_setup() first, and then undoing
> it with a shmem_zero_setup(), looks weird - from John's old XXX comment,

Yes, it is weird. :)

> I think it was a quick hack to piece together some functionality they
> needed in a hurry, which never got revisited (they wanted a name for
> the area? maybe memfd would be good for that now).

So my XXX comment is to do with a change I made to ashmem in order for
it to go into staging, compared with the original Android
implementation. They still carry the patch to undo it here:
https://android.googlesource.com/kernel/common.git/+/ebfc8d6476a66dc91a1b30cbfc18e67d4401af09%5E%21/

But it has more to do w/ in the shared mapping case, providing a
cleaner way of setting the vma->vm_ops = &shmem_vm_ops without having
to use shmem_zero_setup(), and doesn't change the behavior in the
private mapping case.

This discussion has spurred Joel and I to chat a bit about reviving
the effort to "properly" upstream ashmem. We're still in discussion
but I'm thinking we might just try to add the pin/unpin functionality
to memfd. We'll see how the discussion evolves.

thanks
-john

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-08-02 19:12 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CA+55aFxpFefwVdTGVML99PSFUqwpJXPx5LVCA3D=g2t2_QLNsA@mail.gmail.com>
2018-07-30  6:47 ` Linux 4.18-rc7 Amit Pundir
2018-07-30 13:01   ` Kirill A. Shutemov
2018-07-30 13:34     ` Amit Pundir
2018-07-30 17:32     ` Linus Torvalds
2018-07-30 21:53       ` Hugh Dickins
2018-07-31  1:01         ` Linus Torvalds
2018-07-31  3:26           ` Hugh Dickins
2018-07-31  4:25             ` John Stultz
2018-07-31  6:40               ` Amit Pundir
2018-07-31  6:56                 ` Kirill A. Shutemov
2018-07-31 16:29                 ` Linus Torvalds
2018-07-31 16:56                   ` John Stultz
2018-07-31 17:03                   ` Kirill A. Shutemov
2018-07-31 17:43                     ` Luck, Tony
2018-07-31 19:02                       ` Linus Torvalds
2018-08-01 17:15                       ` Linus Torvalds
2018-08-01 18:31                         ` Hugh Dickins
2018-08-01 20:58                           ` Kirill A. Shutemov
2018-08-01 21:55                             ` Hugh Dickins
2018-08-02 19:12                               ` John Stultz
2018-08-01 18:36                         ` Luck, Tony
2018-08-01 20:05                         ` Linus Torvalds
2018-08-01 20:51                           ` Kirill A. Shutemov
2018-08-01 20:56                             ` Linus Torvalds
2018-08-01 21:25                               ` Kirill A. Shutemov
2018-08-02  6:59                         ` Amit Pundir
2018-07-31 17:17                   ` [PATCH] staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages John Stultz
2018-07-31 22:57                   ` Linux 4.18-rc7 youling 257
2018-07-31 23:07                   ` youling 257
2018-07-31  6:29           ` Kirill A. Shutemov
2018-07-31 14:57             ` Kirill A. Shutemov
2018-08-01  0:09               ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).