* Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma
@ 2018-09-21 15:01 ` Chulmin Kim
2018-09-22 4:38 ` Chulmin Kim
0 siblings, 1 reply; 5+ messages in thread
From: Chulmin Kim @ 2018-09-21 15:01 UTC (permalink / raw)
To: linux-mm
[-- Attachment #1: Type: text/plain, Size: 1318 bytes --]
Hi all.
I am developing an android smartphone.
I am facing a problem that a thread is looping the page fault routine
forever.
(The kernel version is around v4.4 though it may differ from the mainline
slightly
as the problem occurs in a device being developed in my company.)
The pte corresponding to the fault address is with PTE_PROT_NONE and
!PTE_VALID.
(by the way, the pte is mapped to anon page (ashmem))
The weird thing, in my opinion, is that
the VMA of the fault address is not with PROT_NONE but with PROT_READ &
PROT_WRITE.
So, the page fault routine (handle_pte_fault()) returns 0 and fault loops
forever.
I don't think this is a normal situation.
As I didn't enable NUMA, a pte with PROT_NONE and !PTE_VALID is likely set
by mprotect().
1. mprotect(PROT_NONE) -> vma split & set pte with PROT_NONE
2. mprotect(PROT_READ & WRITE) -> vma merge & revert pte
I suspect that the revert pte in #2 didn't work somehow
but no clue.
I googled and found a similar situation (
http://linux-kernel.2935.n7.nabble.com/pipe-page-fault-oddness-td953839.html)
which is relevant to NUMA and huge pagetable configs
while my device is nothing to do with those configs.
Am I missing any possible scenario? or is it already known BUG?
It will be pleasure if you can give any idea about this problem.
Thanks.
Chulmin Kim
[-- Attachment #2: Type: text/html, Size: 1811 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma
2018-09-21 15:01 ` Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma Chulmin Kim
@ 2018-09-22 4:38 ` Chulmin Kim
2018-09-24 21:08 ` Andrea Arcangeli
0 siblings, 1 reply; 5+ messages in thread
From: Chulmin Kim @ 2018-09-22 4:38 UTC (permalink / raw)
To: Chulmin Kim, linux-mm, aarcange
Dear Arcangeli,
I think this problem is very much related with
the race condition shown in the below commit.
(e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
against rmap_walk)
I checked that
the the thread and its child threads are doing mprotect(PROT_{NONE or
R|W}) things repeatedly
while I didn't reproduce the problem yet.
Do you think this is one of the phenomenon you expected
from the race condition shown in the above commit?
Thanks.
Chulmin Kim
On 09/22/2018 12:01 AM, Chulmin Kim wrote:
> Hi all.
> I am developing an android smartphone.
>
> I am facing a problem that a thread is looping the page fault routine
> forever.
> (The kernel version is around v4.4 though it may differ from the
> mainline slightly
> as the problem occurs in a device being developed in my company.)
>
> The pte corresponding to the fault address is with PTE_PROT_NONE and
> !PTE_VALID.
> (by the way, the pte is mapped to anon page (ashmem))
> The weird thing, in my opinion, is that
> the VMA of the fault address is not withA PROT_NONEA but with PROT_READ
> & PROT_WRITE.
> So, the page fault routine (handle_pte_fault()) returns 0 and fault
> loops forever.
>
> I don't think this is a normal situation.
>
> As I didn't enable NUMA, a pte with PROT_NONE and !PTE_VALID is likely
> set by mprotect().
> 1. mprotect(PROT_NONE) -> vma split & set pte with PROT_NONE
> 2. mprotect(PROT_READ & WRITE) -> vma merge & revert pte
> I suspect that the revert pte in #2 didn't work somehow
> but no clue.
>
> I googled and found a similar situation
> (http://linux-kernel.2935.n7.nabble.com/pipe-page-fault-oddness-td953839.html)
> which is relevant to NUMA and huge pagetable configs
> while my device is nothing to do with those configs.
>
> Am I missing any possible scenario? or is it already known BUG?
> It will be pleasure if you can give any idea about this problem.
>
> Thanks.
> Chulmin Kim
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma
2018-09-22 4:38 ` Chulmin Kim
@ 2018-09-24 21:08 ` Andrea Arcangeli
2018-09-27 5:10 ` Chulmin Kim
2018-10-05 6:26 ` Chulmin Kim
0 siblings, 2 replies; 5+ messages in thread
From: Andrea Arcangeli @ 2018-09-24 21:08 UTC (permalink / raw)
To: Chulmin Kim; +Cc: Chulmin Kim, linux-mm
Hello,
On Sat, Sep 22, 2018 at 01:38:07PM +0900, Chulmin Kim wrote:
> Dear Arcangeli,
>
>
> I think this problem is very much related with
>
> the race condition shown in the below commit.
>
> (e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
> against rmap_walk)
>
>
> I checked that
>
> the the thread and its child threads are doing mprotect(PROT_{NONE or
> R|W}) things repeatedly
>
> while I didn't reproduce the problem yet.
>
>
> Do you think this is one of the phenomenon you expected
>
> from the race condition shown in the above commit?
Yes that commit will fix your problem in a v4.4 based tree that misses
that fix. You just need to cherry-pick that commit to fix the problem.
Page migrate sets the pte to PROT_NONE by mistake because it runs
concurrently with the mprotect that transitions an adjacent vma from
PROT_NONE to PROT_READ|WRITE. vma_merge (before the fix) temporarily
shown an erratic PROT_NONE vma prot for the virtual range under page
migration.
With NUMA disabled, it's likely compaction that triggered page migrate
for you. Disabling compaction at build time would have likely hidden
the problem. Compaction uses migration and you most certainly have
CONFIG_COMPACTION=y (rightfully so).
On a side note, I suggest to cherry pick the last upstream commit of
mm/vmacache.c too.
Hope this helps,
Andrea
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma
2018-09-24 21:08 ` Andrea Arcangeli
@ 2018-09-27 5:10 ` Chulmin Kim
2018-10-05 6:26 ` Chulmin Kim
1 sibling, 0 replies; 5+ messages in thread
From: Chulmin Kim @ 2018-09-27 5:10 UTC (permalink / raw)
To: Andrea Arcangeli, linux-mm, Chulmin Kim
Hello.
Thanks for the reply.
We are doing the test (a kind of aging test for 3 days) to prove this is
the fix for the problem.
I will let you know when the test is done.
On 09/25/2018 06:08 AM, Andrea Arcangeli wrote:
> Hello,
>
> On Sat, Sep 22, 2018 at 01:38:07PM +0900, Chulmin Kim wrote:
>> Dear Arcangeli,
>>
>>
>> I think this problem is very much related with
>>
>> the race condition shown in the below commit.
>>
>> (e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
>> against rmap_walk)
>>
>>
>> I checked that
>>
>> the the thread and its child threads are doing mprotect(PROT_{NONE or
>> R|W}) things repeatedly
>>
>> while I didn't reproduce the problem yet.
>>
>>
>> Do you think this is one of the phenomenon you expected
>>
>> from the race condition shown in the above commit?
> Yes that commit will fix your problem in a v4.4 based tree that misses
> that fix. You just need to cherry-pick that commit to fix the problem.
>
> Page migrate sets the pte to PROT_NONE by mistake because it runs
> concurrently with the mprotect that transitions an adjacent vma from
> PROT_NONE to PROT_READ|WRITE. vma_merge (before the fix) temporarily
> shown an erratic PROT_NONE vma prot for the virtual range under page
> migration.
>
> With NUMA disabled, it's likely compaction that triggered page migrate
> for you. Disabling compaction at build time would have likely hidden
> the problem. Compaction uses migration and you most certainly have
> CONFIG_COMPACTION=y (rightfully so).
>
> On a side note, I suggest to cherry pick the last upstream commit of
> mm/vmacache.c too.
Sorry but I didn't get this line correctly.
Do you meanthe commit 7a9cdebdc (mm: get rid of vmacache_flush_all()
entirely)?
Could you elaborate what is the point?
Are you saying there is another scenario that makes the problem I am seeing?
> Hope this helps,
> Andrea
>
>
>
Thanks.
Chulmin Kim
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma
2018-09-24 21:08 ` Andrea Arcangeli
2018-09-27 5:10 ` Chulmin Kim
@ 2018-10-05 6:26 ` Chulmin Kim
1 sibling, 0 replies; 5+ messages in thread
From: Chulmin Kim @ 2018-10-05 6:26 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Chulmin Kim, linux-mm
Dear all,
We have verified using the problem scenario (repeat execution fo android
apps for 2~3 days) that
the problem is gone after applying the commit.
- e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
against rmap_walk
Thanks!
Chulmin Kim
On 09/25/2018 06:08 AM, Andrea Arcangeli wrote:
> Hello,
>
> On Sat, Sep 22, 2018 at 01:38:07PM +0900, Chulmin Kim wrote:
>> Dear Arcangeli,
>>
>>
>> I think this problem is very much related with
>>
>> the race condition shown in the below commit.
>>
>> (e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
>> against rmap_walk)
>>
>>
>> I checked that
>>
>> the the thread and its child threads are doing mprotect(PROT_{NONE or
>> R|W}) things repeatedly
>>
>> while I didn't reproduce the problem yet.
>>
>>
>> Do you think this is one of the phenomenon you expected
>>
>> from the race condition shown in the above commit?
> Yes that commit will fix your problem in a v4.4 based tree that misses
> that fix. You just need to cherry-pick that commit to fix the problem.
>
> Page migrate sets the pte to PROT_NONE by mistake because it runs
> concurrently with the mprotect that transitions an adjacent vma from
> PROT_NONE to PROT_READ|WRITE. vma_merge (before the fix) temporarily
> shown an erratic PROT_NONE vma prot for the virtual range under page
> migration.
>
> With NUMA disabled, it's likely compaction that triggered page migrate
> for you. Disabling compaction at build time would have likely hidden
> the problem. Compaction uses migration and you most certainly have
> CONFIG_COMPACTION=y (rightfully so).
>
> On a side note, I suggest to cherry pick the last upstream commit of
> mm/vmacache.c too.
>
> Hope this helps,
> Andrea
>
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-10-05 6:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CGME20180921150147epcas5p33964436b2e609016311e4f12b715779d@epcas5p3.samsung.com>
2018-09-21 15:01 ` Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma Chulmin Kim
2018-09-22 4:38 ` Chulmin Kim
2018-09-24 21:08 ` Andrea Arcangeli
2018-09-27 5:10 ` Chulmin Kim
2018-10-05 6:26 ` Chulmin Kim
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.