Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT

All of lore.kernel.org
 help / color / mirror / Atom feed

* Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma
@ 2018-09-21 15:01 ` Chulmin Kim
  2018-09-22  4:38   ` Chulmin Kim
  0 siblings, 1 reply; 5+ messages in thread
From: Chulmin Kim @ 2018-09-21 15:01 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: text/plain, Size: 1318 bytes --]

Hi all.
I am developing an android smartphone.

I am facing a problem that a thread is looping the page fault routine
forever.
(The kernel version is around v4.4 though it may differ from the mainline
slightly
as the problem occurs in a device being developed in my company.)

The pte corresponding to the fault address is with PTE_PROT_NONE and
!PTE_VALID.
(by the way, the pte is mapped to anon page (ashmem))
The weird thing, in my opinion, is that
the VMA of the fault address is not with PROT_NONE but with PROT_READ &
PROT_WRITE.
So, the page fault routine (handle_pte_fault()) returns 0 and fault loops
forever.

I don't think this is a normal situation.

As I didn't enable NUMA, a pte with PROT_NONE and !PTE_VALID is likely set
by mprotect().
1. mprotect(PROT_NONE) -> vma split & set pte with PROT_NONE
2. mprotect(PROT_READ & WRITE) -> vma merge & revert pte
I suspect that the revert pte in #2 didn't work somehow
but no clue.

I googled and found a similar situation (
http://linux-kernel.2935.n7.nabble.com/pipe-page-fault-oddness-td953839.html)
which is relevant to NUMA and huge pagetable configs
while my device is nothing to do with those configs.

Am I missing any possible scenario? or is it already known BUG?
It will be pleasure if you can give any idea about this problem.

Thanks.
Chulmin Kim

[-- Attachment #2: Type: text/html, Size: 1811 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma
  2018-09-21 15:01 ` Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma Chulmin Kim
@ 2018-09-22  4:38   ` Chulmin Kim
  2018-09-24 21:08     ` Andrea Arcangeli
  0 siblings, 1 reply; 5+ messages in thread
From: Chulmin Kim @ 2018-09-22  4:38 UTC (permalink / raw)
  To: Chulmin Kim, linux-mm, aarcange

Dear Arcangeli,


I think this problem is very much related with

the race condition shown in the below commit.

(e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition 
against rmap_walk)


I checked that

the the thread and its child threads are doing mprotect(PROT_{NONE or 
R|W}) things repeatedly

while I didn't reproduce the problem yet.


Do you think this is one of the phenomenon you expected

from the race condition shown in the above commit?


Thanks.

Chulmin Kim



On 09/22/2018 12:01 AM, Chulmin Kim wrote:
> Hi all.
> I am developing an android smartphone.
>
> I am facing a problem that a thread is looping the page fault routine 
> forever.
> (The kernel version is around v4.4 though it may differ from the 
> mainline slightly
> as the problem occurs in a device being developed in my company.)
>
> The pte corresponding to the fault address is with PTE_PROT_NONE and 
> !PTE_VALID.
> (by the way, the pte is mapped to anon page (ashmem))
> The weird thing, in my opinion, is that
> the VMA of the fault address is not withA PROT_NONEA but with PROT_READ 
> & PROT_WRITE.
> So, the page fault routine (handle_pte_fault()) returns 0 and fault 
> loops forever.
>
> I don't think this is a normal situation.
>
> As I didn't enable NUMA, a pte with PROT_NONE and !PTE_VALID is likely 
> set by mprotect().
> 1. mprotect(PROT_NONE) -> vma split & set pte with PROT_NONE
> 2. mprotect(PROT_READ & WRITE) -> vma merge & revert pte
> I suspect that the revert pte in #2 didn't work somehow
> but no clue.
>
> I googled and found a similar situation 
> (http://linux-kernel.2935.n7.nabble.com/pipe-page-fault-oddness-td953839.html) 
> which is relevant to NUMA and huge pagetable configs
> while my device is nothing to do with those configs.
>
> Am I missing any possible scenario? or is it already known BUG?
> It will be pleasure if you can give any idea about this problem.
>
> Thanks.
> Chulmin Kim

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma
  2018-09-22  4:38   ` Chulmin Kim
@ 2018-09-24 21:08     ` Andrea Arcangeli
  2018-09-27  5:10       ` Chulmin Kim
  2018-10-05  6:26       ` Chulmin Kim
  0 siblings, 2 replies; 5+ messages in thread
From: Andrea Arcangeli @ 2018-09-24 21:08 UTC (permalink / raw)
  To: Chulmin Kim; +Cc: Chulmin Kim, linux-mm

Hello,

On Sat, Sep 22, 2018 at 01:38:07PM +0900, Chulmin Kim wrote:
> Dear Arcangeli,
> 
> 
> I think this problem is very much related with
> 
> the race condition shown in the below commit.
> 
> (e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition 
> against rmap_walk)
> 
> 
> I checked that
> 
> the the thread and its child threads are doing mprotect(PROT_{NONE or 
> R|W}) things repeatedly
> 
> while I didn't reproduce the problem yet.
> 
> 
> Do you think this is one of the phenomenon you expected
> 
> from the race condition shown in the above commit?

Yes that commit will fix your problem in a v4.4 based tree that misses
that fix. You just need to cherry-pick that commit to fix the problem.

Page migrate sets the pte to PROT_NONE by mistake because it runs
concurrently with the mprotect that transitions an adjacent vma from
PROT_NONE to PROT_READ|WRITE. vma_merge (before the fix) temporarily
shown an erratic PROT_NONE vma prot for the virtual range under page
migration.

With NUMA disabled, it's likely compaction that triggered page migrate
for you. Disabling compaction at build time would have likely hidden
the problem. Compaction uses migration and you most certainly have
CONFIG_COMPACTION=y (rightfully so).

On a side note, I suggest to cherry pick the last upstream commit of
mm/vmacache.c too.

Hope this helps,
Andrea

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma
  2018-09-24 21:08     ` Andrea Arcangeli
@ 2018-09-27  5:10       ` Chulmin Kim
  2018-10-05  6:26       ` Chulmin Kim
  1 sibling, 0 replies; 5+ messages in thread
From: Chulmin Kim @ 2018-09-27  5:10 UTC (permalink / raw)
  To: Andrea Arcangeli, linux-mm, Chulmin Kim

Hello.

Thanks for the reply.

We are doing the test (a kind of aging test for 3 days) to prove this is 
the fix for the problem.

I will let you know when the test is done.



On 09/25/2018 06:08 AM, Andrea Arcangeli wrote:
> Hello,
>
> On Sat, Sep 22, 2018 at 01:38:07PM +0900, Chulmin Kim wrote:
>> Dear Arcangeli,
>>
>>
>> I think this problem is very much related with
>>
>> the race condition shown in the below commit.
>>
>> (e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
>> against rmap_walk)
>>
>>
>> I checked that
>>
>> the the thread and its child threads are doing mprotect(PROT_{NONE or
>> R|W}) things repeatedly
>>
>> while I didn't reproduce the problem yet.
>>
>>
>> Do you think this is one of the phenomenon you expected
>>
>> from the race condition shown in the above commit?
> Yes that commit will fix your problem in a v4.4 based tree that misses
> that fix. You just need to cherry-pick that commit to fix the problem.
>
> Page migrate sets the pte to PROT_NONE by mistake because it runs
> concurrently with the mprotect that transitions an adjacent vma from
> PROT_NONE to PROT_READ|WRITE. vma_merge (before the fix) temporarily
> shown an erratic PROT_NONE vma prot for the virtual range under page
> migration.
>
> With NUMA disabled, it's likely compaction that triggered page migrate
> for you. Disabling compaction at build time would have likely hidden
> the problem. Compaction uses migration and you most certainly have
> CONFIG_COMPACTION=y (rightfully so).
>
> On a side note, I suggest to cherry pick the last upstream commit of
> mm/vmacache.c too.
Sorry but I didn't get this line correctly.

Do you meanthe commit 7a9cdebdc (mm: get rid of vmacache_flush_all() 
entirely)?
Could you elaborate what is the point?
Are you saying there is another scenario that makes the problem I am seeing?

> Hope this helps,
> Andrea
>
>
>

Thanks.
Chulmin Kim

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma
  2018-09-24 21:08     ` Andrea Arcangeli
  2018-09-27  5:10       ` Chulmin Kim
@ 2018-10-05  6:26       ` Chulmin Kim
  1 sibling, 0 replies; 5+ messages in thread
From: Chulmin Kim @ 2018-10-05  6:26 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Chulmin Kim, linux-mm

Dear all,


We have verified using the problem scenario (repeat execution fo android 
apps for 2~3 days) that

the problem is gone after applying the commit.

- e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
against rmap_walk


Thanks!
Chulmin Kim


On 09/25/2018 06:08 AM, Andrea Arcangeli wrote:
> Hello,
>
> On Sat, Sep 22, 2018 at 01:38:07PM +0900, Chulmin Kim wrote:
>> Dear Arcangeli,
>>
>>
>> I think this problem is very much related with
>>
>> the race condition shown in the below commit.
>>
>> (e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
>> against rmap_walk)
>>
>>
>> I checked that
>>
>> the the thread and its child threads are doing mprotect(PROT_{NONE or
>> R|W}) things repeatedly
>>
>> while I didn't reproduce the problem yet.
>>
>>
>> Do you think this is one of the phenomenon you expected
>>
>> from the race condition shown in the above commit?
> Yes that commit will fix your problem in a v4.4 based tree that misses
> that fix. You just need to cherry-pick that commit to fix the problem.
>
> Page migrate sets the pte to PROT_NONE by mistake because it runs
> concurrently with the mprotect that transitions an adjacent vma from
> PROT_NONE to PROT_READ|WRITE. vma_merge (before the fix) temporarily
> shown an erratic PROT_NONE vma prot for the virtual range under page
> migration.
>
> With NUMA disabled, it's likely compaction that triggered page migrate
> for you. Disabling compaction at build time would have likely hidden
> the problem. Compaction uses migration and you most certainly have
> CONFIG_COMPACTION=y (rightfully so).
>
> On a side note, I suggest to cherry pick the last upstream commit of
> mm/vmacache.c too.
>
> Hope this helps,
> Andrea
>
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-10-05  6:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20180921150147epcas5p33964436b2e609016311e4f12b715779d@epcas5p3.samsung.com>
2018-09-21 15:01 ` Question about a pte with PTE_PROT_NONE and !PTE_VALID on !PROT_NONE vma Chulmin Kim
2018-09-22  4:38   ` Chulmin Kim
2018-09-24 21:08     ` Andrea Arcangeli
2018-09-27  5:10       ` Chulmin Kim
2018-10-05  6:26       ` Chulmin Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.