All of lore.kernel.org
 help / color / mirror / Atom feed
* [Question] ksm: rmap_item pointing to some stale vmas
@ 2015-04-09 14:05 ` Susheel Khiani
  0 siblings, 0 replies; 14+ messages in thread
From: Susheel Khiani @ 2015-04-09 14:05 UTC (permalink / raw)
  To: akpm, peterz, neilb, dhowells, hughd, paulmcquad, linux-mm, linux-kernel

Hi,

We are seeing an issue during try_to_unmap_ksm where in call to 
try_to_unmap_one is failing.

try_to_unmap_ksm in this particular case is trying to go through vmas 
associated with each rmap_item->anon_vma. What we see is this that the 
corresponding page is not mapped to any of the vmas associated with 2 
rmap_item.

The associated rmap_item in this case looks like pointing to some valid 
vma but the said page is not found to be mapped under it. 
try_to_unmap_one thus fails to find valid ptes for these vmas.

At the same time we can see that the page actually is mapped in 2 
separate and different vmas which are not part of rmap_item associated 
with page.

So whether rmap_item is pointing to some stale vmas and now the mapping 
has changed? Or there is something else going on here.
p
Any pointer would be appreciated.

-- 
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Question] ksm: rmap_item pointing to some stale vmas
@ 2015-04-09 14:05 ` Susheel Khiani
  0 siblings, 0 replies; 14+ messages in thread
From: Susheel Khiani @ 2015-04-09 14:05 UTC (permalink / raw)
  To: akpm, peterz, neilb, dhowells, hughd, paulmcquad, linux-mm, linux-kernel

Hi,

We are seeing an issue during try_to_unmap_ksm where in call to 
try_to_unmap_one is failing.

try_to_unmap_ksm in this particular case is trying to go through vmas 
associated with each rmap_item->anon_vma. What we see is this that the 
corresponding page is not mapped to any of the vmas associated with 2 
rmap_item.

The associated rmap_item in this case looks like pointing to some valid 
vma but the said page is not found to be mapped under it. 
try_to_unmap_one thus fails to find valid ptes for these vmas.

At the same time we can see that the page actually is mapped in 2 
separate and different vmas which are not part of rmap_item associated 
with page.

So whether rmap_item is pointing to some stale vmas and now the mapping 
has changed? Or there is something else going on here.
p
Any pointer would be appreciated.

-- 
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
  2015-04-09 14:05 ` Susheel Khiani
@ 2015-04-10 17:56   ` Hugh Dickins
  -1 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2015-04-10 17:56 UTC (permalink / raw)
  To: Susheel Khiani
  Cc: akpm, peterz, neilb, dhowells, hughd, paulmcquad, linux-mm, linux-kernel

On Thu, 9 Apr 2015, Susheel Khiani wrote:

> Hi,
> 
> We are seeing an issue during try_to_unmap_ksm where in call to
> try_to_unmap_one is failing.
> 
> try_to_unmap_ksm in this particular case is trying to go through vmas
> associated with each rmap_item->anon_vma. What we see is this that the
> corresponding page is not mapped to any of the vmas associated with 2
> rmap_item.
> 
> The associated rmap_item in this case looks like pointing to some valid vma
> but the said page is not found to be mapped under it. try_to_unmap_one thus
> fails to find valid ptes for these vmas.
> 
> At the same time we can see that the page actually is mapped in 2 separate
> and different vmas which are not part of rmap_item associated with page.
> 
> So whether rmap_item is pointing to some stale vmas and now the mapping has
> changed? Or there is something else going on here.
> p
> Any pointer would be appreciated.

I expected to be able to argue this away, but no: I think you've found
a bug, and I think I get it too.  I have no idea what's wrong at this
point, will set aside some time to investigate, and report back.

Which kernel are you using?  try_to_unmap_ksm says v3.13 or earlier.
Probably doesn't affect the bug, but may affect the patch you'll need.

Hugh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
@ 2015-04-10 17:56   ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2015-04-10 17:56 UTC (permalink / raw)
  To: Susheel Khiani
  Cc: akpm, peterz, neilb, dhowells, hughd, paulmcquad, linux-mm, linux-kernel

On Thu, 9 Apr 2015, Susheel Khiani wrote:

> Hi,
> 
> We are seeing an issue during try_to_unmap_ksm where in call to
> try_to_unmap_one is failing.
> 
> try_to_unmap_ksm in this particular case is trying to go through vmas
> associated with each rmap_item->anon_vma. What we see is this that the
> corresponding page is not mapped to any of the vmas associated with 2
> rmap_item.
> 
> The associated rmap_item in this case looks like pointing to some valid vma
> but the said page is not found to be mapped under it. try_to_unmap_one thus
> fails to find valid ptes for these vmas.
> 
> At the same time we can see that the page actually is mapped in 2 separate
> and different vmas which are not part of rmap_item associated with page.
> 
> So whether rmap_item is pointing to some stale vmas and now the mapping has
> changed? Or there is something else going on here.
> p
> Any pointer would be appreciated.

I expected to be able to argue this away, but no: I think you've found
a bug, and I think I get it too.  I have no idea what's wrong at this
point, will set aside some time to investigate, and report back.

Which kernel are you using?  try_to_unmap_ksm says v3.13 or earlier.
Probably doesn't affect the bug, but may affect the patch you'll need.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
  2015-04-10 17:56   ` Hugh Dickins
@ 2015-04-14  7:01     ` Susheel Khiani
  -1 siblings, 0 replies; 14+ messages in thread
From: Susheel Khiani @ 2015-04-14  7:01 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel

On 04/10/15 23:26, Hugh Dickins wrote:
> On Thu, 9 Apr 2015, Susheel Khiani wrote:
>
>> Hi,
>>
>> We are seeing an issue during try_to_unmap_ksm where in call to
>> try_to_unmap_one is failing.
>>
>> try_to_unmap_ksm in this particular case is trying to go through vmas
>> associated with each rmap_item->anon_vma. What we see is this that the
>> corresponding page is not mapped to any of the vmas associated with 2
>> rmap_item.
>>
>> The associated rmap_item in this case looks like pointing to some valid vma
>> but the said page is not found to be mapped under it. try_to_unmap_one thus
>> fails to find valid ptes for these vmas.
>>
>> At the same time we can see that the page actually is mapped in 2 separate
>> and different vmas which are not part of rmap_item associated with page.
>>
>> So whether rmap_item is pointing to some stale vmas and now the mapping has
>> changed? Or there is something else going on here.
>> p
>> Any pointer would be appreciated.
>
> I expected to be able to argue this away, but no: I think you've found
> a bug, and I think I get it too.  I have no idea what's wrong at this
> point, will set aside some time to investigate, and report back.
>
> Which kernel are you using?  try_to_unmap_ksm says v3.13 or earlier.
> Probably doesn't affect the bug, but may affect the patch you'll need.
>
> Hugh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

We are using kernel-3.10.49 and I have gone through patches of ksm above 
this kernel version but didn't find anything relevant w.r.t issue. The 
latest patch which we have for KSM on our tree is

668f9abb: mm: close PageTail race

The issue otherwise is difficult to reproduce and is appearing after 
days of testing on 512MB Android platform. What I am not able to figure 
out is which code path in ksm could actually land us in situation where 
in stable_node we still have stale rmap_items with old vmas which are 
now unmapped.

In the dumps we can see the new vmas mapping to the page but the new 
rmap_items with these new vmas which maps the page are still not updated 
in stable_node.


-- 
Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
@ 2015-04-14  7:01     ` Susheel Khiani
  0 siblings, 0 replies; 14+ messages in thread
From: Susheel Khiani @ 2015-04-14  7:01 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel

On 04/10/15 23:26, Hugh Dickins wrote:
> On Thu, 9 Apr 2015, Susheel Khiani wrote:
>
>> Hi,
>>
>> We are seeing an issue during try_to_unmap_ksm where in call to
>> try_to_unmap_one is failing.
>>
>> try_to_unmap_ksm in this particular case is trying to go through vmas
>> associated with each rmap_item->anon_vma. What we see is this that the
>> corresponding page is not mapped to any of the vmas associated with 2
>> rmap_item.
>>
>> The associated rmap_item in this case looks like pointing to some valid vma
>> but the said page is not found to be mapped under it. try_to_unmap_one thus
>> fails to find valid ptes for these vmas.
>>
>> At the same time we can see that the page actually is mapped in 2 separate
>> and different vmas which are not part of rmap_item associated with page.
>>
>> So whether rmap_item is pointing to some stale vmas and now the mapping has
>> changed? Or there is something else going on here.
>> p
>> Any pointer would be appreciated.
>
> I expected to be able to argue this away, but no: I think you've found
> a bug, and I think I get it too.  I have no idea what's wrong at this
> point, will set aside some time to investigate, and report back.
>
> Which kernel are you using?  try_to_unmap_ksm says v3.13 or earlier.
> Probably doesn't affect the bug, but may affect the patch you'll need.
>
> Hugh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

We are using kernel-3.10.49 and I have gone through patches of ksm above 
this kernel version but didn't find anything relevant w.r.t issue. The 
latest patch which we have for KSM on our tree is

668f9abb: mm: close PageTail race

The issue otherwise is difficult to reproduce and is appearing after 
days of testing on 512MB Android platform. What I am not able to figure 
out is which code path in ksm could actually land us in situation where 
in stable_node we still have stale rmap_items with old vmas which are 
now unmapped.

In the dumps we can see the new vmas mapping to the page but the new 
rmap_items with these new vmas which maps the page are still not updated 
in stable_node.


-- 
Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
  2015-04-14  7:01     ` Susheel Khiani
@ 2015-04-15  6:22       ` Hugh Dickins
  -1 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2015-04-15  6:22 UTC (permalink / raw)
  To: Susheel Khiani
  Cc: Hugh Dickins, akpm, peterz, neilb, dhowells, paulmcquad,
	linux-mm, linux-kernel

On Tue, 14 Apr 2015, Susheel Khiani wrote:
> On 04/10/15 23:26, Hugh Dickins wrote:
> > On Thu, 9 Apr 2015, Susheel Khiani wrote:
> > > 
> > > We are seeing an issue during try_to_unmap_ksm where in call to
> > > try_to_unmap_one is failing.
> > > 
> > > try_to_unmap_ksm in this particular case is trying to go through vmas
> > > associated with each rmap_item->anon_vma. What we see is this that the
> > > corresponding page is not mapped to any of the vmas associated with 2
> > > rmap_item.
> > > 
> > > The associated rmap_item in this case looks like pointing to some valid
> > > vma
> > > but the said page is not found to be mapped under it. try_to_unmap_one
> > > thus
> > > fails to find valid ptes for these vmas.
> > > 
> > > At the same time we can see that the page actually is mapped in 2
> > > separate
> > > and different vmas which are not part of rmap_item associated with page.
> > > 
> > > So whether rmap_item is pointing to some stale vmas and now the mapping
> > > has
> > > changed? Or there is something else going on here.
> > > p
> > > Any pointer would be appreciated.
> > 
> > I expected to be able to argue this away, but no: I think you've found
> > a bug, and I think I get it too.  I have no idea what's wrong at this
> > point, will set aside some time to investigate, and report back.
> > 
> > Which kernel are you using?  try_to_unmap_ksm says v3.13 or earlier.
> > Probably doesn't affect the bug, but may affect the patch you'll need.
> > 
> 
> We are using kernel-3.10.49 and I have gone through patches of ksm above this
> kernel version but didn't find anything relevant w.r.t issue. The latest
> patch which we have for KSM on our tree is
> 
> 668f9abb: mm: close PageTail race

I agree, I don't think 3.10.49 would be missing any relevant fix -
unless there's a later fix to some "random" corruption which happens
to hit you here in KSM.

I wonder how you identified that this issue of un-unmappable pages
is peculiar to KSM.  Have you established that ordinary anon pages
(we need not worry about file pages here) are always successfully
unmappable?  KSM is reliant upon anon_vmas working as intended
(but then makes use of them in its own peculiar way).

> 
> The issue otherwise is difficult to reproduce and is appearing after days of
> testing on 512MB Android platform. What I am not able to figure out is which
> code path in ksm could actually land us in situation where in stable_node we
> still have stale rmap_items with old vmas which are now unmapped.

Whether that's something to worry about depends on what you mean.

It's normal for a stable_node to have some stale rmap_items attached,
now pointing to pages different from the stable page, or pointing to none.
That's in the nature of KSM, the way ksmd builds up its structures by
peeking at what's in each mm, moving on, and coming back a cycle later
to discover what's changed.

But the anon_vma which such a stale rmap_item points to should remain
valid (KSM holds an additional reference to it), even if its interval
tree is now empty, or none of the vmas that it holds now cover this
mm,address (but any vmas held should still be valid vmas).

I was concerned, not that the stable_node has stale rmap_items attached,
but that you know the page to be mapped, yet try_to_unmap_ksm is unable
to locate its mappings.

> 
> In the dumps we can see the new vmas mapping to the page but the new
> rmap_items with these new vmas which maps the page are still not updated in
> stable_node.

"still not updated" after how long?
I assume you to mean that, how ever long you wait (but at least
one full scan), the stable_node is not updated with an rmap_item
pointing to an anon_vma whose interval tree contains one of these
new vmas which maps the page?

(When setting up a new stable node, it will take several scans to
establish, and can be delayed by various races, such as shifts in
the unstable tree, and the trylock_page in try_to_merge_one_page.
But I think that once you can see a stable ksm page mapped somewhere,
all pointers to it should be captured within a single scan.)

That's bad, but I have no idea of the cause.  I mention corruption
above, because that would be one possibility; though unlikely if
it always hits you here in KSM only.

Whereas if you mean that a new mapping of the stable page may not
be unmapped until ksmd has completed a full scan, that is also
wrong, but not so serious.  Or would even that be a serious issue
for you?  Please describe how this comes to be a problem for you.

I believe I have found two bugs that would explain the latter case;
but both of them require fork, and legend has it that Android avoids
fork (correct me if wrong); so I doubt they're responsible for your
case, and expect both to be corrected within one full scan.

The lesser of the bugs is this: KSM reclaim (dependent on anon_vmas)
was introduced in 2.6.33, but then anon_vma_chains were introduced
in 2.6.34, and I suspect that the conversion ought to have updated
try_to_merge_with_ksm_page, to take rmap_item->anon_vma from page
instead of from vma.  I believe that some fork-connected mappings
may be missed for a scan because of that.

But fixing it doesn't help much: because the greater bug (mine) is
that the search_new_forks code is not working as well as intended.
It relies on using one rmap_item's anon_vma to locate the page in
newer mappings forked from it, before ksmd reaches them to create
their own rmap_items; but we're doing nothing to prevent that
earlier rmap_item from being removed too soon.

I would much rather be sending a patch, than trying to describe
this so obscurely; but I have not succeeded and time has run out.

I got far enough, I think, to confirm that this happens for me,
and can be fixed by delaying the removal of such rmap_items.
But I did not get far enough to stop them from leaking wildly;
and although I've searched for quick and easy ways to do it,
have come to the conclusion that fixing it safely without leaks
will require more time and care than I can afford at present.

(And even with those fixed, there would still be rare cases when
a new mapping could not immediately be unmapped: for example,
replace_page increments kpage's mapcount, but a racing
try_to_unmap_ksm may hold kpage's page lock, preventing the
relevant rmap_item from being appended to the stable tree.)

I do hate to put down half-finished work, and would have liked
to send you a patch, even if only to confirm that my problem
is actually not your problem.  But I now see no alternative to
merely informing you of this, and wishing you luck in your own
investigation: I'm sorry, I just don't know.

But if I've misunderstood, and you think that what you're seeing
fits with the transient forking bugs I've (not quite) described,
and you can explain why even the transient case is important for
you to have fixed, then I really ought to redouble my efforts.

Hugh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
@ 2015-04-15  6:22       ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2015-04-15  6:22 UTC (permalink / raw)
  To: Susheel Khiani
  Cc: Hugh Dickins, akpm, peterz, neilb, dhowells, paulmcquad,
	linux-mm, linux-kernel

On Tue, 14 Apr 2015, Susheel Khiani wrote:
> On 04/10/15 23:26, Hugh Dickins wrote:
> > On Thu, 9 Apr 2015, Susheel Khiani wrote:
> > > 
> > > We are seeing an issue during try_to_unmap_ksm where in call to
> > > try_to_unmap_one is failing.
> > > 
> > > try_to_unmap_ksm in this particular case is trying to go through vmas
> > > associated with each rmap_item->anon_vma. What we see is this that the
> > > corresponding page is not mapped to any of the vmas associated with 2
> > > rmap_item.
> > > 
> > > The associated rmap_item in this case looks like pointing to some valid
> > > vma
> > > but the said page is not found to be mapped under it. try_to_unmap_one
> > > thus
> > > fails to find valid ptes for these vmas.
> > > 
> > > At the same time we can see that the page actually is mapped in 2
> > > separate
> > > and different vmas which are not part of rmap_item associated with page.
> > > 
> > > So whether rmap_item is pointing to some stale vmas and now the mapping
> > > has
> > > changed? Or there is something else going on here.
> > > p
> > > Any pointer would be appreciated.
> > 
> > I expected to be able to argue this away, but no: I think you've found
> > a bug, and I think I get it too.  I have no idea what's wrong at this
> > point, will set aside some time to investigate, and report back.
> > 
> > Which kernel are you using?  try_to_unmap_ksm says v3.13 or earlier.
> > Probably doesn't affect the bug, but may affect the patch you'll need.
> > 
> 
> We are using kernel-3.10.49 and I have gone through patches of ksm above this
> kernel version but didn't find anything relevant w.r.t issue. The latest
> patch which we have for KSM on our tree is
> 
> 668f9abb: mm: close PageTail race

I agree, I don't think 3.10.49 would be missing any relevant fix -
unless there's a later fix to some "random" corruption which happens
to hit you here in KSM.

I wonder how you identified that this issue of un-unmappable pages
is peculiar to KSM.  Have you established that ordinary anon pages
(we need not worry about file pages here) are always successfully
unmappable?  KSM is reliant upon anon_vmas working as intended
(but then makes use of them in its own peculiar way).

> 
> The issue otherwise is difficult to reproduce and is appearing after days of
> testing on 512MB Android platform. What I am not able to figure out is which
> code path in ksm could actually land us in situation where in stable_node we
> still have stale rmap_items with old vmas which are now unmapped.

Whether that's something to worry about depends on what you mean.

It's normal for a stable_node to have some stale rmap_items attached,
now pointing to pages different from the stable page, or pointing to none.
That's in the nature of KSM, the way ksmd builds up its structures by
peeking at what's in each mm, moving on, and coming back a cycle later
to discover what's changed.

But the anon_vma which such a stale rmap_item points to should remain
valid (KSM holds an additional reference to it), even if its interval
tree is now empty, or none of the vmas that it holds now cover this
mm,address (but any vmas held should still be valid vmas).

I was concerned, not that the stable_node has stale rmap_items attached,
but that you know the page to be mapped, yet try_to_unmap_ksm is unable
to locate its mappings.

> 
> In the dumps we can see the new vmas mapping to the page but the new
> rmap_items with these new vmas which maps the page are still not updated in
> stable_node.

"still not updated" after how long?
I assume you to mean that, how ever long you wait (but at least
one full scan), the stable_node is not updated with an rmap_item
pointing to an anon_vma whose interval tree contains one of these
new vmas which maps the page?

(When setting up a new stable node, it will take several scans to
establish, and can be delayed by various races, such as shifts in
the unstable tree, and the trylock_page in try_to_merge_one_page.
But I think that once you can see a stable ksm page mapped somewhere,
all pointers to it should be captured within a single scan.)

That's bad, but I have no idea of the cause.  I mention corruption
above, because that would be one possibility; though unlikely if
it always hits you here in KSM only.

Whereas if you mean that a new mapping of the stable page may not
be unmapped until ksmd has completed a full scan, that is also
wrong, but not so serious.  Or would even that be a serious issue
for you?  Please describe how this comes to be a problem for you.

I believe I have found two bugs that would explain the latter case;
but both of them require fork, and legend has it that Android avoids
fork (correct me if wrong); so I doubt they're responsible for your
case, and expect both to be corrected within one full scan.

The lesser of the bugs is this: KSM reclaim (dependent on anon_vmas)
was introduced in 2.6.33, but then anon_vma_chains were introduced
in 2.6.34, and I suspect that the conversion ought to have updated
try_to_merge_with_ksm_page, to take rmap_item->anon_vma from page
instead of from vma.  I believe that some fork-connected mappings
may be missed for a scan because of that.

But fixing it doesn't help much: because the greater bug (mine) is
that the search_new_forks code is not working as well as intended.
It relies on using one rmap_item's anon_vma to locate the page in
newer mappings forked from it, before ksmd reaches them to create
their own rmap_items; but we're doing nothing to prevent that
earlier rmap_item from being removed too soon.

I would much rather be sending a patch, than trying to describe
this so obscurely; but I have not succeeded and time has run out.

I got far enough, I think, to confirm that this happens for me,
and can be fixed by delaying the removal of such rmap_items.
But I did not get far enough to stop them from leaking wildly;
and although I've searched for quick and easy ways to do it,
have come to the conclusion that fixing it safely without leaks
will require more time and care than I can afford at present.

(And even with those fixed, there would still be rare cases when
a new mapping could not immediately be unmapped: for example,
replace_page increments kpage's mapcount, but a racing
try_to_unmap_ksm may hold kpage's page lock, preventing the
relevant rmap_item from being appended to the stable tree.)

I do hate to put down half-finished work, and would have liked
to send you a patch, even if only to confirm that my problem
is actually not your problem.  But I now see no alternative to
merely informing you of this, and wishing you luck in your own
investigation: I'm sorry, I just don't know.

But if I've misunderstood, and you think that what you're seeing
fits with the transient forking bugs I've (not quite) described,
and you can explain why even the transient case is important for
you to have fixed, then I really ought to redouble my efforts.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
  2015-04-15  6:22       ` Hugh Dickins
@ 2015-04-30  6:07         ` Susheel Khiani
  -1 siblings, 0 replies; 14+ messages in thread
From: Susheel Khiani @ 2015-04-30  6:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel

On 04/15/15 11:52, Hugh Dickins wrote:
>> We are using kernel-3.10.49 and I have gone through patches of ksm above this
>> >kernel version but didn't find anything relevant w.r.t issue. The latest
>> >patch which we have for KSM on our tree is
>> >
>> >668f9abb: mm: close PageTail race
> I agree, I don't think 3.10.49 would be missing any relevant fix -
> unless there's a later fix to some "random" corruption which happens
> to hit you here in KSM.
>
> I wonder how you identified that this issue of un-unmappable pages
> is peculiar to KSM.  Have you established that ordinary anon pages
> (we need not worry about file pages here) are always successfully
> unmappable?  KSM is reliant upon anon_vmas working as intended
> (but then makes use of them in its own peculiar way).
>


We identified issue in try_to_unmap_ksm as part of debugging CMA 
allocation failures. During alloc_contig_range we do migrate_pages, 
where we were failing to migrate a specific page even after all the 
retries which we make in migrate_pages function. Digging deeper we were 
able to conclude that we were failing in try_to_unmap_ksm where we 
failed to find valid ptes.


>> >
>> >The issue otherwise is difficult to reproduce and is appearing after days of
>> >testing on 512MB Android platform. What I am not able to figure out is which
>> >code path in ksm could actually land us in situation where in stable_node we
>> >still have stale rmap_items with old vmas which are now unmapped.
> Whether that's something to worry about depends on what you mean.
>
> It's normal for a stable_node to have some stale rmap_items attached,
> now pointing to pages different from the stable page, or pointing to none.
> That's in the nature of KSM, the way ksmd builds up its structures by
> peeking at what's in each mm, moving on, and coming back a cycle later
> to discover what's changed.
>
> But the anon_vma which such a stale rmap_item points to should remain
> valid (KSM holds an additional reference to it), even if its interval
> tree is now empty, or none of the vmas that it holds now cover this
> mm,address (but any vmas held should still be valid vmas).
>
> I was concerned, not that the stable_node has stale rmap_items attached,
> but that you know the page to be mapped, yet try_to_unmap_ksm is unable
> to locate its mappings.
>
>> >
>> >In the dumps we can see the new vmas mapping to the page but the new
>> >rmap_items with these new vmas which maps the page are still not updated in
>> >stable_node.
> "still not updated" after how long?
> I assume you to mean that, how ever long you wait (but at least
> one full scan), the stable_node is not updated with an rmap_item
> pointing to an anon_vma whose interval tree contains one of these
> new vmas which maps the page?


I have not yet concluded if we are waiting for one full scan or not. 
Since I was debugging this w.r.t CMA allocation failure by saying "still 
not updated" , I meant that even after all the number of retries which 
we make in CMA allocation path to migrate pages, the stable_node was not 
updated with rmap_item. But now I understand that we need to wait for at 
least one full ksm scan to see the update.


>
> (When setting up a new stable node, it will take several scans to
> establish, and can be delayed by various races, such as shifts in
> the unstable tree, and the trylock_page in try_to_merge_one_page.
> But I think that once you can see a stable ksm page mapped somewhere,
> all pointers to it should be captured within a single scan.)


I am actually thinking the reason for my issue could be that we might 
have not waited sufficient time to ensure that ksm scan ran once. The 
reason for this is I was able to track down mm_slot structure which we 
create in __ksm_enter and it contained mm_struct which had vma where our 
page is mapped. But rmap_list of this mm_slot was still NULL which I 
guess would get populate once ksm_do_scan runs.


>
> That's bad, but I have no idea of the cause.  I mention corruption
> above, because that would be one possibility; though unlikely if
> it always hits you here in KSM only.


Yes, even we have ruled out corruption since now we have seen multiple 
instances with similar symptoms.


>
> Whereas if you mean that a new mapping of the stable page may not
> be unmapped until ksmd has completed a full scan, that is also
> wrong, but not so serious.  Or would even that be a serious issue
> for you?  Please describe how this comes to be a problem for you.


Right now I don't have enough data points to claim that new mapping of 
the stable page may not be unmapped until ksmd has completed a full 
scan. But I am debugging in this direction and would get back once I 
have sufficient data.


>
> I believe I have found two bugs that would explain the latter case;
> but both of them require fork, and legend has it that Android avoids
> fork (correct me if wrong); so I doubt they're responsible for your
> case, and expect both to be corrected within one full scan.
>
> The lesser of the bugs is this: KSM reclaim (dependent on anon_vmas)
> was introduced in 2.6.33, but then anon_vma_chains were introduced
> in 2.6.34, and I suspect that the conversion ought to have updated
> try_to_merge_with_ksm_page, to take rmap_item->anon_vma from page
> instead of from vma.  I believe that some fork-connected mappings
> may be missed for a scan because of that.
>
> But fixing it doesn't help much: because the greater bug (mine) is
> that the search_new_forks code is not working as well as intended.
> It relies on using one rmap_item's anon_vma to locate the page in
> newer mappings forked from it, before ksmd reaches them to create
> their own rmap_items; but we're doing nothing to prevent that
> earlier rmap_item from being removed too soon.
>
> I would much rather be sending a patch, than trying to describe
> this so obscurely; but I have not succeeded and time has run out.
>
> I got far enough, I think, to confirm that this happens for me,
> and can be fixed by delaying the removal of such rmap_items.
> But I did not get far enough to stop them from leaking wildly;
> and although I've searched for quick and easy ways to do it,
> have come to the conclusion that fixing it safely without leaks
> will require more time and care than I can afford at present.
>
> (And even with those fixed, there would still be rare cases when
> a new mapping could not immediately be unmapped: for example,
> replace_page increments kpage's mapcount, but a racing
> try_to_unmap_ksm may hold kpage's page lock, preventing the
> relevant rmap_item from being appended to the stable tree.)
>
> I do hate to put down half-finished work, and would have liked
> to send you a patch, even if only to confirm that my problem
> is actually not your problem.  But I now see no alternative to
> merely informing you of this, and wishing you luck in your own
> investigation: I'm sorry, I just don't know.
>
> But if I've misunderstood, and you think that what you're seeing
> fits with the transient forking bugs I've (not quite) described,
> and you can explain why even the transient case is important for
> you to have fixed, then I really ought to redouble my efforts.
>
> Hugh


-- 
Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
@ 2015-04-30  6:07         ` Susheel Khiani
  0 siblings, 0 replies; 14+ messages in thread
From: Susheel Khiani @ 2015-04-30  6:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel

On 04/15/15 11:52, Hugh Dickins wrote:
>> We are using kernel-3.10.49 and I have gone through patches of ksm above this
>> >kernel version but didn't find anything relevant w.r.t issue. The latest
>> >patch which we have for KSM on our tree is
>> >
>> >668f9abb: mm: close PageTail race
> I agree, I don't think 3.10.49 would be missing any relevant fix -
> unless there's a later fix to some "random" corruption which happens
> to hit you here in KSM.
>
> I wonder how you identified that this issue of un-unmappable pages
> is peculiar to KSM.  Have you established that ordinary anon pages
> (we need not worry about file pages here) are always successfully
> unmappable?  KSM is reliant upon anon_vmas working as intended
> (but then makes use of them in its own peculiar way).
>


We identified issue in try_to_unmap_ksm as part of debugging CMA 
allocation failures. During alloc_contig_range we do migrate_pages, 
where we were failing to migrate a specific page even after all the 
retries which we make in migrate_pages function. Digging deeper we were 
able to conclude that we were failing in try_to_unmap_ksm where we 
failed to find valid ptes.


>> >
>> >The issue otherwise is difficult to reproduce and is appearing after days of
>> >testing on 512MB Android platform. What I am not able to figure out is which
>> >code path in ksm could actually land us in situation where in stable_node we
>> >still have stale rmap_items with old vmas which are now unmapped.
> Whether that's something to worry about depends on what you mean.
>
> It's normal for a stable_node to have some stale rmap_items attached,
> now pointing to pages different from the stable page, or pointing to none.
> That's in the nature of KSM, the way ksmd builds up its structures by
> peeking at what's in each mm, moving on, and coming back a cycle later
> to discover what's changed.
>
> But the anon_vma which such a stale rmap_item points to should remain
> valid (KSM holds an additional reference to it), even if its interval
> tree is now empty, or none of the vmas that it holds now cover this
> mm,address (but any vmas held should still be valid vmas).
>
> I was concerned, not that the stable_node has stale rmap_items attached,
> but that you know the page to be mapped, yet try_to_unmap_ksm is unable
> to locate its mappings.
>
>> >
>> >In the dumps we can see the new vmas mapping to the page but the new
>> >rmap_items with these new vmas which maps the page are still not updated in
>> >stable_node.
> "still not updated" after how long?
> I assume you to mean that, how ever long you wait (but at least
> one full scan), the stable_node is not updated with an rmap_item
> pointing to an anon_vma whose interval tree contains one of these
> new vmas which maps the page?


I have not yet concluded if we are waiting for one full scan or not. 
Since I was debugging this w.r.t CMA allocation failure by saying "still 
not updated" , I meant that even after all the number of retries which 
we make in CMA allocation path to migrate pages, the stable_node was not 
updated with rmap_item. But now I understand that we need to wait for at 
least one full ksm scan to see the update.


>
> (When setting up a new stable node, it will take several scans to
> establish, and can be delayed by various races, such as shifts in
> the unstable tree, and the trylock_page in try_to_merge_one_page.
> But I think that once you can see a stable ksm page mapped somewhere,
> all pointers to it should be captured within a single scan.)


I am actually thinking the reason for my issue could be that we might 
have not waited sufficient time to ensure that ksm scan ran once. The 
reason for this is I was able to track down mm_slot structure which we 
create in __ksm_enter and it contained mm_struct which had vma where our 
page is mapped. But rmap_list of this mm_slot was still NULL which I 
guess would get populate once ksm_do_scan runs.


>
> That's bad, but I have no idea of the cause.  I mention corruption
> above, because that would be one possibility; though unlikely if
> it always hits you here in KSM only.


Yes, even we have ruled out corruption since now we have seen multiple 
instances with similar symptoms.


>
> Whereas if you mean that a new mapping of the stable page may not
> be unmapped until ksmd has completed a full scan, that is also
> wrong, but not so serious.  Or would even that be a serious issue
> for you?  Please describe how this comes to be a problem for you.


Right now I don't have enough data points to claim that new mapping of 
the stable page may not be unmapped until ksmd has completed a full 
scan. But I am debugging in this direction and would get back once I 
have sufficient data.


>
> I believe I have found two bugs that would explain the latter case;
> but both of them require fork, and legend has it that Android avoids
> fork (correct me if wrong); so I doubt they're responsible for your
> case, and expect both to be corrected within one full scan.
>
> The lesser of the bugs is this: KSM reclaim (dependent on anon_vmas)
> was introduced in 2.6.33, but then anon_vma_chains were introduced
> in 2.6.34, and I suspect that the conversion ought to have updated
> try_to_merge_with_ksm_page, to take rmap_item->anon_vma from page
> instead of from vma.  I believe that some fork-connected mappings
> may be missed for a scan because of that.
>
> But fixing it doesn't help much: because the greater bug (mine) is
> that the search_new_forks code is not working as well as intended.
> It relies on using one rmap_item's anon_vma to locate the page in
> newer mappings forked from it, before ksmd reaches them to create
> their own rmap_items; but we're doing nothing to prevent that
> earlier rmap_item from being removed too soon.
>
> I would much rather be sending a patch, than trying to describe
> this so obscurely; but I have not succeeded and time has run out.
>
> I got far enough, I think, to confirm that this happens for me,
> and can be fixed by delaying the removal of such rmap_items.
> But I did not get far enough to stop them from leaking wildly;
> and although I've searched for quick and easy ways to do it,
> have come to the conclusion that fixing it safely without leaks
> will require more time and care than I can afford at present.
>
> (And even with those fixed, there would still be rare cases when
> a new mapping could not immediately be unmapped: for example,
> replace_page increments kpage's mapcount, but a racing
> try_to_unmap_ksm may hold kpage's page lock, preventing the
> relevant rmap_item from being appended to the stable tree.)
>
> I do hate to put down half-finished work, and would have liked
> to send you a patch, even if only to confirm that my problem
> is actually not your problem.  But I now see no alternative to
> merely informing you of this, and wishing you luck in your own
> investigation: I'm sorry, I just don't know.
>
> But if I've misunderstood, and you think that what you're seeing
> fits with the transient forking bugs I've (not quite) described,
> and you can explain why even the transient case is important for
> you to have fixed, then I really ought to redouble my efforts.
>
> Hugh


-- 
Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
  2015-04-30  6:07         ` Susheel Khiani
@ 2015-06-09 18:26           ` Susheel Khiani
  -1 siblings, 0 replies; 14+ messages in thread
From: Susheel Khiani @ 2015-06-09 18:26 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel

On 4/30/2015 11:37 AM, Susheel Khiani wrote:
>> But if I've misunderstood, and you think that what you're seeing
>> fits with the transient forking bugs I've (not quite) described,
>> and you can explain why even the transient case is important for
>> you to have fixed, then I really ought to redouble my efforts.
>>
>> Hugh

I was able to root cause the issue as we got few instances of same and 
was frequently getting reproducible on stress tests. The reason why it 
was important was because failure to unmap ksm page was resulting into 
CMA allocation failure for us.

For cases like fork, what we observed is for private mapped file pages, 
stable_node pointed by KSM page won't cover all the mappings until ksmd 
completes one full scan. Only after ksmd scan, new rmap_items pointing 
to mappings in child process would come into existence. So in cases like 
CMA allocations where we can't wait for ksmd to complete one full cycle, 
we can traverse anon_vma tree from parent's anon_vma to find out all the 
pages wheres CMA is mapped.

I have tested the following patch on 3.10 kernel and with this change I 
am able to avoid CMA allocation failure which we were otherwise 
frequently seeing because of not able to unmap KSM page.

Please review and let me know the feedback.



[PATCH] ksm: Traverse through parent's anon_vma while unmapping

While doing try_to_unmap_ksm, we traverse through
rmap_item list to find out all the anon_vmas from which
page needs to be unmapped.

Now as per the design of KSM, it builds up its data
structures by looking into each mm, and comes back a cycle
later to find out which data structures are now outdated and
needs to be updated. So, for cases like fork, what we
observe is for private mapped file pages stable_node
pointed by KSM page won't cover all the mappings until
ksmd completes one full scan. Only after ksmd scan, new
rmap_items pointing to mappings in child process would come
into existence.

As a result unmapping of a stable page can't be done until
ksmd has completed one full scan. This becomes an issue in
case of CMA where we need to unmap and move a CMA page and
can't wait for ksmd to complete one cycle. Because of
new rmap_items for new mapping still not created we won't be
able to unmap CMA page from all the vmas where it is mapped.
This would result in frequent CMA allocation failures.

So instead of just relying on rmap_items list which we know
can contain incomplete list, we also scan anon_vma tree from
parent's anon_vma to find out all the vmas where CMA page is
mapped and thereby successfully unmap the page and move it
to new page.

Change-Id: I97cacf6a73734b10c7098362c20fb3f2d4040c76
Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>
---
  mm/ksm.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
  1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 11f6293..10d5266 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1956,6 +1956,7 @@ int page_referenced_ksm(struct page *page, struct 
mem_cgroup *memcg,
  	unsigned int mapcount = page_mapcount(page);
  	int referenced = 0;
  	int search_new_forks = 0;
+	int search_from_root = 0;

  	VM_BUG_ON(!PageKsm(page));
  	VM_BUG_ON(!PageLocked(page));
@@ -1968,9 +1969,20 @@ again:
  		struct anon_vma *anon_vma = rmap_item->anon_vma;
  		struct anon_vma_chain *vmac;
  		struct vm_area_struct *vma;
+		struct rb_root rb_root;
+
+		if (!search_from_root) {
+			if (anon_vma)
+				rb_root = anon_vma->rb_root;
+		}
+		else {
+			if (anon_vma && anon_vma->root) {
+				rb_root = anon_vma->root->rb_root;
+			}
+		}

  		anon_vma_lock_read(anon_vma);
-		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+		anon_vma_interval_tree_foreach(vmac, &rb_root,
  					       0, ULONG_MAX) {
  			vma = vmac->vma;
  			if (rmap_item->address < vma->vm_start ||
@@ -1999,6 +2011,11 @@ again:
  	}
  	if (!search_new_forks++)
  		goto again;
+
+	if (!search_from_root++) {
+		search_new_forks = 0;
+		goto again;
+	}
  out:
  	return referenced;
  }
@@ -2010,6 +2027,7 @@ int try_to_unmap_ksm(struct page *page, enum 
ttu_flags flags,
  	struct rmap_item *rmap_item;
  	int ret = SWAP_AGAIN;
  	int search_new_forks = 0;
+	int search_from_root = 0;

  	VM_BUG_ON(!PageKsm(page));
  	VM_BUG_ON(!PageLocked(page));
@@ -2028,9 +2046,20 @@ again:
  		struct anon_vma *anon_vma = rmap_item->anon_vma;
  		struct anon_vma_chain *vmac;
  		struct vm_area_struct *vma;
+		struct rb_root rb_root;
+
+		if (!search_from_root) {
+			if (anon_vma)
+				rb_root = anon_vma->rb_root;
+		}
+		else {
+			if (anon_vma && anon_vma->root) {
+				rb_root = anon_vma->root->rb_root;
+			}
+		}

  		anon_vma_lock_read(anon_vma);
-		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+		anon_vma_interval_tree_foreach(vmac, &rb_root,
  					       0, ULONG_MAX) {
  			vma = vmac->vma;
  			if (rmap_item->address < vma->vm_start ||
@@ -2056,6 +2085,11 @@ again:
  	}
  	if (!search_new_forks++)
  		goto again;
+
+	if(!search_from_root++) {
+		search_new_forks = 0;
+		goto again;
+	}
  out:
  	return ret;
  }
@@ -2068,6 +2102,7 @@ int rmap_walk_ksm(struct page *page, int 
(*rmap_one)(struct page *,
  	struct rmap_item *rmap_item;
  	int ret = SWAP_AGAIN;
  	int search_new_forks = 0;
+	int search_from_root = 0;

  	VM_BUG_ON(!PageKsm(page));
  	VM_BUG_ON(!PageLocked(page));
@@ -2080,9 +2115,21 @@ again:
  		struct anon_vma *anon_vma = rmap_item->anon_vma;
  		struct anon_vma_chain *vmac;
  		struct vm_area_struct *vma;
+		struct rb_root rb_root;
+
+		if (!search_from_root) {
+			if (anon_vma)
+				rb_root = anon_vma->rb_root;
+		}
+		else {
+			if (anon_vma && anon_vma->root) {
+				rb_root = anon_vma->root->rb_root;
+			}
+		}
+

  		anon_vma_lock_read(anon_vma);
-		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+		anon_vma_interval_tree_foreach(vmac, &rb_root,
  					       0, ULONG_MAX) {
  			vma = vmac->vma;
  			if (rmap_item->address < vma->vm_start ||
@@ -2107,6 +2154,11 @@ again:
  	}
  	if (!search_new_forks++)
  		goto again;
+
+	if (!search_from_root++) {
+		search_new_forks = 0;
+		goto again;
+	}
  out:
  	return ret;
  }
-- 
1.8.2.1



-- 
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

-- 
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
@ 2015-06-09 18:26           ` Susheel Khiani
  0 siblings, 0 replies; 14+ messages in thread
From: Susheel Khiani @ 2015-06-09 18:26 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel

On 4/30/2015 11:37 AM, Susheel Khiani wrote:
>> But if I've misunderstood, and you think that what you're seeing
>> fits with the transient forking bugs I've (not quite) described,
>> and you can explain why even the transient case is important for
>> you to have fixed, then I really ought to redouble my efforts.
>>
>> Hugh

I was able to root cause the issue as we got few instances of same and 
was frequently getting reproducible on stress tests. The reason why it 
was important was because failure to unmap ksm page was resulting into 
CMA allocation failure for us.

For cases like fork, what we observed is for private mapped file pages, 
stable_node pointed by KSM page won't cover all the mappings until ksmd 
completes one full scan. Only after ksmd scan, new rmap_items pointing 
to mappings in child process would come into existence. So in cases like 
CMA allocations where we can't wait for ksmd to complete one full cycle, 
we can traverse anon_vma tree from parent's anon_vma to find out all the 
pages wheres CMA is mapped.

I have tested the following patch on 3.10 kernel and with this change I 
am able to avoid CMA allocation failure which we were otherwise 
frequently seeing because of not able to unmap KSM page.

Please review and let me know the feedback.



[PATCH] ksm: Traverse through parent's anon_vma while unmapping

While doing try_to_unmap_ksm, we traverse through
rmap_item list to find out all the anon_vmas from which
page needs to be unmapped.

Now as per the design of KSM, it builds up its data
structures by looking into each mm, and comes back a cycle
later to find out which data structures are now outdated and
needs to be updated. So, for cases like fork, what we
observe is for private mapped file pages stable_node
pointed by KSM page won't cover all the mappings until
ksmd completes one full scan. Only after ksmd scan, new
rmap_items pointing to mappings in child process would come
into existence.

As a result unmapping of a stable page can't be done until
ksmd has completed one full scan. This becomes an issue in
case of CMA where we need to unmap and move a CMA page and
can't wait for ksmd to complete one cycle. Because of
new rmap_items for new mapping still not created we won't be
able to unmap CMA page from all the vmas where it is mapped.
This would result in frequent CMA allocation failures.

So instead of just relying on rmap_items list which we know
can contain incomplete list, we also scan anon_vma tree from
parent's anon_vma to find out all the vmas where CMA page is
mapped and thereby successfully unmap the page and move it
to new page.

Change-Id: I97cacf6a73734b10c7098362c20fb3f2d4040c76
Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>
---
  mm/ksm.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
  1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 11f6293..10d5266 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1956,6 +1956,7 @@ int page_referenced_ksm(struct page *page, struct 
mem_cgroup *memcg,
  	unsigned int mapcount = page_mapcount(page);
  	int referenced = 0;
  	int search_new_forks = 0;
+	int search_from_root = 0;

  	VM_BUG_ON(!PageKsm(page));
  	VM_BUG_ON(!PageLocked(page));
@@ -1968,9 +1969,20 @@ again:
  		struct anon_vma *anon_vma = rmap_item->anon_vma;
  		struct anon_vma_chain *vmac;
  		struct vm_area_struct *vma;
+		struct rb_root rb_root;
+
+		if (!search_from_root) {
+			if (anon_vma)
+				rb_root = anon_vma->rb_root;
+		}
+		else {
+			if (anon_vma && anon_vma->root) {
+				rb_root = anon_vma->root->rb_root;
+			}
+		}

  		anon_vma_lock_read(anon_vma);
-		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+		anon_vma_interval_tree_foreach(vmac, &rb_root,
  					       0, ULONG_MAX) {
  			vma = vmac->vma;
  			if (rmap_item->address < vma->vm_start ||
@@ -1999,6 +2011,11 @@ again:
  	}
  	if (!search_new_forks++)
  		goto again;
+
+	if (!search_from_root++) {
+		search_new_forks = 0;
+		goto again;
+	}
  out:
  	return referenced;
  }
@@ -2010,6 +2027,7 @@ int try_to_unmap_ksm(struct page *page, enum 
ttu_flags flags,
  	struct rmap_item *rmap_item;
  	int ret = SWAP_AGAIN;
  	int search_new_forks = 0;
+	int search_from_root = 0;

  	VM_BUG_ON(!PageKsm(page));
  	VM_BUG_ON(!PageLocked(page));
@@ -2028,9 +2046,20 @@ again:
  		struct anon_vma *anon_vma = rmap_item->anon_vma;
  		struct anon_vma_chain *vmac;
  		struct vm_area_struct *vma;
+		struct rb_root rb_root;
+
+		if (!search_from_root) {
+			if (anon_vma)
+				rb_root = anon_vma->rb_root;
+		}
+		else {
+			if (anon_vma && anon_vma->root) {
+				rb_root = anon_vma->root->rb_root;
+			}
+		}

  		anon_vma_lock_read(anon_vma);
-		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+		anon_vma_interval_tree_foreach(vmac, &rb_root,
  					       0, ULONG_MAX) {
  			vma = vmac->vma;
  			if (rmap_item->address < vma->vm_start ||
@@ -2056,6 +2085,11 @@ again:
  	}
  	if (!search_new_forks++)
  		goto again;
+
+	if(!search_from_root++) {
+		search_new_forks = 0;
+		goto again;
+	}
  out:
  	return ret;
  }
@@ -2068,6 +2102,7 @@ int rmap_walk_ksm(struct page *page, int 
(*rmap_one)(struct page *,
  	struct rmap_item *rmap_item;
  	int ret = SWAP_AGAIN;
  	int search_new_forks = 0;
+	int search_from_root = 0;

  	VM_BUG_ON(!PageKsm(page));
  	VM_BUG_ON(!PageLocked(page));
@@ -2080,9 +2115,21 @@ again:
  		struct anon_vma *anon_vma = rmap_item->anon_vma;
  		struct anon_vma_chain *vmac;
  		struct vm_area_struct *vma;
+		struct rb_root rb_root;
+
+		if (!search_from_root) {
+			if (anon_vma)
+				rb_root = anon_vma->rb_root;
+		}
+		else {
+			if (anon_vma && anon_vma->root) {
+				rb_root = anon_vma->root->rb_root;
+			}
+		}
+

  		anon_vma_lock_read(anon_vma);
-		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+		anon_vma_interval_tree_foreach(vmac, &rb_root,
  					       0, ULONG_MAX) {
  			vma = vmac->vma;
  			if (rmap_item->address < vma->vm_start ||
@@ -2107,6 +2154,11 @@ again:
  	}
  	if (!search_new_forks++)
  		goto again;
+
+	if (!search_from_root++) {
+		search_new_forks = 0;
+		goto again;
+	}
  out:
  	return ret;
  }
-- 
1.8.2.1



-- 
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

-- 
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
  2015-06-09 18:26           ` Susheel Khiani
@ 2015-06-22  5:19             ` Susheel Khiani
  -1 siblings, 0 replies; 14+ messages in thread
From: Susheel Khiani @ 2015-06-22  5:19 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel

On 6/9/2015 11:56 PM, Susheel Khiani wrote:
> On 4/30/2015 11:37 AM, Susheel Khiani wrote:
>>> But if I've misunderstood, and you think that what you're seeing
>>> fits with the transient forking bugs I've (not quite) described,
>>> and you can explain why even the transient case is important for
>>> you to have fixed, then I really ought to redouble my efforts.
>>>
>>> Hugh
>
> I was able to root cause the issue as we got few instances of same and
> was frequently getting reproducible on stress tests. The reason why it
> was important was because failure to unmap ksm page was resulting into
> CMA allocation failure for us.
>
> For cases like fork, what we observed is for private mapped file pages,
> stable_node pointed by KSM page won't cover all the mappings until ksmd
> completes one full scan. Only after ksmd scan, new rmap_items pointing
> to mappings in child process would come into existence. So in cases like
> CMA allocations where we can't wait for ksmd to complete one full cycle,
> we can traverse anon_vma tree from parent's anon_vma to find out all the
> pages wheres CMA is mapped.
>
> I have tested the following patch on 3.10 kernel and with this change I
> am able to avoid CMA allocation failure which we were otherwise
> frequently seeing because of not able to unmap KSM page.
>
> Please review and let me know the feedback.
>
>
>
> [PATCH] ksm: Traverse through parent's anon_vma while unmapping
>
> While doing try_to_unmap_ksm, we traverse through
> rmap_item list to find out all the anon_vmas from which
> page needs to be unmapped.
>
> Now as per the design of KSM, it builds up its data
> structures by looking into each mm, and comes back a cycle
> later to find out which data structures are now outdated and
> needs to be updated. So, for cases like fork, what we
> observe is for private mapped file pages stable_node
> pointed by KSM page won't cover all the mappings until
> ksmd completes one full scan. Only after ksmd scan, new
> rmap_items pointing to mappings in child process would come
> into existence.
>
> As a result unmapping of a stable page can't be done until
> ksmd has completed one full scan. This becomes an issue in
> case of CMA where we need to unmap and move a CMA page and
> can't wait for ksmd to complete one cycle. Because of
> new rmap_items for new mapping still not created we won't be
> able to unmap CMA page from all the vmas where it is mapped.
> This would result in frequent CMA allocation failures.
>
> So instead of just relying on rmap_items list which we know
> can contain incomplete list, we also scan anon_vma tree from
> parent's anon_vma to find out all the vmas where CMA page is
> mapped and thereby successfully unmap the page and move it
> to new page.
>
> Change-Id: I97cacf6a73734b10c7098362c20fb3f2d4040c76
> Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>
> ---
>   mm/ksm.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 11f6293..10d5266 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1956,6 +1956,7 @@ int page_referenced_ksm(struct page *page, struct
> mem_cgroup *memcg,
>       unsigned int mapcount = page_mapcount(page);
>       int referenced = 0;
>       int search_new_forks = 0;
> +    int search_from_root = 0;
>
>       VM_BUG_ON(!PageKsm(page));
>       VM_BUG_ON(!PageLocked(page));
> @@ -1968,9 +1969,20 @@ again:
>           struct anon_vma *anon_vma = rmap_item->anon_vma;
>           struct anon_vma_chain *vmac;
>           struct vm_area_struct *vma;
> +        struct rb_root rb_root;
> +
> +        if (!search_from_root) {
> +            if (anon_vma)
> +                rb_root = anon_vma->rb_root;
> +        }
> +        else {
> +            if (anon_vma && anon_vma->root) {
> +                rb_root = anon_vma->root->rb_root;
> +            }
> +        }
>
>           anon_vma_lock_read(anon_vma);
> -        anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> +        anon_vma_interval_tree_foreach(vmac, &rb_root,
>                              0, ULONG_MAX) {
>               vma = vmac->vma;
>               if (rmap_item->address < vma->vm_start ||
> @@ -1999,6 +2011,11 @@ again:
>       }
>       if (!search_new_forks++)
>           goto again;
> +
> +    if (!search_from_root++) {
> +        search_new_forks = 0;
> +        goto again;
> +    }
>   out:
>       return referenced;
>   }
> @@ -2010,6 +2027,7 @@ int try_to_unmap_ksm(struct page *page, enum
> ttu_flags flags,
>       struct rmap_item *rmap_item;
>       int ret = SWAP_AGAIN;
>       int search_new_forks = 0;
> +    int search_from_root = 0;
>
>       VM_BUG_ON(!PageKsm(page));
>       VM_BUG_ON(!PageLocked(page));
> @@ -2028,9 +2046,20 @@ again:
>           struct anon_vma *anon_vma = rmap_item->anon_vma;
>           struct anon_vma_chain *vmac;
>           struct vm_area_struct *vma;
> +        struct rb_root rb_root;
> +
> +        if (!search_from_root) {
> +            if (anon_vma)
> +                rb_root = anon_vma->rb_root;
> +        }
> +        else {
> +            if (anon_vma && anon_vma->root) {
> +                rb_root = anon_vma->root->rb_root;
> +            }
> +        }
>
>           anon_vma_lock_read(anon_vma);
> -        anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> +        anon_vma_interval_tree_foreach(vmac, &rb_root,
>                              0, ULONG_MAX) {
>               vma = vmac->vma;
>               if (rmap_item->address < vma->vm_start ||
> @@ -2056,6 +2085,11 @@ again:
>       }
>       if (!search_new_forks++)
>           goto again;
> +
> +    if(!search_from_root++) {
> +        search_new_forks = 0;
> +        goto again;
> +    }
>   out:
>       return ret;
>   }
> @@ -2068,6 +2102,7 @@ int rmap_walk_ksm(struct page *page, int
> (*rmap_one)(struct page *,
>       struct rmap_item *rmap_item;
>       int ret = SWAP_AGAIN;
>       int search_new_forks = 0;
> +    int search_from_root = 0;
>
>       VM_BUG_ON(!PageKsm(page));
>       VM_BUG_ON(!PageLocked(page));
> @@ -2080,9 +2115,21 @@ again:
>           struct anon_vma *anon_vma = rmap_item->anon_vma;
>           struct anon_vma_chain *vmac;
>           struct vm_area_struct *vma;
> +        struct rb_root rb_root;
> +
> +        if (!search_from_root) {
> +            if (anon_vma)
> +                rb_root = anon_vma->rb_root;
> +        }
> +        else {
> +            if (anon_vma && anon_vma->root) {
> +                rb_root = anon_vma->root->rb_root;
> +            }
> +        }
> +
>
>           anon_vma_lock_read(anon_vma);
> -        anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> +        anon_vma_interval_tree_foreach(vmac, &rb_root,
>                              0, ULONG_MAX) {
>               vma = vmac->vma;
>               if (rmap_item->address < vma->vm_start ||
> @@ -2107,6 +2154,11 @@ again:
>       }
>       if (!search_new_forks++)
>           goto again;
> +
> +    if (!search_from_root++) {
> +        search_new_forks = 0;
> +        goto again;
> +    }
>   out:
>       return ret;
>   }

Reminder Ping, did you get a chance to look into
the previous mail

-- 
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Question] ksm: rmap_item pointing to some stale vmas
@ 2015-06-22  5:19             ` Susheel Khiani
  0 siblings, 0 replies; 14+ messages in thread
From: Susheel Khiani @ 2015-06-22  5:19 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel

On 6/9/2015 11:56 PM, Susheel Khiani wrote:
> On 4/30/2015 11:37 AM, Susheel Khiani wrote:
>>> But if I've misunderstood, and you think that what you're seeing
>>> fits with the transient forking bugs I've (not quite) described,
>>> and you can explain why even the transient case is important for
>>> you to have fixed, then I really ought to redouble my efforts.
>>>
>>> Hugh
>
> I was able to root cause the issue as we got few instances of same and
> was frequently getting reproducible on stress tests. The reason why it
> was important was because failure to unmap ksm page was resulting into
> CMA allocation failure for us.
>
> For cases like fork, what we observed is for private mapped file pages,
> stable_node pointed by KSM page won't cover all the mappings until ksmd
> completes one full scan. Only after ksmd scan, new rmap_items pointing
> to mappings in child process would come into existence. So in cases like
> CMA allocations where we can't wait for ksmd to complete one full cycle,
> we can traverse anon_vma tree from parent's anon_vma to find out all the
> pages wheres CMA is mapped.
>
> I have tested the following patch on 3.10 kernel and with this change I
> am able to avoid CMA allocation failure which we were otherwise
> frequently seeing because of not able to unmap KSM page.
>
> Please review and let me know the feedback.
>
>
>
> [PATCH] ksm: Traverse through parent's anon_vma while unmapping
>
> While doing try_to_unmap_ksm, we traverse through
> rmap_item list to find out all the anon_vmas from which
> page needs to be unmapped.
>
> Now as per the design of KSM, it builds up its data
> structures by looking into each mm, and comes back a cycle
> later to find out which data structures are now outdated and
> needs to be updated. So, for cases like fork, what we
> observe is for private mapped file pages stable_node
> pointed by KSM page won't cover all the mappings until
> ksmd completes one full scan. Only after ksmd scan, new
> rmap_items pointing to mappings in child process would come
> into existence.
>
> As a result unmapping of a stable page can't be done until
> ksmd has completed one full scan. This becomes an issue in
> case of CMA where we need to unmap and move a CMA page and
> can't wait for ksmd to complete one cycle. Because of
> new rmap_items for new mapping still not created we won't be
> able to unmap CMA page from all the vmas where it is mapped.
> This would result in frequent CMA allocation failures.
>
> So instead of just relying on rmap_items list which we know
> can contain incomplete list, we also scan anon_vma tree from
> parent's anon_vma to find out all the vmas where CMA page is
> mapped and thereby successfully unmap the page and move it
> to new page.
>
> Change-Id: I97cacf6a73734b10c7098362c20fb3f2d4040c76
> Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>
> ---
>   mm/ksm.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 11f6293..10d5266 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1956,6 +1956,7 @@ int page_referenced_ksm(struct page *page, struct
> mem_cgroup *memcg,
>       unsigned int mapcount = page_mapcount(page);
>       int referenced = 0;
>       int search_new_forks = 0;
> +    int search_from_root = 0;
>
>       VM_BUG_ON(!PageKsm(page));
>       VM_BUG_ON(!PageLocked(page));
> @@ -1968,9 +1969,20 @@ again:
>           struct anon_vma *anon_vma = rmap_item->anon_vma;
>           struct anon_vma_chain *vmac;
>           struct vm_area_struct *vma;
> +        struct rb_root rb_root;
> +
> +        if (!search_from_root) {
> +            if (anon_vma)
> +                rb_root = anon_vma->rb_root;
> +        }
> +        else {
> +            if (anon_vma && anon_vma->root) {
> +                rb_root = anon_vma->root->rb_root;
> +            }
> +        }
>
>           anon_vma_lock_read(anon_vma);
> -        anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> +        anon_vma_interval_tree_foreach(vmac, &rb_root,
>                              0, ULONG_MAX) {
>               vma = vmac->vma;
>               if (rmap_item->address < vma->vm_start ||
> @@ -1999,6 +2011,11 @@ again:
>       }
>       if (!search_new_forks++)
>           goto again;
> +
> +    if (!search_from_root++) {
> +        search_new_forks = 0;
> +        goto again;
> +    }
>   out:
>       return referenced;
>   }
> @@ -2010,6 +2027,7 @@ int try_to_unmap_ksm(struct page *page, enum
> ttu_flags flags,
>       struct rmap_item *rmap_item;
>       int ret = SWAP_AGAIN;
>       int search_new_forks = 0;
> +    int search_from_root = 0;
>
>       VM_BUG_ON(!PageKsm(page));
>       VM_BUG_ON(!PageLocked(page));
> @@ -2028,9 +2046,20 @@ again:
>           struct anon_vma *anon_vma = rmap_item->anon_vma;
>           struct anon_vma_chain *vmac;
>           struct vm_area_struct *vma;
> +        struct rb_root rb_root;
> +
> +        if (!search_from_root) {
> +            if (anon_vma)
> +                rb_root = anon_vma->rb_root;
> +        }
> +        else {
> +            if (anon_vma && anon_vma->root) {
> +                rb_root = anon_vma->root->rb_root;
> +            }
> +        }
>
>           anon_vma_lock_read(anon_vma);
> -        anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> +        anon_vma_interval_tree_foreach(vmac, &rb_root,
>                              0, ULONG_MAX) {
>               vma = vmac->vma;
>               if (rmap_item->address < vma->vm_start ||
> @@ -2056,6 +2085,11 @@ again:
>       }
>       if (!search_new_forks++)
>           goto again;
> +
> +    if(!search_from_root++) {
> +        search_new_forks = 0;
> +        goto again;
> +    }
>   out:
>       return ret;
>   }
> @@ -2068,6 +2102,7 @@ int rmap_walk_ksm(struct page *page, int
> (*rmap_one)(struct page *,
>       struct rmap_item *rmap_item;
>       int ret = SWAP_AGAIN;
>       int search_new_forks = 0;
> +    int search_from_root = 0;
>
>       VM_BUG_ON(!PageKsm(page));
>       VM_BUG_ON(!PageLocked(page));
> @@ -2080,9 +2115,21 @@ again:
>           struct anon_vma *anon_vma = rmap_item->anon_vma;
>           struct anon_vma_chain *vmac;
>           struct vm_area_struct *vma;
> +        struct rb_root rb_root;
> +
> +        if (!search_from_root) {
> +            if (anon_vma)
> +                rb_root = anon_vma->rb_root;
> +        }
> +        else {
> +            if (anon_vma && anon_vma->root) {
> +                rb_root = anon_vma->root->rb_root;
> +            }
> +        }
> +
>
>           anon_vma_lock_read(anon_vma);
> -        anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> +        anon_vma_interval_tree_foreach(vmac, &rb_root,
>                              0, ULONG_MAX) {
>               vma = vmac->vma;
>               if (rmap_item->address < vma->vm_start ||
> @@ -2107,6 +2154,11 @@ again:
>       }
>       if (!search_new_forks++)
>           goto again;
> +
> +    if (!search_from_root++) {
> +        search_new_forks = 0;
> +        goto again;
> +    }
>   out:
>       return ret;
>   }

Reminder Ping, did you get a chance to look into
the previous mail

-- 
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-06-22  5:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-09 14:05 [Question] ksm: rmap_item pointing to some stale vmas Susheel Khiani
2015-04-09 14:05 ` Susheel Khiani
2015-04-10 17:56 ` Hugh Dickins
2015-04-10 17:56   ` Hugh Dickins
2015-04-14  7:01   ` Susheel Khiani
2015-04-14  7:01     ` Susheel Khiani
2015-04-15  6:22     ` Hugh Dickins
2015-04-15  6:22       ` Hugh Dickins
2015-04-30  6:07       ` Susheel Khiani
2015-04-30  6:07         ` Susheel Khiani
2015-06-09 18:26         ` Susheel Khiani
2015-06-09 18:26           ` Susheel Khiani
2015-06-22  5:19           ` Susheel Khiani
2015-06-22  5:19             ` Susheel Khiani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.