* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-12 20:16 ` Minchan Kim
@ 2020-03-12 20:26 ` Dave Hansen
2020-03-12 20:41 ` Michal Hocko
` (2 subsequent siblings)
3 siblings, 0 replies; 34+ messages in thread
From: Dave Hansen @ 2020-03-12 20:26 UTC (permalink / raw)
To: Minchan Kim, Michal Hocko
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione,
Joel Fernandes (Google),
Andrew Morton
On 3/12/20 1:16 PM, Minchan Kim wrote:
> On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
> I't likde to wait Jann's reply since Dave gave his opinion about the vulnerability.
> https://lore.kernel.org/linux-mm/cf95db88-968d-fee5-1c15-10d024c09d8a@intel.com/
> Jann, could you give your insigh about that practically it's possible?
FWIW, just checking for mapcount>=1 seems like a pretty sane fix to me.
I went looking at doing it another way, but Michal was quite correct.
We'd probably end up having to special-case something underneath
shrink_page_list().
> A real dumb question to understand vulnerability:
>
> The attacker would be able to trigger heavy memory consumption so that he
> could make paging them out without MADV_PAGEOUT. I know MADV_PAGEOUT makes
> it easier but he still could do without MADV_PAGEOUT.
> What makes difference here?
Causing memory pressure is quite a bit more disruptive than
MADV_PAGEOUT. It's a much more blunt instrument and is likely to result
in a lot of collateral damage and a lot of I/O.
MADV_PAGEOUT is *surgical*. You can target one very specific page if,
for instance, you think that your victim is reading it in a way that is
vulnerable. You can also do it with zero I/O (after the initial pageout).
> To clarify how MADV_PAGEWORK works:
> If other process has accessed the page so that his page table has access
> bit marked, MADV_PAGEOUT couldn't page it out.
The attacker doesn't need to get the victim to get a major fault, it
just needs to induce *a* fault. I actually did an experiment to see how
this would work in practice.
1. Allocate some memory(), touch it
2. fork()
3. In the parent: Loop reading the memory
4. In the child: loop running MADV_PAGEOUT
The pages stayed in the swap cache and the parent reading the memory saw
a constant stream of faults.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-12 20:16 ` Minchan Kim
2020-03-12 20:26 ` Dave Hansen
@ 2020-03-12 20:41 ` Michal Hocko
2020-03-13 2:08 ` Minchan Kim
2020-03-12 21:41 ` Dave Hansen
2020-03-12 23:29 ` Jann Horn
3 siblings, 1 reply; 34+ messages in thread
From: Michal Hocko @ 2020-03-12 20:41 UTC (permalink / raw)
To: Minchan Kim
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione, Dave Hansen,
Joel Fernandes (Google),
Andrew Morton
On Thu 12-03-20 13:16:02, Minchan Kim wrote:
> On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
[...]
> > From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 12 Mar 2020 09:04:35 +0100
> > Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
> >
> > Jann has brought up a very interesting point [1]. While shared pages are
> > excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
> > that way. This can lead to all sorts of hard to debug problems. E.g.
> > performance problems outlined by Daniel [2]. There are runtime
> > environments where there is a substantial memory shared among security
> > domains via CoW memory and a easy to reclaim way of that memory, which
> > MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
> > in for the parent process which might be more privileged or even open
> > side channel attacks. The feasibility of the later is not really clear
>
> I am not sure it's a good idea to mention performance stuff because
> it's rather arguble. You and Johannes already pointed it out when I sbumit
> early draft which had shared page filtering out logic due to performance
> reason. You guys suggested the shared pages has higher chance to be touched
> so that if it's really hot pages, that whould keep in the memory. I agree.
Yes, the hot memory is likely to be referenced but the point was an
unexpected latency because of the major fault. I have to say that I have
underestimated the issue because I was not aware of runtimes mentioned
in the referenced links. Essentially a lot of CoW memory shared over
security domains.
> I think the only reason at this moment is just vulnerability.
>
> > to me TBH but there is no real reason for exposure at this stage. It
> > seems there is no real use case to depend on reclaiming CoW memory via
> > madvise at this stage so it is much easier to simply disallow it and
> > this is what this patch does. Put it simply MADV_{PAGEOUT,COLD} can
> > operate only on the exclusively owned memory which is a straightforward
> > semantic.
> >
> > [1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com
> > [2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> > mm/madvise.c | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 43b47d3fae02..4bb30ed6c8d2 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -335,12 +335,14 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> > }
> >
> > page = pmd_page(orig_pmd);
> > +
> > + /* Do not interfere with other mappings of this page */
>
>
> How about this?
> /*
> * paging out only single mapped private pages for anonymous mapping,
> * otherwise, it opens a side channel.
> */
I am not sure this is much more helpful without a larger context. I
would stick with the wording unless you insist.
> Otherwise, looks good to me.
Thanks for the review.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-12 20:41 ` Michal Hocko
@ 2020-03-13 2:08 ` Minchan Kim
2020-03-13 8:05 ` Michal Hocko
0 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2020-03-13 2:08 UTC (permalink / raw)
To: Michal Hocko
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione, Dave Hansen,
Joel Fernandes (Google),
Andrew Morton
On Thu, Mar 12, 2020 at 09:41:55PM +0100, Michal Hocko wrote:
> On Thu 12-03-20 13:16:02, Minchan Kim wrote:
> > On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
> [...]
> > > From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Thu, 12 Mar 2020 09:04:35 +0100
> > > Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
> > >
> > > Jann has brought up a very interesting point [1]. While shared pages are
> > > excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
> > > that way. This can lead to all sorts of hard to debug problems. E.g.
> > > performance problems outlined by Daniel [2]. There are runtime
> > > environments where there is a substantial memory shared among security
> > > domains via CoW memory and a easy to reclaim way of that memory, which
> > > MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
> > > in for the parent process which might be more privileged or even open
> > > side channel attacks. The feasibility of the later is not really clear
> >
> > I am not sure it's a good idea to mention performance stuff because
> > it's rather arguble. You and Johannes already pointed it out when I sbumit
> > early draft which had shared page filtering out logic due to performance
> > reason. You guys suggested the shared pages has higher chance to be touched
> > so that if it's really hot pages, that whould keep in the memory. I agree.
>
> Yes, the hot memory is likely to be referenced but the point was an
> unexpected latency because of the major fault. I have to say that I have
I don't understand your point here. If it's likely to be referenced
among several processes, it doesn't have the major fault latency.
What's your point here?
> underestimated the issue because I was not aware of runtimes mentioned
> in the referenced links. Essentially a lot of CoW memory shared over
> security domains.
I tend to agree about security part in the description, but still don't
agree with performance concern in the description so I'd like to remove
it in the description. Current situation is caused by security concern
unfortunately, not performance reason.
>
> > I think the only reason at this moment is just vulnerability.
> >
> > > to me TBH but there is no real reason for exposure at this stage. It
> > > seems there is no real use case to depend on reclaiming CoW memory via
> > > madvise at this stage so it is much easier to simply disallow it and
> > > this is what this patch does. Put it simply MADV_{PAGEOUT,COLD} can
> > > operate only on the exclusively owned memory which is a straightforward
> > > semantic.
> > >
> > > [1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com
> > > [2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com
> > >
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > ---
> > > mm/madvise.c | 12 +++++++++---
> > > 1 file changed, 9 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index 43b47d3fae02..4bb30ed6c8d2 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -335,12 +335,14 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> > > }
> > >
> > > page = pmd_page(orig_pmd);
> > > +
> > > + /* Do not interfere with other mappings of this page */
> >
> >
> > How about this?
> > /*
> > * paging out only single mapped private pages for anonymous mapping,
> > * otherwise, it opens a side channel.
> > */
>
> I am not sure this is much more helpful without a larger context. I
> would stick with the wording unless you insist.
The comment you provides explain what code does, not *why*.
Comment is always lack of explaining the whole story but we try to
demonstrate a certain context to make reader careful.
Havind said, I will not insist on it.
Thanks.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-13 2:08 ` Minchan Kim
@ 2020-03-13 8:05 ` Michal Hocko
2020-03-13 20:59 ` Minchan Kim
0 siblings, 1 reply; 34+ messages in thread
From: Michal Hocko @ 2020-03-13 8:05 UTC (permalink / raw)
To: Minchan Kim
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione, Dave Hansen,
Joel Fernandes (Google),
Andrew Morton
On Thu 12-03-20 19:08:51, Minchan Kim wrote:
> On Thu, Mar 12, 2020 at 09:41:55PM +0100, Michal Hocko wrote:
> > On Thu 12-03-20 13:16:02, Minchan Kim wrote:
> > > On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
> > [...]
> > > > From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > Date: Thu, 12 Mar 2020 09:04:35 +0100
> > > > Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
> > > >
> > > > Jann has brought up a very interesting point [1]. While shared pages are
> > > > excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
> > > > that way. This can lead to all sorts of hard to debug problems. E.g.
> > > > performance problems outlined by Daniel [2]. There are runtime
> > > > environments where there is a substantial memory shared among security
> > > > domains via CoW memory and a easy to reclaim way of that memory, which
> > > > MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
> > > > in for the parent process which might be more privileged or even open
> > > > side channel attacks. The feasibility of the later is not really clear
> > >
> > > I am not sure it's a good idea to mention performance stuff because
> > > it's rather arguble. You and Johannes already pointed it out when I sbumit
> > > early draft which had shared page filtering out logic due to performance
> > > reason. You guys suggested the shared pages has higher chance to be touched
> > > so that if it's really hot pages, that whould keep in the memory. I agree.
> >
> > Yes, the hot memory is likely to be referenced but the point was an
> > unexpected latency because of the major fault. I have to say that I have
>
> I don't understand your point here. If it's likely to be referenced
> among several processes, it doesn't have the major fault latency.
> What's your point here?
a) the particular CoW page might be cold enough to be reclaimed and b)
nothing really prevents the MADV_PAGEOUT to be called faster than the
reference bit being readded.
> > underestimated the issue because I was not aware of runtimes mentioned
> > in the referenced links. Essentially a lot of CoW memory shared over
> > security domains.
>
> I tend to agree about security part in the description, but still don't
> agree with performance concern in the description so I'd like to remove
> it in the description. Current situation is caused by security concern
> unfortunately, not performance reason.
Well, I have to admit that I haven't seen any actual numbers here but
considering zygote like workload I would rather not even risk it. Even
if the risk is theoretical I would rather put the restriction and
mention it in the changelog. If somebody would like to drop this
restriction it is at least more clear what to test for.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-13 8:05 ` Michal Hocko
@ 2020-03-13 20:59 ` Minchan Kim
2020-03-16 9:20 ` Michal Hocko
0 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2020-03-13 20:59 UTC (permalink / raw)
To: Michal Hocko
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione, Dave Hansen,
Joel Fernandes (Google),
Andrew Morton
On Fri, Mar 13, 2020 at 09:05:46AM +0100, Michal Hocko wrote:
> On Thu 12-03-20 19:08:51, Minchan Kim wrote:
> > On Thu, Mar 12, 2020 at 09:41:55PM +0100, Michal Hocko wrote:
> > > On Thu 12-03-20 13:16:02, Minchan Kim wrote:
> > > > On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
> > > [...]
> > > > > From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
> > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > Date: Thu, 12 Mar 2020 09:04:35 +0100
> > > > > Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
> > > > >
> > > > > Jann has brought up a very interesting point [1]. While shared pages are
> > > > > excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
> > > > > that way. This can lead to all sorts of hard to debug problems. E.g.
> > > > > performance problems outlined by Daniel [2]. There are runtime
> > > > > environments where there is a substantial memory shared among security
> > > > > domains via CoW memory and a easy to reclaim way of that memory, which
> > > > > MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
> > > > > in for the parent process which might be more privileged or even open
> > > > > side channel attacks. The feasibility of the later is not really clear
> > > >
> > > > I am not sure it's a good idea to mention performance stuff because
> > > > it's rather arguble. You and Johannes already pointed it out when I sbumit
> > > > early draft which had shared page filtering out logic due to performance
> > > > reason. You guys suggested the shared pages has higher chance to be touched
> > > > so that if it's really hot pages, that whould keep in the memory. I agree.
> > >
> > > Yes, the hot memory is likely to be referenced but the point was an
> > > unexpected latency because of the major fault. I have to say that I have
> >
> > I don't understand your point here. If it's likely to be referenced
> > among several processes, it doesn't have the major fault latency.
> > What's your point here?
>
> a) the particular CoW page might be cold enough to be reclaimed and b)
If it is, that means it's *cold* so it's really worth to be reclaimed.
> nothing really prevents the MADV_PAGEOUT to be called faster than the
> reference bit being readded.
Yeb, that's undesirable. I should admit it was not intended when I implemented
PAGEOUT. The thing is page_check_references clears access bit of pte for every
process are sharing the page so that two times MADV_PAGEOUT from a process could
evict the page. That's the really bug. It shouldn't have cleared the
access bit for other processes. What I wanted to do with MADV_PAGEOUT is just
check the reference from the other processes without clearing access bit of pte
and if it found other process pte has access bit, just bail out.
Okay so you're right. Current implementation could cause performance impact
since MADV_PAGEOUT unconditionally clear the access bit of other processes so
your patch will fix it.
Thanks.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-13 20:59 ` Minchan Kim
@ 2020-03-16 9:20 ` Michal Hocko
2020-03-17 1:43 ` Minchan Kim
0 siblings, 1 reply; 34+ messages in thread
From: Michal Hocko @ 2020-03-16 9:20 UTC (permalink / raw)
To: Minchan Kim
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione, Dave Hansen,
Joel Fernandes (Google),
Andrew Morton
On Fri 13-03-20 13:59:41, Minchan Kim wrote:
> On Fri, Mar 13, 2020 at 09:05:46AM +0100, Michal Hocko wrote:
> > On Thu 12-03-20 19:08:51, Minchan Kim wrote:
> > > On Thu, Mar 12, 2020 at 09:41:55PM +0100, Michal Hocko wrote:
> > > > On Thu 12-03-20 13:16:02, Minchan Kim wrote:
> > > > > On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
> > > > [...]
> > > > > > From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
> > > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > > Date: Thu, 12 Mar 2020 09:04:35 +0100
> > > > > > Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
> > > > > >
> > > > > > Jann has brought up a very interesting point [1]. While shared pages are
> > > > > > excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
> > > > > > that way. This can lead to all sorts of hard to debug problems. E.g.
> > > > > > performance problems outlined by Daniel [2]. There are runtime
> > > > > > environments where there is a substantial memory shared among security
> > > > > > domains via CoW memory and a easy to reclaim way of that memory, which
> > > > > > MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
> > > > > > in for the parent process which might be more privileged or even open
> > > > > > side channel attacks. The feasibility of the later is not really clear
> > > > >
> > > > > I am not sure it's a good idea to mention performance stuff because
> > > > > it's rather arguble. You and Johannes already pointed it out when I sbumit
> > > > > early draft which had shared page filtering out logic due to performance
> > > > > reason. You guys suggested the shared pages has higher chance to be touched
> > > > > so that if it's really hot pages, that whould keep in the memory. I agree.
> > > >
> > > > Yes, the hot memory is likely to be referenced but the point was an
> > > > unexpected latency because of the major fault. I have to say that I have
> > >
> > > I don't understand your point here. If it's likely to be referenced
> > > among several processes, it doesn't have the major fault latency.
> > > What's your point here?
> >
> > a) the particular CoW page might be cold enough to be reclaimed and b)
>
> If it is, that means it's *cold* so it's really worth to be reclaimed.
>
> > nothing really prevents the MADV_PAGEOUT to be called faster than the
> > reference bit being readded.
>
> Yeb, that's undesirable. I should admit it was not intended when I implemented
> PAGEOUT. The thing is page_check_references clears access bit of pte for every
> process are sharing the page so that two times MADV_PAGEOUT from a process could
> evict the page. That's the really bug.
I do not really think this is a bug. This is a side effect of the
reclaim process and we do not really want MADV_{PAGEOUT,COLD} behave
differently here because then the behavior would be even harder to
understand.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-16 9:20 ` Michal Hocko
@ 2020-03-17 1:43 ` Minchan Kim
2020-03-17 7:12 ` Michal Hocko
0 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2020-03-17 1:43 UTC (permalink / raw)
To: Michal Hocko
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione, Dave Hansen,
Joel Fernandes (Google),
Andrew Morton
On Mon, Mar 16, 2020 at 10:20:52AM +0100, Michal Hocko wrote:
> On Fri 13-03-20 13:59:41, Minchan Kim wrote:
> > On Fri, Mar 13, 2020 at 09:05:46AM +0100, Michal Hocko wrote:
> > > On Thu 12-03-20 19:08:51, Minchan Kim wrote:
> > > > On Thu, Mar 12, 2020 at 09:41:55PM +0100, Michal Hocko wrote:
> > > > > On Thu 12-03-20 13:16:02, Minchan Kim wrote:
> > > > > > On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
> > > > > [...]
> > > > > > > From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
> > > > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > > > Date: Thu, 12 Mar 2020 09:04:35 +0100
> > > > > > > Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
> > > > > > >
> > > > > > > Jann has brought up a very interesting point [1]. While shared pages are
> > > > > > > excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
> > > > > > > that way. This can lead to all sorts of hard to debug problems. E.g.
> > > > > > > performance problems outlined by Daniel [2]. There are runtime
> > > > > > > environments where there is a substantial memory shared among security
> > > > > > > domains via CoW memory and a easy to reclaim way of that memory, which
> > > > > > > MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
> > > > > > > in for the parent process which might be more privileged or even open
> > > > > > > side channel attacks. The feasibility of the later is not really clear
> > > > > >
> > > > > > I am not sure it's a good idea to mention performance stuff because
> > > > > > it's rather arguble. You and Johannes already pointed it out when I sbumit
> > > > > > early draft which had shared page filtering out logic due to performance
> > > > > > reason. You guys suggested the shared pages has higher chance to be touched
> > > > > > so that if it's really hot pages, that whould keep in the memory. I agree.
> > > > >
> > > > > Yes, the hot memory is likely to be referenced but the point was an
> > > > > unexpected latency because of the major fault. I have to say that I have
> > > >
> > > > I don't understand your point here. If it's likely to be referenced
> > > > among several processes, it doesn't have the major fault latency.
> > > > What's your point here?
> > >
> > > a) the particular CoW page might be cold enough to be reclaimed and b)
> >
> > If it is, that means it's *cold* so it's really worth to be reclaimed.
> >
> > > nothing really prevents the MADV_PAGEOUT to be called faster than the
> > > reference bit being readded.
> >
> > Yeb, that's undesirable. I should admit it was not intended when I implemented
> > PAGEOUT. The thing is page_check_references clears access bit of pte for every
> > process are sharing the page so that two times MADV_PAGEOUT from a process could
> > evict the page. That's the really bug.
>
> I do not really think this is a bug. This is a side effect of the
> reclaim process and we do not really want MADV_{PAGEOUT,COLD} behave
No, that's the bug since we didn't consider the side effect.
> differently here because then the behavior would be even harder to
No, I do want to have difference because it's per-process hint. IOW,
what he know is for only his context, not others so it shouldn't clean
others' pte. That makes difference between LRU aging and the hint.
> understand.
It's not hard to understand.. MADV_PAGEOUT should consider only his
context since it's per-process hint(Even, he couldn't know others'
context) so it shouldn't bother others.
Actually, Dave's suggestion is correct to fix the issue if there
was no isse with side channel attack. However, due to the attack
issue, page_mapcount could prevent the problem effectively.
That's why I am not against of the patch now since it fixes
the bug as well as vulnerability.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-17 1:43 ` Minchan Kim
@ 2020-03-17 7:12 ` Michal Hocko
2020-03-17 15:00 ` Minchan Kim
0 siblings, 1 reply; 34+ messages in thread
From: Michal Hocko @ 2020-03-17 7:12 UTC (permalink / raw)
To: Minchan Kim
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione, Dave Hansen,
Joel Fernandes (Google),
Andrew Morton
On Mon 16-03-20 18:43:40, Minchan Kim wrote:
> On Mon, Mar 16, 2020 at 10:20:52AM +0100, Michal Hocko wrote:
> > On Fri 13-03-20 13:59:41, Minchan Kim wrote:
> > > On Fri, Mar 13, 2020 at 09:05:46AM +0100, Michal Hocko wrote:
> > > > On Thu 12-03-20 19:08:51, Minchan Kim wrote:
> > > > > On Thu, Mar 12, 2020 at 09:41:55PM +0100, Michal Hocko wrote:
> > > > > > On Thu 12-03-20 13:16:02, Minchan Kim wrote:
> > > > > > > On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
> > > > > > [...]
> > > > > > > > From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
> > > > > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > > > > Date: Thu, 12 Mar 2020 09:04:35 +0100
> > > > > > > > Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
> > > > > > > >
> > > > > > > > Jann has brought up a very interesting point [1]. While shared pages are
> > > > > > > > excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
> > > > > > > > that way. This can lead to all sorts of hard to debug problems. E.g.
> > > > > > > > performance problems outlined by Daniel [2]. There are runtime
> > > > > > > > environments where there is a substantial memory shared among security
> > > > > > > > domains via CoW memory and a easy to reclaim way of that memory, which
> > > > > > > > MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
> > > > > > > > in for the parent process which might be more privileged or even open
> > > > > > > > side channel attacks. The feasibility of the later is not really clear
> > > > > > >
> > > > > > > I am not sure it's a good idea to mention performance stuff because
> > > > > > > it's rather arguble. You and Johannes already pointed it out when I sbumit
> > > > > > > early draft which had shared page filtering out logic due to performance
> > > > > > > reason. You guys suggested the shared pages has higher chance to be touched
> > > > > > > so that if it's really hot pages, that whould keep in the memory. I agree.
> > > > > >
> > > > > > Yes, the hot memory is likely to be referenced but the point was an
> > > > > > unexpected latency because of the major fault. I have to say that I have
> > > > >
> > > > > I don't understand your point here. If it's likely to be referenced
> > > > > among several processes, it doesn't have the major fault latency.
> > > > > What's your point here?
> > > >
> > > > a) the particular CoW page might be cold enough to be reclaimed and b)
> > >
> > > If it is, that means it's *cold* so it's really worth to be reclaimed.
> > >
> > > > nothing really prevents the MADV_PAGEOUT to be called faster than the
> > > > reference bit being readded.
> > >
> > > Yeb, that's undesirable. I should admit it was not intended when I implemented
> > > PAGEOUT. The thing is page_check_references clears access bit of pte for every
> > > process are sharing the page so that two times MADV_PAGEOUT from a process could
> > > evict the page. That's the really bug.
> >
> > I do not really think this is a bug. This is a side effect of the
> > reclaim process and we do not really want MADV_{PAGEOUT,COLD} behave
>
> No, that's the bug since we didn't consider the side effect.
>
> > differently here because then the behavior would be even harder to
>
> No, I do want to have difference because it's per-process hint. IOW,
> what he know is for only his context, not others so it shouldn't clean
> others' pte. That makes difference between LRU aging and the hint.
Just to make it clear, are you really suggesting to special case
page_check_references for madvise path?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-17 7:12 ` Michal Hocko
@ 2020-03-17 15:00 ` Minchan Kim
2020-03-17 15:58 ` Michal Hocko
0 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2020-03-17 15:00 UTC (permalink / raw)
To: Michal Hocko
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione, Dave Hansen,
Joel Fernandes (Google),
Andrew Morton
On Tue, Mar 17, 2020 at 08:12:39AM +0100, Michal Hocko wrote:
> On Mon 16-03-20 18:43:40, Minchan Kim wrote:
> > On Mon, Mar 16, 2020 at 10:20:52AM +0100, Michal Hocko wrote:
> > > On Fri 13-03-20 13:59:41, Minchan Kim wrote:
> > > > On Fri, Mar 13, 2020 at 09:05:46AM +0100, Michal Hocko wrote:
> > > > > On Thu 12-03-20 19:08:51, Minchan Kim wrote:
> > > > > > On Thu, Mar 12, 2020 at 09:41:55PM +0100, Michal Hocko wrote:
> > > > > > > On Thu 12-03-20 13:16:02, Minchan Kim wrote:
> > > > > > > > On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
> > > > > > > [...]
> > > > > > > > > From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
> > > > > > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > > > > > Date: Thu, 12 Mar 2020 09:04:35 +0100
> > > > > > > > > Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
> > > > > > > > >
> > > > > > > > > Jann has brought up a very interesting point [1]. While shared pages are
> > > > > > > > > excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
> > > > > > > > > that way. This can lead to all sorts of hard to debug problems. E.g.
> > > > > > > > > performance problems outlined by Daniel [2]. There are runtime
> > > > > > > > > environments where there is a substantial memory shared among security
> > > > > > > > > domains via CoW memory and a easy to reclaim way of that memory, which
> > > > > > > > > MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
> > > > > > > > > in for the parent process which might be more privileged or even open
> > > > > > > > > side channel attacks. The feasibility of the later is not really clear
> > > > > > > >
> > > > > > > > I am not sure it's a good idea to mention performance stuff because
> > > > > > > > it's rather arguble. You and Johannes already pointed it out when I sbumit
> > > > > > > > early draft which had shared page filtering out logic due to performance
> > > > > > > > reason. You guys suggested the shared pages has higher chance to be touched
> > > > > > > > so that if it's really hot pages, that whould keep in the memory. I agree.
> > > > > > >
> > > > > > > Yes, the hot memory is likely to be referenced but the point was an
> > > > > > > unexpected latency because of the major fault. I have to say that I have
> > > > > >
> > > > > > I don't understand your point here. If it's likely to be referenced
> > > > > > among several processes, it doesn't have the major fault latency.
> > > > > > What's your point here?
> > > > >
> > > > > a) the particular CoW page might be cold enough to be reclaimed and b)
> > > >
> > > > If it is, that means it's *cold* so it's really worth to be reclaimed.
> > > >
> > > > > nothing really prevents the MADV_PAGEOUT to be called faster than the
> > > > > reference bit being readded.
> > > >
> > > > Yeb, that's undesirable. I should admit it was not intended when I implemented
> > > > PAGEOUT. The thing is page_check_references clears access bit of pte for every
> > > > process are sharing the page so that two times MADV_PAGEOUT from a process could
> > > > evict the page. That's the really bug.
> > >
> > > I do not really think this is a bug. This is a side effect of the
> > > reclaim process and we do not really want MADV_{PAGEOUT,COLD} behave
> >
> > No, that's the bug since we didn't consider the side effect.
> >
> > > differently here because then the behavior would be even harder to
> >
> > No, I do want to have difference because it's per-process hint. IOW,
> > what he know is for only his context, not others so it shouldn't clean
> > others' pte. That makes difference between LRU aging and the hint.
>
> Just to make it clear, are you really suggesting to special case
> page_check_references for madvise path?
>
No, (page_mapcount() > 1) checks *effectively* fixes the performance
bug as well as vulnerability issue.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-17 15:00 ` Minchan Kim
@ 2020-03-17 15:58 ` Michal Hocko
2020-03-17 17:20 ` Minchan Kim
0 siblings, 1 reply; 34+ messages in thread
From: Michal Hocko @ 2020-03-17 15:58 UTC (permalink / raw)
To: Minchan Kim
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione, Dave Hansen,
Joel Fernandes (Google),
Andrew Morton
On Tue 17-03-20 08:00:55, Minchan Kim wrote:
> On Tue, Mar 17, 2020 at 08:12:39AM +0100, Michal Hocko wrote:
[...]
> > Just to make it clear, are you really suggesting to special case
> > page_check_references for madvise path?
> >
>
> No, (page_mapcount() > 1) checks *effectively* fixes the performance
> bug as well as vulnerability issue.
Ahh, ok then we are on the same page. You were replying to the part
where I have pointed out that you can control aging by these calls
and your response suggested that this is somehow undesirable behavior or
even a bug.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-17 15:58 ` Michal Hocko
@ 2020-03-17 17:20 ` Minchan Kim
0 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2020-03-17 17:20 UTC (permalink / raw)
To: Michal Hocko
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione, Dave Hansen,
Joel Fernandes (Google),
Andrew Morton
On Tue, Mar 17, 2020 at 04:58:55PM +0100, Michal Hocko wrote:
> On Tue 17-03-20 08:00:55, Minchan Kim wrote:
> > On Tue, Mar 17, 2020 at 08:12:39AM +0100, Michal Hocko wrote:
> [...]
> > > Just to make it clear, are you really suggesting to special case
> > > page_check_references for madvise path?
> > >
> >
> > No, (page_mapcount() > 1) checks *effectively* fixes the performance
> > bug as well as vulnerability issue.
>
> Ahh, ok then we are on the same page. You were replying to the part
> where I have pointed out that you can control aging by these calls
> and your response suggested that this is somehow undesirable behavior or
> even a bug.
Sorry about the confusing.
I want to clarify my speaking.
If we don't have vulnerability issue Jann raised, the performance issue
Daniel pointed should be fixed by introducing a special flag in
page_check_references from madvise path to avoid cleaning of access bit
from other processes's pte. With it, we don't need to limit semantic of
MADV_PAGEOUT as "exclusive page only" so that MADV_PAGEOUT will work
*cold* shared pages as well as exclusive one.
However, since we have the vulnerability issue, *unfortunately*, we need
to make MADV_PAGEOUT's semantic working with only exclusive page.
Thus, page_mapcount check in madvise patch will fix both issues
*effectively*.
Thanks.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-12 20:16 ` Minchan Kim
2020-03-12 20:26 ` Dave Hansen
2020-03-12 20:41 ` Michal Hocko
@ 2020-03-12 21:41 ` Dave Hansen
2020-03-13 2:00 ` Minchan Kim
2020-03-12 23:29 ` Jann Horn
3 siblings, 1 reply; 34+ messages in thread
From: Dave Hansen @ 2020-03-12 21:41 UTC (permalink / raw)
To: Minchan Kim, Michal Hocko
Cc: Jann Horn, Linux-MM, kernel list, Daniel Colascione,
Joel Fernandes (Google),
Andrew Morton
[-- Attachment #1: Type: text/plain, Size: 694 bytes --]
One other fun thing. I have a "victim" thread sitting in a loop doing:
sleep(1)
memcpy(&garbage, buffer, sz);
The "attacker" is doing
madvise(buffer, sz, MADV_PAGEOUT);
in a loop. That, oddly enough doesn't cause the victim to page fault.
But, if I do:
memcpy(&garbage, buffer, sz);
madvise(buffer, sz, MADV_PAGEOUT);
It *does* cause the memory to get paged out. The MADV_PAGEOUT code
actually has a !pte_present() check. It will punt on a PTE if it sees
it. In other words, if a page is in the swap cache but not mapped by a
pte_present() PTE, MADV_PAGEOUT won't touch it.
Shouldn't MADV_PAGEOUT be able to find and reclaim those pages? Patch
attached.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: madv-pageout-find-swap-cache.patch --]
[-- Type: text/x-patch; name="madv-pageout-find-swap-cache.patch", Size: 1692 bytes --]
---
b/mm/madvise.c | 38 +++++++++++++++++++++++++++++++-------
1 file changed, 31 insertions(+), 7 deletions(-)
diff -puN mm/madvise.c~madv-pageout-find-swap-cache mm/madvise.c
--- a/mm/madvise.c~madv-pageout-find-swap-cache 2020-03-12 14:24:45.178775035 -0700
+++ b/mm/madvise.c 2020-03-12 14:35:49.706773378 -0700
@@ -248,6 +248,36 @@ static void force_shm_swapin_readahead(s
#endif /* CONFIG_SWAP */
/*
+ * Given a PTE, find the corresponding 'struct page'. Also handles
+ * non-present swap PTEs.
+ */
+struct page *pte_to_reclaim_page(struct vm_area_struct *vma,
+ unsigned long addr, pte_t ptent)
+{
+ swp_entry_t entry;
+
+ /* Totally empty PTE: */
+ if (pte_none(ptent))
+ return NULL;
+
+ /* A normal, present page is mapped: */
+ if (pte_present(ptent))
+ return vm_normal_page(vma, addr, ptent);
+
+ entry = pte_to_swp_entry(vmf->orig_pte);
+ /* Is it one of the "swap PTEs" that's not really swap? */
+ if (non_swap_entry(entry))
+ return false;
+
+ /*
+ * The PTE was a true swap entry. The page may be in the
+ * swap cache. If so, find it and return it so it may be
+ * reclaimed.
+ */
+ return lookup_swap_cache(entry, vma, addr);
+}
+
+/*
* Schedule all required I/O operations. Do not wait for completion.
*/
static long madvise_willneed(struct vm_area_struct *vma,
@@ -389,13 +419,7 @@ regular_page:
for (; addr < end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
- if (pte_none(ptent))
- continue;
-
- if (!pte_present(ptent))
- continue;
-
- page = vm_normal_page(vma, addr, ptent);
+ page = pte_to_reclaim_page(vma, addr, ptent);
if (!page)
continue;
_
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-12 21:41 ` Dave Hansen
@ 2020-03-13 2:00 ` Minchan Kim
2020-03-13 16:59 ` Dave Hansen
0 siblings, 1 reply; 34+ messages in thread
From: Minchan Kim @ 2020-03-13 2:00 UTC (permalink / raw)
To: Dave Hansen
Cc: Michal Hocko, Jann Horn, Linux-MM, kernel list,
Daniel Colascione, Joel Fernandes (Google),
Andrew Morton
On Thu, Mar 12, 2020 at 02:41:07PM -0700, Dave Hansen wrote:
> One other fun thing. I have a "victim" thread sitting in a loop doing:
>
> sleep(1)
> memcpy(&garbage, buffer, sz);
>
> The "attacker" is doing
>
> madvise(buffer, sz, MADV_PAGEOUT);
>
> in a loop. That, oddly enough doesn't cause the victim to page fault.
> But, if I do:
>
> memcpy(&garbage, buffer, sz);
> madvise(buffer, sz, MADV_PAGEOUT);
>
> It *does* cause the memory to get paged out. The MADV_PAGEOUT code
> actually has a !pte_present() check. It will punt on a PTE if it sees
> it. In other words, if a page is in the swap cache but not mapped by a
> pte_present() PTE, MADV_PAGEOUT won't touch it.
>
> Shouldn't MADV_PAGEOUT be able to find and reclaim those pages? Patch
> attached.
>
>
> ---
>
> b/mm/madvise.c | 38 +++++++++++++++++++++++++++++++-------
> 1 file changed, 31 insertions(+), 7 deletions(-)
>
> diff -puN mm/madvise.c~madv-pageout-find-swap-cache mm/madvise.c
> --- a/mm/madvise.c~madv-pageout-find-swap-cache 2020-03-12 14:24:45.178775035 -0700
> +++ b/mm/madvise.c 2020-03-12 14:35:49.706773378 -0700
> @@ -248,6 +248,36 @@ static void force_shm_swapin_readahead(s
> #endif /* CONFIG_SWAP */
>
> /*
> + * Given a PTE, find the corresponding 'struct page'. Also handles
> + * non-present swap PTEs.
> + */
> +struct page *pte_to_reclaim_page(struct vm_area_struct *vma,
> + unsigned long addr, pte_t ptent)
> +{
> + swp_entry_t entry;
> +
> + /* Totally empty PTE: */
> + if (pte_none(ptent))
> + return NULL;
> +
> + /* A normal, present page is mapped: */
> + if (pte_present(ptent))
> + return vm_normal_page(vma, addr, ptent);
> +
Please check is_swap_pte first.
> + entry = pte_to_swp_entry(vmf->orig_pte);
> + /* Is it one of the "swap PTEs" that's not really swap? */
> + if (non_swap_entry(entry))
> + return false;
> +
> + /*
> + * The PTE was a true swap entry. The page may be in the
> + * swap cache. If so, find it and return it so it may be
> + * reclaimed.
> + */
> + return lookup_swap_cache(entry, vma, addr);
If we go with handling only exclusived owned page for anon,
I think we should apply the rule to swap cache, too.
Do you mind posting it as formal patch?
Thanks for the explain about vulnerability and the patch, Dave!
> +}
> +
> +/*
> * Schedule all required I/O operations. Do not wait for completion.
> */
> static long madvise_willneed(struct vm_area_struct *vma,
> @@ -389,13 +419,7 @@ regular_page:
> for (; addr < end; pte++, addr += PAGE_SIZE) {
> ptent = *pte;
>
> - if (pte_none(ptent))
> - continue;
> -
> - if (!pte_present(ptent))
> - continue;
> -
> - page = vm_normal_page(vma, addr, ptent);
> + page = pte_to_reclaim_page(vma, addr, ptent);
> if (!page)
> continue;
>
> _
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-13 2:00 ` Minchan Kim
@ 2020-03-13 16:59 ` Dave Hansen
2020-03-13 21:13 ` Minchan Kim
0 siblings, 1 reply; 34+ messages in thread
From: Dave Hansen @ 2020-03-13 16:59 UTC (permalink / raw)
To: Minchan Kim
Cc: Michal Hocko, Jann Horn, Linux-MM, kernel list,
Daniel Colascione, Joel Fernandes (Google),
Andrew Morton
On 3/12/20 7:00 PM, Minchan Kim wrote:
> On Thu, Mar 12, 2020 at 02:41:07PM -0700, Dave Hansen wrote:
>> One other fun thing. I have a "victim" thread sitting in a loop doing:
>>
>> sleep(1)
>> memcpy(&garbage, buffer, sz);
>>
>> The "attacker" is doing
>>
>> madvise(buffer, sz, MADV_PAGEOUT);
>>
>> in a loop. That, oddly enough doesn't cause the victim to page fault.
>> But, if I do:
>>
>> memcpy(&garbage, buffer, sz);
>> madvise(buffer, sz, MADV_PAGEOUT);
>>
>> It *does* cause the memory to get paged out. The MADV_PAGEOUT code
>> actually has a !pte_present() check. It will punt on a PTE if it sees
>> it. In other words, if a page is in the swap cache but not mapped by a
>> pte_present() PTE, MADV_PAGEOUT won't touch it.
>>
>> Shouldn't MADV_PAGEOUT be able to find and reclaim those pages? Patch
>> attached.
>
>>
>>
>> ---
>>
>> b/mm/madvise.c | 38 +++++++++++++++++++++++++++++++-------
>> 1 file changed, 31 insertions(+), 7 deletions(-)
>>
>> diff -puN mm/madvise.c~madv-pageout-find-swap-cache mm/madvise.c
>> --- a/mm/madvise.c~madv-pageout-find-swap-cache 2020-03-12 14:24:45.178775035 -0700
>> +++ b/mm/madvise.c 2020-03-12 14:35:49.706773378 -0700
>> @@ -248,6 +248,36 @@ static void force_shm_swapin_readahead(s
>> #endif /* CONFIG_SWAP */
>>
>> /*
>> + * Given a PTE, find the corresponding 'struct page'. Also handles
>> + * non-present swap PTEs.
>> + */
>> +struct page *pte_to_reclaim_page(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t ptent)
>> +{
>> + swp_entry_t entry;
>> +
>> + /* Totally empty PTE: */
>> + if (pte_none(ptent))
>> + return NULL;
>> +
>> + /* A normal, present page is mapped: */
>> + if (pte_present(ptent))
>> + return vm_normal_page(vma, addr, ptent);
>> +
>
> Please check is_swap_pte first.
Why?
is_swap_pte() duplicates the first two checks. But, I need an explicit
pte_present() check somewhere because I need to call vm_normal_page()
only on present PTEs.
I guess the pte_present() check could be:
if (!is_swap_pte(ptent))
return vm_normal_page(...);
*after* the pte_none() check.
>> + entry = pte_to_swp_entry(vmf->orig_pte);
>> + /* Is it one of the "swap PTEs" that's not really swap? */
>> + if (non_swap_entry(entry))
>> + return false;
>> +
>> + /*
>> + * The PTE was a true swap entry. The page may be in the
>> + * swap cache. If so, find it and return it so it may be
>> + * reclaimed.
>> + */
>> + return lookup_swap_cache(entry, vma, addr);
>
> If we go with handling only exclusived owned page for anon,
> I think we should apply the rule to swap cache, too.
I'm going back and forth on it. If we're just trying to avoid causing
faults in other processes, we could add a mapcount>0 check here in
addition to the mapcount>1 checks that were added in the other patch.
But, if we want a check for true exclusivity: no other swap entries or
mappings, we need to check swap_count() too. It's getting quite a bit
uglier as I add that it, but I guess we'll see how it looks in the end.
> Do you mind posting it as formal patch?
Yeah, I'll send something out.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-13 16:59 ` Dave Hansen
@ 2020-03-13 21:13 ` Minchan Kim
0 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2020-03-13 21:13 UTC (permalink / raw)
To: Dave Hansen
Cc: Michal Hocko, Jann Horn, Linux-MM, kernel list,
Daniel Colascione, Joel Fernandes (Google),
Andrew Morton
On Fri, Mar 13, 2020 at 09:59:50AM -0700, Dave Hansen wrote:
> On 3/12/20 7:00 PM, Minchan Kim wrote:
> > On Thu, Mar 12, 2020 at 02:41:07PM -0700, Dave Hansen wrote:
> >> One other fun thing. I have a "victim" thread sitting in a loop doing:
> >>
> >> sleep(1)
> >> memcpy(&garbage, buffer, sz);
> >>
> >> The "attacker" is doing
> >>
> >> madvise(buffer, sz, MADV_PAGEOUT);
> >>
> >> in a loop. That, oddly enough doesn't cause the victim to page fault.
> >> But, if I do:
> >>
> >> memcpy(&garbage, buffer, sz);
> >> madvise(buffer, sz, MADV_PAGEOUT);
> >>
> >> It *does* cause the memory to get paged out. The MADV_PAGEOUT code
> >> actually has a !pte_present() check. It will punt on a PTE if it sees
> >> it. In other words, if a page is in the swap cache but not mapped by a
> >> pte_present() PTE, MADV_PAGEOUT won't touch it.
> >>
> >> Shouldn't MADV_PAGEOUT be able to find and reclaim those pages? Patch
> >> attached.
> >
> >>
> >>
> >> ---
> >>
> >> b/mm/madvise.c | 38 +++++++++++++++++++++++++++++++-------
> >> 1 file changed, 31 insertions(+), 7 deletions(-)
> >>
> >> diff -puN mm/madvise.c~madv-pageout-find-swap-cache mm/madvise.c
> >> --- a/mm/madvise.c~madv-pageout-find-swap-cache 2020-03-12 14:24:45.178775035 -0700
> >> +++ b/mm/madvise.c 2020-03-12 14:35:49.706773378 -0700
> >> @@ -248,6 +248,36 @@ static void force_shm_swapin_readahead(s
> >> #endif /* CONFIG_SWAP */
> >>
> >> /*
> >> + * Given a PTE, find the corresponding 'struct page'. Also handles
> >> + * non-present swap PTEs.
> >> + */
> >> +struct page *pte_to_reclaim_page(struct vm_area_struct *vma,
> >> + unsigned long addr, pte_t ptent)
> >> +{
> >> + swp_entry_t entry;
> >> +
> >> + /* Totally empty PTE: */
> >> + if (pte_none(ptent))
> >> + return NULL;
> >> +
> >> + /* A normal, present page is mapped: */
> >> + if (pte_present(ptent))
> >> + return vm_normal_page(vma, addr, ptent);
> >> +
> >
> > Please check is_swap_pte first.
>
> Why?
>
> is_swap_pte() duplicates the first two checks. But, I need an explicit
> pte_present() check somewhere because I need to call vm_normal_page()
> only on present PTEs.
>
> I guess the pte_present() check could be:
>
> if (!is_swap_pte(ptent))
> return vm_normal_page(...);
>
> *after* the pte_none() check.
Yub, I thought is_swap_pte looks more readable and maintainable for
the change of pte encoding in future. Anyway, I am not insisting.
>
> >> + entry = pte_to_swp_entry(vmf->orig_pte);
> >> + /* Is it one of the "swap PTEs" that's not really swap? */
> >> + if (non_swap_entry(entry))
> >> + return false;
> >> +
> >> + /*
> >> + * The PTE was a true swap entry. The page may be in the
> >> + * swap cache. If so, find it and return it so it may be
> >> + * reclaimed.
> >> + */
> >> + return lookup_swap_cache(entry, vma, addr);
> >
> > If we go with handling only exclusived owned page for anon,
> > I think we should apply the rule to swap cache, too.
>
> I'm going back and forth on it. If we're just trying to avoid causing
> faults in other processes, we could add a mapcount>0 check here in
> addition to the mapcount>1 checks that were added in the other patch.
>
> But, if we want a check for true exclusivity: no other swap entries or
> mappings, we need to check swap_count() too. It's getting quite a bit
> uglier as I add that it, but I guess we'll see how it looks in the end.
If we go to the map_count > 1 check here and follows the Daniel's suggestion
of MADV_PAGEOUT_ALL to make shared page paging out, that means it clearly
makes semantic change for MADV_PAGEOUT: "paging out only exclusive owned page"
so it would be rather weired if we reclaim swap_count() > 1 of swap cache.
>
> > Do you mind posting it as formal patch?
>
> Yeah, I'll send something out.
Thanks for bring up the issue, Dave!
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
2020-03-12 20:16 ` Minchan Kim
` (2 preceding siblings ...)
2020-03-12 21:41 ` Dave Hansen
@ 2020-03-12 23:29 ` Jann Horn
3 siblings, 0 replies; 34+ messages in thread
From: Jann Horn @ 2020-03-12 23:29 UTC (permalink / raw)
To: Minchan Kim
Cc: Michal Hocko, Linux-MM, kernel list, Daniel Colascione,
Dave Hansen, Joel Fernandes (Google),
Andrew Morton
On Thu, Mar 12, 2020 at 9:16 PM Minchan Kim <minchan@kernel.org> wrote:
> On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
> > [Cc akpm]
> >
> > So what about this?
>
> Thanks, Michal.
>
> I't likde to wait Jann's reply since Dave gave his opinion about the vulnerability.
> https://lore.kernel.org/linux-mm/cf95db88-968d-fee5-1c15-10d024c09d8a@intel.com/
> Jann, could you give your insigh about that practically it's possible?
Since I don't really know much about the whole LVI attack, and in
particular don't have any practical experience with it, I don't think
I can help with insight on that. So since Dave said that he doesn't
think it's practical, I'll defer to him on that.
^ permalink raw reply [flat|nested] 34+ messages in thread