Linux-api Archive on lore.kernel.org
 help / color / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Daniel Colascione <dancol@google.com>
Cc: Minchan Kim <minchan@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tim Murray <timmurray@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [RFC 7/7] mm: madvise support MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER
Date: Tue, 28 May 2019 14:32:08 +0200
Message-ID: <20190528123208.GC1658@dhcp22.suse.cz> (raw)
In-Reply-To: <CAKOZueuerHTCPbQqowSxi-_sRsqxYQQqgyi1UOh7EkZcS3DCnA@mail.gmail.com>

On Tue 28-05-19 05:11:16, Daniel Colascione wrote:
> On Tue, May 28, 2019 at 4:49 AM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > > We have various system calls that provide hints for open files, but
> > > the memory operations are distinct. Modeling anonymous memory as a
> > > kind of file-backed memory for purposes of VMA manipulation would also
> > > be a departure from existing practice. Can you help me understand why
> > > you seem to favor the FD-per-VMA approach so heavily? I don't see any
> > > arguments *for* an FD-per-VMA model for remove memory manipulation and
> > > I see a lot of arguments against it. Is there some compelling
> > > advantage I'm missing?
> >
> > First and foremost it provides an easy cookie to the userspace to
> > guarantee time-to-check-time-to-use consistency.
> 
> But only for one VMA at a time.

Which is the unit we operate on, right?

> > It also naturally
> > extend an existing fadvise interface that achieves madvise semantic on
> > files.
> 
> There are lots of things that madvise can do that fadvise can't and
> that don't even really make sense for fadvise, e.g., MADV_FREE. It
> seems odd to me to duplicate much of the madvise interface into
> fadvise so that we can use file APIs to give madvise hints. It seems
> simpler to me to just provide a mechanism to put the madvise hints
> where they're needed.

I do not see why we would duplicate. I confess I haven't tried to
implement this so I might be overlooking something but it seems to me
that we could simply reuse the same functionality from both APIs.

> > I am not really pushing hard for this particular API but I really
> > do care about a programming model that would be sane.
> 
> You've used "sane" twice so far in this message. Can you specify more
> precisely what you mean by that word?

Well, I would consider a model which would prevent from unintended side
effects (e.g. working on a completely different object) without a tricky
synchronization sane.

> I agree that there needs to be
> some defense against TOCTOU races when doing remote memory management,
> but I don't think providing this robustness via a file descriptor is
> any more sane than alternative approaches. A file descriptor comes
> with a lot of other features --- e.g., SCM_RIGHTS, fstat, and a
> concept of owning a resource --- that aren't needed to achieve
> robustness.
> 
> Normally, a file descriptor refers to some resource that the kernel
> holds as long as the file descriptor (well, the open file description
> or struct file) lives -- things like graphics buffers, files, and
> sockets. If we're using an FD *just* as a cookie and not a resource,
> I'd rather just expose the cookie directly.

You are absolutely right. But doesn't that apply to any other
revalidation method that would be tracking VMA status as well. As I've
said I am not married to this approach as long as there are better
alternatives. So far we are in a discussion what should be the actual
semantic of the operation and how much do we want to tolerate races. And
it seems that we are diving into implementation details rather than
landing with a firm decision that the current proposed API is suitable
or not.

> > If we have a
> > different means to achieve the same then all fine by me but so far I
> > haven't heard any sound arguments to invent something completely new
> > when we have established APIs to use.
> 
> Doesn't the next sentence describe something profoundly new? :-)
> 
> > Exporting anonymous mappings via
> > proc the same way we do for file mappings doesn't seem to be stepping
> > outside of the current practice way too much.
> 
> It seems like a radical departure from existing practice to provide
> filesystem interfaces to anonymous memory regions, e.g., anon_vma.
> You've never been able to refer to those memory regions with file
> descriptors.
> 
> All I'm suggesting is that we take the existing madvise mechanism,
> make it work cross-process, and make it robust against TOCTOU
> problems, all one step at a time. Maybe my sense of API "size" is
> miscalibrated, but adding a new type of FD to refer to anonymous VMA
> regions feels like a bigger departure and so requires stronger
> justification, especially if the result of the FD approach is probably
> something less efficient than a cookie-based one.

Feel free to propose the way to achieve that in the respective email
thread.
 
> > and we should focus on discussing whether this is a
> > sane model. And I think it would be much better to discuss that under
> > the respective patch which introduces that API rather than here.
> 
> I think it's important to discuss what that API should look like. :-)

It will be fun to follow this discussion and make some sense of
different parallel threads.

-- 
Michal Hocko
SUSE Labs

  reply index

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190520035254.57579-1-minchan@kernel.org>
     [not found] ` <20190520035254.57579-2-minchan@kernel.org>
2019-05-20  8:16   ` [RFC 1/7] mm: introduce MADV_COOL Michal Hocko
2019-05-20  8:19     ` Michal Hocko
2019-05-20 15:08       ` Suren Baghdasaryan
2019-05-20 22:55       ` Minchan Kim
2019-05-20 22:54     ` Minchan Kim
2019-05-21  6:04       ` Michal Hocko
2019-05-21  9:11         ` Minchan Kim
2019-05-21 10:05           ` Michal Hocko
     [not found] ` <20190520035254.57579-4-minchan@kernel.org>
2019-05-20  8:27   ` [RFC 3/7] mm: introduce MADV_COLD Michal Hocko
2019-05-20 23:00     ` Minchan Kim
2019-05-21  6:08       ` Michal Hocko
2019-05-21  9:13         ` Minchan Kim
     [not found] ` <20190520035254.57579-6-minchan@kernel.org>
2019-05-20  9:18   ` [RFC 5/7] mm: introduce external memory hinting API Michal Hocko
2019-05-21  2:41     ` Minchan Kim
2019-05-21  6:17       ` Michal Hocko
2019-05-21 10:32         ` Minchan Kim
     [not found] ` <20190520035254.57579-7-minchan@kernel.org>
2019-05-20  9:22   ` [RFC 6/7] mm: extend process_madvise syscall to support vector arrary Michal Hocko
2019-05-21  2:48     ` Minchan Kim
2019-05-21  6:24       ` Michal Hocko
2019-05-21 10:26         ` Minchan Kim
2019-05-21 10:37           ` Michal Hocko
2019-05-27  7:49             ` Minchan Kim
2019-05-29 10:08               ` Daniel Colascione
2019-05-29 10:33                 ` Michal Hocko
2019-05-30  2:17                   ` Minchan Kim
2019-05-30  6:57                     ` Michal Hocko
2019-05-30  8:02                       ` Minchan Kim
2019-05-30 16:19                         ` Daniel Colascione
2019-05-30 18:47                         ` Michal Hocko
     [not found] ` <20190520035254.57579-8-minchan@kernel.org>
2019-05-20  9:28   ` [RFC 7/7] mm: madvise support MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER Michal Hocko
2019-05-21  2:55     ` Minchan Kim
2019-05-21  6:26       ` Michal Hocko
2019-05-27  7:58         ` Minchan Kim
2019-05-27 12:44           ` Michal Hocko
2019-05-28  3:26             ` Minchan Kim
2019-05-28  6:29               ` Michal Hocko
2019-05-28  8:13                 ` Minchan Kim
2019-05-28  8:31                   ` Daniel Colascione
2019-05-28  8:49                     ` Minchan Kim
2019-05-28  9:08                       ` Michal Hocko
2019-05-28  9:39                         ` Daniel Colascione
2019-05-28 10:33                           ` Michal Hocko
2019-05-28 11:21                             ` Daniel Colascione
2019-05-28 11:49                               ` Michal Hocko
2019-05-28 12:11                                 ` Daniel Colascione
2019-05-28 12:32                                   ` Michal Hocko [this message]
2019-05-28 10:32                         ` Minchan Kim
2019-05-28 10:41                           ` Michal Hocko
2019-05-28 11:12                             ` Minchan Kim
2019-05-28 11:28                               ` Michal Hocko
2019-05-28 11:42                                 ` Daniel Colascione
2019-05-28 11:56                                   ` Michal Hocko
2019-05-28 12:18                                     ` Daniel Colascione
2019-05-28 12:38                                       ` Michal Hocko
2019-05-28 12:10                                   ` Minchan Kim
2019-05-28 11:44                                 ` Minchan Kim
2019-05-28 11:51                                   ` Daniel Colascione
2019-05-28 12:06                                   ` Michal Hocko
2019-05-28 12:22                                     ` Minchan Kim
2019-05-28 11:28                             ` Daniel Colascione
2019-05-21 15:33       ` Johannes Weiner
2019-05-22  1:50         ` Minchan Kim
2019-05-20  9:28 ` [RFC 0/7] introduce memory hinting API for external process Michal Hocko
     [not found] ` <20190520164605.GA11665@cmpxchg.org>
     [not found]   ` <20190521043950.GJ10039@google.com>
2019-05-21  6:32     ` Michal Hocko
     [not found] ` <20190521014452.GA6738@bombadil.infradead.org>
2019-05-21  6:34   ` Michal Hocko
2019-05-21 12:53 ` Shakeel Butt
     [not found] ` <dbe801f0-4bbe-5f6e-9053-4b7deb38e235@arm.com>
     [not found]   ` <CAEe=Sxka3Q3vX+7aWUJGKicM+a9Px0rrusyL+5bB1w4ywF6N4Q@mail.gmail.com>
     [not found]     ` <1754d0ef-6756-d88b-f728-17b1fe5d5b07@arm.com>
2019-05-21 12:56       ` Shakeel Butt
2019-05-22  4:23         ` Brian Geffon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190528123208.GC1658@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=shakeelb@google.com \
    --cc=sonnyrao@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-api Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-api/0 linux-api/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-api linux-api/ https://lore.kernel.org/linux-api \
		linux-api@vger.kernel.org
	public-inbox-index linux-api

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-api


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git