Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Daniel Colascione <dancol@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tim Murray <timmurray@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [RFC 6/7] mm: extend process_madvise syscall to support vector arrary
Date: Thu, 30 May 2019 11:17:48 +0900
Message-ID: <20190530021748.GE229459@google.com> (raw)
In-Reply-To: <20190529103352.GD18589@dhcp22.suse.cz>

On Wed, May 29, 2019 at 12:33:52PM +0200, Michal Hocko wrote:
> On Wed 29-05-19 03:08:32, Daniel Colascione wrote:
> > On Mon, May 27, 2019 at 12:49 AM Minchan Kim <minchan@kernel.org> wrote:
> > >
> > > On Tue, May 21, 2019 at 12:37:26PM +0200, Michal Hocko wrote:
> > > > On Tue 21-05-19 19:26:13, Minchan Kim wrote:
> > > > > On Tue, May 21, 2019 at 08:24:21AM +0200, Michal Hocko wrote:
> > > > > > On Tue 21-05-19 11:48:20, Minchan Kim wrote:
> > > > > > > On Mon, May 20, 2019 at 11:22:58AM +0200, Michal Hocko wrote:
> > > > > > > > [Cc linux-api]
> > > > > > > >
> > > > > > > > On Mon 20-05-19 12:52:53, Minchan Kim wrote:
> > > > > > > > > Currently, process_madvise syscall works for only one address range
> > > > > > > > > so user should call the syscall several times to give hints to
> > > > > > > > > multiple address range.
> > > > > > > >
> > > > > > > > Is that a problem? How big of a problem? Any numbers?
> > > > > > >
> > > > > > > We easily have 2000+ vma so it's not trivial overhead. I will come up
> > > > > > > with number in the description at respin.
> > > > > >
> > > > > > Does this really have to be a fast operation? I would expect the monitor
> > > > > > is by no means a fast path. The system call overhead is not what it used
> > > > > > to be, sigh, but still for something that is not a hot path it should be
> > > > > > tolerable, especially when the whole operation is quite expensive on its
> > > > > > own (wrt. the syscall entry/exit).
> > > > >
> > > > > What's different with process_vm_[readv|writev] and vmsplice?
> > > > > If the range needed to be covered is a lot, vector operation makes senese
> > > > > to me.
> > > >
> > > > I am not saying that the vector API is wrong. All I am trying to say is
> > > > that the benefit is not really clear so far. If you want to push it
> > > > through then you should better get some supporting data.
> > >
> > > I measured 1000 madvise syscall vs. a vector range syscall with 1000
> > > ranges on ARM64 mordern device. Even though I saw 15% improvement but
> > > absoluate gain is just 1ms so I don't think it's worth to support.
> > > I will drop vector support at next revision.
> > 
> > Please do keep the vector support. Absolute timing is misleading,
> > since in a tight loop, you're not going to contend on mmap_sem. We've
> > seen tons of improvements in things like camera start come from
> > coalescing mprotect calls, with the gains coming from taking and
> > releasing various locks a lot less often and bouncing around less on
> > the contended lock paths. Raw throughput doesn't tell the whole story,
> > especially on mobile.
> 
> This will always be a double edge sword. Taking a lock for longer can
> improve a throughput of a single call but it would make a latency for
> anybody contending on the lock much worse.
> 
> Besides that, please do not overcomplicate the thing from the early
> beginning please. Let's start with a simple and well defined remote
> madvise alternative first and build a vector API on top with some
> numbers based on _real_ workloads.

First time, I didn't think about atomicity about address range race
because MADV_COLD/PAGEOUT is not critical for the race.
However you raised the atomicity issue because people would extend
hints to destructive ones easily. I agree with that and that's why
we discussed how to guarantee the race and Daniel comes up with good idea.

  - vma configuration seq number via process_getinfo(2).

We discussed the race issue without _read_ workloads/requests because
it's common sense that people might extend the syscall later.

Here is same. For current workload, we don't need to support vector
for perfomance point of view based on my experiment. However, it's
rather limited experiment. Some configuration might have 10000+ vmas
or really slow CPU. 

Furthermore, I want to have vector support due to atomicity issue
if it's really the one we should consider.
With vector support of the API and vma configuration sequence number
from Daniel, we could support address ranges operations's atomicity.
However, since we don't introduce vector at this moment, we need to
introduce *another syscall* later to be able to handle multile ranges
all at once atomically if it's okay.

Other thought:
Maybe we could extend address range batch syscall covers other MM
syscall like mmap/munmap/madvise/mprotect and so on because there
are multiple users that would benefit from this general batching
mechanism.


  reply index

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20  3:52 [RFC 0/7] introduce memory hinting API for external process Minchan Kim
2019-05-20  3:52 ` [RFC 1/7] mm: introduce MADV_COOL Minchan Kim
2019-05-20  8:16   ` Michal Hocko
2019-05-20  8:19     ` Michal Hocko
2019-05-20 15:08       ` Suren Baghdasaryan
2019-05-20 22:55       ` Minchan Kim
2019-05-20 22:54     ` Minchan Kim
2019-05-21  6:04       ` Michal Hocko
2019-05-21  9:11         ` Minchan Kim
2019-05-21 10:05           ` Michal Hocko
2019-05-28  8:53   ` Hillf Danton
2019-05-28 10:58   ` Minchan Kim
2019-05-20  3:52 ` [RFC 2/7] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Minchan Kim
2019-05-20 16:50   ` Johannes Weiner
2019-05-20 22:57     ` Minchan Kim
2019-05-20  3:52 ` [RFC 3/7] mm: introduce MADV_COLD Minchan Kim
2019-05-20  8:27   ` Michal Hocko
2019-05-20 23:00     ` Minchan Kim
2019-05-21  6:08       ` Michal Hocko
2019-05-21  9:13         ` Minchan Kim
2019-05-28 14:54   ` Hillf Danton
2019-05-30  0:45   ` Minchan Kim
2019-05-20  3:52 ` [RFC 4/7] mm: factor out madvise's core functionality Minchan Kim
2019-05-20 14:26   ` Oleksandr Natalenko
2019-05-21  1:26     ` Minchan Kim
2019-05-21  6:36       ` Oleksandr Natalenko
2019-05-21  6:50         ` Michal Hocko
2019-05-21  7:06           ` Oleksandr Natalenko
2019-05-21 10:52             ` Minchan Kim
2019-05-21 11:00               ` Michal Hocko
2019-05-21 11:24                 ` Minchan Kim
2019-05-21 11:32                   ` Michal Hocko
2019-05-21 10:49         ` Minchan Kim
2019-05-21 10:55           ` Michal Hocko
2019-05-20  3:52 ` [RFC 5/7] mm: introduce external memory hinting API Minchan Kim
2019-05-20  9:18   ` Michal Hocko
2019-05-21  2:41     ` Minchan Kim
2019-05-21  6:17       ` Michal Hocko
2019-05-21 10:32         ` Minchan Kim
2019-05-21  9:01   ` Christian Brauner
2019-05-21 11:35     ` Minchan Kim
2019-05-21 11:51       ` Christian Brauner
2019-05-21 15:31   ` Oleg Nesterov
2019-05-27  7:43     ` Minchan Kim
2019-05-27 15:12       ` Oleg Nesterov
2019-05-27 23:33         ` Minchan Kim
2019-05-28  7:23           ` Michal Hocko
2019-05-29  3:41   ` Hillf Danton
2019-05-30  0:38   ` Minchan Kim
2019-05-20  3:52 ` [RFC 6/7] mm: extend process_madvise syscall to support vector arrary Minchan Kim
2019-05-20  9:22   ` Michal Hocko
2019-05-21  2:48     ` Minchan Kim
2019-05-21  6:24       ` Michal Hocko
2019-05-21 10:26         ` Minchan Kim
2019-05-21 10:37           ` Michal Hocko
2019-05-27  7:49             ` Minchan Kim
2019-05-29 10:08               ` Daniel Colascione
2019-05-29 10:33                 ` Michal Hocko
2019-05-30  2:17                   ` Minchan Kim [this message]
2019-05-30  6:57                     ` Michal Hocko
2019-05-30  8:02                       ` Minchan Kim
2019-05-30 16:19                         ` Daniel Colascione
2019-05-30 18:47                         ` Michal Hocko
2019-05-29  4:14   ` Hillf Danton
2019-05-30  0:35   ` Minchan Kim
2019-05-20  3:52 ` [RFC 7/7] mm: madvise support MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER Minchan Kim
2019-05-20  9:28   ` Michal Hocko
2019-05-21  2:55     ` Minchan Kim
2019-05-21  6:26       ` Michal Hocko
2019-05-27  7:58         ` Minchan Kim
2019-05-27 12:44           ` Michal Hocko
2019-05-28  3:26             ` Minchan Kim
2019-05-28  6:29               ` Michal Hocko
2019-05-28  8:13                 ` Minchan Kim
2019-05-28  8:31                   ` Daniel Colascione
2019-05-28  8:49                     ` Minchan Kim
2019-05-28  9:08                       ` Michal Hocko
2019-05-28  9:39                         ` Daniel Colascione
2019-05-28 10:33                           ` Michal Hocko
2019-05-28 11:21                             ` Daniel Colascione
2019-05-28 11:49                               ` Michal Hocko
2019-05-28 12:11                                 ` Daniel Colascione
2019-05-28 12:32                                   ` Michal Hocko
2019-05-28 10:32                         ` Minchan Kim
2019-05-28 10:41                           ` Michal Hocko
2019-05-28 11:12                             ` Minchan Kim
2019-05-28 11:28                               ` Michal Hocko
2019-05-28 11:42                                 ` Daniel Colascione
2019-05-28 11:56                                   ` Michal Hocko
2019-05-28 12:18                                     ` Daniel Colascione
2019-05-28 12:38                                       ` Michal Hocko
2019-05-28 12:10                                   ` Minchan Kim
2019-05-28 11:44                                 ` Minchan Kim
2019-05-28 11:51                                   ` Daniel Colascione
2019-05-28 12:06                                   ` Michal Hocko
2019-05-28 12:22                                     ` Minchan Kim
2019-05-28 11:28                             ` Daniel Colascione
2019-05-21 15:33       ` Johannes Weiner
2019-05-22  1:50         ` Minchan Kim
2019-05-29  4:36   ` Hillf Danton
2019-05-30  1:00   ` Minchan Kim
2019-05-20  6:37 ` [RFC 0/7] introduce memory hinting API for external process Anshuman Khandual
2019-05-20 16:59   ` Tim Murray
2019-05-21  2:55     ` Anshuman Khandual
2019-05-21  5:14       ` Minchan Kim
2019-05-21 10:34       ` Michal Hocko
2019-05-28 10:50         ` Anshuman Khandual
2019-05-21 12:56       ` Shakeel Butt
2019-05-22  4:15         ` Brian Geffon
2019-05-22  4:23         ` Brian Geffon
2019-05-20  9:28 ` Michal Hocko
2019-05-20 14:42 ` Oleksandr Natalenko
2019-05-21  2:56   ` Minchan Kim
2019-05-20 16:46 ` Johannes Weiner
2019-05-21  4:39   ` Minchan Kim
2019-05-21  6:32     ` Michal Hocko
2019-05-21  1:44 ` Matthew Wilcox
2019-05-21  5:01   ` Minchan Kim
2019-05-21  6:34   ` Michal Hocko
2019-05-21  8:42 ` Christian Brauner
2019-05-21 11:05   ` Minchan Kim
2019-05-21 11:30     ` Christian Brauner
2019-05-21 11:39       ` Christian Brauner
2019-05-22  5:11         ` Daniel Colascione
2019-05-22  8:22           ` Christian Brauner
2019-05-22 13:16             ` Daniel Colascione
2019-05-22 14:52               ` Christian Brauner
2019-05-22 15:17                 ` Daniel Colascione
2019-05-22 15:48                   ` Christian Brauner
2019-05-22 15:57                     ` Daniel Colascione
2019-05-22 16:01                       ` Christian Brauner
2019-05-22 16:01                         ` Daniel Colascione
2019-05-23 13:07                           ` Minchan Kim
2019-05-27  8:06                             ` Minchan Kim
2019-05-21 11:41       ` Minchan Kim
2019-05-21 12:04         ` Christian Brauner
2019-05-21 12:15           ` Oleksandr Natalenko
2019-05-21 12:53 ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190530021748.GE229459@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=sonnyrao@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git