Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tim Murray <timmurray@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Daniel Colascione <dancol@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>
Subject: Re: [RFC 4/7] mm: factor out madvise's core functionality
Date: Tue, 21 May 2019 19:49:49 +0900
Message-ID: <20190521104949.GE219653@google.com> (raw)
In-Reply-To: <20190521063628.x2npirvs75jxjilx@butterfly.localdomain>

On Tue, May 21, 2019 at 08:36:28AM +0200, Oleksandr Natalenko wrote:
> Hi.
> 
> On Tue, May 21, 2019 at 10:26:49AM +0900, Minchan Kim wrote:
> > On Mon, May 20, 2019 at 04:26:33PM +0200, Oleksandr Natalenko wrote:
> > > Hi.
> > > 
> > > On Mon, May 20, 2019 at 12:52:51PM +0900, Minchan Kim wrote:
> > > > This patch factor out madvise's core functionality so that upcoming
> > > > patch can reuse it without duplication.
> > > > 
> > > > It shouldn't change any behavior.
> > > > 
> > > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > > ---
> > > >  mm/madvise.c | 168 +++++++++++++++++++++++++++------------------------
> > > >  1 file changed, 89 insertions(+), 79 deletions(-)
> > > > 
> > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > index 9a6698b56845..119e82e1f065 100644
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -742,7 +742,8 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
> > > >  	return 0;
> > > >  }
> > > >  
> > > > -static long madvise_dontneed_free(struct vm_area_struct *vma,
> > > > +static long madvise_dontneed_free(struct task_struct *tsk,
> > > > +				  struct vm_area_struct *vma,
> > > >  				  struct vm_area_struct **prev,
> > > >  				  unsigned long start, unsigned long end,
> > > >  				  int behavior)
> > > > @@ -754,8 +755,8 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> > > >  	if (!userfaultfd_remove(vma, start, end)) {
> > > >  		*prev = NULL; /* mmap_sem has been dropped, prev is stale */
> > > >  
> > > > -		down_read(&current->mm->mmap_sem);
> > > > -		vma = find_vma(current->mm, start);
> > > > +		down_read(&tsk->mm->mmap_sem);
> > > > +		vma = find_vma(tsk->mm, start);
> > > >  		if (!vma)
> > > >  			return -ENOMEM;
> > > >  		if (start < vma->vm_start) {
> > > > @@ -802,7 +803,8 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> > > >   * Application wants to free up the pages and associated backing store.
> > > >   * This is effectively punching a hole into the middle of a file.
> > > >   */
> > > > -static long madvise_remove(struct vm_area_struct *vma,
> > > > +static long madvise_remove(struct task_struct *tsk,
> > > > +				struct vm_area_struct *vma,
> > > >  				struct vm_area_struct **prev,
> > > >  				unsigned long start, unsigned long end)
> > > >  {
> > > > @@ -836,13 +838,13 @@ static long madvise_remove(struct vm_area_struct *vma,
> > > >  	get_file(f);
> > > >  	if (userfaultfd_remove(vma, start, end)) {
> > > >  		/* mmap_sem was not released by userfaultfd_remove() */
> > > > -		up_read(&current->mm->mmap_sem);
> > > > +		up_read(&tsk->mm->mmap_sem);
> > > >  	}
> > > >  	error = vfs_fallocate(f,
> > > >  				FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> > > >  				offset, end - start);
> > > >  	fput(f);
> > > > -	down_read(&current->mm->mmap_sem);
> > > > +	down_read(&tsk->mm->mmap_sem);
> > > >  	return error;
> > > >  }
> > > >  
> > > > @@ -916,12 +918,13 @@ static int madvise_inject_error(int behavior,
> > > >  #endif
> > > 
> > > What about madvise_inject_error() and get_user_pages_fast() in it
> > > please?
> > 
> > Good point. Maybe, there more places where assume context is "current" so
> > I'm thinking to limit hints we could allow from external process.
> > It would be better for maintainance point of view in that we could know
> > the workload/usecases when someone ask new advises from external process
> > without making every hints works both contexts.
> 
> Well, for madvise_inject_error() we still have a remote variant of
> get_user_pages(), and that should work, no?

Regardless of madvise_inject_error, it seems to be risky to expose all
of hints for external process, I think. For example, MADV_DONTNEED with
race, it's critical for stability. So, until we could get the way to
prevent the race, I want to restrict hints.

> 
> Regarding restricting the hints, I'm definitely interested in having
> remote MADV_MERGEABLE/MADV_UNMERGEABLE. But, OTOH, doing it via remote
> madvise() introduces another issue with traversing remote VMAs reliably.

How is it signifiact when the race happens? It could waste CPU cycle
and make unncessary break of that merged pages but expect it should be
rare so such non-desruptive hint could be exposed via process_madvise, I think.

If the hint is critical for the race, yes, as Michal suggested, we need a way
to close it and I guess non-cooperative userfaultfd with synchronous support
would help private anonymous vma.

> IIUC, one can do this via userspace by parsing [s]maps file only, which
> is not very consistent, and once some range is parsed, and then it is
> immediately gone, a wrong hint will be sent.
> 
> Isn't this a problem we should worry about?

I think it depends on the hint and usecase.

> 
> > 
> > Thanks.
> 
> -- 
>   Best regards,
>     Oleksandr Natalenko (post-factum)
>     Senior Software Maintenance Engineer


  parent reply index

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20  3:52 [RFC 0/7] introduce memory hinting API for external process Minchan Kim
2019-05-20  3:52 ` [RFC 1/7] mm: introduce MADV_COOL Minchan Kim
2019-05-20  8:16   ` Michal Hocko
2019-05-20  8:19     ` Michal Hocko
2019-05-20 15:08       ` Suren Baghdasaryan
2019-05-20 22:55       ` Minchan Kim
2019-05-20 22:54     ` Minchan Kim
2019-05-21  6:04       ` Michal Hocko
2019-05-21  9:11         ` Minchan Kim
2019-05-21 10:05           ` Michal Hocko
2019-05-28  8:53   ` Hillf Danton
2019-05-28 10:58   ` Minchan Kim
2019-05-20  3:52 ` [RFC 2/7] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Minchan Kim
2019-05-20 16:50   ` Johannes Weiner
2019-05-20 22:57     ` Minchan Kim
2019-05-20  3:52 ` [RFC 3/7] mm: introduce MADV_COLD Minchan Kim
2019-05-20  8:27   ` Michal Hocko
2019-05-20 23:00     ` Minchan Kim
2019-05-21  6:08       ` Michal Hocko
2019-05-21  9:13         ` Minchan Kim
2019-05-28 14:54   ` Hillf Danton
2019-05-30  0:45   ` Minchan Kim
2019-05-20  3:52 ` [RFC 4/7] mm: factor out madvise's core functionality Minchan Kim
2019-05-20 14:26   ` Oleksandr Natalenko
2019-05-21  1:26     ` Minchan Kim
2019-05-21  6:36       ` Oleksandr Natalenko
2019-05-21  6:50         ` Michal Hocko
2019-05-21  7:06           ` Oleksandr Natalenko
2019-05-21 10:52             ` Minchan Kim
2019-05-21 11:00               ` Michal Hocko
2019-05-21 11:24                 ` Minchan Kim
2019-05-21 11:32                   ` Michal Hocko
2019-05-21 10:49         ` Minchan Kim [this message]
2019-05-21 10:55           ` Michal Hocko
2019-05-20  3:52 ` [RFC 5/7] mm: introduce external memory hinting API Minchan Kim
2019-05-20  9:18   ` Michal Hocko
2019-05-21  2:41     ` Minchan Kim
2019-05-21  6:17       ` Michal Hocko
2019-05-21 10:32         ` Minchan Kim
2019-05-21  9:01   ` Christian Brauner
2019-05-21 11:35     ` Minchan Kim
2019-05-21 11:51       ` Christian Brauner
2019-05-21 15:31   ` Oleg Nesterov
2019-05-27  7:43     ` Minchan Kim
2019-05-27 15:12       ` Oleg Nesterov
2019-05-27 23:33         ` Minchan Kim
2019-05-28  7:23           ` Michal Hocko
2019-05-29  3:41   ` Hillf Danton
2019-05-30  0:38   ` Minchan Kim
2019-05-20  3:52 ` [RFC 6/7] mm: extend process_madvise syscall to support vector arrary Minchan Kim
2019-05-20  9:22   ` Michal Hocko
2019-05-21  2:48     ` Minchan Kim
2019-05-21  6:24       ` Michal Hocko
2019-05-21 10:26         ` Minchan Kim
2019-05-21 10:37           ` Michal Hocko
2019-05-27  7:49             ` Minchan Kim
2019-05-29 10:08               ` Daniel Colascione
2019-05-29 10:33                 ` Michal Hocko
2019-05-30  2:17                   ` Minchan Kim
2019-05-30  6:57                     ` Michal Hocko
2019-05-30  8:02                       ` Minchan Kim
2019-05-30 16:19                         ` Daniel Colascione
2019-05-30 18:47                         ` Michal Hocko
2019-05-29  4:14   ` Hillf Danton
2019-05-30  0:35   ` Minchan Kim
2019-05-20  3:52 ` [RFC 7/7] mm: madvise support MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER Minchan Kim
2019-05-20  9:28   ` Michal Hocko
2019-05-21  2:55     ` Minchan Kim
2019-05-21  6:26       ` Michal Hocko
2019-05-27  7:58         ` Minchan Kim
2019-05-27 12:44           ` Michal Hocko
2019-05-28  3:26             ` Minchan Kim
2019-05-28  6:29               ` Michal Hocko
2019-05-28  8:13                 ` Minchan Kim
2019-05-28  8:31                   ` Daniel Colascione
2019-05-28  8:49                     ` Minchan Kim
2019-05-28  9:08                       ` Michal Hocko
2019-05-28  9:39                         ` Daniel Colascione
2019-05-28 10:33                           ` Michal Hocko
2019-05-28 11:21                             ` Daniel Colascione
2019-05-28 11:49                               ` Michal Hocko
2019-05-28 12:11                                 ` Daniel Colascione
2019-05-28 12:32                                   ` Michal Hocko
2019-05-28 10:32                         ` Minchan Kim
2019-05-28 10:41                           ` Michal Hocko
2019-05-28 11:12                             ` Minchan Kim
2019-05-28 11:28                               ` Michal Hocko
2019-05-28 11:42                                 ` Daniel Colascione
2019-05-28 11:56                                   ` Michal Hocko
2019-05-28 12:18                                     ` Daniel Colascione
2019-05-28 12:38                                       ` Michal Hocko
2019-05-28 12:10                                   ` Minchan Kim
2019-05-28 11:44                                 ` Minchan Kim
2019-05-28 11:51                                   ` Daniel Colascione
2019-05-28 12:06                                   ` Michal Hocko
2019-05-28 12:22                                     ` Minchan Kim
2019-05-28 11:28                             ` Daniel Colascione
2019-05-21 15:33       ` Johannes Weiner
2019-05-22  1:50         ` Minchan Kim
2019-05-29  4:36   ` Hillf Danton
2019-05-30  1:00   ` Minchan Kim
2019-05-20  6:37 ` [RFC 0/7] introduce memory hinting API for external process Anshuman Khandual
2019-05-20 16:59   ` Tim Murray
2019-05-21  2:55     ` Anshuman Khandual
2019-05-21  5:14       ` Minchan Kim
2019-05-21 10:34       ` Michal Hocko
2019-05-28 10:50         ` Anshuman Khandual
2019-05-21 12:56       ` Shakeel Butt
2019-05-22  4:15         ` Brian Geffon
2019-05-22  4:23         ` Brian Geffon
2019-05-20  9:28 ` Michal Hocko
2019-05-20 14:42 ` Oleksandr Natalenko
2019-05-21  2:56   ` Minchan Kim
2019-05-20 16:46 ` Johannes Weiner
2019-05-21  4:39   ` Minchan Kim
2019-05-21  6:32     ` Michal Hocko
2019-05-21  1:44 ` Matthew Wilcox
2019-05-21  5:01   ` Minchan Kim
2019-05-21  6:34   ` Michal Hocko
2019-05-21  8:42 ` Christian Brauner
2019-05-21 11:05   ` Minchan Kim
2019-05-21 11:30     ` Christian Brauner
2019-05-21 11:39       ` Christian Brauner
2019-05-22  5:11         ` Daniel Colascione
2019-05-22  8:22           ` Christian Brauner
2019-05-22 13:16             ` Daniel Colascione
2019-05-22 14:52               ` Christian Brauner
2019-05-22 15:17                 ` Daniel Colascione
2019-05-22 15:48                   ` Christian Brauner
2019-05-22 15:57                     ` Daniel Colascione
2019-05-22 16:01                       ` Christian Brauner
2019-05-22 16:01                         ` Daniel Colascione
2019-05-23 13:07                           ` Minchan Kim
2019-05-27  8:06                             ` Minchan Kim
2019-05-21 11:41       ` Minchan Kim
2019-05-21 12:04         ` Christian Brauner
2019-05-21 12:15           ` Oleksandr Natalenko
2019-05-21 12:53 ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190521104949.GE219653@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleksandr@redhat.com \
    --cc=shakeelb@google.com \
    --cc=sonnyrao@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git