All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	linux-mm <linux-mm@kvack.org>,
	linux-api@vger.kernel.org, oleksandr@redhat.com,
	Suren Baghdasaryan <surenb@google.com>,
	Tim Murray <timmurray@google.com>,
	Sandeep Patil <sspatil@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	John Dias <joaodias@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Jann Horn <jannh@google.com>,
	alexander.h.duyck@linux.intel.com, sj38.park@gmail.com,
	Arjun Roy <arjunroy@google.com>, Vlastimil Babka <vbabka@suse.cz>,
	Christian Brauner <christian@brauner.io>,
	Daniel Colascione <dancol@google.com>,
	Jens Axboe <axboe@kernel.dk>, Kirill Tkhai <ktkhai@virtuozzo.com>,
	SeongJae Park <sjpark@amazon.de>,
	linux-man@vger.kernel.org
Subject: Re: [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API
Date: Thu, 25 Jun 2020 13:38:52 -0700	[thread overview]
Message-ID: <20200625203852.GA55572@google.com> (raw)
In-Reply-To: <alpine.DEB.2.22.394.2006241251080.35388@chino.kir.corp.google.com>

On Wed, Jun 24, 2020 at 01:00:14PM -0700, David Rientjes wrote:
> On Mon, 22 Jun 2020, Minchan Kim wrote:
> 
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 551ed816eefe..23abca3f93fa 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/falloc.h>
> >  #include <linux/fadvise.h>
> >  #include <linux/sched.h>
> > +#include <linux/sched/mm.h>
> >  #include <linux/ksm.h>
> >  #include <linux/fs.h>
> >  #include <linux/file.h>
> > @@ -995,6 +996,18 @@ madvise_behavior_valid(int behavior)
> >  	}
> >  }
> >  
> > +static bool
> > +process_madvise_behavior_valid(int behavior)
> > +{
> > +	switch (behavior) {
> > +	case MADV_COLD:
> > +	case MADV_PAGEOUT:
> > +		return true;
> > +	default:
> > +		return false;
> > +	}
> > +}
> > +
> >  /*
> >   * The madvise(2) system call.
> >   *
> > @@ -1042,6 +1055,11 @@ madvise_behavior_valid(int behavior)
> >   *  MADV_DONTDUMP - the application wants to prevent pages in the given range
> >   *		from being included in its core dump.
> >   *  MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump.
> > + *  MADV_COLD - the application is not expected to use this memory soon,
> > + *		deactivate pages in this range so that they can be reclaimed
> > + *		easily if memory pressure hanppens.
> > + *  MADV_PAGEOUT - the application is not expected to use this memory soon,
> > + *		page out the pages in this range immediately.
> >   *
> >   * return values:
> >   *  zero    - success
> > @@ -1176,3 +1194,106 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> >  {
> >  	return do_madvise(current, current->mm, start, len_in, behavior);
> >  }
> > +
> > +static int process_madvise_vec(struct task_struct *target_task,
> > +		struct mm_struct *mm, struct iov_iter *iter, int behavior)
> > +{
> > +	struct iovec iovec;
> > +	int ret = 0;
> > +
> > +	while (iov_iter_count(iter)) {
> > +		iovec = iov_iter_iovec(iter);
> > +		ret = do_madvise(target_task, mm, (unsigned long)iovec.iov_base,
> > +					iovec.iov_len, behavior);
> > +		if (ret < 0)
> > +			break;
> > +		iov_iter_advance(iter, iovec.iov_len);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static ssize_t do_process_madvise(int pidfd, struct iov_iter *iter,
> > +				int behavior, unsigned int flags)
> > +{
> > +	ssize_t ret;
> > +	struct pid *pid;
> > +	struct task_struct *task;
> > +	struct mm_struct *mm;
> > +	size_t total_len = iov_iter_count(iter);
> > +
> > +	if (flags != 0)
> > +		return -EINVAL;
> > +
> > +	pid = pidfd_get_pid(pidfd);
> > +	if (IS_ERR(pid))
> > +		return PTR_ERR(pid);
> > +
> > +	task = get_pid_task(pid, PIDTYPE_PID);
> > +	if (!task) {
> > +		ret = -ESRCH;
> > +		goto put_pid;
> > +	}
> > +
> > +	if (task->mm != current->mm &&
> > +			!process_madvise_behavior_valid(behavior)) {
> > +		ret = -EINVAL;
> > +		goto release_task;
> > +	}
> > +
> > +	mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > +	if (IS_ERR_OR_NULL(mm)) {
> > +		ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > +		goto release_task;
> > +	}
> > 
> 
> mm is always task->mm right?  I'm wondering if it would be better to find 
> the mm directly in process_madvise_vec() rather than passing it into the 
> function.  I'm not sure why we'd pass both task and mm here.

That's because of hint Jann provided in the past version.
https://lore.kernel.org/linux-api/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@mail.gmail.com/

Thanks for the review, David.

  reply	other threads:[~2020-06-25 20:39 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-22 19:28 [PATCH v8 0/4] introduce memory hinting API for external process Minchan Kim
2020-06-22 19:28 ` [PATCH v8 1/4] mm/madvise: pass task and mm to do_madvise Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-24 20:00     ` David Rientjes
2020-06-22 19:28 ` [PATCH v8 2/4] pid: move pidfd_get_pid() to pid.c Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-24 20:00     ` David Rientjes
2020-06-22 19:28 ` [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-24 20:00     ` David Rientjes
2020-06-25 20:38     ` Minchan Kim [this message]
2020-08-28 17:40   ` Arnd Bergmann
2020-08-28 17:40     ` Arnd Bergmann
2020-08-28 18:24     ` Jens Axboe
2020-08-28 18:25       ` Christian Brauner
2020-08-28 18:25         ` Christian Brauner
2020-08-28 19:04         ` Minchan Kim
2020-08-28 19:26       ` Jens Axboe
2020-08-28 20:15         ` Arnd Bergmann
2020-08-28 20:15           ` Arnd Bergmann
2020-08-29  7:02     ` Christoph Hellwig
2020-06-22 19:29 ` [PATCH v8 4/4] mm/madvise: check fatal signal pending of target process Minchan Kim
2020-06-24 20:00   ` David Rientjes
2020-06-24 20:00     ` David Rientjes
2020-06-22 19:36 ` [PATCH v8 0/4] introduce memory hinting API for external process Minchan Kim
2020-06-23  8:59   ` Oleksandr Natalenko
2020-06-23  9:07 ` Oleksandr Natalenko
2020-06-24  1:31   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200625203852.GA55572@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=arjunroy@google.com \
    --cc=axboe@kernel.dk \
    --cc=bgeffon@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=christian@brauner.io \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=joaodias@google.com \
    --cc=joel@joelfernandes.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleksandr@redhat.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=sj38.park@gmail.com \
    --cc=sjpark@amazon.de \
    --cc=sonnyrao@google.com \
    --cc=sspatil@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.