IO-Uring Archive on lore.kernel.org
 help / color / Atom feed
* Re: [PATCH v5 1/7] mm: pass task and mm to do_madvise
       [not found] ` <20200214170520.160271-2-minchan@kernel.org>
@ 2020-02-14 17:25   ` Jann Horn
  2020-02-14 18:22     ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Jann Horn @ 2020-02-14 17:25 UTC (permalink / raw)
  To: Minchan Kim, Jens Axboe, io-uring
  Cc: Andrew Morton, LKML, linux-mm, Linux API, Oleksandr Natalenko,
	Suren Baghdasaryan, Tim Murray, Daniel Colascione, Sandeep Patil,
	Sonny Rao, Brian Geffon, Michal Hocko, Johannes Weiner,
	Shakeel Butt, John Dias, Joel Fernandes, sj38.park,
	Alexander Duyck

+Jens and io-uring list

On Fri, Feb 14, 2020 at 6:06 PM Minchan Kim <minchan@kernel.org> wrote:
> In upcoming patches, do_madvise will be called from external process
> context so we shouldn't asssume "current" is always hinted process's
> task_struct.
[...]
> [1] http://lore.kernel.org/r/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@mail.gmail.com
[...]
> diff --git a/fs/io_uring.c b/fs/io_uring.c
[...]
> @@ -2736,7 +2736,7 @@ static int io_madvise(struct io_kiocb *req, struct io_kiocb **nxt,
>         if (force_nonblock)
>                 return -EAGAIN;
>
> -       ret = do_madvise(ma->addr, ma->len, ma->advice);
> +       ret = do_madvise(current, current->mm, ma->addr, ma->len, ma->advice);
>         if (ret < 0)
>                 req_set_fail_links(req);
>         io_cqring_add_event(req, ret);

Jens, can you have a look at this change and the following patch
<https://lore.kernel.org/linux-mm/20200214170520.160271-4-minchan@kernel.org/>
("[PATCH v5 3/7] mm: check fatal signal pending of target process")?
Basically Minchan's patch tries to plumb through the identity of the
target task so that if that task gets killed in the middle of the
operation, the (potentially long-running and costly) madvise operation
can be cancelled. Just passing in "current" instead (which in this
case is the uring worker thread AFAIK) doesn't really break anything,
other than making the optimization not work, but I wonder whether this
couldn't be done more cleanly - maybe by passing in NULL to mean "we
don't know who the target task is", since I think we don't know that
here?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 1/7] mm: pass task and mm to do_madvise
  2020-02-14 17:25   ` [PATCH v5 1/7] mm: pass task and mm to do_madvise Jann Horn
@ 2020-02-14 18:22     ` Jens Axboe
  2020-02-14 18:45       ` Minchan Kim
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2020-02-14 18:22 UTC (permalink / raw)
  To: Jann Horn, Minchan Kim, io-uring
  Cc: Andrew Morton, LKML, linux-mm, Linux API, Oleksandr Natalenko,
	Suren Baghdasaryan, Tim Murray, Daniel Colascione, Sandeep Patil,
	Sonny Rao, Brian Geffon, Michal Hocko, Johannes Weiner,
	Shakeel Butt, John Dias, Joel Fernandes, sj38.park,
	Alexander Duyck

On 2/14/20 10:25 AM, Jann Horn wrote:
> +Jens and io-uring list
> 
> On Fri, Feb 14, 2020 at 6:06 PM Minchan Kim <minchan@kernel.org> wrote:
>> In upcoming patches, do_madvise will be called from external process
>> context so we shouldn't asssume "current" is always hinted process's
>> task_struct.
> [...]
>> [1] http://lore.kernel.org/r/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@mail.gmail.com
> [...]
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
> [...]
>> @@ -2736,7 +2736,7 @@ static int io_madvise(struct io_kiocb *req, struct io_kiocb **nxt,
>>         if (force_nonblock)
>>                 return -EAGAIN;
>>
>> -       ret = do_madvise(ma->addr, ma->len, ma->advice);
>> +       ret = do_madvise(current, current->mm, ma->addr, ma->len, ma->advice);
>>         if (ret < 0)
>>                 req_set_fail_links(req);
>>         io_cqring_add_event(req, ret);
> 
> Jens, can you have a look at this change and the following patch
> <https://lore.kernel.org/linux-mm/20200214170520.160271-4-minchan@kernel.org/>
> ("[PATCH v5 3/7] mm: check fatal signal pending of target process")?
> Basically Minchan's patch tries to plumb through the identity of the
> target task so that if that task gets killed in the middle of the
> operation, the (potentially long-running and costly) madvise operation
> can be cancelled. Just passing in "current" instead (which in this
> case is the uring worker thread AFAIK) doesn't really break anything,
> other than making the optimization not work, but I wonder whether this
> couldn't be done more cleanly - maybe by passing in NULL to mean "we
> don't know who the target task is", since I think we don't know that
> here?

Thanks for bringing this to my attention, patches that touch io_uring
(or anything else) really should be CC'ed to the maintainer(s) of those
areas...

Yeah, the change above won't do the right thing for io_uring, in fact
it'll always be the wrong task. So I'd second Jann's question, and ask
if we really need the actual task, or if NULL could be used? For
cancelation purposes, I'm guessing you want the task that's actually
doing the operation, even if it's on behalf of someone else. That makes
the interface a bit weird, as you'd assume the task/mm passed in would
be related to the madvise itself, not just for cancelation.

Would be nice with some clarification, so we can figure out an approach
that would actually work.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 1/7] mm: pass task and mm to do_madvise
  2020-02-14 18:22     ` Jens Axboe
@ 2020-02-14 18:45       ` Minchan Kim
  2020-02-14 19:09         ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Minchan Kim @ 2020-02-14 18:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jann Horn, io-uring, Andrew Morton, LKML, linux-mm, Linux API,
	Oleksandr Natalenko, Suren Baghdasaryan, Tim Murray,
	Daniel Colascione, Sandeep Patil, Sonny Rao, Brian Geffon,
	Michal Hocko, Johannes Weiner, Shakeel Butt, John Dias,
	Joel Fernandes, sj38.park, Alexander Duyck

On Fri, Feb 14, 2020 at 11:22:08AM -0700, Jens Axboe wrote:
> On 2/14/20 10:25 AM, Jann Horn wrote:
> > +Jens and io-uring list
> > 
> > On Fri, Feb 14, 2020 at 6:06 PM Minchan Kim <minchan@kernel.org> wrote:
> >> In upcoming patches, do_madvise will be called from external process
> >> context so we shouldn't asssume "current" is always hinted process's
> >> task_struct.
> > [...]
> >> [1] http://lore.kernel.org/r/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@mail.gmail.com
> > [...]
> >> diff --git a/fs/io_uring.c b/fs/io_uring.c
> > [...]
> >> @@ -2736,7 +2736,7 @@ static int io_madvise(struct io_kiocb *req, struct io_kiocb **nxt,
> >>         if (force_nonblock)
> >>                 return -EAGAIN;
> >>
> >> -       ret = do_madvise(ma->addr, ma->len, ma->advice);
> >> +       ret = do_madvise(current, current->mm, ma->addr, ma->len, ma->advice);
> >>         if (ret < 0)
> >>                 req_set_fail_links(req);
> >>         io_cqring_add_event(req, ret);
> > 
> > Jens, can you have a look at this change and the following patch
> > <https://lore.kernel.org/linux-mm/20200214170520.160271-4-minchan@kernel.org/>
> > ("[PATCH v5 3/7] mm: check fatal signal pending of target process")?
> > Basically Minchan's patch tries to plumb through the identity of the
> > target task so that if that task gets killed in the middle of the
> > operation, the (potentially long-running and costly) madvise operation
> > can be cancelled. Just passing in "current" instead (which in this
> > case is the uring worker thread AFAIK) doesn't really break anything,
> > other than making the optimization not work, but I wonder whether this
> > couldn't be done more cleanly - maybe by passing in NULL to mean "we
> > don't know who the target task is", since I think we don't know that
> > here?
> 
> Thanks for bringing this to my attention, patches that touch io_uring
> (or anything else) really should be CC'ed to the maintainer(s) of those
> areas...

Hi Jens, it was my mistake. Sorry for that.

> 
> Yeah, the change above won't do the right thing for io_uring, in fact
> it'll always be the wrong task. So I'd second Jann's question, and ask
> if we really need the actual task, or if NULL could be used? For
> cancelation purposes, I'm guessing you want the task that's actually
> doing the operation, even if it's on behalf of someone else. That makes
> the interface a bit weird, as you'd assume the task/mm passed in would
> be related to the madvise itself, not just for cancelation.
> 
> Would be nice with some clarification, so we can figure out an approach
> that would actually work.

MADV_(COLD|PAGEOUT) checks both caller and callee and the part aims for
callee(ie, target task). Thus, we could pass NULL for io_madvise if
it couldn't know who is target and let's have NULL check before the
fatal_signal_pending. I will put following checks in [3/7].

	if (private->target_Task &&
			fatal_signal_pending(private->target_task))
		return -EINTR;

From d008a5a1049b03b3e0eeef7121faead2b6555f49 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Fri, 14 Feb 2020 07:29:58 -0800
Subject: [PATCH] mm: pass task and mm to do_madvise

In upcoming patches, do_madvise will be called from external process
context so we shouldn't asssume "current" is always hinted process's
task_struct. Furthermore, we couldn't access mm_struct via task->mm
once it's verified by access_mm which will be introduced in next
patch[1]. And let's pass *current* and current->mm as arguments of
do_madvise so it shouldn't change existing behavior but prepare
next patch to make review easy.

Note: io_madvise pass NULL as target_tas argument of do_madvise
because it couldn't know who is target.

[1] http://lore.kernel.org/r/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@mail.gmail.com

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 fs/io_uring.c      |  2 +-
 include/linux/mm.h |  3 ++-
 mm/madvise.c       | 34 +++++++++++++++++++---------------
 3 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 63beda9bafc5..1c7e9cd6c8ce 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2736,7 +2736,7 @@ static int io_madvise(struct io_kiocb *req, struct io_kiocb **nxt,
 	if (force_nonblock)
 		return -EAGAIN;
 
-	ret = do_madvise(ma->addr, ma->len, ma->advice);
+	ret = do_madvise(NULL, current->mm, ma->addr, ma->len, ma->advice);
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..beb9259f9ed1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2323,7 +2323,8 @@ extern int __do_munmap(struct mm_struct *, unsigned long, size_t,
 		       struct list_head *uf, bool downgrade);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t,
 		     struct list_head *uf);
-extern int do_madvise(unsigned long start, size_t len_in, int behavior);
+extern int do_madvise(struct task_struct *task, struct mm_struct *mm,
+		unsigned long start, size_t len_in, int behavior);
 
 static inline unsigned long
 do_mmap_pgoff(struct file *file, unsigned long addr,
diff --git a/mm/madvise.c b/mm/madvise.c
index 43b47d3fae02..f75c86b6c463 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -254,6 +254,7 @@ static long madvise_willneed(struct vm_area_struct *vma,
 			     struct vm_area_struct **prev,
 			     unsigned long start, unsigned long end)
 {
+	struct mm_struct *mm = vma->vm_mm;
 	struct file *file = vma->vm_file;
 	loff_t offset;
 
@@ -288,12 +289,12 @@ static long madvise_willneed(struct vm_area_struct *vma,
 	 */
 	*prev = NULL;	/* tell sys_madvise we drop mmap_sem */
 	get_file(file);
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	offset = (loff_t)(start - vma->vm_start)
 			+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 	vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED);
 	fput(file);
-	down_read(&current->mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	return 0;
 }
 
@@ -676,7 +677,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 	if (nr_swap) {
 		if (current->mm == mm)
 			sync_mm_rss(mm);
-
 		add_mm_counter(mm, MM_SWAPENTS, nr_swap);
 	}
 	arch_leave_lazy_mmu_mode();
@@ -756,6 +756,8 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 				  unsigned long start, unsigned long end,
 				  int behavior)
 {
+	struct mm_struct *mm = vma->vm_mm;
+
 	*prev = vma;
 	if (!can_madv_lru_vma(vma))
 		return -EINVAL;
@@ -763,8 +765,8 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 	if (!userfaultfd_remove(vma, start, end)) {
 		*prev = NULL; /* mmap_sem has been dropped, prev is stale */
 
-		down_read(&current->mm->mmap_sem);
-		vma = find_vma(current->mm, start);
+		down_read(&mm->mmap_sem);
+		vma = find_vma(mm, start);
 		if (!vma)
 			return -ENOMEM;
 		if (start < vma->vm_start) {
@@ -818,6 +820,7 @@ static long madvise_remove(struct vm_area_struct *vma,
 	loff_t offset;
 	int error;
 	struct file *f;
+	struct mm_struct *mm = vma->vm_mm;
 
 	*prev = NULL;	/* tell sys_madvise we drop mmap_sem */
 
@@ -845,13 +848,13 @@ static long madvise_remove(struct vm_area_struct *vma,
 	get_file(f);
 	if (userfaultfd_remove(vma, start, end)) {
 		/* mmap_sem was not released by userfaultfd_remove() */
-		up_read(&current->mm->mmap_sem);
+		up_read(&mm->mmap_sem);
 	}
 	error = vfs_fallocate(f,
 				FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
 				offset, end - start);
 	fput(f);
-	down_read(&current->mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	return error;
 }
 
@@ -1044,7 +1047,8 @@ madvise_behavior_valid(int behavior)
  *  -EBADF  - map exists, but area maps something that isn't a file.
  *  -EAGAIN - a kernel resource was temporarily unavailable.
  */
-int do_madvise(unsigned long start, size_t len_in, int behavior)
+int do_madvise(struct task_struct *target_task, struct mm_struct *mm,
+		unsigned long start, size_t len_in, int behavior)
 {
 	unsigned long end, tmp;
 	struct vm_area_struct *vma, *prev;
@@ -1082,10 +1086,10 @@ int do_madvise(unsigned long start, size_t len_in, int behavior)
 
 	write = madvise_need_mmap_write(behavior);
 	if (write) {
-		if (down_write_killable(&current->mm->mmap_sem))
+		if (down_write_killable(&mm->mmap_sem))
 			return -EINTR;
 	} else {
-		down_read(&current->mm->mmap_sem);
+		down_read(&mm->mmap_sem);
 	}
 
 	/*
@@ -1093,7 +1097,7 @@ int do_madvise(unsigned long start, size_t len_in, int behavior)
 	 * ranges, just ignore them, but return -ENOMEM at the end.
 	 * - different from the way of handling in mlock etc.
 	 */
-	vma = find_vma_prev(current->mm, start, &prev);
+	vma = find_vma_prev(mm, start, &prev);
 	if (vma && start > vma->vm_start)
 		prev = vma;
 
@@ -1130,19 +1134,19 @@ int do_madvise(unsigned long start, size_t len_in, int behavior)
 		if (prev)
 			vma = prev->vm_next;
 		else	/* madvise_remove dropped mmap_sem */
-			vma = find_vma(current->mm, start);
+			vma = find_vma(mm, start);
 	}
 out:
 	blk_finish_plug(&plug);
 	if (write)
-		up_write(&current->mm->mmap_sem);
+		up_write(&mm->mmap_sem);
 	else
-		up_read(&current->mm->mmap_sem);
+		up_read(&mm->mmap_sem);
 
 	return error;
 }
 
 SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
 {
-	return do_madvise(start, len_in, behavior);
+	return do_madvise(current, current->mm, start, len_in, behavior);
 }
-- 
2.25.0.265.gbab2e86ba0-goog

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 1/7] mm: pass task and mm to do_madvise
  2020-02-14 18:45       ` Minchan Kim
@ 2020-02-14 19:09         ` Jens Axboe
  2020-02-14 19:31           ` Minchan Kim
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2020-02-14 19:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Jann Horn, io-uring, Andrew Morton, LKML, linux-mm, Linux API,
	Oleksandr Natalenko, Suren Baghdasaryan, Tim Murray,
	Daniel Colascione, Sandeep Patil, Sonny Rao, Brian Geffon,
	Michal Hocko, Johannes Weiner, Shakeel Butt, John Dias,
	Joel Fernandes, sj38.park, Alexander Duyck

On 2/14/20 11:45 AM, Minchan Kim wrote:
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 63beda9bafc5..1c7e9cd6c8ce 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2736,7 +2736,7 @@ static int io_madvise(struct io_kiocb *req, struct io_kiocb **nxt,
>  	if (force_nonblock)
>  		return -EAGAIN;
>  
> -	ret = do_madvise(ma->addr, ma->len, ma->advice);
> +	ret = do_madvise(NULL, current->mm, ma->addr, ma->len, ma->advice);
>  	if (ret < 0)
>  		req_set_fail_links(req);
>  	io_cqring_add_event(req, ret);

I think we want to use req->work.mm here - it'll be the same as
current->mm at this point, but it makes it clear that we're using a
grabbed mm.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 1/7] mm: pass task and mm to do_madvise
  2020-02-14 19:09         ` Jens Axboe
@ 2020-02-14 19:31           ` Minchan Kim
  0 siblings, 0 replies; 5+ messages in thread
From: Minchan Kim @ 2020-02-14 19:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jann Horn, io-uring, Andrew Morton, LKML, linux-mm, Linux API,
	Oleksandr Natalenko, Suren Baghdasaryan, Tim Murray,
	Daniel Colascione, Sandeep Patil, Sonny Rao, Brian Geffon,
	Michal Hocko, Johannes Weiner, Shakeel Butt, John Dias,
	Joel Fernandes, sj38.park, Alexander Duyck

On Fri, Feb 14, 2020 at 12:09:50PM -0700, Jens Axboe wrote:
> On 2/14/20 11:45 AM, Minchan Kim wrote:
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index 63beda9bafc5..1c7e9cd6c8ce 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -2736,7 +2736,7 @@ static int io_madvise(struct io_kiocb *req, struct io_kiocb **nxt,
> >  	if (force_nonblock)
> >  		return -EAGAIN;
> >  
> > -	ret = do_madvise(ma->addr, ma->len, ma->advice);
> > +	ret = do_madvise(NULL, current->mm, ma->addr, ma->len, ma->advice);
> >  	if (ret < 0)
> >  		req_set_fail_links(req);
> >  	io_cqring_add_event(req, ret);
> 
> I think we want to use req->work.mm here - it'll be the same as
> current->mm at this point, but it makes it clear that we're using a
> grabbed mm.

Will fix at respin. Thanks for the review!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200214170520.160271-1-minchan@kernel.org>
     [not found] ` <20200214170520.160271-2-minchan@kernel.org>
2020-02-14 17:25   ` [PATCH v5 1/7] mm: pass task and mm to do_madvise Jann Horn
2020-02-14 18:22     ` Jens Axboe
2020-02-14 18:45       ` Minchan Kim
2020-02-14 19:09         ` Jens Axboe
2020-02-14 19:31           ` Minchan Kim

IO-Uring Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/io-uring/0 io-uring/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 io-uring io-uring/ https://lore.kernel.org/io-uring \
		io-uring@vger.kernel.org
	public-inbox-index io-uring

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.io-uring


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git