linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <kent.overstreet@linux.dev>
To: Jens Axboe <axboe@kernel.dk>
Cc: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-bcachefs@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>,
	Christian Brauner <brauner@kernel.org>
Subject: Re: [GIT PULL] bcachefs
Date: Thu, 6 Jul 2023 16:15:10 -0400	[thread overview]
Message-ID: <20230706201510.sh5ukzfsf5vdxvrf@moria.home.lan> (raw)
In-Reply-To: <2e635579-37ba-ddfc-a2ab-e6c080ab4971@kernel.dk>

On Wed, Jun 28, 2023 at 03:17:43PM -0600, Jens Axboe wrote:
> On 6/28/23 2:44?PM, Jens Axboe wrote:
> > On 6/28/23 11:52?AM, Kent Overstreet wrote:
> >> On Wed, Jun 28, 2023 at 10:57:02AM -0600, Jens Axboe wrote:
> >>> I discussed this with Christian offline. I have a patch that is pretty
> >>> simple, but it does mean that you'd wait for delayed fput flush off
> >>> umount. Which seems kind of iffy.
> >>>
> >>> I think we need to back up a bit and consider if the kill && umount
> >>> really is sane. If you kill a task that has open files, then any fput
> >>> from that task will end up being delayed. This means that the umount may
> >>> very well fail.
> >>>
> >>> It'd be handy if we could have umount wait for that to finish, but I'm
> >>> not at all confident this is a sane solution for all cases. And as
> >>> discussed, we have no way to even identify which files we'd need to
> >>> flush out of the delayed list.
> >>>
> >>> Maybe the test case just needs fixing? Christian suggested lazy/detach
> >>> umount and wait for sb release. There's an fsnotify hook for that,
> >>> fsnotify_sb_delete(). Obviously this is a bit more involved, but seems
> >>> to me that this would be the way to make it more reliable when killing
> >>> of tasks with open files are involved.
> >>
> >> No, this is a real breakage. Any time we introduce unexpected
> >> asynchrony there's the potential for breakage: case in point, there was
> >> a filesystem that made rm asynchronous, then there were scripts out
> >> there that deleted until df showed under some threshold.. whoops...
> > 
> > This is nothing new - any fput done from an exiting task will end up
> > being deferred. The window may be a bit wider now or a bit different,
> > but it's the same window. If an application assumes it can kill && wait
> > on a task and be guaranteed that the files are released as soon as wait
> > returns, it is mistaken. That is NOT the case.
> 
> Case in point, just changed my reproducer to use aio instead of
> io_uring. Here's the full script:
> 
> #!/bin/bash
> 
> DEV=/dev/nvme1n1
> MNT=/data
> ITER=0
> 
> while true; do
> 	echo loop $ITER
> 	sudo mount $DEV $MNT
> 	fio --name=test --ioengine=aio --iodepth=2 --filename=$MNT/foo --size=1g --buffered=1 --overwrite=0 --numjobs=12 --minimal --rw=randread --output=/dev/null &
> 	Y=$(($RANDOM % 3))
> 	X=$(($RANDOM % 10))
> 	VAL="$Y.$X"
> 	sleep $VAL
> 	ps -e | grep fio > /dev/null 2>&1
> 	while [ $? -eq 0 ]; do
> 		killall -9 fio > /dev/null 2>&1
> 		echo will wait
> 		wait > /dev/null 2>&1
> 		echo done waiting
> 		ps -e | grep "fio " > /dev/null 2>&1
> 	done
> 	sudo umount /data
> 	if [ $? -ne 0 ]; then
> 		break
> 	fi
> 	((ITER++))
> done
> 
> and if I run that, fails on the first umount attempt in that loop:
> 
> axboe@m1max-kvm ~> bash test2.sh
> loop 0
> will wait
> done waiting
> umount: /data: target is busy.

Your test fails because fio by default spawns off multiple processes,
and just calling wait does not wait for the subprocesses.

When I pass --thread to fio, your test passes.

I have a patch to avoid use of the delayed_fput list in the aio path,
but curiously it seems not to be needed - perhaps there's some other
synchronization I haven't found yet. I'm including the patch below in
case the technique is useful for io_uring:

diff --git a/fs/aio.c b/fs/aio.c
index b3e14a9fe3..00cb953efa 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -211,6 +211,7 @@ struct aio_kiocb {
 						 * for cancellation */
 	refcount_t		ki_refcnt;
 
+	struct task_struct	*ki_task;
 	/*
 	 * If the aio_resfd field of the userspace iocb is not zero,
 	 * this is the underlying eventfd context to deliver events to.
@@ -321,7 +322,7 @@ static void put_aio_ring_file(struct kioctx *ctx)
 		ctx->aio_ring_file = NULL;
 		spin_unlock(&i_mapping->private_lock);
 
-		fput(aio_ring_file);
+		__fput_sync(aio_ring_file);
 	}
 }
 
@@ -1068,6 +1069,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 	INIT_LIST_HEAD(&req->ki_list);
 	refcount_set(&req->ki_refcnt, 2);
 	req->ki_eventfd = NULL;
+	req->ki_task = get_task_struct(current);
 	return req;
 }
 
@@ -1104,8 +1106,9 @@ static inline void iocb_destroy(struct aio_kiocb *iocb)
 	if (iocb->ki_eventfd)
 		eventfd_ctx_put(iocb->ki_eventfd);
 	if (iocb->ki_filp)
-		fput(iocb->ki_filp);
+		fput_for_task(iocb->ki_filp, iocb->ki_task);
 	percpu_ref_put(&iocb->ki_ctx->reqs);
+	put_task_struct(iocb->ki_task);
 	kmem_cache_free(kiocb_cachep, iocb);
 }
 
diff --git a/fs/file_table.c b/fs/file_table.c
index 372653b926..137f87f55e 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -367,12 +367,13 @@ EXPORT_SYMBOL_GPL(flush_delayed_fput);
 
 static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
 
-void fput(struct file *file)
+void fput_for_task(struct file *file, struct task_struct *task)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
-		struct task_struct *task = current;
+		if (!task && likely(!in_interrupt() && !(current->flags & PF_KTHREAD)))
+			task = current;
 
-		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
+		if (task) {
 			init_task_work(&file->f_rcuhead, ____fput);
 			if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
 				return;
@@ -388,6 +389,11 @@ void fput(struct file *file)
 	}
 }
 
+void fput(struct file *file)
+{
+	fput_for_task(file, NULL);
+}
+
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
  * in some umount() (and thus can't use flush_delayed_fput() without
@@ -405,6 +411,7 @@ void __fput_sync(struct file *file)
 	}
 }
 
+EXPORT_SYMBOL(fput_for_task);
 EXPORT_SYMBOL(fput);
 EXPORT_SYMBOL(__fput_sync);
 
diff --git a/include/linux/file.h b/include/linux/file.h
index 39704eae83..667a68f477 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -12,7 +12,9 @@
 #include <linux/errno.h>
 
 struct file;
+struct task_struct;
 
+extern void fput_for_task(struct file *, struct task_struct *);
 extern void fput(struct file *);
 
 struct file_operations;

  parent reply	other threads:[~2023-07-06 20:15 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 21:46 [GIT PULL] bcachefs Kent Overstreet
2023-06-26 23:11 ` Jens Axboe
2023-06-27  0:06   ` Kent Overstreet
2023-06-27  1:13     ` Jens Axboe
2023-06-27  2:05       ` Kent Overstreet
2023-06-27  2:59         ` Jens Axboe
2023-06-27  3:10           ` Kent Overstreet
2023-06-27 17:16           ` Jens Axboe
2023-06-27 20:15             ` Kent Overstreet
2023-06-27 22:05               ` Dave Chinner
2023-06-27 22:41                 ` Kent Overstreet
2023-06-28 14:40                 ` Jens Axboe
2023-06-28 14:48                   ` Thomas Weißschuh
2023-06-28 14:58                     ` Jens Axboe
2023-06-28  3:16               ` Jens Axboe
2023-06-28  4:01                 ` Kent Overstreet
2023-06-28 14:58                   ` Jens Axboe
2023-06-28 15:22                     ` Jens Axboe
2023-06-28 17:56                       ` Kent Overstreet
2023-06-28 20:45                         ` Jens Axboe
2023-06-28 16:57                     ` Jens Axboe
2023-06-28 17:33                       ` Christian Brauner
2023-06-28 17:52                       ` Kent Overstreet
2023-06-28 20:44                         ` Jens Axboe
2023-06-28 21:17                           ` Jens Axboe
2023-06-28 22:13                             ` Kent Overstreet
2023-06-28 22:33                               ` Jens Axboe
2023-06-28 22:55                                 ` Kent Overstreet
2023-06-28 23:14                                   ` Jens Axboe
2023-06-28 23:50                                     ` Kent Overstreet
2023-06-29  1:00                                       ` Dave Chinner
2023-06-29  1:33                                         ` Jens Axboe
2023-06-29 11:18                                           ` Christian Brauner
2023-06-29 14:17                                             ` Kent Overstreet
2023-06-29 15:31                                             ` Kent Overstreet
2023-06-30  9:40                                               ` Christian Brauner
2023-07-06 15:20                                                 ` Kent Overstreet
2023-07-06 16:26                                                   ` Jens Axboe
2023-07-06 16:34                                                     ` Kent Overstreet
2023-06-29  1:29                                       ` Jens Axboe
2023-07-06 20:15                             ` Kent Overstreet [this message]
2023-06-28 17:54                     ` Kent Overstreet
2023-06-28 20:54                       ` Jens Axboe
2023-06-28 22:14                         ` Jens Axboe
2023-06-28 23:04                           ` Kent Overstreet
2023-06-28 23:11                             ` Jens Axboe
2023-06-27  2:33       ` Kent Overstreet
2023-06-27  2:59         ` Jens Axboe
2023-06-27  3:19           ` Matthew Wilcox
2023-06-27  3:22             ` Kent Overstreet
2023-06-27  3:52 ` Christoph Hellwig
2023-06-27  4:36   ` Kent Overstreet
2023-07-06 15:56 ` Kent Overstreet
2023-07-06 16:40   ` Josef Bacik
2023-07-06 17:38     ` Kent Overstreet
2023-07-06 19:17       ` Eric Sandeen
2023-07-06 19:31         ` Kent Overstreet
2023-07-06 21:19       ` Darrick J. Wong
2023-07-06 22:43         ` Kent Overstreet
2023-07-07 13:13           ` Jan Kara
2023-07-07 13:52             ` Kent Overstreet
2023-07-07  8:48         ` Christian Brauner
2023-07-07  9:18           ` Kent Overstreet
2023-07-07 16:26             ` James Bottomley
2023-07-07 16:48               ` Kent Overstreet
2023-07-07 17:04                 ` James Bottomley
2023-07-07 17:26                   ` Kent Overstreet
2023-07-08  3:54               ` Matthew Wilcox
2023-07-08  4:10                 ` Kent Overstreet
2023-07-08  4:31                 ` Kent Overstreet
2023-07-08 15:02                   ` Theodore Ts'o
2023-07-08 15:23                     ` Kent Overstreet
2023-07-08 16:42                 ` James Bottomley
2023-07-09  1:16                   ` Kent Overstreet
2023-07-07  9:35           ` Kent Overstreet
2023-07-07  2:04       ` Theodore Ts'o
2023-07-07 12:18       ` Brian Foster
2023-07-07 14:49         ` Kent Overstreet
2023-07-12  2:54   ` Kent Overstreet
2023-07-12 19:48     ` Kees Cook
2023-07-12 19:57       ` Kent Overstreet
2023-07-12 22:10     ` Darrick J. Wong
2023-07-12 23:57       ` Kent Overstreet
2023-08-09  1:27     ` Linus Torvalds
2023-08-10 15:54       ` Kent Overstreet
2023-08-10 16:40         ` Linus Torvalds
2023-08-10 18:02           ` Kent Overstreet
2023-08-10 18:09             ` Linus Torvalds
2023-08-10 17:52         ` Jan Kara
2023-08-11  2:47           ` Kent Overstreet
2023-08-11  8:10             ` Jan Kara
2023-08-11  8:13               ` Christian Brauner
2023-08-10 22:39         ` Darrick J. Wong
2023-08-10 23:47           ` Linus Torvalds
2023-08-11  2:40             ` Jens Axboe
2023-08-11  4:03             ` Kent Overstreet
2023-08-11  5:20               ` Linus Torvalds
2023-08-11  5:29                 ` Kent Overstreet
2023-08-11  5:53                   ` Linus Torvalds
2023-08-11  7:52                     ` Christian Brauner
2023-08-11 14:31                     ` Jens Axboe
2023-08-11  3:45           ` Kent Overstreet
2023-08-21  0:09             ` Dave Chinner
2023-08-10 23:07         ` Matthew Wilcox
2023-08-11 10:54         ` Christian Brauner
2023-08-11 12:58           ` Kent Overstreet
2023-08-14  7:25             ` Christian Brauner
2023-08-14 15:23               ` Kent Overstreet
2023-08-11 13:21           ` Kent Overstreet
2023-08-11 22:56             ` Darrick J. Wong
2023-08-14  7:21             ` Christian Brauner
2023-08-14 15:27               ` Kent Overstreet
2023-09-03  3:25 Kent Overstreet
2023-09-05 13:24 ` Christoph Hellwig
2023-09-06  0:00   ` Kent Overstreet
2023-09-06  0:41     ` Matthew Wilcox
2023-09-06 16:10       ` Kent Overstreet
2023-09-06 17:57         ` Darrick J. Wong
2023-09-08  9:37     ` Christoph Hellwig
2023-09-06 19:36 ` Linus Torvalds
2023-09-06 20:02   ` Linus Torvalds
2023-09-06 20:20     ` Linus Torvalds
2023-09-06 21:55       ` Arnaldo Carvalho de Melo
2023-09-06 23:13         ` David Sterba
2023-09-06 23:34           ` Linus Torvalds
2023-09-06 23:46             ` Arnaldo Carvalho de Melo
2023-09-06 23:53               ` Arnaldo Carvalho de Melo
2023-09-06 23:16         ` Linus Torvalds
2023-09-10  0:53       ` Kent Overstreet
2023-09-07 20:37   ` Kent Overstreet
2023-09-07 20:51     ` Linus Torvalds
2023-09-07 23:40   ` Kent Overstreet
2023-09-08  6:29     ` Martin Steigerwald
2023-09-08  9:11     ` Joshua Ashton
2023-09-06 22:28 ` Nathan Chancellor
2023-09-07  0:03   ` Kees Cook
2023-09-07 14:29     ` Chris Mason
2023-09-07 20:39     ` Kent Overstreet
2023-09-08 10:50       ` Brian Foster
2023-09-08 23:05     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230706201510.sh5ukzfsf5vdxvrf@moria.home.lan \
    --to=kent.overstreet@linux.dev \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).