All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.0 44/52] aio: simplify - and fix - fget/fput for io_submit()
Date: Tue, 26 Mar 2019 15:30:31 +0900	[thread overview]
Message-ID: <20190326042703.322415113@linuxfoundation.org> (raw)
In-Reply-To: <20190326042700.963224437@linuxfoundation.org>

5.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 84c4e1f89fefe70554da0ab33be72c9be7994379 upstream.

Al Viro root-caused a race where the IOCB_CMD_POLL handling of
fget/fput() could cause us to access the file pointer after it had
already been freed:

 "In more details - normally IOCB_CMD_POLL handling looks so:

   1) io_submit(2) allocates aio_kiocb instance and passes it to
      aio_poll()

   2) aio_poll() resolves the descriptor to struct file by req->file =
      fget(iocb->aio_fildes)

   3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
      aio_kiocb to 2 (bumps by 1, that is).

   4) aio_poll() calls vfs_poll(). After sanity checks (basically,
      "poll_wait() had been called and only once") it locks the queue.
      That's what the extra reference to iocb had been for - we know we
      can safely access it.

   5) With queue locked, we check if ->woken has already been set to
      true (by aio_poll_wake()) and, if it had been, we unlock the
      queue, drop a reference to aio_kiocb and bugger off - at that
      point it's a responsibility to aio_poll_wake() and the stuff
      called/scheduled by it. That code will drop the reference to file
      in req->file, along with the other reference to our aio_kiocb.

   6) otherwise, we see whether we need to wait. If we do, we unlock the
      queue, drop one reference to aio_kiocb and go away - eventual
      wakeup (or cancel) will deal with the reference to file and with
      the other reference to aio_kiocb

   7) otherwise we remove ourselves from waitqueue (still under the
      queue lock), so that wakeup won't get us. No async activity will
      be happening, so we can safely drop req->file and iocb ourselves.

  If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
  won't get freed under us, so we can do all the checks and locking
  safely. And we don't touch ->file if we detect that case.

  However, vfs_poll() most certainly *does* touch the file it had been
  given. So wakeup coming while we are still in ->poll() might end up
  doing fput() on that file. That case is not too rare, and usually we
  are saved by the still present reference from descriptor table - that
  fput() is not the final one.

  But if another thread closes that descriptor right after our fget()
  and wakeup does happen before ->poll() returns, we are in trouble -
  final fput() done while we are in the middle of a method:

Al also wrote a patch to take an extra reference to the file descriptor
to fix this, but I instead suggested we just streamline the whole file
pointer handling by submit_io() so that the generic aio submission code
simply keeps the file pointer around until the aio has completed.

Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/aio.c           |   72 +++++++++++++++++++++--------------------------------
 include/linux/fs.h |    8 +++++
 2 files changed, 36 insertions(+), 44 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -167,9 +167,13 @@ struct kioctx {
 	unsigned		id;
 };
 
+/*
+ * First field must be the file pointer in all the
+ * iocb unions! See also 'struct kiocb' in <linux/fs.h>
+ */
 struct fsync_iocb {
-	struct work_struct	work;
 	struct file		*file;
+	struct work_struct	work;
 	bool			datasync;
 };
 
@@ -183,8 +187,15 @@ struct poll_iocb {
 	struct work_struct	work;
 };
 
+/*
+ * NOTE! Each of the iocb union members has the file pointer
+ * as the first entry in their struct definition. So you can
+ * access the file pointer through any of the sub-structs,
+ * or directly as just 'ki_filp' in this struct.
+ */
 struct aio_kiocb {
 	union {
+		struct file		*ki_filp;
 		struct kiocb		rw;
 		struct fsync_iocb	fsync;
 		struct poll_iocb	poll;
@@ -1060,6 +1071,8 @@ static inline void iocb_put(struct aio_k
 {
 	if (refcount_read(&iocb->ki_refcnt) == 0 ||
 	    refcount_dec_and_test(&iocb->ki_refcnt)) {
+		if (iocb->ki_filp)
+			fput(iocb->ki_filp);
 		percpu_ref_put(&iocb->ki_ctx->reqs);
 		kmem_cache_free(kiocb_cachep, iocb);
 	}
@@ -1424,7 +1437,6 @@ static void aio_complete_rw(struct kiocb
 		file_end_write(kiocb->ki_filp);
 	}
 
-	fput(kiocb->ki_filp);
 	aio_complete(iocb, res, res2);
 }
 
@@ -1432,9 +1444,6 @@ static int aio_prep_rw(struct kiocb *req
 {
 	int ret;
 
-	req->ki_filp = fget(iocb->aio_fildes);
-	if (unlikely(!req->ki_filp))
-		return -EBADF;
 	req->ki_complete = aio_complete_rw;
 	req->private = NULL;
 	req->ki_pos = iocb->aio_offset;
@@ -1451,7 +1460,7 @@ static int aio_prep_rw(struct kiocb *req
 		ret = ioprio_check_cap(iocb->aio_reqprio);
 		if (ret) {
 			pr_debug("aio ioprio check cap error: %d\n", ret);
-			goto out_fput;
+			return ret;
 		}
 
 		req->ki_ioprio = iocb->aio_reqprio;
@@ -1460,14 +1469,10 @@ static int aio_prep_rw(struct kiocb *req
 
 	ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
 	if (unlikely(ret))
-		goto out_fput;
+		return ret;
 
 	req->ki_flags &= ~IOCB_HIPRI; /* no one is going to poll for this I/O */
 	return 0;
-
-out_fput:
-	fput(req->ki_filp);
-	return ret;
 }
 
 static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec,
@@ -1521,24 +1526,19 @@ static ssize_t aio_read(struct kiocb *re
 	if (ret)
 		return ret;
 	file = req->ki_filp;
-
-	ret = -EBADF;
 	if (unlikely(!(file->f_mode & FMODE_READ)))
-		goto out_fput;
+		return -EBADF;
 	ret = -EINVAL;
 	if (unlikely(!file->f_op->read_iter))
-		goto out_fput;
+		return -EINVAL;
 
 	ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
 	if (ret)
-		goto out_fput;
+		return ret;
 	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret)
 		aio_rw_done(req, call_read_iter(file, req, &iter));
 	kfree(iovec);
-out_fput:
-	if (unlikely(ret))
-		fput(file);
 	return ret;
 }
 
@@ -1555,16 +1555,14 @@ static ssize_t aio_write(struct kiocb *r
 		return ret;
 	file = req->ki_filp;
 
-	ret = -EBADF;
 	if (unlikely(!(file->f_mode & FMODE_WRITE)))
-		goto out_fput;
-	ret = -EINVAL;
+		return -EBADF;
 	if (unlikely(!file->f_op->write_iter))
-		goto out_fput;
+		return -EINVAL;
 
 	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
 	if (ret)
-		goto out_fput;
+		return ret;
 	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret) {
 		/*
@@ -1582,9 +1580,6 @@ static ssize_t aio_write(struct kiocb *r
 		aio_rw_done(req, call_write_iter(file, req, &iter));
 	}
 	kfree(iovec);
-out_fput:
-	if (unlikely(ret))
-		fput(file);
 	return ret;
 }
 
@@ -1594,7 +1589,6 @@ static void aio_fsync_work(struct work_s
 	int ret;
 
 	ret = vfs_fsync(req->file, req->datasync);
-	fput(req->file);
 	aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0);
 }
 
@@ -1605,13 +1599,8 @@ static int aio_fsync(struct fsync_iocb *
 			iocb->aio_rw_flags))
 		return -EINVAL;
 
-	req->file = fget(iocb->aio_fildes);
-	if (unlikely(!req->file))
-		return -EBADF;
-	if (unlikely(!req->file->f_op->fsync)) {
-		fput(req->file);
+	if (unlikely(!req->file->f_op->fsync))
 		return -EINVAL;
-	}
 
 	req->datasync = datasync;
 	INIT_WORK(&req->work, aio_fsync_work);
@@ -1621,10 +1610,7 @@ static int aio_fsync(struct fsync_iocb *
 
 static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask)
 {
-	struct file *file = iocb->poll.file;
-
 	aio_complete(iocb, mangle_poll(mask), 0);
-	fput(file);
 }
 
 static void aio_poll_complete_work(struct work_struct *work)
@@ -1749,9 +1735,6 @@ static ssize_t aio_poll(struct aio_kiocb
 
 	INIT_WORK(&req->work, aio_poll_complete_work);
 	req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP;
-	req->file = fget(iocb->aio_fildes);
-	if (unlikely(!req->file))
-		return -EBADF;
 
 	req->head = NULL;
 	req->woken = false;
@@ -1794,10 +1777,8 @@ static ssize_t aio_poll(struct aio_kiocb
 	spin_unlock_irq(&ctx->ctx_lock);
 
 out:
-	if (unlikely(apt.error)) {
-		fput(req->file);
+	if (unlikely(apt.error))
 		return apt.error;
-	}
 
 	if (mask)
 		aio_poll_complete(aiocb, mask);
@@ -1835,6 +1816,11 @@ static int __io_submit_one(struct kioctx
 	if (unlikely(!req))
 		goto out_put_reqs_available;
 
+	req->ki_filp = fget(iocb->aio_fildes);
+	ret = -EBADF;
+	if (unlikely(!req->ki_filp))
+		goto out_put_req;
+
 	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
 		/*
 		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -304,13 +304,19 @@ enum rw_hint {
 
 struct kiocb {
 	struct file		*ki_filp;
+
+	/* The 'ki_filp' pointer is shared in a union for aio */
+	randomized_struct_fields_start
+
 	loff_t			ki_pos;
 	void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
 	void			*private;
 	int			ki_flags;
 	u16			ki_hint;
 	u16			ki_ioprio; /* See linux/ioprio.h */
-} __randomize_layout;
+
+	randomized_struct_fields_end
+};
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
 {



  parent reply	other threads:[~2019-03-26  6:41 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-26  6:29 [PATCH 5.0 00/52] 5.0.5-stable review Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 01/52] ALSA: hda - add Lenovo IdeaCentre B550 to the power_save_blacklist Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 02/52] ALSA: firewire-motu: use version field of unit directory to identify model Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 03/52] mmc: pxamci: fix enum type confusion Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 04/52] mmc: alcor: fix DMA reads Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 05/52] mmc: mxcmmc: "Revert mmc: mxcmmc: handle highmem pages" Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 06/52] mmc: renesas_sdhi: limit block count to 16 bit for old revisions Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 07/52] drm/amdgpu: fix invalid use of change_bit Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 08/52] drm/vmwgfx: Dont double-free the mode stored in par->set_mode Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 09/52] drm/vmwgfx: Return 0 when gmrid::get_node runs out of IDs Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 10/52] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 11/52] iommu/iova: Fix tracking of recently failed iova address Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 5.0 12/52] libceph: wait for latest osdmap in ceph_monc_blacklist_add() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 13/52] udf: Fix crash on IO error during truncate Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 14/52] mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 15/52] MIPS: Ensure ELF appended dtb is relocated Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 16/52] MIPS: Fix kernel crash for R6 in jump label branch function Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 17/52] powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038 Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 18/52] powerpc/security: Fix spectre_v2 reporting Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 19/52] net/mlx5: Fix DCT creation bad flow Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 20/52] scsi: core: Avoid that a kernel warning appears during system resume Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 21/52] scsi: qla2xxx: Fix FC-AL connection target discovery Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 22/52] scsi: ibmvscsi: Protect ibmvscsi_head from concurrent modificaiton Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 23/52] scsi: ibmvscsi: Fix empty event pool access during host removal Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 24/52] futex: Ensure that futex address is aligned in handle_futex_death() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 25/52] cifs: allow guest mounts to work for smb3.11 Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 26/52] perf probe: Fix getting the kernel map Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 27/52] objtool: Move objtool_file struct off the stack Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 28/52] irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 29/52] clocksource/drivers/riscv: Fix clocksource mask Greg Kroah-Hartman
2019-03-26  6:30   ` Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 30/52] SMB3: Fix SMB3.1.1 guest mounts to Samba Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 31/52] ALSA: hda - Dont trigger jackpoll_work in azx_resume Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 32/52] ALSA: ac97: Fix of-node refcount unbalance Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 33/52] ext4: fix NULL pointer dereference while journal is aborted Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 34/52] ext4: fix data corruption caused by unaligned direct AIO Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 35/52] ext4: brelse all indirect buffer in ext4_ind_remove_space() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 36/52] media: v4l2-ctrls.c/uvc: zero v4l2_event Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 37/52] Bluetooth: hci_uart: Check if socket buffer is ERR_PTR in h4_recv_buf() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 38/52] Bluetooth: Fix decrementing reference count twice in releasing socket Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 39/52] Bluetooth: hci_ldisc: Initialize hci_dev before open() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 40/52] Bluetooth: hci_ldisc: Postpone HCI_UART_PROTO_READY bit set in hci_uart_set_proto() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 41/52] drm/vkms: Fix flush_work() without INIT_WORK() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 42/52] RDMA/cma: Rollback source IP address if failing to acquire device Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 43/52] f2fs: fix to avoid deadlock of atomic file operations Greg Kroah-Hartman
2019-03-26  6:30 ` Greg Kroah-Hartman [this message]
2019-03-26  6:30 ` [PATCH 5.0 45/52] netfilter: ebtables: remove BUGPRINT messages Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 46/52] loop: access lo_backing_file only when the loop device is Lo_bound Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 47/52] x86/unwind: Handle NULL pointer calls better in frame unwinder Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 48/52] x86/unwind: Add hardcoded ORC entry for NULL Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 49/52] locking/lockdep: Add debug_locks check in __lock_downgrade() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 50/52] mm, mempolicy: fix uninit memory access Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 51/52] ALSA: hda - Record the current power state before suspend/resume calls Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 5.0 52/52] ALSA: hda - Enforces runtime_resume after S3 and S4 for each codec Greg Kroah-Hartman
2019-03-26 15:20 ` [PATCH 5.0 00/52] 5.0.5-stable review Jon Hunter
2019-03-26 15:20   ` Jon Hunter
2019-03-27  0:56   ` Greg Kroah-Hartman
2019-03-26 17:50 ` Guenter Roeck
2019-03-27  0:59   ` Greg Kroah-Hartman
2019-03-26 23:18 ` shuah
2019-03-27  0:55   ` Greg Kroah-Hartman
2019-03-27  4:06 ` Naresh Kamboju
2019-03-27  5:06   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190326042703.322415113@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.