All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Eric Biggers <ebiggers@google.com>
Subject: [PATCH 4.19 37/74] aio: keep poll requests on waitqueue until completed
Date: Mon, 13 Dec 2021 10:30:08 +0100	[thread overview]
Message-ID: <20211213092932.060590322@linuxfoundation.org> (raw)
In-Reply-To: <20211213092930.763200615@linuxfoundation.org>

From: Eric Biggers <ebiggers@google.com>

commit 363bee27e25804d8981dd1c025b4ad49dc39c530 upstream.

Currently, aio_poll_wake() will always remove the poll request from the
waitqueue.  Then, if aio_poll_complete_work() sees that none of the
polled events are ready and the request isn't cancelled, it re-adds the
request to the waitqueue.  (This can easily happen when polling a file
that doesn't pass an event mask when waking up its waitqueue.)

This is fundamentally broken for two reasons:

  1. If a wakeup occurs between vfs_poll() and the request being
     re-added to the waitqueue, it will be missed because the request
     wasn't on the waitqueue at the time.  Therefore, IOCB_CMD_POLL
     might never complete even if the polled file is ready.

  2. When the request isn't on the waitqueue, there is no way to be
     notified that the waitqueue is being freed (which happens when its
     lifetime is shorter than the struct file's).  This is supposed to
     happen via the waitqueue entries being woken up with POLLFREE.

Therefore, leave the requests on the waitqueue until they are actually
completed (or cancelled).  To keep track of when aio_poll_complete_work
needs to be scheduled, use new fields in struct poll_iocb.  Remove the
'done' field which is now redundant.

Note that this is consistent with how sys_poll() and eventpoll work;
their wakeup functions do *not* remove the waitqueue entries.

Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL")
Cc: <stable@vger.kernel.org> # v4.18+
Link: https://lore.kernel.org/r/20211209010455.42744-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/aio.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 63 insertions(+), 20 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -176,8 +176,9 @@ struct poll_iocb {
 	struct file		*file;
 	struct wait_queue_head	*head;
 	__poll_t		events;
-	bool			done;
 	bool			cancelled;
+	bool			work_scheduled;
+	bool			work_need_resched;
 	struct wait_queue_entry	wait;
 	struct work_struct	work;
 };
@@ -1635,14 +1636,26 @@ static void aio_poll_complete_work(struc
 	 * avoid further branches in the fast path.
 	 */
 	spin_lock_irq(&ctx->ctx_lock);
+	spin_lock(&req->head->lock);
 	if (!mask && !READ_ONCE(req->cancelled)) {
-		add_wait_queue(req->head, &req->wait);
+		/*
+		 * The request isn't actually ready to be completed yet.
+		 * Reschedule completion if another wakeup came in.
+		 */
+		if (req->work_need_resched) {
+			schedule_work(&req->work);
+			req->work_need_resched = false;
+		} else {
+			req->work_scheduled = false;
+		}
+		spin_unlock(&req->head->lock);
 		spin_unlock_irq(&ctx->ctx_lock);
 		return;
 	}
+	list_del_init(&req->wait.entry);
+	spin_unlock(&req->head->lock);
 	list_del_init(&iocb->ki_list);
 	iocb->ki_res.res = mangle_poll(mask);
-	req->done = true;
 	spin_unlock_irq(&ctx->ctx_lock);
 
 	iocb_put(iocb);
@@ -1656,9 +1669,9 @@ static int aio_poll_cancel(struct kiocb
 
 	spin_lock(&req->head->lock);
 	WRITE_ONCE(req->cancelled, true);
-	if (!list_empty(&req->wait.entry)) {
-		list_del_init(&req->wait.entry);
+	if (!req->work_scheduled) {
 		schedule_work(&aiocb->poll.work);
+		req->work_scheduled = true;
 	}
 	spin_unlock(&req->head->lock);
 
@@ -1677,20 +1690,26 @@ static int aio_poll_wake(struct wait_que
 	if (mask && !(mask & req->events))
 		return 0;
 
-	list_del_init(&req->wait.entry);
-
-	if (mask && spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
+	/*
+	 * Complete the request inline if possible.  This requires that three
+	 * conditions be met:
+	 *   1. An event mask must have been passed.  If a plain wakeup was done
+	 *	instead, then mask == 0 and we have to call vfs_poll() to get
+	 *	the events, so inline completion isn't possible.
+	 *   2. The completion work must not have already been scheduled.
+	 *   3. ctx_lock must not be busy.  We have to use trylock because we
+	 *	already hold the waitqueue lock, so this inverts the normal
+	 *	locking order.  Use irqsave/irqrestore because not all
+	 *	filesystems (e.g. fuse) call this function with IRQs disabled,
+	 *	yet IRQs have to be disabled before ctx_lock is obtained.
+	 */
+	if (mask && !req->work_scheduled &&
+	    spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
 		struct kioctx *ctx = iocb->ki_ctx;
 
-		/*
-		 * Try to complete the iocb inline if we can. Use
-		 * irqsave/irqrestore because not all filesystems (e.g. fuse)
-		 * call this function with IRQs disabled and because IRQs
-		 * have to be disabled before ctx_lock is obtained.
-		 */
+		list_del_init(&req->wait.entry);
 		list_del(&iocb->ki_list);
 		iocb->ki_res.res = mangle_poll(mask);
-		req->done = true;
 		if (iocb->ki_eventfd && eventfd_signal_count()) {
 			iocb = NULL;
 			INIT_WORK(&req->work, aio_poll_put_work);
@@ -1700,7 +1719,20 @@ static int aio_poll_wake(struct wait_que
 		if (iocb)
 			iocb_put(iocb);
 	} else {
-		schedule_work(&req->work);
+		/*
+		 * Schedule the completion work if needed.  If it was already
+		 * scheduled, record that another wakeup came in.
+		 *
+		 * Don't remove the request from the waitqueue here, as it might
+		 * not actually be complete yet (we won't know until vfs_poll()
+		 * is called), and we must not miss any wakeups.
+		 */
+		if (req->work_scheduled) {
+			req->work_need_resched = true;
+		} else {
+			schedule_work(&req->work);
+			req->work_scheduled = true;
+		}
 	}
 	return 1;
 }
@@ -1747,8 +1779,9 @@ static ssize_t aio_poll(struct aio_kiocb
 	req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP;
 
 	req->head = NULL;
-	req->done = false;
 	req->cancelled = false;
+	req->work_scheduled = false;
+	req->work_need_resched = false;
 
 	apt.pt._qproc = aio_poll_queue_proc;
 	apt.pt._key = req->events;
@@ -1763,17 +1796,27 @@ static ssize_t aio_poll(struct aio_kiocb
 	spin_lock_irq(&ctx->ctx_lock);
 	if (likely(req->head)) {
 		spin_lock(&req->head->lock);
-		if (unlikely(list_empty(&req->wait.entry))) {
-			if (apt.error)
+		if (list_empty(&req->wait.entry) || req->work_scheduled) {
+			/*
+			 * aio_poll_wake() already either scheduled the async
+			 * completion work, or completed the request inline.
+			 */
+			if (apt.error) /* unsupported case: multiple queues */
 				cancel = true;
 			apt.error = 0;
 			mask = 0;
 		}
 		if (mask || apt.error) {
+			/* Steal to complete synchronously. */
 			list_del_init(&req->wait.entry);
 		} else if (cancel) {
+			/* Cancel if possible (may be too late though). */
 			WRITE_ONCE(req->cancelled, true);
-		} else if (!req->done) { /* actually waiting for an event */
+		} else if (!list_empty(&req->wait.entry)) {
+			/*
+			 * Actually waiting for an event, so add the request to
+			 * active_reqs so that it can be cancelled if needed.
+			 */
 			list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
 			aiocb->ki_cancel = aio_poll_cancel;
 		}



  parent reply	other threads:[~2021-12-13  9:44 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-13  9:29 [PATCH 4.19 00/74] 4.19.221-rc1 review Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 01/74] HID: google: add eel USB id Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 02/74] HID: add hid_is_usb() function to make it simpler for USB detection Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 03/74] HID: add USB_HID dependancy to hid-prodikeys Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 04/74] HID: add USB_HID dependancy to hid-chicony Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 05/74] HID: add USB_HID dependancy on some USB HID drivers Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 06/74] HID: wacom: fix problems when device is not a valid USB device Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 07/74] HID: check for valid USB device for many HID drivers Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 08/74] can: kvaser_usb: get CAN clock frequency from device Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 09/74] can: sja1000: fix use after free in ems_pcmcia_add_card() Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 10/74] net: core: netlink: add helper refcount dec and lock function Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 11/74] net: sched: rename qdisc_destroy() to qdisc_put() Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 12/74] net: sched: extend Qdisc with rcu Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 13/74] net: sched: add helper function to take reference to Qdisc Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 14/74] net: sched: use Qdisc rcu API instead of relying on rtnl lock Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 15/74] nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 16/74] bpf: Fix the off-by-two error in range markings Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 17/74] ice: ignore dropped packets during init Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 18/74] bonding: make tx_rebalance_counter an atomic Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 19/74] nfp: Fix memory leak in nfp_cpp_area_cache_add() Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 20/74] seg6: fix the iif in the IPv6 socket control block Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 21/74] udp: using datalen to cap max gso segments Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 22/74] IB/hfi1: Correct guard on eager buffer deallocation Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 23/74] mm: bdi: initialize bdi_min_ratio when bdi is unregistered Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 24/74] ALSA: ctl: Fix copy of updated id with element read/write Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 25/74] ALSA: pcm: oss: Fix negative period/buffer sizes Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 26/74] ALSA: pcm: oss: Limit the period size to 16MB Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 27/74] ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*() Greg Kroah-Hartman
2021-12-13  9:29 ` [PATCH 4.19 28/74] tracefs: Have new files inherit the ownership of their parent Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 29/74] clk: qcom: regmap-mux: fix parent clock lookup Greg Kroah-Hartman
2021-12-15  9:16   ` Pavel Machek
2021-12-15 10:37     ` Dmitry Baryshkov
2021-12-15 17:54       ` Pavel Machek
2021-12-13  9:30 ` [PATCH 4.19 30/74] can: pch_can: pch_can_rx_normal: fix use after free Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 31/74] can: m_can: Disable and ignore ELO interrupt Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 32/74] x86/sme: Explicitly map new EFI memmap table as encrypted Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 33/74] libata: add horkage for ASMedia 1092 Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 34/74] wait: add wake_up_pollfree() Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 35/74] binder: use wake_up_pollfree() Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 36/74] signalfd: " Greg Kroah-Hartman
2021-12-13  9:30 ` Greg Kroah-Hartman [this message]
2021-12-13  9:30 ` [PATCH 4.19 38/74] aio: fix use-after-free due to missing POLLFREE handling Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 39/74] tracefs: Set all files to the same group ownership as the mount option Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 40/74] block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 41/74] qede: validate non LSO skb length Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 42/74] ASoC: qdsp6: q6routing: Fix return value from msm_routing_put_audio_mixer Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 43/74] i40e: Fix pre-set max number of queues for VF Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 44/74] mtd: rawnand: fsmc: Take instruction delay into account Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 45/74] tools build: Remove needless libpython-version feature check that breaks test-all fast path Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 46/74] net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 47/74] net: altera: set a couple error code in probe() Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 48/74] net: fec: only clear interrupt of handling queue in fec_enet_rx_queue() Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 49/74] net, neigh: clear whole pneigh_entry at alloc time Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 50/74] net/qla3xxx: fix an error code in ql_adapter_up() Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 51/74] USB: gadget: detect too-big endpoint 0 requests Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 52/74] USB: gadget: zero allocate endpoint 0 buffers Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 53/74] usb: core: config: fix validation of wMaxPacketValue entries Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 54/74] xhci: Remove CONFIG_USB_DEFAULT_PERSIST to prevent xHCI from runtime suspending Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 55/74] usb: core: config: using bit mask instead of individual bits Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 56/74] xhci: avoid race between disable slot command and host runtime suspend Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 57/74] iio: trigger: Fix reference counting Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 58/74] iio: trigger: stm32-timer: fix MODULE_ALIAS Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 59/74] iio: stk3310: Dont return error code in interrupt handler Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 60/74] iio: mma8452: Fix trigger reference couting Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 61/74] iio: ltr501: Dont return error code in trigger handler Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 62/74] iio: kxsd9: " Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 63/74] iio: itg3200: Call iio_trigger_notify_done() on error Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 64/74] iio: dln2-adc: Fix lockdep complaint Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 65/74] iio: dln2: Check return value of devm_iio_trigger_register() Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 66/74] iio: at91-sama5d2: Fix incorrect sign extension Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 67/74] iio: adc: axp20x_adc: fix charging current reporting on AXP22x Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 68/74] iio: accel: kxcjk-1013: Fix possible memory leak in probe and remove Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 69/74] irqchip/armada-370-xp: Fix return value of armada_370_xp_msi_alloc() Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 70/74] irqchip/armada-370-xp: Fix support for Multi-MSI interrupts Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 71/74] irqchip/irq-gic-v3-its.c: Force synchronisation when issuing INVALL Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 72/74] irqchip: nvic: Fix offset for Interrupt Priority Offsets Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 73/74] net_sched: fix a crash in tc_new_tfilter() Greg Kroah-Hartman
2021-12-13  9:30 ` [PATCH 4.19 74/74] net: sched: make function qdisc_free_cb() static Greg Kroah-Hartman
2021-12-13 10:33 ` [PATCH 4.19 00/74] 4.19.221-rc1 review Pavel Machek
2021-12-13 14:43 ` Jon Hunter
2021-12-13 16:27 ` Sudip Mukherjee
2021-12-13 18:58   ` Sudip Mukherjee
2021-12-13 19:05     ` Linus Torvalds
2021-12-13 22:23       ` Sudip Mukherjee
2021-12-13 19:55 ` Guenter Roeck
2021-12-13 20:30 ` Shuah Khan
2021-12-14  5:04 ` Naresh Kamboju
2021-12-14 12:40 ` Sudip Mukherjee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211213092932.060590322@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=ebiggers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.