mm-commits Archive on lore.kernel.org
 help / color / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, dbueso@suse.de, edumazet@google.com,
	guantaol@google.com, khazhy@google.com, linux-mm@kvack.org,
	mm-commits@vger.kernel.org, soheil@google.com,
	torvalds@linux-foundation.org, willemb@google.com
Subject: [patch 06/78] epoll: check for events when removing a timed out thread from the wait queue
Date: Fri, 18 Dec 2020 14:01:44 -0800
Message-ID: <20201218220144.2NepJeilo%akpm@linux-foundation.org> (raw)
In-Reply-To: <20201218140046.497484741326828e5b5d46ec@linux-foundation.org>

From: Soheil Hassas Yeganeh <soheil@google.com>
Subject: epoll: check for events when removing a timed out thread from the wait queue

Patch series "simplify ep_poll".

This patch series is a followup based on the suggestions and feedback by
Linus:
https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com

The first patch in the series is a fix for the epoll race in presence of
timeouts, so that it can be cleanly backported to all affected stable
kernels.

The rest of the patch series simplify the ep_poll() implementation.  Some
of these simplifications result in minor performance enhancements as well.
We have kept these changes under self tests and internal benchmarks for a
few days, and there are minor (1-2%) performance enhancements as a result.


This patch (of 8):

After abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2)
timeout"), we break out of the ep_poll loop upon timeout, without checking
whether there is any new events available.  Prior to that patch-series we
always called ep_events_available() after exiting the loop.

This can cause races and missed wakeups.  For example, consider the
following scenario reported by Guantao Liu:

Suppose we have an eventfd added using EPOLLET to an epollfd.

Thread 1: Sleeps for just below 5ms and then writes to an eventfd.
Thread 2: Calls epoll_wait with a timeout of 5 ms. If it sees an
          event of the eventfd, it will write back on that fd.
Thread 3: Calls epoll_wait with a negative timeout.

Prior to abc610e01c66, it is guaranteed that Thread 3 will wake up either
by Thread 1 or Thread 2.  After abc610e01c66, Thread 3 can be blocked
indefinitely if Thread 2 sees a timeout right before the write to the
eventfd by Thread 1.  Thread 2 will be woken up from
schedule_hrtimeout_range and, with evail 0, it will not call
ep_send_events().

To fix this issue:
1) Simplify the timed_out case as suggested by Linus.
2) while holding the lock, recheck whether the thread was woken up
   after its time out has reached.

Note that (2) is different from Linus' original suggestion: It do not set
"eavail = ep_events_available(ep)" to avoid unnecessary contention (when
there are too many timed-out threads and a small number of events), as
well as races mentioned in the discussion thread.

This is the first patch in the series so that the backport to stable
releases is straightforward.

Link: https://lkml.kernel.org/r/20201106231635.3528496-1-soheil.kdev@gmail.com
Link: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com
Link: https://lkml.kernel.org/r/20201106231635.3528496-2-soheil.kdev@gmail.com
Fixes: abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Guantao Liu <guantaol@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Guantao Liu <guantaol@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Khazhismel Kumykov <khazhy@google.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/eventpoll.c |   27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

--- a/fs/eventpoll.c~epoll-check-for-events-when-removing-a-timed-out-thread-from-the-wait-queue
+++ a/fs/eventpoll.c
@@ -1817,23 +1817,30 @@ fetch_events:
 		}
 		write_unlock_irq(&ep->lock);
 
-		if (eavail || res)
-			break;
-
-		if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) {
-			timed_out = 1;
-			break;
-		}
-
-		/* We were woken up, thus go and try to harvest some events */
+		if (!eavail && !res)
+			timed_out = !schedule_hrtimeout_range(to, slack,
+							      HRTIMER_MODE_ABS);
+
+		/*
+		 * We were woken up, thus go and try to harvest some events.
+		 * If timed out and still on the wait queue, recheck eavail
+		 * carefully under lock, below.
+		 */
 		eavail = 1;
-
 	} while (0);
 
 	__set_current_state(TASK_RUNNING);
 
 	if (!list_empty_careful(&wait.entry)) {
 		write_lock_irq(&ep->lock);
+		/*
+		 * If the thread timed out and is not on the wait queue, it
+		 * means that the thread was woken up after its timeout expired
+		 * before it could reacquire the lock. Thus, when wait.entry is
+		 * empty, it needs to harvest events.
+		 */
+		if (timed_out)
+			eavail = list_empty(&wait.entry);
 		__remove_wait_queue(&ep->wq, &wait);
 		write_unlock_irq(&ep->lock);
 	}
_

  parent reply index

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-18 22:00 incoming Andrew Morton
2020-12-18 22:01 ` [patch 01/78] mm/memcg: bail early from swap accounting if memcg disabled Andrew Morton
2020-12-18 22:01 ` [patch 02/78] mm/memcg: warning on !memcg after readahead page charged Andrew Morton
2020-12-18 22:01 ` [patch 03/78] mm/memcg: remove unused definitions Andrew Morton
2020-12-18 22:01 ` [patch 04/78] mm, kvm: account kvm_vcpu_mmap to kmemcg Andrew Morton
2020-12-18 22:01 ` [patch 05/78] mm/memcontrol:rewrite mem_cgroup_page_lruvec() Andrew Morton
2020-12-18 22:01 ` Andrew Morton [this message]
2020-12-18 22:01 ` [patch 07/78] epoll: simplify signal handling Andrew Morton
2020-12-18 22:01 ` [patch 08/78] epoll: pull fatal signal checks into ep_send_events() Andrew Morton
2020-12-18 22:01 ` [patch 09/78] epoll: move eavail next to the list_empty_careful check Andrew Morton
2020-12-18 22:01 ` [patch 10/78] epoll: simplify and optimize busy loop logic Andrew Morton
2020-12-18 22:02 ` [patch 11/78] epoll: pull all code between fetch_events and send_event into the loop Andrew Morton
2020-12-18 22:02 ` [patch 12/78] epoll: replace gotos with a proper loop Andrew Morton
2020-12-18 22:02 ` [patch 13/78] epoll: eliminate unnecessary lock for zero timeout Andrew Morton
2020-12-18 22:02 ` [patch 14/78] kasan: drop unnecessary GPL text from comment headers Andrew Morton
2020-12-18 22:02 ` [patch 15/78] kasan: KASAN_VMALLOC depends on KASAN_GENERIC Andrew Morton
2020-12-18 22:02 ` [patch 16/78] kasan: group vmalloc code Andrew Morton
2020-12-18 22:02 ` [patch 17/78] kasan: shadow declarations only for software modes Andrew Morton
2020-12-18 22:02 ` [patch 18/78] kasan: rename (un)poison_shadow to (un)poison_range Andrew Morton
2020-12-18 22:02 ` [patch 19/78] kasan: rename KASAN_SHADOW_* to KASAN_GRANULE_* Andrew Morton
2020-12-18 22:02 ` [patch 20/78] kasan: only build init.c for software modes Andrew Morton
2020-12-18 22:02 ` [patch 21/78] kasan: split out shadow.c from common.c Andrew Morton
2020-12-19  0:28   ` Marco Elver
2020-12-19  1:13     ` Andrew Morton
2020-12-19 10:01       ` Marco Elver
2020-12-19 10:11       ` Marco Elver
2020-12-19 18:01       ` Andrey Konovalov
2020-12-19 19:17       ` Linus Torvalds
2020-12-19 19:26         ` Linus Torvalds
2020-12-21  9:46         ` Alexander Potapenko
2020-12-21 17:41           ` Linus Torvalds
2020-12-22 18:38             ` Andrew Morton
2020-12-18 22:02 ` [patch 22/78] kasan: define KASAN_MEMORY_PER_SHADOW_PAGE Andrew Morton
2020-12-18 22:02 ` [patch 23/78] kasan: rename report and tags files Andrew Morton
2020-12-18 22:02 ` [patch 24/78] kasan: don't duplicate config dependencies Andrew Morton
2020-12-18 22:02 ` [patch 25/78] kasan: hide invalid free check implementation Andrew Morton
2020-12-18 22:02 ` [patch 26/78] kasan: decode stack frame only with KASAN_STACK_ENABLE Andrew Morton
2020-12-18 22:02 ` [patch 27/78] kasan, arm64: only init shadow for software modes Andrew Morton
2020-12-18 22:02 ` [patch 28/78] kasan, arm64: only use kasan_depth " Andrew Morton
2020-12-18 22:03 ` [patch 29/78] kasan, arm64: move initialization message Andrew Morton
2020-12-18 22:03 ` [patch 30/78] kasan, arm64: rename kasan_init_tags and mark as __init Andrew Morton
2020-12-18 22:03 ` [patch 31/78] kasan: rename addr_has_shadow to addr_has_metadata Andrew Morton
2020-12-18 22:03 ` [patch 32/78] kasan: rename print_shadow_for_address to print_memory_metadata Andrew Morton
2020-12-18 22:03 ` [patch 33/78] kasan: rename SHADOW layout macros to META Andrew Morton
2020-12-18 22:03 ` [patch 34/78] kasan: separate metadata_fetch_row for each mode Andrew Morton
2020-12-18 22:03 ` [patch 35/78] kasan: introduce CONFIG_KASAN_HW_TAGS Andrew Morton
2020-12-18 22:03 ` [patch 36/78] arm64: enable armv8.5-a asm-arch option Andrew Morton
2020-12-18 22:03 ` [patch 37/78] arm64: mte: add in-kernel MTE helpers Andrew Morton
2020-12-18 22:03 ` [patch 38/78] arm64: mte: reset the page tag in page->flags Andrew Morton
2020-12-18 22:03 ` [patch 39/78] arm64: mte: add in-kernel tag fault handler Andrew Morton
2020-12-18 22:03 ` [patch 40/78] arm64: kasan: allow enabling in-kernel MTE Andrew Morton
2020-12-18 22:03 ` [patch 41/78] arm64: mte: convert gcr_user into an exclude mask Andrew Morton
2020-12-18 22:03 ` [patch 42/78] arm64: mte: switch GCR_EL1 in kernel entry and exit Andrew Morton
2020-12-18 22:03 ` [patch 43/78] kasan, mm: untag page address in free_reserved_area Andrew Morton
2020-12-18 22:03 ` [patch 44/78] arm64: kasan: align allocations for HW_TAGS Andrew Morton
2020-12-18 22:03 ` [patch 45/78] arm64: kasan: add arch layer for memory tagging helpers Andrew Morton
2020-12-18 22:03 ` [patch 46/78] kasan: define KASAN_GRANULE_SIZE for HW_TAGS Andrew Morton
2020-12-18 22:03 ` [patch 47/78] kasan, x86, s390: update undef CONFIG_KASAN Andrew Morton
2020-12-18 22:04 ` [patch 48/78] kasan, arm64: expand CONFIG_KASAN checks Andrew Morton
2020-12-18 22:04 ` [patch 49/78] kasan, arm64: implement HW_TAGS runtime Andrew Morton
2020-12-18 22:04 ` [patch 50/78] kasan, arm64: print report from tag fault handler Andrew Morton
2020-12-18 22:04 ` [patch 51/78] kasan, mm: reset tags when accessing metadata Andrew Morton
2020-12-18 22:04 ` [patch 52/78] kasan, arm64: enable CONFIG_KASAN_HW_TAGS Andrew Morton
2020-12-18 22:04 ` [patch 53/78] kasan: add documentation for hardware tag-based mode Andrew Morton
2020-12-18 22:04 ` [patch 54/78] kselftest/arm64: check GCR_EL1 after context switch Andrew Morton
2020-12-18 22:04 ` [patch 55/78] kasan: simplify quarantine_put call site Andrew Morton
2020-12-18 22:04 ` [patch 56/78] kasan: rename get_alloc/free_info Andrew Morton
2020-12-18 22:04 ` [patch 57/78] kasan: introduce set_alloc_info Andrew Morton
2020-12-18 22:04 ` [patch 58/78] kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK Andrew Morton
2020-12-18 22:04 ` [patch 59/78] kasan: allow VMAP_STACK for HW_TAGS mode Andrew Morton
2020-12-18 22:04 ` [patch 60/78] kasan: remove __kasan_unpoison_stack Andrew Morton
2020-12-18 22:04 ` [patch 61/78] kasan: inline kasan_reset_tag for tag-based modes Andrew Morton
2020-12-18 22:04 ` [patch 62/78] kasan: inline random_tag for HW_TAGS Andrew Morton
2020-12-18 22:04 ` [patch 63/78] kasan: open-code kasan_unpoison_slab Andrew Morton
2020-12-18 22:04 ` [patch 64/78] kasan: inline (un)poison_range and check_invalid_free Andrew Morton
2020-12-18 22:05 ` [patch 65/78] kasan: add and integrate kasan boot parameters Andrew Morton
2020-12-18 22:05 ` [patch 66/78] kasan, mm: check kasan_enabled in annotations Andrew Morton
2020-12-18 22:05 ` [patch 67/78] kasan, mm: rename kasan_poison_kfree Andrew Morton
2020-12-18 22:05 ` [patch 68/78] kasan: don't round_up too much Andrew Morton
2020-12-18 22:05 ` [patch 69/78] kasan: simplify assign_tag and set_tag calls Andrew Morton
2020-12-18 22:05 ` [patch 70/78] kasan: clarify comment in __kasan_kfree_large Andrew Morton
2020-12-18 22:05 ` [patch 71/78] kasan: sanitize objects when metadata doesn't fit Andrew Morton
2020-12-18 22:05 ` [patch 72/78] kasan, mm: allow cache merging with no metadata Andrew Morton
2020-12-18 22:05 ` [patch 73/78] kasan: update documentation Andrew Morton
2020-12-18 22:05 ` [patch 74/78] mm/Kconfig: fix spelling mistake "whats" -> "what's" Andrew Morton
2020-12-18 22:05 ` [patch 75/78] epoll: convert internal api to timespec64 Andrew Morton
2020-12-18 22:05 ` [patch 76/78] epoll: add syscall epoll_pwait2 Andrew Morton
2020-12-18 22:05 ` [patch 77/78] epoll: wire up " Andrew Morton
2020-12-18 22:05 ` [patch 78/78] selftests/filesystems: expand epoll with epoll_pwait2 Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201218220144.2NepJeilo%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=dbueso@suse.de \
    --cc=edumazet@google.com \
    --cc=guantaol@google.com \
    --cc=khazhy@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=soheil@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

mm-commits Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/mm-commits/0 mm-commits/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 mm-commits mm-commits/ https://lore.kernel.org/mm-commits \
		mm-commits@vger.kernel.org
	public-inbox-index mm-commits

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.mm-commits


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git