linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	kernel test robot <oliver.sang@intel.com>,
	Sandeep Patil <sspatil@android.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.10 68/98] pipe: avoid unnecessary EPOLLET wakeups under normal loads
Date: Tue, 24 Aug 2021 12:58:38 -0400	[thread overview]
Message-ID: <20210824165908.709932-69-sashal@kernel.org> (raw)
In-Reply-To: <20210824165908.709932-1-sashal@kernel.org>

From: Linus Torvalds <torvalds@linux-foundation.org>

[ Upstream commit 3b844826b6c6affa80755254da322b017358a2f4 ]

I had forgotten just how sensitive hackbench is to extra pipe wakeups,
and commit 3a34b13a88ca ("pipe: make pipe writes always wake up
readers") ended up causing a quite noticeable regression on larger
machines.

Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's
not clear that this matters in real life all that much, but as Mel
points out, it's used often enough when comparing kernels and so the
performance regression shows up like a sore thumb.

It's easy enough to fix at least for the common cases where pipes are
used purely for data transfer, and you never have any exciting poll
usage at all.  So set a special 'poll_usage' flag when there is polling
activity, and make the ugly "EPOLLET has crazy legacy expectations"
semantics explicit to only that case.

I would love to limit it to just the broken EPOLLET case, but the pipe
code can't see the difference between epoll and regular select/poll, so
any non-read/write waiting will trigger the extra wakeup behavior.  That
is sufficient for at least the hackbench case.

Apart from making the odd extra wakeup cases more explicitly about
EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe
write, not at the first write chunk.  That is actually much saner
semantics (as much as you can call any of the legacy edge-triggered
expectations for EPOLLET "sane") since it means that you know the wakeup
will happen once the write is done, rather than possibly in the middle
of one.

[ For stable people: I'm putting a "Fixes" tag on this, but I leave it
  up to you to decide whether you actually want to backport it or not.
  It likely has no impact outside of synthetic benchmarks  - Linus ]

Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/
Fixes: 3a34b13a88ca ("pipe: make pipe writes always wake up readers")
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Sandeep Patil <sspatil@android.com>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/pipe.c                 | 15 +++++++++------
 include/linux/pipe_fs_i.h |  2 ++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 28b2e973f10e..48abe65333c4 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -444,9 +444,6 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
 #endif
 
 	/*
-	 * Epoll nonsensically wants a wakeup whether the pipe
-	 * was already empty or not.
-	 *
 	 * If it wasn't empty we try to merge new data into
 	 * the last buffer.
 	 *
@@ -455,9 +452,9 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
 	 * spanning multiple pages.
 	 */
 	head = pipe->head;
-	was_empty = true;
+	was_empty = pipe_empty(head, pipe->tail);
 	chars = total_len & (PAGE_SIZE-1);
-	if (chars && !pipe_empty(head, pipe->tail)) {
+	if (chars && !was_empty) {
 		unsigned int mask = pipe->ring_size - 1;
 		struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
 		int offset = buf->offset + buf->len;
@@ -590,8 +587,11 @@ out:
 	 * This is particularly important for small writes, because of
 	 * how (for example) the GNU make jobserver uses small writes to
 	 * wake up pending jobs
+	 *
+	 * Epoll nonsensically wants a wakeup whether the pipe
+	 * was already empty or not.
 	 */
-	if (was_empty) {
+	if (was_empty || pipe->poll_usage) {
 		wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
 		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
 	}
@@ -654,6 +654,9 @@ pipe_poll(struct file *filp, poll_table *wait)
 	struct pipe_inode_info *pipe = filp->private_data;
 	unsigned int head, tail;
 
+	/* Epoll has some historical nasty semantics, this enables them */
+	pipe->poll_usage = 1;
+
 	/*
 	 * Reading pipe state only -- no need for acquiring the semaphore.
 	 *
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 5d2705f1d01c..fc5642431b92 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -48,6 +48,7 @@ struct pipe_buffer {
  *	@files: number of struct file referring this pipe (protected by ->i_lock)
  *	@r_counter: reader counter
  *	@w_counter: writer counter
+ *	@poll_usage: is this pipe used for epoll, which has crazy wakeups?
  *	@fasync_readers: reader side fasync
  *	@fasync_writers: writer side fasync
  *	@bufs: the circular array of pipe buffers
@@ -70,6 +71,7 @@ struct pipe_inode_info {
 	unsigned int files;
 	unsigned int r_counter;
 	unsigned int w_counter;
+	unsigned int poll_usage;
 	struct page *tmp_page;
 	struct fasync_struct *fasync_readers;
 	struct fasync_struct *fasync_writers;
-- 
2.30.2


  parent reply	other threads:[~2021-08-24 17:10 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 16:57 [PATCH 5.10 00/98] 5.10.61-rc1 review Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 01/98] ath: Use safer key clearing with key cache entries Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 02/98] ath9k: Clear key cache explicitly on disabling hardware Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 03/98] ath: Export ath_hw_keysetmac() Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 04/98] ath: Modify ath_key_delete() to not need full key entry Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 05/98] ath9k: Postpone key cache entry deletion for TXQ frames reference it Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 06/98] mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 07/98] media: zr364xx: propagate errors from zr364xx_start_readpipe() Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 08/98] media: zr364xx: fix memory leaks in probe() Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 09/98] vdpa: Extend routine to accept vdpa device name Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 10/98] vdpa: Define vdpa mgmt device, ops and a netlink interface Sasha Levin
2021-08-24 18:54   ` Pavel Machek
2021-08-25  1:16     ` Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 11/98] media: drivers/media/usb: fix memory leak in zr364xx_probe Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 12/98] KVM: x86: Factor out x86 instruction emulation with decoding Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 13/98] KVM: X86: Fix warning caused by stale emulation context Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 14/98] USB: core: Avoid WARNings for 0-length descriptor requests Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 15/98] USB: core: Fix incorrect pipe calculation in do_proc_control() Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 16/98] dmaengine: xilinx_dma: Fix read-after-free bug when terminating transfers Sasha Levin
2021-08-25  8:04   ` Pavel Machek
2021-08-24 16:57 ` [PATCH 5.10 17/98] dmaengine: usb-dmac: Fix PM reference leak in usb_dmac_probe() Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 18/98] spi: spi-mux: Add module info needed for autoloading Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 19/98] net: xfrm: Fix end of loop tests for list_for_each_entry Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 20/98] ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218 Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 21/98] dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is not yet available Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 22/98] scsi: pm80xx: Fix TMF task completion race condition Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 23/98] scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry() Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 24/98] scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach() Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 25/98] scsi: core: Avoid printing an error if target_alloc() returns -ENXIO Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 26/98] scsi: core: Fix capacity set to zero after offlinining device Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 27/98] drm/amdgpu: fix the doorbell missing when in CGPG issue for renoir Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 28/98] qede: fix crash in rmmod qede while automatic debug collection Sasha Levin
2021-08-24 16:57 ` [PATCH 5.10 29/98] ARM: dts: nomadik: Fix up interrupt controller node names Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 30/98] net: usb: pegasus: Check the return value of get_geristers() and friends; Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 31/98] net: usb: lan78xx: don't modify phy_device state concurrently Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 32/98] drm/amd/display: Fix Dynamic bpp issue with 8K30 with Navi 1X Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 33/98] drm/amd/display: workaround for hard hang on HPD on native DP Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 34/98] Bluetooth: hidp: use correct wait queue when removing ctrl_wait Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 35/98] arm64: dts: qcom: c630: fix correct powerdown pin for WSA881x Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 36/98] arm64: dts: qcom: msm8992-bullhead: Remove PSCI Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 37/98] iommu: Check if group is NULL before remove device Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 38/98] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 39/98] dccp: add do-while-0 stubs for dccp_pr_debug macros Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 40/98] virtio: Protect vqs list access Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 41/98] vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update() Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 42/98] bus: ti-sysc: Fix error handling for sysc_check_active_timer() Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 43/98] vhost: Fix the calculation in vhost_overflow() Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 44/98] vdpa/mlx5: Avoid destroying MR on empty iotlb Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 45/98] soc / drm: mediatek: Move DDP component defines into mtk-mmsys.h Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 46/98] drm/mediatek: Fix aal size config Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 47/98] drm/mediatek: Add AAL output size configuration Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 48/98] bpf: Clear zext_dst of dead insns Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 49/98] bnxt: don't lock the tx queue from napi poll Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 50/98] bnxt: disable napi before canceling DIM Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 51/98] bnxt: make sure xmit_more + errors does not miss doorbells Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 52/98] bnxt: count Tx drops Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 53/98] net: 6pack: fix slab-out-of-bounds in decode_data Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 54/98] ptp_pch: Restore dependency on PCI Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 55/98] bnxt_en: Disable aRFS if running on 212 firmware Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 56/98] bnxt_en: Add missing DMA memory barriers Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 57/98] vrf: Reset skb conntrack connection on VRF rcv Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 58/98] virtio-net: support XDP when not more queues Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 59/98] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 60/98] net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32 Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 61/98] ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 62/98] sch_cake: fix srchost/dsthost hashing mode Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 63/98] net: mdio-mux: Don't ignore memory allocation errors Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 64/98] net: mdio-mux: Handle -EPROBE_DEFER correctly Sasha Levin
2021-08-24 19:00   ` Pavel Machek
2021-08-24 19:34     ` Saravana Kannan
2021-08-25  5:05       ` Pavel Machek
2021-08-24 16:58 ` [PATCH 5.10 65/98] ovs: clear skb->tstamp in forwarding path Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 66/98] iommu/vt-d: Consolidate duplicate cache invaliation code Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 67/98] iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry() Sasha Levin
2021-08-24 16:58 ` Sasha Levin [this message]
2021-08-24 16:58 ` [PATCH 5.10 69/98] r8152: fix writing USB_BP2_EN Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 70/98] i40e: Fix ATR queue selection Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 71/98] iavf: Fix ping is lost after untrusted VF had tried to change MAC Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 72/98] Revert "flow_offload: action should not be NULL when it is referenced" Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 73/98] mmc: dw_mmc: Fix hang on data CRC error Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 74/98] mmc: mmci: stm32: Check when the voltage switch procedure should be done Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 75/98] mmc: sdhci-msm: Update the software timeout value for sdhc Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 76/98] clk: imx6q: fix uart earlycon unwork Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 77/98] clk: qcom: gdsc: Ensure regulator init state matches GDSC state Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 78/98] ALSA: hda - fix the 'Capture Switch' value change notifications Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 79/98] tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 80/98] slimbus: messaging: start transaction ids from 1 instead of zero Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 81/98] slimbus: messaging: check for valid transaction id Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 82/98] slimbus: ngd: reset dma setup during runtime pm Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 83/98] ipack: tpci200: fix many double free issues in tpci200_pci_probe Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 84/98] ipack: tpci200: fix memory leak in the tpci200_register Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 85/98] ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9510 laptop Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 86/98] mmc: sdhci-iproc: Cap min clock frequency on BCM2711 Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 87/98] mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN " Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 88/98] btrfs: prevent rename2 from exchanging a subvol with a directory from different parents Sasha Levin
2021-08-24 16:58 ` [PATCH 5.10 89/98] ALSA: hda/via: Apply runtime PM workaround for ASUS B23E Sasha Levin
2021-08-24 16:59 ` [PATCH 5.10 90/98] s390/pci: fix use after free of zpci_dev Sasha Levin
2021-08-24 16:59 ` [PATCH 5.10 91/98] PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI Sasha Levin
2021-08-24 16:59 ` [PATCH 5.10 92/98] ALSA: hda/realtek: Limit mic boost on HP ProBook 445 G8 Sasha Levin
2021-08-24 16:59 ` [PATCH 5.10 93/98] ASoC: intel: atom: Fix breakage for PCM buffer address setup Sasha Levin
2021-08-24 16:59 ` [PATCH 5.10 94/98] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Sasha Levin
2021-08-24 16:59 ` [PATCH 5.10 95/98] fs: warn about impending deprecation of mandatory locks Sasha Levin
2021-08-24 16:59 ` [PATCH 5.10 96/98] io_uring: fix xa_alloc_cycle() error return value check Sasha Levin
2021-08-24 16:59 ` [PATCH 5.10 97/98] io_uring: only assign io_uring_enter() SQPOLL error in actual error case Sasha Levin
2021-08-24 16:59 ` [PATCH 5.10 98/98] Linux 5.10.61-rc1 Sasha Levin
2021-08-25  7:35 ` [PATCH 5.10 00/98] 5.10.61-rc1 review Pavel Machek
2021-08-26 12:53   ` Sasha Levin
2021-08-25 13:12 ` Sudip Mukherjee
2021-08-26 12:53   ` Sasha Levin
2021-08-25 17:50 ` Daniel Díaz
2021-08-25 20:24 ` Guenter Roeck
2021-08-25 22:35 ` Shuah Khan
2021-08-26  1:00 ` Samuel Zou
2021-08-26 12:54   ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210824165908.709932-69-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=oliver.sang@intel.com \
    --cc=sspatil@android.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).