stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.4 15/75] net: fix sk_page_frag() recursion from memory reclaim
Date: Fri,  8 Nov 2019 19:49:32 +0100	[thread overview]
Message-ID: <20191108174722.021907258@linuxfoundation.org> (raw)
In-Reply-To: <20191108174708.135680837@linuxfoundation.org>

From: Tejun Heo <tj@kernel.org>

[ Upstream commit 20eb4f29b60286e0d6dc01d9c260b4bd383c58fb ]

sk_page_frag() optimizes skb_frag allocations by using per-task
skb_frag cache when it knows it's the only user.  The condition is
determined by seeing whether the socket allocation mask allows
blocking - if the allocation may block, it obviously owns the task's
context and ergo exclusively owns current->task_frag.

Unfortunately, this misses recursion through memory reclaim path.
Please take a look at the following backtrace.

 [2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10
     ...
     tcp_sendmsg+0x27/0x40
     sock_sendmsg+0x30/0x40
     sock_xmit.isra.24+0xa1/0x170 [nbd]
     nbd_send_cmd+0x1d2/0x690 [nbd]
     nbd_queue_rq+0x1b5/0x3b0 [nbd]
     __blk_mq_try_issue_directly+0x108/0x1b0
     blk_mq_request_issue_directly+0xbd/0xe0
     blk_mq_try_issue_list_directly+0x41/0xb0
     blk_mq_sched_insert_requests+0xa2/0xe0
     blk_mq_flush_plug_list+0x205/0x2a0
     blk_flush_plug_list+0xc3/0xf0
 [1] blk_finish_plug+0x21/0x2e
     _xfs_buf_ioapply+0x313/0x460
     __xfs_buf_submit+0x67/0x220
     xfs_buf_read_map+0x113/0x1a0
     xfs_trans_read_buf_map+0xbf/0x330
     xfs_btree_read_buf_block.constprop.42+0x95/0xd0
     xfs_btree_lookup_get_block+0x95/0x170
     xfs_btree_lookup+0xcc/0x470
     xfs_bmap_del_extent_real+0x254/0x9a0
     __xfs_bunmapi+0x45c/0xab0
     xfs_bunmapi+0x15/0x30
     xfs_itruncate_extents_flags+0xca/0x250
     xfs_free_eofblocks+0x181/0x1e0
     xfs_fs_destroy_inode+0xa8/0x1b0
     destroy_inode+0x38/0x70
     dispose_list+0x35/0x50
     prune_icache_sb+0x52/0x70
     super_cache_scan+0x120/0x1a0
     do_shrink_slab+0x120/0x290
     shrink_slab+0x216/0x2b0
     shrink_node+0x1b6/0x4a0
     do_try_to_free_pages+0xc6/0x370
     try_to_free_mem_cgroup_pages+0xe3/0x1e0
     try_charge+0x29e/0x790
     mem_cgroup_charge_skmem+0x6a/0x100
     __sk_mem_raise_allocated+0x18e/0x390
     __sk_mem_schedule+0x2a/0x40
 [0] tcp_sendmsg_locked+0x8eb/0xe10
     tcp_sendmsg+0x27/0x40
     sock_sendmsg+0x30/0x40
     ___sys_sendmsg+0x26d/0x2b0
     __sys_sendmsg+0x57/0xa0
     do_syscall_64+0x42/0x100
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

In [0], tcp_send_msg_locked() was using current->page_frag when it
called sk_wmem_schedule().  It already calculated how many bytes can
be fit into current->page_frag.  Due to memory pressure,
sk_wmem_schedule() called into memory reclaim path which called into
xfs and then IO issue path.  Because the filesystem in question is
backed by nbd, the control goes back into the tcp layer - back into
tcp_sendmsg_locked().

nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
sense - it's in the process of freeing memory and wants to be able to,
e.g., drop clean pages to make forward progress.  However, this
confused sk_page_frag() called from [2].  Because it only tests
whether the allocation allows blocking which it does, it now thinks
current->page_frag can be used again although it already was being
used in [0].

After [2] used current->page_frag, the offset would be increased by
the used amount.  When the control returns to [0],
current->page_frag's offset is increased and the previously calculated
number of bytes now may overrun the end of allocated memory leading to
silent memory corruptions.

Fix it by adding gfpflags_normal_context() which tests sleepable &&
!reclaim and use it to determine whether to use current->task_frag.

v2: Eric didn't like gfp flags being tested twice.  Introduce a new
    helper gfpflags_normal_context() and combine the two tests.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/gfp.h |   23 +++++++++++++++++++++++
 include/net/sock.h  |   11 ++++++++---
 2 files changed, 31 insertions(+), 3 deletions(-)

--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -274,6 +274,29 @@ static inline bool gfpflags_allow_blocki
 	return (bool __force)(gfp_flags & __GFP_DIRECT_RECLAIM);
 }
 
+/**
+ * gfpflags_normal_context - is gfp_flags a normal sleepable context?
+ * @gfp_flags: gfp_flags to test
+ *
+ * Test whether @gfp_flags indicates that the allocation is from the
+ * %current context and allowed to sleep.
+ *
+ * An allocation being allowed to block doesn't mean it owns the %current
+ * context.  When direct reclaim path tries to allocate memory, the
+ * allocation context is nested inside whatever %current was doing at the
+ * time of the original allocation.  The nested allocation may be allowed
+ * to block but modifying anything %current owns can corrupt the outer
+ * context's expectations.
+ *
+ * %true result from this function indicates that the allocation context
+ * can sleep and use anything that's associated with %current.
+ */
+static inline bool gfpflags_normal_context(const gfp_t gfp_flags)
+{
+	return (gfp_flags & (__GFP_DIRECT_RECLAIM | __GFP_MEMALLOC)) ==
+		__GFP_DIRECT_RECLAIM;
+}
+
 #ifdef CONFIG_HIGHMEM
 #define OPT_ZONE_HIGHMEM ZONE_HIGHMEM
 #else
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2077,12 +2077,17 @@ struct sk_buff *sk_stream_alloc_skb(stru
  * sk_page_frag - return an appropriate page_frag
  * @sk: socket
  *
- * If socket allocation mode allows current thread to sleep, it means its
- * safe to use the per task page_frag instead of the per socket one.
+ * Use the per task page_frag instead of the per socket one for
+ * optimization when we know that we're in the normal context and owns
+ * everything that's associated with %current.
+ *
+ * gfpflags_allow_blocking() isn't enough here as direct reclaim may nest
+ * inside other socket operations and end up recursing into sk_page_frag()
+ * while it's already in use.
  */
 static inline struct page_frag *sk_page_frag(struct sock *sk)
 {
-	if (gfpflags_allow_blocking(sk->sk_allocation))
+	if (gfpflags_normal_context(sk->sk_allocation))
 		return &current->task_frag;
 
 	return &sk->sk_frag;



  parent reply	other threads:[~2019-11-08 19:24 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-08 18:49 [PATCH 4.4 00/75] 4.4.200-stable review Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 01/75] kbuild: add -fcf-protection=none when using retpoline flags Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 02/75] regulator: ti-abb: Fix timeout in ti_abb_wait_txdone/ti_abb_clear_all_txdone Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 03/75] regulator: pfuze100-regulator: Variable "val" in pfuze100_regulator_probe() could be uninitialized Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 04/75] ASoc: rockchip: i2s: Fix RPM imbalance Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 05/75] ARM: dts: logicpd-torpedo-som: Remove twl_keypad Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 06/75] ARM: mm: fix alignment handler faults under memory pressure Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 07/75] scsi: sni_53c710: fix compilation error Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 08/75] scsi: fix kconfig dependency warning related to 53C700_LE_ON_BE Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 09/75] perf kmem: Fix memory leak in compact_gfp_flags() Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 10/75] scsi: target: core: Do not overwrite CDB byte 1 Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 11/75] of: unittest: fix memory leak in unittest_data_add Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 12/75] MIPS: bmips: mark exception vectors as char arrays Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 13/75] cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 14/75] dccp: do not leak jiffies on the wire Greg Kroah-Hartman
2019-11-08 18:49 ` Greg Kroah-Hartman [this message]
2019-11-08 18:49 ` [PATCH 4.4 16/75] net: hisilicon: Fix ping latency when deal with high throughput Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 17/75] net: Zeroing the structure ethtool_wolinfo in ethtool_get_wol() Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 18/75] net: add READ_ONCE() annotation in __skb_wait_for_more_packets() Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 19/75] vxlan: check tun_info options_len properly Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 20/75] net/mlx4_core: Dynamically set guaranteed amount of counters per VF Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 21/75] inet: stop leaking jiffies on the wire Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 22/75] net/flow_dissector: switch to siphash Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 23/75] dmaengine: qcom: bam_dma: Fix resource leak Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 24/75] ARM: 8051/1: put_user: fix possible data corruption in put_user Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 25/75] ARM: 8478/2: arm/arm64: add arm-smccc Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 26/75] ARM: 8479/2: add implementation for arm-smccc Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 27/75] ARM: 8480/2: arm64: " Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 28/75] ARM: 8481/2: drivers: psci: replace psci firmware calls Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 29/75] ARM: uaccess: remove put_user() code duplication Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 30/75] ARM: Move system register accessors to asm/cp15.h Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 31/75] arm/arm64: KVM: Advertise SMCCC v1.1 Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 32/75] arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 33/75] firmware/psci: Expose PSCI conduit Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 34/75] firmware/psci: Expose SMCCC version through psci_ops Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 35/75] arm/arm64: smccc: Make function identifiers an unsigned quantity Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 36/75] arm/arm64: smccc: Implement SMCCC v1.1 inline primitive Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 37/75] arm/arm64: smccc: Add SMCCC-specific return codes Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 38/75] arm/arm64: smccc-1.1: Make return values unsigned long Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 39/75] arm/arm64: smccc-1.1: Handle function result as parameters Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 40/75] ARM: add more CPU part numbers for Cortex and Brahma B15 CPUs Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 41/75] ARM: bugs: prepare processor bug infrastructure Greg Kroah-Hartman
2019-11-08 18:49 ` [PATCH 4.4 42/75] ARM: bugs: hook processor bug checking into SMP and suspend paths Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 43/75] ARM: bugs: add support for per-processor bug checking Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 44/75] ARM: spectre: add Kconfig symbol for CPUs vulnerable to Spectre Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 45/75] ARM: spectre-v2: harden branch predictor on context switches Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 46/75] ARM: spectre-v2: add Cortex A8 and A15 validation of the IBE bit Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 47/75] ARM: spectre-v2: harden user aborts in kernel space Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 48/75] ARM: spectre-v2: add firmware based hardening Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 49/75] ARM: spectre-v2: warn about incorrect context switching functions Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 50/75] ARM: spectre-v1: add speculation barrier (csdb) macros Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 51/75] ARM: spectre-v1: add array_index_mask_nospec() implementation Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 52/75] ARM: spectre-v1: fix syscall entry Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 53/75] ARM: signal: copy registers using __copy_from_user() Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 54/75] ARM: vfp: use __copy_from_user() when restoring VFP state Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 55/75] ARM: oabi-compat: copy semops using __copy_from_user() Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 56/75] ARM: use __inttype() in get_user() Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 57/75] ARM: spectre-v1: use get_user() for __get_user() Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 58/75] ARM: spectre-v1: mitigate user accesses Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 59/75] ARM: 8789/1: signal: copy registers using __copy_to_user() Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 60/75] ARM: 8791/1: vfp: use __copy_to_user() when saving VFP state Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 61/75] ARM: 8792/1: oabi-compat: copy oabi events using __copy_to_user() Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 62/75] ARM: 8793/1: signal: replace __put_user_error with __put_user Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 63/75] ARM: 8794/1: uaccess: Prevent speculative use of the current addr_limit Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 64/75] ARM: 8795/1: spectre-v1.1: use put_user() for __put_user() Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 65/75] ARM: 8796/1: spectre-v1,v1.1: provide helpers for address sanitization Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 66/75] ARM: 8810/1: vfp: Fix wrong assignement to ufp_exc Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 67/75] ARM: make lookup_processor_type() non-__init Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 68/75] ARM: split out processor lookup Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 69/75] ARM: clean up per-processor check_bugs method call Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 70/75] ARM: add PROC_VTABLE and PROC_TABLE macros Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 71/75] ARM: spectre-v2: per-CPU vtables to work around big.Little systems Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 72/75] ARM: ensure that processor vtables is not lost after boot Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 73/75] ARM: fix the cockup in the previous patch Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 74/75] alarmtimer: Change remaining ENOTSUPP to EOPNOTSUPP Greg Kroah-Hartman
2019-11-08 18:50 ` [PATCH 4.4 75/75] fs/dcache: move security_d_instantiate() behind attaching dentry to inode Greg Kroah-Hartman
2019-11-09  1:17 ` [PATCH 4.4 00/75] 4.4.200-stable review kernelci.org bot
2019-11-09 10:32 ` Naresh Kamboju
2019-11-09 15:38 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191108174722.021907258@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).