linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, David Rientjes <rientjes@google.com>,
	Guenter Roeck <linux@roeck-us.net>,
	Douglas Anderson <dianders@chromium.org>,
	Mike Snitzer <snitzer@redhat.com>
Subject: [PATCH 4.9 44/52] dm bufio: avoid sleeping while holding the dm_bufio lock
Date: Tue, 10 Jul 2018 20:25:12 +0200	[thread overview]
Message-ID: <20180710182453.462571488@linuxfoundation.org> (raw)
In-Reply-To: <20180710182449.285532226@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Douglas Anderson <dianders@chromium.org>

commit 9ea61cac0b1ad0c09022f39fd97e9b99a2cfc2dc upstream.

We've seen in-field reports showing _lots_ (18 in one case, 41 in
another) of tasks all sitting there blocked on:

  mutex_lock+0x4c/0x68
  dm_bufio_shrink_count+0x38/0x78
  shrink_slab.part.54.constprop.65+0x100/0x464
  shrink_zone+0xa8/0x198

In the two cases analyzed, we see one task that looks like this:

  Workqueue: kverityd verity_prefetch_io

  __switch_to+0x9c/0xa8
  __schedule+0x440/0x6d8
  schedule+0x94/0xb4
  schedule_timeout+0x204/0x27c
  schedule_timeout_uninterruptible+0x44/0x50
  wait_iff_congested+0x9c/0x1f0
  shrink_inactive_list+0x3a0/0x4cc
  shrink_lruvec+0x418/0x5cc
  shrink_zone+0x88/0x198
  try_to_free_pages+0x51c/0x588
  __alloc_pages_nodemask+0x648/0xa88
  __get_free_pages+0x34/0x7c
  alloc_buffer+0xa4/0x144
  __bufio_new+0x84/0x278
  dm_bufio_prefetch+0x9c/0x154
  verity_prefetch_io+0xe8/0x10c
  process_one_work+0x240/0x424
  worker_thread+0x2fc/0x424
  kthread+0x10c/0x114

...and that looks to be the one holding the mutex.

The problem has been reproduced on fairly easily:
0. Be running Chrome OS w/ verity enabled on the root filesystem
1. Pick test patch: http://crosreview.com/412360
2. Install launchBalloons.sh and balloon.arm from
     http://crbug.com/468342
   ...that's just a memory stress test app.
3. On a 4GB rk3399 machine, run
     nice ./launchBalloons.sh 4 900 100000
   ...that tries to eat 4 * 900 MB of memory and keep accessing.
4. Login to the Chrome web browser and restore many tabs

With that, I've seen printouts like:
  DOUG: long bufio 90758 ms
...and stack trace always show's we're in dm_bufio_prefetch().

The problem is that we try to allocate memory with GFP_NOIO while
we're holding the dm_bufio lock.  Instead we should be using
GFP_NOWAIT.  Using GFP_NOIO can cause us to sleep while holding the
lock and that causes the above problems.

The current behavior explained by David Rientjes:

  It will still try reclaim initially because __GFP_WAIT (or
  __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO.  This is the cause of
  contention on dm_bufio_lock() that the thread holds.  You want to
  pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a
  mutex that can be contended by a concurrent slab shrinker (if
  count_objects didn't use a trylock, this pattern would trivially
  deadlock).

This change significantly increases responsiveness of the system while
in this state.  It makes a real difference because it unblocks kswapd.
In the bug report analyzed, kswapd was hung:

   kswapd0         D ffffffc000204fd8     0    72      2 0x00000000
   Call trace:
   [<ffffffc000204fd8>] __switch_to+0x9c/0xa8
   [<ffffffc00090b794>] __schedule+0x440/0x6d8
   [<ffffffc00090bac0>] schedule+0x94/0xb4
   [<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44
   [<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac
   [<ffffffc00090d9d8>] mutex_lock+0x4c/0x68
   [<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78
   [<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464
   [<ffffffc00030dbd8>] shrink_zone+0xa8/0x198
   [<ffffffc00030e578>] balance_pgdat+0x328/0x508
   [<ffffffc00030eb7c>] kswapd+0x424/0x51c
   [<ffffffc00023f06c>] kthread+0x10c/0x114
   [<ffffffc000203dd0>] ret_from_fork+0x10/0x40

By unblocking kswapd memory pressure should be reduced.

Suggested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/dm-bufio.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -824,7 +824,8 @@ static struct dm_buffer *__alloc_buffer_
 	 * dm-bufio is resistant to allocation failures (it just keeps
 	 * one buffer reserved in cases all the allocations fail).
 	 * So set flags to not try too hard:
-	 *	GFP_NOIO: don't recurse into the I/O layer
+	 *	GFP_NOWAIT: don't wait; if we need to sleep we'll release our
+	 *		    mutex and wait ourselves.
 	 *	__GFP_NORETRY: don't retry and rather return failure
 	 *	__GFP_NOMEMALLOC: don't use emergency reserves
 	 *	__GFP_NOWARN: don't print a warning in case of failure
@@ -834,7 +835,7 @@ static struct dm_buffer *__alloc_buffer_
 	 */
 	while (1) {
 		if (dm_bufio_cache_size_latch != 1) {
-			b = alloc_buffer(c, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
+			b = alloc_buffer(c, GFP_NOWAIT | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
 			if (b)
 				return b;
 		}



  parent reply	other threads:[~2018-07-10 18:30 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-10 18:24 [PATCH 4.9 00/52] 4.9.112-stable review Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 01/52] usb: cdc_acm: Add quirk for Uniden UBC125 scanner Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 02/52] USB: serial: cp210x: add CESINEL device ids Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 03/52] USB: serial: cp210x: add Silicon Labs IDs for Windows Update Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 04/52] usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 05/52] n_tty: Fix stall at n_tty_receive_char_special() Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 06/52] n_tty: Access echo_* variables carefully Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 07/52] staging: android: ion: Return an ERR_PTR in ion_map_kernel Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 08/52] vt: prevent leaking uninitialized data to userspace via /dev/vcs* Greg Kroah-Hartman
2018-07-10 18:30   ` syzbot
2018-07-10 18:24 ` [PATCH 4.9 09/52] i2c: rcar: fix resume by always initializing registers before transfer Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 10/52] ipv4: Fix error return value in fib_convert_metrics() Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 11/52] kprobes/x86: Do not modify singlestep buffer while resuming Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 12/52] netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain() Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 13/52] Revert "sit: reload iphdr in ipip6_rcv" Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 14/52] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 15/52] ARM: dts: imx6q: Use correct SDMA script for SPI5 core Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 16/52] IB/hfi1: Fix user context tail allocation for DMA_RTAIL Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 17/52] x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 18/52] x86/cpu: Re-apply forced caps every time CPU caps are re-read Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 19/52] mm: hugetlb: yield when prepping struct pages Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 20/52] tracing: Fix missing return symbol in function_graph output Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 21/52] scsi: sg: mitigate read/write abuse Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 22/52] s390: Correct register corruption in critical section cleanup Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 23/52] drbd: fix access after free Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 24/52] cifs: Fix infinite loop when using hard mount option Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 25/52] drm/udl: fix display corruption of the last line Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 26/52] jbd2: dont mark block as modified if the handle is out of credits Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 27/52] ext4: make sure bitmaps and the inode table dont overlap with bg descriptors Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 28/52] ext4: always check block group bounds in ext4_init_block_bitmap() Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 29/52] ext4: only look at the bg_flags field if it is valid Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 30/52] ext4: verify the depth of extent tree in ext4_find_extent() Greg Kroah-Hartman
2018-07-10 18:24 ` [PATCH 4.9 31/52] ext4: include the illegal physical block in the bad map ext4_error msg Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 32/52] ext4: clear i_data in ext4_inode_info when removing inline data Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 33/52] ext4: add more inode number paranoia checks Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 34/52] ext4: add more mount time checks of the superblock Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 35/52] ext4: check superblock mapped prior to committing Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 36/52] mlxsw: spectrum: Forbid linking of VLAN devices to devices that have uppers Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 37/52] HID: i2c-hid: Fix "incomplete report" noise Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 38/52] HID: hiddev: fix potential Spectre v1 Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 39/52] HID: debug: check length before copy_to_user() Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 40/52] PM / OPP: Update voltage in case freq == old_freq Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 41/52] Kbuild: fix # escaping in .cmd files for future Make Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 42/52] media: cx25840: Use subdev host data for PLL override Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 43/52] mm, page_alloc: do not break __GFP_THISNODE by zonelist reset Greg Kroah-Hartman
2018-07-10 18:25 ` Greg Kroah-Hartman [this message]
2018-07-10 18:25 ` [PATCH 4.9 45/52] dm bufio: drop the lock when doing GFP_NOIO allocation Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 46/52] mtd: rawnand: mxc: set spare area size register explicitly Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 47/52] dm bufio: dont take the lock in dm_bufio_shrink_count Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 48/52] mtd: cfi_cmdset_0002: Change definition naming to retry write operation Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 49/52] mtd: cfi_cmdset_0002: Change erase functions to retry for error Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 50/52] mtd: cfi_cmdset_0002: Change erase functions to check chip good only Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 51/52] netfilter: nf_log: dont hold nf_log_mutex during user access Greg Kroah-Hartman
2018-07-10 18:25 ` [PATCH 4.9 52/52] staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write() Greg Kroah-Hartman
2018-07-10 19:09 ` [PATCH 4.9 00/52] 4.9.112-stable review Nathan Chancellor
2018-07-11 11:04   ` Greg Kroah-Hartman
2018-07-11 11:17 ` Naresh Kamboju
2018-07-11 20:12   ` Dan Rue
2018-07-11 13:40 ` Guenter Roeck
2018-07-11 15:12 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180710182453.462571488@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dianders@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=rientjes@google.com \
    --cc=snitzer@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).