linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Nikos Tsironis <ntsironis@arrikto.com>,
	Ilias Tsitsimpis <iliastsi@arrikto.com>,
	Mike Snitzer <snitzer@redhat.com>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.9 28/39] dm kcopyd: Fix bug causing workqueue stalls
Date: Thu, 24 Jan 2019 20:20:31 +0100	[thread overview]
Message-ID: <20190124190449.249066142@linuxfoundation.org> (raw)
In-Reply-To: <20190124190448.232316246@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit d7e6b8dfc7bcb3f4f3a18313581f67486a725b52 ]

When using kcopyd to run callbacks through dm_kcopyd_do_callback() or
submitting copy jobs with a source size of 0, the jobs are pushed
directly to the complete_jobs list, which could be under processing by
the kcopyd thread. As a result, the kcopyd thread can continue running
completed jobs indefinitely, without releasing the CPU, as long as
someone keeps submitting new completed jobs through the aforementioned
paths. Processing of work items, queued for execution on the same CPU as
the currently running kcopyd thread, is thus stalled for excessive
amounts of time, hurting performance.

Running the following test, from the device mapper test suite [1],

  dmtest run --suite snapshot -n parallel_io_to_many_snaps_N

, with 8 active snapshots, we get, in dmesg, messages like the
following:

[68899.948523] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 95s!
[68899.949282] Showing busy workqueues and worker pools:
[68899.949288] workqueue events: flags=0x0
[68899.949295]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
[68899.949306]     pending: vmstat_shepherd, cache_reap
[68899.949331] workqueue mm_percpu_wq: flags=0x8
[68899.949337]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949345]     pending: vmstat_update
[68899.949387] workqueue dm_bufio_cache: flags=0x8
[68899.949392]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[68899.949400]     pending: work_fn [dm_bufio]
[68899.949423] workqueue kcopyd: flags=0x8
[68899.949429]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949437]     pending: do_work [dm_mod]
[68899.949452] workqueue kcopyd: flags=0x8
[68899.949458]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
[68899.949466]     in-flight: 13:do_work [dm_mod]
[68899.949474]     pending: do_work [dm_mod]
[68899.949487] workqueue kcopyd: flags=0x8
[68899.949493]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949501]     pending: do_work [dm_mod]
[68899.949515] workqueue kcopyd: flags=0x8
[68899.949521]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949529]     pending: do_work [dm_mod]
[68899.949541] workqueue kcopyd: flags=0x8
[68899.949547]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[68899.949555]     pending: do_work [dm_mod]
[68899.949568] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=95s workers=4 idle: 27130 27223 1084

Fix this by splitting the complete_jobs list into two parts: A user
facing part, named callback_jobs, and one used internally by kcopyd,
retaining the name complete_jobs. dm_kcopyd_do_callback() and
dispatch_job() now push their jobs to the callback_jobs list, which is
spliced to the complete_jobs list once, every time the kcopyd thread
wakes up. This prevents kcopyd from hogging the CPU indefinitely and
causing workqueue stalls.

Re-running the aforementioned test:

  * Workqueue stalls are eliminated
  * The maximum writing time among all targets is reduced from 09m37.10s
    to 06m04.85s and the total run time of the test is reduced from
    10m43.591s to 7m19.199s

[1] https://github.com/jthornber/device-mapper-test-suite

Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/md/dm-kcopyd.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 56fcccc30554..e0cfde3501e0 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -55,15 +55,17 @@ struct dm_kcopyd_client {
 	struct dm_kcopyd_throttle *throttle;
 
 /*
- * We maintain three lists of jobs:
+ * We maintain four lists of jobs:
  *
  * i)   jobs waiting for pages
  * ii)  jobs that have pages, and are waiting for the io to be issued.
- * iii) jobs that have completed.
+ * iii) jobs that don't need to do any IO and just run a callback
+ * iv) jobs that have completed.
  *
- * All three of these are protected by job_lock.
+ * All four of these are protected by job_lock.
  */
 	spinlock_t job_lock;
+	struct list_head callback_jobs;
 	struct list_head complete_jobs;
 	struct list_head io_jobs;
 	struct list_head pages_jobs;
@@ -584,6 +586,7 @@ static void do_work(struct work_struct *work)
 	struct dm_kcopyd_client *kc = container_of(work,
 					struct dm_kcopyd_client, kcopyd_work);
 	struct blk_plug plug;
+	unsigned long flags;
 
 	/*
 	 * The order that these are called is *very* important.
@@ -592,6 +595,10 @@ static void do_work(struct work_struct *work)
 	 * list.  io jobs call wake when they complete and it all
 	 * starts again.
 	 */
+	spin_lock_irqsave(&kc->job_lock, flags);
+	list_splice_tail_init(&kc->callback_jobs, &kc->complete_jobs);
+	spin_unlock_irqrestore(&kc->job_lock, flags);
+
 	blk_start_plug(&plug);
 	process_jobs(&kc->complete_jobs, kc, run_complete_job);
 	process_jobs(&kc->pages_jobs, kc, run_pages_job);
@@ -609,7 +616,7 @@ static void dispatch_job(struct kcopyd_job *job)
 	struct dm_kcopyd_client *kc = job->kc;
 	atomic_inc(&kc->nr_jobs);
 	if (unlikely(!job->source.count))
-		push(&kc->complete_jobs, job);
+		push(&kc->callback_jobs, job);
 	else if (job->pages == &zero_page_list)
 		push(&kc->io_jobs, job);
 	else
@@ -796,7 +803,7 @@ void dm_kcopyd_do_callback(void *j, int read_err, unsigned long write_err)
 	job->read_err = read_err;
 	job->write_err = write_err;
 
-	push(&kc->complete_jobs, job);
+	push(&kc->callback_jobs, job);
 	wake(kc);
 }
 EXPORT_SYMBOL(dm_kcopyd_do_callback);
@@ -826,6 +833,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro
 		return ERR_PTR(-ENOMEM);
 
 	spin_lock_init(&kc->job_lock);
+	INIT_LIST_HEAD(&kc->callback_jobs);
 	INIT_LIST_HEAD(&kc->complete_jobs);
 	INIT_LIST_HEAD(&kc->io_jobs);
 	INIT_LIST_HEAD(&kc->pages_jobs);
@@ -875,6 +883,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc)
 	/* Wait for completion of all jobs submitted by this client. */
 	wait_event(kc->destroyq, !atomic_read(&kc->nr_jobs));
 
+	BUG_ON(!list_empty(&kc->callback_jobs));
 	BUG_ON(!list_empty(&kc->complete_jobs));
 	BUG_ON(!list_empty(&kc->io_jobs));
 	BUG_ON(!list_empty(&kc->pages_jobs));
-- 
2.19.1




  parent reply	other threads:[~2019-01-24 20:05 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-24 19:20 [PATCH 4.9 00/39] 4.9.153-stable review Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 01/39] r8169: Add support for new Realtek Ethernet Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 02/39] ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 03/39] ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 04/39] platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 05/39] e1000e: allow non-monotonic SYSTIM readings Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 06/39] writeback: dont decrement wb->refcnt if !wb->bdi Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 07/39] serial: set suppress_bind_attrs flag only if builtin Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 08/39] ALSA: oxfw: add support for APOGEE duet FireWire Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 09/39] MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 10/39] arm64: perf: set suppress_bind_attrs flag to true Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 11/39] selinux: always allow mounting submounts Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 12/39] rxe: IB_WR_REG_MR does not capture MRs iova field Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 13/39] jffs2: Fix use of uninitialized delayed_work, lockdep breakage Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 14/39] pstore/ram: Do not treat empty buffers as valid Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 15/39] powerpc/xmon: Fix invocation inside lock region Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 16/39] powerpc/pseries/cpuidle: Fix preempt warning Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 17/39] media: firewire: Fix app_info parameter type in avc_ca{,_app}_info Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 18/39] net: call sk_dst_reset when set SO_DONTROUTE Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 19/39] scsi: target: use consistent left-aligned ASCII INQUIRY data Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 20/39] clk: imx6q: reset exclusive gates on init Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 21/39] kconfig: fix file name and line number of warn_ignored_character() Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 22/39] kconfig: fix memory leak when EOF is encountered in quotation Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 23/39] mmc: atmel-mci: do not assume idle after atmci_request_end Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 24/39] tty/serial: do not free trasnmit buffer page under port lock Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 25/39] perf intel-pt: Fix error with config term "pt=0" Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 26/39] perf svghelper: Fix unchecked usage of strncpy() Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 27/39] perf parse-events: " Greg Kroah-Hartman
2019-01-24 19:20 ` Greg Kroah-Hartman [this message]
2019-01-24 19:20 ` [PATCH 4.9 29/39] tools lib subcmd: Dont add the kernel sources to the include path Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 30/39] dm snapshot: Fix excessive memory usage and workqueue stalls Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 31/39] ALSA: bebob: fix model-id of unit for Apogee Ensemble Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 32/39] sysfs: Disable lockdep for driver bind/unbind files Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 33/39] scsi: smartpqi: correct lun reset issues Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 34/39] scsi: megaraid: fix out-of-bound array accesses Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 35/39] ocfs2: fix panic due to unrecovered local alloc Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 36/39] mm/page-writeback.c: dont break integrity writeback on ->writepage() error Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 37/39] mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 38/39] ipmi:ssif: Fix handling of multi-part return messages Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 39/39] locking/qspinlock: Pull in asm/byteorder.h to ensure correct endianness Greg Kroah-Hartman
2019-01-25 14:43 ` [PATCH 4.9 00/39] 4.9.153-stable review shuah
2019-01-25 16:28 ` Naresh Kamboju
2019-01-25 23:17 ` Guenter Roeck
2019-01-26 12:07 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190124190449.249066142@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=iliastsi@arrikto.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ntsironis@arrikto.com \
    --cc=sashal@kernel.org \
    --cc=snitzer@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).