All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Fix zone lock deadlock
@ 2020-02-28  7:12 Naohiro Aota
  2020-02-28  7:12 ` [PATCH 1/6] zbd: avoid initializing swd when unnecessary Naohiro Aota
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Naohiro Aota @ 2020-02-28  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, Naohiro Aota

With zonemode=zbd and asynchronous ioengine, a thread takes a zone lock
before an I/O submission (in zbd_adjust_block() or
zbd_convert_to_open_zone()) and releases the lock after the I/O is put (in
zbd_put_io()).  With a small number of open zones and/or a large number of
jobs, threads can easily end up circular lock dependency and deadlocks. For
example, thread A sends an I/O to zone 0, so thread A holds a zone lock #0.
Then, thread A continues on zone 1 and try to acquire zone lock #1. At the
same time, thread B held zone lock #1, sent I/O to zone 1, and try to
acquire zone #0. Now, both threads are waiting for each other's lock, which
is never released.

This series fixes three problems to eliminate the deadlock. First, taking
all the zone locks should be avoided. zbd_process_swd() and
zbd_reset_zones() take the lock for all zones of the specified device,
preventing other threads from accessing different zones in parallel. While
it is not the root cause of the deadlock, such all zone locking easily
trigger a deadlock. So, this series reduces such contentions by (1)
eliminating unnecessary invocation of zbd_process_swd() and (2) changing to
take single zone at at time in zbd_reset_zones().

Secondly, zbd's I/O issuing path should expect lock contention with other
threads and handle the case properly. Commit 6f0c608564c3 ("zbd: Avoid
async I/O multi-job workload deadlock") also addressed this issue by using
pthread_mutex_try_lock() and io_u_quiesce(). However, there are more
pthread_mutex_lock() left to be fixed in the same way.

Finally, fio should clean up I/Os properly on an error case. Currently,
cleanup_pending_aio() and io_u_quiesce() fail to clean up I/Os with an
error. As a result, zone locks, which are held by an erroneous thread, are
kept held and blocks other threads to acquire the locks.

This series also add a test case to cause the deadlock with unpatched fio.

Patches 1 and 2 avoid long range lock holding to reduce lock contentions.

Patch 3 introduces zone_lock and use it to handle the lock contention case.

Patches 4 and 5 fix error path to clean up all the pending I/Os left.

Patch 6 adds the test.

Naohiro Aota (6):
  zbd: avoid initializing swd when unnecessary
  zbd: reset one zone at a time
  zbd: use zone_lock to lock a zone
  backend: always clean up pending aios
  io_u: ensure io_u_quiesce() to process all the IOs
  zbd: add test for stressing zone locking

 backend.c              |  5 ---
 io_u.c                 |  6 +--
 t/zbd/test-zbd-support | 30 +++++++++++++++
 zbd.c                  | 84 ++++++++++++++++--------------------------
 4 files changed, 65 insertions(+), 60 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/6] zbd: avoid initializing swd when unnecessary
  2020-02-28  7:12 [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
@ 2020-02-28  7:12 ` Naohiro Aota
  2020-02-28  7:12 ` [PATCH 2/6] zbd: reset one zone at a time Naohiro Aota
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Naohiro Aota @ 2020-02-28  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, Naohiro Aota

When enable_check_swd == false, there is no use to initialize swd. Just
disable it in this case.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 zbd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/zbd.c b/zbd.c
index ee8bcb30416a..ddf1e6f3e008 100644
--- a/zbd.c
+++ b/zbd.c
@@ -847,6 +847,9 @@ static void zbd_init_swd(struct fio_file *f)
 {
 	uint64_t swd;
 
+	if (!enable_check_swd)
+		return;
+
 	swd = zbd_process_swd(f, SET_SWD);
 	dprint(FD_ZBD, "%s(%s): swd = %" PRIu64 "\n", __func__, f->file_name,
 	       swd);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/6] zbd: reset one zone at a time
  2020-02-28  7:12 [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
  2020-02-28  7:12 ` [PATCH 1/6] zbd: avoid initializing swd when unnecessary Naohiro Aota
@ 2020-02-28  7:12 ` Naohiro Aota
  2020-02-28  7:12 ` [PATCH 3/6] zbd: use zone_lock to lock a zone Naohiro Aota
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Naohiro Aota @ 2020-02-28  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, Naohiro Aota

zbd_rest_zones() currently scans over device zones and try to reset as much
zones as possible at a time. However, this routine takes all the lock on
the range and causes a lot of lock contentions with other threads.

This commit change the behavior to hold the lock and examine one zone at a
time. While it will increase the number of ioctl() call when it need to
reset contiguous, the overhead of increased number of ioctl()s are anyway
amortized by device side's reset performance.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 zbd.c | 42 +++++++-----------------------------------
 1 file changed, 7 insertions(+), 35 deletions(-)

diff --git a/zbd.c b/zbd.c
index ddf1e6f3e008..18a55ea46ef9 100644
--- a/zbd.c
+++ b/zbd.c
@@ -707,7 +707,7 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 			   struct fio_zone_info *const zb,
 			   struct fio_zone_info *const ze, bool all_zones)
 {
-	struct fio_zone_info *z, *start_z = ze;
+	struct fio_zone_info *z;
 	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
 	bool reset_wp;
 	int res = 0;
@@ -717,48 +717,20 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 	assert(f->fd != -1);
 	for (z = zb; z < ze; z++) {
 		pthread_mutex_lock(&z->mutex);
-		switch (z->type) {
-		case BLK_ZONE_TYPE_SEQWRITE_REQ:
+		if (z->type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
 			reset_wp = all_zones ? z->wp != z->start :
 					(td->o.td_ddir & TD_DDIR_WRITE) &&
 					z->wp % min_bs != 0;
-			if (start_z == ze && reset_wp) {
-				start_z = z;
-			} else if (start_z < ze && !reset_wp) {
-				dprint(FD_ZBD,
-				       "%s: resetting zones %u .. %u\n",
+			if (reset_wp) {
+				dprint(FD_ZBD, "%s: resetting zone %u\n",
 				       f->file_name,
-					zbd_zone_nr(f->zbd_info, start_z),
-					zbd_zone_nr(f->zbd_info, z));
-				if (zbd_reset_range(td, f, start_z->start,
-						z->start - start_z->start) < 0)
+				       zbd_zone_nr(f->zbd_info, z));
+				if (zbd_reset_zone(td, f, z) < 0)
 					res = 1;
-				start_z = ze;
 			}
-			break;
-		default:
-			if (start_z == ze)
-				break;
-			dprint(FD_ZBD, "%s: resetting zones %u .. %u\n",
-			       f->file_name, zbd_zone_nr(f->zbd_info, start_z),
-			       zbd_zone_nr(f->zbd_info, z));
-			if (zbd_reset_range(td, f, start_z->start,
-					    z->start - start_z->start) < 0)
-				res = 1;
-			start_z = ze;
-			break;
 		}
-	}
-	if (start_z < ze) {
-		dprint(FD_ZBD, "%s: resetting zones %u .. %u\n", f->file_name,
-			zbd_zone_nr(f->zbd_info, start_z),
-			zbd_zone_nr(f->zbd_info, z));
-		if (zbd_reset_range(td, f, start_z->start,
-				    z->start - start_z->start) < 0)
-			res = 1;
-	}
-	for (z = zb; z < ze; z++)
 		pthread_mutex_unlock(&z->mutex);
+	}
 
 	return res;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/6] zbd: use zone_lock to lock a zone
  2020-02-28  7:12 [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
  2020-02-28  7:12 ` [PATCH 1/6] zbd: avoid initializing swd when unnecessary Naohiro Aota
  2020-02-28  7:12 ` [PATCH 2/6] zbd: reset one zone at a time Naohiro Aota
@ 2020-02-28  7:12 ` Naohiro Aota
  2020-02-28  7:12 ` [PATCH 4/6] backend: always clean up pending aios Naohiro Aota
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Naohiro Aota @ 2020-02-28  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, Naohiro Aota

commit 6f0c608564c3 ("zbd: Avoid async I/O multi-job workload deadlock")
introduced io_u_quiesce() when it failed to lock a zone to avoid deadlock.
This situation can happen on the other locking place like
zbd_convert_to_open_zone(). Thus, introduce common helper "zone_lock" to
lock a zone.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 zbd.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/zbd.c b/zbd.c
index 18a55ea46ef9..b2d9424972da 100644
--- a/zbd.c
+++ b/zbd.c
@@ -927,6 +927,24 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
 	f->zbd_info->zone_info[zone_idx].open = 0;
 }
 
+static void zone_lock(struct thread_data *td, struct fio_zone_info *z)
+{
+	/*
+	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
+	 * is changed or when io_u completes and zbd_put_io() executed.
+	 * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
+	 * other waiting for zone locks when building an io_u batch, first
+	 * only trylock the zone. If the zone is already locked by another job,
+	 * process the currently queued I/Os so that I/O progress is made and
+	 * zones unlocked.
+	 */
+	if (pthread_mutex_trylock(&z->mutex) != 0) {
+		if (!td_ioengine_flagged(td, FIO_SYNCIO))
+			io_u_quiesce(td);
+		pthread_mutex_lock(&z->mutex);
+	}
+}
+
 /*
  * Modify the offset of an I/O unit that does not refer to an open zone such
  * that it refers to an open zone. Close an open zone and open a new zone if
@@ -969,7 +987,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	for (;;) {
 		z = &f->zbd_info->zone_info[zone_idx];
 
-		pthread_mutex_lock(&z->mutex);
+		zone_lock(td, z);
 		pthread_mutex_lock(&f->zbd_info->mutex);
 		if (td->o.max_open_zones == 0)
 			goto examine_zone;
@@ -1017,7 +1035,7 @@ examine_zone:
 			z = &f->zbd_info->zone_info[zone_idx];
 		}
 		assert(is_valid_offset(f, z->start));
-		pthread_mutex_lock(&z->mutex);
+		zone_lock(td, z);
 		if (z->open)
 			continue;
 		if (zbd_open_zone(td, io_u, zone_idx))
@@ -1035,7 +1053,7 @@ examine_zone:
 
 		z = &f->zbd_info->zone_info[zone_idx];
 
-		pthread_mutex_lock(&z->mutex);
+		zone_lock(td, z);
 		if (z->wp + min_bs <= (z+1)->start)
 			goto out;
 		pthread_mutex_lock(&f->zbd_info->mutex);
@@ -1321,20 +1339,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 	zbd_check_swd(f);
 
-	/*
-	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
-	 * is changed or when io_u completes and zbd_put_io() executed.
-	 * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
-	 * other waiting for zone locks when building an io_u batch, first
-	 * only trylock the zone. If the zone is already locked by another job,
-	 * process the currently queued I/Os so that I/O progress is made and
-	 * zones unlocked.
-	 */
-	if (pthread_mutex_trylock(&zb->mutex) != 0) {
-		if (!td_ioengine_flagged(td, FIO_SYNCIO))
-			io_u_quiesce(td);
-		pthread_mutex_lock(&zb->mutex);
-	}
+	zone_lock(td, zb);
 
 	switch (io_u->ddir) {
 	case DDIR_READ:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/6] backend: always clean up pending aios
  2020-02-28  7:12 [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
                   ` (2 preceding siblings ...)
  2020-02-28  7:12 ` [PATCH 3/6] zbd: use zone_lock to lock a zone Naohiro Aota
@ 2020-02-28  7:12 ` Naohiro Aota
  2020-02-28  7:12 ` [PATCH 5/6] io_u: ensure io_u_quiesce() to process all the IOs Naohiro Aota
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Naohiro Aota @ 2020-02-28  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, Naohiro Aota

cleanup_pending_aios() is called when a thread exits with error, so all the
call site of this function is under "if (td->error)". However, commit
d28174f0189c ("workqueue: ensure we see deferred error for IOs"), for some
reason, added "if (td->error) return" at the head of this function, making
this function practically void. Revert this part to ensure cleaning up
pending aios.

Besides, cleanup_pending_aios() should not return even when
io_u_queued_complete() failed. Because, it keeps in-flight aios left.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 backend.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/backend.c b/backend.c
index 936203dcb1c4..feb34e51382f 100644
--- a/backend.c
+++ b/backend.c
@@ -237,15 +237,10 @@ static void cleanup_pending_aio(struct thread_data *td)
 {
 	int r;
 
-	if (td->error)
-		return;
-
 	/*
 	 * get immediately available events, if any
 	 */
 	r = io_u_queued_complete(td, 0);
-	if (r < 0)
-		return;
 
 	/*
 	 * now cancel remaining active events
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/6] io_u: ensure io_u_quiesce() to process all the IOs
  2020-02-28  7:12 [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
                   ` (3 preceding siblings ...)
  2020-02-28  7:12 ` [PATCH 4/6] backend: always clean up pending aios Naohiro Aota
@ 2020-02-28  7:12 ` Naohiro Aota
  2020-02-28  7:12 ` [PATCH 6/6] zbd: add test for stressing zone locking Naohiro Aota
  2020-03-17  2:50 ` [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
  6 siblings, 0 replies; 11+ messages in thread
From: Naohiro Aota @ 2020-02-28  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, Naohiro Aota

Currently, when IO have an error io_u_quiesce() stops processing
in-flight IOs there and leaves other IOs non-completed. This is not a
desired behavior for io_u_quiesce(). Fix it by continuing even on
error.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 io_u.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/io_u.c b/io_u.c
index bcb893c5abad..5d62a76cb5a7 100644
--- a/io_u.c
+++ b/io_u.c
@@ -606,7 +606,7 @@ static inline enum fio_ddir get_rand_ddir(struct thread_data *td)
 
 int io_u_quiesce(struct thread_data *td)
 {
-	int ret = 0, completed = 0;
+	int ret = 0, completed = 0, err = 0;
 
 	/*
 	 * We are going to sleep, ensure that we flush anything pending as
@@ -625,7 +625,7 @@ int io_u_quiesce(struct thread_data *td)
 		if (ret > 0)
 			completed += ret;
 		else if (ret < 0)
-			break;
+			err = ret;
 	}
 
 	if (td->flags & TD_F_REGROW_LOGS)
@@ -634,7 +634,7 @@ int io_u_quiesce(struct thread_data *td)
 	if (completed)
 		return completed;
 
-	return ret;
+	return err;
 }
 
 static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/6] zbd: add test for stressing zone locking
  2020-02-28  7:12 [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
                   ` (4 preceding siblings ...)
  2020-02-28  7:12 ` [PATCH 5/6] io_u: ensure io_u_quiesce() to process all the IOs Naohiro Aota
@ 2020-02-28  7:12 ` Naohiro Aota
  2020-03-17  2:50 ` [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
  6 siblings, 0 replies; 11+ messages in thread
From: Naohiro Aota @ 2020-02-28  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, Naohiro Aota

Add a test to stress zone locking mechanism by having a large number of
threads with a small number of max_open_zones. Run 30 seconds time-based
fio under the timeout command. After 45 seconds, "timeout" kill -KILL the
fio process. If a zone lock deadlocks, fio is killed by the timeout
command, and this test fails. If not, fio runs to the end and this test
success.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 t/zbd/test-zbd-support | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 5d079a8b7873..bd41fffb3298 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -755,6 +755,36 @@ test47() {
     grep -q 'zoneskip 1 is not a multiple of the device zone size' "${logfile}.${test_number}"
 }
 
+# Multiple overlapping random write jobs for the same drive and with a
+# limited number of open zones. This is similar to test29, but uses libaio
+# to stress test zone locking.
+test48() {
+    local i jobs=16 off opts=()
+
+    off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
+    size=$((16*zone_size))
+    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    opts=("--aux-path=/tmp" "--allow_file_create=0" "--significant_figures=10")
+    opts+=("--debug=zbd")
+    opts+=("--ioengine=libaio" "--rw=randwrite" "--direct=1")
+    opts+=("--time_based" "--runtime=30")
+    opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
+    opts+=("--max_open_zones=4")
+    for ((i=0;i<jobs;i++)); do
+	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
+	opts+=("--io_size=$zone_size" "--iodepth=256" "--thread=1")
+	opts+=("--group_reporting=1")
+    done
+
+    fio=$(dirname "$0")/../../fio
+
+    { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
+
+    timeout -v -s KILL 45s \
+	    "${dynamic_analyzer[@]}" "$fio" "${opts[@]}" \
+	    >> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
 tests=()
 dynamic_analyzer=()
 reset_all_zones=
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] Fix zone lock deadlock
  2020-02-28  7:12 [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
                   ` (5 preceding siblings ...)
  2020-02-28  7:12 ` [PATCH 6/6] zbd: add test for stressing zone locking Naohiro Aota
@ 2020-03-17  2:50 ` Naohiro Aota
  2020-03-18  1:48   ` Jens Axboe
  6 siblings, 1 reply; 11+ messages in thread
From: Naohiro Aota @ 2020-03-17  2:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

On Fri, Feb 28, 2020 at 04:12:42PM +0900, Naohiro Aota wrote:
>With zonemode=zbd and asynchronous ioengine, a thread takes a zone lock
>before an I/O submission (in zbd_adjust_block() or
>zbd_convert_to_open_zone()) and releases the lock after the I/O is put (in
>zbd_put_io()).  With a small number of open zones and/or a large number of
>jobs, threads can easily end up circular lock dependency and deadlocks. For
>example, thread A sends an I/O to zone 0, so thread A holds a zone lock #0.
>Then, thread A continues on zone 1 and try to acquire zone lock #1. At the
>same time, thread B held zone lock #1, sent I/O to zone 1, and try to
>acquire zone #0. Now, both threads are waiting for each other's lock, which
>is never released.
>
>This series fixes three problems to eliminate the deadlock. First, taking
>all the zone locks should be avoided. zbd_process_swd() and
>zbd_reset_zones() take the lock for all zones of the specified device,
>preventing other threads from accessing different zones in parallel. While
>it is not the root cause of the deadlock, such all zone locking easily
>trigger a deadlock. So, this series reduces such contentions by (1)
>eliminating unnecessary invocation of zbd_process_swd() and (2) changing to
>take single zone at at time in zbd_reset_zones().
>
>Secondly, zbd's I/O issuing path should expect lock contention with other
>threads and handle the case properly. Commit 6f0c608564c3 ("zbd: Avoid
>async I/O multi-job workload deadlock") also addressed this issue by using
>pthread_mutex_try_lock() and io_u_quiesce(). However, there are more
>pthread_mutex_lock() left to be fixed in the same way.
>
>Finally, fio should clean up I/Os properly on an error case. Currently,
>cleanup_pending_aio() and io_u_quiesce() fail to clean up I/Os with an
>error. As a result, zone locks, which are held by an erroneous thread, are
>kept held and blocks other threads to acquire the locks.
>
>This series also add a test case to cause the deadlock with unpatched fio.
>
>Patches 1 and 2 avoid long range lock holding to reduce lock contentions.
>
>Patch 3 introduces zone_lock and use it to handle the lock contention case.
>
>Patches 4 and 5 fix error path to clean up all the pending I/Os left.
>
>Patch 6 adds the test.
>
>Naohiro Aota (6):
>  zbd: avoid initializing swd when unnecessary
>  zbd: reset one zone at a time
>  zbd: use zone_lock to lock a zone
>  backend: always clean up pending aios
>  io_u: ensure io_u_quiesce() to process all the IOs
>  zbd: add test for stressing zone locking
>
> backend.c              |  5 ---
> io_u.c                 |  6 +--
> t/zbd/test-zbd-support | 30 +++++++++++++++
> zbd.c                  | 84 ++++++++++++++++--------------------------
> 4 files changed, 65 insertions(+), 60 deletions(-)
>
>-- 
>2.25.1

Ping on this series (also on this
https://www.spinics.net/lists/fio/msg08322.html ).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] Fix zone lock deadlock
  2020-03-17  2:50 ` [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
@ 2020-03-18  1:48   ` Jens Axboe
  2020-03-18  2:00     ` Damien Le Moal
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2020-03-18  1:48 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: fio, Damien Le Moal

On 3/16/20 8:50 PM, Naohiro Aota wrote:
> On Fri, Feb 28, 2020 at 04:12:42PM +0900, Naohiro Aota wrote:
>> With zonemode=zbd and asynchronous ioengine, a thread takes a zone lock
>> before an I/O submission (in zbd_adjust_block() or
>> zbd_convert_to_open_zone()) and releases the lock after the I/O is put (in
>> zbd_put_io()).  With a small number of open zones and/or a large number of
>> jobs, threads can easily end up circular lock dependency and deadlocks. For
>> example, thread A sends an I/O to zone 0, so thread A holds a zone lock #0.
>> Then, thread A continues on zone 1 and try to acquire zone lock #1. At the
>> same time, thread B held zone lock #1, sent I/O to zone 1, and try to
>> acquire zone #0. Now, both threads are waiting for each other's lock, which
>> is never released.
>>
>> This series fixes three problems to eliminate the deadlock. First, taking
>> all the zone locks should be avoided. zbd_process_swd() and
>> zbd_reset_zones() take the lock for all zones of the specified device,
>> preventing other threads from accessing different zones in parallel. While
>> it is not the root cause of the deadlock, such all zone locking easily
>> trigger a deadlock. So, this series reduces such contentions by (1)
>> eliminating unnecessary invocation of zbd_process_swd() and (2) changing to
>> take single zone at at time in zbd_reset_zones().
>>
>> Secondly, zbd's I/O issuing path should expect lock contention with other
>> threads and handle the case properly. Commit 6f0c608564c3 ("zbd: Avoid
>> async I/O multi-job workload deadlock") also addressed this issue by using
>> pthread_mutex_try_lock() and io_u_quiesce(). However, there are more
>> pthread_mutex_lock() left to be fixed in the same way.
>>
>> Finally, fio should clean up I/Os properly on an error case. Currently,
>> cleanup_pending_aio() and io_u_quiesce() fail to clean up I/Os with an
>> error. As a result, zone locks, which are held by an erroneous thread, are
>> kept held and blocks other threads to acquire the locks.
>>
>> This series also add a test case to cause the deadlock with unpatched fio.
>>
>> Patches 1 and 2 avoid long range lock holding to reduce lock contentions.
>>
>> Patch 3 introduces zone_lock and use it to handle the lock contention case.
>>
>> Patches 4 and 5 fix error path to clean up all the pending I/Os left.
>>
>> Patch 6 adds the test.
>>
>> Naohiro Aota (6):
>>  zbd: avoid initializing swd when unnecessary
>>  zbd: reset one zone at a time
>>  zbd: use zone_lock to lock a zone
>>  backend: always clean up pending aios
>>  io_u: ensure io_u_quiesce() to process all the IOs
>>  zbd: add test for stressing zone locking
>>
>> backend.c              |  5 ---
>> io_u.c                 |  6 +--
>> t/zbd/test-zbd-support | 30 +++++++++++++++
>> zbd.c                  | 84 ++++++++++++++++--------------------------
>> 4 files changed, 65 insertions(+), 60 deletions(-)
>>
>> -- 
>> 2.25.1
> 
> Ping on this series (also on this
> https://www.spinics.net/lists/fio/msg08322.html ).

Damien, can you take a look please?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] Fix zone lock deadlock
  2020-03-18  1:48   ` Jens Axboe
@ 2020-03-18  2:00     ` Damien Le Moal
  2020-03-18  2:06       ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Damien Le Moal @ 2020-03-18  2:00 UTC (permalink / raw)
  To: Jens Axboe, Naohiro Aota; +Cc: fio

On 2020/03/18 10:49, Jens Axboe wrote:
> On 3/16/20 8:50 PM, Naohiro Aota wrote:
>> On Fri, Feb 28, 2020 at 04:12:42PM +0900, Naohiro Aota wrote:
>>> With zonemode=zbd and asynchronous ioengine, a thread takes a zone lock
>>> before an I/O submission (in zbd_adjust_block() or
>>> zbd_convert_to_open_zone()) and releases the lock after the I/O is put (in
>>> zbd_put_io()).  With a small number of open zones and/or a large number of
>>> jobs, threads can easily end up circular lock dependency and deadlocks. For
>>> example, thread A sends an I/O to zone 0, so thread A holds a zone lock #0.
>>> Then, thread A continues on zone 1 and try to acquire zone lock #1. At the
>>> same time, thread B held zone lock #1, sent I/O to zone 1, and try to
>>> acquire zone #0. Now, both threads are waiting for each other's lock, which
>>> is never released.
>>>
>>> This series fixes three problems to eliminate the deadlock. First, taking
>>> all the zone locks should be avoided. zbd_process_swd() and
>>> zbd_reset_zones() take the lock for all zones of the specified device,
>>> preventing other threads from accessing different zones in parallel. While
>>> it is not the root cause of the deadlock, such all zone locking easily
>>> trigger a deadlock. So, this series reduces such contentions by (1)
>>> eliminating unnecessary invocation of zbd_process_swd() and (2) changing to
>>> take single zone at at time in zbd_reset_zones().
>>>
>>> Secondly, zbd's I/O issuing path should expect lock contention with other
>>> threads and handle the case properly. Commit 6f0c608564c3 ("zbd: Avoid
>>> async I/O multi-job workload deadlock") also addressed this issue by using
>>> pthread_mutex_try_lock() and io_u_quiesce(). However, there are more
>>> pthread_mutex_lock() left to be fixed in the same way.
>>>
>>> Finally, fio should clean up I/Os properly on an error case. Currently,
>>> cleanup_pending_aio() and io_u_quiesce() fail to clean up I/Os with an
>>> error. As a result, zone locks, which are held by an erroneous thread, are
>>> kept held and blocks other threads to acquire the locks.
>>>
>>> This series also add a test case to cause the deadlock with unpatched fio.
>>>
>>> Patches 1 and 2 avoid long range lock holding to reduce lock contentions.
>>>
>>> Patch 3 introduces zone_lock and use it to handle the lock contention case.
>>>
>>> Patches 4 and 5 fix error path to clean up all the pending I/Os left.
>>>
>>> Patch 6 adds the test.
>>>
>>> Naohiro Aota (6):
>>>  zbd: avoid initializing swd when unnecessary
>>>  zbd: reset one zone at a time
>>>  zbd: use zone_lock to lock a zone
>>>  backend: always clean up pending aios
>>>  io_u: ensure io_u_quiesce() to process all the IOs
>>>  zbd: add test for stressing zone locking
>>>
>>> backend.c              |  5 ---
>>> io_u.c                 |  6 +--
>>> t/zbd/test-zbd-support | 30 +++++++++++++++
>>> zbd.c                  | 84 ++++++++++++++++--------------------------
>>> 4 files changed, 65 insertions(+), 60 deletions(-)
>>>
>>> -- 
>>> 2.25.1
>>
>> Ping on this series (also on this
>> https://www.spinics.net/lists/fio/msg08322.html ).
> 
> Damien, can you take a look please?
> 

My apologies. I should have sent a reviewed-by tag earlier.
I worked on these with Naohiro (who did all the heavy lifting).
Tested too and at least for SMR disks workloads, no problems detected.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] Fix zone lock deadlock
  2020-03-18  2:00     ` Damien Le Moal
@ 2020-03-18  2:06       ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2020-03-18  2:06 UTC (permalink / raw)
  To: Damien Le Moal, Naohiro Aota; +Cc: fio

On 3/17/20 8:00 PM, Damien Le Moal wrote:
> On 2020/03/18 10:49, Jens Axboe wrote:
>> On 3/16/20 8:50 PM, Naohiro Aota wrote:
>>> On Fri, Feb 28, 2020 at 04:12:42PM +0900, Naohiro Aota wrote:
>>>> With zonemode=zbd and asynchronous ioengine, a thread takes a zone lock
>>>> before an I/O submission (in zbd_adjust_block() or
>>>> zbd_convert_to_open_zone()) and releases the lock after the I/O is put (in
>>>> zbd_put_io()).  With a small number of open zones and/or a large number of
>>>> jobs, threads can easily end up circular lock dependency and deadlocks. For
>>>> example, thread A sends an I/O to zone 0, so thread A holds a zone lock #0.
>>>> Then, thread A continues on zone 1 and try to acquire zone lock #1. At the
>>>> same time, thread B held zone lock #1, sent I/O to zone 1, and try to
>>>> acquire zone #0. Now, both threads are waiting for each other's lock, which
>>>> is never released.
>>>>
>>>> This series fixes three problems to eliminate the deadlock. First, taking
>>>> all the zone locks should be avoided. zbd_process_swd() and
>>>> zbd_reset_zones() take the lock for all zones of the specified device,
>>>> preventing other threads from accessing different zones in parallel. While
>>>> it is not the root cause of the deadlock, such all zone locking easily
>>>> trigger a deadlock. So, this series reduces such contentions by (1)
>>>> eliminating unnecessary invocation of zbd_process_swd() and (2) changing to
>>>> take single zone at at time in zbd_reset_zones().
>>>>
>>>> Secondly, zbd's I/O issuing path should expect lock contention with other
>>>> threads and handle the case properly. Commit 6f0c608564c3 ("zbd: Avoid
>>>> async I/O multi-job workload deadlock") also addressed this issue by using
>>>> pthread_mutex_try_lock() and io_u_quiesce(). However, there are more
>>>> pthread_mutex_lock() left to be fixed in the same way.
>>>>
>>>> Finally, fio should clean up I/Os properly on an error case. Currently,
>>>> cleanup_pending_aio() and io_u_quiesce() fail to clean up I/Os with an
>>>> error. As a result, zone locks, which are held by an erroneous thread, are
>>>> kept held and blocks other threads to acquire the locks.
>>>>
>>>> This series also add a test case to cause the deadlock with unpatched fio.
>>>>
>>>> Patches 1 and 2 avoid long range lock holding to reduce lock contentions.
>>>>
>>>> Patch 3 introduces zone_lock and use it to handle the lock contention case.
>>>>
>>>> Patches 4 and 5 fix error path to clean up all the pending I/Os left.
>>>>
>>>> Patch 6 adds the test.
>>>>
>>>> Naohiro Aota (6):
>>>>  zbd: avoid initializing swd when unnecessary
>>>>  zbd: reset one zone at a time
>>>>  zbd: use zone_lock to lock a zone
>>>>  backend: always clean up pending aios
>>>>  io_u: ensure io_u_quiesce() to process all the IOs
>>>>  zbd: add test for stressing zone locking
>>>>
>>>> backend.c              |  5 ---
>>>> io_u.c                 |  6 +--
>>>> t/zbd/test-zbd-support | 30 +++++++++++++++
>>>> zbd.c                  | 84 ++++++++++++++++--------------------------
>>>> 4 files changed, 65 insertions(+), 60 deletions(-)
>>>>
>>>> -- 
>>>> 2.25.1
>>>
>>> Ping on this series (also on this
>>> https://www.spinics.net/lists/fio/msg08322.html ).
>>
>> Damien, can you take a look please?
>>
> 
> My apologies. I should have sent a reviewed-by tag earlier.
> I worked on these with Naohiro (who did all the heavy lifting).
> Tested too and at least for SMR disks workloads, no problems detected.
> 
> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
> Tested-by: Damien Le Moal <damien.lemoal@wdc.com>

Ah great, just wanted to double check. Thanks, series applied.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-03-18  2:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28  7:12 [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
2020-02-28  7:12 ` [PATCH 1/6] zbd: avoid initializing swd when unnecessary Naohiro Aota
2020-02-28  7:12 ` [PATCH 2/6] zbd: reset one zone at a time Naohiro Aota
2020-02-28  7:12 ` [PATCH 3/6] zbd: use zone_lock to lock a zone Naohiro Aota
2020-02-28  7:12 ` [PATCH 4/6] backend: always clean up pending aios Naohiro Aota
2020-02-28  7:12 ` [PATCH 5/6] io_u: ensure io_u_quiesce() to process all the IOs Naohiro Aota
2020-02-28  7:12 ` [PATCH 6/6] zbd: add test for stressing zone locking Naohiro Aota
2020-03-17  2:50 ` [PATCH 0/6] Fix zone lock deadlock Naohiro Aota
2020-03-18  1:48   ` Jens Axboe
2020-03-18  2:00     ` Damien Le Moal
2020-03-18  2:06       ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.