All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] [PATCH v3 0/1] Multipath io_err_stat fixes
@ 2021-01-19  4:46 Benjamin Marzinski
  2021-01-19  4:46 ` [dm-devel] [PATCH v3 1/1] multipathd: cleanup logging for marginal paths Benjamin Marzinski
  2021-01-19 10:10 ` [dm-devel] [PATCH v3 0/1] Multipath io_err_stat fixes Martin Wilck
  0 siblings, 2 replies; 3+ messages in thread
From: Benjamin Marzinski @ 2021-01-19  4:46 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

I found an ABBA deadlock in the io_err_stat marginal path code, and in
the process of fixing it, noticed a potential crash on shutdown. This
patchset addresses both of the issues.

Changes from v2:

0001: This is patch 0006 from v2. It is exactly the same code. But the
commit message now mentions that check_path() no longer switches paths
from delayed to pending.

Changes from v1:

0002: use cleanup_mutex instead of cleanup_unlock as suggested by
Martin

0003: add pthread_testcancel and use cleanup_mutex instead of
cleanup_unlock as suggested by Martin. Also, make tmp_pathvec a constant
pointer, since it should always equal _pathvec.

0004-0006 are new patches to deal with io_err_stat issues from Martin's
review


Benjamin Marzinski (1):
  multipathd: cleanup logging for marginal paths

 libmultipath/io_err_stat.c |  7 +++----
 multipathd/main.c          | 25 ++++++++++++++-----------
 2 files changed, 17 insertions(+), 15 deletions(-)

-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [dm-devel] [PATCH v3 1/1] multipathd: cleanup logging for marginal paths
  2021-01-19  4:46 [dm-devel] [PATCH v3 0/1] Multipath io_err_stat fixes Benjamin Marzinski
@ 2021-01-19  4:46 ` Benjamin Marzinski
  2021-01-19 10:10 ` [dm-devel] [PATCH v3 0/1] Multipath io_err_stat fixes Martin Wilck
  1 sibling, 0 replies; 3+ messages in thread
From: Benjamin Marzinski @ 2021-01-19  4:46 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

io_err_stat logged at level 2 whenever it enqueued a path to check,
which could happen multiple times while a path was marginal.  On the
other hand if marginal_pathgroups wasn't set, multipathd didn't log when
paths were set to marginal. Now io_err_stat only logs at level 2 when
something unexpected happens, but multipathd will always log when a
path switches its marginal state.

This patch also fixes an issue where paths in the delayed state could
get set to the pending state if they could not be checked in time.
Aside from going against the idea the paths should not be set to pending
if they already have a valid state, this caused multipathd to log a
message whenever the path state switched to from delayed to pending and
then back.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/io_err_stat.c |  7 +++----
 multipathd/main.c          | 25 ++++++++++++++-----------
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index bf78a236..abdd0b4f 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -252,7 +252,7 @@ static int enqueue_io_err_stat_by_path(struct path *path)
 	vector_set_slot(io_err_pathvec, p);
 	pthread_mutex_unlock(&io_err_pathvec_lock);
 
-	io_err_stat_log(2, "%s: enqueue path %s to check",
+	io_err_stat_log(3, "%s: enqueue path %s to check",
 			path->mpp->alias, path->dev);
 	return 0;
 
@@ -343,7 +343,7 @@ int need_io_err_check(struct path *pp)
 	if (uatomic_read(&io_err_thread_running) == 0)
 		return 0;
 	if (count_active_paths(pp->mpp) <= 0) {
-		io_err_stat_log(2, "%s: recover path early", pp->dev);
+		io_err_stat_log(2, "%s: no paths. recovering early", pp->dev);
 		goto recover;
 	}
 	if (pp->io_err_pathfail_cnt != PATH_IO_ERR_WAITING_TO_CHECK)
@@ -361,8 +361,7 @@ int need_io_err_check(struct path *pp)
 		 * Or else,  return 1 to set path state to PATH_SHAKY
 		 */
 		if (r == 1) {
-			io_err_stat_log(3, "%s: enqueue fails, to recover",
-					pp->dev);
+			io_err_stat_log(2, "%s: enqueue failed. recovering early", pp->dev);
 			goto recover;
 		} else
 			pp->io_err_pathfail_cnt = PATH_IO_ERR_IN_CHECKING;
diff --git a/multipathd/main.c b/multipathd/main.c
index 92c45d44..99a89a69 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2132,8 +2132,8 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 		pathinfo(pp, conf, 0);
 		pthread_cleanup_pop(1);
 		return 1;
-	} else if ((newstate != PATH_UP && newstate != PATH_GHOST) &&
-			(pp->state == PATH_DELAYED)) {
+	} else if ((newstate != PATH_UP && newstate != PATH_GHOST &&
+		    newstate != PATH_PENDING) && (pp->state == PATH_DELAYED)) {
 		/* If path state become failed again cancel path delay state */
 		pp->state = newstate;
 		/*
@@ -2200,8 +2200,9 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
 	    (san_path_check_enabled(pp->mpp) ||
 	     marginal_path_check_enabled(pp->mpp))) {
-		int was_marginal = pp->marginal;
 		if (should_skip_path(pp)) {
+			if (!pp->marginal && pp->state != PATH_DELAYED)
+				condlog(2, "%s: path is now marginal", pp->dev);
 			if (!marginal_pathgroups) {
 				if (marginal_path_check_enabled(pp->mpp))
 					/* to reschedule as soon as possible,
@@ -2211,13 +2212,18 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 				pp->state = PATH_DELAYED;
 				return 1;
 			}
-			if (!was_marginal) {
+			if (!pp->marginal) {
 				pp->marginal = 1;
 				marginal_changed = 1;
 			}
-		} else if (marginal_pathgroups && was_marginal) {
-			pp->marginal = 0;
-			marginal_changed = 1;
+		} else {
+			if (pp->marginal || pp->state == PATH_DELAYED)
+				condlog(2, "%s: path is no longer marginal",
+					pp->dev);
+			if (marginal_pathgroups && pp->marginal) {
+				pp->marginal = 0;
+				marginal_changed = 1;
+			}
 		}
 	}
 
@@ -2343,11 +2349,8 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 	 */
 	condlog(4, "path prio refresh");
 
-	if (marginal_changed) {
-		condlog(2, "%s: path is %s marginal", pp->dev,
-			(pp->marginal)? "now" : "no longer");
+	if (marginal_changed)
 		reload_and_sync_map(pp->mpp, vecs, 1);
-	}
 	else if (update_prio(pp, new_path_up) &&
 	    (pp->mpp->pgpolicyfn == (pgpolicyfn *)group_by_prio) &&
 	     pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [dm-devel] [PATCH v3 0/1] Multipath io_err_stat fixes
  2021-01-19  4:46 [dm-devel] [PATCH v3 0/1] Multipath io_err_stat fixes Benjamin Marzinski
  2021-01-19  4:46 ` [dm-devel] [PATCH v3 1/1] multipathd: cleanup logging for marginal paths Benjamin Marzinski
@ 2021-01-19 10:10 ` Martin Wilck
  1 sibling, 0 replies; 3+ messages in thread
From: Martin Wilck @ 2021-01-19 10:10 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Mon, 2021-01-18 at 22:46 -0600, Benjamin Marzinski wrote:
> I found an ABBA deadlock in the io_err_stat marginal path code, and
> in
> the process of fixing it, noticed a potential crash on shutdown. This
> patchset addresses both of the issues.
> 
> Changes from v2:
> 
> 0001: This is patch 0006 from v2. It is exactly the same code. But
> the
> commit message now mentions that check_path() no longer switches
> paths
> from delayed to pending.
> 
> Changes from v1:
> 
> 0002: use cleanup_mutex instead of cleanup_unlock as suggested by
> Martin
> 
> 0003: add pthread_testcancel and use cleanup_mutex instead of
> cleanup_unlock as suggested by Martin. Also, make tmp_pathvec a
> constant
> pointer, since it should always equal _pathvec.
> 
> 0004-0006 are new patches to deal with io_err_stat issues from
> Martin's
> review
> 
> 
> Benjamin Marzinski (1):
>   multipathd: cleanup logging for marginal paths
> 
>  libmultipath/io_err_stat.c |  7 +++----
>  multipathd/main.c          | 25 ++++++++++++++-----------
>  2 files changed, 17 insertions(+), 15 deletions(-)
> 

Thanks a lot for adding the explanation.
For the whole 6-part series:

Reviewed-by: Martin Wilck <mwilck@suse.com>

Pushed to "queue" branch.

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-19 10:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-19  4:46 [dm-devel] [PATCH v3 0/1] Multipath io_err_stat fixes Benjamin Marzinski
2021-01-19  4:46 ` [dm-devel] [PATCH v3 1/1] multipathd: cleanup logging for marginal paths Benjamin Marzinski
2021-01-19 10:10 ` [dm-devel] [PATCH v3 0/1] Multipath io_err_stat fixes Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.