All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 3/5] libmultipath: drop mpp->nr_active field
       [not found] ` <20191115143403.31692-4-martin.wilck@suse.com>
@ 2019-11-19 22:22   ` Benjamin Marzinski
  2019-11-20  8:24     ` Martin Wilck
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Marzinski @ 2019-11-19 22:22 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Xose Vazquez Perez, Chongyun Wu, dm-devel, Bart Van Assche

On Fri, Nov 15, 2019 at 02:41:50PM +0000, Martin Wilck wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> The tracking of nr_active has turned out to be error prone and hard
> to verify. Calculating it on the fly is a quick operation, so
> do this rather than trying to track nr_active. Use a boolean
> field instead to track whether a map is currently in recovery mode.
> 
> Moreover, clean up the recovery logic by making set_no_path_retry()
> the place that checks the current configuration and state, sets
> "queue_if_no_path" if necessary, and enters or leaves recovery
> mode if necessary. This ensures that consistent logic is applied.

Thanks. This looks better. The only thing I'm am sorta worried about is
that when we call the cli_handler functions, we haven't called
update_multipath_strings() to sync the state with the kernel first. This
could mean that multipathd is wrong about what the queue_if_no_path state
is, which is possible since it doesn't update mpp->features whenever it
calls dm_queue_if_no_path(). However, in the worst case scenario, this
would only cause multipathd to need to wait for the next call to
check_path to fix this up. Alternatively, we might to call
update_multipath_strings() here, or even replace the calls to
set_no_path_retry() with something like setup_multipath() or
update_multipath().

Any thoughts? I might just be being overly paranoid here, since I can't
really come up with a good explanation for how this could even get to be
in a problem state, and like I said, even if it does occur, it will just
get resolved on the next call to check_path.

-Ben


> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/configure.c   |  5 ++-
>  libmultipath/devmapper.c   |  2 +-
>  libmultipath/io_err_stat.c |  4 +--
>  libmultipath/print.c       |  5 +--
>  libmultipath/structs.c     | 19 +++++++++++
>  libmultipath/structs.h     |  4 ++-
>  libmultipath/structs_vec.c | 69 ++++++++++++++++++++++++++------------
>  libmultipath/structs_vec.h |  1 -
>  multipathd/cli_handlers.c  | 39 ++++++++++-----------
>  multipathd/main.c          | 19 ++++-------
>  10 files changed, 101 insertions(+), 66 deletions(-)
> 
> diff --git a/libmultipath/configure.c b/libmultipath/configure.c
> index 5ac7d903..c95848a0 100644
> --- a/libmultipath/configure.c
> +++ b/libmultipath/configure.c
> @@ -401,7 +401,6 @@ int setup_map(struct multipath *mpp, char *params, int params_size,
>  			condlog(2, "%s: setting up map with %d/%d path checkers pending",
>  				mpp->alias, n_pending, n_paths);
>  	}
> -	mpp->nr_active = pathcount(mpp, PATH_UP) + pathcount(mpp, PATH_GHOST);
>  
>  	/*
>  	 * ponders each path group and determine highest prio pg
> @@ -934,8 +933,8 @@ int domap(struct multipath *mpp, char *params, int is_daemon)
>  		}
>  
>  		sysfs_set_max_sectors_kb(mpp, 0);
> -		if (is_daemon && mpp->ghost_delay > 0 && mpp->nr_active &&
> -		    pathcount(mpp, PATH_GHOST) == mpp->nr_active)
> +		if (is_daemon && mpp->ghost_delay > 0 && count_active_paths(mpp) &&
> +		    pathcount(mpp, PATH_UP) == 0)
>  			mpp->ghost_delay_tick = mpp->ghost_delay;
>  		r = dm_addmap_create(mpp, params);
>  
> diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> index acf576aa..bed8ddc6 100644
> --- a/libmultipath/devmapper.c
> +++ b/libmultipath/devmapper.c
> @@ -403,7 +403,7 @@ static uint16_t build_udev_flags(const struct multipath *mpp, int reload)
>  	/* DM_UDEV_DISABLE_LIBRARY_FALLBACK is added in dm_addmap */
>  	return	(mpp->skip_kpartx == SKIP_KPARTX_ON ?
>  		 MPATH_UDEV_NO_KPARTX_FLAG : 0) |
> -		((mpp->nr_active == 0 || mpp->ghost_delay_tick > 0)?
> +		((count_active_paths(mpp) == 0 || mpp->ghost_delay_tick > 0) ?
>  		 MPATH_UDEV_NO_PATHS_FLAG : 0) |
>  		(reload && !mpp->force_udev_reload ?
>  		 MPATH_UDEV_RELOAD_FLAG : 0);
> diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
> index dcc8690d..1b9cd6c0 100644
> --- a/libmultipath/io_err_stat.c
> +++ b/libmultipath/io_err_stat.c
> @@ -383,7 +383,7 @@ int need_io_err_check(struct path *pp)
>  
>  	if (uatomic_read(&io_err_thread_running) == 0)
>  		return 0;
> -	if (pp->mpp->nr_active <= 0) {
> +	if (count_active_paths(pp->mpp) <= 0) {
>  		io_err_stat_log(2, "%s: recover path early", pp->dev);
>  		goto recover;
>  	}
> @@ -481,7 +481,7 @@ static int poll_io_err_stat(struct vectors *vecs, struct io_err_stat_path *pp)
>  		 */
>  		path->tick = 1;
>  
> -	} else if (path->mpp && path->mpp->nr_active > 0) {
> +	} else if (path->mpp && count_active_paths(path->mpp) > 0) {
>  		io_err_stat_log(3, "%s: keep failing the dm path %s",
>  				path->mpp->alias, path->dev);
>  		path->io_err_pathfail_cnt = PATH_IO_ERR_WAITING_TO_CHECK;
> diff --git a/libmultipath/print.c b/libmultipath/print.c
> index b98e9bda..e1ef4d3f 100644
> --- a/libmultipath/print.c
> +++ b/libmultipath/print.c
> @@ -181,9 +181,10 @@ snprint_queueing (char * buff, size_t len, const struct multipath * mpp)
>  		return snprintf(buff, len, "-");
>  	else if (mpp->no_path_retry > 0) {
>  		if (mpp->retry_tick > 0)
> +
>  			return snprintf(buff, len, "%i sec",
>  					mpp->retry_tick);
> -		else if (mpp->retry_tick == 0 && mpp->nr_active > 0)
> +		else if (mpp->retry_tick == 0 && count_active_paths(mpp) > 0)
>  			return snprintf(buff, len, "%i chk",
>  					mpp->no_path_retry);
>  		else
> @@ -195,7 +196,7 @@ snprint_queueing (char * buff, size_t len, const struct multipath * mpp)
>  static int
>  snprint_nb_paths (char * buff, size_t len, const struct multipath * mpp)
>  {
> -	return snprint_int(buff, len, mpp->nr_active);
> +	return snprint_int(buff, len, count_active_paths(mpp));
>  }
>  
>  static int
> diff --git a/libmultipath/structs.c b/libmultipath/structs.c
> index f420ee5c..cc931e4e 100644
> --- a/libmultipath/structs.c
> +++ b/libmultipath/structs.c
> @@ -478,6 +478,25 @@ int pathcount(const struct multipath *mpp, int state)
>  	return count;
>  }
>  
> +int count_active_paths(const struct multipath *mpp)
> +{
> +	struct pathgroup *pgp;
> +	struct path *pp;
> +	int count = 0;
> +	int i, j;
> +
> +	if (!mpp->pg)
> +		return 0;
> +
> +	vector_foreach_slot (mpp->pg, pgp, i) {
> +		vector_foreach_slot (pgp->paths, pp, j) {
> +			if (pp->state == PATH_UP || pp->state == PATH_GHOST)
> +				count++;
> +		}
> +	}
> +	return count;
> +}
> +
>  int pathcmp(const struct pathgroup *pgp, const struct pathgroup *cpgp)
>  {
>  	int i, j;
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index 3665b89a..da4b6ca0 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -3,6 +3,7 @@
>  
>  #include <sys/types.h>
>  #include <inttypes.h>
> +#include <stdbool.h>
>  
>  #include "prio.h"
>  #include "byteorder.h"
> @@ -309,7 +310,6 @@ struct multipath {
>  	int pgfailback;
>  	int failback_tick;
>  	int rr_weight;
> -	int nr_active;     /* current available(= not known as failed) paths */
>  	int no_path_retry; /* number of retries after all paths are down */
>  	int retry_tick;    /* remaining times for retries */
>  	int disable_queueing;
> @@ -319,6 +319,7 @@ struct multipath {
>  	int fast_io_fail;
>  	int retain_hwhandler;
>  	int deferred_remove;
> +	bool in_recovery;
>  	int san_path_err_threshold;
>  	int san_path_err_forget_rate;
>  	int san_path_err_recovery_time;
> @@ -449,6 +450,7 @@ struct path * first_path (const struct multipath *mpp);
>  
>  int pathcountgr (const struct pathgroup *, int);
>  int pathcount (const struct multipath *, int);
> +int count_active_paths(const struct multipath *);
>  int pathcmp (const struct pathgroup *, const struct pathgroup *);
>  int add_feature (char **, const char *);
>  int remove_feature (char **, const char *);
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index fbe97662..0c5a3a81 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -290,10 +290,15 @@ update_multipath_strings(struct multipath *mpp, vector pathvec, int is_daemon)
>  	return 0;
>  }
>  
> -void enter_recovery_mode(struct multipath *mpp)
> +static void enter_recovery_mode(struct multipath *mpp)
>  {
>  	unsigned int checkint;
> -	struct config *conf = get_multipath_config();
> +	struct config *conf;
> +
> +	if (mpp->in_recovery || mpp->no_path_retry <= 0)
> +		return;
> +
> +	conf = get_multipath_config();
>  	checkint = conf->checkint;
>  	put_multipath_config(conf);
>  
> @@ -302,17 +307,37 @@ void enter_recovery_mode(struct multipath *mpp)
>  	 * meaning of +1: retry_tick may be decremented in checkerloop before
>  	 * starting retry.
>  	 */
> +	mpp->in_recovery = true;
>  	mpp->stat_queueing_timeouts++;
>  	mpp->retry_tick = mpp->no_path_retry * checkint + 1;
>  	condlog(1, "%s: Entering recovery mode: max_retries=%d",
>  		mpp->alias, mpp->no_path_retry);
>  }
>  
> +static void leave_recovery_mode(struct multipath *mpp)
> +{
> +	bool recovery = mpp->in_recovery;
> +
> +	mpp->in_recovery = false;
> +	mpp->retry_tick = 0;
> +
> +	/*
> +	 * in_recovery is only ever set if mpp->no_path_retry > 0
> +	 * (see enter_recovery_mode()). But no_path_retry may have been
> +	 * changed while the map was recovering, so test it here again.
> +	 */
> +	if (recovery && (mpp->no_path_retry == NO_PATH_RETRY_QUEUE ||
> +			 mpp->no_path_retry > 0)) {
> +		dm_queue_if_no_path(mpp->alias, 1);
> +		condlog(2, "%s: queue_if_no_path enabled", mpp->alias);
> +		condlog(1, "%s: Recovered to normal mode", mpp->alias);
> +	}
> +}
> +
>  void set_no_path_retry(struct multipath *mpp)
>  {
> -	char is_queueing = 0;
> +	bool is_queueing = 0;
>  
> -	mpp->nr_active = pathcount(mpp, PATH_UP) + pathcount(mpp, PATH_GHOST);
>  	if (mpp->features && strstr(mpp->features, "queue_if_no_path"))
>  		is_queueing = 1;
>  
> @@ -328,11 +353,15 @@ void set_no_path_retry(struct multipath *mpp)
>  			dm_queue_if_no_path(mpp->alias, 1);
>  		break;
>  	default:
> -		if (mpp->nr_active > 0) {
> -			mpp->retry_tick = 0;
> -			if (!is_queueing)
> +		if (count_active_paths(mpp) > 0) {
> +			/*
> +			 * If in_recovery is set, leave_recovery_mode() takes
> +			 * care of dm_queue_if_no_path. Otherwise, do it here.
> +			 */
> +			if (!is_queueing && !mpp->in_recovery)
>  				dm_queue_if_no_path(mpp->alias, 1);
> -		} else if (is_queueing && mpp->retry_tick == 0)
> +			leave_recovery_mode(mpp);
> +		} else
>  			enter_recovery_mode(mpp);
>  		break;
>  	}
> @@ -480,25 +509,23 @@ int verify_paths(struct multipath *mpp, struct vectors *vecs)
>   */
>  void update_queue_mode_del_path(struct multipath *mpp)
>  {
> -	if (--mpp->nr_active == 0) {
> -		if (mpp->no_path_retry > 0)
> -			enter_recovery_mode(mpp);
> -		else if (mpp->no_path_retry != NO_PATH_RETRY_QUEUE)
> +	int active = count_active_paths(mpp);
> +
> +	if (active == 0) {
> +		enter_recovery_mode(mpp);
> +		if (mpp->no_path_retry != NO_PATH_RETRY_QUEUE)
>  			mpp->stat_map_failures++;
>  	}
> -	condlog(2, "%s: remaining active paths: %d", mpp->alias, mpp->nr_active);
> +	condlog(2, "%s: remaining active paths: %d", mpp->alias, active);
>  }
>  
>  void update_queue_mode_add_path(struct multipath *mpp)
>  {
> -	if (mpp->nr_active++ == 0 && mpp->no_path_retry > 0) {
> -		/* come back to normal mode from retry mode */
> -		mpp->retry_tick = 0;
> -		dm_queue_if_no_path(mpp->alias, 1);
> -		condlog(2, "%s: queue_if_no_path enabled", mpp->alias);
> -		condlog(1, "%s: Recovered to normal mode", mpp->alias);
> -	}
> -	condlog(2, "%s: remaining active paths: %d", mpp->alias, mpp->nr_active);
> +	int active = count_active_paths(mpp);
> +
> +	if (active > 0)
> +		leave_recovery_mode(mpp);
> +	condlog(2, "%s: remaining active paths: %d", mpp->alias, active);
>  }
>  
>  vector get_used_hwes(const struct _vector *pathvec)
> diff --git a/libmultipath/structs_vec.h b/libmultipath/structs_vec.h
> index d3219278..678efe4d 100644
> --- a/libmultipath/structs_vec.h
> +++ b/libmultipath/structs_vec.h
> @@ -12,7 +12,6 @@ struct vectors {
>  };
>  
>  void set_no_path_retry(struct multipath *mpp);
> -void enter_recovery_mode(struct multipath *mpp);
>  
>  int adopt_paths (vector pathvec, struct multipath * mpp);
>  void orphan_paths(vector pathvec, struct multipath *mpp,
> diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> index 371b0a79..7d878c88 100644
> --- a/multipathd/cli_handlers.c
> +++ b/multipathd/cli_handlers.c
> @@ -1024,16 +1024,17 @@ cli_restore_queueing(void *v, char **reply, int *len, void *data)
>  	select_no_path_retry(conf, mpp);
>  	pthread_cleanup_pop(1);
>  
> +	/*
> +	 * Don't call set_no_path_retry() for the NO_PATH_RETRY_FAIL case.
> +	 * That would disable queueing when "restorequeueing" is called,
> +	 * and the code never behaved that way. Users might not expect it.
> +	 * In almost all cases, queueing will be disabled anyway when we
> +	 * are here.
> +	 */
>  	if (mpp->no_path_retry != NO_PATH_RETRY_UNDEF &&
> -			mpp->no_path_retry != NO_PATH_RETRY_FAIL) {
> -		dm_queue_if_no_path(mpp->alias, 1);
> -		if (mpp->no_path_retry > 0) {
> -			if (mpp->nr_active > 0)
> -				mpp->retry_tick = 0;
> -			else
> -				enter_recovery_mode(mpp);
> -		}
> -	}
> +	    mpp->no_path_retry != NO_PATH_RETRY_FAIL)
> +		set_no_path_retry(mpp);
> +
>  	return 0;
>  }
>  
> @@ -1051,16 +1052,10 @@ cli_restore_all_queueing(void *v, char **reply, int *len, void *data)
>  		pthread_cleanup_push(put_multipath_config, conf);
>  		select_no_path_retry(conf, mpp);
>  		pthread_cleanup_pop(1);
> +		/* See comment in cli_restore_queueing() */
>  		if (mpp->no_path_retry != NO_PATH_RETRY_UNDEF &&
> -		    mpp->no_path_retry != NO_PATH_RETRY_FAIL) {
> -			dm_queue_if_no_path(mpp->alias, 1);
> -			if (mpp->no_path_retry > 0) {
> -				if (mpp->nr_active > 0)
> -					mpp->retry_tick = 0;
> -				else
> -					enter_recovery_mode(mpp);
> -			}
> -		}
> +		    mpp->no_path_retry != NO_PATH_RETRY_FAIL)
> +			set_no_path_retry(mpp);
>  	}
>  	return 0;
>  }
> @@ -1085,12 +1080,12 @@ cli_disable_queueing(void *v, char **reply, int *len, void *data)
>  		return 1;
>  	}
>  
> -	if (mpp->nr_active == 0)
> +	if (count_active_paths(mpp) == 0)
>  		mpp->stat_map_failures++;
>  	mpp->retry_tick = 0;
>  	mpp->no_path_retry = NO_PATH_RETRY_FAIL;
>  	mpp->disable_queueing = 1;
> -	dm_queue_if_no_path(mpp->alias, 0);
> +	set_no_path_retry(mpp);
>  	return 0;
>  }
>  
> @@ -1103,12 +1098,12 @@ cli_disable_all_queueing(void *v, char **reply, int *len, void *data)
>  
>  	condlog(2, "disable queueing (operator)");
>  	vector_foreach_slot(vecs->mpvec, mpp, i) {
> -		if (mpp->nr_active == 0)
> +		if (count_active_paths(mpp) == 0)
>  			mpp->stat_map_failures++;
>  		mpp->retry_tick = 0;
>  		mpp->no_path_retry = NO_PATH_RETRY_FAIL;
>  		mpp->disable_queueing = 1;
> -		dm_queue_if_no_path(mpp->alias, 0);
> +		set_no_path_retry(mpp);
>  	}
>  	return 0;
>  }
> diff --git a/multipathd/main.c b/multipathd/main.c
> index a21d96e4..c0423602 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1616,7 +1616,7 @@ fail_path (struct path * pp, int del_active)
>   * caller must have locked the path list before calling that function
>   */
>  static int
> -reinstate_path (struct path * pp, int add_active)
> +reinstate_path (struct path * pp)
>  {
>  	int ret = 0;
>  
> @@ -1628,8 +1628,7 @@ reinstate_path (struct path * pp, int add_active)
>  		ret = 1;
>  	} else {
>  		condlog(2, "%s: reinstated", pp->dev_t);
> -		if (add_active)
> -			update_queue_mode_add_path(pp->mpp);
> +		update_queue_mode_add_path(pp->mpp);
>  	}
>  	return ret;
>  }
> @@ -1861,7 +1860,7 @@ static int check_path_reinstate_state(struct path * pp) {
>  
>  	if (pp->disable_reinstate) {
>  		/* If there are no other usable paths, reinstate the path */
> -		if (pp->mpp->nr_active == 0) {
> +		if (count_active_paths(pp->mpp) == 0) {
>  			condlog(2, "%s : reinstating path early", pp->dev);
>  			goto reinstate_path;
>  		}
> @@ -1960,7 +1959,6 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
>  	int newstate;
>  	int new_path_up = 0;
>  	int chkr_new_path_up = 0;
> -	int add_active;
>  	int disable_reinstate = 0;
>  	int oldchkrstate = pp->chkrstate;
>  	int retrigger_tries, verbosity;
> @@ -2134,7 +2132,7 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
>  	 * paths if there are no other active paths in map.
>  	 */
>  	disable_reinstate = (newstate == PATH_GHOST &&
> -			     pp->mpp->nr_active == 0 &&
> +			     count_active_paths(pp->mpp) == 0 &&
>  			     path_get_tpgs(pp) == TPGS_IMPLICIT) ? 1 : 0;
>  
>  	pp->chkrstate = newstate;
> @@ -2185,12 +2183,7 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
>  		/*
>  		 * reinstate this path
>  		 */
> -		if (oldstate != PATH_UP &&
> -		    oldstate != PATH_GHOST)
> -			add_active = 1;
> -		else
> -			add_active = 0;
> -		if (!disable_reinstate && reinstate_path(pp, add_active)) {
> +		if (!disable_reinstate && reinstate_path(pp)) {
>  			condlog(3, "%s: reload map", pp->dev);
>  			ev_add_path(pp, vecs, 1);
>  			pp->tick = 1;
> @@ -2213,7 +2206,7 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
>  		    pp->dmstate == PSTATE_UNDEF) &&
>  		    !disable_reinstate) {
>  			/* Clear IO errors */
> -			if (reinstate_path(pp, 0)) {
> +			if (reinstate_path(pp)) {
>  				condlog(3, "%s: reload map", pp->dev);
>  				ev_add_path(pp, vecs, 1);
>  				pp->tick = 1;
> -- 
> 2.24.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 5/5] libmultipath: fix ALUA autodetection when paths are down
       [not found] ` <20191115143403.31692-6-martin.wilck@suse.com>
@ 2019-11-19 22:29   ` Benjamin Marzinski
  2019-11-20 12:33     ` Martin Wilck
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Marzinski @ 2019-11-19 22:29 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Xose Vazquez Perez, Chongyun Wu, dm-devel, Bart Van Assche

On Fri, Nov 15, 2019 at 02:41:54PM +0000, Martin Wilck wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> If a single path was offline when detect_alua() was called,
> multipathd would assume ALUA was generally unsupported.
> 
> Fix that by assuming that if at least one path has ALUA support and
> no path explicitly does not have it, ALUA is supported.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/discovery.c | 22 +++++++++++++++++++++-
>  libmultipath/propsel.c   | 20 +++++++++++++++++---
>  2 files changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 4288c9fd..5f41dcb7 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -871,6 +871,10 @@ get_serial (char * str, int maxlen, int fd)
>  	return 1;
>  }
>  
> +/*
> + * Side effect: sets pp->tpgs if it could be determined.
> + * If ALUA calls fail because paths are unreachable, pp->tpgs remains unchanged.
> + */
>  static void
>  detect_alua(struct path * pp)
>  {
> @@ -881,12 +885,28 @@ detect_alua(struct path * pp)
>  	if (sysfs_get_timeout(pp, &timeout) <= 0)
>  		timeout = DEF_TIMEOUT;
>  
> -	if ((tpgs = get_target_port_group_support(pp, timeout)) <= 0) {
> +	tpgs = get_target_port_group_support(pp, timeout);
> +	if (tpgs == -RTPG_INQUIRY_FAILED)
> +		return;
> +	else if (tpgs <= 0) {
>  		pp->tpgs = TPGS_NONE;
>  		return;
>  	}
> +
> +	if (pp->fd == -1 || pp->offline)
> +		return;
> +
 
This is just a nitpick, but won't tpgs already be -RTPG_INQUIRY_FAILED
if pp->fd == -1. This check makes more sense before
get_target_port_group_support().

-Ben
 
>  	ret = get_target_port_group(pp, timeout);
>  	if (ret < 0 || get_asymmetric_access_state(pp, ret, timeout) < 0) {
> +		int state;
> +
> +		if (ret == -RTPG_INQUIRY_FAILED)
> +			return;
> +
> +		state = path_offline(pp);
> +		if (state == PATH_DOWN || state == PATH_PENDING)
> +			return;
> +
>  		pp->tpgs = TPGS_NONE;
>  		return;
>  	}
> diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
> index 27e8d68a..a5fc6ba0 100644
> --- a/libmultipath/propsel.c
> +++ b/libmultipath/propsel.c
> @@ -432,12 +432,26 @@ int select_hwhandler(struct config *conf, struct multipath *mp)
>  	static const char tpgs_origin[]= "(setting: autodetected from TPGS)";
>  	char *dh_state;
>  	int i;
> -	bool all_tpgs = true;
> +	bool all_tpgs = true, one_tpgs = false;
>  
>  	dh_state = &handler[2];
>  
> -	vector_foreach_slot(mp->paths, pp, i)
> -		all_tpgs = all_tpgs && (path_get_tpgs(pp) > 0);
> +	/*
> +	 * TPGS_UNDEF means that ALUA support couldn't determined either way
> +	 * yet, probably because the path was always down.
> +	 * If at least one path does have TPGS support, and no path has
> +	 * TPGS_NONE, assume that TPGS would be supported by all paths if
> +	 * all were up.
> +	 */
> +	vector_foreach_slot(mp->paths, pp, i) {
> +		int tpgs = path_get_tpgs(pp);
> +
> +		all_tpgs = all_tpgs && tpgs != TPGS_NONE;
> +		one_tpgs = one_tpgs ||
> +			(tpgs != TPGS_NONE && tpgs != TPGS_UNDEF);
> +	}
> +	all_tpgs = all_tpgs && one_tpgs;
> +
>  	if (mp->retain_hwhandler != RETAIN_HWHANDLER_OFF) {
>  		vector_foreach_slot(mp->paths, pp, i) {
>  			if (get_dh_state(pp, dh_state, sizeof(handler) - 2) > 0
> -- 
> 2.24.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/5] multipathd: move set_no_path_retry() back to libmultipath
       [not found] ` <20191115143403.31692-2-martin.wilck@suse.com>
@ 2019-11-19 22:30   ` Benjamin Marzinski
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2019-11-19 22:30 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Xose Vazquez Perez, Chongyun Wu, dm-devel, Bart Van Assche

On Fri, Nov 15, 2019 at 02:41:46PM +0000, Martin Wilck wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> This function is useful elsewhere, too. No code changes except
> for changing the linkage.
> 
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/structs_vec.c | 30 ++++++++++++++++++++++++++++++
>  libmultipath/structs_vec.h |  1 +
>  multipathd/main.c          | 30 ------------------------------
>  3 files changed, 31 insertions(+), 30 deletions(-)
> 
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index 6991f9ac..fbe97662 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -308,6 +308,36 @@ void enter_recovery_mode(struct multipath *mpp)
>  		mpp->alias, mpp->no_path_retry);
>  }
>  
> +void set_no_path_retry(struct multipath *mpp)
> +{
> +	char is_queueing = 0;
> +
> +	mpp->nr_active = pathcount(mpp, PATH_UP) + pathcount(mpp, PATH_GHOST);
> +	if (mpp->features && strstr(mpp->features, "queue_if_no_path"))
> +		is_queueing = 1;
> +
> +	switch (mpp->no_path_retry) {
> +	case NO_PATH_RETRY_UNDEF:
> +		break;
> +	case NO_PATH_RETRY_FAIL:
> +		if (is_queueing)
> +			dm_queue_if_no_path(mpp->alias, 0);
> +		break;
> +	case NO_PATH_RETRY_QUEUE:
> +		if (!is_queueing)
> +			dm_queue_if_no_path(mpp->alias, 1);
> +		break;
> +	default:
> +		if (mpp->nr_active > 0) {
> +			mpp->retry_tick = 0;
> +			if (!is_queueing)
> +				dm_queue_if_no_path(mpp->alias, 1);
> +		} else if (is_queueing && mpp->retry_tick == 0)
> +			enter_recovery_mode(mpp);
> +		break;
> +	}
> +}
> +
>  void
>  sync_map_state(struct multipath *mpp)
>  {
> diff --git a/libmultipath/structs_vec.h b/libmultipath/structs_vec.h
> index f8b9f63e..d3219278 100644
> --- a/libmultipath/structs_vec.h
> +++ b/libmultipath/structs_vec.h
> @@ -11,6 +11,7 @@ struct vectors {
>  	vector mpvec;
>  };
>  
> +void set_no_path_retry(struct multipath *mpp);
>  void enter_recovery_mode(struct multipath *mpp);
>  
>  int adopt_paths (vector pathvec, struct multipath * mpp);
> diff --git a/multipathd/main.c b/multipathd/main.c
> index bb5c1f1d..a21d96e4 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -409,36 +409,6 @@ set_multipath_wwid (struct multipath * mpp)
>  	dm_get_uuid(mpp->alias, mpp->wwid, WWID_SIZE);
>  }
>  
> -static void set_no_path_retry(struct multipath *mpp)
> -{
> -	char is_queueing = 0;
> -
> -	mpp->nr_active = pathcount(mpp, PATH_UP) + pathcount(mpp, PATH_GHOST);
> -	if (mpp->features && strstr(mpp->features, "queue_if_no_path"))
> -		is_queueing = 1;
> -
> -	switch (mpp->no_path_retry) {
> -	case NO_PATH_RETRY_UNDEF:
> -		break;
> -	case NO_PATH_RETRY_FAIL:
> -		if (is_queueing)
> -			dm_queue_if_no_path(mpp->alias, 0);
> -		break;
> -	case NO_PATH_RETRY_QUEUE:
> -		if (!is_queueing)
> -			dm_queue_if_no_path(mpp->alias, 1);
> -		break;
> -	default:
> -		if (mpp->nr_active > 0) {
> -			mpp->retry_tick = 0;
> -			if (!is_queueing)
> -				dm_queue_if_no_path(mpp->alias, 1);
> -		} else if (is_queueing && mpp->retry_tick == 0)
> -			enter_recovery_mode(mpp);
> -		break;
> -	}
> -}
> -
>  int __setup_multipath(struct vectors *vecs, struct multipath *mpp,
>  		      int reset)
>  {
> -- 
> 2.24.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/5] libmultipath: dict.c: rename duplicate set_no_path_retry()
       [not found] ` <20191115143403.31692-3-martin.wilck@suse.com>
@ 2019-11-19 22:30   ` Benjamin Marzinski
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2019-11-19 22:30 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Xose Vazquez Perez, Chongyun Wu, dm-devel, Bart Van Assche

On Fri, Nov 15, 2019 at 02:41:48PM +0000, Martin Wilck wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> We have two functions called set_no_path_retry(). Fix that by
> renaming the function in dict.c to no_path_retry_helper().
> 
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/dict.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> index a90690fa..e8a5ecb2 100644
> --- a/libmultipath/dict.c
> +++ b/libmultipath/dict.c
> @@ -1099,7 +1099,7 @@ declare_mp_handler(pgfailback, set_pgfailback)
>  declare_mp_snprint(pgfailback, print_pgfailback)
>  
>  static int
> -set_no_path_retry(vector strvec, void *ptr)
> +no_path_retry_helper(vector strvec, void *ptr)
>  {
>  	int *int_ptr = (int *)ptr;
>  	char * buff;
> @@ -1134,13 +1134,13 @@ print_no_path_retry(char * buff, int len, long v)
>  	}
>  }
>  
> -declare_def_handler(no_path_retry, set_no_path_retry)
> +declare_def_handler(no_path_retry, no_path_retry_helper)
>  declare_def_snprint(no_path_retry, print_no_path_retry)
> -declare_ovr_handler(no_path_retry, set_no_path_retry)
> +declare_ovr_handler(no_path_retry, no_path_retry_helper)
>  declare_ovr_snprint(no_path_retry, print_no_path_retry)
> -declare_hw_handler(no_path_retry, set_no_path_retry)
> +declare_hw_handler(no_path_retry, no_path_retry_helper)
>  declare_hw_snprint(no_path_retry, print_no_path_retry)
> -declare_mp_handler(no_path_retry, set_no_path_retry)
> +declare_mp_handler(no_path_retry, no_path_retry_helper)
>  declare_mp_snprint(no_path_retry, print_no_path_retry)
>  
>  static int
> -- 
> 2.24.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/5] libmultipath: fix (max_)polling_interval setting logic
       [not found] ` <20191115143403.31692-5-martin.wilck@suse.com>
@ 2019-11-19 22:31   ` Benjamin Marzinski
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2019-11-19 22:31 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Xose Vazquez Perez, Chongyun Wu, dm-devel, Bart Van Assche

On Fri, Nov 15, 2019 at 02:41:52PM +0000, Martin Wilck wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> checkint should never be larger than max_checkint. The WATCHDOG_USEC
> setting should be honored on "reconfigure", too, not only on startup.
> Pathological values for checkint and integer overflows should be avoided.
> The logic should work reasonably well if both polling_interval and
> max_polling_interval, just one of them, or neither is set.
> 
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/config.c   | 40 +++++++++++++++++++++++++++++++++++++---
>  libmultipath/config.h   |  1 +
>  libmultipath/defaults.h |  3 +--
>  multipathd/main.c       | 26 ++++++--------------------
>  4 files changed, 45 insertions(+), 25 deletions(-)
> 
> diff --git a/libmultipath/config.c b/libmultipath/config.c
> index 49723add..0bf7bb40 100644
> --- a/libmultipath/config.c
> +++ b/libmultipath/config.c
> @@ -683,6 +683,27 @@ process_config_dir(struct config *conf, char *dir)
>  	pthread_cleanup_pop(1);
>  }
>  
> +static void set_max_checkint_from_watchdog(struct config *conf)
> +{
> +#ifdef USE_SYSTEMD
> +	char *envp = getenv("WATCHDOG_USEC");
> +	unsigned long checkint;
> +
> +	if (envp && sscanf(envp, "%lu", &checkint) == 1) {
> +		/* Value is in microseconds */
> +		checkint /= 1000000;
> +		if (checkint < 1 || checkint > UINT_MAX) {
> +			condlog(1, "invalid value for WatchdogSec: \"%s\"", envp);
> +			return;
> +		}
> +		if (conf->max_checkint == 0 || conf->max_checkint > checkint)
> +			conf->max_checkint = checkint;
> +		condlog(3, "enabling watchdog, interval %ld", checkint);
> +		conf->use_watchdog = true;
> +	}
> +#endif
> +}
> +
>  struct config *
>  load_config (char * file)
>  {
> @@ -703,7 +724,8 @@ load_config (char * file)
>  	conf->multipath_dir = set_default(DEFAULT_MULTIPATHDIR);
>  	conf->attribute_flags = 0;
>  	conf->reassign_maps = DEFAULT_REASSIGN_MAPS;
> -	conf->checkint = DEFAULT_CHECKINT;
> +	conf->checkint = CHECKINT_UNDEF;
> +	conf->use_watchdog = false;
>  	conf->max_checkint = 0;
>  	conf->force_sync = DEFAULT_FORCE_SYNC;
>  	conf->partition_delim = (default_partition_delim != NULL ?
> @@ -754,8 +776,20 @@ load_config (char * file)
>  	/*
>  	 * fill the voids left in the config file
>  	 */
> -	if (conf->max_checkint == 0)
> -		conf->max_checkint = MAX_CHECKINT(conf->checkint);
> +	set_max_checkint_from_watchdog(conf);
> +	if (conf->max_checkint == 0) {
> +		if (conf->checkint == CHECKINT_UNDEF)
> +			conf->checkint = DEFAULT_CHECKINT;
> +		conf->max_checkint = (conf->checkint < UINT_MAX / 4 ?
> +				      conf->checkint * 4 : UINT_MAX);
> +	} else if (conf->checkint == CHECKINT_UNDEF)
> +		conf->checkint = (conf->max_checkint >= 4 ?
> +				  conf->max_checkint / 4 : 1);
> +	else if (conf->checkint > conf->max_checkint)
> +		conf->checkint = conf->max_checkint;
> +	condlog(3, "polling interval: %d, max: %d",
> +		conf->checkint, conf->max_checkint);
> +
>  	if (conf->blist_devnode == NULL) {
>  		conf->blist_devnode = vector_alloc();
>  
> diff --git a/libmultipath/config.h b/libmultipath/config.h
> index 2ab7b42c..a078947c 100644
> --- a/libmultipath/config.h
> +++ b/libmultipath/config.h
> @@ -139,6 +139,7 @@ struct config {
>  	int minio_rq;
>  	unsigned int checkint;
>  	unsigned int max_checkint;
> +	bool use_watchdog;
>  	int pgfailback;
>  	int remove;
>  	int rr_weight;
> diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
> index d7034655..e5ee6afe 100644
> --- a/libmultipath/defaults.h
> +++ b/libmultipath/defaults.h
> @@ -53,9 +53,8 @@
>  /* Enable all foreign libraries by default */
>  #define DEFAULT_ENABLE_FOREIGN ""
>  
> -#define CHECKINT_UNDEF		(~0U)
> +#define CHECKINT_UNDEF		UINT_MAX
>  #define DEFAULT_CHECKINT	5
> -#define MAX_CHECKINT(a)		(a << 2)
>  
>  #define MAX_DEV_LOSS_TMO	UINT_MAX
>  #define DEFAULT_PIDFILE		"/" RUN_DIR "/multipathd.pid"
> diff --git a/multipathd/main.c b/multipathd/main.c
> index c0423602..95426d3d 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -33,10 +33,6 @@
>   */
>  #include "checkers.h"
>  
> -#ifdef USE_SYSTEMD
> -static int use_watchdog;
> -#endif
> -
>  /*
>   * libmultipath
>   */
> @@ -2284,6 +2280,7 @@ checkerloop (void *ap)
>  	struct timespec last_time;
>  	struct config *conf;
>  	int foreign_tick = 0;
> +	bool use_watchdog;
>  
>  	pthread_cleanup_push(rcu_unregister, NULL);
>  	rcu_register_thread();
> @@ -2295,6 +2292,11 @@ checkerloop (void *ap)
>  	get_monotonic_time(&last_time);
>  	last_time.tv_sec -= 1;
>  
> +	/* use_watchdog is set from process environment and never changes */
> +	conf = get_multipath_config();
> +	use_watchdog = conf->use_watchdog;
> +	put_multipath_config(conf);
> +
>  	while (1) {
>  		struct timespec diff_time, start_time, end_time;
>  		int num_paths = 0, strict_timing, rc = 0;
> @@ -2766,7 +2768,6 @@ child (__attribute__((unused)) void *param)
>  	struct multipath * mpp;
>  	int i;
>  #ifdef USE_SYSTEMD
> -	unsigned long checkint;
>  	int startup_done = 0;
>  #endif
>  	int rc;
> @@ -2843,21 +2844,6 @@ child (__attribute__((unused)) void *param)
>  	setscheduler();
>  	set_oom_adj();
>  
> -#ifdef USE_SYSTEMD
> -	envp = getenv("WATCHDOG_USEC");
> -	if (envp && sscanf(envp, "%lu", &checkint) == 1) {
> -		/* Value is in microseconds */
> -		conf->max_checkint = checkint / 1000000;
> -		/* Rescale checkint */
> -		if (conf->checkint > conf->max_checkint)
> -			conf->checkint = conf->max_checkint;
> -		else
> -			conf->checkint = conf->max_checkint / 4;
> -		condlog(3, "enabling watchdog, interval %d max %d",
> -			conf->checkint, conf->max_checkint);
> -		use_watchdog = conf->checkint;
> -	}
> -#endif
>  	/*
>  	 * Startup done, invalidate configuration
>  	 */
> -- 
> 2.24.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/5] libmultipath: drop mpp->nr_active field
  2019-11-19 22:22   ` [PATCH 3/5] libmultipath: drop mpp->nr_active field Benjamin Marzinski
@ 2019-11-20  8:24     ` Martin Wilck
  0 siblings, 0 replies; 8+ messages in thread
From: Martin Wilck @ 2019-11-20  8:24 UTC (permalink / raw)
  To: bmarzins; +Cc: xose.vazquez, wu.chongyun, dm-devel, Bart.VanAssche

On Tue, 2019-11-19 at 16:22 -0600, Benjamin Marzinski wrote:
> On Fri, Nov 15, 2019 at 02:41:50PM +0000, Martin Wilck wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > The tracking of nr_active has turned out to be error prone and hard
> > to verify. Calculating it on the fly is a quick operation, so
> > do this rather than trying to track nr_active. Use a boolean
> > field instead to track whether a map is currently in recovery mode.
> > 
> > Moreover, clean up the recovery logic by making set_no_path_retry()
> > the place that checks the current configuration and state, sets
> > "queue_if_no_path" if necessary, and enters or leaves recovery
> > mode if necessary. This ensures that consistent logic is applied.
> 
> Thanks. This looks better. The only thing I'm am sorta worried about
> is
> that when we call the cli_handler functions, we haven't called
> update_multipath_strings() to sync the state with the kernel first.
> This
> could mean that multipathd is wrong about what the queue_if_no_path
> state
> is, which is possible since it doesn't update mpp->features whenever
> it
> calls dm_queue_if_no_path(). However, in the worst case scenario,
> this
> would only cause multipathd to need to wait for the next call to
> check_path to fix this up. Alternatively, we might to call
> update_multipath_strings() here, or even replace the calls to
> set_no_path_retry() with something like setup_multipath() or
> update_multipath().
> 
> Any thoughts? I might just be being overly paranoid here, since I
> can't
> really come up with a good explanation for how this could even get to
> be
> in a problem state, and like I said, even if it does occur, it will
> just
> get resolved on the next call to check_path.

Having to call setup_multipath() in this code path seems too much to
me. One tempting thought would be to simply keep mpp->features up-to-
date, but we're trying to achieve consistent state with this patch set,
and that doesn't go well together with trying to track kernel state in
user space. After all, in theory at least, users could run "dmsetup
message /dev/dm-$X 0 fail_if_no_path" any time, and there's no way to
get notified about this. The way multipathd currently works, we would
find out in check_path(), no sooner.

So, I believe the cli_handler code path should not look at the features
string. I'll send a v2.

Thanks for the review and regards,
Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 5/5] libmultipath: fix ALUA autodetection when paths are down
  2019-11-19 22:29   ` [PATCH 5/5] libmultipath: fix ALUA autodetection when paths are down Benjamin Marzinski
@ 2019-11-20 12:33     ` Martin Wilck
  2019-11-20 17:01       ` Benjamin Marzinski
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2019-11-20 12:33 UTC (permalink / raw)
  To: bmarzins; +Cc: xose.vazquez, wu.chongyun, dm-devel, Bart.VanAssche

On Tue, 2019-11-19 at 16:29 -0600, Benjamin Marzinski wrote:
> On Fri, Nov 15, 2019 at 02:41:54PM +0000, Martin Wilck wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > If a single path was offline when detect_alua() was called,
> > multipathd would assume ALUA was generally unsupported.
> > 
> > Fix that by assuming that if at least one path has ALUA support and
> > no path explicitly does not have it, ALUA is supported.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  libmultipath/discovery.c | 22 +++++++++++++++++++++-
> >  libmultipath/propsel.c   | 20 +++++++++++++++++---
> >  2 files changed, 38 insertions(+), 4 deletions(-)
> > 
> > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > index 4288c9fd..5f41dcb7 100644
> > --- a/libmultipath/discovery.c
> > +++ b/libmultipath/discovery.c
> > @@ -871,6 +871,10 @@ get_serial (char * str, int maxlen, int fd)
> >  	return 1;
> >  }
> >  
> > /*
> > + * Side effect: sets pp->tpgs if it could be determined.
> > + * If ALUA calls fail because paths are unreachable, pp->tpgs
> > remains unchanged.
> > + */
> >  static void
> >  detect_alua(struct path * pp)
> >  {
> > @@ -881,12 +885,28 @@ detect_alua(struct path * pp)
> >  	if (sysfs_get_timeout(pp, &timeout) <= 0)
> >  		timeout = DEF_TIMEOUT;
> >  
> > -	if ((tpgs = get_target_port_group_support(pp, timeout)) <= 0) {
> > +	tpgs = get_target_port_group_support(pp, timeout);
> > +	if (tpgs == -RTPG_INQUIRY_FAILED)
> > +		return;
> > +	else if (tpgs <= 0) {
> >  		pp->tpgs = TPGS_NONE;
> >  		return;
> >  	}
> > +
> > +	if (pp->fd == -1 || pp->offline)
> > +		return;
> > +
>  
> This is just a nitpick, but won't tpgs already be
> -RTPG_INQUIRY_FAILED
> if pp->fd == -1. This check makes more sense before
> get_target_port_group_support().

Not really, because get_target_port_group_support() normally obtains
INQUIRY data from sysfs, which can return something reasonable even
if fd == -1 (e.g. if the path is temporarily offline). In particular,
it could indicate that the device has no TPGS support. That's why I
call it first.

Regards,
Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 5/5] libmultipath: fix ALUA autodetection when paths are down
  2019-11-20 12:33     ` Martin Wilck
@ 2019-11-20 17:01       ` Benjamin Marzinski
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2019-11-20 17:01 UTC (permalink / raw)
  To: Martin Wilck; +Cc: xose.vazquez, wu.chongyun, dm-devel, Bart.VanAssche

On Wed, Nov 20, 2019 at 12:33:15PM +0000, Martin Wilck wrote:
> On Tue, 2019-11-19 at 16:29 -0600, Benjamin Marzinski wrote:
> > On Fri, Nov 15, 2019 at 02:41:54PM +0000, Martin Wilck wrote:
> > > From: Martin Wilck <mwilck@suse.com>
> > > 
> > > If a single path was offline when detect_alua() was called,
> > > multipathd would assume ALUA was generally unsupported.
> > > 
> > > Fix that by assuming that if at least one path has ALUA support and
> > > no path explicitly does not have it, ALUA is supported.
> > > 
> > > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > > ---
> > >  libmultipath/discovery.c | 22 +++++++++++++++++++++-
> > >  libmultipath/propsel.c   | 20 +++++++++++++++++---
> > >  2 files changed, 38 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > > index 4288c9fd..5f41dcb7 100644
> > > --- a/libmultipath/discovery.c
> > > +++ b/libmultipath/discovery.c
> > > @@ -871,6 +871,10 @@ get_serial (char * str, int maxlen, int fd)
> > >  	return 1;
> > >  }
> > >  
> > > /*
> > > + * Side effect: sets pp->tpgs if it could be determined.
> > > + * If ALUA calls fail because paths are unreachable, pp->tpgs
> > > remains unchanged.
> > > + */
> > >  static void
> > >  detect_alua(struct path * pp)
> > >  {
> > > @@ -881,12 +885,28 @@ detect_alua(struct path * pp)
> > >  	if (sysfs_get_timeout(pp, &timeout) <= 0)
> > >  		timeout = DEF_TIMEOUT;
> > >  
> > > -	if ((tpgs = get_target_port_group_support(pp, timeout)) <= 0) {
> > > +	tpgs = get_target_port_group_support(pp, timeout);
> > > +	if (tpgs == -RTPG_INQUIRY_FAILED)
> > > +		return;
> > > +	else if (tpgs <= 0) {
> > >  		pp->tpgs = TPGS_NONE;
> > >  		return;
> > >  	}
> > > +
> > > +	if (pp->fd == -1 || pp->offline)
> > > +		return;
> > > +
> >  
> > This is just a nitpick, but won't tpgs already be
> > -RTPG_INQUIRY_FAILED
> > if pp->fd == -1. This check makes more sense before
> > get_target_port_group_support().
> 
> Not really, because get_target_port_group_support() normally obtains
> INQUIRY data from sysfs, which can return something reasonable even
> if fd == -1 (e.g. if the path is temporarily offline). In particular,
> it could indicate that the device has no TPGS support. That's why I
> call it first.

Oops. My bad.

Reviewed-by: Benjamin Marzinsk <bmarzins@redhat.com>

-Ben

> 
> Regards,
> Martin
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-11-20 17:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20191115143403.31692-1-martin.wilck@suse.com>
     [not found] ` <20191115143403.31692-4-martin.wilck@suse.com>
2019-11-19 22:22   ` [PATCH 3/5] libmultipath: drop mpp->nr_active field Benjamin Marzinski
2019-11-20  8:24     ` Martin Wilck
     [not found] ` <20191115143403.31692-6-martin.wilck@suse.com>
2019-11-19 22:29   ` [PATCH 5/5] libmultipath: fix ALUA autodetection when paths are down Benjamin Marzinski
2019-11-20 12:33     ` Martin Wilck
2019-11-20 17:01       ` Benjamin Marzinski
     [not found] ` <20191115143403.31692-2-martin.wilck@suse.com>
2019-11-19 22:30   ` [PATCH 1/5] multipathd: move set_no_path_retry() back to libmultipath Benjamin Marzinski
     [not found] ` <20191115143403.31692-3-martin.wilck@suse.com>
2019-11-19 22:30   ` [PATCH 2/5] libmultipath: dict.c: rename duplicate set_no_path_retry() Benjamin Marzinski
     [not found] ` <20191115143403.31692-5-martin.wilck@suse.com>
2019-11-19 22:31   ` [PATCH 4/5] libmultipath: fix (max_)polling_interval setting logic Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.