All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC for multipath queue_if_no_path timeout.
@ 2013-09-26 17:14 Frank Mayhar
  2013-09-26 17:24 ` Alasdair G Kergon
  2013-09-26 17:41 ` Mike Snitzer
  0 siblings, 2 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-09-26 17:14 UTC (permalink / raw)
  To: dm-devel

Hey, folks.  We're using multipath as an in-kernel failover mechanism,
so that if an underlying device dies, multipath will switch to another
in its list.  Further, we use queue_if_no_path so that a daemon can get
involved and replace the list if the kernel runs out of alternatives.
In testing, however, we ran into a problem.

Obviously, if queue_if_no_path is on and multipath runs out of good
paths, the I/Os will sit there queued forever barring user intervention.
I was doing a lot of failure testing and encountered a daemon bug in
which it would abandon its recovery in the middle, leaving the list
intact and the I/Os queued, forever.  We fixed the daemon but the
problem is potentially still there if for some reason the daemon dies
and is not restarted.  This is a problem not solely (or even primarily)
for the queued I/O, but also because things like slab shrink can get
stuck behind that I/O and then other stuff becomes stuck behind _that_
(since tries to get locks held by shrink and may itself hold
semaphores), bringing the whole system to its knees in fairly short
order, to the point that it's impossible to even get in via the network
and reboot it.  I have an existence proof that this is the case. :-)

My idea to deal with this in the kernel was to introduce a timeout on
queue_if_no_path and make it settable either kernel-wide or per-table.
By default it's disabled and is only armed when multipath runs out of
valid paths and queue_if_no_path is on.  It's disabled again on table
load.  If the timeout ever fires, all that happens is that the handler
turns off queue_if_no_path; this causes all the outstanding I/O to get
EIO and unsticks things all the way up the chain.  Losing those I/Os is
far better than losing the entire system.

I've actually implemented this and it works.  I've debated about talking
with you folks about it but figured it was worth a shot.  I can post the
patch if you're interested.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 17:14 RFC for multipath queue_if_no_path timeout Frank Mayhar
@ 2013-09-26 17:24 ` Alasdair G Kergon
  2013-09-26 17:31   ` Frank Mayhar
  2013-09-26 17:41 ` Mike Snitzer
  1 sibling, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-09-26 17:24 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel

Timeouts were discussed when we added queue_if_no_path. and we made the
decision to implement this in userspace not the kernel.

Make sure that your daemon doesn't die - or that something monitors it
and relaunches it if it does.

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 17:24 ` Alasdair G Kergon
@ 2013-09-26 17:31   ` Frank Mayhar
  2013-09-26 17:38     ` Alasdair G Kergon
  0 siblings, 1 reply; 49+ messages in thread
From: Frank Mayhar @ 2013-09-26 17:31 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

On Thu, 2013-09-26 at 18:24 +0100, Alasdair G Kergon wrote:
> Timeouts were discussed when we added queue_if_no_path. and we made the
> decision to implement this in userspace not the kernel.
> 
> Make sure that your daemon doesn't die - or that something monitors it
> and relaunches it if it does.

Uh, huh.  And what about when (not if) _that_ fails?  (For one thing,
what if the stuckness caused by the queued I/O prevents the binary from
being successfully pulled in from storage?)

Granted, this is a belt-and-suspenders kind of thing that most won't
need, but when you need it, you _need_ it.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 17:31   ` Frank Mayhar
@ 2013-09-26 17:38     ` Alasdair G Kergon
  2013-09-26 17:47       ` Frank Mayhar
  0 siblings, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-09-26 17:38 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel

On Thu, Sep 26, 2013 at 10:31:56AM -0700, Frank Mayhar wrote:
> Uh, huh.  And what about when (not if) _that_ fails?  (For one thing,
> what if the stuckness caused by the queued I/O prevents the binary from
> being successfully pulled in from storage?)
 
Lock the daemon in memory (or launch from ramdisk), don't allocate any new
memory while it's doing critical monitoring, tell the OOM killer not to kill
it, set high/real-time priority etc.

lvm2 and multipath-tools use some of these techniques and seem to cope OK.

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 17:14 RFC for multipath queue_if_no_path timeout Frank Mayhar
  2013-09-26 17:24 ` Alasdair G Kergon
@ 2013-09-26 17:41 ` Mike Snitzer
  2013-09-26 17:55   ` Frank Mayhar
  1 sibling, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-09-26 17:41 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel

On Thu, Sep 26 2013 at  1:14pm -0400,
Frank Mayhar <fmayhar@google.com> wrote:

> Hey, folks.  We're using multipath as an in-kernel failover mechanism,
> so that if an underlying device dies, multipath will switch to another
> in its list.  Further, we use queue_if_no_path so that a daemon can get
> involved and replace the list if the kernel runs out of alternatives.
> In testing, however, we ran into a problem.
> 
> Obviously, if queue_if_no_path is on and multipath runs out of good
> paths, the I/Os will sit there queued forever barring user intervention.
> I was doing a lot of failure testing and encountered a daemon bug in
> which it would abandon its recovery in the middle, leaving the list
> intact and the I/Os queued, forever.  We fixed the daemon

Did you share the fix upstream yet?  If not please do ;)

> but the
> problem is potentially still there if for some reason the daemon dies
> and is not restarted.  This is a problem not solely (or even primarily)
> for the queued I/O, but also because things like slab shrink can get
> stuck behind that I/O and then other stuff becomes stuck behind _that_
> (since tries to get locks held by shrink and may itself hold
> semaphores), bringing the whole system to its knees in fairly short
> order, to the point that it's impossible to even get in via the network
> and reboot it.  I have an existence proof that this is the case. :-)
> 
> My idea to deal with this in the kernel was to introduce a timeout on
> queue_if_no_path and make it settable either kernel-wide or per-table.
> By default it's disabled and is only armed when multipath runs out of
> valid paths and queue_if_no_path is on.  It's disabled again on table
> load.  If the timeout ever fires, all that happens is that the handler
> turns off queue_if_no_path; this causes all the outstanding I/O to get
> EIO and unsticks things all the way up the chain.  Losing those I/Os is
> far better than losing the entire system.
> 
> I've actually implemented this and it works.  I've debated about talking
> with you folks about it but figured it was worth a shot.  I can post the
> patch if you're interested.

A timeout is always going to be racey.  But obviously with enough
testing you could arrive at a timeout that is reasonable for your
needs.. but in general I just don't think a timeout to release the
queuing is the right way to go.

And I understand Alasdair's point about hardening multipathd and using a
watchdog to restart it if it fails.  Ultimately that is ideal.  But if
multipathd does have a bug that makes it incapable of handling a case
(like the one you just fixed) it doesn't help to restart the daemon.

Therefore I'm not opposed to some solution in kernel.  But I'd think it
would be the kernel equivalent to multipathd's "queue_without_daemon".
AFAIK we currently don't have a way for the kernel to _know_ multipathd
is running; but that doesn't mean such a mechanism couldn't be
implemented.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 17:38     ` Alasdair G Kergon
@ 2013-09-26 17:47       ` Frank Mayhar
  2013-09-26 17:52         ` Mike Snitzer
  2013-09-26 23:22         ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon
  0 siblings, 2 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-09-26 17:47 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

On Thu, 2013-09-26 at 18:38 +0100, Alasdair G Kergon wrote:
> On Thu, Sep 26, 2013 at 10:31:56AM -0700, Frank Mayhar wrote:
> > Uh, huh.  And what about when (not if) _that_ fails?  (For one thing,
> > what if the stuckness caused by the queued I/O prevents the binary from
> > being successfully pulled in from storage?)
>  
> Lock the daemon in memory (or launch from ramdisk), don't allocate any new
> memory while it's doing critical monitoring, tell the OOM killer not to kill
> it, set high/real-time priority etc.
> 
> lvm2 and multipath-tools use some of these techniques and seem to cope OK.

Launching it from ramdisk won't help, particularly, since it still goes
through the block layer.  The other stuff won't help if a (potentially
unrelated) bug in the daemon happens to be being tickled at the same
time, or if some dependency happens to be broken and _that's_ what's
preventing the daemon from making progress.

And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind
of environments most people have, but that's not the kind of environment
(or scale) we have to deal with.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 17:47       ` Frank Mayhar
@ 2013-09-26 17:52         ` Mike Snitzer
  2013-09-26 20:36           ` [PATCH 1/1] dm mpath: Add timeout mechanism for queue_if_no_path Frank Mayhar
  2013-09-26 23:22         ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon
  1 sibling, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-09-26 17:52 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel, Alasdair G Kergon

On Thu, Sep 26 2013 at  1:47pm -0400,
Frank Mayhar <fmayhar@google.com> wrote:

> On Thu, 2013-09-26 at 18:38 +0100, Alasdair G Kergon wrote:
> > On Thu, Sep 26, 2013 at 10:31:56AM -0700, Frank Mayhar wrote:
> > > Uh, huh.  And what about when (not if) _that_ fails?  (For one thing,
> > > what if the stuckness caused by the queued I/O prevents the binary from
> > > being successfully pulled in from storage?)
> >  
> > Lock the daemon in memory (or launch from ramdisk), don't allocate any new
> > memory while it's doing critical monitoring, tell the OOM killer not to kill
> > it, set high/real-time priority etc.
> > 
> > lvm2 and multipath-tools use some of these techniques and seem to cope OK.
> 
> Launching it from ramdisk won't help, particularly, since it still goes
> through the block layer.  The other stuff won't help if a (potentially
> unrelated) bug in the daemon happens to be being tickled at the same
> time, or if some dependency happens to be broken and _that's_ what's
> preventing the daemon from making progress.
> 
> And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind
> of environments most people have, but that's not the kind of environment
> (or scale) we have to deal with.

Fine, please see the post I made earlier in this thread and let me know
what you think.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 17:41 ` Mike Snitzer
@ 2013-09-26 17:55   ` Frank Mayhar
  2013-09-26 18:41     ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Frank Mayhar @ 2013-09-26 17:55 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel

On Thu, 2013-09-26 at 13:41 -0400, Mike Snitzer wrote:
> On Thu, Sep 26 2013 at  1:14pm -0400,
> Frank Mayhar <fmayhar@google.com> wrote:
> > Obviously, if queue_if_no_path is on and multipath runs out of good
> > paths, the I/Os will sit there queued forever barring user intervention.
> > I was doing a lot of failure testing and encountered a daemon bug in
> > which it would abandon its recovery in the middle, leaving the list
> > intact and the I/Os queued, forever.  We fixed the daemon
> Did you share the fix upstream yet?  If not please do ;)

It's a daemon we wrote so the fix only really applies to us, sorry.

> A timeout is always going to be racey.  But obviously with enough
> testing you could arrive at a timeout that is reasonable for your
> needs.. but in general I just don't think a timeout to release the
> queuing is the right way to go.

Having it as an admin-settable option seems reasonable to me, though.  I
agree you don't want one by default.  I expect that the timeout that's
actually used to be on the order of single-digit minutes.

> And I understand Alasdair's point about hardening multipathd and using a
> watchdog to restart it if it fails.  Ultimately that is ideal.  But if
> multipathd does have a bug that makes it incapable of handling a case
> (like the one you just fixed) it doesn't help to restart the daemon.

Believe me, we're hardening out daemon as much as possible, but the
reality is that there's always going to be some situation that wasn't
anticipated.  In our environment, that kind of stuff happens almost
constantly.  No matter how hardened the daemon, _something_ can keep it
from doing its job.

> Therefore I'm not opposed to some solution in kernel.  But I'd think it
> would be the kernel equivalent to multipathd's "queue_without_daemon".
> AFAIK we currently don't have a way for the kernel to _know_ multipathd
> is running; but that doesn't mean such a mechanism couldn't be
> implemented.

If you have a reasonable alternative I'm all ears.  However instantly
failing the I/O if the daemon isn't present and we run out of paths
isn't a good answer for us.  Setting a timeout only if the daemon isn't
present is functionally equivalent to setting a timeout regardless and
having a running daemon nearly instantly reload the table (thereby
turning off the timeout).
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 17:55   ` Frank Mayhar
@ 2013-09-26 18:41     ` Mike Snitzer
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-09-26 18:41 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel

On Thu, Sep 26 2013 at  1:55pm -0400,
Frank Mayhar <fmayhar@google.com> wrote:

> On Thu, 2013-09-26 at 13:41 -0400, Mike Snitzer wrote:
> > On Thu, Sep 26 2013 at  1:14pm -0400,
> > Frank Mayhar <fmayhar@google.com> wrote:
> > > Obviously, if queue_if_no_path is on and multipath runs out of good
> > > paths, the I/Os will sit there queued forever barring user intervention.
> > > I was doing a lot of failure testing and encountered a daemon bug in
> > > which it would abandon its recovery in the middle, leaving the list
> > > intact and the I/Os queued, forever.  We fixed the daemon
> > Did you share the fix upstream yet?  If not please do ;)
> 
> It's a daemon we wrote so the fix only really applies to us, sorry.

Ah, unfortunate to hear multipath-tools wasn't suitable for your needs.
We don't need to get in an elaborate tangent on why it wasn't used but
if there is something that would help improve multipath-tools in general
please share.

> > A timeout is always going to be racey.  But obviously with enough
> > testing you could arrive at a timeout that is reasonable for your
> > needs.. but in general I just don't think a timeout to release the
> > queuing is the right way to go.
> 
> Having it as an admin-settable option seems reasonable to me, though.  I
> agree you don't want one by default.  I expect that the timeout that's
> actually used to be on the order of single-digit minutes.
> 
> > And I understand Alasdair's point about hardening multipathd and using a
> > watchdog to restart it if it fails.  Ultimately that is ideal.  But if
> > multipathd does have a bug that makes it incapable of handling a case
> > (like the one you just fixed) it doesn't help to restart the daemon.
> 
> Believe me, we're hardening out daemon as much as possible, but the
> reality is that there's always going to be some situation that wasn't
> anticipated.  In our environment, that kind of stuff happens almost
> constantly.  No matter how hardened the daemon, _something_ can keep it
> from doing its job.
> 
> > Therefore I'm not opposed to some solution in kernel.  But I'd think it
> > would be the kernel equivalent to multipathd's "queue_without_daemon".
> > AFAIK we currently don't have a way for the kernel to _know_ multipathd
> > is running; but that doesn't mean such a mechanism couldn't be
> > implemented.
> 
> If you have a reasonable alternative I'm all ears.  However instantly
> failing the I/O if the daemon isn't present and we run out of paths
> isn't a good answer for us.

I see your point, and making a decision based on the availability of the
registered daemon (your custom thing or multipathd) is racey if the
daemon has a bug that causes it to crash and the watchdog restarts it
repeatedly; could result in false positive that makes the kernel think
the daemon is there (only for it to fail).

> Setting a timeout only if the daemon isn't
> present is functionally equivalent to setting a timeout regardless and
> having a running daemon nearly instantly reload the table (thereby
> turning off the timeout).

Not really following you here but the big point is you don't want
immediate release of the queued IO (even if you were armed with the
insight that there is no daemon to recover the paths)... you want that
release to be time delayed.

Pretty sure multipathd doesn't have the ability to dequeue multipath
devices after a timeout; so while Alasdair said it was a conscious
decision to have that feature be done in a userspace daemon I can easily
see that a bug in said daemon could prevent it from staying up long
enough to hit the configured timeout and then reload tables.

It is moot considering you aren't using multipath-tools, but your daemon
could also suffer from inability to reliably staying up to reload the
tables on timeout.

So all said, please post your dm-mpath patch that implements this
queue_if_no_path_timeout feature for further consideration.  I'd hope 0
disables it (the default) and that the unit of the timeout is in
seconds.  Also, I don't think it should be dm-mpath (kernel)
wide... best to just have it be per multipath table.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/1] dm mpath: Add timeout mechanism for queue_if_no_path.
  2013-09-26 17:52         ` Mike Snitzer
@ 2013-09-26 20:36           ` Frank Mayhar
  0 siblings, 0 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-09-26 20:36 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Alasdair G Kergon

Add a configurable timeout mechanism to the queue_if_no_path function
of multipath.  If set and if queue_if_no_path is set, the timeout is
started when there are no active paths on a multipath device.  The
timeout is reset if an active path is introduced or, of course, if a
new table (and therefore a new multipath definition) is loaded.  If
the timeout ever fires, the handler simply turns queue_if_no_path off.
This allows I/O queued in multipath to be errored, possibly releasing
locks and semaphores that may be being held waiting for that I/O to
complete.

The implementation dynamically allocates the timeout as needed, so
as to add as little overhead as possible when it is not being used.

This mechanism is not turned on by default (the default timeout is zero).
It can be turned on by either adding the queue_if_no_path_timeout
parameter to the table definition or sending the parameter via the
DM message mechanism (sets a timeout for only that multipath instance;
note that this doesn't survive a table reload unless the parameter is
included in the new table).

Signed-off-by: Frank Mayhar <fmayhar@google.com>

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index de570a5..a746306 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -27,6 +27,8 @@
 #define DM_PG_INIT_DELAY_MSECS 2000
 #define DM_PG_INIT_DELAY_DEFAULT ((unsigned) -1)
 
+#define DEFAULT_NOPATH_TIMEOUT_SECS 0
+
 /* Path properties */
 struct pgpath {
 	struct list_head list;
@@ -105,6 +107,17 @@ struct multipath {
 	mempool_t *mpio_pool;
 
 	struct mutex work_mutex;
+
+	/*
+	 * When a context loses its last path, queue_if_no_path is set and
+	 * nopath_timeout > 0, we start this timer.  If it fires, we clear
+	 * queue_if_no_path.  If we get a new path before it fires, we stop
+	 * the timer.
+	 *
+	 * The timeout is given in seconds.
+	 */
+	unsigned nopath_timeout;
+	struct timer_list *nopath_timer;
 };
 
 /*
@@ -117,12 +130,19 @@ struct dm_mpath_io {
 
 typedef int (*action_fn) (struct pgpath *pgpath);
 
+static unsigned long dm_mpath_nopath_timeout = DEFAULT_NOPATH_TIMEOUT_SECS;
+module_param_named(queue_if_no_path_timeout_secs,
+		   dm_mpath_nopath_timeout, ulong, S_IRUGO | S_IWUSR);
+
 static struct kmem_cache *_mpio_cache;
 
 static struct workqueue_struct *kmultipathd, *kmpath_handlerd;
 static void process_queued_ios(struct work_struct *work);
 static void trigger_event(struct work_struct *work);
 static void activate_path(struct work_struct *work);
+static void handle_nopath_timeout(unsigned long data);
+static void activate_nopath_timeout_l(struct multipath *m);
+static int activate_nopath_timeout(struct multipath *m, unsigned to);
 
 
 /*-----------------------------------------------
@@ -193,6 +213,8 @@ static struct multipath *alloc_multipath(struct dm_target *ti)
 
 	m = kzalloc(sizeof(*m), GFP_KERNEL);
 	if (m) {
+		int r;
+
 		INIT_LIST_HEAD(&m->priority_groups);
 		INIT_LIST_HEAD(&m->queued_ios);
 		spin_lock_init(&m->lock);
@@ -207,6 +229,14 @@ static struct multipath *alloc_multipath(struct dm_target *ti)
 			kfree(m);
 			return NULL;
 		}
+		m->nopath_timer = NULL;
+		r = activate_nopath_timeout(m, DEFAULT_NOPATH_TIMEOUT_SECS);
+		if (r == -ENOMEM) {
+			DMWARN("Couldn't allocate timer list!");
+			mempool_destroy(m->mpio_pool);
+			kfree(m);
+			return NULL;
+		}
 		m->ti = ti;
 		ti->private = m;
 	}
@@ -790,6 +820,8 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
 		{0, 6, "invalid number of feature args"},
 		{1, 50, "pg_init_retries must be between 1 and 50"},
 		{0, 60000, "pg_init_delay_msecs must be between 0 and 60000"},
+		{0, 31557600, "queue_if_no_path_timeout_secs must be 0 or"
+			      " a positive number of seconds"},
 	};
 
 	r = dm_read_arg_group(_args, as, &argc, &ti->error);
@@ -827,6 +859,16 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
 			continue;
 		}
 
+		if (!strcasecmp(arg_name, "queue_if_no_path_timeout_secs") &&
+		    (argc >= 1)) {
+			unsigned to;
+
+			r = dm_read_arg(_args+3, as, &to, &ti->error);
+			activate_nopath_timeout(m, to);
+			argc--;
+			continue;
+		}
+
 		ti->error = "Unrecognised multipath feature request";
 		r = -EINVAL;
 	} while (argc && !r);
@@ -952,6 +994,8 @@ static void multipath_dtr(struct dm_target *ti)
 {
 	struct multipath *m = ti->private;
 
+	if (m->nopath_timer)
+		del_timer_sync(m->nopath_timer);
 	flush_multipath_work(m);
 	free_multipath(m);
 }
@@ -978,6 +1022,17 @@ static int multipath_map(struct dm_target *ti, struct request *clone,
 }
 
 /*
+ * If the queue_if_no_path timeout fires, turn off queue_if_no_path and
+ * process any queued I/O.
+ */
+static void handle_nopath_timeout(unsigned long data)
+{
+	struct multipath *m = (struct multipath *)data;
+
+	(void)queue_if_no_path(m, 0, 0);
+}
+
+/*
  * Take a path out of use.
  */
 static int fail_path(struct pgpath *pgpath)
@@ -1006,6 +1061,8 @@ static int fail_path(struct pgpath *pgpath)
 
 	schedule_work(&m->trigger_event);
 
+	activate_nopath_timeout_l(m);
+
 out:
 	spin_unlock_irqrestore(&m->lock, flags);
 
@@ -1055,6 +1112,9 @@ static int reinstate_path(struct pgpath *pgpath)
 out:
 	spin_unlock_irqrestore(&m->lock, flags);
 
+	if (m->nopath_timer)
+		del_timer_sync(m->nopath_timer);
+
 	return r;
 }
 
@@ -1079,6 +1139,79 @@ static int action_dev(struct multipath *m, struct dm_dev *dev,
 }
 
 /*
+ * Activate the queue_if_no_path timeout if necessary.  Called with m->lock
+ * held.
+ */
+static void activate_nopath_timeout_l(struct multipath *m)
+{
+	if (m->nr_valid_paths == 0 && m->nopath_timer &&
+	    m->nopath_timeout > 0 && m->queue_if_no_path)
+		mod_timer(m->nopath_timer, jiffies + m->nopath_timeout * HZ);
+}
+
+/*
+ * Set the queue_if_no_path timeout and activate it.
+ */
+static int activate_nopath_timeout(struct multipath *m, unsigned to)
+{
+	unsigned long flags;
+	struct timer_list *nopath_timer = NULL;
+
+	spin_lock_irqsave(&m->lock, flags);
+	/*
+	 * If the user is turning the timeout on, allocate the timer list.
+	 */
+	if (to && !m->nopath_timer) {
+		spin_unlock_irqrestore(&m->lock, flags);
+		nopath_timer = kzalloc(sizeof(struct timer_list), GFP_KERNEL);
+		if (!nopath_timer)
+			return -ENOMEM;
+		init_timer(nopath_timer);
+		nopath_timer->data = (unsigned long)m;
+		nopath_timer->function = handle_nopath_timeout;
+		spin_lock_irqsave(&m->lock, flags);
+		/* If we raced with an allocate, drop the new one. */
+		if (m->nopath_timer)
+			kfree(nopath_timer);
+		else
+			m->nopath_timer = nopath_timer;
+		nopath_timer = NULL;
+	}
+	m->nopath_timeout = to;
+	if (to)
+		activate_nopath_timeout_l(m);
+	else {
+		/*
+		 * If the user is turning the timeout off, null out the timer
+		 * list pointer so we can free it below.
+		 */
+		nopath_timer = m->nopath_timer;
+		m->nopath_timer = NULL;
+	}
+	spin_unlock_irqrestore(&m->lock, flags);
+	/* Now turn off the timer and free the timer list. */
+	if (nopath_timer) {
+		del_timer_sync(nopath_timer);
+		kfree(nopath_timer);
+	}
+	return 0;
+}
+
+/*
+ * Get the queue_if_no_path timeout, set it and activate it.
+ */
+static int set_nopath_timeout(struct multipath *m, const char *timestr)
+{
+	unsigned long to;
+
+	if (!timestr || (sscanf(timestr, "%lu", &to) != 1)) {
+		DMWARN("invalid timeout supplied to set_nopath_timeout");
+		return -EINVAL;
+	}
+	return activate_nopath_timeout(m, to);
+}
+
+/*
  * Temporarily try to avoid having to use the specified PG
  */
 static void bypass_pg(struct multipath *m, struct priority_group *pg,
@@ -1423,7 +1556,8 @@ static void multipath_status(struct dm_target *ti, status_type_t type,
 		DMEMIT("%u ", m->queue_if_no_path +
 			      (m->pg_init_retries > 0) * 2 +
 			      (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT) * 2 +
-			      m->retain_attached_hw_handler);
+			      m->retain_attached_hw_handler) +
+			      (m->nopath_timeout > 0) * 2;
 		if (m->queue_if_no_path)
 			DMEMIT("queue_if_no_path ");
 		if (m->pg_init_retries)
@@ -1432,6 +1566,9 @@ static void multipath_status(struct dm_target *ti, status_type_t type,
 			DMEMIT("pg_init_delay_msecs %u ", m->pg_init_delay_msecs);
 		if (m->retain_attached_hw_handler)
 			DMEMIT("retain_attached_hw_handler ");
+		if (m->nopath_timeout)
+			DMEMIT("queue_if_no_path_timeout_secs %u ",
+				   m->nopath_timeout);
 	}
 
 	if (!m->hw_handler_name || type == STATUSTYPE_INFO)
@@ -1550,6 +1687,9 @@ static int multipath_message(struct dm_target *ti, unsigned argc, char **argv)
 	} else if (!strcasecmp(argv[0], "switch_group")) {
 		r = switch_pg_num(m, argv[1]);
 		goto out;
+	} else if (!strcasecmp(argv[0], "queue_if_no_path_timeout_secs")) {
+		r = set_nopath_timeout(m, argv[1]);
+		goto out;
 	} else if (!strcasecmp(argv[0], "reinstate_path"))
 		action = reinstate_path;
 	else if (!strcasecmp(argv[0], "fail_path"))

-- 
Frank Mayhar
310-460-4042

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 17:47       ` Frank Mayhar
  2013-09-26 17:52         ` Mike Snitzer
@ 2013-09-26 23:22         ` Alasdair G Kergon
  2013-09-26 23:49           ` Mike Snitzer
  2013-09-27 16:27           ` Frank Mayhar
  1 sibling, 2 replies; 49+ messages in thread
From: Alasdair G Kergon @ 2013-09-26 23:22 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel

On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote:
> Launching it from ramdisk won't help, particularly, since it still goes
> through the block layer.  The other stuff won't help if a (potentially
> unrelated) bug in the daemon happens to be being tickled at the same
> time, or if some dependency happens to be broken and _that's_ what's
> preventing the daemon from making progress.
 
Then put more effort into debugging your daemon so it doesn't have
bugs that make it die?  Implement the timeout in a robust independent
daemon if it's other code there that's unreliable?

> And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind
> of environments most people have, but that's not the kind of environment
> (or scale) we have to deal with.

In what way are your requirements so different that a locked-into-memory
monitoring daemon cannot implement this timeout?

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 23:22         ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon
@ 2013-09-26 23:49           ` Mike Snitzer
  2013-09-27  6:07             ` Hannes Reinecke
                               ` (2 more replies)
  2013-09-27 16:27           ` Frank Mayhar
  1 sibling, 3 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-09-26 23:49 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel

On Thu, Sep 26 2013 at  7:22pm -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote:
> > Launching it from ramdisk won't help, particularly, since it still goes
> > through the block layer.  The other stuff won't help if a (potentially
> > unrelated) bug in the daemon happens to be being tickled at the same
> > time, or if some dependency happens to be broken and _that's_ what's
> > preventing the daemon from making progress.
>  
> Then put more effort into debugging your daemon so it doesn't have
> bugs that make it die?  Implement the timeout in a robust independent
> daemon if it's other code there that's unreliable?
> 
> > And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind
> > of environments most people have, but that's not the kind of environment
> > (or scale) we have to deal with.
> 
> In what way are your requirements so different that a locked-into-memory
> monitoring daemon cannot implement this timeout?

Frank, I had a look at your patch.  It leaves a lot to be desired, I was
starting to clean it up but ultimately found myself agreeing with
Alasdair's original point: that this policy should be implemented in the
userspace daemon.

Mike

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 23:49           ` Mike Snitzer
@ 2013-09-27  6:07             ` Hannes Reinecke
  2013-09-27  8:06               ` Hannes Reinecke
  2013-09-27 16:29             ` Frank Mayhar
  2013-10-17 19:03             ` Frank Mayhar
  2 siblings, 1 reply; 49+ messages in thread
From: Hannes Reinecke @ 2013-09-27  6:07 UTC (permalink / raw)
  To: dm-devel

On 09/27/2013 01:49 AM, Mike Snitzer wrote:
> On Thu, Sep 26 2013 at  7:22pm -0400,
> Alasdair G Kergon <agk@redhat.com> wrote:
> 
>> On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote:
>>> Launching it from ramdisk won't help, particularly, since it still goes
>>> through the block layer.  The other stuff won't help if a (potentially
>>> unrelated) bug in the daemon happens to be being tickled at the same
>>> time, or if some dependency happens to be broken and _that's_ what's
>>> preventing the daemon from making progress.
>>  
>> Then put more effort into debugging your daemon so it doesn't have
>> bugs that make it die?  Implement the timeout in a robust independent
>> daemon if it's other code there that's unreliable?
>>
>>> And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind
>>> of environments most people have, but that's not the kind of environment
>>> (or scale) we have to deal with.
>>
>> In what way are your requirements so different that a locked-into-memory
>> monitoring daemon cannot implement this timeout?
> 
> Frank, I had a look at your patch.  It leaves a lot to be desired, I was
> starting to clean it up but ultimately found myself agreeing with
> Alasdair's original point: that this policy should be implemented in the
> userspace daemon.
> 
_Actually_ there is a way how this could be implemented properly:
implement a blk_timeout function.

Thing is, every request_queue might have a timeout function
implemented, whose goal is to abort requests which are beyond that
timeout. EG SCSI uses that for the dev_loss_tmo mechanism.

Multipath what with it being request-based could easily implement
the same mechanism, namely have to blk_timeout function which would
just re-arm the timeout in the default case, but abort any queued
I/O (after a timeout) if all paths are down.

Hmm. I see to draft up a PoC.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-27  6:07             ` Hannes Reinecke
@ 2013-09-27  8:06               ` Hannes Reinecke
  2013-09-27  8:37                 ` Alasdair G Kergon
  2013-09-27 16:32                 ` Frank Mayhar
  0 siblings, 2 replies; 49+ messages in thread
From: Hannes Reinecke @ 2013-09-27  8:06 UTC (permalink / raw)
  To: dm-devel; +Cc: Frank Mayhar, Mike Snitzer

[-- Attachment #1: Type: text/plain, Size: 2403 bytes --]

On 09/27/2013 08:07 AM, Hannes Reinecke wrote:
> On 09/27/2013 01:49 AM, Mike Snitzer wrote:
>> On Thu, Sep 26 2013 at  7:22pm -0400,
>> Alasdair G Kergon <agk@redhat.com> wrote:
>>
>>> On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote:
>>>> Launching it from ramdisk won't help, particularly, since it still goes
>>>> through the block layer.  The other stuff won't help if a (potentially
>>>> unrelated) bug in the daemon happens to be being tickled at the same
>>>> time, or if some dependency happens to be broken and _that's_ what's
>>>> preventing the daemon from making progress.
>>>  
>>> Then put more effort into debugging your daemon so it doesn't have
>>> bugs that make it die?  Implement the timeout in a robust independent
>>> daemon if it's other code there that's unreliable?
>>>
>>>> And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind
>>>> of environments most people have, but that's not the kind of environment
>>>> (or scale) we have to deal with.
>>>
>>> In what way are your requirements so different that a locked-into-memory
>>> monitoring daemon cannot implement this timeout?
>>
>> Frank, I had a look at your patch.  It leaves a lot to be desired, I was
>> starting to clean it up but ultimately found myself agreeing with
>> Alasdair's original point: that this policy should be implemented in the
>> userspace daemon.
>>
> _Actually_ there is a way how this could be implemented properly:
> implement a blk_timeout function.
> 
> Thing is, every request_queue might have a timeout function
> implemented, whose goal is to abort requests which are beyond that
> timeout. EG SCSI uses that for the dev_loss_tmo mechanism.
> 
> Multipath what with it being request-based could easily implement
> the same mechanism, namely have to blk_timeout function which would
> just re-arm the timeout in the default case, but abort any queued
> I/O (after a timeout) if all paths are down.
> 
> Hmm. I see to draft up a PoC.
> 
And indeed, here it is.

Completely untested, just to give you an idea what I was going on
about. Let's see if I can put this to test somewhere...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

[-- Attachment #2: dm-mpath-no-path-timeout.patch --]
[-- Type: text/x-patch, Size: 4589 bytes --]

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 5adede1..6801ac3 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -444,6 +444,61 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path,
 	return 0;
 }
 
+/*
+ * Block timeout callback, called from the block layer
+ *
+ * request_queue lock is held on entry.
+ *
+ * Return values:
+ * BLK_EH_RESET_TIMER if the request should be left running
+ * BLK_EH_NOT_HANDLED if the request is handled or terminated
+ *                    by the driver.
+ */
+enum blk_eh_timer_return abort_if_no_path(struct request *req)
+{
+	union map_info *info;
+	struct dm_mpath_io *mpio;
+	struct multipath *m;
+	unsigned long flags;
+	int rc = BLK_EH_RESET_TIMER;
+	int flush_ios = 0;
+
+	info = dm_get_rq_mapinfo(req);
+	if (!info || !info->ptr)
+		return BLK_EH_NOT_HANDLED;
+
+	mpio = info->ptr;
+	m = mpio->pgpath->pg->m;
+	/*
+	 * Only abort request if:
+	 * - queued_ios is not empty
+	 *   (protect against races with process_queued_ios)
+	 * - queue_io is not set
+	 * - no valid paths are found
+	 */
+	spin_lock_irqsave(&m->lock, flags);
+	if (!list_empty(&m->queued_ios) &&
+	    !m->queue_io &&
+	    !m->nr_valid_paths) {
+		list_del_init(&req->queuelist);
+		m->queue_size--;
+		m->queue_if_no_path = 0;
+		if (m->queue_size)
+			flush_ios = 1;
+		rc = BLK_EH_NOT_HANDLED;
+	}
+	spin_unlock_irqrestore(&m->lock, flags);
+
+	if (rc == BLK_EH_NOT_HANDLED) {
+		mempool_free(mpio, m->mpio_pool);
+		dm_kill_unmapped_request(clone, -ETIMEOUT);
+	}
+	if (flush_ios)
+		queue_work(kmultipathd, &m->process_queue_ios);
+
+	return rc;
+}
+
 /*-----------------------------------------------------------------
  * The multipath daemon is responsible for resubmitting queued ios.
  *---------------------------------------------------------------*/
@@ -790,6 +845,7 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
 		{0, 6, "invalid number of feature args"},
 		{1, 50, "pg_init_retries must be between 1 and 50"},
 		{0, 60000, "pg_init_delay_msecs must be between 0 and 60000"},
+		{0, 65535, "no_path_timeout must be between 0 and 65535"},
 	};
 
 	r = dm_read_arg_group(_args, as, &argc, &ti->error);
@@ -827,6 +883,13 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
 			continue;
 		}
 
+		if (!strcasecmp(arg_name, "no_path_timeout") &&
+		    (argc >= 1)) {
+			r = dm_read_arg(_args + 3, as, &m->no_path_timeout, &ti->error);
+			argc--;
+			continue;
+		}
+
 		ti->error = "Unrecognised multipath feature request";
 		r = -EINVAL;
 	} while (argc && !r);
@@ -905,6 +968,12 @@ static int multipath_ctr(struct dm_target *ti, unsigned int argc,
 		goto bad;
 	}
 
+	if (m->no_path_timeout)
+		dm_set_queue_timeout(ti, abort_if_no_path,
+				     m->no_path_timeout * HZ);
+	else
+		dm_set_queue_timeout(ti, NULL, 0)
+
 	ti->num_flush_bios = 1;
 	ti->num_discard_bios = 1;
 	ti->num_write_same_bios = 1;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 9e39d2b..26bfad6 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1881,6 +1881,16 @@ static void dm_init_md_queue(struct mapped_device *md)
 	blk_queue_merge_bvec(md->queue, dm_merge_bvec);
 }
 
+static void dm_set_md_timeout(struct mapped_device *md,
+			      rq_timed_out_fn *fn, unsigned int timeout)
+{
+	if (dm_get_md_type(md) != DM_TYPE_REQUEST_BASED)
+		return;
+
+	blk_queue_rq_timed_out(md->queue, fn);
+	blk_queue_rq_timeout(md->queue, timeout);
+}
+
 /*
  * Allocate and initialise a blank device with a given minor.
  */
@@ -2790,6 +2800,13 @@ int dm_noflush_suspending(struct dm_target *ti)
 }
 EXPORT_SYMBOL_GPL(dm_noflush_suspending);
 
+int dm_set_queue_timeout(struct dm_target *ti, rq_timed_out_fn *fn,
+			 unsigned int timeout)
+{
+	return dm_set_md_timeout(dm_table_get_md(ti->table), fn, timeout);
+}
+EXPORT_SYMBOL_GPL(dm_set_queue_timeout);
+
 struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity, unsigned per_bio_data_size)
 {
 	struct dm_md_mempools *pools = kzalloc(sizeof(*pools), GFP_KERNEL);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 45b97da..c8df1ef 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -80,6 +80,8 @@ void dm_unlock_md_type(struct mapped_device *md);
 void dm_set_md_type(struct mapped_device *md, unsigned type);
 unsigned dm_get_md_type(struct mapped_device *md);
 struct target_type *dm_get_immutable_target_type(struct mapped_device *md);
+void dm_set_queue_timeout(struct dm_table *t, rq_timed_out_fn *fn,
+			  unsigned int timeout);
 
 int dm_setup_md_queue(struct mapped_device *md);
 

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-27  8:06               ` Hannes Reinecke
@ 2013-09-27  8:37                 ` Alasdair G Kergon
  2013-09-27 13:52                   ` Hannes Reinecke
  2013-09-27 16:32                 ` Frank Mayhar
  1 sibling, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-09-27  8:37 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Frank Mayhar, Mike Snitzer

But this still dodges the fundamental problem:

  What is the right value to use for the timeout?
  - How long should you wait for a path to (re)appear?
    - In the current model, reinstating a path is a userspace 
      responsibility.

The timeout, as proposed, is being used in two conflicting ways:
  - How long to wait for path recovery when all paths went down
  - How long to wait when the system locks without enough free
    memory even to reinstate a path (because of broken userspace
    code) before having multipath fail queued I/O in a desperate
    attempt at releasing memory to assist recovery
 
The second case should point to a very short timeout.
The first case probably wants a longer one.

In my view the correct approach for the case Frank is discussing is to
use a different trigger to detect the (approaching?) locking up of the
system.   E.g.  should something related to the handling of an out
of memory condition have a hook to instruct multipath to release such
queued I/O?

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-27  8:37                 ` Alasdair G Kergon
@ 2013-09-27 13:52                   ` Hannes Reinecke
  2013-09-27 16:37                     ` Frank Mayhar
  0 siblings, 1 reply; 49+ messages in thread
From: Hannes Reinecke @ 2013-09-27 13:52 UTC (permalink / raw)
  To: dm-devel, Frank Mayhar, Mike Snitzer

On 09/27/2013 10:37 AM, Alasdair G Kergon wrote:
> But this still dodges the fundamental problem:
> 
>   What is the right value to use for the timeout?
>   - How long should you wait for a path to (re)appear?
>     - In the current model, reinstating a path is a userspace 
>       responsibility.
> 
And with my proposed patch it would still be userspace which is
setting the timeout.
Currently, no_path_retry is not a proper measure anyway, as it's
depending on the time multipathd takes to complete a path check
round. Which depends on the number of device, the state of those etc.

> The timeout, as proposed, is being used in two conflicting ways:
>   - How long to wait for path recovery when all paths went down

That would be set via the new 'no_path_timeout' feature, which would
be set instead of the (multipath-internal) no_path_retry
setting.

>   - How long to wait when the system locks without enough free
>     memory even to reinstate a path (because of broken userspace
>     code) before having multipath fail queued I/O in a desperate
>     attempt at releasing memory to assist recovery
>  
Do we even handle that case currently?
Methinks this is precisely the use-case this is supposed to address.
When currently 'no_path_retry' is set _and_ we're running under a
low-mem condition there is a quite large likelyhood that the
multipath daemon will be killed by the OOM-killer or not able to
send any dm messages down to the kernel, as the latter most likely
require some memory allocations.

So in the current 'no_path_retry' scenario the maps would have been
created with 'queue_if_no_path', and the daemon would have to reset
the 'queue_if_no_path' flag if the no_path_retry value expires.
Which it might not be able to do so due to the above scenario.

So with the proposed 'no_path_timeout' we would enable the dm-mpath
module to terminate all outstanding I/O, irrespective on all
userland conditions. Which seems like an improvement to me ...

> The second case should point to a very short timeout.
> The first case probably wants a longer one.
> 
> In my view the correct approach for the case Frank is discussing is to
> use a different trigger to detect the (approaching?) locking up of the
> system.   E.g.  should something related to the handling of an out
> of memory condition have a hook to instruct multipath to release such
> queued I/O?
> 
Yeah, that was what I had planned for quite some time.
But thinking it over the no_path_timeout seems like a better
approach here.

(Plus we're hooking into the generic 'blk_timeout' mechanism, which
then would allow to blk_abort_request() to work)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 23:22         ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon
  2013-09-26 23:49           ` Mike Snitzer
@ 2013-09-27 16:27           ` Frank Mayhar
  1 sibling, 0 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-09-27 16:27 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

On Fri, 2013-09-27 at 00:22 +0100, Alasdair G Kergon wrote:
> On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote:
> > Launching it from ramdisk won't help, particularly, since it still goes
> > through the block layer.  The other stuff won't help if a (potentially
> > unrelated) bug in the daemon happens to be being tickled at the same
> > time, or if some dependency happens to be broken and _that's_ what's
> > preventing the daemon from making progress.
>  
> Then put more effort into debugging your daemon so it doesn't have
> bugs that make it die?  Implement the timeout in a robust independent
> daemon if it's other code there that's unreliable?

I'm not sure how to respond to this.  Some fifty years of people
programming computers appears to show unequivocally that you can't rely
on code not having bugs no matter _how_ much effort you put into it.
It's just the nature of the beast.

> > And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind
> > of environments most people have, but that's not the kind of environment
> > (or scale) we have to deal with.
> In what way are your requirements so different that a locked-into-memory
> monitoring daemon cannot implement this timeout?

If we could _have_ an independent, locked-into-memory monitoring daemon
just for this purpose, we might be able to get by.  It would still be
iffy, for the reason I mention above; at our scale anything that _can_
fail _will_ fail, at least occasionally and often many times per day.
Unfortunately, that's a non-starter for a number of reasons, including
but not limited to the fact that the environment the daemon is running
in is memory-constrained.  Add to that the fact that the daemon we
actually have depends on stuff that my team has no direct control over
and we end up really needing an in-kernel way to deal with this.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 23:49           ` Mike Snitzer
  2013-09-27  6:07             ` Hannes Reinecke
@ 2013-09-27 16:29             ` Frank Mayhar
  2013-10-17 19:03             ` Frank Mayhar
  2 siblings, 0 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-09-27 16:29 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel

On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote:Frank, I had a
look at your patch.  It leaves a lot to be desired, I was
> starting to clean it up but ultimately found myself agreeing with
> Alasdair's original point: that this policy should be implemented in the
> userspace daemon.

If you could quickly summarize your issues with it I would be more than
happy to clean it up myself.  On the other hand, it appears that Hannes
may have something more palatable that solves the same problem?
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-27  8:06               ` Hannes Reinecke
  2013-09-27  8:37                 ` Alasdair G Kergon
@ 2013-09-27 16:32                 ` Frank Mayhar
  1 sibling, 0 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-09-27 16:32 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On Fri, 2013-09-27 at 10:06 +0200, Hannes Reinecke wrote:
> On 09/27/2013 08:07 AM, Hannes Reinecke wrote:
> > On 09/27/2013 01:49 AM, Mike Snitzer wrote:
> >> On Thu, Sep 26 2013 at  7:22pm -0400,
> >> Alasdair G Kergon <agk@redhat.com> wrote:
> >>
> >>> On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote:
> >>>> Launching it from ramdisk won't help, particularly, since it still goes
> >>>> through the block layer.  The other stuff won't help if a (potentially
> >>>> unrelated) bug in the daemon happens to be being tickled at the same
> >>>> time, or if some dependency happens to be broken and _that's_ what's
> >>>> preventing the daemon from making progress.
> >>>  
> >>> Then put more effort into debugging your daemon so it doesn't have
> >>> bugs that make it die?  Implement the timeout in a robust independent
> >>> daemon if it's other code there that's unreliable?
> >>>
> >>>> And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind
> >>>> of environments most people have, but that's not the kind of environment
> >>>> (or scale) we have to deal with.
> >>>
> >>> In what way are your requirements so different that a locked-into-memory
> >>> monitoring daemon cannot implement this timeout?
> >>
> >> Frank, I had a look at your patch.  It leaves a lot to be desired, I was
> >> starting to clean it up but ultimately found myself agreeing with
> >> Alasdair's original point: that this policy should be implemented in the
> >> userspace daemon.
> >>
> > _Actually_ there is a way how this could be implemented properly:
> > implement a blk_timeout function.
> > 
> > Thing is, every request_queue might have a timeout function
> > implemented, whose goal is to abort requests which are beyond that
> > timeout. EG SCSI uses that for the dev_loss_tmo mechanism.
> > 
> > Multipath what with it being request-based could easily implement
> > the same mechanism, namely have to blk_timeout function which would
> > just re-arm the timeout in the default case, but abort any queued
> > I/O (after a timeout) if all paths are down.
> > 
> > Hmm. I see to draft up a PoC.
> > 
> And indeed, here it is.
> 
> Completely untested, just to give you an idea what I was going on
> about. Let's see if I can put this to test somewhere...

Thanks, Hannes!  I'll grab this and test it today.  I clearly don't know
enough about the block layer, since using blk_timeout never even crossed
my mind.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-27 13:52                   ` Hannes Reinecke
@ 2013-09-27 16:37                     ` Frank Mayhar
  0 siblings, 0 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-09-27 16:37 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On Fri, 2013-09-27 at 15:52 +0200, Hannes Reinecke wrote:
> On 09/27/2013 10:37 AM, Alasdair G Kergon wrote:
> > But this still dodges the fundamental problem:
> > 
> >   What is the right value to use for the timeout?
> >   - How long should you wait for a path to (re)appear?
> >     - In the current model, reinstating a path is a userspace 
> >       responsibility.
> > 
> And with my proposed patch it would still be userspace which is
> setting the timeout.
> Currently, no_path_retry is not a proper measure anyway, as it's
> depending on the time multipathd takes to complete a path check
> round. Which depends on the number of device, the state of those etc.
> 
> > The timeout, as proposed, is being used in two conflicting ways:
> >   - How long to wait for path recovery when all paths went down
> 
> That would be set via the new 'no_path_timeout' feature, which would
> be set instead of the (multipath-internal) no_path_retry
> setting.

Yes, this matches our setup as well.

> >   - How long to wait when the system locks without enough free
> >     memory even to reinstate a path (because of broken userspace
> >     code) before having multipath fail queued I/O in a desperate
> >     attempt at releasing memory to assist recovery
> Do we even handle that case currently?

My understanding is that the current code doesn't, no, but if it does I
would love to know how.

> Methinks this is precisely the use-case this is supposed to address.

Yes, exactly.

> When currently 'no_path_retry' is set _and_ we're running under a
> low-mem condition there is a quite large likelyhood that the
> multipath daemon will be killed by the OOM-killer or not able to
> send any dm messages down to the kernel, as the latter most likely
> require some memory allocations.
> 
> So in the current 'no_path_retry' scenario the maps would have been
> created with 'queue_if_no_path', and the daemon would have to reset
> the 'queue_if_no_path' flag if the no_path_retry value expires.
> Which it might not be able to do so due to the above scenario.
> 
> So with the proposed 'no_path_timeout' we would enable the dm-mpath
> module to terminate all outstanding I/O, irrespective on all
> userland conditions. Which seems like an improvement to me ...

And to me, which is why I went in this direction in the first place.  I
could see no dependable way to deal with outside of the kernel; if I
had, I would have taken it, since userspace changes are _much_ easier
for us to deal with than kernel changes.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-09-26 23:49           ` Mike Snitzer
  2013-09-27  6:07             ` Hannes Reinecke
  2013-09-27 16:29             ` Frank Mayhar
@ 2013-10-17 19:03             ` Frank Mayhar
  2013-10-17 19:15               ` Mike Snitzer
  2013-10-21 16:05               ` RFC for multipath " Benjamin Marzinski
  2 siblings, 2 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-10-17 19:03 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel

Dragging this back up into the light...

On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote:
> Frank, I had a look at your patch.  It leaves a lot to be desired, I was
> starting to clean it up but ultimately found myself agreeing with
> Alasdair's original point: that this policy should be implemented in the
> userspace daemon.

I've found and fixed a couple of bugs but I would still like to know
what issues you had with the patch.  As I said before, I would be more
than happy to clean it up.

In the time since we had this discussion, by the way, we ran into a
problem that a userspace daemon can't solve:  That of shutdown.  We ran
into a number of failures in which systems were hung for hours.  It
turned out that they were caused by a regular system shutdown.  Our
backing store is network-based and networking was getting killed before
applications (as is usually the case), leaving I/O outstanding on the
device.  Since queue_if_no_path was set, the I/O wasn't dumped and our
daemon was killed by shutdown very shortly thereafter so it couldn't
recover (otherwise it would have cleaned things up).

With those I/Os sitting queued in multipath, with no network and no
daemon to turn off queue_if_no_path, the systems just sat.  When we
finally diagnosed this, we realized that the timeout would work
perfectly to solve the problem, automatically turning queue_if_no_path
off shortly after the network went away without depending on the
intervention of the no-longer-running daemon.

So how do you guys deal with this failure scenario?
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-10-17 19:03             ` Frank Mayhar
@ 2013-10-17 19:15               ` Mike Snitzer
  2013-10-17 20:45                 ` Frank Mayhar
  2013-10-21 16:05               ` RFC for multipath " Benjamin Marzinski
  1 sibling, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-17 19:15 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel

On Thu, Oct 17 2013 at  3:03pm -0400,
Frank Mayhar <fmayhar@google.com> wrote:

> Dragging this back up into the light...
> 
> On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote:
> > Frank, I had a look at your patch.  It leaves a lot to be desired, I was
> > starting to clean it up but ultimately found myself agreeing with
> > Alasdair's original point: that this policy should be implemented in the
> > userspace daemon.
> 
> I've found and fixed a couple of bugs but I would still like to know
> what issues you had with the patch.  As I said before, I would be more
> than happy to clean it up.

I don't recall, will let you know if/when I do have time to look again.
 
> In the time since we had this discussion, by the way, we ran into a
> problem that a userspace daemon can't solve:  That of shutdown.  We ran
> into a number of failures in which systems were hung for hours.  It
> turned out that they were caused by a regular system shutdown.  Our
> backing store is network-based and networking was getting killed before
> applications (as is usually the case), leaving I/O outstanding on the
> device.  Since queue_if_no_path was set, the I/O wasn't dumped and our
> daemon was killed by shutdown very shortly thereafter so it couldn't
> recover (otherwise it would have cleaned things up).
> 
> With those I/Os sitting queued in multipath, with no network and no
> daemon to turn off queue_if_no_path, the systems just sat.  When we
> finally diagnosed this, we realized that the timeout would work
> perfectly to solve the problem, automatically turning queue_if_no_path
> off shortly after the network went away without depending on the
> intervention of the no-longer-running daemon.
> 
> So how do you guys deal with this failure scenario?

Shouldn't you wait for the application to shutdown before ripping the
network out?  Seems odd to just throw away queued IO.

A proper shutdown sequence really should avoid this problem in general,
the multipath daemon would only be shutdown once all mpath devices are
deactivated.

Then, if you still want to gracefully handle the case where there is no
network (and hence no paths) on shutdown the multipathd would still be
around to transition to a table that doesn't have queue_if_no_path.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-10-17 19:15               ` Mike Snitzer
@ 2013-10-17 20:45                 ` Frank Mayhar
  2013-10-17 21:13                   ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Frank Mayhar @ 2013-10-17 20:45 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel

On Thu, 2013-10-17 at 15:15 -0400, Mike Snitzer wrote:
> Shouldn't you wait for the application to shutdown before ripping the
> network out?  Seems odd to just throw away queued IO.

In general, yes, and we do that, but there's still the occasional
shutdown that doesn't go through those extra steps.

> A proper shutdown sequence really should avoid this problem in general,
> the multipath daemon would only be shutdown once all mpath devices are
> deactivated.
> 
> Then, if you still want to gracefully handle the case where there is no
> network (and hence no paths) on shutdown the multipathd would still be
> around to transition to a table that doesn't have queue_if_no_path.

Of course, this still depends on the daemon surviving everything else
circumstances can throw at it.

In our particular case, changing the shutdown sequence is far more
difficult than using the kernel mechanism I've already provided.  This
would also simplify things for other folks who use queue_if_no_path.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-10-17 20:45                 ` Frank Mayhar
@ 2013-10-17 21:13                   ` Mike Snitzer
  2013-10-18 20:51                     ` Frank Mayhar
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-17 21:13 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel

On Thu, Oct 17 2013 at  4:45pm -0400,
Frank Mayhar <fmayhar@google.com> wrote:

> On Thu, 2013-10-17 at 15:15 -0400, Mike Snitzer wrote:
> > Shouldn't you wait for the application to shutdown before ripping the
> > network out?  Seems odd to just throw away queued IO.
> 
> In general, yes, and we do that, but there's still the occasional
> shutdown that doesn't go through those extra steps.
> 
> > A proper shutdown sequence really should avoid this problem in general,
> > the multipath daemon would only be shutdown once all mpath devices are
> > deactivated.
> > 
> > Then, if you still want to gracefully handle the case where there is no
> > network (and hence no paths) on shutdown the multipathd would still be
> > around to transition to a table that doesn't have queue_if_no_path.
> 
> Of course, this still depends on the daemon surviving everything else
> circumstances can throw at it.
> 
> In our particular case, changing the shutdown sequence is far more
> difficult than using the kernel mechanism I've already provided.  This
> would also simplify things for other folks who use queue_if_no_path.

Cannot say that argument wins me over but I will say that if you intend
to take the approach to have the kernel have a timeout; please pursue
the approach Hannes offered:

https://patchwork.kernel.org/patch/2953231/

It is much cleaner and if it works for your needs we can see about
getting a tested version upstream.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-10-17 21:13                   ` Mike Snitzer
@ 2013-10-18 20:51                     ` Frank Mayhar
  2013-10-18 21:47                       ` Alasdair G Kergon
  2013-10-18 22:53                       ` [RFC PATCH v2] dm mpath: add a " Mike Snitzer
  0 siblings, 2 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-10-18 20:51 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel

On Thu, 2013-10-17 at 17:13 -0400, Mike Snitzer wrote:
> Cannot say that argument wins me over but I will say that if you intend
> to take the approach to have the kernel have a timeout; please pursue
> the approach Hannes offered:
> 
> https://patchwork.kernel.org/patch/2953231/
> 
> It is much cleaner and if it works for your needs we can see about
> getting a tested version upstream.

Unfortunately his patch doesn't work as-is; it turns out that it tries
to set the timeout only if the target is request-based but at the time
he tries to set it the table type hasn't yet been set.

I'm looking into fixing it.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-10-18 20:51                     ` Frank Mayhar
@ 2013-10-18 21:47                       ` Alasdair G Kergon
  2013-10-18 22:53                       ` [RFC PATCH v2] dm mpath: add a " Mike Snitzer
  1 sibling, 0 replies; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-18 21:47 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: Jun'ichi Nomura, dm-devel, Mike Snitzer

On Fri, Oct 18, 2013 at 01:51:55PM -0700, Frank Mayhar wrote:
> I'm looking into fixing it.

Do consider whether some of the code fits more naturally into the core
part of dm rather than a particular target.

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-18 20:51                     ` Frank Mayhar
  2013-10-18 21:47                       ` Alasdair G Kergon
@ 2013-10-18 22:53                       ` Mike Snitzer
  2013-10-30  1:02                         ` Mike Snitzer
  1 sibling, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-18 22:53 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel, Alasdair G. Kergon

On Fri, Oct 18 2013 at  4:51pm -0400,
Frank Mayhar <fmayhar@google.com> wrote:

> On Thu, 2013-10-17 at 17:13 -0400, Mike Snitzer wrote:
> > Cannot say that argument wins me over but I will say that if you intend
> > to take the approach to have the kernel have a timeout; please pursue
> > the approach Hannes offered:
> > 
> > https://patchwork.kernel.org/patch/2953231/
> > 
> > It is much cleaner and if it works for your needs we can see about
> > getting a tested version upstream.
> 
> Unfortunately his patch doesn't work as-is; it turns out that it tries
> to set the timeout only if the target is request-based but at the time
> he tries to set it the table type hasn't yet been set.
> 
> I'm looking into fixing it.

Ouch, yeah, can't access the DM device's queue from .ctr()
There were other issues with Hannes RFC patch, wouldn't compile.

Anyway, looks like we need a new target_type hook (e.g. .init_queue)
that is called from dm_init_request_based_queue().

Request-based DM only allows a single DM target per device so we don't
need the usual multi DM-target iterators.

But, unfortunately, at the time we call dm_init_request_based_queue()
the mapped_device isn't yet connected to the inactive table that is
being loaded (but the table is connected to the mapped_device).

In dm-ioctl.c:table_load(), the inactive table could be passed directly
into dm_setup_md_queue().

Please give the following revised patch a try, if it works we can clean
it up further (think multipath_status needs updating, we also may want
to constrain .init_queue to only being called if the target is a
singleton, which dm-mpath should be, but isn't flagged as such yet).

It compiles, but I haven't tested it...

---
 drivers/md/dm-ioctl.c         |    2 +-
 drivers/md/dm-mpath.c         |   77 +++++++++++++++++++++++++++++++++++++++++
 drivers/md/dm.c               |   12 +++++--
 drivers/md/dm.h               |    2 +-
 include/linux/device-mapper.h |    4 ++
 5 files changed, 92 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index afe0814..74d1ab4 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1289,7 +1289,7 @@ static int table_load(struct dm_ioctl *param, size_t param_size)
 	}
 
 	/* setup md->queue to reflect md's type (may block) */
-	r = dm_setup_md_queue(md);
+	r = dm_setup_md_queue(md, t);
 	if (r) {
 		DMWARN("unable to set up device queue for new table.");
 		goto err_unlock_md_type;
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index de570a5..2c3e427 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -105,6 +105,8 @@ struct multipath {
 	mempool_t *mpio_pool;
 
 	struct mutex work_mutex;
+
+	unsigned no_path_timeout;
 };
 
 /*
@@ -444,6 +446,61 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path,
 	return 0;
 }
 
+/*
+ * Block timeout callback, called from the block layer
+ *
+ * request_queue lock is held on entry.
+ *
+ * Return values:
+ * BLK_EH_RESET_TIMER if the request should be left running
+ * BLK_EH_NOT_HANDLED if the request is handled or terminated
+ *                    by the driver.
+ */
+enum blk_eh_timer_return abort_if_no_path(struct request *rq)
+{
+	union map_info *info;
+	struct dm_mpath_io *mpio;
+	struct multipath *m;
+	unsigned long flags;
+	int rc = BLK_EH_RESET_TIMER;
+	int flush_ios = 0;
+
+	info = dm_get_rq_mapinfo(rq);
+	if (!info || !info->ptr)
+		return BLK_EH_NOT_HANDLED;
+
+	mpio = info->ptr;
+	m = mpio->pgpath->pg->m;
+	/*
+	 * Only abort request if:
+	 * - queued_ios is not empty
+	 *   (protect against races with process_queued_ios)
+	 * - queue_io is not set
+	 * - no valid paths are found
+	 */
+	spin_lock_irqsave(&m->lock, flags);
+	if (!list_empty(&m->queued_ios) &&
+	    !m->queue_io &&
+	    !m->nr_valid_paths) {
+		list_del_init(&rq->queuelist);
+		m->queue_size--;
+		m->queue_if_no_path = 0;
+		if (m->queue_size)
+			flush_ios = 1;
+		rc = BLK_EH_NOT_HANDLED;
+	}
+	spin_unlock_irqrestore(&m->lock, flags);
+
+	if (rc == BLK_EH_NOT_HANDLED) {
+		mempool_free(mpio, m->mpio_pool);
+		dm_kill_unmapped_request(rq, -ETIMEDOUT);
+	}
+	if (flush_ios)
+		queue_work(kmultipathd, &m->process_queued_ios);
+
+	return rc;
+}
+
 /*-----------------------------------------------------------------
  * The multipath daemon is responsible for resubmitting queued ios.
  *---------------------------------------------------------------*/
@@ -790,6 +847,7 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
 		{0, 6, "invalid number of feature args"},
 		{1, 50, "pg_init_retries must be between 1 and 50"},
 		{0, 60000, "pg_init_delay_msecs must be between 0 and 60000"},
+		{0, 65535, "no_path_timeout must be between 0 and 65535"},
 	};
 
 	r = dm_read_arg_group(_args, as, &argc, &ti->error);
@@ -827,6 +885,13 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
 			continue;
 		}
 
+		if (!strcasecmp(arg_name, "no_path_timeout") &&
+		    (argc >= 1)) {
+			r = dm_read_arg(_args + 3, as, &m->no_path_timeout, &ti->error);
+			argc--;
+			continue;
+		}
+
 		ti->error = "Unrecognised multipath feature request";
 		r = -EINVAL;
 	} while (argc && !r);
@@ -1709,6 +1774,17 @@ out:
 	return busy;
 }
 
+static void multipath_init_queue(struct dm_target *ti,
+				 struct request_queue *q)
+{
+	struct multipath *m = ti->private;
+
+	if (m->no_path_timeout) {
+		blk_queue_rq_timed_out(q, abort_if_no_path);
+		blk_queue_rq_timeout(q, m->no_path_timeout * HZ);
+	}
+}
+
 /*-----------------------------------------------------------------
  * Module setup
  *---------------------------------------------------------------*/
@@ -1728,6 +1804,7 @@ static struct target_type multipath_target = {
 	.ioctl  = multipath_ioctl,
 	.iterate_devices = multipath_iterate_devices,
 	.busy = multipath_busy,
+	.init_queue = multipath_init_queue,
 };
 
 static int __init dm_multipath_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b3e26c7..ce87b8a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2336,8 +2336,10 @@ EXPORT_SYMBOL_GPL(dm_get_queue_limits);
 /*
  * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
  */
-static int dm_init_request_based_queue(struct mapped_device *md)
+static int dm_init_request_based_queue(struct mapped_device *md,
+				       struct dm_table *table)
 {
+	struct dm_target *ti = NULL;
 	struct request_queue *q = NULL;
 
 	if (md->queue->elevator)
@@ -2356,16 +2358,20 @@ static int dm_init_request_based_queue(struct mapped_device *md)
 
 	elv_register_queue(md->queue);
 
+	ti = dm_table_get_target(table, 0);
+	if (ti->type->init_queue)
+		ti->type->init_queue(ti, md->queue);
+
 	return 1;
 }
 
 /*
  * Setup the DM device's queue based on md's type
  */
-int dm_setup_md_queue(struct mapped_device *md)
+int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 {
 	if ((dm_get_md_type(md) == DM_TYPE_REQUEST_BASED) &&
-	    !dm_init_request_based_queue(md)) {
+	    !dm_init_request_based_queue(md, t)) {
 		DMWARN("Cannot initialize queue for request-based mapped device");
 		return -EINVAL;
 	}
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 1d1ad7b..55cb207 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -83,7 +83,7 @@ void dm_set_md_type(struct mapped_device *md, unsigned type);
 unsigned dm_get_md_type(struct mapped_device *md);
 struct target_type *dm_get_immutable_target_type(struct mapped_device *md);
 
-int dm_setup_md_queue(struct mapped_device *md);
+int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t);
 
 /*
  * To check the return value from dm_table_find_target().
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index ed419c6..650c575 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -107,6 +107,9 @@ typedef int (*dm_iterate_devices_fn) (struct dm_target *ti,
 typedef void (*dm_io_hints_fn) (struct dm_target *ti,
 				struct queue_limits *limits);
 
+typedef void (*dm_init_queue_fn) (struct dm_target *ti,
+				  struct request_queue *q);
+
 /*
  * Returns:
  *    0: The target can handle the next I/O immediately.
@@ -162,6 +165,7 @@ struct target_type {
 	dm_busy_fn busy;
 	dm_iterate_devices_fn iterate_devices;
 	dm_io_hints_fn io_hints;
+	dm_init_queue_fn init_queue;
 
 	/* For internal device-mapper use. */
 	struct list_head list;

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-10-17 19:03             ` Frank Mayhar
  2013-10-17 19:15               ` Mike Snitzer
@ 2013-10-21 16:05               ` Benjamin Marzinski
  2013-10-21 16:17                 ` Frank Mayhar
  1 sibling, 1 reply; 49+ messages in thread
From: Benjamin Marzinski @ 2013-10-21 16:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Frank Mayhar

On Thu, Oct 17, 2013 at 12:03:10PM -0700, Frank Mayhar wrote:
> Dragging this back up into the light...
> 
> On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote:
> > Frank, I had a look at your patch.  It leaves a lot to be desired, I was
> > starting to clean it up but ultimately found myself agreeing with
> > Alasdair's original point: that this policy should be implemented in the
> > userspace daemon.
> 
> I've found and fixed a couple of bugs but I would still like to know
> what issues you had with the patch.  As I said before, I would be more
> than happy to clean it up.
> 
> In the time since we had this discussion, by the way, we ran into a
> problem that a userspace daemon can't solve:  That of shutdown.  We ran
> into a number of failures in which systems were hung for hours.  It
> turned out that they were caused by a regular system shutdown.  Our
> backing store is network-based and networking was getting killed before
> applications (as is usually the case), leaving I/O outstanding on the
> device.  Since queue_if_no_path was set, the I/O wasn't dumped and our
> daemon was killed by shutdown very shortly thereafter so it couldn't
> recover (otherwise it would have cleaned things up).
> 

Was multipathd force killed? What was the default configuration
parameter "queue_without_daemon" set to?

If "queue_without_daemon" is set to "no", multipathd should disable
queueing when it is stopped. This was added specifically to avoid this
issue.

-Ben

> With those I/Os sitting queued in multipath, with no network and no
> daemon to turn off queue_if_no_path, the systems just sat.  When we
> finally diagnosed this, we realized that the timeout would work
> perfectly to solve the problem, automatically turning queue_if_no_path
> off shortly after the network went away without depending on the
> intervention of the no-longer-running daemon.
> 
> So how do you guys deal with this failure scenario?
> -- 
> Frank Mayhar
> 310-460-4042
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-10-21 16:05               ` RFC for multipath " Benjamin Marzinski
@ 2013-10-21 16:17                 ` Frank Mayhar
  2013-10-23 13:39                   ` Benjamin Marzinski
  0 siblings, 1 reply; 49+ messages in thread
From: Frank Mayhar @ 2013-10-21 16:17 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development

On Mon, 2013-10-21 at 11:05 -0500, Benjamin Marzinski wrote:
> On Thu, Oct 17, 2013 at 12:03:10PM -0700, Frank Mayhar wrote:
> > Dragging this back up into the light...
> > 
> > On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote:
> > > Frank, I had a look at your patch.  It leaves a lot to be desired, I was
> > > starting to clean it up but ultimately found myself agreeing with
> > > Alasdair's original point: that this policy should be implemented in the
> > > userspace daemon.
> > 
> > I've found and fixed a couple of bugs but I would still like to know
> > what issues you had with the patch.  As I said before, I would be more
> > than happy to clean it up.
> > 
> > In the time since we had this discussion, by the way, we ran into a
> > problem that a userspace daemon can't solve:  That of shutdown.  We ran
> > into a number of failures in which systems were hung for hours.  It
> > turned out that they were caused by a regular system shutdown.  Our
> > backing store is network-based and networking was getting killed before
> > applications (as is usually the case), leaving I/O outstanding on the
> > device.  Since queue_if_no_path was set, the I/O wasn't dumped and our
> > daemon was killed by shutdown very shortly thereafter so it couldn't
> > recover (otherwise it would have cleaned things up).
> > 
> 
> Was multipathd force killed? What was the default configuration
> parameter "queue_without_daemon" set to?
> 
> If "queue_without_daemon" is set to "no", multipathd should disable
> queueing when it is stopped. This was added specifically to avoid this
> issue.

We don't run multipathd (we need our own daemon for various reasons) and
we're also running an older kernel (based on 3.3), so we don't have
"queue_without_daemon" yet.  (In fact, I don't see it on a cursory
glance at dm-mpath.c; where is it implemented?)
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: RFC for multipath queue_if_no_path timeout.
  2013-10-21 16:17                 ` Frank Mayhar
@ 2013-10-23 13:39                   ` Benjamin Marzinski
  0 siblings, 0 replies; 49+ messages in thread
From: Benjamin Marzinski @ 2013-10-23 13:39 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: device-mapper development

On Mon, Oct 21, 2013 at 09:17:31AM -0700, Frank Mayhar wrote:
> On Mon, 2013-10-21 at 11:05 -0500, Benjamin Marzinski wrote:
> > On Thu, Oct 17, 2013 at 12:03:10PM -0700, Frank Mayhar wrote:
> > > Dragging this back up into the light...
> > > 
> > > On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote:
> > > > Frank, I had a look at your patch.  It leaves a lot to be desired, I was
> > > > starting to clean it up but ultimately found myself agreeing with
> > > > Alasdair's original point: that this policy should be implemented in the
> > > > userspace daemon.
> > > 
> > > I've found and fixed a couple of bugs but I would still like to know
> > > what issues you had with the patch.  As I said before, I would be more
> > > than happy to clean it up.
> > > 
> > > In the time since we had this discussion, by the way, we ran into a
> > > problem that a userspace daemon can't solve:  That of shutdown.  We ran
> > > into a number of failures in which systems were hung for hours.  It
> > > turned out that they were caused by a regular system shutdown.  Our
> > > backing store is network-based and networking was getting killed before
> > > applications (as is usually the case), leaving I/O outstanding on the
> > > device.  Since queue_if_no_path was set, the I/O wasn't dumped and our
> > > daemon was killed by shutdown very shortly thereafter so it couldn't
> > > recover (otherwise it would have cleaned things up).
> > > 
> > 
> > Was multipathd force killed? What was the default configuration
> > parameter "queue_without_daemon" set to?
> > 
> > If "queue_without_daemon" is set to "no", multipathd should disable
> > queueing when it is stopped. This was added specifically to avoid this
> > issue.
> 
> We don't run multipathd (we need our own daemon for various reasons) and
> we're also running an older kernel (based on 3.3), so we don't have
> "queue_without_daemon" yet.  (In fact, I don't see it on a cursory
> glance at dm-mpath.c; where is it implemented?)

It's implemented in multipathd, not the kernel, so won't be any help to
you then.

-Ben

> -- 
> Frank Mayhar
> 310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-18 22:53                       ` [RFC PATCH v2] dm mpath: add a " Mike Snitzer
@ 2013-10-30  1:02                         ` Mike Snitzer
  2013-10-30 15:08                           ` Frank Mayhar
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-30  1:02 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: dm-devel, Alasdair G. Kergon

On Fri, Oct 18 2013 at  6:53pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Fri, Oct 18 2013 at  4:51pm -0400,
> Frank Mayhar <fmayhar@google.com> wrote:
> 
> > On Thu, 2013-10-17 at 17:13 -0400, Mike Snitzer wrote:
> > > Cannot say that argument wins me over but I will say that if you intend
> > > to take the approach to have the kernel have a timeout; please pursue
> > > the approach Hannes offered:
> > > 
> > > https://patchwork.kernel.org/patch/2953231/
> > > 
> > > It is much cleaner and if it works for your needs we can see about
> > > getting a tested version upstream.
> > 
> > Unfortunately his patch doesn't work as-is; it turns out that it tries
> > to set the timeout only if the target is request-based but at the time
> > he tries to set it the table type hasn't yet been set.
> > 
> > I'm looking into fixing it.
> 
> Ouch, yeah, can't access the DM device's queue from .ctr()
> There were other issues with Hannes RFC patch, wouldn't compile.
> 
> Anyway, looks like we need a new target_type hook (e.g. .init_queue)
> that is called from dm_init_request_based_queue().
> 
> Request-based DM only allows a single DM target per device so we don't
> need the usual multi DM-target iterators.
> 
> But, unfortunately, at the time we call dm_init_request_based_queue()
> the mapped_device isn't yet connected to the inactive table that is
> being loaded (but the table is connected to the mapped_device).
> 
> In dm-ioctl.c:table_load(), the inactive table could be passed directly
> into dm_setup_md_queue().
> 
> Please give the following revised patch a try, if it works we can clean
> it up further (think multipath_status needs updating, we also may want
> to constrain .init_queue to only being called if the target is a
> singleton, which dm-mpath should be, but isn't flagged as such yet).
> 
> It compiles, but I haven't tested it...

Frank,

Any interest in this or should I just table it for >= v3.14?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-30  1:02                         ` Mike Snitzer
@ 2013-10-30 15:08                           ` Frank Mayhar
  2013-10-30 15:43                             ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Frank Mayhar @ 2013-10-30 15:08 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Alasdair G. Kergon

On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
> Any interest in this or should I just table it for >= v3.14?

Sorry, I've been busy putting out another fire.  Yes, there's definitely
still interest.  I grabbed your revised patch and tested with it.
Unfortunately the timeout doesn't actually fire when requests are queued
due to queue_if_no_path; IIRC the block request queue timeout logic
wasn't triggering.  I planned to look into it more deeply figure out why
but I had to spend all last week fixing a nasty race and hadn't gotten
back to it yet.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-30 15:08                           ` Frank Mayhar
@ 2013-10-30 15:43                             ` Mike Snitzer
  2013-10-30 18:09                               ` Frank Mayhar
  2013-10-31 14:59                               ` [RFC PATCH v2] " Hannes Reinecke
  0 siblings, 2 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-30 15:43 UTC (permalink / raw)
  To: Frank Mayhar, hare; +Cc: dm-devel, Alasdair G. Kergon

On Wed, Oct 30 2013 at 11:08am -0400,
Frank Mayhar <fmayhar@google.com> wrote:

> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
> > Any interest in this or should I just table it for >= v3.14?
> 
> Sorry, I've been busy putting out another fire.  Yes, there's definitely
> still interest.  I grabbed your revised patch and tested with it.
> Unfortunately the timeout doesn't actually fire when requests are queued
> due to queue_if_no_path; IIRC the block request queue timeout logic
> wasn't triggering.  I planned to look into it more deeply figure out why
> but I had to spend all last week fixing a nasty race and hadn't gotten
> back to it yet.

OK, Hannes, any idea why this might be happening?  The patch in question
is here: https://patchwork.kernel.org/patch/3070391/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-30 15:43                             ` Mike Snitzer
@ 2013-10-30 18:09                               ` Frank Mayhar
  2013-10-31  9:36                                 ` Junichi Nomura
  2013-10-31 14:59                               ` [RFC PATCH v2] " Hannes Reinecke
  1 sibling, 1 reply; 49+ messages in thread
From: Frank Mayhar @ 2013-10-30 18:09 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Alasdair G. Kergon

On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote:
> On Wed, Oct 30 2013 at 11:08am -0400,
> Frank Mayhar <fmayhar@google.com> wrote:
> 
> > On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
> > > Any interest in this or should I just table it for >= v3.14?
> > 
> > Sorry, I've been busy putting out another fire.  Yes, there's definitely
> > still interest.  I grabbed your revised patch and tested with it.
> > Unfortunately the timeout doesn't actually fire when requests are queued
> > due to queue_if_no_path; IIRC the block request queue timeout logic
> > wasn't triggering.  I planned to look into it more deeply figure out why
> > but I had to spend all last week fixing a nasty race and hadn't gotten
> > back to it yet.
> 
> OK, Hannes, any idea why this might be happening?  The patch in question
> is here: https://patchwork.kernel.org/patch/3070391/

I got to this today and so far the most interesting I see is that the
cloned request that's queued in multipath has no queue associated with
it when it's queued; a printk reveals:

[  517.610042] map_io: queueing rq ffff8801150e0070 q           (null)

When it's eventually dequeued, it gets a queue from the destination
device (in the pgpath) via bdev_get_queue().

Because of this and from just looking at the code, blk_start_request()
(and therefore blk_add_timer()) isn't being called for those requests,
so there's never a chance that the timeout would happen.

Does this make sense?  Or am I totally off-base?
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-30 18:09                               ` Frank Mayhar
@ 2013-10-31  9:36                                 ` Junichi Nomura
  2013-10-31 14:16                                   ` Frank Mayhar
  0 siblings, 1 reply; 49+ messages in thread
From: Junichi Nomura @ 2013-10-31  9:36 UTC (permalink / raw)
  To: device-mapper development, Mike Snitzer, Frank Mayhar; +Cc: Alasdair G. Kergon

On 10/31/13 03:09, Frank Mayhar wrote:
> On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote:
>> On Wed, Oct 30 2013 at 11:08am -0400,
>> Frank Mayhar <fmayhar@google.com> wrote:
>>
>>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
>>>> Any interest in this or should I just table it for >= v3.14?
>>>
>>> Sorry, I've been busy putting out another fire.  Yes, there's definitely
>>> still interest.  I grabbed your revised patch and tested with it.
>>> Unfortunately the timeout doesn't actually fire when requests are queued
>>> due to queue_if_no_path; IIRC the block request queue timeout logic
>>> wasn't triggering.  I planned to look into it more deeply figure out why
>>> but I had to spend all last week fixing a nasty race and hadn't gotten
>>> back to it yet.
>>
>> OK, Hannes, any idea why this might be happening?  The patch in question
>> is here: https://patchwork.kernel.org/patch/3070391/
> 
> I got to this today and so far the most interesting I see is that the
> cloned request that's queued in multipath has no queue associated with
> it when it's queued; a printk reveals:
> 
> [  517.610042] map_io: queueing rq ffff8801150e0070 q           (null)
> 
> When it's eventually dequeued, it gets a queue from the destination
> device (in the pgpath) via bdev_get_queue().
> 
> Because of this and from just looking at the code, blk_start_request()
> (and therefore blk_add_timer()) isn't being called for those requests,
> so there's never a chance that the timeout would happen.
> 
> Does this make sense?  Or am I totally off-base?

Hi,

I haven't checked the above patch in detail but there is a problem;
abort_if_no_path() treats "rq" as a clone request, which it isn't.
"rq" is an original request.

It shouldn't be a correct fix but just for testing purpose, you can try
changing:
  info = dm_get_rq_mapinfo(rq);
to
  info = dm_get_rq_mapinfo(rq->special);
and see what happens.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-31  9:36                                 ` Junichi Nomura
@ 2013-10-31 14:16                                   ` Frank Mayhar
  2013-10-31 14:31                                     ` Alasdair G Kergon
  2013-11-01  1:58                                     ` Junichi Nomura
  0 siblings, 2 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-10-31 14:16 UTC (permalink / raw)
  To: Junichi Nomura
  Cc: device-mapper development, Alasdair G. Kergon, Mike Snitzer

On Thu, 2013-10-31 at 09:36 +0000, Junichi Nomura wrote:
> On 10/31/13 03:09, Frank Mayhar wrote:
> > On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote:
> >> On Wed, Oct 30 2013 at 11:08am -0400,
> >> Frank Mayhar <fmayhar@google.com> wrote:
> >>
> >>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
> >>>> Any interest in this or should I just table it for >= v3.14?
> >>>
> >>> Sorry, I've been busy putting out another fire.  Yes, there's definitely
> >>> still interest.  I grabbed your revised patch and tested with it.
> >>> Unfortunately the timeout doesn't actually fire when requests are queued
> >>> due to queue_if_no_path; IIRC the block request queue timeout logic
> >>> wasn't triggering.  I planned to look into it more deeply figure out why
> >>> but I had to spend all last week fixing a nasty race and hadn't gotten
> >>> back to it yet.
> >>
> >> OK, Hannes, any idea why this might be happening?  The patch in question
> >> is here: https://patchwork.kernel.org/patch/3070391/
> > 
> > I got to this today and so far the most interesting I see is that the
> > cloned request that's queued in multipath has no queue associated with
> > it when it's queued; a printk reveals:
> > 
> > [  517.610042] map_io: queueing rq ffff8801150e0070 q           (null)
> > 
> > When it's eventually dequeued, it gets a queue from the destination
> > device (in the pgpath) via bdev_get_queue().
> > 
> > Because of this and from just looking at the code, blk_start_request()
> > (and therefore blk_add_timer()) isn't being called for those requests,
> > so there's never a chance that the timeout would happen.
> > 
> > Does this make sense?  Or am I totally off-base?
> 
> Hi,
> 
> I haven't checked the above patch in detail but there is a problem;
> abort_if_no_path() treats "rq" as a clone request, which it isn't.
> "rq" is an original request.
> 
> It shouldn't be a correct fix but just for testing purpose, you can try
> changing:
>   info = dm_get_rq_mapinfo(rq);
> to
>   info = dm_get_rq_mapinfo(rq->special);
> and see what happens.

Well, at the moment this is kind of moot since abort_if_no_path() isn't
being called.  But, regardless, don't we want to time out the clone
request?  That is, after all, what is being queued in map_io().
Unfortunately the clones don't appear to be associated with a request
queue; they're just put on multipath's internal queue.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-31 14:16                                   ` Frank Mayhar
@ 2013-10-31 14:31                                     ` Alasdair G Kergon
  2013-10-31 17:17                                       ` Frank Mayhar
  2013-11-01  1:58                                     ` Junichi Nomura
  1 sibling, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-31 14:31 UTC (permalink / raw)
  To: Frank Mayhar, Junichi Nomura, device-mapper development,
	Mike Snitzer, Alasdair G. Kergon

On Thu, Oct 31, 2013 at 07:16:51AM -0700, Frank Mayhar wrote:
> Unfortunately the clones don't appear to be associated with a request
> queue; they're just put on multipath's internal queue.

(And also remember to test table swap/push back.)

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-30 15:43                             ` Mike Snitzer
  2013-10-30 18:09                               ` Frank Mayhar
@ 2013-10-31 14:59                               ` Hannes Reinecke
  1 sibling, 0 replies; 49+ messages in thread
From: Hannes Reinecke @ 2013-10-31 14:59 UTC (permalink / raw)
  To: Mike Snitzer, Frank Mayhar; +Cc: dm-devel, Alasdair G. Kergon

On 10/30/2013 04:43 PM, Mike Snitzer wrote:
> On Wed, Oct 30 2013 at 11:08am -0400,
> Frank Mayhar <fmayhar@google.com> wrote:
>
>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
>>> Any interest in this or should I just table it for >= v3.14?
>>
>> Sorry, I've been busy putting out another fire.  Yes, there's definitely
>> still interest.  I grabbed your revised patch and tested with it.
>> Unfortunately the timeout doesn't actually fire when requests are queued
>> due to queue_if_no_path; IIRC the block request queue timeout logic
>> wasn't triggering.  I planned to look into it more deeply figure out why
>> but I had to spend all last week fixing a nasty race and hadn't gotten
>> back to it yet.
>
> OK, Hannes, any idea why this might be happening?  The patch in question
> is here: https://patchwork.kernel.org/patch/3070391/
>
Not yet; currently I'm on vacation, but will be looking into it on Monday.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-31 14:31                                     ` Alasdair G Kergon
@ 2013-10-31 17:17                                       ` Frank Mayhar
  2013-11-01  1:23                                         ` Junichi Nomura
  0 siblings, 1 reply; 49+ messages in thread
From: Frank Mayhar @ 2013-10-31 17:17 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: Junichi Nomura, device-mapper development, Mike Snitzer

On Thu, 2013-10-31 at 14:31 +0000, Alasdair G Kergon wrote:
> On Thu, Oct 31, 2013 at 07:16:51AM -0700, Frank Mayhar wrote:
> > Unfortunately the clones don't appear to be associated with a request
> > queue; they're just put on multipath's internal queue.
> 
> (And also remember to test table swap/push back.)

That brings up something I wanted to ask.  I've dug through the code and
this particular thing isn't clear to me.  So how does it handle the
queued I/Os when switching tables?  I see nothing in the table_load()
path that would deal with this.  I'm guessing that the requests are
pushed back to the block layer and are later resubmitted and requeued on
the new multipath queue, but I don't see how that works.

Code references would be very welcome.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-31 17:17                                       ` Frank Mayhar
@ 2013-11-01  1:23                                         ` Junichi Nomura
  0 siblings, 0 replies; 49+ messages in thread
From: Junichi Nomura @ 2013-11-01  1:23 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: device-mapper development, Mike Snitzer, Alasdair G Kergon

On 11/01/13 02:17, Frank Mayhar wrote:
> On Thu, 2013-10-31 at 14:31 +0000, Alasdair G Kergon wrote:
>> (And also remember to test table swap/push back.)
> 
> That brings up something I wanted to ask.  I've dug through the code and
> this particular thing isn't clear to me.  So how does it handle the
> queued I/Os when switching tables?  I see nothing in the table_load()
> path that would deal with this.  I'm guessing that the requests are
> pushed back to the block layer and are later resubmitted and requeued on
> the new multipath queue, but I don't see how that works.
> 
> Code references would be very welcome.

Relevant piece of codes is:
  - multipath_presuspend() temporarily disables "queue_if_no_path"
  - during the suspend process, __must_push_back() catches (otherwise
    failing) requests and requeues them back to the block layer queue
  - upon resume, dm starts processing requests in the block layer queue
    as usual

Hope this helps.
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-10-31 14:16                                   ` Frank Mayhar
  2013-10-31 14:31                                     ` Alasdair G Kergon
@ 2013-11-01  1:58                                     ` Junichi Nomura
  2013-11-01  4:17                                       ` Junichi Nomura
  1 sibling, 1 reply; 49+ messages in thread
From: Junichi Nomura @ 2013-11-01  1:58 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: device-mapper development, Alasdair G. Kergon, Mike Snitzer

On 10/31/13 23:16, Frank Mayhar wrote:
> On Thu, 2013-10-31 at 09:36 +0000, Junichi Nomura wrote:
>> On 10/31/13 03:09, Frank Mayhar wrote:
>>> On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote:
>>>> On Wed, Oct 30 2013 at 11:08am -0400,
>>>> Frank Mayhar <fmayhar@google.com> wrote:
>>>>
>>>>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
>>>>>> Any interest in this or should I just table it for >= v3.14?
>>>>>
>>>>> Sorry, I've been busy putting out another fire.  Yes, there's definitely
>>>>> still interest.  I grabbed your revised patch and tested with it.
>>>>> Unfortunately the timeout doesn't actually fire when requests are queued
>>>>> due to queue_if_no_path; IIRC the block request queue timeout logic
>>>>> wasn't triggering.  I planned to look into it more deeply figure out why
>>>>> but I had to spend all last week fixing a nasty race and hadn't gotten
>>>>> back to it yet.
>>>>
>>>> OK, Hannes, any idea why this might be happening?  The patch in question
>>>> is here: https://patchwork.kernel.org/patch/3070391/
>>>
>>> I got to this today and so far the most interesting I see is that the
>>> cloned request that's queued in multipath has no queue associated with
>>> it when it's queued; a printk reveals:
>>>
>>> [  517.610042] map_io: queueing rq ffff8801150e0070 q           (null)
>>>
>>> When it's eventually dequeued, it gets a queue from the destination
>>> device (in the pgpath) via bdev_get_queue().
>>>
>>> Because of this and from just looking at the code, blk_start_request()
>>> (and therefore blk_add_timer()) isn't being called for those requests,
>>> so there's never a chance that the timeout would happen.
>>>
>>> Does this make sense?  Or am I totally off-base?
>>
>> Hi,
>>
>> I haven't checked the above patch in detail but there is a problem;
>> abort_if_no_path() treats "rq" as a clone request, which it isn't.
>> "rq" is an original request.
>>
>> It shouldn't be a correct fix but just for testing purpose, you can try
>> changing:
>>   info = dm_get_rq_mapinfo(rq);
>> to
>>   info = dm_get_rq_mapinfo(rq->special);
>> and see what happens.
> 
> Well, at the moment this is kind of moot since abort_if_no_path() isn't
> being called.  But, regardless, don't we want to time out the clone
> request?  That is, after all, what is being queued in map_io().
> Unfortunately the clones don't appear to be associated with a request
> queue; they're just put on multipath's internal queue.

Hmm, "isn't being called" is strange.
If the clone is in multipath's internal queue, the original
should have been "started" from request queue point of view
and timeout should fire.

As for the "clone or original" question, if you are to use the block timer,
you have to use it for the original request (then perhaps let the handler
find its clone and kill it).
That's because (as you already see) clones are not associated with
request queue when it's queued in multipath internal queue
and when it's associated, it belongs to the lower device's queue.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-11-01  1:58                                     ` Junichi Nomura
@ 2013-11-01  4:17                                       ` Junichi Nomura
  2013-11-05 15:18                                         ` Frank Mayhar
  0 siblings, 1 reply; 49+ messages in thread
From: Junichi Nomura @ 2013-11-01  4:17 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: device-mapper development, Mike Snitzer, Alasdair G. Kergon

On 11/01/13 10:58, Junichi Nomura wrote:
> On 10/31/13 23:16, Frank Mayhar wrote:
>> On Thu, 2013-10-31 at 09:36 +0000, Junichi Nomura wrote:
>>> On 10/31/13 03:09, Frank Mayhar wrote:
>>>> On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote:
>>>>> On Wed, Oct 30 2013 at 11:08am -0400,
>>>>> Frank Mayhar <fmayhar@google.com> wrote:
>>>>>
>>>>>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
>>>>>>> Any interest in this or should I just table it for >= v3.14?
>>>>>>
>>>>>> Sorry, I've been busy putting out another fire.  Yes, there's definitely
>>>>>> still interest.  I grabbed your revised patch and tested with it.
>>>>>> Unfortunately the timeout doesn't actually fire when requests are queued
>>>>>> due to queue_if_no_path; IIRC the block request queue timeout logic
>>>>>> wasn't triggering.  I planned to look into it more deeply figure out why
>>>>>> but I had to spend all last week fixing a nasty race and hadn't gotten
>>>>>> back to it yet.
>>>>>
>>>>> OK, Hannes, any idea why this might be happening?  The patch in question
>>>>> is here: https://patchwork.kernel.org/patch/3070391/
>>>>
>>>> I got to this today and so far the most interesting I see is that the
>>>> cloned request that's queued in multipath has no queue associated with
>>>> it when it's queued; a printk reveals:
>>>>
>>>> [  517.610042] map_io: queueing rq ffff8801150e0070 q           (null)
>>>>
>>>> When it's eventually dequeued, it gets a queue from the destination
>>>> device (in the pgpath) via bdev_get_queue().
>>>>
>>>> Because of this and from just looking at the code, blk_start_request()
>>>> (and therefore blk_add_timer()) isn't being called for those requests,
>>>> so there's never a chance that the timeout would happen.
>>>>
>>>> Does this make sense?  Or am I totally off-base?
>>>
>>> Hi,
>>>
>>> I haven't checked the above patch in detail but there is a problem;
>>> abort_if_no_path() treats "rq" as a clone request, which it isn't.
>>> "rq" is an original request.
>>>
>>> It shouldn't be a correct fix but just for testing purpose, you can try
>>> changing:
>>>   info = dm_get_rq_mapinfo(rq);
>>> to
>>>   info = dm_get_rq_mapinfo(rq->special);
>>> and see what happens.
>>
>> Well, at the moment this is kind of moot since abort_if_no_path() isn't
>> being called.  But, regardless, don't we want to time out the clone
>> request?  That is, after all, what is being queued in map_io().
>> Unfortunately the clones don't appear to be associated with a request
>> queue; they're just put on multipath's internal queue.
> 
> Hmm, "isn't being called" is strange.
> If the clone is in multipath's internal queue, the original
> should have been "started" from request queue point of view
> and timeout should fire.
> 
> As for the "clone or original" question, if you are to use the block timer,
> you have to use it for the original request (then perhaps let the handler
> find its clone and kill it).
> That's because (as you already see) clones are not associated with
> request queue when it's queued in multipath internal queue
> and when it's associated, it belongs to the lower device's queue.

I slightly modified the patch:
  - fixed the timeout handler to correctly find
    clone request and "struct multipath"
  - the timeout handler just disables "queue_if_no_path"
    instead of killing the request directly
  - "dmsetup status" to show the parameter
  - changed an interface between dm core and target
  - added some debugging printk (you can remove them)
and checked the timeout occurs, at least.

I'm not sure if this feature is good or not though.
(The timer behavior is not intuitive, I think)

---
Jun'ichi Nomura, NEC Corporation


diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 65f1035..7ea06b0 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -170,7 +170,7 @@ void blk_add_timer(struct request *req)
 	struct request_queue *q = req->q;
 	unsigned long expiry;
 
-	if (!q->rq_timed_out_fn)
+	if (!q->rq_timed_out_fn || !q->rq_timeout)
 		return;
 
 	BUG_ON(!list_empty(&req->timeout_list));
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index de570a5..c345fef 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -105,6 +105,8 @@ struct multipath {
 	mempool_t *mpio_pool;
 
 	struct mutex work_mutex;
+
+	unsigned no_path_timeout;
 };
 
 /*
@@ -444,6 +446,48 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path,
 	return 0;
 }
 
+/*
+ * Block timeout callback, called from the block layer
+ *
+ * request_queue lock is held on entry.
+ *
+ * Return values:
+ * BLK_EH_RESET_TIMER if the request should be left running
+ * BLK_EH_NOT_HANDLED if the request is handled or terminated
+ *                    by the driver.
+ */
+enum blk_eh_timer_return multipath_timed_out(struct dm_target *ti,
+					     struct request *clone)
+{
+	unsigned long flags;
+	struct multipath *m = ti->private;
+	int rc = BLK_EH_RESET_TIMER;
+	int flush_ios = 0;
+
+	printk(KERN_DEBUG "[%lu] multipath_timed_out %p %p\n", jiffies, clone, m);
+
+	/*
+	 * Only abort request if:
+	 * - queued_ios is not empty
+	 *   (protect against races with process_queued_ios)
+	 * - no valid paths are found
+	 */
+	spin_lock_irqsave(&m->lock, flags);
+	if (m->no_path_timeout &&
+	    !list_empty(&m->queued_ios) && !m->nr_valid_paths) {
+		flush_ios = 1;
+	}
+	spin_unlock_irqrestore(&m->lock, flags);
+
+	if (flush_ios) {
+		DMWARN("disabling queue_if_no_path as I/O timed out\n");
+		queue_if_no_path(m, 0, 0);
+		queue_work(kmultipathd, &m->process_queued_ios);
+	}
+
+	return rc;
+}
+
 /*-----------------------------------------------------------------
  * The multipath daemon is responsible for resubmitting queued ios.
  *---------------------------------------------------------------*/
@@ -790,6 +834,7 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
 		{0, 6, "invalid number of feature args"},
 		{1, 50, "pg_init_retries must be between 1 and 50"},
 		{0, 60000, "pg_init_delay_msecs must be between 0 and 60000"},
+		{0, 65535, "no_path_timeout must be between 0 and 65535"},
 	};
 
 	r = dm_read_arg_group(_args, as, &argc, &ti->error);
@@ -827,6 +872,13 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
 			continue;
 		}
 
+		if (!strcasecmp(arg_name, "no_path_timeout") &&
+		    (argc >= 1)) {
+			r = dm_read_arg(_args + 3, as, &m->no_path_timeout, &ti->error);
+			argc--;
+			continue;
+		}
+
 		ti->error = "Unrecognised multipath feature request";
 		r = -EINVAL;
 	} while (argc && !r);
@@ -1384,6 +1436,9 @@ static void multipath_resume(struct dm_target *ti)
 
 	spin_lock_irqsave(&m->lock, flags);
 	m->queue_if_no_path = m->saved_queue_if_no_path;
+	if (m->no_path_timeout)
+		printk("Setting no_path_timeout %d\n", m->no_path_timeout);
+	dm_set_timeout(dm_table_get_md(m->ti->table), m->no_path_timeout);
 	spin_unlock_irqrestore(&m->lock, flags);
 }
 
@@ -1421,11 +1476,14 @@ static void multipath_status(struct dm_target *ti, status_type_t type,
 		DMEMIT("2 %u %u ", m->queue_size, m->pg_init_count);
 	else {
 		DMEMIT("%u ", m->queue_if_no_path +
+			      (m->no_path_timeout > 0) * 2 +
 			      (m->pg_init_retries > 0) * 2 +
 			      (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT) * 2 +
 			      m->retain_attached_hw_handler);
 		if (m->queue_if_no_path)
 			DMEMIT("queue_if_no_path ");
+		if (m->no_path_timeout)
+			DMEMIT("no_path_timeout %u ", m->no_path_timeout);
 		if (m->pg_init_retries)
 			DMEMIT("pg_init_retries %u ", m->pg_init_retries);
 		if (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT)
@@ -1728,6 +1786,7 @@ static struct target_type multipath_target = {
 	.ioctl  = multipath_ioctl,
 	.iterate_devices = multipath_iterate_devices,
 	.busy = multipath_busy,
+	.timed_out = multipath_timed_out,
 };
 
 static int __init dm_multipath_init(void)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 8f87835..32c5399 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -26,6 +26,8 @@
 #define KEYS_PER_NODE (NODE_SIZE / sizeof(sector_t))
 #define CHILDREN_PER_NODE (KEYS_PER_NODE + 1)
 
+enum blk_eh_timer_return dm_rq_timed_out(struct request *rq);
+
 struct dm_table {
 	struct mapped_device *md;
 	unsigned type;
@@ -1519,6 +1521,7 @@ static void suspend_targets(struct dm_table *t, unsigned postsuspend)
 
 		ti++;
 	}
+	blk_queue_rq_timed_out(dm_disk(t->md)->queue, NULL);
 }
 
 void dm_table_presuspend_targets(struct dm_table *t)
@@ -1540,10 +1543,14 @@ void dm_table_postsuspend_targets(struct dm_table *t)
 int dm_table_resume_targets(struct dm_table *t)
 {
 	int i, r = 0;
+	int has_timeout = 0;
 
 	for (i = 0; i < t->num_targets; i++) {
 		struct dm_target *ti = t->targets + i;
 
+		if (ti->type->timed_out)
+			has_timeout = 1;
+
 		if (!ti->type->preresume)
 			continue;
 
@@ -1552,6 +1559,9 @@ int dm_table_resume_targets(struct dm_table *t)
 			return r;
 	}
 
+	if (has_timeout)
+		blk_queue_rq_timed_out(dm_disk(t->md)->queue, dm_rq_timed_out);
+
 	for (i = 0; i < t->num_targets; i++) {
 		struct dm_target *ti = t->targets + i;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b3e26c7..701d3d2 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1026,6 +1026,13 @@ static void end_clone_request(struct request *clone, int error)
 	dm_complete_request(clone, error);
 }
 
+int dm_set_timeout(struct mapped_device *md, unsigned int tmo)
+{
+	blk_queue_rq_timeout(md->queue, tmo * HZ);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dm_set_timeout);
+
 /*
  * Return maximum size of I/O possible at the supplied sector up to the current
  * target boundary.
@@ -1829,6 +1836,23 @@ out:
 	dm_put_live_table(md, srcu_idx);
 }
 
+enum blk_eh_timer_return dm_rq_timed_out(struct request *rq)
+{
+	struct request *clone = rq->special;
+	struct dm_rq_target_io *tio = clone->end_io_data;
+
+	printk(KERN_DEBUG "[%lu] dm_timed_out %p : %p\n", jiffies, rq, tio);
+
+	if (!tio)
+		goto out; /* not mapped */
+
+	if (tio->ti->type->timed_out)
+		return tio->ti->type->timed_out(tio->ti, clone);
+
+out:
+	return BLK_EH_RESET_TIMER;
+}
+
 int dm_underlying_device_busy(struct request_queue *q)
 {
 	return blk_lld_busy(q);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index ed419c6..84c0368 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -107,6 +107,8 @@ typedef int (*dm_iterate_devices_fn) (struct dm_target *ti,
 typedef void (*dm_io_hints_fn) (struct dm_target *ti,
 				struct queue_limits *limits);
 
+typedef enum blk_eh_timer_return (*dm_rq_timed_out_fn) (struct dm_target *ti,
+							struct request *clone);
 /*
  * Returns:
  *    0: The target can handle the next I/O immediately.
@@ -162,6 +164,7 @@ struct target_type {
 	dm_busy_fn busy;
 	dm_iterate_devices_fn iterate_devices;
 	dm_io_hints_fn io_hints;
+	dm_rq_timed_out_fn timed_out;
 
 	/* For internal device-mapper use. */
 	struct list_head list;
@@ -604,5 +607,6 @@ void dm_dispatch_request(struct request *rq);
 void dm_requeue_unmapped_request(struct request *rq);
 void dm_kill_unmapped_request(struct request *rq, int error);
 int dm_underlying_device_busy(struct request_queue *q);
+int dm_set_timeout(struct mapped_device *md, unsigned int tmo);
 
 #endif	/* _LINUX_DEVICE_MAPPER_H */

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
  2013-11-01  4:17                                       ` Junichi Nomura
@ 2013-11-05 15:18                                         ` Frank Mayhar
  2013-11-05 16:02                                           ` [RFC PATCH v3] " Frank Mayhar
  0 siblings, 1 reply; 49+ messages in thread
From: Frank Mayhar @ 2013-11-05 15:18 UTC (permalink / raw)
  To: Junichi Nomura
  Cc: device-mapper development, Mike Snitzer, Alasdair G. Kergon

On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote:
> I slightly modified the patch:
>   - fixed the timeout handler to correctly find
>     clone request and "struct multipath"
>   - the timeout handler just disables "queue_if_no_path"
>     instead of killing the request directly
>   - "dmsetup status" to show the parameter
>   - changed an interface between dm core and target
>   - added some debugging printk (you can remove them)
> and checked the timeout occurs, at least.
> 
> I'm not sure if this feature is good or not though.
> (The timer behavior is not intuitive, I think)
> 
> ---
> Jun'ichi Nomura, NEC Corporation
> 
[patch trimmed]

Thanks!  I integrated your new patch and tested it.  Sure enough, it
seems to work.  I've made a few tweaks (added a module tunable and
support for setting the timer in multipath_message(), removed your debug
printks) and will submit the modified patch for discussion shortly.
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout
  2013-11-05 15:18                                         ` Frank Mayhar
@ 2013-11-05 16:02                                           ` Frank Mayhar
  2013-11-05 16:53                                             ` Mike Snitzer
  2013-11-06  6:54                                             ` Hannes Reinecke
  0 siblings, 2 replies; 49+ messages in thread
From: Frank Mayhar @ 2013-11-05 16:02 UTC (permalink / raw)
  To: Junichi Nomura
  Cc: device-mapper development, Mike Snitzer, Alasdair G. Kergon

This is the patch submitted by Jun'ichi Nomura, originally based on
Mike's patch with some small changes by me.  Jun'ichi's description
follows, along with my changes:

On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote:
> On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote:
> > I slightly modified the patch:
> >   - fixed the timeout handler to correctly find
> >     clone request and "struct multipath"
> >   - the timeout handler just disables "queue_if_no_path"
> >     instead of killing the request directly
> >   - "dmsetup status" to show the parameter
> >   - changed an interface between dm core and target
> >   - added some debugging printk (you can remove them)
> > and checked the timeout occurs, at least.
> > 
> > I'm not sure if this feature is good or not though.
> > (The timer behavior is not intuitive, I think)
> Thanks!  I integrated your new patch and tested it.  Sure enough, it
> seems to work.  I've made a few tweaks (added a module tunable and
> support for setting the timer in multipath_message(), removed your debug
> printks) and will submit the modified patch for discussion shortly.

Comments?

---
 block/blk-timeout.c           |    2 
 drivers/md/dm-mpath.c         |   85 ++++++++++++++++++++++++++++++++++++++++++
 drivers/md/dm-table.c         |   10 ++++
 drivers/md/dm.c               |   22 ++++++++++
 include/linux/device-mapper.h |    4 +
 5 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 65f1035..7ea06b0 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -170,7 +170,7 @@ void blk_add_timer(struct request *req)
 	struct request_queue *q = req->q;
 	unsigned long expiry;
 
-	if (!q->rq_timed_out_fn)
+	if (!q->rq_timed_out_fn || !q->rq_timeout)
 		return;
 
 	BUG_ON(!list_empty(&req->timeout_list));
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index de570a5..6f127b7 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -27,6 +27,8 @@
 #define DM_PG_INIT_DELAY_MSECS 2000
 #define DM_PG_INIT_DELAY_DEFAULT ((unsigned) -1)
 
+#define DEFAULT_NO_PATH_TIMEOUT_SECS 0
+
 /* Path properties */
 struct pgpath {
 	struct list_head list;
@@ -105,6 +107,8 @@ struct multipath {
 	mempool_t *mpio_pool;
 
 	struct mutex work_mutex;
+
+	unsigned no_path_timeout;
 };
 
 /*
@@ -117,6 +121,10 @@ struct dm_mpath_io {
 
 typedef int (*action_fn) (struct pgpath *pgpath);
 
+static unsigned long global_no_path_timeout = DEFAULT_NO_PATH_TIMEOUT_SECS;
+module_param_named(queue_if_no_path_timeout_secs,
+		   global_no_path_timeout, ulong, S_IRUGO | S_IWUSR);
+
 static struct kmem_cache *_mpio_cache;
 
 static struct workqueue_struct *kmultipathd, *kmpath_handlerd;
@@ -207,6 +215,7 @@ static struct multipath *alloc_multipath(struct dm_target *ti)
 			kfree(m);
 			return NULL;
 		}
+		m->no_path_timeout = global_no_path_timeout;
 		m->ti = ti;
 		ti->private = m;
 	}
@@ -444,6 +453,45 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path,
 	return 0;
 }
 
+/*
+ * Block timeout callback, called from the block layer
+ *
+ * request_queue lock is held on entry.
+ *
+ * Return values:
+ * BLK_EH_RESET_TIMER if the request should be left running
+ * BLK_EH_NOT_HANDLED if the request is handled or terminated
+ *                    by the driver.
+ */
+enum blk_eh_timer_return multipath_timed_out(struct dm_target *ti,
+					     struct request *clone)
+{
+	unsigned long flags;
+	struct multipath *m = ti->private;
+	int rc = BLK_EH_RESET_TIMER;
+	int flush_ios = 0;
+
+	/*
+	 * Only abort request if:
+	 * - queued_ios is not empty
+	 *   (protect against races with process_queued_ios)
+	 * - no valid paths are found
+	 */
+	spin_lock_irqsave(&m->lock, flags);
+	if (m->no_path_timeout &&
+	    !list_empty(&m->queued_ios) && !m->nr_valid_paths) {
+		flush_ios = 1;
+	}
+	spin_unlock_irqrestore(&m->lock, flags);
+
+	if (flush_ios) {
+		queue_if_no_path(m, 0, 0);
+		queue_work(kmultipathd, &m->process_queued_ios);
+	}
+
+	return rc;
+}
+
 /*-----------------------------------------------------------------
  * The multipath daemon is responsible for resubmitting queued ios.
  *---------------------------------------------------------------*/
@@ -790,6 +838,7 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
 		{0, 6, "invalid number of feature args"},
 		{1, 50, "pg_init_retries must be between 1 and 50"},
 		{0, 60000, "pg_init_delay_msecs must be between 0 and 60000"},
+		{0, 65535, "queue_if_no_path_timeout_secs must be between 0 and 65535"},
 	};
 
 	r = dm_read_arg_group(_args, as, &argc, &ti->error);
@@ -827,6 +876,13 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
 			continue;
 		}
 
+		if (!strcasecmp(arg_name, "queue_if_no_path_timeout_secs") &&
+		    (argc >= 1)) {
+			r = dm_read_arg(_args + 3, as, &m->no_path_timeout, &ti->error);
+			argc--;
+			continue;
+		}
+
 		ti->error = "Unrecognised multipath feature request";
 		r = -EINVAL;
 	} while (argc && !r);
@@ -978,6 +1034,27 @@ static int multipath_map(struct dm_target *ti, struct request *clone,
 }
 
 /*
+ * Set the no_path_timeout on behalf of a message.  Note that this only
+ * affects new requests; already-queued requests are unaffected.
+ */
+static int set_no_path_timeout(struct multipath *m, const char *timestr)
+{
+	unsigned long to;
+	unsigned long flags;
+
+	if (!timestr || (sscanf(timestr, "%lu", &to) != 1)) {
+		DMWARN("invalid timeout supplied to set_no_path_timeout");
+		return -EINVAL;
+	}
+
+	spin_lock_irqsave(&m->lock, flags);
+	m->no_path_timeout = to;
+	dm_set_timeout(dm_table_get_md(m->ti->table), m->no_path_timeout);
+	spin_unlock_irqrestore(&m->lock, flags);
+	return 0;
+}
+
+/*
  * Take a path out of use.
  */
 static int fail_path(struct pgpath *pgpath)
@@ -1384,6 +1461,7 @@ static void multipath_resume(struct dm_target *ti)
 
 	spin_lock_irqsave(&m->lock, flags);
 	m->queue_if_no_path = m->saved_queue_if_no_path;
+	dm_set_timeout(dm_table_get_md(m->ti->table), m->no_path_timeout);
 	spin_unlock_irqrestore(&m->lock, flags);
 }
 
@@ -1421,11 +1499,14 @@ static void multipath_status(struct dm_target *ti, status_type_t type,
 		DMEMIT("2 %u %u ", m->queue_size, m->pg_init_count);
 	else {
 		DMEMIT("%u ", m->queue_if_no_path +
+				  (m->no_path_timeout > 0) * 2 +
 			      (m->pg_init_retries > 0) * 2 +
 			      (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT) * 2 +
 			      m->retain_attached_hw_handler);
 		if (m->queue_if_no_path)
 			DMEMIT("queue_if_no_path ");
+		if (m->no_path_timeout)
+			DMEMIT("queue_if_no_path_timeout_secs %u ", m->no_path_timeout);
 		if (m->pg_init_retries)
 			DMEMIT("pg_init_retries %u ", m->pg_init_retries);
 		if (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT)
@@ -1550,6 +1631,9 @@ static int multipath_message(struct dm_target *ti, unsigned argc, char **argv)
 	} else if (!strcasecmp(argv[0], "switch_group")) {
 		r = switch_pg_num(m, argv[1]);
 		goto out;
+	} else if (!strcasecmp(argv[0], "queue_if_no_path_timeout_secs")) {
+		r = set_no_path_timeout(m, argv[1]);
+		goto out;
 	} else if (!strcasecmp(argv[0], "reinstate_path"))
 		action = reinstate_path;
 	else if (!strcasecmp(argv[0], "fail_path"))
@@ -1728,6 +1812,7 @@ static struct target_type multipath_target = {
 	.ioctl  = multipath_ioctl,
 	.iterate_devices = multipath_iterate_devices,
 	.busy = multipath_busy,
+	.timed_out = multipath_timed_out,
 };
 
 static int __init dm_multipath_init(void)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 8f87835..32c5399 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -26,6 +26,8 @@
 #define KEYS_PER_NODE (NODE_SIZE / sizeof(sector_t))
 #define CHILDREN_PER_NODE (KEYS_PER_NODE + 1)
 
+enum blk_eh_timer_return dm_rq_timed_out(struct request *rq);
+
 struct dm_table {
 	struct mapped_device *md;
 	unsigned type;
@@ -1519,6 +1521,7 @@ static void suspend_targets(struct dm_table *t, unsigned postsuspend)
 
 		ti++;
 	}
+	blk_queue_rq_timed_out(dm_disk(t->md)->queue, NULL);
 }
 
 void dm_table_presuspend_targets(struct dm_table *t)
@@ -1540,10 +1543,14 @@ void dm_table_postsuspend_targets(struct dm_table *t)
 int dm_table_resume_targets(struct dm_table *t)
 {
 	int i, r = 0;
+	int has_timeout = 0;
 
 	for (i = 0; i < t->num_targets; i++) {
 		struct dm_target *ti = t->targets + i;
 
+		if (ti->type->timed_out)
+			has_timeout = 1;
+
 		if (!ti->type->preresume)
 			continue;
 
@@ -1552,6 +1559,9 @@ int dm_table_resume_targets(struct dm_table *t)
 			return r;
 	}
 
+	if (has_timeout)
+		blk_queue_rq_timed_out(dm_disk(t->md)->queue, dm_rq_timed_out);
+
 	for (i = 0; i < t->num_targets; i++) {
 		struct dm_target *ti = t->targets + i;
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b3e26c7..bb7a29c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1026,6 +1026,13 @@ static void end_clone_request(struct request *clone, int error)
 	dm_complete_request(clone, error);
 }
 
+int dm_set_timeout(struct mapped_device *md, unsigned int tmo)
+{
+	blk_queue_rq_timeout(md->queue, tmo * HZ);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dm_set_timeout);
+
 /*
  * Return maximum size of I/O possible at the supplied sector up to the current
  * target boundary.
@@ -1829,6 +1836,21 @@ out:
 	dm_put_live_table(md, srcu_idx);
 }
 
+enum blk_eh_timer_return dm_rq_timed_out(struct request *rq)
+{
+	struct request *clone = rq->special;
+	struct dm_rq_target_io *tio = clone->end_io_data;
+
+	if (!tio)
+		goto out; /* not mapped */
+
+	if (tio->ti->type->timed_out)
+		return tio->ti->type->timed_out(tio->ti, clone);
+
+out:
+	return BLK_EH_RESET_TIMER;
+}
+
 int dm_underlying_device_busy(struct request_queue *q)
 {
 	return blk_lld_busy(q);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index ed419c6..84c0368 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -107,6 +107,8 @@ typedef int (*dm_iterate_devices_fn) (struct dm_target *ti,
 typedef void (*dm_io_hints_fn) (struct dm_target *ti,
 				struct queue_limits *limits);
 
+typedef enum blk_eh_timer_return (*dm_rq_timed_out_fn) (struct dm_target *ti,
+							struct request *clone);
 /*
  * Returns:
  *    0: The target can handle the next I/O immediately.
@@ -162,6 +164,7 @@ struct target_type {
 	dm_busy_fn busy;
 	dm_iterate_devices_fn iterate_devices;
 	dm_io_hints_fn io_hints;
+	dm_rq_timed_out_fn timed_out;
 
 	/* For internal device-mapper use. */
 	struct list_head list;
@@ -604,5 +607,6 @@ void dm_dispatch_request(struct request *rq);
 void dm_requeue_unmapped_request(struct request *rq);
 void dm_kill_unmapped_request(struct request *rq, int error);
 int dm_underlying_device_busy(struct request_queue *q);
+int dm_set_timeout(struct mapped_device *md, unsigned int tmo);
 
 #endif	/* _LINUX_DEVICE_MAPPER_H */

-- 
Frank Mayhar
310-460-4042

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout
  2013-11-05 16:02                                           ` [RFC PATCH v3] " Frank Mayhar
@ 2013-11-05 16:53                                             ` Mike Snitzer
  2013-11-06  6:54                                             ` Hannes Reinecke
  1 sibling, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-11-05 16:53 UTC (permalink / raw)
  To: Frank Mayhar
  Cc: Junichi Nomura, device-mapper development, Alasdair G. Kergon

On Tue, Nov 05 2013 at 11:02am -0500,
Frank Mayhar <fmayhar@google.com> wrote:

> This is the patch submitted by Jun'ichi Nomura, originally based on
> Mike's patch with some small changes by me.  Jun'ichi's description
> follows, along with my changes:
> 
> On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote:
> > On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote:
> > > I slightly modified the patch:
> > >   - fixed the timeout handler to correctly find
> > >     clone request and "struct multipath"
> > >   - the timeout handler just disables "queue_if_no_path"
> > >     instead of killing the request directly
> > >   - "dmsetup status" to show the parameter
> > >   - changed an interface between dm core and target
> > >   - added some debugging printk (you can remove them)
> > > and checked the timeout occurs, at least.
> > > 
> > > I'm not sure if this feature is good or not though.
> > > (The timer behavior is not intuitive, I think)
> > Thanks!  I integrated your new patch and tested it.  Sure enough, it
> > seems to work.  I've made a few tweaks (added a module tunable and
> > support for setting the timer in multipath_message(), removed your debug
> > printks) and will submit the modified patch for discussion shortly.
> 
> Comments?

My primary concern is that this patch is always establishing a timed_out
method via blk_queue_rq_timed_out() regardless of whether the mpath
device needs it.  I'm also not a fan of adding such a specialized
rq-based only hook (dm_rq_timed_out_fn timed_out).

Could be we'll need to do other things to the queue (be it bio-based or
rq-based) in the future.

So I prefer the approach I took to arming the queue (via new .init_queue
hook) in this patch: https://patchwork.kernel.org/patch/3070391/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout
  2013-11-05 16:02                                           ` [RFC PATCH v3] " Frank Mayhar
  2013-11-05 16:53                                             ` Mike Snitzer
@ 2013-11-06  6:54                                             ` Hannes Reinecke
  2013-11-06 15:43                                               ` Frank Mayhar
  1 sibling, 1 reply; 49+ messages in thread
From: Hannes Reinecke @ 2013-11-06  6:54 UTC (permalink / raw)
  To: Frank Mayhar
  Cc: Jun'ichi Nomura, dm-devel, Alasdair G Kergon, Mike Snitzer

On 11/05/2013 05:02 PM, Frank Mayhar wrote:
> This is the patch submitted by Jun'ichi Nomura, originally based on
> Mike's patch with some small changes by me.  Jun'ichi's description
> follows, along with my changes:
> 
> On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote:
>> On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote:
>>> I slightly modified the patch:
>>>   - fixed the timeout handler to correctly find
>>>     clone request and "struct multipath"
>>>   - the timeout handler just disables "queue_if_no_path"
>>>     instead of killing the request directly
>>>   - "dmsetup status" to show the parameter
>>>   - changed an interface between dm core and target
>>>   - added some debugging printk (you can remove them)
>>> and checked the timeout occurs, at least.
>>>
>>> I'm not sure if this feature is good or not though.
>>> (The timer behavior is not intuitive, I think)
>> Thanks!  I integrated your new patch and tested it.  Sure enough, it
>> seems to work.  I've made a few tweaks (added a module tunable and
>> support for setting the timer in multipath_message(), removed your debug
>> printks) and will submit the modified patch for discussion shortly.
> 
> Comments?
> 
Yeah. Seems to be my eternal fate; initiating fixes and not getting
mentioned at all.
Sigh.

I dimly remember having sent the original patch for the blk timeout
function ... hence a short notice would've been nice.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout
  2013-11-06  6:54                                             ` Hannes Reinecke
@ 2013-11-06 15:43                                               ` Frank Mayhar
  2013-11-06 19:21                                                 ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Frank Mayhar @ 2013-11-06 15:43 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jun'ichi Nomura, dm-devel, Alasdair G Kergon, Mike, Snitzer

On Wed, 2013-11-06 at 07:54 +0100, Hannes Reinecke wrote:
> On 11/05/2013 05:02 PM, Frank Mayhar wrote:
> > This is the patch submitted by Jun'ichi Nomura, originally based on
> > Mike's patch with some small changes by me.  Jun'ichi's description
> > follows, along with my changes:
> > 
> > On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote:
> >> On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote:
> >>> I slightly modified the patch:
> >>>   - fixed the timeout handler to correctly find
> >>>     clone request and "struct multipath"
> >>>   - the timeout handler just disables "queue_if_no_path"
> >>>     instead of killing the request directly
> >>>   - "dmsetup status" to show the parameter
> >>>   - changed an interface between dm core and target
> >>>   - added some debugging printk (you can remove them)
> >>> and checked the timeout occurs, at least.
> >>>
> >>> I'm not sure if this feature is good or not though.
> >>> (The timer behavior is not intuitive, I think)
> >> Thanks!  I integrated your new patch and tested it.  Sure enough, it
> >> seems to work.  I've made a few tweaks (added a module tunable and
> >> support for setting the timer in multipath_message(), removed your debug
> >> printks) and will submit the modified patch for discussion shortly.
> > 
> > Comments?
> > 
> Yeah. Seems to be my eternal fate; initiating fixes and not getting
> mentioned at all.
> Sigh.
> 
> I dimly remember having sent the original patch for the blk timeout
> function ... hence a short notice would've been nice.

Sorry, I did ding you early on (and I think Mike dinged you as well),
but you were apparently busy with other things at the time.

I also sent email last Thursday, quoted below:


On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote:
> On Wed, Oct 30 2013 at 11:08am -0400,
> Frank Mayhar <fmayhar@google.com> wrote:
> 
> > On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
> > > Any interest in this or should I just table it for >= v3.14?
> > 
> > Sorry, I've been busy putting out another fire.  Yes, there's
definitely
> > still interest.  I grabbed your revised patch and tested with it.
> > Unfortunately the timeout doesn't actually fire when requests are
queued
> > due to queue_if_no_path; IIRC the block request queue timeout logic
> > wasn't triggering.  I planned to look into it more deeply figure out
why
> > but I had to spend all last week fixing a nasty race and hadn't
gotten
> > back to it yet.
> 
> OK, Hannes, any idea why this might be happening?  The patch in
question
> is here: https://patchwork.kernel.org/patch/3070391/

I got to this today and so far the most interesting I see is that the
cloned request that's queued in multipath has no queue associated with
it when it's queued; a printk reveals:

[  517.610042] map_io: queueing rq ffff8801150e0070 q           (null)

When it's eventually dequeued, it gets a queue from the destination
device (in the pgpath) via bdev_get_queue().

Because of this and from just looking at the code, blk_start_request()
(and therefore blk_add_timer()) isn't being called for those requests,
so there's never a chance that the timeout would happen.

Does this make sense?  Or am I totally off-base?
-- 
Frank Mayhar
310-460-4042

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout
  2013-11-06 15:43                                               ` Frank Mayhar
@ 2013-11-06 19:21                                                 ` Mike Snitzer
  2013-11-07  1:03                                                   ` Junichi Nomura
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-11-06 19:21 UTC (permalink / raw)
  To: Frank Mayhar; +Cc: Jun'ichi Nomura, dm-devel, Mike, Alasdair G Kergon

On Wed, Nov 06 2013 at 10:43am -0500,
Frank Mayhar <fmayhar@google.com> wrote:

> On Wed, 2013-11-06 at 07:54 +0100, Hannes Reinecke wrote:
> > On 11/05/2013 05:02 PM, Frank Mayhar wrote:
> > > This is the patch submitted by Jun'ichi Nomura, originally based on
> > > Mike's patch with some small changes by me.  Jun'ichi's description
> > > follows, along with my changes:
> > > 
> > > On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote:
> > >> On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote:
> > >>> I slightly modified the patch:
> > >>>   - fixed the timeout handler to correctly find
> > >>>     clone request and "struct multipath"
> > >>>   - the timeout handler just disables "queue_if_no_path"
> > >>>     instead of killing the request directly
> > >>>   - "dmsetup status" to show the parameter
> > >>>   - changed an interface between dm core and target
> > >>>   - added some debugging printk (you can remove them)
> > >>> and checked the timeout occurs, at least.
> > >>>
> > >>> I'm not sure if this feature is good or not though.
> > >>> (The timer behavior is not intuitive, I think)
> > >> Thanks!  I integrated your new patch and tested it.  Sure enough, it
> > >> seems to work.  I've made a few tweaks (added a module tunable and
> > >> support for setting the timer in multipath_message(), removed your debug
> > >> printks) and will submit the modified patch for discussion shortly.
> > > 
> > > Comments?
> > > 
> > Yeah. Seems to be my eternal fate; initiating fixes and not getting
> > mentioned at all.
> > Sigh.
> > 
> > I dimly remember having sent the original patch for the blk timeout
> > function ... hence a short notice would've been nice.
> 
> Sorry, I did ding you early on (and I think Mike dinged you as well),
> but you were apparently busy with other things at the time.

Hi Frank,

I wouldn't worry about this.  You didn't even supply a patch header.. so
it isn't like Hannes was obviously left out.  Fact is this patch has had
4 iterations, the first of which from Hannes didn't compile or even make
sense.  Anyway, he'll get attribution through Suggested-by unless he
wins the race to produces the first upstream-worthy variant of this line
of work.

So far it has all been RFC-style patches.. your most recent one that
builds on Jun'ichi's patch with a module param and timeout default:  We
generally don't add module params to targets (but I can appreciate why
you might want that.. just not seeing the need).  And having a message
to change the timeout conflicts with my desire to conditionally
establish a timed_out method only if a timeout was specified (like my
last reply in this thread suggested).

Mike

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v3] dm mpath: add a queue_if_no_path timeout
  2013-11-06 19:21                                                 ` Mike Snitzer
@ 2013-11-07  1:03                                                   ` Junichi Nomura
  0 siblings, 0 replies; 49+ messages in thread
From: Junichi Nomura @ 2013-11-07  1:03 UTC (permalink / raw)
  To: device-mapper development; +Cc: Alasdair G Kergon, Frank Mayhar, Mike Snitzer

On 11/07/13 04:21, Mike Snitzer wrote:
> On Wed, Nov 06 2013 at 10:43am -0500,
> Frank Mayhar <fmayhar@google.com> wrote:
> 
>> On Wed, 2013-11-06 at 07:54 +0100, Hannes Reinecke wrote:
>>> On 11/05/2013 05:02 PM, Frank Mayhar wrote:
>>>> This is the patch submitted by Jun'ichi Nomura, originally based on
>>>> Mike's patch with some small changes by me.  Jun'ichi's description
>>>> follows, along with my changes:
>>>>
>>>> On Tue, 2013-11-05 at 07:18 -0800, Frank Mayhar wrote:
>>>>> On Fri, 2013-11-01 at 04:17 +0000, Junichi Nomura wrote:
>>>>>> I slightly modified the patch:
>>>>>>   - fixed the timeout handler to correctly find
>>>>>>     clone request and "struct multipath"
>>>>>>   - the timeout handler just disables "queue_if_no_path"
>>>>>>     instead of killing the request directly
>>>>>>   - "dmsetup status" to show the parameter
>>>>>>   - changed an interface between dm core and target
>>>>>>   - added some debugging printk (you can remove them)
>>>>>> and checked the timeout occurs, at least.
>>>>>>
>>>>>> I'm not sure if this feature is good or not though.
>>>>>> (The timer behavior is not intuitive, I think)
>>>>> Thanks!  I integrated your new patch and tested it.  Sure enough, it
>>>>> seems to work.  I've made a few tweaks (added a module tunable and
>>>>> support for setting the timer in multipath_message(), removed your debug
>>>>> printks) and will submit the modified patch for discussion shortly.
>>>>
>>>> Comments?
>>>>
>>> Yeah. Seems to be my eternal fate; initiating fixes and not getting
>>> mentioned at all.
>>> Sigh.
>>>
>>> I dimly remember having sent the original patch for the blk timeout
>>> function ... hence a short notice would've been nice.
>>
>> Sorry, I did ding you early on (and I think Mike dinged you as well),
>> but you were apparently busy with other things at the time.
> 
> Hi Frank,
> 
> I wouldn't worry about this.  You didn't even supply a patch header.. so
> it isn't like Hannes was obviously left out.  Fact is this patch has had
> 4 iterations, the first of which from Hannes didn't compile or even make
> sense.  Anyway, he'll get attribution through Suggested-by unless he
> wins the race to produces the first upstream-worthy variant of this line
> of work.
> 
> So far it has all been RFC-style patches.. your most recent one that
> builds on Jun'ichi's patch with a module param and timeout default:  We
> generally don't add module params to targets (but I can appreciate why
> you might want that.. just not seeing the need).  And having a message
> to change the timeout conflicts with my desire to conditionally
> establish a timed_out method only if a timeout was specified (like my
> last reply in this thread suggested).

BTW, this approach of using blk timer might conflict with
possible future removal of multipath internal queue:
  https://www.redhat.com/archives/dm-devel/2013-November/msg00025.html
because the timer is stopped if the request is queued to block layer.
To cope with that, we might have to extend block layer to run timer
for queued request, etc.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2013-11-07  1:03 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-26 17:14 RFC for multipath queue_if_no_path timeout Frank Mayhar
2013-09-26 17:24 ` Alasdair G Kergon
2013-09-26 17:31   ` Frank Mayhar
2013-09-26 17:38     ` Alasdair G Kergon
2013-09-26 17:47       ` Frank Mayhar
2013-09-26 17:52         ` Mike Snitzer
2013-09-26 20:36           ` [PATCH 1/1] dm mpath: Add timeout mechanism for queue_if_no_path Frank Mayhar
2013-09-26 23:22         ` RFC for multipath queue_if_no_path timeout Alasdair G Kergon
2013-09-26 23:49           ` Mike Snitzer
2013-09-27  6:07             ` Hannes Reinecke
2013-09-27  8:06               ` Hannes Reinecke
2013-09-27  8:37                 ` Alasdair G Kergon
2013-09-27 13:52                   ` Hannes Reinecke
2013-09-27 16:37                     ` Frank Mayhar
2013-09-27 16:32                 ` Frank Mayhar
2013-09-27 16:29             ` Frank Mayhar
2013-10-17 19:03             ` Frank Mayhar
2013-10-17 19:15               ` Mike Snitzer
2013-10-17 20:45                 ` Frank Mayhar
2013-10-17 21:13                   ` Mike Snitzer
2013-10-18 20:51                     ` Frank Mayhar
2013-10-18 21:47                       ` Alasdair G Kergon
2013-10-18 22:53                       ` [RFC PATCH v2] dm mpath: add a " Mike Snitzer
2013-10-30  1:02                         ` Mike Snitzer
2013-10-30 15:08                           ` Frank Mayhar
2013-10-30 15:43                             ` Mike Snitzer
2013-10-30 18:09                               ` Frank Mayhar
2013-10-31  9:36                                 ` Junichi Nomura
2013-10-31 14:16                                   ` Frank Mayhar
2013-10-31 14:31                                     ` Alasdair G Kergon
2013-10-31 17:17                                       ` Frank Mayhar
2013-11-01  1:23                                         ` Junichi Nomura
2013-11-01  1:58                                     ` Junichi Nomura
2013-11-01  4:17                                       ` Junichi Nomura
2013-11-05 15:18                                         ` Frank Mayhar
2013-11-05 16:02                                           ` [RFC PATCH v3] " Frank Mayhar
2013-11-05 16:53                                             ` Mike Snitzer
2013-11-06  6:54                                             ` Hannes Reinecke
2013-11-06 15:43                                               ` Frank Mayhar
2013-11-06 19:21                                                 ` Mike Snitzer
2013-11-07  1:03                                                   ` Junichi Nomura
2013-10-31 14:59                               ` [RFC PATCH v2] " Hannes Reinecke
2013-10-21 16:05               ` RFC for multipath " Benjamin Marzinski
2013-10-21 16:17                 ` Frank Mayhar
2013-10-23 13:39                   ` Benjamin Marzinski
2013-09-27 16:27           ` Frank Mayhar
2013-09-26 17:41 ` Mike Snitzer
2013-09-26 17:55   ` Frank Mayhar
2013-09-26 18:41     ` Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.