All of lore.kernel.org
 help / color / mirror / Atom feed
* dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
@ 2016-09-21  9:14 jiangyiwen
  2016-09-21 15:04 ` Mike Snitzer
  0 siblings, 1 reply; 11+ messages in thread
From: jiangyiwen @ 2016-09-21  9:14 UTC (permalink / raw)
  To: dm-devel; +Cc: wangyibin, Mike Snitzer

Hi guys,

I'm sorry if someone else has already asked the same question before, but
here's what we are facing with generic multipath.

We were evaluating SCSI-3 PR and hit the following issue:

Our IPSAN has two controllers (C1/C2). Our host is connected to both
controllers. So we have 2 I_T nexus. The same key is registered on each I_T
nexus. When we issue SCSI-3 PR RESERE through generic multipath, the
reservation will be placed on one of the controllers (say, C1). Now if we
issue SCSI-3 PR RELEASE, generic multipath will select _only_ one of its
paths based on user defined path selection policy. The problem is that, if
the selected path leads to controller C2, which does not hold the
reservation, the reservation won't be released - and what's more, since the
reservation is done by the same host, the RELEASE command returns
successfully. So this gives us the illusion that the reservation is released
but in fact it is still there.

I know that there's a user space tool mpathpersist that can handle this
problem by issuing release command on all available paths, but here I am
talking about doing same thing in kernel space. So my question is, is there
anyone who is working on supporting SCSI-3 PR in a multiple controller
environment?

Please advise. Thanks in advance!

Best regards,
Yiwen

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
  2016-09-21  9:14 dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment jiangyiwen
@ 2016-09-21 15:04 ` Mike Snitzer
  2016-09-21 17:14   ` Christoph Hellwig
  2016-09-22  8:14   ` jiangyiwen
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Snitzer @ 2016-09-21 15:04 UTC (permalink / raw)
  To: jiangyiwen; +Cc: wangyibin, dm-devel, hch

On Wed, Sep 21 2016 at  5:14am -0400,
jiangyiwen <jiangyiwen@huawei.com> wrote:

> Hi guys,
> 
> I'm sorry if someone else has already asked the same question before, but
> here's what we are facing with generic multipath.
> 
> We were evaluating SCSI-3 PR and hit the following issue:
> 
> Our IPSAN has two controllers (C1/C2). Our host is connected to both
> controllers. So we have 2 I_T nexus. The same key is registered on each I_T
> nexus. When we issue SCSI-3 PR RESERE through generic multipath, the
> reservation will be placed on one of the controllers (say, C1). Now if we
> issue SCSI-3 PR RELEASE, generic multipath will select _only_ one of its
> paths based on user defined path selection policy. The problem is that, if
> the selected path leads to controller C2, which does not hold the
> reservation, the reservation won't be released - and what's more, since the
> reservation is done by the same host, the RELEASE command returns
> successfully. So this gives us the illusion that the reservation is released
> but in fact it is still there.
> 
> I know that there's a user space tool mpathpersist that can handle this
> problem by issuing release command on all available paths, but here I am
> talking about doing same thing in kernel space. So my question is, is there
> anyone who is working on supporting SCSI-3 PR in a multiple controller
> environment?
> 
> Please advise. Thanks in advance!

Which kernel are you using?  Christoph (now cc'd) has introduced a new
PR api.  I merged a fix from him relatively recently (commit 9c72bad1f3
"dm: call PR reserve/unreserve on each underlying device")

Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
  2016-09-21 15:04 ` Mike Snitzer
@ 2016-09-21 17:14   ` Christoph Hellwig
  2016-09-22  8:14   ` jiangyiwen
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2016-09-21 17:14 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: wangyibin, jiangyiwen, dm-devel, hch

On Wed, Sep 21, 2016 at 11:04:27AM -0400, Mike Snitzer wrote:
> Which kernel are you using?  Christoph (now cc'd) has introduced a new
> PR api.  I merged a fix from him relatively recently (commit 9c72bad1f3
> "dm: call PR reserve/unreserve on each underlying device")

The PR API will handle this fine, but mpathpersist isn't using it yet.
Switching mpathpersist to use the API if available would be a useful
improvement.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
  2016-09-21 15:04 ` Mike Snitzer
  2016-09-21 17:14   ` Christoph Hellwig
@ 2016-09-22  8:14   ` jiangyiwen
  2016-09-26 16:03     ` Christoph Hellwig
  1 sibling, 1 reply; 11+ messages in thread
From: jiangyiwen @ 2016-09-22  8:14 UTC (permalink / raw)
  To: Mike Snitzer, hch; +Cc: wangyibin, dm-devel

On 2016/9/21 23:04, Mike Snitzer wrote:
> On Wed, Sep 21 2016 at  5:14am -0400,
> jiangyiwen <jiangyiwen@huawei.com> wrote:
> 
>> Hi guys,
>>
>> I'm sorry if someone else has already asked the same question before, but
>> here's what we are facing with generic multipath.
>>
>> We were evaluating SCSI-3 PR and hit the following issue:
>>
>> Our IPSAN has two controllers (C1/C2). Our host is connected to both
>> controllers. So we have 2 I_T nexus. The same key is registered on each I_T
>> nexus. When we issue SCSI-3 PR RESERE through generic multipath, the
>> reservation will be placed on one of the controllers (say, C1). Now if we
>> issue SCSI-3 PR RELEASE, generic multipath will select _only_ one of its
>> paths based on user defined path selection policy. The problem is that, if
>> the selected path leads to controller C2, which does not hold the
>> reservation, the reservation won't be released - and what's more, since the
>> reservation is done by the same host, the RELEASE command returns
>> successfully. So this gives us the illusion that the reservation is released
>> but in fact it is still there.
>>
>> I know that there's a user space tool mpathpersist that can handle this
>> problem by issuing release command on all available paths, but here I am
>> talking about doing same thing in kernel space. So my question is, is there
>> anyone who is working on supporting SCSI-3 PR in a multiple controller
>> environment?
>>
>> Please advise. Thanks in advance!
> 
> Which kernel are you using?  Christoph (now cc'd) has introduced a new
> PR api.  I merged a fix from him relatively recently (commit 9c72bad1f3
> "dm: call PR reserve/unreserve on each underlying device")
> 
> Mike
> 
> .
> 
Hello Mike and Christoph,

Thanks for the feedback! We are using ancient 3.0.x kernel so I guess we need to
backport the PR API.

I briefly reviewed the PR API. If I understand correctly, a bunch of PR
dedicated operations (pr_ops) are defined in block_device_operations, which
includes register, reserve, release, preempt and clear operations. But among
these operations, only register() calls dm_call_pr(), which in turns iterates
over the device for each path and does the registeration.

My point is, we also need to do this in release operation, so that we can
eliminate the possibility of trying to release the reservation on the wrong
controller and get a fake success.

Also, since device mapper or multipath doesn't have any idea of which path
(and/or I_T nexus) has registered its key or holds the reservation, it's quite
difficult for the APIs to handle failures like path breakage. Right now, it is
up to the API user to do the error handling, which, IMHO, is painful. I am
wondering whether this could be done within dm/multipath. Please advise.

Thanks,
Yiwen

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
  2016-09-22  8:14   ` jiangyiwen
@ 2016-09-26 16:03     ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2016-09-26 16:03 UTC (permalink / raw)
  To: jiangyiwen; +Cc: wangyibin, dm-devel, hch, Mike Snitzer

On Thu, Sep 22, 2016 at 04:14:50PM +0800, jiangyiwen wrote:
> I briefly reviewed the PR API. If I understand correctly, a bunch of PR
> dedicated operations (pr_ops) are defined in block_device_operations, which
> includes register, reserve, release, preempt and clear operations. But among
> these operations, only register() calls dm_call_pr(), which in turns iterates
> over the device for each path and does the registeration.

Register and unregister in fact, but they both multiple throught the
same method.

> My point is, we also need to do this in release operation, so that we can
> eliminate the possibility of trying to release the reservation on the wrong
> controller and get a fake success.

dm_grab_bdev_for_ioctl ensures that we have a working path.  At least
that's the theory and my testing confirms it.  If you know a case where
the release doesn't work using this path we'll need to fix.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
  2016-10-13  9:19   ` Yibin Wang
  2016-10-13 11:16     ` Yibin Wang
@ 2016-10-14 14:51     ` Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2016-10-14 14:51 UTC (permalink / raw)
  To: Yibin Wang; +Cc: Jiangyiwen, dm-devel, Christoph Hellwig, Mike Snitzer

Hi Yibin,

>>> 	- For reserve/query/preempt/clear, we should return success once an
>>> 	 iteration returns successfully.
>> That's what the dm_grab_bdev_for_ioctl path does.
>
> If I understand correctly, dm_grab_bdev_for_ioctl() select a working path, 
> and
> pr_*() uses that path to do the actually work.
>
> This works for reserve/query/preempt/clear, but it may not work for 
> release.

You're right - if we had a path mapping change inbetween we might
have a problem with the unregister.  That beeing said we have an even
worse problem if the path we registered on disappeared permanently,
so we'll probably need a slightly more complex loop than the one
we have right now.

> I meant the API that callers from above block layer can use - They can not 
> call
> dm_pr_*() directly. So adding blkdev_pr_ops_{register/reserve/etc}() would 
> be
> great.

Take a look at the pNFS SCSI layout driver under fs/nfs/blocklayout
which is the currently existing in-kernel user of the API, it just
calls the block device methods directly.

What additional work would you place in the ops?  Also can you please
send me a link to your user of the API?

>
>>> 5. Support for multiple targets devices.
>>> An md device might have multiple targets. Current implementation only supports
>>> single target device.
>> That's because it is so far only intended for dm-multipath, which always
>> uses as single target.  I'm not against multi-target support, but we'll
>> need a detailed explanation of the use case.
>
> OK. Fair enough. That's rather a "good-to-have" than a "must-have".

As said I'm fine with adding as an enhancement, especially if you submit
an in-kernel user for it, but a useful opensource application also
would be more than enough of a justification

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
  2016-10-13 11:16     ` Yibin Wang
@ 2016-10-14 13:48       ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2016-10-14 13:48 UTC (permalink / raw)
  To: Yibin Wang; +Cc: Jiangyiwen, dm-devel, Christoph Hellwig, Mike Snitzer

Hi Yibin,

On Thu, Oct 13, 2016 at 07:16:13PM +0800, Yibin Wang wrote:
> BTW, did you mean using dm_pr::fail_early as the indication whether
> multipath_iterate_devices() should fail early upon failure? If that's the 
> case,
> I can't see it is used anywhere except in dm_pr_register().
>
> Please correct me if I am wrong.  Thanks!

Yes, fail_early was intended to ensure we don't break out of the
loop.  But it's obviously not checked in the current code, so we'll need
to fix it indeed.  If you have existing patches for that please send
them out, else I'm going to look into it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
  2016-10-13  9:19   ` Yibin Wang
@ 2016-10-13 11:16     ` Yibin Wang
  2016-10-14 13:48       ` Christoph Hellwig
  2016-10-14 14:51     ` Christoph Hellwig
  1 sibling, 1 reply; 11+ messages in thread
From: Yibin Wang @ 2016-10-13 11:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jiangyiwen, dm-devel, Mike Snitzer



On 2016/10/13 17:19, Yibin Wang wrote:
> Hi Christoph,
>
> Thanks for your reply. Comments inline...
>
> On 2016/10/9 23:16, Christoph Hellwig wrote:
> ......
>>
>>>     - For regisetr, we should stop iteration on failure, and 
>>> followed by a
>>>      non-stopping unregister operation.
>> What do you mean with "non-stopping"?  Currently once register failed
>> for a path we then ungerigster all paths, and ignore failures (e.g. due
>> to a down path or an not already registered path).
>

BTW, did you mean using dm_pr::fail_early as the indication whether
multipath_iterate_devices() should fail early upon failure? If that's 
the case,
I can't see it is used anywhere except in dm_pr_register().

Please correct me if I am wrong.  Thanks!

> That's exactly what I meant - do not stop on failures while doing 
> unregistration.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
  2016-10-09 15:16 ` Christoph Hellwig
@ 2016-10-13  9:19   ` Yibin Wang
  2016-10-13 11:16     ` Yibin Wang
  2016-10-14 14:51     ` Christoph Hellwig
  0 siblings, 2 replies; 11+ messages in thread
From: Yibin Wang @ 2016-10-13  9:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jiangyiwen, dm-devel, Mike Snitzer

Hi Christoph,

Thanks for your reply. Comments inline...

On 2016/10/9 23:16, Christoph Hellwig wrote:
> Hi Yibin,
>
> On Sun, Oct 09, 2016 at 01:12:41PM +0000, wangyibin wrote:
>> 1. Improper dm_pr_ops implementation.
>> The dm_pr_ops functions, except register/unregister, all result in
>> infinite loop by recursively calling themselves, which will definitely cause
>> stack overflow. In fact, they should be implemented the same way as
>> register/unregister by calling iterate_devices().
> How do they recurse into themselves?  All of them do indeed call
> another method of the same name, but that only happens after
> the target ->prepare_ioctl ioctl method redirected them to a different
> device.
>
> If you can reproduce a deadlock please post the exact table setup,
> as this should not happen.

Sorry, the dead lock was caused by incomplete backport of patches. So 
this is not
an issue any more.

>> 2. Multipath device iteration policy is needed.
>> Iteration policy should be added to multipath for PR operations.
>> 	- For unregister, we should iterate on all devices.
> That's what we do.
>
>> 	- For regisetr, we should stop iteration on failure, and followed by a
>> 	 non-stopping unregister operation.
> What do you mean with "non-stopping"?  Currently once register failed
> for a path we then ungerigster all paths, and ignore failures (e.g. due
> to a down path or an not already registered path).

That's exactly what I meant - do not stop on failures while doing 
unregistration.

>
>> 	- For reserve/query/preempt/clear, we should return success once an
>> 	 iteration returns successfully.
> That's what the dm_grab_bdev_for_ioctl path does.

If I understand correctly, dm_grab_bdev_for_ioctl() select a working 
path, and
pr_*() uses that path to do the actually work.

This works for reserve/query/preempt/clear, but it may not work for 
release.

For example, if we have a device (/dev/dm-2) that is connected to two
controllers, and we have one path for each controller, we got the 
following with
"multipath -ll":

3690b11c000283cd00000112557e1b3c2 dm-2 .....
size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
   |- 5:0:0:2 sda 8:112 active ready running
     `- 6:0:0:2 sdb 8:208 active ready running

Suppose we have registered the same keys (because all paths are on the same
host) on all available paths.  And then we reserved the device on path 
/dev/sda.

Now if we want to release the reservation, dm_pr_release() will grab one 
of the
paths according to path selection policy. And the grabbed path _COULD_ be
/dev/sdb. In this case, sd_pr_release() would be called on /dev/sdb 
which will
return 0  since the there's no reservation on this path. So this gives 
up the
illusion that the reservation is released while it's still placed on path
/dev/sda.

So, for dm_pr_release(), we need to use .iterate_devices() - and only 
returns 0
if one of the paths returns 0.

>
>> 3. Lack of query function.
>> Sometimes we need to query the reservation key or registered keys.
> If you have a use case for it feel free to add it.  My current user
> doesn't need it.
>
>> 4. Lack of kernel space API.
>> Currently there's only API for ioctl, which is meant to be called by user space
>> utils. I know we can still call them anyway in kernel space via the help of
>> {set,get}_fs(), but it looks ugly and unnatrual in every aspect.
> The API is currently used from kernel space, that's why I added it.
> If you point me to the code that you plan to submit which wants to use
> it I'd be happy to help you in using it.

I meant the API that callers from above block layer can use - They can 
not call
dm_pr_*() directly. So adding blkdev_pr_ops_{register/reserve/etc}() 
would be
great.

>> 5. Support for multiple targets devices.
>> An md device might have multiple targets. Current implementation only supports
>> single target device.
> That's because it is so far only intended for dm-multipath, which always
> uses as single target.  I'm not against multi-target support, but we'll
> need a detailed explanation of the use case.

OK. Fair enough. That's rather a "good-to-have" than a "must-have".

Thanks again for your reply!

Best regards,
Yibin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
  2016-10-09 13:12 wangyibin
@ 2016-10-09 15:16 ` Christoph Hellwig
  2016-10-13  9:19   ` Yibin Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2016-10-09 15:16 UTC (permalink / raw)
  To: wangyibin; +Cc: Jiangyiwen, dm-devel, Christoph Hellwig, Mike Snitzer

Hi Yibin,

On Sun, Oct 09, 2016 at 01:12:41PM +0000, wangyibin wrote:
> 1. Improper dm_pr_ops implementation.
> The dm_pr_ops functions, except register/unregister, all result in
> infinite loop by recursively calling themselves, which will definitely cause
> stack overflow. In fact, they should be implemented the same way as
> register/unregister by calling iterate_devices().

How do they recurse into themselves?  All of them do indeed call
another method of the same name, but that only happens after
the target ->prepare_ioctl ioctl method redirected them to a different
device.

If you can reproduce a deadlock please post the exact table setup,
as this should not happen.

> 
> 2. Multipath device iteration policy is needed.
> Iteration policy should be added to multipath for PR operations.
> 	- For unregister, we should iterate on all devices.

That's what we do.

> 	- For regisetr, we should stop iteration on failure, and followed by a
> 	 non-stopping unregister operation.

What do you mean with "non-stopping"?  Currently once register failed
for a path we then ungerigster all paths, and ignore failures (e.g. due
to a down path or an not already registered path).

> 	- For reserve/query/preempt/clear, we should return success once an
> 	 iteration returns successfully.

That's what the dm_grab_bdev_for_ioctl path does.

> 3. Lack of query function.
> Sometimes we need to query the reservation key or registered keys.

If you have a use case for it feel free to add it.  My current user
doesn't need it.

> 4. Lack of kernel space API.
> Currently there's only API for ioctl, which is meant to be called by user space
> utils. I know we can still call them anyway in kernel space via the help of
> {set,get}_fs(), but it looks ugly and unnatrual in every aspect.

The API is currently used from kernel space, that's why I added it.
If you point me to the code that you plan to submit which wants to use
it I'd be happy to help you in using it.

> 5. Support for multiple targets devices.
> An md device might have multiple targets. Current implementation only supports
> single target device.

That's because it is so far only intended for dm-multipath, which always
uses as single target.  I'm not against multi-target support, but we'll
need a detailed explanation of the use case.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment
@ 2016-10-09 13:12 wangyibin
  2016-10-09 15:16 ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: wangyibin @ 2016-10-09 13:12 UTC (permalink / raw)
  To: Christoph Hellwig, Jiangyiwen; +Cc: dm-devel, Mike Snitzer

Hello Christoph,

There are several issues with current PR implementation:

1. Improper dm_pr_ops implementation.
The dm_pr_ops functions, except register/unregister, all result in
infinite loop by recursively calling themselves, which will definitely cause
stack overflow. In fact, they should be implemented the same way as
register/unregister by calling iterate_devices().

2. Multipath device iteration policy is needed.
Iteration policy should be added to multipath for PR operations.
	- For unregister, we should iterate on all devices.
	- For regisetr, we should stop iteration on failure, and followed by a
	 non-stopping unregister operation.
	- For reserve/query/preempt/clear, we should return success once an
	 iteration returns successfully.
	- For release, we should stop iteraton on failure.

3. Lack of query function.
Sometimes we need to query the reservation key or registered keys.

4. Lack of kernel space API.
Currently there's only API for ioctl, which is meant to be called by user space
utils. I know we can still call them anyway in kernel space via the help of
{set,get}_fs(), but it looks ugly and unnatrual in every aspect.

5. Support for multiple targets devices.
An md device might have multiple targets. Current implementation only supports
single target device.

Please comment on the issues I have written above. We have cooked some patches
for evaluation purposes. And if anyone is interested in the patches, we can post 
them in the mailing list for review.

Thanks,
	Yibin

On Sep 27, 2016 at 00:03, Christoph Hellwig Wrote:
>On Thu, Sep 22, 2016 at 04:14:50PM +0800, jiangyiwen wrote:
>> I briefly reviewed the PR API. If I understand correctly, a bunch of 
>> PR dedicated operations (pr_ops) are defined in 
>> block_device_operations, which includes register, reserve, release, 
>> preempt and clear operations. But among these operations, only 
>> register() calls dm_call_pr(), which in turns iterates over the device for each path and does the registeration.
>
>Register and unregister in fact, but they both multiple throught the same method.
>
>> My point is, we also need to do this in release operation, so that we 
>> can eliminate the possibility of trying to release the reservation on 
>> the wrong controller and get a fake success.
>
>dm_grab_bdev_for_ioctl ensures that we have a working path.  At least that's the theory and my testing confirms it.  If you know a case where the release doesn't work using this path we'll need to fix.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-10-14 14:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-21  9:14 dm-mpath: Handling SCSI-3 PR RELEASE in multi-controller environment jiangyiwen
2016-09-21 15:04 ` Mike Snitzer
2016-09-21 17:14   ` Christoph Hellwig
2016-09-22  8:14   ` jiangyiwen
2016-09-26 16:03     ` Christoph Hellwig
2016-10-09 13:12 wangyibin
2016-10-09 15:16 ` Christoph Hellwig
2016-10-13  9:19   ` Yibin Wang
2016-10-13 11:16     ` Yibin Wang
2016-10-14 13:48       ` Christoph Hellwig
2016-10-14 14:51     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.