All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] [PATCH] libmultipath: fix max_sectors_kb on adding path
@ 2023-08-23 16:24 Etienne Aujames
  2023-08-23 19:51 ` Martin Wilck
  2023-08-29 19:33 ` [dm-devel] " Martin Wilck
  0 siblings, 2 replies; 7+ messages in thread
From: Etienne Aujames @ 2023-08-23 16:24 UTC (permalink / raw)
  To: dm-devel; +Cc: mwilck

From: Etienne AUJAMES <eaujames@ddn.com>

A user can change the value of max_sectors_kb on the multipath device
and its path devices.
So when a path is deleted and then re-added, its value will not be the
same as the multipath device. In that case the IOs could be mis-sized.

On reload, this patch re-apply max_sectors_kb value of the multipath
device on its path devices.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
---
 libmultipath/configure.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index 4a1c28bb..639c0905 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -593,11 +593,12 @@ sysfs_set_max_sectors_kb(struct multipath *mpp,
int is_reload)
 	ssize_t len;
 	int i, j, ret, err = 0;
 	struct udev_device *udd;
-	int max_sectors_kb;
+	int max_sectors_kb = mpp->max_sectors_kb;
 
-	if (mpp->max_sectors_kb == MAX_SECTORS_KB_UNDEF)
+	/* by default, do not initialize max_sectors_kb on the device
*/
+	if (max_sectors_kb == MAX_SECTORS_KB_UNDEF && !is_reload)
 		return 0;
-	max_sectors_kb = mpp->max_sectors_kb;
+	/* on reload, re-apply the user tuning on all the path devices
*/
 	if (is_reload) {
 		if (!has_dm_info(mpp) &&
 		    dm_get_info(mpp->alias, &mpp->dmi) != 0) {
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [dm-devel] [PATCH] libmultipath: fix max_sectors_kb on adding path
  2023-08-23 16:24 [dm-devel] [PATCH] libmultipath: fix max_sectors_kb on adding path Etienne Aujames
@ 2023-08-23 19:51 ` Martin Wilck
  2024-04-11 17:41   ` Problem with " Martin Wilck
  2023-08-29 19:33 ` [dm-devel] " Martin Wilck
  1 sibling, 1 reply; 7+ messages in thread
From: Martin Wilck @ 2023-08-23 19:51 UTC (permalink / raw)
  To: Etienne Aujames, dm-devel

On Wed, 2023-08-23 at 16:24 +0000, Etienne Aujames wrote:
> From: Etienne AUJAMES <eaujames@ddn.com>
> 
> A user can change the value of max_sectors_kb on the multipath device
> and its path devices.
> So when a path is deleted and then re-added, its value will not be
> the
> same as the multipath device. In that case the IOs could be mis-
> sized.
> 
> On reload, this patch re-apply max_sectors_kb value of the multipath
> device on its path devices.
> 
> Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>

Looks good to me.
Reviewed-by: Martin Wilck <mwilck@suse.com>



> ---
>  libmultipath/configure.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/libmultipath/configure.c b/libmultipath/configure.c
> index 4a1c28bb..639c0905 100644
> --- a/libmultipath/configure.c
> +++ b/libmultipath/configure.c
> @@ -593,11 +593,12 @@ sysfs_set_max_sectors_kb(struct multipath *mpp,
> int is_reload)
>         ssize_t len;
>         int i, j, ret, err = 0;
>         struct udev_device *udd;
> -       int max_sectors_kb;
> +       int max_sectors_kb = mpp->max_sectors_kb;
>  
> -       if (mpp->max_sectors_kb == MAX_SECTORS_KB_UNDEF)
> +       /* by default, do not initialize max_sectors_kb on the device
> */
> +       if (max_sectors_kb == MAX_SECTORS_KB_UNDEF && !is_reload)
>                 return 0;
> -       max_sectors_kb = mpp->max_sectors_kb;
> +       /* on reload, re-apply the user tuning on all the path
> devices
> */
>         if (is_reload) {
>                 if (!has_dm_info(mpp) &&
>                     dm_get_info(mpp->alias, &mpp->dmi) != 0) {

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dm-devel] [PATCH] libmultipath: fix max_sectors_kb on adding path
  2023-08-23 16:24 [dm-devel] [PATCH] libmultipath: fix max_sectors_kb on adding path Etienne Aujames
  2023-08-23 19:51 ` Martin Wilck
@ 2023-08-29 19:33 ` Martin Wilck
  1 sibling, 0 replies; 7+ messages in thread
From: Martin Wilck @ 2023-08-29 19:33 UTC (permalink / raw)
  To: Etienne Aujames, dm-devel

On Wed, 2023-08-23 at 16:24 +0000, Etienne Aujames wrote:
> From: Etienne AUJAMES <eaujames@ddn.com>
> 
> A user can change the value of max_sectors_kb on the multipath device
> and its path devices.
> So when a path is deleted and then re-added, its value will not be
> the
> same as the multipath device. In that case the IOs could be mis-
> sized.
> 
> On reload, this patch re-apply max_sectors_kb value of the multipath
> device on its path devices.
> 
> Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>

Next time, please send your patch with an email software that doesn't
corrupt the patch content by inserting line breaks.

Regards
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Problem with [PATCH] libmultipath: fix max_sectors_kb on adding path
  2023-08-23 19:51 ` Martin Wilck
@ 2024-04-11 17:41   ` Martin Wilck
  2024-04-12  6:06     ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Wilck @ 2024-04-11 17:41 UTC (permalink / raw)
  To: bmarzins, Etienne Aujames
  Cc: Hannes Reinecke, Christophe Varoqui, nilesh.javali, DM-DEVEL ML

Hello Ben, hello Etienne, 

On Wed, 2023-08-23 at 21:51 +0200, Martin Wilck wrote:
> On Wed, 2023-08-23 at 16:24 +0000, Etienne Aujames wrote:
> > From: Etienne AUJAMES <eaujames@ddn.com>
> > 
> > A user can change the value of max_sectors_kb on the multipath
> > device
> > and its path devices.
> > So when a path is deleted and then re-added, its value will not be
> > the
> > same as the multipath device. In that case the IOs could be mis-
> > sized.
> > 
> > On reload, this patch re-apply max_sectors_kb value of the
> > multipath
> > device on its path devices.
> > 
> > Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
> 
> Looks good to me.
> Reviewed-by: Martin Wilck <mwilck@suse.com>

We've seen issues in our beta-testing with this patch.

Consider a device with an odd max_sectors number (in 512 byte blocks).
If you look through the kernel sources, you'll see that there are quite
a few devices with odd max_sectors defaults. In our testing, we used
bnx2i (max_sectors = 127). 

When a multipath map is set up, the kernel will call
blk_stack_limits(), making sure that the map has a max_sectors value
that doesn't exceed the max of all path devices.
The map will have max_sectors = 127 in the bnx2i case.

When a device is added, multipathd will read max_sectors_kb from sysfs,
and obtain (127 >> 1) == 63. It will then write this value to the path
devices, causing the kernel-internal max_sectors value to be set to 
(63 << 1) = 126. But the map's max_sectors is still 127. The path
devices have now a lower limit than the map. This leads to errors like
this:

[  254.652934][  T820] blk_insert_cloned_request: over max size limit. (127 > 126)

If the map was reloaded after that, the kernel would fix this
discrepancy autmatically through the blk_stack_limits() algorithm. But
as long as this doesn't happen, the blk_insert_cloned_request() error
will repeat and possibly cause the system to hang, and even if a reload
happens, if the map is currently active, there will be a race window
during which IO may fail.

I'm working on a fix for this. My first idea was just to revert
Etienne's patch bbb77f3 ("libmultipath: fix max_sectors_kb on adding
path"). The code that exhibits the above issue is actually from 8fd4868
("libmultipath: don't set max_sectors_kb on reloads").

My thoughts so far:

1) We must strictly avoid max_sectors(path) < max_sectors(map).
2) max_sectors(path) > max_sectors(map) doesn't hurt.
3) We currently have no way to tell if the kernel value for max_sectors
of a path device is even or odd. 
4) As soon as we or someone else writes anything to max_sectors_kb, the
kernel internal value will be even. This applies to both path and map
devices.
5) The max_sectors value of a map device will be equal to the lowest
max_sectors of all path devices after each reload.
6) Therefore, if max_sectors_kb is set in multipath.conf and we have
set it at least once in a reload, we're on the safe side wrt the
scenario above. This is why we see no issue with 8fd4868
("libmultipath: don't set max_sectors_kb on reloads"), even though that
commit introduced the code that rounds the max_sectors value down.
7) Arguably, the kernel should be patched to provide the raw
max_sectors value to user space, but that will take time.
8) If users change max_sectors_kb, either for path devices or for map
devices or other stacked devices, after the mapping has been set up,
they are asking for trouble. We can't avoid errors resulting from such
user actions, and I believe it's futile to try. This is not a
multipath-specific problem, it applies to any stacked block device.

AFAIU, bbb77f3 ("libmultipath: fix max_sectors_kb on adding path") just
"fixes" a scenario where a user had manipulated max_sectors_kb, which
is a situation we can't correctly deal with anyway (see (8)). So
perhaps just reverting bbb77f3 is enough.

OTOH, 8fd4868 is somewhat inconsistent. If it's correct to set
max_sectors_kb when path devices are added to make the value consistent
with the map, it should also be correct if max_sectors_kb was not
configured. This is what bbb77f3 tried to fix.

In light of (2) above, I think we shouldn't change a path devices'
max_sectors_kb unless it's smaller than the value of the map.
If we do it this way, the scenario above couldn't occur.

Thoughts / comments?

Martin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem with [PATCH] libmultipath: fix max_sectors_kb on adding path
  2024-04-11 17:41   ` Problem with " Martin Wilck
@ 2024-04-12  6:06     ` Hannes Reinecke
  2024-04-12  7:21       ` Martin Wilck
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2024-04-12  6:06 UTC (permalink / raw)
  To: Martin Wilck, bmarzins, Etienne Aujames
  Cc: Christophe Varoqui, nilesh.javali, DM-DEVEL ML

On 4/11/24 19:41, Martin Wilck wrote:
> Hello Ben, hello Etienne,
> 
> On Wed, 2023-08-23 at 21:51 +0200, Martin Wilck wrote:
>> On Wed, 2023-08-23 at 16:24 +0000, Etienne Aujames wrote:
>>> From: Etienne AUJAMES <eaujames@ddn.com>
>>>
>>> A user can change the value of max_sectors_kb on the multipath
>>> device
>>> and its path devices.
>>> So when a path is deleted and then re-added, its value will not be
>>> the
>>> same as the multipath device. In that case the IOs could be mis-
>>> sized.
>>>
>>> On reload, this patch re-apply max_sectors_kb value of the
>>> multipath
>>> device on its path devices.
>>>
>>> Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
>>
>> Looks good to me.
>> Reviewed-by: Martin Wilck <mwilck@suse.com>
> 
> We've seen issues in our beta-testing with this patch.
> 
> Consider a device with an odd max_sectors number (in 512 byte blocks).
> If you look through the kernel sources, you'll see that there are quite
> a few devices with odd max_sectors defaults. In our testing, we used
> bnx2i (max_sectors = 127).
> 
> When a multipath map is set up, the kernel will call
> blk_stack_limits(), making sure that the map has a max_sectors value
> that doesn't exceed the max of all path devices.
> The map will have max_sectors = 127 in the bnx2i case.
> 
> When a device is added, multipathd will read max_sectors_kb from sysfs,
> and obtain (127 >> 1) == 63. It will then write this value to the path
> devices, causing the kernel-internal max_sectors value to be set to
> (63 << 1) = 126. But the map's max_sectors is still 127. The path
> devices have now a lower limit than the map. This leads to errors like
> this:
> 
> [  254.652934][  T820] blk_insert_cloned_request: over max size limit. (127 > 126)
> 
> If the map was reloaded after that, the kernel would fix this
> discrepancy autmatically through the blk_stack_limits() algorithm. But
> as long as this doesn't happen, the blk_insert_cloned_request() error
> will repeat and possibly cause the system to hang, and even if a reload
> happens, if the map is currently active, there will be a race window
> during which IO may fail.
> 
> I'm working on a fix for this. My first idea was just to revert
> Etienne's patch bbb77f3 ("libmultipath: fix max_sectors_kb on adding
> path"). The code that exhibits the above issue is actually from 8fd4868
> ("libmultipath: don't set max_sectors_kb on reloads").
> 
> My thoughts so far:
> 
> 1) We must strictly avoid max_sectors(path) < max_sectors(map).
> 2) max_sectors(path) > max_sectors(map) doesn't hurt.
> 3) We currently have no way to tell if the kernel value for max_sectors
> of a path device is even or odd.
> 4) As soon as we or someone else writes anything to max_sectors_kb, the
> kernel internal value will be even. This applies to both path and map
> devices.
> 5) The max_sectors value of a map device will be equal to the lowest
> max_sectors of all path devices after each reload.
> 6) Therefore, if max_sectors_kb is set in multipath.conf and we have
> set it at least once in a reload, we're on the safe side wrt the
> scenario above. This is why we see no issue with 8fd4868
> ("libmultipath: don't set max_sectors_kb on reloads"), even though that
> commit introduced the code that rounds the max_sectors value down.
> 7) Arguably, the kernel should be patched to provide the raw
> max_sectors value to user space, but that will take time.
> 8) If users change max_sectors_kb, either for path devices or for map
> devices or other stacked devices, after the mapping has been set up,
> they are asking for trouble. We can't avoid errors resulting from such
> user actions, and I believe it's futile to try. This is not a
> multipath-specific problem, it applies to any stacked block device.
> 
> AFAIU, bbb77f3 ("libmultipath: fix max_sectors_kb on adding path") just
> "fixes" a scenario where a user had manipulated max_sectors_kb, which
> is a situation we can't correctly deal with anyway (see (8)). So
> perhaps just reverting bbb77f3 is enough.
> 
> OTOH, 8fd4868 is somewhat inconsistent. If it's correct to set
> max_sectors_kb when path devices are added to make the value consistent
> with the map, it should also be correct if max_sectors_kb was not
> configured. This is what bbb77f3 tried to fix.
> 
> In light of (2) above, I think we shouldn't change a path devices'
> max_sectors_kb unless it's smaller than the value of the map.
> If we do it this way, the scenario above couldn't occur.
> 
We have gone into great pains in the kernel to ensure the queue limits
are sane, and updated correctly. Even for stacking devices.
I sinserely doubt we need this patch from multipath anymore.
Having to adjust max_sectors_kb really should be reserved for
corner-cases where the user has a dodgy hardware which doesn't
report correct limits.
But even that should rather be handled by blacklisting.
Can't we just set max_sectors_kb to readonly in the kernel and
be done with it?

BTW: Brilliant analysis!

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.com                               +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem with [PATCH] libmultipath: fix max_sectors_kb on adding path
  2024-04-12  6:06     ` Hannes Reinecke
@ 2024-04-12  7:21       ` Martin Wilck
  2024-04-12  8:25         ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Wilck @ 2024-04-12  7:21 UTC (permalink / raw)
  To: Hannes Reinecke, bmarzins, Etienne Aujames
  Cc: Christophe Varoqui, nilesh.javali, DM-DEVEL ML

On Fri, 2024-04-12 at 08:06 +0200, Hannes Reinecke wrote:
> > 
> We have gone into great pains in the kernel to ensure the queue
> limits
> are sane, and updated correctly. Even for stacking devices.

This is true, but only for the creation of stacked devices (table
activation, as far as device mapper is concerned). Admins are free to
change max_sectors_kb any time; there's no propagation of changed
settings along the device stack, and no sanity checking in the kernel
prevents them from setting values that will cause I/O errors.

> I sinserely doubt we need this patch from multipath anymore.
> Having to adjust max_sectors_kb really should be reserved for
> corner-cases where the user has a dodgy hardware which doesn't
> report correct limits.

Right. We've seen a couple of cases where decreasing max_sectors_kb
from the default value was the only remedy for weird I/O failures. This
happened with remote storage reporting wrong limits, misbehaving
elements in the fabric, and even with virtualized IO stacks.

> But even that should rather be handled by blacklisting.
> Can't we just set max_sectors_kb to readonly in the kernel and
> be done with it?

Personally, I think this goes a bit too far. I believe the kernel
should disallow changing (more specifically, decreasing) the
max_sectors_kb sysfs attribute for block devices that are either in use
(bd_openers > 0) or held by other block devices (bd_holder != NULL).
That would eliminate a large portion of bad cases, AFAICS. Admins could
still increase max_sectors_kb at the top of the device stack, but that
would arguably count as shooting oneself into the foot.

Errors in valid configurations are possible, even without changing
max_sectors_kb in sysfs. Consider a multipath map consisting of devices
with different max_sectors (for example mixed iSCSI/tcp and
iSCSI/bnx2i). If only the paths with large max_sectors are initially
detected, and others are added later, the map's max_sectors will be
decreased while in use, and the change will not be propagated to
stacked block layers above multipath: bummer. The only way to avoid
this in general is implementing limit propagation. I assume that the
implementation of block limit propagation in the kernel would be a
major effort with lots of possible race conditions. It's far easier to
have admins simply impose max_sectors_kb on multipath maps in corner
case scenarios like this.

Regards,
Martin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem with [PATCH] libmultipath: fix max_sectors_kb on adding path
  2024-04-12  7:21       ` Martin Wilck
@ 2024-04-12  8:25         ` Hannes Reinecke
  0 siblings, 0 replies; 7+ messages in thread
From: Hannes Reinecke @ 2024-04-12  8:25 UTC (permalink / raw)
  To: Martin Wilck, bmarzins, Etienne Aujames
  Cc: Christophe Varoqui, nilesh.javali, DM-DEVEL ML

On 4/12/24 09:21, Martin Wilck wrote:
> On Fri, 2024-04-12 at 08:06 +0200, Hannes Reinecke wrote:
>>>
>> We have gone into great pains in the kernel to ensure the queue
>> limits
>> are sane, and updated correctly. Even for stacking devices.
> 
> This is true, but only for the creation of stacked devices (table
> activation, as far as device mapper is concerned). Admins are free to
> change max_sectors_kb any time; there's no propagation of changed
> settings along the device stack, and no sanity checking in the kernel
> prevents them from setting values that will cause I/O errors.
> 
>> I sinserely doubt we need this patch from multipath anymore.
>> Having to adjust max_sectors_kb really should be reserved for
>> corner-cases where the user has a dodgy hardware which doesn't
>> report correct limits.
> 
> Right. We've seen a couple of cases where decreasing max_sectors_kb
> from the default value was the only remedy for weird I/O failures. This
> happened with remote storage reporting wrong limits, misbehaving
> elements in the fabric, and even with virtualized IO stacks.
> 
>> But even that should rather be handled by blacklisting.
>> Can't we just set max_sectors_kb to readonly in the kernel and
>> be done with it?
> 
> Personally, I think this goes a bit too far. I believe the kernel
> should disallow changing (more specifically, decreasing) the
> max_sectors_kb sysfs attribute for block devices that are either in use
> (bd_openers > 0) or held by other block devices (bd_holder != NULL).
> That would eliminate a large portion of bad cases, AFAICS. Admins could
> still increase max_sectors_kb at the top of the device stack, but that
> would arguably count as shooting oneself into the foot.
> 
> Errors in valid configurations are possible, even without changing
> max_sectors_kb in sysfs. Consider a multipath map consisting of devices
> with different max_sectors (for example mixed iSCSI/tcp and
> iSCSI/bnx2i). If only the paths with large max_sectors are initially
> detected, and others are added later, the map's max_sectors will be
> decreased while in use, and the change will not be propagated to
> stacked block layers above multipath: bummer. The only way to avoid
> this in general is implementing limit propagation. I assume that the
> implementation of block limit propagation in the kernel would be a
> major effort with lots of possible race conditions. It's far easier to
> have admins simply impose max_sectors_kb on multipath maps in corner
> case scenarios like this.
> 
Indeed; changing the max_sectors_kb while the queue is live should
be preceeded with a quiesce, to ensure that we're not changing request
limits in flight.

And disabling setting of max_sectors_kb for stacked devices is a good
idea, too.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-12  8:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-23 16:24 [dm-devel] [PATCH] libmultipath: fix max_sectors_kb on adding path Etienne Aujames
2023-08-23 19:51 ` Martin Wilck
2024-04-11 17:41   ` Problem with " Martin Wilck
2024-04-12  6:06     ` Hannes Reinecke
2024-04-12  7:21       ` Martin Wilck
2024-04-12  8:25         ` Hannes Reinecke
2023-08-29 19:33 ` [dm-devel] " Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.