All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi_lib: Align max_sectors to kb
@ 2024-04-18  7:00 Hannes Reinecke
  2024-04-18  7:03 ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Reinecke @ 2024-04-18  7:00 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke,
	Martin Wilck

max_sectors can be modified via sysfs, but only in kb units.
Which leads to a misalignment on stacked devices if the original
max_sector size is an odd number. So align the max_sectors setting
to kb to avoid this issue.

Reported-by: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Hannes Reinecke <hare@kernel.org>
---
 drivers/scsi/scsi_lib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2e28e2360c85..aad2ac1353d1 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1983,7 +1983,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
 		blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
 	}
 
-	blk_queue_max_hw_sectors(q, shost->max_sectors);
+	/* Align to kb to avoid conflicts with Sysfs settings */
+	blk_queue_max_hw_sectors(q, shost->max_sectors & ~0x1);
 	blk_queue_segment_boundary(q, shost->dma_boundary);
 	dma_set_seg_boundary(dev, shost->dma_boundary);
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi_lib: Align max_sectors to kb
  2024-04-18  7:00 [PATCH] scsi_lib: Align max_sectors to kb Hannes Reinecke
@ 2024-04-18  7:03 ` Christoph Hellwig
  2024-04-18  7:42   ` Hannes Reinecke
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-04-18  7:03 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	linux-scsi, Martin Wilck

On Thu, Apr 18, 2024 at 09:00:15AM +0200, Hannes Reinecke wrote:
> max_sectors can be modified via sysfs, but only in kb units.

Yes.

> Which leads to a misalignment on stacked devices if the original
> max_sector size is an odd number.

How?

Note that we really should not stack max_sectors anyway, as it's only
used for splitting in the lower device to start with.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi_lib: Align max_sectors to kb
  2024-04-18  7:03 ` Christoph Hellwig
@ 2024-04-18  7:42   ` Hannes Reinecke
  2024-04-18  9:00     ` Martin Wilck
  2024-04-18 14:51     ` Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: Hannes Reinecke @ 2024-04-18  7:42 UTC (permalink / raw)
  To: Christoph Hellwig, Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi, Martin Wilck

On 4/18/24 09:03, Christoph Hellwig wrote:
> On Thu, Apr 18, 2024 at 09:00:15AM +0200, Hannes Reinecke wrote:
>> max_sectors can be modified via sysfs, but only in kb units.
> 
> Yes.
> 
>> Which leads to a misalignment on stacked devices if the original
>> max_sector size is an odd number.
> 
> How?
> 

That's an issue we've been seeing during testing:
https://lore.kernel.org/dm-devel/7742003e19b5a49398067dc0c59dfa8ddeffc3d7.camel@suse.com/

While this can be fixed in userspace (Martin Wilck provided a patchset
to multipath-tools), what I find irritating is that we will always
display the max_sectors setting in kb, even if the actual value is not
kb aligned.
_And_ we allow to modify that value (again in kb units). Which means 
that we _never_ are able to reset it to its original value.

I would vastly prefer to have the actual values displayed in sysfs,
ie either have a max_sectors_kb value for the block limits or have an
max_sectors setting in sysfs, but we really should avoid this 'shift by 
1' when displaying the value.

> Note that we really should not stack max_sectors anyway, as it's only
> used for splitting in the lower device to start with.

If that's the case, why don't we inhibit the modification for 
max_sectors on the lower devices?

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi_lib: Align max_sectors to kb
  2024-04-18  7:42   ` Hannes Reinecke
@ 2024-04-18  9:00     ` Martin Wilck
  2024-04-18 14:51     ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Martin Wilck @ 2024-04-18  9:00 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig, Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi

On Thu, 2024-04-18 at 09:42 +0200, Hannes Reinecke wrote:
> On 4/18/24 09:03, Christoph Hellwig wrote:
> > On Thu, Apr 18, 2024 at 09:00:15AM +0200, Hannes Reinecke wrote:
> > > max_sectors can be modified via sysfs, but only in kb units.
> > 
> > Yes.
> > 
> > > Which leads to a misalignment on stacked devices if the original
> > > max_sector size is an odd number.
> > 
> > How?
> > 
> 
> That's an issue we've been seeing during testing:
> https://lore.kernel.org/dm-devel/7742003e19b5a49398067dc0c59dfa8ddeffc3d7.camel@suse.com/
> 
> While this can be fixed in userspace (Martin Wilck provided a
> patchset
> to multipath-tools), what I find irritating is that we will always
> display the max_sectors setting in kb, even if the actual value is
> not
> kb aligned.
> _And_ we allow to modify that value (again in kb units). Which means 
> that we _never_ are able to reset it to its original value.

User space has no way to determine whether the actual max_sectors value
in the kernel is even or odd. By reading max_sectors_kb and writing it
back, one may actually change the kernel-internal value by rounding it
down to the next even number. This can cause breakage if the device
being changed is a multipath path device.

Wrt Hannes' patch: It would fix the issue on the kernel side, but user
space would still have no means to determine whether this patch applied
or not, except for checking the kernel version, which is unreliable.
For user space, it would be more helpful to add a "max_sectors" sysfs
attribute that exposes the actual value in blocks.

> > Note that we really should not stack max_sectors anyway, as it's
> > only
> > used for splitting in the lower device to start with.
> 
> If that's the case, why don't we inhibit the modification for 
> max_sectors on the lower devices?

I vote for allowing changes to max_sectors_kb only for devices that
don't have anything stacked on top, even though my late testing
indicates that it's only a real problem with request-based dm aka
multipath. After all, max_sectors only needs to be manipulated in rare
situations, and would be generally recommended to do this in an udev
rule early during boot.

Regards,
Martin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi_lib: Align max_sectors to kb
  2024-04-18  7:42   ` Hannes Reinecke
  2024-04-18  9:00     ` Martin Wilck
@ 2024-04-18 14:51     ` Christoph Hellwig
  2024-04-18 16:46       ` Martin Wilck
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-04-18 14:51 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Hannes Reinecke, Martin K. Petersen,
	James Bottomley, linux-scsi, Martin Wilck

On Thu, Apr 18, 2024 at 09:42:06AM +0200, Hannes Reinecke wrote:
> While this can be fixed in userspace (Martin Wilck provided a patchset
> to multipath-tools), what I find irritating is that we will always
> display the max_sectors setting in kb, even if the actual value is not
> kb aligned.

The problem you are running into it exactly a problem of pointlessly
inheriting the value when we should not.

Other secondary issues are that we:

 a) should allow the modification in the granularity that we actually
    internally use, that is sectors
 b) apparently some drivers still use the silly BLK_SAFE_MAX_SECTORS
    limits that is too low for just about any modern hardwre.

>> Note that we really should not stack max_sectors anyway, as it's only
>> used for splitting in the lower device to start with.
>
> If that's the case, why don't we inhibit the modification for max_sectors 
> on the lower devices?

Why would we?  It makes absolutly no sense to inherit these limits,
the lower device will split anyway which is very much the point of
the immutable bio_vec work.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi_lib: Align max_sectors to kb
  2024-04-18 14:51     ` Christoph Hellwig
@ 2024-04-18 16:46       ` Martin Wilck
  2024-04-19  6:20         ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Wilck @ 2024-04-18 16:46 UTC (permalink / raw)
  To: Christoph Hellwig, Hannes Reinecke
  Cc: Hannes Reinecke, Martin K. Petersen, James Bottomley, linux-scsi

On Thu, 2024-04-18 at 16:51 +0200, Christoph Hellwig wrote:
> 
> > > Note that we really should not stack max_sectors anyway, as it's
> > > only
> > > used for splitting in the lower device to start with.
> > 
> > If that's the case, why don't we inhibit the modification for
> > max_sectors 
> > on the lower devices?
> 
> Why would we?  It makes absolutly no sense to inherit these limits,
> the lower device will split anyway which is very much the point of
> the immutable bio_vec work.
> 

Sorry, I don't follow. With (request-based) dm-multipath on top of
SCSI, we hit the "over max size limit" condition in
blk_insert_cloned_request() [1], which will cause IO to fail at the dm
level. So at least in this configuration, it's crucial that the upper
device inherit the lower device's limits.

Regards,
Martin

[1] https://elixir.bootlin.com/linux/latest/source/block/blk-mq.c#L3048


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi_lib: Align max_sectors to kb
  2024-04-18 16:46       ` Martin Wilck
@ 2024-04-19  6:20         ` Christoph Hellwig
  2024-04-19  8:27           ` Martin Wilck
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-04-19  6:20 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Christoph Hellwig, Hannes Reinecke, Hannes Reinecke,
	Martin K. Petersen, James Bottomley, linux-scsi

On Thu, Apr 18, 2024 at 06:46:06PM +0200, Martin Wilck wrote:
> > Why would we?  It makes absolutly no sense to inherit these limits,
> > the lower device will split anyway which is very much the point of
> > the immutable bio_vec work.
> > 
> 
> Sorry, I don't follow. With (request-based) dm-multipath on top of
> SCSI, we hit the "over max size limit" condition in
> blk_insert_cloned_request() [1], which will cause IO to fail at the dm
> level. So at least in this configuration, it's crucial that the upper
> device inherit the lower device's limits.

Oh, indeed.  Request based multipath is different from everyone else.
I really wish we could spend the effort to convert it to bio based
and remove this special case..


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi_lib: Align max_sectors to kb
  2024-04-19  6:20         ` Christoph Hellwig
@ 2024-04-19  8:27           ` Martin Wilck
  2024-04-22  7:28             ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Wilck @ 2024-04-19  8:27 UTC (permalink / raw)
  To: Christoph Hellwig, Benjamin Marzinski, Hannes Reinecke, Mike Snitzer
  Cc: Hannes Reinecke, Martin K. Petersen, James Bottomley, linux-scsi,
	dm-devel

(added Mike, Ben, and dm-devel)

On Fri, 2024-04-19 at 08:20 +0200, Christoph Hellwig wrote:
> On Thu, Apr 18, 2024 at 06:46:06PM +0200, Martin Wilck wrote:
> > > Why would we?  It makes absolutly no sense to inherit these
> > > limits,
> > > the lower device will split anyway which is very much the point
> > > of
> > > the immutable bio_vec work.
> > > 
> > 
> > Sorry, I don't follow. With (request-based) dm-multipath on top of
> > SCSI, we hit the "over max size limit" condition in
> > blk_insert_cloned_request() [1], which will cause IO to fail at the
> > dm
> > level. So at least in this configuration, it's crucial that the
> > upper
> > device inherit the lower device's limits.
> 
> Oh, indeed.  Request based multipath is different from everyone else.
> I really wish we could spend the effort to convert it to bio based
> and remove this special case..
> 

Well, it's just a matter of setting "queue_mode" in the multipath
configuration. But request-based has been the default since kernel v3.0
(f40c67f0f7e2 ("dm mpath: change to be request based")). While we
got bio-based dm-multipath back in v4.8 (76e33fe4e2c4 ("dm mpath:
reinstate bio-based support")), making it the default or even
removing request-based dm (in order to free the block layer from limit-
stacking requirements) will obviously require a broad discussion.

Mike has pointed out in 76e33fe4e2c4 that many of the original reasons
for introducing request-based dm-multipath have been obsoleted by
faster hardware and improvements in the generic block layer. 

I am open for discussion on this subject, but with my distribution 
maintainer hat on, I don't expect the default being changed for end
customers soon. Actually, with NVMe rising, I wonder if a major
technology switch for dm-multipath (which is effectively SCSI multipath
these days) makes sense at this point in time.

Thanks,
Martin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi_lib: Align max_sectors to kb
  2024-04-19  8:27           ` Martin Wilck
@ 2024-04-22  7:28             ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-04-22  7:28 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Christoph Hellwig, Benjamin Marzinski, Hannes Reinecke,
	Mike Snitzer, Hannes Reinecke, Martin K. Petersen,
	James Bottomley, linux-scsi, dm-devel

On Fri, Apr 19, 2024 at 10:27:53AM +0200, Martin Wilck wrote:
> Well, it's just a matter of setting "queue_mode" in the multipath
> configuration.

Yes, and no.  We'd probably at very last want a blk_plug callback to
allow making decisions for entire large I/Os instead of on a per-bio
basis.

> Mike has pointed out in 76e33fe4e2c4 that many of the original reasons
> for introducing request-based dm-multipath have been obsoleted by
> faster hardware and improvements in the generic block layer. 

Yes.

> I am open for discussion on this subject, but with my distribution 
> maintainer hat on, I don't expect the default being changed for end
> customers soon. Actually, with NVMe rising, I wonder if a major
> technology switch for dm-multipath (which is effectively SCSI multipath
> these days) makes sense at this point in time.

Well, it'll take ages until anything upstream gets into the enterprise
distros, which array users tend to use anyway.  But I'd really love
to get the ball rolling upstream rather sooner than later even if there
is no need for immediate action.  The reason is that request based
dm multipath causes a lot of special casing in the block core and
dm, and that has become a significant maintenance burden.

So if you have a chance to test the current bio based path, or help
with testing a plug callback (I can try to write a patch for that,
but I don't really have a good test setup) we can start to see where
we are in practice in terms of still needing the request based stacking,
and if there is a feasible path to move away from it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-04-22  7:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-18  7:00 [PATCH] scsi_lib: Align max_sectors to kb Hannes Reinecke
2024-04-18  7:03 ` Christoph Hellwig
2024-04-18  7:42   ` Hannes Reinecke
2024-04-18  9:00     ` Martin Wilck
2024-04-18 14:51     ` Christoph Hellwig
2024-04-18 16:46       ` Martin Wilck
2024-04-19  6:20         ` Christoph Hellwig
2024-04-19  8:27           ` Martin Wilck
2024-04-22  7:28             ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.