* Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
@ 2020-12-22 17:24 ` Hannes Reinecke
0 siblings, 0 replies; 12+ messages in thread
From: Hannes Reinecke @ 2020-12-22 17:24 UTC (permalink / raw)
To: Mike Snitzer, Christoph Hellwig, linux-block, dm-devel
Cc: Jens Axboe, Stephen Rothwell, Linux Next Mailing List,
Linux Kernel Mailing List, Alasdair G Kergon
On 12/22/20 3:53 PM, Mike Snitzer wrote:
> [added linux-block and dm-devel, if someone replies to this email to
> continue "proper discussion" _please_ at least drop sfr and linux-next
> from Cc]
>
> On Tue, Dec 22 2020 at 8:15am -0500,
> Christoph Hellwig <hch@lst.de> wrote:
>
>> Mike, Hannes,
>>
>> I think this patch is rather harmful. Why does device mapper even
>> mix file system path with a dev_t and all the other weird forms
>> parsed by name_to_dev_t, which was supposed to be be for the early
>> init code where no file system is available.
>
> OK, I'll need to revisit (unless someone beats me to it) because this
> could've easily been a blind-spot for me when the dm-init code went in.
> Any dm-init specific enabling interface shouldn't be used by more
> traditional DM interfaces. So Hannes' change might be treating symptom
> rather than the core problem (which would be better treated by factoring
> out dm-init requirements for a name_to_dev_t()-like interface?).
>
> DM has supported passing maj:min and blockdev names on DM table lines
> forever... so we'll need to be very specific about where/why things
> regressed.
>
Ok. The problem from my perspective is that device-mapper needs to
a) ensure that the arbitrary string passed in with the table definition
refers to a valid block device
and
b) the block device can be opened with O_EXCL, so that device-mapper can
then use it.
Originally (ie prior to commit 644bda6f3460) dm_get_device() just
converted the string to a 'dev_t' representation, and then the block
device itself was checked and opened in dm_get_table_device().
'lookup_bdev' was just being used to convert the path if the string was
not in the canonical major:minor format, as then it was assumed that it
referred to a block device node, and then lookup_bdev kinda makes sense.
However, lookup_bdev() now always recurses into the filesystem, causing
multipath to stall in an all-paths-down scenario.
So, the real issue is the table definiton; as it also accepts a device
to be specified by the block device _node_ name, we need to have a way
of converting that into a dev_t.
If lookup_bdev() is the wrong interface for that, by all means, please,
do tell me. I'd be happy to draft up a patch.
Alternatively, if Mike says that only major:minor is the valid format
for a table definition we can kill that code completely. But clearly _I_
cannot make the call here.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
2020-12-22 17:24 ` [dm-devel] " Hannes Reinecke
@ 2020-12-22 21:06 ` Alasdair G Kergon
-1 siblings, 0 replies; 12+ messages in thread
From: Alasdair G Kergon @ 2020-12-22 21:06 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Mike Snitzer, Christoph Hellwig, linux-block, dm-devel,
Jens Axboe, Linux Kernel Mailing List, Alasdair G Kergon
On Tue, Dec 22, 2020 at 06:24:09PM +0100, Hannes Reinecke wrote:
> However, lookup_bdev() now always recurses into the filesystem, causing
> multipath to stall in an all-paths-down scenario.
I have not read the background about whatever the new problem is - I'm
jumping in cold seeing this message - but from the very beginning of
device-mapper we have strongly recommended that userspace supplies the
block device in the form MAJOR:MINOR and all our own tools do that. We
guarantee not to deadlock in these places when this is done.
We also accept the device in the form of a path name as we know there
are times when this is safe and convenient, but then we offer no
guarantees - we place the responsibility upon userspace only to do this
when it knows it is safe to do so i.e. no race and no deadlock.
Alasdair
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
@ 2020-12-22 21:06 ` Alasdair G Kergon
0 siblings, 0 replies; 12+ messages in thread
From: Alasdair G Kergon @ 2020-12-22 21:06 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Jens Axboe, Mike Snitzer, Linux Kernel Mailing List, linux-block,
dm-devel, Christoph Hellwig, Alasdair G Kergon
On Tue, Dec 22, 2020 at 06:24:09PM +0100, Hannes Reinecke wrote:
> However, lookup_bdev() now always recurses into the filesystem, causing
> multipath to stall in an all-paths-down scenario.
I have not read the background about whatever the new problem is - I'm
jumping in cold seeing this message - but from the very beginning of
device-mapper we have strongly recommended that userspace supplies the
block device in the form MAJOR:MINOR and all our own tools do that. We
guarantee not to deadlock in these places when this is done.
We also accept the device in the form of a path name as we know there
are times when this is safe and convenient, but then we offer no
guarantees - we place the responsibility upon userspace only to do this
when it knows it is safe to do so i.e. no race and no deadlock.
Alasdair
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
2020-12-22 21:06 ` Alasdair G Kergon
@ 2020-12-23 8:06 ` Christoph Hellwig
-1 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-12-23 8:06 UTC (permalink / raw)
To: Hannes Reinecke, Mike Snitzer, Christoph Hellwig, linux-block,
dm-devel, Jens Axboe, Linux Kernel Mailing List,
Alasdair G Kergon
On Tue, Dec 22, 2020 at 09:06:04PM +0000, Alasdair G Kergon wrote:
> I have not read the background about whatever the new problem is - I'm
> jumping in cold seeing this message - but from the very beginning of
> device-mapper we have strongly recommended that userspace supplies the
> block device in the form MAJOR:MINOR and all our own tools do that. We
> guarantee not to deadlock in these places when this is done.
>
> We also accept the device in the form of a path name as we know there
> are times when this is safe and convenient, but then we offer no
> guarantees - we place the responsibility upon userspace only to do this
> when it knows it is safe to do so i.e. no race and no deadlock.
644bda6f346038bce7ad3ed48f7044c10dde6d47 changes that by accepting all
kinds of weirdo formats :(
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
@ 2020-12-23 8:06 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-12-23 8:06 UTC (permalink / raw)
To: Hannes Reinecke, Mike Snitzer, Christoph Hellwig, linux-block,
dm-devel, Jens Axboe, Linux Kernel Mailing List,
Alasdair G Kergon
On Tue, Dec 22, 2020 at 09:06:04PM +0000, Alasdair G Kergon wrote:
> I have not read the background about whatever the new problem is - I'm
> jumping in cold seeing this message - but from the very beginning of
> device-mapper we have strongly recommended that userspace supplies the
> block device in the form MAJOR:MINOR and all our own tools do that. We
> guarantee not to deadlock in these places when this is done.
>
> We also accept the device in the form of a path name as we know there
> are times when this is safe and convenient, but then we offer no
> guarantees - we place the responsibility upon userspace only to do this
> when it knows it is safe to do so i.e. no race and no deadlock.
644bda6f346038bce7ad3ed48f7044c10dde6d47 changes that by accepting all
kinds of weirdo formats :(
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
2020-12-22 17:24 ` [dm-devel] " Hannes Reinecke
@ 2020-12-23 8:05 ` Christoph Hellwig
-1 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-12-23 8:05 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Mike Snitzer, Christoph Hellwig, linux-block, dm-devel,
Stephen Rothwell, Alasdair G Kergon, Jens Axboe,
Linux Kernel Mailing List, Linux Next Mailing List
On Tue, Dec 22, 2020 at 06:24:09PM +0100, Hannes Reinecke wrote:
> Ok. The problem from my perspective is that device-mapper needs to
> a) ensure that the arbitrary string passed in with the table definition
> refers to a valid block device
> and
> b) the block device can be opened with O_EXCL, so that device-mapper can
> then use it.
>
> Originally (ie prior to commit 644bda6f3460) dm_get_device() just converted
> the string to a 'dev_t' representation, and then the block device itself
> was checked and opened in dm_get_table_device().
> 'lookup_bdev' was just being used to convert the path if the string was not
> in the canonical major:minor format, as then it was assumed that it
> referred to a block device node, and then lookup_bdev kinda makes sense.
Yes, 644bda6f3460 is the cause of all evil, as it uses an API in a place
where it should not be used. It and the prep patch
(e6e20a7a5f3f49bfee518d5c6849107398d83912) which did the grave mistake
of making name_to_dev_t available outside of the early init code really
need to be reverted.
> However, lookup_bdev() now always recurses into the filesystem, causing
> multipath to stall in an all-paths-down scenario.
lookup_bdev always did a file system lookup, and always only accepted
a valid name in the file system.
> Alternatively, if Mike says that only major:minor is the valid format for a
> table definition we can kill that code completely. But clearly _I_ cannot
> make the call here.
Before 644bda6f3460 the table definitions only accepted a valid name in
the file system. Which is the proper interface.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
@ 2020-12-23 8:05 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-12-23 8:05 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Jens Axboe, Stephen Rothwell, Mike Snitzer,
Linux Kernel Mailing List, linux-block, dm-devel,
Linux Next Mailing List, Christoph Hellwig, Alasdair G Kergon
On Tue, Dec 22, 2020 at 06:24:09PM +0100, Hannes Reinecke wrote:
> Ok. The problem from my perspective is that device-mapper needs to
> a) ensure that the arbitrary string passed in with the table definition
> refers to a valid block device
> and
> b) the block device can be opened with O_EXCL, so that device-mapper can
> then use it.
>
> Originally (ie prior to commit 644bda6f3460) dm_get_device() just converted
> the string to a 'dev_t' representation, and then the block device itself
> was checked and opened in dm_get_table_device().
> 'lookup_bdev' was just being used to convert the path if the string was not
> in the canonical major:minor format, as then it was assumed that it
> referred to a block device node, and then lookup_bdev kinda makes sense.
Yes, 644bda6f3460 is the cause of all evil, as it uses an API in a place
where it should not be used. It and the prep patch
(e6e20a7a5f3f49bfee518d5c6849107398d83912) which did the grave mistake
of making name_to_dev_t available outside of the early init code really
need to be reverted.
> However, lookup_bdev() now always recurses into the filesystem, causing
> multipath to stall in an all-paths-down scenario.
lookup_bdev always did a file system lookup, and always only accepted
a valid name in the file system.
> Alternatively, if Mike says that only major:minor is the valid format for a
> table definition we can kill that code completely. But clearly _I_ cannot
> make the call here.
Before 644bda6f3460 the table definitions only accepted a valid name in
the file system. Which is the proper interface.
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 12+ messages in thread