All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-05-02 16:45 Walker, Benjamin
  0 siblings, 0 replies; 17+ messages in thread
From: Walker, Benjamin @ 2018-05-02 16:45 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]

On Wed, 2018-05-02 at 13:40 +0000, Andrey Kuzmin wrote:
> So how do we move this forward? Should I submit a patch based on the below
> approach, or just remove claimed check (and, presumably, leave bdev_open
> interface as is so that the caller can still limit access to read-only if
> desired)?

My vote (and Jim's) is to remove the check in spdk_bdev_claim. Don't change any
of the public APIs. If you submit a patch for this, we'll review and merge.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-05-09 21:57 Walker, Benjamin
  0 siblings, 0 replies; 17+ messages in thread
From: Walker, Benjamin @ 2018-05-09 21:57 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3312 bytes --]

On Wed, 2018-05-09 at 17:58 +0000, Andrey Kuzmin wrote:
> 
> 
> On Wed, May 9, 2018, 20:32 Walker, Benjamin <benjamin.walker(a)intel.com> wrote:
> > On Wed, 2018-05-09 at 08:12 +0000, Andrey Kuzmin wrote:
> > > Ben,
> > > 
> > > the patch hs been submitted a week back, and then updated to account for
> > > review coments, but I still don't see it merged. Is there anything I can
> > do to
> > > make this happen?
> > 
> > Hi Andrey,
> > 
> > The patch is failing one of the tests. The test output is linked in the
> > comments
> > of the review.
> 
> If you're referring to bdev unit tests, I have fixed that already in the
> updated patch.
> 
> > I re-ran it a few times and it consistently fails. I don't think
> > the problem is going to end up being your change, but rather your change
> > invalidates some assumptions or expected behavior in one of the tests. That
> > needs to get debugged and resolved before we can merge.
> 
> If you detail on what test is failing (I'm not sure as all unitests pass
> locally), I could give it a shot as well.

The patch is passing the unit tests. It's a functional test that's failing. Here
 is the dashboard for the test results for your patch:

https://ci.spdk.io/spdk/builds/review/14e40f86993e67f4107231fc25e239491d89ef08.1
525882846/

If you click fedora-08, which is the system that failed, and then click
'build.log' you'll see that a test failed because an assert was hit in
blobstore. It happens to hit when it is opening a bdev to check if it has a
blobstore present on it - precisely the area of code that you changed.

The failing test kicks off in test/lvol/lvol.sh, but for whatever reason that
set of tests is calling a fairly sizable python script to execute the tests
(test/lvol/lvol_test.py). Our general policy is that functional tests should be
written as bash scripts calling small tools and the applications under test, but
this set of tests seems to have slipped through the cracks. That's going to make
this a bit more annoying to debug, unfortunately.

The failing test in test/lvol/lvol_test.py is #700. Again, numbering tests is
not our general policy and pattern, but this specific python script numbers
them. It's hitting an assert when it tries to load up a blobstore from an NVMe
device - the metadata on the disk is invalid.

I suspected that there is some other problem that your changes are simply
uncovering, so I put in a few print statements that trigger any time a
spdk_bdev_open is called but fails due to the write permission check in the
current code. Sure enough, this hits in this lvol test all over. It turns out
that there is a code path that is relying on spdk_bdev_open to get blocked by a
previous claim in that test. We need to deal with that in a separate patch, then
your patch will pass the tests. The problem is the same one identified here:

https://github.com/spdk/spdk/issues/293


> 
> Thanks,
> A.
> > Thanks,
> > Ben
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> 
> -- 
> Regards,
> Andrey
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-05-09 17:58 Andrey Kuzmin
  0 siblings, 0 replies; 17+ messages in thread
From: Andrey Kuzmin @ 2018-05-09 17:58 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1177 bytes --]

On Wed, May 9, 2018, 20:32 Walker, Benjamin <benjamin.walker(a)intel.com>
wrote:

> On Wed, 2018-05-09 at 08:12 +0000, Andrey Kuzmin wrote:
> > Ben,
> >
> > the patch hs been submitted a week back, and then updated to account for
> > review coments, but I still don't see it merged. Is there anything I can
> do to
> > make this happen?
>
> Hi Andrey,
>
> The patch is failing one of the tests. The test output is linked in the
> comments
> of the review.


If you're referring to bdev unit tests, I have fixed that already in the
updated patch.

I re-ran it a few times and it consistently fails. I don't think
> the problem is going to end up being your change, but rather your change
> invalidates some assumptions or expected behavior in one of the tests. That
> needs to get debugged and resolved before we can merge.
>

If you detail on what test is failing (I'm not sure as all unitests pass
locally), I could give it a shot as well.

Thanks,
A.

>
> Thanks,
> Ben
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 2206 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-05-09 17:32 Walker, Benjamin
  0 siblings, 0 replies; 17+ messages in thread
From: Walker, Benjamin @ 2018-05-09 17:32 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 649 bytes --]

On Wed, 2018-05-09 at 08:12 +0000, Andrey Kuzmin wrote:
> Ben,
> 
> the patch hs been submitted a week back, and then updated to account for
> review coments, but I still don't see it merged. Is there anything I can do to
> make this happen?

Hi Andrey,

The patch is failing one of the tests. The test output is linked in the comments
of the review. I re-ran it a few times and it consistently fails. I don't think
the problem is going to end up being your change, but rather your change
invalidates some assumptions or expected behavior in one of the tests. That
needs to get debugged and resolved before we can merge.

Thanks,
Ben

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-05-09  8:12 Andrey Kuzmin
  0 siblings, 0 replies; 17+ messages in thread
From: Andrey Kuzmin @ 2018-05-09  8:12 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]

On Thu, May 3, 2018, 01:07 Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com> wrote:

>
>
> On Wed, May 2, 2018, 19:45 Walker, Benjamin <benjamin.walker(a)intel.com>
> wrote:
>
>> On Wed, 2018-05-02 at 13:40 +0000, Andrey Kuzmin wrote:
>> > So how do we move this forward? Should I submit a patch based on the
>> below
>> > approach, or just remove claimed check (and, presumably, leave bdev_open
>> > interface as is so that the caller can still limit access to read-only
>> if
>> > desired)?
>>
>> My vote (and Jim's) is to remove the check in spdk_bdev_claim. Don't
>> change any
>> of the public APIs. If you submit a patch for this, we'll review and
>> merge.
>>
>
Ben,

the patch hs been submitted a week back, and then updated to account for
review coments, but I still don't see it merged. Is there anything I can do
to make this happen?

Regards,
Andrey


> Ok, will do tomorrow.
>
> Regards,
> A.
>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>>
> --
>
> Regards,
> Andrey
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 2524 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-05-02 22:07 Andrey Kuzmin
  0 siblings, 0 replies; 17+ messages in thread
From: Andrey Kuzmin @ 2018-05-02 22:07 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

On Wed, May 2, 2018, 19:45 Walker, Benjamin <benjamin.walker(a)intel.com>
wrote:

> On Wed, 2018-05-02 at 13:40 +0000, Andrey Kuzmin wrote:
> > So how do we move this forward? Should I submit a patch based on the
> below
> > approach, or just remove claimed check (and, presumably, leave bdev_open
> > interface as is so that the caller can still limit access to read-only if
> > desired)?
>
> My vote (and Jim's) is to remove the check in spdk_bdev_claim. Don't
> change any
> of the public APIs. If you submit a patch for this, we'll review and merge.
>

Ok, will do tomorrow.

Regards,
A.

> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 1470 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-05-02 13:58 Luse, Paul E
  0 siblings, 0 replies; 17+ messages in thread
From: Luse, Paul E @ 2018-05-02 13:58 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8132 bytes --]

Awesome Andrey!  Given the quick preview of the changes below I’m sort of leaning towards just removing the check and adding some comments in the code and an entry in the changelog. I don’t feel like the additional code below is really worth what it buys us.  You can see what Jim or Ben think though as they’ve both chimed in on this.

And, of course, you never need to ask before submitting a patch…GerritHub is meant to discuss these kinds of things.  Understand if you’d rather hold off but for small stuff like this it’s usually best just to submit the patch and let the review process work through it.

Thx
Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
Sent: Wednesday, May 2, 2018 6:41 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Bdev claim and single writer semantics

So how do we move this forward? Should I submit a patch based on the below approach, or just remove claimed check (and, presumably, leave bdev_open interface as is so that the caller can still limit access to read-only if desired)?

Regards,
Andrey
On Mon, Apr 30, 2018, 20:26 Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com<mailto:andrey.v.kuzmin(a)gmail.com>> wrote:
It looks like the resistance/flexibility sweet spot is in delegating
the semantic decision to the module's author by adding a boolean to
the module definition, to be set in the module's declaration if single
writer semantics is desired. This seems to deliver both the least
resistance path toward changing the current default and some
flexibility further down the road.

I have attached the patch sketch inline below so as to get the go/no-go input.

Regards,
Andrey

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 46a845c..aa8a9e7 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,7 +1,28 @@
 # Changelog

+### bdev
+
+Default block device sharing mode has been changed.
+
+Previously, multiple write descriptors could be used only for block
devices that
+didn't have any claims (and thus no virtual block devices stacked
upon it). This
+is now the default behavior for ANY block device, including claimed ones.
+
+To revert to the previos behavior, virtual block device module must explicitly
+claim single-writer semantics in the module's declaration as in the
example below.
+
+```C
+static struct spdk_bdev_module single_writer_module_if = {
+ .name = "single-writer-virtual-block-device",
+ .single_writer = true,
+ /* Other module declarations */
+};
+```
+
+
+
 ## v18.04: Logical Volume Snapshot/Clone, iSCSI Initiator, Bdev QoS,
VPP Userspace TCP/IP

 ### vhost

 The SPDK vhost-scsi, vhost-blk and vhost-nvme applications have fixes
to address the
diff --git a/include/spdk_internal/bdev.h b/include/spdk_internal/bdev.h

index 9988d4d..2d37cdd 100644
--- a/include/spdk_internal/bdev.h
+++ b/include/spdk_internal/bdev.h
@@ -131,10 +131,17 @@
  * create their own vbdevs.
  */
  void (*examine)(struct spdk_bdev *bdev);

  /**
+ * Specifies if the module enforces single-writer semantics for bdevs
it claims.
+ * If set, any bdev claimed by this module will be writable only via a bdev
+ * descriptor obtained by opening this module's bdev.
+ */
+ bool single_writer;
+
+ /**
  * Count of bdev inits/examinations in progress. Used by generic bdev
  * layer and must not be modified by bdev modules.
  *
  * \note Used internally by bdev subsystem, don't change this value
in bdev module.
  */
diff --git a/lib/bdev/bdev.c b/lib/bdev/bdev.c
index 93b222f..a730e45 100644
--- a/lib/bdev/bdev.c
+++ b/lib/bdev/bdev.c
@@ -2700,12 +2700,14 @@
  return -ENOMEM;
  }

  pthread_mutex_lock(&bdev->mutex);

- if (write && bdev->claim_module) {
- SPDK_INFOLOG(SPDK_LOG_BDEV, "Could not open %s - already claimed\n",
bdev->name);
+ if (write && bdev->claim_module && bdev->claim_module->single_writer) {
+ SPDK_INFOLOG(SPDK_LOG_BDEV, "Could not open %s as writable - "
+ "claimed by single-writer module %s\n", bdev->name,
+ bdev->claim_module->name);
  free(desc);
  pthread_mutex_unlock(&bdev->mutex);
  return -EPERM;
  }

Regards,
Andrey


On Sun, Apr 29, 2018 at 12:35 AM, Andrey Kuzmin
<andrey.v.kuzmin(a)gmail.com<mailto:andrey.v.kuzmin(a)gmail.com>> wrote:
>
>
> On Sun, Apr 29, 2018, 00:14 Luse, Paul E <paul.e.luse(a)intel.com<mailto:paul.e.luse(a)intel.com>> wrote:
>>
>> Maybe make the check a compilation option and default to off? That makes
>> it very clear that it’s a consideration at least and easy to flip it however
>> depending...
>
>
> Whatever the mechanism - compile- or configuration time - there still will
> be some coding involved, either making some code conditional or checking the
> config option at runtime. Personally, I think compile-time is best suited
> for configuring the build rather than the installation. Just my $.02.
>
> Regards,
> Andrey
>>
>>
>> Thx
>> Paul
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Harris, James R
>> Sent: Friday, April 27, 2018 4:06 PM
>> To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
>> Subject: Re: [SPDK] Bdev claim and single writer semantics
>>
>>
>>
>> On 4/27/18, 8:04 AM, "SPDK on behalf of Andrey Kuzmin"
>> <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org> on behalf of andrey.v.kuzmin(a)gmail.com<mailto:andrey.v.kuzmin(a)gmail.com>> wrote:
>>
>>     On Fri, Apr 27, 2018 at 2:26 AM, Walker, Benjamin
>>     <benjamin.walker(a)intel.com<mailto:benjamin.walker(a)intel.com>> wrote:
>>     > On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
>>     >> Hi all,
>>   >>
>>
>> <snip>
>>
>>     >> While adding support for the shared access semantics seems to be
>>     >> pretty straightforward, I thought it reasonable to bring the issue
>> up
>>     >> here first, looking for insights, comments, and objections. Please
>> let
>>     >> me know if there are any or I can give a shot to a shared access
>>     >> patch.
>>     >
>>     > Your analysis is all correct. There are two access control
>> mechanisms in the
>>     > bdev layer for different purposes; claiming, which is for stacking
>> bdevs
>>     > together to make I/O pipelines, and descriptors, which is standard
>> read/write
>>     > access control for consumers of bdevs. We thought it made the most
>> sense to
>>     > allow write access to a bdev from only a single module for safety
>> reasons.
>>
>>     What seems to be a fairly reasonable safety measure for a
>> general-purpose
>>     I/O stack can easily become an unwarranted constraint in a
>> tightly-controlled
>>     (say, appliance) settings. Turning single writer into a stack-level
>>     configurable
>>     option could buy us the best of both worlds at once.
>>
>> I’m wondering if we should just remove the safety check.  The original
>> concern was to prevent the overwriting of metadata.  For example, an NVMe
>> bdev contains a logical volume store – the lvol bdev module has claimed the
>> bdev, opening and writing directly to the NVMe bdev could corrupt the
>> metadata.
>>
>> But I agree there are a lot of use cases where you may still want to allow
>> this for one reason or another.  And the Linux kernel doesn’t try to prevent
>> this – you can put an LVM PV/VG on a block device and then write zeroes to
>> the underlying block device.  Same with using parted to put down a GPT and
>> creating a partition.
>>
>> So basically, it’s up to the user to know what they are doing.
>>
>> -Jim
>>
>>
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
>> https://lists.01.org/mailman/listinfo/spdk
>
> --
>
> Regards,
> Andrey
--

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 14329 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-05-02 13:40 Andrey Kuzmin
  0 siblings, 0 replies; 17+ messages in thread
From: Andrey Kuzmin @ 2018-05-02 13:40 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7373 bytes --]

So how do we move this forward? Should I submit a patch based on the below
approach, or just remove claimed check (and, presumably, leave bdev_open
interface as is so that the caller can still limit access to read-only if
desired)?

Regards,
Andrey

On Mon, Apr 30, 2018, 20:26 Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com> wrote:

> It looks like the resistance/flexibility sweet spot is in delegating
> the semantic decision to the module's author by adding a boolean to
> the module definition, to be set in the module's declaration if single
> writer semantics is desired. This seems to deliver both the least
> resistance path toward changing the current default and some
> flexibility further down the road.
>
> I have attached the patch sketch inline below so as to get the go/no-go
> input.
>
> Regards,
> Andrey
>
> diff --git a/CHANGELOG.md b/CHANGELOG.md
> index 46a845c..aa8a9e7 100644
> --- a/CHANGELOG.md
> +++ b/CHANGELOG.md
> @@ -1,7 +1,28 @@
>  # Changelog
>
> +### bdev
> +
> +Default block device sharing mode has been changed.
> +
> +Previously, multiple write descriptors could be used only for block
> devices that
> +didn't have any claims (and thus no virtual block devices stacked
> upon it). This
> +is now the default behavior for ANY block device, including claimed ones.
> +
> +To revert to the previos behavior, virtual block device module must
> explicitly
> +claim single-writer semantics in the module's declaration as in the
> example below.
> +
> +```C
> +static struct spdk_bdev_module single_writer_module_if = {
> + .name = "single-writer-virtual-block-device",
> + .single_writer = true,
> + /* Other module declarations */
> +};
> +```
> +
> +
> +
>  ## v18.04: Logical Volume Snapshot/Clone, iSCSI Initiator, Bdev QoS,
> VPP Userspace TCP/IP
>
>  ### vhost
>
>  The SPDK vhost-scsi, vhost-blk and vhost-nvme applications have fixes
> to address the
> diff --git a/include/spdk_internal/bdev.h b/include/spdk_internal/bdev.h
>
> index 9988d4d..2d37cdd 100644
> --- a/include/spdk_internal/bdev.h
> +++ b/include/spdk_internal/bdev.h
> @@ -131,10 +131,17 @@
>   * create their own vbdevs.
>   */
>   void (*examine)(struct spdk_bdev *bdev);
>
>   /**
> + * Specifies if the module enforces single-writer semantics for bdevs
> it claims.
> + * If set, any bdev claimed by this module will be writable only via a
> bdev
> + * descriptor obtained by opening this module's bdev.
> + */
> + bool single_writer;
> +
> + /**
>   * Count of bdev inits/examinations in progress. Used by generic bdev
>   * layer and must not be modified by bdev modules.
>   *
>   * \note Used internally by bdev subsystem, don't change this value
> in bdev module.
>   */
> diff --git a/lib/bdev/bdev.c b/lib/bdev/bdev.c
> index 93b222f..a730e45 100644
> --- a/lib/bdev/bdev.c
> +++ b/lib/bdev/bdev.c
> @@ -2700,12 +2700,14 @@
>   return -ENOMEM;
>   }
>
>   pthread_mutex_lock(&bdev->mutex);
>
> - if (write && bdev->claim_module) {
> - SPDK_INFOLOG(SPDK_LOG_BDEV, "Could not open %s - already claimed\n",
> bdev->name);
> + if (write && bdev->claim_module && bdev->claim_module->single_writer) {
> + SPDK_INFOLOG(SPDK_LOG_BDEV, "Could not open %s as writable - "
> + "claimed by single-writer module %s\n", bdev->name,
> + bdev->claim_module->name);
>   free(desc);
>   pthread_mutex_unlock(&bdev->mutex);
>   return -EPERM;
>   }
>
> Regards,
> Andrey
>
>
> On Sun, Apr 29, 2018 at 12:35 AM, Andrey Kuzmin
> <andrey.v.kuzmin(a)gmail.com> wrote:
> >
> >
> > On Sun, Apr 29, 2018, 00:14 Luse, Paul E <paul.e.luse(a)intel.com> wrote:
> >>
> >> Maybe make the check a compilation option and default to off? That makes
> >> it very clear that it’s a consideration at least and easy to flip it
> however
> >> depending...
> >
> >
> > Whatever the mechanism - compile- or configuration time - there still
> will
> > be some coding involved, either making some code conditional or checking
> the
> > config option at runtime. Personally, I think compile-time is best suited
> > for configuring the build rather than the installation. Just my $.02.
> >
> > Regards,
> > Andrey
> >>
> >>
> >> Thx
> >> Paul
> >>
> >> -----Original Message-----
> >> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris,
> James R
> >> Sent: Friday, April 27, 2018 4:06 PM
> >> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> >> Subject: Re: [SPDK] Bdev claim and single writer semantics
> >>
> >>
> >>
> >> On 4/27/18, 8:04 AM, "SPDK on behalf of Andrey Kuzmin"
> >> <spdk-bounces(a)lists.01.org on behalf of andrey.v.kuzmin(a)gmail.com>
> wrote:
> >>
> >>     On Fri, Apr 27, 2018 at 2:26 AM, Walker, Benjamin
> >>     <benjamin.walker(a)intel.com> wrote:
> >>     > On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
> >>     >> Hi all,
> >>   >>
> >>
> >> <snip>
> >>
> >>     >> While adding support for the shared access semantics seems to be
> >>     >> pretty straightforward, I thought it reasonable to bring the
> issue
> >> up
> >>     >> here first, looking for insights, comments, and objections.
> Please
> >> let
> >>     >> me know if there are any or I can give a shot to a shared access
> >>     >> patch.
> >>     >
> >>     > Your analysis is all correct. There are two access control
> >> mechanisms in the
> >>     > bdev layer for different purposes; claiming, which is for stacking
> >> bdevs
> >>     > together to make I/O pipelines, and descriptors, which is standard
> >> read/write
> >>     > access control for consumers of bdevs. We thought it made the most
> >> sense to
> >>     > allow write access to a bdev from only a single module for safety
> >> reasons.
> >>
> >>     What seems to be a fairly reasonable safety measure for a
> >> general-purpose
> >>     I/O stack can easily become an unwarranted constraint in a
> >> tightly-controlled
> >>     (say, appliance) settings. Turning single writer into a stack-level
> >>     configurable
> >>     option could buy us the best of both worlds at once.
> >>
> >> I’m wondering if we should just remove the safety check.  The original
> >> concern was to prevent the overwriting of metadata.  For example, an
> NVMe
> >> bdev contains a logical volume store – the lvol bdev module has claimed
> the
> >> bdev, opening and writing directly to the NVMe bdev could corrupt the
> >> metadata.
> >>
> >> But I agree there are a lot of use cases where you may still want to
> allow
> >> this for one reason or another.  And the Linux kernel doesn’t try to
> prevent
> >> this – you can put an LVM PV/VG on a block device and then write zeroes
> to
> >> the underlying block device.  Same with using parted to put down a GPT
> and
> >> creating a partition.
> >>
> >> So basically, it’s up to the user to know what they are doing.
> >>
> >> -Jim
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> SPDK mailing list
> >> SPDK(a)lists.01.org
> >> https://lists.01.org/mailman/listinfo/spdk
> >> _______________________________________________
> >> SPDK mailing list
> >> SPDK(a)lists.01.org
> >> https://lists.01.org/mailman/listinfo/spdk
> >
> > --
> >
> > Regards,
> > Andrey
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 9450 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-04-30 17:26 Andrey Kuzmin
  0 siblings, 0 replies; 17+ messages in thread
From: Andrey Kuzmin @ 2018-04-30 17:26 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6566 bytes --]

It looks like the resistance/flexibility sweet spot is in delegating
the semantic decision to the module's author by adding a boolean to
the module definition, to be set in the module's declaration if single
writer semantics is desired. This seems to deliver both the least
resistance path toward changing the current default and some
flexibility further down the road.

I have attached the patch sketch inline below so as to get the go/no-go input.

Regards,
Andrey

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 46a845c..aa8a9e7 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,7 +1,28 @@
 # Changelog

+### bdev
+
+Default block device sharing mode has been changed.
+
+Previously, multiple write descriptors could be used only for block
devices that
+didn't have any claims (and thus no virtual block devices stacked
upon it). This
+is now the default behavior for ANY block device, including claimed ones.
+
+To revert to the previos behavior, virtual block device module must explicitly
+claim single-writer semantics in the module's declaration as in the
example below.
+
+```C
+static struct spdk_bdev_module single_writer_module_if = {
+ .name = "single-writer-virtual-block-device",
+ .single_writer = true,
+ /* Other module declarations */
+};
+```
+
+
+
 ## v18.04: Logical Volume Snapshot/Clone, iSCSI Initiator, Bdev QoS,
VPP Userspace TCP/IP

 ### vhost

 The SPDK vhost-scsi, vhost-blk and vhost-nvme applications have fixes
to address the
diff --git a/include/spdk_internal/bdev.h b/include/spdk_internal/bdev.h

index 9988d4d..2d37cdd 100644
--- a/include/spdk_internal/bdev.h
+++ b/include/spdk_internal/bdev.h
@@ -131,10 +131,17 @@
  * create their own vbdevs.
  */
  void (*examine)(struct spdk_bdev *bdev);

  /**
+ * Specifies if the module enforces single-writer semantics for bdevs
it claims.
+ * If set, any bdev claimed by this module will be writable only via a bdev
+ * descriptor obtained by opening this module's bdev.
+ */
+ bool single_writer;
+
+ /**
  * Count of bdev inits/examinations in progress. Used by generic bdev
  * layer and must not be modified by bdev modules.
  *
  * \note Used internally by bdev subsystem, don't change this value
in bdev module.
  */
diff --git a/lib/bdev/bdev.c b/lib/bdev/bdev.c
index 93b222f..a730e45 100644
--- a/lib/bdev/bdev.c
+++ b/lib/bdev/bdev.c
@@ -2700,12 +2700,14 @@
  return -ENOMEM;
  }

  pthread_mutex_lock(&bdev->mutex);

- if (write && bdev->claim_module) {
- SPDK_INFOLOG(SPDK_LOG_BDEV, "Could not open %s - already claimed\n",
bdev->name);
+ if (write && bdev->claim_module && bdev->claim_module->single_writer) {
+ SPDK_INFOLOG(SPDK_LOG_BDEV, "Could not open %s as writable - "
+ "claimed by single-writer module %s\n", bdev->name,
+ bdev->claim_module->name);
  free(desc);
  pthread_mutex_unlock(&bdev->mutex);
  return -EPERM;
  }

Regards,
Andrey


On Sun, Apr 29, 2018 at 12:35 AM, Andrey Kuzmin
<andrey.v.kuzmin(a)gmail.com> wrote:
>
>
> On Sun, Apr 29, 2018, 00:14 Luse, Paul E <paul.e.luse(a)intel.com> wrote:
>>
>> Maybe make the check a compilation option and default to off? That makes
>> it very clear that it’s a consideration at least and easy to flip it however
>> depending...
>
>
> Whatever the mechanism - compile- or configuration time - there still will
> be some coding involved, either making some code conditional or checking the
> config option at runtime. Personally, I think compile-time is best suited
> for configuring the build rather than the installation. Just my $.02.
>
> Regards,
> Andrey
>>
>>
>> Thx
>> Paul
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James R
>> Sent: Friday, April 27, 2018 4:06 PM
>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
>> Subject: Re: [SPDK] Bdev claim and single writer semantics
>>
>>
>>
>> On 4/27/18, 8:04 AM, "SPDK on behalf of Andrey Kuzmin"
>> <spdk-bounces(a)lists.01.org on behalf of andrey.v.kuzmin(a)gmail.com> wrote:
>>
>>     On Fri, Apr 27, 2018 at 2:26 AM, Walker, Benjamin
>>     <benjamin.walker(a)intel.com> wrote:
>>     > On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
>>     >> Hi all,
>>   >>
>>
>> <snip>
>>
>>     >> While adding support for the shared access semantics seems to be
>>     >> pretty straightforward, I thought it reasonable to bring the issue
>> up
>>     >> here first, looking for insights, comments, and objections. Please
>> let
>>     >> me know if there are any or I can give a shot to a shared access
>>     >> patch.
>>     >
>>     > Your analysis is all correct. There are two access control
>> mechanisms in the
>>     > bdev layer for different purposes; claiming, which is for stacking
>> bdevs
>>     > together to make I/O pipelines, and descriptors, which is standard
>> read/write
>>     > access control for consumers of bdevs. We thought it made the most
>> sense to
>>     > allow write access to a bdev from only a single module for safety
>> reasons.
>>
>>     What seems to be a fairly reasonable safety measure for a
>> general-purpose
>>     I/O stack can easily become an unwarranted constraint in a
>> tightly-controlled
>>     (say, appliance) settings. Turning single writer into a stack-level
>>     configurable
>>     option could buy us the best of both worlds at once.
>>
>> I’m wondering if we should just remove the safety check.  The original
>> concern was to prevent the overwriting of metadata.  For example, an NVMe
>> bdev contains a logical volume store – the lvol bdev module has claimed the
>> bdev, opening and writing directly to the NVMe bdev could corrupt the
>> metadata.
>>
>> But I agree there are a lot of use cases where you may still want to allow
>> this for one reason or another.  And the Linux kernel doesn’t try to prevent
>> this – you can put an LVM PV/VG on a block device and then write zeroes to
>> the underlying block device.  Same with using parted to put down a GPT and
>> creating a partition.
>>
>> So basically, it’s up to the user to know what they are doing.
>>
>> -Jim
>>
>>
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>
> --
>
> Regards,
> Andrey

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-04-30 17:24 Walker, Benjamin
  0 siblings, 0 replies; 17+ messages in thread
From: Walker, Benjamin @ 2018-04-30 17:24 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

On Sat, 2018-04-28 at 21:30 +0000, Andrey Kuzmin wrote:
> While adding support for the shared access semantics seems to be
> pretty straightforward, I thought it reasonable to bring the issue up
> here first, looking for insights, comments, and objections. Please let
> me know if there are any or I can give a shot to a shared access
> patch.

On Sat, Apr 28, 2018, 02:06 Harris, James R <james.r.harris(a)intel.com> wrote:
> I’m wondering if we should just remove the safety check.  The original concern
> was to prevent the overwriting of metadata.  For example, an NVMe bdev
> contains a logical volume store – the lvol bdev module has claimed the bdev,
> opening and writing directly to the NVMe bdev could corrupt the metadata.
> 
> But I agree there are a lot of use cases where you may still want to allow
> this for one reason or another.  And the Linux kernel doesn’t try to prevent
> this – you can put an LVM PV/VG on a block device and then write zeroes to the
> underlying block device.  Same with using parted to put down a GPT and
> creating a partition.

Based on the behavior of other similar block stacks, I'm on board with just
entirely removing the check. I vote we entirely decouple claiming (building the
block device dependency tree) from descriptors (read/write permission checks).

Thanks,
Ben

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-04-28 21:35 Andrey Kuzmin
  0 siblings, 0 replies; 17+ messages in thread
From: Andrey Kuzmin @ 2018-04-28 21:35 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3421 bytes --]

On Sun, Apr 29, 2018, 00:14 Luse, Paul E <paul.e.luse(a)intel.com> wrote:

> Maybe make the check a compilation option and default to off? That makes
> it very clear that it’s a consideration at least and easy to flip it
> however depending...
>

Whatever the mechanism - compile- or configuration time - there still will
be some coding involved, either making some code conditional or checking
the config option at runtime. Personally, I think compile-time is best
suited for configuring the build rather than the installation. Just my $.02.

Regards,
Andrey

>
> Thx
> Paul
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James R
> Sent: Friday, April 27, 2018 4:06 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Bdev claim and single writer semantics
>
>
>
> On 4/27/18, 8:04 AM, "SPDK on behalf of Andrey Kuzmin" <
> spdk-bounces(a)lists.01.org on behalf of andrey.v.kuzmin(a)gmail.com> wrote:
>
>     On Fri, Apr 27, 2018 at 2:26 AM, Walker, Benjamin
>     <benjamin.walker(a)intel.com> wrote:
>     > On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
>     >> Hi all,
>   >>
>
> <snip>
>
>     >> While adding support for the shared access semantics seems to be
>     >> pretty straightforward, I thought it reasonable to bring the issue
> up
>     >> here first, looking for insights, comments, and objections. Please
> let
>     >> me know if there are any or I can give a shot to a shared access
>     >> patch.
>     >
>     > Your analysis is all correct. There are two access control
> mechanisms in the
>     > bdev layer for different purposes; claiming, which is for stacking
> bdevs
>     > together to make I/O pipelines, and descriptors, which is standard
> read/write
>     > access control for consumers of bdevs. We thought it made the most
> sense to
>     > allow write access to a bdev from only a single module for safety
> reasons.
>
>     What seems to be a fairly reasonable safety measure for a
> general-purpose
>     I/O stack can easily become an unwarranted constraint in a
> tightly-controlled
>     (say, appliance) settings. Turning single writer into a stack-level
>     configurable
>     option could buy us the best of both worlds at once.
>
> I’m wondering if we should just remove the safety check.  The original
> concern was to prevent the overwriting of metadata.  For example, an NVMe
> bdev contains a logical volume store – the lvol bdev module has claimed the
> bdev, opening and writing directly to the NVMe bdev could corrupt the
> metadata.
>
> But I agree there are a lot of use cases where you may still want to allow
> this for one reason or another.  And the Linux kernel doesn’t try to
> prevent this – you can put an LVM PV/VG on a block device and then write
> zeroes to the underlying block device.  Same with using parted to put down
> a GPT and creating a partition.
>
> So basically, it’s up to the user to know what they are doing.
>
> -Jim
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 4670 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-04-28 21:30 Andrey Kuzmin
  0 siblings, 0 replies; 17+ messages in thread
From: Andrey Kuzmin @ 2018-04-28 21:30 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]

On Sat, Apr 28, 2018, 02:06 Harris, James R <james.r.harris(a)intel.com>
wrote:

>
>
> On 4/27/18, 8:04 AM, "SPDK on behalf of Andrey Kuzmin" <
> spdk-bounces(a)lists.01.org on behalf of andrey.v.kuzmin(a)gmail.com> wrote:
>
>     On Fri, Apr 27, 2018 at 2:26 AM, Walker, Benjamin
>     <benjamin.walker(a)intel.com> wrote:
>     > On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
>     >> Hi all,
>   >>
>
> <snip>
>
>     >> While adding support for the shared access semantics seems to be
>     >> pretty straightforward, I thought it reasonable to bring the issue
> up
>     >> here first, looking for insights, comments, and objections. Please
> let
>     >> me know if there are any or I can give a shot to a shared access
>     >> patch.
>     >
>     > Your analysis is all correct. There are two access control
> mechanisms in the
>     > bdev layer for different purposes; claiming, which is for stacking
> bdevs
>     > together to make I/O pipelines, and descriptors, which is standard
> read/write
>     > access control for consumers of bdevs. We thought it made the most
> sense to
>     > allow write access to a bdev from only a single module for safety
> reasons.
>
>     What seems to be a fairly reasonable safety measure for a
> general-purpose
>     I/O stack can easily become an unwarranted constraint in a
> tightly-controlled
>     (say, appliance) settings. Turning single writer into a stack-level
>     configurable
>     option could buy us the best of both worlds at once.
>
> I’m wondering if we should just remove the safety check.  The original
> concern was to prevent the overwriting of metadata.  For example, an NVMe
> bdev contains a logical volume store – the lvol bdev module has claimed the
> bdev, opening and writing directly to the NVMe bdev could corrupt the
> metadata.
>
> But I agree there are a lot of use cases where you may still want to allow
> this for one reason or another.  And the Linux kernel doesn’t try to
> prevent this – you can put an LVM PV/VG on a block device and then write
> zeroes to the underlying block device.  Same with using parted to put down
> a GPT and creating a partition.
>
> So basically, it’s up to the user to know what they are doing.
>

Agreed.

Regards,
Andrey


> -Jim
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 3515 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-04-28 21:13 Luse, Paul E
  0 siblings, 0 replies; 17+ messages in thread
From: Luse, Paul E @ 2018-04-28 21:13 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2700 bytes --]

Maybe make the check a compilation option and default to off? That makes it very clear that it’s a consideration at least and easy to flip it however depending...

Thx
Paul

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James R
Sent: Friday, April 27, 2018 4:06 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Bdev claim and single writer semantics



On 4/27/18, 8:04 AM, "SPDK on behalf of Andrey Kuzmin" <spdk-bounces(a)lists.01.org on behalf of andrey.v.kuzmin(a)gmail.com> wrote:

    On Fri, Apr 27, 2018 at 2:26 AM, Walker, Benjamin
    <benjamin.walker(a)intel.com> wrote:
    > On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
    >> Hi all,
  >>

<snip>

    >> While adding support for the shared access semantics seems to be
    >> pretty straightforward, I thought it reasonable to bring the issue up
    >> here first, looking for insights, comments, and objections. Please let
    >> me know if there are any or I can give a shot to a shared access
    >> patch.
    >
    > Your analysis is all correct. There are two access control mechanisms in the
    > bdev layer for different purposes; claiming, which is for stacking bdevs
    > together to make I/O pipelines, and descriptors, which is standard read/write
    > access control for consumers of bdevs. We thought it made the most sense to
    > allow write access to a bdev from only a single module for safety reasons.
    
    What seems to be a fairly reasonable safety measure for a general-purpose
    I/O stack can easily become an unwarranted constraint in a tightly-controlled
    (say, appliance) settings. Turning single writer into a stack-level
    configurable
    option could buy us the best of both worlds at once.
    
I’m wondering if we should just remove the safety check.  The original concern was to prevent the overwriting of metadata.  For example, an NVMe bdev contains a logical volume store – the lvol bdev module has claimed the bdev, opening and writing directly to the NVMe bdev could corrupt the metadata.

But I agree there are a lot of use cases where you may still want to allow this for one reason or another.  And the Linux kernel doesn’t try to prevent this – you can put an LVM PV/VG on a block device and then write zeroes to the underlying block device.  Same with using parted to put down a GPT and creating a partition.

So basically, it’s up to the user to know what they are doing.

-Jim




_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-04-27 23:06 Harris, James R
  0 siblings, 0 replies; 17+ messages in thread
From: Harris, James R @ 2018-04-27 23:06 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2117 bytes --]



On 4/27/18, 8:04 AM, "SPDK on behalf of Andrey Kuzmin" <spdk-bounces(a)lists.01.org on behalf of andrey.v.kuzmin(a)gmail.com> wrote:

    On Fri, Apr 27, 2018 at 2:26 AM, Walker, Benjamin
    <benjamin.walker(a)intel.com> wrote:
    > On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
    >> Hi all,
  >>

<snip>

    >> While adding support for the shared access semantics seems to be
    >> pretty straightforward, I thought it reasonable to bring the issue up
    >> here first, looking for insights, comments, and objections. Please let
    >> me know if there are any or I can give a shot to a shared access
    >> patch.
    >
    > Your analysis is all correct. There are two access control mechanisms in the
    > bdev layer for different purposes; claiming, which is for stacking bdevs
    > together to make I/O pipelines, and descriptors, which is standard read/write
    > access control for consumers of bdevs. We thought it made the most sense to
    > allow write access to a bdev from only a single module for safety reasons.
    
    What seems to be a fairly reasonable safety measure for a general-purpose
    I/O stack can easily become an unwarranted constraint in a tightly-controlled
    (say, appliance) settings. Turning single writer into a stack-level
    configurable
    option could buy us the best of both worlds at once.
    
I’m wondering if we should just remove the safety check.  The original concern was to prevent the overwriting of metadata.  For example, an NVMe bdev contains a logical volume store – the lvol bdev module has claimed the bdev, opening and writing directly to the NVMe bdev could corrupt the metadata.

But I agree there are a lot of use cases where you may still want to allow this for one reason or another.  And the Linux kernel doesn’t try to prevent this – you can put an LVM PV/VG on a block device and then write zeroes to the underlying block device.  Same with using parted to put down a GPT and creating a partition.

So basically, it’s up to the user to know what they are doing.

-Jim





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-04-27 15:04 Andrey Kuzmin
  0 siblings, 0 replies; 17+ messages in thread
From: Andrey Kuzmin @ 2018-04-27 15:04 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4520 bytes --]

On Fri, Apr 27, 2018 at 2:26 AM, Walker, Benjamin
<benjamin.walker(a)intel.com> wrote:
> On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
>> Hi all,
>>
>> while looking at the block device subsystem code, I noticed bdev
>> claims being used for two purposes at the same time. First (and a
>> primary one, as far as I understood it) is a perfectly reasonable idea
>> to provide the bdev subsystem with dependency information so that
>> block devices could be brought up and shut down in the proper order
>> (which doesn't seem to be in place yet, though).
>>
>> On the second hand, claims are somewhat counter-intuitively also used
>> to ensure single writer semantics where only a module that has claimed
>> the block device has write access. While it may be considered a safety
>> measure of sorts, it doesn't go well with the case where a
>> higher-level application is interested in shared access to the
>> underlying block device(s), in particular when there are
>> application-level means to ensure access consistency under multiple
>> writers scenario.
>>
>> While adding support for the shared access semantics seems to be
>> pretty straightforward, I thought it reasonable to bring the issue up
>> here first, looking for insights, comments, and objections. Please let
>> me know if there are any or I can give a shot to a shared access
>> patch.
>
> Your analysis is all correct. There are two access control mechanisms in the
> bdev layer for different purposes; claiming, which is for stacking bdevs
> together to make I/O pipelines, and descriptors, which is standard read/write
> access control for consumers of bdevs. We thought it made the most sense to
> allow write access to a bdev from only a single module for safety reasons.

What seems to be a fairly reasonable safety measure for a general-purpose
I/O stack can easily become an unwarranted constraint in a tightly-controlled
(say, appliance) settings. Turning single writer into a stack-level
configurable
option could buy us the best of both worlds at once.

> That's actually a fairly narrow limitation and I'm not sure it is precluding you
> from doing what you are looking to do.
>
> The rule, as it is written today, is that if a bdev isn't claimed by a bdev
> module (i.e. it doesn't have a bdev stacked on top of it), then multiple write
> descriptors can be used. If it is claimed by a bdev module (i.e. another bdev is
>  stacked on top of it), then other consumers may only open it as read only.
>
> Is your use case that you have some stack of bdevs, and then at some point your
> application wants to write to one of the bdevs that's not at the top of the
> stack? As a concrete example, this would be like having a RAID volume that you
> normally write to, but at some point your application bypasses the RAID logic
> and writes directly to one of the disks that back it. That sounds fishy to me,
> but I'm open to legitimate use cases here.
>

The above logic is impeccable in a single system settings, but makes
far less sense
if one considers a distributed, e.g. cluster, environment where
multiple instances
of the same stack are being active simultaneously.

If write request in such a system is allowed to enter the local stack
instance anywhere
below the top-level bdev, it would hit the single writer wall while
being perfectly legitimate
in the sense that it comes from the write-capable top-level bdev, just
another instance :).

> If you application wants to use a bdev that is at the top of a stack from
> multiple places, then you should be able to open multiple write descriptors
> today. Just don't have your application claim the bdev.

Please correct me if I'm wrong, but even this isn't the case with
virtual bdevs where
one needs to claim underlying bdevs for the virtual bdev even to get
assembled at
startup.

Regards,
Andrey
>
> We've tried to make this as clear as possible, but it could really use some
> better documentation at a minimum. And discussion about how best to present
> these concepts to users and make it easier to use is always very welcome. Let me
> know what your thoughts are.
>
>>
>> Best regards,
>> Andrey
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [SPDK] Bdev claim and single writer semantics
@ 2018-04-26 23:26 Walker, Benjamin
  0 siblings, 0 replies; 17+ messages in thread
From: Walker, Benjamin @ 2018-04-26 23:26 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3199 bytes --]

On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
> Hi all,
> 
> while looking at the block device subsystem code, I noticed bdev
> claims being used for two purposes at the same time. First (and a
> primary one, as far as I understood it) is a perfectly reasonable idea
> to provide the bdev subsystem with dependency information so that
> block devices could be brought up and shut down in the proper order
> (which doesn't seem to be in place yet, though).
> 
> On the second hand, claims are somewhat counter-intuitively also used
> to ensure single writer semantics where only a module that has claimed
> the block device has write access. While it may be considered a safety
> measure of sorts, it doesn't go well with the case where a
> higher-level application is interested in shared access to the
> underlying block device(s), in particular when there are
> application-level means to ensure access consistency under multiple
> writers scenario.
> 
> While adding support for the shared access semantics seems to be
> pretty straightforward, I thought it reasonable to bring the issue up
> here first, looking for insights, comments, and objections. Please let
> me know if there are any or I can give a shot to a shared access
> patch.

Your analysis is all correct. There are two access control mechanisms in the
bdev layer for different purposes; claiming, which is for stacking bdevs
together to make I/O pipelines, and descriptors, which is standard read/write
access control for consumers of bdevs. We thought it made the most sense to
allow write access to a bdev from only a single module for safety reasons.
That's actually a fairly narrow limitation and I'm not sure it is precluding you
from doing what you are looking to do.

The rule, as it is written today, is that if a bdev isn't claimed by a bdev
module (i.e. it doesn't have a bdev stacked on top of it), then multiple write
descriptors can be used. If it is claimed by a bdev module (i.e. another bdev is
 stacked on top of it), then other consumers may only open it as read only.

Is your use case that you have some stack of bdevs, and then at some point your
application wants to write to one of the bdevs that's not at the top of the
stack? As a concrete example, this would be like having a RAID volume that you
normally write to, but at some point your application bypasses the RAID logic
and writes directly to one of the disks that back it. That sounds fishy to me,
but I'm open to legitimate use cases here.

If you application wants to use a bdev that is at the top of a stack from
multiple places, then you should be able to open multiple write descriptors
today. Just don't have your application claim the bdev.

We've tried to make this as clear as possible, but it could really use some
better documentation at a minimum. And discussion about how best to present
these concepts to users and make it easier to use is always very welcome. Let me
know what your thoughts are.

> 
> Best regards,
> Andrey
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [SPDK] Bdev claim and single writer semantics
@ 2018-04-26 12:25 Andrey Kuzmin
  0 siblings, 0 replies; 17+ messages in thread
From: Andrey Kuzmin @ 2018-04-26 12:25 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1199 bytes --]

Hi all,

while looking at the block device subsystem code, I noticed bdev
claims being used for two purposes at the same time. First (and a
primary one, as far as I understood it) is a perfectly reasonable idea
to provide the bdev subsystem with dependency information so that
block devices could be brought up and shut down in the proper order
(which doesn't seem to be in place yet, though).

On the second hand, claims are somewhat counter-intuitively also used
to ensure single writer semantics where only a module that has claimed
the block device has write access. While it may be considered a safety
measure of sorts, it doesn't go well with the case where a
higher-level application is interested in shared access to the
underlying block device(s), in particular when there are
application-level means to ensure access consistency under multiple
writers scenario.

While adding support for the shared access semantics seems to be
pretty straightforward, I thought it reasonable to bring the issue up
here first, looking for insights, comments, and objections. Please let
me know if there are any or I can give a shot to a shared access
patch.

Best regards,
Andrey

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-05-09 21:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-02 16:45 [SPDK] Bdev claim and single writer semantics Walker, Benjamin
  -- strict thread matches above, loose matches on Subject: below --
2018-05-09 21:57 Walker, Benjamin
2018-05-09 17:58 Andrey Kuzmin
2018-05-09 17:32 Walker, Benjamin
2018-05-09  8:12 Andrey Kuzmin
2018-05-02 22:07 Andrey Kuzmin
2018-05-02 13:58 Luse, Paul E
2018-05-02 13:40 Andrey Kuzmin
2018-04-30 17:26 Andrey Kuzmin
2018-04-30 17:24 Walker, Benjamin
2018-04-28 21:35 Andrey Kuzmin
2018-04-28 21:30 Andrey Kuzmin
2018-04-28 21:13 Luse, Paul E
2018-04-27 23:06 Harris, James R
2018-04-27 15:04 Andrey Kuzmin
2018-04-26 23:26 Walker, Benjamin
2018-04-26 12:25 Andrey Kuzmin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.