Awesome Andrey!  Given the quick preview of the changes below I’m sort of leaning towards just removing the check and adding some comments in the code and an entry in the changelog. I don’t feel like the additional code below is really worth what it buys us.  You can see what Jim or Ben think though as they’ve both chimed in on this. 

 

And, of course, you never need to ask before submitting a patch…GerritHub is meant to discuss these kinds of things.  Understand if you’d rather hold off but for small stuff like this it’s usually best just to submit the patch and let the review process work through it.

 

Thx

Paul

 

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Andrey Kuzmin
Sent: Wednesday, May 2, 2018 6:41 AM
To: Storage Performance Development Kit <spdk@lists.01.org>
Subject: Re: [SPDK] Bdev claim and single writer semantics

 

So how do we move this forward? Should I submit a patch based on the below approach, or just remove claimed check (and, presumably, leave bdev_open interface as is so that the caller can still limit access to read-only if desired)?

 

Regards,

Andrey

On Mon, Apr 30, 2018, 20:26 Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote:

It looks like the resistance/flexibility sweet spot is in delegating
the semantic decision to the module's author by adding a boolean to
the module definition, to be set in the module's declaration if single
writer semantics is desired. This seems to deliver both the least
resistance path toward changing the current default and some
flexibility further down the road.

I have attached the patch sketch inline below so as to get the go/no-go input.

Regards,
Andrey

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 46a845c..aa8a9e7 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,7 +1,28 @@
 # Changelog

+### bdev
+
+Default block device sharing mode has been changed.
+
+Previously, multiple write descriptors could be used only for block
devices that
+didn't have any claims (and thus no virtual block devices stacked
upon it). This
+is now the default behavior for ANY block device, including claimed ones.
+
+To revert to the previos behavior, virtual block device module must explicitly
+claim single-writer semantics in the module's declaration as in the
example below.
+
+```C
+static struct spdk_bdev_module single_writer_module_if = {
+ .name = "single-writer-virtual-block-device",
+ .single_writer = true,
+ /* Other module declarations */
+};
+```
+
+
+
 ## v18.04: Logical Volume Snapshot/Clone, iSCSI Initiator, Bdev QoS,
VPP Userspace TCP/IP

 ### vhost

 The SPDK vhost-scsi, vhost-blk and vhost-nvme applications have fixes
to address the
diff --git a/include/spdk_internal/bdev.h b/include/spdk_internal/bdev.h

index 9988d4d..2d37cdd 100644
--- a/include/spdk_internal/bdev.h
+++ b/include/spdk_internal/bdev.h
@@ -131,10 +131,17 @@
  * create their own vbdevs.
  */
  void (*examine)(struct spdk_bdev *bdev);

  /**
+ * Specifies if the module enforces single-writer semantics for bdevs
it claims.
+ * If set, any bdev claimed by this module will be writable only via a bdev
+ * descriptor obtained by opening this module's bdev.
+ */
+ bool single_writer;
+
+ /**
  * Count of bdev inits/examinations in progress. Used by generic bdev
  * layer and must not be modified by bdev modules.
  *
  * \note Used internally by bdev subsystem, don't change this value
in bdev module.
  */
diff --git a/lib/bdev/bdev.c b/lib/bdev/bdev.c
index 93b222f..a730e45 100644
--- a/lib/bdev/bdev.c
+++ b/lib/bdev/bdev.c
@@ -2700,12 +2700,14 @@
  return -ENOMEM;
  }

  pthread_mutex_lock(&bdev->mutex);

- if (write && bdev->claim_module) {
- SPDK_INFOLOG(SPDK_LOG_BDEV, "Could not open %s - already claimed\n",
bdev->name);
+ if (write && bdev->claim_module && bdev->claim_module->single_writer) {
+ SPDK_INFOLOG(SPDK_LOG_BDEV, "Could not open %s as writable - "
+ "claimed by single-writer module %s\n", bdev->name,
+ bdev->claim_module->name);
  free(desc);
  pthread_mutex_unlock(&bdev->mutex);
  return -EPERM;
  }

Regards,
Andrey


On Sun, Apr 29, 2018 at 12:35 AM, Andrey Kuzmin
<andrey.v.kuzmin@gmail.com> wrote:
>
>
> On Sun, Apr 29, 2018, 00:14 Luse, Paul E <paul.e.luse@intel.com> wrote:
>>
>> Maybe make the check a compilation option and default to off? That makes
>> it very clear that it’s a consideration at least and easy to flip it however
>> depending...
>
>
> Whatever the mechanism - compile- or configuration time - there still will
> be some coding involved, either making some code conditional or checking the
> config option at runtime. Personally, I think compile-time is best suited
> for configuring the build rather than the installation. Just my $.02.
>
> Regards,
> Andrey
>>
>>
>> Thx
>> Paul
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Harris, James R
>> Sent: Friday, April 27, 2018 4:06 PM
>> To: Storage Performance Development Kit <spdk@lists.01.org>
>> Subject: Re: [SPDK] Bdev claim and single writer semantics
>>
>>
>>
>> On 4/27/18, 8:04 AM, "SPDK on behalf of Andrey Kuzmin"
>> <spdk-bounces@lists.01.org on behalf of andrey.v.kuzmin@gmail.com> wrote:
>>
>>     On Fri, Apr 27, 2018 at 2:26 AM, Walker, Benjamin
>>     <benjamin.walker@intel.com> wrote:
>>     > On Thu, 2018-04-26 at 15:25 +0300, Andrey Kuzmin wrote:
>>     >> Hi all,
>>   >>
>>
>> <snip>
>>
>>     >> While adding support for the shared access semantics seems to be
>>     >> pretty straightforward, I thought it reasonable to bring the issue
>> up
>>     >> here first, looking for insights, comments, and objections. Please
>> let
>>     >> me know if there are any or I can give a shot to a shared access
>>     >> patch.
>>     >
>>     > Your analysis is all correct. There are two access control
>> mechanisms in the
>>     > bdev layer for different purposes; claiming, which is for stacking
>> bdevs
>>     > together to make I/O pipelines, and descriptors, which is standard
>> read/write
>>     > access control for consumers of bdevs. We thought it made the most
>> sense to
>>     > allow write access to a bdev from only a single module for safety
>> reasons.
>>
>>     What seems to be a fairly reasonable safety measure for a
>> general-purpose
>>     I/O stack can easily become an unwarranted constraint in a
>> tightly-controlled
>>     (say, appliance) settings. Turning single writer into a stack-level
>>     configurable
>>     option could buy us the best of both worlds at once.
>>
>> I’m wondering if we should just remove the safety check.  The original
>> concern was to prevent the overwriting of metadata.  For example, an NVMe
>> bdev contains a logical volume store – the lvol bdev module has claimed the
>> bdev, opening and writing directly to the NVMe bdev could corrupt the
>> metadata.
>>
>> But I agree there are a lot of use cases where you may still want to allow
>> this for one reason or another.  And the Linux kernel doesn’t try to prevent
>> this – you can put an LVM PV/VG on a block device and then write zeroes to
>> the underlying block device.  Same with using parted to put down a GPT and
>> creating a partition.
>>
>> So basically, it’s up to the user to know what they are doing.
>>
>> -Jim
>>
>>
>>
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK@lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK@lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>
> --
>
> Regards,
> Andrey

--

Regards,
Andrey