All of lore.kernel.org
 help / color / mirror / Atom feed
* [SPDK] Re: a proposal of raid1
@ 2019-10-12 21:56 Luse, Paul E
  0 siblings, 0 replies; 4+ messages in thread
From: Luse, Paul E @ 2019-10-12 21:56 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10494 bytes --]

Hi Yu,

Wrt item #1, that has been designed yet but yes it will need to. I suspect the framework for that will come as part of the raid5 effort.

On item #2, yes, your help would be much appreciated.  I'm sure you saw the full board at https://trello.com/b/4HEkWVvF/raid and you should feel free to volunteer for any of those. Artur is mainly coordinating the work, he's in Poland so depending on your time zone either email or using out Slack channel (and of course adding yourself to a trello card and updating with more details about how you see the implantation going) are  all good ways to get started.  Our community meetings https://spdk.io/community/ are also a great place to collaborate in real time.

That said, a few pretty common-sense guidelines for success in contributing to SPDK, especially when collaborating with others:

1) communicate frequently.  I mention the communications channels above, use what makes sense.
2) we're not big on design docs but with many of these things on the trello board for raid, it'd be good to post some informal design notes prior to coding to make sure what you're about to do fits in well with SPDK and the other raid activites.
3) whatever you're working on, use small patches in series (see https://spdk.io/development/) when proposing code.  It helps reviewers tremendously to have a dozen small patches broken up logically for review as opposed to one big one with multiple area being touched.
4) get your patches up for review quickly, even if the series isn't complete.  This way your implementation can be sanity checked before you complete a ton of work that may not be in the best direction

Looking forward to your participation!

-Paul

On 10/11/19, 9:53 PM, "peng yu" <yupeng0921(a)gmail.com> wrote:

    Hi, Paul and Ziye
    Thanks for your responses and suggestions. I have two following questions:
    1 will the raid1e support resync and record the sync status to
    someplace (e.g. bitmap)?
    2 I found there is a refactor backlog:
    https://trello.com/c/V5lGkXVY/27-refactor-the-bdevraid-module
    May I have any opportunity to contribute to this backlog or other raid
    related backlogs? I hope to let raid1 ready for spdk quickly, no
    matter whether the raid1 is my implementation or not. I need this
    feature for one of my personal project. Raid1 will be used for both
    data HA and cloning data online.
    
    Best regards
    
    On Fri, Oct 11, 2019 at 5:57 AM Luse, Paul E <paul.e.luse(a)intel.com> wrote:
    >
    > Hi Yu, I'm glad you brought this up as there's been a lot of activity in this lately.  I'll summarize and then address your points below:
    >
    > * RAID1 has come up a few times but there's never been a strong pull for it so it's never been fully completed
    > * The older patch below is from Ziye and is a standalone basic RAID1 that doesn't have some features that most would consider required (rebuild, remembering state between boots, etc). It is unlikely to be carried forward but it good to still have around for discussion.
    > * RADI0 came in later and is what is in tree now.  It is considered production and what we'd like to use as the foundation for adding RAID levels in the future.
    > * I started to refactor the RAID0 code to accommodate RAID1E and have the mappings working here https://review.gerrithub.io/c/spdk/spdk/+/468187 however...
    > * In a recent community meeting some other folks from Intel proposed a RAID5 solution backed by a number of strong developers where the first step would be to further refactor what's there in order to accommodate more advanced features. So I've placed my RAID1E mapping on hold until that work takes off (and I'll participate in that as well, you are more than welcome to as well!)
    >
    >
    > On 10/11/19, 12:25 AM, "peng yu" <yupeng0921(a)gmail.com> wrote:
    >
    >     Hi, I'm trying to implement raid1 in SPDK. I researched the online
    >     resources of spdk, and I found current it only support raid level 0. I
    >     found there are some discussions about raid1. Here is a raid1 patch:
    >     https://review.gerrithub.io/c/spdk/spdk/+/385341/21/lib/bdev/raid1/vbdev_raid1.c
    >     It looks like only dispatch IO to two disks, no resync, hot replace or
    >     bitmap, and it is not merge to spdk mainline.
    >
    > PL> Correct, this is the first one I mention above.
    >
    >     I also found Paul has a proposal of raid1:
    >     https://trello.com/c/ywQOX109/43-add-mirroring-raid-1-support-to-raid-bdev-module
    >     But I don't find the progress about it.
    >
    > PL> I put the link to the latest of RAID1E above.
    >
    >     I write a raid1 implementation here:
    >     https://github.com/yupeng0921/spdk/tree/raid1/module/bdev/raid1
    >     It is not completed yet. Below are my idea:
    >     (1). use bitmap to record the sync up status of the two data disks
    >     (2). the bitmap and super block are stored in separate metadata disks
    >     (3). every data disk has a corresponding metadata disks, similar as
    >     the linux kernel dm-raid
    >     (4). every raid1 bdev is managed by a single thread, all IOs will be
    >     handled by this thread.
    >     The IO write logic:
    >     To optimize the IO performance, using batch writing and delay clearing
    >     for the bitmap. The bitmap is grouped to region. A region is a PAGE of
    >     bitmap. When the bdev receives an IO, puts the IO to a queue, and set
    >     bitmap, waits for a while, then writes a PAGE of bitmap together, and
    >     then delivers IOs to data disks. Creates a counter for every bit in
    >     the bitmap. The counter is used to record how many writing IOs are
    >     inflight for a given bit. When the counter is decreased to 0, clear
    >     the bit in the bitmap, and put the region to a clear queue, then the
    >     bits in this region will be write to disk later.
    >     The resync logic:
    >     use two bitmaps for resync: needed_bm and active_bm. The needed_bm
    >     indicates which bit need to be sync up, the active_bm indicates which
    >     bit is syncing up now. The master thread of the raid1 bdev scans the
    >     needed_bm, if a bit is 1, and no writing IO on this bit, issues a
    >     resync (read from one disk and write to another disk), and set
    >     active_bm to 1. If there are wring IOs on the bit, put the resync
    >     command to a queue, when an IO completes, and the IO counter of this
    >     bit decreases to 0, the resync command will be picked up from the
    >     list, and be issued.
    >     About hot swapping:
    >     In my idea, I don't want to implement hot swaping (or hot spare) in
    >     the raid1 code itself. I hope to write another vbdev. This vbdev has
    >     below features:
    >     generally it works as a pass through device.
    >     it could be suspend and resume
    >     during suspending, we create replace the underlying device with a new one.
    >     Thus, we could create this vbdev on top of a raid1 device, then if we
    >     want to replace a data disk, we could suspend the vbdev, destruct the
    >     raid1 device, rebuild the raid1 with a new data disk.
    >     Below are tests I did:
    >     (1) run a spdk app
    >     sudo ./app/spdk_tgt/spdk_tgt
    >     (2) create meta disks and data disks:
    >     sudo python3 ./scripts/rpc.py bdev_malloc_create -b meta0 20 4096
    >     sudo python3 ./scripts/rpc.py bdev_malloc_create -b data0 64 4096
    >     sudo python3 ./scripts/rpc.py bdev_malloc_create -b meta1 20 4096
    >     sudo python3 ./scripts/rpc.py bdev_malloc_create -b data1 64 4096
    >     (3) create raid1 device:
    >     sudo python3 ./scripts/rpc.py create_raid1_bdev -n my_raid1
    >     --meta0-name meta0 --data0-name data0 --meta1-name meta1 --data1-name
    >     data1 --data-size-mb 64
    >     (4) create a test file for io:
    >     dmesg > /tmp/data
    >     (5) write 4k data from filesystem to raid1:
    >     sudo python3 ./scripts/rpc.py test_raid1_io -n my_raid1 --file-path
    >     /tmp/data --rw=1 --offset=0 --nbytes=4096
    >     (6) read 4k data from raid1 to file system:
    >     sudo python3 ./scripts/rpc.py test_raid1_io -n my_raid1 --file-path
    >     /tmp/data1 --rw=0 --offset=0 --nbytes=4096
    >     (7) delete the raid1 device:
    >     sudo python3 ./scripts/rpc.py delete_raid1_bdev -n my_raid1
    >
    >     Currently it only supports to create a new raid1 device. I'm trying
    >     write the rebuild logic, we could let a raid1 device could be
    >     re-constructed after destruct/delete. Please let me know whether such
    >     an idea could be accepted by spdk.
    >
    > Wow, you've done a decent amount of work on this! I haven't looked at the code yet but your ideas all sound sane, I'd have to think more about the hot plug notion though.  Either way, here's what I would suggest:
    >
    > * Get your code up on GerritHub as that's how we review things in the SPDK community: https://spdk.io/development/
    > * Look for an email soon from Artur, he's going to setup a Trello board with a task list for the current RAID0 module to begin getting it more genericized and ready to be extended.
    >
    > I think my best advice now would be to help with the effort I just mentioned and then see how your ideas fit. The SPDK community is unlikely to upstream multiple RAID modules (just makes it too confusing and hard to maintain) but there's plenty of time to make significant contributions.  Or, if you never intended to upstream yours you can, of course, continue development on your own knowng the community is always here to help with general SPDK interfacing and architecture questions.
    >
    > I would also suggest out community meetings as a place to discuss this with the key SPDK contributors and maintainers: https://spdk.io/community/
    >
    > Thanks for your email!!!
    > Paul
    >     ______________________________________________
    >
    >     SPDK mailing list -- spdk(a)lists.01.org
    >     To unsubscribe send an email to spdk-leave(a)lists.01.org
    >
    >
    > _______________________________________________
    > SPDK mailing list -- spdk(a)lists.01.org
    > To unsubscribe send an email to spdk-leave(a)lists.01.org
    


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [SPDK] Re: a proposal of raid1
@ 2019-10-12  4:53 peng yu
  0 siblings, 0 replies; 4+ messages in thread
From: peng yu @ 2019-10-12  4:53 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8183 bytes --]

Hi, Paul and Ziye
Thanks for your responses and suggestions. I have two following questions:
1 will the raid1e support resync and record the sync status to
someplace (e.g. bitmap)?
2 I found there is a refactor backlog:
https://trello.com/c/V5lGkXVY/27-refactor-the-bdevraid-module
May I have any opportunity to contribute to this backlog or other raid
related backlogs? I hope to let raid1 ready for spdk quickly, no
matter whether the raid1 is my implementation or not. I need this
feature for one of my personal project. Raid1 will be used for both
data HA and cloning data online.

Best regards

On Fri, Oct 11, 2019 at 5:57 AM Luse, Paul E <paul.e.luse(a)intel.com> wrote:
>
> Hi Yu, I'm glad you brought this up as there's been a lot of activity in this lately.  I'll summarize and then address your points below:
>
> * RAID1 has come up a few times but there's never been a strong pull for it so it's never been fully completed
> * The older patch below is from Ziye and is a standalone basic RAID1 that doesn't have some features that most would consider required (rebuild, remembering state between boots, etc). It is unlikely to be carried forward but it good to still have around for discussion.
> * RADI0 came in later and is what is in tree now.  It is considered production and what we'd like to use as the foundation for adding RAID levels in the future.
> * I started to refactor the RAID0 code to accommodate RAID1E and have the mappings working here https://review.gerrithub.io/c/spdk/spdk/+/468187 however...
> * In a recent community meeting some other folks from Intel proposed a RAID5 solution backed by a number of strong developers where the first step would be to further refactor what's there in order to accommodate more advanced features. So I've placed my RAID1E mapping on hold until that work takes off (and I'll participate in that as well, you are more than welcome to as well!)
>
>
> On 10/11/19, 12:25 AM, "peng yu" <yupeng0921(a)gmail.com> wrote:
>
>     Hi, I'm trying to implement raid1 in SPDK. I researched the online
>     resources of spdk, and I found current it only support raid level 0. I
>     found there are some discussions about raid1. Here is a raid1 patch:
>     https://review.gerrithub.io/c/spdk/spdk/+/385341/21/lib/bdev/raid1/vbdev_raid1.c
>     It looks like only dispatch IO to two disks, no resync, hot replace or
>     bitmap, and it is not merge to spdk mainline.
>
> PL> Correct, this is the first one I mention above.
>
>     I also found Paul has a proposal of raid1:
>     https://trello.com/c/ywQOX109/43-add-mirroring-raid-1-support-to-raid-bdev-module
>     But I don't find the progress about it.
>
> PL> I put the link to the latest of RAID1E above.
>
>     I write a raid1 implementation here:
>     https://github.com/yupeng0921/spdk/tree/raid1/module/bdev/raid1
>     It is not completed yet. Below are my idea:
>     (1). use bitmap to record the sync up status of the two data disks
>     (2). the bitmap and super block are stored in separate metadata disks
>     (3). every data disk has a corresponding metadata disks, similar as
>     the linux kernel dm-raid
>     (4). every raid1 bdev is managed by a single thread, all IOs will be
>     handled by this thread.
>     The IO write logic:
>     To optimize the IO performance, using batch writing and delay clearing
>     for the bitmap. The bitmap is grouped to region. A region is a PAGE of
>     bitmap. When the bdev receives an IO, puts the IO to a queue, and set
>     bitmap, waits for a while, then writes a PAGE of bitmap together, and
>     then delivers IOs to data disks. Creates a counter for every bit in
>     the bitmap. The counter is used to record how many writing IOs are
>     inflight for a given bit. When the counter is decreased to 0, clear
>     the bit in the bitmap, and put the region to a clear queue, then the
>     bits in this region will be write to disk later.
>     The resync logic:
>     use two bitmaps for resync: needed_bm and active_bm. The needed_bm
>     indicates which bit need to be sync up, the active_bm indicates which
>     bit is syncing up now. The master thread of the raid1 bdev scans the
>     needed_bm, if a bit is 1, and no writing IO on this bit, issues a
>     resync (read from one disk and write to another disk), and set
>     active_bm to 1. If there are wring IOs on the bit, put the resync
>     command to a queue, when an IO completes, and the IO counter of this
>     bit decreases to 0, the resync command will be picked up from the
>     list, and be issued.
>     About hot swapping:
>     In my idea, I don't want to implement hot swaping (or hot spare) in
>     the raid1 code itself. I hope to write another vbdev. This vbdev has
>     below features:
>     generally it works as a pass through device.
>     it could be suspend and resume
>     during suspending, we create replace the underlying device with a new one.
>     Thus, we could create this vbdev on top of a raid1 device, then if we
>     want to replace a data disk, we could suspend the vbdev, destruct the
>     raid1 device, rebuild the raid1 with a new data disk.
>     Below are tests I did:
>     (1) run a spdk app
>     sudo ./app/spdk_tgt/spdk_tgt
>     (2) create meta disks and data disks:
>     sudo python3 ./scripts/rpc.py bdev_malloc_create -b meta0 20 4096
>     sudo python3 ./scripts/rpc.py bdev_malloc_create -b data0 64 4096
>     sudo python3 ./scripts/rpc.py bdev_malloc_create -b meta1 20 4096
>     sudo python3 ./scripts/rpc.py bdev_malloc_create -b data1 64 4096
>     (3) create raid1 device:
>     sudo python3 ./scripts/rpc.py create_raid1_bdev -n my_raid1
>     --meta0-name meta0 --data0-name data0 --meta1-name meta1 --data1-name
>     data1 --data-size-mb 64
>     (4) create a test file for io:
>     dmesg > /tmp/data
>     (5) write 4k data from filesystem to raid1:
>     sudo python3 ./scripts/rpc.py test_raid1_io -n my_raid1 --file-path
>     /tmp/data --rw=1 --offset=0 --nbytes=4096
>     (6) read 4k data from raid1 to file system:
>     sudo python3 ./scripts/rpc.py test_raid1_io -n my_raid1 --file-path
>     /tmp/data1 --rw=0 --offset=0 --nbytes=4096
>     (7) delete the raid1 device:
>     sudo python3 ./scripts/rpc.py delete_raid1_bdev -n my_raid1
>
>     Currently it only supports to create a new raid1 device. I'm trying
>     write the rebuild logic, we could let a raid1 device could be
>     re-constructed after destruct/delete. Please let me know whether such
>     an idea could be accepted by spdk.
>
> Wow, you've done a decent amount of work on this! I haven't looked at the code yet but your ideas all sound sane, I'd have to think more about the hot plug notion though.  Either way, here's what I would suggest:
>
> * Get your code up on GerritHub as that's how we review things in the SPDK community: https://spdk.io/development/
> * Look for an email soon from Artur, he's going to setup a Trello board with a task list for the current RAID0 module to begin getting it more genericized and ready to be extended.
>
> I think my best advice now would be to help with the effort I just mentioned and then see how your ideas fit. The SPDK community is unlikely to upstream multiple RAID modules (just makes it too confusing and hard to maintain) but there's plenty of time to make significant contributions.  Or, if you never intended to upstream yours you can, of course, continue development on your own knowng the community is always here to help with general SPDK interfacing and architecture questions.
>
> I would also suggest out community meetings as a place to discuss this with the key SPDK contributors and maintainers: https://spdk.io/community/
>
> Thanks for your email!!!
> Paul
>     ______________________________________________
>
>     SPDK mailing list -- spdk(a)lists.01.org
>     To unsubscribe send an email to spdk-leave(a)lists.01.org
>
>
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org
> To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [SPDK] Re: a proposal of raid1
@ 2019-10-11 14:11 Yang, Ziye
  0 siblings, 0 replies; 4+ messages in thread
From: Yang, Ziye @ 2019-10-11 14:11 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4931 bytes --]

Hi Yu,

My patch is written a very long time ago, that time I found the some data structure is only available to implement Raid0, then I write the code to implement the Raid1, which is used to as a reference with the real solution. It is very simple, and short of rebuild, or other features. I am glad to see you work on the real raid1 and post your patches. But if you want to your patch be merged into spdk, please do provide your design doc and send your code to review (refer the development guide in http://spdk.io)

Thanks.




Best Regards
Ziye Yang 


-----Original Message-----
From: peng yu [mailto:yupeng0921(a)gmail.com] 
Sent: Friday, October 11, 2019 12:25 AM
To: spdk(a)lists.01.org
Subject: [SPDK] a proposal of raid1

Hi, I'm trying to implement raid1 in SPDK. I researched the online resources of spdk, and I found current it only support raid level 0. I found there are some discussions about raid1. Here is a raid1 patch:
https://review.gerrithub.io/c/spdk/spdk/+/385341/21/lib/bdev/raid1/vbdev_raid1.c
It looks like only dispatch IO to two disks, no resync, hot replace or bitmap, and it is not merge to spdk mainline.
I also found Paul has a proposal of raid1:
https://trello.com/c/ywQOX109/43-add-mirroring-raid-1-support-to-raid-bdev-module
But I don't find the progress about it.
I write a raid1 implementation here:
https://github.com/yupeng0921/spdk/tree/raid1/module/bdev/raid1
It is not completed yet. Below are my idea:
(1). use bitmap to record the sync up status of the two data disks (2). the bitmap and super block are stored in separate metadata disks (3). every data disk has a corresponding metadata disks, similar as the linux kernel dm-raid (4). every raid1 bdev is managed by a single thread, all IOs will be handled by this thread.
The IO write logic:
To optimize the IO performance, using batch writing and delay clearing for the bitmap. The bitmap is grouped to region. A region is a PAGE of bitmap. When the bdev receives an IO, puts the IO to a queue, and set bitmap, waits for a while, then writes a PAGE of bitmap together, and then delivers IOs to data disks. Creates a counter for every bit in the bitmap. The counter is used to record how many writing IOs are inflight for a given bit. When the counter is decreased to 0, clear the bit in the bitmap, and put the region to a clear queue, then the bits in this region will be write to disk later.
The resync logic:
use two bitmaps for resync: needed_bm and active_bm. The needed_bm indicates which bit need to be sync up, the active_bm indicates which bit is syncing up now. The master thread of the raid1 bdev scans the needed_bm, if a bit is 1, and no writing IO on this bit, issues a resync (read from one disk and write to another disk), and set active_bm to 1. If there are wring IOs on the bit, put the resync command to a queue, when an IO completes, and the IO counter of this bit decreases to 0, the resync command will be picked up from the list, and be issued.
About hot swapping:
In my idea, I don't want to implement hot swaping (or hot spare) in the raid1 code itself. I hope to write another vbdev. This vbdev has below features:
generally it works as a pass through device.
it could be suspend and resume
during suspending, we create replace the underlying device with a new one.
Thus, we could create this vbdev on top of a raid1 device, then if we want to replace a data disk, we could suspend the vbdev, destruct the
raid1 device, rebuild the raid1 with a new data disk.
Below are tests I did:
(1) run a spdk app
sudo ./app/spdk_tgt/spdk_tgt
(2) create meta disks and data disks:
sudo python3 ./scripts/rpc.py bdev_malloc_create -b meta0 20 4096 sudo python3 ./scripts/rpc.py bdev_malloc_create -b data0 64 4096 sudo python3 ./scripts/rpc.py bdev_malloc_create -b meta1 20 4096 sudo python3 ./scripts/rpc.py bdev_malloc_create -b data1 64 4096
(3) create raid1 device:
sudo python3 ./scripts/rpc.py create_raid1_bdev -n my_raid1 --meta0-name meta0 --data0-name data0 --meta1-name meta1 --data1-name
data1 --data-size-mb 64
(4) create a test file for io:
dmesg > /tmp/data
(5) write 4k data from filesystem to raid1:
sudo python3 ./scripts/rpc.py test_raid1_io -n my_raid1 --file-path /tmp/data --rw=1 --offset=0 --nbytes=4096
(6) read 4k data from raid1 to file system:
sudo python3 ./scripts/rpc.py test_raid1_io -n my_raid1 --file-path
/tmp/data1 --rw=0 --offset=0 --nbytes=4096
(7) delete the raid1 device:
sudo python3 ./scripts/rpc.py delete_raid1_bdev -n my_raid1

Currently it only supports to create a new raid1 device. I'm trying write the rebuild logic, we could let a raid1 device could be re-constructed after destruct/delete. Please let me know whether such an idea could be accepted by spdk.
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [SPDK] Re: a proposal of raid1
@ 2019-10-11 12:57 Luse, Paul E
  0 siblings, 0 replies; 4+ messages in thread
From: Luse, Paul E @ 2019-10-11 12:57 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7140 bytes --]

Hi Yu, I'm glad you brought this up as there's been a lot of activity in this lately.  I'll summarize and then address your points below:

* RAID1 has come up a few times but there's never been a strong pull for it so it's never been fully completed
* The older patch below is from Ziye and is a standalone basic RAID1 that doesn't have some features that most would consider required (rebuild, remembering state between boots, etc). It is unlikely to be carried forward but it good to still have around for discussion.
* RADI0 came in later and is what is in tree now.  It is considered production and what we'd like to use as the foundation for adding RAID levels in the future.
* I started to refactor the RAID0 code to accommodate RAID1E and have the mappings working here https://review.gerrithub.io/c/spdk/spdk/+/468187 however...
* In a recent community meeting some other folks from Intel proposed a RAID5 solution backed by a number of strong developers where the first step would be to further refactor what's there in order to accommodate more advanced features. So I've placed my RAID1E mapping on hold until that work takes off (and I'll participate in that as well, you are more than welcome to as well!)


On 10/11/19, 12:25 AM, "peng yu" <yupeng0921(a)gmail.com> wrote:

    Hi, I'm trying to implement raid1 in SPDK. I researched the online
    resources of spdk, and I found current it only support raid level 0. I
    found there are some discussions about raid1. Here is a raid1 patch:
    https://review.gerrithub.io/c/spdk/spdk/+/385341/21/lib/bdev/raid1/vbdev_raid1.c
    It looks like only dispatch IO to two disks, no resync, hot replace or
    bitmap, and it is not merge to spdk mainline.

PL> Correct, this is the first one I mention above.

    I also found Paul has a proposal of raid1:
    https://trello.com/c/ywQOX109/43-add-mirroring-raid-1-support-to-raid-bdev-module
    But I don't find the progress about it.

PL> I put the link to the latest of RAID1E above.

    I write a raid1 implementation here:
    https://github.com/yupeng0921/spdk/tree/raid1/module/bdev/raid1
    It is not completed yet. Below are my idea:
    (1). use bitmap to record the sync up status of the two data disks
    (2). the bitmap and super block are stored in separate metadata disks
    (3). every data disk has a corresponding metadata disks, similar as
    the linux kernel dm-raid
    (4). every raid1 bdev is managed by a single thread, all IOs will be
    handled by this thread.
    The IO write logic:
    To optimize the IO performance, using batch writing and delay clearing
    for the bitmap. The bitmap is grouped to region. A region is a PAGE of
    bitmap. When the bdev receives an IO, puts the IO to a queue, and set
    bitmap, waits for a while, then writes a PAGE of bitmap together, and
    then delivers IOs to data disks. Creates a counter for every bit in
    the bitmap. The counter is used to record how many writing IOs are
    inflight for a given bit. When the counter is decreased to 0, clear
    the bit in the bitmap, and put the region to a clear queue, then the
    bits in this region will be write to disk later.
    The resync logic:
    use two bitmaps for resync: needed_bm and active_bm. The needed_bm
    indicates which bit need to be sync up, the active_bm indicates which
    bit is syncing up now. The master thread of the raid1 bdev scans the
    needed_bm, if a bit is 1, and no writing IO on this bit, issues a
    resync (read from one disk and write to another disk), and set
    active_bm to 1. If there are wring IOs on the bit, put the resync
    command to a queue, when an IO completes, and the IO counter of this
    bit decreases to 0, the resync command will be picked up from the
    list, and be issued.
    About hot swapping:
    In my idea, I don't want to implement hot swaping (or hot spare) in
    the raid1 code itself. I hope to write another vbdev. This vbdev has
    below features:
    generally it works as a pass through device.
    it could be suspend and resume
    during suspending, we create replace the underlying device with a new one.
    Thus, we could create this vbdev on top of a raid1 device, then if we
    want to replace a data disk, we could suspend the vbdev, destruct the
    raid1 device, rebuild the raid1 with a new data disk.
    Below are tests I did:
    (1) run a spdk app
    sudo ./app/spdk_tgt/spdk_tgt
    (2) create meta disks and data disks:
    sudo python3 ./scripts/rpc.py bdev_malloc_create -b meta0 20 4096
    sudo python3 ./scripts/rpc.py bdev_malloc_create -b data0 64 4096
    sudo python3 ./scripts/rpc.py bdev_malloc_create -b meta1 20 4096
    sudo python3 ./scripts/rpc.py bdev_malloc_create -b data1 64 4096
    (3) create raid1 device:
    sudo python3 ./scripts/rpc.py create_raid1_bdev -n my_raid1
    --meta0-name meta0 --data0-name data0 --meta1-name meta1 --data1-name
    data1 --data-size-mb 64
    (4) create a test file for io:
    dmesg > /tmp/data
    (5) write 4k data from filesystem to raid1:
    sudo python3 ./scripts/rpc.py test_raid1_io -n my_raid1 --file-path
    /tmp/data --rw=1 --offset=0 --nbytes=4096
    (6) read 4k data from raid1 to file system:
    sudo python3 ./scripts/rpc.py test_raid1_io -n my_raid1 --file-path
    /tmp/data1 --rw=0 --offset=0 --nbytes=4096
    (7) delete the raid1 device:
    sudo python3 ./scripts/rpc.py delete_raid1_bdev -n my_raid1
    
    Currently it only supports to create a new raid1 device. I'm trying
    write the rebuild logic, we could let a raid1 device could be
    re-constructed after destruct/delete. Please let me know whether such
    an idea could be accepted by spdk.

Wow, you've done a decent amount of work on this! I haven't looked at the code yet but your ideas all sound sane, I'd have to think more about the hot plug notion though.  Either way, here's what I would suggest:

* Get your code up on GerritHub as that's how we review things in the SPDK community: https://spdk.io/development/ 
* Look for an email soon from Artur, he's going to setup a Trello board with a task list for the current RAID0 module to begin getting it more genericized and ready to be extended.

I think my best advice now would be to help with the effort I just mentioned and then see how your ideas fit. The SPDK community is unlikely to upstream multiple RAID modules (just makes it too confusing and hard to maintain) but there's plenty of time to make significant contributions.  Or, if you never intended to upstream yours you can, of course, continue development on your own knowng the community is always here to help with general SPDK interfacing and architecture questions.

I would also suggest out community meetings as a place to discuss this with the key SPDK contributors and maintainers: https://spdk.io/community/ 

Thanks for your email!!!
Paul
    ______________________________________________

    SPDK mailing list -- spdk(a)lists.01.org
    To unsubscribe send an email to spdk-leave(a)lists.01.org
    


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-10-12 21:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-12 21:56 [SPDK] Re: a proposal of raid1 Luse, Paul E
  -- strict thread matches above, loose matches on Subject: below --
2019-10-12  4:53 peng yu
2019-10-11 14:11 Yang, Ziye
2019-10-11 12:57 Luse, Paul E

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.