All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] Basic design of linear bdev (aggregating multi bdevs)
@ 2018-09-17 11:00 Senthil Kumar Veluswamy
  0 siblings, 0 replies; 4+ messages in thread
From: Senthil Kumar Veluswamy @ 2018-09-17 11:00 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7767 bytes --]

Hi Shuhei,
     Here are my thoughts.

1. As you said, "linear RAID" is not actually a RAID and it is just a concatenation of bdevs. One use case could be that the "linear RAID" allows to a have mix and match of bdevs with different block-counts. 
    This is where it differs from current RAID 0 implementation. It is not meant for any performance improvements or data protection. 

2. If LVS is constructed with this "linear RAID" bdev then it is possible to create logical volumes and I/O distribution is depends on how lvol are created and presented to hosts. 

3. The I/O splitting happens when "I/O offset + length" falls across two or more bdevs. So, I/O splitting could be two or more times. For this, current I/O splitting & wait-queue logic can be leveraged. 

Regards,
Senthil Kumar V.
 

On 14/09/18, 4:39 AM, "SPDK on behalf of 松本周平 / MATSUMOTO,SHUUHEI" <spdk-bounces(a)lists.01.org on behalf of shuhei.matsumoto.xt(a)hitachi.com> wrote:

    Hi Ben,
    
    
    Thank you so much for your feedback.
    
    
    Just to clarify a bit more - this bdev concatenates bdevs, placing one after the
    other, right?
    
    
    Yes.
    
    
    Are you able to share use cases for this type of volume?
    
    
     Often, a RAID 0 with very large strips
    (1MB) ends up working better. There are other considerations, such as what
    happens when a disk fails, that may make RAID 0 infeasible, but it's something
    to consider.
    
    I had talked with my colleagues before and what we want to have for practical use cases is higher levels of RAID and their variants.
    
    Not linear bdev. Linear bdev is not RAID actually.
    
    
    Please share with us if any one have specific idea.
    
    
    My primary concern is that higher layers in the software stack often are designed
    to prefer to write to lower LBAs, so concatenation results in hammering the
    first disk and hardly touching the rest.
    
    Your concern makes sense to me.
    
    
    There are other considerations, such as what
    happens when a disk fails, that may make RAID 0 infeasible, but it's something
    to consider.
    
    Yes.
    
    For a bdev that is made as a concatenation of other bdevs there won't be much
    splitting necessary and I think the algorithm is unique enough that you could do
    something specific to the concatentation code instead of relying on generic
    splitting in the bdev layer. I think that's a simpler approach overall.
    
    OK. As you say, times of splitting will be two at the maximum in linear bdev.
    
    The raid bdev is on the right track and going to higher levels of RAID directly or RAID 1 may be reasonable.
    Linear bdev may be helpful for SPDK users as introdcution material but RAID 0 of the raid bdev is well considered and sufficiently simple.
    
    So, I couldn't make up my mind and asked to mailing list.
    Now I could get valuable feedback and I will stop my effort to linear bdev as long as big change doesn't occur.
    
    Thanks,
    Shuhei
    
    
    ________________________________
    差出人: SPDK <spdk-bounces(a)lists.01.org> が Walker, Benjamin <benjamin.walker(a)intel.com> の代理で送信
    送信日時: 2018年9月14日 2:05:58
    宛先: spdk(a)lists.01.org
    件名: [!]Re: [SPDK] Basic design of linear bdev (aggregating multi bdevs)
    
    On Thu, 2018-09-13 at 02:46 +0000, 松本周平 / MATSUMOTO,SHUUHEI wrote:
    > Hi All,
    >
    >
    > As I talked shortly in the community before, I'm working on linear bdev
    > (aggregating mutilple bdevs) as part of my works.
    
    Just to clarify a bit more - this bdev concatenates bdevs, placing one after the
    other, right? Are you able to share use cases for this type of volume? My
    primary concern is that higher layers in the software stack often are designed
    to prefer to write to lower LBAs, so concatenation results in hammering the
    first disk and hardly touching the rest. Often, a RAID 0 with very large strips
    (1MB) ends up working better. There are other considerations, such as what
    happens when a disk fails, that may make RAID 0 infeasible, but it's something
    to consider.
    
    >
    >
    > The following is my present basic design of the linear bdev.
    >
    >
    > Linear bdev doesn't belong to raid but I want to add linear bdev as another
    > level of raid bdev.
    >
    > Raid bdev and IO splitting in bdev will be able to become the great
    > foundation.
    
    For a bdev that is made as a concatenation of other bdevs there won't be much
    splitting necessary and I think the algorithm is unique enough that you could do
    something specific to the concatentation code instead of relying on generic
    splitting in the bdev layer. I think that's a simpler approach overall.
    
    >
    >
    > bdev:
    > -  Utilize current IO splitting as much as possible.
    > -  In linear bdev, optimal_io_boundary is not constant and linear bdev can't
    > use current implementation based on optimal_io_boundary.
    >
    > -  Add an array made of (start, length) pair to struct spdk_bdev instead of
    > optimal_io_boundary.
    > -  Abstract the following APIs:
    >   - bool _spdk_bdev_io_should_split(struct spdk_bdev_io *bdev_io)
    >   - uint32_t _to_next_boundary(uint64_t offset, uint32_t boundary)
    > - The (start, length) array and optimal_io_boundary are mutually exclusive.
    >
    >
    > raid_bdev:
    > - Utilize current implementation as much as possible.
    > - Abstract the following:
    >  - raid_bdev_configure()
    >    - check consistency among base bdevs, calculate total block counts, add
    > split info to spdk_bdev
    >  - raid_bdev_start_rw_request() and following functions
    > - Use -1 as the level of the linear bdev.
    >
    >
    > rpc:
    > - Add an new parameter linear (bool) to construct_raid_bdev.
    > - raid_level and linear are mutually exclusive.
    >
    >
    > Your any feedback is very welcome.
    >
    >
    > Thanks,
    >
    > Shuhei
    > _______________________________________________
    > SPDK mailing list
    > SPDK(a)lists.01.org
    > https://clicktime.symantec.com/a/1/ZW3uBtRi2M7hq_aNnkR9PzJSlp_AYoGbjqo0wswWSYk=?d=NcXhrGN7CZ15MUcpwV4_e3cjJJH_4KhSif5lDhz1ejYZXyS9G1IDSRmbzsRn7gHR1DF9yTt9Dsu44QlZg8uq4Egqz7P0LBUbnDotRAmluhjbKBI7YQou0d4mlwuY42jXutXXqNEPF0TMgQNhFy4CK_gXrB5RhPXcPTzoR6LQfGro7ZRPoSGCMilWFVeCJGhoKD2rOxOr3gqWsBWD10WbO8dSakgmSVJJddcFx1NGuiZFREqVp3xX9HYOzn3R0CKD4dPX_fIDmpoHDg_B-0o--XioJfN5r6hqqEsWqYnQBkD2aWcR9P4FRkAjxMbwY0VIrS6CTiv3d5-ESINlOoXkI4Ef6ytGMK6n0Q1gpOr2Pq_EmPLvmVnE2YYmd3OahlRTlCJDIZoQWZjfXKILldDNHDeXf7pOld0XQZLDBZNhSKfG0FHX1A%3D%3D&u=https%3A%2F%2Flists.01.org%2Fmailman%2Flistinfo%2Fspdk
    
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://clicktime.symantec.com/a/1/ZW3uBtRi2M7hq_aNnkR9PzJSlp_AYoGbjqo0wswWSYk=?d=NcXhrGN7CZ15MUcpwV4_e3cjJJH_4KhSif5lDhz1ejYZXyS9G1IDSRmbzsRn7gHR1DF9yTt9Dsu44QlZg8uq4Egqz7P0LBUbnDotRAmluhjbKBI7YQou0d4mlwuY42jXutXXqNEPF0TMgQNhFy4CK_gXrB5RhPXcPTzoR6LQfGro7ZRPoSGCMilWFVeCJGhoKD2rOxOr3gqWsBWD10WbO8dSakgmSVJJddcFx1NGuiZFREqVp3xX9HYOzn3R0CKD4dPX_fIDmpoHDg_B-0o--XioJfN5r6hqqEsWqYnQBkD2aWcR9P4FRkAjxMbwY0VIrS6CTiv3d5-ESINlOoXkI4Ef6ytGMK6n0Q1gpOr2Pq_EmPLvmVnE2YYmd3OahlRTlCJDIZoQWZjfXKILldDNHDeXf7pOld0XQZLDBZNhSKfG0FHX1A%3D%3D&u=https%3A%2F%2Flists.01.org%2Fmailman%2Flistinfo%2Fspdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [SPDK] Basic design of linear bdev (aggregating multi bdevs)
@ 2018-09-13 23:09 
  0 siblings, 0 replies; 4+ messages in thread
From:  @ 2018-09-13 23:09 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6061 bytes --]

Hi Ben,


Thank you so much for your feedback.


Just to clarify a bit more - this bdev concatenates bdevs, placing one after the
other, right?


Yes.


Are you able to share use cases for this type of volume?


 Often, a RAID 0 with very large strips
(1MB) ends up working better. There are other considerations, such as what
happens when a disk fails, that may make RAID 0 infeasible, but it's something
to consider.

I had talked with my colleagues before and what we want to have for practical use cases is higher levels of RAID and their variants.

Not linear bdev. Linear bdev is not RAID actually.


Please share with us if any one have specific idea.


My primary concern is that higher layers in the software stack often are designed
to prefer to write to lower LBAs, so concatenation results in hammering the
first disk and hardly touching the rest.

Your concern makes sense to me.


There are other considerations, such as what
happens when a disk fails, that may make RAID 0 infeasible, but it's something
to consider.

Yes.

For a bdev that is made as a concatenation of other bdevs there won't be much
splitting necessary and I think the algorithm is unique enough that you could do
something specific to the concatentation code instead of relying on generic
splitting in the bdev layer. I think that's a simpler approach overall.

OK. As you say, times of splitting will be two at the maximum in linear bdev.

The raid bdev is on the right track and going to higher levels of RAID directly or RAID 1 may be reasonable.
Linear bdev may be helpful for SPDK users as introdcution material but RAID 0 of the raid bdev is well considered and sufficiently simple.

So, I couldn't make up my mind and asked to mailing list.
Now I could get valuable feedback and I will stop my effort to linear bdev as long as big change doesn't occur.

Thanks,
Shuhei


________________________________
差出人: SPDK <spdk-bounces(a)lists.01.org> が Walker, Benjamin <benjamin.walker(a)intel.com> の代理で送信
送信日時: 2018年9月14日 2:05:58
宛先: spdk(a)lists.01.org
件名: [!]Re: [SPDK] Basic design of linear bdev (aggregating multi bdevs)

On Thu, 2018-09-13 at 02:46 +0000, 松本周平 / MATSUMOTO,SHUUHEI wrote:
> Hi All,
>
>
> As I talked shortly in the community before, I'm working on linear bdev
> (aggregating mutilple bdevs) as part of my works.

Just to clarify a bit more - this bdev concatenates bdevs, placing one after the
other, right? Are you able to share use cases for this type of volume? My
primary concern is that higher layers in the software stack often are designed
to prefer to write to lower LBAs, so concatenation results in hammering the
first disk and hardly touching the rest. Often, a RAID 0 with very large strips
(1MB) ends up working better. There are other considerations, such as what
happens when a disk fails, that may make RAID 0 infeasible, but it's something
to consider.

>
>
> The following is my present basic design of the linear bdev.
>
>
> Linear bdev doesn't belong to raid but I want to add linear bdev as another
> level of raid bdev.
>
> Raid bdev and IO splitting in bdev will be able to become the great
> foundation.

For a bdev that is made as a concatenation of other bdevs there won't be much
splitting necessary and I think the algorithm is unique enough that you could do
something specific to the concatentation code instead of relying on generic
splitting in the bdev layer. I think that's a simpler approach overall.

>
>
> bdev:
> -  Utilize current IO splitting as much as possible.
> -  In linear bdev, optimal_io_boundary is not constant and linear bdev can't
> use current implementation based on optimal_io_boundary.
>
> -  Add an array made of (start, length) pair to struct spdk_bdev instead of
> optimal_io_boundary.
> -  Abstract the following APIs:
>   - bool _spdk_bdev_io_should_split(struct spdk_bdev_io *bdev_io)
>   - uint32_t _to_next_boundary(uint64_t offset, uint32_t boundary)
> - The (start, length) array and optimal_io_boundary are mutually exclusive.
>
>
> raid_bdev:
> - Utilize current implementation as much as possible.
> - Abstract the following:
>  - raid_bdev_configure()
>    - check consistency among base bdevs, calculate total block counts, add
> split info to spdk_bdev
>  - raid_bdev_start_rw_request() and following functions
> - Use -1 as the level of the linear bdev.
>
>
> rpc:
> - Add an new parameter linear (bool) to construct_raid_bdev.
> - raid_level and linear are mutually exclusive.
>
>
> Your any feedback is very welcome.
>
>
> Thanks,
>
> Shuhei
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://clicktime.symantec.com/a/1/ZW3uBtRi2M7hq_aNnkR9PzJSlp_AYoGbjqo0wswWSYk=?d=NcXhrGN7CZ15MUcpwV4_e3cjJJH_4KhSif5lDhz1ejYZXyS9G1IDSRmbzsRn7gHR1DF9yTt9Dsu44QlZg8uq4Egqz7P0LBUbnDotRAmluhjbKBI7YQou0d4mlwuY42jXutXXqNEPF0TMgQNhFy4CK_gXrB5RhPXcPTzoR6LQfGro7ZRPoSGCMilWFVeCJGhoKD2rOxOr3gqWsBWD10WbO8dSakgmSVJJddcFx1NGuiZFREqVp3xX9HYOzn3R0CKD4dPX_fIDmpoHDg_B-0o--XioJfN5r6hqqEsWqYnQBkD2aWcR9P4FRkAjxMbwY0VIrS6CTiv3d5-ESINlOoXkI4Ef6ytGMK6n0Q1gpOr2Pq_EmPLvmVnE2YYmd3OahlRTlCJDIZoQWZjfXKILldDNHDeXf7pOld0XQZLDBZNhSKfG0FHX1A%3D%3D&u=https%3A%2F%2Flists.01.org%2Fmailman%2Flistinfo%2Fspdk

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://clicktime.symantec.com/a/1/ZW3uBtRi2M7hq_aNnkR9PzJSlp_AYoGbjqo0wswWSYk=?d=NcXhrGN7CZ15MUcpwV4_e3cjJJH_4KhSif5lDhz1ejYZXyS9G1IDSRmbzsRn7gHR1DF9yTt9Dsu44QlZg8uq4Egqz7P0LBUbnDotRAmluhjbKBI7YQou0d4mlwuY42jXutXXqNEPF0TMgQNhFy4CK_gXrB5RhPXcPTzoR6LQfGro7ZRPoSGCMilWFVeCJGhoKD2rOxOr3gqWsBWD10WbO8dSakgmSVJJddcFx1NGuiZFREqVp3xX9HYOzn3R0CKD4dPX_fIDmpoHDg_B-0o--XioJfN5r6hqqEsWqYnQBkD2aWcR9P4FRkAjxMbwY0VIrS6CTiv3d5-ESINlOoXkI4Ef6ytGMK6n0Q1gpOr2Pq_EmPLvmVnE2YYmd3OahlRTlCJDIZoQWZjfXKILldDNHDeXf7pOld0XQZLDBZNhSKfG0FHX1A%3D%3D&u=https%3A%2F%2Flists.01.org%2Fmailman%2Flistinfo%2Fspdk

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [SPDK] Basic design of linear bdev (aggregating multi bdevs)
@ 2018-09-13 17:05 Walker, Benjamin
  0 siblings, 0 replies; 4+ messages in thread
From: Walker, Benjamin @ 2018-09-13 17:05 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2652 bytes --]

On Thu, 2018-09-13 at 02:46 +0000, 松本周平 / MATSUMOTO,SHUUHEI wrote:
> Hi All,
> 
> 
> As I talked shortly in the community before, I'm working on linear bdev
> (aggregating mutilple bdevs) as part of my works.

Just to clarify a bit more - this bdev concatenates bdevs, placing one after the
other, right? Are you able to share use cases for this type of volume? My
primary concern is that higher layers in the software stack often are designed
to prefer to write to lower LBAs, so concatenation results in hammering the
first disk and hardly touching the rest. Often, a RAID 0 with very large strips
(1MB) ends up working better. There are other considerations, such as what
happens when a disk fails, that may make RAID 0 infeasible, but it's something
to consider.

> 
> 
> The following is my present basic design of the linear bdev.
> 
> 
> Linear bdev doesn't belong to raid but I want to add linear bdev as another
> level of raid bdev.
> 
> Raid bdev and IO splitting in bdev will be able to become the great
> foundation.

For a bdev that is made as a concatenation of other bdevs there won't be much
splitting necessary and I think the algorithm is unique enough that you could do
something specific to the concatentation code instead of relying on generic
splitting in the bdev layer. I think that's a simpler approach overall.

> 
> 
> bdev:
> -  Utilize current IO splitting as much as possible.
> -  In linear bdev, optimal_io_boundary is not constant and linear bdev can't
> use current implementation based on optimal_io_boundary.
> 
> -  Add an array made of (start, length) pair to struct spdk_bdev instead of
> optimal_io_boundary.
> -  Abstract the following APIs:
>   - bool _spdk_bdev_io_should_split(struct spdk_bdev_io *bdev_io)
>   - uint32_t _to_next_boundary(uint64_t offset, uint32_t boundary)
> - The (start, length) array and optimal_io_boundary are mutually exclusive.
> 
> 
> raid_bdev:
> - Utilize current implementation as much as possible.
> - Abstract the following:
>  - raid_bdev_configure()
>    - check consistency among base bdevs, calculate total block counts, add
> split info to spdk_bdev
>  - raid_bdev_start_rw_request() and following functions
> - Use -1 as the level of the linear bdev.
> 
> 
> rpc:
> - Add an new parameter linear (bool) to construct_raid_bdev.
> - raid_level and linear are mutually exclusive.
> 
> 
> Your any feedback is very welcome.
> 
> 
> Thanks,
> 
> Shuhei
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [SPDK] Basic design of linear bdev (aggregating multi bdevs)
@ 2018-09-13  2:46 
  0 siblings, 0 replies; 4+ messages in thread
From:  @ 2018-09-13  2:46 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

Hi All,


As I talked shortly in the community before, I'm working on linear bdev (aggregating mutilple bdevs) as part of my works.


The following is my present basic design of the linear bdev.


Linear bdev doesn't belong to raid but I want to add linear bdev as another level of raid bdev.

Raid bdev and IO splitting in bdev will be able to become the great foundation.


bdev:
-  Utilize current IO splitting as much as possible.
-  In linear bdev, optimal_io_boundary is not constant and linear bdev can't use current implementation based on optimal_io_boundary.

-  Add an array made of (start, length) pair to struct spdk_bdev instead of optimal_io_boundary.
-  Abstract the following APIs:
  - bool _spdk_bdev_io_should_split(struct spdk_bdev_io *bdev_io)
  - uint32_t _to_next_boundary(uint64_t offset, uint32_t boundary)
- The (start, length) array and optimal_io_boundary are mutually exclusive.


raid_bdev:
- Utilize current implementation as much as possible.
- Abstract the following:
 - raid_bdev_configure()
   - check consistency among base bdevs, calculate total block counts, add split info to spdk_bdev
 - raid_bdev_start_rw_request() and following functions
- Use -1 as the level of the linear bdev.


rpc:
- Add an new parameter linear (bool) to construct_raid_bdev.
- raid_level and linear are mutually exclusive.


Your any feedback is very welcome.


Thanks,

Shuhei

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-09-17 11:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-17 11:00 [SPDK] Basic design of linear bdev (aggregating multi bdevs) Senthil Kumar Veluswamy
  -- strict thread matches above, loose matches on Subject: below --
2018-09-13 23:09 
2018-09-13 17:05 Walker, Benjamin
2018-09-13  2:46 

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.