All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH RFC] target/user: Add double ring buffers support.
       [not found] <1486709283-15535-1-git-send-email-lixiubo@cmss.chinamobile.com>
@ 2017-02-10 18:02 ` Andy Grover
  2017-02-13  1:38   ` Xiubo Li
  2017-02-13  9:50   ` Xiubo Li
  0 siblings, 2 replies; 9+ messages in thread
From: Andy Grover @ 2017-02-10 18:02 UTC (permalink / raw)
  To: lixiubo, mchristi; +Cc: nab, sheng, linux-scsi, target-devel, namei.unix

On 02/09/2017 10:48 PM, lixiubo@cmss.chinamobile.com wrote:
> For now the tcmu is based on UIO framework and only using the
> map0 with fixed ring buffer size. This will work fine mostly,
> but when we are using the 10GBASE NIC, the ring buffer is too
> small to deal with the incoming iscsi cmds.
>
> We can resolve the above issue by increasingt the ring buffer
> size larger, but we don't know the exactly size to all cases.
>
> This patch will add double ring buffers support by adding map1
> support through UIO device.

Hi Xiubo,

Yes I think it's now clear we need more buffer space to avoid 
bottlenecks for high iops. The initial design kept it simple with the 
1MB vmalloc'd space but anticipated greater would be needed. It should 
not be necessary to change userspace or the TCMU ABI to handle growing 
the buffer for fast devices:

1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB 
to 1GB or larger
2. Don't vmalloc() the whole thing, instead vmalloc for the cmd ring 
portion, and dynamically alloc pages for the data area as needed and map 
them into the data area.
3. Upgrade the current fixed-size bitmap-based tracking of data area to 
handle the new scheme
4. Implement an algorithm to keep allocated pages mapped into the data 
area for reuse, and maybe a heuristic to keep extreme burstiness from 
over-allocating pages

This should allow TCMU to allocate more data area as needed, not waste 
memory for slower devices, and avoid userspace ABI changes. Could we 
prototype this approach and see if it is workable?

Thanks -- Regards -- Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] target/user: Add double ring buffers support.
  2017-02-10 18:02 ` [PATCH RFC] target/user: Add double ring buffers support Andy Grover
@ 2017-02-13  1:38   ` Xiubo Li
  2017-02-13 17:42     ` Andy Grover
  2017-02-13  9:50   ` Xiubo Li
  1 sibling, 1 reply; 9+ messages in thread
From: Xiubo Li @ 2017-02-13  1:38 UTC (permalink / raw)
  To: Andy Grover, mchristi; +Cc: nab, sheng, linux-scsi, target-devel, namei.unix

Hi Andy,

Thanks for your quick reply.

Yes, It looks much better than the current.


On 2017年02月11日 02:02, Andy Grover wrote:
> On 02/09/2017 10:48 PM, lixiubo@cmss.chinamobile.com wrote:
>> For now the tcmu is based on UIO framework and only using the
>> map0 with fixed ring buffer size. This will work fine mostly,
>> but when we are using the 10GBASE NIC, the ring buffer is too
>> small to deal with the incoming iscsi cmds.
>>
>> We can resolve the above issue by increasingt the ring buffer
>> size larger, but we don't know the exactly size to all cases.
>>
>> This patch will add double ring buffers support by adding map1
>> support through UIO device.
>
> Hi Xiubo,
>
> Yes I think it's now clear we need more buffer space to avoid 
> bottlenecks for high iops. The initial design kept it simple with the 
> 1MB vmalloc'd space but anticipated greater would be needed. It should 
> not be necessary to change userspace or the TCMU ABI to handle growing 
> the buffer for fast devices:
>
> 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB 
> to 1GB or larger
> 2. Don't vmalloc() the whole thing, instead vmalloc for the cmd ring 
> portion, and dynamically alloc pages for the data area as needed and 
> map them into the data area.
Should the data area mapping is dynamical or add one new API to 
userspace to trigger it ?

> 3. Upgrade the current fixed-size bitmap-based tracking of data area 
> to handle the new scheme
> 4. Implement an algorithm to keep allocated pages mapped into the data 
> area for reuse, and maybe a heuristic to keep extreme burstiness from 
> over-allocating pages
>
Does a little like slab ?

> This should allow TCMU to allocate more data area as needed, not waste 
> memory for slower devices, and avoid userspace ABI changes. Could we 
> prototype this approach and see if it is workable?
>
I will do more investigate about this.


> Thanks -- Regards -- Andy
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] target/user: Add double ring buffers support.
  2017-02-10 18:02 ` [PATCH RFC] target/user: Add double ring buffers support Andy Grover
  2017-02-13  1:38   ` Xiubo Li
@ 2017-02-13  9:50   ` Xiubo Li
  2017-02-13 17:58     ` Andy Grover
  1 sibling, 1 reply; 9+ messages in thread
From: Xiubo Li @ 2017-02-13  9:50 UTC (permalink / raw)
  To: Andy Grover, mchristi; +Cc: nab, sheng, linux-scsi, target-devel, namei.unix

Hi Andy,

There is one new scheme in my mind:

>
> Yes I think it's now clear we need more buffer space to avoid 
> bottlenecks for high iops. The initial design kept it simple with the 
> 1MB vmalloc'd space but anticipated greater would be needed. It should 
> not be necessary to change userspace or the TCMU ABI to handle growing 
> the buffer for fast devices:
>
> 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB 
> to 1GB or larger
For the cmd area, set the size to fixed 512M, and data area's size to 
fixed 1G, is that okay ?

> 2. Don't vmalloc() the whole thing, instead vmalloc for the cmd ring 
> portion, and dynamically alloc pages for the data area as needed and 
> map them into the data area.
TCMU will just vmalloc() the 512M cmd area, and let the data area memory 
allocate and map later when needed.
The userspace runner will mmap() all the (512M + 1G) when initialising, 
and will return 1.5G virtual address space. The cmd area will be mapped 
to actual physical addresses here, while the data area will be mapped in 
page fault hook when using....


> 3. Upgrade the current fixed-size bitmap-based tracking of data area 
> to handle the new scheme
The Radix tree will be used to keep the block's index(0 ~ 
1G/DATA_BLOCK_SIZE) and physical page mapping relations. Each leaf is 
one data block(the size is DATA_BLOCK_SIZE).

For non-leaf nodes, use the radix tags[0][SLOTs] to indicate wether 
slot[SLOTs]'s branch has free(reused the old one or NULL leafs) block 
leafs or not.
This could speed the search of the free blocks in data area.


> 4. Implement an algorithm to keep allocated pages mapped into the data 
> area for reuse, and maybe a heuristic to keep extreme burstiness from 
> over-allocating pages
>
For leaf nodes:
if one leaf node is exist and its tags[0][SLOTs] = 0 meaning that some 
older cmds have already touched the block leafs and then "freed" it, if 
so, we will reuse it.
if the leaf node is exist and its tag[0][SLOTs] = 1 meaning that this is 
still used.
if the leaf node is non-exist, this is the first time to touch this 
leaf, will allocate memory and then insert it here setting the 
tag[0][SLOTs] to 1.

> This should allow TCMU to allocate more data area as needed, not waste 
> memory for slower devices, and avoid userspace ABI changes. Could we 
> prototype this approach and see if it is workable?
>
For slower devices, it will save memory.
Could also avoid changing the userspace ABI.

Thanks,

BRs
Xiubo



> Thanks -- Regards -- Andy
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] target/user: Add double ring buffers support.
  2017-02-13  1:38   ` Xiubo Li
@ 2017-02-13 17:42     ` Andy Grover
  2017-02-14  5:03       ` Xiubo Li
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Grover @ 2017-02-13 17:42 UTC (permalink / raw)
  To: Xiubo Li, mchristi; +Cc: nab, sheng, linux-scsi, target-devel, namei.unix

On 02/12/2017 05:38 PM, Xiubo Li wrote:
> On 2017年02月11日 02:02, Andy Grover wrote:
>> 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB
>> to 1GB or larger
>> 2. Don't vmalloc() the whole thing, instead vmalloc for the cmd ring
>> portion, and dynamically alloc pages for the data area as needed and
>> map them into the data area.
> Should the data area mapping is dynamical or add one new API to
> userspace to trigger it ?

I would recommend having the kernel make the decision to allocate/map 
more memory, or not (i.e. wait for existing I/O to complete and free 
pages) because this avoids the need for a new API, and also the kernel 
may better decide how to allocate total pages used across multiple TCMU 
devices.

>> 4. Implement an algorithm to keep allocated pages mapped into the data
>> area for reuse, and maybe a heuristic to keep extreme burstiness from
>> over-allocating pages
>>
> Does a little like slab ?

It's definitely becoming less of a storage issue and more of a 
page-management issue. It might make sense to look at mm code and ask mm 
experts for their advice, yep.

>> This should allow TCMU to allocate more data area as needed, not waste
>> memory for slower devices, and avoid userspace ABI changes. Could we
>> prototype this approach and see if it is workable?
>>
> I will do more investigate about this.

Thank you for working to improve this area!

Regards -- Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] target/user: Add double ring buffers support.
  2017-02-13  9:50   ` Xiubo Li
@ 2017-02-13 17:58     ` Andy Grover
  2017-02-14  5:50       ` Xiubo Li
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Grover @ 2017-02-13 17:58 UTC (permalink / raw)
  To: Xiubo Li, mchristi; +Cc: nab, sheng, linux-scsi, target-devel, namei.unix

On 02/13/2017 01:50 AM, Xiubo Li wrote:
> There is one new scheme in my mind:

>> Yes I think it's now clear we need more buffer space to avoid
>> bottlenecks for high iops. The initial design kept it simple with the
>> 1MB vmalloc'd space but anticipated greater would be needed. It should
>> not be necessary to change userspace or the TCMU ABI to handle growing
>> the buffer for fast devices:
>>
>> 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB
>> to 1GB or larger
> For the cmd area, set the size to fixed 512M, and data area's size to
> fixed 1G, is that okay ?

512M seems a little big for the cmd area... use of the cmd ring should 
be much smaller than the data area. Each cmd has a minimum size, but 
then to describe an additional page in the iovec[] should be just one 
struct iovec (16 bytes?) for each 4K page.

>> 3. Upgrade the current fixed-size bitmap-based tracking of data area
>> to handle the new scheme
> The Radix tree will be used to keep the block's index(0 ~
> 1G/DATA_BLOCK_SIZE) and physical page mapping relations. Each leaf is
> one data block(the size is DATA_BLOCK_SIZE).
>
> For non-leaf nodes, use the radix tags[0][SLOTs] to indicate wether
> slot[SLOTs]'s branch has free(reused the old one or NULL leafs) block
> leafs or not.
> This could speed the search of the free blocks in data area.

Yes I think radix tree is a good place to start.

Regards -- Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] target/user: Add double ring buffers support.
  2017-02-13 17:42     ` Andy Grover
@ 2017-02-14  5:03       ` Xiubo Li
  0 siblings, 0 replies; 9+ messages in thread
From: Xiubo Li @ 2017-02-14  5:03 UTC (permalink / raw)
  To: Andy Grover, mchristi; +Cc: nab, sheng, linux-scsi, target-devel, namei.unix



On 2017年02月14日 01:42, Andy Grover wrote:
> On 02/12/2017 05:38 PM, Xiubo Li wrote:
>> On 2017年02月11日 02:02, Andy Grover wrote:
>>> 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB
>>> to 1GB or larger
>>> 2. Don't vmalloc() the whole thing, instead vmalloc for the cmd ring
>>> portion, and dynamically alloc pages for the data area as needed and
>>> map them into the data area.
>> Should the data area mapping is dynamical or add one new API to
>> userspace to trigger it ?
>
> I would recommend having the kernel make the decision to allocate/map 
> more memory, or not (i.e. wait for existing I/O to complete and free 
> pages) because this avoids the need for a new API, and also the kernel 
> may better decide how to allocate total pages used across multiple 
> TCMU devices.
>
Yes, agreed.


>>> 4. Implement an algorithm to keep allocated pages mapped into the data
>>> area for reuse, and maybe a heuristic to keep extreme burstiness from
>>> over-allocating pages
>>>
>> Does a little like slab ?
>
> It's definitely becoming less of a storage issue and more of a 
> page-management issue. It might make sense to look at mm code and ask 
> mm experts for their advice, yep.
>
Sure, i will have a deep investigate about this.

Thanks very much.

BRs
Xiubo


>>> This should allow TCMU to allocate more data area as needed, not waste
>>> memory for slower devices, and avoid userspace ABI changes. Could we
>>> prototype this approach and see if it is workable?
>>>
>> I will do more investigate about this.
>
> Thank you for working to improve this area!
>
> Regards -- Andy
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] target/user: Add double ring buffers support.
  2017-02-13 17:58     ` Andy Grover
@ 2017-02-14  5:50       ` Xiubo Li
  2017-02-15  0:44         ` Andy Grover
  0 siblings, 1 reply; 9+ messages in thread
From: Xiubo Li @ 2017-02-14  5:50 UTC (permalink / raw)
  To: Andy Grover, mchristi; +Cc: nab, sheng, linux-scsi, target-devel, namei.unix


>>> Yes I think it's now clear we need more buffer space to avoid
>>> bottlenecks for high iops. The initial design kept it simple with the
>>> 1MB vmalloc'd space but anticipated greater would be needed. It should
>>> not be necessary to change userspace or the TCMU ABI to handle growing
>>> the buffer for fast devices:
>>>
>>> 1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB
>>> to 1GB or larger
>> For the cmd area, set the size to fixed 512M, and data area's size to
>> fixed 1G, is that okay ?
>
> 512M seems a little big for the cmd area... use of the cmd ring should 
> be much smaller than the data area. Each cmd has a minimum size, but 
> then to describe an additional page in the iovec[] should be just one 
> struct iovec (16 bytes?) for each 4K page.
>

The struct tcmu_cmd_entry {} size is fixed 44 bytes without iovec[], and 
the size of struct iovec[N] is about 16 bytes * N.

The cmd entry size will be [44B, N *16 + 44B], and the data size will be 
[0, N * 4096], so the ratio of sizeof(cmd entry): sizeof(entry datas) == 
(N * 16 + 44)Bytes : (N * 4096)Bytes == (N * 16)/(N * 4096) + 
44/(N*4096) == 1/256 + 11/(N * 1024).
When N is bigger, the ratio will be smaller. If N >= 1, the ratio will 
be [15/1024, 4/1024).

So, that's right, 512M is a little bigger than actually needed, for the 
safe ratio i think 16/1024 is enough, if the data area is fixed 1G, so 
the cmd area could be 16M+.
Or cmd area(64M) + data area(960M) == 1G ?


>>> 3. Upgrade the current fixed-size bitmap-based tracking of data area
>>> to handle the new scheme
>> The Radix tree will be used to keep the block's index(0 ~
>> 1G/DATA_BLOCK_SIZE) and physical page mapping relations. Each leaf is
>> one data block(the size is DATA_BLOCK_SIZE).
>>
>> For non-leaf nodes, use the radix tags[0][SLOTs] to indicate wether
>> slot[SLOTs]'s branch has free(reused the old one or NULL leafs) block
>> leafs or not.
>> This could speed the search of the free blocks in data area.
>
> Yes I think radix tree is a good place to start.
>
Okay.

Thanks,

BRs
Xiubo


> Regards -- Andy
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] target/user: Add double ring buffers support.
  2017-02-14  5:50       ` Xiubo Li
@ 2017-02-15  0:44         ` Andy Grover
  2017-02-15  5:46           ` Xiubo Li
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Grover @ 2017-02-15  0:44 UTC (permalink / raw)
  To: Xiubo Li, mchristi; +Cc: nab, sheng, linux-scsi, target-devel, namei.unix

On 02/13/2017 09:50 PM, Xiubo Li wrote:
> The struct tcmu_cmd_entry {} size is fixed 44 bytes without iovec[], and
> the size of struct iovec[N] is about 16 bytes * N.
> 
> The cmd entry size will be [44B, N *16 + 44B], and the data size will be
> [0, N * 4096], so the ratio of sizeof(cmd entry): sizeof(entry datas) ==
> (N * 16 + 44)Bytes : (N * 4096)Bytes == (N * 16)/(N * 4096) +
> 44/(N*4096) == 1/256 + 11/(N * 1024).
> When N is bigger, the ratio will be smaller. If N >= 1, the ratio will
> be [15/1024, 4/1024).
> 
> So, that's right, 512M is a little bigger than actually needed, for the
> safe ratio i think 16/1024 is enough, if the data area is fixed 1G, so
> the cmd area could be 16M+.
> Or cmd area(64M) + data area(960M) == 1G ?

Seems like a good ratio. You could look at growing the cmd ring when
deciding to allocate more pages for the data area. But, growing the cmd
ring is a little trickier (see below) so maybe have a different policy
for deciding when & how much to grow.

Changing the cmd ring size: Unlike the data area where you can just
allocate & map more pages, I think the cmd ring you probably want to
complete all I/O, get cmd_head and cmd_tail back to the top with a PAD
op, and *then* reallocate/remap the cmd ring and update
tcmu_mailbox.cmdr_size.

Regards -- Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] target/user: Add double ring buffers support.
  2017-02-15  0:44         ` Andy Grover
@ 2017-02-15  5:46           ` Xiubo Li
  0 siblings, 0 replies; 9+ messages in thread
From: Xiubo Li @ 2017-02-15  5:46 UTC (permalink / raw)
  To: Andy Grover, mchristi; +Cc: nab, sheng, linux-scsi, target-devel, namei.unix

>> The struct tcmu_cmd_entry {} size is fixed 44 bytes without iovec[], and
>> the size of struct iovec[N] is about 16 bytes * N.
>>
>> The cmd entry size will be [44B, N *16 + 44B], and the data size will be
>> [0, N * 4096], so the ratio of sizeof(cmd entry): sizeof(entry datas) ==
>> (N * 16 + 44)Bytes : (N * 4096)Bytes == (N * 16)/(N * 4096) +
>> 44/(N*4096) == 1/256 + 11/(N * 1024).
>> When N is bigger, the ratio will be smaller. If N >= 1, the ratio will
>> be [15/1024, 4/1024).
>>
>> So, that's right, 512M is a little bigger than actually needed, for the
>> safe ratio i think 16/1024 is enough, if the data area is fixed 1G, so
>> the cmd area could be 16M+.
>> Or cmd area(64M) + data area(960M) == 1G ?
> Seems like a good ratio. You could look at growing the cmd ring when
> deciding to allocate more pages for the data area. But, growing the cmd
> ring is a little trickier (see below) so maybe have a different policy
> for deciding when & how much to grow.
>
> Changing the cmd ring size: Unlike the data area where you can just
> allocate & map more pages, I think the cmd ring you probably want to
> complete all I/O, get cmd_head and cmd_tail back to the top with a PAD
> op, and *then* reallocate/remap the cmd ring and update
> tcmu_mailbox.cmdr_size.
Yes, that sounds nice. Growing the cmd area will be much more
complex, and this could be implemented in the future.

I will try to implement the following simple new scheme firstly:
- Fixed cmd area using the vmalloc(), total size will be 64M or 128M(?)
- Growing data area using kmalloc(DATA_BLOCK_SIZE, GFP_ATOMIC),
   the maximum total size will be 1G.

Thanks,

BRs
Xiubo

> Regards -- Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-02-15  5:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1486709283-15535-1-git-send-email-lixiubo@cmss.chinamobile.com>
2017-02-10 18:02 ` [PATCH RFC] target/user: Add double ring buffers support Andy Grover
2017-02-13  1:38   ` Xiubo Li
2017-02-13 17:42     ` Andy Grover
2017-02-14  5:03       ` Xiubo Li
2017-02-13  9:50   ` Xiubo Li
2017-02-13 17:58     ` Andy Grover
2017-02-14  5:50       ` Xiubo Li
2017-02-15  0:44         ` Andy Grover
2017-02-15  5:46           ` Xiubo Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.