All of lore.kernel.org
 help / color / mirror / Atom feed
* tcmu data area double copy overhead
@ 2021-12-08 12:43 Xiaoguang Wang
  2021-12-08 13:32 ` Xiubo Li
  2021-12-09 19:11 ` Bodo Stroesser
  0 siblings, 2 replies; 7+ messages in thread
From: Xiaoguang Wang @ 2021-12-08 12:43 UTC (permalink / raw)
  To: target-devel; +Cc: martin.petersen

hi,

I'm a newcomer to tcmu or iscsi subsystem, and have spent several days 
to learn
iSCSI and tcmu, so if my question looks fool, forgive me :)

One of our customers is using tcmu to visit remote distributed 
filesystem and finds
that there's obvious copy overhead in tcmu while doing read operations, 
so I spent
time to find the reason and see whether can optimize a bit. Now 
according to my
understanding about tcmu kernel codes, tcmu allocates internal data 
pages for data
area, use these data pages as temporary storage between user-space 
backstore and
tcmu. For iSCSI initiator's write request, tcmu first copy sg page's 
content to internal
data pages, then user-space backstore moves mmaped data area for these 
data pages
to backstore; for iSCSI initiator's read request, tcmu also allocates 
internal data pages,
backstore copies distributed filesystem's data to these data pages, 
later tcmu copy
data pages' content to sg's pages. That means for both read and write 
requests, it
exists one extra data copy.

So my question is that whether we don't allocate internal data pages in 
tcmu, just make
sg's pages to be mmaped in data area, so we can reduce one extra copy, 
which I think
it can improve throughput. Or is there any special security issues that 
we can not do
this way? Thanks.


Regards,
Xiaoguang Wang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tcmu data area double copy overhead
  2021-12-08 12:43 tcmu data area double copy overhead Xiaoguang Wang
@ 2021-12-08 13:32 ` Xiubo Li
  2021-12-09 19:19   ` Bodo Stroesser
  2021-12-09 19:11 ` Bodo Stroesser
  1 sibling, 1 reply; 7+ messages in thread
From: Xiubo Li @ 2021-12-08 13:32 UTC (permalink / raw)
  To: Xiaoguang Wang, target-devel; +Cc: martin.petersen


On 12/8/21 8:43 PM, Xiaoguang Wang wrote:
> hi,
>
> I'm a newcomer to tcmu or iscsi subsystem, and have spent several days 
> to learn
> iSCSI and tcmu, so if my question looks fool, forgive me :)
>
> One of our customers is using tcmu to visit remote distributed 
> filesystem and finds
> that there's obvious copy overhead in tcmu while doing read 
> operations, so I spent
> time to find the reason and see whether can optimize a bit. Now 
> according to my
> understanding about tcmu kernel codes, tcmu allocates internal data 
> pages for data
> area, use these data pages as temporary storage between user-space 
> backstore and
> tcmu. For iSCSI initiator's write request, tcmu first copy sg page's 
> content to internal
> data pages, then user-space backstore moves mmaped data area for these 
> data pages
> to backstore; for iSCSI initiator's read request, tcmu also allocates 
> internal data pages,
> backstore copies distributed filesystem's data to these data pages, 
> later tcmu copy
> data pages' content to sg's pages. That means for both read and write 
> requests, it
> exists one extra data copy.
>
> So my question is that whether we don't allocate internal data pages 
> in tcmu, just make
> sg's pages to be mmaped in data area, so we can reduce one extra copy, 
> which I think
> it can improve throughput. Or is there any special security issues 
> that we can not do
> this way? Thanks.
>
Yeah, this is a good start to improve this. Currently the tcmu-runner 
could benifit from the mapped temporary data pages and won't do the 
extra copy in userspace.

I think you can do the dynamic map/unmap for the data pages for each 
SCSI cmd in the LIO's ringbuffer, but this should be opaque to the user 
space daemons. And at the same time the LIO needs to tell tcmu-runner 
the offset from/to where should it read/write the data contents in the 
mapped ringbuffer via each tcmu cmd entry. Currently the tcmu cmd entry 
will be followed by its data contents.

But I didn't investigate whether could we make the sg's pages, not sure 
whether will they have any limitation to do this.


BRs

Xiubo



>
> Regards,
> Xiaoguang Wang
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tcmu data area double copy overhead
  2021-12-08 12:43 tcmu data area double copy overhead Xiaoguang Wang
  2021-12-08 13:32 ` Xiubo Li
@ 2021-12-09 19:11 ` Bodo Stroesser
  2021-12-10  6:42   ` Xiaoguang Wang
  2021-12-21 13:26   ` Xiaoguang Wang
  1 sibling, 2 replies; 7+ messages in thread
From: Bodo Stroesser @ 2021-12-09 19:11 UTC (permalink / raw)
  To: Xiaoguang Wang, target-devel; +Cc: martin.petersen

On 08.12.21 13:43, Xiaoguang Wang wrote:
> hi,
> 
> I'm a newcomer to tcmu or iscsi subsystem, and have spent several days 
> to learn
> iSCSI and tcmu, so if my question looks fool, forgive me :)
> 
> One of our customers is using tcmu to visit remote distributed 
> filesystem and finds
> that there's obvious copy overhead in tcmu while doing read operations, 
> so I spent
> time to find the reason and see whether can optimize a bit. Now 
> according to my
> understanding about tcmu kernel codes, tcmu allocates internal data 
> pages for data
> area, use these data pages as temporary storage between user-space 
> backstore and
> tcmu. For iSCSI initiator's write request, tcmu first copy sg page's 
> content to internal
> data pages, then user-space backstore moves mmaped data area for these 
> data pages
> to backstore; for iSCSI initiator's read request, tcmu also allocates 
> internal data pages,
> backstore copies distributed filesystem's data to these data pages, 
> later tcmu copy
> data pages' content to sg's pages. That means for both read and write 
> requests, it
> exists one extra data copy.
> 
> So my question is that whether we don't allocate internal data pages in 
> tcmu, just make
> sg's pages to be mmaped in data area, so we can reduce one extra copy, 
> which I think
> it can improve throughput. Or is there any special security issues that 
> we can not do
> this way? Thanks.

You are right, tcmu currently copies data between the sg-pages and tcmu
data pages.

But I'm not sure the solution you suggest would really show the improved
throughput you expect, because we would have to map all data pages of the
sgl(s) of a new cmd into user space and unmap them again when the cmd is
processed.

To map one page means, that we store the struct page pointer in tcmu's
data (xarray). If userspace tries to read or write that page, a page fault
will occur and kernel will call tcmu_vma_fault which returns the page
pointer. To unmap means that tcmu has remove the page pointer and to call
unmap_mapping_range. So I'm not sure that copying content of one page is
more expensive than mapping and unmapping one page.

Additionally, if tcmu would map the sg-pages, it would have to unmap the
pages immediately when userspace completes the cmd, because tcmu is not
the owner of the pages. So the recently added feature "KEEP_BUF" would
have to be removed again. But that feature was added to avoid the need for
data copy in userspace in some situations.

Finally, if tcmu times out a cmd that is waiting on the ring for
completion from userspace, tcmu sends cmd completion to tcm core. Before
doing so, it would have to unmap the sg-pages. If userspace later tries to
access one of these pages, tcmu_vma_fault has nothing to map, instead
returns VM_FAULT_SIGBUS and userspace receives SIGBUS.

I already started another attempt to avoid data copy in tcmu. The idea
is to optionally allow backend drivers to have callbacks for sg allocation
and free. That way the pages in a sg allocated by tcm core can be pages
from tcmu's data area. Thus, no map/unmap is needed and the fabric driver
directly writes/reads data to/from those pages visible to userspace.

In a high performance scenario the method already lowers cpu load and
enhances throughput very well with qla2xxx fabric. Unfortunately that
patchset works only for fabrics using target_submit_cmd or calling
target_submit_prep without allocated sgls, which iscsi does not :(

Currently I'm working on another tuning measure in tcmu. After that I'll
go back to my no-data-copy patches. Maybe I can make them work with most
fabric drivers including iscsi.

Regards,
Bodo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tcmu data area double copy overhead
  2021-12-08 13:32 ` Xiubo Li
@ 2021-12-09 19:19   ` Bodo Stroesser
  0 siblings, 0 replies; 7+ messages in thread
From: Bodo Stroesser @ 2021-12-09 19:19 UTC (permalink / raw)
  To: Xiubo Li, Xiaoguang Wang, target-devel; +Cc: martin.petersen

On 08.12.21 14:32, Xiubo Li wrote:
> 
> On 12/8/21 8:43 PM, Xiaoguang Wang wrote:
>> hi,
>>
>> I'm a newcomer to tcmu or iscsi subsystem, and have spent several days 
>> to learn
>> iSCSI and tcmu, so if my question looks fool, forgive me :)
>>
>> One of our customers is using tcmu to visit remote distributed 
>> filesystem and finds
>> that there's obvious copy overhead in tcmu while doing read 
>> operations, so I spent
>> time to find the reason and see whether can optimize a bit. Now 
>> according to my
>> understanding about tcmu kernel codes, tcmu allocates internal data 
>> pages for data
>> area, use these data pages as temporary storage between user-space 
>> backstore and
>> tcmu. For iSCSI initiator's write request, tcmu first copy sg page's 
>> content to internal
>> data pages, then user-space backstore moves mmaped data area for these 
>> data pages
>> to backstore; for iSCSI initiator's read request, tcmu also allocates 
>> internal data pages,
>> backstore copies distributed filesystem's data to these data pages, 
>> later tcmu copy
>> data pages' content to sg's pages. That means for both read and write 
>> requests, it
>> exists one extra data copy.
>>
>> So my question is that whether we don't allocate internal data pages 
>> in tcmu, just make
>> sg's pages to be mmaped in data area, so we can reduce one extra copy, 
>> which I think
>> it can improve throughput. Or is there any special security issues 
>> that we can not do
>> this way? Thanks.
>>
> Yeah, this is a good start to improve this. Currently the tcmu-runner 
> could benifit from the mapped temporary data pages and won't do the 
> extra copy in userspace.

I think the idea is to avoid data copy in tcmu, not userspace.
What extra copy in userspace are you talking about?

> 
> I think you can do the dynamic map/unmap for the data pages for each 
> SCSI cmd in the LIO's ringbuffer, but this should be opaque to the user 
> space daemons. And at the same time the LIO needs to tell tcmu-runner 
> the offset from/to where should it read/write the data contents in the 
> mapped ringbuffer via each tcmu cmd entry. Currently the tcmu cmd entry 
> will be followed by its data contents.

Unfortunately that's not true. tcmu does not store data in the cmd ring.
Data area is behind the cmd ring in tcmu's uio dev mapping.
So cmd entry in cmd ring already contains an array of struct iovec to
specify the pieces of the data area used for the cmd's in or out data.

> 
> But I didn't investigate whether could we make the sg's pages, not sure 
> whether will they have any limitation to do this.
> 
> 
> BRs
> 
> Xiubo
> 
> 
> 
>>
>> Regards,
>> Xiaoguang Wang
>>
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tcmu data area double copy overhead
  2021-12-09 19:11 ` Bodo Stroesser
@ 2021-12-10  6:42   ` Xiaoguang Wang
  2021-12-21 13:26   ` Xiaoguang Wang
  1 sibling, 0 replies; 7+ messages in thread
From: Xiaoguang Wang @ 2021-12-10  6:42 UTC (permalink / raw)
  To: Bodo Stroesser, target-devel; +Cc: martin.petersen

hi,

> On 08.12.21 13:43, Xiaoguang Wang wrote:
>> hi,
>>
>> I'm a newcomer to tcmu or iscsi subsystem, and have spent several 
>> days to learn
>> iSCSI and tcmu, so if my question looks fool, forgive me :)
>>
>> One of our customers is using tcmu to visit remote distributed 
>> filesystem and finds
>> that there's obvious copy overhead in tcmu while doing read 
>> operations, so I spent
>> time to find the reason and see whether can optimize a bit. Now 
>> according to my
>> understanding about tcmu kernel codes, tcmu allocates internal data 
>> pages for data
>> area, use these data pages as temporary storage between user-space 
>> backstore and
>> tcmu. For iSCSI initiator's write request, tcmu first copy sg page's 
>> content to internal
>> data pages, then user-space backstore moves mmaped data area for 
>> these data pages
>> to backstore; for iSCSI initiator's read request, tcmu also allocates 
>> internal data pages,
>> backstore copies distributed filesystem's data to these data pages, 
>> later tcmu copy
>> data pages' content to sg's pages. That means for both read and write 
>> requests, it
>> exists one extra data copy.
>>
>> So my question is that whether we don't allocate internal data pages 
>> in tcmu, just make
>> sg's pages to be mmaped in data area, so we can reduce one extra 
>> copy, which I think
>> it can improve throughput. Or is there any special security issues 
>> that we can not do
>> this way? Thanks.
>
> You are right, tcmu currently copies data between the sg-pages and tcmu
> data pages.
>
> But I'm not sure the solution you suggest would really show the improved
> throughput you expect, because we would have to map all data pages of the
> sgl(s) of a new cmd into user space and unmap them again when the cmd is
> processed.
>
> To map one page means, that we store the struct page pointer in tcmu's
> data (xarray). If userspace tries to read or write that page, a page 
> fault
> will occur and kernel will call tcmu_vma_fault which returns the page
> pointer. To unmap means that tcmu has remove the page pointer and to call
> unmap_mapping_range. So I'm not sure that copying content of one page is
> more expensive than mapping and unmapping one page.
OK, I see, thanks for your detailed explanations.

>
> Additionally, if tcmu would map the sg-pages, it would have to unmap the
> pages immediately when userspace completes the cmd, because tcmu is not
> the owner of the pages. So the recently added feature "KEEP_BUF" would
> have to be removed again. But that feature was added to avoid the need 
> for
> data copy in userspace in some situations.
>
> Finally, if tcmu times out a cmd that is waiting on the ring for
> completion from userspace, tcmu sends cmd completion to tcm core. Before
> doing so, it would have to unmap the sg-pages. If userspace later 
> tries to
> access one of these pages, tcmu_vma_fault has nothing to map, instead
> returns VM_FAULT_SIGBUS and userspace receives SIGBUS.
OK, I see.

>
> I already started another attempt to avoid data copy in tcmu. The idea
> is to optionally allow backend drivers to have callbacks for sg 
> allocation
> and free. That way the pages in a sg allocated by tcm core can be pages
> from tcmu's data area. Thus, no map/unmap is needed and the fabric driver
> directly writes/reads data to/from those pages visible to userspace.
Yeah, if we can eliminate the map/unmap operation and make userpace
backstore operates sg pages directly, tcmu can give much higher throughput.

Regards,
Xiaoguang Wang

>
> In a high performance scenario the method already lowers cpu load and
> enhances throughput very well with qla2xxx fabric. Unfortunately that
> patchset works only for fabrics using target_submit_cmd or calling
> target_submit_prep without allocated sgls, which iscsi does not :(
>
> Currently I'm working on another tuning measure in tcmu. After that I'll
> go back to my no-data-copy patches. Maybe I can make them work with most
> fabric drivers including iscsi.
>
> Regards,
> Bodo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tcmu data area double copy overhead
  2021-12-09 19:11 ` Bodo Stroesser
  2021-12-10  6:42   ` Xiaoguang Wang
@ 2021-12-21 13:26   ` Xiaoguang Wang
  2021-12-23 12:05     ` Bodo Stroesser
  1 sibling, 1 reply; 7+ messages in thread
From: Xiaoguang Wang @ 2021-12-21 13:26 UTC (permalink / raw)
  To: Bodo Stroesser, target-devel; +Cc: martin.petersen

hi,

> On 08.12.21 13:43, Xiaoguang Wang wrote:
>> hi,
>>
>> I'm a newcomer to tcmu or iscsi subsystem, and have spent several 
>> days to learn
>> iSCSI and tcmu, so if my question looks fool, forgive me :)
>>
>> One of our customers is using tcmu to visit remote distributed 
>> filesystem and finds
>> that there's obvious copy overhead in tcmu while doing read 
>> operations, so I spent
>> time to find the reason and see whether can optimize a bit. Now 
>> according to my
>> understanding about tcmu kernel codes, tcmu allocates internal data 
>> pages for data
>> area, use these data pages as temporary storage between user-space 
>> backstore and
>> tcmu. For iSCSI initiator's write request, tcmu first copy sg page's 
>> content to internal
>> data pages, then user-space backstore moves mmaped data area for 
>> these data pages
>> to backstore; for iSCSI initiator's read request, tcmu also allocates 
>> internal data pages,
>> backstore copies distributed filesystem's data to these data pages, 
>> later tcmu copy
>> data pages' content to sg's pages. That means for both read and write 
>> requests, it
>> exists one extra data copy.
>>
>> So my question is that whether we don't allocate internal data pages 
>> in tcmu, just make
>> sg's pages to be mmaped in data area, so we can reduce one extra 
>> copy, which I think
>> it can improve throughput. Or is there any special security issues 
>> that we can not do
>> this way? Thanks.
>
> You are right, tcmu currently copies data between the sg-pages and tcmu
> data pages.
>
> But I'm not sure the solution you suggest would really show the improved
> throughput you expect, because we would have to map all data pages of the
> sgl(s) of a new cmd into user space and unmap them again when the cmd is
> processed.
>
> To map one page means, that we store the struct page pointer in tcmu's
> data (xarray). If userspace tries to read or write that page, a page 
> fault
> will occur and kernel will call tcmu_vma_fault which returns the page
> pointer. To unmap means that tcmu has remove the page pointer and to call
> unmap_mapping_range. So I'm not sure that copying content of one page is
> more expensive than mapping and unmapping one page.
>
> Additionally, if tcmu would map the sg-pages, it would have to unmap the
> pages immediately when userspace completes the cmd, because tcmu is not
> the owner of the pages. So the recently added feature "KEEP_BUF" would
> have to be removed again. But that feature was added to avoid the need 
> for
> data copy in userspace in some situations.
>
> Finally, if tcmu times out a cmd that is waiting on the ring for
> completion from userspace, tcmu sends cmd completion to tcm core. Before
> doing so, it would have to unmap the sg-pages. If userspace later 
> tries to
> access one of these pages, tcmu_vma_fault has nothing to map, instead
> returns VM_FAULT_SIGBUS and userspace receives SIGBUS.
>
> I already started another attempt to avoid data copy in tcmu. The idea
> is to optionally allow backend drivers to have callbacks for sg 
> allocation
> and free. 
Does  the "backend drivers" here mean user-space tcmu backstore? If yes,
seems that lio uses target_alloc_sgl() to allocates sg pages, so how we pass
this info to user-space tcmu backstore? Thanks.

One of our customers use LIO to export a local scsi device using 
tcm_loop and
tcmu userspace back store, for this usage, because
==> tcm_loop_target_queue_cmd
====> target_submit_prep
tcm_loop_target_queue_cmd() is called with valid sg, so your 
optimization won't
work here, right? Thanks.

Regards,
Xiaoguang Wang

> That way the pages in a sg allocated by tcm core can be pages
> from tcmu's data area. Thus, no map/unmap is needed and the fabric driver
> directly writes/reads data to/from those pages visible to userspace.
>
> In a high performance scenario the method already lowers cpu load and
> enhances throughput very well with qla2xxx fabric. Unfortunately that
> patchset works only for fabrics using target_submit_cmd or calling
> target_submit_prep without allocated sgls, which iscsi does not :(
> Currently I'm working on another tuning measure in tcmu. After that I'll
> go back to my no-data-copy patches. Maybe I can make them work with most
> fabric drivers including iscsi.
>
> Regards,
> Bodo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tcmu data area double copy overhead
  2021-12-21 13:26   ` Xiaoguang Wang
@ 2021-12-23 12:05     ` Bodo Stroesser
  0 siblings, 0 replies; 7+ messages in thread
From: Bodo Stroesser @ 2021-12-23 12:05 UTC (permalink / raw)
  To: Xiaoguang Wang, target-devel; +Cc: martin.petersen

On 21.12.21 14:26, Xiaoguang Wang wrote:
> hi,
> 
>> On 08.12.21 13:43, Xiaoguang Wang wrote:
>>> hi,
>>>
>>> I'm a newcomer to tcmu or iscsi subsystem, and have spent several 
>>> days to learn
>>> iSCSI and tcmu, so if my question looks fool, forgive me :)
>>>
>>> One of our customers is using tcmu to visit remote distributed 
>>> filesystem and finds
>>> that there's obvious copy overhead in tcmu while doing read 
>>> operations, so I spent
>>> time to find the reason and see whether can optimize a bit. Now 
>>> according to my
>>> understanding about tcmu kernel codes, tcmu allocates internal data 
>>> pages for data
>>> area, use these data pages as temporary storage between user-space 
>>> backstore and
>>> tcmu. For iSCSI initiator's write request, tcmu first copy sg page's 
>>> content to internal
>>> data pages, then user-space backstore moves mmaped data area for 
>>> these data pages
>>> to backstore; for iSCSI initiator's read request, tcmu also allocates 
>>> internal data pages,
>>> backstore copies distributed filesystem's data to these data pages, 
>>> later tcmu copy
>>> data pages' content to sg's pages. That means for both read and write 
>>> requests, it
>>> exists one extra data copy.
>>>
>>> So my question is that whether we don't allocate internal data pages 
>>> in tcmu, just make
>>> sg's pages to be mmaped in data area, so we can reduce one extra 
>>> copy, which I think
>>> it can improve throughput. Or is there any special security issues 
>>> that we can not do
>>> this way? Thanks.
>>
>> You are right, tcmu currently copies data between the sg-pages and tcmu
>> data pages.
>>
>> But I'm not sure the solution you suggest would really show the improved
>> throughput you expect, because we would have to map all data pages of the
>> sgl(s) of a new cmd into user space and unmap them again when the cmd is
>> processed.
>>
>> To map one page means, that we store the struct page pointer in tcmu's
>> data (xarray). If userspace tries to read or write that page, a page 
>> fault
>> will occur and kernel will call tcmu_vma_fault which returns the page
>> pointer. To unmap means that tcmu has remove the page pointer and to call
>> unmap_mapping_range. So I'm not sure that copying content of one page is
>> more expensive than mapping and unmapping one page.
>>
>> Additionally, if tcmu would map the sg-pages, it would have to unmap the
>> pages immediately when userspace completes the cmd, because tcmu is not
>> the owner of the pages. So the recently added feature "KEEP_BUF" would
>> have to be removed again. But that feature was added to avoid the need 
>> for
>> data copy in userspace in some situations.
>>
>> Finally, if tcmu times out a cmd that is waiting on the ring for
>> completion from userspace, tcmu sends cmd completion to tcm core. Before
>> doing so, it would have to unmap the sg-pages. If userspace later 
>> tries to
>> access one of these pages, tcmu_vma_fault has nothing to map, instead
>> returns VM_FAULT_SIGBUS and userspace receives SIGBUS.
>>
>> I already started another attempt to avoid data copy in tcmu. The idea
>> is to optionally allow backend drivers to have callbacks for sg 
>> allocation
>> and free. 
> Does  the "backend drivers" here mean user-space tcmu backstore? If yes,
> seems that lio uses target_alloc_sgl() to allocates sg pages, so how we 
> pass
> this info to user-space tcmu backstore? Thanks.

I just added a new optional backend callback that was called by
transport_generic_new_cmd. Only tcmu implemented that callback.

> 
> One of our customers use LIO to export a local scsi device using 
> tcm_loop and
> tcmu userspace back store, for this usage, because
> ==> tcm_loop_target_queue_cmd
> ====> target_submit_prep
> tcm_loop_target_queue_cmd() is called with valid sg, so your 
> optimization won't
> work here, right? Thanks.

Yes. The way I implemented my first attempt didn't work for drivers
setting SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC. Of course for a final
solution it would be great to speed up _all_ fabrics. Unfortunately
I fear it will not be possible. E.g. for tcmloop I see no chance,
because sgls in that case directly come from the initiator side.

I don't know yet, whether for iscsi we can find a solution. If you
have suggestions, please let me know.

> 
> Regards,
> Xiaoguang Wang
> 
>> That way the pages in a sg allocated by tcm core can be pages
>> from tcmu's data area. Thus, no map/unmap is needed and the fabric driver
>> directly writes/reads data to/from those pages visible to userspace.
>>
>> In a high performance scenario the method already lowers cpu load and
>> enhances throughput very well with qla2xxx fabric. Unfortunately that
>> patchset works only for fabrics using target_submit_cmd or calling
>> target_submit_prep without allocated sgls, which iscsi does not :(
>> Currently I'm working on another tuning measure in tcmu. After that I'll
>> go back to my no-data-copy patches. Maybe I can make them work with most
>> fabric drivers including iscsi.
>>
>> Regards,
>> Bodo
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-23 12:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08 12:43 tcmu data area double copy overhead Xiaoguang Wang
2021-12-08 13:32 ` Xiubo Li
2021-12-09 19:19   ` Bodo Stroesser
2021-12-09 19:11 ` Bodo Stroesser
2021-12-10  6:42   ` Xiaoguang Wang
2021-12-21 13:26   ` Xiaoguang Wang
2021-12-23 12:05     ` Bodo Stroesser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.