All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
To: Bodo Stroesser <bostroesser@gmail.com>,
	linux-mm@kvack.org, target-devel@vger.kernel.org,
	linux-scsi@vger.kernel.org
Cc: linux-block@vger.kernel.org, xuyu@linux.alibaba.com
Subject: Re: [RFC 0/3] Add zero copy feature for tcmu
Date: Tue, 22 Mar 2022 21:17:07 +0800	[thread overview]
Message-ID: <36b5a8e5-c8e9-6a1f-834c-6bf9bf920f4c@linux.alibaba.com> (raw)
In-Reply-To: <abbe51c4-873f-e96e-d421-85906689a55a@gmail.com>

hi,

> On 18.03.22 10:55, Xiaoguang Wang wrote:
>> The core idea to implement tcmu zero copy feature is really straight,
>> which just maps block device io request's sgl pages to tcmu user space
>> backstore, then we can avoid extra copy overhead between sgl pages and
>> tcmu internal data area(which really impacts io throughput), please see
>> https://www.spinics.net/lists/target-devel/msg21121.html for detailed
>> info.
>>
>
> Can you please tell us, how big the performance improvement is and
> which configuration you are using for measurenments?
Sorry, I should have attached test results here. Initially I tried to use
tcmu user:fbo backstore to evaluate performance improvements, but
it only shows about 10%~15% io throughput improvement. Fio config
is numjobs=1, iodepth=8, bs=256k, which isn't very impressive. The
reason is that user:fbo backstore does buffered reads, it consumes most
of cpu.

Then I test this zero copy feature for our real workload, whose backstore
is a network program visiting distributed file system and it's 
multi-threaded.
For 4 job, 8 depth, 256 kb io size, the write throughput improves from
3.6GB/s to 10GB/s.

Regards,
Xiaoguang Wang

>
>> Initially I use remap_pfn_range or vm_insert_pages to map sgl pages to
>> user space, but both of them have limits:
>> 1)  Use vm_insert_pages
>> which is like tcp getsockopt(TCP_ZEROCOPY_RECEIVE), but there're two
>> restrictions:
>>    1. anonymous pages can not be mmaped to user spacea.
>>      ==> vm_insert_pages
>>      ====> insert_pages
>>      ======> insert_page_in_batch_locked
>>      ========> validate_page_before_insert
>>      In validate_page_before_insert(), it shows that anonymous page 
>> can not
>>      be mapped to use space, we know that if issuing direct io to block
>>      device, io request's sgl pages mostly comes from anonymous page.
>>          if (PageAnon(page) || PageSlab(page) || page_has_type(page))
>>              return -EINVAL;
>>      I'm not sure why there is such restriction? for safety reasons ?
>>
>>    2. warn_on triggered in __folio_mark_dirty
>>      When calling zap_page_range in tcmu user space backstore when io
>>      completes, there is a warn_on triggered in __folio_mark_dirty:
>>         if (folio->mapping) {   /* Race with truncate? */
>>             WARN_ON_ONCE(warn && !folio_test_uptodate(folio));
>>
>>      I'm not familiar with folio yet, but I think the reason is that 
>> when
>>      issuing a buffered read to tcmu block device, it's page cache 
>> mapped
>>      to user space, backstore write this page and pte will be 
>> dirtied. but
>>      initially it's newly allocated, hence page_update flag not set.
>>      In zap_pte_range(), there is such codes:
>>         if (!PageAnon(page)) {
>>             if (pte_dirty(ptent)) {
>>                 force_flush = 1;
>>                 set_page_dirty(page);
>>             }
>>     So this warn_on is reasonable.
>>     Indeed what I want is just to map io request sgl pages to tcmu user
>>     space backstore, then backstore can read or write data to mapped 
>> area,
>>     I don't want to care about page or its mapping status, so I 
>> choose to
>>     use remap_pfn_range.
>>
>> 2) Use remap_pfn_range()
>>    remap_pfn_range works well, but it has somewhat obvious overhead. 
>> For a
>>    512kb io request, it has 128 pages, and usually this 128 page's 
>> pfn are
>>    not consecutive, so in worst cases, for a 512kb io request, I'd 
>> need to
>>    issue 128 calls to remap_pfn_range, it's horrible. And in 
>> remap_pfn_range,
>>    if x86 page attribute table feature is enabled, lookup_memtype 
>> called by
>>    track_pfn_remap() also introduces obvious overhead.
>>
>> Finally in order to solve these problems, Xu Yu helps to implment a new
>> helper, which accepts an array of pages as parameter, anonymous pages 
>> can
>> be mapped to user space, pages would be treated as special 
>> pte(pte_special
>> returns true), so vm_normal_page returns NULL, above folio warn_on won't
>> trigger.
>>
>> Thanks.
>>
>> Xiaoguang Wang (2):
>>    mm: export zap_page_range()
>>    scsi: target: tcmu: Support zero copy
>>
>> Xu Yu (1):
>>    mm/memory.c: introduce vm_insert_page(s)_mkspecial
>>
>>   drivers/target/target_core_user.c | 257 
>> +++++++++++++++++++++++++++++++++-----
>>   include/linux/mm.h                |   2 +
>>   mm/memory.c                       | 183 +++++++++++++++++++++++++++
>>   3 files changed, 414 insertions(+), 28 deletions(-)
>>


  reply	other threads:[~2022-03-22 13:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-18  9:55 [RFC 0/3] Add zero copy feature for tcmu Xiaoguang Wang
2022-03-18  9:55 ` [RFC 1/3] mm/memory.c: introduce vm_insert_page(s)_mkspecial Xiaoguang Wang
2022-03-23 16:45   ` Christoph Hellwig
2022-03-24  7:27     ` Xiaoguang Wang
2022-03-18  9:55 ` [RFC 2/3] mm: export zap_page_range() Xiaoguang Wang
2022-03-21 12:01   ` David Hildenbrand
2022-03-22 13:02     ` Xiaoguang Wang
2022-03-22 13:08       ` David Hildenbrand
2022-03-23 13:59         ` Xiaoguang Wang
2022-03-23 16:48         ` Christoph Hellwig
2022-03-23 16:47   ` Christoph Hellwig
2022-03-24  9:16   ` Ming Lei
2022-03-18  9:55 ` [RFC 3/3] scsi: target: tcmu: Support zero copy Xiaoguang Wang
2022-03-22 14:01   ` Bodo Stroesser
2022-03-23 14:33     ` Xiaoguang Wang
2022-03-25  9:06       ` Bodo Stroesser
2022-03-22 12:40 ` [RFC 0/3] Add zero copy feature for tcmu Bodo Stroesser
2022-03-22 13:17   ` Xiaoguang Wang [this message]
2022-03-22 14:05     ` Bodo Stroesser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36b5a8e5-c8e9-6a1f-834c-6bf9bf920f4c@linux.alibaba.com \
    --to=xiaoguang.wang@linux.alibaba.com \
    --cc=bostroesser@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=target-devel@vger.kernel.org \
    --cc=xuyu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.