All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Stodden <daniel.stodden@citrix.com>
To: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Cc: Fitzhardinge <jeremy@goop.org>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Jeremy
Subject: Re: blktap: Sync with XCP, dropping zero-copy.
Date: Wed, 17 Nov 2010 15:42:47 -0800	[thread overview]
Message-ID: <1290037367.11102.1663.camel@agari.van.xensource.com> (raw)
In-Reply-To: <E4E889D2-D5B2-435B-8833-BC01C86506B2@lagarcavilla.org>

On Wed, 2010-11-17 at 11:36 -0500, Andres Lagar-Cavilla wrote:
> I'll throw an idea there and you educate me why it's lame.
> 
> Going back to the primary issue of dropping zero-copy, you want the block backend (tapdev w/AIO or otherwise) to operate on regular dom0 pages, because you run into all sorts of quirkiness otherwise: magical VM_FOREIGN incantations to back granted mfn's with fake page structs that make get_user_pages happy, quirky grant PTEs, etc.
> 
> Ok, so how about something along the lines of GNTTABOP_swap? Eerily reminiscent of (maligned?) GNTTABOP_transfer, but hear me out.
> 
> The observation is that for a blkfront read, you could do the read all along on a regular dom0 frame, and when stuffing the response into the ring, swap the dom0 frame (mfn) you used with the domU frame provided as a buffer. Then the algorithm folds out:
> 
> 1. Block backend, instead of get_empty_pages_and_pagevec at init time, creates a pool of reserved regular pages via get_free_page(s). These pages have their refcount pumped, no one in dom0 will ever touch them.
> 
> 2. When extracting a blkfront write from the ring, call GNTTABOP_swap immediately. One of the backend-reserved mfn's is swapped with the domU mfn. Pfn's and page struct's on both ends remain untouched.
> 
> 3. For blkfront reads, call swap when stuffing the response back into the ring
> 
> 4. Because of 1, dom0 can a) calmly fix its p2m (and kvaddr) after swap, much like balloon and others do, without fear of races. More importantly, b) you don't have a weirdo granted PTE, or work with a frame from other domain. It's your page all along, dom0
> 
> 5. One assumption for domU is that pages allocated as blkfront buffers won't be touched by anybody, so a) it's safe for them to swap async with another frame with undef contents and b) domU can fix its p2m (and kvaddr) when pulling responses from the ring (the new mfn should be put on the response by dom0 directly or through an opaque handle)
> 
> 6. Scatter-gather vectors in ring requests give you a natural multicall batching for these GNTTABOP_swap's. I.e. all these hypercalls won't happen as often and at the granularity as skbuff's demanded for GNTTABOP_transfer
> 
> 7. Potentially domU may want to use the contents in a blkfront write buffer later for something else. So it's not really zero-copy. But the approach opens a window to async memcpy . From the point of swap when pulling the req to the point of pushing the response, you can do memcpy at any time. Don't know about how practical that is though.
> 
> Problems at first glance:
> 1. To support GNTTABOP_swap you need to add more if(version) to blkfront and blkback.
> 2. The kernel vaddr will need to be managed as well by dom0/U. Much like balloon or others: hypercall, fix p2m, and fix kvaddr all need to be taken care of. domU will probably need to neuter its kvaddr before granting, and then re-establish it when the response arrives. Weren't all these hypercalls ultimately more expensive than memcpy for GNTABOP_transfer for netback?
> 3. Managing the pool of backend reserved pages may be a problem?

I guess GNT_transfer for network I/O died because of the double-ended
TLB fallout?

Still liked the general direction, nice shot.

Cheers,
Daniel

  parent reply	other threads:[~2010-11-17 23:42 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20101116215621.59FC2CF782@homiemail-mx7.g.dreamhost.com>
2010-11-17 16:36 ` blktap: Sync with XCP, dropping zero-copy Andres Lagar-Cavilla
2010-11-17 17:52   ` Jeremy Fitzhardinge
2010-11-17 19:47     ` Andres Lagar-Cavilla
2010-11-17 23:42   ` Daniel Stodden [this message]
2010-11-12 23:31 Daniel Stodden
2010-11-13  0:50 ` Jeremy Fitzhardinge
2010-11-13  3:56   ` Daniel Stodden
     [not found]   ` <1289620544.11102.373.camel@agari.van.xensource.com>
2010-11-15 18:27     ` Jeremy Fitzhardinge
2010-11-16  9:13       ` Daniel Stodden
2010-11-16 17:56         ` Jeremy Fitzhardinge
2010-11-16 21:28           ` Daniel Stodden
2010-11-17 18:00             ` Jeremy Fitzhardinge
2010-11-17 20:21               ` Daniel Stodden
2010-11-17 21:02                 ` Jeremy Fitzhardinge
2010-11-17 21:57                   ` Daniel Stodden
2010-11-17 22:14                     ` Jeremy Fitzhardinge
     [not found]                       ` <1290035201.11102.1577.camel@agari.van.xensource.com>
     [not found]                         ` <4CE46A03.3010104@goop.org>
     [not found]                           ` <1290040898.11102.1709.camel@agari.van.xensource.com>
2010-11-18  2:29                             ` Jeremy Fitzhardinge
2010-11-17 23:32                     ` Daniel Stodden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1290037367.11102.1663.camel@agari.van.xensource.com \
    --to=daniel.stodden@citrix.com \
    --cc=andres@lagarcavilla.org \
    --cc=jeremy@goop.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.