All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Daniel Stodden <daniel.stodden@citrix.com>
Cc: "Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>
Subject: Re: blktap: Sync with XCP, dropping zero-copy.
Date: Tue, 16 Nov 2010 09:56:01 -0800	[thread overview]
Message-ID: <4CE2C5B1.1050806@goop.org> (raw)
In-Reply-To: <1289898792.23890.214.camel@ramone>

On 11/16/2010 01:13 AM, Daniel Stodden wrote:
> On Mon, 2010-11-15 at 13:27 -0500, Jeremy Fitzhardinge wrote:
>> On 11/12/2010 07:55 PM, Daniel Stodden wrote:
>>>> Surely this can be dealt with by replacing the mapped granted page with
>>>> a local copy if the refcount is elevated?
>>> Yeah. We briefly discussed this when the problem started to pop up
>>> (again).
>>>
>>> I had a patch, for blktap1 in XS 5.5 iirc, which would fill mapping with
>>> a dummy page mapped in. You wouldn't need a copy, a R/O zero map easily
>>> does the job.
>> Hm, I'd be a bit concerned that that might cause problems if used
>> generically. 
> Yeah. It wasn't a problem because all the network backends are on TCP,
> where one can be rather sure that the dups are going to be properly
> dropped.
>
> Does this hold everywhere ..? -- As mentioned below, the problem is
> rather in AIO/DIO than being Xen-specific, so you can see the same
> behavior on bare metal kernels too. A userspace app seeing an AIO
> complete and then reusing that buffer elsewhere will occassionally
> resend garbage over the network.

Yeah, that sounds like a generic security problem.  I presume the
protocol will just discard the excess retransmit data, but it might mean
a usermode program ends up transmitting secrets it never intended to...

> There are some important parts which would go missing. Such as
> ratelimiting gntdev accesses -- 200 thundering tapdisks each trying to
> gntmap 352 pages simultaneously isn't so good, so there still needs to
> be some bridge arbitrating them. I'd rather keep that in kernel space,
> okay to cram stuff like that into gntdev? It'd be much more
> straightforward than IPC.

What's the problem?  If you do nothing then it will appear to the kernel
as a bunch of processes doing memory allocations, and they'll get
blocked/rate-limited accordingly if memory is getting short.  There's
plenty of existing mechanisms to control that sort of thing (cgroups,
etc) without adding anything new to the kernel.  Or are you talking
about something other than simple memory pressure?

And there's plenty of existing IPC mechanisms if you want them to
explicitly coordinate with each other, but I'd tend to thing that's
premature unless you have something specific in mind.

> Also, I was absolutely certain I once saw VM_FOREIGN support in gntdev..
> Can't find it now, what happened? Without, there's presently still no
> zero-copy.

gntdev doesn't need VM_FOREIGN any more - it uses the (relatively
new-ish) mmu notifier infrastructure which is intended to allow a device
to sync an external MMU with usermode mappings.  We're not using it in
precisely that way, but it allows us to wrangle grant mappings before
the generic code tries to do normal pte ops on them.

> Once the issues were solved, it'd be kinda nice. Simplifies stuff like
> memshr for blktap, which depends on getting hold of original grefs.
>
> We'd presumably still need the tapdev nodes, for qemu, etc. But those
> can stay non-xen aware then.
>
>>>> The only caveat is the stray unmapping problem, but I think gntdev can
>>>> be modified to deal with that pretty easily.
>>> Not easier than anything else in kernel space, but when dealing only
>>> with the refcounts, that's as as good a place as anwhere else, yes.
>> I think the refcount test is pretty straightforward - if the refcount is
>> 1, then we're the sole owner of the page and we don't need to worry
>> about any other users.  If its > 1, then somebody else has it, and we
>> need to make sure it no longer refers to a granted page (which is just a
>> matter of doing a set_pte_atomic() to remap from present to present).
> [set_pte_atomic over grant ptes doesn't work, or does it?]

No, I forgot about grant ptes magic properties.  But there is the hypercall.

>> Then we'd have a set of frames whose lifetimes are being determined by
>> some other subsystem.  We can either maintain a list of them and poll
>> waiting for them to become free, or just release them and let them be
>> managed by the normal kernel lifetime rules (which requires that the
>> memory attached to them be completely normal, of course).
> The latter sounds like a good alternative to polling. So an
> unmap_and_replace, and giving up ownership thereafter. Next run of the
> dispatcher thread can can just refill the foreign pfn range via
> alloc_empty_pages(), to rebalance.

Do we actually need a "foreign page range"?  Won't any pfn do?  If we
start with a specific range of foreign pfns and then start freeing those
pfns back to the kernel, we won't have one for long...

    J

  parent reply	other threads:[~2010-11-16 17:56 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-12 23:31 blktap: Sync with XCP, dropping zero-copy Daniel Stodden
2010-11-12 23:31 ` [PATCH 1/5] blktap: Manage segment buffers in mempools Daniel Stodden
2010-11-12 23:31 ` [PATCH 2/5] blktap: Make VMAs non-foreign and bounce buffered Daniel Stodden
2010-11-12 23:31 ` [PATCH 3/5] blktap: Add queue access macros Daniel Stodden
2010-11-12 23:31 ` [PATCH 4/5] blktap: Forward port to 2.6.32 Daniel Stodden
2010-11-12 23:31 ` [PATCH 5/5] Fix compilation format warning in drivers/xen/blktap/device.c Daniel Stodden
2010-11-13  0:50 ` blktap: Sync with XCP, dropping zero-copy Jeremy Fitzhardinge
2010-11-13  3:56   ` Daniel Stodden
     [not found]   ` <1289620544.11102.373.camel@agari.van.xensource.com>
2010-11-15 18:27     ` Jeremy Fitzhardinge
2010-11-15 19:19       ` Ian Campbell
2010-11-15 19:34         ` Jeremy Fitzhardinge
2010-11-15 20:07           ` Ian Campbell
2010-11-16  0:43             ` Daniel Stodden
2010-11-16  9:13       ` Daniel Stodden
2010-11-16 12:17         ` Stefano Stabellini
2010-11-16 16:11           ` Konrad Rzeszutek Wilk
2010-11-16 16:16             ` Stefano Stabellini
2010-11-17  2:40           ` Daniel Stodden
2010-11-17 12:35             ` Stefano Stabellini
2010-11-17 15:34               ` Jonathan Ludlam
2010-11-16 13:00         ` Dave Scott
2010-11-16 14:48           ` Stefano Stabellini
2010-11-16 17:56         ` Jeremy Fitzhardinge [this message]
2010-11-16 21:28           ` Daniel Stodden
2010-11-17 17:04             ` Ian Campbell
2010-11-17 19:27               ` Daniel Stodden
2010-11-18 13:56                 ` Ian Campbell
2010-11-18 19:37                   ` Daniel Stodden
2010-11-19 10:57                     ` Ian Campbell
2010-11-17 18:00             ` Jeremy Fitzhardinge
2010-11-17 20:21               ` Daniel Stodden
2010-11-17 21:02                 ` Jeremy Fitzhardinge
2010-11-17 21:57                   ` Daniel Stodden
2010-11-17 22:14                     ` Jeremy Fitzhardinge
     [not found]                       ` <1290035201.11102.1577.camel@agari.van.xensource.com>
     [not found]                         ` <4CE46A03.3010104@goop.org>
     [not found]                           ` <1290040898.11102.1709.camel@agari.van.xensource.com>
2010-11-18  2:29                             ` Jeremy Fitzhardinge
2010-11-17 23:32                     ` Daniel Stodden
     [not found] <20101116215621.59FC2CF782@homiemail-mx7.g.dreamhost.com>
2010-11-17 16:36 ` Andres Lagar-Cavilla
2010-11-17 17:52   ` Jeremy Fitzhardinge
2010-11-17 19:47     ` Andres Lagar-Cavilla
2010-11-17 23:42   ` Daniel Stodden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE2C5B1.1050806@goop.org \
    --to=jeremy@goop.org \
    --cc=Xen-devel@lists.xensource.com \
    --cc=daniel.stodden@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.