From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: blktap: Sync with XCP, dropping zero-copy.
Date: Wed, 17 Nov 2010 13:02:34 -0800
Message-ID: <4CE442EA.1090708@goop.org>
References: <1289604707-13378-1-git-send-email-daniel.stodden@citrix.com>	
	<4CDDE0DA.2070303@goop.org>	
	<1289620544.11102.373.camel@agari.van.xensource.com>	
	<4CE17B80.7080606@goop.org> <1289898792.23890.214.camel@ramone>	
	<4CE2C5B1.1050806@goop.org>	
	<1289942932.11102.802.camel@agari.van.xensource.com>	
	<4CE41853.1010000@goop.org>
	<1290025317.11102.1216.camel@agari.van.xensource.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <1290025317.11102.1216.camel@agari.van.xensource.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Daniel Stodden <daniel.stodden@citrix.com>
Cc: "Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

On 11/17/2010 12:21 PM, Daniel Stodden wrote:
> And, like all granted frames, not owning them implies they are not
> resolvable via mfn_to_pfn, thereby failing in follow_page, thereby gup()
> without the VM_FOREIGN hack.

Hm, I see.  Well, I wonder if using _PAGE_SPECIAL would help (it is put
on usermode ptes which don't have a backing struct page).  After all,
there's no fundamental reason why it would need a pfn; the mfn in the
pte is what's actually needed to ultimately generate a DMA descriptor.

> Correct me if I'm mistaken. I used to be quicker looking up stuff on
> arch-xen kernels, but I think fundamental constants of the Xen universe
> didn't change since last time.

No, but Linux has.

> [
> Part of the reason why blktap *never* frees those pages, apart from
> being slightly greedy, are deadlock hazards when writing those nodes in
> dom0 through the pagecache, as dom0 might. You need memory pools on the
> datapath to guarantee progress under pressure. That got pretty ugly
> after 2.6.27, btw.
> ]

That's what mempools are intended to solve.

> In any case, let's skip trying what happens if a thundering herd of
> several hundred userspace disks tries gfp()ing their grant slots out of
> dom0 without without arbitration.

I'm not against arbitration, but I don't think that's something that
should be implemented as part of a Xen driver.

>>> I guess we've been meaning the same thing here, unless I'm
>>> misunderstanding you. Any pfn does, and the balloon pagevec allocations
>>> default to order 0 entries indeed. Sorry, you're right, that's not a
>>> 'range'. With a pending re-xmit, the backend can find a couple (or all)
>>> of the request frames have count>1. It can flip and abandon those as
>>> normal memory. But it will need those lost memory slots back, straight
>>> away or next time it's running out of frames. As order-0 allocations.
>> Right.  GFP_KERNEL order 0 allocations are pretty reliable; they only
>> fail if the system is under extreme memory pressure.  And it has the
>> nice property that if those allocations block or fail it rate limits IO
>> ingress from domains rather than being crushed by memory pressure at the
>> backend (ie, the problem with trying to allocate memory in the writeout
>> path).
>>
>> Also the cgroup mechanism looks like an extremely powerful way to
>> control the allocations for a process or group of processes to stop them
>> from dominating the whole machine.
> Ah. In case it can be put to work to bind processes allocating pagecache
> entries for dirtying to some boundary, I'd be really interested. I think
> I came across it once but didn't take the time to read the docs
> thoroughly. Can it?

I'm not sure about dirtyness - it seems like something that should be
within its remit, even if it doesn't currently have it.

The cgroup mechanism is extremely powerful, now that I look at it.  You
can do everything from setting block IO priorities and QoS parameters to
CPU limits.

    J