From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:50013)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1QMibF-0002Zk-DU
	for qemu-devel@nongnu.org; Wed, 18 May 2011 11:23:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1QMibE-0006fp-1q
	for qemu-devel@nongnu.org; Wed, 18 May 2011 11:23:49 -0400
Received: from mx1.redhat.com ([209.132.183.28]:16852)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1QMibD-0006fj-NK
	for qemu-devel@nongnu.org; Wed, 18 May 2011 11:23:48 -0400
Message-ID: <4DD3E47F.9060104@redhat.com>
Date: Wed, 18 May 2011 18:23:43 +0300
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
References: <4DD3C5B9.1080908@redhat.com> <4DD3D236.90708@siemens.com>
	<4DD3D95E.2060105@redhat.com> <4DD3E1B3.3020405@siemens.com>
In-Reply-To: <4DD3E1B3.3020405@siemens.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] Memory API
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: qemu-devel <qemu-devel@nongnu.org>

On 05/18/2011 06:11 PM, Jan Kiszka wrote:
> On 2011-05-18 16:36, Avi Kivity wrote:
> >>  I would add another drawback:
> >>
> >>    - Inability to identify the origin of a region accesses and handle them
> >>      differently based on the source.
> >>
> >>      That is at least problematic for the x86 APIC which is CPU local. Our
> >>      current way do deal with it is, well, very creative and falls to
> >>      dust if a guest actually tries to remap the APIC.
> >>
> >>  However, I'm unsure if that can easily be addressed. As long as only x86
> >>  is affected, it's tricky to ask for a big infrastructure to handle this
> >>  special case. Maybe there some other use cases, don't know.
> >
> >  We could implement it with a per-cpu MemoryRegion, with each cpu's
> >  MemoryRegion populated by a different APIC sub-region.
>
> The tricky part is wiring this up efficiently for TCG, ie. in QEMU's
> softmmu. I played with passing the issuing CPUState (or NULL for
> devices) down the MMIO handler chain. Not totally beautiful as
> decentralized dispatching was still required, but at least only
> moderately invasive. Maybe your API allows for cleaning up the
> management and dispatching part, need to rethink...

My suggestion is opposite - have a different MemoryRegion for each (e.g. 
CPUState::memory).  Then the TLBs will resolve to a different ram_addr_t 
for the same physical address, for the local APIC range.

> >
> >  There is nothing we can do with a return code.  You can't fail an mmio
> >  that causes overlapping physical memory map.
>
> We must fail such requests to make progress with the API. That may
> happen either on caller side or in cpu_register_memory_region itself
> (hwerror). Otherwise the new API will just be a shiny new facade for on
> old and still fragile building.

> >>
> >>>   void cpu_unregister_memory_region(MemoryRegion *mr);
> >
> >  Instead, we need cpu_unregister_memory_region() to restore any
> >  previously hidden ranges.
>
> I disagree. Both approaches, rejecting overlaps or restoring them, imply
> subtle semantical changes that exiting device models have to deal with.
> We can't use any of both without some review and conversion work. So
> better head for the clearer and, thus, cleaner approach.

We need to head for the more hardware-like approach.  What happens when 
you program overlapping BARs?  I imagine the result is 
implementation-defined, but ends up with one region decoded in 
preference to the other.  There is simply no way to reject an 
overlapping mapping.

> >>    - Get rid of wasteful PhysPageDesc at this chance?
> >
> >  That's the intent, but not at this chance, rather later on.
>
> The features you expose to the users somehow have to be mapped on data
> structures internally. Those need to support both the
> registration/deregistration as well as the lookup efficiently. By
> postponing that internal design to the point when we already switched to
> facade, the risk arises that a suboptimal interface was exposed and
> conversion was done in vain.

That is true, and one of the reasons this is posted as an [RFC] (I 
usually prefer [PATCH]).  However, I don't see a way to change the 
internals without changing the API.

> >   But I want
> >  the API to be compatible with the goal so we don't have to touch all
> >  devices again.
>
> We can't perform any proper change in the area without touching all
> users, some a bit more, some only minimally.

Right.  I want to touch the devices *once*, to covert them to this API, 
then work on the internals.

> >
> >>    - How to hook into the region maintenance (CPUPhysMemoryClient,
> >>      listening vs. filtering or modifying changes)? How to simplify
> >>      memory clients this way?
> >
> >  I'd leave things as is, at least for the beginning.  CPUPhysMemoryClient
> >  is global in nature, whereas MemoryRegion is local (offsets are relative
> >  to the containing region).
>
> See [1]: We really need to get rid of slot management on
> CPUPhysMemoryClient side. Your API provides a perfect opportunity to
> establish the infrastructure of slot tracking at a central place. We can
> then switch from reporting cpu_registering_memory events to reporting
> coalesced changes to slots, those slot that also the core uses. So a new
> CPUPhysMemoryClient API needs to be considered in this API change as
> well - or we change twice in the end.

The kvm memory slots (and hopefully future qemu memory slots) are a 
flattened view of the MemoryRegion tree.  There is no 1:1 mapping.


-- 
error compiling committee.c: too many arguments to function