From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] expand x86 arch_shared_info to support >3 level p2m tree Date: Tue, 9 Sep 2014 10:36:06 +0100 Message-ID: <540ECA06.9070300@citrix.com> References: <1410184138-11277-1-git-send-email-jgross@suse.com> <1410184138-11277-2-git-send-email-jgross@suse.com> <540DB65F.30601@citrix.com> <540DD52D.3030307@suse.com> <540EE46B0200007800032730@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <540EE46B0200007800032730@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Juergen Gross Cc: keir@xen.org, tim@xen.org, ian.jackson@eu.citrix.com, ian.campbell@citrix.com, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 09/09/14 10:28, Jan Beulich wrote: >>>> On 08.09.14 at 18:11, wrote: >> On 09/08/2014 03:59 PM, Andrew Cooper wrote: >>> On 08/09/14 14:48, Juergen Gross wrote: >>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list >>>> currently contains the mfn of the top level page frame of the 3 level >>>> p2m tree, which is used by the Xen tools during saving and restoring >>>> (and live migration) of pv domains. With three levels of the p2m tree >>>> it is possible to support up to 512 GB of RAM for a pv domain. >>> Specifically only 64bit PV domains have the 512GB limit. >>> >>> 32bit PV domains have a far larger supported RAM as they can fit twice >>> as many mfns in each p2m page. >>> >>>> To be >>>> able to support more RAM an additional level is to be added. >>>> >>>> This patch expands struct arch_shared_info with a new p2m tree root >>>> and the number of levels of the p2m tree. The new information is >>>> indicated by the domain to be valid by storing ~0UL into >>>> pfn_to_mfn_frame_list_list (this should be done only if more than >>>> three levels are needed, of course). >>>> >>>> Signed-off-by: Juergen Gross >>>> --- >>>> xen/include/public/arch-x86/xen.h | 8 +++++++- >>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/xen/include/public/arch-x86/xen.h >> b/xen/include/public/arch-x86/xen.h >>>> index f35804b..b7fa2b6 100644 >>>> --- a/xen/include/public/arch-x86/xen.h >>>> +++ b/xen/include/public/arch-x86/xen.h >>>> @@ -224,7 +224,13 @@ struct arch_shared_info { >>>> /* Frame containing list of mfns containing list of mfns containing >> p2m. */ >>>> xen_pfn_t pfn_to_mfn_frame_list_list; >>>> unsigned long nmi_reason; >>>> - uint64_t pad[32]; >>>> + /* >>>> + * Following two fields are valid if pfn_to_mfn_frame_list_list >> contains >>>> + * ~0UL. >>>> + */ >>>> + unsigned long p2m_levels; /* number of levels of p2m tree */ >>>> + xen_pfn_t p2m_root; /* p2m tree top level mfn */ >>>> + uint64_t pad[30]; >>> This padding is now wrong, as unsigned long is only 4 bytes in 32bit, >>> and it is adjacent to another unsigned long. >> The padding in this case is a nightmare. >> >> For 32 bits arch_shared_info_t is 64 bit aligned due to uint64_t pad[]. > No - uint64_t is 4-byte aligned on 32-bit. > >> This enforces a 4 byte hole before the structure (there are 3 4-byte >> fields before it in shared_info_t). And before pad[] there is another >> 4-byte hole. >> >> And don't forget: on 32 bits xen_pfn_t is 4 bytes, too. I could either >> align each new variable explicitly to 8 bytes or I could use a union. >> >> As arch_shared_info_t is at the end of shared_info_t I could just ignore >> the alignment/padding ... >> >> What is your preference? > As I view it, the exact padding size here doesn't really matter: The > shared info lives in a separate page anyway. > > Jan > Which begs the question why the padding exists in the first place? arch_shared_info is the final field in shared_info, and shared_info is explicitly documented not to necessarily have a consistent size across different versions of Xen. Frankly, I think the padding can just be removed. It serves no purpose, and attempting to maintain it correctly is proving very difficult. ~Andrew