From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759460Ab2HXNdI (ORCPT ); Fri, 24 Aug 2012 09:33:08 -0400 Received: from smtp.ctxuk.citrix.com ([62.200.22.115]:56470 "EHLO SMTP.EU.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755090Ab2HXNdF (ORCPT ); Fri, 24 Aug 2012 09:33:05 -0400 X-IronPort-AV: E=Sophos;i="4.80,304,1344211200"; d="scan'208";a="14169532" Date: Fri, 24 Aug 2012 14:32:38 +0100 From: Stefano Stabellini X-X-Sender: sstabellini@kaball.uk.xensource.com To: Thomas Gleixner CC: Konrad Rzeszutek Wilk , Borislav Petkov , Attilio Rao , "linux-kernel@vger.kernel.org" , "mingo@redhat.com" , "hpa@zytor.com" , "x86@kernel.org" , Stefano Stabellini Subject: Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve In-Reply-To: Message-ID: References: <1345648122-11935-1-git-send-email-attilio.rao@citrix.com> <1345648122-11935-2-git-send-email-attilio.rao@citrix.com> <50364FE5.1070608@citrix.com> <503664C3.7010301@citrix.com> <20120824100309.GG3019@liondog.tnic> <20120824113644.GE11007@konrad-lan.dumpdata.com> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 24 Aug 2012, Thomas Gleixner wrote: > I looked at the whole pgt_buf_* mess and it's amazingly stupid. We > could avoid all that dance and make all of that pgt_buf_* stuff static > and provide proper accessor functions and hand start, end, top to the > reserve function instead of fiddling with global variables all over > the place. That'd be a real cleanup and progress. > > But we can't do that easily. And why? Because XEN is making magic > decisions based on those globals in mask_rw_pte(). > > /* > * If the new pfn is within the range of the newly allocated > * kernel pagetable, and it isn't being mapped into an > * early_ioremap fixmap slot as a freshly allocated page, make sure > * it is RO. > */ > if (((!is_early_ioremap_ptep(ptep) && > pfn >= pgt_buf_start && pfn < pgt_buf_top)) || > (is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1))) > > This comment along with the implementation is really a master piece of > obfuscation. Let's see what this is doing. RO is enforced when: > > This is not an early ioreamp AND > > pfn >= pgt_buf_start && pfn < pgt_buf_top > > So why is this checking pgt_buf_top? The early stuff is installed > within pgt_buf_start and pgt_buf_end. Anything which is >= > pgt_buf_end at this point is completely wrong. Unfortunately pgt_buf_end only marks the current end of the pagetable pages (pgt_buf_end keeps increasing during kernel_physical_mapping_init). However at some point kernel_physical_mapping_init is going to start mapping the pagetable pages themselves, when that happens some of them are not pagetable pages yet (pgt_buf_end <= page < pgt_buf_top) but they are going to be in the near future. On Xen they all need to be mapped RO because otherwise as soon as they are hooked into the pagetable the kernel goes boom. In fact this problem only happens if the pagetable pages are allocated in high memory, we started to see it after: commit 4b239f458c229de044d6905c2b0f9fe16ed9e01e Author: Yinghai Lu Date: Fri Dec 17 16:58:28 2010 -0800 x86-64, mm: Put early page table high > Now the second check is even more interesting: > > If this is an early ioremap AND > > pfn != (pgt_buf_end -1 ) > > then it's forced RO as well. > > So this checks whether the early ioremap is happening on the last > allocated pfn from the pgt_buf range. > > OMG, really great design! And the comment above that if() obfuscation > is not really helping much. > > If anything is missing a semantic documentation and analysis then > definitely code like this which is just a cobbled together steaming > pile of .... I agree it is terrible and fragile, I would be happy to get rid of it. However it is a difficult problem to solve and even if we had lengthy discussions on the LKML we couldn't find any better solutions at the time. The comment is not much, but there quite a bit on context in the commit message: commit 279b706bf800b5967037f492dbe4fc5081ad5d0f Author: Stefano Stabellini Date: Thu Apr 14 15:49:41 2011 +0100 x86,xen: introduce x86_init.mapping.pagetable_reserve