From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH v4 0/3] x86: modify_ldt improvement, test,
 and config option
Date: Thu, 30 Jul 2015 19:54:25 +0100
Message-ID: <55BA72E1.4050809__39807.7587284662$1438282581$gmane$org@citrix.com>
References: <cover.1437802102.git.luto@kernel.org>
	<55B7AE39.7000101@citrix.com>
	<CALCETrVd56uwkZw0YtaSHKHp5dh7NugQouigibJkr=e3Q_mYyA@mail.gmail.com>
	<55B7B791.2050208@oracle.com>
	<CALCETrXH5_PMqfH1en_5c+5gUpq8SjCnQ3Xaz-K6ej6FgBgLDQ@mail.gmail.com>
	<55B822B8.3090608@citrix.com> <55B841FF.2000102@oracle.com>
	<CALCETrWkMRb+Y3FsJ7+kNYmPxtupM3ZPOeOPwagXytgBqM6tJQ@mail.gmail.com>
	<55B8E16C.2050406@citrix.com> <55B8E68B.2030305@oracle.com>
	<55B9236B.9090507@citrix.com> <55B94451.8040600@oracle.com>
	<CALCETrWA=hAyqqp=yzZ2r_S=9U9hLkd6dZEuNefew8hyLVA_eQ@mail.gmail.com>
	<55B947AF.7020404@citrix.com>
	<CALCETrXp_DV-_Uvekwv7xLHO-5P8Oxkgn6OeXG-6tVOD4RkKMw@mail.gmail.com>
	<55B94F9D.3000405@citrix.com> <55B957DE.60405@cantab.net>
	<55B95863.2000102@oracle.com> <55B95B70.8010902@citrix.com>
	<CALCETrWy93qobHmMWzTfqFN+0Y7DGyM7viwpPMGOeSiXEP0Z6w@mail.gmail.com>
	<55B96FE0.6010600@citrix.com>
	<CALCETrUi2GBdGP2OX+3PwSf0UYjKuf2+DugENe3Y6mUoy-Rfkw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <CALCETrUi2GBdGP2OX+3PwSf0UYjKuf2+DugENe3Y6mUoy-Rfkw@mail.gmail.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andy Lutomirski <luto@amacapital.net>
Cc: "security@kernel.org" <security@kernel.org>, Peter Zijlstra <peterz@infradead.org>, X86 ML <x86@kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Steven Rostedt <rostedt@goodmis.org>, xen-devel <xen-devel@lists.xen.org>, David Vrabel <dvrabel@cantab.net>, Borislav Petkov <bp@alien8.de>, David Vrabel <david.vrabel@citrix.com>, Jan Beulich <jbeulich@suse.com>, Sasha Levin <sasha.levin@oracle.com>, Boris Ostrovsky <boris.ostrovsky@oracle.com>
List-Id: xen-devel@lists.xenproject.org

On 30/07/15 19:30, Andy Lutomirski wrote:
> On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 30/07/2015 00:13, Andy Lutomirski wrote:
>>> On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>>> On 29/07/2015 23:49, Boris Ostrovsky wrote:
>>>>> On 07/29/2015 06:46 PM, David Vrabel wrote:
>>>>>> On 29/07/2015 23:11, Andrew Cooper wrote:
>>>>>>> On 29/07/2015 23:05, Andy Lutomirski wrote:
>>>>>>>> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
>>>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> On 29/07/2015 22:26, Andy Lutomirski wrote:
>>>>>>>>>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
>>>>>>>>>> <boris.ostrovsky@oracle.com> wrote:
>>>>>>>>>>> On 07/29/2015 03:03 PM, Andrew Cooper wrote:
>>>>>>>>>>>> On 29/07/15 15:43, Boris Ostrovsky wrote:
>>>>>>>>>>>>> FYI, I have got a repro now and am investigating.
>>>>>>>>>>>> Good and bad news.  This bug has nothing to do with LDTs
>>>>>>>>>>>> themselves.
>>>>>>>>>>>>
>>>>>>>>>>>> I have worked out what is going on, but this:
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>>>>>>>>>>>> index 5abeaac..7e1a82e 100644
>>>>>>>>>>>> --- a/arch/x86/xen/enlighten.c
>>>>>>>>>>>> +++ b/arch/x86/xen/enlighten.c
>>>>>>>>>>>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
>>>>>>>>>>>> pgprot_t prot)
>>>>>>>>>>>>             pte = pfn_pte(pfn, prot);
>>>>>>>>>>>>    +       (void)*(volatile int*)v;
>>>>>>>>>>>>           if (HYPERVISOR_update_va_mapping((unsigned long)v,
>>>>>>>>>>>> pte, 0)) {
>>>>>>>>>>>>                   pr_err("set_aliased_prot va update failed w/
>>>>>>>>>>>> lazy mode
>>>>>>>>>>>> %u\n", paravirt_get_lazy_mode());
>>>>>>>>>>>>                   BUG();
>>>>>>>>>>>>
>>>>>>>>>>>> Is perhaps not the fix we are looking for, and every use of
>>>>>>>>>>>> HYPERVISOR_update_va_mapping() is susceptible to the same problem.
>>>>>>>>>>> I think in most cases we know that page is mapped so hopefully
>>>>>>>>>>> this is the
>>>>>>>>>>> only site that we need to be careful about.
>>>>>>>>>> Is there any chance we can get some kind of quick-and-dirty fix that
>>>>>>>>>> can go to x86/urgent in the next few days even if a clean fix isn't
>>>>>>>>>> available yet?
>>>>>>>>> Quick and dirty?
>>>>>>>>>
>>>>>>>>> Reading from v is the most obvious and quick way, for areas where
>>>>>>>>> we are
>>>>>>>>> certain v exists, is kernel memory and is expected to have a backing
>>>>>>>>> page.  I don't know offhand how many of current
>>>>>>>>> HYPERVISOR_update_va_mapping() callsites this applies to.
>>>>>>>> __get_user((char *)v, tmp), perhaps, unless there's something better
>>>>>>>> in the wings.  Keep in mind that we need this for -stable, and it's
>>>>>>>> likely to get backported quite quickly due to CVE-2015-5157.
>>>>>>> Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
>>>>>>> would probably work, and certainly be minimal hassle for -stable.
>>>>>>>
>>>>>>> Altering the hypercall used is certainly not something to backport, nor
>>>>>>> are we sure it is a viable fix at this time.
>>>>>> Changing this one use of update_va_mapping to use mmu_update_normal_pt
>>>>>> is the correct fix to unblock this LDT series.  I see no reason why this
>>>>>> cannot be backported.
>>>>> To properly fix it should include batching and that is not something
>>>>> that I think we should target for stable.
>>>> Batching is absolutely not necessary to alter update_va_mapping to
>>>> mmu_update_normal_pt.  After all, update_va_mapping isn't batched.
>>>>
>>>> However this isn't the first issue issue we have had lazy mmu faulting,
>>>> and I doubt it is the last.  There are not many callsites of
>>>> update_va_mapping - I will audit them tomorrow and see if any similar
>>>> issues are lurking elsewhere.
>>> One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
>>> yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
>>> access to fault.  Is this something we should be worried about?
>> Yes.  update_va_mapping() will function perfectly well taking one RW
>> mapping to RO even if there is a second RW mapping.  In such a case, the
>> next LDT access will fault.
> Which is a problem because that alias might still exist, and also
> because Linux really doesn't expect that fault.
>
>> On closer inspection, Xen is rather unhelpful with the fault.  Xen's
>> lazy #PF will be bounced back to the guest with cr2 adjusted to appear
>> in the range passed to set_ldt().  The error code however will be
>> unmodified (and limited only by not-user and not-reserved), so will
>> appear as a non-present read or write supervisor access to an address
>> which the kernel has a valid read mapping of.
> More yuck.
>
> I think I'm just going to stick an unconditional vm_flush_aliases in alloc_ldt.
>
>> Therefore, set_ldt() needs to be confident that there are no writeable
>> mappings to the frames used to make up the LDT.  It could proactively
>> fault them in by accessing one descriptor in each page inside the limit,
>> but by the time a fault is received it is probably too late to work out
>> where the other mapping is which prevented the typechange (or indeed,
>> whether Xen objected to one of the descriptors instead).
> This seems like overkill.
>
> I'm still a bit confused, though: the failure is in xen_free_ldt.  How
> do we make it all the way to xen_free_ldt without the vmapped page
> existing in the guest's page tables?  After all, we had to survive
> xen_alloc_ldt first, and ISTM that should fail in exactly the same
> way.

(Summarising part of a discussion which has just occurred on IRC)

I presume that xen_free_ldt() is called while in the context of an mm
which doesn't have the particular area of the vmalloc() space faulted in.

This is (I presume) why reading 'v' (which occasionally causes a
pagefault to occur) fixes the issue.

~Andrew