From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: [PATCH v4 0/3] x86: modify_ldt improvement, test, and config option Date: Thu, 30 Jul 2015 11:30:25 -0700 Message-ID: References: <55B75993.90909@citrix.com> <55B7AE39.7000101@citrix.com> <55B7B791.2050208@oracle.com> <55B822B8.3090608@citrix.com> <55B841FF.2000102@oracle.com> <55B8E16C.2050406@citrix.com> <55B8E68B.2030305@oracle.com> <55B9236B.9090507@citrix.com> <55B94451.8040600@oracle.com> <55B947AF.7020404@citrix.com> <55B94F9D.3000405@citrix.com> <55B957DE.60405@cantab.net> <55B95863.2000102@oracle.com> <55B95B70.8010902@citrix.com> <55B96FE0.6010600@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55B96FE0.6010600@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: "security@kernel.org" , Peter Zijlstra , X86 ML , "linux-kernel@vger.kernel.org" , Steven Rostedt , xen-devel , David Vrabel , Borislav Petkov , David Vrabel , Jan Beulich , Sasha Levin , Boris Ostrovsky List-Id: xen-devel@lists.xenproject.org On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper wrote: > On 30/07/2015 00:13, Andy Lutomirski wrote: >> On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper >> wrote: >>> On 29/07/2015 23:49, Boris Ostrovsky wrote: >>>> On 07/29/2015 06:46 PM, David Vrabel wrote: >>>>> On 29/07/2015 23:11, Andrew Cooper wrote: >>>>>> On 29/07/2015 23:05, Andy Lutomirski wrote: >>>>>>> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper >>>>>>> wrote: >>>>>>>> On 29/07/2015 22:26, Andy Lutomirski wrote: >>>>>>>>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky >>>>>>>>> wrote: >>>>>>>>>> On 07/29/2015 03:03 PM, Andrew Cooper wrote: >>>>>>>>>>> On 29/07/15 15:43, Boris Ostrovsky wrote: >>>>>>>>>>>> FYI, I have got a repro now and am investigating. >>>>>>>>>>> Good and bad news. This bug has nothing to do with LDTs >>>>>>>>>>> themselves. >>>>>>>>>>> >>>>>>>>>>> I have worked out what is going on, but this: >>>>>>>>>>> >>>>>>>>>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >>>>>>>>>>> index 5abeaac..7e1a82e 100644 >>>>>>>>>>> --- a/arch/x86/xen/enlighten.c >>>>>>>>>>> +++ b/arch/x86/xen/enlighten.c >>>>>>>>>>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, >>>>>>>>>>> pgprot_t prot) >>>>>>>>>>> pte = pfn_pte(pfn, prot); >>>>>>>>>>> + (void)*(volatile int*)v; >>>>>>>>>>> if (HYPERVISOR_update_va_mapping((unsigned long)v, >>>>>>>>>>> pte, 0)) { >>>>>>>>>>> pr_err("set_aliased_prot va update failed w/ >>>>>>>>>>> lazy mode >>>>>>>>>>> %u\n", paravirt_get_lazy_mode()); >>>>>>>>>>> BUG(); >>>>>>>>>>> >>>>>>>>>>> Is perhaps not the fix we are looking for, and every use of >>>>>>>>>>> HYPERVISOR_update_va_mapping() is susceptible to the same problem. >>>>>>>>>> I think in most cases we know that page is mapped so hopefully >>>>>>>>>> this is the >>>>>>>>>> only site that we need to be careful about. >>>>>>>>> Is there any chance we can get some kind of quick-and-dirty fix that >>>>>>>>> can go to x86/urgent in the next few days even if a clean fix isn't >>>>>>>>> available yet? >>>>>>>> Quick and dirty? >>>>>>>> >>>>>>>> Reading from v is the most obvious and quick way, for areas where >>>>>>>> we are >>>>>>>> certain v exists, is kernel memory and is expected to have a backing >>>>>>>> page. I don't know offhand how many of current >>>>>>>> HYPERVISOR_update_va_mapping() callsites this applies to. >>>>>>> __get_user((char *)v, tmp), perhaps, unless there's something better >>>>>>> in the wings. Keep in mind that we need this for -stable, and it's >>>>>>> likely to get backported quite quickly due to CVE-2015-5157. >>>>>> Hmm - something like that tucked inside HYPERVISOR_update_va_mapping() >>>>>> would probably work, and certainly be minimal hassle for -stable. >>>>>> >>>>>> Altering the hypercall used is certainly not something to backport, nor >>>>>> are we sure it is a viable fix at this time. >>>>> Changing this one use of update_va_mapping to use mmu_update_normal_pt >>>>> is the correct fix to unblock this LDT series. I see no reason why this >>>>> cannot be backported. >>>> To properly fix it should include batching and that is not something >>>> that I think we should target for stable. >>> Batching is absolutely not necessary to alter update_va_mapping to >>> mmu_update_normal_pt. After all, update_va_mapping isn't batched. >>> >>> However this isn't the first issue issue we have had lazy mmu faulting, >>> and I doubt it is the last. There are not many callsites of >>> update_va_mapping - I will audit them tomorrow and see if any similar >>> issues are lurking elsewhere. >> One thing I should add: nothing flushes old aliases in xen_alloc_ldt, >> yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT >> access to fault. Is this something we should be worried about? > > Yes. update_va_mapping() will function perfectly well taking one RW > mapping to RO even if there is a second RW mapping. In such a case, the > next LDT access will fault. Which is a problem because that alias might still exist, and also because Linux really doesn't expect that fault. > > On closer inspection, Xen is rather unhelpful with the fault. Xen's > lazy #PF will be bounced back to the guest with cr2 adjusted to appear > in the range passed to set_ldt(). The error code however will be > unmodified (and limited only by not-user and not-reserved), so will > appear as a non-present read or write supervisor access to an address > which the kernel has a valid read mapping of. More yuck. I think I'm just going to stick an unconditional vm_flush_aliases in alloc_ldt. > Therefore, set_ldt() needs to be confident that there are no writeable > mappings to the frames used to make up the LDT. It could proactively > fault them in by accessing one descriptor in each page inside the limit, > but by the time a fault is received it is probably too late to work out > where the other mapping is which prevented the typechange (or indeed, > whether Xen objected to one of the descriptors instead). This seems like overkill. I'm still a bit confused, though: the failure is in xen_free_ldt. How do we make it all the way to xen_free_ldt without the vmapped page existing in the guest's page tables? After all, we had to survive xen_alloc_ldt first, and ISTM that should fail in exactly the same way. Anyway, I'll send v6. --Andy