From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933132AbZFOSK6 (ORCPT ); Mon, 15 Jun 2009 14:10:58 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1763674AbZFOSKt (ORCPT ); Mon, 15 Jun 2009 14:10:49 -0400 Received: from tomts40.bellnexxia.net ([209.226.175.97]:50552 "EHLO tomts40-srv.bellnexxia.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1758950AbZFOSKs (ORCPT ); Mon, 15 Jun 2009 14:10:48 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AokFAIAlNkpMQWQl/2dsb2JhbACBT9VZhA0F Date: Mon, 15 Jun 2009 14:05:27 -0400 From: Mathieu Desnoyers To: Linus Torvalds Cc: Ingo Molnar , mingo@redhat.com, hpa@zytor.com, paulus@samba.org, acme@redhat.com, linux-kernel@vger.kernel.org, a.p.zijlstra@chello.nl, penberg@cs.helsinki.fi, vegard.nossum@gmail.com, efault@gmx.de, jeremy@goop.org, npiggin@suse.de, tglx@linutronix.de, linux-tip-commits@vger.kernel.org Subject: Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chain support to use NMI-safe methods Message-ID: <20090615180527.GB4201@Krystal> References: <20090615171845.GA7664@elte.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.21.3-grsec (i686) X-Uptime: 14:01:29 up 107 days, 14:27, 3 users, load average: 0.89, 0.48, 0.40 User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Linus Torvalds (torvalds@linux-foundation.org) wrote: > > > On Mon, 15 Jun 2009, Ingo Molnar wrote: > > > > A simple cr2 corruption would explain all those cc1 SIGSEGVs and > > other user-space crashes i saw, with sufficiently intense sampling - > > easily. > > Note that we could work around the %cr2 issue, since any corruption is > always nicely "nested" (ie there are never any SMP issues with async > writes to the register). > > So what we _could_ do is to have a magic value for %cr2, along with a "NMI > sequence count", and if we see that value, we just return (without doing > anything) from the page fault handler. > > Then, the NMI handler would be changed to always write that value to %cr2 > after it has done the operation that could fault, and do an atomic > increment of the NMI sequence count. Then, we can do something like this > in the page fault handler: > > if (cr2 == MAGIC_CR2) { > static unsigned long my_seqno = -1; > if (my_seqno != nmi_seqno) { > my_seqno = nmi_seqno; > return; > } > } > > where the whole (and only) point of that "seqno" is to protect against > user space doing something like > > int i = *(int *)MAGIC_CR2; > > and causing infinite faults. > > If a real NMI happens, then nmi_seqno will always be different, and we'll > just retry the fault (the NMI handler would do something like > > write_cr2(MAGIC_CR2); > atomic_inc(&nmi_seqno); > > to set it all up). > > Anyway, I do think that the _correct_ solution is to not do page faults > from within NMI's, but the above is an outline of how we could _try_ to > handle it if we really really wanted to. IOW, the fact that cr2 gets > corrupted is not insurmountable, exactly because we _could_ always just > retrigger the page fault, and thus "re-create' the corrupted %cr2 value. > > Hacky, hacky. And I'm not sure how happy CPU's even are to have %cr2 > written to, so we could hit CPU issues. > Hrm, would it be possible to save the c2 register upon nmi handler entry and restore it before iret instead ? This would ensure a nmi-interrupted page fault handler would continue what it was doing with a non-corrupted cr2 register after returning from nmi. Plus, this involves no modification to the page fault handler fast path. But I fear I might be missing something totally obvious. Mathieu > Linus -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68