From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756228AbcETRAD (ORCPT ); Fri, 20 May 2016 13:00:03 -0400 Received: from mail-oi0-f51.google.com ([209.85.218.51]:35386 "EHLO mail-oi0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755083AbcETQ77 (ORCPT ); Fri, 20 May 2016 12:59:59 -0400 MIME-Version: 1.0 In-Reply-To: <20160520164128.4rypa3pnkkyq6jvz@treble> References: <20160429212546.t26mvthtvh7543ff@treble> <20160429224112.kl3jlk7ccvfceg2r@treble> <20160502135243.jkbnonaesv7zfios@treble> <20160519231546.yvtqz5wacxvykvn2@treble> <20160520140513.dsgfb5sei52qge3w@treble> <20160520164128.4rypa3pnkkyq6jvz@treble> From: Andy Lutomirski Date: Fri, 20 May 2016 09:59:38 -0700 Message-ID: Subject: Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking To: Josh Poimboeuf Cc: Jiri Kosina , Ingo Molnar , X86 ML , Heiko Carstens , "linux-s390@vger.kernel.org" , live-patching@vger.kernel.org, Michael Ellerman , Chris J Arges , linuxppc-dev@lists.ozlabs.org, Jessica Yu , Petr Mladek , Jiri Slaby , Vojtech Pavlik , "linux-kernel@vger.kernel.org" , Miroslav Benes , Peter Zijlstra Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 20, 2016 at 9:41 AM, Josh Poimboeuf wrote: > On Fri, May 20, 2016 at 08:41:00AM -0700, Andy Lutomirski wrote: >> On Fri, May 20, 2016 at 7:05 AM, Josh Poimboeuf wrote: >> > On Thu, May 19, 2016 at 04:39:51PM -0700, Andy Lutomirski wrote: >> >> On Thu, May 19, 2016 at 4:15 PM, Josh Poimboeuf wrote: >> >> > Note this example is with today's unwinder. It could be made smarter to >> >> > get the RIP from the pt_regs so the '?' could be removed from >> >> > copy_page_to_iter(). >> >> > >> >> > Thoughts? >> >> >> >> I think we should do that. The silly sample patch I sent you (or at >> >> least that I think I sent you) did that, and it worked nicely. >> > >> > Yeah, we can certainly do something similar to make the unwinder >> > smarter. It should be very simple with this approach: if it finds the >> > pt_regs() function on the stack, the (struct pt_regs *) pointer will be >> > right after it. >> >> That seems barely easier than checking if it finds a function in >> .entry that's marked on the stack, > > Don't forget you'd also have to look up the function's pt_regs offset in > the table. > >> and the latter has no runtime cost. > > Well, I had been assuming the three extra pushes and one extra pop for > interrupts would be negligible, but I'm definitely no expert there. Any > idea how I can measure it? I think it would be negligible, at least for interrupts, since interrupts are already extremely expensive. But I don't love adding assembly code that makes them even slower. The real thing I dislike about this approach is that it's not a normal stack frame, so you need code in the unwinder to unwind through it correctly, which makes me think that you're not saving much complexity by adding the pushes. But maybe I'm wrong. > >> > I'm not sure about the idea of a table. I get the feeling it would add >> > more complexity to both the entry code and the unwinder. How would you >> > specify the pt_regs location when it's on a different stack? (See the >> > interrupt macro: non-nested interrupts will place pt_regs on the task >> > stack before switching to the irq stack.) >> >> Hmm. I need to think about the interrupt stack case a bit. Although >> the actual top of the interrupt stack has a nearly fixed format, and I >> have old patches to clean it up and make it actually be fixed. I'll >> try to dust those off and resend them soon. > > How would that solve the problem? Would pt_regs be moved or copied to > the irq stack? Hmm. Maybe the right way would be to unwind off the IRQ stack in two steps. Step one would be to figure out that you're on the IRQ stack and pop your way off it. Step two would be to find pt_regs. But that could be rather nasty to implement. Maybe what we actually want to do is this: First, apply this thing: https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist&id=2231ec7e0bcc1a2bc94a17081511ab54cc6badd1 that gives us a common format for the IRQ stack. Second, use my idea of making a table of offsets to pt_regs, so we'd have, roughly: ENTER_IRQ_STACK old_rsp=%r11 call __do_softirq ANNOTATE_IRQSTACK_PTREGS_CALL offset=0 LEAVE_IRQ_STACK the idea here is that offset=0 means that there is no offset beyond that implied by ENTER_IRQ_STACK. What actually gets written to the table is offset 8, because ENTER_IRQ_STACK itself does one push. So far, this has no runtime overhead at all. Then, in the unwinder, the logic becomes: If the return address is annotated in the ptregs entry table, look up the offset. If the offset is in bounds, then use it. If the offset is out of bounds and we're on the IRQ stack, then unwind the ENTER_IRQ_STACK as well. Does that seem reasonable? I can try to implement it and see what it looks like. --Andy > >> > If you're worried about performance, I can remove the syscall >> > annotations. They're optional anyway, since the pt_regs is always at >> > the same place on the stack for syscalls. >> > >> > I think three extra pushes wouldn't be a performance issue for >> > interrupts/exceptions. And they'll go away when we finally bury >> > CONFIG_FRAME_POINTER. >> >> I bet we'll always need to support CONFIG_FRAME_POINTER for some >> embedded systems. > > Yeah, probably. > >> I'll play with this a bit. > > Thanks, looking forward to seeing what you come up with. > > -- > Josh -- Andy Lutomirski AMA Capital Management, LLC