From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752989AbXC1Vk1 (ORCPT ); Wed, 28 Mar 2007 17:40:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753156AbXC1Vk1 (ORCPT ); Wed, 28 Mar 2007 17:40:27 -0400 Received: from mx1.redhat.com ([66.187.233.31]:56056 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752992AbXC1Vk0 (ORCPT ); Wed, 28 Mar 2007 17:40:26 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Roland McGrath To: Alan Stern X-Fcc: ~/Mail/utrace Cc: Prasanna S Panchamukhi , Kernel development list Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch) In-Reply-To: Alan Stern's message of Wednesday, 14 March 2007 15:11:10 -0400 Emacs: well, why *shouldn't* you pay property taxes on your editor? Message-Id: <20070328213951.BCD5B1801C4@magilla.sf.frob.com> Date: Wed, 28 Mar 2007 14:39:51 -0700 (PDT) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Sorry I've been slow in responding to your most recent version. I fell into a large hole and couldn't get out until I fixed some bugs. > No, I'm not confused and neither are you. I realize there's no functional > difference between the two sets of enable bits, since Linux doesn't use > hardware task-switching. I just like to keep things neatly separated, > that's all. I don't really know what more to say. I like things neatly separated too, most especially between clearly meaningful and misleading yet apparently meaningful. Overloading unrelated hardware bits for no material purpose will never make sense to me. > > How would that happen? This would mean that some user process has been > > allowed to enable ioperm for some io port that kernel drivers also send to > > from interrupt handlers. Can that ever happen? > > I haven't checked the ioperm code to be certain, but it seems like the > sort of thing somebody might want to do on occasion. I checked the ioperm code. As far as I can tell, if you have CAP_SYS_RAWIO then you can use ioperm to enable any port you want, so this will always be possible. (That seems a bit nutty to me, hence my earlier reaction.) > > > It gives drivers a way to tell whether or not the breakpoint is currently > > > installed without having to do explicit tracking of installed() and > > > uninstalled() callbacks. > > > > How could that ever be used that would not be racy and thus buggy? A > > registration call on another CPU could cause a change and callback just > > after you fetched the value. > > Not if you have interrupts disabled. Debug register settings are > disseminated from one CPU to the others by means of an IPI. But the callback is not per-CPU, it is per-registration. When a new registration displaces an old one, or a deregistration stops displacing one, isn't the status change in the data structure and the callback made right away, by the thread doing the registration/deregistration call? Another CPU with interrupts disabled won't have its breakpoints actually change, but the data structure will have changed. Isn't that right? > I implemented most of the changes we discussed. Ignoring the length for > execute breakpoints turned out not to be a good idea because it would > affect the way ptrace works, so the code verifies it just like any other > kind of breakpoint. I think that's perfectly fine. > I also decided against adding a .bits member. It doesn't really gain very > much; the savings in encoding the breakpoint values is trivial -- one line > of code on i386. I think this is a mistake. On other machines, there is no need for more than one field and .len will go unused. There is no reason not to encode ahead of time. If it's useful for callers to be able to extract the type and length from a struct hw_breakpoint, it's easy to define macros or inlines in asm/hw_breakpoint.h that go the other way. > And it helps to have the original length and type values available for > use by the ptrace routines. There's no reason __modify_user_hw_breakpoint can't use an encoded value. modify_user_hw_breakpoint can do checking and encoding before taking the lock. > In fact, I decided to add a superfluous bit to the type code. That's to > help disambiguate between length and type values; it's easy to mix the > two of them up. It doesn't hurt to add some constant bits to the API values that you just mask out (after checking) as part of the encoding convsersion. In fact, it's a good way to make sure people are using the right macros. > Likewise, the length macros don't give the encoded values; that's so > people can just specify the length directly instead of using the macro. I object to this. The purpose of having macros is to give a constrained set of values that can be used at compile time. Letting people use literal numbers is just asking for people to write their calls unportably. For machines that support only LEN8, I'd planned to define it to some magic bit pattern just so it could be checked and rule out cavalier users who don't pay attention to the macros. > There's a small question about the value of the error_code argument for > send_sigtrap(). The value passed into do_debug() isn't available in > ptrace_triggered() -- but since it is always 0, that's what I'm using. > I'm not sure what it's supposed to mean anyway. task->thread.trap_no and task->thread.error_code usually store the values from the hardware trap frame when a trap is turned into a signal (e.g. see do_trap). This is the hardware trap number, which is 1 for do_debug. The error code is a word of the hardware trap frame whose meaning is specified differently for each trap number. For debug traps, it's always zero. For other kinds of traps, it is sometimes useful to have the value. My review comments below are about your patch of 3/22 (described as "actually works with 2.6.21-rc4"). > - if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code, > - SIGTRAP) == NOTIFY_STOP) > + args.regs = regs; > + args.str = "debug"; > + get_debugreg(args.err, 6); > + set_debugreg(0, 6); /* DR6 is never cleared by the CPU */ > + args.trapnr = error_code; > + args.signr = SIGTRAP; > + if (atomic_notifier_call_chain(&i386die_chain, DIE_DEBUG, &args) == > + NOTIFY_STOP) Put it back using notify_die, just pass dr6 instead of error_code. > -#define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */ > -#define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */ > +#define DR_LOCAL_EXACT (0x100) /* Local slow the pipeline */ > +#define DR_GLOBAL_EXACT (0x200) /* Global slow the pipeline */ Don't mess with these names if there isn't a good reason. It's unrelated to the work you're doing. > + > +/* > + * HW breakpoint additions > + */ > + > +#include > +#include Put all this inside #ifdef __KERNEL__. > + * Appropriate macros are defined in include/asm/hw_breakpoint.h. > + * Execute breakpoints must have @len equal to the special value > + * %HW_BREAKPOINT_LEN_EXECUTE. This should be more explicit about it being machine-dependent what subset of the macros will be defined. > Execute-breakpoint traps occur before the > + * breakpointed instruction runs; all other types of trap occur after the > + * memory access has taken place. We need to say something about the restart behavior. i.e., figure out the situation with the x86 RF flag and what the story is on other machines that have instruction breakpoint registers. > + * Hardware breakpoints are implemented using the CPU's debug registers, > + * which are a limited hardware resource. Requests to register a > + * breakpoint will always succeed (provided the parameters are valid), I've said before and still maintain that there should be the option to have a NULL installed callback and fail immediately if it can't be installed right now. The wording below about installed being NULL is not clear on what this means. > +/* QUESTIONS > + > + Error code in ptrace_triggered? Zero is right. > + Set RF flag bit for execution faults? Yes, I don't think you'll ever progress if you haven't set it. However, setting it opens a can of worms. Once you set it, the bit could leak into a signal context, or be seen via ptrace, or stay set when ptrace changes the pc, etc. It requires some investigation. > + TF flag bit for single-step exceptions in kernel space? What's the question? I don't think hw_breakpoint has anything to do with TF or single-step at all. > + CPU hotplug, kexec, etc? Using register_cpu_notifier in an __init function seems to be what everything else does, or you can hack __cpu_up directly. For kexec, clearing dr7 in machine_kexec seems like a good idea. > Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c Though you've split the header file into generic and i386, I see this is all still in the i386 file. I'd like to see the shareable code (list handling, priority stuff, multi-CPU coordination) in a common file. For x86_64, all the machine-specific code is all but identical (literally just one bit different), so it might as well actually share the i386 source file. But I'm keen to do the powerpc64 machine-dependent bits as soon as your generic code is making it easy enough for me. :-) I think the ia64 version will be straightforward too, though I won't do that one myself. A notable difference on these processors (and maybe all other ones that have breakpoint registers) is that execute breakpoints and data breakpoints are separate disjoint resources. There may be zero execute breakpoint registers or a few, and may be one data breakpoint register or a few, but there is no single HB_NUM of how many breakpoint slots for any kind whatever. > +#include Surely this should be . > + /* Block kernel breakpoint updates from other CPUs */ > + local_irq_save(flags); I have a feeling this is more costly than we want, though I don't really know. It seems to me that things in struct cpu_hw_breakpoint are not really per-CPU, except for bp_task. They are "current global state", right? So I think it fits well to have the per_cpu struct contain just bp_task and a pointer to the current state struct, and use RCU to replace that struct when registrations change. Perhaps rather than actually updating other CPU's pointers in the RCU fashion, that would just be done by each CPU for itself from the IPI handler that changes the hardware. Still, it just requires rcu_read_lock+rcu_dereference at context switch. > + switch (chbi->num_kbps) { Doesn't this unnecessarily do all four settings in a thread that only uses one? The old code did that too, but now it seems easy enough to cache the number of active thread breakpoints and switch on that. > + /* Mask in the parts of DR7 that refer to the new thread */ > + chbi->dr7 = chbi->kdr7 | (~chbi->kdr7_mask & thbi->tdr7); > + set_debugreg(chbi->dr7, 7); I'd say it's worth saving the loads here to recompute tdr7 with the kernel-used bits masked out and cache it that way, so the fast path looks at only thbi->tdr7|chbi->kdr7. It does not seem necessary to cache the dr7 value. Save the store. set_debugreg(chbi->kdr7 | thbi->tdr7, 7); The only use of the saved value is in hw_breakpoint_handler, which can juse use chbi->kdr7 | thbi->tdr7. Thanks, Roland