Frederic Weisbecker wrote:
> On Sat, Sep 12, 2009 at 12:09:40AM +0200, Jan Kiszka wrote:
>> Frederic Weisbecker wrote:
>>> This patch rebase the implementation of the breakpoints API on top of
>>> perf counters instances.
>>>
>>> The core breakpoint API has changed a bit:
>>>
>>> - register_kernel_hw_breakpoint() now takes a cpu as a parameter. For
>>>   now it doesn't support all cpu wide breakpoints but this may be
>>>   implemented soon.
>>>
>>> - unregister_kernel_hw_breakpoint() and unregister_user_hw_breakpoint()
>>>   have been unified in a single unregister_hw_breakpoint()
>>>
>>> Each breakpoints now match a perf counter which now handles the
>>> register scheduling, thread/cpu attachment, etc..
>>>
>>> The new layering is now made as follows:
>>>
>>>        ptrace       kgdb      ftrace   perf syscall
>>>           \          |          /         /
>>>            \         |         /         /
>> kgdb doesn't fit here as it requires nmi-safe services.
>>
>> I don't think you want to make the whole stack nmi-safe but rather
>> provide a separate interface that allows kgdb to announce to the kernel
>> when it uses some slot. Those slots should simply be excluded from
>> hardware updates. That's roughly the logic we use in KVM for guest
>> debugging: when the host starts to use debug registers for that purpose,
>> the guest's setting will not effect the real hardware anymore.
> 
> 
> 
> I don't quite understand what must be NMI-safe here. Is it when
> we request a breakpoint or when we hit one?
> 

Both. With kgdb, the kernel may be interrupted (almost) everywhere, and
then the operator may decide to add/remove hardware breakpoints during
this interruption.

> 
>  
>> Still on my wishlist for KVM is a cheap & easy way to obtain the current
>> register content or to refresh it in hardware. It's not yet clear to me
>> where to hook this in the given design. It looks like this information
>> can be scattered over the current thread and some perf counters.
> 
> 
> With this design approach, the debug registers are not anymore stored
> in the thread structure. They are not stored anymore actually.
> 
> Especially because the breakpoint are not anymore assigned to a
> specific address register. This one is decided when the counter
> is enabled. And the counter is often toggled on/off, depending
> if we start/end profiling the desired context. It can be a single task,
> in which case the counter is enabled while the task is sched in, and
> disabled when it is sched out.
> And between two sched atoms, the register used for a breakpoint
> can be different.
> 
> The arch informations about the breakpoints (len/type/addr) are stored
> in the counter structure, and the address/control registers contents
> are now dynamically computed.
> 
> For your needs, basically the control must be done from perfcounters.
> When you switch from host to guest, the counter must be sched out.
> And in the reverse direction, it must be sched in.
> Then perf will take care of that by itself.

Actually, we wanted to avoid sched-out activity, and so far this is
possible. But if both steps are cheap enough, specifically if the
sched-out does _not_ touch the hardware and is very cheap if no
breakpoints are set, KVM will likely be a happy user.

Does that API already exist or what additional work is required?

Jan