On 02/08/2016 07:30 AM, Stefan Hajnoczi wrote:
> On Wed, Jan 13, 2016 at 03:13:02PM -0800, Hollis Blanchard wrote:
>> The event timestamps are host time (get_clock()). I'm correlating qemu
>> events with other logs (using icount), so host time is unhelpful. Could we
>> use cpu_get_clock() instead? (Trace events are used in other tools like
>> qemu-io, where guest time doesn't exist, and there we could continue to use
>> get_clock().)
> It must be possible to trace code while the guest CPUs are stopped, so
> cpu_get_clock() on own breaks existing users.
>
> If the CPU clock is more convenient for you, perhaps you can add an
> option to use that clocksource?

"Breaks"? Is that because every trace record is assumed to have a unique 
timestamp? Who makes that assumption? Qemu's tracing infrastructure 
works fine without it...

Hmm, an option. I guess if you want to debug Qemu behavior, like "why 
are block operations taking so long", seeing host time is the most 
important thing. However, if you want debug visibility into guest 
behavior, like "how long did I run with IRQs disabled", guest time is 
the most important thing. I'm not sure when a developer should be forced 
to make that decision...?

>> When trying to understand multi-core guest behavior, it's pretty important
>> to know which core is performing the traced action (e.g. MMIO). Would it
>> make sense to automatically embed the core index, like the timestamp, or do
>> you think it should be encoded in each individual tracepoint?
> I think that Lluís has some of this stuff automated in his TCG tracing
> work.  He has added special trace event types for TCG that can be
> planted at code generation time as well as TB execution time.  They can
> include the vcpu number automatically IIRC.

Neat.

> Regarding I/O emulation, we have to be careful because architecturally
> some types of devices (e.g. PCI devices) don't have the concept of which
> core is performing an action.  QEMU takes advantage of that with
> ioeventfd where a lightweight vmexit simply marks an eventfd file
> descriptor readable in the kernel and quickly returns back to vcpu
> execution.  Another QEMU thread monitors the eventfd and processes the
> notification and there is no concept of the current vcpu at that point.

Yeah, by "MMIO" I was thinking in particular of memory.c 
instrumentation, which almost always executes in the context of a vcpu.

I'm not sure a vcpu property could work there. cpu_physical_memory_rw() 
may be executed by a machine init function, before any VCPUs are 
running. To handle that, I have a hack like this in 
memory_region_read_accessor():

         int cpu_index = -1;
         if (current_cpu)
             cpu_index = current_cpu->cpu_index;
         trace_memory_region_ops_read(cpu_index, mr, addr, tmp, size);

... and then my analysis tools, which display guest activity, know to 
ignore events with cpu_index==-1.

I'm not sure if a vcpu property could or should do the same...?

> It might be easiest to include the vcpu id in the trace event explicitly
> for now.

Mmm hmm. That could cause a lot of churn to the current set of trace 
events, but they seem to be mostly IO operations that execute outside 
VCPU context.

>> Right now, the simple tracepoint backend creates a trace file even when no
>> events are active and no -trace options are present. Compounding the
>> situation, omitting "-trace file=foo" results in the directory being
>> littered with trace-NNNN files... could we create the file only when tracing
>> is activated? And maybe with a more predictable name?
> It should be possible to create the file only if traces are actually
> generated.  I guess this wasn't done for simplicity and because the
> write-out thread needs to empty the ring buffer as quickly as possible
> to avoid dropping trace data.
>
> The names contains the QEMU pid.  Although the pid isn't predictable it
> is useful for correlating with system logs, host performance tools, etc.
>
> You can override the name with -trace file=foo.log if you want.
>
> Do you have an idea for another automatic file naming scheme?

"qemu.trace" or something static. If the log is important, the user will 
copy it anyways to give it a descriptive name, e.g. 
"out_of_memory.trace". If the log is not important, it should be 
overwritten by the next run; otherwise the workflow becomes:

qemu-system-... args args args
ls trace-*
rm trace-NNNN
qemu-system-... args args args
ls trace-*
rm trace-MMMM


Hollis Blanchard
Mentor Graphics Emulation Division