Marcelo Tosatti wrote: > On Thu, May 07, 2009 at 01:03:45PM -0400, Gregory Haskins wrote: > >> Chris Wright wrote: >> >>> * Gregory Haskins (ghaskins@novell.com) wrote: >>> >>> >>>> Chris Wright wrote: >>>> >>>> >>>>> VF drivers can also have this issue (and typically use mmio). >>>>> I at least have a better idea what your proposal is, thanks for >>>>> explanation. Are you able to demonstrate concrete benefit with it yet >>>>> (improved latency numbers for example)? >>>>> >>>>> >>>> I had a test-harness/numbers for this kind of thing, but its a bit >>>> crufty since its from ~1.5 years ago. I will dig it up, update it, and >>>> generate/post new numbers. >>>> >>>> >>> That would be useful, because I keep coming back to pio and shared >>> page(s) when think of why not to do this. Seems I'm not alone in that. >>> >>> thanks, >>> -chris >>> >>> >> I completed the resurrection of the test and wrote up a little wiki on >> the subject, which you can find here: >> >> http://developer.novell.com/wiki/index.php/WhyHypercalls >> >> Hopefully this answers Chris' "show me the numbers" and Anthony's "Why >> reinvent the wheel?" questions. >> >> I will include this information when I publish the updated v2 series >> with the s/hypercall/dynhc changes. >> >> Let me know if you have any questions. >> > > Greg, > > I think comparison is not entirely fair. You're using > KVM_HC_VAPIC_POLL_IRQ ("null" hypercall) and the compiler optimizes that > (on Intel) to only one register read: > > nr = kvm_register_read(vcpu, VCPU_REGS_RAX); > > Whereas in a real hypercall for (say) PIO you would need the address, > size, direction and data. > Hi Marcelo, I'll have to respectfully disagree with you here. What you are proposing is actually a different test: a 4th type I would call "PIO over HC". It is distinctly different than the existing MMIO, PIO, and HC tests already present. I assert that the current HC test remains valid because for pure hypercalls, the "nr" *is* the address. It identifies the function to be executed (e.g. VAPIC_POLL_IRQ = null), just like the PIO address of my nullio device identifies the function to be executed (i.e. nullio_write() = null) My argument is that the HC test emulates the "dynhc()" concept I have been talking about, whereas the PIOoHC is more like the pv_io_ops->iowrite approach. That said, your 4th test type would actually be a very interesting data-point to add to the suite (especially since we are still kicking around the notion of doing something like this). I will update the patches. > Also for PIO/MMIO you're adding this unoptimized lookup to the > measurement: > > pio_dev = vcpu_find_pio_dev(vcpu, port, size, !in); > if (pio_dev) { > kernel_pio(pio_dev, vcpu, vcpu->arch.pio_data); > complete_pio(vcpu); > return 1; > } > > Whereas for hypercall measurement you don't. In theory they should both share about the same algorithmic complexity in the decode-stage, but due to the possible optimization you mention you may have a point. I need to take some steps to ensure the HC path isn't artificially simplified by GCC (like making the execute stage do some trivial work like you mention below). > I believe a fair comparison > would be have a shared guest/host memory area where you store guest/host > TSC values and then do, on guest: > > rdtscll(&shared_area->guest_tsc); > pio/mmio/hypercall > ... back to host > rdtscll(&shared_area->host_tsc); > > And then calculate the difference (minus guests TSC_OFFSET of course)? > > I'm not sure I need that much complexity. I can probably just change the test harness to generate an ioread32(), and have the functions return the TSC value as a return parameter for all test types. The important thing is that we pick something extremely cheap (yet dynamic) to compute so the execution time doesn't invalidate the measurement granularity with a large constant. Regards, -Greg