From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Xu Subject: Re: [kvm-unit-tests PATCH 11/14] pci: edu: introduce pci-edu helpers Date: Tue, 25 Oct 2016 19:33:39 +0800 Message-ID: <20161025113339.GA13839@pxdev.xzpeter.org> References: <1476448852-30062-1-git-send-email-peterx@redhat.com> <1476448852-30062-12-git-send-email-peterx@redhat.com> <20161020131928.shuxwj7u7wyctoy3@kamzik.brq.redhat.com> <20161025033422.GA15168@pxdev.xzpeter.org> <20161025104341.rvyr4eo7grteoq3j@kamzik.brq.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: kvm@vger.kernel.org, jan.kiszka@web.de, agordeev@redhat.com, rkrcmar@redhat.com, pbonzini@redhat.com To: Andrew Jones Return-path: Received: from mx1.redhat.com ([209.132.183.28]:45074 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753454AbcJYLdl (ORCPT ); Tue, 25 Oct 2016 07:33:41 -0400 Content-Disposition: inline In-Reply-To: <20161025104341.rvyr4eo7grteoq3j@kamzik.brq.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Oct 25, 2016 at 12:43:41PM +0200, Andrew Jones wrote: > On Tue, Oct 25, 2016 at 11:34:22AM +0800, Peter Xu wrote: > > > You may consider adding a cpu_relax() to lib/x86/asm/barrier.h, like > > > arm and ppc have. I noticed a big difference when running with tcg > > > on how fast a 'while (state-not-changed);' loop can complete when > > > adding that barrier. > > > > Should I just do: > > > > #include > > > > in x86/asm/barrier.h, just like ppc? > > Nope. See below > > > > > Btw, could you elaborate what would be the difference? I see > > cpu_relax() is defined as: > > > > #define cpu_relax() asm volatile("" : : : "memory") > > > > Then how would the two differ: > > > > (1) while (xxx); > > (2) while (xxx) cpu_relax(); > > It inserts a compiler-generated memory barrier, which for ARM TCG > helps relinquish some time to QEMU, allowing QEMU to do what it > needs to do in order for the condition to be fulfilled. I just > checked Linux code for x86's definition of cpu_relax(), though, and > see that it's > > /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */ > static __always_inline void rep_nop(void) > { > asm volatile("rep; nop" ::: "memory"); > } > > static __always_inline void cpu_relax(void) > { > rep_nop(); > } > > So that may work better for you. A lesson for cpu_relax(). Thanks. :-) So I guess "rep; nop" is a arch-specific hint for x86. Will add one more patch for it. > > > > > (I thought they are still the same? maybe I should assume variables in > > xxx should have volatile keywords) > > Don't assume they do, make sure they do. Without volatile for those > variables you may never stop looping. Noted. [...] > > > Why wrap pci_dev in this pci_edu_dev struct? Will there ever > > > be more state? Should we just wait and see? Also, why use a > > > typedef for pci_dev (assuming we want it)? 'struct pci_dev' > > > would be more kernel like. > > > > My own preference is to use typedef just like how QEMU is using it, to > > avoid typing "struct", and with more flexibility (e.g., I can just > > change a struct into union without touching other part of code, as > > long as I keep all the existing fields' name there). However I think I > > should follow how kvm-unit-tests/Linux's coding style to use struct. > > I would. Note, the kernel tends to wrap unions in structs, avoiding > a need to switch the outer type. Hmm... right. > > > > > But should I still keep the wrapper even if there is only one field > > which is "struct pci_dev"? Benefits: > > > > (1) people won't be able to edu_dma() a PCI device which is not a edu > > device, or say "struct pci_edu_dev" type is checked during > > compilation. > > > > (2) in case edu device will need more fields, we won't need to touch > > everywhere and change the old "struct pci_dev" into "struct > > pci_edu_dev". > > Fair enough, but I think we need to decide if we even need struct > pci_dev first :-) There's no need to introduce any structs if all > we really need to do is pass around devid. I think that may depend on how we target kvm-unit-tests, and how we will try to extend PCI functions in kvm-unit-tests in the future. For now, I needed it mostly because of MSI offset. For PCI bar, indeed we can fetch it everytime we need it. However for MSI offset, it's hidden inside capability list. If we don't cache it, we will need to traverse the capability list every time, which is really not that efficient. If one day we want to play with MSI-X? Or more? If we have such a requirement, it seems just natural to have a "struct pci_dev" and keep all the device specific information along with the object. :-) Thanks! -- peterx