From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756114AbcDDSbK (ORCPT ); Mon, 4 Apr 2016 14:31:10 -0400 Received: from mx2.suse.de ([195.135.220.15]:40072 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755401AbcDDSbI (ORCPT ); Mon, 4 Apr 2016 14:31:08 -0400 Date: Mon, 4 Apr 2016 20:31:03 +0200 From: "Luis R. Rodriguez" To: Takashi Iwai , Boris Ostrovsky Cc: Andy Lutomirski , Konstantin Ozerkov , Thomas Gleixner , Borislav Petkov , Stephen Hemminger , Andrew Cooper , George Dunlap , xen-devel , Steven Rostedt , Ingo Molnar , Jim Gettys , ALSA development , Matt Fleming , "Luis R. Rodriguez" , Dario Faggioli , Stefano Stabellini , Joerg Roedel , "linux-kernel@vger.kernel.org" , Mel Gorman Subject: Re: Getting rid of inside_vm in intel8x0 Message-ID: <20160404183103.GC1990@wotan.suse.de> References: <20160331222618.GE1990@wotan.suse.de> <20160401222831.GX1990@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Apr 02, 2016 at 10:22:44PM +0200, Takashi Iwai wrote: > On Sat, 02 Apr 2016 20:05:21 +0200, > Andy Lutomirski wrote: > > > > On Apr 2, 2016 12:07 PM, "Takashi Iwai" wrote: > > > > > > On Sat, 02 Apr 2016 14:57:44 +0200, > > > Andy Lutomirski wrote: > > > > > > > > On Fri, Apr 1, 2016 at 10:33 PM, Takashi Iwai wrote: > > > > > On Sat, 02 Apr 2016 00:28:31 +0200, > > > > > Luis R. Rodriguez wrote: > > > > >> If the former, could a we somehow detect an emulated device other than through > > > > >> this type of check ? Or could we *add* a capability of some sort to detect it > > > > >> on the driver ? This would not address the removal, but it could mean finding a > > > > >> way to address emulation issues. > > > > >> > > > > >> If its an IO issue -- exactly what is the causing the delays in IO ? > > > > > > > > > > Luis, there is no problem about emulation itself. It's rather an > > > > > optimization to lighten the host side load, as I/O access on a VM is > > > > > much heavier. > > > > > > > > > >> > > > This is satisfied mostly only on VM, and can't > > > > >> > > > be measured easily unlike the IO read speed. > > > > >> > > > > > > >> > > Interesting, note the original patch claimed it was for KVM and > > > > >> > > Parallels hypervisor only, but since the code uses: > > > > >> > > > > > > >> > > +#if defined(__i386__) || defined(__x86_64__) > > > > >> > > + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR); > > > > >> > > +#endif > > > > >> > > > > > > >> > > This makes it apply also to Xen as well, this makes this hack more > > > > >> > > broad, but does is it only applicable when an emulated device is > > > > >> > > used ? What about if a hypervisor is used and PCI passthrough is > > > > >> > > used ? > > > > >> > > > > > >> > A good question. Xen was added there at the time from positive > > > > >> > results by quick tests, but it might show an issue if it's running on > > > > >> > a very old chip with PCI passthrough. But I'm not sure whether PCI > > > > >> > passthrough would work on such old chipsets at all. > > > > >> > > > > >> If it did have an issue then that would have to be special cased, that > > > > >> is the module parameter would not need to be enabled for such type of > > > > >> systems, and heuristics would be needed. As you note, fortunately this > > > > >> may not be common though... > > > > > > > > > > Actually this *is* module parametered. If set to a boolean value, it > > > > > can be applied / skipped forcibly. So, if there has been a problem on > > > > > Xen, this should have been reported. That's why I wrote it's no > > > > > common case. This comes from the real experience. > > > > > > > > > >> but if this type of work around may be > > > > >> taken as a precedent to enable other types of hacks in other drivers > > > > >> I'm very fearful of more hacks later needing these considerations as > > > > >> well. > > > > >> > > > > >> > > > > There are a pile of nonsensical "are we in a VM" checks of various > > > > >> > > > > sorts scattered throughout the kernel, they're all a mess to maintain > > > > >> > > > > (there are lots of kinds of VMs in the world, and Linux may not even > > > > >> > > > > know it's a guest), and, in most cases, it appears that the correct > > > > >> > > > > solution is to delete the checks. I just removed a nasty one in the > > > > >> > > > > x86_32 entry asm, and this one is written in C so it should be a piece > > > > >> > > > > of cake :) > > > > >> > > > > > > > >> > > > This cake looks sweet, but a worm is hidden behind the cream. > > > > >> > > > The loop in the code itself is already a kludge for the buggy hardware > > > > >> > > > where the inconsistent read happens not so often (only at the boundary > > > > >> > > > and in a racy way). It would be nice if we can have a more reliably > > > > >> > > > way to know the hardware buggyness, but it's difficult, > > > > >> > > > unsurprisingly. > > > > >> > > > > > > >> > > The concern here is setting precedents for VM cases sprinkled in the kernel. > > > > >> > > The assumption here is such special cases are really paper'ing over another > > > > >> > > type of issue, so its best to ultimately try to root cause the issue in > > > > >> > > a more generalized fashion. > > > > >> > > > > > >> > Well, it's rather bare metal that shows the buggy behavior, thus we > > > > >> > need to paper over it. In that sense, it's other way round; we don't > > > > >> > tune for VM. The VM check we're discussing is rather for skipping the > > > > >> > strange workaround. > > > > >> > > > > >> What is it exactly about a VM that enables this work around to be skipped? > > > > >> I don't quite get it yet. > > > > > > > > > > VM -- at least the full one with the sound hardware emulation -- > > > > > doesn't have the hardware bug. So, the check isn't needed. > > > > > > > > Here's the issue, though: asking "am I in a VM" is not a good way to > > > > learn properties of hardware. Just off the top of my head, here are > > > > some types of VM and what they might imply about hardware: > > > > > > > > Intel Kernel Guard: your sound card is passed through from real hardware. > > > > > > > > Xen: could go either way. In dom0, it's likely passed through. In > > > > domU, it could be passed through or emulated, and I believe this is > > > > the case for all of the Xen variants. > > > > > > > > KVM: Probably emulated, but could be passed through. > > > > > > > > I think the main reason that Luis and I are both uncomfortable with > > > > "am I in a VM" checks is that they're rarely the right thing to be > > > > detecting, the APIs are poorly designed, and most of the use cases in > > > > the kernel are using them as a proxy for something else and would be > > > > clearer and more future proof if they tested what they actually need > > > > to test more directly. > > > > > > Please, guys, take a look at the code more closely. This is applied > > > only to the known emulated PCI devices, and the driver shows the > > > kernel message: > > > > > > static int snd_intel8x0_inside_vm(struct pci_dev *pci) > > > .... > > > /* check for known (emulated) devices */ > > > if (pci->subsystem_vendor == PCI_SUBVENDOR_ID_REDHAT_QUMRANET && > > > pci->subsystem_device == PCI_SUBDEVICE_ID_QEMU) { > > > /* KVM emulated sound, PCI SSID: 1af4:1100 */ > > > msg = "enable KVM"; > > > } else if (pci->subsystem_vendor == 0x1ab8) { > > > /* Parallels VM emulated sound, PCI SSID: 1ab8:xxxx */ > > > msg = "enable Parallels VM"; > > > } else { > > > msg = "disable (unknown or VT-d) VM"; > > > result = 0; > > > } > > > > Now I'm more confused. Why are you checking the PCI IDs *and* whether > > a hypervisor is detected? Why not check only the IDs? > > > > In any event, at the very least the comment is misleading: > > > > /* detect KVM and Parallels virtual environments */ > > result = kvm_para_available(); > > #ifdef X86_FEATURE_HYPERVISOR > > result = result || boot_cpu_has(X86_FEATURE_HYPERVISOR); > > #endif > > > > You're detecting KVM (sometimes) and the x86 "hypervisor" bit. The > > latter has no particularly well-defined meaning. You're also missing > > Xen PV, I believe, and I think that Xen PV + QEMU is a real thing, and > > you'll fail to detect it, even though it can present a QEMU-emulated > > card. Indeed, I think some Xen instances would fail to check with this. > > In other words, how is this code any different from a simple whitelist > > of two specific cards that work a little differently from others? > > The PCI ID whitelist was introduced later in the commit 7fb4f392bd27, > and before that, we relied on the VM detection as a switch for > skipping the workaround. The VM detection code was kept there just to > be sure, in case the whitelist isn't 100% correct. > > Looking at the current status, the whitelist alone seems enough, so > the VM detection code could be dropped, I suppose. Great, such a change should perhaps highligh the impact also to Xen then. Luis