From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751800AbcDBFdr (ORCPT <rfc822;w@1wt.eu>);
	Sat, 2 Apr 2016 01:33:47 -0400
Received: from mx2.suse.de ([195.135.220.15]:48968 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750940AbcDBFdp (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Sat, 2 Apr 2016 01:33:45 -0400
Date: Sat, 02 Apr 2016 07:33:40 +0200
Message-ID: <s5hwpog377v.wl-tiwai@suse.de>
From: Takashi Iwai <tiwai@suse.de>
To: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
        Andrew Cooper <andrew.cooper3@citrix.com>,
        Stephen Hemminger <stephen@networkplumber.org>,
        Jim Gettys <jg@freedesktop.org>,
        Dario Faggioli <dario.faggioli@citrix.com>,
        George.Dunlap@eu.citrix.com, Ingo Molnar <mingo@kernel.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Steven Rostedt <rostedt@goodmis.org>, Mel Gorman <mgorman@suse.de>,
        Andy Lutomirski <luto@amacapital.net>,
        Konstantin Ozerkov <kozerkov@parallels.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        ALSA development <alsa-devel@alsa-project.org>,
        Matt Fleming <matt@codeblueprint.co.uk>,
        Borislav Petkov <bp@alien8.de>,
        xen-devel <xen-devel@lists.xenproject.org>,
        Joerg Roedel <joro@8bytes.org>
Subject: Re: Getting rid of inside_vm in intel8x0
In-Reply-To: <20160401222831.GX1990@wotan.suse.de>
References: <CALCETrWnsn4CoRXcKDXpMTjxCYj8==mZQHkZRuXfZQyvERB3Gw@mail.gmail.com>
	<s5h60w48po7.wl-tiwai@suse.de>
	<20160331222618.GE1990@wotan.suse.de>
	<s5hvb4151v1.wl-tiwai@suse.de>
	<20160401222831.GX1990@wotan.suse.de>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka)
 FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/24.5
 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sat, 02 Apr 2016 00:28:31 +0200,
Luis R. Rodriguez wrote:
> 
> On Fri, Apr 01, 2016 at 07:34:10AM +0200, Takashi Iwai wrote:
> > On Fri, 01 Apr 2016 00:26:18 +0200,
> > Luis R. Rodriguez wrote:
> > > 
> > > On Wed, Mar 30, 2016 at 08:07:04AM +0200, Takashi Iwai wrote:
> > > > On Tue, 29 Mar 2016 23:37:32 +0200,
> > > > Andy Lutomirski wrote:
> > > > > 
> > > > > Would it be possible to revert:
> > > > > 
> > > > > commit 228cf79376f13b98f2e1ac10586311312757675c
> > > > > Author: Konstantin Ozerkov <kozerkov@parallels.com>
> > > > > Date:   Wed Oct 26 19:11:01 2011 +0400
> > > > > 
> > > > >     ALSA: intel8x0: Improve performance in virtual environment
> > > > > 
> > > > > Presumably one or more of the following is true:
> > > > > 
> > > > > a) The inside_vm == true case is just an optimization and should apply
> > > > > unconditionally.
> > > > > 
> > > > > b) The inside_vm == true case is incorrect and should be fixed or disabled.
> > > > > 
> > > > > c) The inside_vm == true case is a special case that makes sense then
> > > > > IO is very very slow but doesn't make sense when IO is fast.  If so,
> > > > > why not literally measure the time that the IO takes and switch over
> > > > > to the "inside VM" path when IO is slow?
> > > 
> > > BTW can we simulate this on bare metal by throttling an IO bus, or
> > > perhaps mucking with the scheduler ?
> > > 
> > > I ask as I wonder if similar type of optimization may be useful
> > > to first simulate with other types of buses for other IO devices
> > > we might use in virtualization environments. If so, I'd be curious
> > > to know if similar type of optimizations might be possible for
> > > other sounds cards, or other IO devices.
> > 
> > There aren't so many sound devices requiring such a workaround.
> 
> Why not, what makes this special?

The hardware buggyness.

> > > > More important condition is rather that the register updates of CIV
> > > > and PICB are atomic.
> > > 
> > > To help with this can you perhaps elaborate a bit more on what the code
> > > does? As I read it snd_intel8x0_pcm_pointer() gets a pointer to some
> > > sort of audio frame we are in and uses two values to see if we are
> > > going to be evaluating the right frame, we use an optimization of
> > > some sort to skip one check for virtual environments. We seem to need
> > > this given that on a virtual environment it is assumed that the sound
> > > card is emulated, and as such an IO read there is rather expensive.
> > > 
> > > Can you confirm and/or elaborate a bit more what this does ?
> > > 
> > > To try to help understand what is going on can you describe what CIV
> > > and PICB are exactly ?
> > 
> > CIV and PICB registers are a pair and we calculate the linear position
> > in a ring buffer from both two.
> 
> What type of ring buffer is this ?

A normal PCM ring buffer via PCI DMA transfer.

> > However, they are divorced sometimes
> > under stress, and the position calculated from such values may go
> > backward wrongly.  For avoiding it, there is the second read of the
> > PICB register and compare with the previous value, and loop until it
> > matches.  This check is skipped on VM.
> 
> I see. Is this a software emulation *bug*, or an IO issue due to
> virtualization? I tried to read what the pointer() (that's what its called)
> callback does but since there is no documentation for any of the callbacks I
> have no clue what so ever.
> 
> If the former, could a we somehow detect an emulated device other than through
> this type of check ? Or could we *add* a capability of some sort to detect it
> on the driver ? This would not address the removal, but it could mean finding a
> way to address emulation issues.
> 
> If its an IO issue -- exactly what is the causing the delays in IO ?

Luis, there is no problem about emulation itself.  It's rather an
optimization to lighten the host side load, as I/O access on a VM is
much heavier.

> > > > This is satisfied mostly only on VM, and can't
> > > > be measured easily unlike the IO read speed.
> > > 
> > > Interesting, note the original patch claimed it was for KVM and
> > > Parallels hypervisor only, but since the code uses:
> > > 
> > > +#if defined(__i386__) || defined(__x86_64__)
> > > +               inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> > > +#endif
> > > 
> > > This makes it apply also to Xen as well, this makes this hack more
> > > broad, but does is it only applicable when an emulated device is
> > > used ? What about if a hypervisor is used and PCI passthrough is
> > > used ?
> > 
> > A good question.  Xen was added there at the time from positive
> > results by quick tests, but it might show an issue if it's running on
> > a very old chip with PCI passthrough.  But I'm not sure whether PCI
> > passthrough would work on such old chipsets at all.
> 
> If it did have an issue then that would have to be special cased, that
> is the module parameter would not need to be enabled for such type of
> systems, and heuristics would be needed. As you note, fortunately this
> may not be common though...

Actually this *is* module parametered.  If set to a boolean value, it
can be applied / skipped forcibly.  So, if there has been a problem on
Xen, this should have been reported.  That's why I wrote it's no
common case.  This comes from the real experience.

> but if this type of work around may be
> taken as a precedent to enable other types of hacks in other drivers
> I'm very fearful of more hacks later needing these considerations as
> well.
> 
> > > > > There are a pile of nonsensical "are we in a VM" checks of various
> > > > > sorts scattered throughout the kernel, they're all a mess to maintain
> > > > > (there are lots of kinds of VMs in the world, and Linux may not even
> > > > > know it's a guest), and, in most cases, it appears that the correct
> > > > > solution is to delete the checks.  I just removed a nasty one in the
> > > > > x86_32 entry asm, and this one is written in C so it should be a piece
> > > > > of cake :)
> > > > 
> > > > This cake looks sweet, but a worm is hidden behind the cream.
> > > > The loop in the code itself is already a kludge for the buggy hardware
> > > > where the inconsistent read happens not so often (only at the boundary
> > > > and in a racy way).  It would be nice if we can have a more reliably
> > > > way to know the hardware buggyness, but it's difficult,
> > > > unsurprisingly.
> > > 
> > > The concern here is setting precedents for VM cases sprinkled in the kernel.
> > > The assumption here is such special cases are really paper'ing over another
> > > type of issue, so its best to ultimately try to root cause the issue in
> > > a more generalized fashion.
> >  
> > Well, it's rather bare metal that shows the buggy behavior, thus we
> > need to paper over it.   In that sense, it's other way round; we don't
> > tune for VM.  The VM check we're discussing is rather for skipping the
> > strange workaround.
> 
> What is it exactly about a VM that enables this work around to be skipped?
> I don't quite get it yet.

VM -- at least the full one with the sound hardware emulation --
doesn't have the hardware bug.  So, the check isn't needed.

> > You may ask whether we can reduce the whole workaround instead.  It's
> > practically impossible.  We don't know which models doing so and which
> > not.  And, the hardware in question are (literally) thousands of
> > variants of damn old PC mobos.  Any fundamental change needs to be
> > verified on all these machines...
> 
> What if we can come up with algorithm on the ring buffer that would
> satisfy both cases without special casing it ? Is removing this VM
> check impossible really?

Yes, it's impossible practically, see my comment above.
Whatever you change, you need to verify it on real machines.  And it's
very difficult to achieve.


thanks,

Takashi