Re: [patch 5/9] x86/ioport: Reduce ioperm impact for sane usage further

From: Andy Lutomirski <luto@kernel.org>
To: Ingo Molnar <mingo@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Willy Tarreau <w@1wt.eu>, Juergen Gross <jgross@suse.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [patch 5/9] x86/ioport: Reduce ioperm impact for sane usage further
Date: Sun, 10 Nov 2019 09:17:39 -0800	[thread overview]
Message-ID: <54ae3c3a-f49a-f5cc-d174-de6e64afe899@kernel.org> (raw)
In-Reply-To: <20191107082541.GF30739@gmail.com>

On 11/7/19 12:25 AM, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> On Wed, Nov 6, 2019 at 12:57 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>>
>>> Calculate both the position of the first zero bit and the last zero bit to
>>> limit the range which needs to be copied. This does not solve the problem
>>> when the previous tasked had only byte 0 cleared and the next one has only
>>> byte 65535 cleared, but trying to solve that would be too complex and
>>> heavyweight for the context switch path. As the ioperm() usage is very rare
>>> the case which is optimized is the single task/process which uses ioperm().
>>
>> Hmm.
>>
>> I may read this patch wrong, but from what I can tell, if we really
>> have just one process with an io bitmap, we're doing unnecessary
>> copies.
>>
>> If we really have just one process that has an iobitmap, I think we
>> could just keep the bitmap of that process entirely unchanged. Then,
>> when we switch away from it, we set the io_bitmap_base to an invalid
>> base outside the TSS segment, and when we switch back, we set it back
>> to the valid one. No actual bitmap copies at all.
>>
>> So I think that rather than the "begin/end offset" games, we should
>> perhaps have a "what was the last process that used the IO bitmap for
>> this TSS" pointer (and, I think, some sequence counter, so that when
>> the process updates its bitmap, it invalidates that case)?
>>
>>  Of course, you can do *nboth*, but if we really think that the common
>> case is "one special process", then I think the begin/end offset is
>> useless, but a "last bitmap process" would be very useful.
>>
>> Am I missing something?
> 
> In fact on SMP systems this would result in a very nice optimization: 
> pretty quickly *all* TSS's would be populated with that single task's 
> bitmap, and it would persist even across migrations from CPU to CPU.
> 
> I'd love to get rid of the offset caching and bit scanning games as well 
> - it doesn't really help in a number of common scenarios and it 
> complicates this non-trivial piece of code a *LOT* - and we probably 
> don't really have the natural testing density of this code anymore to 
> find any regressions quickly.

I think we should not over-optimize this.  I am all for penalizing
ioperm() and iopl() users as much as is convenient for us.  There is
simply no legitimate use case.  Sorry, DPDK, but "virtio-legacy sucks,
let's optimize the crap out of something that is slow anyway and use
iopl()" is not a good excuse.  Just use the %*!7 syscall to write to
/sys/.../resource0 and suck up the probably negligible performance hit.
And tell your customers to upgrade their hypervisors.  And quite
kvetching about performance of the control place on an old
software-emulated NIC while you're at it.

For the TLB case, it's worth tracking who last used which ASID and
whether it's still up to date, since *everyone* uses the MMU.  For
ioperm, I don't really believe this is worth it.

> 
> So intuitively I'd suggest we gravitate towards the simplest 
> implementation, with a good caching optimization for the single-task 
> case.

I agree with the first bit, but caching on an SMP system is necessarily
subtle.  Some kind of invalidation is needed.

> 
> I.e. the model I'm suggesting is that if a task uses ioperm() or iopl() 
> then it should have a bitmap from that point on until exit(), even if 
> it's all zeroes or all ones. Most applications that are using those 
> primitives really need it all the time and are using just a few ioports, 
> so all the tracking doesn't help much anyway.
> 
> On a related note, another simplification would be that in principle we 
> could also use just a single bitmap and emulate iopl() as an ioperm(all) 
> or ioperm(none) calls. Yeah, it's not fully ABI compatible for mixed 
> ioperm()/iopl() uses, but is that ABI actually being relied on in 
> practice?
> 

Let's please keep the ABI.  Or rather, let's attempt to eventually
remove the ABI, but let's not change it in the mean time please.