linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fenghua Yu <fenghua.yu@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Joerg Roedel <joro@8bytes.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Jacob Jun Pan <jacob.jun.pan@intel.com>,
	Ashok Raj <ashok.raj@intel.com>,
	Ravi V Shankar <ravi.v.shankar@intel.com>,
	iommu@lists.linux-foundation.org, x86 <x86@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting
Date: Tue, 28 Sep 2021 16:36:25 +0000	[thread overview]
Message-ID: <YVNEiUMUTQezzH6f@otcwcpicx3.sc.intel.com> (raw)
In-Reply-To: <87mto0ckpd.ffs@tglx>

Hi, Thomas,

On Sun, Sep 26, 2021 at 01:13:50AM +0200, Thomas Gleixner wrote:
> Fenghua,
> 
> On Fri, Sep 24 2021 at 16:12, Fenghua Yu wrote:
> > On Fri, Sep 24, 2021 at 03:18:12PM +0200, Thomas Gleixner wrote:
> >> But OTOH why do you need a per task reference count on the PASID at all?
> >> 
> >> The PASID is fundamentaly tied to the mm and the mm can't go away before
> >> the threads have gone away unless this magically changed after I checked
> >> that ~20 years ago.
> >
> > There are up to 1M PASIDs because PASID is 20-bit. I think there are a few ways
> > to allocate and free PASID:
> >
> > 1. Statically allocate a PASID once a mm is created and free it in mm
> >    exit. No PASID allocation/free during the mm's lifetime. Then
> >    up to 1M processes can be created due to 1M PASIDs limitation.
> >    We don't want this method because the 1M processes limitation.
> 
> I'm not so worried about the 1M limitation, but it obviously makes sense
> to avoid that because allocating stuff which is not used is pointless in
> general.
> 
> > 2. A PASID is allocated to the mm in open(dev)->bind(dev, mm). There
> >    are three ways to free it:
> >    (a) Actively free it in close(fd)->unbind(dev, mm) by sending
> >        IPIs to tell all tasks using the PASID to clear the IA32_PASID
> >        MSR. This has locking issues similar to the actively loading
> >        IA32_PASID MSR which was force disabled in upstream. So won't work.
> 
> Exactly.
> 
> >    (b) Passively free the PASID in destroy_context(mm) in mm exit. Once
> >        the PASID is allocated, it stays with the process for the lifetime. It's
> >        better than #1 because the PASID is allocated only on demand.
> 
> Which is simple and makes a lot of sense. See below.
> 
> >    (c) Passively free the PASID in deactive_mm(mm) or unbind() whenever there
> >        is no usage as implemented in this series. Tracking the PASID usage
> >        per task provides a chance to free the PASID on task exit. The
> >        PASID has a better chance to be freed earlier than mm exit in #(b).
> >
> > This series uses #2 and #(c) to allocate and free the PASID for a better
> > chance to ease the 1M PASIDs limitation pressure. For example, a thread
> > doing open(dev)->ENQCMD->close(fd)->exit(2) will not occupy a PASID while
> > its sibling threads are still running.
> 
> I'm not seeing that as a realistic problem. Applications which use this
> kind of devices are unlikely to behave exactly that way.
> 
> 2^20 PASIDs are really plenty and just adding code for the theoretical
> case of PASID pressure is a pointless exercise IMO. It just adds
> complexity for no reason.
> 
> IMO reality will be that either you have long lived processes with tons
> of threads which use such devices over and over or short lived forked
> processes which open the device, do the job, close and exit. Both
> scenarios are fine with allocate on first use and drop on process exit.
> 
> I think with your approach you create overhead for applications which
> use thread pools where the threads get work thrown at them and do open()
> -> do_stuff() -> close() and then go back to wait for the next job which
> will do exactly the same thing. So you add the overhead of refcounts in
> general and in the worst case if the refcount drops to zero then the
> next worker has to allocate a new PASID instead of just moving on.
> 
> So unless you have a really compelling real world usecase argument, I'm
> arguing that the PASID pressure problem is a purely academic exercise.
> 
> I think you are conflating two things here:
> 
>   1) PASID lifetime
>   2) PASID MSR overhead
> 
> Which is not correct: You still can and have to optimize the per thread
> behaviour vs. the PASID MSR: Track per thread whether it ever needed the
> PASID and act upon that.
> 
> If the thread just does EMQCMD once in it's lifetime, then so be
> it. That's not a realistic use case, really.
> 
> And if someone does this then this does not mean we have to optimize for
> that. Optimizing for possible stupid implementations is the wrong
> approach. There is no technial measure against stupidity. If that would
> exist the world would be a much better place.
> 
> You really have to think about the problem space you are working
> on. There are problems which need a 'get it right at the first shot'
> solution because they create user space ABI or otheer hard to fix
> dependencies.
> 
> That's absolutely not the case here.
> 
> Get the basic simple support correct and work from there. Trying to
> solve all possible theoretical problems upfront is simply not possible
> and a guarantee for not making progress.
> 
> "Keep it simple" and "correctness first" are still the best working
> engineering principles.
> 
> They do not prevent us from revisiting this _if_ there is a real world
> problem which makes enough sense to implement a finer grained solution.

Sure. Will free the PASID in destroy_context() on mm exit and won't track
the PASID usage per task. The code will be simpler and clearer.

Thank you very much for your insight!

-Fenghua

  reply	other threads:[~2021-09-28 16:44 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-20 19:23 [PATCH 0/8] Re-enable ENQCMD and PASID MSR Fenghua Yu
2021-09-20 19:23 ` [PATCH 1/8] iommu/vt-d: Clean up unused PASID updating functions Fenghua Yu
2021-09-29  7:34   ` Lu Baolu
2021-09-30  0:40     ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 2/8] x86/process: Clear PASID state for a newly forked/cloned thread Fenghua Yu
2021-09-20 19:23 ` [PATCH 3/8] sched: Define and initialize a flag to identify valid PASID in the task Fenghua Yu
2021-09-20 19:23 ` [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP Fenghua Yu
2021-09-22 21:07   ` Peter Zijlstra
2021-09-22 21:11     ` Peter Zijlstra
2021-09-22 21:26       ` Luck, Tony
2021-09-23  7:03         ` Peter Zijlstra
2021-09-22 21:33       ` Dave Hansen
2021-09-23  7:05         ` Peter Zijlstra
2021-09-22 21:36       ` Fenghua Yu
2021-09-22 23:39     ` Fenghua Yu
2021-09-23 17:14     ` Luck, Tony
2021-09-24 13:37       ` Peter Zijlstra
2021-09-24 15:39         ` Luck, Tony
2021-09-29  9:00           ` Peter Zijlstra
2021-09-23 11:31   ` Thomas Gleixner
2021-09-23 23:17   ` Andy Lutomirski
2021-09-24  2:56     ` Fenghua Yu
2021-09-24  5:12       ` Andy Lutomirski
2021-09-27 21:02     ` Luck, Tony
2021-09-27 23:51       ` Dave Hansen
2021-09-28 18:50         ` Luck, Tony
2021-09-28 19:19           ` Dave Hansen
2021-09-28 20:28             ` Luck, Tony
2021-09-28 20:55               ` Dave Hansen
2021-09-28 23:10                 ` Luck, Tony
2021-09-28 23:50                   ` Fenghua Yu
2021-09-29  0:08                     ` Luck, Tony
2021-09-29  0:26                       ` Yu, Fenghua
2021-09-29  1:06                         ` Luck, Tony
2021-09-29  1:16                           ` Fenghua Yu
2021-09-29  2:11                             ` Luck, Tony
2021-09-29  1:56                       ` Yu, Fenghua
2021-09-29  2:15                         ` Luck, Tony
2021-09-29 16:58                   ` Andy Lutomirski
2021-09-29 17:07                     ` Luck, Tony
2021-09-29 17:48                       ` Andy Lutomirski
2021-09-20 19:23 ` [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting Fenghua Yu
2021-09-23  5:43   ` Lu Baolu
2021-09-30  0:44     ` Fenghua Yu
2021-09-23 14:36   ` Thomas Gleixner
2021-09-23 16:40     ` Luck, Tony
2021-09-23 17:48       ` Thomas Gleixner
2021-09-24 13:18         ` Thomas Gleixner
2021-09-24 16:12           ` Luck, Tony
2021-09-24 23:03             ` Andy Lutomirski
2021-09-24 23:11               ` Luck, Tony
2021-09-29  9:54               ` Peter Zijlstra
2021-09-29 12:28                 ` Thomas Gleixner
2021-09-29 16:51                   ` Luck, Tony
2021-09-29 17:07                     ` Fenghua Yu
2021-09-29 16:59                   ` Andy Lutomirski
2021-09-29 17:15                     ` Thomas Gleixner
2021-09-29 17:41                       ` Luck, Tony
2021-09-29 17:46                         ` Andy Lutomirski
2021-09-29 18:07                         ` Fenghua Yu
2021-09-29 18:31                           ` Luck, Tony
2021-09-29 20:07                             ` Thomas Gleixner
2021-09-24 16:12           ` Fenghua Yu
2021-09-25 23:13             ` Thomas Gleixner
2021-09-28 16:36               ` Fenghua Yu [this message]
2021-09-23 23:09   ` Andy Lutomirski
2021-09-23 23:22     ` Luck, Tony
2021-09-24  5:17       ` Andy Lutomirski
2021-09-20 19:23 ` [PATCH 6/8] x86/cpufeatures: Re-enable ENQCMD Fenghua Yu
2021-09-20 19:23 ` [PATCH 7/8] tools/objtool: Check for use of the ENQCMD instruction in the kernel Fenghua Yu
2021-09-22 21:03   ` Peter Zijlstra
2021-09-22 23:44     ` Fenghua Yu
2021-09-23  7:17       ` Peter Zijlstra
2021-09-23 15:26         ` Fenghua Yu
2021-09-24  0:55           ` Josh Poimboeuf
2021-09-24  0:57             ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 8/8] docs: x86: Change documentation for SVA (Shared Virtual Addressing) Fenghua Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YVNEiUMUTQezzH6f@otcwcpicx3.sc.intel.com \
    --to=fenghua.yu@intel.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=joro@8bytes.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).