All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Ashok Raj <ashok.raj@intel.com>,
	Peter Zijlstra <peterz@infradead.org>, x86 <x86@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	iommu@lists.linux-foundation.org, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Jacob Jun Pan <jacob.jun.pan@intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting
Date: Sun, 26 Sep 2021 01:13:50 +0200	[thread overview]
Message-ID: <87mto0ckpd.ffs@tglx> (raw)
In-Reply-To: <YU34+1J4v0cn9ZRs@otcwcpicx3.sc.intel.com>

Fenghua,

On Fri, Sep 24 2021 at 16:12, Fenghua Yu wrote:
> On Fri, Sep 24, 2021 at 03:18:12PM +0200, Thomas Gleixner wrote:
>> But OTOH why do you need a per task reference count on the PASID at all?
>> 
>> The PASID is fundamentaly tied to the mm and the mm can't go away before
>> the threads have gone away unless this magically changed after I checked
>> that ~20 years ago.
>
> There are up to 1M PASIDs because PASID is 20-bit. I think there are a few ways
> to allocate and free PASID:
>
> 1. Statically allocate a PASID once a mm is created and free it in mm
>    exit. No PASID allocation/free during the mm's lifetime. Then
>    up to 1M processes can be created due to 1M PASIDs limitation.
>    We don't want this method because the 1M processes limitation.

I'm not so worried about the 1M limitation, but it obviously makes sense
to avoid that because allocating stuff which is not used is pointless in
general.

> 2. A PASID is allocated to the mm in open(dev)->bind(dev, mm). There
>    are three ways to free it:
>    (a) Actively free it in close(fd)->unbind(dev, mm) by sending
>        IPIs to tell all tasks using the PASID to clear the IA32_PASID
>        MSR. This has locking issues similar to the actively loading
>        IA32_PASID MSR which was force disabled in upstream. So won't work.

Exactly.

>    (b) Passively free the PASID in destroy_context(mm) in mm exit. Once
>        the PASID is allocated, it stays with the process for the lifetime. It's
>        better than #1 because the PASID is allocated only on demand.

Which is simple and makes a lot of sense. See below.

>    (c) Passively free the PASID in deactive_mm(mm) or unbind() whenever there
>        is no usage as implemented in this series. Tracking the PASID usage
>        per task provides a chance to free the PASID on task exit. The
>        PASID has a better chance to be freed earlier than mm exit in #(b).
>
> This series uses #2 and #(c) to allocate and free the PASID for a better
> chance to ease the 1M PASIDs limitation pressure. For example, a thread
> doing open(dev)->ENQCMD->close(fd)->exit(2) will not occupy a PASID while
> its sibling threads are still running.

I'm not seeing that as a realistic problem. Applications which use this
kind of devices are unlikely to behave exactly that way.

2^20 PASIDs are really plenty and just adding code for the theoretical
case of PASID pressure is a pointless exercise IMO. It just adds
complexity for no reason.

IMO reality will be that either you have long lived processes with tons
of threads which use such devices over and over or short lived forked
processes which open the device, do the job, close and exit. Both
scenarios are fine with allocate on first use and drop on process exit.

I think with your approach you create overhead for applications which
use thread pools where the threads get work thrown at them and do open()
-> do_stuff() -> close() and then go back to wait for the next job which
will do exactly the same thing. So you add the overhead of refcounts in
general and in the worst case if the refcount drops to zero then the
next worker has to allocate a new PASID instead of just moving on.

So unless you have a really compelling real world usecase argument, I'm
arguing that the PASID pressure problem is a purely academic exercise.

I think you are conflating two things here:

  1) PASID lifetime
  2) PASID MSR overhead

Which is not correct: You still can and have to optimize the per thread
behaviour vs. the PASID MSR: Track per thread whether it ever needed the
PASID and act upon that.

If the thread just does EMQCMD once in it's lifetime, then so be
it. That's not a realistic use case, really.

And if someone does this then this does not mean we have to optimize for
that. Optimizing for possible stupid implementations is the wrong
approach. There is no technial measure against stupidity. If that would
exist the world would be a much better place.

You really have to think about the problem space you are working
on. There are problems which need a 'get it right at the first shot'
solution because they create user space ABI or otheer hard to fix
dependencies.

That's absolutely not the case here.

Get the basic simple support correct and work from there. Trying to
solve all possible theoretical problems upfront is simply not possible
and a guarantee for not making progress.

"Keep it simple" and "correctness first" are still the best working
engineering principles.

They do not prevent us from revisiting this _if_ there is a real world
problem which makes enough sense to implement a finer grained solution.

Thanks,

        tglx
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Thomas Gleixner <tglx@linutronix.de>
To: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Joerg Roedel <joro@8bytes.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Jacob Jun Pan <jacob.jun.pan@intel.com>,
	Ashok Raj <ashok.raj@intel.com>,
	Ravi V Shankar <ravi.v.shankar@intel.com>,
	iommu@lists.linux-foundation.org, x86 <x86@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting
Date: Sun, 26 Sep 2021 01:13:50 +0200	[thread overview]
Message-ID: <87mto0ckpd.ffs@tglx> (raw)
In-Reply-To: <YU34+1J4v0cn9ZRs@otcwcpicx3.sc.intel.com>

Fenghua,

On Fri, Sep 24 2021 at 16:12, Fenghua Yu wrote:
> On Fri, Sep 24, 2021 at 03:18:12PM +0200, Thomas Gleixner wrote:
>> But OTOH why do you need a per task reference count on the PASID at all?
>> 
>> The PASID is fundamentaly tied to the mm and the mm can't go away before
>> the threads have gone away unless this magically changed after I checked
>> that ~20 years ago.
>
> There are up to 1M PASIDs because PASID is 20-bit. I think there are a few ways
> to allocate and free PASID:
>
> 1. Statically allocate a PASID once a mm is created and free it in mm
>    exit. No PASID allocation/free during the mm's lifetime. Then
>    up to 1M processes can be created due to 1M PASIDs limitation.
>    We don't want this method because the 1M processes limitation.

I'm not so worried about the 1M limitation, but it obviously makes sense
to avoid that because allocating stuff which is not used is pointless in
general.

> 2. A PASID is allocated to the mm in open(dev)->bind(dev, mm). There
>    are three ways to free it:
>    (a) Actively free it in close(fd)->unbind(dev, mm) by sending
>        IPIs to tell all tasks using the PASID to clear the IA32_PASID
>        MSR. This has locking issues similar to the actively loading
>        IA32_PASID MSR which was force disabled in upstream. So won't work.

Exactly.

>    (b) Passively free the PASID in destroy_context(mm) in mm exit. Once
>        the PASID is allocated, it stays with the process for the lifetime. It's
>        better than #1 because the PASID is allocated only on demand.

Which is simple and makes a lot of sense. See below.

>    (c) Passively free the PASID in deactive_mm(mm) or unbind() whenever there
>        is no usage as implemented in this series. Tracking the PASID usage
>        per task provides a chance to free the PASID on task exit. The
>        PASID has a better chance to be freed earlier than mm exit in #(b).
>
> This series uses #2 and #(c) to allocate and free the PASID for a better
> chance to ease the 1M PASIDs limitation pressure. For example, a thread
> doing open(dev)->ENQCMD->close(fd)->exit(2) will not occupy a PASID while
> its sibling threads are still running.

I'm not seeing that as a realistic problem. Applications which use this
kind of devices are unlikely to behave exactly that way.

2^20 PASIDs are really plenty and just adding code for the theoretical
case of PASID pressure is a pointless exercise IMO. It just adds
complexity for no reason.

IMO reality will be that either you have long lived processes with tons
of threads which use such devices over and over or short lived forked
processes which open the device, do the job, close and exit. Both
scenarios are fine with allocate on first use and drop on process exit.

I think with your approach you create overhead for applications which
use thread pools where the threads get work thrown at them and do open()
-> do_stuff() -> close() and then go back to wait for the next job which
will do exactly the same thing. So you add the overhead of refcounts in
general and in the worst case if the refcount drops to zero then the
next worker has to allocate a new PASID instead of just moving on.

So unless you have a really compelling real world usecase argument, I'm
arguing that the PASID pressure problem is a purely academic exercise.

I think you are conflating two things here:

  1) PASID lifetime
  2) PASID MSR overhead

Which is not correct: You still can and have to optimize the per thread
behaviour vs. the PASID MSR: Track per thread whether it ever needed the
PASID and act upon that.

If the thread just does EMQCMD once in it's lifetime, then so be
it. That's not a realistic use case, really.

And if someone does this then this does not mean we have to optimize for
that. Optimizing for possible stupid implementations is the wrong
approach. There is no technial measure against stupidity. If that would
exist the world would be a much better place.

You really have to think about the problem space you are working
on. There are problems which need a 'get it right at the first shot'
solution because they create user space ABI or otheer hard to fix
dependencies.

That's absolutely not the case here.

Get the basic simple support correct and work from there. Trying to
solve all possible theoretical problems upfront is simply not possible
and a guarantee for not making progress.

"Keep it simple" and "correctness first" are still the best working
engineering principles.

They do not prevent us from revisiting this _if_ there is a real world
problem which makes enough sense to implement a finer grained solution.

Thanks,

        tglx

  reply	other threads:[~2021-09-25 23:13 UTC|newest]

Thread overview: 154+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-20 19:23 [PATCH 0/8] Re-enable ENQCMD and PASID MSR Fenghua Yu
2021-09-20 19:23 ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 1/8] iommu/vt-d: Clean up unused PASID updating functions Fenghua Yu
2021-09-20 19:23   ` Fenghua Yu
2021-09-29  7:34   ` Lu Baolu
2021-09-29  7:34     ` Lu Baolu
2021-09-30  0:40     ` Fenghua Yu
2021-09-30  0:40       ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 2/8] x86/process: Clear PASID state for a newly forked/cloned thread Fenghua Yu
2021-09-20 19:23   ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 3/8] sched: Define and initialize a flag to identify valid PASID in the task Fenghua Yu
2021-09-20 19:23   ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP Fenghua Yu
2021-09-20 19:23   ` Fenghua Yu
2021-09-22 21:07   ` Peter Zijlstra
2021-09-22 21:07     ` Peter Zijlstra
2021-09-22 21:11     ` Peter Zijlstra
2021-09-22 21:11       ` Peter Zijlstra
2021-09-22 21:26       ` Luck, Tony
2021-09-22 21:26         ` Luck, Tony
2021-09-23  7:03         ` Peter Zijlstra
2021-09-23  7:03           ` Peter Zijlstra
2021-09-22 21:33       ` Dave Hansen
2021-09-22 21:33         ` Dave Hansen
2021-09-23  7:05         ` Peter Zijlstra
2021-09-23  7:05           ` Peter Zijlstra
2021-09-22 21:36       ` Fenghua Yu
2021-09-22 21:36         ` Fenghua Yu
2021-09-22 23:39     ` Fenghua Yu
2021-09-22 23:39       ` Fenghua Yu
2021-09-23 17:14     ` Luck, Tony
2021-09-23 17:14       ` Luck, Tony
2021-09-24 13:37       ` Peter Zijlstra
2021-09-24 13:37         ` Peter Zijlstra
2021-09-24 15:39         ` Luck, Tony
2021-09-24 15:39           ` Luck, Tony
2021-09-29  9:00           ` Peter Zijlstra
2021-09-29  9:00             ` Peter Zijlstra
2021-09-23 11:31   ` Thomas Gleixner
2021-09-23 11:31     ` Thomas Gleixner
2021-09-23 23:17   ` Andy Lutomirski
2021-09-23 23:17     ` Andy Lutomirski
2021-09-24  2:56     ` Fenghua Yu
2021-09-24  2:56       ` Fenghua Yu
2021-09-24  5:12       ` Andy Lutomirski
2021-09-24  5:12         ` Andy Lutomirski
2021-09-27 21:02     ` Luck, Tony
2021-09-27 21:02       ` Luck, Tony
2021-09-27 23:51       ` Dave Hansen
2021-09-27 23:51         ` Dave Hansen
2021-09-28 18:50         ` Luck, Tony
2021-09-28 18:50           ` Luck, Tony
2021-09-28 19:19           ` Dave Hansen
2021-09-28 19:19             ` Dave Hansen
2021-09-28 20:28             ` Luck, Tony
2021-09-28 20:28               ` Luck, Tony
2021-09-28 20:55               ` Dave Hansen
2021-09-28 20:55                 ` Dave Hansen
2021-09-28 23:10                 ` Luck, Tony
2021-09-28 23:10                   ` Luck, Tony
2021-09-28 23:50                   ` Fenghua Yu
2021-09-28 23:50                     ` Fenghua Yu
2021-09-29  0:08                     ` Luck, Tony
2021-09-29  0:08                       ` Luck, Tony
2021-09-29  0:26                       ` Yu, Fenghua
2021-09-29  0:26                         ` Yu, Fenghua
2021-09-29  1:06                         ` Luck, Tony
2021-09-29  1:06                           ` Luck, Tony
2021-09-29  1:16                           ` Fenghua Yu
2021-09-29  1:16                             ` Fenghua Yu
2021-09-29  2:11                             ` Luck, Tony
2021-09-29  2:11                               ` Luck, Tony
2021-09-29  1:56                       ` Yu, Fenghua
2021-09-29  1:56                         ` Yu, Fenghua
2021-09-29  2:15                         ` Luck, Tony
2021-09-29  2:15                           ` Luck, Tony
2021-09-29 16:58                   ` Andy Lutomirski
2021-09-29 16:58                     ` Andy Lutomirski
2021-09-29 17:07                     ` Luck, Tony
2021-09-29 17:07                       ` Luck, Tony
2021-09-29 17:48                       ` Andy Lutomirski
2021-09-29 17:48                         ` Andy Lutomirski
2021-09-20 19:23 ` [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting Fenghua Yu
2021-09-20 19:23   ` Fenghua Yu
2021-09-23  5:43   ` Lu Baolu
2021-09-23  5:43     ` Lu Baolu
2021-09-30  0:44     ` Fenghua Yu
2021-09-30  0:44       ` Fenghua Yu
2021-09-23 14:36   ` Thomas Gleixner
2021-09-23 14:36     ` Thomas Gleixner
2021-09-23 16:40     ` Luck, Tony
2021-09-23 16:40       ` Luck, Tony
2021-09-23 17:48       ` Thomas Gleixner
2021-09-23 17:48         ` Thomas Gleixner
2021-09-24 13:18         ` Thomas Gleixner
2021-09-24 13:18           ` Thomas Gleixner
2021-09-24 16:12           ` Luck, Tony
2021-09-24 16:12             ` Luck, Tony
2021-09-24 23:03             ` Andy Lutomirski
2021-09-24 23:03               ` Andy Lutomirski
2021-09-24 23:11               ` Luck, Tony
2021-09-24 23:11                 ` Luck, Tony
2021-09-29  9:54               ` Peter Zijlstra
2021-09-29  9:54                 ` Peter Zijlstra
2021-09-29 12:28                 ` Thomas Gleixner
2021-09-29 12:28                   ` Thomas Gleixner
2021-09-29 16:51                   ` Luck, Tony
2021-09-29 16:51                     ` Luck, Tony
2021-09-29 17:07                     ` Fenghua Yu
2021-09-29 17:07                       ` Fenghua Yu
2021-09-29 16:59                   ` Andy Lutomirski
2021-09-29 16:59                     ` Andy Lutomirski
2021-09-29 17:15                     ` Thomas Gleixner
2021-09-29 17:15                       ` Thomas Gleixner
2021-09-29 17:41                       ` Luck, Tony
2021-09-29 17:41                         ` Luck, Tony
2021-09-29 17:46                         ` Andy Lutomirski
2021-09-29 17:46                           ` Andy Lutomirski
2021-09-29 18:07                         ` Fenghua Yu
2021-09-29 18:07                           ` Fenghua Yu
2021-09-29 18:31                           ` Luck, Tony
2021-09-29 18:31                             ` Luck, Tony
2021-09-29 20:07                             ` Thomas Gleixner
2021-09-29 20:07                               ` Thomas Gleixner
2021-09-24 16:12           ` Fenghua Yu
2021-09-24 16:12             ` Fenghua Yu
2021-09-25 23:13             ` Thomas Gleixner [this message]
2021-09-25 23:13               ` Thomas Gleixner
2021-09-28 16:36               ` Fenghua Yu
2021-09-28 16:36                 ` Fenghua Yu
2021-09-23 23:09   ` Andy Lutomirski
2021-09-23 23:09     ` Andy Lutomirski
2021-09-23 23:22     ` Luck, Tony
2021-09-23 23:22       ` Luck, Tony
2021-09-24  5:17       ` Andy Lutomirski
2021-09-24  5:17         ` Andy Lutomirski
2021-09-20 19:23 ` [PATCH 6/8] x86/cpufeatures: Re-enable ENQCMD Fenghua Yu
2021-09-20 19:23   ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 7/8] tools/objtool: Check for use of the ENQCMD instruction in the kernel Fenghua Yu
2021-09-20 19:23   ` Fenghua Yu
2021-09-22 21:03   ` Peter Zijlstra
2021-09-22 21:03     ` Peter Zijlstra
2021-09-22 23:44     ` Fenghua Yu
2021-09-22 23:44       ` Fenghua Yu
2021-09-23  7:17       ` Peter Zijlstra
2021-09-23  7:17         ` Peter Zijlstra
2021-09-23 15:26         ` Fenghua Yu
2021-09-23 15:26           ` Fenghua Yu
2021-09-24  0:55           ` Josh Poimboeuf
2021-09-24  0:55             ` Josh Poimboeuf
2021-09-24  0:57             ` Fenghua Yu
2021-09-24  0:57               ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 8/8] docs: x86: Change documentation for SVA (Shared Virtual Addressing) Fenghua Yu
2021-09-20 19:23   ` Fenghua Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mto0ckpd.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=ashok.raj@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.