From: Nicholas Piggin <npiggin@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>
Cc: Anton Blanchard <anton@ozlabs.org>,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
Randy Dunlap <rdunlap@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v4 2/4] lazy tlb: allow lazy tlb mm refcounting to be configurable
Date: Mon, 14 Jun 2021 15:21:55 +1000 [thread overview]
Message-ID: <1623647326.0np4yc0lo0.astroid@bobo.none> (raw)
In-Reply-To: <1623645385.u2cqbcn3co.astroid@bobo.none>
Excerpts from Nicholas Piggin's message of June 14, 2021 2:47 pm:
> Excerpts from Nicholas Piggin's message of June 14, 2021 2:14 pm:
>> Excerpts from Andy Lutomirski's message of June 14, 2021 1:52 pm:
>>> On 6/13/21 5:45 PM, Nicholas Piggin wrote:
>>>> Excerpts from Andy Lutomirski's message of June 9, 2021 2:20 am:
>>>>> On 6/4/21 6:42 PM, Nicholas Piggin wrote:
>>>>>> Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm
>>>>>> when it is context switched. This can be disabled by architectures that
>>>>>> don't require this refcounting if they clean up lazy tlb mms when the
>>>>>> last refcount is dropped. Currently this is always enabled, which is
>>>>>> what existing code does, so the patch is effectively a no-op.
>>>>>>
>>>>>> Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is.
>>>>>
>>>>> I am in favor of this approach, but I would be a lot more comfortable
>>>>> with the resulting code if task->active_mm were at least better
>>>>> documented and possibly even guarded by ifdefs.
>>>>
>>>> active_mm is fairly well documented in Documentation/active_mm.rst IMO.
>>>> I don't think anything has changed in 20 years, I don't know what more
>>>> is needed, but if you can add to documentation that would be nice. Maybe
>>>> moving a bit of that into .c and .h files?
>>>>
>>>
>>> Quoting from that file:
>>>
>>> - however, we obviously need to keep track of which address space we
>>> "stole" for such an anonymous user. For that, we have "tsk->active_mm",
>>> which shows what the currently active address space is.
>>>
>>> This isn't even true right now on x86.
>>
>> From the perspective of core code, it is. x86 might do something crazy
>> with it, but it has to make it appear this way to non-arch code that
>> uses active_mm.
>>
>> Is x86's scheme documented?
>>
>>> With your patch applied:
>>>
>>> To support all that, the "struct mm_struct" now has two counters: a
>>> "mm_users" counter that is how many "real address space users" there are,
>>> and a "mm_count" counter that is the number of "lazy" users (ie anonymous
>>> users) plus one if there are any real users.
>>>
>>> isn't even true any more.
>>
>> Well yeah but the active_mm concept hasn't changed. The refcounting
>> change is hopefully reasonably documented?
>>
>>>
>>>
>>>>> x86 bare metal currently does not need the core lazy mm refcounting, and
>>>>> x86 bare metal *also* does not need ->active_mm. Under the x86 scheme,
>>>>> if lazy mm refcounting were configured out, ->active_mm could become a
>>>>> dangling pointer, and this makes me extremely uncomfortable.
>>>>>
>>>>> So I tend to think that, depending on config, the core code should
>>>>> either keep ->active_mm [1] alive or get rid of it entirely.
>>>>
>>>> I don't actually know what you mean.
>>>>
>>>> core code needs the concept of an "active_mm". This is the mm that your
>>>> kernel threads are using, even in the unmerged CONFIG_LAZY_TLB=n patch,
>>>> active_mm still points to init_mm for kernel threads.
>>>
>>> Core code does *not* need this concept. First, it's wrong on x86 since
>>> at least 4.15. Any core code that actually assumes that ->active_mm is
>>> "active" for any sensible definition of the word active is wrong.
>>> Fortunately there is no such code.
>>>
>>> I looked through all active_mm references in core code. We have:
>>>
>>> kernel/sched/core.c: it's all refcounting, although it's a bit tangled
>>> with membarrier.
>>>
>>> kernel/kthread.c: same. refcounting and membarrier stuff.
>>>
>>> kernel/exit.c: exit_mm() a BUG_ON().
>>>
>>> kernel/fork.c: initialization code and a warning.
>>>
>>> kernel/cpu.c: cpu offline stuff. wouldn't be needed if active_mm went away.
>>>
>>> fs/exec.c: nothing of interest
>>
>> I might not have been clear. Core code doesn't need active_mm if
>> active_mm somehow goes away. I'm saying active_mm can't go away because
>> it's needed to support (most) archs that do lazy tlb mm switching.
>>
>> The part I don't understand is when you say it can just go away. How?
>>
>>> I didn't go through drivers, but I maintain my point. active_mm is
>>> there for refcounting. So please don't just make it even more confusing
>>> -- do your performance improvement, but improve the code at the same
>>> time: get rid of active_mm, at least on architectures that opt out of
>>> the refcounting.
>>
>> powerpc opts out of the refcounting and can not "get rid of active_mm".
>> Not even in theory.
>
> That is to say, it does do a type of reference management that requires
> active_mm so you can argue it has not entirely opted out of refcounting.
> But we're not just doing refcounting for the sake of refcounting! That
> would make no sense.
>
> active_mm is required because that's the mm that we have switched to
> (from core code's perspective), and it is integral to know when to
> switch to a different mm. See how active_mm is a fundamental concept
> in core code? It's part of the contract between core code and the
> arch mm context management calls. reference counting follows from there
> but it's not the _reason_ for this code.
>
> Pretend the reference problem does not exit (whether by refcounting or
> shootdown or garbage collection or whatever). We still can't remove
> active_mm! We need it to know how to call into arch functions like
> switch_mm.
>
> I don't know if you just forgot that critical requirement in your above
> list, or you actually are entirely using x86's mental model for this
> code which is doing something entirely different that does not need it
> at all. If that is the case I really don't mind some cleanup or wrapper
> functions for x86 do entirely do its own thing, but if that's the case
> you can't criticize core code's use of active_mm due to the current
> state of x86. It's x86 that needs documentation and cleaning up.
Ah, that must be where your confusion is coming from: x86's switch_mm
doesn't use prev anywhere, and the reference scheme it is using appears
to be under-documented, although vague references in changelogs suggest
it has not actually "opted out" of active_mm refcounting.
That's understandable, but please redirect your objections to the proper
place. git blame suggests 3d28ebceaffab.
Thanks,
Nick
next prev parent reply other threads:[~2021-06-14 5:23 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-05 1:42 [PATCH v4 0/4] shoot lazy tlbs Nicholas Piggin
2021-06-05 1:42 ` [PATCH v4 1/4] lazy tlb: introduce lazy mm refcount helper functions Nicholas Piggin
2021-06-07 23:49 ` Andrew Morton
2021-06-08 1:39 ` Nicholas Piggin
2021-06-08 1:48 ` Andrew Morton
2021-06-08 4:11 ` Nicholas Piggin
2021-06-05 1:42 ` [PATCH v4 2/4] lazy tlb: allow lazy tlb mm refcounting to be configurable Nicholas Piggin
2021-06-08 3:11 ` Nicholas Piggin
2021-06-08 16:20 ` Andy Lutomirski
2021-06-14 0:45 ` Nicholas Piggin
2021-06-14 3:52 ` Andy Lutomirski
2021-06-14 4:14 ` Nicholas Piggin
2021-06-14 4:47 ` Nicholas Piggin
2021-06-14 5:21 ` Nicholas Piggin [this message]
2021-06-14 16:20 ` Andy Lutomirski
2021-06-15 0:55 ` Nicholas Piggin
2021-06-16 0:14 ` Andy Lutomirski
2021-06-16 1:02 ` Nicholas Piggin
2021-06-17 0:32 ` Nicholas Piggin
2021-06-05 1:42 ` [PATCH v4 3/4] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Nicholas Piggin
2021-06-08 3:15 ` Nicholas Piggin
2021-06-05 1:42 ` [PATCH v4 4/4] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN Nicholas Piggin
2021-06-07 23:52 ` Andrew Morton
2021-06-08 2:13 ` Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1623647326.0np4yc0lo0.astroid@bobo.none \
--to=npiggin@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=anton@ozlabs.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=luto@kernel.org \
--cc=rdunlap@infradead.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).