linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: Dennis Zhou <dennis@kernel.org>,
	linux-kernel@vger.kernel.org, tj@kernel.org, cl@linux.com,
	akpm@linux-foundation.org, shakeelb@google.com,
	linux-mm@kvack.org
Subject: Re: [PATCH 0/2] execve scalability issues, part 1
Date: Wed, 23 Aug 2023 14:13:20 +0200	[thread overview]
Message-ID: <CAGudoHFFt5wvYWrwNkz813KaXBmROJ7YJ67s1h3_CBgcoV2fCA@mail.gmail.com> (raw)
In-Reply-To: <20230823094915.ggv3spzevgyoov6i@quack3>

On 8/23/23, Jan Kara <jack@suse.cz> wrote:
> On Tue 22-08-23 16:24:56, Mateusz Guzik wrote:
>> On 8/22/23, Jan Kara <jack@suse.cz> wrote:
>> > On Tue 22-08-23 00:29:49, Mateusz Guzik wrote:
>> >> On 8/21/23, Mateusz Guzik <mjguzik@gmail.com> wrote:
>> >> > True Fix(tm) is a longer story.
>> >> >
>> >> > Maybe let's sort out this patchset first, whichever way. :)
>> >> >
>> >>
>> >> So I found the discussion around the original patch with a perf
>> >> regression report.
>> >>
>> >> https://lore.kernel.org/linux-mm/20230608111408.s2minsenlcjow7q3@quack3/
>> >>
>> >> The reporter suggests dodging the problem by only allocating per-cpu
>> >> counters when the process is going multithreaded. Given that there is
>> >> still plenty of forever single-threaded procs out there I think that's
>> >> does sound like a great plan regardless of what happens with this
>> >> patchset.
>> >>
>> >> Almost all access is already done using dedicated routines, so this
>> >> should be an afternoon churn to sort out, unless I missed a
>> >> showstopper. (maybe there is no good place to stuff a flag/whatever
>> >> other indicator about the state of counters?)
>> >>
>> >> That said I'll look into it some time this or next week.
>> >
>> > Good, just let me know how it went, I also wanted to start looking into
>> > this to come up with some concrete patches :). What I had in mind was
>> > that
>> > we could use 'counters == NULL' as an indication that the counter is
>> > still
>> > in 'single counter mode'.
>> >
>>
>> In the current state there are only pointers to counters in mm_struct
>> and there is no storage for them in task_struct. So I don't think
>> merely null-checking the per-cpu stuff is going to cut it -- where
>> should the single-threaded counters land?
>
> I think you misunderstood. What I wanted to do it to provide a new flavor
> of percpu_counter (sharing most of code and definitions) which would have
> an option to start as simple counter (indicated by pcc->counters == NULL
> and using pcc->count for counting) and then be upgraded by a call to real
> percpu thing. Because I think such counters would be useful also on other
> occasions than as rss counters.
>

Indeed I did -- I had tunnel vision on dodging atomics for current
given remote modifications, which wont be done in your proposal.

I concede your idea solves the problem at hand, I question whether it
is the right to do though. Not my call to make.

>> Then for single-threaded case an area is allocated for NR_MM_COUNTERS
>> countes * 2 -- first set updated without any synchro by current
>> thread. Second set only to be modified by others and protected with
>> mm->arg_lock. The lock protects remote access to the union to begin
>> with.
>
> arg_lock seems a bit like a hack. How is it related to rss_stat? The scheme
> with two counters is clever but I'm not 100% convinced the complexity is
> really worth it. I'm not sure the overhead of always using an atomic
> counter would really be measurable as atomic counter ops in local CPU cache
> tend to be cheap. Did you try to measure the difference?
>

arg_lock is not as is, it would have to be renamed to something more generic.

Atomics on x86-64 are very expensive to this very day. Here is a
sample measurement of 2 atomics showing up done by someone else:
https://lore.kernel.org/oe-lkp/202308141149.d38fdf91-oliver.sang@intel.com/T/#u

tl;dr it is *really* bad.

> If the second counter proves to be worth it, we could make just that one
> atomic to avoid the need for abusing some spinlock.
>

The spinlock would be there to synchronize against the transition to
per-cpu -- any trickery is avoided and we trivially know for a fact
the remote party either sees the per-cpu state if transitioned, or
local if not. Then one easily knows no updates have been lost and the
buf for 2 sets of counters can be safely freed.

While writing down the idea previously I did not realize the per-cpu
counter ops disable interrupts around the op. That's already very slow
and the trip should be comparable to paying for an atomic (as in the
patch which introduced percpu counters here slowed things down for
single-threaded processes).

With your proposal the atomic would be there, but interrupt trip could
be avoided. This would roughly maintain the current cost of doing the
op (as in it would not get /worse/). My patch would make it lower.

All that said, I'm going to refrain from writing a patch for the time
being. If powers to be decide on your approach, I'm not going to argue
-- I don't think either is a clear winner over the other.

-- 
Mateusz Guzik <mjguzik gmail.com>

  parent reply	other threads:[~2023-08-23 12:13 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 20:28 [PATCH 0/2] execve scalability issues, part 1 Mateusz Guzik
2023-08-21 20:28 ` [PATCH 1/2] pcpcntr: add group allocation/free Mateusz Guzik
2023-08-22 13:37   ` Vegard Nossum
2023-08-22 14:06     ` Mateusz Guzik
2023-08-22 17:02   ` Dennis Zhou
2023-08-21 20:28 ` [PATCH 2/2] fork: group allocation of per-cpu counters for mm struct Mateusz Guzik
2023-08-21 21:20   ` Matthew Wilcox
2023-08-21 20:42 ` [PATCH 0/2] execve scalability issues, part 1 Matthew Wilcox
2023-08-21 20:44   ` [PATCH 1/7] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 2/7] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 3/7] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 4/7] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 5/7] memcg: Add mem_cgroup_uncharge_batch() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 6/7] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 7/7] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2023-08-21 21:07 ` [PATCH 0/2] execve scalability issues, part 1 Dennis Zhou
2023-08-21 21:39   ` Mateusz Guzik
2023-08-21 22:29     ` Mateusz Guzik
2023-08-22  9:51       ` Jan Kara
2023-08-22 14:24         ` Mateusz Guzik
2023-08-23  9:49           ` Jan Kara
2023-08-23 10:49             ` David Laight
2023-08-23 12:01               ` Mateusz Guzik
2023-08-23 12:13             ` Mateusz Guzik [this message]
2023-08-23 15:47               ` Jan Kara
2023-08-23 16:10                 ` Mateusz Guzik
2023-08-23 16:41                   ` Jan Kara
2023-08-23 17:12                     ` Mateusz Guzik
2023-08-23 20:27             ` Dennis Zhou
2023-08-24  9:19               ` Jan Kara
2023-08-26 18:33 ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGudoHFFt5wvYWrwNkz813KaXBmROJ7YJ67s1h3_CBgcoV2fCA@mail.gmail.com \
    --to=mjguzik@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dennis@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).