linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@intel.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	<linux-kernel@vger.kernel.org>, Olivier Dion <odion@efficios.com>,
	<michael.christie@oracle.com>
Subject: Re: [RFC PATCH v9 2/2] sched: Fix performance regression introduced by mm_cid
Date: Thu, 20 Apr 2023 22:39:10 +0800	[thread overview]
Message-ID: <20230420143910.GB154815@ziqianlu-desk2> (raw)
In-Reply-To: <20230420141852.GA154815@ziqianlu-desk2>

On Thu, Apr 20, 2023 at 10:18:52PM +0800, Aaron Lu wrote:
> On Thu, Apr 20, 2023 at 09:54:29AM -0400, Mathieu Desnoyers wrote:
> > On 2023-04-20 09:35, Aaron Lu wrote:
> > [...]
> > > > > > 
> > > > > > Then we clearly have another member of mm_struct on the same cache line as
> > > > > > pcpu_cid which is bouncing all over the place and causing false-sharing. Any
> > > > > > idea which field(s) are causing this ?
> > > > > 
> > > > > That's my first reaction too but as I said in an earlier reply:
> > > > > https://lore.kernel.org/lkml/20230419080606.GA4247@ziqianlu-desk2/
> > > > > I've tried to place pcpu_cid into a dedicate cacheline with no other
> > > > > fields sharing a cacheline with it in mm_struct but it didn't help...
> > > > 
> > > > I see two possible culprits there:
> > > > 
> > > > 1) The mm_struct pcpu_cid field is suffering from false-sharing. I would be
> > > >     interested to look at your attempt to move it to a separate cache line to
> > > >     try to figure out what is going on.
> > > 
> > > Brain damaged...my mistake, I only made sure its following fields not
> > > share the same cacheline but forgot to exclude its preceding fields and
> > > turned out it's one(some?) of the preceeding fields that caused false
> > > sharing. When I did:
> > > 
> > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > index 5eab61156f0e..a6f9d815991c 100644
> > > --- a/include/linux/mm_types.h
> > > +++ b/include/linux/mm_types.h
> > > @@ -606,6 +606,7 @@ struct mm_struct {
> > >                   */
> > >                  atomic_t mm_count;
> > >   #ifdef CONFIG_SCHED_MM_CID
> > > +               CACHELINE_PADDING(_pad1_);
> > >                  /**
> > >                   * @pcpu_cid: Per-cpu current cid.
> > >                   *
> > > mm_cid_get() dropped to 0.0x% when running hackbench :-)
> > 
> > Now we are talking! :)
> > 
> > > 
> > > sched_mm_cid_migrate_to() is about 4% with most cycles spent on
> > > accessing mm->mm_users:
> > > 
> > >         │     dst_cid = READ_ONCE(dst_pcpu_cid->cid);
> > >    0.03 │       mov     0x8(%r12),%r15d
> > >         │     if (!mm_cid_is_unset(dst_cid) &&
> > >    0.07 │       cmp     $0xffffffff,%r15d
> > >         │     ↓ je      87
> > >         │     arch_atomic_read():
> > >         │     {
> > >         │     /*
> > >         │     * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
> > >         │     * it's non-inlined function that increases binary size and stack usage.
> > >         │     */
> > >         │     return __READ_ONCE((v)->counter);
> > >   76.13 │       mov     0x54(%r13),%eax
> > >         │     sched_mm_cid_migrate_to():
> > >         │       cmp     %eax,0x410(%rdx)
> > >   21.71 │     ↓ jle     1d8
> > >         │     atomic_read(&mm->mm_users) >= t->nr_cpus_allowed)
> > > 
> > > With this info, it should be mm_users that caused false sharing for
> > > pcpu_cid previously. Looks like mm_users is bouncing.
> > 
> > I suspect that the culprit here is mm_count rather than mm_users. mm_users
> > just happens to share the same cache line as mm_count.
> > 
> > mm_count is incremented/decremented with mmgrab()/mmdrop() during
> > context switch.
> > 
> > This is likely causing other issues, for instance, the
> > membarrier_state field is AFAIR read-mostly, used for
> > membarrier_mm_sync_core_before_usermode() to issue core
> > sync before every return to usermode if needed.
> > 
> > Other things like mm_struct pgd pointer appear to be likely
> > read-mostly variables.
> > 
> > I suspect it's mm_count which should be moved to its own cache line
> > to eliminate false-sharing with all the other read-mostly fields
> > of mm_struct.
> > 
> > Thoughts ?
> 
> Makes sesne, I was wondering where the write side of mm_user is. Let me
> see how that goes by placing mm_count aside from other read mostly fields.

With the following naive padding for mm_count:

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5eab61156f0e..866696e2d83e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -604,7 +604,9 @@ struct mm_struct {
                 * Use mmgrab()/mmdrop() to modify. When this drops to 0, the
                 * &struct mm_struct is freed.
                 */
+               CACHELINE_PADDING(_pad1_);
                atomic_t mm_count;
+               CACHELINE_PADDING(_pad2_);
 #ifdef CONFIG_SCHED_MM_CID
                /**
                 * @pcpu_cid: Per-cpu current cid.

mm_cid_get() is about 0.1% and sched_mm_cid_migrate_to() is about 0.2%
for hackbench on SPR :-)

Thanks,
Aaron

  reply	other threads:[~2023-04-20 14:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-19 15:50 [RFC PATCH v9 1/2] sched: Fix: Handle target_cpu offlining in active_load_balance_cpu_stop Mathieu Desnoyers
2023-04-19 15:50 ` [RFC PATCH v9 2/2] sched: Fix performance regression introduced by mm_cid Mathieu Desnoyers
2023-04-20  9:56   ` Aaron Lu
2023-04-20 12:41     ` Mathieu Desnoyers
2023-04-20 12:50       ` Aaron Lu
2023-04-20 13:10         ` Mathieu Desnoyers
2023-04-20 13:35           ` Aaron Lu
2023-04-20 13:54             ` Mathieu Desnoyers
2023-04-20 14:18               ` Aaron Lu
2023-04-20 14:39                 ` Aaron Lu [this message]
2023-04-20 14:41                   ` Mathieu Desnoyers
2023-04-20 14:35               ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230420143910.GB154815@ziqianlu-desk2 \
    --to=aaron.lu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael.christie@oracle.com \
    --cc=odion@efficios.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).