All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vacek <neelx.g@gmail.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>, Hugh Dickins <hughd@google.com>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Thu, 5 Aug 2021 20:47:58 +0200	[thread overview]
Message-ID: <CAA7rmPFTBvO+=+Wh36nQPG5JsekXpDC6L0EuC7fA8ztovqn8iQ@mail.gmail.com> (raw)
In-Reply-To: <20210805140458.GF6464@techsingularity.net>

Hi Mel.

On Thu, Aug 5, 2021 at 4:48 PM Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Thu, Aug 05, 2021 at 02:56:53PM +0200, Thomas Gleixner wrote:
> > On Wed, Aug 04 2021 at 15:23, Mel Gorman wrote:
> > Mel,
> >
> > > On Wed, Aug 04, 2021 at 03:42:25PM +0200, Vlastimil Babka wrote:
> > >> The idea was not build-time, but runtime (hidden behind lockdep, VM_DEBUG or
> > >> whatnot), i.e.:
> > >>
> > >> <sched_expert> what that code needs is switch(item) { case foo1: case foo2:
> > >> lockdep_assert_irqs_disabled(); break; case bar1: case bar2:
> > >> lockdep_assert_preempt_disabled(); lockdep_assert_no_in_irq(); break; } or
> > >> something along those lines
> > >>
> > > Ok, that would potentially work. It may not even need to split the stats
> > > into different enums. Simply document which stats need protection from
> > > IRQ or preemption and use PROVE_LOCKING to check if preemption or IRQs
> > > are disabled depending on the kernel config. I don't think it gets rid
> > > of preempt_disable_rt unless the API was completely reworked with entry
> > > points that describe the locking requirements. That would be tricky
> > > because the requirements differ between kernel configurations.
> >
> > Right. This won't get rid of the preempt disabling on RT, but I think we
> > should rather open code this
> >
> >        if (IS_ENABLED(CONFIG_PREEMPT_RT))
> >                       preempt_dis/enable();
> >
> > instead of proliferating these helper macros which have only one user left.
> >
>
> Ok, that is reasonable. I tried creating a vmstat-specific helper but the
> names were misleading so I ended up with the patch below which open-codes
> it as you suggest. The comment is not accurate because "locking/local_lock:
> Add RT support" is not upstream but it'll eventually be accurate.
>
> Is this ok?
>
> --8<--
> From e5b7a2ffcf55e4b4030fd54e49f5c5a1d1864ebe Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@elte.hu>
> Date: Fri, 3 Jul 2009 08:30:13 -0500
> Subject: [PATCH] mm/vmstat: Protect per cpu variables with preempt disable on
>  RT
>
> Disable preemption on -RT for the vmstat code. On vanila the code runs
> in IRQ-off regions while on -RT it may not when stats are updated under
> a local_lock. "preempt_disable" ensures that the same resources is not
> updated in parallel due to preemption.
>
> This patch differs from the preempt-rt version where __count_vm_event and
> __count_vm_events are also protected. The counters are explicitly "allowed
> to be to be racy" so there is no need to protect them from preemption. Only
> the accurate page stats that are updated by a read-modify-write need
> protection. This patch also differs in that a preempt_[en|dis]able_rt
> helper is not used. As vmstat is the only user of the helper, it was
> suggested that it be open-coded in vmstat.c instead of risking the helper
> being used in unnecessary contexts.
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/vmstat.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index b0534e068166..2c7e7569a453 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -319,6 +319,16 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>         long x;
>         long t;
>
> +       /*
> +        * Accurate vmstat updates require a RMW. On !PREEMPT_RT kernels,
> +        * atomicity is provided by IRQs being disabled -- either explicitly
> +        * or via local_lock_irq. On PREEMPT_RT, local_lock_irq only disables
> +        * CPU migrations and preemption potentially corrupts a counter so
> +        * disable preemption.
> +        */
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         x = delta + __this_cpu_read(*p);
>
>         t = __this_cpu_read(pcp->stat_threshold);
> @@ -328,6 +338,9 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>                 x = 0;
>         }
>         __this_cpu_write(*p, x);
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>  EXPORT_SYMBOL(__mod_zone_page_state);
>
> @@ -350,6 +363,10 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
>                 delta >>= PAGE_SHIFT;
>         }
>
> +       /* See __mod_node_page_state */

__mod_zone_page_state?

> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         x = delta + __this_cpu_read(*p);
>
>         t = __this_cpu_read(pcp->stat_threshold);
> @@ -359,6 +376,9 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
>                 x = 0;
>         }
>         __this_cpu_write(*p, x);
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>  EXPORT_SYMBOL(__mod_node_page_state);
>
> @@ -391,6 +411,10 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
>         s8 __percpu *p = pcp->vm_stat_diff + item;
>         s8 v, t;
>
> +       /* See __mod_node_page_state */

ditto.

> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         v = __this_cpu_inc_return(*p);
>         t = __this_cpu_read(pcp->stat_threshold);
>         if (unlikely(v > t)) {
> @@ -399,6 +423,9 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
>                 zone_page_state_add(v + overstep, zone, item);
>                 __this_cpu_write(*p, -overstep);
>         }
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>
>  void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
> @@ -409,6 +436,10 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>
>         VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
>
> +       /* See __mod_node_page_state */

""

> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         v = __this_cpu_inc_return(*p);
>         t = __this_cpu_read(pcp->stat_threshold);
>         if (unlikely(v > t)) {
> @@ -417,6 +448,9 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>                 node_page_state_add(v + overstep, pgdat, item);
>                 __this_cpu_write(*p, -overstep);
>         }
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>
>  void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
> @@ -437,6 +471,10 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
>         s8 __percpu *p = pcp->vm_stat_diff + item;
>         s8 v, t;
>
> +       /* See __mod_node_page_state */

...

> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         v = __this_cpu_dec_return(*p);
>         t = __this_cpu_read(pcp->stat_threshold);
>         if (unlikely(v < - t)) {
> @@ -445,6 +483,9 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
>                 zone_page_state_add(v - overstep, zone, item);
>                 __this_cpu_write(*p, overstep);
>         }
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>
>  void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
> @@ -455,6 +496,10 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>
>         VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
>
> +       /* See __mod_node_page_state */

and one more time here

--nX

> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         v = __this_cpu_dec_return(*p);
>         t = __this_cpu_read(pcp->stat_threshold);
>         if (unlikely(v < - t)) {
> @@ -463,6 +508,9 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>                 node_page_state_add(v - overstep, pgdat, item);
>                 __this_cpu_write(*p, overstep);
>         }
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>
>  void __dec_zone_page_state(struct page *page, enum zone_stat_item item)

  parent reply	other threads:[~2021-08-05 18:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 10:00 [PATCH 0/2] Protect vmstats on PREEMPT_RT Mel Gorman
2021-07-23 10:00 ` [PATCH 1/2] preempt: Provide preempt_*_(no)rt variants Mel Gorman
2021-07-23 10:00 ` [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT Mel Gorman
2021-08-03 23:54   ` Thomas Gleixner
2021-08-03 23:54     ` Thomas Gleixner
2021-08-04  9:54     ` Mel Gorman
2021-08-04 13:42       ` Vlastimil Babka
2021-08-04 14:23         ` Mel Gorman
2021-08-05 12:56           ` Thomas Gleixner
2021-08-05 14:04             ` Mel Gorman
2021-08-05 15:42               ` Thomas Gleixner
2021-08-05 18:47               ` Daniel Vacek [this message]
2021-08-05 18:47                 ` Daniel Vacek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA7rmPFTBvO+=+Wh36nQPG5JsekXpDC6L0EuC7fA8ztovqn8iQ@mail.gmail.com' \
    --to=neelx.g@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.