linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Vacek <neelx.g@gmail.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>, Hugh Dickins <hughd@google.com>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Thu, 5 Aug 2021 20:47:58 +0200	[thread overview]
Message-ID: <CAA7rmPFTBvO+=+Wh36nQPG5JsekXpDC6L0EuC7fA8ztovqn8iQ@mail.gmail.com> (raw)
In-Reply-To: <20210805140458.GF6464@techsingularity.net>

Hi Mel.

On Thu, Aug 5, 2021 at 4:48 PM Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Thu, Aug 05, 2021 at 02:56:53PM +0200, Thomas Gleixner wrote:
> > On Wed, Aug 04 2021 at 15:23, Mel Gorman wrote:
> > Mel,
> >
> > > On Wed, Aug 04, 2021 at 03:42:25PM +0200, Vlastimil Babka wrote:
> > >> The idea was not build-time, but runtime (hidden behind lockdep, VM_DEBUG or
> > >> whatnot), i.e.:
> > >>
> > >> <sched_expert> what that code needs is switch(item) { case foo1: case foo2:
> > >> lockdep_assert_irqs_disabled(); break; case bar1: case bar2:
> > >> lockdep_assert_preempt_disabled(); lockdep_assert_no_in_irq(); break; } or
> > >> something along those lines
> > >>
> > > Ok, that would potentially work. It may not even need to split the stats
> > > into different enums. Simply document which stats need protection from
> > > IRQ or preemption and use PROVE_LOCKING to check if preemption or IRQs
> > > are disabled depending on the kernel config. I don't think it gets rid
> > > of preempt_disable_rt unless the API was completely reworked with entry
> > > points that describe the locking requirements. That would be tricky
> > > because the requirements differ between kernel configurations.
> >
> > Right. This won't get rid of the preempt disabling on RT, but I think we
> > should rather open code this
> >
> >        if (IS_ENABLED(CONFIG_PREEMPT_RT))
> >                       preempt_dis/enable();
> >
> > instead of proliferating these helper macros which have only one user left.
> >
>
> Ok, that is reasonable. I tried creating a vmstat-specific helper but the
> names were misleading so I ended up with the patch below which open-codes
> it as you suggest. The comment is not accurate because "locking/local_lock:
> Add RT support" is not upstream but it'll eventually be accurate.
>
> Is this ok?
>
> --8<--
> From e5b7a2ffcf55e4b4030fd54e49f5c5a1d1864ebe Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@elte.hu>
> Date: Fri, 3 Jul 2009 08:30:13 -0500
> Subject: [PATCH] mm/vmstat: Protect per cpu variables with preempt disable on
>  RT
>
> Disable preemption on -RT for the vmstat code. On vanila the code runs
> in IRQ-off regions while on -RT it may not when stats are updated under
> a local_lock. "preempt_disable" ensures that the same resources is not
> updated in parallel due to preemption.
>
> This patch differs from the preempt-rt version where __count_vm_event and
> __count_vm_events are also protected. The counters are explicitly "allowed
> to be to be racy" so there is no need to protect them from preemption. Only
> the accurate page stats that are updated by a read-modify-write need
> protection. This patch also differs in that a preempt_[en|dis]able_rt
> helper is not used. As vmstat is the only user of the helper, it was
> suggested that it be open-coded in vmstat.c instead of risking the helper
> being used in unnecessary contexts.
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/vmstat.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index b0534e068166..2c7e7569a453 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -319,6 +319,16 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>         long x;
>         long t;
>
> +       /*
> +        * Accurate vmstat updates require a RMW. On !PREEMPT_RT kernels,
> +        * atomicity is provided by IRQs being disabled -- either explicitly
> +        * or via local_lock_irq. On PREEMPT_RT, local_lock_irq only disables
> +        * CPU migrations and preemption potentially corrupts a counter so
> +        * disable preemption.
> +        */
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         x = delta + __this_cpu_read(*p);
>
>         t = __this_cpu_read(pcp->stat_threshold);
> @@ -328,6 +338,9 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>                 x = 0;
>         }
>         __this_cpu_write(*p, x);
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>  EXPORT_SYMBOL(__mod_zone_page_state);
>
> @@ -350,6 +363,10 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
>                 delta >>= PAGE_SHIFT;
>         }
>
> +       /* See __mod_node_page_state */

__mod_zone_page_state?

> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         x = delta + __this_cpu_read(*p);
>
>         t = __this_cpu_read(pcp->stat_threshold);
> @@ -359,6 +376,9 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
>                 x = 0;
>         }
>         __this_cpu_write(*p, x);
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>  EXPORT_SYMBOL(__mod_node_page_state);
>
> @@ -391,6 +411,10 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
>         s8 __percpu *p = pcp->vm_stat_diff + item;
>         s8 v, t;
>
> +       /* See __mod_node_page_state */

ditto.

> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         v = __this_cpu_inc_return(*p);
>         t = __this_cpu_read(pcp->stat_threshold);
>         if (unlikely(v > t)) {
> @@ -399,6 +423,9 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
>                 zone_page_state_add(v + overstep, zone, item);
>                 __this_cpu_write(*p, -overstep);
>         }
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>
>  void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
> @@ -409,6 +436,10 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>
>         VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
>
> +       /* See __mod_node_page_state */

""

> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         v = __this_cpu_inc_return(*p);
>         t = __this_cpu_read(pcp->stat_threshold);
>         if (unlikely(v > t)) {
> @@ -417,6 +448,9 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>                 node_page_state_add(v + overstep, pgdat, item);
>                 __this_cpu_write(*p, -overstep);
>         }
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>
>  void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
> @@ -437,6 +471,10 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
>         s8 __percpu *p = pcp->vm_stat_diff + item;
>         s8 v, t;
>
> +       /* See __mod_node_page_state */

...

> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         v = __this_cpu_dec_return(*p);
>         t = __this_cpu_read(pcp->stat_threshold);
>         if (unlikely(v < - t)) {
> @@ -445,6 +483,9 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
>                 zone_page_state_add(v - overstep, zone, item);
>                 __this_cpu_write(*p, overstep);
>         }
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>
>  void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
> @@ -455,6 +496,10 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>
>         VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
>
> +       /* See __mod_node_page_state */

and one more time here

--nX

> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
> +
>         v = __this_cpu_dec_return(*p);
>         t = __this_cpu_read(pcp->stat_threshold);
>         if (unlikely(v < - t)) {
> @@ -463,6 +508,9 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>                 node_page_state_add(v - overstep, pgdat, item);
>                 __this_cpu_write(*p, overstep);
>         }
> +
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>  }
>
>  void __dec_zone_page_state(struct page *page, enum zone_stat_item item)

      parent reply	other threads:[~2021-08-05 18:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 10:00 [PATCH 0/2] Protect vmstats on PREEMPT_RT Mel Gorman
2021-07-23 10:00 ` [PATCH 1/2] preempt: Provide preempt_*_(no)rt variants Mel Gorman
2021-07-23 10:00 ` [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT Mel Gorman
2021-08-03 23:54   ` Thomas Gleixner
2021-08-04  9:54     ` Mel Gorman
2021-08-04 13:42       ` Vlastimil Babka
2021-08-04 14:23         ` Mel Gorman
2021-08-05 12:56           ` Thomas Gleixner
2021-08-05 14:04             ` Mel Gorman
2021-08-05 15:42               ` Thomas Gleixner
2021-08-05 18:47               ` Daniel Vacek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA7rmPFTBvO+=+Wh36nQPG5JsekXpDC6L0EuC7fA8ztovqn8iQ@mail.gmail.com' \
    --to=neelx.g@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --subject='Re: [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).