Linux-rt-users Archive on lore.kernel.org
 help / color / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, Michal Hocko <mhocko@kernel.org>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH 02/11] mm/page_alloc: Convert per-cpu list protection to local_lock
Date: Fri, 9 Apr 2021 08:59:39 +0100
Message-ID: <20210409075939.GJ3697@techsingularity.net> (raw)
In-Reply-To: <YG/2scd9ADdrIyCM@hirez.programming.kicks-ass.net>

On Fri, Apr 09, 2021 at 08:39:45AM +0200, Peter Zijlstra wrote:
> On Thu, Apr 08, 2021 at 06:42:44PM +0100, Mel Gorman wrote:
> > On Thu, Apr 08, 2021 at 12:52:07PM +0200, Peter Zijlstra wrote:
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index a68bacddcae0..e9e60d1a85d4 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -112,6 +112,13 @@ typedef int __bitwise fpi_t;
> > > >  static DEFINE_MUTEX(pcp_batch_high_lock);
> > > >  #define MIN_PERCPU_PAGELIST_FRACTION	(8)
> > > >  
> > > > +struct pagesets {
> > > > +	local_lock_t lock;
> > > > +};
> > > > +static DEFINE_PER_CPU(struct pagesets, pagesets) = {
> > > > +	.lock = INIT_LOCAL_LOCK(lock),
> > > > +};
> > > 
> > > So why isn't the local_lock_t in struct per_cpu_pages ? That seems to be
> > > the actual object that is protected by it and is already per-cpu.
> > > 
> > > Is that because you want to avoid the duplication across zones? Is that
> > > worth the effort?
> > 
> > When I wrote the patch, the problem was that zone_pcp_reset freed the
> > per_cpu_pages structure and it was "protected" by local_irq_save(). If
> > that was converted to local_lock_irq then the structure containing the
> > lock is freed before it is released which is obviously bad.
> > 
> > Much later when trying to make the allocator RT-safe in general, I realised
> > that locking was broken and fixed it in patch 3 of this series. With that,
> > the local_lock could potentially be embedded within per_cpu_pages safely
> > at the end of this series.
> 
> Fair enough; I was just wondering why the obvious solution wasn't chosen
> and neither changelog nor comment explain, so I had to ask :-)

It's a fair question and it was my first approach before I hit problems.
Thinking again this morning, I remembered that another problem I hit was
patterns like this

        local_lock_irqsave(&pagesets.lock, flags);
        pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);

turning into

	cpu = get_cpu();
        pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
        local_lock_irqsave(&pcp->lock, flags);

That has its own problems if zone->lock was acquired within the
local_lock_irqsave section (Section "spinlock_t and rwlock_t" in
Documentation/locking/locktypes.rst) so it has to turn into

	migrate_disable();
	pcp = this_cpu_ptr(zone->per_cpu_pageset);
        local_lock_irqsave(&pcp->lock, flags);

I did not want to start adding migrate_disable() in multiple places like
this because I'm guessing that new users of migrate_disable() need strong
justification and adding such code in page_alloc.c might cause cargo-cult
copy&paste in the future. Maybe it could be addressed with a helper like
this_cpu_local_lock or this_cpu_local_lock_irq but that means in some
cases that the PCP structure is looked up twice with patterns like this one

        local_lock_irqsave(&pagesets.lock, flags);
        free_unref_page_commit(page, pfn, migratetype);
        local_unlock_irqrestore(&pagesets.lock, flags);

To get around multiple lookups the helper becomes something that disables
migration, looks up the PCP structure, locks it and returns it with
pcp then passed around as appropriate. Not sure what I would call that
helper :P

In the end I just gave up and kept it simple as there is no benefit to
!PREEMPT_RT which just disables IRQs. Maybe it'll be worth considering when
PREEMPT_RT is upstream and can be enabled. The series was functionally
tested on the PREEMPT_RT tree by reverting the page_alloc.c patch and
applies this series and all of its prerequisites on top.

-- 
Mel Gorman
SUSE Labs

  reply index

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-07 20:24 [PATCH 0/11 v2] Use local_lock for pcp protection and reduce stat overhead Mel Gorman
2021-04-07 20:24 ` [PATCH 01/11] mm/page_alloc: Split per cpu page lists and zone stats Mel Gorman
2021-04-12 17:43   ` Vlastimil Babka
2021-04-13 13:27     ` Mel Gorman
2021-04-07 20:24 ` [PATCH 02/11] mm/page_alloc: Convert per-cpu list protection to local_lock Mel Gorman
2021-04-08 10:52   ` Peter Zijlstra
2021-04-08 17:42     ` Mel Gorman
2021-04-09  6:39       ` Peter Zijlstra
2021-04-09  7:59         ` Mel Gorman [this message]
2021-04-09  8:24           ` Peter Zijlstra
2021-04-09 13:32             ` Mel Gorman
2021-04-09 18:55               ` Peter Zijlstra
2021-04-12 11:56                 ` Mel Gorman
2021-04-12 21:47                   ` Thomas Gleixner
2021-04-13 16:52                     ` Mel Gorman
2021-04-07 20:24 ` [PATCH 03/11] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove Mel Gorman
2021-04-07 20:24 ` [PATCH 04/11] mm/vmstat: Convert NUMA statistics to basic NUMA counters Mel Gorman
2021-04-14 12:56   ` Vlastimil Babka
2021-04-14 15:18     ` Mel Gorman
2021-04-14 15:56       ` Vlastimil Babka
2021-04-15 10:06         ` Mel Gorman
2021-04-07 20:24 ` [PATCH 05/11] mm/vmstat: Inline NUMA event counter updates Mel Gorman
2021-04-07 20:24 ` [PATCH 06/11] mm/page_alloc: Batch the accounting updates in the bulk allocator Mel Gorman
2021-04-07 20:24 ` [PATCH 07/11] mm/page_alloc: Reduce duration that IRQs are disabled for VM counters Mel Gorman
2021-04-07 20:24 ` [PATCH 08/11] mm/page_alloc: Remove duplicate checks if migratetype should be isolated Mel Gorman
2021-04-07 20:24 ` [PATCH 09/11] mm/page_alloc: Explicitly acquire the zone lock in __free_pages_ok Mel Gorman
2021-04-07 20:24 ` [PATCH 10/11] mm/page_alloc: Avoid conflating IRQs disabled with zone->lock Mel Gorman
2021-04-07 20:24 ` [PATCH 11/11] mm/page_alloc: Update PGFREE outside the zone lock in __free_pages_ok Mel Gorman
2021-04-08 10:56 ` [PATCH 0/11 v2] Use local_lock for pcp protection and reduce stat overhead Peter Zijlstra
2021-04-08 17:48   ` Mel Gorman
2021-04-14 13:39 [PATCH 0/11 v3] " Mel Gorman
2021-04-14 13:39 ` [PATCH 02/11] mm/page_alloc: Convert per-cpu list protection to local_lock Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210409075939.GJ3697@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=brouer@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=osalvador@suse.de \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-rt-users Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rt-users/0 linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ https://lore.kernel.org/linux-rt-users \
		linux-rt-users@vger.kernel.org
	public-inbox-index linux-rt-users

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rt-users


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git