From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6240C4332F for ; Fri, 8 Oct 2021 16:19:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 417C861038 for ; Fri, 8 Oct 2021 16:19:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 417C861038 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 569FD6B0072; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 515226B0073; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DC646B0074; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0128.hostedemail.com [216.40.44.128]) by kanga.kvack.org (Postfix) with ESMTP id 2F09C6B0072 for ; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id CE0AF184075AD for ; Fri, 8 Oct 2021 16:19:38 +0000 (UTC) X-FDA: 78673780836.16.FFC5501 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 59586D0020D3 for ; Fri, 8 Oct 2021 16:19:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633709977; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2fuLR+YRtRHAlItHHlAWKLPVpJwI/vzr35BiENgc3FU=; b=Spc1oS85z506bdKXS/4lPCD06xiaOAWDLTMgkxmkm57Cft5J1UADkVaLB4/wOoiGZVUmSk y9W56aZRGY2CMQAKkfzFQuwvMc5geJrTCsoFnXIAC4RxyjzvZHoNClr1i7SIMPTCH7utfa XwfwIZqO/eOdkDg93Hu+kaARPgL60J8= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-96-XcBDPIayPse_bdIkBKkNng-1; Fri, 08 Oct 2021 12:19:36 -0400 X-MC-Unique: XcBDPIayPse_bdIkBKkNng-1 Received: by mail-wr1-f70.google.com with SMTP id c4-20020a5d6cc4000000b00160edc8bb28so1919981wrc.9 for ; Fri, 08 Oct 2021 09:19:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2fuLR+YRtRHAlItHHlAWKLPVpJwI/vzr35BiENgc3FU=; b=v3V40VfMv8CIaZdamr/X1QjGoif7VVHuGLQQywrTzM8nH8FGraavGO9Gu3Krk1tqws SwmT4hxahKhMBXaDGnVnk0Ewcmnp4Rxo+iHj9yRH/c9KEZB7eDUbU9MHr4YAFwdLOLsQ ptnkKHJ4Qhq9gSODsV3sqSbcM0nLHbFMERXrcxbryedy44bRvbmoGpydBAF7MDMzAbPe mdSEoTYH0lsyiZEeedvgzqkH0aMTbMWMgwxxx0vWpvfAfuqVzkQH7cG69PjCJilMxA7p mwm6RpTNErNuXgr4IK5Fz0W9BaL2VVNiLRUCAY5KhCYB/txxpiutxrzfjmjflQIZW4p9 1XAw== X-Gm-Message-State: AOAM533nWmnDNeDXwoLw5IPcsmdQPUE1tcpz68iU2/rsX+MCfiiScPah IxQdRRgv2eZlgQdP7S6C9YAPrVAo3aI0cq1D2gA+LUTuHc9y7Y9tG+1ORYKnhjjcgDQBX0MNQha 1zCZaanM6w8c= X-Received: by 2002:a05:600c:154f:: with SMTP id f15mr4502435wmg.92.1633709975483; Fri, 08 Oct 2021 09:19:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxOQOshorop78BPPPmvY4365BmT9Zj6yiWDgdBD9GJTejFjNxj+YIOr2lmIyGa7Qfdh1ZDiUg== X-Received: by 2002:a05:600c:154f:: with SMTP id f15mr4502412wmg.92.1633709975241; Fri, 08 Oct 2021 09:19:35 -0700 (PDT) Received: from vian.redhat.com ([2a0c:5a80:1d03:b900:c3d1:5974:ce92:3123]) by smtp.gmail.com with ESMTPSA id f184sm2901753wmf.22.2021.10.08.09.19.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 09:19:34 -0700 (PDT) From: Nicolas Saenz Julienne To: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de, peterz@infradead.org, mtosatti@redhat.com, nilal@redhat.com, mgorman@suse.de, linux-rt-users@vger.kernel.org, vbabka@suse.cz, cl@linux.com, paulmck@kernel.org, ppandit@redhat.com, Nicolas Saenz Julienne Subject: [RFC 2/3] mm/page_alloc: Access lists in 'struct per_cpu_pages' indirectly Date: Fri, 8 Oct 2021 18:19:21 +0200 Message-Id: <20211008161922.942459-3-nsaenzju@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211008161922.942459-1-nsaenzju@redhat.com> References: <20211008161922.942459-1-nsaenzju@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 59586D0020D3 X-Stat-Signature: zn9aw8bob11fmypoax35kqwb674hq7ur Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Spc1oS85; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf15.hostedemail.com: domain of nsaenzju@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=nsaenzju@redhat.com X-HE-Tag: 1633709978-247655 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation to adding remote pcplists drain support, let's bundle 'struct per_cpu_pages' list heads and page count into a new structure, 'struct pcplists', and have all code access it indirectly through a pointer. It'll be used by upcoming patches, which will maintain multiple versions of pcplists and switch the pointer atomically. free_pcppages_bulk() also gains a new argument, since we want to avoid dereferencing the pcplists pointer twice per critical section (delimited by the pagevec local locks). 'struct pcplists' data is marked as __private, so as to make sure nobody accesses it directly, except for the initialization code. Note that 'struct per_cpu_pages' is used during boot, when no allocation is possible. Signed-off-by: Nicolas Saenz Julienne --- include/linux/mmzone.h | 10 +++++-- mm/page_alloc.c | 66 +++++++++++++++++++++++++----------------- mm/vmstat.c | 6 ++-- 3 files changed, 49 insertions(+), 33 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6a1d79d84675..fb023da9a181 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -358,7 +358,6 @@ enum zone_watermarks { =20 /* Fields and list protected by pagesets local_lock in page_alloc.c */ struct per_cpu_pages { - int count; /* number of pages in the list */ int high; /* high watermark, emptying needed */ int batch; /* chunk size for buddy add/remove */ short free_factor; /* batch scaling factor during free */ @@ -366,8 +365,13 @@ struct per_cpu_pages { short expire; /* When 0, remote pagesets are drained */ #endif =20 - /* Lists of pages, one per migrate type stored on the pcp-lists */ - struct list_head lists[NR_PCP_LISTS]; + struct pcplists *lp; + struct pcplists { + /* Number of pages in the lists */ + int count; + /* Lists of pages, one per migrate type stored on the pcp-lists */ + struct list_head lists[NR_PCP_LISTS]; + } __private pcplists; }; =20 struct per_cpu_zonestat { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dd89933503b4..842816f269da 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1438,7 +1438,8 @@ static inline void prefetch_buddy(struct page *page= ) * pinned" detection logic. */ static void free_pcppages_bulk(struct zone *zone, int count, - struct per_cpu_pages *pcp) + struct per_cpu_pages *pcp, + struct pcplists *lp) { int pindex =3D 0; int batch_free =3D 0; @@ -1453,7 +1454,7 @@ static void free_pcppages_bulk(struct zone *zone, i= nt count, * Ensure proper count is passed which otherwise would stuck in the * below while (list_empty(list)) loop. */ - count =3D min(pcp->count, count); + count =3D min(lp->count, count); while (count > 0) { struct list_head *list; =20 @@ -1468,7 +1469,7 @@ static void free_pcppages_bulk(struct zone *zone, i= nt count, batch_free++; if (++pindex =3D=3D NR_PCP_LISTS) pindex =3D 0; - list =3D &pcp->lists[pindex]; + list =3D &lp->lists[pindex]; } while (list_empty(list)); =20 /* This is the only non-empty list. Free them all. */ @@ -1508,7 +1509,7 @@ static void free_pcppages_bulk(struct zone *zone, i= nt count, } } while (count > 0 && --batch_free && !list_empty(list)); } - pcp->count -=3D nr_freed; + lp->count -=3D nr_freed; =20 /* * local_lock_irq held so equivalent to spin_lock_irqsave for @@ -3069,14 +3070,16 @@ static int rmqueue_bulk(struct zone *zone, unsign= ed int order, */ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) { + struct pcplists *lp; unsigned long flags; int to_drain, batch; =20 local_lock_irqsave(&pagesets.lock, flags); batch =3D READ_ONCE(pcp->batch); - to_drain =3D min(pcp->count, batch); + lp =3D pcp->lp; + to_drain =3D min(lp->count, batch); if (to_drain > 0) - free_pcppages_bulk(zone, to_drain, pcp); + free_pcppages_bulk(zone, to_drain, pcp, lp); local_unlock_irqrestore(&pagesets.lock, flags); } #endif @@ -3092,12 +3095,14 @@ static void drain_pages_zone(unsigned int cpu, st= ruct zone *zone) { unsigned long flags; struct per_cpu_pages *pcp; + struct pcplists *lp; =20 local_lock_irqsave(&pagesets.lock, flags); =20 pcp =3D per_cpu_ptr(zone->per_cpu_pageset, cpu); - if (pcp->count) - free_pcppages_bulk(zone, pcp->count, pcp); + lp =3D pcp->lp; + if (lp->count) + free_pcppages_bulk(zone, lp->count, pcp, lp); =20 local_unlock_irqrestore(&pagesets.lock, flags); } @@ -3158,7 +3163,7 @@ static void drain_local_pages_wq(struct work_struct= *work) * * drain_all_pages() is optimized to only execute on cpus where pcplists= are * not empty. The check for non-emptiness can however race with a free t= o - * pcplist that has not yet increased the pcp->count from 0 to 1. Caller= s + * pcplist that has not yet increased the lp->count from 0 to 1. Callers * that need the guarantee that every CPU has drained can disable the * optimizing racy check. */ @@ -3200,21 +3205,22 @@ static void __drain_all_pages(struct zone *zone, = bool force_all_cpus) struct per_cpu_pages *pcp; struct zone *z; bool has_pcps =3D false; + struct pcplists *lp; =20 if (force_all_cpus) { /* - * The pcp.count check is racy, some callers need a + * The lp->count check is racy, some callers need a * guarantee that no cpu is missed. */ has_pcps =3D true; } else if (zone) { - pcp =3D per_cpu_ptr(zone->per_cpu_pageset, cpu); - if (pcp->count) + lp =3D per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp; + if (lp->count) has_pcps =3D true; } else { for_each_populated_zone(z) { - pcp =3D per_cpu_ptr(z->per_cpu_pageset, cpu); - if (pcp->count) { + lp =3D per_cpu_ptr(z->per_cpu_pageset, cpu)->lp; + if (lp->count) { has_pcps =3D true; break; } @@ -3366,19 +3372,21 @@ static void free_unref_page_commit(struct page *p= age, unsigned long pfn, { struct zone *zone =3D page_zone(page); struct per_cpu_pages *pcp; + struct pcplists *lp; int high; int pindex; =20 __count_vm_event(PGFREE); pcp =3D this_cpu_ptr(zone->per_cpu_pageset); + lp =3D pcp->lp; pindex =3D order_to_pindex(migratetype, order); - list_add(&page->lru, &pcp->lists[pindex]); - pcp->count +=3D 1 << order; + list_add(&page->lru, &lp->lists[pindex]); + lp->count +=3D 1 << order; high =3D nr_pcp_high(pcp, zone); - if (pcp->count >=3D high) { + if (lp->count >=3D high) { int batch =3D READ_ONCE(pcp->batch); =20 - free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp); + free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, lp); } } =20 @@ -3603,9 +3611,11 @@ struct page *__rmqueue_pcplist(struct zone *zone, = unsigned int order, struct per_cpu_pages *pcp) { struct list_head *list; + struct pcplists *lp; struct page *page; =20 - list =3D &pcp->lists[order_to_pindex(migratetype, order)]; + lp =3D pcp->lp; + list =3D &lp->lists[order_to_pindex(migratetype, order)]; =20 do { if (list_empty(list)) { @@ -3625,14 +3635,14 @@ struct page *__rmqueue_pcplist(struct zone *zone,= unsigned int order, batch, list, migratetype, alloc_flags); =20 - pcp->count +=3D alloced << order; + lp->count +=3D alloced << order; if (unlikely(list_empty(list))) return NULL; } =20 page =3D list_first_entry(list, struct page, lru); list_del(&page->lru); - pcp->count -=3D 1 << order; + lp->count -=3D 1 << order; } while (check_new_pcp(page)); =20 return page; @@ -5877,7 +5887,7 @@ void show_free_areas(unsigned int filter, nodemask_= t *nodemask) continue; =20 for_each_online_cpu(cpu) - free_pcp +=3D per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; + free_pcp +=3D per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp->count; } =20 printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n" @@ -5971,7 +5981,7 @@ void show_free_areas(unsigned int filter, nodemask_= t *nodemask) =20 free_pcp =3D 0; for_each_online_cpu(cpu) - free_pcp +=3D per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; + free_pcp +=3D per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp->count; =20 show_node(zone); printk(KERN_CONT @@ -6012,7 +6022,7 @@ void show_free_areas(unsigned int filter, nodemask_= t *nodemask) K(zone_page_state(zone, NR_MLOCK)), K(zone_page_state(zone, NR_BOUNCE)), K(free_pcp), - K(this_cpu_read(zone->per_cpu_pageset->count)), + K(this_cpu_read(zone->per_cpu_pageset)->lp->count), K(zone_page_state(zone, NR_FREE_CMA_PAGES))); printk("lowmem_reserve[]:"); for (i =3D 0; i < MAX_NR_ZONES; i++) @@ -6848,7 +6858,7 @@ static int zone_highsize(struct zone *zone, int bat= ch, int cpu_online) =20 /* * pcp->high and pcp->batch values are related and generally batch is lo= wer - * than high. They are also related to pcp->count such that count is low= er + * than high. They are also related to pcp->lp->count such that count is= lower * than high, and as soon as it reaches high, the pcplist is flushed. * * However, guaranteeing these relations at all times would require e.g.= write @@ -6856,7 +6866,7 @@ static int zone_highsize(struct zone *zone, int bat= ch, int cpu_online) * thus be prone to error and bad for performance. Thus the update only = prevents * store tearing. Any new users of pcp->batch and pcp->high should ensur= e they * can cope with those fields changing asynchronously, and fully trust o= nly the - * pcp->count field on the local CPU with interrupts disabled. + * pcp->lp->count field on the local CPU with interrupts disabled. * * mutex_is_locked(&pcp_batch_high_lock) required when calling this func= tion * outside of boot time (or some other assurance that no concurrent upda= ters @@ -6876,8 +6886,10 @@ static void per_cpu_pages_init(struct per_cpu_page= s *pcp, struct per_cpu_zonesta memset(pcp, 0, sizeof(*pcp)); memset(pzstats, 0, sizeof(*pzstats)); =20 + pcp->lp =3D &ACCESS_PRIVATE(pcp, pcplists); + for (pindex =3D 0; pindex < NR_PCP_LISTS; pindex++) - INIT_LIST_HEAD(&pcp->lists[pindex]); + INIT_LIST_HEAD(&pcp->lp->lists[pindex]); =20 /* * Set batch and high values safe for a boot pageset. A true percpu diff --git a/mm/vmstat.c b/mm/vmstat.c index 8ce2620344b2..5279d3f34e0b 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -856,7 +856,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) * if not then there is nothing to expire. */ if (!__this_cpu_read(pcp->expire) || - !__this_cpu_read(pcp->count)) + !this_cpu_ptr(pcp)->lp->count) continue; =20 /* @@ -870,7 +870,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) if (__this_cpu_dec_return(pcp->expire)) continue; =20 - if (__this_cpu_read(pcp->count)) { + if (this_cpu_ptr(pcp)->lp->count) { drain_zone_pages(zone, this_cpu_ptr(pcp)); changes++; } @@ -1707,7 +1707,7 @@ static void zoneinfo_show_print(struct seq_file *m,= pg_data_t *pgdat, "\n high: %i" "\n batch: %i", i, - pcp->count, + pcp->lp->count, pcp->high, pcp->batch); #ifdef CONFIG_SMP --=20 2.31.1