From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B447C4361B for ; Tue, 15 Dec 2020 03:10:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F213A22473 for ; Tue, 15 Dec 2020 03:10:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F213A22473 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 93BC18D0038; Mon, 14 Dec 2020 22:10:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EB0D8D001C; Mon, 14 Dec 2020 22:10:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 828E28D0038; Mon, 14 Dec 2020 22:10:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0008.hostedemail.com [216.40.44.8]) by kanga.kvack.org (Postfix) with ESMTP id 6B4888D001C for ; Mon, 14 Dec 2020 22:10:52 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 42C2B8249980 for ; Tue, 15 Dec 2020 03:10:52 +0000 (UTC) X-FDA: 77594039544.03.pump99_5600ef127420 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 25FC328A4E9 for ; Tue, 15 Dec 2020 03:10:52 +0000 (UTC) X-HE-Tag: pump99_5600ef127420 X-Filterd-Recvd-Size: 5879 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 03:10:51 +0000 (UTC) Date: Mon, 14 Dec 2020 19:10:50 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608001850; bh=RoNQlnEM9uvViSW0ZBmHZMnhv15RIhpWOBrmm3NI+X4=; h=From:To:Subject:In-Reply-To:From; b=Ub8vwdQN8zWvx50EWzjGHNJkN7fVwEu/fK5JB0hkbo+y+aPNq3kG5Wb3Rwfa8GVeA XZ5SENXk1H1EcEex/tlszRikmrt7jlvSEQrcru8LLd1ZMHBJeq0UCdIpDJZuheheqe q6E9WxX4eLbkBcXx5mgmRfIyOekKeaFlDQ4+0dpU= From: Andrew Morton To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, mhocko@suse.com, mm-commits@vger.kernel.org, osalvador@suse.de, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 126/200] mm, page_alloc: simplify pageset_update() Message-ID: <20201215031050.rkdPr6y_m%akpm@linux-foundation.org> In-Reply-To: <20201214190237.a17b70ae14f129e2dca3d204@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Vlastimil Babka Subject: mm, page_alloc: simplify pageset_update() pageset_update() attempts to update pcplist's high and batch values in a way that readers don't observe batch > high. It uses smp_wmb() to order the updates in a way to achieve this. However, without proper pairing read barriers in readers this guarantee doesn't hold, and there are no such barriers in e.g. free_unref_page_commit(). Commit 88e8ac11d2ea ("mm, page_alloc: fix core hung in free_pcppages_bulk()") already showed this is problematic, and solved this by ultimately only trusing pcp->count of the current cpu with interrupts disabled. The update dance with unpaired write barriers thus makes no sense. Replace them with plain WRITE_ONCE to prevent store tearing, and document that the values can change asynchronously and should not be trusted for correctness. All current readers appear to be OK after 88e8ac11d2ea. Convert them to READ_ONCE to prevent unnecessary read tearing, but mainly to alert anybody making future changes to the code that special care is needed. Link: https://lkml.kernel.org/r/20201111092812.11329-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka Reviewed-by: Oscar Salvador Acked-by: David Hildenbrand Acked-by: Michal Hocko Signed-off-by: Andrew Morton --- mm/page_alloc.c | 42 +++++++++++++++++++----------------------- 1 file changed, 19 insertions(+), 23 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-simplify-pageset_update +++ a/mm/page_alloc.c @@ -1344,7 +1344,7 @@ static void free_pcppages_bulk(struct zo { int migratetype = 0; int batch_free = 0; - int prefetch_nr = 0; + int prefetch_nr = READ_ONCE(pcp->batch); bool isolated_pageblocks; struct page *page, *tmp; LIST_HEAD(head); @@ -1395,8 +1395,10 @@ static void free_pcppages_bulk(struct zo * avoid excessive prefetching due to large count, only * prefetch buddy for the first pcp->batch nr of pages. */ - if (prefetch_nr++ < pcp->batch) + if (prefetch_nr) { prefetch_buddy(page); + prefetch_nr--; + } } while (--count && --batch_free && !list_empty(list)); } @@ -3197,10 +3199,8 @@ static void free_unref_page_commit(struc pcp = &this_cpu_ptr(zone->pageset)->pcp; list_add(&page->lru, &pcp->lists[migratetype]); pcp->count++; - if (pcp->count >= pcp->high) { - unsigned long batch = READ_ONCE(pcp->batch); - free_pcppages_bulk(zone, batch, pcp); - } + if (pcp->count >= READ_ONCE(pcp->high)) + free_pcppages_bulk(zone, READ_ONCE(pcp->batch), pcp); } /* @@ -3385,7 +3385,7 @@ static struct page *__rmqueue_pcplist(st do { if (list_empty(list)) { pcp->count += rmqueue_bulk(zone, 0, - pcp->batch, list, + READ_ONCE(pcp->batch), list, migratetype, alloc_flags); if (unlikely(list_empty(list))) return NULL; @@ -6270,13 +6270,16 @@ static int zone_batchsize(struct zone *z } /* - * pcp->high and pcp->batch values are related and dependent on one another: - * ->batch must never be higher then ->high. - * The following function updates them in a safe manner without read side - * locking. - * - * Any new users of pcp->batch and pcp->high should ensure they can cope with - * those fields changing asynchronously (acording to the above rule). + * pcp->high and pcp->batch values are related and generally batch is lower + * than high. They are also related to pcp->count such that count is lower + * than high, and as soon as it reaches high, the pcplist is flushed. + * + * However, guaranteeing these relations at all times would require e.g. write + * barriers here but also careful usage of read barriers at the read side, and + * thus be prone to error and bad for performance. Thus the update only prevents + * store tearing. Any new users of pcp->batch and pcp->high should ensure they + * can cope with those fields changing asynchronously, and fully trust only the + * pcp->count field on the local CPU with interrupts disabled. * * mutex_is_locked(&pcp_batch_high_lock) required when calling this function * outside of boot time (or some other assurance that no concurrent updaters @@ -6285,15 +6288,8 @@ static int zone_batchsize(struct zone *z static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, unsigned long batch) { - /* start with a fail safe value for batch */ - pcp->batch = 1; - smp_wmb(); - - /* Update high, then batch, in order */ - pcp->high = high; - smp_wmb(); - - pcp->batch = batch; + WRITE_ONCE(pcp->batch, batch); + WRITE_ONCE(pcp->high, high); } static void pageset_init(struct per_cpu_pageset *p) _