From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16218C433E0 for ; Wed, 3 Jun 2020 23:03:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BEB25221F7 for ; Wed, 3 Jun 2020 23:03:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="huzf3819" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BEB25221F7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6B49D28007C; Wed, 3 Jun 2020 19:03:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6652C28006C; Wed, 3 Jun 2020 19:03:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5563628007C; Wed, 3 Jun 2020 19:03:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0244.hostedemail.com [216.40.44.244]) by kanga.kvack.org (Postfix) with ESMTP id 3F45A28006C for ; Wed, 3 Jun 2020 19:03:12 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0E5A08248D51 for ; Wed, 3 Jun 2020 23:03:12 +0000 (UTC) X-FDA: 76889428224.01.fact89_38f839ab20c34 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id D00B018001185 for ; Wed, 3 Jun 2020 23:03:11 +0000 (UTC) X-HE-Tag: fact89_38f839ab20c34 X-Filterd-Recvd-Size: 7872 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Wed, 3 Jun 2020 23:03:11 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5A565221F3; Wed, 3 Jun 2020 23:03:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1591225390; bh=xF87H2eoB3yEVSc2y53V6A9OSH+tdxBaphPvluwfFgU=; h=Date:From:To:Subject:In-Reply-To:From; b=huzf38197vk3FH3F7bOlyjjC6It9TdR3t48WP+iP+K+nqDPsR24Shio3tHImBOQQj tjDvje0ZTBYFF2UQpjy3dD/BzQapZ8GPSp+Kq0Sk8yr2pV2j9ZDoCVt9PbqD7Oppig vrj1JfACNRqISGvxDBCzbk5E9hc9853ne+g0dHo0= Date: Wed, 03 Jun 2020 16:03:09 -0700 From: Andrew Morton To: akpm@linux-foundation.org, hannes@cmpxchg.org, iamjoonsoo.kim@lge.com, linux-mm@kvack.org, mhocko@suse.com, minchan@kernel.org, mm-commits@vger.kernel.org, riel@surriel.com, torvalds@linux-foundation.org Subject: [patch 115/131] mm: vmscan: reclaim writepage is IO cost Message-ID: <20200603230309.JD8tPaup2%akpm@linux-foundation.org> In-Reply-To: <20200603155549.e041363450869eaae4c7f05b@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Queue-Id: D00B018001185 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Johannes Weiner Subject: mm: vmscan: reclaim writepage is IO cost The VM tries to balance reclaim pressure between anon and file so as to reduce the amount of IO incurred due to the memory shortage. It already counts refaults and swapins, but in addition it should also count writepage calls during reclaim. For swap, this is obvious: it's IO that wouldn't have occurred if the anonymous memory hadn't been under memory pressure. From a relative balancing point of view this makes sense as well: even if anon is cold and reclaimable, a cache that isn't thrashing may have equally cold pages that don't require IO to reclaim. For file writeback, it's trickier: some of the reclaim writepage IO would have likely occurred anyway due to dirty expiration. But not all of it - premature writeback reduces batching and generates additional writes. Since the flushers are already woken up by the time the VM starts writing cache pages one by one, let's assume that we'e likely causing writes that wouldn't have happened without memory pressure. In addition, the per-page cost of IO would have probably been much cheaper if written in larger batches from the flusher thread rather than the single-page-writes from kswapd. For our purposes - getting the trend right to accelerate convergence on a stable state that doesn't require paging at all - this is sufficiently accurate. If we later wanted to optimize for sustained thrashing, we can still refine the measurements. Count all writepage calls from kswapd as IO cost toward the LRU that the page belongs to. Why do this dynamically? Don't we know in advance that anon pages require IO to reclaim, and so could build in a static bias? First, scanning is not the same as reclaiming. If all the anon pages are referenced, we may not swap for a while just because we're scanning the anon list. During this time, however, it's important that we age anonymous memory and the page cache at the same rate so that their hot-cold gradients are comparable. Everything else being equal, we still want to reclaim the coldest memory overall. Second, we keep copies in swap unless the page changes. If there is swap-backed data that's mostly read (tmpfs file) and has been swapped out before, we can reclaim it without incurring additional IO. Link: http://lkml.kernel.org/r/20200520232525.798933-14-hannes@cmpxchg.org Signed-off-by: Johannes Weiner Cc: Joonsoo Kim Cc: Michal Hocko Cc: Minchan Kim Cc: Rik van Riel Signed-off-by: Andrew Morton --- include/linux/swap.h | 4 +++- include/linux/vmstat.h | 1 + mm/swap.c | 16 ++++++++++------ mm/swap_state.c | 2 +- mm/vmscan.c | 3 +++ mm/workingset.c | 2 +- 6 files changed, 19 insertions(+), 9 deletions(-) --- a/include/linux/swap.h~mm-vmscan-reclaim-writepage-is-io-cost +++ a/include/linux/swap.h @@ -334,7 +334,9 @@ extern unsigned long nr_free_pagecache_p /* linux/mm/swap.c */ -extern void lru_note_cost(struct page *); +extern void lru_note_cost(struct lruvec *lruvec, bool file, + unsigned int nr_pages); +extern void lru_note_cost_page(struct page *); extern void lru_cache_add(struct page *); extern void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *head); --- a/include/linux/vmstat.h~mm-vmscan-reclaim-writepage-is-io-cost +++ a/include/linux/vmstat.h @@ -26,6 +26,7 @@ struct reclaim_stat { unsigned nr_congested; unsigned nr_writeback; unsigned nr_immediate; + unsigned nr_pageout; unsigned nr_activate[2]; unsigned nr_ref_keep; unsigned nr_unmap_fail; --- a/mm/swap.c~mm-vmscan-reclaim-writepage-is-io-cost +++ a/mm/swap.c @@ -278,18 +278,16 @@ void rotate_reclaimable_page(struct page } } -void lru_note_cost(struct page *page) +void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages) { - struct lruvec *lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - do { unsigned long lrusize; /* Record cost event */ - if (page_is_file_lru(page)) - lruvec->file_cost++; + if (file) + lruvec->file_cost += nr_pages; else - lruvec->anon_cost++; + lruvec->anon_cost += nr_pages; /* * Decay previous events @@ -311,6 +309,12 @@ void lru_note_cost(struct page *page) } while ((lruvec = parent_lruvec(lruvec))); } +void lru_note_cost_page(struct page *page) +{ + lru_note_cost(mem_cgroup_page_lruvec(page, page_pgdat(page)), + page_is_file_lru(page), hpage_nr_pages(page)); +} + static void __activate_page(struct page *page, struct lruvec *lruvec, void *arg) { --- a/mm/swap_state.c~mm-vmscan-reclaim-writepage-is-io-cost +++ a/mm/swap_state.c @@ -442,7 +442,7 @@ struct page *__read_swap_cache_async(swp /* XXX: Move to lru_cache_add() when it supports new vs putback */ spin_lock_irq(&page_pgdat(page)->lru_lock); - lru_note_cost(page); + lru_note_cost_page(page); spin_unlock_irq(&page_pgdat(page)->lru_lock); /* Caller will initiate read into locked page */ --- a/mm/vmscan.c~mm-vmscan-reclaim-writepage-is-io-cost +++ a/mm/vmscan.c @@ -1359,6 +1359,8 @@ static unsigned int shrink_page_list(str case PAGE_ACTIVATE: goto activate_locked; case PAGE_SUCCESS: + stat->nr_pageout += hpage_nr_pages(page); + if (PageWriteback(page)) goto keep; if (PageDirty(page)) @@ -1964,6 +1966,7 @@ shrink_inactive_list(unsigned long nr_to move_pages_to_lru(lruvec, &page_list); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); + lru_note_cost(lruvec, file, stat.nr_pageout); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); --- a/mm/workingset.c~mm-vmscan-reclaim-writepage-is-io-cost +++ a/mm/workingset.c @@ -367,7 +367,7 @@ void workingset_refault(struct page *pag SetPageWorkingset(page); /* XXX: Move to lru_cache_add() when it supports new vs putback */ spin_lock_irq(&page_pgdat(page)->lru_lock); - lru_note_cost(page); + lru_note_cost_page(page); spin_unlock_irq(&page_pgdat(page)->lru_lock); inc_lruvec_state(lruvec, WORKINGSET_RESTORE); } _