From patchwork Wed May 20 23:25:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245269 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C3A8C433DF for ; Wed, 20 May 2020 23:26:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3C5020759 for ; Wed, 20 May 2020 23:26:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="Lb7ndAEu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728721AbgETX0D (ORCPT ); Wed, 20 May 2020 19:26:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728683AbgETX0C (ORCPT ); Wed, 20 May 2020 19:26:02 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1933CC061A0F for ; Wed, 20 May 2020 16:26:02 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id v4so4103712qte.3 for ; Wed, 20 May 2020 16:26:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2ozutM1RuMVFVDvSIK8bxM1gfp4d+rFh7jBPaadPhlY=; b=Lb7ndAEuNhjpm7qFW7cY/KXnvFygKYOdT4JswXczLPm909v/DQ8GR7bbm44g/X0spu p+/16sYWRdQx8Pm26HKQwjp4euw6hp95qZ8If2y+6Lox6HQZx2c+LIwq5Y0oPZxBzXs3 ZA3Arusf3P6B4eajc4K3WyFw2kA4iSUUg1kEJ96RgKwwEWmlVQ2wDIvpq6YM/ddUvEKn oEjySAahIKLRb9VG1+Twv7hYdcsJI4ETtJdTABdfTl7Opktcpi2wNjDAtfxC2xkALi/K o2svRWYCilFhKKJYzc7pz7+IYAeMn0+Gbgr0Txxt00hvGDR9FzKc8alzfLV2+uOXOopH 9JOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2ozutM1RuMVFVDvSIK8bxM1gfp4d+rFh7jBPaadPhlY=; b=jdhjcqyoDjgU6CdxxmhXKjK3yNnzuW83PssKutp06G0X8Vp9PEnbfBWZUurQm8ZWir RdcRXBxEcekRLQeVjSxw6XSrsrmDrNV6Y1TvFWcDs6B0SubcyGQKiOCQXyTyxtP/yovw b/sjA0tHlqdyIecMpNmejivuwUwfYBxMx0XXL6ccbyWSa0so+xzfru/ZEAQQ93QBGOwB jzlcvfH0KLW90duHg/XSrg6cTN34djwofAldspr3EH8DBUPi0ye7e0vDdKmDzCRakh6P CZqOTZdLL3zzlhh3D/NAdVSZ38o9VhGmfHSsZVv9qZJZqEPnx0N1V8KrynAYtTnISHa/ AYDg== X-Gm-Message-State: AOAM533TTZEPoeEdWThlLOwJN+wujtmNwyjqgDpX1/XA/lJkjn8uIQTh 8G90/SWctifrisAlP7FvKLDdZA== X-Google-Smtp-Source: ABdhPJwHDsn4V5BDMrhnRfBkMRRvO8wA+GSitS0IWz8FXvpMWlAjzbtWutB3L5QWvt/43zchkIBqSg== X-Received: by 2002:ac8:6615:: with SMTP id c21mr8107277qtp.185.1590017161371; Wed, 20 May 2020 16:26:01 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id l6sm2544316qkc.59.2020.05.20.16.26.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:00 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 01/14] mm: fix LRU balancing effect of new transparent huge pages Date: Wed, 20 May 2020 19:25:12 -0400 Message-Id: <20200520232525.798933-2-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, THP are counted as single pages until they are split right before being swapped out. However, at that point the VM is already in the middle of reclaim, and adjusting the LRU balance then is useless. Always account THP by the number of basepages, and remove the fixup from the splitting path. Signed-off-by: Johannes Weiner Reviewed-by: Rik van Riel Acked-by: Michal Hocko Acked-by: Minchan Kim Reviewed-by: Shakeel Butt --- mm/swap.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index bf9a79fed62d..68eae1e2787a 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -263,13 +263,14 @@ void rotate_reclaimable_page(struct page *page) } static void update_page_reclaim_stat(struct lruvec *lruvec, - int file, int rotated) + int file, int rotated, + unsigned int nr_pages) { struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; - reclaim_stat->recent_scanned[file]++; + reclaim_stat->recent_scanned[file] += nr_pages; if (rotated) - reclaim_stat->recent_rotated[file]++; + reclaim_stat->recent_rotated[file] += nr_pages; } static void __activate_page(struct page *page, struct lruvec *lruvec, @@ -286,7 +287,7 @@ static void __activate_page(struct page *page, struct lruvec *lruvec, trace_mm_lru_activate(page); __count_vm_event(PGACTIVATE); - update_page_reclaim_stat(lruvec, file, 1); + update_page_reclaim_stat(lruvec, file, 1, hpage_nr_pages(page)); } } @@ -541,7 +542,7 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, if (active) __count_vm_event(PGDEACTIVATE); - update_page_reclaim_stat(lruvec, file, 0); + update_page_reclaim_stat(lruvec, file, 0, hpage_nr_pages(page)); } static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, @@ -557,7 +558,7 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, add_page_to_lru_list(page, lruvec, lru); __count_vm_events(PGDEACTIVATE, hpage_nr_pages(page)); - update_page_reclaim_stat(lruvec, file, 0); + update_page_reclaim_stat(lruvec, file, 0, hpage_nr_pages(page)); } } @@ -582,7 +583,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, __count_vm_events(PGLAZYFREE, hpage_nr_pages(page)); count_memcg_page_event(page, PGLAZYFREE); - update_page_reclaim_stat(lruvec, 1, 0); + update_page_reclaim_stat(lruvec, 1, 0, hpage_nr_pages(page)); } } @@ -890,8 +891,6 @@ EXPORT_SYMBOL(__pagevec_release); void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *list) { - const int file = 0; - VM_BUG_ON_PAGE(!PageHead(page), page); VM_BUG_ON_PAGE(PageCompound(page_tail), page); VM_BUG_ON_PAGE(PageLRU(page_tail), page); @@ -917,9 +916,6 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, add_page_to_lru_list_tail(page_tail, lruvec, page_lru(page_tail)); } - - if (!PageUnevictable(page)) - update_page_reclaim_stat(lruvec, file, PageActive(page_tail)); } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -962,8 +958,9 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, if (page_evictable(page)) { lru = page_lru(page); - update_page_reclaim_stat(lruvec, page_is_file_lru(page), - PageActive(page)); + update_page_reclaim_stat(lruvec, is_file_lru(lru), + PageActive(page), + hpage_nr_pages(page)); if (was_unevictable) count_vm_event(UNEVICTABLE_PGRESCUED); } else { From patchwork Wed May 20 23:25:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245256 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 499A3C433E3 for ; Wed, 20 May 2020 23:26:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2628E20825 for ; Wed, 20 May 2020 23:26:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="1G6DXIJG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728769AbgETX0I (ORCPT ); Wed, 20 May 2020 19:26:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728462AbgETX0E (ORCPT ); Wed, 20 May 2020 19:26:04 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0288AC061A0E for ; Wed, 20 May 2020 16:26:04 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id c24so4078345qtw.7 for ; Wed, 20 May 2020 16:26:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MdHcjhZvow6usaHH6IOhGcdeIOnMYFh9EQeCamKBCKk=; b=1G6DXIJGh/ylA6R/q/RaFWJ/W8j3BBPQO2vuLJ34z1yQroPUpS9pk1ZGKeTrxnf+Hc tmEefkJvFEGoQ4hj5qcK/cg+QothlLtdAhEPtj2S/2wBMP1KY44qMyPfBrtAHZcOL0JR dyRAhgbmJlLBDULSLg8Jx5WbzHO5Ao+TvQdM1srUzVPMV9UA78m5ObhgZWwNE7/2CyEo ELuGqWPDdsKvqkf8trDnk4xL5c0IBxcalipILLmL+FJ7Rrg/xOLvpdUmTDvsKNiCmVws 4Yku9O4MgI2DOBcZSsnFrOVKVA6PoKUnW5wC5+kQUkGl335nOvyVgUGx18XaM5BLDeW+ Bc2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MdHcjhZvow6usaHH6IOhGcdeIOnMYFh9EQeCamKBCKk=; b=TG453vRCbqmicJujUu4lCF4ykUw/du8qijL5X15bQDSb75LQiKWEidB6ziJ/JbPUIy yRphMm1pxy0CGkrgFfGfYxB03Rijlt17ikIbYtZ+BYLs+26P8fGC1JjZwaNNXCF9Ihjy cpb8fHawT2+Jwb0uygJ0ETG+Ak3807xjq9bchIx65PxqXNeZjstiC6mAZUeMTn0UNCqL kK6uFzb1CbpotunXseqv+d5DJQA8MOM7jGpDmd8cJ61Huvnd+gbJl8w03uu9atARBc0Z FiNv4Qa/Vkkb0bMAKS+c1gRAyYYhiinMwXZl7Brabwxi+1waaB4VBv+c+cr/Zvl8hLRq muRQ== X-Gm-Message-State: AOAM532CB65YphvIwwvzr+afYT6VA+PtZLtsCNTIlmnNX5DI1wtsvKvk D9gDxscMD7aBcprazTxsdhA0Ug== X-Google-Smtp-Source: ABdhPJwXIhLQpueVxZvCxqygroC8yldBD9x4uWr+VG50vq899PbZhDNyaHBtVUWFrf+89u7XJLc0mw== X-Received: by 2002:aed:221c:: with SMTP id n28mr7910631qtc.235.1590017162988; Wed, 20 May 2020 16:26:02 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id t21sm3841161qtb.0.2020.05.20.16.26.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:02 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 02/14] mm: keep separate anon and file statistics on page reclaim activity Date: Wed, 20 May 2020 19:25:13 -0400 Message-Id: <20200520232525.798933-3-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Having statistics on pages scanned and pages reclaimed for both anon and file pages makes it easier to evaluate changes to LRU balancing. While at it, clean up the stat-keeping mess for isolation, putback, reclaim stats etc. a bit: first the physical LRU operation (isolation and putback), followed by vmstats, reclaim_stats, and then vm events. Signed-off-by: Johannes Weiner --- include/linux/vm_event_item.h | 4 ++++ mm/vmscan.c | 17 +++++++++-------- mm/vmstat.c | 4 ++++ 3 files changed, 17 insertions(+), 8 deletions(-) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index ffef0f279747..24fc7c3ae7d6 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -35,6 +35,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, PGSCAN_KSWAPD, PGSCAN_DIRECT, PGSCAN_DIRECT_THROTTLE, + PGSCAN_ANON, + PGSCAN_FILE, + PGSTEAL_ANON, + PGSTEAL_FILE, #ifdef CONFIG_NUMA PGSCAN_ZONE_RECLAIM_FAILED, #endif diff --git a/mm/vmscan.c b/mm/vmscan.c index cc555903a332..70b0e2c6a4b9 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1913,7 +1913,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, unsigned int nr_reclaimed = 0; unsigned long nr_taken; struct reclaim_stat stat; - int file = is_file_lru(lru); + bool file = is_file_lru(lru); enum vm_event_item item; struct pglist_data *pgdat = lruvec_pgdat(lruvec); struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; @@ -1941,11 +1941,12 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; - item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned); + __count_vm_events(PGSCAN_ANON + file, nr_scanned); + spin_unlock_irq(&pgdat->lru_lock); if (nr_taken == 0) @@ -1956,16 +1957,16 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, spin_lock_irq(&pgdat->lru_lock); + move_pages_to_lru(lruvec, &page_list); + + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); + reclaim_stat->recent_rotated[0] += stat.nr_activate[0]; + reclaim_stat->recent_rotated[1] += stat.nr_activate[1]; item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); - reclaim_stat->recent_rotated[0] += stat.nr_activate[0]; - reclaim_stat->recent_rotated[1] += stat.nr_activate[1]; - - move_pages_to_lru(lruvec, &page_list); - - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); + __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); spin_unlock_irq(&pgdat->lru_lock); diff --git a/mm/vmstat.c b/mm/vmstat.c index 4b0c90e4de71..3d06bd89d9ec 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1201,6 +1201,10 @@ const char * const vmstat_text[] = { "pgscan_kswapd", "pgscan_direct", "pgscan_direct_throttle", + "pgscan_anon", + "pgscan_file", + "pgsteal_anon", + "pgsteal_file", #ifdef CONFIG_NUMA "zone_reclaim_failed", From patchwork Wed May 20 23:25:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245267 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF224C433DF for ; Wed, 20 May 2020 23:26:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7CEC620759 for ; Wed, 20 May 2020 23:26:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="qoEmy1tK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728955AbgETX0u (ORCPT ); Wed, 20 May 2020 19:26:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728745AbgETX0F (ORCPT ); Wed, 20 May 2020 19:26:05 -0400 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6EDAC061A0E for ; Wed, 20 May 2020 16:26:05 -0700 (PDT) Received: by mail-qv1-xf43.google.com with SMTP id v15so2245823qvr.8 for ; Wed, 20 May 2020 16:26:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=AKHbKCUcHFNLRiN+4/B8hcf+bCyhRn4jQIo/5RZRxow=; b=qoEmy1tKSC0xVCapmBoUjzzu0j0Aj1hc9MAHx6UPt552xaOa7/mQRu++KzukzzNZfI cbpQWIDPyXI7+NHRVaY407+x+tWNgSW3vroKqtQezT1T1NMV23w7nWrp95GB9j0wemtV 5b2TEXx1rWN2NAUKHxy3gEEm1DZkJmUEDomarBU/lnllwWeeOXsqvu3SnmAlT7vRsF5u VcpvqECxLTn+kF4zAZbQDeANdJzm8nwOKue9fLo+hzoZQV/kKSQ4Prxh0UsIjICys1Hb Udy11DHrzA3j0Nw3cr5UJlj9H9X8ZYBYZ0US2EzRxw+sD3LKVmtrSMLqmU4mupf6yTUz Bh6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AKHbKCUcHFNLRiN+4/B8hcf+bCyhRn4jQIo/5RZRxow=; b=Sd0g9OmDLc7NaSDQLbehqOcLGdiNTXtEBX4xnbrGKbt+nrSd23yFvtzUb2xA82ffqp ozl4/25kxOjmhcD/gclb9u/Ulqys2Ox5sefvNVqBjFdyHhx5j7/SOEK5Ka6LmqrpgYj8 BQgPtj1TBIN7gF5b14qQsIx70PlsMczfktjxO/jK2l0W8HHGWHMq3Hp8VyPXkNWtfC2V Zohjj0t4ZEjQuQzX3qKu3CTbaxHFvdZ6INWVxssxX3Z6WaMNQbx50Rx+PJ10HLdDKXmm FuXTYF9M9wctrNlPlG2Q/Dv7d0vX2D6q53LY3AN+TDsUV+kfnIJKVgld/RJakJS8yYwQ yD/Q== X-Gm-Message-State: AOAM533cRdWyrPnzPBzI4qRb3DEehr4mg4eeh3KvaCPKpXUDbpSTP94m Q5utSonlhRbZ7eloikasIhU7sQ== X-Google-Smtp-Source: ABdhPJwp9ZX1EUTNRzadr7k2Tw2KHmAiDcrX312wi7ypQUlBcRghIKz1r+baKMwt5HCjHNgueyfdGw== X-Received: by 2002:ad4:4e6f:: with SMTP id ec15mr7582144qvb.88.1590017164865; Wed, 20 May 2020 16:26:04 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id h15sm1879288qkh.18.2020.05.20.16.26.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:04 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 03/14] mm: allow swappiness that prefers reclaiming anon over the file workingset Date: Wed, 20 May 2020 19:25:14 -0400 Message-Id: <20200520232525.798933-4-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With the advent of fast random IO devices (SSDs, PMEM) and in-memory swap devices such as zswap, it's possible for swap to be much faster than filesystems, and for swapping to be preferable over thrashing filesystem caches. Allow setting swappiness - which defines the rough relative IO cost of cache misses between page cache and swap-backed pages - to reflect such situations by making the swap-preferred range configurable. v2: clarify how to calculate swappiness (Minchan Kim) Signed-off-by: Johannes Weiner --- Documentation/admin-guide/sysctl/vm.rst | 23 ++++++++++++++++++----- kernel/sysctl.c | 3 ++- mm/vmscan.c | 2 +- 3 files changed, 21 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 0329a4d3fa9e..d46d5b7013c6 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -831,14 +831,27 @@ When page allocation performance is not a bottleneck and you want all swappiness ========== -This control is used to define how aggressive the kernel will swap -memory pages. Higher values will increase aggressiveness, lower values -decrease the amount of swap. A value of 0 instructs the kernel not to -initiate swap until the amount of free and file-backed pages is less -than the high water mark in a zone. +This control is used to define the rough relative IO cost of swapping +and filesystem paging, as a value between 0 and 200. At 100, the VM +assumes equal IO cost and will thus apply memory pressure to the page +cache and swap-backed pages equally; lower values signify more +expensive swap IO, higher values indicates cheaper. + +Keep in mind that filesystem IO patterns under memory pressure tend to +be more efficient than swap's random IO. An optimal value will require +experimentation and will also be workload-dependent. The default value is 60. +For in-memory swap, like zram or zswap, as well as hybrid setups that +have swap on faster devices than the filesystem, values beyond 100 can +be considered. For example, if the random IO against the swap device +is on average 2x faster than IO from the filesystem, swappiness should +be 133 (x + 2x = 200, 2x = 133.33). + +At 0, the kernel will not initiate swap until the amount of free and +file-backed pages is less than the high watermark in a zone. + unprivileged_userfaultfd ======================== diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 8a176d8727a3..7f15d292e44c 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -131,6 +131,7 @@ static unsigned long zero_ul; static unsigned long one_ul = 1; static unsigned long long_max = LONG_MAX; static int one_hundred = 100; +static int two_hundred = 200; static int one_thousand = 1000; #ifdef CONFIG_PRINTK static int ten_thousand = 10000; @@ -1391,7 +1392,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = &one_hundred, + .extra2 = &two_hundred, }, #ifdef CONFIG_HUGETLB_PAGE { diff --git a/mm/vmscan.c b/mm/vmscan.c index 70b0e2c6a4b9..43f88b1a4f14 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -161,7 +161,7 @@ struct scan_control { #endif /* - * From 0 .. 100. Higher means more swappy. + * From 0 .. 200. Higher means more swappy. */ int vm_swappiness = 60; /* From patchwork Wed May 20 23:25:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245268 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E6A4C433E1 for ; Wed, 20 May 2020 23:26:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3504820759 for ; Wed, 20 May 2020 23:26:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="Vfqtr1av" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728943AbgETX0u (ORCPT ); Wed, 20 May 2020 19:26:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728750AbgETX0H (ORCPT ); Wed, 20 May 2020 19:26:07 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A732C061A0F for ; Wed, 20 May 2020 16:26:07 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id t25so4128020qtc.0 for ; Wed, 20 May 2020 16:26:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ONzXxQJWlYS6v930NyblSl/dSH7byxuUjdizqQ76XH0=; b=Vfqtr1avu8tKcBmh9fQhTBFRNFDrzNK0PrK8D6jNEROiFm3/ZTM/m06At6b/4uXVsC 1vbyQWxOIaVQr8yv2yXIoDgk7tZynHELSGHCvuW7WV2763JCx3aB8O33biCDlb137whD YgcWXGrAB+b+mPLNQFGRvhO6neIyhmoIE6bO5J0FYCYgv+ZTT/AzpiPnrIdTee7gBQia P+7xyDQehuBqMoq6DezkMOl7yYDOuW2DcBiYBvc3Qk4tEDxyOUsSSdzIzTake8NsBSF/ +Cyv0HXlKg53qZ5z2nVOzPqKfeedvzJGjOYZSvaRrM8tSG1urxGtbkJMYtK/JbJy4hwa Xv/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ONzXxQJWlYS6v930NyblSl/dSH7byxuUjdizqQ76XH0=; b=K8d60Vh6WnTzUYR/coF1fhG36HN4ttyR5pzoiMTKm3Ne/Q7CH3uKQwkZ3yatqDi04z a5Ye7ssEz/CIiX4JvGEmhgpkEm5tMCZ3oNlwEgucAjhfrsspI8yWWypAnrmnEpmCo6KG wwMTAfwa7cidqidRYR/ASxK2vSqwUK6VdtamRmmoUCp7ZmTxSCVjMVOZtbggBnzEIuGB 1ofabW60xQ8RwiC7QPENxA24ceaN8Y/3KL7gl1aDV6G3e+atq/ZOjwtdV4+Tc2pD8fhw k3tRuMrf0nuCTaLPA8mafm0xXnOjx59PJPJCKLuFAZ0ynzrLCBRuoQKKQkkdwV4r3HIY ToAA== X-Gm-Message-State: AOAM5336BdAZUpxD3H0mhK4Pk6JWJVxgoE9LszFgoZ5scyAb3/XV/rEM zBtH3kV0qDihN227uBElzbJ8XlX7nyA= X-Google-Smtp-Source: ABdhPJxWLrLCauEGOpU27uvVaRr+J6xURbThu0ApKi/WqsatFbaueRYa5w8NI0hkSOEFQm0hdHV3iw== X-Received: by 2002:ac8:3fa4:: with SMTP id d33mr8066768qtk.376.1590017166387; Wed, 20 May 2020 16:26:06 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id p42sm3802559qtk.94.2020.05.20.16.26.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:05 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 04/14] mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() Date: Wed, 20 May 2020 19:25:15 -0400 Message-Id: <20200520232525.798933-5-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org They're the same function, and for the purpose of all callers they are equivalent to lru_cache_add(). Signed-off-by: Johannes Weiner Reviewed-by: Rik van Riel Acked-by: Michal Hocko Acked-by: Minchan Kim --- fs/cifs/file.c | 10 +++++----- fs/fuse/dev.c | 2 +- include/linux/swap.h | 2 -- mm/khugepaged.c | 8 ++------ mm/memory.c | 2 +- mm/shmem.c | 6 +++--- mm/swap.c | 38 ++++++++------------------------------ mm/swap_state.c | 2 +- 8 files changed, 21 insertions(+), 49 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0b1528edebcf..169ab37bb3aa 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4162,7 +4162,7 @@ cifs_readv_complete(struct work_struct *work) for (i = 0; i < rdata->nr_pages; i++) { struct page *page = rdata->pages[i]; - lru_cache_add_file(page); + lru_cache_add(page); if (rdata->result == 0 || (rdata->result == -EAGAIN && got_bytes)) { @@ -4232,7 +4232,7 @@ readpages_fill_pages(struct TCP_Server_Info *server, * fill them until the writes are flushed. */ zero_user(page, 0, PAGE_SIZE); - lru_cache_add_file(page); + lru_cache_add(page); flush_dcache_page(page); SetPageUptodate(page); unlock_page(page); @@ -4242,7 +4242,7 @@ readpages_fill_pages(struct TCP_Server_Info *server, continue; } else { /* no need to hold page hostage */ - lru_cache_add_file(page); + lru_cache_add(page); unlock_page(page); put_page(page); rdata->pages[i] = NULL; @@ -4437,7 +4437,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping, /* best to give up if we're out of mem */ list_for_each_entry_safe(page, tpage, &tmplist, lru) { list_del(&page->lru); - lru_cache_add_file(page); + lru_cache_add(page); unlock_page(page); put_page(page); } @@ -4475,7 +4475,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping, add_credits_and_wake_if(server, &rdata->credits, 0); for (i = 0; i < rdata->nr_pages; i++) { page = rdata->pages[i]; - lru_cache_add_file(page); + lru_cache_add(page); unlock_page(page); put_page(page); } diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 97eec7522bf2..8c0cc79d8071 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -840,7 +840,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) get_page(newpage); if (!(buf->flags & PIPE_BUF_FLAG_LRU)) - lru_cache_add_file(newpage); + lru_cache_add(newpage); err = 0; spin_lock(&cs->req->waitq.lock); diff --git a/include/linux/swap.h b/include/linux/swap.h index b42fb47d8cbe..30fd4641890e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -334,8 +334,6 @@ extern unsigned long nr_free_pagecache_pages(void); /* linux/mm/swap.c */ extern void lru_cache_add(struct page *); -extern void lru_cache_add_anon(struct page *page); -extern void lru_cache_add_file(struct page *page); extern void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *head); extern void activate_page(struct page *); diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f2e0a5e5cfbb..d458c61d6195 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1869,13 +1869,9 @@ static void collapse_file(struct mm_struct *mm, SetPageUptodate(new_page); page_ref_add(new_page, HPAGE_PMD_NR - 1); - - if (is_shmem) { + if (is_shmem) set_page_dirty(new_page); - lru_cache_add_anon(new_page); - } else { - lru_cache_add_file(new_page); - } + lru_cache_add(new_page); /* * Remove pte page tables, so we can re-fault the page as huge. diff --git a/mm/memory.c b/mm/memory.c index b4efc125ae09..a1813ca388a8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3139,7 +3139,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (err) goto out_page; - lru_cache_add_anon(page); + lru_cache_add(page); swap_readpage(page, true); } } else { diff --git a/mm/shmem.c b/mm/shmem.c index e83de27ce8f4..ea95a3e46fbb 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1609,7 +1609,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, */ oldpage = newpage; } else { - lru_cache_add_anon(newpage); + lru_cache_add(newpage); *pagep = newpage; } @@ -1860,7 +1860,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, charge_mm); if (error) goto unacct; - lru_cache_add_anon(page); + lru_cache_add(page); spin_lock_irq(&info->lock); info->alloced += compound_nr(page); @@ -2376,7 +2376,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, if (!pte_none(*dst_pte)) goto out_release_unlock; - lru_cache_add_anon(page); + lru_cache_add(page); spin_lock_irq(&info->lock); info->alloced++; diff --git a/mm/swap.c b/mm/swap.c index 68eae1e2787a..c93a6c84464c 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -403,35 +403,6 @@ void mark_page_accessed(struct page *page) } EXPORT_SYMBOL(mark_page_accessed); -static void __lru_cache_add(struct page *page) -{ - struct pagevec *pvec = &get_cpu_var(lru_add_pvec); - - get_page(page); - if (!pagevec_add(pvec, page) || PageCompound(page)) - __pagevec_lru_add(pvec); - put_cpu_var(lru_add_pvec); -} - -/** - * lru_cache_add_anon - add a page to the page lists - * @page: the page to add - */ -void lru_cache_add_anon(struct page *page) -{ - if (PageActive(page)) - ClearPageActive(page); - __lru_cache_add(page); -} - -void lru_cache_add_file(struct page *page) -{ - if (PageActive(page)) - ClearPageActive(page); - __lru_cache_add(page); -} -EXPORT_SYMBOL(lru_cache_add_file); - /** * lru_cache_add - add a page to a page list * @page: the page to be added to the LRU. @@ -443,10 +414,17 @@ EXPORT_SYMBOL(lru_cache_add_file); */ void lru_cache_add(struct page *page) { + struct pagevec *pvec = &get_cpu_var(lru_add_pvec); + VM_BUG_ON_PAGE(PageActive(page) && PageUnevictable(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); - __lru_cache_add(page); + + get_page(page); + if (!pagevec_add(pvec, page) || PageCompound(page)) + __pagevec_lru_add(pvec); + put_cpu_var(lru_add_pvec); } +EXPORT_SYMBOL(lru_cache_add); /** * lru_cache_add_active_or_unevictable diff --git a/mm/swap_state.c b/mm/swap_state.c index 17123caec6dd..b5e08ff00e1e 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -432,7 +432,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* Initiate read into locked page */ SetPageWorkingset(page); - lru_cache_add_anon(page); + lru_cache_add(page); *new_page_allocated = true; return page; From patchwork Wed May 20 23:25:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245257 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89DB8C433DF for ; Wed, 20 May 2020 23:26:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 621F32089D for ; Wed, 20 May 2020 23:26:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="Fz6dmISB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728792AbgETX0L (ORCPT ); Wed, 20 May 2020 19:26:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728462AbgETX0I (ORCPT ); Wed, 20 May 2020 19:26:08 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A09DFC061A0E for ; Wed, 20 May 2020 16:26:08 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id f13so5535044qkh.2 for ; Wed, 20 May 2020 16:26:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HJDfvKd/xQ+U/A8BiqznBlIOAX4alsApJfZU8rh9CnI=; b=Fz6dmISBeqccLp92SIXlWpZ4Azk3J/vmLzFY5xmcQlxADIPoHHlyHraLVAj7OZUKVp n3BskcozIV7s77yj6cNgbDIZni1VpzH871u2OMtUJXOOmAHgFtjBsBG0UFMjJRBJupvC Yz5A9ODIDyk5meuBLYNJ5wk/syTQIMSwrII92tLUyMni08y2bsn59zRkQGqhiZdZHyvA wibKQvsMqsubKVxJ4Xn53Wz6L/XBKMlWCs/peKgGpGu2sLdIaDzkpgLcBysyBP7RqTu1 PUFqqHOtLdaL7yYc+m6Qc+PlYM91ZKwumkyvKsNL8e3t2TFpfyfMfX5SYHtuwf7ee0xA yd/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HJDfvKd/xQ+U/A8BiqznBlIOAX4alsApJfZU8rh9CnI=; b=FwsfTnbvrBK0r5U72E+pBnMB9rqmRamyCS46CYyEuB7Zk43Pk085LprMA0HEpeSur4 0b+mHl5PS/7IDMUxfqE3O7upBC9Gdq0Qhple/xQeD+naoLI025bOXI33o5eVJM0fKm8N 3rXCdafx7IgO/dlxOmJHPCO9HygKgoHePS9QOqGmCuyLgLjZWzj/rBqYHPY/6YXbf4Ko wI8jIT4z88IoQaF6nZRiWJsbldxWpPyqMasr6Pp9CUT/qbTX/EiQviTbixtwwyGIdv0P sI/szyrP8A0wruIQ2tvHdzIVogjHFxyNzEjFC90bO9qNm5SWFEmsCB/t+ABp1k7ndOxt awgw== X-Gm-Message-State: AOAM530VpTYS8IdxICTu5G2yOa2Uyw2q628fwkdm3P+jGxNacKJmXZtc E0D3bwzMACMvJu7pRx2QOh+UxA4Eb/c= X-Google-Smtp-Source: ABdhPJxoMydyCn7v1MWAcLb23MBY3YgxIiym2FF8pOGzEVh285Tr6htPKp/YVFywnRRUF3nRDo/cLA== X-Received: by 2002:ae9:e404:: with SMTP id q4mr7364353qkc.129.1590017167920; Wed, 20 May 2020 16:26:07 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id o3sm3596698qtt.56.2020.05.20.16.26.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:07 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 05/14] mm: workingset: let cache workingset challenge anon Date: Wed, 20 May 2020 19:25:16 -0400 Message-Id: <20200520232525.798933-6-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We activate cache refaults with reuse distances in pages smaller than the size of the total cache. This allows new pages with competitive access frequencies to establish themselves, as well as challenge and potentially displace pages on the active list that have gone cold. However, that assumes that active cache can only replace other active cache in a competition for the hottest memory. This is not a great default assumption. The page cache might be thrashing while there are enough completely cold and unused anonymous pages sitting around that we'd only have to write to swap once to stop all IO from the cache. Activate cache refaults when their reuse distance in pages is smaller than the total userspace workingset, including anonymous pages. Reclaim can still decide how to balance pressure among the two LRUs depending on the IO situation. Rotational drives will prefer avoiding random IO from swap and go harder after cache. But fundamentally, hot cache should be able to compete with anon pages for a place in RAM. Signed-off-by: Johannes Weiner Reported-by: Joonsoo Kim Signed-off-by: Johannes Weiner Reported-by: Joonsoo Kim Signed-off-by: Johannes Weiner Acked-by: Joonsoo Kim Reported-by: Joonsoo Kim Signed-off-by: Johannes Weiner --- mm/workingset.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 474186b76ced..e69865739539 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -277,8 +277,8 @@ void workingset_refault(struct page *page, void *shadow) struct mem_cgroup *eviction_memcg; struct lruvec *eviction_lruvec; unsigned long refault_distance; + unsigned long workingset_size; struct pglist_data *pgdat; - unsigned long active_file; struct mem_cgroup *memcg; unsigned long eviction; struct lruvec *lruvec; @@ -310,7 +310,6 @@ void workingset_refault(struct page *page, void *shadow) goto out; eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat); refault = atomic_long_read(&eviction_lruvec->inactive_age); - active_file = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); /* * Calculate the refault distance @@ -345,10 +344,18 @@ void workingset_refault(struct page *page, void *shadow) /* * Compare the distance to the existing workingset size. We - * don't act on pages that couldn't stay resident even if all - * the memory was available to the page cache. + * don't activate pages that couldn't stay resident even if + * all the memory was available to the page cache. Whether + * cache can compete with anon or not depends on having swap. */ - if (refault_distance > active_file) + workingset_size = lruvec_page_state(eviction_lruvec, NR_ACTIVE_FILE); + if (mem_cgroup_get_nr_swap_pages(memcg) > 0) { + workingset_size += lruvec_page_state(eviction_lruvec, + NR_INACTIVE_ANON); + workingset_size += lruvec_page_state(eviction_lruvec, + NR_ACTIVE_ANON); + } + if (refault_distance > workingset_size) goto out; SetPageActive(page); From patchwork Wed May 20 23:25:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245266 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1170EC433E0 for ; Wed, 20 May 2020 23:26:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DFE6B20759 for ; Wed, 20 May 2020 23:26:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="1MBXqpD1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728811AbgETX0N (ORCPT ); Wed, 20 May 2020 19:26:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728772AbgETX0K (ORCPT ); Wed, 20 May 2020 19:26:10 -0400 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1EF1DC061A0F for ; Wed, 20 May 2020 16:26:10 -0700 (PDT) Received: by mail-qv1-xf44.google.com with SMTP id p4so2238266qvr.10 for ; Wed, 20 May 2020 16:26:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=YB631ifqXCSgSJTd0qtcdsdE2wZ/BeBb/PETguDBZ3s=; b=1MBXqpD1aFpHw165X+FC9nGMW0T95Jxsnt4XOC7AgQUgsdiHpDYQk/saWpqTNLgFYs PzCCUkT3nRULKqA9705MjHZXPneWXmrgjjaJqGg+zrHZkXdPqXGK8/TGsM+fMxsFcrMC c54J8REBGDNyJTGugU5J1FMAAZzn8mT3e4hWSUHhMQO3zIQ1yRUHndfe+jtJHm+wHSRn zH1QCMOUl4zIikJQpG5bNDC4Ze2Zk0j/yVAiVURjIROF6fYk0pxzFqwXO4U3OsOSsM9B BB8OvWj9BBODjPg7rnIDI/1MSp5ZXWXnz4Wo9kYnQv3PDyzwkhNC20yd32DRo/dvOWpG 031w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YB631ifqXCSgSJTd0qtcdsdE2wZ/BeBb/PETguDBZ3s=; b=gKNaerpq7YQWicwoaBcNZfPVrjY0rjf0W5ImvF7UaewBYXNAXHPnA30+ZeV0dyxBRG aN7fytjwrjRVuqBm8VKHjnEFx9ogsChCpi1CUf3OwGy/e7kTAbhdtMipwk333NvVBnTj pCWvdiub4qpysgBuT6Rg7trbPhJyvQBJQqN22WLK78/M6WN3wWnbz16WVTlckgNYn1gn T8h7IC8RSvu8g06d3083NvnYyiTUSI67CGMdO59j/1TC2LTxIoNmA32mXSxWx1CcTi5H GKu5SvrNY1oe+zCBxw/MTU3rIwnxlz85Poryds65Hj1P7NXcMQ/ynyrgaINU1Tbhbc3s 4Zog== X-Gm-Message-State: AOAM5305YWBF7U5edS+jdk2oEDOLt6qfbrxse93awCrN2+VVcoj55Ixf C9/rw0gC5LLpjGaicDquxO1Dww== X-Google-Smtp-Source: ABdhPJxLMyX89YOB3AHomG7E8lWZKxV5Wrb3mzIrpS/hIG6SDGi63YgQU4/iwCWZW0KkCgIHwu/ARA== X-Received: by 2002:a0c:ab19:: with SMTP id h25mr7526704qvb.108.1590017169194; Wed, 20 May 2020 16:26:09 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id 124sm3333412qkn.73.2020.05.20.16.26.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:08 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 06/14] mm: remove use-once cache bias from LRU balancing Date: Wed, 20 May 2020 19:25:17 -0400 Message-Id: <20200520232525.798933-7-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When the splitlru patches divided page cache and swap-backed pages into separate LRU lists, the pressure balance between the lists was biased to account for the fact that streaming IO can cause memory pressure with a flood of pages that are used only once. New page cache additions would tip the balance toward the file LRU, and repeat access would neutralize that bias again. This ensured that page reclaim would always go for used-once cache first. Since e9868505987a ("mm,vmscan: only evict file pages when we have plenty"), page reclaim generally skips over swap-backed memory entirely as long as there is used-once cache present, and will apply the LRU balancing when only repeatedly accessed cache pages are left - at which point the previous use-once bias will have been neutralized. This makes the use-once cache balancing bias unnecessary. Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Acked-by: Minchan Kim --- mm/swap.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index c93a6c84464c..3b8c81bc93cd 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -277,7 +277,6 @@ static void __activate_page(struct page *page, struct lruvec *lruvec, void *arg) { if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { - int file = page_is_file_lru(page); int lru = page_lru_base_type(page); del_page_from_lru_list(page, lruvec, lru); @@ -287,7 +286,6 @@ static void __activate_page(struct page *page, struct lruvec *lruvec, trace_mm_lru_activate(page); __count_vm_event(PGACTIVATE); - update_page_reclaim_stat(lruvec, file, 1, hpage_nr_pages(page)); } } @@ -936,9 +934,6 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, if (page_evictable(page)) { lru = page_lru(page); - update_page_reclaim_stat(lruvec, is_file_lru(lru), - PageActive(page), - hpage_nr_pages(page)); if (was_unevictable) count_vm_event(UNEVICTABLE_PGRESCUED); } else { From patchwork Wed May 20 23:25:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245259 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CE9DC433E1 for ; Wed, 20 May 2020 23:26:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 17C2D2089D for ; Wed, 20 May 2020 23:26:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="ztU73FgX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728853AbgETX0R (ORCPT ); Wed, 20 May 2020 19:26:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728800AbgETX0N (ORCPT ); Wed, 20 May 2020 19:26:13 -0400 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2352C061A0E for ; Wed, 20 May 2020 16:26:11 -0700 (PDT) Received: by mail-qv1-xf43.google.com with SMTP id dh1so2223907qvb.13 for ; Wed, 20 May 2020 16:26:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=k1mJyFxqpgpRzCt8CICzuJvDAagkfPR/HMnvDC6JCyY=; b=ztU73FgX01oSp/NMDzA3s6wLAWqemke1edTHLBlfwK9d2FVAn7pD+tArAISYdGoz7L Ypfr5EHdNe9un0Ie+AMD5IE3dm8h3CGusgw8hsPu9ZUkXVG8A51EaAffxsk5T8iiWFAe xtiaaL5aQURwIyCVnz5CyGKiPgfPVsamum8BJ2c5ofoLUyoiXENrOdR+1uWnUSbKXPBg Ff0B+cycL5NqIdYQs2aD/eFO88CRg0vYo1gIEcjYRcGHPnKueMcv/yv5jd7nRU09os5u t7ma1B4qYkHr6/J8Qjog9NU4MXUf45aePH1niFKtNgTZxnfzY31ZKSQajZ0tIwE3Rbat lHDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=k1mJyFxqpgpRzCt8CICzuJvDAagkfPR/HMnvDC6JCyY=; b=JDKQfELv9BkO+0h04aA0x4rWI0OaLqjwI/ihc5vMhNqmJVciIsRCpGCCgpn4UWpLqK NC6JiIV/42L5LZpSOjqTbMXc0IgPFxVqgVnPgPMEcKFyGXg4ctA+6Q/bv4t1tNVVU0Jv 3XmvOqxtpIPaAGf4MTXqY9a8LlVHJ2HaIeOEeFpKheSDtw9qdrfB3pS5srSlCr06wt5E SDmYBTqc7i4Vqz8FgXbRHMROl98QDUYTdMqsRJYtIh+bOukt0hE0yCQQKt/ZqNayyq5c 9liD/H6Iz6lmzTjPOqVaqv7yBM0kIMlgkQkZV6SCvm8atKQBKEmhapYqdvnm7HWp9x5V qkDg== X-Gm-Message-State: AOAM532WvcJrTrdKYws+ZVy95Vk57lP8yn8q3QRmUTIY8aXoekRE7kJk 0IGGs/m6t3nR/IHfF/Uejtkwqw== X-Google-Smtp-Source: ABdhPJwIoVni4xkak/KSGtFyIkrBv+2cj6B2hO1dZHEdUPHsdcUxiqPFz+TFg6pubttpAcmk8zKYXQ== X-Received: by 2002:a0c:f407:: with SMTP id h7mr7159580qvl.116.1590017170707; Wed, 20 May 2020 16:26:10 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id j36sm3921691qta.71.2020.05.20.16.26.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:10 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 07/14] mm: vmscan: drop unnecessary div0 avoidance rounding in get_scan_count() Date: Wed, 20 May 2020 19:25:18 -0400 Message-Id: <20200520232525.798933-8-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When we calculate the relative scan pressure between the anon and file LRU lists, we have to assume that reclaim_stat can contain zeroes. To avoid div0 crashes, we add 1 to all denominators like so: anon_prio = swappiness; file_prio = 200 - anon_prio; [...] /* * The amount of pressure on anon vs file pages is inversely * proportional to the fraction of recently scanned pages on * each list that were recently referenced and in active use. */ ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1); ap /= reclaim_stat->recent_rotated[0] + 1; fp = file_prio * (reclaim_stat->recent_scanned[1] + 1); fp /= reclaim_stat->recent_rotated[1] + 1; spin_unlock_irq(&pgdat->lru_lock); fraction[0] = ap; fraction[1] = fp; denominator = ap + fp + 1; While reclaim_stat can contain 0, it's not actually possible for ap + fp to be 0. One of anon_prio or file_prio could be zero, but they must still add up to 200. And the reclaim_stat fraction, due to the +1 in there, is always at least 1. So if one of the two numerators is 0, the other one can't be. ap + fp is always at least 1. Drop the + 1. Signed-off-by: Johannes Weiner --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 43f88b1a4f14..6cd1029ea9d4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2348,7 +2348,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, fraction[0] = ap; fraction[1] = fp; - denominator = ap + fp + 1; + denominator = ap + fp; out: for_each_evictable_lru(lru) { int file = is_file_lru(lru); From patchwork Wed May 20 23:25:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245258 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C91FC433E3 for ; Wed, 20 May 2020 23:26:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4DE5120899 for ; Wed, 20 May 2020 23:26:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="hp4okOv4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728841AbgETX0O (ORCPT ); Wed, 20 May 2020 19:26:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728801AbgETX0N (ORCPT ); Wed, 20 May 2020 19:26:13 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17853C061A0F for ; Wed, 20 May 2020 16:26:13 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id b6so5450902qkh.11 for ; Wed, 20 May 2020 16:26:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NkkZFxGQtrhNG1PgEmdhJKjLbxi3YgXFndmhvMvqYPA=; b=hp4okOv4daPRreoCjYfITe0+iUReKpLCO9iyhpD3ButmC+bwcqlyAV7n5gwd9lP3+v XsJD22sp4pK+D85AJiTQ3GEDuy/5R3iI+5fVpVRryXDZl0DevboUf9O2tc4nP3P5JF9H 7rJ0FlSVRxn/eo+JJtLOlBusyBrWnjYfcDPewhHycXcKLx+2k/zF8S9EK3r3rQwJzhdj wtbjaEYUmlwtVv2KEJIpPPfjdIyp7CdJx1aKSJw5X6a813793jCv7RUGWaP7GYU8QqaK LyO7RbGbdjK1kCFNJvh9TcQ3wcFVD/GKGgNvCAjzObX+slL9kg7oNNVAcjVSho85brD2 8k6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NkkZFxGQtrhNG1PgEmdhJKjLbxi3YgXFndmhvMvqYPA=; b=SKDVQ2GCQ1dO1f4IxAltioz9e3sfvyyE48j7BxG1kR5AxCDczFYYuxmgQa22csB7zI 8rZkgc1PutCIVOtHZVCGZs400Lw1jQLs0uH+iURu3Y1MtrcrYOn8a210x98aZxsi9KVH 5ghRqTRBIAErUWdxexwYMl3pUGhHxyn047lG+3OO1UwcZvtyXCedujJpeR6PnQB5E3lr SCKwkp9BvKdNy4KY7K7Qn+Fi38Y2CRJM6SSWfdvlketbo20bKtskPgPzVBl42dPbSf// zyH1KLTEFIgcRYWw1V9LXsqd68lHO901+UkX0LKFdzZLA5XpdfM5BNLRjotvVAwO8Dj2 LAZg== X-Gm-Message-State: AOAM532lKpV6rtskCdSESJvqHXddXnlUWv6Db8bjstMosA2iPic/2iXZ jg++FdDNjxrqGJ4FwEQfsCHo6w== X-Google-Smtp-Source: ABdhPJwZc53QAG9TAKYABqAnp0YJ74dnlPkNRmmm5h7nXgJbpaqqzzcAsZiZQXedAqpzltfNzwczdg== X-Received: by 2002:a37:d245:: with SMTP id f66mr6480773qkj.452.1590017172223; Wed, 20 May 2020 16:26:12 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id n35sm3624325qte.55.2020.05.20.16.26.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:11 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 08/14] mm: base LRU balancing on an explicit cost model Date: Wed, 20 May 2020 19:25:19 -0400 Message-Id: <20200520232525.798933-9-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, scan pressure between the anon and file LRU lists is balanced based on a mixture of reclaim efficiency and a somewhat vague notion of "value" of having certain pages in memory over others. That concept of value is problematic, because it has caused us to count any event that remotely makes one LRU list more or less preferrable for reclaim, even when these events are not directly comparable and impose very different costs on the system. One example is referenced file pages that we still deactivate and referenced anonymous pages that we actually rotate back to the head of the list. There is also conceptual overlap with the LRU algorithm itself. By rotating recently used pages instead of reclaiming them, the algorithm already biases the applied scan pressure based on page value. Thus, when rebalancing scan pressure due to rotations, we should think of reclaim cost, and leave assessing the page value to the LRU algorithm. Lastly, considering both value-increasing as well as value-decreasing events can sometimes cause the same type of event to be counted twice, i.e. how rotating a page increases the LRU value, while reclaiming it succesfully decreases the value. In itself this will balance out fine, but it quietly skews the impact of events that are only recorded once. The abstract metric of "value", the murky relationship with the LRU algorithm, and accounting both negative and positive events make the current pressure balancing model hard to reason about and modify. This patch switches to a balancing model of accounting the concrete, actually observed cost of reclaiming one LRU over another. For now, that cost includes pages that are scanned but rotated back to the list head. Subsequent patches will add consideration for IO caused by refaulting of recently evicted pages. Replace struct zone_reclaim_stat with two cost counters in the lruvec, and make everything that affects cost go through a new lru_note_cost() function. v2: remove superfluous cost denominator (Minchan Kim) improve cost variable naming (Michal Hocko) Signed-off-by: Johannes Weiner Acked-by: Michal Hocko --- include/linux/mmzone.h | 21 +++++++------------- include/linux/swap.h | 2 ++ mm/memcontrol.c | 18 ++++++----------- mm/swap.c | 19 ++++++++---------- mm/vmscan.c | 44 +++++++++++++++++++++--------------------- 5 files changed, 45 insertions(+), 59 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c1fbda9ddd1f..e959602140b4 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -240,19 +240,6 @@ static inline bool is_active_lru(enum lru_list lru) return (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE); } -struct zone_reclaim_stat { - /* - * The pageout code in vmscan.c keeps track of how many of the - * mem/swap backed and file backed pages are referenced. - * The higher the rotated/scanned ratio, the more valuable - * that cache is. - * - * The anon LRU stats live in [0], file LRU stats in [1] - */ - unsigned long recent_rotated[2]; - unsigned long recent_scanned[2]; -}; - enum lruvec_flags { LRUVEC_CONGESTED, /* lruvec has many dirty pages * backed by a congested BDI @@ -261,7 +248,13 @@ enum lruvec_flags { struct lruvec { struct list_head lists[NR_LRU_LISTS]; - struct zone_reclaim_stat reclaim_stat; + /* + * These track the cost of reclaiming one LRU - file or anon - + * over the other. As the observed cost of reclaiming one LRU + * increases, the reclaim scan balance tips toward the other. + */ + unsigned long anon_cost; + unsigned long file_cost; /* Evictions & activations on the inactive file list */ atomic_long_t inactive_age; /* Refaults at the time of last reclaim cycle */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 30fd4641890e..5ace6d8a33bd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -333,6 +333,8 @@ extern unsigned long nr_free_pagecache_pages(void); /* linux/mm/swap.c */ +extern void lru_note_cost(struct lruvec *lruvec, bool file, + unsigned int nr_pages); extern void lru_cache_add(struct page *); extern void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *head); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fe4f4d96ae3e..3e000a316b59 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3790,23 +3790,17 @@ static int memcg_stat_show(struct seq_file *m, void *v) { pg_data_t *pgdat; struct mem_cgroup_per_node *mz; - struct zone_reclaim_stat *rstat; - unsigned long recent_rotated[2] = {0, 0}; - unsigned long recent_scanned[2] = {0, 0}; + unsigned long anon_cost = 0; + unsigned long file_cost = 0; for_each_online_pgdat(pgdat) { mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id); - rstat = &mz->lruvec.reclaim_stat; - recent_rotated[0] += rstat->recent_rotated[0]; - recent_rotated[1] += rstat->recent_rotated[1]; - recent_scanned[0] += rstat->recent_scanned[0]; - recent_scanned[1] += rstat->recent_scanned[1]; + anon_cost += mz->lruvec.anon_cost; + file_cost += mz->lruvec.file_cost; } - seq_printf(m, "recent_rotated_anon %lu\n", recent_rotated[0]); - seq_printf(m, "recent_rotated_file %lu\n", recent_rotated[1]); - seq_printf(m, "recent_scanned_anon %lu\n", recent_scanned[0]); - seq_printf(m, "recent_scanned_file %lu\n", recent_scanned[1]); + seq_printf(m, "anon_cost %lu\n", anon_cost); + seq_printf(m, "file_cost %lu\n", file_cost); } #endif diff --git a/mm/swap.c b/mm/swap.c index 3b8c81bc93cd..5d62c5a0c651 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -262,15 +262,12 @@ void rotate_reclaimable_page(struct page *page) } } -static void update_page_reclaim_stat(struct lruvec *lruvec, - int file, int rotated, - unsigned int nr_pages) +void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages) { - struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; - - reclaim_stat->recent_scanned[file] += nr_pages; - if (rotated) - reclaim_stat->recent_rotated[file] += nr_pages; + if (file) + lruvec->file_cost += nr_pages; + else + lruvec->anon_cost += nr_pages; } static void __activate_page(struct page *page, struct lruvec *lruvec, @@ -518,7 +515,7 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, if (active) __count_vm_event(PGDEACTIVATE); - update_page_reclaim_stat(lruvec, file, 0, hpage_nr_pages(page)); + lru_note_cost(lruvec, !file, hpage_nr_pages(page)); } static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, @@ -534,7 +531,7 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, add_page_to_lru_list(page, lruvec, lru); __count_vm_events(PGDEACTIVATE, hpage_nr_pages(page)); - update_page_reclaim_stat(lruvec, file, 0, hpage_nr_pages(page)); + lru_note_cost(lruvec, !file, hpage_nr_pages(page)); } } @@ -559,7 +556,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, __count_vm_events(PGLAZYFREE, hpage_nr_pages(page)); count_memcg_page_event(page, PGLAZYFREE); - update_page_reclaim_stat(lruvec, 1, 0, hpage_nr_pages(page)); + lru_note_cost(lruvec, 0, hpage_nr_pages(page)); } } diff --git a/mm/vmscan.c b/mm/vmscan.c index 6cd1029ea9d4..6ff63906a288 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1916,7 +1916,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, bool file = is_file_lru(lru); enum vm_event_item item; struct pglist_data *pgdat = lruvec_pgdat(lruvec); - struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; bool stalled = false; while (unlikely(too_many_isolated(pgdat, file, sc))) { @@ -1940,7 +1939,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, &nr_scanned, sc, lru); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); - reclaim_stat->recent_scanned[file] += nr_taken; item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_scanned); @@ -1960,8 +1958,12 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, move_pages_to_lru(lruvec, &page_list); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - reclaim_stat->recent_rotated[0] += stat.nr_activate[0]; - reclaim_stat->recent_rotated[1] += stat.nr_activate[1]; + /* + * Rotating pages costs CPU without actually + * progressing toward the reclaim goal. + */ + lru_note_cost(lruvec, 0, stat.nr_activate[0]); + lru_note_cost(lruvec, 1, stat.nr_activate[1]); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); @@ -2013,7 +2015,6 @@ static void shrink_active_list(unsigned long nr_to_scan, LIST_HEAD(l_active); LIST_HEAD(l_inactive); struct page *page; - struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; unsigned nr_deactivate, nr_activate; unsigned nr_rotated = 0; int file = is_file_lru(lru); @@ -2027,7 +2028,6 @@ static void shrink_active_list(unsigned long nr_to_scan, &nr_scanned, sc, lru); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); - reclaim_stat->recent_scanned[file] += nr_taken; __count_vm_events(PGREFILL, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); @@ -2085,7 +2085,7 @@ static void shrink_active_list(unsigned long nr_to_scan, * helps balance scan pressure between file and anonymous pages in * get_scan_count. */ - reclaim_stat->recent_rotated[file] += nr_rotated; + lru_note_cost(lruvec, file, nr_rotated); nr_activate = move_pages_to_lru(lruvec, &l_active); nr_deactivate = move_pages_to_lru(lruvec, &l_inactive); @@ -2242,13 +2242,13 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, { struct mem_cgroup *memcg = lruvec_memcg(lruvec); int swappiness = mem_cgroup_swappiness(memcg); - struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; u64 fraction[2]; u64 denominator = 0; /* gcc */ struct pglist_data *pgdat = lruvec_pgdat(lruvec); unsigned long anon_prio, file_prio; enum scan_balance scan_balance; unsigned long anon, file; + unsigned long totalcost; unsigned long ap, fp; enum lru_list lru; @@ -2324,26 +2324,26 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES); spin_lock_irq(&pgdat->lru_lock); - if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) { - reclaim_stat->recent_scanned[0] /= 2; - reclaim_stat->recent_rotated[0] /= 2; - } - - if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) { - reclaim_stat->recent_scanned[1] /= 2; - reclaim_stat->recent_rotated[1] /= 2; + totalcost = lruvec->anon_cost + lruvec->file_cost; + if (unlikely(totalcost > (anon + file) / 4)) { + lruvec->anon_cost /= 2; + lruvec->file_cost /= 2; + totalcost /= 2; } /* * The amount of pressure on anon vs file pages is inversely - * proportional to the fraction of recently scanned pages on - * each list that were recently referenced and in active use. + * proportional to the assumed cost of reclaiming each list, + * as determined by the share of pages that are likely going + * to refault or rotate on each list (recently referenced), + * times the relative IO cost of bringing back a swapped out + * anonymous page vs reloading a filesystem page (swappiness). */ - ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1); - ap /= reclaim_stat->recent_rotated[0] + 1; + ap = anon_prio * (totalcost + 1); + ap /= lruvec->anon_cost + 1; - fp = file_prio * (reclaim_stat->recent_scanned[1] + 1); - fp /= reclaim_stat->recent_rotated[1] + 1; + fp = file_prio * (totalcost + 1); + fp /= lruvec->file_cost + 1; spin_unlock_irq(&pgdat->lru_lock); fraction[0] = ap; From patchwork Wed May 20 23:25:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245263 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD751C433E1 for ; Wed, 20 May 2020 23:26:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ADE03208A9 for ; Wed, 20 May 2020 23:26:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="Qi/qXzru" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728869AbgETX0U (ORCPT ); Wed, 20 May 2020 19:26:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728839AbgETX0O (ORCPT ); Wed, 20 May 2020 19:26:14 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 951F3C061A0E for ; Wed, 20 May 2020 16:26:14 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id m11so5525177qka.4 for ; Wed, 20 May 2020 16:26:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JD9BxGjYxOg6IYiFlF2HPLz+K3fq+w79j5nMBZBCwlM=; b=Qi/qXzruuxtJLTXqrhZFDKcKb5REKlP74g21aFN90SZ3bhM87tqSeN8rVQv6r7t1Jp ut9fAU8Db4oeQogAEkzCnYFRWK/Fy/RgT3GvUU03jI0bJbUbdhlbXZhd1FMvGRLLJLUk OuRrTMw8YmUqWOUue4k+R7BWWDRHlz1kjM2X2QV/PeoaVgYRQVTZhsqb2a1+kxU1B5tw FN6M4GOn/HpVF8XtR97Ne449r1UJ9oD9LLKkPOO7aOsgDwL2fHzuZIHi4/DN5/aFWpRz sQoJTlYcMFNce6uwQl+iactFlQ494ZgLPLmf5I3Fc0qMfA5+YFyjlXDcll0G3Pgv9w/n l8Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JD9BxGjYxOg6IYiFlF2HPLz+K3fq+w79j5nMBZBCwlM=; b=SSrVhTfAZRo8pa+ryBlN0LJJJt1yKIKa0MuPbsu2XVkXOrIN/PrxIpmML6+dIq3X76 2Qdz27c3HGP4ca4eKMN/CQFi5+smaii13gSgCWtQIUBMjRpXNdeZZmKRkOWcVvv1Vxi8 7FC+M0x3YHggivqVehljOVrzuxoEiP4gqVnRQ2rRdstQTCqh+ysFMHKqbEwjoUEEZdwK ez0LdhrZjopcUMLBOn7JJKwpqRvedQ/8ohObAUdBtJxP9JWD6kWrcExC/B7JD4nGcPR0 ARBrfoivq6tkV8MA8hizVLirCy+HDsxCtf2Rivn2208bvnShRn/XkTFYM+YnE01l5bLA tMnw== X-Gm-Message-State: AOAM533kOvB0YlpWrXw62HkOZA3yJSfCXdkqDNkFpjyY1Xxgw/tln6Az 8ZzDHW5S7XbObsinv3i/qU3mmw== X-Google-Smtp-Source: ABdhPJzSL6c+84ozsdZGZDu3uGWyllg+dwn1/sWq5j9UJIWTeTeGQbmBH2yrr6HAHp7qlRYoVWHveA== X-Received: by 2002:a37:4c48:: with SMTP id z69mr6380214qka.138.1590017173866; Wed, 20 May 2020 16:26:13 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id z65sm3372453qkc.91.2020.05.20.16.26.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:13 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 09/14] mm: deactivations shouldn't bias the LRU balance Date: Wed, 20 May 2020 19:25:20 -0400 Message-Id: <20200520232525.798933-10-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Operations like MADV_FREE, FADV_DONTNEED etc. currently move any affected active pages to the inactive list to accelerate their reclaim (good) but also steer page reclaim toward that LRU type, or away from the other (bad). The reason why this is undesirable is that such operations are not part of the regular page aging cycle, and rather a fluke that doesn't say much about the remaining pages on that list; they might all be in heavy use, and once the chunk of easy victims has been purged, the VM continues to apply elevated pressure on those remaining hot pages. The other LRU, meanwhile, might have easily reclaimable pages, and there was never a need to steer away from it in the first place. As the previous patch outlined, we should focus on recording actually observed cost to steer the balance rather than speculating about the potential value of one LRU list over the other. In that spirit, leave explicitely deactivated pages to the LRU algorithm to pick up, and let rotations decide which list is the easiest to reclaim. Signed-off-by: Johannes Weiner Acked-by: Minchan Kim Acked-by: Michal Hocko --- mm/swap.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 5d62c5a0c651..d7912bfb597f 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -515,14 +515,12 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, if (active) __count_vm_event(PGDEACTIVATE); - lru_note_cost(lruvec, !file, hpage_nr_pages(page)); } static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, void *arg) { if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { - int file = page_is_file_lru(page); int lru = page_lru_base_type(page); del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE); @@ -531,7 +529,6 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, add_page_to_lru_list(page, lruvec, lru); __count_vm_events(PGDEACTIVATE, hpage_nr_pages(page)); - lru_note_cost(lruvec, !file, hpage_nr_pages(page)); } } @@ -556,7 +553,6 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, __count_vm_events(PGLAZYFREE, hpage_nr_pages(page)); count_memcg_page_event(page, PGLAZYFREE); - lru_note_cost(lruvec, 0, hpage_nr_pages(page)); } } From patchwork Wed May 20 23:25:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245265 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 254C4C433DF for ; Wed, 20 May 2020 23:26:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0261A20759 for ; Wed, 20 May 2020 23:26:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="OELmW9qL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728930AbgETX0k (ORCPT ); Wed, 20 May 2020 19:26:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728846AbgETX0Q (ORCPT ); Wed, 20 May 2020 19:26:16 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46126C061A0E for ; Wed, 20 May 2020 16:26:16 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id f189so5498381qkd.5 for ; Wed, 20 May 2020 16:26:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cWi8GUMvjXVcjgAmpN1v26MqAhRkGwn3WDFKVSmqebQ=; b=OELmW9qLobpcB8i03ujKYrp+niXe2qkQDDyCEagaV+SP83teno7CkK8mwvctRBTkg1 Jn2aAhUzNJaZz9tfP80zhnf138nqWeO0vsvISlzXvS2SGVB93ZvUEVNaN3q2O+GkN7NE bjg8Tg/TDJarAz+t2CHuC3NxYvV30ehHzuIqNk7gjrpRyt3UkikzztTsRkDHEd+njr0f nM8CH+weltbNVSkCzZRV+t3r4RbdHr+Pw5a8Kuz1ujBKDCzLMMOzuAcXrX8etVZiQqlc HFQ0hCpStUibnNrSOGGtuXoKx+Y7Xr81FW+cwTS/uhV6bMLBMl+hCXebE0sLmM8ZmxPZ /01g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cWi8GUMvjXVcjgAmpN1v26MqAhRkGwn3WDFKVSmqebQ=; b=JAdN8SARWVEmFdi2b+LLiE5WA5lmrPjahPZyXgQlwTa7PV9AEZJkvSP0sv0PAXD+TG 5AiwD2hoIRagYNvxW+eNOkqPRK4F10MzajZKlc79VT2jklJ0zLEvf5QgWJnX0pAakLqT a33F6a5dqOmhGhrI/B9ENCPUMWH4LeMDPLnNnP4bm/dIAmd9qq6rEKwJQrrwRRHBReRF wW/tHfMERM99QLiASJgsGWfkC7EgTkF5dB9Wxad+oQDiT5BF2VZPidg2hHccnPPHUN5F h0TZtOdQBLcX3HUnoLadjDzCJZPduxKqbuQY8geOJ/DN4A0suyLgc/rbbhAZYaPoIPRW OXbw== X-Gm-Message-State: AOAM533YN0dxHVtJApjVE1PzERn7WhATZOOwqzY5kNqJ0s/5gr9UJePq MYxTGQlHynfT4diKgf4f55aqyQ0ZJbw= X-Google-Smtp-Source: ABdhPJw3U94cPbOfdJ5cTZaAWKG+ZGczSAYc2f5W6Y4QceSR5KNnjHRzPv09HTHZ8XUkGYZlDFHjoA== X-Received: by 2002:a37:bd81:: with SMTP id n123mr7017392qkf.57.1590017175541; Wed, 20 May 2020 16:26:15 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id p22sm2693945qkp.109.2020.05.20.16.26.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:14 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 10/14] mm: only count actual rotations as LRU reclaim cost Date: Wed, 20 May 2020 19:25:21 -0400 Message-Id: <20200520232525.798933-11-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When shrinking the active file list we rotate referenced pages only when they're in an executable mapping. The others get deactivated. When it comes to balancing scan pressure, though, we count all referenced pages as rotated, even the deactivated ones. Yet they do not carry the same cost to the system: the deactivated page *might* refault later on, but the deactivation is tangible progress toward freeing pages; rotations on the other hand cost time and effort without getting any closer to freeing memory. Don't treat both events as equal. The following patch will hook up LRU balancing to cache and anon refaults, which are a much more concrete cost signal for reclaiming one list over the other. Thus, remove the maybe-IO cost bias from page references, and only note the CPU cost for actual rotations that prevent the pages from getting reclaimed. v2: readable changelog (Michal Hocko) Signed-off-by: Johannes Weiner Acked-by: Minchan Kim Acked-by: Michal Hocko --- mm/vmscan.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 6ff63906a288..2c3fb8dd1159 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2054,7 +2054,6 @@ static void shrink_active_list(unsigned long nr_to_scan, if (page_referenced(page, 0, sc->target_mem_cgroup, &vm_flags)) { - nr_rotated += hpage_nr_pages(page); /* * Identify referenced, file-backed active pages and * give them one more trip around the active list. So @@ -2065,6 +2064,7 @@ static void shrink_active_list(unsigned long nr_to_scan, * so we ignore them here. */ if ((vm_flags & VM_EXEC) && page_is_file_lru(page)) { + nr_rotated += hpage_nr_pages(page); list_add(&page->lru, &l_active); continue; } @@ -2080,10 +2080,8 @@ static void shrink_active_list(unsigned long nr_to_scan, */ spin_lock_irq(&pgdat->lru_lock); /* - * Count referenced pages from currently used mappings as rotated, - * even though only some of them are actually re-activated. This - * helps balance scan pressure between file and anonymous pages in - * get_scan_count. + * Rotating pages costs CPU without actually + * progressing toward the reclaim goal. */ lru_note_cost(lruvec, file, nr_rotated); From patchwork Wed May 20 23:25:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245260 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B53D3C433E5 for ; Wed, 20 May 2020 23:26:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8EE892089D for ; Wed, 20 May 2020 23:26:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="jb+DE776" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728879AbgETX0W (ORCPT ); Wed, 20 May 2020 19:26:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728800AbgETX0S (ORCPT ); Wed, 20 May 2020 19:26:18 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0CCDC061A0E for ; Wed, 20 May 2020 16:26:17 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id c24so4078745qtw.7 for ; Wed, 20 May 2020 16:26:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jdQ80xB6Sy/rgHokK0g8aS6vKa88in/XYXLkOFHRciw=; b=jb+DE776l/fyNxlCw/JBI42y3UT/XNmb/p3gRaRWiZqQ5FyHxhCQ/vBiaUOO2VQ2fY Y0a7fZNRsgF5KVEvf4Q69PO2/duToOYwm4yZ1PGocFY1ceo4s0necEwOH/quwvnGhnIq aA8q6JP7sv/8yR1o8kLmioTB3hhVUxNa4+ui4/qPtiS1tWfdZsrQdeLFe01oM+WxrBw4 GElDns84HC4jqVoG+fhY4Imj6Z4N/IdpI787FPyDiQJVVvJTe1zmVh1EzSdcsGk6Jr9Q 5QV4/lE4N89Q/ptxpSRIF9GEhDPfA2b2Jef0Q3GY7eBBUOIX7ko4fqKo62C3bDtjL9rj hXQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jdQ80xB6Sy/rgHokK0g8aS6vKa88in/XYXLkOFHRciw=; b=He1wZZ8ltGe5S22WWpYawNFlHvD2Xtctul5BxGh4svxl+/KHJKTDftd7+guBGguMFc ivhuJ4dyD7CzmAyv/mDDcL/oDyxEwa6jI6vT2chwCdaaISj1L1poPAvCnvZo7CliKfzh N5gNX+16KDGczr4xGzR+Yrazj7GPmXYLaut1sTfPAirD+/oeqiqzBn04+Q2WTvNEeYYf iA1oQHgDBJYDoUA6gFj5EIjPvLgYgmtshdtQgesT9kAKmGU/GRjHtASvnzMFfyw6f5Zj H3i18h4bqqE/1CHwCChY1DHO1b/wGDAdwhEp/0ZC+4QBmkA91b/iRjxqh8FKvYHwhVIG JUoA== X-Gm-Message-State: AOAM531F8m0XJd25284gjVDEiC1QxpoqoTjLO0FtlMB9JD4xfODDLDYa F6MK2RgqGixLi9tqIT7+14B/5A== X-Google-Smtp-Source: ABdhPJyxwfsHIlUpPMQ1Jzk7Pe13S07SClFnhyqM3GLDO8dPNWkEqRjEquSRuwmK5MchEuMc6VTH2Q== X-Received: by 2002:aed:35a1:: with SMTP id c30mr8075822qte.228.1590017176929; Wed, 20 May 2020 16:26:16 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id o18sm3869649qtb.7.2020.05.20.16.26.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:16 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 11/14] mm: balance LRU lists based on relative thrashing Date: Wed, 20 May 2020 19:25:22 -0400 Message-Id: <20200520232525.798933-12-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Since the LRUs were split into anon and file lists, the VM has been balancing between page cache and anonymous pages based on per-list ratios of scanned vs. rotated pages. In most cases that tips page reclaim towards the list that is easier to reclaim and has the fewest actively used pages, but there are a few problems with it: 1. Refaults and LRU rotations are weighted the same way, even though one costs IO and the other costs a bit of CPU. 2. The less we scan an LRU list based on already observed rotations, the more we increase the sampling interval for new references, and rotations become even more likely on that list. This can enter a death spiral in which we stop looking at one list completely until the other one is all but annihilated by page reclaim. Since commit a528910e12ec ("mm: thrash detection-based file cache sizing") we have refault detection for the page cache. Along with swapin events, they are good indicators of when the file or anon list, respectively, is too small for its workingset and needs to grow. For example, if the page cache is thrashing, the cache pages need more time in memory, while there may be colder pages on the anonymous list. Likewise, if swapped pages are faulting back in, it indicates that we reclaim anonymous pages too aggressively and should back off. Replace LRU rotations with refaults and swapins as the basis for relative reclaim cost of the two LRUs. This will have the VM target list balances that incur the least amount of IO on aggregate. Signed-off-by: Johannes Weiner --- include/linux/swap.h | 3 +-- mm/swap.c | 11 +++++++---- mm/swap_state.c | 5 +++++ mm/vmscan.c | 39 ++++++++++----------------------------- mm/workingset.c | 4 ++++ 5 files changed, 27 insertions(+), 35 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 5ace6d8a33bd..818a94b41d82 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -333,8 +333,7 @@ extern unsigned long nr_free_pagecache_pages(void); /* linux/mm/swap.c */ -extern void lru_note_cost(struct lruvec *lruvec, bool file, - unsigned int nr_pages); +extern void lru_note_cost(struct page *); extern void lru_cache_add(struct page *); extern void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *head); diff --git a/mm/swap.c b/mm/swap.c index d7912bfb597f..2ff91656dea2 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -262,12 +262,15 @@ void rotate_reclaimable_page(struct page *page) } } -void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages) +void lru_note_cost(struct page *page) { - if (file) - lruvec->file_cost += nr_pages; + struct lruvec *lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + + /* Record new data point */ + if (page_is_file_lru(page)) + lruvec->file_cost++; else - lruvec->anon_cost += nr_pages; + lruvec->anon_cost++; } static void __activate_page(struct page *page, struct lruvec *lruvec, diff --git a/mm/swap_state.c b/mm/swap_state.c index b5e08ff00e1e..8b902897a867 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -430,6 +430,11 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, if (mem_cgroup_charge(page, NULL, gfp_mask & GFP_KERNEL)) goto fail_delete; + /* XXX: Move to lru_cache_add() when it supports new vs putback */ + spin_lock_irq(&page_pgdat(page)->lru_lock); + lru_note_cost(page); + spin_unlock_irq(&page_pgdat(page)->lru_lock); + /* Initiate read into locked page */ SetPageWorkingset(page); lru_cache_add(page); diff --git a/mm/vmscan.c b/mm/vmscan.c index 2c3fb8dd1159..e7e6868bcbf7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1958,12 +1958,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, move_pages_to_lru(lruvec, &page_list); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - /* - * Rotating pages costs CPU without actually - * progressing toward the reclaim goal. - */ - lru_note_cost(lruvec, 0, stat.nr_activate[0]); - lru_note_cost(lruvec, 1, stat.nr_activate[1]); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); @@ -2079,11 +2073,6 @@ static void shrink_active_list(unsigned long nr_to_scan, * Move pages back to the lru list. */ spin_lock_irq(&pgdat->lru_lock); - /* - * Rotating pages costs CPU without actually - * progressing toward the reclaim goal. - */ - lru_note_cost(lruvec, file, nr_rotated); nr_activate = move_pages_to_lru(lruvec, &l_active); nr_deactivate = move_pages_to_lru(lruvec, &l_inactive); @@ -2298,22 +2287,23 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, scan_balance = SCAN_FRACT; /* - * With swappiness at 100, anonymous and file have the same priority. - * This scanning priority is essentially the inverse of IO cost. + * Calculate the pressure balance between anon and file pages. + * + * The amount of pressure we put on each LRU is inversely + * proportional to the cost of reclaiming each list, as + * determined by the share of pages that are refaulting, times + * the relative IO cost of bringing back a swapped out + * anonymous page vs reloading a filesystem page (swappiness). + * + * With swappiness at 100, anon and file have equal IO cost. */ anon_prio = swappiness; file_prio = 200 - anon_prio; /* - * OK, so we have swap space and a fair amount of page cache - * pages. We use the recently rotated / recently scanned - * ratios to determine how valuable each cache is. - * * Because workloads change over time (and to avoid overflow) * we keep these statistics as a floating average, which ends - * up weighing recent references more than old ones. - * - * anon in [0], file in [1] + * up weighing recent refaults more than old ones. */ anon = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) + @@ -2328,15 +2318,6 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, lruvec->file_cost /= 2; totalcost /= 2; } - - /* - * The amount of pressure on anon vs file pages is inversely - * proportional to the assumed cost of reclaiming each list, - * as determined by the share of pages that are likely going - * to refault or rotate on each list (recently referenced), - * times the relative IO cost of bringing back a swapped out - * anonymous page vs reloading a filesystem page (swappiness). - */ ap = anon_prio * (totalcost + 1); ap /= lruvec->anon_cost + 1; diff --git a/mm/workingset.c b/mm/workingset.c index e69865739539..a6a2a740ed0b 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -365,6 +365,10 @@ void workingset_refault(struct page *page, void *shadow) /* Page was active prior to eviction */ if (workingset) { SetPageWorkingset(page); + /* XXX: Move to lru_cache_add() when it supports new vs putback */ + spin_lock_irq(&page_pgdat(page)->lru_lock); + lru_note_cost(page); + spin_unlock_irq(&page_pgdat(page)->lru_lock); inc_lruvec_state(lruvec, WORKINGSET_RESTORE); } out: From patchwork Wed May 20 23:25:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245261 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A2C5C433E2 for ; Wed, 20 May 2020 23:26:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F331B2089D for ; Wed, 20 May 2020 23:26:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="PSsOIvSP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728889AbgETX0Y (ORCPT ); Wed, 20 May 2020 19:26:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728866AbgETX0T (ORCPT ); Wed, 20 May 2020 19:26:19 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83A7AC061A0E for ; Wed, 20 May 2020 16:26:19 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id x12so4062975qts.9 for ; Wed, 20 May 2020 16:26:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/qXY/GfTCL+Mr3UBVbQ4X5QzZETe/pl7Vv8oztEayYI=; b=PSsOIvSPlgXVB2PWgNn9RS9RdFPIsrwfa/83SQv3ZZf/Uvr9WVl1NH08U4jAfwTUod QJAh5J0PzfHdKRbyXV7fjK+H7YFSnYIyqeI4+n+922T1Vp7jvCdwiWf/d5/toiDLMvDk Hxri8JORn9NhXuBMoiwwGospfZD7yJyppmhtc1Mn+UUQyN4CYoIgO9nu6EiHOTs2f+7r sU0zICUqXiuR8vypJWT2L26oHyUgnyUC7mW9k0fHCuYIfvftxaw1JzhAsUbFOfodI01j ZY/Fy7ZuqoV0nf3mWpqipinUK0bVh66y/AdKSga06/X9ZcHOK7Kl1iOSkYtR6wCOcjbE Op/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/qXY/GfTCL+Mr3UBVbQ4X5QzZETe/pl7Vv8oztEayYI=; b=MkaJk+BX+8eTiBDsYUpswmKYAxevh1SIS67v3AVO0TptTIN2u2bcIJHwejbpXqr2WE fPCQ5YWWoL+zRmG9nu7NYNgniZBOwKPbTsbPC7wepHvwa3JHOgF80i+iJuxFP/rqqyg8 9Tzsmnz3hi2QL8uHAq16gTORCR6wbf4pL3wrofNdNyu+xXkpvsLVB33hZIHgMXKWoIWv sNpR/jip1yWjrCEISw/dFcIdeGuRBj1QWxWAjfPqOI8KQeH/IBtWbeu27B8ARemJIOO2 r0loL6UjDm08L8NxxAH/rqm2iy6DU0+PrLJNn6B2rbe27ATzNFPDtn+n08E5DQz4wTgO tksA== X-Gm-Message-State: AOAM531har1X1iVzcOFmTQMACXoXyYSYGVtdUCgiWF3J6oNzkuqkXaNq jDCxKG8wpA2SqdYRHb/k4ZWpYA== X-Google-Smtp-Source: ABdhPJxi4EgJCESVIwdCAEwBX7lXykOt6rKu034N+rUoTj2Ks7dEIwot4dYtekJfafPemUXYI7SASw== X-Received: by 2002:ac8:c8b:: with SMTP id n11mr8182253qti.49.1590017178581; Wed, 20 May 2020 16:26:18 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id t88sm3665378qtd.5.2020.05.20.16.26.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:18 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 12/14] mm: vmscan: determine anon/file pressure balance at the reclaim root Date: Wed, 20 May 2020 19:25:23 -0400 Message-Id: <20200520232525.798933-13-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We split the LRU lists into anon and file, and we rebalance the scan pressure between them when one of them begins thrashing: if the file cache experiences workingset refaults, we increase the pressure on anonymous pages; if the workload is stalled on swapins, we increase the pressure on the file cache instead. With cgroups and their nested LRU lists, we currently don't do this correctly. While recursive cgroup reclaim establishes a relative LRU order among the pages of all involved cgroups, LRU pressure balancing is done on an individual cgroup LRU level. As a result, when one cgroup is thrashing on the filesystem cache while a sibling may have cold anonymous pages, pressure doesn't get equalized between them. This patch moves LRU balancing decision to the root of reclaim - the same level where the LRU order is established. It does this by tracking LRU cost recursively, so that every level of the cgroup tree knows the aggregate LRU cost of all memory within its domain. When the page scanner calculates the scan balance for any given individual cgroup's LRU list, it uses the values from the ancestor cgroup that initiated the reclaim cycle. If one sibling is then thrashing on the cache, it will tip the pressure balance inside its ancestors, and the next hierarchical reclaim iteration will go more after the anon pages in the tree. Signed-off-by: Johannes Weiner --- include/linux/memcontrol.h | 13 ++++++++++++ mm/swap.c | 32 ++++++++++++++++++++++++----- mm/vmscan.c | 41 ++++++++++++++++---------------------- 3 files changed, 57 insertions(+), 29 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 32a0b4d47540..d982c80da157 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1303,6 +1303,19 @@ static inline void dec_lruvec_page_state(struct page *page, mod_lruvec_page_state(page, idx, -1); } +static inline struct lruvec *parent_lruvec(struct lruvec *lruvec) +{ + struct mem_cgroup *memcg; + + memcg = lruvec_memcg(lruvec); + if (!memcg) + return NULL; + memcg = parent_mem_cgroup(memcg); + if (!memcg) + return NULL; + return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec)); +} + #ifdef CONFIG_CGROUP_WRITEBACK struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); diff --git a/mm/swap.c b/mm/swap.c index 2ff91656dea2..3d8aa46c47ff 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -266,11 +266,33 @@ void lru_note_cost(struct page *page) { struct lruvec *lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - /* Record new data point */ - if (page_is_file_lru(page)) - lruvec->file_cost++; - else - lruvec->anon_cost++; + do { + unsigned long lrusize; + + /* Record cost event */ + if (page_is_file_lru(page)) + lruvec->file_cost++; + else + lruvec->anon_cost++; + + /* + * Decay previous events + * + * Because workloads change over time (and to avoid + * overflow) we keep these statistics as a floating + * average, which ends up weighing recent refaults + * more than old ones. + */ + lrusize = lruvec_page_state(lruvec, NR_INACTIVE_ANON) + + lruvec_page_state(lruvec, NR_ACTIVE_ANON) + + lruvec_page_state(lruvec, NR_INACTIVE_FILE) + + lruvec_page_state(lruvec, NR_ACTIVE_FILE); + + if (lruvec->file_cost + lruvec->anon_cost > lrusize / 4) { + lruvec->file_cost /= 2; + lruvec->anon_cost /= 2; + } + } while ((lruvec = parent_lruvec(lruvec))); } static void __activate_page(struct page *page, struct lruvec *lruvec, diff --git a/mm/vmscan.c b/mm/vmscan.c index e7e6868bcbf7..1487ff5d4698 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -79,6 +79,12 @@ struct scan_control { */ struct mem_cgroup *target_mem_cgroup; + /* + * Scan pressure balancing between anon and file LRUs + */ + unsigned long anon_cost; + unsigned long file_cost; + /* Can active pages be deactivated as part of reclaim? */ #define DEACTIVATE_ANON 1 #define DEACTIVATE_FILE 2 @@ -2231,10 +2237,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, int swappiness = mem_cgroup_swappiness(memcg); u64 fraction[2]; u64 denominator = 0; /* gcc */ - struct pglist_data *pgdat = lruvec_pgdat(lruvec); unsigned long anon_prio, file_prio; enum scan_balance scan_balance; - unsigned long anon, file; unsigned long totalcost; unsigned long ap, fp; enum lru_list lru; @@ -2285,7 +2289,6 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, } scan_balance = SCAN_FRACT; - /* * Calculate the pressure balance between anon and file pages. * @@ -2300,30 +2303,12 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, anon_prio = swappiness; file_prio = 200 - anon_prio; - /* - * Because workloads change over time (and to avoid overflow) - * we keep these statistics as a floating average, which ends - * up weighing recent refaults more than old ones. - */ - - anon = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) + - lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES); - file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) + - lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES); - - spin_lock_irq(&pgdat->lru_lock); - totalcost = lruvec->anon_cost + lruvec->file_cost; - if (unlikely(totalcost > (anon + file) / 4)) { - lruvec->anon_cost /= 2; - lruvec->file_cost /= 2; - totalcost /= 2; - } + totalcost = sc->anon_cost + sc->file_cost; ap = anon_prio * (totalcost + 1); - ap /= lruvec->anon_cost + 1; + ap /= sc->anon_cost + 1; fp = file_prio * (totalcost + 1); - fp /= lruvec->file_cost + 1; - spin_unlock_irq(&pgdat->lru_lock); + fp /= sc->file_cost + 1; fraction[0] = ap; fraction[1] = fp; @@ -2679,6 +2664,14 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) nr_reclaimed = sc->nr_reclaimed; nr_scanned = sc->nr_scanned; + /* + * Determine the scan balance between anon and file LRUs. + */ + spin_lock_irq(&pgdat->lru_lock); + sc->anon_cost = target_lruvec->anon_cost; + sc->file_cost = target_lruvec->file_cost; + spin_unlock_irq(&pgdat->lru_lock); + /* * Target desirable inactive:active list ratios for the anon * and file LRU lists. From patchwork Wed May 20 23:25:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245264 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49533C433E3 for ; Wed, 20 May 2020 23:26:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 21A5820759 for ; Wed, 20 May 2020 23:26:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="ymMBqLQf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728912AbgETX0c (ORCPT ); Wed, 20 May 2020 19:26:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728871AbgETX0V (ORCPT ); Wed, 20 May 2020 19:26:21 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFDD1C061A0E for ; Wed, 20 May 2020 16:26:20 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id o19so4059556qtr.10 for ; Wed, 20 May 2020 16:26:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9J/IQYd83mBxHLfy76lcnUtA4w8aOTMdLMgX2FLlLsQ=; b=ymMBqLQfEQW5KElYK7T4y/9ipWW33G9fMTVFLQGdE7/dQ8tAS9eUEHZ/dTtUN6QDxg Xy9tb3mMkcwSjTcxUTcF3DQJPt9WBlDHTk3QCb5zDYFV7SwSmr/Lp0/PoZf1hKtsyJ2I ZGsf7aOSH3VLyE6UVGBamMKPAm6XDF6HrmBBo/Sd9rB8CPomBmOLF2l0bqpSD4sdsV/Q JO1Y1kPqtWMRqhTCybPxStIDqi/CI/wi76l4izuGceQaxVUaG+6CEvGJGmxWyBjxNfeP USe2YSk+QvKaRB6PTHVaxDgXXENs4cwOTcwfIgDxfvV57Gbnl3MXK7YKxwrHsxMHoy2X Lfjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9J/IQYd83mBxHLfy76lcnUtA4w8aOTMdLMgX2FLlLsQ=; b=MWZpWSXV3AfusXuCTxQHl/oS5u15JfFAqW/wipMhA9SNLRMkD98pHFu4mFYw3ge4iV 1Fap9xNr5Mb1rbsKfyZWw/gWaA6FmJInXsiSu7dq5FeC1EUyrSz1QFg0y06irvcLHezn F9q70I5NwIDgEr5aKY0U00dc7+N7at6nxvKVYuZndgXblsbeLUAimIBilrPhRfAigAUn q0AM8ptqiAzXydToA1UmEayBB9dRSiIRkGxsQ5eMfGehEI7TNTt/srHAq4IhGV3+VwCK jplJ+7FMxf/QyJf0gbmMtVir/fOVPvtW5+2VhJHzYH0uKTvf1PgR1SMQnI+2VW2YMPC6 iyBg== X-Gm-Message-State: AOAM532msNoRRZhq9oLvDnuH2a8LvlMb+U4QyHepSiBQ9IsFNXy5lfTr vqs9SSYCMZVkusW1QQtOUcD3bxkniEA= X-Google-Smtp-Source: ABdhPJzlcL09DR8Lj5VQfEi4dP0R0lZJ4ur4CgxgQLf8R+pIa0MbEqXFCRCw4JOTlBkaqQwSSR9Wlg== X-Received: by 2002:ac8:37e6:: with SMTP id e35mr7733071qtc.310.1590017180044; Wed, 20 May 2020 16:26:20 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id w21sm3863134qtj.78.2020.05.20.16.26.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:19 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 13/14] mm: vmscan: reclaim writepage is IO cost Date: Wed, 20 May 2020 19:25:24 -0400 Message-Id: <20200520232525.798933-14-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The VM tries to balance reclaim pressure between anon and file so as to reduce the amount of IO incurred due to the memory shortage. It already counts refaults and swapins, but in addition it should also count writepage calls during reclaim. For swap, this is obvious: it's IO that wouldn't have occurred if the anonymous memory hadn't been under memory pressure. From a relative balancing point of view this makes sense as well: even if anon is cold and reclaimable, a cache that isn't thrashing may have equally cold pages that don't require IO to reclaim. For file writeback, it's trickier: some of the reclaim writepage IO would have likely occurred anyway due to dirty expiration. But not all of it - premature writeback reduces batching and generates additional writes. Since the flushers are already woken up by the time the VM starts writing cache pages one by one, let's assume that we'e likely causing writes that wouldn't have happened without memory pressure. In addition, the per-page cost of IO would have probably been much cheaper if written in larger batches from the flusher thread rather than the single-page-writes from kswapd. For our purposes - getting the trend right to accelerate convergence on a stable state that doesn't require paging at all - this is sufficiently accurate. If we later wanted to optimize for sustained thrashing, we can still refine the measurements. Count all writepage calls from kswapd as IO cost toward the LRU that the page belongs to. Why do this dynamically? Don't we know in advance that anon pages require IO to reclaim, and so could build in a static bias? First, scanning is not the same as reclaiming. If all the anon pages are referenced, we may not swap for a while just because we're scanning the anon list. During this time, however, it's important that we age anonymous memory and the page cache at the same rate so that their hot-cold gradients are comparable. Everything else being equal, we still want to reclaim the coldest memory overall. Second, we keep copies in swap unless the page changes. If there is swap-backed data that's mostly read (tmpfs file) and has been swapped out before, we can reclaim it without incurring additional IO. Signed-off-by: Johannes Weiner --- include/linux/swap.h | 4 +++- include/linux/vmstat.h | 1 + mm/swap.c | 16 ++++++++++------ mm/swap_state.c | 2 +- mm/vmscan.c | 3 +++ mm/workingset.c | 2 +- 6 files changed, 19 insertions(+), 9 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 818a94b41d82..157e5081bf98 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -333,7 +333,9 @@ extern unsigned long nr_free_pagecache_pages(void); /* linux/mm/swap.c */ -extern void lru_note_cost(struct page *); +extern void lru_note_cost(struct lruvec *lruvec, bool file, + unsigned int nr_pages); +extern void lru_note_cost_page(struct page *); extern void lru_cache_add(struct page *); extern void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *head); diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 10cc932e209a..3d12c34cd42a 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -26,6 +26,7 @@ struct reclaim_stat { unsigned nr_congested; unsigned nr_writeback; unsigned nr_immediate; + unsigned nr_pageout; unsigned nr_activate[2]; unsigned nr_ref_keep; unsigned nr_unmap_fail; diff --git a/mm/swap.c b/mm/swap.c index 3d8aa46c47ff..ffc457911be2 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -262,18 +262,16 @@ void rotate_reclaimable_page(struct page *page) } } -void lru_note_cost(struct page *page) +void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages) { - struct lruvec *lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - do { unsigned long lrusize; /* Record cost event */ - if (page_is_file_lru(page)) - lruvec->file_cost++; + if (file) + lruvec->file_cost += nr_pages; else - lruvec->anon_cost++; + lruvec->anon_cost += nr_pages; /* * Decay previous events @@ -295,6 +293,12 @@ void lru_note_cost(struct page *page) } while ((lruvec = parent_lruvec(lruvec))); } +void lru_note_cost_page(struct page *page) +{ + lru_note_cost(mem_cgroup_page_lruvec(page, page_pgdat(page)), + page_is_file_lru(page), hpage_nr_pages(page)); +} + static void __activate_page(struct page *page, struct lruvec *lruvec, void *arg) { diff --git a/mm/swap_state.c b/mm/swap_state.c index 8b902897a867..1e744e6c9c20 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -432,7 +432,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* XXX: Move to lru_cache_add() when it supports new vs putback */ spin_lock_irq(&page_pgdat(page)->lru_lock); - lru_note_cost(page); + lru_note_cost_page(page); spin_unlock_irq(&page_pgdat(page)->lru_lock); /* Initiate read into locked page */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 1487ff5d4698..5453b4ef2ea1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1359,6 +1359,8 @@ static unsigned int shrink_page_list(struct list_head *page_list, case PAGE_ACTIVATE: goto activate_locked; case PAGE_SUCCESS: + stat->nr_pageout += hpage_nr_pages(page); + if (PageWriteback(page)) goto keep; if (PageDirty(page)) @@ -1964,6 +1966,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, move_pages_to_lru(lruvec, &page_list); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); + lru_note_cost(lruvec, file, stat.nr_pageout); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); diff --git a/mm/workingset.c b/mm/workingset.c index a6a2a740ed0b..d481ea452eeb 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -367,7 +367,7 @@ void workingset_refault(struct page *page, void *shadow) SetPageWorkingset(page); /* XXX: Move to lru_cache_add() when it supports new vs putback */ spin_lock_irq(&page_pgdat(page)->lru_lock); - lru_note_cost(page); + lru_note_cost_page(page); spin_unlock_irq(&page_pgdat(page)->lru_lock); inc_lruvec_state(lruvec, WORKINGSET_RESTORE); } From patchwork Wed May 20 23:25:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 1245262 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 525EAC433E4 for ; Wed, 20 May 2020 23:26:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2029E20899 for ; Wed, 20 May 2020 23:26:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="hpPggKJV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728898AbgETX02 (ORCPT ); Wed, 20 May 2020 19:26:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728800AbgETX0W (ORCPT ); Wed, 20 May 2020 19:26:22 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CC1EC061A0E for ; Wed, 20 May 2020 16:26:22 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id p12so4054011qtn.13 for ; Wed, 20 May 2020 16:26:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Y1gbv98UFIo3SZ72LRLIih1+wdVTAGD1gNzI7HeJpHo=; b=hpPggKJV3g/zDlX6q+IhjzuaavboPvYaqmrBUzXq9ZuPMM5ClP8Bg1in2XIQ9orriE C3/fOi2brri7XodOhZcMTWfdRuq9LwXxZYJxLvZ+Ht5koUY1VFZ3AzkUVKxMwi330Gxv mVC7DJPtGzV5OUq+Xq06kmHZBrnWm4DjETEOTBOqK0WFjGIYMvQFk8Ghbw+2G5CELACD US/WZx6T8fOw2RdBM2kdBYcP8oaAV1GpYueK7oj7VZz/GEL1gJha+42iQCDigfZX9qPm nrC/AXRbipE8QvX3yfiJgY/Ye36g5GgmfjtVLl9gC1u+mb+PMdEBjUd6SFV5M19yghd4 Cjdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Y1gbv98UFIo3SZ72LRLIih1+wdVTAGD1gNzI7HeJpHo=; b=QwI3tTxSRJzns28imV8X6fvSmkq/h6tvXDuVvxVdcwq+IYCKJG7fAlCqv/t3Z68KlT bg+Bx8RmKq1+AYY/5+OcL0GmE8rV4RDe9jBELp7OuvAYK0kGyfiXKIdKZ4r/v1aKO7TB gkN0rleiceOLquYD6tF0yKKttSjnpRnEHRwuOh6caVUEqVw4+r5CvyaLSiBJECsf8mM/ K8fciK8V9BSFSVPJn5NLLKgyAETfzb30dggiON+U0pC/uRUL8AbR+ME8tdXLzRMWoN4G NSShj07T+TY5xiP+WNRvBxX50toDrBCdNl9GcCh5mdl25m1Rudq2Q3pT7bqg1D7PtnfI RWiQ== X-Gm-Message-State: AOAM532qvRtfhfZc/ZMOZcW22aqQ48CKOjNaGNtdbE7avX+O3DpuKyy6 n3AzrTjeR/PoWQAI0oWSAxP18w== X-Google-Smtp-Source: ABdhPJzwApchXPfv+uLAF0ZuP3Mw5WX1lotYMFwoUQTYCPT/AwiqdCElwWXi1OQ68cMyYPQz/b88jQ== X-Received: by 2002:ac8:2dbc:: with SMTP id p57mr7872466qta.280.1590017181716; Wed, 20 May 2020 16:26:21 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4708]) by smtp.gmail.com with ESMTPSA id o3sm3597138qtt.56.2020.05.20.16.26.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2020 16:26:21 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Rik van Riel , Minchan Kim , Michal Hocko , Andrew Morton , Joonsoo Kim , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 14/14] mm: vmscan: limit the range of LRU type balancing Date: Wed, 20 May 2020 19:25:25 -0400 Message-Id: <20200520232525.798933-15-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520232525.798933-1-hannes@cmpxchg.org> References: <20200520232525.798933-1-hannes@cmpxchg.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When LRU cost only shows up on one list, we abruptly stop scanning that list altogether. That's an extreme reaction: by the time the other list starts thrashing and the pendulum swings back, we may have no recent age information on the first list anymore, and we could have significant latencies until the scanner has caught up. Soften this change in the feedback system by ensuring that no list receives less than a third of overall pressure, and only distribute the other 66% according to LRU cost. This ensures that we maintain a minimum rate of aging on the entire workingset while it's being pressured, while still allowing a generous rate of convergence when the relative sizes of the lists need to adjust. Signed-off-by: Johannes Weiner --- mm/vmscan.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 5453b4ef2ea1..c628f9ab886b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2237,12 +2237,11 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, unsigned long *nr) { struct mem_cgroup *memcg = lruvec_memcg(lruvec); + unsigned long anon_cost, file_cost, total_cost; int swappiness = mem_cgroup_swappiness(memcg); u64 fraction[2]; u64 denominator = 0; /* gcc */ - unsigned long anon_prio, file_prio; enum scan_balance scan_balance; - unsigned long totalcost; unsigned long ap, fp; enum lru_list lru; @@ -2301,17 +2300,22 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, * the relative IO cost of bringing back a swapped out * anonymous page vs reloading a filesystem page (swappiness). * + * Although we limit that influence to ensure no list gets + * left behind completely: at least a third of the pressure is + * applied, before swappiness. + * * With swappiness at 100, anon and file have equal IO cost. */ - anon_prio = swappiness; - file_prio = 200 - anon_prio; + total_cost = sc->anon_cost + sc->file_cost; + anon_cost = total_cost + sc->anon_cost; + file_cost = total_cost + sc->file_cost; + total_cost = anon_cost + file_cost; - totalcost = sc->anon_cost + sc->file_cost; - ap = anon_prio * (totalcost + 1); - ap /= sc->anon_cost + 1; + ap = swappiness * (total_cost + 1); + ap /= anon_cost + 1; - fp = file_prio * (totalcost + 1); - fp /= sc->file_cost + 1; + fp = (200 - swappiness) * (total_cost + 1); + fp /= file_cost + 1; fraction[0] = ap; fraction[1] = fp;