From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933075AbdA0K7D (ORCPT ); Fri, 27 Jan 2017 05:59:03 -0500 Received: from mx2.suse.de ([195.135.220.15]:60560 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932521AbdA0K7A (ORCPT ); Fri, 27 Jan 2017 05:59:00 -0500 X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "References" From: Jiri Slaby To: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Shaohua Li , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Linus Torvalds , Jiri Slaby Subject: [PATCH 3.12 034/235] mm/vmscan.c: set correct defer count for shrinker Date: Fri, 27 Jan 2017 11:52:47 +0100 Message-Id: <9ba6fb6a61aeac873544e8805d584c42f78319b1.1485514374.git.jslaby@suse.cz> X-Mailer: git-send-email 2.11.0 In-Reply-To: <5b46dc789ca2be4046e4e40a131858d386cac741.1485514374.git.jslaby@suse.cz> References: <5b46dc789ca2be4046e4e40a131858d386cac741.1485514374.git.jslaby@suse.cz> In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shaohua Li 3.12-stable review patch. If anyone has any objections, please let me know. =============== commit 5f33a0803bbd781de916f5c7448cbbbbc763d911 upstream. Our system uses significantly more slab memory with memcg enabled with the latest kernel. With 3.10 kernel, slab uses 2G memory, while with 4.6 kernel, 6G memory is used. The shrinker has problem. Let's see we have two memcg for one shrinker. In do_shrink_slab: 1. Check cg1. nr_deferred = 0, assume total_scan = 700. batch size is 1024, then no memory is freed. nr_deferred = 700 2. Check cg2. nr_deferred = 700. Assume freeable = 20, then total_scan = 10 or 40. Let's assume it's 10. No memory is freed. nr_deferred = 10. The deferred share of cg1 is lost in this case. kswapd will free no memory even run above steps again and again. The fix makes sure one memcg's deferred share isn't lost. Link: http://lkml.kernel.org/r/2414be961b5d25892060315fbb56bb19d81d0c07.1476227351.git.shli@fb.com Signed-off-by: Shaohua Li Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Jiri Slaby --- mm/vmscan.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 6dc33d9dc2cf..dc23ad3ecf4c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -231,6 +231,7 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker, int nid = shrinkctl->nid; long batch_size = shrinker->batch ? shrinker->batch : SHRINK_BATCH; + long scanned = 0, next_deferred; freeable = shrinker->count_objects(shrinker, shrinkctl); if (freeable == 0) @@ -253,7 +254,9 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker, "shrink_slab: %pF negative objects to delete nr=%ld\n", shrinker->scan_objects, total_scan); total_scan = freeable; - } + next_deferred = nr; + } else + next_deferred = total_scan; /* * We need to avoid excessive windup on filesystem shrinkers @@ -310,17 +313,22 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker, count_vm_events(SLABS_SCANNED, nr_to_scan); total_scan -= nr_to_scan; + scanned += nr_to_scan; cond_resched(); } + if (next_deferred >= scanned) + next_deferred -= scanned; + else + next_deferred = 0; /* * move the unused scan count back into the shrinker in a * manner that handles concurrent updates. If we exhausted the * scan, there is no need to do an update. */ - if (total_scan > 0) - new_nr = atomic_long_add_return(total_scan, + if (next_deferred > 0) + new_nr = atomic_long_add_return(next_deferred, &shrinker->nr_deferred[nid]); else new_nr = atomic_long_read(&shrinker->nr_deferred[nid]); -- 2.11.0