From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751588AbdB1X5j (ORCPT ); Tue, 28 Feb 2017 18:57:39 -0500 Received: from gum.cmpxchg.org ([85.214.110.215]:37882 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751450AbdB1X5T (ORCPT ); Tue, 28 Feb 2017 18:57:19 -0500 From: Johannes Weiner To: Andrew Morton Cc: Jia He , Michal Hocko , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 0/9] mm: kswapd spinning on unreclaimable nodes - fixes and cleanups Date: Tue, 28 Feb 2017 16:39:58 -0500 Message-Id: <20170228214007.5621-1-hannes@cmpxchg.org> X-Mailer: git-send-email 2.11.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Jia reported a scenario in which the kswapd of a node indefinitely spins at 100% CPU usage. We have seen similar cases at Facebook. The kernel's current method of judging its ability to reclaim a node (or whether to back off and sleep) is based on the amount of scanned pages in proportion to the amount of reclaimable pages. In Jia's and our scenarios, there are no reclaimable pages in the node, however, and the condition for backing off is never met. Kswapd busyloops in an attempt to restore the watermarks while having nothing to work with. This series reworks the definition of an unreclaimable node based not on scanning but on whether kswapd is able to actually reclaim pages in MAX_RECLAIM_RETRIES (16) consecutive runs. This is the same criteria the page allocator uses for giving up on direct reclaim and invoking the OOM killer. If it cannot free any pages, kswapd will go to sleep and leave further attempts to direct reclaim invocations, which will either make progress and re-enable kswapd, or invoke the OOM killer. Patch #1 fixes the immediate problem Jia reported, the remainder are smaller fixlets, cleanups, and overall phasing out of the old method. Patch #6 is the odd one out. It's a nice cleanup to get_scan_count(), and directly related to #5, but in itself not relevant to the series. If the whole series is too ambitious for 4.11, I would consider the first three patches fixes, the rest cleanups. Thanks include/linux/mmzone.h | 3 +- mm/internal.h | 7 +- mm/migrate.c | 3 - mm/page_alloc.c | 39 +++-------- mm/vmscan.c | 169 ++++++++++++++++++----------------------------- mm/vmstat.c | 24 ++----- 6 files changed, 89 insertions(+), 156 deletions(-) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 2AA566B0038 for ; Tue, 28 Feb 2017 16:46:13 -0500 (EST) Received: by mail-wm0-f72.google.com with SMTP id u63so9373733wmu.0 for ; Tue, 28 Feb 2017 13:46:13 -0800 (PST) Received: from gum.cmpxchg.org (gum.cmpxchg.org. [85.214.110.215]) by mx.google.com with ESMTPS id e1si3985580wrd.138.2017.02.28.13.46.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 28 Feb 2017 13:46:12 -0800 (PST) From: Johannes Weiner Subject: [PATCH 0/9] mm: kswapd spinning on unreclaimable nodes - fixes and cleanups Date: Tue, 28 Feb 2017 16:39:58 -0500 Message-Id: <20170228214007.5621-1-hannes@cmpxchg.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Jia He , Michal Hocko , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Hi, Jia reported a scenario in which the kswapd of a node indefinitely spins at 100% CPU usage. We have seen similar cases at Facebook. The kernel's current method of judging its ability to reclaim a node (or whether to back off and sleep) is based on the amount of scanned pages in proportion to the amount of reclaimable pages. In Jia's and our scenarios, there are no reclaimable pages in the node, however, and the condition for backing off is never met. Kswapd busyloops in an attempt to restore the watermarks while having nothing to work with. This series reworks the definition of an unreclaimable node based not on scanning but on whether kswapd is able to actually reclaim pages in MAX_RECLAIM_RETRIES (16) consecutive runs. This is the same criteria the page allocator uses for giving up on direct reclaim and invoking the OOM killer. If it cannot free any pages, kswapd will go to sleep and leave further attempts to direct reclaim invocations, which will either make progress and re-enable kswapd, or invoke the OOM killer. Patch #1 fixes the immediate problem Jia reported, the remainder are smaller fixlets, cleanups, and overall phasing out of the old method. Patch #6 is the odd one out. It's a nice cleanup to get_scan_count(), and directly related to #5, but in itself not relevant to the series. If the whole series is too ambitious for 4.11, I would consider the first three patches fixes, the rest cleanups. Thanks include/linux/mmzone.h | 3 +- mm/internal.h | 7 +- mm/migrate.c | 3 - mm/page_alloc.c | 39 +++-------- mm/vmscan.c | 169 ++++++++++++++++++----------------------------- mm/vmstat.c | 24 ++----- 6 files changed, 89 insertions(+), 156 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org