From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755082AbdDDWaG (ORCPT ); Tue, 4 Apr 2017 18:30:06 -0400 Received: from gum.cmpxchg.org ([85.214.110.215]:43734 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751854AbdDDWaE (ORCPT ); Tue, 4 Apr 2017 18:30:04 -0400 Date: Tue, 4 Apr 2017 18:29:52 -0400 From: Johannes Weiner To: Andrew Morton Cc: Rik van Riel , Mel Gorman , Michal Hocko , Vladimir Davydov , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm: vmscan: fix IO/refault regression in cache workingset transition Message-ID: <20170404222952.GA28930@cmpxchg.org> References: <20170404220052.27593-1-hannes@cmpxchg.org> <20170404150703.742c49d73921df6369ed3dbd@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170404150703.742c49d73921df6369ed3dbd@linux-foundation.org> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 04, 2017 at 03:07:03PM -0700, Andrew Morton wrote: > On Tue, 4 Apr 2017 18:00:52 -0400 Johannes Weiner wrote: > > > Since 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") > > we noticed bigger IO spikes during changes in cache access patterns. > > > > The patch in question shrunk the inactive list size to leave more room > > for the current workingset in the presence of streaming IO. However, > > workingset transitions that previously happened on the inactive list > > are now pushed out of memory and incur more refaults to complete. > > > > This patch disables active list protection when refaults are being > > observed. This accelerates workingset transitions, and allows more of > > the new set to establish itself from memory, without eating into the > > ability to protect the established workingset during stable periods. > > > > Fixes: 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") > > Signed-off-by: Johannes Weiner > > Cc: # 4.7+ > > That's a pretty large patch and the problem has been there for a year. > I'm not sure that it's 4.11 material, let alone -stable. Care to > explain further? The problem statement is a little terse, my apologies. The workloads that were measurably affected for us were hit pretty bad by it, with refault/majfault rates doubling and tripling during cache transitions, and the machines sustaining half-hour periods of 100% IO utilization, where they'd previously have sub-minute peaks at 60-90%. Stateful services that handle user data tend to be more conservative with kernel upgrades. As a result we hit most page cache issues with some delay, as was the case here. The severity seemed to warrant a stable tag, but I agree that holding out until 4.11.1 is probably better, given the invasiveness of this. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 0AF636B0390 for ; Tue, 4 Apr 2017 18:30:04 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id i18so29887845wrb.21 for ; Tue, 04 Apr 2017 15:30:03 -0700 (PDT) Received: from gum.cmpxchg.org (gum.cmpxchg.org. [85.214.110.215]) by mx.google.com with ESMTPS id k6si21615704wma.165.2017.04.04.15.30.02 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Apr 2017 15:30:02 -0700 (PDT) Date: Tue, 4 Apr 2017 18:29:52 -0400 From: Johannes Weiner Subject: Re: [PATCH] mm: vmscan: fix IO/refault regression in cache workingset transition Message-ID: <20170404222952.GA28930@cmpxchg.org> References: <20170404220052.27593-1-hannes@cmpxchg.org> <20170404150703.742c49d73921df6369ed3dbd@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170404150703.742c49d73921df6369ed3dbd@linux-foundation.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Rik van Riel , Mel Gorman , Michal Hocko , Vladimir Davydov , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com On Tue, Apr 04, 2017 at 03:07:03PM -0700, Andrew Morton wrote: > On Tue, 4 Apr 2017 18:00:52 -0400 Johannes Weiner wrote: > > > Since 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") > > we noticed bigger IO spikes during changes in cache access patterns. > > > > The patch in question shrunk the inactive list size to leave more room > > for the current workingset in the presence of streaming IO. However, > > workingset transitions that previously happened on the inactive list > > are now pushed out of memory and incur more refaults to complete. > > > > This patch disables active list protection when refaults are being > > observed. This accelerates workingset transitions, and allows more of > > the new set to establish itself from memory, without eating into the > > ability to protect the established workingset during stable periods. > > > > Fixes: 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") > > Signed-off-by: Johannes Weiner > > Cc: # 4.7+ > > That's a pretty large patch and the problem has been there for a year. > I'm not sure that it's 4.11 material, let alone -stable. Care to > explain further? The problem statement is a little terse, my apologies. The workloads that were measurably affected for us were hit pretty bad by it, with refault/majfault rates doubling and tripling during cache transitions, and the machines sustaining half-hour periods of 100% IO utilization, where they'd previously have sub-minute peaks at 60-90%. Stateful services that handle user data tend to be more conservative with kernel upgrades. As a result we hit most page cache issues with some delay, as was the case here. The severity seemed to warrant a stable tag, but I agree that holding out until 4.11.1 is probably better, given the invasiveness of this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Johannes Weiner Subject: Re: [PATCH] mm: vmscan: fix IO/refault regression in cache workingset transition Date: Tue, 4 Apr 2017 18:29:52 -0400 Message-ID: <20170404222952.GA28930@cmpxchg.org> References: <20170404220052.27593-1-hannes@cmpxchg.org> <20170404150703.742c49d73921df6369ed3dbd@linux-foundation.org> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cmpxchg.org ; s=x; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject: Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ZgTmYVQ5yGsluxvbsFSFKVnQPGMTI0oR5LNHITjszLI=; b=0rIjDRzh/5Mze7RFlm75Gui96/ 3TNz4IddVjE/OgpCsrwOs5uw5MdJPPQE2OUXvSHMNYSPv+d1b7c0ufzurUaf8nvGUs5ZS8ZIPFeQM GvhMQpSWFJd6U+8u8gvrLhepuYjL/0Adv9dbN/+t0HF9fA2GUu98LbkjZXNNeyL0XZLM=; Content-Disposition: inline In-Reply-To: <20170404150703.742c49d73921df6369ed3dbd-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Andrew Morton Cc: Rik van Riel , Mel Gorman , Michal Hocko , Vladimir Davydov , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kernel-team-b10kYP2dOMg@public.gmane.org On Tue, Apr 04, 2017 at 03:07:03PM -0700, Andrew Morton wrote: > On Tue, 4 Apr 2017 18:00:52 -0400 Johannes Weiner wrote: > > > Since 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") > > we noticed bigger IO spikes during changes in cache access patterns. > > > > The patch in question shrunk the inactive list size to leave more room > > for the current workingset in the presence of streaming IO. However, > > workingset transitions that previously happened on the inactive list > > are now pushed out of memory and incur more refaults to complete. > > > > This patch disables active list protection when refaults are being > > observed. This accelerates workingset transitions, and allows more of > > the new set to establish itself from memory, without eating into the > > ability to protect the established workingset during stable periods. > > > > Fixes: 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") > > Signed-off-by: Johannes Weiner > > Cc: # 4.7+ > > That's a pretty large patch and the problem has been there for a year. > I'm not sure that it's 4.11 material, let alone -stable. Care to > explain further? The problem statement is a little terse, my apologies. The workloads that were measurably affected for us were hit pretty bad by it, with refault/majfault rates doubling and tripling during cache transitions, and the machines sustaining half-hour periods of 100% IO utilization, where they'd previously have sub-minute peaks at 60-90%. Stateful services that handle user data tend to be more conservative with kernel upgrades. As a result we hit most page cache issues with some delay, as was the case here. The severity seemed to warrant a stable tag, but I agree that holding out until 4.11.1 is probably better, given the invasiveness of this.