From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756122AbcDDROA (ORCPT ); Mon, 4 Apr 2016 13:14:00 -0400 Received: from gum.cmpxchg.org ([85.214.110.215]:49016 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754256AbcDDRN7 (ORCPT ); Mon, 4 Apr 2016 13:13:59 -0400 From: Johannes Weiner To: Andres Freund , Rik van Riel , Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 0/3] mm: support bigger cache workingsets and protect against writes Date: Mon, 4 Apr 2016 13:13:35 -0400 Message-Id: <1459790018-6630-1-git-send-email-hannes@cmpxchg.org> X-Mailer: git-send-email 2.8.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, this is a follow-up to http://www.spinics.net/lists/linux-mm/msg101739.html where Andres reported his database workingset being pushed out by the minimum size enforcement of the inactive file list - currently 50% of cache - as well as repeatedly written file pages that are never actually read. Two changes fell out of the discussions. The first change observes that pages that are only ever written don't benefit from caching beyond what the writeback cache does for partial page writes, and so we shouldn't promote them to the active file list where they compete with pages whose cached data is actually accessed repeatedly. This change comes in two patches - one for in-cache write accesses and one for refaults triggered by writes, neither of which should promote a cache page. Second, with the refault detection we don't need to set 50% of the cache aside for used-once cache anymore since we can detect frequently used pages even when they are evicted between accesses. We can allow the active list to be bigger and thus protect a bigger workingset that isn't challenged by streamers. Depending on the access patterns, this can increase major faults during workingset transitions for better performance during stable phases. Andres, I tried reproducing your postgres scenario, but I could never get the WAL to interfere even with wal_log = hot_standby mode. It's a 8G machine, I set shared_buffers = 2GB, ran pgbench -i -s 290, and then -c 32 -j 32 -M prepared -t 150000. Any input on how to trigger the thrashing you observed would be appreciated. But it would be great if you could test these patches on your known-problematic setup as well. Thanks! include/linux/memcontrol.h | 25 ----------- mm/filemap.c | 8 +++- mm/page_alloc.c | 44 ------------------ mm/vmscan.c | 104 +++++++++++++++++-------------------------- 4 files changed, 48 insertions(+), 133 deletions(-)