From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B560AC433EF for ; Wed, 24 Nov 2021 18:59:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D9C26B0072; Wed, 24 Nov 2021 13:58:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 989516B0075; Wed, 24 Nov 2021 13:58:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 850AA6B007B; Wed, 24 Nov 2021 13:58:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0234.hostedemail.com [216.40.44.234]) by kanga.kvack.org (Postfix) with ESMTP id 768676B0072 for ; Wed, 24 Nov 2021 13:58:47 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 418ED180C120D for ; Wed, 24 Nov 2021 18:58:37 +0000 (UTC) X-FDA: 78844735032.07.5EE84F8 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf27.hostedemail.com (Postfix) with ESMTP id 856BA70000BD for ; Wed, 24 Nov 2021 18:58:35 +0000 (UTC) Received: by mail-qt1-f180.google.com with SMTP id j17so3724718qtx.2 for ; Wed, 24 Nov 2021 10:58:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=+bM/O38rPI+qfv/U3PQAO1yZ/WpbjgCs1jId/KOne+A=; b=pz5C3pEbbdneSHJvWhBW5lm6IawjQHl9mVUDrg4QAY7SXoUoWMFjgkOTCoVFrhtpjS RJp6W7x41zz3Iv5jQwBM9bKUiOxEwssfknMeY4jXOt4krof5vDMTNa89NnzMK6czNnQd 0T1Onq3p9iGFQIm1j9Tcb/KlZKNvWxywL/YdVaLUqBM35kjuL3VO+hdwN/monOn6+kvS PCCnFe+LL+JLva6sWD9/YVvveb1TyXlULDmlqyIVUW7Y6Dt9xm6W6Tg1c56P2veknlj4 aVm3FF6nmA8ypFVbtjp5S6TLK4qDUc6ffbvIQPlyjJAY3Ece5V8oOGBdBX6dCWZaqb4x XeqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=+bM/O38rPI+qfv/U3PQAO1yZ/WpbjgCs1jId/KOne+A=; b=RzADepBWqEe2MTKfuFhD7Zsb15a2dkp62QZX+IMBJqFdeccP43s7q7vizHWnkR/cgF zuILj27E4EB0AWH92UzcMZUXfjHsCrPSaThmcyVNU1CKdOZL9y1/TO7b9jfqe8pXrPWY TtW9bAwdKJH5JczArZ0krWrloTNpgdArHMS7Fiao94gGafiQsFcLHaQ7jbf8T+eNFL61 xPyL8S83yFF2p2QqRemARbrC5yiIWZiKKOXSTFhfBdzpxoWqH6AgZUQa6xHLJB9nRljG fN99E5C0cWFXfg9O0RCMZFNhmqZN8pvONIRJE/VfZgsYodMrNoZBZRA5vwqBCTbRtBYc sV6g== X-Gm-Message-State: AOAM530eZz9gwj7KNsyfIcgdFvXFg0h0XwMswLXRDFsG92WXYKcBd/oB bHRd6ysV5MKZIZDYplI9hvI= X-Google-Smtp-Source: ABdhPJxzTa/waov5WY5oCFmmWu5OPXhZW+907iSmGhMIrLsEp8ZzjCqSha54/PcbOBR9Hmwd2GTVOA== X-Received: by 2002:ac8:5fc5:: with SMTP id k5mr1298352qta.502.1637780316136; Wed, 24 Nov 2021 10:58:36 -0800 (PST) Received: from hasanalmaruf-mbp.thefacebook.com ([2620:10d:c091:480::1:a1b0]) by smtp.gmail.com with ESMTPSA id r16sm315775qkp.42.2021.11.24.10.58.34 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 Nov 2021 10:58:35 -0800 (PST) From: Hasan Al Maruf X-Google-Original-From: Hasan Al Maruf To: dave.hansen@linux.intel.com, ying.huang@intel.com, yang.shi@linux.alibaba.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 0/5] Transparent Page Placement for Tiered-Memory Date: Wed, 24 Nov 2021 13:58:25 -0500 Message-Id: X-Mailer: git-send-email 2.30.1 (Apple Git-130) MIME-Version: 1.0 X-Stat-Signature: 5x7d9jo9sudikft4385pz7oequxu6qmp X-Rspamd-Queue-Id: 856BA70000BD X-Rspamd-Server: rspam07 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=pz5C3pEb; spf=pass (imf27.hostedemail.com: domain of hasan3050@gmail.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=hasan3050@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1637780315-538924 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [resend in proper format] With the advent of new memory types and technologies, we can see differen= t types of memory together, e.g. DRAM, PMEM, CXL-enabled memory, etc. In recent future, we can see CXL-Memory be available in the physical address= - space as a CPU-less NUMA node along with the native DDR memory channels. As different types of memory have different level of performance impact, how we manage pages across the NUMA nodes should be a matter of concern. Dave Hansen's patchset on "Migrate Pages in lieu of discard" demotes toptier pages to a slow tier node during the reclamation process. https://lwn.net/Articles/860215/ However, that patchset does not include the features to promote pages on slow tier memory node to the toptier one. As a result, pages demoted or newly allocated on the slow tier node, experiences NUMA latency and hurt application performance. In this patch set, we augment existing AutoNUMA mechanism to promote pages from slow tier nodes to toptier nodes. We decouple reclamation and allocation logics for the toptier node so tha= t reclamation gets triggered at a higher watermark and demotes colder pages to the slow-tier memory. As a result, toptier nodes can maintain some fre= e space to accept both new allocation and promotion from slowtier nodes. During promotion, we add hysteresis to page and only promote pages that are less likely to be demoted within a short period of time. This reduces the chance for a page being ping-ponged across the NUMA nodes due to frequent demotion and promotion within a short period of time. We tested this patchset on systems with CXL-enabled DRAM and PMEM tiers. We find this patchset can bring hotter pages to the toptier node while moving the colder pages to the slow-tier nodes for a good range of Meta production workloads with live traffic. As a result, toptier nodes serve more hot pages and the application performance improves. Case Study of a Meta cache application with two NUMA nodes =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D Toptier node: DRAM directly attached to the CPU Slowtier node: DRAM attached through CXL Toptier vs Slowtier memory capacity ratio is 1:4 With default page placement policy, file caches fills up the toptier node and anons get trapped in the slowtier node. Only 14% of the total anons reside in toptier node. Remote NUMA read bandwidth is 80%. Throughput regression is 18% compared to all memory being served from toptier node. This patchset brings 80% of the anons to the toptier node. Anons on the slowtier memory is mostly cold anons. As the toptier node can not host al= l the hot memory, some hot files still remain on the slowtier node. Even though, remote NUMA read bandwidth reduces from 80% to 40%. With this patchset, throughput regression is only 5% compared to the baseline of toptier node serving the whole working set. Hasan Al Maruf (5): Promotion and demotion related statistics NUMA balancing for tiered-memory system Decouple reclaim and allocation for toptier nodes Reclaim to satisfy WMARK_DEMOTE on toptier nodes active LRU-based promotion to avoid ping-pong Documentation/admin-guide/sysctl/kernel.rst | 18 +++++ Documentation/admin-guide/sysctl/vm.rst | 12 ++++ include/linux/mempolicy.h | 11 ++- include/linux/mm.h | 4 ++ include/linux/mmzone.h | 5 ++ include/linux/node.h | 7 ++ include/linux/page-flags.h | 9 +++ include/linux/page_ext.h | 3 + include/linux/sched/numa_balancing.h | 63 ++++++++++++++++- include/linux/sched/sysctl.h | 6 ++ include/linux/vm_event_item.h | 13 ++++ include/trace/events/mmflags.h | 10 ++- kernel/sched/core.c | 36 ++++++++-- kernel/sched/fair.c | 23 ++++++- kernel/sched/sched.h | 2 + kernel/sysctl.c | 19 ++++-- mm/huge_memory.c | 29 +++++--- mm/memory.c | 15 +++- mm/mempolicy.c | 30 +++++++- mm/migrate.c | 48 ++++++++++--- mm/mprotect.c | 8 ++- mm/page_alloc.c | 34 ++++++++- mm/vmscan.c | 76 +++++++++++++++++++-- mm/vmstat.c | 20 +++++- 24 files changed, 451 insertions(+), 50 deletions(-) -- 2.30.2