From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.2 required=3.0 tests=BAYES_00,DATE_IN_PAST_06_12, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1C6EC4320E for ; Wed, 1 Sep 2021 10:09:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A854960F21 for ; Wed, 1 Sep 2021 10:09:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232239AbhIAKKf (ORCPT ); Wed, 1 Sep 2021 06:10:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230257AbhIAKKd (ORCPT ); Wed, 1 Sep 2021 06:10:33 -0400 Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F38B2C061575 for ; Wed, 1 Sep 2021 03:09:36 -0700 (PDT) Received: by mail-ej1-x62c.google.com with SMTP id me10so5374932ejb.11 for ; Wed, 01 Sep 2021 03:09:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=TrVtx3U2QpvsgE0FsR57CyQeiezj6FJ6Rth9MaRCEts=; b=uAWAUjmcKXmZTpm8DxapCzUspI/qDjymaCV9CWEUyzZcNtk6saBZlBADwwYiyRzqa6 r2EK6c02bWa8XaTAt9B5qft591wxYQId0Lzjp+DifmqDrn0tHLRvOx5zE+GYEhafzXfS JQ2RViiZuzKBy2pQG7BFKV13ymhBOZ53hlExEftMjIRCdwtC9ffVkWtx5SAFSivEG8VX VNd6rUN9fSWbk4Vv33m/jMYzKurpA7lb0IO1p8aurT4j/fR1cvxQdCb5TwHKR99IfzBh +bnj7UnYxlyYglMPNTCmVfwSTy4megUESOuwyE6kpqdix0Kt85k3Fh9DlwutHX+3pXCH /Gww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=TrVtx3U2QpvsgE0FsR57CyQeiezj6FJ6Rth9MaRCEts=; b=AGzNh+YmIxo1lTXqsHaIkMZiIqyg1sVaLxJn4T7ey66eHsMfhyoE0MPe4AALrmZ3wn FlDWvDVg+jkL8AdlJncqO7sr1zzCpCEya7+z7dF/EtuyBYPOFtHWzXckiDKWF3bDrXPa +Y4yKJu6cTmpX6ieuq+DEl9kaB4HxhQ9SeT+Sn+sjvui8Fq+dSXn32WGq2Q0Jajjdids 9k1nwVrmbdoo1AH2R/S+ksLS4sdolOEFtYuXe9/CoD1UFKo9h3wXMhA15eo3wSnPsi5N 0mF0LPOWudEgo2QJymZfJrMl7vojqLBQAHAuu22ebbVQ3kObSPftDmqcI3hsN0UeefG/ FYuA== X-Gm-Message-State: AOAM531qOzqnbZ+uo17HT32gun30WcwoX2RE/8vfRlS2+g5JS/4MVCHd ecfo73TrRaK9PXwCpelMTbSsuFxujHMdF+imlJ7TFEg33VhED1W3 X-Google-Smtp-Source: ABdhPJzA3IjbHbXxcr1K2sV888cVO5o3JligdUX1Fqj1LpsSkVcqyR8Ya0CGbqlHeFyLfUT91sjvUnDJUlrN/WHd4z4= X-Received: by 2002:a17:906:36d6:: with SMTP id b22mr36572958ejc.387.1630490975520; Wed, 01 Sep 2021 03:09:35 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Wed, 1 Sep 2021 10:09:24 +0800 Message-ID: Subject: Re: Is it possible to implement the per-node page cache for programs/libraries? To: Shijie Huang Cc: Linus Torvalds , viro@zeniv.linux.org.uk, Andrew Morton , linux-mm@kvack.org, Barry Song , LKML , Frank Wang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 1, 2021 at 11:09 AM Shijie Huang wrote: > > Hi Everyone, > > In the NUMA, we only have one page cache for each file. For the > program/shared libraries, the > > remote-access delays longer then the local-access. > > So, is it possible to implement the per-node page cache for > programs/libraries? as far as i know, this is an very interesting topic, we do have some "solutions" on this. MIPS kernel supports kernel TEXT replication: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arc= h/mips/sgi-ip27/Kconfig https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arc= h/mips/sgi-ip27/ip27-klnuma.c config REPLICATE_KTEXT bool "Kernel text replication support" depends on SGI_IP27 select MAPPED_KERNEL help Say Y here to enable replicating the kernel text across multiple nodes in a NUMA cluster. This trades memory for speed. for x86, RedHawk Linux=EF=BC=88https://www.concurrent-rt.com/solutions/linu= x/=EF=BC=89supports kernel text replication. here are some benchmark: https://www.concurrent-rt.com/wp-content/uploads/2016/11/kernel-page-replic= ation.pdf For userspace, dplace from SGI can help replicate text: https://www.spec.org/cpu2006/flags/SGI-platform.html -r bl: specifies that text should be replicated on the NUMA node or nodes where the process is running. 'b' indicates that binary (a.out) text should be replicated; 'l' indicates that library text should be replicated. but all of the above except mips ktext replication are out of tree. Please count me in if you have any solution and any pending patch. I am interested in this topic. > > > We can do it like this: > > 1.) Add a new system call to control specific files to > NUMA-aware, such as: > > set_numa_aware("/usr/lib/libc.so", enable); > > After the system call, the page cache of libc.so has the > flags "NUMA_ENABLED" > > > 2.) When A new process tries to setup the MMU page table for > libc.so, it will check > > if NUMA_ENABLED is set. If it set, the kernel will give a > page which is bind to the process's NUMA node. > > By this way, we can eliminate the remote-access for > programs/shared library. > > > Is this proposal ok? Or do you have a better idea? > > > Thanks > > Huang Shijie Thanks barry From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.2 required=3.0 tests=BAYES_00,DATE_IN_PAST_06_12, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99469C432BE for ; Wed, 1 Sep 2021 10:09:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2661060F21 for ; Wed, 1 Sep 2021 10:09:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2661060F21 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 93C766B006C; Wed, 1 Sep 2021 06:09:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C62B6B0071; Wed, 1 Sep 2021 06:09:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7671E8D0001; Wed, 1 Sep 2021 06:09:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0031.hostedemail.com [216.40.44.31]) by kanga.kvack.org (Postfix) with ESMTP id 63E826B006C for ; Wed, 1 Sep 2021 06:09:37 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 238671804DF8F for ; Wed, 1 Sep 2021 10:09:37 +0000 (UTC) X-FDA: 78538582794.38.CD1AB3E Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf29.hostedemail.com (Postfix) with ESMTP id CD19C9000094 for ; Wed, 1 Sep 2021 10:09:36 +0000 (UTC) Received: by mail-ej1-f52.google.com with SMTP id e21so5346987ejz.12 for ; Wed, 01 Sep 2021 03:09:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=TrVtx3U2QpvsgE0FsR57CyQeiezj6FJ6Rth9MaRCEts=; b=uAWAUjmcKXmZTpm8DxapCzUspI/qDjymaCV9CWEUyzZcNtk6saBZlBADwwYiyRzqa6 r2EK6c02bWa8XaTAt9B5qft591wxYQId0Lzjp+DifmqDrn0tHLRvOx5zE+GYEhafzXfS JQ2RViiZuzKBy2pQG7BFKV13ymhBOZ53hlExEftMjIRCdwtC9ffVkWtx5SAFSivEG8VX VNd6rUN9fSWbk4Vv33m/jMYzKurpA7lb0IO1p8aurT4j/fR1cvxQdCb5TwHKR99IfzBh +bnj7UnYxlyYglMPNTCmVfwSTy4megUESOuwyE6kpqdix0Kt85k3Fh9DlwutHX+3pXCH /Gww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=TrVtx3U2QpvsgE0FsR57CyQeiezj6FJ6Rth9MaRCEts=; b=fUSzqRzc3b1w03Gv+xyw3jRvWFI3UHVWLmdPwsxTijgAWyiuaZ47Mc8ZeetMJmr6A9 5fJmUyuR/T5fC/jyPNvU3qZ7K1mhhiZT7/kT9rlKl+YtU00/sOSjom2He7h8p19rcoSv RBPD+lPXnR5ptD4EMWyqn43TpFWBaUEMzCkTFmaXEVImF90bvPjZ1bDYJSyXCZ9u9Nmh qdg2AK4YplCqshMMJVyEgnpWcUsl1IFmX/OKYUTh+frct0Gmabkxz+p7bU7iW9iv9h5l +09QToi6tp4ncZXr6H6ueU+sQPld7L4TFbu+lTjHaTlyxqfU1i6na7sG8vA/wgmp7rjA ZjUw== X-Gm-Message-State: AOAM530NwpQOts6D9AYUyf0tzrY0/l9O3WRfW7V+BC6zHm3NSNhR0MEC R/gDUbYksrPOzvg6zaOJZj/lDljpCWfSF9JEvXw= X-Google-Smtp-Source: ABdhPJzA3IjbHbXxcr1K2sV888cVO5o3JligdUX1Fqj1LpsSkVcqyR8Ya0CGbqlHeFyLfUT91sjvUnDJUlrN/WHd4z4= X-Received: by 2002:a17:906:36d6:: with SMTP id b22mr36572958ejc.387.1630490975520; Wed, 01 Sep 2021 03:09:35 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Wed, 1 Sep 2021 10:09:24 +0800 Message-ID: Subject: Re: Is it possible to implement the per-node page cache for programs/libraries? To: Shijie Huang Cc: Linus Torvalds , viro@zeniv.linux.org.uk, Andrew Morton , linux-mm@kvack.org, Barry Song , LKML , Frank Wang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=uAWAUjmc; spf=pass (imf29.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CD19C9000094 X-Stat-Signature: 9cubbh6a6m5uumdixanp4pwmktrfam61 X-HE-Tag: 1630490976-617650 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 1, 2021 at 11:09 AM Shijie Huang wrote: > > Hi Everyone, > > In the NUMA, we only have one page cache for each file. For the > program/shared libraries, the > > remote-access delays longer then the local-access. > > So, is it possible to implement the per-node page cache for > programs/libraries? as far as i know, this is an very interesting topic, we do have some "solutions" on this. MIPS kernel supports kernel TEXT replication: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arc= h/mips/sgi-ip27/Kconfig https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arc= h/mips/sgi-ip27/ip27-klnuma.c config REPLICATE_KTEXT bool "Kernel text replication support" depends on SGI_IP27 select MAPPED_KERNEL help Say Y here to enable replicating the kernel text across multiple nodes in a NUMA cluster. This trades memory for speed. for x86, RedHawk Linux=EF=BC=88https://www.concurrent-rt.com/solutions/linu= x/=EF=BC=89supports kernel text replication. here are some benchmark: https://www.concurrent-rt.com/wp-content/uploads/2016/11/kernel-page-replic= ation.pdf For userspace, dplace from SGI can help replicate text: https://www.spec.org/cpu2006/flags/SGI-platform.html -r bl: specifies that text should be replicated on the NUMA node or nodes where the process is running. 'b' indicates that binary (a.out) text should be replicated; 'l' indicates that library text should be replicated. but all of the above except mips ktext replication are out of tree. Please count me in if you have any solution and any pending patch. I am interested in this topic. > > > We can do it like this: > > 1.) Add a new system call to control specific files to > NUMA-aware, such as: > > set_numa_aware("/usr/lib/libc.so", enable); > > After the system call, the page cache of libc.so has the > flags "NUMA_ENABLED" > > > 2.) When A new process tries to setup the MMU page table for > libc.so, it will check > > if NUMA_ENABLED is set. If it set, the kernel will give a > page which is bind to the process's NUMA node. > > By this way, we can eliminate the remote-access for > programs/shared library. > > > Is this proposal ok? Or do you have a better idea? > > > Thanks > > Huang Shijie Thanks barry