From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCB36C433C1 for ; Tue, 30 Mar 2021 09:06:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 40F8C6195D for ; Tue, 30 Mar 2021 09:06:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40F8C6195D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 31F886B0098; Tue, 30 Mar 2021 05:06:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F6866B0099; Tue, 30 Mar 2021 05:06:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 128096B009A; Tue, 30 Mar 2021 05:06:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84]) by kanga.kvack.org (Postfix) with ESMTP id B980A6B0098 for ; Tue, 30 Mar 2021 05:06:20 -0400 (EDT) Received: from smtpin37.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6EBEF181AEF3C for ; Tue, 30 Mar 2021 09:06:20 +0000 (UTC) X-FDA: 77975959320.37.7379BFD Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) by imf22.hostedemail.com (Postfix) with ESMTP id C6E71C0007CD for ; Tue, 30 Mar 2021 09:06:16 +0000 (UTC) Received: by mail-qk1-f177.google.com with SMTP id g15so15151270qkl.4 for ; Tue, 30 Mar 2021 02:06:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=c38wusKiZ9Be/xNlFcrywm9PQXTdOO8T5ykhQZird1w=; b=LrxfnRuNgznd/ZU0/xqKplihaaqCLtZO1gzQzfhsPRtNh00bsnlspFcuQ2s+pFjLIp VgGd5PGO+/RkY5sNYDC2p1RCumAJDvuVIGKoSZ/fvQ15mpXpx2Ui6Y5LUs1U8Mum5LhJ 0u6lKZX+mzYGA7L/TzOQX2i86nCgaP0BDzb6e8rFFzXkrYlgAR9pfRgpU4QhYBvkn4O5 iLWYHqGjJZ5QkKlObxLnr7hzOHZ8xHHwVrqCMbZO8k1HEZa1h/OAhVFxdRax+1MKZdj/ 7bfgH5TK9DVImnmPc7sk82pY9PerMymNFGKMxC0cdQpBac5IUqp+iHVEvxtZKRVyie2n yTpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=c38wusKiZ9Be/xNlFcrywm9PQXTdOO8T5ykhQZird1w=; b=XCvVGdTj1gHkZ1XegyLeeRwhpZukKU1EzJNjnx5Gf3Wm3VSKa3pKA4FtatRi5i/Gyz amJPP0nkgz6cSW9O4gryG3k3EKDm205sTVEKSSwVMyEjNTk0WHYt+6ZHh+gazNxaZxQS /eL6/6JDAhLIvZrCqlWWnstR6KpJOUuTrU2CzvgTDgcNMJlrvgd4eNbPSm/80ZyGbS6j xbf61ALd1xp48f8Vs8guHL/JEZ9nUyLCeDEhnXS/NUSAkmuLmvlu1Vi3HULy9B3OlSAk qexbGKZGO94oW76AKIWBlNafZTkjzFFV+HI+clVy/VZTovzHqONzNSY+dLx/RW+Ed4S/ RoJw== X-Gm-Message-State: AOAM532+T6ksf99NBGvFOLX/QKaKv5Nt7aGElFv29tglTmDuDkqoeAOm 2XeM/T5pXlydoGKLVC+NTmA= X-Google-Smtp-Source: ABdhPJyu6kk/uBL+Jo0MYh/GCGWGisUmNlvITni4RPj80rZ3tiMtY0Zv9pKxDkMkJ+cGhdgAxrJwJQ== X-Received: by 2002:a37:a941:: with SMTP id s62mr29643155qke.404.1617095177828; Tue, 30 Mar 2021 02:06:17 -0700 (PDT) Received: from localhost.localdomain (ec2-35-169-212-159.compute-1.amazonaws.com. [35.169.212.159]) by smtp.gmail.com with ESMTPSA id 8sm14630011qkc.32.2021.03.30.02.06.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Mar 2021 02:06:17 -0700 (PDT) From: sj38.park@gmail.com To: akpm@linux-foundation.org Cc: SeongJae Park , Jonathan.Cameron@Huawei.com, acme@kernel.org, alexander.shishkin@linux.intel.com, amit@kernel.org, benh@kernel.crashing.org, brendanhiggins@google.com, corbet@lwn.net, david@redhat.com, dwmw@amazon.com, elver@google.com, fan.du@intel.com, foersleo@amazon.de, greg@kroah.com, gthelen@google.com, guoju.fgj@alibaba-inc.com, mgorman@suse.de, minchan@kernel.org, mingo@redhat.com, namhyung@kernel.org, peterz@infradead.org, riel@surriel.com, rientjes@google.com, rostedt@goodmis.org, rppt@kernel.org, shakeelb@google.com, shuah@kernel.org, sj38.park@gmail.com, snu@amazon.de, vbabka@suse.cz, vdavydov.dev@gmail.com, zgf574564920@gmail.com, linux-damon@amazon.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v26 10/13] Documentation: Add documents for DAMON Date: Tue, 30 Mar 2021 09:05:34 +0000 Message-Id: <20210330090537.12143-11-sj38.park@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210330090537.12143-1-sj38.park@gmail.com> References: <20210330090537.12143-1-sj38.park@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C6E71C0007CD X-Stat-Signature: aazhqbppsoj1134dpxehup3xidir9x3q Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf22; identity=mailfrom; envelope-from=""; helo=mail-qk1-f177.google.com; client-ip=209.85.222.177 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617095176-893297 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park This commit adds documents for DAMON under `Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`. Signed-off-by: SeongJae Park --- Documentation/admin-guide/mm/damon/guide.rst | 159 +++++++++++++ Documentation/admin-guide/mm/damon/index.rst | 15 ++ Documentation/admin-guide/mm/damon/plans.rst | 29 +++ Documentation/admin-guide/mm/damon/start.rst | 97 ++++++++ Documentation/admin-guide/mm/damon/usage.rst | 112 +++++++++ Documentation/admin-guide/mm/index.rst | 1 + Documentation/vm/damon/api.rst | 20 ++ Documentation/vm/damon/design.rst | 166 +++++++++++++ Documentation/vm/damon/eval.rst | 232 +++++++++++++++++++ Documentation/vm/damon/faq.rst | 58 +++++ Documentation/vm/damon/index.rst | 31 +++ Documentation/vm/index.rst | 1 + 12 files changed, 921 insertions(+) create mode 100644 Documentation/admin-guide/mm/damon/guide.rst create mode 100644 Documentation/admin-guide/mm/damon/index.rst create mode 100644 Documentation/admin-guide/mm/damon/plans.rst create mode 100644 Documentation/admin-guide/mm/damon/start.rst create mode 100644 Documentation/admin-guide/mm/damon/usage.rst create mode 100644 Documentation/vm/damon/api.rst create mode 100644 Documentation/vm/damon/design.rst create mode 100644 Documentation/vm/damon/eval.rst create mode 100644 Documentation/vm/damon/faq.rst create mode 100644 Documentation/vm/damon/index.rst diff --git a/Documentation/admin-guide/mm/damon/guide.rst b/Documentation= /admin-guide/mm/damon/guide.rst new file mode 100644 index 000000000000..49da40bc4ba9 --- /dev/null +++ b/Documentation/admin-guide/mm/damon/guide.rst @@ -0,0 +1,159 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Optimization Guide +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This document helps you estimating the amount of benefit that you could = get +from DAMON-based optimizations, and describes how you could achieve it. = You +are assumed to already read :doc:`start`. + + +Check The Signs +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +No optimization can provide same extent of benefit to every case. There= fore +you should first guess how much improvements you could get using DAMON. = If +some of below conditions match your situation, you could consider using = DAMON. + +- *Low IPC and High Cache Miss Ratios.* Low IPC means most of the CPU t= ime is + spent waiting for the completion of time-consuming operations such as = memory + access, while high cache miss ratios mean the caches don't help it wel= l. + DAMON is not for cache level optimization, but DRAM level. However, + improving DRAM management will also help this case by reducing the mem= ory + operation latency. +- *Memory Over-commitment and Unknown Users.* If you are doing memory + overcommitment and you cannot control every user of your system, a mem= ory + bank run could happen at any time. You can estimate when it will happ= en + based on DAMON's monitoring results and act earlier to avoid or deal b= etter + with the crisis. +- *Frequent Memory Pressure.* Frequent memory pressure means your syste= m has + wrong configurations or memory hogs. DAMON will help you find the rig= ht + configuration and/or the criminals. +- *Heterogeneous Memory System.* If your system is utilizing memory dev= ices + that placed between DRAM and traditional hard disks, such as non-volat= ile + memory or fast SSDs, DAMON could help you utilizing the devices more + efficiently. + + +Profile +=3D=3D=3D=3D=3D=3D=3D + +If you found some positive signals, you could start by profiling your wo= rkloads +using DAMON. Find major workloads on your systems and analyze their dat= a +access pattern to find something wrong or can be improved. The DAMON us= er +space tool (``damo``) will be useful for this. You can get ``damo`` fro= m +``tools/damon/`` directory in the DAMON development tree (``damon/master= `` +branch of https://github.com/sjp38/linux.git). + +We recommend you to start from working set size distribution check using= ``damo +report wss``. If the distribution is ununiform or quite different from = what +you estimated, you could consider `Memory Configuration`_ optimization. + +Then, review the overall access pattern in heatmap form using ``damo rep= ort +heats``. If it shows a simple pattern consists of a small number of mem= ory +regions having high contrast of access temperature, you could consider m= anual +`Program Modification`_. + +If you still want to absorb more benefits, you should develop `Personali= zed +DAMON Application`_ for your special case. + +You don't need to take only one approach among the above plans, but you = could +use multiple of the above approaches to maximize the benefit. + + +Optimize +=3D=3D=3D=3D=3D=3D=3D=3D + +If the profiling result also says it's worth trying some optimization, y= ou +could consider below approaches. Note that some of the below approaches= assume +that your systems are configured with swap devices or other types of aux= iliary +memory so that you don't strictly required to accommodate the whole work= ing set +in the main memory. Most of the detailed optimization should be made on= your +concrete understanding of your memory devices. + + +Memory Configuration +-------------------- + +No more no less, DRAM should be large enough to accommodate only importa= nt +working sets, because DRAM is highly performance critical but expensive = and +heavily consumes the power. However, knowing the size of the real impor= tant +working sets is difficult. As a consequence, people usually equips +unnecessarily large or too small DRAM. Many problems stem from such wro= ng +configurations. + +Using the working set size distribution report provided by ``damo report= wss``, +you can know the appropriate DRAM size for you. For example, roughly sp= eaking, +if you worry about only 95 percentile latency, you don't need to equip D= RAM of +a size larger than 95 percentile working set size. + +Let's see a real example. This `page +`_ +shows the heatmap and the working set size distributions/changes of +``freqmine`` workload in PARSEC3 benchmark suite. The working set size = spikes +up to 180 MiB, but keeps smaller than 50 MiB for more than 95% of the ti= me. +Even though you give only 50 MiB of memory space to the workload, it wil= l work +well for 95% of the time. Meanwhile, you can save the 130 MiB of memory= space. + + +Program Modification +-------------------- + +If the data access pattern heatmap plotted by ``damo report heats`` is q= uite +simple so that you can understand how the things are going in the worklo= ad with +your human eye, you could manually optimize the memory management. + +For example, suppose that the workload has two big memory object but onl= y one +object is frequently accessed while the other one is only occasionally +accessed. Then, you could modify the program source code to keep the ho= t +object in the main memory by invoking ``mlock()`` or ``madvise()`` with +``MADV_WILLNEED``. Or, you could proactively evict the cold object usin= g +``madvise()`` with ``MADV_COLD`` or ``MADV_PAGEOUT``. Using both togeth= er +would be also worthy. + +A research work [1]_ using the ``mlock()`` achieved up to 2.55x performa= nce +speedup. + +Let's see another realistic example access pattern for this kind of +optimizations. This `page +`_ +shows the visualized access patterns of streamcluster workload in PARSEC= 3 +benchmark suite. We can easily identify the 100 MiB sized hot object. + + +Personalized DAMON Application +------------------------------ + +Above approaches will work well for many general cases, but would not en= ough +for some special cases. + +If this is the case, it might be the time to forget the comfortable use = of the +user space tool and dive into the debugfs interface (refer to :doc:`usag= e` for +the detail) of DAMON. Using the interface, you can control the DAMON mo= re +flexibly. Therefore, you can write your personalized DAMON application = that +controls the monitoring via the debugfs interface, analyzes the result, = and +applies complex optimizations itself. Using this, you can make more cre= ative +and wise optimizations. + +If you are a kernel space programmer, writing kernel space DAMON applica= tions +using the API (refer to the :doc:`/vm/damon/api` for more detail) would = be an +option. + + +Reference Practices +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Referencing previously done successful practices could help you getting = the +sense for this kind of optimizations. There is an academic paper [1]_ +reporting the visualized access pattern and manual `Program +Modification`_ results for a number of realistic workloads. You can als= o get +the visualized access patterns [3]_ [4]_ [5]_ and automated DAMON-based = memory +operations results for other realistic workloads that collected with lat= est +version of DAMON [2]_ . + +.. [1] https://dl.acm.org/doi/10.1145/3366626.3368125 +.. [2] https://damonitor.github.io/test/result/perf/latest/html/ +.. [3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap= .1.png.html +.. [4] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.= png.html +.. [5] https://damonitor.github.io/test/result/visual/latest/rec.wss_tim= e.png.html diff --git a/Documentation/admin-guide/mm/damon/index.rst b/Documentation= /admin-guide/mm/damon/index.rst new file mode 100644 index 000000000000..0baae7a5402b --- /dev/null +++ b/Documentation/admin-guide/mm/damon/index.rst @@ -0,0 +1,15 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Monitoring Data Accesses +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +:doc:`DAMON ` allows light-weight data access monitorin= g. +Using this, users can analyze and optimize their systems. + +.. toctree:: + :maxdepth: 2 + + start + guide + usage diff --git a/Documentation/admin-guide/mm/damon/plans.rst b/Documentation= /admin-guide/mm/damon/plans.rst new file mode 100644 index 000000000000..e3aa5ab96c29 --- /dev/null +++ b/Documentation/admin-guide/mm/damon/plans.rst @@ -0,0 +1,29 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Future Plans +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +DAMON is still on its first stage. Below plans are still under developm= ent. + + +Automate Data Access Monitoring-based Memory Operation Schemes Execution +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The ultimate goal of DAMON is to be used as a building block for the dat= a +access pattern aware kernel memory management optimization. It will mak= e +system just works efficiently. However, some users having very special +workloads will want to further do their own optimization. DAMON will au= tomate +most of the tasks for such manual optimizations in near future. Users w= ill be +required to only describe what kind of data access pattern-based operati= on +schemes they want in a simple form. + +By applying a very simple scheme for THP promotion/demotion with a proto= type +implementation, DAMON reduced 60% of THP memory footprint overhead while +preserving 50% of the THP performance benefit. The detailed results can= be +seen on an external web page [1]_. + +Several RFC patchsets for this plan are available [2]_. + +.. [1] https://damonitor.github.io/test/result/perf/latest/html/ +.. [2] https://lore.kernel.org/linux-mm/20200616073828.16509-1-sjpark@am= azon.com/ diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation= /admin-guide/mm/damon/start.rst new file mode 100644 index 000000000000..69bac6782624 --- /dev/null +++ b/Documentation/admin-guide/mm/damon/start.rst @@ -0,0 +1,97 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Getting Started +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This document briefly describes how you can use DAMON by demonstrating i= ts +default user space tool. Please note that this document describes only = a part +of its features for brevity. Please refer to :doc:`usage` for more deta= ils. + + +TL; DR +=3D=3D=3D=3D=3D=3D + +Follow below 5 commands to monitor and visualize the access pattern of y= our +workload. :: + + $ git clone https://github.com/sjp38/linux -b damon/master + /* build the kernel with CONFIG_DAMON=3Dy, install, reboot */ + $ mount -t debugfs none /sys/kernel/debug/ + $ cd linux/tools/damon + $ ./damo record $(pidof ) + $ ./damo report heats --heatmap access_pattern.png + + +Prerequisites +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Kernel +------ + +You should first ensure your system is running on a kernel built with +``CONFIG_DAMON=3Dy``. + + +User Space Tool +--------------- + +For the demonstration, we will use the default user space tool for DAMON= , +called DAMON Operator (DAMO). It is located at ``tools/damon/damo`` of = the +DAMON development kernel source tree (``damon/master`` branch of +https://github.com/sjp38/linux). For brevity, below examples assume you= set +``$PATH`` to point it. It's not mandatory, though. + +Because DAMO is using the debugfs interface (refer to :doc:`usage` for t= he +detail) of DAMON, you should ensure debugfs is mounted. Mount it manual= ly as +below:: + + # mount -t debugfs none /sys/kernel/debug/ + +or append below line to your ``/etc/fstab`` file so that your system can +automatically mount debugfs from next booting:: + + debugfs /sys/kernel/debug debugfs defaults 0 0 + + +Recording Data Access Patterns +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +Below commands record memory access pattern of a program and save the +monitoring results in a file. :: + + $ git clone https://github.com/sjp38/masim + $ cd masim; make; ./masim ./configs/zigzag.cfg & + $ sudo damo record -o damon.data $(pidof masim) + +The first two lines of the commands get an artificial memory access gene= rator +program and runs it in the background. It will repeatedly access two 10= 0 MiB +sized memory regions one by one. You can substitute this with your real +workload. The last line asks ``damo`` to record the access pattern in +``damon.data`` file. + + +Visualizing Recorded Patterns +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D + +Below three commands visualize the recorded access patterns into three +image files. :: + + $ damo report heats --heatmap access_pattern_heatmap.png + $ damo report wss --range 0 101 1 --plot wss_dist.png + $ damo report wss --range 0 101 1 --sortby time --plot wss_chron_cha= nge.png + +- ``access_pattern_heatmap.png`` will show the data access pattern in a + heatmap, which shows when (x-axis) what memory region (y-axis) is how + frequently accessed (color). +- ``wss_dist.png`` will show the distribution of the working set size. +- ``wss_chron_change.png`` will show how the working set size has + chronologically changed. + +You can show the images in a web page [1]_ . Those made with other real= istic +workloads are also available [2]_ [3]_ [4]_. + +.. [1] https://damonitor.github.io/doc/html/v17/admin-guide/mm/damon/sta= rt.html#visualizing-recorded-patterns +.. [2] https://damonitor.github.io/test/result/visual/latest/rec.heatmap= .1.png.html +.. [3] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.= png.html +.. [4] https://damonitor.github.io/test/result/visual/latest/rec.wss_tim= e.png.html diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation= /admin-guide/mm/damon/usage.rst new file mode 100644 index 000000000000..ccf631c3c2c2 --- /dev/null +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -0,0 +1,112 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Detailed Usages +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +DAMON provides below three interfaces for different users. + +- *DAMON user space tool.* + This is for privileged people such as system administrators who want a + just-working human-friendly interface. Using this, users can use the = DAMON=E2=80=99s + major features in a human-friendly way. It may not be highly tuned fo= r + special cases, though. It supports only virtual address spaces monito= ring. +- *debugfs interface.* + This is for privileged user space programmers who want more optimized = use of + DAMON. Using this, users can use DAMON=E2=80=99s major features by re= ading + from and writing to special debugfs files. Therefore, you can write a= nd use + your personalized DAMON debugfs wrapper programs that reads/writes the + debugfs files instead of you. The DAMON user space tool is also a ref= erence + implementation of such programs. It supports only virtual address spa= ces + monitoring. +- *Kernel Space Programming Interface.* + This is for kernel space programmers. Using this, users can utilize e= very + feature of DAMON most flexibly and efficiently by writing kernel space + DAMON application programs for you. You can even extend DAMON for var= ious + address spaces. + +This document describes only the debugfs interface because the user spac= e tool +is only in the development tree (``damon/master`` branch of +https://github.com/sjp38/linux) and you could refer to :doc:`/vm/damon/a= pi` for +the kernel space programming interface. + + +debugfs Interface +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +DAMON exports three files, ``attrs``, ``target_ids``, and ``monitor_on``= under +its debugfs directory, ``/damon/``. + + +Attributes +---------- + +Users can get and set the ``sampling interval``, ``aggregation interval`= `, +``regions update interval``, and min/max number of monitoring target reg= ions by +reading from and writing to the ``attrs`` file. To know about the monit= oring +attributes in detail, please refer to the :doc:`/vm/damon/design`. For +example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 a= nd +1000, and then check it again:: + + # cd /damon + # echo 5000 100000 1000000 10 1000 > attrs + # cat attrs + 5000 100000 1000000 10 1000 + + +Target IDs +---------- + +Some types of address spaces supports multiple monitoring target. For e= xample, +the virtual memory address spaces monitoring can have multiple processes= as the +monitoring targets. Users can set the targets by writing relevant id va= lues of +the targets to, and get the ids of the current targets by reading from t= he +``target_ids`` file. In case of the virtual address spaces monitoring, = the +values should be pids of the monitoring target processes. For example, = below +commands set processes having pids 42 and 4242 as the monitoring targets= and +check it again:: + + # cd /damon + # echo 42 4242 > target_ids + # cat target_ids + 42 4242 + +Note that setting the target ids doesn't start the monitoring. + + +Turning On/Off +-------------- + +Setting the files as described above doesn't incur effect unless you exp= licitly +start the monitoring. You can start, stop, and check the current status= of the +monitoring by writing to and reading from the ``monitor_on`` file. Writ= ing +``on`` to the file starts the monitoring of the targets with the attribu= tes. +Writing ``off`` to the file stops those. DAMON also stops if every targ= et +process is terminated. Below example commands turn on, off, and check t= he +status of DAMON:: + + # cd /damon + # echo on > monitor_on + # echo off > monitor_on + # cat monitor_on + off + +Please note that you cannot write to the above-mentioned debugfs files w= hile +the monitoring is turned on. If you write to the files while DAMON is r= unning, +an error code such as ``-EBUSY`` will be returned. + + +Tracepoint for Monitoring Results +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D + +DAMON provides the monitoring results via a tracepoint, +``damon:damon_aggregated``. While the monitoring is turned on, you coul= d +record the tracepoint events and show results using tracepoint supportin= g tools +like ``perf``. For example:: + + # echo on > monitor_on + # perf record damon:damon_aggregated & + # sleep 5 + # kill 9 $(pidof perf) + # echo off > monitor_on + # perf script diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin= -guide/mm/index.rst index 4b14d8b50e9e..cbd19d5e625f 100644 --- a/Documentation/admin-guide/mm/index.rst +++ b/Documentation/admin-guide/mm/index.rst @@ -27,6 +27,7 @@ the Linux memory management. =20 concepts cma_debugfs + damon/index hugetlbpage idle_page_tracking ksm diff --git a/Documentation/vm/damon/api.rst b/Documentation/vm/damon/api.= rst new file mode 100644 index 000000000000..08f34df45523 --- /dev/null +++ b/Documentation/vm/damon/api.rst @@ -0,0 +1,20 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +API Reference +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Kernel space programs can use every feature of DAMON using below APIs. = All you +need to do is including ``damon.h``, which is located in ``include/linux= /`` of +the source tree. + +Structures +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +.. kernel-doc:: include/linux/damon.h + + +Functions +=3D=3D=3D=3D=3D=3D=3D=3D=3D + +.. kernel-doc:: mm/damon/core.c diff --git a/Documentation/vm/damon/design.rst b/Documentation/vm/damon/d= esign.rst new file mode 100644 index 000000000000..727d72093f8f --- /dev/null +++ b/Documentation/vm/damon/design.rst @@ -0,0 +1,166 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D +Design +=3D=3D=3D=3D=3D=3D + +Configurable Layers +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +DAMON provides data access monitoring functionality while making the acc= uracy +and the overhead controllable. The fundamental access monitorings requi= re +primitives that dependent on and optimized for the target address space.= On +the other hand, the accuracy and overhead tradeoff mechanism, which is t= he core +of DAMON, is in the pure logic space. DAMON separates the two parts in +different layers and defines its interface to allow various low level +primitives implementations configurable with the core logic. + +Due to this separated design and the configurable interface, users can e= xtend +DAMON for any address space by configuring the core logics with appropri= ate low +level primitive implementations. If appropriate one is not provided, us= ers can +implement the primitives on their own. + +For example, physical memory, virtual memory, swap space, those for spec= ific +processes, NUMA nodes, files, and backing memory devices would be suppor= table. +Also, if some architectures or devices support special optimized access = check +primitives, those will be easily configurable. + + +Reference Implementations of Address Space Specific Primitives +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The low level primitives for the fundamental access monitoring are defin= ed in +two parts: + +1. Identification of the monitoring target address range for the address= space. +2. Access check of specific address range in the target space. + +DAMON currently provides the implementation of the primitives for only t= he +virtual address spaces. Below two subsections describe how it works. + + +PTE Accessed-bit Based Access Check +----------------------------------- + +The implementation for the virtual address space uses PTE Accessed-bit f= or +basic access checks. It finds the relevant PTE Accessed bit from the ad= dress +by walking the page table for the target task of the address. In this w= ay, the +implementation finds and clears the bit for next sampling target address= and +checks whether the bit set again after one sampling period. This could = disturb +other kernel subsystems using the Accessed bits, namely Idle page tracki= ng and +the reclaim logic. To avoid such disturbances, DAMON makes it mutually +exclusive with Idle page tracking and uses ``PG_idle`` and ``PG_young`` = page +flags to solve the conflict with the reclaim logic, as Idle page trackin= g does. + + +VMA-based Target Address Range Construction +------------------------------------------- + +Only small parts in the super-huge virtual address space of the processe= s are +mapped to the physical memory and accessed. Thus, tracking the unmapped +address regions is just wasteful. However, because DAMON can deal with = some +level of noise using the adaptive regions adjustment mechanism, tracking= every +mapping is not strictly required but could even incur a high overhead in= some +cases. That said, too huge unmapped areas inside the monitoring target = should +be removed to not take the time for the adaptive mechanism. + +For the reason, this implementation converts the complex mappings to thr= ee +distinct regions that cover every mapped area of the address space. The= two +gaps between the three regions are the two biggest unmapped areas in the= given +address space. The two biggest unmapped areas would be the gap between = the +heap and the uppermost mmap()-ed region, and the gap between the lowermo= st +mmap()-ed region and the stack in most of the cases. Because these gaps= are +exceptionally huge in usual address spaces, excluding these will be suff= icient +to make a reasonable trade-off. Below shows this in detail:: + + + + + (small mmap()-ed regions and munmap()-ed regions) + + + + + +Address Space Independent Core Mechanisms +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Below four sections describe each of the DAMON core mechanisms and the f= ive +monitoring attributes, ``sampling interval``, ``aggregation interval``, +``regions update interval``, ``minimum number of regions``, and ``maximu= m +number of regions``. + + +Access Frequency Monitoring +--------------------------- + +The output of DAMON says what pages are how frequently accessed for a gi= ven +duration. The resolution of the access frequency is controlled by setti= ng +``sampling interval`` and ``aggregation interval``. In detail, DAMON ch= ecks +access to each page per ``sampling interval`` and aggregates the results= . In +other words, counts the number of the accesses to each page. After each +``aggregation interval`` passes, DAMON calls callback functions that pre= viously +registered by users so that users can read the aggregated results and th= en +clears the results. This can be described in below simple pseudo-code:: + + while monitoring_on: + for page in monitoring_target: + if accessed(page): + nr_accesses[page] +=3D 1 + if time() % aggregation_interval =3D=3D 0: + for callback in user_registered_callbacks: + callback(monitoring_target, nr_accesses) + for page in monitoring_target: + nr_accesses[page] =3D 0 + sleep(sampling interval) + +The monitoring overhead of this mechanism will arbitrarily increase as t= he +size of the target workload grows. + + +Region Based Sampling +--------------------- + +To avoid the unbounded increase of the overhead, DAMON groups adjacent p= ages +that assumed to have the same access frequencies into a region. As long= as the +assumption (pages in a region have the same access frequencies) is kept,= only +one page in the region is required to be checked. Thus, for each ``samp= ling +interval``, DAMON randomly picks one page in each region, waits for one +``sampling interval``, checks whether the page is accessed meanwhile, an= d +increases the access frequency of the region if so. Therefore, the moni= toring +overhead is controllable by setting the number of regions. DAMON allows= users +to set the minimum and the maximum number of regions for the trade-off. + +This scheme, however, cannot preserve the quality of the output if the +assumption is not guaranteed. + + +Adaptive Regions Adjustment +--------------------------- + +Even somehow the initial monitoring target regions are well constructed = to +fulfill the assumption (pages in same region have similar access frequen= cies), +the data access pattern can be dynamically changed. This will result in= low +monitoring quality. To keep the assumption as much as possible, DAMON +adaptively merges and splits each region based on their access frequency= . + +For each ``aggregation interval``, it compares the access frequencies of +adjacent regions and merges those if the frequency difference is small. = Then, +after it reports and clears the aggregated access frequency of each regi= on, it +splits each region into two or three regions if the total number of regi= ons +will not exceed the user-specified maximum number of regions after the s= plit. + +In this way, DAMON provides its best-effort quality and minimal overhead= while +keeping the bounds users set for their trade-off. + + +Dynamic Target Space Updates Handling +------------------------------------- + +The monitoring target address range could dynamically changed. For exam= ple, +virtual memory could be dynamically mapped and unmapped. Physical memor= y could +be hot-plugged. + +As the changes could be quite frequent in some cases, DAMON checks the d= ynamic +memory mapping changes and applies it to the abstracted target area only= for +each of a user-specified time interval (``regions update interval``). diff --git a/Documentation/vm/damon/eval.rst b/Documentation/vm/damon/eva= l.rst new file mode 100644 index 000000000000..4ce1a6d86036 --- /dev/null +++ b/Documentation/vm/damon/eval.rst @@ -0,0 +1,232 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Evaluation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +DAMON is lightweight. It increases system memory usage by 0.39% and slo= ws +target workloads down by 1.16%. + +DAMON is accurate and useful for memory management optimizations. An +experimental DAMON-based operation scheme for THP, namely 'ethp', remove= s +76.15% of THP memory overheads while preserving 51.25% of THP speedup. = Another +experimental DAMON-based 'proactive reclamation' implementation, namely = 'prcl', +reduces 93.38% of residential sets and 23.63% of system memory footprint= while +incurring only 1.22% runtime overhead in the best case (parsec3/freqmine= ). + + +Setup +=3D=3D=3D=3D=3D + +On QEMU/KVM based virtual machines utilizing 130GB of RAM and 36 vCPUs h= osted +by AWS EC2 i3.metal instances that running a kernel that v24 DAMON patch= set is +applied, I measure runtime and consumed system memory while running vari= ous +realistic workloads with several configurations. From each of PARSEC3 [= 3]_ and +SPLASH-2X [4]_ benchmark suites I pick 12 workloads, so I use 24 workloa= ds in +total. I use another wrapper scripts [5]_ for convenient setup and run = of the +workloads. + + +Measurement +----------- + +For the measurement of the amount of consumed memory in system global sc= ope, I +drop caches before starting each of the workloads and monitor 'MemFree' = in the +'/proc/meminfo' file. To make results more stable, I repeat the runs 5 = times +and average results. + + +Configurations +-------------- + +The configurations I use are as below. + +- orig: Linux v5.10 with 'madvise' THP policy +- rec: 'orig' plus DAMON running with virtual memory access recording +- prec: 'orig' plus DAMON running with physical memory access recording +- thp: same with 'orig', but use 'always' THP policy +- ethp: 'orig' plus a DAMON operation scheme, 'efficient THP' +- prcl: 'orig' plus a DAMON operation scheme, 'proactive reclaim [6]_' + +I use 'rec' for measurement of DAMON overheads to target workloads and s= ystem +memory. 'prec' is for physical memory monitroing and recording. It mon= itors +17GB sized 'System RAM' region. The remaining configs including 'thp', = 'ethp', +and 'prcl' are for measurement of DAMON monitoring accuracy. + +'ethp' and 'prcl' are simple DAMON-based operation schemes developed for +proof of concepts of DAMON. 'ethp' reduces memory space waste of THP by= using +DAMON for the decision of promotions and demotion for huge pages, while = 'prcl' +is as similar as the original work. For example, those can be implement= ed as +below:: + + # format: <= action> + # ethp: Use huge pages if a region shows >=3D5% access rate, use reg= ular + # pages if a region >=3D2MB shows 0 access rate for >=3D7 seconds + min max 5 max min max hugepage + 2M max min min 7s max nohugepage + + # prcl: If a region >=3D4KB shows 0 access rate for >=3D5 seconds, p= age out. + 4K max 0 0 5s max pageout + +Note that these examples are designed with my only straightforward intui= tion +because those are for only proof of concepts and monitoring accuracy of = DAMON. +In other words, those are not for production. For production use, those= should +be more tuned. For automation of such tuning, you can use a user space = tool +called DAMOOS [8]_. For the evaluation, we use 'ethp' as same to above +example, but we use DAMOOS-tuned 'prcl' for each workload. + +The evaluation is done using the tests package for DAMON, ``damon-tests`= ` [7]_. +Using it, you can do the evaluation and generate a report on your own. + +.. [1] "Redis latency problems troubleshooting", https://redis.io/topics= /latency +.. [2] "Disable Transparent Huge Pages (THP)", + https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/ +.. [3] "The PARSEC Becnhmark Suite", https://parsec.cs.princeton.edu/ind= ex.htm +.. [4] "SPLASH-2x", https://parsec.cs.princeton.edu/parsec3-doc.htm#spla= sh2x +.. [5] "parsec3_on_ubuntu", https://github.com/sjp38/parsec3_on_ubuntu +.. [6] "Proactively reclaiming idle memory", https://lwn.net/Articles/78= 7611/ +.. [7] "damon-tests", https://github.com/awslabs/damon-tests +.. [8] "DAMOOS", https://github.com/awslabs/damoos + + +Results +=3D=3D=3D=3D=3D=3D=3D + +Below two tables show the measurement results. The runtimes are in seco= nds +while the memory usages are in KiB. Each configuration except 'orig' sh= ows +its overhead relative to 'orig' in percent within parenthesizes.:: + + runtime orig rec (overhead) prec (overh= ead) thp (overhead) ethp (overhead) prcl (overhead) + parsec3/blackscholes 139.658 140.168 (0.37) 139.385 (-0.20= ) 138.367 (-0.92) 139.279 (-0.27) 147.024 (5.27) + parsec3/bodytrack 123.788 124.622 (0.67) 123.636 (-0.12= ) 125.115 (1.07) 123.840 (0.04) 141.928 (14.65) + parsec3/canneal 207.491 210.318 (1.36) 217.927 (5.03)= 174.287 (-16.00) 202.609 (-2.35) 225.483 (8.67) + parsec3/dedup 18.292 18.301 (0.05) 18.378 (0.47)= 18.264 (-0.15) 18.298 (0.03) 20.541 (12.30) + parsec3/facesim 343.893 340.286 (-1.05) 338.217 (-1.65= ) 332.953 (-3.18) 333.840 (-2.92) 365.650 (6.33) + parsec3/fluidanimate 339.959 326.886 (-3.85) 330.286 (-2.85= ) 331.239 (-2.57) 326.011 (-4.10) 341.684 (0.51) + parsec3/freqmine 445.987 436.332 (-2.16) 435.946 (-2.25= ) 435.704 (-2.31) 437.595 (-1.88) 451.414 (1.22) + parsec3/raytrace 184.106 182.158 (-1.06) 182.056 (-1.11= ) 183.180 (-0.50) 183.545 (-0.30) 202.197 (9.83) + parsec3/streamcluster 599.990 674.091 (12.35) 617.314 (2.89)= 521.864 (-13.02) 551.971 (-8.00) 696.127 (16.02) + parsec3/swaptions 220.462 222.637 (0.99) 220.449 (-0.01= ) 219.921 (-0.25) 221.607 (0.52) 223.956 (1.59) + parsec3/vips 87.767 88.700 (1.06) 87.461 (-0.35= ) 87.466 (-0.34) 87.875 (0.12) 91.768 (4.56) + parsec3/x264 110.843 117.856 (6.33) 113.023 (1.97)= 108.665 (-1.97) 115.434 (4.14) 117.811 (6.29) + splash2x/barnes 131.441 129.275 (-1.65) 128.341 (-2.36= ) 119.317 (-9.22) 126.199 (-3.99) 147.602 (12.30) + splash2x/fft 59.705 58.382 (-2.22) 58.858 (-1.42= ) 45.949 (-23.04) 59.939 (0.39) 64.548 (8.11) + splash2x/lu_cb 132.552 131.604 (-0.72) 131.846 (-0.53= ) 132.320 (-0.18) 132.100 (-0.34) 140.289 (5.84) + splash2x/lu_ncb 150.215 149.670 (-0.36) 149.646 (-0.38= ) 148.823 (-0.93) 149.416 (-0.53) 152.338 (1.41) + splash2x/ocean_cp 84.033 76.405 (-9.08) 75.104 (-10.6= 3) 73.487 (-12.55) 77.789 (-7.43) 77.380 (-7.92) + splash2x/ocean_ncp 153.833 154.247 (0.27) 156.227 (1.56)= 106.619 (-30.69) 139.299 (-9.45) 165.030 (7.28) + splash2x/radiosity 143.566 143.654 (0.06) 142.426 (-0.79= ) 141.193 (-1.65) 141.740 (-1.27) 157.817 (9.93) + splash2x/radix 49.984 49.996 (0.02) 50.519 (1.07)= 46.573 (-6.82) 50.724 (1.48) 50.695 (1.42) + splash2x/raytrace 133.238 134.337 (0.83) 134.389 (0.86)= 134.833 (1.20) 131.073 (-1.62) 145.541 (9.23) + splash2x/volrend 121.700 120.652 (-0.86) 120.560 (-0.94= ) 120.629 (-0.88) 119.581 (-1.74) 129.422 (6.35) + splash2x/water_nsquared 370.771 375.236 (1.20) 376.829 (1.63)= 355.592 (-4.09) 354.087 (-4.50) 419.606 (13.17) + splash2x/water_spatial 133.295 132.931 (-0.27) 132.762 (-0.40= ) 133.090 (-0.15) 133.809 (0.39) 153.647 (15.27) + total 4486.580 4538.750 (1.16) 4481.600 (-0.11= ) 4235.430 (-5.60) 4357.660 (-2.87) 4829.510 (7.64) + + + memused.avg orig rec (overhead) prec = (overhead) thp (overhead) ethp (overhead) prcl = (overhead) + parsec3/blackscholes 1828693.600 1834084.000 (0.29) 1823839= .800 (-0.27) 1819296.600 (-0.51) 1830281.800 (0.09) 1603975.= 800 (-12.29) + parsec3/bodytrack 1424963.400 1440085.800 (1.06) 1438384= .200 (0.94) 1421718.400 (-0.23) 1432834.600 (0.55) 1439283.= 000 (1.00) + parsec3/canneal 1036782.600 1052828.800 (1.55) 1050148= .600 (1.29) 1035104.400 (-0.16) 1051145.400 (1.39) 1050019.= 400 (1.28) + parsec3/dedup 2511841.400 2507374.000 (-0.18) 2472450= .600 (-1.57) 2523557.600 (0.47) 2508912.000 (-0.12) 2493347.= 200 (-0.74) + parsec3/facesim 537769.800 550740.800 (2.41) 548683.= 600 (2.03) 543547.800 (1.07) 560556.600 (4.24) 482782.6= 00 (-10.23) + parsec3/fluidanimate 570268.600 585598.000 (2.69) 579837.= 800 (1.68) 571433.000 (0.20) 582112.800 (2.08) 470073.4= 00 (-17.57) + parsec3/freqmine 982941.400 996253.200 (1.35) 993919.= 800 (1.12) 990531.800 (0.77) 1000994.400 (1.84) 750685.8= 00 (-23.63) + parsec3/raytrace 1737446.000 1749908.800 (0.72) 1741183= .800 (0.22) 1726674.800 (-0.62) 1748530.200 (0.64) 1552275.= 600 (-10.66) + parsec3/streamcluster 115857.000 155194.400 (33.95) 158272.= 800 (36.61) 122125.200 (5.41) 134545.600 (16.13) 133448.6= 00 (15.18) + parsec3/swaptions 13694.200 28451.800 (107.76) 28464.6= 00 (107.86) 12797.800 (-6.55) 25328.200 (84.96) 28138.40= 0 (105.48) + parsec3/vips 2976126.400 3002408.600 (0.88) 3008218= .800 (1.08) 2978258.600 (0.07) 2995428.600 (0.65) 2936338.= 600 (-1.34) + parsec3/x264 3233886.200 3258790.200 (0.77) 3248355= .000 (0.45) 3232070.000 (-0.06) 3256360.200 (0.69) 3254707.= 400 (0.64) + splash2x/barnes 1210470.600 1211918.600 (0.12) 1204507= .000 (-0.49) 1210892.800 (0.03) 1217414.800 (0.57) 944053.4= 00 (-22.01) + splash2x/fft 9697440.000 9604535.600 (-0.96) 9210571= .800 (-5.02) 9867368.000 (1.75) 9637571.800 (-0.62) 9804092.= 000 (1.10) + splash2x/lu_cb 510680.400 521792.600 (2.18) 517724.= 600 (1.38) 513500.800 (0.55) 519980.600 (1.82) 351787.0= 00 (-31.11) + splash2x/lu_ncb 512896.200 529353.600 (3.21) 521248.= 600 (1.63) 513493.200 (0.12) 523793.400 (2.12) 418701.6= 00 (-18.37) + splash2x/ocean_cp 3320800.200 3313688.400 (-0.21) 3225585= .000 (-2.87) 3359032.200 (1.15) 3316591.800 (-0.13) 3304702.= 200 (-0.48) + splash2x/ocean_ncp 3915132.400 3917401.000 (0.06) 3884086= .400 (-0.79) 7050398.600 (80.08) 4532528.600 (15.77) 3920395.= 800 (0.13) + splash2x/radiosity 1456908.200 1467611.800 (0.73) 1453612= .600 (-0.23) 1466695.400 (0.67) 1467495.600 (0.73) 421197.6= 00 (-71.09) + splash2x/radix 2345874.600 2318202.200 (-1.18) 2261499= .200 (-3.60) 2438228.400 (3.94) 2373697.800 (1.19) 2336605.= 600 (-0.40) + splash2x/raytrace 43258.800 57624.200 (33.21) 55164.6= 00 (27.52) 46204.400 (6.81) 60475.000 (39.80) 48865.40= 0 (12.96) + splash2x/volrend 149615.000 163809.400 (9.49) 162115.= 400 (8.36) 149119.600 (-0.33) 162747.800 (8.78) 157734.6= 00 (5.43) + splash2x/water_nsquared 40384.400 54848.600 (35.82) 53796.6= 00 (33.21) 41455.800 (2.65) 53226.400 (31.80) 58260.60= 0 (44.27) + splash2x/water_spatial 670580.200 680444.200 (1.47) 670020.= 400 (-0.08) 668262.400 (-0.35) 678552.000 (1.19) 372931.0= 00 (-44.39) + total 40844300.000 41002900.000 (0.39) 4031160= 0.000 (-1.30) 44301900.000 (8.47) 41671052.000 (2.02) 38334431= .000 (-6.14) + + +DAMON Overheads +--------------- + +In total, DAMON virtual memory access recording feature ('rec') incurs 1= .16% +runtime overhead and 0.39% memory space overhead. Even though the size = of the +monitoring target region becomes much larger with the physical memory ac= cess +recording ('prec'), it still shows only modest amount of overhead (-0.11= % for +runtime and -1.30% for memory footprint). + +For a convenient test run of 'rec' and 'prec', I use a Python wrapper. = The +wrapper constantly consumes about 10-15MB of memory. This becomes a hig= h +memory overhead if the target workload has a small memory footprint. +Nonetheless, the overheads are not from DAMON, but from the wrapper, and= thus +should be ignored. This fake memory overhead continues in 'ethp' and 'p= rcl', +as those configurations are also using the Python wrapper. + + +Efficient THP +------------- + +THP 'always' enabled policy achieves 5.60% speedup but incurs 8.47% memo= ry +overhead. It achieves 30.69% speedup in the best case, but 80.08% memor= y +overhead in the worst case. Interestingly, both the best and worst-case= are +with 'splash2x/ocean_ncp'). + +The 2-lines implementation of data access monitoring based THP version (= 'ethp') +shows 2.87% speedup and 2.02% memory overhead. In other words, 'ethp' r= emoves +76.15% of THP memory waste while preserving 51.25% of THP speedup in tot= al. In +the case of the 'splash2x/ocean_ncp', 'ethp' removes 80.30% of THP memor= y waste +while preserving 30.79% of THP speedup. + + +Proactive Reclamation +--------------------- + +As similar to the original work, I use 4G 'zram' swap device for this +configuration. Also note that we use DAMOOS-tuned ethp schemes for each +workload. + +In total, our 1 line implementation of Proactive Reclamation, 'prcl', in= curred +7.64% runtime overhead in total while achieving 6.14% system memory foot= print +reduction. Even in the worst case, the runtime overhead was only 16.02%= . + +Nonetheless, as the memory usage is calculated with 'MemFree' in +'/proc/meminfo', it contains the SwapCached pages. As the swapcached pa= ges can +be easily evicted, I also measured the residential set size of the workl= oads:: + + rss.avg orig rec (overhead) prec = (overhead) thp (overhead) ethp (overhead) prcl = (overhead) + parsec3/blackscholes 587536.800 585720.000 (-0.31) 586233.= 400 (-0.22) 587045.400 (-0.08) 586753.400 (-0.13) 252207.4= 00 (-57.07) + parsec3/bodytrack 32302.200 32290.600 (-0.04) 32261.8= 00 (-0.13) 32215.800 (-0.27) 32173.000 (-0.40) 6798.800= (-78.95) + parsec3/canneal 842370.600 841443.400 (-0.11) 844012.= 400 (0.19) 838074.400 (-0.51) 841700.800 (-0.08) 840804.0= 00 (-0.19) + parsec3/dedup 1180414.800 1164634.600 (-1.34) 1188886= .200 (0.72) 1207821.000 (2.32) 1193896.200 (1.14) 572359.2= 00 (-51.51) + parsec3/facesim 311848.400 311709.800 (-0.04) 311790.= 800 (-0.02) 317345.800 (1.76) 315443.400 (1.15) 188488.0= 00 (-39.56) + parsec3/fluidanimate 531868.000 531885.600 (0.00) 531828.= 800 (-0.01) 532988.000 (0.21) 532959.600 (0.21) 415153.2= 00 (-21.94) + parsec3/freqmine 552491.000 552718.600 (0.04) 552807.= 200 (0.06) 556574.200 (0.74) 554374.600 (0.34) 36573.40= 0 (-93.38) + parsec3/raytrace 879683.400 880752.200 (0.12) 879907.= 000 (0.03) 870631.000 (-1.03) 880952.200 (0.14) 293119.2= 00 (-66.68) + parsec3/streamcluster 110991.800 110937.200 (-0.05) 110964.= 600 (-0.02) 115606.800 (4.16) 116199.000 (4.69) 110108.2= 00 (-0.80) + parsec3/swaptions 5665.000 5718.400 (0.94) 5720.60= 0 (0.98) 5682.200 (0.30) 5628.600 (-0.64) 3613.800= (-36.21) + parsec3/vips 32143.600 31823.200 (-1.00) 31912.2= 00 (-0.72) 33164.200 (3.18) 33925.800 (5.54) 27813.60= 0 (-13.47) + parsec3/x264 81534.000 81811.000 (0.34) 81708.4= 00 (0.21) 83052.400 (1.86) 83758.800 (2.73) 81691.80= 0 (0.19) + splash2x/barnes 1220515.200 1218291.200 (-0.18) 1217699= .600 (-0.23) 1228551.600 (0.66) 1220669.800 (0.01) 681096.0= 00 (-44.20) + splash2x/fft 9915850.400 10036461.000 (1.22) 9881242= .800 (-0.35) 10334603.600 (4.22) 10006993.200 (0.92) 8975181.= 200 (-9.49) + splash2x/lu_cb 511327.200 511679.000 (0.07) 511761.= 600 (0.08) 511971.600 (0.13) 511711.200 (0.08) 338005.0= 00 (-33.90) + splash2x/lu_ncb 511505.000 506816.800 (-0.92) 511392.= 800 (-0.02) 496623.000 (-2.91) 511410.200 (-0.02) 404734.0= 00 (-20.87) + splash2x/ocean_cp 3398834.000 3405017.800 (0.18) 3415287= .800 (0.48) 3443604.600 (1.32) 3416264.200 (0.51) 3387134.= 000 (-0.34) + splash2x/ocean_ncp 3947092.800 3939805.400 (-0.18) 3952311= .600 (0.13) 7165858.800 (81.55) 4610075.000 (16.80) 3944753.= 400 (-0.06) + splash2x/radiosity 1475024.000 1474053.200 (-0.07) 1475032= .400 (0.00) 1483718.800 (0.59) 1475919.600 (0.06) 99637.20= 0 (-93.25) + splash2x/radix 2431302.200 2416928.600 (-0.59) 2455596= .800 (1.00) 2568526.400 (5.64) 2479966.800 (2.00) 2437406.= 600 (0.25) + splash2x/raytrace 23274.400 23278.400 (0.02) 23287.2= 00 (0.05) 28828.000 (23.86) 27800.200 (19.45) 5667.000= (-75.65) + splash2x/volrend 44106.800 44151.400 (0.10) 44186.0= 00 (0.18) 45200.400 (2.48) 44751.200 (1.46) 16912.00= 0 (-61.66) + splash2x/water_nsquared 29427.200 29425.600 (-0.01) 29402.4= 00 (-0.08) 28055.400 (-4.66) 28572.400 (-2.90) 13207.80= 0 (-55.12) + splash2x/water_spatial 664312.200 664095.600 (-0.03) 663025.= 200 (-0.19) 664100.600 (-0.03) 663597.400 (-0.11) 261214.2= 00 (-60.68) + total 29321300.000 29401500.000 (0.27) 2933830= 0.000 (0.06) 33179900.000 (13.16) 30175600.000 (2.91) 23393600= .000 (-20.22) + +In total, 20.22% of residential sets were reduced. + +With parsec3/freqmine, 'prcl' reduced 93.38% of residential sets and 23.= 63% of +system memory usage while incurring only 1.22% runtime overhead. diff --git a/Documentation/vm/damon/faq.rst b/Documentation/vm/damon/faq.= rst new file mode 100644 index 000000000000..088128bbf22b --- /dev/null +++ b/Documentation/vm/damon/faq.rst @@ -0,0 +1,58 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D +Frequently Asked Questions +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +Why a new subsystem, instead of extending perf or other user space tools= ? +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +First, because it needs to be lightweight as much as possible so that it= can be +used online, any unnecessary overhead such as kernel - user space contex= t +switching cost should be avoided. Second, DAMON aims to be used by othe= r +programs including the kernel. Therefore, having a dependency on specif= ic +tools like perf is not desirable. These are the two biggest reasons why= DAMON +is implemented in the kernel space. + + +Can 'idle pages tracking' or 'perf mem' substitute DAMON? +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D + +Idle page tracking is a low level primitive for access check of the phys= ical +address space. 'perf mem' is similar, though it can use sampling to min= imize +the overhead. On the other hand, DAMON is a higher-level framework for = the +monitoring of various address spaces. It is focused on memory managemen= t +optimization and provides sophisticated accuracy/overhead handling mecha= nisms. +Therefore, 'idle pages tracking' and 'perf mem' could provide a subset o= f +DAMON's output, but cannot substitute DAMON. + + +How can I optimize my system's memory management using DAMON? +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Because there are several ways for the DAMON-based optimizations, we wro= te a +separate document, :doc:`/admin-guide/mm/damon/guide`. Please refer to = that. + + +Does DAMON support virtual memory only? +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +No. The core of the DAMON is address space independent. The address sp= ace +specific low level primitive parts including monitoring target regions +constructions and actual access checks can be implemented and configured= on the +DAMON core by the users. In this way, DAMON users can monitor any addre= ss +space with any access check technique. + +Nonetheless, DAMON provides vma tracking and PTE Accessed bit check base= d +implementations of the address space dependent functions for the virtual= memory +by default, for a reference and convenient use. In near future, we will +provide those for physical memory address space. + + +Can I simply monitor page granularity? +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Yes. You can do so by setting the ``min_nr_regions`` attribute higher t= han the +working set size divided by the page size. Because the monitoring targe= t +regions size is forced to be ``>=3Dpage size``, the region split will ma= ke no +effect. diff --git a/Documentation/vm/damon/index.rst b/Documentation/vm/damon/in= dex.rst new file mode 100644 index 000000000000..17dca3c12aad --- /dev/null +++ b/Documentation/vm/damon/index.rst @@ -0,0 +1,31 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D +DAMON: Data Access MONitor +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +DAMON is a data access monitoring framework subsystem for the Linux kern= el. +The core mechanisms of DAMON (refer to :doc:`design` for the detail) mak= e it + + - *accurate* (the monitoring output is useful enough for DRAM level mem= ory + management; It might not appropriate for CPU Cache levels, though), + - *light-weight* (the monitoring overhead is low enough to be applied o= nline), + and + - *scalable* (the upper-bound of the overhead is in constant range rega= rdless + of the size of target workloads). + +Using this framework, therefore, the kernel's memory management mechanis= ms can +make advanced decisions. Experimental memory management optimization wo= rks +that incurring high data accesses monitoring overhead could implemented = again. +In user space, meanwhile, users who have some special workloads can writ= e +personalized applications for better understanding and optimizations of = their +workloads and systems. + +.. toctree:: + :maxdepth: 2 + + faq + design + eval + api + plans diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst index eff5fbd492d0..b51f0d8992f8 100644 --- a/Documentation/vm/index.rst +++ b/Documentation/vm/index.rst @@ -32,6 +32,7 @@ descriptions of data structures and algorithms. arch_pgtable_helpers balance cleancache + damon/index free_page_reporting frontswap highmem --=20 2.17.1