All of lore.kernel.org
 help / color / mirror / Atom feed
From: bot@edi.works
To: yuzhao@google.com
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	page-reclaim@google.com, corbet@lwn.net,
	michael@michaellarabel.com, sofia.trinh@edi.works
Subject: Re: [PATCH v5 00/10] Multigenerational LRU Framework
Date: Wed,  1 Dec 2021 22:28:06 -0800	[thread overview]
Message-ID: <20211202062806.80365-1-bot@edi.works> (raw)
In-Reply-To: <20211111041510.402534-1-yuzhao@google.com>

Kernel / Apache Cassandra benchmark with MGLRU

TLDR
====
With the MGLRU, Apache Cassandra achieved 95% CIs [1.06, 4.10]%,
[1.94, 5.43]% and [4.11, 7.50]% more operations per second (OPS),
respectively, for exponential (distribution) access, random access
and Zipfian access, when swap was off; 95% CIs [0.50, 2.60]%, [6.51,
8.77]% and [3.29, 6.75]% more OPS, respectively, for exponential
access, random access and Zipfian access, when swap was set to
minimum (vm.swappiness=1).

Background
==========
Memory overcommit can increase utilization and, if carried out
properly, can also increase throughput. The challenges are to improve
working set estimation and to optimize page reclaim. The risks are
performance degradation and OOM kills. Short of overcoming the
challenges, the only way to reduce the risks is to underutilize
memory.

Apache Cassandra is one of the most popular open-source NoSQL
databases. YCSB is the leading open-source NoSQL database
benchmarking software that supports multiple access distributions.
Swap can have a negative effect, as Apache Cassandra cautions "Do
never allow your system to swap" [1].

[1]: https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L394

Matrix
======
Kernels: version [+ patchset]
* Baseline: 5.15
* Patched: 5.15 + MGLRU

Swap configurations:
* Off
* Minimum (vm.swappiness=1)

Concurrency: average # of users per CPU
* Medium: 3

Access distributions (2kB objects, 10% update):
* Exponential
* Uniform random
* Zipfian

Total configurations: 12
Data points per configuration: 10
Total run duration (minutes) per data point: ~40

Note that Apache Cassandra reached the peak performance for this
benchmark with 2-3 users per CPU, i.e., its performance started
degrading with fewer or more users.

Procedure
=========
The latest MGLRU patchset for the 5.15 kernel is available at
git fetch https://linux-mm.googlesource.com/page-reclaim \
    refs/changes/30/1430/2

Baseline and patched 5.15 kernel images are available at
https://drive.google.com/drive/folders/1eMkQleAFGkP2vzM_JyRA21oKE0ESHBqp

<install and configure OS>
ycsb_load.sh
systemctl stop cassandra
e2image <backup /mnt/data>

<for each kernel>
    grub-set-default <baseline, patched>
    <for each swap configuration>
        <swapoff, swapon>
        <for each access distribution>
            <update ycsb_run.sh>
            <for each data point>
                systemctl stop cassandra
                e2image <restore /mnt/data>
                reboot
                ycsb_run.sh
                <collect OPS>

Hardware
========
Memory (GB): 256
CPU (total #): 48
NVMe SSD (GB): 1024

OS
==
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=21.10
DISTRIB_CODENAME=impish
DISTRIB_DESCRIPTION="Ubuntu 21.10"

$ cat /proc/swaps
Filename          Type          Size          Used     Priority
/dev/nvme0n1p3    partition     32970748      0        -2

$ cat /sys/fs/cgroup/user.slice/memory.min
4294967296

$ cat /proc/sys/vm/overcommit_memory
1

$ cat /proc/sys/vm/swappiness
1

$ cat /proc/sys/vm/max_map_count
1048575

Apache Cassandra
================
$ nodetool version
ReleaseVersion: 4.0.1

$ cat jvm8-server.options
<existing parameters>

#-XX:+UseParNewGC
#-XX:+UseConcMarkSweepGC
#-XX:+CMSParallelRemarkEnabled
#-XX:SurvivorRatio=8
#-XX:MaxTenuringThreshold=1
#-XX:CMSInitiatingOccupancyFraction=75
#-XX:+UseCMSInitiatingOccupancyOnly
#-XX:CMSWaitDuration=10000
#-XX:+CMSParallelInitialMarkEnabled
#-XX:+CMSEdenChunksRecordAlways
#-XX:+CMSClassUnloadingEnabled

-XX:+UseG1GC
-XX:+ParallelRefProcEnabled
-XX:MaxGCPauseMillis=400

<existing parameters>

$ cat cassandra.yaml
<existing parameters>

data_file_directories: /mnt/data/
key_cache_size_in_mb: 5000
file_cache_enabled: true
file_cache_size_in_mb: 10000
buffer_pool_use_heap_if_exhausted: false
memtable_offheap_space_in_mb: 10000
memtable_allocation_type: offheap_buffers

<existing parameters>

YCSB
====
$ git log
commit ce3eb9ce51c84ee9e236998cdd2cefaeb96798a8 (HEAD -> master,
origin/master, origin/HEAD)
Author: Ivan <john.koepi@gmail.com>
Date:   Tue Feb 16 17:38:00 2021 +0200

    [scylla] enable token aware LB by default, improve the docs (#1507)

$ cat ycsb_load.sh
# load objects
cqlsh -e "create keyspace ycsb WITH REPLICATION = {'class' : \
    'SimpleStrategy', 'replication_factor': 1};"
cqlsh -k ycsb -e "create table usertable (y_id varchar primary key, \
    field0 varchar, field1 varchar, field2 varchar, field3 varchar, \
    field4 varchar, field5 varchar ,field6 varchar, field7 varchar, \
    field8 varchar, field9 varchar);"
ycsb load cassandra-cql -s -threads 24 -p hosts=localhost \
    -p workload=site.ycsb.workloads.CoreWorkload -p fieldlength=200 \
    -p recordcount=130000000

$ cat ycsb_run.sh
# run benchmark
ycsb run cassandra-cql -s -threads 144 -p hosts=localhost \
    -p workload=site.ycsb.workloads.CoreWorkload \
    -p recordcount=130000000 -p operationcount=130000000 \
    -p readproportion=0.9 -p updateproportion=0.1 \
    -p maxexecutiontime=1800 \
    -p requestdistribution=<exponential, uniform, zipfian>

Results
=======
Comparing the patched with the baseline kernel, Apache Cassandra
achieved 95% CIs [1.06, 4.10]%, [1.94, 5.43]% and [4.11, 7.50]% more
OPS, respectively, for exponential access, random access and Zipfian
access, when swap was off; 95% CIs [0.50, 2.60]%, [6.51, 8.77]% and
[3.29, 6.75]% more OPS, respectively, for exponential access, random
access and Zipfian access, when swap was set to minimum
(vm.swappiness=1).

+--------------------+--------------------+---------------------+
| Mean OPS [95% CI]  | No swap            | Minimum swap        |
+--------------------+--------------------+---------------------+
| Exponential access | 71084.9 / 72917.5  | 71499.6 / 72607.9   |
|                    | [751.42, 2913.77]  | [358.40, 1858.19]   |
+--------------------+--------------------+---------------------+
| Random access      | 47127.2 / 48862.8  | 47585.4 / 51220.1   |
|                    | [912.68, 2558.51]  | [3097.39, 4172.00]  |
+--------------------+--------------------+---------------------+
| Zipfian access     | 70271.5 / 74348.8  | 70698.2 / 74248.3   |
|                    | [2887.20, 5267.39] | [2326.69, 4773.50]  |
+--------------------+--------------------+---------------------+
Table 1. Comparison between the baseline and the patched kernels

Comparing minimum swap with no swap, Apache Cassandra achieved 95%
CIs [4.05, 5.60]% more OPS for random access, when using the patched
kernel. There were no statistically significant changes in OPS under
other conditions.

+--------------------+--------------------+---------------------+
| Mean OPS [95% CI]  | Baseline kernel    |  Patched kernel     |
+--------------------+--------------------+---------------------+
| Exponential access | 71084.9 / 71499.6  | 72917.5 / 72607.9   |
|                    | [-358.97, 1188.37] | [-1376.93, 757.73]  |
+--------------------+--------------------+---------------------+
| Random access      | 47127.2 / 47585.4  | 48862.8 / 51220.1   |
|                    | [-424.55, 1340.95] | [1977.09, 2737.50]  |
+--------------------+--------------------+---------------------+
| Zipfian access     | 70271.5 / 70698.2  | 74348.8 / 74248.3   |
|                    | [-749.39, 1602.79] | [-1337.07, 1136.07] |
+--------------------+--------------------+---------------------+
Table 2. Comparison between no swap and minimum swap

Metrics collected during each run are available at
https://github.com/ediworks/KernelPerf/tree/master/mglru/cassandra/5.15

Appendix
========
$ cat raw_data_cassandra.r
v <- c(
    # baseline swapoff exp
    69952, 70274, 70286, 70818, 70946, 71202, 71244, 71615, 71787, 72725,
    # baseline swapoff uni
    45309, 46056, 46086, 46188, 47275, 47524, 47797, 48243, 48329, 48465,
    # baseline swapoff zip
    69096, 69194, 69386, 69408, 69412, 70795, 70890, 71170, 71232, 72132,
    # baseline swapon exp
    69836, 70783, 70951, 71188, 71521, 71764, 72035, 72166, 72287, 72465,
    # baseline swapon uni
    46089, 46963, 47308, 47599, 47776, 47822, 47952, 48042, 48092, 48211,
    # baseline swapon zip
    68986, 69279, 69290, 69805, 70146, 70913, 71462, 71978, 72370, 72753,
    # patched swapoff exp
    70701, 71328, 71458, 72846, 72885, 73078, 73702, 74077, 74415, 74685,
    # patched swapoff uni
    48275, 48460, 48735, 48813, 48902, 48969, 48996, 49007, 49213, 49258,
    # patched swapoff zip
    71829, 72909, 73259, 73835, 74200, 74544, 75318, 75514, 76031, 76049,
    # patched swapon exp
    71169, 71968, 72208, 72374, 72401, 72755, 72861, 72942, 73469, 73932,
    # patched swapon uni
    50292, 50529, 50981, 51224, 51414, 51420, 51480, 51608, 51625, 51628,
    # patched swapon zip
    72032, 72325, 73834, 74366, 74482, 74573, 74810, 75044, 75371, 75646
)

a <- array(v, dim = c(10, 3, 2, 2))

# baseline vs patched
for (swap in 1:2) {
    for (dist in 1:3) {
        r <- t.test(a[, dist, swap, 1], a[, dist, swap, 2])
        print(r)

        p <- r$conf.int * 100 / r$estimate[1]
        if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) {
            s <- sprintf("swap%d dist%d: no significance", swap, dist)
        } else {
            s <- sprintf("swap%d dist%d: [%.2f, %.2f]%%", swap, dist, -p[2], -p[1])
        }
        print(s)
    }
}

# swapoff vs swapon
for (kern in 1:2) {
    for (dist in 1:3) {
        r <- t.test(a[, dist, 1, kern], a[, dist, 2, kern])
        print(r)

        p <- r$conf.int * 100 / r$estimate[1]
        if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) {
            s <- sprintf("kern%d dist%d: no significance", kern, dist)
        } else {
            s <- sprintf("kern%d dist%d: [%.2f, %.2f]%%", kern, dist, -p[2], -p[1])
        }
        print(s)
    }
}

$ R -q -s -f raw_data_cassandra.r

        Welch Two Sample t-test

data:  a[, dist, swap, 1] and a[, dist, swap, 2]
t = -3.6172, df = 14.793, p-value = 0.002585
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2913.7703  -751.4297
sample estimates:
mean of x mean of y
  71084.9   72917.5

[1] "swap1 dist1: [1.06, 4.10]%"

        Welch Two Sample t-test

data:  a[, dist, swap, 1] and a[, dist, swap, 2]
t = -4.679, df = 10.331, p-value = 0.0007961
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2558.5199  -912.6801
sample estimates:
mean of x mean of y
  47127.2   48862.8

[1] "swap1 dist2: [1.94, 5.43]%"

        Welch Two Sample t-test

data:  a[, dist, swap, 1] and a[, dist, swap, 2]
t = -7.2315, df = 16.902, p-value = 1.452e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -5267.396 -2887.204
sample estimates:
mean of x mean of y
  70271.5   74348.8

[1] "swap1 dist3: [4.11, 7.50]%"

        Welch Two Sample t-test

data:  a[, dist, swap, 1] and a[, dist, swap, 2]
t = -3.1057, df = 17.95, p-value = 0.006118
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1858.191  -358.409
sample estimates:
mean of x mean of y
  71499.6   72607.9

[1] "swap2 dist1: [0.50, 2.60]%"

        Welch Two Sample t-test

data:  a[, dist, swap, 1] and a[, dist, swap, 2]
t = -14.307, df = 16.479, p-value = 1.022e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4172.006 -3097.394
sample estimates:
mean of x mean of y
  47585.4   51220.1

[1] "swap2 dist2: [6.51, 8.77]%"

        Welch Two Sample t-test

data:  a[, dist, swap, 1] and a[, dist, swap, 2]
t = -6.1048, df = 17.664, p-value = 9.877e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4773.504 -2326.696
sample estimates:
mean of x mean of y
  70698.2   74248.3

[1] "swap2 dist3: [3.29, 6.75]%"

        Welch Two Sample t-test

data:  a[, dist, 1, kern] and a[, dist, 2, kern]
t = -1.1261, df = 17.998, p-value = 0.2749
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1188.3785   358.9785
sample estimates:
mean of x mean of y
  71084.9   71499.6

[1] "kern1 dist1: no significance"

        Welch Two Sample t-test

data:  a[, dist, 1, kern] and a[, dist, 2, kern]
t = -1.1108, df = 14.338, p-value = 0.2849
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1340.9555   424.5555
sample estimates:
mean of x mean of y
  47127.2   47585.4

[1] "kern1 dist2: no significance"

        Welch Two Sample t-test

data:  a[, dist, 1, kern] and a[, dist, 2, kern]
t = -0.76534, df = 17.035, p-value = 0.4545
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1602.7926   749.3926
sample estimates:
mean of x mean of y
  70271.5   70698.2

[1] "kern1 dist3: no significance"

        Welch Two Sample t-test

data:  a[, dist, 1, kern] and a[, dist, 2, kern]
t = 0.62117, df = 14.235, p-value = 0.5443
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -757.7355 1376.9355
sample estimates:
mean of x mean of y
  72917.5   72607.9

[1] "kern2 dist1: no significance"

        Welch Two Sample t-test

data:  a[, dist, 1, kern] and a[, dist, 2, kern]
t = -13.18, df = 15.466, p-value = 8.07e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2737.509 -1977.091
sample estimates:
mean of x mean of y
  48862.8   51220.1

[1] "kern2 dist2: [4.05, 5.60]%"

        Welch Two Sample t-test

data:  a[, dist, 1, kern] and a[, dist, 2, kern]
t = 0.17104, df = 17.575, p-value = 0.8661
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1136.076  1337.076
sample estimates:
mean of x mean of y
  74348.8   74248.3

[1] "kern2 dist3: no significance"

  parent reply	other threads:[~2021-12-02  6:28 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-11  4:15 [PATCH v5 00/10] Multigenerational LRU Framework Yu Zhao
2021-11-11  4:15 ` Yu Zhao
2021-11-11  4:15 ` [PATCH v5 01/10] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2021-11-11  4:15   ` Yu Zhao
2021-11-11  8:59   ` Will Deacon
2021-11-11  8:59     ` Will Deacon
2021-11-11 11:11     ` Yu Zhao
2021-11-11 11:11       ` Yu Zhao
2021-11-11  4:15 ` [PATCH v5 02/10] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2021-11-11  4:15   ` Yu Zhao
2021-11-11  4:15 ` [PATCH v5 03/10] mm/vmscan.c: refactor shrink_node() Yu Zhao
2021-11-11  4:15   ` Yu Zhao
2021-11-11  4:15 ` [PATCH v5 04/10] mm: multigenerational lru: groundwork Yu Zhao
2021-11-11  4:15   ` Yu Zhao
2021-11-11  4:15 ` [PATCH v5 05/10] mm: multigenerational lru: mm_struct list Yu Zhao
2021-11-11  4:15   ` Yu Zhao
2021-11-11  4:15 ` [PATCH v5 06/10] mm: multigenerational lru: aging Yu Zhao
2021-11-11  4:15   ` Yu Zhao
2021-11-11  4:15 ` [PATCH v5 07/10] mm: multigenerational lru: eviction Yu Zhao
2021-11-11  4:15   ` Yu Zhao
2021-11-11  4:15 ` [PATCH v5 08/10] mm: multigenerational lru: user interface Yu Zhao
2021-11-11  4:15   ` Yu Zhao
2021-11-11  4:15 ` [PATCH v5 09/10] mm: multigenerational lru: Kconfig Yu Zhao
2021-11-11  4:15   ` Yu Zhao
2021-11-11  4:15 ` [PATCH v5 10/10] mm: multigenerational lru: documentation Yu Zhao
2021-11-11  4:15   ` Yu Zhao
2021-11-22  5:32 ` [PATCH v5 00/10] Multigenerational LRU Framework bot
2021-12-02  6:28 ` bot [this message]
2021-12-09  7:24 ` bot
2021-12-18  7:10 ` bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211202062806.80365-1-bot@edi.works \
    --to=bot@edi.works \
    --cc=corbet@lwn.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=michael@michaellarabel.com \
    --cc=page-reclaim@google.com \
    --cc=sofia.trinh@edi.works \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.