linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@linux.alibaba.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	mgorman@techsingularity.net, tj@kernel.org,
	khlebnikov@yandex-team.ru, willy@infradead.org,
	hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	shakeelb@google.com, iamjoonsoo.kim@lge.com,
	richard.weiyang@gmail.com, kirill@shutemov.name,
	alexander.duyck@gmail.com, rong.a.chen@intel.com,
	mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com
Subject: Re: [PATCH v18 00/32] per memcg lru_lock
Date: Wed, 26 Aug 2020 16:59:28 +0800	[thread overview]
Message-ID: <01ed6e45-3853-dcba-61cb-b429a49a7572@linux.alibaba.com> (raw)
In-Reply-To: <20200826011946.spknwjt44d2szrdo@ca-dmjordan1.us.oracle.com>

[-- Attachment #1: Type: text/plain, Size: 3373 bytes --]



在 2020/8/26 上午9:19, Daniel Jordan 写道:
> On Tue, Aug 25, 2020 at 11:26:58AM +0800, Alex Shi wrote:
>> 在 2020/8/25 上午9:56, Daniel Jordan 写道:
>>> Alex, do you have a pointer to the modified readtwice case?
>>
>> Sorry, no. my developer machine crashed, so I lost case my container and modified
>> case. I am struggling to get my container back from a account problematic repository. 
>>
>> But some testing scripts is here, generally, the original readtwice case will
>> run each of threads on each of cpus. The new case will run one container on each cpus,
>> and just run one readtwice thead in each of containers.
> 
> Ok, what you've sent so far gives me an idea of what you did.  My readtwice
> changes were similar, except I used the cgroup interface directly instead of
> docker and shared a filesystem between all the cgroups whereas it looks like
> you had one per memcg.  30 second runs on 5.9-rc2 and v18 gave 11% more data
> read with v18.  This was using 16 cgroups (32 dd tasks) on a 40 CPU, 2 socket
> machine.

I clean up my testing and make it reproducable by a Dockerfile and a case patch which
attached. 
User can build a container from the file, and then do testing like following:

#start some testing containers
for ((i=0; i< 80; i++)); do docker run --privileged=true --rm lrulock bash -c " sleep 20000" & done

#do testing evn setup 
for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do docker exec --privileged=true -it $i bash -c "cd vm-scalability/; bash -x ./case-lru-file-readtwice m"& done

#kick testing
for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do docker exec --privileged=true -it $i bash -c "cd vm-scalability/; bash -x ./case-lru-file-readtwice r"& done

#show result
for i in `docker ps | sed '1 d' | awk '{print $1 }'` ;do echo === $i ===; docker exec $i bash -c 'cat /tmp/vm-scalability-tmp/dd-output-* '  & done | grep MB | awk 'BEGIN {a=0;} { a+=$10 } END {print NR, a/(NR)}'

This time, on a 2P * 20 core * 2 HT machine,
This readtwice performance is 252% compare to v5.9-rc2 kernel. A good surprise!

> 
>>> Even better would be a description of the problem you're having in production
>>> with lru_lock.  We might be able to create at least a simulation of it to show
>>> what the expected improvement of your real workload is.
>>
>> we are using thousands memcgs in a machine, but as a simulation, I guess above case
>> could be helpful to show the problem.
> 
> Using thousands of memcgs to do what?  Any particulars about the type of
> workload?  Surely it's more complicated than page cache reads :)

Yes, the workload are quit different on different business, some use cpu a lot, some use
memory a lot, and some are may mixed. For containers number, that are also quit various
from tens to hundreds to thousands.

> 
>>> I ran a few benchmarks on v17 last week (sysbench oltp readonly, kerndevel from
>>> mmtests, a memcg-ized version of the readtwice case I cooked up) and then today
>>> discovered there's a chance I wasn't running the right kernels, so I'm redoing
>>> them on v18.
> 
> Neither kernel compile nor git checkout in the root cgroup changed much, just
> 0.31% slower on elapsed time for the compile, so no significant regressions
> there.  Now for sysbench again.
> 

Thanks a lot for testing report!
Alex

[-- Attachment #2: Dockerfile --]
[-- Type: text/plain, Size: 509 bytes --]

FROM centos:8
MAINTAINER Alexs 
#WORKDIR /vm-scalability 
#RUN yum update -y && yum groupinstall "Development Tools" -y && yum clean all && \
#examples https://www.linuxtechi.com/build-docker-container-images-with-dockerfile/
RUN yum install git xfsprogs patch make gcc -y && yum clean all && \
git clone  https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/ && \
cd vm-scalability && make usemem

COPY readtwice.patch /vm-scalability/

RUN cd vm-scalability && patch -p1 < readtwice.patch

[-- Attachment #3: readtwice.patch --]
[-- Type: text/plain, Size: 2243 bytes --]

diff --git a/case-lru-file-readtwice b/case-lru-file-readtwice
index 85533b248634..57cb97d121ae 100755
--- a/case-lru-file-readtwice
+++ b/case-lru-file-readtwice
@@ -15,23 +15,30 @@
 
 . ./hw_vars
 
-for i in `seq 1 $nr_task`
-do
-	create_sparse_file $SPARSE_FILE-$i $((ROTATE_BYTES / nr_task))
-	timeout --foreground -s INT ${runtime:-600} dd bs=4k if=$SPARSE_FILE-$i of=/dev/null > $TMPFS_MNT/dd-output-1-$i 2>&1 &
-	timeout --foreground -s INT ${runtime:-600} dd bs=4k if=$SPARSE_FILE-$i of=/dev/null > $TMPFS_MNT/dd-output-2-$i 2>&1 &
-done
+OUT_DIR=$(hostname)-${nr_task}c-$(((mem + (1<<29))>>30))g
+TEST_CASES=${@:-$(echo case-*)}
+
+echo $((1<<30)) > /proc/sys/vm/max_map_count
+echo $((1<<20)) > /proc/sys/kernel/threads-max
+echo 1 > /proc/sys/vm/overcommit_memory
+#echo 3 > /proc/sys/vm/drop_caches
+
+
+i=1
+
+if [ "$1" == "m" ];then
+	mount_tmpfs
+	create_sparse_root
+	create_sparse_file $SPARSE_FILE-$i $((ROTATE_BYTES))
+	exit
+fi
+
+
+if [ "$1" == "r" ];then
+	(timeout --foreground -s INT ${runtime:-300} dd bs=4k if=$SPARSE_FILE-$i of=/dev/null > $TMPFS_MNT/dd-output-1-$i 2>&1)&
+	(timeout --foreground -s INT ${runtime:-300} dd bs=4k if=$SPARSE_FILE-$i of=/dev/null > $TMPFS_MNT/dd-output-2-$i 2>&1)&
+fi
 
 wait
 sleep 1
 
-for file in $TMPFS_MNT/dd-output-*
-do
-	[ -s "$file" ] || {
-		echo "dd output file empty: $file" >&2
-	}
-	cat $file
-	rm  $file
-done
-
-rm `seq -f $SPARSE_FILE-%g 1 $nr_task`
diff --git a/hw_vars b/hw_vars
index 8731cefb9f57..ceeaa9f17c0b 100755
--- a/hw_vars
+++ b/hw_vars
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/sh -ex
 
 if [ -n "$runtime" ]; then
 	USEMEM="$CMD ./usemem --runtime $runtime"
@@ -43,7 +43,7 @@ create_loop_devices()
 	modprobe loop 2>/dev/null
 	[ -e "/dev/loop0" ] || modprobe loop 2>/dev/null
 
-	for i in $(seq 0 8)
+	for i in $(seq 0 104)
 	do
 		[ -e "/dev/loop$i" ] && continue
 		mknod /dev/loop$i b 7 $i
@@ -101,11 +101,11 @@ remove_sparse_root () {
 create_sparse_file () {
 	name=$1
 	size=$2
-	# echo "$name is of size $size"
+	echo "$name is of size $size"
 	$CMD truncate $name -s $size
 	# dd if=/dev/zero of=$name bs=1k count=1 seek=$((size >> 10)) 2>/dev/null
-	# ls $SPARSE_ROOT
-	# ls /tmp/vm-scalability/*
+	ls $SPARSE_ROOT
+	ls /tmp/vm-scalability/*
 }
 
 


  reply	other threads:[~2020-08-26  9:01 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 12:54 [PATCH v18 00/32] per memcg lru_lock Alex Shi
2020-08-24 12:54 ` [PATCH v18 01/32] mm/memcg: warning on !memcg after readahead page charged Alex Shi
2020-08-24 12:54 ` [PATCH v18 02/32] mm/memcg: bail out early from swap accounting when memcg is disabled Alex Shi
2020-08-24 12:54 ` [PATCH v18 03/32] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-08-24 12:54 ` [PATCH v18 04/32] mm/thp: clean up lru_add_page_tail Alex Shi
2020-08-24 12:54 ` [PATCH v18 05/32] mm/thp: remove code path which never got into Alex Shi
2020-08-24 12:54 ` [PATCH v18 06/32] mm/thp: narrow lru locking Alex Shi
2020-09-10 13:49   ` Matthew Wilcox
2020-09-11  3:37     ` Alex Shi
2020-09-13 15:27       ` Matthew Wilcox
2020-09-19  1:00         ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 07/32] mm/swap.c: stop deactivate_file_page if page not on lru Alex Shi
2020-08-24 12:54 ` [PATCH v18 08/32] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-08-24 12:54 ` [PATCH v18 09/32] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-08-24 12:54 ` [PATCH v18 10/32] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-08-24 12:54 ` [PATCH v18 11/32] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-08-24 12:54 ` [PATCH v18 12/32] mm/memcg: optimize mem_cgroup_page_lruvec Alex Shi
2020-08-24 12:54 ` [PATCH v18 13/32] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-08-24 12:54 ` [PATCH v18 14/32] mm/lru: move lru_lock holding in func lru_note_cost_page Alex Shi
2020-08-24 12:54 ` [PATCH v18 15/32] mm/lru: move lock into lru_note_cost Alex Shi
2020-09-21 21:36   ` Hugh Dickins
2020-09-21 22:03     ` Hugh Dickins
2020-09-22  3:39       ` Alex Shi
2020-09-22  3:38     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 16/32] mm/lru: introduce TestClearPageLRU Alex Shi
2020-09-21 23:16   ` Hugh Dickins
2020-09-22  3:53     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 17/32] mm/compaction: do page isolation first in compaction Alex Shi
2020-09-21 23:49   ` Hugh Dickins
2020-09-22  4:57     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 18/32] mm/thp: add tail pages into lru anyway in split_huge_page() Alex Shi
2020-08-24 12:54 ` [PATCH v18 19/32] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-09-22  0:42   ` Hugh Dickins
2020-09-22  5:00     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 20/32] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-09-22  5:27   ` Hugh Dickins
2020-09-22  8:58     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 21/32] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-09-22  5:40   ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 22/32] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-09-22  5:44   ` Hugh Dickins
2020-09-23  1:55     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 23/32] mm/lru: revise the comments of lru_lock Alex Shi
2020-09-22  5:48   ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 24/32] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-09-22  5:53   ` Hugh Dickins
2020-09-23  1:55     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 25/32] mm/mlock: remove lru_lock on TestClearPageMlocked in munlock_vma_page Alex Shi
2020-08-26  5:52   ` Alex Shi
2020-09-22  6:13   ` Hugh Dickins
2020-09-23  1:58     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 26/32] mm/mlock: remove __munlock_isolate_lru_page Alex Shi
2020-08-24 12:55 ` [PATCH v18 27/32] mm/swap.c: optimizing __pagevec_lru_add lru_lock Alex Shi
2020-08-26  9:07   ` Alex Shi
2020-08-24 12:55 ` [PATCH v18 28/32] mm/compaction: Drop locked from isolate_migratepages_block Alex Shi
2020-08-24 12:55 ` [PATCH v18 29/32] mm: Identify compound pages sooner in isolate_migratepages_block Alex Shi
2020-08-24 12:55 ` [PATCH v18 30/32] mm: Drop use of test_and_set_skip in favor of just setting skip Alex Shi
2020-08-24 12:55 ` [PATCH v18 31/32] mm: Add explicit page decrement in exception path for isolate_lru_pages Alex Shi
2020-09-09  1:01   ` Matthew Wilcox
2020-09-09 15:43     ` Alexander Duyck
2020-09-09 17:07       ` Matthew Wilcox
2020-09-09 18:24       ` Hugh Dickins
2020-09-09 20:15         ` Matthew Wilcox
2020-09-09 21:05           ` Hugh Dickins
2020-09-09 21:17         ` Alexander Duyck
2020-08-24 12:55 ` [PATCH v18 32/32] mm: Split release_pages work into 3 passes Alex Shi
2020-08-24 18:42 ` [PATCH v18 00/32] per memcg lru_lock Andrew Morton
2020-08-24 20:24   ` Hugh Dickins
2020-08-25  1:56     ` Daniel Jordan
2020-08-25  3:26       ` Alex Shi
2020-08-25 11:39         ` Matthew Wilcox
2020-08-26  1:19         ` Daniel Jordan
2020-08-26  8:59           ` Alex Shi [this message]
2020-08-28  1:40             ` Daniel Jordan
2020-08-28  5:22               ` Alex Shi
2020-09-09  2:44               ` Aaron Lu
2020-09-09 11:40                 ` Michal Hocko
2020-08-25  8:52       ` Alex Shi
2020-08-25 13:00         ` Alex Shi
2020-08-27  7:01     ` Hugh Dickins
2020-08-27 12:20       ` Race between freeing and waking page Matthew Wilcox
2020-09-08 23:41       ` [PATCH v18 00/32] per memcg lru_lock: reviews Hugh Dickins
2020-09-09  2:24         ` Wei Yang
2020-09-09 15:08         ` Alex Shi
2020-09-09 23:16           ` Hugh Dickins
2020-09-11  2:50             ` Alex Shi
2020-09-12  2:13               ` Hugh Dickins
2020-09-13 14:21                 ` Alex Shi
2020-09-15  8:21                   ` Hugh Dickins
2020-09-15 16:58                     ` Daniel Jordan
2020-09-16 12:44                       ` Alex Shi
2020-09-17  2:37                       ` Alex Shi
2020-09-17 14:35                         ` Daniel Jordan
2020-09-17 15:39                           ` Alexander Duyck
2020-09-17 16:48                             ` Daniel Jordan
2020-09-12  8:38           ` Hugh Dickins
2020-09-13 14:22             ` Alex Shi
2020-09-09 16:11         ` Alexander Duyck
2020-09-10  0:32           ` Hugh Dickins
2020-09-10 14:24             ` Alexander Duyck
2020-09-12  5:12               ` Hugh Dickins
2020-08-25  7:21   ` [PATCH v18 00/32] per memcg lru_lock Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01ed6e45-3853-dcba-61cb-b429a49a7572@linux.alibaba.com \
    --to=alex.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).