linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Ivan Babrou <ivan@cloudflare.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	cgroups@vger.kernel.org, kernel-team <kernel-team@cloudflare.com>
Subject: Re: Low TCP throughput due to vmpressure with swap enabled
Date: Tue, 22 Nov 2022 12:46:15 -0700	[thread overview]
Message-ID: <CAOUHufYd-5cqLsQvPBwcmWeph2pQyQYFRWynyg0UVpzUBWKbxw@mail.gmail.com> (raw)
In-Reply-To: <CABWYdi0G7cyNFbndM-ELTDAR3x4Ngm0AehEp5aP0tfNkXUE+Uw@mail.gmail.com>

On Mon, Nov 21, 2022 at 5:53 PM Ivan Babrou <ivan@cloudflare.com> wrote:
>
> Hello,
>
> We have observed a negative TCP throughput behavior from the following commit:
>
> * 8e8ae645249b mm: memcontrol: hook up vmpressure to socket pressure
>
> It landed back in 2016 in v4.5, so it's not exactly a new issue.
>
> The crux of the issue is that in some cases with swap present the
> workload can be unfairly throttled in terms of TCP throughput.
>
> I am able to reproduce this issue in a VM locally on v6.1-rc6 with 8
> GiB of RAM with zram enabled.
>
> The setup is fairly simple:
>
> 1. Run the following go proxy in one cgroup (it has some memory
> ballast to simulate useful memory usage):
>
> * https://gist.github.com/bobrik/2c1a8a19b921fefe22caac21fda1be82
>
> sudo systemd-run --scope -p MemoryLimit=6G go run main.go
>
> 2. Run the following fio config in another cgroup to simulate mmapped
> page cache usage:
>
> [global]
> size=8g
> bs=256k
> iodepth=256
> direct=0
> ioengine=mmap
> group_reporting
> time_based
> runtime=86400
> numjobs=8
> name=randread
> rw=randread

Is it practical for your workload to apply some madvise/fadvise hint?
For the above repro, it would be fadvise_hint=1 which is mapped into
MADV_RANDOM automatically. The kernel also supports MADV_SEQUENTIAL,
but not POSIX_FADV_NOREUSE at the moment.

We actually have similar issues but unfortunately I haven't been able
to come up with any solution beyond recommending the above flags.
The problem is that harvesting the accessed bit from mmapped memory is
costly, and when random accesses happen fast enough, the cost of doing
that prevents LRU from collecting more information to make better
decisions. In a nutshell, LRU can't tell whether there is genuine
memory locality with your test case.

It's a very difficult problem to solve from LRU's POV. I'd like to
hear more about your workloads and see whether there are workarounds
other than tackling the problem head-on, if applying hints is not
practical or preferrable.

  parent reply	other threads:[~2022-11-22 19:47 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-22  0:53 Low TCP throughput due to vmpressure with swap enabled Ivan Babrou
2022-11-22 18:01 ` Eric Dumazet
2022-11-22 18:11   ` Ivan Babrou
2022-11-22 18:23     ` Eric Dumazet
2022-11-22 18:59 ` Yu Zhao
2022-11-22 19:05   ` Ivan Babrou
2022-11-22 19:08     ` Yu Zhao
2022-11-22 19:46 ` Yu Zhao [this message]
2022-11-22 20:05   ` Yu Zhao
2022-11-23  0:44     ` Yu Zhao
2022-11-23 21:22       ` Johannes Weiner
2022-11-24  1:18         ` Yu Zhao
2022-11-24  1:29           ` Yu Zhao
2022-11-22 20:05 ` Johannes Weiner
2022-11-22 22:11   ` Ivan Babrou
2022-11-23  1:28     ` Ivan Babrou
2022-11-28 18:07       ` Johannes Weiner
2022-12-05 19:28         ` Shakeel Butt
2022-12-05 23:57         ` Ivan Babrou
2022-12-06  0:50           ` Ivan Babrou
2022-12-06 19:00             ` Johannes Weiner
2022-12-06 19:13               ` Eric Dumazet
2022-12-06 20:51                 ` Johannes Weiner
2022-12-06 23:10                   ` Shakeel Butt
2022-12-07 12:53                     ` Johannes Weiner
2022-12-08  0:31                       ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOUHufYd-5cqLsQvPBwcmWeph2pQyQYFRWynyg0UVpzUBWKbxw@mail.gmail.com \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=ivan@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).