All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>, Tejun Heo <tj@kernel.org>,
	<netdev@vger.kernel.org>, <linux-mm@kvack.org>,
	<cgroups@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy
Date: Wed, 28 Oct 2015 11:20:03 +0300	[thread overview]
Message-ID: <20151028082003.GK13221@esperanza> (raw)
In-Reply-To: <20151027155833.GB4665@cmpxchg.org>

On Tue, Oct 27, 2015 at 09:01:08AM -0700, Johannes Weiner wrote:
...
> > > But regardless of tcp window control, we need to account socket memory
> > > in the main memory accounting pool where pressure is shared (to the
> > > best of our abilities) between all accounted memory consumers.
> > > 
> > 
> > No objections to this point. However, I really don't like the idea to
> > charge tcp window size to memory.current instead of charging individual
> > pages consumed by the workload for storing socket buffers, because it is
> > inconsistent with what we have now. Can't we charge individual skb pages
> > as we do in case of other kmem allocations?
> 
> Absolutely, both work for me. I chose that route because it's where
> the networking code already tracks and accounts memory consumed, so it
> seemed like a better site to hook into.
> 
> But I understand your concerns. We want to track this stuff as close
> to the memory allocators as possible.

Exactly.

> 
> > > But also, there are people right now for whom the socket buffers cause
> > > system OOM, but the existing memcg's hard tcp window limitq that
> > > exists absolutely wrecks network performance for them. It's not usable
> > > the way it is. It'd be much better to have the socket buffers exert
> > > pressure on the shared pool, and then propagate the overall pressure
> > > back to individual consumers with reclaim, shrinkers, vmpressure etc.
> > 
> > This might or might not work. I'm not an expert to judge. But if you do
> > this only for memcg leaving the global case as it is, networking people
> > won't budge IMO. So could you please start such a major rework from the
> > global case? Could you please try to deprecate the tcp window limits not
> > only in the legacy memcg hierarchy, but also system-wide in order to
> > attract attention of networking experts?
> 
> I'm definitely interested in addressing this globally as well.
> 
> The idea behind this was to use the memcg part as a testbed. cgroup2
> is going to be new and people are prepared for hiccups when migrating
> their applications to it; and they can roll back to cgroup1 and tcp
> window limits at any time should they run into problems in production.

Then you'd better not touch existing tcp limits at all, because they
just work, and the logic behind them is very close to that of global tcp
limits. I don't think one can simplify it somehow. Moreover, frankly I
still have my reservations about this vmpressure propagation to skb
you're proposing. It might work, but I doubt it will allow us to throw
away explicit tcp limit, as I explained previously. So, even with your
approach I think we can still need per memcg tcp limit *unless* you get
rid of global tcp limit somehow.

> 
> So this seemed like a good way to prove a new mechanism before rolling
> it out to every single Linux setup, rather than switch everybody over
> after the limited scope testing I can do as a developer on my own.
> 
> Keep in mind that my patches are not committing anything in terms of
> interface, so we retain all the freedom to fix and tune the way this
> is implemented, including the freedom to re-add tcp window limits in
> case the pressure balancing is not a comprehensive solution.
> 

I really dislike this kind of proof. It looks like you're trying to push
something you think is right covertly, w/o having a proper discussion
with networking people and then say that it just works and hence should
be done globally, but what if it won't? Revert it? We already have a lot
of dubious stuff in memcg that should be reverted, so let's please try
to avoid this kind of mistakes in future. Note, I say "w/o having a
proper discussion with networking people", because I don't think they
will really care *unless* you change the global logic, simply because
most of them aren't very interested in memcg AFAICS.

That effectively means you loose a chance to listen to networking
experts, who could point you at design flaws and propose an improvement
right away. Let's please not miss such an opportunity. You said that
you'd seen this problem happen w/o cgroups, so you have a use case that
might need fixing at the global level. IMO it shouldn't be difficult to
prepare an RFC patch for the global case first and see what people think
about it.

Thanks,
Vladimir

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>, Tejun Heo <tj@kernel.org>,
	<netdev@vger.kernel.org>, <linux-mm@kvack.org>,
	<cgroups@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy
Date: Wed, 28 Oct 2015 11:20:03 +0300	[thread overview]
Message-ID: <20151028082003.GK13221@esperanza> (raw)
In-Reply-To: <20151027155833.GB4665@cmpxchg.org>

On Tue, Oct 27, 2015 at 09:01:08AM -0700, Johannes Weiner wrote:
...
> > > But regardless of tcp window control, we need to account socket memory
> > > in the main memory accounting pool where pressure is shared (to the
> > > best of our abilities) between all accounted memory consumers.
> > > 
> > 
> > No objections to this point. However, I really don't like the idea to
> > charge tcp window size to memory.current instead of charging individual
> > pages consumed by the workload for storing socket buffers, because it is
> > inconsistent with what we have now. Can't we charge individual skb pages
> > as we do in case of other kmem allocations?
> 
> Absolutely, both work for me. I chose that route because it's where
> the networking code already tracks and accounts memory consumed, so it
> seemed like a better site to hook into.
> 
> But I understand your concerns. We want to track this stuff as close
> to the memory allocators as possible.

Exactly.

> 
> > > But also, there are people right now for whom the socket buffers cause
> > > system OOM, but the existing memcg's hard tcp window limitq that
> > > exists absolutely wrecks network performance for them. It's not usable
> > > the way it is. It'd be much better to have the socket buffers exert
> > > pressure on the shared pool, and then propagate the overall pressure
> > > back to individual consumers with reclaim, shrinkers, vmpressure etc.
> > 
> > This might or might not work. I'm not an expert to judge. But if you do
> > this only for memcg leaving the global case as it is, networking people
> > won't budge IMO. So could you please start such a major rework from the
> > global case? Could you please try to deprecate the tcp window limits not
> > only in the legacy memcg hierarchy, but also system-wide in order to
> > attract attention of networking experts?
> 
> I'm definitely interested in addressing this globally as well.
> 
> The idea behind this was to use the memcg part as a testbed. cgroup2
> is going to be new and people are prepared for hiccups when migrating
> their applications to it; and they can roll back to cgroup1 and tcp
> window limits at any time should they run into problems in production.

Then you'd better not touch existing tcp limits at all, because they
just work, and the logic behind them is very close to that of global tcp
limits. I don't think one can simplify it somehow. Moreover, frankly I
still have my reservations about this vmpressure propagation to skb
you're proposing. It might work, but I doubt it will allow us to throw
away explicit tcp limit, as I explained previously. So, even with your
approach I think we can still need per memcg tcp limit *unless* you get
rid of global tcp limit somehow.

> 
> So this seemed like a good way to prove a new mechanism before rolling
> it out to every single Linux setup, rather than switch everybody over
> after the limited scope testing I can do as a developer on my own.
> 
> Keep in mind that my patches are not committing anything in terms of
> interface, so we retain all the freedom to fix and tune the way this
> is implemented, including the freedom to re-add tcp window limits in
> case the pressure balancing is not a comprehensive solution.
> 

I really dislike this kind of proof. It looks like you're trying to push
something you think is right covertly, w/o having a proper discussion
with networking people and then say that it just works and hence should
be done globally, but what if it won't? Revert it? We already have a lot
of dubious stuff in memcg that should be reverted, so let's please try
to avoid this kind of mistakes in future. Note, I say "w/o having a
proper discussion with networking people", because I don't think they
will really care *unless* you change the global logic, simply because
most of them aren't very interested in memcg AFAICS.

That effectively means you loose a chance to listen to networking
experts, who could point you at design flaws and propose an improvement
right away. Let's please not miss such an opportunity. You said that
you'd seen this problem happen w/o cgroups, so you have a use case that
might need fixing at the global level. IMO it shouldn't be difficult to
prepare an RFC patch for the global case first and see what people think
about it.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>, Tejun Heo <tj@kernel.org>,
	netdev@vger.kernel.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy
Date: Wed, 28 Oct 2015 11:20:03 +0300	[thread overview]
Message-ID: <20151028082003.GK13221@esperanza> (raw)
In-Reply-To: <20151027155833.GB4665@cmpxchg.org>

On Tue, Oct 27, 2015 at 09:01:08AM -0700, Johannes Weiner wrote:
...
> > > But regardless of tcp window control, we need to account socket memory
> > > in the main memory accounting pool where pressure is shared (to the
> > > best of our abilities) between all accounted memory consumers.
> > > 
> > 
> > No objections to this point. However, I really don't like the idea to
> > charge tcp window size to memory.current instead of charging individual
> > pages consumed by the workload for storing socket buffers, because it is
> > inconsistent with what we have now. Can't we charge individual skb pages
> > as we do in case of other kmem allocations?
> 
> Absolutely, both work for me. I chose that route because it's where
> the networking code already tracks and accounts memory consumed, so it
> seemed like a better site to hook into.
> 
> But I understand your concerns. We want to track this stuff as close
> to the memory allocators as possible.

Exactly.

> 
> > > But also, there are people right now for whom the socket buffers cause
> > > system OOM, but the existing memcg's hard tcp window limitq that
> > > exists absolutely wrecks network performance for them. It's not usable
> > > the way it is. It'd be much better to have the socket buffers exert
> > > pressure on the shared pool, and then propagate the overall pressure
> > > back to individual consumers with reclaim, shrinkers, vmpressure etc.
> > 
> > This might or might not work. I'm not an expert to judge. But if you do
> > this only for memcg leaving the global case as it is, networking people
> > won't budge IMO. So could you please start such a major rework from the
> > global case? Could you please try to deprecate the tcp window limits not
> > only in the legacy memcg hierarchy, but also system-wide in order to
> > attract attention of networking experts?
> 
> I'm definitely interested in addressing this globally as well.
> 
> The idea behind this was to use the memcg part as a testbed. cgroup2
> is going to be new and people are prepared for hiccups when migrating
> their applications to it; and they can roll back to cgroup1 and tcp
> window limits at any time should they run into problems in production.

Then you'd better not touch existing tcp limits at all, because they
just work, and the logic behind them is very close to that of global tcp
limits. I don't think one can simplify it somehow. Moreover, frankly I
still have my reservations about this vmpressure propagation to skb
you're proposing. It might work, but I doubt it will allow us to throw
away explicit tcp limit, as I explained previously. So, even with your
approach I think we can still need per memcg tcp limit *unless* you get
rid of global tcp limit somehow.

> 
> So this seemed like a good way to prove a new mechanism before rolling
> it out to every single Linux setup, rather than switch everybody over
> after the limited scope testing I can do as a developer on my own.
> 
> Keep in mind that my patches are not committing anything in terms of
> interface, so we retain all the freedom to fix and tune the way this
> is implemented, including the freedom to re-add tcp window limits in
> case the pressure balancing is not a comprehensive solution.
> 

I really dislike this kind of proof. It looks like you're trying to push
something you think is right covertly, w/o having a proper discussion
with networking people and then say that it just works and hence should
be done globally, but what if it won't? Revert it? We already have a lot
of dubious stuff in memcg that should be reverted, so let's please try
to avoid this kind of mistakes in future. Note, I say "w/o having a
proper discussion with networking people", because I don't think they
will really care *unless* you change the global logic, simply because
most of them aren't very interested in memcg AFAICS.

That effectively means you loose a chance to listen to networking
experts, who could point you at design flaws and propose an improvement
right away. Let's please not miss such an opportunity. You said that
you'd seen this problem happen w/o cgroups, so you have a use case that
might need fixing at the global level. IMO it shouldn't be difficult to
prepare an RFC patch for the global case first and see what people think
about it.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-10-28  8:21 UTC|newest]

Thread overview: 156+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-22  4:21 [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
2015-10-22  4:21 ` Johannes Weiner
2015-10-22  4:21 ` [PATCH 1/8] mm: page_counter: let page_counter_try_charge() return bool Johannes Weiner
2015-10-22  4:21   ` Johannes Weiner
2015-10-23 11:31   ` Michal Hocko
2015-10-23 11:31     ` Michal Hocko
2015-10-22  4:21 ` [PATCH 2/8] mm: memcontrol: export root_mem_cgroup Johannes Weiner
2015-10-22  4:21   ` Johannes Weiner
2015-10-23 11:32   ` Michal Hocko
2015-10-23 11:32     ` Michal Hocko
2015-10-22  4:21 ` [PATCH 3/8] net: consolidate memcg socket buffer tracking and accounting Johannes Weiner
2015-10-22  4:21   ` Johannes Weiner
2015-10-22 18:46   ` Vladimir Davydov
2015-10-22 18:46     ` Vladimir Davydov
2015-10-22 18:46     ` Vladimir Davydov
2015-10-22 19:09     ` Johannes Weiner
2015-10-22 19:09       ` Johannes Weiner
2015-10-23 13:42       ` Vladimir Davydov
2015-10-23 13:42         ` Vladimir Davydov
2015-10-23 13:42         ` Vladimir Davydov
2015-10-23 12:38   ` Michal Hocko
2015-10-23 12:38     ` Michal Hocko
2015-10-22  4:21 ` [PATCH 4/8] mm: memcontrol: prepare for unified hierarchy socket accounting Johannes Weiner
2015-10-22  4:21   ` Johannes Weiner
2015-10-23 12:39   ` Michal Hocko
2015-10-23 12:39     ` Michal Hocko
2015-10-22  4:21 ` [PATCH 5/8] mm: memcontrol: account socket memory on unified hierarchy Johannes Weiner
2015-10-22  4:21   ` Johannes Weiner
2015-10-22 18:47   ` Vladimir Davydov
2015-10-22 18:47     ` Vladimir Davydov
2015-10-22 18:47     ` Vladimir Davydov
2015-10-23 13:19   ` Michal Hocko
2015-10-23 13:19     ` Michal Hocko
2015-10-23 13:19     ` Michal Hocko
2015-10-23 13:59     ` David Miller
2015-10-23 13:59       ` David Miller
2015-10-23 13:59       ` David Miller
2015-10-26 16:56       ` Johannes Weiner
2015-10-26 16:56         ` Johannes Weiner
2015-10-27 12:26         ` Michal Hocko
2015-10-27 12:26           ` Michal Hocko
2015-10-27 13:49           ` David Miller
2015-10-27 13:49             ` David Miller
2015-10-27 13:49             ` David Miller
2015-10-27 15:41           ` Johannes Weiner
2015-10-27 15:41             ` Johannes Weiner
2015-10-27 15:41             ` Johannes Weiner
2015-10-27 16:15             ` Michal Hocko
2015-10-27 16:15               ` Michal Hocko
2015-10-27 16:42               ` Johannes Weiner
2015-10-27 16:42                 ` Johannes Weiner
2015-10-28  0:45                 ` David Miller
2015-10-28  0:45                   ` David Miller
2015-10-28  0:45                   ` David Miller
2015-10-28  3:05                   ` Johannes Weiner
2015-10-28  3:05                     ` Johannes Weiner
2015-10-29 15:25                 ` Michal Hocko
2015-10-29 15:25                   ` Michal Hocko
2015-10-29 16:10                   ` Johannes Weiner
2015-10-29 16:10                     ` Johannes Weiner
2015-10-29 16:10                     ` Johannes Weiner
2015-11-04 10:42                     ` Michal Hocko
2015-11-04 10:42                       ` Michal Hocko
2015-11-04 19:50                       ` Johannes Weiner
2015-11-04 19:50                         ` Johannes Weiner
2015-11-04 19:50                         ` Johannes Weiner
2015-11-05 14:40                         ` Michal Hocko
2015-11-05 14:40                           ` Michal Hocko
2015-11-05 16:16                           ` David Miller
2015-11-05 16:16                             ` David Miller
2015-11-05 16:28                             ` Michal Hocko
2015-11-05 16:28                               ` Michal Hocko
2015-11-05 16:28                               ` Michal Hocko
2015-11-05 16:30                               ` David Miller
2015-11-05 16:30                                 ` David Miller
2015-11-05 22:32                               ` Johannes Weiner
2015-11-05 22:32                                 ` Johannes Weiner
2015-11-05 22:32                                 ` Johannes Weiner
2015-11-06 12:51                                 ` Michal Hocko
2015-11-06 12:51                                   ` Michal Hocko
2015-11-05 20:55                           ` Johannes Weiner
2015-11-05 20:55                             ` Johannes Weiner
2015-11-05 22:52                             ` Johannes Weiner
2015-11-05 22:52                               ` Johannes Weiner
2015-11-05 22:52                               ` Johannes Weiner
2015-11-06 10:57                               ` Michal Hocko
2015-11-06 10:57                                 ` Michal Hocko
2015-11-06 16:19                                 ` Johannes Weiner
2015-11-06 16:19                                   ` Johannes Weiner
2015-11-06 16:46                                   ` Michal Hocko
2015-11-06 16:46                                     ` Michal Hocko
2015-11-06 16:46                                     ` Michal Hocko
2015-11-06 17:45                                     ` Johannes Weiner
2015-11-06 17:45                                       ` Johannes Weiner
2015-11-06 17:45                                       ` Johannes Weiner
2015-11-07  3:45                                     ` David Miller
2015-11-07  3:45                                       ` David Miller
2015-11-12 18:36                                   ` Mel Gorman
2015-11-12 18:36                                     ` Mel Gorman
2015-11-12 18:36                                     ` Mel Gorman
2015-11-12 19:12                                     ` Johannes Weiner
2015-11-12 19:12                                       ` Johannes Weiner
2015-11-06  9:05                             ` Vladimir Davydov
2015-11-06  9:05                               ` Vladimir Davydov
2015-11-06  9:05                               ` Vladimir Davydov
2015-11-06  9:05                               ` Vladimir Davydov
2015-11-06 13:29                               ` Michal Hocko
2015-11-06 13:29                                 ` Michal Hocko
2015-11-06 16:35                               ` Johannes Weiner
2015-11-06 16:35                                 ` Johannes Weiner
2015-11-06 13:21                             ` Michal Hocko
2015-11-06 13:21                               ` Michal Hocko
2015-10-22  4:21 ` [PATCH 6/8] mm: vmscan: simplify memcg vs. global shrinker invocation Johannes Weiner
2015-10-22  4:21   ` Johannes Weiner
2015-10-23 13:26   ` Michal Hocko
2015-10-23 13:26     ` Michal Hocko
2015-10-22  4:21 ` [PATCH 7/8] mm: vmscan: report vmpressure at the level of reclaim activity Johannes Weiner
2015-10-22  4:21   ` Johannes Weiner
2015-10-22 18:48   ` Vladimir Davydov
2015-10-22 18:48     ` Vladimir Davydov
2015-10-22 18:48     ` Vladimir Davydov
2015-10-22 18:48     ` Vladimir Davydov
2015-10-23 13:49   ` Michal Hocko
2015-10-23 13:49     ` Michal Hocko
2015-10-23 13:49     ` Michal Hocko
2015-10-22  4:21 ` [PATCH 8/8] mm: memcontrol: hook up vmpressure to socket pressure Johannes Weiner
2015-10-22  4:21   ` Johannes Weiner
2015-10-22 18:57   ` Vladimir Davydov
2015-10-22 18:57     ` Vladimir Davydov
2015-10-22 18:57     ` Vladimir Davydov
2015-10-22 18:45 ` [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy Vladimir Davydov
2015-10-22 18:45   ` Vladimir Davydov
2015-10-22 18:45   ` Vladimir Davydov
2015-10-26 17:22   ` Johannes Weiner
2015-10-26 17:22     ` Johannes Weiner
2015-10-26 17:22     ` Johannes Weiner
2015-10-26 17:22     ` Johannes Weiner
2015-10-27  8:43     ` Vladimir Davydov
2015-10-27  8:43       ` Vladimir Davydov
2015-10-27  8:43       ` Vladimir Davydov
2015-10-27 16:01       ` Johannes Weiner
2015-10-27 16:01         ` Johannes Weiner
2015-10-28  8:20         ` Vladimir Davydov [this message]
2015-10-28  8:20           ` Vladimir Davydov
2015-10-28  8:20           ` Vladimir Davydov
2015-10-28 18:58           ` Johannes Weiner
2015-10-28 18:58             ` Johannes Weiner
2015-10-29  9:27             ` Vladimir Davydov
2015-10-29  9:27               ` Vladimir Davydov
2015-10-29  9:27               ` Vladimir Davydov
2015-10-29 17:52               ` Johannes Weiner
2015-10-29 17:52                 ` Johannes Weiner
2015-10-29 17:52                 ` Johannes Weiner
2015-11-02 14:47                 ` Vladimir Davydov
2015-11-02 14:47                   ` Vladimir Davydov
2015-11-02 14:47                   ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151028082003.GK13221@esperanza \
    --to=vdavydov@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=netdev@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.