linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
@ 2019-07-05  7:05 Yafang Shao
  2019-07-05  9:09 ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Yafang Shao @ 2019-07-05  7:05 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, Yafang Shao, Johannes Weiner, Michal Hocko,
	Vladimir Davydov, Shakeel Butt, Yafang Shao

We always deploy many containers on one host. Some of these containers
are with high priority, while others are with low priority.
memory.{min, low} is useful to help us protect page cache of a specified
container to gain better performance.
But currently it is only supported in cgroup v2.
To support it in cgroup v1, we only need to make small changes, as the
facility is already exist.
This patch exposed two files to user in cgroup v1, which are memory.min
and memory.low. The usage to set these two files is same with cgroup v2.
Both hierarchical and non-hierarchical mode are supported.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Yafang Shao <shaoyafang@didiglobal.com>
---
 Documentation/cgroup-v1/memory.txt |  4 ++++
 mm/memcontrol.c                    | 20 +++++++++++++++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/Documentation/cgroup-v1/memory.txt b/Documentation/cgroup-v1/memory.txt
index a33cedf..7178247 100644
--- a/Documentation/cgroup-v1/memory.txt
+++ b/Documentation/cgroup-v1/memory.txt
@@ -63,6 +63,10 @@ Brief summary of control files.
 				 (See 5.5 for details)
  memory.limit_in_bytes		 # set/show limit of memory usage
  memory.memsw.limit_in_bytes	 # set/show limit of memory+Swap usage
+ memory.min			 # set/show hard memory protection
+				 (See ../admin-guide/cgroup-v2.rst for details)
+ memory.low			 # set/show best-effort memory protection
+				 (See ../admin-guide/cgroup-v2.rst for details)
  memory.failcnt			 # show the number of memory usage hits limits
  memory.memsw.failcnt		 # show the number of memory+Swap hits limits
  memory.max_usage_in_bytes	 # show max memory usage recorded
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3ee806b..58dce75 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -169,6 +169,12 @@ struct mem_cgroup_event {
 
 static void mem_cgroup_threshold(struct mem_cgroup *memcg);
 static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
+static int memory_min_show(struct seq_file *m, void *v);
+static ssize_t memory_min_write(struct kernfs_open_file *of,
+				char *buf, size_t nbytes, loff_t off);
+static int memory_low_show(struct seq_file *m, void *v);
+static ssize_t memory_low_write(struct kernfs_open_file *of,
+				char *buf, size_t nbytes, loff_t off);
 
 /* Stuffs for move charges at task migration. */
 /*
@@ -4288,6 +4294,18 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 		.read_u64 = mem_cgroup_read_u64,
 	},
 	{
+		.name = "min",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = memory_min_show,
+		.write = memory_min_write,
+	},
+	{
+		.name = "low",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = memory_low_show,
+		.write = memory_low_write,
+	},
+	{
 		.name = "failcnt",
 		.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
 		.write = mem_cgroup_reset,
@@ -5925,7 +5943,7 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
 	parent = parent_mem_cgroup(memcg);
 	/* No parent means a non-hierarchical mode on v1 memcg */
 	if (!parent)
-		return MEMCG_PROT_NONE;
+		goto exit;
 
 	if (parent == root)
 		goto exit;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05  7:05 [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1 Yafang Shao
@ 2019-07-05  9:09 ` Michal Hocko
  2019-07-05  9:41   ` Yafang Shao
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2019-07-05  9:09 UTC (permalink / raw)
  To: Yafang Shao
  Cc: akpm, linux-mm, Johannes Weiner, Vladimir Davydov, Shakeel Butt,
	Yafang Shao

On Fri 05-07-19 15:05:30, Yafang Shao wrote:
> We always deploy many containers on one host. Some of these containers
> are with high priority, while others are with low priority.
> memory.{min, low} is useful to help us protect page cache of a specified
> container to gain better performance.
> But currently it is only supported in cgroup v2.
> To support it in cgroup v1, we only need to make small changes, as the
> facility is already exist.
> This patch exposed two files to user in cgroup v1, which are memory.min
> and memory.low. The usage to set these two files is same with cgroup v2.
> Both hierarchical and non-hierarchical mode are supported.

Cgroup v1 API is considered frozen with new features added only to v2.
Why cannot you move over to v2 and have to stick with v1?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05  9:09 ` Michal Hocko
@ 2019-07-05  9:41   ` Yafang Shao
  2019-07-05 11:10     ` Michal Hocko
  2019-07-05 15:52     ` Chris Down
  0 siblings, 2 replies; 14+ messages in thread
From: Yafang Shao @ 2019-07-05  9:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Linux MM, Johannes Weiner, Vladimir Davydov,
	Shakeel Butt, Yafang Shao

On Fri, Jul 5, 2019 at 5:09 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 05-07-19 15:05:30, Yafang Shao wrote:
> > We always deploy many containers on one host. Some of these containers
> > are with high priority, while others are with low priority.
> > memory.{min, low} is useful to help us protect page cache of a specified
> > container to gain better performance.
> > But currently it is only supported in cgroup v2.
> > To support it in cgroup v1, we only need to make small changes, as the
> > facility is already exist.
> > This patch exposed two files to user in cgroup v1, which are memory.min
> > and memory.low. The usage to set these two files is same with cgroup v2.
> > Both hierarchical and non-hierarchical mode are supported.
>
> Cgroup v1 API is considered frozen with new features added only to v2.

The facilities support both cgroup v1 and cgroup v2, and what we need
to do is only exposing the interface.
If the cgroup v1 API is frozen, it will be a pity.

> Why cannot you move over to v2 and have to stick with v1?
Because the interfaces between cgroup v1 and cgroup v2 are changed too
much, which is unacceptable by our customer.
It may take long time to use cgroup v2 in production envrioment, per
my understanding.
BTW, the filesystem on our servers is XFS, but the cgroup  v2
writeback throttle is not supported on XFS by now, that is beyond my
comprehension.

Thanks
Yafang


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05  9:41   ` Yafang Shao
@ 2019-07-05 11:10     ` Michal Hocko
  2019-07-05 14:33       ` Yafang Shao
  2019-07-05 15:52     ` Chris Down
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2019-07-05 11:10 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Andrew Morton, Linux MM, Johannes Weiner, Vladimir Davydov,
	Shakeel Butt, Yafang Shao

On Fri 05-07-19 17:41:44, Yafang Shao wrote:
> On Fri, Jul 5, 2019 at 5:09 PM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > Why cannot you move over to v2 and have to stick with v1?
> Because the interfaces between cgroup v1 and cgroup v2 are changed too
> much, which is unacceptable by our customer.

Could you be more specific about obstacles with respect to interfaces
please?

> It may take long time to use cgroup v2 in production envrioment, per
> my understanding.
> BTW, the filesystem on our servers is XFS, but the cgroup  v2
> writeback throttle is not supported on XFS by now, that is beyond my
> comprehension.

Are you sure? I would be surprised if v1 throttling would work while v2
wouldn't. As far as I remember it is v2 writeback throttling which
actually works. The only throttling we have for v1 is reclaim based one
which is a huge hammer.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05 11:10     ` Michal Hocko
@ 2019-07-05 14:33       ` Yafang Shao
  2019-07-05 15:10         ` Brian Foster
  2019-07-05 19:54         ` Michal Hocko
  0 siblings, 2 replies; 14+ messages in thread
From: Yafang Shao @ 2019-07-05 14:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Linux MM, Johannes Weiner, Vladimir Davydov,
	Shakeel Butt, Yafang Shao

On Fri, Jul 5, 2019 at 7:10 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 05-07-19 17:41:44, Yafang Shao wrote:
> > On Fri, Jul 5, 2019 at 5:09 PM Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > > Why cannot you move over to v2 and have to stick with v1?
> > Because the interfaces between cgroup v1 and cgroup v2 are changed too
> > much, which is unacceptable by our customer.
>
> Could you be more specific about obstacles with respect to interfaces
> please?
>

Lots of applications will be changed.
Kubernetes, Docker and some other applications which are using cgroup v1,
that will be a trouble, because they are not maintained by us.

> > It may take long time to use cgroup v2 in production envrioment, per
> > my understanding.
> > BTW, the filesystem on our servers is XFS, but the cgroup  v2
> > writeback throttle is not supported on XFS by now, that is beyond my
> > comprehension.
>
> Are you sure? I would be surprised if v1 throttling would work while v2
> wouldn't. As far as I remember it is v2 writeback throttling which
> actually works. The only throttling we have for v1 is reclaim based one
> which is a huge hammer.
> --

We did it in cgroup v1 in our kernel.
But the upstream still don't support it in cgroup v2.
So my real question is why upstream can't support such an import file system ?
Do you know which companies  besides facebook are using cgroup v2  in
their product enviroment?

Thanks
Yafang


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05 14:33       ` Yafang Shao
@ 2019-07-05 15:10         ` Brian Foster
  2019-07-05 23:39           ` Yafang Shao
  2019-07-05 23:52           ` Dave Chinner
  2019-07-05 19:54         ` Michal Hocko
  1 sibling, 2 replies; 14+ messages in thread
From: Brian Foster @ 2019-07-05 15:10 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Michal Hocko, Andrew Morton, Linux MM, Johannes Weiner,
	Vladimir Davydov, Shakeel Butt, Yafang Shao, linux-xfs

cc linux-xfs

On Fri, Jul 05, 2019 at 10:33:04PM +0800, Yafang Shao wrote:
> On Fri, Jul 5, 2019 at 7:10 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Fri 05-07-19 17:41:44, Yafang Shao wrote:
> > > On Fri, Jul 5, 2019 at 5:09 PM Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> > > > Why cannot you move over to v2 and have to stick with v1?
> > > Because the interfaces between cgroup v1 and cgroup v2 are changed too
> > > much, which is unacceptable by our customer.
> >
> > Could you be more specific about obstacles with respect to interfaces
> > please?
> >
> 
> Lots of applications will be changed.
> Kubernetes, Docker and some other applications which are using cgroup v1,
> that will be a trouble, because they are not maintained by us.
> 
> > > It may take long time to use cgroup v2 in production envrioment, per
> > > my understanding.
> > > BTW, the filesystem on our servers is XFS, but the cgroup  v2
> > > writeback throttle is not supported on XFS by now, that is beyond my
> > > comprehension.
> >
> > Are you sure? I would be surprised if v1 throttling would work while v2
> > wouldn't. As far as I remember it is v2 writeback throttling which
> > actually works. The only throttling we have for v1 is reclaim based one
> > which is a huge hammer.
> > --
> 
> We did it in cgroup v1 in our kernel.
> But the upstream still don't support it in cgroup v2.
> So my real question is why upstream can't support such an import file system ?
> Do you know which companies  besides facebook are using cgroup v2  in
> their product enviroment?
> 

I think the original issue with regard to XFS cgroupv2 writeback
throttling support was that at the time the XFS patch was proposed,
there wasn't any test coverage to prove that the code worked (and the
original author never followed up). That has since been resolved and
Christoph has recently posted a new patch [1], which appears to have
been accepted by the maintainer.

Brian

[1] https://marc.info/?l=linux-xfs&m=156138379906141&w=2

> Thanks
> Yafang
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05  9:41   ` Yafang Shao
  2019-07-05 11:10     ` Michal Hocko
@ 2019-07-05 15:52     ` Chris Down
  2019-07-05 23:47       ` Yafang Shao
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Down @ 2019-07-05 15:52 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Michal Hocko, Andrew Morton, Linux MM, Johannes Weiner,
	Vladimir Davydov, Shakeel Butt, Yafang Shao

Yafang Shao writes:
>> Cgroup v1 API is considered frozen with new features added only to v2.
>
>The facilities support both cgroup v1 and cgroup v2, and what we need
>to do is only exposing the interface.
>If the cgroup v1 API is frozen, it will be a pity.

This might be true in the absolute purest technical sense, but not in a 
practical one. Just exposing the memory protection interface without making it 
comprehend v1's API semantics seems a bad move to me -- for example, how it 
(and things like effective protections) interact without the no internal 
process constraint, and certainly many many more things that nobody has 
realised are not going to work yet.

And to that extent, you're really implicitly asking for a lot of work and 
evaluation to be done on memory protections for an interface which is already 
frozen. I'm quite strongly against that.

>Because the interfaces between cgroup v1 and cgroup v2 are changed too
>much, which is unacceptable by our customer.

The problem is that you're explicitly requesting to use functionality which 
under the hood relies on that new interface while also requesting to not use 
that new interface at the same time :-)

While it may superficially work without it, I'm sceptical that simply adding 
memory.low and memory.min to the v1 hierarchy is going to end up with 
reasonable results under scrutiny, or a coherent system or API.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05 14:33       ` Yafang Shao
  2019-07-05 15:10         ` Brian Foster
@ 2019-07-05 19:54         ` Michal Hocko
  2019-07-05 23:54           ` Yafang Shao
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2019-07-05 19:54 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Andrew Morton, Linux MM, Johannes Weiner, Vladimir Davydov,
	Shakeel Butt, Yafang Shao

On Fri 05-07-19 22:33:04, Yafang Shao wrote:
> On Fri, Jul 5, 2019 at 7:10 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Fri 05-07-19 17:41:44, Yafang Shao wrote:
> > > On Fri, Jul 5, 2019 at 5:09 PM Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> > > > Why cannot you move over to v2 and have to stick with v1?
> > > Because the interfaces between cgroup v1 and cgroup v2 are changed too
> > > much, which is unacceptable by our customer.
> >
> > Could you be more specific about obstacles with respect to interfaces
> > please?
> >
> 
> Lots of applications will be changed.
> Kubernetes, Docker and some other applications which are using cgroup v1,
> that will be a trouble, because they are not maintained by us.

Do they actually have to change or they can simply use v2? I mean, how
many of them really do rely on having tasks in intermediate nodes or
rely on per-thread cgroups? Those should be the most visibile changes in
the interface except for control files naming. If it is purely about the
naming then it should be quite trivial to update, no?

Brian has already answered the xfs part I believe. I am not really
familiar with that topic so I cannot comment anyway.

> Do you know which companies  besides facebook are using cgroup v2  in
> their product enviroment?

I do not really know who those users are but it has been made a wider
decision that v2 is going to be a rework of a new interface and the the
v1 will be preserved and maintain for ever for backward compatibility.
If there are usecases which cannot use v2 because of some fundamental
reasons then we really want to hear about those. And if v2 really is not
usable we can think of adding features to v1 of course.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05 15:10         ` Brian Foster
@ 2019-07-05 23:39           ` Yafang Shao
  2019-07-05 23:52           ` Dave Chinner
  1 sibling, 0 replies; 14+ messages in thread
From: Yafang Shao @ 2019-07-05 23:39 UTC (permalink / raw)
  To: Brian Foster
  Cc: Michal Hocko, Andrew Morton, Linux MM, Johannes Weiner,
	Vladimir Davydov, Shakeel Butt, Yafang Shao, linux-xfs

On Fri, Jul 5, 2019 at 11:11 PM Brian Foster <bfoster@redhat.com> wrote:
>
> cc linux-xfs
>
> On Fri, Jul 05, 2019 at 10:33:04PM +0800, Yafang Shao wrote:
> > On Fri, Jul 5, 2019 at 7:10 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Fri 05-07-19 17:41:44, Yafang Shao wrote:
> > > > On Fri, Jul 5, 2019 at 5:09 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > [...]
> > > > > Why cannot you move over to v2 and have to stick with v1?
> > > > Because the interfaces between cgroup v1 and cgroup v2 are changed too
> > > > much, which is unacceptable by our customer.
> > >
> > > Could you be more specific about obstacles with respect to interfaces
> > > please?
> > >
> >
> > Lots of applications will be changed.
> > Kubernetes, Docker and some other applications which are using cgroup v1,
> > that will be a trouble, because they are not maintained by us.
> >
> > > > It may take long time to use cgroup v2 in production envrioment, per
> > > > my understanding.
> > > > BTW, the filesystem on our servers is XFS, but the cgroup  v2
> > > > writeback throttle is not supported on XFS by now, that is beyond my
> > > > comprehension.
> > >
> > > Are you sure? I would be surprised if v1 throttling would work while v2
> > > wouldn't. As far as I remember it is v2 writeback throttling which
> > > actually works. The only throttling we have for v1 is reclaim based one
> > > which is a huge hammer.
> > > --
> >
> > We did it in cgroup v1 in our kernel.
> > But the upstream still don't support it in cgroup v2.
> > So my real question is why upstream can't support such an import file system ?
> > Do you know which companies  besides facebook are using cgroup v2  in
> > their product enviroment?
> >
>
> I think the original issue with regard to XFS cgroupv2 writeback
> throttling support was that at the time the XFS patch was proposed,
> there wasn't any test coverage to prove that the code worked (and the
> original author never followed up). That has since been resolved and
> Christoph has recently posted a new patch [1], which appears to have
> been accepted by the maintainer.
>
> Brian
>
> [1] https://marc.info/?l=linux-xfs&m=156138379906141&w=2
>

Thanks for your reference.
I will pay attention to that thread.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05 15:52     ` Chris Down
@ 2019-07-05 23:47       ` Yafang Shao
  2019-07-06 11:26         ` Chris Down
  0 siblings, 1 reply; 14+ messages in thread
From: Yafang Shao @ 2019-07-05 23:47 UTC (permalink / raw)
  To: Chris Down
  Cc: Michal Hocko, Andrew Morton, Linux MM, Johannes Weiner,
	Vladimir Davydov, Shakeel Butt, Yafang Shao

On Fri, Jul 5, 2019 at 11:52 PM Chris Down <chris@chrisdown.name> wrote:
>
> Yafang Shao writes:
> >> Cgroup v1 API is considered frozen with new features added only to v2.
> >
> >The facilities support both cgroup v1 and cgroup v2, and what we need
> >to do is only exposing the interface.
> >If the cgroup v1 API is frozen, it will be a pity.
>
> This might be true in the absolute purest technical sense, but not in a
> practical one. Just exposing the memory protection interface without making it
> comprehend v1's API semantics seems a bad move to me -- for example, how it
> (and things like effective protections) interact without the no internal
> process constraint, and certainly many many more things that nobody has
> realised are not going to work yet.
>

Hmm ?
Would be more specific about the issues without the o internal process
constraint ?
The memcg LRU scan/reclaim works fine on both cgroup v1 and cgroup v2,
so the memcg LRU protection should works fine on both cgroup v1 and
cgroup v2 as well.
IOW, if there're some issues without internal process contraint, then
there must be some issues
in cgroup v1 LRU.

> And to that extent, you're really implicitly asking for a lot of work and
> evaluation to be done on memory protections for an interface which is already
> frozen. I'm quite strongly against that.
>
> >Because the interfaces between cgroup v1 and cgroup v2 are changed too
> >much, which is unacceptable by our customer.
>
> The problem is that you're explicitly requesting to use functionality which
> under the hood relies on that new interface while also requesting to not use
> that new interface at the same time :-)
>

I'm sorry that I can't get your point really.
As I said, the facility is already there.
The memcg LRU is not designed for cgroup v2 only.

> While it may superficially work without it, I'm sceptical that simply adding
> memory.low and memory.min to the v1 hierarchy is going to end up with
> reasonable results under scrutiny, or a coherent system or API.

I have finished some simple stress tests, and the result are fine.
Would you pls give me some suggestion on how to test it if you think
there may be some issues ?

Thanks
Yafang


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05 15:10         ` Brian Foster
  2019-07-05 23:39           ` Yafang Shao
@ 2019-07-05 23:52           ` Dave Chinner
  2019-07-08 12:15             ` Brian Foster
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2019-07-05 23:52 UTC (permalink / raw)
  To: Brian Foster
  Cc: Yafang Shao, Michal Hocko, Andrew Morton, Linux MM,
	Johannes Weiner, Vladimir Davydov, Shakeel Butt, Yafang Shao,
	linux-xfs

On Fri, Jul 05, 2019 at 11:10:45AM -0400, Brian Foster wrote:
> cc linux-xfs
> 
> On Fri, Jul 05, 2019 at 10:33:04PM +0800, Yafang Shao wrote:
> > On Fri, Jul 5, 2019 at 7:10 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Fri 05-07-19 17:41:44, Yafang Shao wrote:
> > > > On Fri, Jul 5, 2019 at 5:09 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > [...]
> > > > > Why cannot you move over to v2 and have to stick with v1?
> > > > Because the interfaces between cgroup v1 and cgroup v2 are changed too
> > > > much, which is unacceptable by our customer.
> > >
> > > Could you be more specific about obstacles with respect to interfaces
> > > please?
> > >
> > 
> > Lots of applications will be changed.
> > Kubernetes, Docker and some other applications which are using cgroup v1,
> > that will be a trouble, because they are not maintained by us.
> > 
> > > > It may take long time to use cgroup v2 in production envrioment, per
> > > > my understanding.
> > > > BTW, the filesystem on our servers is XFS, but the cgroup  v2
> > > > writeback throttle is not supported on XFS by now, that is beyond my
> > > > comprehension.
> > >
> > > Are you sure? I would be surprised if v1 throttling would work while v2
> > > wouldn't. As far as I remember it is v2 writeback throttling which
> > > actually works. The only throttling we have for v1 is reclaim based one
> > > which is a huge hammer.
> > > --
> > 
> > We did it in cgroup v1 in our kernel.
> > But the upstream still don't support it in cgroup v2.
> > So my real question is why upstream can't support such an import file system ?
> > Do you know which companies  besides facebook are using cgroup v2  in
> > their product enviroment?
> > 
> 
> I think the original issue with regard to XFS cgroupv2 writeback
> throttling support was that at the time the XFS patch was proposed,
> there wasn't any test coverage to prove that the code worked (and the
> original author never followed up). That has since been resolved and
> Christoph has recently posted a new patch [1], which appears to have
> been accepted by the maintainer.

I don't think the validation issue has been resolved.

i.e. we still don't have regression tests that ensure it keeps
working it in future, or that it works correctly in any specific
distro setting/configuration. The lack of repeatable QoS validation
infrastructure was the reason I never merged support for this in the
first place.

So while the (simple) patch to support it has been merged now,
there's no guarantee that it will work as expected or continue to do
so over the long run as nobody upstream or in distro land has a way
of validating that it is working correctly.

From that perspective, it is still my opinion that one-off "works
for me" testing isn't sufficient validation for a QoS feature that
people will use to implement SLAs with $$$ penalities attached to
QoS failures....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05 19:54         ` Michal Hocko
@ 2019-07-05 23:54           ` Yafang Shao
  0 siblings, 0 replies; 14+ messages in thread
From: Yafang Shao @ 2019-07-05 23:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Linux MM, Johannes Weiner, Vladimir Davydov,
	Shakeel Butt, Yafang Shao

On Sat, Jul 6, 2019 at 3:54 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 05-07-19 22:33:04, Yafang Shao wrote:
> > On Fri, Jul 5, 2019 at 7:10 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Fri 05-07-19 17:41:44, Yafang Shao wrote:
> > > > On Fri, Jul 5, 2019 at 5:09 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > [...]
> > > > > Why cannot you move over to v2 and have to stick with v1?
> > > > Because the interfaces between cgroup v1 and cgroup v2 are changed too
> > > > much, which is unacceptable by our customer.
> > >
> > > Could you be more specific about obstacles with respect to interfaces
> > > please?
> > >
> >
> > Lots of applications will be changed.
> > Kubernetes, Docker and some other applications which are using cgroup v1,
> > that will be a trouble, because they are not maintained by us.
>
> Do they actually have to change or they can simply use v2? I mean, how
> many of them really do rely on having tasks in intermediate nodes or
> rely on per-thread cgroups? Those should be the most visibile changes in
> the interface except for control files naming. If it is purely about the
> naming then it should be quite trivial to update, no?
>

This is not a technical issue.
There're many other factors we have to consider, i.e. the cost.
One simple example is in publich cloud, if we upgrade our system and
force the customers to modify their user code to run on our new
system, we may lost these customers.
As this is not a techical issue really, I don't think we can make a
clear conclusion here.

> Brian has already answered the xfs part I believe. I am not really
> familiar with that topic so I cannot comment anyway.
>
> > Do you know which companies  besides facebook are using cgroup v2  in
> > their product enviroment?
>
> I do not really know who those users are but it has been made a wider
> decision that v2 is going to be a rework of a new interface and the the
> v1 will be preserved and maintain for ever for backward compatibility.
> If there are usecases which cannot use v2 because of some fundamental
> reasons then we really want to hear about those. And if v2 really is not
> usable we can think of adding features to v1 of course.
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05 23:47       ` Yafang Shao
@ 2019-07-06 11:26         ` Chris Down
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Down @ 2019-07-06 11:26 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Michal Hocko, Andrew Morton, Linux MM, Johannes Weiner,
	Vladimir Davydov, Shakeel Butt, Yafang Shao

Yafang Shao writes:
>> While it may superficially work without it, I'm sceptical that simply adding
>> memory.low and memory.min to the v1 hierarchy is going to end up with
>> reasonable results under scrutiny, or a coherent system or API.
>
>I have finished some simple stress tests, and the result are fine.
>Would you pls give me some suggestion on how to test it if you think
>there may be some issues ?

My concerns are about introducing an API that requires propagation of control 
into an API that doesn't delineate between using values for *propagation* and 
using values for *consumption*, and this becomes significantly more important 
when we're talking about something that necessitates active propagation of 
values like memory.{low,min}. That's one reason why I bring up concerns related 
to the fact that v1 allows processes in intermediate cgroups.

So again, I'm not expecting that anything necessarily technically goes wrong 
(hence my comment at the beginning), just that it doesn't necessarily compose 
to a reasonable thing to have in v1. My concerns are almost strictly about 
exposing this as an interface in an API that doesn't have qualities that make 
the end result reasonable.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1
  2019-07-05 23:52           ` Dave Chinner
@ 2019-07-08 12:15             ` Brian Foster
  0 siblings, 0 replies; 14+ messages in thread
From: Brian Foster @ 2019-07-08 12:15 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Yafang Shao, Michal Hocko, Andrew Morton, Linux MM,
	Johannes Weiner, Vladimir Davydov, Shakeel Butt, Yafang Shao,
	linux-xfs

On Sat, Jul 06, 2019 at 09:52:22AM +1000, Dave Chinner wrote:
> On Fri, Jul 05, 2019 at 11:10:45AM -0400, Brian Foster wrote:
> > cc linux-xfs
> > 
> > On Fri, Jul 05, 2019 at 10:33:04PM +0800, Yafang Shao wrote:
> > > On Fri, Jul 5, 2019 at 7:10 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Fri 05-07-19 17:41:44, Yafang Shao wrote:
> > > > > On Fri, Jul 5, 2019 at 5:09 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > [...]
> > > > > > Why cannot you move over to v2 and have to stick with v1?
> > > > > Because the interfaces between cgroup v1 and cgroup v2 are changed too
> > > > > much, which is unacceptable by our customer.
> > > >
> > > > Could you be more specific about obstacles with respect to interfaces
> > > > please?
> > > >
> > > 
> > > Lots of applications will be changed.
> > > Kubernetes, Docker and some other applications which are using cgroup v1,
> > > that will be a trouble, because they are not maintained by us.
> > > 
> > > > > It may take long time to use cgroup v2 in production envrioment, per
> > > > > my understanding.
> > > > > BTW, the filesystem on our servers is XFS, but the cgroup  v2
> > > > > writeback throttle is not supported on XFS by now, that is beyond my
> > > > > comprehension.
> > > >
> > > > Are you sure? I would be surprised if v1 throttling would work while v2
> > > > wouldn't. As far as I remember it is v2 writeback throttling which
> > > > actually works. The only throttling we have for v1 is reclaim based one
> > > > which is a huge hammer.
> > > > --
> > > 
> > > We did it in cgroup v1 in our kernel.
> > > But the upstream still don't support it in cgroup v2.
> > > So my real question is why upstream can't support such an import file system ?
> > > Do you know which companies  besides facebook are using cgroup v2  in
> > > their product enviroment?
> > > 
> > 
> > I think the original issue with regard to XFS cgroupv2 writeback
> > throttling support was that at the time the XFS patch was proposed,
> > there wasn't any test coverage to prove that the code worked (and the
> > original author never followed up). That has since been resolved and
> > Christoph has recently posted a new patch [1], which appears to have
> > been accepted by the maintainer.
> 
> I don't think the validation issue has been resolved.
> 
> i.e. we still don't have regression tests that ensure it keeps
> working it in future, or that it works correctly in any specific
> distro setting/configuration. The lack of repeatable QoS validation
> infrastructure was the reason I never merged support for this in the
> first place.
> 
> So while the (simple) patch to support it has been merged now,
> there's no guarantee that it will work as expected or continue to do
> so over the long run as nobody upstream or in distro land has a way
> of validating that it is working correctly.
> 
> From that perspective, it is still my opinion that one-off "works
> for me" testing isn't sufficient validation for a QoS feature that
> people will use to implement SLAs with $$$ penalities attached to
> QoS failures....
> 

We do have an fstest to cover the accounting bits (which is what the fs
is responsible for). Christoph also sent a patch[1] to enable that on
XFS. I'm sure there's plenty of room for additional/broader test
coverage, of course...

Brian

[1] https://marc.info/?l=fstests&m=156138385006173&w=2

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-07-08 12:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-05  7:05 [PATCH] mm, memcg: support memory.{min, low} protection in cgroup v1 Yafang Shao
2019-07-05  9:09 ` Michal Hocko
2019-07-05  9:41   ` Yafang Shao
2019-07-05 11:10     ` Michal Hocko
2019-07-05 14:33       ` Yafang Shao
2019-07-05 15:10         ` Brian Foster
2019-07-05 23:39           ` Yafang Shao
2019-07-05 23:52           ` Dave Chinner
2019-07-08 12:15             ` Brian Foster
2019-07-05 19:54         ` Michal Hocko
2019-07-05 23:54           ` Yafang Shao
2019-07-05 15:52     ` Chris Down
2019-07-05 23:47       ` Yafang Shao
2019-07-06 11:26         ` Chris Down

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).