[PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2
@ 2016-12-20 16:12 Tejun Heo
  2016-12-20 16:12 ` [PATCH 1/5] kernfs: make kernfs_open_file->mmapped a bitfield Tejun Heo
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Tejun Heo @ 2016-12-20 16:12 UTC (permalink / raw)
  To: gregkh, lizefan, hannes; +Cc: linux-kernel, cgroups, kernel-team

On cgroup v1, the pid listings in "cgroup.procs" and "tasks" are
sorted which adds a lot of complications and overhead.  v2 doesn't
have such requirement and has been intentionally using a modified
sorting order so that the output doesn't look sorted to users.

This patchset re-implements "cgroup.procs" reading for v2 which simply
keeps a css_task_iter open while the file is being read.  Keeping the
iterator open makes it unnecessary to skip to the right position on
each read segment and associated errors - e.g. incorrectly skipping
over pids because earlier pids disappeared between the reads.

Using persistent iterator across multiple read calls requires
->release() callback to clean it up.  kernfs operations
->open/release() are added and piped through cftype.

This patchset contains the following five patches.

 0001-kernfs-make-kernfs_open_file-mmapped-a-bitfield.patch
 0002-kernfs-add-kernfs_ops-open-release-callbacks.patch
 0003-cgroup-add-cftype-open-release-callbacks.patch
 0004-cgroup-reimplement-reading-cgroup.procs-on-cgroup-v2.patch
 0005-cgroup-remove-cgroup_pid_fry-and-friends.patch

0001 is a misc kernfs patch and 0002 adds ->open/release() to kernfs.
0003 pipes ->open/release() through cftype.  0004 implements the new
cgroup.procs for v2 and 0005 removes the now unused sort order frying
logic.

Greg, would it be okay to route the kernfs patches through
cgroup/for-4.11?

The patches are also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup2-procs

diffstat follows.  Thanks.

 fs/kernfs/dir.c             |    2 
 fs/kernfs/file.c            |   53 +++++++++++++++--
 fs/kernfs/kernfs-internal.h |    2 
 include/linux/cgroup-defs.h |    3 +
 include/linux/kernfs.h      |   12 +++-
 kernel/cgroup.c             |  130 +++++++++++++++++++++++++++++---------------
 6 files changed, 148 insertions(+), 54 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] kernfs: make kernfs_open_file->mmapped a bitfield
  2016-12-20 16:12 [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Tejun Heo
@ 2016-12-20 16:12 ` Tejun Heo
  2016-12-20 16:12 ` [PATCH 2/5] kernfs: add kernfs_ops->open/release() callbacks Tejun Heo
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2016-12-20 16:12 UTC (permalink / raw)
  To: gregkh, lizefan, hannes; +Cc: linux-kernel, cgroups, kernel-team, Tejun Heo

More kernfs_open_file->mutex synchronized flags are planned to be
added.  Convert ->mmapped to a bitfield in preparation.

While at it, make kernfs_fop_mmap() use "true" instead of "1" on
->mmapped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/kernfs/file.c       | 2 +-
 include/linux/kernfs.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 78219d5..d0b6fb1 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -516,7 +516,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
 		goto out_put;
 
 	rc = 0;
-	of->mmapped = 1;
+	of->mmapped = true;
 	of->vm_ops = vma->vm_ops;
 	vma->vm_ops = &kernfs_vm_ops;
 out_put:
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 7056238..afd4e5a 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -185,7 +185,7 @@ struct kernfs_open_file {
 	char			*prealloc_buf;
 
 	size_t			atomic_write_len;
-	bool			mmapped;
+	bool			mmapped:1;
 	const struct vm_operations_struct *vm_ops;
 };
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] kernfs: add kernfs_ops->open/release() callbacks
  2016-12-20 16:12 [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Tejun Heo
  2016-12-20 16:12 ` [PATCH 1/5] kernfs: make kernfs_open_file->mmapped a bitfield Tejun Heo
@ 2016-12-20 16:12 ` Tejun Heo
  2016-12-20 16:12 ` [PATCH 3/5] cgroup add cftype->open/release() callbacks Tejun Heo
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2016-12-20 16:12 UTC (permalink / raw)
  To: gregkh, lizefan, hannes; +Cc: linux-kernel, cgroups, kernel-team, Tejun Heo

Add ->open/release() methods to kernfs_ops.  ->open() is called when
the file is opened and ->release() when the file is either released or
severed.  These callbacks can be used, for example, to manage
persistent caching objects over multiple seq_file iterations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/kernfs/dir.c             |  2 +-
 fs/kernfs/file.c            | 51 +++++++++++++++++++++++++++++++++++++++------
 fs/kernfs/kernfs-internal.h |  2 +-
 include/linux/kernfs.h      | 10 +++++++++
 4 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index cf4c636..0b9d23b 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -468,7 +468,7 @@ static void kernfs_drain(struct kernfs_node *kn)
 		rwsem_release(&kn->dep_map, 1, _RET_IP_);
 	}
 
-	kernfs_unmap_bin_file(kn);
+	kernfs_drain_open_files(kn);
 
 	mutex_lock(&kernfs_mutex);
 }
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index d0b6fb1..20c3962 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -708,7 +708,8 @@ static int kernfs_fop_open(struct inode *inode, struct file *file)
 	if (error)
 		goto err_free;
 
-	((struct seq_file *)file->private_data)->private = of;
+	of->seq_file = file->private_data;
+	of->seq_file->private = of;
 
 	/* seq_file clears PWRITE unconditionally, restore it if WRITE */
 	if (file->f_mode & FMODE_WRITE)
@@ -717,13 +718,22 @@ static int kernfs_fop_open(struct inode *inode, struct file *file)
 	/* make sure we have open node struct */
 	error = kernfs_get_open_node(kn, of);
 	if (error)
-		goto err_close;
+		goto err_seq_release;
+
+	if (ops->open) {
+		/* nobody has access to @of yet, skip @of->mutex */
+		error = ops->open(of);
+		if (error)
+			goto err_put_node;
+	}
 
 	/* open succeeded, put active references */
 	kernfs_put_active(kn);
 	return 0;
 
-err_close:
+err_put_node:
+	kernfs_put_open_node(kn, of);
+err_seq_release:
 	seq_release(inode, file);
 err_free:
 	kfree(of->prealloc_buf);
@@ -733,11 +743,32 @@ static int kernfs_fop_open(struct inode *inode, struct file *file)
 	return error;
 }
 
+/* used from release/drain to ensure that ->release() is called exactly once */
+static void kernfs_release_file(struct kernfs_node *kn,
+				struct kernfs_open_file *of)
+{
+	if (!(kn->flags & KERNFS_HAS_RELEASE))
+		return;
+
+	mutex_lock(&of->mutex);
+	if (!of->released) {
+		/*
+		 * A file is never detached without being released and we
+		 * need to be able to release files which are deactivated
+		 * and being drained.  Don't use kernfs_ops().
+		 */
+		kn->attr.ops->release(of);
+		of->released = true;
+	}
+	mutex_unlock(&of->mutex);
+}
+
 static int kernfs_fop_release(struct inode *inode, struct file *filp)
 {
 	struct kernfs_node *kn = filp->f_path.dentry->d_fsdata;
 	struct kernfs_open_file *of = kernfs_of(filp);
 
+	kernfs_release_file(kn, of);
 	kernfs_put_open_node(kn, of);
 	seq_release(inode, filp);
 	kfree(of->prealloc_buf);
@@ -746,12 +777,12 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-void kernfs_unmap_bin_file(struct kernfs_node *kn)
+void kernfs_drain_open_files(struct kernfs_node *kn)
 {
 	struct kernfs_open_node *on;
 	struct kernfs_open_file *of;
 
-	if (!(kn->flags & KERNFS_HAS_MMAP))
+	if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
 		return;
 
 	spin_lock_irq(&kernfs_open_node_lock);
@@ -763,10 +794,16 @@ void kernfs_unmap_bin_file(struct kernfs_node *kn)
 		return;
 
 	mutex_lock(&kernfs_open_file_mutex);
+
 	list_for_each_entry(of, &on->files, list) {
 		struct inode *inode = file_inode(of->file);
-		unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+
+		if (kn->flags & KERNFS_HAS_MMAP)
+			unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+
+		kernfs_release_file(kn, of);
 	}
+
 	mutex_unlock(&kernfs_open_file_mutex);
 
 	kernfs_put_open_node(kn, NULL);
@@ -965,6 +1002,8 @@ struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent,
 		kn->flags |= KERNFS_HAS_SEQ_SHOW;
 	if (ops->mmap)
 		kn->flags |= KERNFS_HAS_MMAP;
+	if (ops->release)
+		kn->flags |= KERNFS_HAS_RELEASE;
 
 	rc = kernfs_add_one(kn);
 	if (rc) {
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index bfd551b..3100987 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -104,7 +104,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
  */
 extern const struct file_operations kernfs_file_fops;
 
-void kernfs_unmap_bin_file(struct kernfs_node *kn);
+void kernfs_drain_open_files(struct kernfs_node *kn);
 
 /*
  * symlink.c
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index afd4e5a..a9b11b8 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -46,6 +46,7 @@ enum kernfs_node_flag {
 	KERNFS_SUICIDAL		= 0x0400,
 	KERNFS_SUICIDED		= 0x0800,
 	KERNFS_EMPTY_DIR	= 0x1000,
+	KERNFS_HAS_RELEASE	= 0x2000,
 };
 
 /* @flags for kernfs_create_root() */
@@ -175,6 +176,7 @@ struct kernfs_open_file {
 	/* published fields */
 	struct kernfs_node	*kn;
 	struct file		*file;
+	struct seq_file		*seq_file;
 	void			*priv;
 
 	/* private fields, do not use outside kernfs proper */
@@ -186,11 +188,19 @@ struct kernfs_open_file {
 
 	size_t			atomic_write_len;
 	bool			mmapped:1;
+	bool			released:1;
 	const struct vm_operations_struct *vm_ops;
 };
 
 struct kernfs_ops {
 	/*
+	 * Optional open/release methods.  Both are called with
+	 * @of->seq_file populated.
+	 */
+	int (*open)(struct kernfs_open_file *of);
+	void (*release)(struct kernfs_open_file *of);
+
+	/*
 	 * Read is handled by either seq_file or raw_read().
 	 *
 	 * If seq_show() is present, seq_file path is active.  Other seq
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] cgroup add cftype->open/release() callbacks
  2016-12-20 16:12 [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Tejun Heo
  2016-12-20 16:12 ` [PATCH 1/5] kernfs: make kernfs_open_file->mmapped a bitfield Tejun Heo
  2016-12-20 16:12 ` [PATCH 2/5] kernfs: add kernfs_ops->open/release() callbacks Tejun Heo
@ 2016-12-20 16:12 ` Tejun Heo
  2016-12-20 16:12 ` [PATCH 4/5] cgroup: reimplement reading "cgroup.procs" on cgroup v2 Tejun Heo
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2016-12-20 16:12 UTC (permalink / raw)
  To: gregkh, lizefan, hannes; +Cc: linux-kernel, cgroups, kernel-team, Tejun Heo

Pipe the newly added kernfs->open/release() callbacks through cftype.
While at it, as cleanup operations now can be performed from
->release() instead of ->seq_stop(), make the latter optional.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup-defs.h |  3 +++
 kernel/cgroup.c             | 24 +++++++++++++++++++++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 861b467..8a916dc 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -388,6 +388,9 @@ struct cftype {
 	struct list_head node;		/* anchored at ss->cfts */
 	struct kernfs_ops *kf_ops;
 
+	int (*open)(struct kernfs_open_file *of);
+	void (*release)(struct kernfs_open_file *of);
+
 	/*
 	 * read_u64() is a shortcut for the common case of returning a
 	 * single integer. Use it in place of read()
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2ee9ec3..87167e4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3503,6 +3503,23 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
 	return 0;
 }
 
+static int cgroup_file_open(struct kernfs_open_file *of)
+{
+	struct cftype *cft = of->kn->priv;
+
+	if (cft->open)
+		return cft->open(of);
+	return 0;
+}
+
+static void cgroup_file_release(struct kernfs_open_file *of)
+{
+	struct cftype *cft = of->kn->priv;
+
+	if (cft->release)
+		cft->release(of);
+}
+
 static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
 				 size_t nbytes, loff_t off)
 {
@@ -3553,7 +3570,8 @@ static void *cgroup_seqfile_next(struct seq_file *seq, void *v, loff_t *ppos)
 
 static void cgroup_seqfile_stop(struct seq_file *seq, void *v)
 {
-	seq_cft(seq)->seq_stop(seq, v);
+	if (seq_cft(seq)->seq_stop)
+		seq_cft(seq)->seq_stop(seq, v);
 }
 
 static int cgroup_seqfile_show(struct seq_file *m, void *arg)
@@ -3575,12 +3593,16 @@ static int cgroup_seqfile_show(struct seq_file *m, void *arg)
 
 static struct kernfs_ops cgroup_kf_single_ops = {
 	.atomic_write_len	= PAGE_SIZE,
+	.open			= cgroup_file_open,
+	.release		= cgroup_file_release,
 	.write			= cgroup_file_write,
 	.seq_show		= cgroup_seqfile_show,
 };
 
 static struct kernfs_ops cgroup_kf_ops = {
 	.atomic_write_len	= PAGE_SIZE,
+	.open			= cgroup_file_open,
+	.release		= cgroup_file_release,
 	.write			= cgroup_file_write,
 	.seq_start		= cgroup_seqfile_start,
 	.seq_next		= cgroup_seqfile_next,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] cgroup: reimplement reading "cgroup.procs" on cgroup v2
  2016-12-20 16:12 [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Tejun Heo
                   ` (2 preceding siblings ...)
  2016-12-20 16:12 ` [PATCH 3/5] cgroup add cftype->open/release() callbacks Tejun Heo
@ 2016-12-20 16:12 ` Tejun Heo
  2016-12-20 16:12 ` [PATCH 5/5] cgroup: remove cgroup_pid_fry() and friends Tejun Heo
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2016-12-20 16:12 UTC (permalink / raw)
  To: gregkh, lizefan, hannes; +Cc: linux-kernel, cgroups, kernel-team, Tejun Heo

On v1, "tasks" and "cgroup.procs" are expected to be sorted which
makes the implementation expensive and unnecessarily complicated
involving result cache management.

v2 doesn't have the sorting requirement, so it can just iterate and
print processes one by one.  seq_files are either read sequentially or
reset to position zero, so the implementation doesn't even need to
worry about seeking.

This keeps the css_task_iter across multiple read(2) calls and
migrations of new processes always append won't miss processes which
are newly migrated in before each read(2).

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 87167e4..fd684bfd 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4426,6 +4426,60 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
 	return ret;
 }
 
+static void cgroup_procs_release(struct kernfs_open_file *of)
+{
+	if (of->priv) {
+		css_task_iter_end(of->priv);
+		kfree(of->priv);
+	}
+}
+
+static void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	struct kernfs_open_file *of = s->private;
+	struct css_task_iter *it = of->priv;
+	struct task_struct *task;
+
+	do {
+		task = css_task_iter_next(it);
+	} while (task && !thread_group_leader(task));
+
+	return task;
+}
+
+static void *cgroup_procs_start(struct seq_file *s, loff_t *pos)
+{
+	struct kernfs_open_file *of = s->private;
+	struct cgroup *cgrp = seq_css(s)->cgroup;
+	struct css_task_iter *it = of->priv;
+
+	/*
+	 * When a seq_file is seeked, it's always traversed sequentially
+	 * from position 0, so we can simply keep iterating on !0 *pos.
+	 */
+	if (!it) {
+		if (WARN_ON_ONCE((*pos)++))
+			return ERR_PTR(-EINVAL);
+
+		it = kzalloc(sizeof(*it), GFP_KERNEL);
+		if (!it)
+			return ERR_PTR(-ENOMEM);
+		of->priv = it;
+		css_task_iter_start(&cgrp->self, it);
+	} else if (!(*pos)++) {
+		css_task_iter_end(it);
+		css_task_iter_start(&cgrp->self, it);
+	}
+
+	return cgroup_procs_next(s, NULL, NULL);
+}
+
+static int cgroup_procs_show(struct seq_file *s, void *v)
+{
+	seq_printf(s, "%d\n", task_tgid_vnr(v));
+	return 0;
+}
+
 /*
  * Stuff for reading the 'tasks'/'procs' files.
  *
@@ -4914,11 +4968,10 @@ static struct cftype cgroup_dfl_base_files[] = {
 	{
 		.name = "cgroup.procs",
 		.file_offset = offsetof(struct cgroup, procs_file),
-		.seq_start = cgroup_pidlist_start,
-		.seq_next = cgroup_pidlist_next,
-		.seq_stop = cgroup_pidlist_stop,
-		.seq_show = cgroup_pidlist_show,
-		.private = CGROUP_FILE_PROCS,
+		.release = cgroup_procs_release,
+		.seq_start = cgroup_procs_start,
+		.seq_next = cgroup_procs_next,
+		.seq_show = cgroup_procs_show,
 		.write = cgroup_procs_write,
 	},
 	{
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] cgroup: remove cgroup_pid_fry() and friends
  2016-12-20 16:12 [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Tejun Heo
                   ` (3 preceding siblings ...)
  2016-12-20 16:12 ` [PATCH 4/5] cgroup: reimplement reading "cgroup.procs" on cgroup v2 Tejun Heo
@ 2016-12-20 16:12 ` Tejun Heo
  2016-12-21  9:59 ` [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Greg KH
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2016-12-20 16:12 UTC (permalink / raw)
  To: gregkh, lizefan, hannes; +Cc: linux-kernel, cgroups, kernel-team, Tejun Heo

cgroup_pid_fry() was added to mangle cgroup.procs pid listing order on
v2 to make it clear that the output is not sorted.  Now that v2 now
uses a separate "cgroup.procs" read method, this is no longer used.
Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 43 +++++--------------------------------------
 1 file changed, 5 insertions(+), 38 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index fd684bfd..b9e2d85 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4618,42 +4618,12 @@ static int pidlist_uniq(pid_t *list, int length)
  * sorted by task pointer.  As pidlists can be fairly large, allocating one
  * per open file is dangerous, so cgroup had to implement shared pool of
  * pidlists keyed by cgroup and namespace.
- *
- * All this extra complexity was caused by the original implementation
- * committing to an entirely unnecessary property.  In the long term, we
- * want to do away with it.  Explicitly scramble sort order if on the
- * default hierarchy so that no such expectation exists in the new
- * interface.
- *
- * Scrambling is done by swapping every two consecutive bits, which is
- * non-identity one-to-one mapping which disturbs sort order sufficiently.
  */
-static pid_t pid_fry(pid_t pid)
-{
-	unsigned a = pid & 0x55555555;
-	unsigned b = pid & 0xAAAAAAAA;
-
-	return (a << 1) | (b >> 1);
-}
-
-static pid_t cgroup_pid_fry(struct cgroup *cgrp, pid_t pid)
-{
-	if (cgroup_on_dfl(cgrp))
-		return pid_fry(pid);
-	else
-		return pid;
-}
-
 static int cmppid(const void *a, const void *b)
 {
 	return *(pid_t *)a - *(pid_t *)b;
 }
 
-static int fried_cmppid(const void *a, const void *b)
-{
-	return pid_fry(*(pid_t *)a) - pid_fry(*(pid_t *)b);
-}
-
 static struct cgroup_pidlist *cgroup_pidlist_find(struct cgroup *cgrp,
 						  enum cgroup_filetype type)
 {
@@ -4741,10 +4711,7 @@ static int pidlist_array_load(struct cgroup *cgrp, enum cgroup_filetype type,
 	css_task_iter_end(&it);
 	length = n;
 	/* now sort & (if procs) strip out duplicates */
-	if (cgroup_on_dfl(cgrp))
-		sort(array, length, sizeof(pid_t), fried_cmppid, NULL);
-	else
-		sort(array, length, sizeof(pid_t), cmppid, NULL);
+	sort(array, length, sizeof(pid_t), cmppid, NULL);
 	if (type == CGROUP_FILE_PROCS)
 		length = pidlist_uniq(array, length);
 
@@ -4876,10 +4843,10 @@ static void *cgroup_pidlist_start(struct seq_file *s, loff_t *pos)
 
 		while (index < end) {
 			int mid = (index + end) / 2;
-			if (cgroup_pid_fry(cgrp, l->list[mid]) == pid) {
+			if (l->list[mid] == pid) {
 				index = mid;
 				break;
-			} else if (cgroup_pid_fry(cgrp, l->list[mid]) <= pid)
+			} else if (l->list[mid] <= pid)
 				index = mid + 1;
 			else
 				end = mid;
@@ -4890,7 +4857,7 @@ static void *cgroup_pidlist_start(struct seq_file *s, loff_t *pos)
 		return NULL;
 	/* Update the abstract position to be the actual pid that we found */
 	iter = l->list + index;
-	*pos = cgroup_pid_fry(cgrp, *iter);
+	*pos = *iter;
 	return iter;
 }
 
@@ -4919,7 +4886,7 @@ static void *cgroup_pidlist_next(struct seq_file *s, void *v, loff_t *pos)
 	if (p >= end) {
 		return NULL;
 	} else {
-		*pos = cgroup_pid_fry(seq_css(s)->cgroup, *p);
+		*pos = *p;
 		return p;
 	}
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2
  2016-12-20 16:12 [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Tejun Heo
                   ` (4 preceding siblings ...)
  2016-12-20 16:12 ` [PATCH 5/5] cgroup: remove cgroup_pid_fry() and friends Tejun Heo
@ 2016-12-21  9:59 ` Greg KH
  2016-12-26  6:22 ` Zefan Li
  2016-12-27 19:53 ` Tejun Heo
  7 siblings, 0 replies; 9+ messages in thread
From: Greg KH @ 2016-12-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lizefan, hannes, linux-kernel, cgroups, kernel-team

On Tue, Dec 20, 2016 at 11:12:17AM -0500, Tejun Heo wrote:
> On cgroup v1, the pid listings in "cgroup.procs" and "tasks" are
> sorted which adds a lot of complications and overhead.  v2 doesn't
> have such requirement and has been intentionally using a modified
> sorting order so that the output doesn't look sorted to users.
> 
> This patchset re-implements "cgroup.procs" reading for v2 which simply
> keeps a css_task_iter open while the file is being read.  Keeping the
> iterator open makes it unnecessary to skip to the right position on
> each read segment and associated errors - e.g. incorrectly skipping
> over pids because earlier pids disappeared between the reads.
> 
> Using persistent iterator across multiple read calls requires
> ->release() callback to clean it up.  kernfs operations
> ->open/release() are added and piped through cftype.
> 
> This patchset contains the following five patches.
> 
>  0001-kernfs-make-kernfs_open_file-mmapped-a-bitfield.patch
>  0002-kernfs-add-kernfs_ops-open-release-callbacks.patch
>  0003-cgroup-add-cftype-open-release-callbacks.patch
>  0004-cgroup-reimplement-reading-cgroup.procs-on-cgroup-v2.patch
>  0005-cgroup-remove-cgroup_pid_fry-and-friends.patch
> 
> 0001 is a misc kernfs patch and 0002 adds ->open/release() to kernfs.
> 0003 pipes ->open/release() through cftype.  0004 implements the new
> cgroup.procs for v2 and 0005 removes the now unused sort order frying
> logic.
> 
> Greg, would it be okay to route the kernfs patches through
> cgroup/for-4.11?

No objection from me at all:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2
  2016-12-20 16:12 [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Tejun Heo
                   ` (5 preceding siblings ...)
  2016-12-21  9:59 ` [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Greg KH
@ 2016-12-26  6:22 ` Zefan Li
  2016-12-27 19:53 ` Tejun Heo
  7 siblings, 0 replies; 9+ messages in thread
From: Zefan Li @ 2016-12-26  6:22 UTC (permalink / raw)
  To: Tejun Heo, gregkh, hannes; +Cc: linux-kernel, cgroups, kernel-team

On 2016/12/21 0:12, Tejun Heo wrote:
> On cgroup v1, the pid listings in "cgroup.procs" and "tasks" are
> sorted which adds a lot of complications and overhead.  v2 doesn't
> have such requirement and has been intentionally using a modified
> sorting order so that the output doesn't look sorted to users.
> 
> This patchset re-implements "cgroup.procs" reading for v2 which simply
> keeps a css_task_iter open while the file is being read.  Keeping the
> iterator open makes it unnecessary to skip to the right position on
> each read segment and associated errors - e.g. incorrectly skipping
> over pids because earlier pids disappeared between the reads.
> 
> Using persistent iterator across multiple read calls requires
> ->release() callback to clean it up.  kernfs operations
> ->open/release() are added and piped through cftype.
> 
> This patchset contains the following five patches.
> 
>  0001-kernfs-make-kernfs_open_file-mmapped-a-bitfield.patch
>  0002-kernfs-add-kernfs_ops-open-release-callbacks.patch
>  0003-cgroup-add-cftype-open-release-callbacks.patch
>  0004-cgroup-reimplement-reading-cgroup.procs-on-cgroup-v2.patch
>  0005-cgroup-remove-cgroup_pid_fry-and-friends.patch
> 
> 0001 is a misc kernfs patch and 0002 adds ->open/release() to kernfs.
> 0003 pipes ->open/release() through cftype.  0004 implements the new
> cgroup.procs for v2 and 0005 removes the now unused sort order frying
> logic.
> 
> Greg, would it be okay to route the kernfs patches through
> cgroup/for-4.11?
> 
> The patches are also available in the following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup2-procs
> 
> diffstat follows.  Thanks.
> 
>  fs/kernfs/dir.c             |    2 
>  fs/kernfs/file.c            |   53 +++++++++++++++--
>  fs/kernfs/kernfs-internal.h |    2 
>  include/linux/cgroup-defs.h |    3 +
>  include/linux/kernfs.h      |   12 +++-
>  kernel/cgroup.c             |  130 +++++++++++++++++++++++++++++---------------
>  6 files changed, 148 insertions(+), 54 deletions(-)
> 

Acked-by: Zefan Li <lizefan@huawei.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2
  2016-12-20 16:12 [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Tejun Heo
                   ` (6 preceding siblings ...)
  2016-12-26  6:22 ` Zefan Li
@ 2016-12-27 19:53 ` Tejun Heo
  7 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2016-12-27 19:53 UTC (permalink / raw)
  To: gregkh, lizefan, hannes; +Cc: linux-kernel, cgroups, kernel-team

On Tue, Dec 20, 2016 at 11:12:17AM -0500, Tejun Heo wrote:
> On cgroup v1, the pid listings in "cgroup.procs" and "tasks" are
> sorted which adds a lot of complications and overhead.  v2 doesn't
> have such requirement and has been intentionally using a modified
> sorting order so that the output doesn't look sorted to users.
> 
> This patchset re-implements "cgroup.procs" reading for v2 which simply
> keeps a css_task_iter open while the file is being read.  Keeping the
> iterator open makes it unnecessary to skip to the right position on
> each read segment and associated errors - e.g. incorrectly skipping
> over pids because earlier pids disappeared between the reads.
> 
> Using persistent iterator across multiple read calls requires
> ->release() callback to clean it up.  kernfs operations
> ->open/release() are added and piped through cftype.

Applied to cgroup/for-4.11.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-12-27 20:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-20 16:12 [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Tejun Heo
2016-12-20 16:12 ` [PATCH 1/5] kernfs: make kernfs_open_file->mmapped a bitfield Tejun Heo
2016-12-20 16:12 ` [PATCH 2/5] kernfs: add kernfs_ops->open/release() callbacks Tejun Heo
2016-12-20 16:12 ` [PATCH 3/5] cgroup add cftype->open/release() callbacks Tejun Heo
2016-12-20 16:12 ` [PATCH 4/5] cgroup: reimplement reading "cgroup.procs" on cgroup v2 Tejun Heo
2016-12-20 16:12 ` [PATCH 5/5] cgroup: remove cgroup_pid_fry() and friends Tejun Heo
2016-12-21  9:59 ` [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2 Greg KH
2016-12-26  6:22 ` Zefan Li
2016-12-27 19:53 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).