linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs
@ 2019-11-04 23:59 Tejun Heo
  2019-11-04 23:59 ` [PATCH 01/10] kernfs: fix ino wrap-around detection Tejun Heo
                   ` (12 more replies)
  0 siblings, 13 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel

Hello,

Currently, there are three IDs which are being used to identify a
cgroup.

1. cgroup->id
2. cgroupfs 32bit ino
3. cgroupfs 32bit ino + 32bit gen

All three IDs are visible to userland through different interfaces.
This is very confusing and #1 can't even be resolved to cgroups from
userland.

A 64bit number is sufficient to identify a cgroup instance uniquely
and ino_t is 64bit on all archs except for alpha.  There's no reason
for three different IDs at all.  This patchset updates kernfs so that
it supports 64bit ino and associated exportfs operations and unifies
the cgroup IDs.

* On 64bit ino archs, ino is kernfs node ID which is also the cgroup
  ID.  The ino can be passed directly into open_by_handle_at(2) w/ the
  new key type FILEID_KERNFS.  Backward compatibility is maintained
  for FILEID_INO32_GEN keys.

* On 32bit ino archs, kernfs node ID is still 64bit and the cgroup ID.
  ino is the low 32bits and gen is the high 32bits.  If the high
  32bits is zero, open_by_handle_at(2) only matches the ino part of
  the ID allowing userland to resolve inos to cgroups as long as
  distinguishing recycled inos isn't necessary.

This patchset contains the following 10 patches.

 0001-kernfs-fix-ino-wrap-around-detection.patch
 0002-writeback-use-ino_t-for-inodes-in-tracepoints.patch
 0003-netprio-use-css-ID-instead-of-cgroup-ID.patch
 0004-kernfs-use-dumber-locking-for-kernfs_find_and_get_no.patch
 0005-kernfs-kernfs_find_and_get_node_by_ino-should-only-l.patch
 0006-kernfs-convert-kernfs_node-id-from-union-kernfs_node.patch
 0007-kernfs-combine-ino-id-lookup-functions-into-kernfs_f.patch
 0008-kernfs-implement-custom-exportfs-ops-and-fid-type.patch
 0009-kernfs-use-64bit-inos-if-ino_t-is-64bit.patch
 0010-cgroup-use-cgrp-kn-id-as-the-cgroup-ID.patch

0001 is a fix which should be backported through -stable.  0002 and
0003 are prep patches.  0004-0009 make kernfs_node->id a u64 and use
it as ino on 64bit ino archs.  0010 replaces cgroup->id with the
kernfs node ID.

Greg, how do you want to route the patches?  We can route 0001-0009
through your tree and the last one through cgroup after pulling in.
I'd be happy to route them all too.

This patchset is also available in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified-cgid

diffstat follows.  Thanks.

 fs/kernfs/dir.c                  |  101 +++++++++++++++++++-------------------
 fs/kernfs/file.c                 |    4 -
 fs/kernfs/inode.c                |    4 -
 fs/kernfs/kernfs-internal.h      |    2 
 fs/kernfs/mount.c                |  102 ++++++++++++++++++++++-----------------
 include/linux/cgroup-defs.h      |   17 ------
 include/linux/cgroup.h           |   26 ++++-----
 include/linux/exportfs.h         |    5 +
 include/linux/kernfs.h           |   57 ++++++++++++++-------
 include/net/netprio_cgroup.h     |    2 
 include/trace/events/cgroup.h    |    6 +-
 include/trace/events/writeback.h |   92 +++++++++++++++++------------------
 kernel/bpf/helpers.c             |    2 
 kernel/bpf/local_storage.c       |    2 
 kernel/cgroup/cgroup.c           |   81 ++++++++++--------------------
 kernel/trace/blktrace.c          |   84 +++++++++++++++++---------------
 net/core/filter.c                |    4 -
 net/core/netprio_cgroup.c        |    8 +--
 18 files changed, 301 insertions(+), 298 deletions(-)

--
tejun


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/10] kernfs: fix ino wrap-around detection
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
@ 2019-11-04 23:59 ` Tejun Heo
  2019-11-04 23:59 ` [PATCH 02/10] writeback: use ino_t for inodes in tracepoints Tejun Heo
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel, Tejun Heo, stable

When the 32bit ino wraps around, kernfs increments the generation
number to distinguish reused ino instances.  The wrap-around detection
tests whether the allocated ino is lower than what the cursor but the
cursor is pointing to the next ino to allocate so the condition never
triggers.

Fix it by remembering the last ino and comparing against that.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 4a3ef68acacf ("kernfs: implement i_generation")
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org # v4.14+
---
 fs/kernfs/dir.c        | 5 ++---
 include/linux/kernfs.h | 1 +
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 6ebae6bbe6a5..7d4af6cea2a6 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -622,7 +622,6 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 {
 	struct kernfs_node *kn;
 	u32 gen;
-	int cursor;
 	int ret;
 
 	name = kstrdup_const(name, GFP_KERNEL);
@@ -635,11 +634,11 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&kernfs_idr_lock);
-	cursor = idr_get_cursor(&root->ino_idr);
 	ret = idr_alloc_cyclic(&root->ino_idr, kn, 1, 0, GFP_ATOMIC);
-	if (ret >= 0 && ret < cursor)
+	if (ret >= 0 && ret < root->last_ino)
 		root->next_generation++;
 	gen = root->next_generation;
+	root->last_ino = ret;
 	spin_unlock(&kernfs_idr_lock);
 	idr_preload_end();
 	if (ret < 0)
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 936b61bd504e..f797ccc650e7 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -187,6 +187,7 @@ struct kernfs_root {
 
 	/* private fields, do not use outside kernfs proper */
 	struct idr		ino_idr;
+	u32			last_ino;
 	u32			next_generation;
 	struct kernfs_syscall_ops *syscall_ops;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/10] writeback: use ino_t for inodes in tracepoints
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
  2019-11-04 23:59 ` [PATCH 01/10] kernfs: fix ino wrap-around detection Tejun Heo
@ 2019-11-04 23:59 ` Tejun Heo
  2019-11-05  9:11   ` Jan Kara
  2019-11-04 23:59 ` [PATCH 03/10] netprio: use css ID instead of cgroup ID Tejun Heo
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel, Tejun Heo, Jan Kara, Jens Axboe

Writeback TPs currently use mix of 32 and 64bits for inos.  This isn't
currently broken because only cgroup inos are using 32bits and they're
limited to 32bits.  cgroup inos will make use of 64bits.  Let's
uniformly use ino_t.

While at it, switch the default cgroup ino value used when cgroup is
disabled to 1 instead of -1U as root cgroup always uses ino 1.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Namhyung Kim <namhyung@kernel.org>
---
Hello,

This is to prepare for kernfs 64bit ino support.  It'd be best to
route this with the rest of kernfs patchset.

Thanks.

 include/trace/events/writeback.h | 88 ++++++++++++++++----------------
 1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index c2ce6480b4b1..95e50677476b 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -61,7 +61,7 @@ DECLARE_EVENT_CLASS(writeback_page_template,
 
 	TP_STRUCT__entry (
 		__array(char, name, 32)
-		__field(unsigned long, ino)
+		__field(ino_t, ino)
 		__field(pgoff_t, index)
 	),
 
@@ -102,7 +102,7 @@ DECLARE_EVENT_CLASS(writeback_dirty_inode_template,
 
 	TP_STRUCT__entry (
 		__array(char, name, 32)
-		__field(unsigned long, ino)
+		__field(ino_t, ino)
 		__field(unsigned long, state)
 		__field(unsigned long, flags)
 	),
@@ -150,28 +150,28 @@ DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode,
 #ifdef CREATE_TRACE_POINTS
 #ifdef CONFIG_CGROUP_WRITEBACK
 
-static inline unsigned int __trace_wb_assign_cgroup(struct bdi_writeback *wb)
+static inline ino_t __trace_wb_assign_cgroup(struct bdi_writeback *wb)
 {
 	return wb->memcg_css->cgroup->kn->id.ino;
 }
 
-static inline unsigned int __trace_wbc_assign_cgroup(struct writeback_control *wbc)
+static inline ino_t __trace_wbc_assign_cgroup(struct writeback_control *wbc)
 {
 	if (wbc->wb)
 		return __trace_wb_assign_cgroup(wbc->wb);
 	else
-		return -1U;
+		return 1;
 }
 #else	/* CONFIG_CGROUP_WRITEBACK */
 
-static inline unsigned int __trace_wb_assign_cgroup(struct bdi_writeback *wb)
+static inline ino_t __trace_wb_assign_cgroup(struct bdi_writeback *wb)
 {
-	return -1U;
+	return 1;
 }
 
-static inline unsigned int __trace_wbc_assign_cgroup(struct writeback_control *wbc)
+static inline ino_t __trace_wbc_assign_cgroup(struct writeback_control *wbc)
 {
-	return -1U;
+	return 1;
 }
 
 #endif	/* CONFIG_CGROUP_WRITEBACK */
@@ -187,8 +187,8 @@ TRACE_EVENT(inode_foreign_history,
 
 	TP_STRUCT__entry(
 		__array(char,		name, 32)
-		__field(unsigned long,	ino)
-		__field(unsigned int,	cgroup_ino)
+		__field(ino_t,		ino)
+		__field(ino_t,		cgroup_ino)
 		__field(unsigned int,	history)
 	),
 
@@ -199,7 +199,7 @@ TRACE_EVENT(inode_foreign_history,
 		__entry->history	= history;
 	),
 
-	TP_printk("bdi %s: ino=%lu cgroup_ino=%u history=0x%x",
+	TP_printk("bdi %s: ino=%lu cgroup_ino=%lu history=0x%x",
 		__entry->name,
 		__entry->ino,
 		__entry->cgroup_ino,
@@ -216,9 +216,9 @@ TRACE_EVENT(inode_switch_wbs,
 
 	TP_STRUCT__entry(
 		__array(char,		name, 32)
-		__field(unsigned long,	ino)
-		__field(unsigned int,	old_cgroup_ino)
-		__field(unsigned int,	new_cgroup_ino)
+		__field(ino_t,		ino)
+		__field(ino_t,		old_cgroup_ino)
+		__field(ino_t,		new_cgroup_ino)
 	),
 
 	TP_fast_assign(
@@ -228,7 +228,7 @@ TRACE_EVENT(inode_switch_wbs,
 		__entry->new_cgroup_ino	= __trace_wb_assign_cgroup(new_wb);
 	),
 
-	TP_printk("bdi %s: ino=%lu old_cgroup_ino=%u new_cgroup_ino=%u",
+	TP_printk("bdi %s: ino=%lu old_cgroup_ino=%lu new_cgroup_ino=%lu",
 		__entry->name,
 		__entry->ino,
 		__entry->old_cgroup_ino,
@@ -245,10 +245,10 @@ TRACE_EVENT(track_foreign_dirty,
 	TP_STRUCT__entry(
 		__array(char,		name, 32)
 		__field(u64,		bdi_id)
-		__field(unsigned long,	ino)
+		__field(ino_t,		ino)
 		__field(unsigned int,	memcg_id)
-		__field(unsigned int,	cgroup_ino)
-		__field(unsigned int,	page_cgroup_ino)
+		__field(ino_t,		cgroup_ino)
+		__field(ino_t,		page_cgroup_ino)
 	),
 
 	TP_fast_assign(
@@ -263,7 +263,7 @@ TRACE_EVENT(track_foreign_dirty,
 		__entry->page_cgroup_ino = page->mem_cgroup->css.cgroup->kn->id.ino;
 	),
 
-	TP_printk("bdi %s[%llu]: ino=%lu memcg_id=%u cgroup_ino=%u page_cgroup_ino=%u",
+	TP_printk("bdi %s[%llu]: ino=%lu memcg_id=%u cgroup_ino=%lu page_cgroup_ino=%lu",
 		__entry->name,
 		__entry->bdi_id,
 		__entry->ino,
@@ -282,7 +282,7 @@ TRACE_EVENT(flush_foreign,
 
 	TP_STRUCT__entry(
 		__array(char,		name, 32)
-		__field(unsigned int,	cgroup_ino)
+		__field(ino_t,		cgroup_ino)
 		__field(unsigned int,	frn_bdi_id)
 		__field(unsigned int,	frn_memcg_id)
 	),
@@ -294,7 +294,7 @@ TRACE_EVENT(flush_foreign,
 		__entry->frn_memcg_id	= frn_memcg_id;
 	),
 
-	TP_printk("bdi %s: cgroup_ino=%u frn_bdi_id=%u frn_memcg_id=%u",
+	TP_printk("bdi %s: cgroup_ino=%lu frn_bdi_id=%u frn_memcg_id=%u",
 		__entry->name,
 		__entry->cgroup_ino,
 		__entry->frn_bdi_id,
@@ -311,9 +311,9 @@ DECLARE_EVENT_CLASS(writeback_write_inode_template,
 
 	TP_STRUCT__entry (
 		__array(char, name, 32)
-		__field(unsigned long, ino)
+		__field(ino_t, ino)
 		__field(int, sync_mode)
-		__field(unsigned int, cgroup_ino)
+		__field(ino_t, cgroup_ino)
 	),
 
 	TP_fast_assign(
@@ -324,7 +324,7 @@ DECLARE_EVENT_CLASS(writeback_write_inode_template,
 		__entry->cgroup_ino	= __trace_wbc_assign_cgroup(wbc);
 	),
 
-	TP_printk("bdi %s: ino=%lu sync_mode=%d cgroup_ino=%u",
+	TP_printk("bdi %s: ino=%lu sync_mode=%d cgroup_ino=%lu",
 		__entry->name,
 		__entry->ino,
 		__entry->sync_mode,
@@ -358,7 +358,7 @@ DECLARE_EVENT_CLASS(writeback_work_class,
 		__field(int, range_cyclic)
 		__field(int, for_background)
 		__field(int, reason)
-		__field(unsigned int, cgroup_ino)
+		__field(ino_t, cgroup_ino)
 	),
 	TP_fast_assign(
 		strscpy_pad(__entry->name,
@@ -374,7 +374,7 @@ DECLARE_EVENT_CLASS(writeback_work_class,
 		__entry->cgroup_ino = __trace_wb_assign_cgroup(wb);
 	),
 	TP_printk("bdi %s: sb_dev %d:%d nr_pages=%ld sync_mode=%d "
-		  "kupdate=%d range_cyclic=%d background=%d reason=%s cgroup_ino=%u",
+		  "kupdate=%d range_cyclic=%d background=%d reason=%s cgroup_ino=%lu",
 		  __entry->name,
 		  MAJOR(__entry->sb_dev), MINOR(__entry->sb_dev),
 		  __entry->nr_pages,
@@ -413,13 +413,13 @@ DECLARE_EVENT_CLASS(writeback_class,
 	TP_ARGS(wb),
 	TP_STRUCT__entry(
 		__array(char, name, 32)
-		__field(unsigned int, cgroup_ino)
+		__field(ino_t, cgroup_ino)
 	),
 	TP_fast_assign(
 		strscpy_pad(__entry->name, dev_name(wb->bdi->dev), 32);
 		__entry->cgroup_ino = __trace_wb_assign_cgroup(wb);
 	),
-	TP_printk("bdi %s: cgroup_ino=%u",
+	TP_printk("bdi %s: cgroup_ino=%lu",
 		  __entry->name,
 		  __entry->cgroup_ino
 	)
@@ -459,7 +459,7 @@ DECLARE_EVENT_CLASS(wbc_class,
 		__field(int, range_cyclic)
 		__field(long, range_start)
 		__field(long, range_end)
-		__field(unsigned int, cgroup_ino)
+		__field(ino_t, cgroup_ino)
 	),
 
 	TP_fast_assign(
@@ -478,7 +478,7 @@ DECLARE_EVENT_CLASS(wbc_class,
 
 	TP_printk("bdi %s: towrt=%ld skip=%ld mode=%d kupd=%d "
 		"bgrd=%d reclm=%d cyclic=%d "
-		"start=0x%lx end=0x%lx cgroup_ino=%u",
+		"start=0x%lx end=0x%lx cgroup_ino=%lu",
 		__entry->name,
 		__entry->nr_to_write,
 		__entry->pages_skipped,
@@ -510,7 +510,7 @@ TRACE_EVENT(writeback_queue_io,
 		__field(long,		age)
 		__field(int,		moved)
 		__field(int,		reason)
-		__field(unsigned int,	cgroup_ino)
+		__field(ino_t,		cgroup_ino)
 	),
 	TP_fast_assign(
 		unsigned long *older_than_this = work->older_than_this;
@@ -522,7 +522,7 @@ TRACE_EVENT(writeback_queue_io,
 		__entry->reason	= work->reason;
 		__entry->cgroup_ino	= __trace_wb_assign_cgroup(wb);
 	),
-	TP_printk("bdi %s: older=%lu age=%ld enqueue=%d reason=%s cgroup_ino=%u",
+	TP_printk("bdi %s: older=%lu age=%ld enqueue=%d reason=%s cgroup_ino=%lu",
 		__entry->name,
 		__entry->older,	/* older_than_this in jiffies */
 		__entry->age,	/* older_than_this in relative milliseconds */
@@ -596,7 +596,7 @@ TRACE_EVENT(bdi_dirty_ratelimit,
 		__field(unsigned long,	dirty_ratelimit)
 		__field(unsigned long,	task_ratelimit)
 		__field(unsigned long,	balanced_dirty_ratelimit)
-		__field(unsigned int,	cgroup_ino)
+		__field(ino_t,		cgroup_ino)
 	),
 
 	TP_fast_assign(
@@ -614,7 +614,7 @@ TRACE_EVENT(bdi_dirty_ratelimit,
 	TP_printk("bdi %s: "
 		  "write_bw=%lu awrite_bw=%lu dirty_rate=%lu "
 		  "dirty_ratelimit=%lu task_ratelimit=%lu "
-		  "balanced_dirty_ratelimit=%lu cgroup_ino=%u",
+		  "balanced_dirty_ratelimit=%lu cgroup_ino=%lu",
 		  __entry->bdi,
 		  __entry->write_bw,		/* write bandwidth */
 		  __entry->avg_write_bw,	/* avg write bandwidth */
@@ -660,7 +660,7 @@ TRACE_EVENT(balance_dirty_pages,
 		__field(	 long,	pause)
 		__field(unsigned long,	period)
 		__field(	 long,	think)
-		__field(unsigned int,	cgroup_ino)
+		__field(ino_t,		cgroup_ino)
 	),
 
 	TP_fast_assign(
@@ -692,7 +692,7 @@ TRACE_EVENT(balance_dirty_pages,
 		  "bdi_setpoint=%lu bdi_dirty=%lu "
 		  "dirty_ratelimit=%lu task_ratelimit=%lu "
 		  "dirtied=%u dirtied_pause=%u "
-		  "paused=%lu pause=%ld period=%lu think=%ld cgroup_ino=%u",
+		  "paused=%lu pause=%ld period=%lu think=%ld cgroup_ino=%lu",
 		  __entry->bdi,
 		  __entry->limit,
 		  __entry->setpoint,
@@ -718,10 +718,10 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
 
 	TP_STRUCT__entry(
 		__array(char, name, 32)
-		__field(unsigned long, ino)
+		__field(ino_t, ino)
 		__field(unsigned long, state)
 		__field(unsigned long, dirtied_when)
-		__field(unsigned int, cgroup_ino)
+		__field(ino_t, cgroup_ino)
 	),
 
 	TP_fast_assign(
@@ -733,7 +733,7 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
 		__entry->cgroup_ino	= __trace_wb_assign_cgroup(inode_to_wb(inode));
 	),
 
-	TP_printk("bdi %s: ino=%lu state=%s dirtied_when=%lu age=%lu cgroup_ino=%u",
+	TP_printk("bdi %s: ino=%lu state=%s dirtied_when=%lu age=%lu cgroup_ino=%lu",
 		  __entry->name,
 		  __entry->ino,
 		  show_inode_state(__entry->state),
@@ -789,13 +789,13 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
 
 	TP_STRUCT__entry(
 		__array(char, name, 32)
-		__field(unsigned long, ino)
+		__field(ino_t, ino)
 		__field(unsigned long, state)
 		__field(unsigned long, dirtied_when)
 		__field(unsigned long, writeback_index)
 		__field(long, nr_to_write)
 		__field(unsigned long, wrote)
-		__field(unsigned int, cgroup_ino)
+		__field(ino_t, cgroup_ino)
 	),
 
 	TP_fast_assign(
@@ -811,7 +811,7 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
 	),
 
 	TP_printk("bdi %s: ino=%lu state=%s dirtied_when=%lu age=%lu "
-		  "index=%lu to_write=%ld wrote=%lu cgroup_ino=%u",
+		  "index=%lu to_write=%ld wrote=%lu cgroup_ino=%lu",
 		  __entry->name,
 		  __entry->ino,
 		  show_inode_state(__entry->state),
@@ -845,7 +845,7 @@ DECLARE_EVENT_CLASS(writeback_inode_template,
 
 	TP_STRUCT__entry(
 		__field(	dev_t,	dev			)
-		__field(unsigned long,	ino			)
+		__field(	ino_t,	ino			)
 		__field(unsigned long,	state			)
 		__field(	__u16, mode			)
 		__field(unsigned long, dirtied_when		)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/10] netprio: use css ID instead of cgroup ID
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
  2019-11-04 23:59 ` [PATCH 01/10] kernfs: fix ino wrap-around detection Tejun Heo
  2019-11-04 23:59 ` [PATCH 02/10] writeback: use ino_t for inodes in tracepoints Tejun Heo
@ 2019-11-04 23:59 ` Tejun Heo
  2019-11-05 13:27   ` Neil Horman
  2019-11-04 23:59 ` [PATCH 04/10] kernfs: use dumber locking for kernfs_find_and_get_node_by_ino() Tejun Heo
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel, Tejun Heo, David S. Miller, Neil Horman

netprio uses cgroup ID to index the priority mapping table.  This is
currently okay as cgroup IDs are allocated using idr and packed.
However, cgroup IDs will be changed to use full 64bit range and won't
be packed making this impractical.  netprio doesn't care what type of
IDs it uses as long as they can identify the controller instances and
are packed.  Let's switch to css IDs instead of cgroup IDs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
Hello,

This is to prepare for kernfs 64bit ino support.  It'd be best to
route this with the rest of kernfs patchset.

Thanks.

 include/net/netprio_cgroup.h | 2 +-
 net/core/netprio_cgroup.c    | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/net/netprio_cgroup.h b/include/net/netprio_cgroup.h
index cfc9441ef074..dec7522b6ce1 100644
--- a/include/net/netprio_cgroup.h
+++ b/include/net/netprio_cgroup.h
@@ -26,7 +26,7 @@ static inline u32 task_netprioidx(struct task_struct *p)
 
 	rcu_read_lock();
 	css = task_css(p, net_prio_cgrp_id);
-	idx = css->cgroup->id;
+	idx = css->id;
 	rcu_read_unlock();
 	return idx;
 }
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 256b7954b720..8881dd943dd0 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -93,7 +93,7 @@ static int extend_netdev_table(struct net_device *dev, u32 target_idx)
 static u32 netprio_prio(struct cgroup_subsys_state *css, struct net_device *dev)
 {
 	struct netprio_map *map = rcu_dereference_rtnl(dev->priomap);
-	int id = css->cgroup->id;
+	int id = css->id;
 
 	if (map && id < map->priomap_len)
 		return map->priomap[id];
@@ -113,7 +113,7 @@ static int netprio_set_prio(struct cgroup_subsys_state *css,
 			    struct net_device *dev, u32 prio)
 {
 	struct netprio_map *map;
-	int id = css->cgroup->id;
+	int id = css->id;
 	int ret;
 
 	/* avoid extending priomap for zero writes */
@@ -177,7 +177,7 @@ static void cgrp_css_free(struct cgroup_subsys_state *css)
 
 static u64 read_prioidx(struct cgroup_subsys_state *css, struct cftype *cft)
 {
-	return css->cgroup->id;
+	return css->id;
 }
 
 static int read_priomap(struct seq_file *sf, void *v)
@@ -237,7 +237,7 @@ static void net_prio_attach(struct cgroup_taskset *tset)
 	struct cgroup_subsys_state *css;
 
 	cgroup_taskset_for_each(p, css, tset) {
-		void *v = (void *)(unsigned long)css->cgroup->id;
+		void *v = (void *)(unsigned long)css->id;
 
 		task_lock(p);
 		iterate_fd(p->files, 0, update_netprio, v);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/10] kernfs: use dumber locking for kernfs_find_and_get_node_by_ino()
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
                   ` (2 preceding siblings ...)
  2019-11-04 23:59 ` [PATCH 03/10] netprio: use css ID instead of cgroup ID Tejun Heo
@ 2019-11-04 23:59 ` Tejun Heo
  2019-11-04 23:59 ` [PATCH 05/10] kernfs: kernfs_find_and_get_node_by_ino() should only look up activated nodes Tejun Heo
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel, Tejun Heo

kernfs_find_and_get_node_by_ino() uses RCU protection.  It's currently
a bit buggy because it can look up a node which hasn't been activated
yet and thus may end up exposing a node that the kernfs user is still
prepping.

While it can be fixed by pushing it further in the current direction,
it's already complicated and isn't clear whether the complexity is
justified.  The main use of kernfs_find_and_get_node_by_ino() is for
exportfs operations.  They aren't super hot and all the follow-up
operations (e.g. mapping to path) use normal locking anyway.

Let's switch to a dumber locking scheme and protect the lookup with
kernfs_idr_lock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 fs/kernfs/dir.c   | 45 +++++++++------------------------------------
 fs/kernfs/mount.c | 11 +----------
 2 files changed, 10 insertions(+), 46 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 7d4af6cea2a6..798f0f03b62b 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -508,10 +508,6 @@ void kernfs_put(struct kernfs_node *kn)
 	struct kernfs_node *parent;
 	struct kernfs_root *root;
 
-	/*
-	 * kernfs_node is freed with ->count 0, kernfs_find_and_get_node_by_ino
-	 * depends on this to filter reused stale node
-	 */
 	if (!kn || !atomic_dec_and_test(&kn->count))
 		return;
 	root = kernfs_root(kn);
@@ -646,11 +642,7 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 	kn->id.ino = ret;
 	kn->id.generation = gen;
 
-	/*
-	 * set ino first. This RELEASE is paired with atomic_inc_not_zero in
-	 * kernfs_find_and_get_node_by_ino
-	 */
-	atomic_set_release(&kn->count, 1);
+	atomic_set(&kn->count, 1);
 	atomic_set(&kn->active, KN_DEACTIVATED_BIAS);
 	RB_CLEAR_NODE(&kn->rb);
 
@@ -716,38 +708,19 @@ struct kernfs_node *kernfs_find_and_get_node_by_ino(struct kernfs_root *root,
 {
 	struct kernfs_node *kn;
 
-	rcu_read_lock();
+	spin_lock(&kernfs_idr_lock);
+
 	kn = idr_find(&root->ino_idr, ino);
 	if (!kn)
-		goto out;
+		goto err_unlock;
 
-	/*
-	 * Since kernfs_node is freed in RCU, it's possible an old node for ino
-	 * is freed, but reused before RCU grace period. But a freed node (see
-	 * kernfs_put) or an incompletedly initialized node (see
-	 * __kernfs_new_node) should have 'count' 0. We can use this fact to
-	 * filter out such node.
-	 */
-	if (!atomic_inc_not_zero(&kn->count)) {
-		kn = NULL;
-		goto out;
-	}
-
-	/*
-	 * The node could be a new node or a reused node. If it's a new node,
-	 * we are ok. If it's reused because of RCU (because of
-	 * SLAB_TYPESAFE_BY_RCU), the __kernfs_new_node always sets its 'ino'
-	 * before 'count'. So if 'count' is uptodate, 'ino' should be uptodate,
-	 * hence we can use 'ino' to filter stale node.
-	 */
-	if (kn->id.ino != ino)
-		goto out;
-	rcu_read_unlock();
+	if (unlikely(!atomic_inc_not_zero(&kn->count)))
+		goto err_unlock;
 
+	spin_unlock(&kernfs_idr_lock);
 	return kn;
-out:
-	rcu_read_unlock();
-	kernfs_put(kn);
+err_unlock:
+	spin_unlock(&kernfs_idr_lock);
 	return NULL;
 }
 
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 6c12fac2c287..067b7c380056 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -363,18 +363,9 @@ void kernfs_kill_sb(struct super_block *sb)
 
 void __init kernfs_init(void)
 {
-
-	/*
-	 * the slab is freed in RCU context, so kernfs_find_and_get_node_by_ino
-	 * can access the slab lock free. This could introduce stale nodes,
-	 * please see how kernfs_find_and_get_node_by_ino filters out stale
-	 * nodes.
-	 */
 	kernfs_node_cache = kmem_cache_create("kernfs_node_cache",
 					      sizeof(struct kernfs_node),
-					      0,
-					      SLAB_PANIC | SLAB_TYPESAFE_BY_RCU,
-					      NULL);
+					      0, SLAB_PANIC, NULL);
 
 	/* Creates slab cache for kernfs inode attributes */
 	kernfs_iattrs_cache  = kmem_cache_create("kernfs_iattrs_cache",
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/10] kernfs: kernfs_find_and_get_node_by_ino() should only look up activated nodes
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
                   ` (3 preceding siblings ...)
  2019-11-04 23:59 ` [PATCH 04/10] kernfs: use dumber locking for kernfs_find_and_get_node_by_ino() Tejun Heo
@ 2019-11-04 23:59 ` Tejun Heo
  2019-11-04 23:59 ` [PATCH 06/10] kernfs: convert kernfs_node->id from union kernfs_node_id to u64 Tejun Heo
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel, Tejun Heo

kernfs node can be created in two separate steps - allocation and
activation.  This is used to make kernfs nodes visible only after the
internal states attached to the node are fully initialized.
kernfs_find_and_get_node_by_id() currently allows lookups of nodes
which aren't activated yet and thus can expose nodes are which are
still being prepped by kernfs users.

Fix it by disallowing lookups of nodes which aren't activated yet.

kernfs_find_and_get_node_by_ino()

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 fs/kernfs/dir.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 798f0f03b62b..beabb585a7d8 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -714,7 +714,13 @@ struct kernfs_node *kernfs_find_and_get_node_by_ino(struct kernfs_root *root,
 	if (!kn)
 		goto err_unlock;
 
-	if (unlikely(!atomic_inc_not_zero(&kn->count)))
+	/*
+	 * ACTIVATED is protected with kernfs_mutex but it was clear when
+	 * @kn was added to idr and we just wanna see it set.  No need to
+	 * grab kernfs_mutex.
+	 */
+	if (unlikely(!(kn->flags & KERNFS_ACTIVATED) ||
+		     !atomic_inc_not_zero(&kn->count)))
 		goto err_unlock;
 
 	spin_unlock(&kernfs_idr_lock);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/10] kernfs: convert kernfs_node->id from union kernfs_node_id to u64
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
                   ` (4 preceding siblings ...)
  2019-11-04 23:59 ` [PATCH 05/10] kernfs: kernfs_find_and_get_node_by_ino() should only look up activated nodes Tejun Heo
@ 2019-11-04 23:59 ` Tejun Heo
  2019-11-04 23:59 ` [PATCH 07/10] kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id() Tejun Heo
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel, Tejun Heo, Jens Axboe

kernfs_node->id is currently a union kernfs_node_id which represents
either a 32bit (ino, gen) pair or u64 value.  I can't see much value
in the usage of the union - all that's needed is a 64bit ID which the
current code is already limited to.  Using a union makes the code
unnecessarily complicated and prevents using 64bit ino without adding
practical benefits.

This patch drops union kernfs_node_id and makes kernfs_node->id a u64.
ino is stored in the lower 32bits and gen upper.  Accessors -
kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the
ino and gen.  This simplifies ID handling less cumbersome and will
allow using 64bit inos on supported archs.

This patch doesn't make any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexei Starovoitov <ast@kernel.org>
---
 fs/kernfs/dir.c                  | 10 ++---
 fs/kernfs/file.c                 |  4 +-
 fs/kernfs/inode.c                |  4 +-
 fs/kernfs/mount.c                |  7 ++--
 include/linux/cgroup.h           | 17 ++++----
 include/linux/kernfs.h           | 45 ++++++++++++---------
 include/trace/events/writeback.h |  4 +-
 kernel/bpf/helpers.c             |  2 +-
 kernel/bpf/local_storage.c       |  2 +-
 kernel/cgroup/cgroup.c           |  3 +-
 kernel/trace/blktrace.c          | 67 +++++++++++++++-----------------
 net/core/filter.c                |  4 +-
 12 files changed, 85 insertions(+), 84 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index beabb585a7d8..c67afb591e5b 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -532,7 +532,7 @@ void kernfs_put(struct kernfs_node *kn)
 		kmem_cache_free(kernfs_iattrs_cache, kn->iattr);
 	}
 	spin_lock(&kernfs_idr_lock);
-	idr_remove(&root->ino_idr, kn->id.ino);
+	idr_remove(&root->ino_idr, kernfs_ino(kn));
 	spin_unlock(&kernfs_idr_lock);
 	kmem_cache_free(kernfs_node_cache, kn);
 
@@ -639,8 +639,8 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 	idr_preload_end();
 	if (ret < 0)
 		goto err_out2;
-	kn->id.ino = ret;
-	kn->id.generation = gen;
+
+	kn->id = (u64)gen << 32 | ret;
 
 	atomic_set(&kn->count, 1);
 	atomic_set(&kn->active, KN_DEACTIVATED_BIAS);
@@ -671,7 +671,7 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 	return kn;
 
  err_out3:
-	idr_remove(&root->ino_idr, kn->id.ino);
+	idr_remove(&root->ino_idr, kernfs_ino(kn));
  err_out2:
 	kmem_cache_free(kernfs_node_cache, kn);
  err_out1:
@@ -1656,7 +1656,7 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx)
 		const char *name = pos->name;
 		unsigned int type = dt_type(pos);
 		int len = strlen(name);
-		ino_t ino = pos->id.ino;
+		ino_t ino = kernfs_ino(pos);
 
 		ctx->pos = pos->hash;
 		file->private_data = pos;
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index e8c792b49616..34366db3620d 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -892,7 +892,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
 		 * have the matching @file available.  Look up the inodes
 		 * and generate the events manually.
 		 */
-		inode = ilookup(info->sb, kn->id.ino);
+		inode = ilookup(info->sb, kernfs_ino(kn));
 		if (!inode)
 			continue;
 
@@ -901,7 +901,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
 		if (parent) {
 			struct inode *p_inode;
 
-			p_inode = ilookup(info->sb, parent->id.ino);
+			p_inode = ilookup(info->sb, kernfs_ino(parent));
 			if (p_inode) {
 				fsnotify(p_inode, FS_MODIFY | FS_EVENT_ON_CHILD,
 					 inode, FSNOTIFY_EVENT_INODE, &name, 0);
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index f3eaa8869f42..eac277c63d42 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -201,7 +201,7 @@ static void kernfs_init_inode(struct kernfs_node *kn, struct inode *inode)
 	inode->i_private = kn;
 	inode->i_mapping->a_ops = &kernfs_aops;
 	inode->i_op = &kernfs_iops;
-	inode->i_generation = kn->id.generation;
+	inode->i_generation = kernfs_gen(kn);
 
 	set_default_inode_attr(inode, kn->mode);
 	kernfs_refresh_inode(kn, inode);
@@ -247,7 +247,7 @@ struct inode *kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn)
 {
 	struct inode *inode;
 
-	inode = iget_locked(sb, kn->id.ino);
+	inode = iget_locked(sb, kernfs_ino(kn));
 	if (inode && (inode->i_state & I_NEW))
 		kernfs_init_inode(kn, inode);
 
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 067b7c380056..f05d5d6f926d 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -57,15 +57,14 @@ const struct super_operations kernfs_sops = {
  * Similar to kernfs_fh_get_inode, this one gets kernfs node from inode
  * number and generation
  */
-struct kernfs_node *kernfs_get_node_by_id(struct kernfs_root *root,
-	const union kernfs_node_id *id)
+struct kernfs_node *kernfs_get_node_by_id(struct kernfs_root *root, u64 id)
 {
 	struct kernfs_node *kn;
 
-	kn = kernfs_find_and_get_node_by_ino(root, id->ino);
+	kn = kernfs_find_and_get_node_by_ino(root, kernfs_id_ino(id));
 	if (!kn)
 		return NULL;
-	if (kn->id.generation != id->generation) {
+	if (kernfs_gen(kn) != kernfs_id_gen(id)) {
 		kernfs_put(kn);
 		return NULL;
 	}
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index f6b048902d6c..815fff49d555 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -616,7 +616,7 @@ static inline bool cgroup_is_populated(struct cgroup *cgrp)
 /* returns ino associated with a cgroup */
 static inline ino_t cgroup_ino(struct cgroup *cgrp)
 {
-	return cgrp->kn->id.ino;
+	return kernfs_ino(cgrp->kn);
 }
 
 /* cft/css accessors for cftype->write() operation */
@@ -687,13 +687,12 @@ static inline void cgroup_kthread_ready(void)
 	current->no_cgroup_migration = 0;
 }
 
-static inline union kernfs_node_id *cgroup_get_kernfs_id(struct cgroup *cgrp)
+static inline u64 cgroup_get_kernfs_id(struct cgroup *cgrp)
 {
-	return &cgrp->kn->id;
+	return cgrp->kn->id;
 }
 
-void cgroup_path_from_kernfs_id(const union kernfs_node_id *id,
-					char *buf, size_t buflen);
+void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen);
 #else /* !CONFIG_CGROUPS */
 
 struct cgroup_subsys_state;
@@ -718,9 +717,9 @@ static inline int cgroup_init_early(void) { return 0; }
 static inline int cgroup_init(void) { return 0; }
 static inline void cgroup_init_kthreadd(void) {}
 static inline void cgroup_kthread_ready(void) {}
-static inline union kernfs_node_id *cgroup_get_kernfs_id(struct cgroup *cgrp)
+static inline union u64 cgroup_get_kernfs_id(struct cgroup *cgrp)
 {
-	return NULL;
+	return 0;
 }
 
 static inline struct cgroup *cgroup_parent(struct cgroup *cgrp)
@@ -739,8 +738,8 @@ static inline bool task_under_cgroup_hierarchy(struct task_struct *task,
 	return true;
 }
 
-static inline void cgroup_path_from_kernfs_id(const union kernfs_node_id *id,
-	char *buf, size_t buflen) {}
+static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
+{}
 #endif /* !CONFIG_CGROUPS */
 
 #ifdef CONFIG_CGROUPS
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index f797ccc650e7..b2fc5c8ef6d9 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -104,21 +104,6 @@ struct kernfs_elem_attr {
 	struct kernfs_node	*notify_next;	/* for kernfs_notify() */
 };
 
-/* represent a kernfs node */
-union kernfs_node_id {
-	struct {
-		/*
-		 * blktrace will export this struct as a simplified 'struct
-		 * fid' (which is a big data struction), so userspace can use
-		 * it to find kernfs node. The layout must match the first two
-		 * fields of 'struct fid' exactly.
-		 */
-		u32		ino;
-		u32		generation;
-	};
-	u64			id;
-};
-
 /*
  * kernfs_node - the building block of kernfs hierarchy.  Each and every
  * kernfs node is represented by single kernfs_node.  Most fields are
@@ -155,7 +140,12 @@ struct kernfs_node {
 
 	void			*priv;
 
-	union kernfs_node_id	id;
+	/*
+	 * 64bit unique ID.  Lower 32bits carry the inode number and lower
+	 * generation.
+	 */
+	u64			id;
+
 	unsigned short		flags;
 	umode_t			mode;
 	struct kernfs_iattrs	*iattr;
@@ -292,6 +282,26 @@ static inline enum kernfs_node_type kernfs_type(struct kernfs_node *kn)
 	return kn->flags & KERNFS_TYPE_MASK;
 }
 
+static inline ino_t kernfs_id_ino(u64 id)
+{
+	return (u32)id;
+}
+
+static inline u32 kernfs_id_gen(u64 id)
+{
+	return id >> 32;
+}
+
+static inline ino_t kernfs_ino(struct kernfs_node *kn)
+{
+	return kernfs_id_ino(kn->id);
+}
+
+static inline ino_t kernfs_gen(struct kernfs_node *kn)
+{
+	return kernfs_id_gen(kn->id);
+}
+
 /**
  * kernfs_enable_ns - enable namespace under a directory
  * @kn: directory of interest, should be empty
@@ -383,8 +393,7 @@ void kernfs_kill_sb(struct super_block *sb);
 
 void kernfs_init(void);
 
-struct kernfs_node *kernfs_get_node_by_id(struct kernfs_root *root,
-	const union kernfs_node_id *id);
+struct kernfs_node *kernfs_get_node_by_id(struct kernfs_root *root, u64 id);
 #else	/* CONFIG_KERNFS */
 
 static inline enum kernfs_node_type kernfs_type(struct kernfs_node *kn)
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 95e50677476b..b4f0ffe1817e 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -152,7 +152,7 @@ DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode,
 
 static inline ino_t __trace_wb_assign_cgroup(struct bdi_writeback *wb)
 {
-	return wb->memcg_css->cgroup->kn->id.ino;
+	return cgroup_ino(wb->memcg_css->cgroup);
 }
 
 static inline ino_t __trace_wbc_assign_cgroup(struct writeback_control *wbc)
@@ -260,7 +260,7 @@ TRACE_EVENT(track_foreign_dirty,
 		__entry->ino		= inode ? inode->i_ino : 0;
 		__entry->memcg_id	= wb->memcg_css->id;
 		__entry->cgroup_ino	= __trace_wb_assign_cgroup(wb);
-		__entry->page_cgroup_ino = page->mem_cgroup->css.cgroup->kn->id.ino;
+		__entry->page_cgroup_ino = cgroup_ino(page->mem_cgroup->css.cgroup);
 	),
 
 	TP_printk("bdi %s[%llu]: ino=%lu memcg_id=%u cgroup_ino=%lu page_cgroup_ino=%lu",
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5e28718928ca..912e761cd17a 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -317,7 +317,7 @@ BPF_CALL_0(bpf_get_current_cgroup_id)
 {
 	struct cgroup *cgrp = task_dfl_cgroup(current);
 
-	return cgrp->kn->id.id;
+	return cgrp->kn->id;
 }
 
 const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index addd6fdceec8..5d867f6d7204 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -569,7 +569,7 @@ void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
 		return;
 
 	storage->key.attach_type = type;
-	storage->key.cgroup_inode_id = cgroup->kn->id.id;
+	storage->key.cgroup_inode_id = cgroup->kn->id;
 
 	map = storage->map;
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index cf32c0c7a45d..c6bd1a5a1977 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5786,8 +5786,7 @@ static int __init cgroup_wq_init(void)
 }
 core_initcall(cgroup_wq_init);
 
-void cgroup_path_from_kernfs_id(const union kernfs_node_id *id,
-					char *buf, size_t buflen)
+void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
 {
 	struct kernfs_node *kn;
 
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 2d6e93ab0478..a986d2e74ca2 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -64,8 +64,7 @@ static void blk_unregister_tracepoints(void);
  * Send out a notify message.
  */
 static void trace_note(struct blk_trace *bt, pid_t pid, int action,
-		       const void *data, size_t len,
-		       union kernfs_node_id *cgid)
+		       const void *data, size_t len, u64 cgid)
 {
 	struct blk_io_trace *t;
 	struct ring_buffer_event *event = NULL;
@@ -73,7 +72,7 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
 	int pc = 0;
 	int cpu = smp_processor_id();
 	bool blk_tracer = blk_tracer_enabled;
-	ssize_t cgid_len = cgid ? sizeof(*cgid) : 0;
+	ssize_t cgid_len = cgid ? sizeof(cgid) : 0;
 
 	if (blk_tracer) {
 		buffer = blk_tr->trace_buffer.buffer;
@@ -100,8 +99,8 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
 		t->pid = pid;
 		t->cpu = cpu;
 		t->pdu_len = len + cgid_len;
-		if (cgid)
-			memcpy((void *)t + sizeof(*t), cgid, cgid_len);
+		if (cgid_len)
+			memcpy((void *)t + sizeof(*t), &cgid, cgid_len);
 		memcpy((void *) t + sizeof(*t) + cgid_len, data, len);
 
 		if (blk_tracer)
@@ -122,7 +121,7 @@ static void trace_note_tsk(struct task_struct *tsk)
 	spin_lock_irqsave(&running_trace_lock, flags);
 	list_for_each_entry(bt, &running_trace_list, running_list) {
 		trace_note(bt, tsk->pid, BLK_TN_PROCESS, tsk->comm,
-			   sizeof(tsk->comm), NULL);
+			   sizeof(tsk->comm), 0);
 	}
 	spin_unlock_irqrestore(&running_trace_lock, flags);
 }
@@ -139,7 +138,7 @@ static void trace_note_time(struct blk_trace *bt)
 	words[1] = now.tv_nsec;
 
 	local_irq_save(flags);
-	trace_note(bt, 0, BLK_TN_TIMESTAMP, words, sizeof(words), NULL);
+	trace_note(bt, 0, BLK_TN_TIMESTAMP, words, sizeof(words), 0);
 	local_irq_restore(flags);
 }
 
@@ -172,9 +171,9 @@ void __trace_note_message(struct blk_trace *bt, struct blkcg *blkcg,
 		blkcg = NULL;
 #ifdef CONFIG_BLK_CGROUP
 	trace_note(bt, 0, BLK_TN_MESSAGE, buf, n,
-		blkcg ? cgroup_get_kernfs_id(blkcg->css.cgroup) : NULL);
+		blkcg ? cgroup_get_kernfs_id(blkcg->css.cgroup) : 0);
 #else
-	trace_note(bt, 0, BLK_TN_MESSAGE, buf, n, NULL);
+	trace_note(bt, 0, BLK_TN_MESSAGE, buf, n, 0);
 #endif
 	local_irq_restore(flags);
 }
@@ -212,7 +211,7 @@ static const u32 ddir_act[2] = { BLK_TC_ACT(BLK_TC_READ),
  */
 static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 		     int op, int op_flags, u32 what, int error, int pdu_len,
-		     void *pdu_data, union kernfs_node_id *cgid)
+		     void *pdu_data, u64 cgid)
 {
 	struct task_struct *tsk = current;
 	struct ring_buffer_event *event = NULL;
@@ -223,7 +222,7 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 	pid_t pid;
 	int cpu, pc = 0;
 	bool blk_tracer = blk_tracer_enabled;
-	ssize_t cgid_len = cgid ? sizeof(*cgid) : 0;
+	ssize_t cgid_len = cgid ? sizeof(cgid) : 0;
 
 	if (unlikely(bt->trace_state != Blktrace_running && !blk_tracer))
 		return;
@@ -294,7 +293,7 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 		t->pdu_len = pdu_len + cgid_len;
 
 		if (cgid_len)
-			memcpy((void *)t + sizeof(*t), cgid, cgid_len);
+			memcpy((void *)t + sizeof(*t), &cgid, cgid_len);
 		if (pdu_len)
 			memcpy((void *)t + sizeof(*t) + cgid_len, pdu_data, pdu_len);
 
@@ -751,31 +750,29 @@ void blk_trace_shutdown(struct request_queue *q)
 }
 
 #ifdef CONFIG_BLK_CGROUP
-static union kernfs_node_id *
-blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio)
+static u64 blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio)
 {
 	struct blk_trace *bt = q->blk_trace;
 
 	if (!bt || !(blk_tracer_flags.val & TRACE_BLK_OPT_CGROUP))
-		return NULL;
+		return 0;
 
 	if (!bio->bi_blkg)
-		return NULL;
+		return 0;
 	return cgroup_get_kernfs_id(bio_blkcg(bio)->css.cgroup);
 }
 #else
-static union kernfs_node_id *
-blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio)
+u64 blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio)
 {
-	return NULL;
+	return 0;
 }
 #endif
 
-static union kernfs_node_id *
+static u64
 blk_trace_request_get_cgid(struct request_queue *q, struct request *rq)
 {
 	if (!rq->bio)
-		return NULL;
+		return 0;
 	/* Use the first bio */
 	return blk_trace_bio_get_cgid(q, rq->bio);
 }
@@ -797,8 +794,7 @@ blk_trace_request_get_cgid(struct request_queue *q, struct request *rq)
  *
  **/
 static void blk_add_trace_rq(struct request *rq, int error,
-			     unsigned int nr_bytes, u32 what,
-			     union kernfs_node_id *cgid)
+			     unsigned int nr_bytes, u32 what, u64 cgid)
 {
 	struct blk_trace *bt = rq->q->blk_trace;
 
@@ -913,7 +909,7 @@ static void blk_add_trace_getrq(void *ignore,
 
 		if (bt)
 			__blk_add_trace(bt, 0, 0, rw, 0, BLK_TA_GETRQ, 0, 0,
-					NULL, NULL);
+					NULL, 0);
 	}
 }
 
@@ -929,7 +925,7 @@ static void blk_add_trace_sleeprq(void *ignore,
 
 		if (bt)
 			__blk_add_trace(bt, 0, 0, rw, 0, BLK_TA_SLEEPRQ,
-					0, 0, NULL, NULL);
+					0, 0, NULL, 0);
 	}
 }
 
@@ -938,7 +934,7 @@ static void blk_add_trace_plug(void *ignore, struct request_queue *q)
 	struct blk_trace *bt = q->blk_trace;
 
 	if (bt)
-		__blk_add_trace(bt, 0, 0, 0, 0, BLK_TA_PLUG, 0, 0, NULL, NULL);
+		__blk_add_trace(bt, 0, 0, 0, 0, BLK_TA_PLUG, 0, 0, NULL, 0);
 }
 
 static void blk_add_trace_unplug(void *ignore, struct request_queue *q,
@@ -955,7 +951,7 @@ static void blk_add_trace_unplug(void *ignore, struct request_queue *q,
 		else
 			what = BLK_TA_UNPLUG_TIMER;
 
-		__blk_add_trace(bt, 0, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu, NULL);
+		__blk_add_trace(bt, 0, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu, 0);
 	}
 }
 
@@ -1172,19 +1168,17 @@ const struct blk_io_trace *te_blk_io_trace(const struct trace_entry *ent)
 
 static inline const void *pdu_start(const struct trace_entry *ent, bool has_cg)
 {
-	return (void *)(te_blk_io_trace(ent) + 1) +
-		(has_cg ? sizeof(union kernfs_node_id) : 0);
+	return (void *)(te_blk_io_trace(ent) + 1) + (has_cg ? sizeof(u64) : 0);
 }
 
-static inline const void *cgid_start(const struct trace_entry *ent)
+static inline u64 t_cgid(const struct trace_entry *ent)
 {
-	return (void *)(te_blk_io_trace(ent) + 1);
+	return *(u64 *)(te_blk_io_trace(ent) + 1);
 }
 
 static inline int pdu_real_len(const struct trace_entry *ent, bool has_cg)
 {
-	return te_blk_io_trace(ent)->pdu_len -
-			(has_cg ? sizeof(union kernfs_node_id) : 0);
+	return te_blk_io_trace(ent)->pdu_len - (has_cg ? sizeof(u64) : 0);
 }
 
 static inline u32 t_action(const struct trace_entry *ent)
@@ -1257,7 +1251,7 @@ static void blk_log_action(struct trace_iterator *iter, const char *act,
 
 	fill_rwbs(rwbs, t);
 	if (has_cg) {
-		const union kernfs_node_id *id = cgid_start(iter->ent);
+		u64 id = t_cgid(iter->ent);
 
 		if (blk_tracer_flags.val & TRACE_BLK_OPT_CGNAME) {
 			char blkcg_name_buf[NAME_MAX + 1] = "<...>";
@@ -1269,9 +1263,10 @@ static void blk_log_action(struct trace_iterator *iter, const char *act,
 				 blkcg_name_buf, act, rwbs);
 		} else
 			trace_seq_printf(&iter->seq,
-				 "%3d,%-3d %x,%-x %2s %3s ",
+				 "%3d,%-3d %lx,%-x %2s %3s ",
 				 MAJOR(t->device), MINOR(t->device),
-				 id->ino, id->generation, act, rwbs);
+				 kernfs_id_ino(id), kernfs_id_gen(id),
+				 act, rwbs);
 	} else
 		trace_seq_printf(&iter->seq, "%3d,%-3d %2s %3s ",
 				 MAJOR(t->device), MINOR(t->device), act, rwbs);
diff --git a/net/core/filter.c b/net/core/filter.c
index ed6563622ce3..b360a7beb6fc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4089,7 +4089,7 @@ BPF_CALL_1(bpf_skb_cgroup_id, const struct sk_buff *, skb)
 		return 0;
 
 	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
-	return cgrp->kn->id.id;
+	return cgrp->kn->id;
 }
 
 static const struct bpf_func_proto bpf_skb_cgroup_id_proto = {
@@ -4114,7 +4114,7 @@ BPF_CALL_2(bpf_skb_ancestor_cgroup_id, const struct sk_buff *, skb, int,
 	if (!ancestor)
 		return 0;
 
-	return ancestor->kn->id.id;
+	return ancestor->kn->id;
 }
 
 static const struct bpf_func_proto bpf_skb_ancestor_cgroup_id_proto = {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/10] kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id()
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
                   ` (5 preceding siblings ...)
  2019-11-04 23:59 ` [PATCH 06/10] kernfs: convert kernfs_node->id from union kernfs_node_id to u64 Tejun Heo
@ 2019-11-04 23:59 ` Tejun Heo
  2019-11-04 23:59 ` [PATCH 08/10] kernfs: implement custom exportfs ops and fid type Tejun Heo
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel, Tejun Heo

kernfs_find_and_get_node_by_ino() looks the kernfs_node matching the
specified ino.  On top of that, kernfs_get_node_by_id() and
kernfs_fh_get_inode() implement full ID matching by testing the rest
of ID.

On surface, confusingly, the two are slightly different in that the
latter uses 0 gen as wildcard while the former doesn't - does it mean
that the latter can't uniquely identify inodes w/ 0 gen?  In practice,
this is a distinction without a difference because generation number
starts at 1.  There are no actual IDs with 0 gen, so it can always
safely used as wildcard.

Let's simplify the code by renaming kernfs_find_and_get_node_by_ino()
to kernfs_find_and_get_node_by_id(), moving all lookup logics into it,
and removing now unnecessary kernfs_get_node_by_id().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c             | 17 +++++++++++++----
 fs/kernfs/kernfs-internal.h |  2 --
 fs/kernfs/mount.c           | 26 ++------------------------
 include/linux/kernfs.h      |  3 ++-
 kernel/cgroup/cgroup.c      |  2 +-
 5 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index c67afb591e5b..5dcf19d4adbc 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -696,17 +696,22 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
 }
 
 /*
- * kernfs_find_and_get_node_by_ino - get kernfs_node from inode number
+ * kernfs_find_and_get_node_by_id - get kernfs_node from node id
  * @root: the kernfs root
- * @ino: inode number
+ * @id: the target node id
+ *
+ * @id's lower 32bits encode ino and upper gen.  If the gen portion is
+ * zero, all generations are matched.
  *
  * RETURNS:
  * NULL on failure. Return a kernfs node with reference counter incremented
  */
-struct kernfs_node *kernfs_find_and_get_node_by_ino(struct kernfs_root *root,
-						    unsigned int ino)
+struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
+						   u64 id)
 {
 	struct kernfs_node *kn;
+	ino_t ino = kernfs_id_ino(id);
+	u32 gen = kernfs_id_gen(id);
 
 	spin_lock(&kernfs_idr_lock);
 
@@ -714,6 +719,10 @@ struct kernfs_node *kernfs_find_and_get_node_by_ino(struct kernfs_root *root,
 	if (!kn)
 		goto err_unlock;
 
+	/* 0 matches all generations */
+	if (unlikely(gen && kernfs_gen(kn) != gen))
+		goto err_unlock;
+
 	/*
 	 * ACTIVATED is protected with kernfs_mutex but it was clear when
 	 * @kn was added to idr and we just wanna see it set.  No need to
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index 02ce570a9a3c..2f3c51d55261 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -109,8 +109,6 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
 				    const char *name, umode_t mode,
 				    kuid_t uid, kgid_t gid,
 				    unsigned flags);
-struct kernfs_node *kernfs_find_and_get_node_by_ino(struct kernfs_root *root,
-						    unsigned int ino);
 
 /*
  * file.c
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index f05d5d6f926d..8aed2cccd002 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -53,24 +53,6 @@ const struct super_operations kernfs_sops = {
 	.show_path	= kernfs_sop_show_path,
 };
 
-/*
- * Similar to kernfs_fh_get_inode, this one gets kernfs node from inode
- * number and generation
- */
-struct kernfs_node *kernfs_get_node_by_id(struct kernfs_root *root, u64 id)
-{
-	struct kernfs_node *kn;
-
-	kn = kernfs_find_and_get_node_by_ino(root, kernfs_id_ino(id));
-	if (!kn)
-		return NULL;
-	if (kernfs_gen(kn) != kernfs_id_gen(id)) {
-		kernfs_put(kn);
-		return NULL;
-	}
-	return kn;
-}
-
 static struct inode *kernfs_fh_get_inode(struct super_block *sb,
 		u64 ino, u32 generation)
 {
@@ -81,7 +63,8 @@ static struct inode *kernfs_fh_get_inode(struct super_block *sb,
 	if (ino == 0)
 		return ERR_PTR(-ESTALE);
 
-	kn = kernfs_find_and_get_node_by_ino(info->root, ino);
+	kn = kernfs_find_and_get_node_by_id(info->root,
+					    ino | ((u64)generation << 32));
 	if (!kn)
 		return ERR_PTR(-ESTALE);
 	inode = kernfs_get_inode(sb, kn);
@@ -89,11 +72,6 @@ static struct inode *kernfs_fh_get_inode(struct super_block *sb,
 	if (!inode)
 		return ERR_PTR(-ESTALE);
 
-	if (generation && inode->i_generation != generation) {
-		/* we didn't find the right inode.. */
-		iput(inode);
-		return ERR_PTR(-ESTALE);
-	}
 	return inode;
 }
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index b2fc5c8ef6d9..38267cc9420c 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -393,7 +393,8 @@ void kernfs_kill_sb(struct super_block *sb);
 
 void kernfs_init(void);
 
-struct kernfs_node *kernfs_get_node_by_id(struct kernfs_root *root, u64 id);
+struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
+						   u64 id);
 #else	/* CONFIG_KERNFS */
 
 static inline enum kernfs_node_type kernfs_type(struct kernfs_node *kn)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c6bd1a5a1977..b5dcbee5aa6c 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5790,7 +5790,7 @@ void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
 {
 	struct kernfs_node *kn;
 
-	kn = kernfs_get_node_by_id(cgrp_dfl_root.kf_root, id);
+	kn = kernfs_find_and_get_node_by_id(cgrp_dfl_root.kf_root, id);
 	if (!kn)
 		return;
 	kernfs_path(kn, buf, buflen);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/10] kernfs: implement custom exportfs ops and fid type
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
                   ` (6 preceding siblings ...)
  2019-11-04 23:59 ` [PATCH 07/10] kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id() Tejun Heo
@ 2019-11-04 23:59 ` Tejun Heo
  2019-11-04 23:59 ` [PATCH 09/10] kernfs: use 64bit inos if ino_t is 64bit Tejun Heo
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel, Tejun Heo

The current kernfs exportfs implementation uses the generic_fh_*()
helpers and FILEID_INO32_GEN[_PARENT] which limits ino to 32bits.
Let's implement custom exportfs operations and fid type to remove the
restriction.

* FILEID_KERNFS is a single u64 value whose content is
  kernfs_node->id.  This is the only native fid type.

* For backward compatibility with blk_log_action() path which exposes
  (ino,gen) pairs which userland assembles into FILEID_INO32_GEN keys,
  combine the generic keys into 64bit IDs in the same order.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 fs/kernfs/mount.c        | 77 +++++++++++++++++++++++++++++++---------
 include/linux/exportfs.h |  5 +++
 2 files changed, 66 insertions(+), 16 deletions(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 8aed2cccd002..37a1e5df117a 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -53,40 +53,84 @@ const struct super_operations kernfs_sops = {
 	.show_path	= kernfs_sop_show_path,
 };
 
-static struct inode *kernfs_fh_get_inode(struct super_block *sb,
-		u64 ino, u32 generation)
+static int kernfs_encode_fh(struct inode *inode, __u32 *fh, int *max_len,
+			    struct inode *parent)
+{
+	struct kernfs_node *kn = inode->i_private;
+
+	if (*max_len < 2) {
+		*max_len = 2;
+		return FILEID_INVALID;
+	}
+
+	*max_len = 2;
+	*(u64 *)fh = kn->id;
+	return FILEID_KERNFS;
+}
+
+static struct dentry *__kernfs_fh_to_dentry(struct super_block *sb,
+					    struct fid *fid, int fh_len,
+					    int fh_type, bool get_parent)
 {
 	struct kernfs_super_info *info = kernfs_info(sb);
-	struct inode *inode;
 	struct kernfs_node *kn;
+	struct inode *inode;
+	u64 id;
 
-	if (ino == 0)
-		return ERR_PTR(-ESTALE);
+	if (fh_len < 2)
+		return NULL;
+
+	switch (fh_type) {
+	case FILEID_KERNFS:
+		id = *(u64 *)fid;
+		break;
+	case FILEID_INO32_GEN:
+	case FILEID_INO32_GEN_PARENT:
+		/*
+		 * blk_log_action() exposes (ino,gen) pair without type and
+		 * userland can call us with generic fid constructed from
+		 * them.  Combine it back to ID.  See blk_log_action().
+		 */
+		id = ((u64)fid->i32.gen << 32) | fid->i32.ino;
+		break;
+	default:
+		return NULL;
+	}
 
-	kn = kernfs_find_and_get_node_by_id(info->root,
-					    ino | ((u64)generation << 32));
+	kn = kernfs_find_and_get_node_by_id(info->root, id);
 	if (!kn)
 		return ERR_PTR(-ESTALE);
+
+	if (get_parent) {
+		struct kernfs_node *parent;
+
+		parent = kernfs_get_parent(kn);
+		kernfs_put(kn);
+		kn = parent;
+		if (!kn)
+			return ERR_PTR(-ESTALE);
+	}
+
 	inode = kernfs_get_inode(sb, kn);
 	kernfs_put(kn);
 	if (!inode)
 		return ERR_PTR(-ESTALE);
 
-	return inode;
+	return d_obtain_alias(inode);
 }
 
-static struct dentry *kernfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
-		int fh_len, int fh_type)
+static struct dentry *kernfs_fh_to_dentry(struct super_block *sb,
+					  struct fid *fid, int fh_len,
+					  int fh_type)
 {
-	return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
-				    kernfs_fh_get_inode);
+	return __kernfs_fh_to_dentry(sb, fid, fh_len, fh_type, false);
 }
 
-static struct dentry *kernfs_fh_to_parent(struct super_block *sb, struct fid *fid,
-		int fh_len, int fh_type)
+static struct dentry *kernfs_fh_to_parent(struct super_block *sb,
+					  struct fid *fid, int fh_len,
+					  int fh_type)
 {
-	return generic_fh_to_parent(sb, fid, fh_len, fh_type,
-				    kernfs_fh_get_inode);
+	return __kernfs_fh_to_dentry(sb, fid, fh_len, fh_type, true);
 }
 
 static struct dentry *kernfs_get_parent_dentry(struct dentry *child)
@@ -97,6 +141,7 @@ static struct dentry *kernfs_get_parent_dentry(struct dentry *child)
 }
 
 static const struct export_operations kernfs_export_ops = {
+	.encode_fh	= kernfs_encode_fh,
 	.fh_to_dentry	= kernfs_fh_to_dentry,
 	.fh_to_parent	= kernfs_fh_to_parent,
 	.get_parent	= kernfs_get_parent_dentry,
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index cf6571fc9c01..d896b8657085 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -104,6 +104,11 @@ enum fid_type {
 	 */
 	FILEID_LUSTRE = 0x97,
 
+	/*
+	 * 64 bit unique kernfs id
+	 */
+	FILEID_KERNFS = 0xfe,
+
 	/*
 	 * Filesystems must not use 0xff file ID.
 	 */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/10] kernfs: use 64bit inos if ino_t is 64bit
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
                   ` (7 preceding siblings ...)
  2019-11-04 23:59 ` [PATCH 08/10] kernfs: implement custom exportfs ops and fid type Tejun Heo
@ 2019-11-04 23:59 ` Tejun Heo
  2019-11-04 23:59 ` [PATCH 10/10] cgroup: use cgrp->kn->id as the cgroup ID Tejun Heo
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel, Tejun Heo

Each kernfs_node is identified with a 64bit ID.  The low 32bit is
exposed as ino and the high gen.  While this already allows using inos
as keys by looking up with wildcard generation number of 0, it's
adding unnecessary complications for 64bit ino archs which can
directly use kernfs_node IDs as inos to uniquely identify each cgroup
instance.

This patch exposes IDs directly as inos on 64bit ino archs.  The
conversion is mostly straight-forward.

* 32bit ino archs behave the same as before.  64bit ino archs now use
  the whole 64bit ID as ino and the generation number is fixed at 1.

* 64bit inos still use the same idr allocator which gurantees that the
  lower 32bits identify the current live instance uniquely and the
  high 32bits are incremented whenever the low bits wrap.  As the
  upper 32bits are no longer used as gen and we don't wanna start ino
  allocation with 33rd bit set, the initial value for highbits
  allocation is changed to 0 on 64bit ino archs.

* blktrace exposes two 32bit numbers - (INO,GEN) pair - to identify
  the issuing cgroup.  Userland builds FILEID_INO32_GEN fids from
  these numbers to look up the cgroups.  To remain compatible with the
  behavior, always output (LOW32,HIGH32) which will be constructed
  back to the original 64bit ID by __kernfs_fh_to_dentry().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 fs/kernfs/dir.c         | 42 ++++++++++++++++++++++++++++-------------
 fs/kernfs/mount.c       |  7 ++++---
 include/linux/kernfs.h  | 20 ++++++++++++++------
 kernel/trace/blktrace.c | 21 +++++++++++++++++----
 4 files changed, 64 insertions(+), 26 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 5dcf19d4adbc..b2d9f79c4a7c 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -532,7 +532,7 @@ void kernfs_put(struct kernfs_node *kn)
 		kmem_cache_free(kernfs_iattrs_cache, kn->iattr);
 	}
 	spin_lock(&kernfs_idr_lock);
-	idr_remove(&root->ino_idr, kernfs_ino(kn));
+	idr_remove(&root->ino_idr, (u32)kernfs_ino(kn));
 	spin_unlock(&kernfs_idr_lock);
 	kmem_cache_free(kernfs_node_cache, kn);
 
@@ -617,7 +617,7 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 					     unsigned flags)
 {
 	struct kernfs_node *kn;
-	u32 gen;
+	u32 id_highbits;
 	int ret;
 
 	name = kstrdup_const(name, GFP_KERNEL);
@@ -631,16 +631,16 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 	idr_preload(GFP_KERNEL);
 	spin_lock(&kernfs_idr_lock);
 	ret = idr_alloc_cyclic(&root->ino_idr, kn, 1, 0, GFP_ATOMIC);
-	if (ret >= 0 && ret < root->last_ino)
-		root->next_generation++;
-	gen = root->next_generation;
-	root->last_ino = ret;
+	if (ret >= 0 && ret < root->last_id_lowbits)
+		root->id_highbits++;
+	id_highbits = root->id_highbits;
+	root->last_id_lowbits = ret;
 	spin_unlock(&kernfs_idr_lock);
 	idr_preload_end();
 	if (ret < 0)
 		goto err_out2;
 
-	kn->id = (u64)gen << 32 | ret;
+	kn->id = (u64)id_highbits << 32 | ret;
 
 	atomic_set(&kn->count, 1);
 	atomic_set(&kn->active, KN_DEACTIVATED_BIAS);
@@ -671,7 +671,7 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root,
 	return kn;
 
  err_out3:
-	idr_remove(&root->ino_idr, kernfs_ino(kn));
+	idr_remove(&root->ino_idr, (u32)kernfs_ino(kn));
  err_out2:
 	kmem_cache_free(kernfs_node_cache, kn);
  err_out1:
@@ -715,13 +715,19 @@ struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
 
 	spin_lock(&kernfs_idr_lock);
 
-	kn = idr_find(&root->ino_idr, ino);
+	kn = idr_find(&root->ino_idr, (u32)ino);
 	if (!kn)
 		goto err_unlock;
 
-	/* 0 matches all generations */
-	if (unlikely(gen && kernfs_gen(kn) != gen))
-		goto err_unlock;
+	if (sizeof(ino_t) >= sizeof(u64)) {
+		/* we looked up with the low 32bits, compare the whole */
+		if (kernfs_ino(kn) != ino)
+			goto err_unlock;
+	} else {
+		/* 0 matches all generations */
+		if (unlikely(gen && kernfs_gen(kn) != gen))
+			goto err_unlock;
+	}
 
 	/*
 	 * ACTIVATED is protected with kernfs_mutex but it was clear when
@@ -949,7 +955,17 @@ struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops,
 
 	idr_init(&root->ino_idr);
 	INIT_LIST_HEAD(&root->supers);
-	root->next_generation = 1;
+
+	/*
+	 * On 64bit ino setups, id is ino.  On 32bit, low 32bits are ino.
+	 * High bits generation.  The starting value for both ino and
+	 * genenration is 1.  Initialize upper 32bit allocation
+	 * accordingly.
+	 */
+	if (sizeof(ino_t) >= sizeof(u64))
+		root->id_highbits = 0;
+	else
+		root->id_highbits = 1;
 
 	kn = __kernfs_new_node(root, NULL, "", S_IFDIR | S_IRUGO | S_IXUGO,
 			       GLOBAL_ROOT_UID, GLOBAL_ROOT_GID,
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 37a1e5df117a..4d31503abaee 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -87,9 +87,10 @@ static struct dentry *__kernfs_fh_to_dentry(struct super_block *sb,
 	case FILEID_INO32_GEN:
 	case FILEID_INO32_GEN_PARENT:
 		/*
-		 * blk_log_action() exposes (ino,gen) pair without type and
-		 * userland can call us with generic fid constructed from
-		 * them.  Combine it back to ID.  See blk_log_action().
+		 * blk_log_action() exposes "LOW32,HIGH32" pair without
+		 * type and userland can call us with generic fid
+		 * constructed from them.  Combine it back to ID.  See
+		 * blk_log_action().
 		 */
 		id = ((u64)fid->i32.gen << 32) | fid->i32.ino;
 		break;
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 38267cc9420c..dded2e5a9f42 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -141,8 +141,8 @@ struct kernfs_node {
 	void			*priv;
 
 	/*
-	 * 64bit unique ID.  Lower 32bits carry the inode number and lower
-	 * generation.
+	 * 64bit unique ID.  On 64bit ino setups, id is the ino.  On 32bit,
+	 * the low 32bits are ino and upper generation.
 	 */
 	u64			id;
 
@@ -177,8 +177,8 @@ struct kernfs_root {
 
 	/* private fields, do not use outside kernfs proper */
 	struct idr		ino_idr;
-	u32			last_ino;
-	u32			next_generation;
+	u32			last_id_lowbits;
+	u32			id_highbits;
 	struct kernfs_syscall_ops *syscall_ops;
 
 	/* list of kernfs_super_info of this root, protected by kernfs_mutex */
@@ -284,12 +284,20 @@ static inline enum kernfs_node_type kernfs_type(struct kernfs_node *kn)
 
 static inline ino_t kernfs_id_ino(u64 id)
 {
-	return (u32)id;
+	/* id is ino if ino_t is 64bit; otherwise, low 32bits */
+	if (sizeof(ino_t) >= sizeof(u64))
+		return id;
+	else
+		return (u32)id;
 }
 
 static inline u32 kernfs_id_gen(u64 id)
 {
-	return id >> 32;
+	/* gen is fixed at 1 if ino_t is 64bit; otherwise, high 32bits */
+	if (sizeof(ino_t) >= sizeof(u64))
+		return 1;
+	else
+		return id >> 32;
 }
 
 static inline ino_t kernfs_ino(struct kernfs_node *kn)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index a986d2e74ca2..a7dac5b63f3f 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -1261,12 +1261,25 @@ static void blk_log_action(struct trace_iterator *iter, const char *act,
 			trace_seq_printf(&iter->seq, "%3d,%-3d %s %2s %3s ",
 				 MAJOR(t->device), MINOR(t->device),
 				 blkcg_name_buf, act, rwbs);
-		} else
+		} else {
+			/*
+			 * The cgid portion used to be "INO,GEN".  Userland
+			 * builds a FILEID_INO32_GEN fid out of them and
+			 * opens the cgroup using open_by_handle_at(2).
+			 * While 32bit ino setups are still the same, 64bit
+			 * ones now use the 64bit ino as the whole ID and
+			 * no longer use generation.
+			 *
+			 * Regarldess of the content, always output
+			 * "LOW32,HIGH32" so that FILEID_INO32_GEN fid can
+			 * be mapped back to @id on both 64 and 32bit ino
+			 * setups.  See __kernfs_fh_to_dentry().
+			 */
 			trace_seq_printf(&iter->seq,
-				 "%3d,%-3d %lx,%-x %2s %3s ",
+				 "%3d,%-3d %llx,%-llx %2s %3s ",
 				 MAJOR(t->device), MINOR(t->device),
-				 kernfs_id_ino(id), kernfs_id_gen(id),
-				 act, rwbs);
+				 id & U32_MAX, id >> 32, act, rwbs);
+		}
 	} else
 		trace_seq_printf(&iter->seq, "%3d,%-3d %2s %3s ",
 				 MAJOR(t->device), MINOR(t->device), act, rwbs);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/10] cgroup: use cgrp->kn->id as the cgroup ID
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
                   ` (8 preceding siblings ...)
  2019-11-04 23:59 ` [PATCH 09/10] kernfs: use 64bit inos if ino_t is 64bit Tejun Heo
@ 2019-11-04 23:59 ` Tejun Heo
  2019-11-06 13:50 ` [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Namhyung Kim
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-04 23:59 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel, Tejun Heo

cgroup ID is currently allocated using a dedicated per-hierarchy idr
and used internally and exposed through tracepoints and bpf.  This is
confusing because there are tracepoints and other interfaces which use
the cgroupfs ino as IDs.

The preceding changes made kn->id exposed as ino as 64bit ino on
supported archs or ino+gen (low 32bits as ino, high gen).  There's no
reason for cgroup to use different IDs.  The kernfs IDs are unique and
userland can easily discover them and map them back to paths using
standard file operations.

This patch replaces cgroup IDs with kernfs IDs.

* cgroup_id() is added and all cgroup ID users are converted to use it.

* kernfs_node creation is moved to earlier during cgroup init so that
  cgroup_id() is available during init.

* While at it, s/cgroup/cgrp/ in psi helpers for consistency.

* Fallback ID value is changed to 1 to be consistent with root cgroup
  ID.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 include/linux/cgroup-defs.h   | 17 +-------
 include/linux/cgroup.h        | 17 ++++----
 include/trace/events/cgroup.h |  6 +--
 kernel/bpf/helpers.c          |  2 +-
 kernel/bpf/local_storage.c    |  2 +-
 kernel/cgroup/cgroup.c        | 76 ++++++++++++-----------------------
 kernel/trace/blktrace.c       |  4 +-
 net/core/filter.c             |  4 +-
 8 files changed, 43 insertions(+), 85 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 430e219e3aba..cc38408aa065 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -354,16 +354,6 @@ struct cgroup {
 
 	unsigned long flags;		/* "unsigned long" so bitops work */
 
-	/*
-	 * idr allocated in-hierarchy ID.
-	 *
-	 * ID 0 is not used, the ID of the root cgroup is always 1, and a
-	 * new cgroup will be assigned with a smallest available ID.
-	 *
-	 * Allocating/Removing ID must be protected by cgroup_mutex.
-	 */
-	int id;
-
 	/*
 	 * The depth this cgroup is at.  The root is at depth zero and each
 	 * step down the hierarchy increments the level.  This along with
@@ -488,7 +478,7 @@ struct cgroup {
 	struct cgroup_freezer_state freezer;
 
 	/* ids of the ancestors at each level including self */
-	int ancestor_ids[];
+	u64 ancestor_ids[];
 };
 
 /*
@@ -509,7 +499,7 @@ struct cgroup_root {
 	struct cgroup cgrp;
 
 	/* for cgrp->ancestor_ids[0] */
-	int cgrp_ancestor_id_storage;
+	u64 cgrp_ancestor_id_storage;
 
 	/* Number of cgroups in the hierarchy, used only for /proc/cgroups */
 	atomic_t nr_cgrps;
@@ -520,9 +510,6 @@ struct cgroup_root {
 	/* Hierarchy-specific flags */
 	unsigned int flags;
 
-	/* IDs for cgroups in this hierarchy */
-	struct idr cgroup_idr;
-
 	/* The path to use for release notifications. */
 	char release_agent_path[PATH_MAX];
 
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 815fff49d555..d7ddebd0cdec 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -304,6 +304,11 @@ void css_task_iter_end(struct css_task_iter *it);
  * Inline functions.
  */
 
+static inline u64 cgroup_id(struct cgroup *cgrp)
+{
+	return cgrp->kn->id;
+}
+
 /**
  * css_get - obtain a reference on the specified css
  * @css: target css
@@ -565,7 +570,7 @@ static inline bool cgroup_is_descendant(struct cgroup *cgrp,
 {
 	if (cgrp->root != ancestor->root || cgrp->level < ancestor->level)
 		return false;
-	return cgrp->ancestor_ids[ancestor->level] == ancestor->id;
+	return cgrp->ancestor_ids[ancestor->level] == cgroup_id(ancestor);
 }
 
 /**
@@ -687,17 +692,13 @@ static inline void cgroup_kthread_ready(void)
 	current->no_cgroup_migration = 0;
 }
 
-static inline u64 cgroup_get_kernfs_id(struct cgroup *cgrp)
-{
-	return cgrp->kn->id;
-}
-
 void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen);
 #else /* !CONFIG_CGROUPS */
 
 struct cgroup_subsys_state;
 struct cgroup;
 
+static inline u64 cgroup_id(struct cgroup *cgrp) { return 1; }
 static inline void css_get(struct cgroup_subsys_state *css) {}
 static inline void css_put(struct cgroup_subsys_state *css) {}
 static inline int cgroup_attach_task_all(struct task_struct *from,
@@ -717,10 +718,6 @@ static inline int cgroup_init_early(void) { return 0; }
 static inline int cgroup_init(void) { return 0; }
 static inline void cgroup_init_kthreadd(void) {}
 static inline void cgroup_kthread_ready(void) {}
-static inline union u64 cgroup_get_kernfs_id(struct cgroup *cgrp)
-{
-	return 0;
-}
 
 static inline struct cgroup *cgroup_parent(struct cgroup *cgrp)
 {
diff --git a/include/trace/events/cgroup.h b/include/trace/events/cgroup.h
index a566cc521476..7f42a3de59e6 100644
--- a/include/trace/events/cgroup.h
+++ b/include/trace/events/cgroup.h
@@ -66,7 +66,7 @@ DECLARE_EVENT_CLASS(cgroup,
 
 	TP_fast_assign(
 		__entry->root = cgrp->root->hierarchy_id;
-		__entry->id = cgrp->id;
+		__entry->id = cgroup_id(cgrp);
 		__entry->level = cgrp->level;
 		__assign_str(path, path);
 	),
@@ -135,7 +135,7 @@ DECLARE_EVENT_CLASS(cgroup_migrate,
 
 	TP_fast_assign(
 		__entry->dst_root = dst_cgrp->root->hierarchy_id;
-		__entry->dst_id = dst_cgrp->id;
+		__entry->dst_id = cgroup_id(dst_cgrp);
 		__entry->dst_level = dst_cgrp->level;
 		__assign_str(dst_path, path);
 		__entry->pid = task->pid;
@@ -179,7 +179,7 @@ DECLARE_EVENT_CLASS(cgroup_event,
 
 	TP_fast_assign(
 		__entry->root = cgrp->root->hierarchy_id;
-		__entry->id = cgrp->id;
+		__entry->id = cgroup_id(cgrp);
 		__entry->level = cgrp->level;
 		__assign_str(path, path);
 		__entry->val = val;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 912e761cd17a..cada974c9f4e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -317,7 +317,7 @@ BPF_CALL_0(bpf_get_current_cgroup_id)
 {
 	struct cgroup *cgrp = task_dfl_cgroup(current);
 
-	return cgrp->kn->id;
+	return cgroup_id(cgrp);
 }
 
 const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 5d867f6d7204..2ba750725cb2 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -569,7 +569,7 @@ void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
 		return;
 
 	storage->key.attach_type = type;
-	storage->key.cgroup_inode_id = cgroup->kn->id;
+	storage->key.cgroup_inode_id = cgroup_id(cgroup);
 
 	map = storage->map;
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index b5dcbee5aa6c..c12dcf7dc432 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1308,10 +1308,7 @@ static void cgroup_exit_root_id(struct cgroup_root *root)
 
 void cgroup_free_root(struct cgroup_root *root)
 {
-	if (root) {
-		idr_destroy(&root->cgroup_idr);
-		kfree(root);
-	}
+	kfree(root);
 }
 
 static void cgroup_destroy_root(struct cgroup_root *root)
@@ -1917,7 +1914,6 @@ void init_cgroup_root(struct cgroup_fs_context *ctx)
 	atomic_set(&root->nr_cgrps, 1);
 	cgrp->root = root;
 	init_cgroup_housekeeping(cgrp);
-	idr_init(&root->cgroup_idr);
 
 	root->flags = ctx->flags;
 	if (ctx->release_agent)
@@ -1938,12 +1934,6 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	ret = cgroup_idr_alloc(&root->cgroup_idr, root_cgrp, 1, 2, GFP_KERNEL);
-	if (ret < 0)
-		goto out;
-	root_cgrp->id = ret;
-	root_cgrp->ancestor_ids[0] = ret;
-
 	ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release,
 			      0, GFP_KERNEL);
 	if (ret)
@@ -1976,6 +1966,8 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 		goto exit_root_id;
 	}
 	root_cgrp->kn = root->kf_root->kn;
+	WARN_ON_ONCE(cgroup_id(root_cgrp) != 1);
+	root_cgrp->ancestor_ids[0] = cgroup_id(root_cgrp);
 
 	ret = css_populate_dir(&root_cgrp->self);
 	if (ret)
@@ -3552,22 +3544,22 @@ static int cpu_stat_show(struct seq_file *seq, void *v)
 #ifdef CONFIG_PSI
 static int cgroup_io_pressure_show(struct seq_file *seq, void *v)
 {
-	struct cgroup *cgroup = seq_css(seq)->cgroup;
-	struct psi_group *psi = cgroup->id == 1 ? &psi_system : &cgroup->psi;
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	struct psi_group *psi = cgroup_id(cgrp) == 1 ? &psi_system : &cgrp->psi;
 
 	return psi_show(seq, psi, PSI_IO);
 }
 static int cgroup_memory_pressure_show(struct seq_file *seq, void *v)
 {
-	struct cgroup *cgroup = seq_css(seq)->cgroup;
-	struct psi_group *psi = cgroup->id == 1 ? &psi_system : &cgroup->psi;
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	struct psi_group *psi = cgroup_id(cgrp) == 1 ? &psi_system : &cgrp->psi;
 
 	return psi_show(seq, psi, PSI_MEM);
 }
 static int cgroup_cpu_pressure_show(struct seq_file *seq, void *v)
 {
-	struct cgroup *cgroup = seq_css(seq)->cgroup;
-	struct psi_group *psi = cgroup->id == 1 ? &psi_system : &cgroup->psi;
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	struct psi_group *psi = cgroup_id(cgrp) == 1 ? &psi_system : &cgrp->psi;
 
 	return psi_show(seq, psi, PSI_CPU);
 }
@@ -4987,9 +4979,6 @@ static void css_release_work_fn(struct work_struct *work)
 			tcgrp->nr_dying_descendants--;
 		spin_unlock_irq(&css_set_lock);
 
-		cgroup_idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
-		cgrp->id = -1;
-
 		/*
 		 * There are two control paths which try to determine
 		 * cgroup from dentry without going through kernfs -
@@ -5154,10 +5143,12 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
  * it isn't associated with its kernfs_node and doesn't have the control
  * mask applied.
  */
-static struct cgroup *cgroup_create(struct cgroup *parent)
+static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
+				    umode_t mode)
 {
 	struct cgroup_root *root = parent->root;
 	struct cgroup *cgrp, *tcgrp;
+	struct kernfs_node *kn;
 	int level = parent->level + 1;
 	int ret;
 
@@ -5177,15 +5168,13 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 			goto out_cancel_ref;
 	}
 
-	/*
-	 * Temporarily set the pointer to NULL, so idr_find() won't return
-	 * a half-baked cgroup.
-	 */
-	cgrp->id = cgroup_idr_alloc(&root->cgroup_idr, NULL, 2, 0, GFP_KERNEL);
-	if (cgrp->id < 0) {
-		ret = -ENOMEM;
+	/* create the directory */
+	kn = kernfs_create_dir(parent->kn, name, mode, cgrp);
+	if (IS_ERR(kn)) {
+		ret = PTR_ERR(kn);
 		goto out_stat_exit;
 	}
+	cgrp->kn = kn;
 
 	init_cgroup_housekeeping(cgrp);
 
@@ -5195,7 +5184,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 
 	ret = psi_cgroup_alloc(cgrp);
 	if (ret)
-		goto out_idr_free;
+		goto out_kernfs_remove;
 
 	ret = cgroup_bpf_inherit(cgrp);
 	if (ret)
@@ -5219,7 +5208,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 
 	spin_lock_irq(&css_set_lock);
 	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) {
-		cgrp->ancestor_ids[tcgrp->level] = tcgrp->id;
+		cgrp->ancestor_ids[tcgrp->level] = cgroup_id(tcgrp);
 
 		if (tcgrp != cgrp) {
 			tcgrp->nr_descendants++;
@@ -5248,12 +5237,6 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	atomic_inc(&root->nr_cgrps);
 	cgroup_get_live(parent);
 
-	/*
-	 * @cgrp is now fully operational.  If something fails after this
-	 * point, it'll be released via the normal destruction path.
-	 */
-	cgroup_idr_replace(&root->cgroup_idr, cgrp, cgrp->id);
-
 	/*
 	 * On the default hierarchy, a child doesn't automatically inherit
 	 * subtree_control from the parent.  Each is configured manually.
@@ -5267,8 +5250,8 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 
 out_psi_free:
 	psi_cgroup_free(cgrp);
-out_idr_free:
-	cgroup_idr_remove(&root->cgroup_idr, cgrp->id);
+out_kernfs_remove:
+	kernfs_remove(cgrp->kn);
 out_stat_exit:
 	if (cgroup_on_dfl(parent))
 		cgroup_rstat_exit(cgrp);
@@ -5305,7 +5288,6 @@ static bool cgroup_check_hierarchy_limits(struct cgroup *parent)
 int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 {
 	struct cgroup *parent, *cgrp;
-	struct kernfs_node *kn;
 	int ret;
 
 	/* do not accept '\n' to prevent making /proc/<pid>/cgroup unparsable */
@@ -5321,27 +5303,19 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 		goto out_unlock;
 	}
 
-	cgrp = cgroup_create(parent);
+	cgrp = cgroup_create(parent, name, mode);
 	if (IS_ERR(cgrp)) {
 		ret = PTR_ERR(cgrp);
 		goto out_unlock;
 	}
 
-	/* create the directory */
-	kn = kernfs_create_dir(parent->kn, name, mode, cgrp);
-	if (IS_ERR(kn)) {
-		ret = PTR_ERR(kn);
-		goto out_destroy;
-	}
-	cgrp->kn = kn;
-
 	/*
 	 * This extra ref will be put in cgroup_free_fn() and guarantees
 	 * that @cgrp->kn is always accessible.
 	 */
-	kernfs_get(kn);
+	kernfs_get(cgrp->kn);
 
-	ret = cgroup_kn_set_ugid(kn);
+	ret = cgroup_kn_set_ugid(cgrp->kn);
 	if (ret)
 		goto out_destroy;
 
@@ -5356,7 +5330,7 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 	TRACE_CGROUP_PATH(mkdir, cgrp);
 
 	/* let's create and online css's */
-	kernfs_activate(kn);
+	kernfs_activate(cgrp->kn);
 
 	ret = 0;
 	goto out_unlock;
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index a7dac5b63f3f..475e29498bca 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -171,7 +171,7 @@ void __trace_note_message(struct blk_trace *bt, struct blkcg *blkcg,
 		blkcg = NULL;
 #ifdef CONFIG_BLK_CGROUP
 	trace_note(bt, 0, BLK_TN_MESSAGE, buf, n,
-		blkcg ? cgroup_get_kernfs_id(blkcg->css.cgroup) : 0);
+		   blkcg ? cgroup_id(blkcg->css.cgroup) : 1);
 #else
 	trace_note(bt, 0, BLK_TN_MESSAGE, buf, n, 0);
 #endif
@@ -759,7 +759,7 @@ static u64 blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio)
 
 	if (!bio->bi_blkg)
 		return 0;
-	return cgroup_get_kernfs_id(bio_blkcg(bio)->css.cgroup);
+	return cgroup_id(bio_blkcg(bio)->css.cgroup);
 }
 #else
 u64 blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio)
diff --git a/net/core/filter.c b/net/core/filter.c
index b360a7beb6fc..caef7c74cad5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4089,7 +4089,7 @@ BPF_CALL_1(bpf_skb_cgroup_id, const struct sk_buff *, skb)
 		return 0;
 
 	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
-	return cgrp->kn->id;
+	return cgroup_id(cgrp);
 }
 
 static const struct bpf_func_proto bpf_skb_cgroup_id_proto = {
@@ -4114,7 +4114,7 @@ BPF_CALL_2(bpf_skb_ancestor_cgroup_id, const struct sk_buff *, skb, int,
 	if (!ancestor)
 		return 0;
 
-	return ancestor->kn->id;
+	return cgroup_id(ancestor);
 }
 
 static const struct bpf_func_proto bpf_skb_ancestor_cgroup_id_proto = {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 02/10] writeback: use ino_t for inodes in tracepoints
  2019-11-04 23:59 ` [PATCH 02/10] writeback: use ino_t for inodes in tracepoints Tejun Heo
@ 2019-11-05  9:11   ` Jan Kara
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2019-11-05  9:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: gregkh, kernel-team, linux-kernel, cgroups, lizefan, hannes,
	namhyung, ast, daniel, Jan Kara, Jens Axboe

On Mon 04-11-19 15:59:36, Tejun Heo wrote:
> Writeback TPs currently use mix of 32 and 64bits for inos.  This isn't
> currently broken because only cgroup inos are using 32bits and they're
> limited to 32bits.  cgroup inos will make use of 64bits.  Let's
> uniformly use ino_t.
> 
> While at it, switch the default cgroup ino value used when cgroup is
> disabled to 1 instead of -1U as root cgroup always uses ino 1.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Namhyung Kim <namhyung@kernel.org>

Thanks! The patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> Hello,
> 
> This is to prepare for kernfs 64bit ino support.  It'd be best to
> route this with the rest of kernfs patchset.
> 
> Thanks.
> 
>  include/trace/events/writeback.h | 88 ++++++++++++++++----------------
>  1 file changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
> index c2ce6480b4b1..95e50677476b 100644
> --- a/include/trace/events/writeback.h
> +++ b/include/trace/events/writeback.h
> @@ -61,7 +61,7 @@ DECLARE_EVENT_CLASS(writeback_page_template,
>  
>  	TP_STRUCT__entry (
>  		__array(char, name, 32)
> -		__field(unsigned long, ino)
> +		__field(ino_t, ino)
>  		__field(pgoff_t, index)
>  	),
>  
> @@ -102,7 +102,7 @@ DECLARE_EVENT_CLASS(writeback_dirty_inode_template,
>  
>  	TP_STRUCT__entry (
>  		__array(char, name, 32)
> -		__field(unsigned long, ino)
> +		__field(ino_t, ino)
>  		__field(unsigned long, state)
>  		__field(unsigned long, flags)
>  	),
> @@ -150,28 +150,28 @@ DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode,
>  #ifdef CREATE_TRACE_POINTS
>  #ifdef CONFIG_CGROUP_WRITEBACK
>  
> -static inline unsigned int __trace_wb_assign_cgroup(struct bdi_writeback *wb)
> +static inline ino_t __trace_wb_assign_cgroup(struct bdi_writeback *wb)
>  {
>  	return wb->memcg_css->cgroup->kn->id.ino;
>  }
>  
> -static inline unsigned int __trace_wbc_assign_cgroup(struct writeback_control *wbc)
> +static inline ino_t __trace_wbc_assign_cgroup(struct writeback_control *wbc)
>  {
>  	if (wbc->wb)
>  		return __trace_wb_assign_cgroup(wbc->wb);
>  	else
> -		return -1U;
> +		return 1;
>  }
>  #else	/* CONFIG_CGROUP_WRITEBACK */
>  
> -static inline unsigned int __trace_wb_assign_cgroup(struct bdi_writeback *wb)
> +static inline ino_t __trace_wb_assign_cgroup(struct bdi_writeback *wb)
>  {
> -	return -1U;
> +	return 1;
>  }
>  
> -static inline unsigned int __trace_wbc_assign_cgroup(struct writeback_control *wbc)
> +static inline ino_t __trace_wbc_assign_cgroup(struct writeback_control *wbc)
>  {
> -	return -1U;
> +	return 1;
>  }
>  
>  #endif	/* CONFIG_CGROUP_WRITEBACK */
> @@ -187,8 +187,8 @@ TRACE_EVENT(inode_foreign_history,
>  
>  	TP_STRUCT__entry(
>  		__array(char,		name, 32)
> -		__field(unsigned long,	ino)
> -		__field(unsigned int,	cgroup_ino)
> +		__field(ino_t,		ino)
> +		__field(ino_t,		cgroup_ino)
>  		__field(unsigned int,	history)
>  	),
>  
> @@ -199,7 +199,7 @@ TRACE_EVENT(inode_foreign_history,
>  		__entry->history	= history;
>  	),
>  
> -	TP_printk("bdi %s: ino=%lu cgroup_ino=%u history=0x%x",
> +	TP_printk("bdi %s: ino=%lu cgroup_ino=%lu history=0x%x",
>  		__entry->name,
>  		__entry->ino,
>  		__entry->cgroup_ino,
> @@ -216,9 +216,9 @@ TRACE_EVENT(inode_switch_wbs,
>  
>  	TP_STRUCT__entry(
>  		__array(char,		name, 32)
> -		__field(unsigned long,	ino)
> -		__field(unsigned int,	old_cgroup_ino)
> -		__field(unsigned int,	new_cgroup_ino)
> +		__field(ino_t,		ino)
> +		__field(ino_t,		old_cgroup_ino)
> +		__field(ino_t,		new_cgroup_ino)
>  	),
>  
>  	TP_fast_assign(
> @@ -228,7 +228,7 @@ TRACE_EVENT(inode_switch_wbs,
>  		__entry->new_cgroup_ino	= __trace_wb_assign_cgroup(new_wb);
>  	),
>  
> -	TP_printk("bdi %s: ino=%lu old_cgroup_ino=%u new_cgroup_ino=%u",
> +	TP_printk("bdi %s: ino=%lu old_cgroup_ino=%lu new_cgroup_ino=%lu",
>  		__entry->name,
>  		__entry->ino,
>  		__entry->old_cgroup_ino,
> @@ -245,10 +245,10 @@ TRACE_EVENT(track_foreign_dirty,
>  	TP_STRUCT__entry(
>  		__array(char,		name, 32)
>  		__field(u64,		bdi_id)
> -		__field(unsigned long,	ino)
> +		__field(ino_t,		ino)
>  		__field(unsigned int,	memcg_id)
> -		__field(unsigned int,	cgroup_ino)
> -		__field(unsigned int,	page_cgroup_ino)
> +		__field(ino_t,		cgroup_ino)
> +		__field(ino_t,		page_cgroup_ino)
>  	),
>  
>  	TP_fast_assign(
> @@ -263,7 +263,7 @@ TRACE_EVENT(track_foreign_dirty,
>  		__entry->page_cgroup_ino = page->mem_cgroup->css.cgroup->kn->id.ino;
>  	),
>  
> -	TP_printk("bdi %s[%llu]: ino=%lu memcg_id=%u cgroup_ino=%u page_cgroup_ino=%u",
> +	TP_printk("bdi %s[%llu]: ino=%lu memcg_id=%u cgroup_ino=%lu page_cgroup_ino=%lu",
>  		__entry->name,
>  		__entry->bdi_id,
>  		__entry->ino,
> @@ -282,7 +282,7 @@ TRACE_EVENT(flush_foreign,
>  
>  	TP_STRUCT__entry(
>  		__array(char,		name, 32)
> -		__field(unsigned int,	cgroup_ino)
> +		__field(ino_t,		cgroup_ino)
>  		__field(unsigned int,	frn_bdi_id)
>  		__field(unsigned int,	frn_memcg_id)
>  	),
> @@ -294,7 +294,7 @@ TRACE_EVENT(flush_foreign,
>  		__entry->frn_memcg_id	= frn_memcg_id;
>  	),
>  
> -	TP_printk("bdi %s: cgroup_ino=%u frn_bdi_id=%u frn_memcg_id=%u",
> +	TP_printk("bdi %s: cgroup_ino=%lu frn_bdi_id=%u frn_memcg_id=%u",
>  		__entry->name,
>  		__entry->cgroup_ino,
>  		__entry->frn_bdi_id,
> @@ -311,9 +311,9 @@ DECLARE_EVENT_CLASS(writeback_write_inode_template,
>  
>  	TP_STRUCT__entry (
>  		__array(char, name, 32)
> -		__field(unsigned long, ino)
> +		__field(ino_t, ino)
>  		__field(int, sync_mode)
> -		__field(unsigned int, cgroup_ino)
> +		__field(ino_t, cgroup_ino)
>  	),
>  
>  	TP_fast_assign(
> @@ -324,7 +324,7 @@ DECLARE_EVENT_CLASS(writeback_write_inode_template,
>  		__entry->cgroup_ino	= __trace_wbc_assign_cgroup(wbc);
>  	),
>  
> -	TP_printk("bdi %s: ino=%lu sync_mode=%d cgroup_ino=%u",
> +	TP_printk("bdi %s: ino=%lu sync_mode=%d cgroup_ino=%lu",
>  		__entry->name,
>  		__entry->ino,
>  		__entry->sync_mode,
> @@ -358,7 +358,7 @@ DECLARE_EVENT_CLASS(writeback_work_class,
>  		__field(int, range_cyclic)
>  		__field(int, for_background)
>  		__field(int, reason)
> -		__field(unsigned int, cgroup_ino)
> +		__field(ino_t, cgroup_ino)
>  	),
>  	TP_fast_assign(
>  		strscpy_pad(__entry->name,
> @@ -374,7 +374,7 @@ DECLARE_EVENT_CLASS(writeback_work_class,
>  		__entry->cgroup_ino = __trace_wb_assign_cgroup(wb);
>  	),
>  	TP_printk("bdi %s: sb_dev %d:%d nr_pages=%ld sync_mode=%d "
> -		  "kupdate=%d range_cyclic=%d background=%d reason=%s cgroup_ino=%u",
> +		  "kupdate=%d range_cyclic=%d background=%d reason=%s cgroup_ino=%lu",
>  		  __entry->name,
>  		  MAJOR(__entry->sb_dev), MINOR(__entry->sb_dev),
>  		  __entry->nr_pages,
> @@ -413,13 +413,13 @@ DECLARE_EVENT_CLASS(writeback_class,
>  	TP_ARGS(wb),
>  	TP_STRUCT__entry(
>  		__array(char, name, 32)
> -		__field(unsigned int, cgroup_ino)
> +		__field(ino_t, cgroup_ino)
>  	),
>  	TP_fast_assign(
>  		strscpy_pad(__entry->name, dev_name(wb->bdi->dev), 32);
>  		__entry->cgroup_ino = __trace_wb_assign_cgroup(wb);
>  	),
> -	TP_printk("bdi %s: cgroup_ino=%u",
> +	TP_printk("bdi %s: cgroup_ino=%lu",
>  		  __entry->name,
>  		  __entry->cgroup_ino
>  	)
> @@ -459,7 +459,7 @@ DECLARE_EVENT_CLASS(wbc_class,
>  		__field(int, range_cyclic)
>  		__field(long, range_start)
>  		__field(long, range_end)
> -		__field(unsigned int, cgroup_ino)
> +		__field(ino_t, cgroup_ino)
>  	),
>  
>  	TP_fast_assign(
> @@ -478,7 +478,7 @@ DECLARE_EVENT_CLASS(wbc_class,
>  
>  	TP_printk("bdi %s: towrt=%ld skip=%ld mode=%d kupd=%d "
>  		"bgrd=%d reclm=%d cyclic=%d "
> -		"start=0x%lx end=0x%lx cgroup_ino=%u",
> +		"start=0x%lx end=0x%lx cgroup_ino=%lu",
>  		__entry->name,
>  		__entry->nr_to_write,
>  		__entry->pages_skipped,
> @@ -510,7 +510,7 @@ TRACE_EVENT(writeback_queue_io,
>  		__field(long,		age)
>  		__field(int,		moved)
>  		__field(int,		reason)
> -		__field(unsigned int,	cgroup_ino)
> +		__field(ino_t,		cgroup_ino)
>  	),
>  	TP_fast_assign(
>  		unsigned long *older_than_this = work->older_than_this;
> @@ -522,7 +522,7 @@ TRACE_EVENT(writeback_queue_io,
>  		__entry->reason	= work->reason;
>  		__entry->cgroup_ino	= __trace_wb_assign_cgroup(wb);
>  	),
> -	TP_printk("bdi %s: older=%lu age=%ld enqueue=%d reason=%s cgroup_ino=%u",
> +	TP_printk("bdi %s: older=%lu age=%ld enqueue=%d reason=%s cgroup_ino=%lu",
>  		__entry->name,
>  		__entry->older,	/* older_than_this in jiffies */
>  		__entry->age,	/* older_than_this in relative milliseconds */
> @@ -596,7 +596,7 @@ TRACE_EVENT(bdi_dirty_ratelimit,
>  		__field(unsigned long,	dirty_ratelimit)
>  		__field(unsigned long,	task_ratelimit)
>  		__field(unsigned long,	balanced_dirty_ratelimit)
> -		__field(unsigned int,	cgroup_ino)
> +		__field(ino_t,		cgroup_ino)
>  	),
>  
>  	TP_fast_assign(
> @@ -614,7 +614,7 @@ TRACE_EVENT(bdi_dirty_ratelimit,
>  	TP_printk("bdi %s: "
>  		  "write_bw=%lu awrite_bw=%lu dirty_rate=%lu "
>  		  "dirty_ratelimit=%lu task_ratelimit=%lu "
> -		  "balanced_dirty_ratelimit=%lu cgroup_ino=%u",
> +		  "balanced_dirty_ratelimit=%lu cgroup_ino=%lu",
>  		  __entry->bdi,
>  		  __entry->write_bw,		/* write bandwidth */
>  		  __entry->avg_write_bw,	/* avg write bandwidth */
> @@ -660,7 +660,7 @@ TRACE_EVENT(balance_dirty_pages,
>  		__field(	 long,	pause)
>  		__field(unsigned long,	period)
>  		__field(	 long,	think)
> -		__field(unsigned int,	cgroup_ino)
> +		__field(ino_t,		cgroup_ino)
>  	),
>  
>  	TP_fast_assign(
> @@ -692,7 +692,7 @@ TRACE_EVENT(balance_dirty_pages,
>  		  "bdi_setpoint=%lu bdi_dirty=%lu "
>  		  "dirty_ratelimit=%lu task_ratelimit=%lu "
>  		  "dirtied=%u dirtied_pause=%u "
> -		  "paused=%lu pause=%ld period=%lu think=%ld cgroup_ino=%u",
> +		  "paused=%lu pause=%ld period=%lu think=%ld cgroup_ino=%lu",
>  		  __entry->bdi,
>  		  __entry->limit,
>  		  __entry->setpoint,
> @@ -718,10 +718,10 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
>  
>  	TP_STRUCT__entry(
>  		__array(char, name, 32)
> -		__field(unsigned long, ino)
> +		__field(ino_t, ino)
>  		__field(unsigned long, state)
>  		__field(unsigned long, dirtied_when)
> -		__field(unsigned int, cgroup_ino)
> +		__field(ino_t, cgroup_ino)
>  	),
>  
>  	TP_fast_assign(
> @@ -733,7 +733,7 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
>  		__entry->cgroup_ino	= __trace_wb_assign_cgroup(inode_to_wb(inode));
>  	),
>  
> -	TP_printk("bdi %s: ino=%lu state=%s dirtied_when=%lu age=%lu cgroup_ino=%u",
> +	TP_printk("bdi %s: ino=%lu state=%s dirtied_when=%lu age=%lu cgroup_ino=%lu",
>  		  __entry->name,
>  		  __entry->ino,
>  		  show_inode_state(__entry->state),
> @@ -789,13 +789,13 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
>  
>  	TP_STRUCT__entry(
>  		__array(char, name, 32)
> -		__field(unsigned long, ino)
> +		__field(ino_t, ino)
>  		__field(unsigned long, state)
>  		__field(unsigned long, dirtied_when)
>  		__field(unsigned long, writeback_index)
>  		__field(long, nr_to_write)
>  		__field(unsigned long, wrote)
> -		__field(unsigned int, cgroup_ino)
> +		__field(ino_t, cgroup_ino)
>  	),
>  
>  	TP_fast_assign(
> @@ -811,7 +811,7 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
>  	),
>  
>  	TP_printk("bdi %s: ino=%lu state=%s dirtied_when=%lu age=%lu "
> -		  "index=%lu to_write=%ld wrote=%lu cgroup_ino=%u",
> +		  "index=%lu to_write=%ld wrote=%lu cgroup_ino=%lu",
>  		  __entry->name,
>  		  __entry->ino,
>  		  show_inode_state(__entry->state),
> @@ -845,7 +845,7 @@ DECLARE_EVENT_CLASS(writeback_inode_template,
>  
>  	TP_STRUCT__entry(
>  		__field(	dev_t,	dev			)
> -		__field(unsigned long,	ino			)
> +		__field(	ino_t,	ino			)
>  		__field(unsigned long,	state			)
>  		__field(	__u16, mode			)
>  		__field(unsigned long, dirtied_when		)
> -- 
> 2.17.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 03/10] netprio: use css ID instead of cgroup ID
  2019-11-04 23:59 ` [PATCH 03/10] netprio: use css ID instead of cgroup ID Tejun Heo
@ 2019-11-05 13:27   ` Neil Horman
  0 siblings, 0 replies; 19+ messages in thread
From: Neil Horman @ 2019-11-05 13:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: gregkh, kernel-team, linux-kernel, cgroups, lizefan, hannes,
	namhyung, ast, daniel, David S. Miller

On Mon, Nov 04, 2019 at 03:59:37PM -0800, Tejun Heo wrote:
> netprio uses cgroup ID to index the priority mapping table.  This is
> currently okay as cgroup IDs are allocated using idr and packed.
> However, cgroup IDs will be changed to use full 64bit range and won't
> be packed making this impractical.  netprio doesn't care what type of
> IDs it uses as long as they can identify the controller instances and
> are packed.  Let's switch to css IDs instead of cgroup IDs.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> ---
> Hello,
> 
> This is to prepare for kernfs 64bit ino support.  It'd be best to
> route this with the rest of kernfs patchset.
> 
> Thanks.
> 
>  include/net/netprio_cgroup.h | 2 +-
>  net/core/netprio_cgroup.c    | 8 ++++----
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/net/netprio_cgroup.h b/include/net/netprio_cgroup.h
> index cfc9441ef074..dec7522b6ce1 100644
> --- a/include/net/netprio_cgroup.h
> +++ b/include/net/netprio_cgroup.h
> @@ -26,7 +26,7 @@ static inline u32 task_netprioidx(struct task_struct *p)
>  
>  	rcu_read_lock();
>  	css = task_css(p, net_prio_cgrp_id);
> -	idx = css->cgroup->id;
> +	idx = css->id;
>  	rcu_read_unlock();
>  	return idx;
>  }
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index 256b7954b720..8881dd943dd0 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -93,7 +93,7 @@ static int extend_netdev_table(struct net_device *dev, u32 target_idx)
>  static u32 netprio_prio(struct cgroup_subsys_state *css, struct net_device *dev)
>  {
>  	struct netprio_map *map = rcu_dereference_rtnl(dev->priomap);
> -	int id = css->cgroup->id;
> +	int id = css->id;
>  
>  	if (map && id < map->priomap_len)
>  		return map->priomap[id];
> @@ -113,7 +113,7 @@ static int netprio_set_prio(struct cgroup_subsys_state *css,
>  			    struct net_device *dev, u32 prio)
>  {
>  	struct netprio_map *map;
> -	int id = css->cgroup->id;
> +	int id = css->id;
>  	int ret;
>  
>  	/* avoid extending priomap for zero writes */
> @@ -177,7 +177,7 @@ static void cgrp_css_free(struct cgroup_subsys_state *css)
>  
>  static u64 read_prioidx(struct cgroup_subsys_state *css, struct cftype *cft)
>  {
> -	return css->cgroup->id;
> +	return css->id;
>  }
>  
>  static int read_priomap(struct seq_file *sf, void *v)
> @@ -237,7 +237,7 @@ static void net_prio_attach(struct cgroup_taskset *tset)
>  	struct cgroup_subsys_state *css;
>  
>  	cgroup_taskset_for_each(p, css, tset) {
> -		void *v = (void *)(unsigned long)css->cgroup->id;
> +		void *v = (void *)(unsigned long)css->id;
>  
>  		task_lock(p);
>  		iterate_fd(p->files, 0, update_netprio, v);
> -- 
> 2.17.1
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
                   ` (9 preceding siblings ...)
  2019-11-04 23:59 ` [PATCH 10/10] cgroup: use cgrp->kn->id as the cgroup ID Tejun Heo
@ 2019-11-06 13:50 ` Namhyung Kim
  2019-11-07 15:31   ` Tejun Heo
  2019-11-12 15:51 ` Tejun Heo
  2019-11-12 16:09 ` Greg KH
  12 siblings, 1 reply; 19+ messages in thread
From: Namhyung Kim @ 2019-11-06 13:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, kernel-team, linux-kernel, cgroups, Li Zefan,
	Johannes Weiner, Alexei Starovoitov, Daniel Borkmann

Hi Tejun,

On Tue, Nov 5, 2019 at 8:59 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> Currently, there are three IDs which are being used to identify a
> cgroup.
>
> 1. cgroup->id
> 2. cgroupfs 32bit ino
> 3. cgroupfs 32bit ino + 32bit gen
>
> All three IDs are visible to userland through different interfaces.
> This is very confusing and #1 can't even be resolved to cgroups from
> userland.
>
> A 64bit number is sufficient to identify a cgroup instance uniquely
> and ino_t is 64bit on all archs except for alpha.  There's no reason
> for three different IDs at all.  This patchset updates kernfs so that
> it supports 64bit ino and associated exportfs operations and unifies
> the cgroup IDs.
>
> * On 64bit ino archs, ino is kernfs node ID which is also the cgroup
>   ID.  The ino can be passed directly into open_by_handle_at(2) w/ the
>   new key type FILEID_KERNFS.  Backward compatibility is maintained
>   for FILEID_INO32_GEN keys.
>
> * On 32bit ino archs, kernfs node ID is still 64bit and the cgroup ID.
>   ino is the low 32bits and gen is the high 32bits.  If the high
>   32bits is zero, open_by_handle_at(2) only matches the ino part of
>   the ID allowing userland to resolve inos to cgroups as long as
>   distinguishing recycled inos isn't necessary.
>
> This patchset contains the following 10 patches.
>
>  0001-kernfs-fix-ino-wrap-around-detection.patch
>  0002-writeback-use-ino_t-for-inodes-in-tracepoints.patch
>  0003-netprio-use-css-ID-instead-of-cgroup-ID.patch
>  0004-kernfs-use-dumber-locking-for-kernfs_find_and_get_no.patch
>  0005-kernfs-kernfs_find_and_get_node_by_ino-should-only-l.patch
>  0006-kernfs-convert-kernfs_node-id-from-union-kernfs_node.patch
>  0007-kernfs-combine-ino-id-lookup-functions-into-kernfs_f.patch
>  0008-kernfs-implement-custom-exportfs-ops-and-fid-type.patch
>  0009-kernfs-use-64bit-inos-if-ino_t-is-64bit.patch
>  0010-cgroup-use-cgrp-kn-id-as-the-cgroup-ID.patch
>
> 0001 is a fix which should be backported through -stable.  0002 and
> 0003 are prep patches.  0004-0009 make kernfs_node->id a u64 and use
> it as ino on 64bit ino archs.  0010 replaces cgroup->id with the
> kernfs node ID.
>
> Greg, how do you want to route the patches?  We can route 0001-0009
> through your tree and the last one through cgroup after pulling in.
> I'd be happy to route them all too.
>
> This patchset is also available in the following git branch.
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-unified-cgid
>
> diffstat follows.  Thanks.

Thanks a lot for doing this!  I've tested it with my perf cgroup patchset
based on top of this series and it looks good.  You can add my

Tested-by: Namhyung Kim <namhyung@kernel.org>

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs
  2019-11-06 13:50 ` [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Namhyung Kim
@ 2019-11-07 15:31   ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-07 15:31 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Greg Kroah-Hartman, kernel-team, linux-kernel, cgroups, Li Zefan,
	Johannes Weiner, Alexei Starovoitov, Daniel Borkmann

On Wed, Nov 06, 2019 at 10:50:15PM +0900, Namhyung Kim wrote:
> Thanks a lot for doing this!  I've tested it with my perf cgroup patchset
> based on top of this series and it looks good.  You can add my
> 
> Tested-by: Namhyung Kim <namhyung@kernel.org>

Thanks for testing, sorry about the delay!

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
                   ` (10 preceding siblings ...)
  2019-11-06 13:50 ` [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Namhyung Kim
@ 2019-11-12 15:51 ` Tejun Heo
  2019-11-12 16:08   ` Greg KH
  2019-11-12 16:09 ` Greg KH
  12 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2019-11-12 15:51 UTC (permalink / raw)
  To: gregkh
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel

Hello, Greg.

A gentle ping.  I can route these through cgroup/for-5.5 if you're
okay with the patch series.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs
  2019-11-12 15:51 ` Tejun Heo
@ 2019-11-12 16:08   ` Greg KH
  0 siblings, 0 replies; 19+ messages in thread
From: Greg KH @ 2019-11-12 16:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel

On Tue, Nov 12, 2019 at 07:51:14AM -0800, Tejun Heo wrote:
> Hello, Greg.
> 
> A gentle ping.  I can route these through cgroup/for-5.5 if you're
> okay with the patch series.

Oops, let me go review...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs
  2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
                   ` (11 preceding siblings ...)
  2019-11-12 15:51 ` Tejun Heo
@ 2019-11-12 16:09 ` Greg KH
  2019-11-12 16:19   ` Tejun Heo
  12 siblings, 1 reply; 19+ messages in thread
From: Greg KH @ 2019-11-12 16:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel

On Mon, Nov 04, 2019 at 03:59:34PM -0800, Tejun Heo wrote:
> Hello,
> 
> Currently, there are three IDs which are being used to identify a
> cgroup.
> 
> 1. cgroup->id
> 2. cgroupfs 32bit ino
> 3. cgroupfs 32bit ino + 32bit gen
> 
> All three IDs are visible to userland through different interfaces.
> This is very confusing and #1 can't even be resolved to cgroups from
> userland.
> 
> A 64bit number is sufficient to identify a cgroup instance uniquely
> and ino_t is 64bit on all archs except for alpha.  There's no reason
> for three different IDs at all.  This patchset updates kernfs so that
> it supports 64bit ino and associated exportfs operations and unifies
> the cgroup IDs.
> 
> * On 64bit ino archs, ino is kernfs node ID which is also the cgroup
>   ID.  The ino can be passed directly into open_by_handle_at(2) w/ the
>   new key type FILEID_KERNFS.  Backward compatibility is maintained
>   for FILEID_INO32_GEN keys.
> 
> * On 32bit ino archs, kernfs node ID is still 64bit and the cgroup ID.
>   ino is the low 32bits and gen is the high 32bits.  If the high
>   32bits is zero, open_by_handle_at(2) only matches the ino part of
>   the ID allowing userland to resolve inos to cgroups as long as
>   distinguishing recycled inos isn't necessary.
> 
> This patchset contains the following 10 patches.
> 
>  0001-kernfs-fix-ino-wrap-around-detection.patch
>  0002-writeback-use-ino_t-for-inodes-in-tracepoints.patch
>  0003-netprio-use-css-ID-instead-of-cgroup-ID.patch
>  0004-kernfs-use-dumber-locking-for-kernfs_find_and_get_no.patch
>  0005-kernfs-kernfs_find_and_get_node_by_ino-should-only-l.patch
>  0006-kernfs-convert-kernfs_node-id-from-union-kernfs_node.patch
>  0007-kernfs-combine-ino-id-lookup-functions-into-kernfs_f.patch
>  0008-kernfs-implement-custom-exportfs-ops-and-fid-type.patch
>  0009-kernfs-use-64bit-inos-if-ino_t-is-64bit.patch
>  0010-cgroup-use-cgrp-kn-id-as-the-cgroup-ID.patch
> 
> 0001 is a fix which should be backported through -stable.  0002 and
> 0003 are prep patches.  0004-0009 make kernfs_node->id a u64 and use
> it as ino on 64bit ino archs.  0010 replaces cgroup->id with the
> kernfs node ID.
> 
> Greg, how do you want to route the patches?  We can route 0001-0009
> through your tree and the last one through cgroup after pulling in.
> I'd be happy to route them all too.

Sorry for the delay.  Feel free to take all of these through your tree,
that probably is easiest:

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs
  2019-11-12 16:09 ` Greg KH
@ 2019-11-12 16:19   ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2019-11-12 16:19 UTC (permalink / raw)
  To: Greg KH
  Cc: kernel-team, linux-kernel, cgroups, lizefan, hannes, namhyung,
	ast, daniel

Hello, Greg.

On Tue, Nov 12, 2019 at 05:09:33PM +0100, Greg KH wrote:
> Sorry for the delay.  Feel free to take all of these through your tree,
> that probably is easiest:
> 
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Thanks a lot for the review!  Applying 1-10 to cgroup/for-5.5.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-11-12 16:19 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-04 23:59 [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Tejun Heo
2019-11-04 23:59 ` [PATCH 01/10] kernfs: fix ino wrap-around detection Tejun Heo
2019-11-04 23:59 ` [PATCH 02/10] writeback: use ino_t for inodes in tracepoints Tejun Heo
2019-11-05  9:11   ` Jan Kara
2019-11-04 23:59 ` [PATCH 03/10] netprio: use css ID instead of cgroup ID Tejun Heo
2019-11-05 13:27   ` Neil Horman
2019-11-04 23:59 ` [PATCH 04/10] kernfs: use dumber locking for kernfs_find_and_get_node_by_ino() Tejun Heo
2019-11-04 23:59 ` [PATCH 05/10] kernfs: kernfs_find_and_get_node_by_ino() should only look up activated nodes Tejun Heo
2019-11-04 23:59 ` [PATCH 06/10] kernfs: convert kernfs_node->id from union kernfs_node_id to u64 Tejun Heo
2019-11-04 23:59 ` [PATCH 07/10] kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id() Tejun Heo
2019-11-04 23:59 ` [PATCH 08/10] kernfs: implement custom exportfs ops and fid type Tejun Heo
2019-11-04 23:59 ` [PATCH 09/10] kernfs: use 64bit inos if ino_t is 64bit Tejun Heo
2019-11-04 23:59 ` [PATCH 10/10] cgroup: use cgrp->kn->id as the cgroup ID Tejun Heo
2019-11-06 13:50 ` [PATCHSET cgroup/for-5.5] kernfs,cgroup: support 64bit inos and unify cgroup IDs Namhyung Kim
2019-11-07 15:31   ` Tejun Heo
2019-11-12 15:51 ` Tejun Heo
2019-11-12 16:08   ` Greg KH
2019-11-12 16:09 ` Greg KH
2019-11-12 16:19   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).