linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET block/for-4.3] writeback: cgroup writeback updates
@ 2015-07-07 15:10 Tejun Heo
  2015-07-07 15:10 ` [PATCH 1/5] writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg Tejun Heo
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Tejun Heo @ 2015-07-07 15:10 UTC (permalink / raw)
  To: axboe; +Cc: jack, linux-kernel, cgroups, kernel-team

Hello,

This patchset contains the following updates to cgroup writeback
support

* bdi_writeback iteration fix which could lead to some wb's being
  skipped or repeated during e.g. sync under memory pressure.

* Simplification of wb work wait mechanism.

* Writeback tracepoints updated to report cgroup.

and is consisted of the following five patches.

 0001-writeback-bdi_for_each_wb-iteration-is-memcg-ID-base.patch
 0002-writeback-remove-wb_writeback_work-single_wait-done.patch
 0003-writeback-explain-why-inode-is-allowed-to-be-NULL-fo.patch
 0004-kernfs-implement-kernfs_path_len.patch
 0005-writeback-update-writeback-tracepoints-to-report-cgr.patch

The patchset is on top of d770e558e219 ("Linux 4.2-rc1") and available
in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-writeback-updates-for-4.3

diffstat follows.  Thanks.

 fs/fs-writeback.c                |  139 +++++++++---------------------
 fs/kernfs/dir.c                  |   23 ++++
 include/linux/backing-dev.h      |   24 ++---
 include/linux/kernfs.h           |    4 
 include/trace/events/writeback.h |  180 ++++++++++++++++++++++++++++++---------
 mm/page-writeback.c              |    6 -
 6 files changed, 226 insertions(+), 150 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg
  2015-07-07 15:10 [PATCHSET block/for-4.3] writeback: cgroup writeback updates Tejun Heo
@ 2015-07-07 15:10 ` Tejun Heo
  2015-07-07 15:10 ` [PATCH 2/5] writeback: remove wb_writeback_work->single_wait/done Tejun Heo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2015-07-07 15:10 UTC (permalink / raw)
  To: axboe; +Cc: jack, linux-kernel, cgroups, kernel-team, Tejun Heo

wb's (bdi_writeback's) are currently keyed by memcg ID; however, in an
earlier implementation, wb's were keyed by blkcg ID.
bdi_for_each_wb() walks bdi->cgwb_tree in the ascending ID order and
allows iterations to start from an arbitrary ID which is used to
interrupt and resume iterations.

Unfortunately, while changing wb to be keyed by memcg ID instead of
blkcg, bdi_for_each_wb() was missed and is still assuming that wb's
are keyed by blkcg ID.  This doesn't affect iterations which don't get
interrupted but bdi_split_work_to_wbs() makes use of iteration
resuming on allocation failures and thus may incorrectly skip or
repeat wb's.

Fix it by changing bdi_for_each_wb() to take memcg IDs instead of
blkcg IDs and updating bdi_split_work_to_wbs() accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/fs-writeback.c           |  6 +++---
 include/linux/backing-dev.h | 24 ++++++++++++------------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f0520bc..6079922 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -838,7 +838,7 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
 				  bool skip_if_busy)
 {
 	long nr_pages = base_work->nr_pages;
-	int next_blkcg_id = 0;
+	int next_memcg_id = 0;
 	struct bdi_writeback *wb;
 	struct wb_iter iter;
 
@@ -848,14 +848,14 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
 		return;
 restart:
 	rcu_read_lock();
-	bdi_for_each_wb(wb, bdi, &iter, next_blkcg_id) {
+	bdi_for_each_wb(wb, bdi, &iter, next_memcg_id) {
 		if (!wb_has_dirty_io(wb) ||
 		    (skip_if_busy && writeback_in_progress(wb)))
 			continue;
 
 		base_work->nr_pages = wb_split_bdi_pages(wb, nr_pages);
 		if (!wb_clone_and_queue_work(wb, base_work)) {
-			next_blkcg_id = wb->blkcg_css->id + 1;
+			next_memcg_id = wb->memcg_css->id + 1;
 			rcu_read_unlock();
 			wb_wait_for_single_work(bdi, base_work);
 			goto restart;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 0fe9df9..23ebb94 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -402,7 +402,7 @@ static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked)
 }
 
 struct wb_iter {
-	int			start_blkcg_id;
+	int			start_memcg_id;
 	struct radix_tree_iter	tree_iter;
 	void			**slot;
 };
@@ -414,9 +414,9 @@ static inline struct bdi_writeback *__wb_iter_next(struct wb_iter *iter,
 
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	if (iter->start_blkcg_id >= 0) {
-		iter->slot = radix_tree_iter_init(titer, iter->start_blkcg_id);
-		iter->start_blkcg_id = -1;
+	if (iter->start_memcg_id >= 0) {
+		iter->slot = radix_tree_iter_init(titer, iter->start_memcg_id);
+		iter->start_memcg_id = -1;
 	} else {
 		iter->slot = radix_tree_next_slot(iter->slot, titer, 0);
 	}
@@ -430,30 +430,30 @@ static inline struct bdi_writeback *__wb_iter_next(struct wb_iter *iter,
 
 static inline struct bdi_writeback *__wb_iter_init(struct wb_iter *iter,
 						   struct backing_dev_info *bdi,
-						   int start_blkcg_id)
+						   int start_memcg_id)
 {
-	iter->start_blkcg_id = start_blkcg_id;
+	iter->start_memcg_id = start_memcg_id;
 
-	if (start_blkcg_id)
+	if (start_memcg_id)
 		return __wb_iter_next(iter, bdi);
 	else
 		return &bdi->wb;
 }
 
 /**
- * bdi_for_each_wb - walk all wb's of a bdi in ascending blkcg ID order
+ * bdi_for_each_wb - walk all wb's of a bdi in ascending memcg ID order
  * @wb_cur: cursor struct bdi_writeback pointer
  * @bdi: bdi to walk wb's of
  * @iter: pointer to struct wb_iter to be used as iteration buffer
- * @start_blkcg_id: blkcg ID to start iteration from
+ * @start_memcg_id: memcg ID to start iteration from
  *
  * Iterate @wb_cur through the wb's (bdi_writeback's) of @bdi in ascending
- * blkcg ID order starting from @start_blkcg_id.  @iter is struct wb_iter
+ * memcg ID order starting from @start_memcg_id.  @iter is struct wb_iter
  * to be used as temp storage during iteration.  rcu_read_lock() must be
  * held throughout iteration.
  */
-#define bdi_for_each_wb(wb_cur, bdi, iter, start_blkcg_id)		\
-	for ((wb_cur) = __wb_iter_init(iter, bdi, start_blkcg_id);	\
+#define bdi_for_each_wb(wb_cur, bdi, iter, start_memcg_id)		\
+	for ((wb_cur) = __wb_iter_init(iter, bdi, start_memcg_id);	\
 	     (wb_cur); (wb_cur) = __wb_iter_next(iter, bdi))
 
 #else	/* CONFIG_CGROUP_WRITEBACK */
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] writeback: remove wb_writeback_work->single_wait/done
  2015-07-07 15:10 [PATCHSET block/for-4.3] writeback: cgroup writeback updates Tejun Heo
  2015-07-07 15:10 ` [PATCH 1/5] writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg Tejun Heo
@ 2015-07-07 15:10 ` Tejun Heo
  2015-07-07 15:10 ` [PATCH 3/5] writeback: explain why @inode is allowed to be NULL for inode_congested() Tejun Heo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2015-07-07 15:10 UTC (permalink / raw)
  To: axboe; +Cc: jack, linux-kernel, cgroups, kernel-team, Tejun Heo

wb_writeback_work->single_wait/done are used for the wait mechanism
for synchronous wb_work (wb_writeback_work) items which are issued
when bdi_split_work_to_wbs() fails to allocate memory for asynchronous
wb_work items; however, there's no reason to use a separate wait
mechanism for this.  bdi_split_work_to_wbs() can simply use on-stack
fallback wb_work item and separate wb_completion to wait for it.

This patch removes wb_work->single_wait/done and the related code and
make bdi_split_work_to_wbs() use on-stack fallback wb_work and
wb_completion instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 116 ++++++++++++++----------------------------------------
 1 file changed, 30 insertions(+), 86 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6079922..f15148c 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -53,8 +53,6 @@ struct wb_writeback_work {
 	unsigned int for_background:1;
 	unsigned int for_sync:1;	/* sync(2) WB_SYNC_ALL writeback */
 	unsigned int auto_free:1;	/* free on completion */
-	unsigned int single_wait:1;
-	unsigned int single_done:1;
 	enum wb_reason reason;		/* why was writeback initiated? */
 
 	struct list_head list;		/* pending work list */
@@ -181,11 +179,8 @@ static void wb_queue_work(struct bdi_writeback *wb,
 	trace_writeback_queue(wb->bdi, work);
 
 	spin_lock_bh(&wb->work_lock);
-	if (!test_bit(WB_registered, &wb->state)) {
-		if (work->single_wait)
-			work->single_done = 1;
+	if (!test_bit(WB_registered, &wb->state))
 		goto out_unlock;
-	}
 	if (work->done)
 		atomic_inc(&work->done->cnt);
 	list_add_tail(&work->list, &wb->work_list);
@@ -737,32 +732,6 @@ int inode_congested(struct inode *inode, int cong_bits)
 EXPORT_SYMBOL_GPL(inode_congested);
 
 /**
- * wb_wait_for_single_work - wait for completion of a single bdi_writeback_work
- * @bdi: bdi the work item was issued to
- * @work: work item to wait for
- *
- * Wait for the completion of @work which was issued to one of @bdi's
- * bdi_writeback's.  The caller must have set @work->single_wait before
- * issuing it.  This wait operates independently fo
- * wb_wait_for_completion() and also disables automatic freeing of @work.
- */
-static void wb_wait_for_single_work(struct backing_dev_info *bdi,
-				    struct wb_writeback_work *work)
-{
-	if (WARN_ON_ONCE(!work->single_wait))
-		return;
-
-	wait_event(bdi->wb_waitq, work->single_done);
-
-	/*
-	 * Paired with smp_wmb() in wb_do_writeback() and ensures that all
-	 * modifications to @work prior to assertion of ->single_done is
-	 * visible to the caller once this function returns.
-	 */
-	smp_rmb();
-}
-
-/**
  * wb_split_bdi_pages - split nr_pages to write according to bandwidth
  * @wb: target bdi_writeback to split @nr_pages to
  * @nr_pages: number of pages to write for the whole bdi
@@ -791,38 +760,6 @@ static long wb_split_bdi_pages(struct bdi_writeback *wb, long nr_pages)
 }
 
 /**
- * wb_clone_and_queue_work - clone a wb_writeback_work and issue it to a wb
- * @wb: target bdi_writeback
- * @base_work: source wb_writeback_work
- *
- * Try to make a clone of @base_work and issue it to @wb.  If cloning
- * succeeds, %true is returned; otherwise, @base_work is issued directly
- * and %false is returned.  In the latter case, the caller is required to
- * wait for @base_work's completion using wb_wait_for_single_work().
- *
- * A clone is auto-freed on completion.  @base_work never is.
- */
-static bool wb_clone_and_queue_work(struct bdi_writeback *wb,
-				    struct wb_writeback_work *base_work)
-{
-	struct wb_writeback_work *work;
-
-	work = kmalloc(sizeof(*work), GFP_ATOMIC);
-	if (work) {
-		*work = *base_work;
-		work->auto_free = 1;
-		work->single_wait = 0;
-	} else {
-		work = base_work;
-		work->auto_free = 0;
-		work->single_wait = 1;
-	}
-	work->single_done = 0;
-	wb_queue_work(wb, work);
-	return work != base_work;
-}
-
-/**
  * bdi_split_work_to_wbs - split a wb_writeback_work to all wb's of a bdi
  * @bdi: target backing_dev_info
  * @base_work: wb_writeback_work to issue
@@ -837,7 +774,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
 				  struct wb_writeback_work *base_work,
 				  bool skip_if_busy)
 {
-	long nr_pages = base_work->nr_pages;
 	int next_memcg_id = 0;
 	struct bdi_writeback *wb;
 	struct wb_iter iter;
@@ -849,17 +785,39 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
 restart:
 	rcu_read_lock();
 	bdi_for_each_wb(wb, bdi, &iter, next_memcg_id) {
+		DEFINE_WB_COMPLETION_ONSTACK(fallback_work_done);
+		struct wb_writeback_work fallback_work;
+		struct wb_writeback_work *work;
+		long nr_pages;
+
 		if (!wb_has_dirty_io(wb) ||
 		    (skip_if_busy && writeback_in_progress(wb)))
 			continue;
 
-		base_work->nr_pages = wb_split_bdi_pages(wb, nr_pages);
-		if (!wb_clone_and_queue_work(wb, base_work)) {
-			next_memcg_id = wb->memcg_css->id + 1;
-			rcu_read_unlock();
-			wb_wait_for_single_work(bdi, base_work);
-			goto restart;
+		nr_pages = wb_split_bdi_pages(wb, base_work->nr_pages);
+
+		work = kmalloc(sizeof(*work), GFP_ATOMIC);
+		if (work) {
+			*work = *base_work;
+			work->nr_pages = nr_pages;
+			work->auto_free = 1;
+			wb_queue_work(wb, work);
+			continue;
 		}
+
+		/* alloc failed, execute synchronously using on-stack fallback */
+		work = &fallback_work;
+		*work = *base_work;
+		work->nr_pages = nr_pages;
+		work->auto_free = 0;
+		work->done = &fallback_work_done;
+
+		wb_queue_work(wb, work);
+
+		next_memcg_id = wb->memcg_css->id + 1;
+		rcu_read_unlock();
+		wb_wait_for_completion(bdi, &fallback_work_done);
+		goto restart;
 	}
 	rcu_read_unlock();
 }
@@ -901,8 +859,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
 	if (bdi_has_dirty_io(bdi) &&
 	    (!skip_if_busy || !writeback_in_progress(&bdi->wb))) {
 		base_work->auto_free = 0;
-		base_work->single_wait = 0;
-		base_work->single_done = 0;
 		wb_queue_work(&bdi->wb, base_work);
 	}
 }
@@ -1793,26 +1749,14 @@ static long wb_do_writeback(struct bdi_writeback *wb)
 	set_bit(WB_writeback_running, &wb->state);
 	while ((work = get_next_work_item(wb)) != NULL) {
 		struct wb_completion *done = work->done;
-		bool need_wake_up = false;
 
 		trace_writeback_exec(wb->bdi, work);
 
 		wrote += wb_writeback(wb, work);
 
-		if (work->single_wait) {
-			WARN_ON_ONCE(work->auto_free);
-			/* paired w/ rmb in wb_wait_for_single_work() */
-			smp_wmb();
-			work->single_done = 1;
-			need_wake_up = true;
-		} else if (work->auto_free) {
+		if (work->auto_free)
 			kfree(work);
-		}
-
 		if (done && atomic_dec_and_test(&done->cnt))
-			need_wake_up = true;
-
-		if (need_wake_up)
 			wake_up_all(&wb->bdi->wb_waitq);
 	}
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] writeback: explain why @inode is allowed to be NULL for inode_congested()
  2015-07-07 15:10 [PATCHSET block/for-4.3] writeback: cgroup writeback updates Tejun Heo
  2015-07-07 15:10 ` [PATCH 1/5] writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg Tejun Heo
  2015-07-07 15:10 ` [PATCH 2/5] writeback: remove wb_writeback_work->single_wait/done Tejun Heo
@ 2015-07-07 15:10 ` Tejun Heo
  2015-07-07 15:10 ` [PATCH 4/5] kernfs: implement kernfs_path_len() Tejun Heo
  2015-07-07 15:10 ` [PATCH 5/5] writeback: update writeback tracepoints to report cgroup Tejun Heo
  4 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2015-07-07 15:10 UTC (permalink / raw)
  To: axboe; +Cc: jack, linux-kernel, cgroups, kernel-team, Tejun Heo

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f15148c..b5022c3 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -700,7 +700,7 @@ void wbc_account_io(struct writeback_control *wbc, struct page *page,
 
 /**
  * inode_congested - test whether an inode is congested
- * @inode: inode to test for congestion
+ * @inode: inode to test for congestion (may be NULL)
  * @cong_bits: mask of WB_[a]sync_congested bits to test
  *
  * Tests whether @inode is congested.  @cong_bits is the mask of congestion
@@ -710,6 +710,9 @@ void wbc_account_io(struct writeback_control *wbc, struct page *page,
  * determined by whether the cgwb (cgroup bdi_writeback) for the blkcg
  * associated with @inode is congested; otherwise, the root wb's congestion
  * state is used.
+ *
+ * @inode is allowed to be NULL as this function is often called on
+ * mapping->host which is NULL for the swapper space.
  */
 int inode_congested(struct inode *inode, int cong_bits)
 {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] kernfs: implement kernfs_path_len()
  2015-07-07 15:10 [PATCHSET block/for-4.3] writeback: cgroup writeback updates Tejun Heo
                   ` (2 preceding siblings ...)
  2015-07-07 15:10 ` [PATCH 3/5] writeback: explain why @inode is allowed to be NULL for inode_congested() Tejun Heo
@ 2015-07-07 15:10 ` Tejun Heo
  2015-07-07 15:12   ` Tejun Heo
       [not found]   ` <559CDA8B.6040909@siteground.com>
  2015-07-07 15:10 ` [PATCH 5/5] writeback: update writeback tracepoints to report cgroup Tejun Heo
  4 siblings, 2 replies; 9+ messages in thread
From: Tejun Heo @ 2015-07-07 15:10 UTC (permalink / raw)
  To: axboe
  Cc: jack, linux-kernel, cgroups, kernel-team, Tejun Heo, Greg Kroah-Hartman

Add a function to determine the path length of a kernfs node.  This
for now will be used by writeback tracepoint updates.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/kernfs/dir.c        | 23 +++++++++++++++++++++++
 include/linux/kernfs.h |  4 ++++
 2 files changed, 27 insertions(+)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 2d48d28..91e0045 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -92,6 +92,29 @@ int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 }
 
 /**
+ * kernfs_path_len - determine the length of the full path of a given node
+ * @kn: kernfs_node of interest
+ *
+ * The returned length doesn't include the space for the terminating '\0'.
+ */
+size_t kernfs_path_len(struct kernfs_node *kn)
+{
+	size_t len = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&kernfs_rename_lock, flags);
+
+	do {
+		len += strlen(kn->name) + 1;
+		kn = kn->parent;
+	} while (kn && kn->parent);
+
+	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
+
+	return len;
+}
+
+/**
  * kernfs_path - build full path of a given node
  * @kn: kernfs_node of interest
  * @buf: buffer to copy @kn's name into
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 123be25..5d4e9c4 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -266,6 +266,7 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 }
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
+size_t kernfs_path_len(struct kernfs_node *kn);
 char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
 				size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
@@ -332,6 +333,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 { return -ENOSYS; }
 
+static inline size_t kernfs_path_len(struct kernfs_node *kn)
+{ return 0; }
+
 static inline char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
 					      size_t buflen)
 { return NULL; }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] writeback: update writeback tracepoints to report cgroup
  2015-07-07 15:10 [PATCHSET block/for-4.3] writeback: cgroup writeback updates Tejun Heo
                   ` (3 preceding siblings ...)
  2015-07-07 15:10 ` [PATCH 4/5] kernfs: implement kernfs_path_len() Tejun Heo
@ 2015-07-07 15:10 ` Tejun Heo
  4 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2015-07-07 15:10 UTC (permalink / raw)
  To: axboe; +Cc: jack, linux-kernel, cgroups, kernel-team, Tejun Heo

The following tracepoints are updated to report the cgroup used during
cgroup writeback.

* writeback_write_inode[_start]
* writeback_queue
* writeback_exec
* writeback_start
* writeback_written
* writeback_wait
* writeback_nowork
* writeback_wake_background
* wbc_writepage
* writeback_queue_io
* bdi_dirty_ratelimit
* balance_dirty_pages
* writeback_sb_inodes_requeue
* writeback_single_inode[_start]

Note that writeback_bdi_register is separated out from writeback_class
as reporting cgroup doesn't make sense to it.  Tracepoints which take
bdi are updated to take bdi_writeback instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c                |  14 +--
 include/trace/events/writeback.h | 180 ++++++++++++++++++++++++++++++---------
 mm/page-writeback.c              |   6 +-
 3 files changed, 151 insertions(+), 49 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index b5022c3..3fa5aa3 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -176,7 +176,7 @@ static void wb_wakeup(struct bdi_writeback *wb)
 static void wb_queue_work(struct bdi_writeback *wb,
 			  struct wb_writeback_work *work)
 {
-	trace_writeback_queue(wb->bdi, work);
+	trace_writeback_queue(wb, work);
 
 	spin_lock_bh(&wb->work_lock);
 	if (!test_bit(WB_registered, &wb->state))
@@ -882,7 +882,7 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
 	 */
 	work = kzalloc(sizeof(*work), GFP_ATOMIC);
 	if (!work) {
-		trace_writeback_nowork(wb->bdi);
+		trace_writeback_nowork(wb);
 		wb_wakeup(wb);
 		return;
 	}
@@ -912,7 +912,7 @@ void wb_start_background_writeback(struct bdi_writeback *wb)
 	 * We just wake up the flusher thread. It will perform background
 	 * writeback as soon as there is no other work to do.
 	 */
-	trace_writeback_wake_background(wb->bdi);
+	trace_writeback_wake_background(wb);
 	wb_wakeup(wb);
 }
 
@@ -1615,14 +1615,14 @@ static long wb_writeback(struct bdi_writeback *wb,
 		} else if (work->for_background)
 			oldest_jif = jiffies;
 
-		trace_writeback_start(wb->bdi, work);
+		trace_writeback_start(wb, work);
 		if (list_empty(&wb->b_io))
 			queue_io(wb, work);
 		if (work->sb)
 			progress = writeback_sb_inodes(work->sb, wb, work);
 		else
 			progress = __writeback_inodes_wb(wb, work);
-		trace_writeback_written(wb->bdi, work);
+		trace_writeback_written(wb, work);
 
 		wb_update_bandwidth(wb, wb_start);
 
@@ -1647,7 +1647,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		 * we'll just busyloop.
 		 */
 		if (!list_empty(&wb->b_more_io))  {
-			trace_writeback_wait(wb->bdi, work);
+			trace_writeback_wait(wb, work);
 			inode = wb_inode(wb->b_more_io.prev);
 			spin_lock(&inode->i_lock);
 			spin_unlock(&wb->list_lock);
@@ -1753,7 +1753,7 @@ static long wb_do_writeback(struct bdi_writeback *wb)
 	while ((work = get_next_work_item(wb)) != NULL) {
 		struct wb_completion *done = work->done;
 
-		trace_writeback_exec(wb->bdi, work);
+		trace_writeback_exec(wb, work);
 
 		wrote += wb_writeback(wb, work);
 
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index a7aa607..fff846b 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -131,6 +131,66 @@ DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode,
 	TP_ARGS(inode, flags)
 );
 
+#ifdef CREATE_TRACE_POINTS
+#ifdef CONFIG_CGROUP_WRITEBACK
+
+static inline size_t __trace_wb_cgroup_size(struct bdi_writeback *wb)
+{
+	return kernfs_path_len(wb->memcg_css->cgroup->kn) + 1;
+}
+
+static inline void __trace_wb_assign_cgroup(char *buf, struct bdi_writeback *wb)
+{
+	struct cgroup *cgrp = wb->memcg_css->cgroup;
+	char *path;
+
+	path = cgroup_path(cgrp, buf, kernfs_path_len(cgrp->kn) + 1);
+	WARN_ON_ONCE(path != buf);
+}
+
+static inline size_t __trace_wbc_cgroup_size(struct writeback_control *wbc)
+{
+	if (wbc->wb)
+		return __trace_wb_cgroup_size(wbc->wb);
+	else
+		return 2;
+}
+
+static inline void __trace_wbc_assign_cgroup(char *buf,
+					     struct writeback_control *wbc)
+{
+	if (wbc->wb)
+		__trace_wb_assign_cgroup(buf, wbc->wb);
+	else
+		strcpy(buf, "/");
+}
+
+#else	/* CONFIG_CGROUP_WRITEBACK */
+
+static inline size_t __trace_wb_cgroup_size(struct bdi_writeback *wb)
+{
+	return 2;
+}
+
+static inline void __trace_wb_assign_cgroup(char *buf, struct bdi_writeback *wb)
+{
+	strcpy(buf, "/");
+}
+
+static inline size_t __trace_wbc_cgroup_size(struct writeback_control *wbc)
+{
+	return 2;
+}
+
+static inline void __trace_wbc_assign_cgroup(char *buf,
+					     struct writeback_control *wbc)
+{
+	strcpy(buf, "/");
+}
+
+#endif	/* CONFIG_CGROUP_WRITEBACK */
+#endif	/* CREATE_TRACE_POINTS */
+
 DECLARE_EVENT_CLASS(writeback_write_inode_template,
 
 	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
@@ -141,6 +201,7 @@ DECLARE_EVENT_CLASS(writeback_write_inode_template,
 		__array(char, name, 32)
 		__field(unsigned long, ino)
 		__field(int, sync_mode)
+		__dynamic_array(char, cgroup, __trace_wbc_cgroup_size(wbc))
 	),
 
 	TP_fast_assign(
@@ -148,12 +209,14 @@ DECLARE_EVENT_CLASS(writeback_write_inode_template,
 			dev_name(inode_to_bdi(inode)->dev), 32);
 		__entry->ino		= inode->i_ino;
 		__entry->sync_mode	= wbc->sync_mode;
+		__trace_wbc_assign_cgroup(__get_str(cgroup), wbc);
 	),
 
-	TP_printk("bdi %s: ino=%lu sync_mode=%d",
+	TP_printk("bdi %s: ino=%lu sync_mode=%d cgroup=%s",
 		__entry->name,
 		__entry->ino,
-		__entry->sync_mode
+		__entry->sync_mode,
+		__get_str(cgroup)
 	)
 );
 
@@ -172,8 +235,8 @@ DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode,
 );
 
 DECLARE_EVENT_CLASS(writeback_work_class,
-	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work),
-	TP_ARGS(bdi, work),
+	TP_PROTO(struct bdi_writeback *wb, struct wb_writeback_work *work),
+	TP_ARGS(wb, work),
 	TP_STRUCT__entry(
 		__array(char, name, 32)
 		__field(long, nr_pages)
@@ -183,10 +246,11 @@ DECLARE_EVENT_CLASS(writeback_work_class,
 		__field(int, range_cyclic)
 		__field(int, for_background)
 		__field(int, reason)
+		__dynamic_array(char, cgroup, __trace_wb_cgroup_size(wb))
 	),
 	TP_fast_assign(
 		strncpy(__entry->name,
-			bdi->dev ? dev_name(bdi->dev) : "(unknown)", 32);
+			wb->bdi->dev ? dev_name(wb->bdi->dev) : "(unknown)", 32);
 		__entry->nr_pages = work->nr_pages;
 		__entry->sb_dev = work->sb ? work->sb->s_dev : 0;
 		__entry->sync_mode = work->sync_mode;
@@ -194,9 +258,10 @@ DECLARE_EVENT_CLASS(writeback_work_class,
 		__entry->range_cyclic = work->range_cyclic;
 		__entry->for_background	= work->for_background;
 		__entry->reason = work->reason;
+		__trace_wb_assign_cgroup(__get_str(cgroup), wb);
 	),
 	TP_printk("bdi %s: sb_dev %d:%d nr_pages=%ld sync_mode=%d "
-		  "kupdate=%d range_cyclic=%d background=%d reason=%s",
+		  "kupdate=%d range_cyclic=%d background=%d reason=%s cgroup=%s",
 		  __entry->name,
 		  MAJOR(__entry->sb_dev), MINOR(__entry->sb_dev),
 		  __entry->nr_pages,
@@ -204,13 +269,14 @@ DECLARE_EVENT_CLASS(writeback_work_class,
 		  __entry->for_kupdate,
 		  __entry->range_cyclic,
 		  __entry->for_background,
-		  __print_symbolic(__entry->reason, WB_WORK_REASON)
+		  __print_symbolic(__entry->reason, WB_WORK_REASON),
+		  __get_str(cgroup)
 	)
 );
 #define DEFINE_WRITEBACK_WORK_EVENT(name) \
 DEFINE_EVENT(writeback_work_class, name, \
-	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work), \
-	TP_ARGS(bdi, work))
+	TP_PROTO(struct bdi_writeback *wb, struct wb_writeback_work *work), \
+	TP_ARGS(wb, work))
 DEFINE_WRITEBACK_WORK_EVENT(writeback_queue);
 DEFINE_WRITEBACK_WORK_EVENT(writeback_exec);
 DEFINE_WRITEBACK_WORK_EVENT(writeback_start);
@@ -230,26 +296,42 @@ TRACE_EVENT(writeback_pages_written,
 );
 
 DECLARE_EVENT_CLASS(writeback_class,
-	TP_PROTO(struct backing_dev_info *bdi),
-	TP_ARGS(bdi),
+	TP_PROTO(struct bdi_writeback *wb),
+	TP_ARGS(wb),
 	TP_STRUCT__entry(
 		__array(char, name, 32)
+		__dynamic_array(char, cgroup, __trace_wb_cgroup_size(wb))
 	),
 	TP_fast_assign(
-		strncpy(__entry->name, dev_name(bdi->dev), 32);
+		strncpy(__entry->name, dev_name(wb->bdi->dev), 32);
+		__trace_wb_assign_cgroup(__get_str(cgroup), wb);
 	),
-	TP_printk("bdi %s",
-		  __entry->name
+	TP_printk("bdi %s: cgroup=%s",
+		  __entry->name,
+		  __get_str(cgroup)
 	)
 );
 #define DEFINE_WRITEBACK_EVENT(name) \
 DEFINE_EVENT(writeback_class, name, \
-	TP_PROTO(struct backing_dev_info *bdi), \
-	TP_ARGS(bdi))
+	TP_PROTO(struct bdi_writeback *wb), \
+	TP_ARGS(wb))
 
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
 DEFINE_WRITEBACK_EVENT(writeback_wake_background);
-DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
+
+TRACE_EVENT(writeback_bdi_register,
+	TP_PROTO(struct backing_dev_info *bdi),
+	TP_ARGS(bdi),
+	TP_STRUCT__entry(
+		__array(char, name, 32)
+	),
+	TP_fast_assign(
+		strncpy(__entry->name, dev_name(bdi->dev), 32);
+	),
+	TP_printk("bdi %s",
+		__entry->name
+	)
+);
 
 DECLARE_EVENT_CLASS(wbc_class,
 	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
@@ -265,6 +347,7 @@ DECLARE_EVENT_CLASS(wbc_class,
 		__field(int, range_cyclic)
 		__field(long, range_start)
 		__field(long, range_end)
+		__dynamic_array(char, cgroup, __trace_wbc_cgroup_size(wbc))
 	),
 
 	TP_fast_assign(
@@ -278,11 +361,12 @@ DECLARE_EVENT_CLASS(wbc_class,
 		__entry->range_cyclic	= wbc->range_cyclic;
 		__entry->range_start	= (long)wbc->range_start;
 		__entry->range_end	= (long)wbc->range_end;
+		__trace_wbc_assign_cgroup(__get_str(cgroup), wbc);
 	),
 
 	TP_printk("bdi %s: towrt=%ld skip=%ld mode=%d kupd=%d "
 		"bgrd=%d reclm=%d cyclic=%d "
-		"start=0x%lx end=0x%lx",
+		"start=0x%lx end=0x%lx cgroup=%s",
 		__entry->name,
 		__entry->nr_to_write,
 		__entry->pages_skipped,
@@ -292,7 +376,9 @@ DECLARE_EVENT_CLASS(wbc_class,
 		__entry->for_reclaim,
 		__entry->range_cyclic,
 		__entry->range_start,
-		__entry->range_end)
+		__entry->range_end,
+		__get_str(cgroup)
+	)
 )
 
 #define DEFINE_WBC_EVENT(name) \
@@ -312,6 +398,7 @@ TRACE_EVENT(writeback_queue_io,
 		__field(long,		age)
 		__field(int,		moved)
 		__field(int,		reason)
+		__dynamic_array(char, cgroup, __trace_wb_cgroup_size(wb))
 	),
 	TP_fast_assign(
 		unsigned long *older_than_this = work->older_than_this;
@@ -321,13 +408,15 @@ TRACE_EVENT(writeback_queue_io,
 				  (jiffies - *older_than_this) * 1000 / HZ : -1;
 		__entry->moved	= moved;
 		__entry->reason	= work->reason;
+		__trace_wb_assign_cgroup(__get_str(cgroup), wb);
 	),
-	TP_printk("bdi %s: older=%lu age=%ld enqueue=%d reason=%s",
+	TP_printk("bdi %s: older=%lu age=%ld enqueue=%d reason=%s cgroup=%s",
 		__entry->name,
 		__entry->older,	/* older_than_this in jiffies */
 		__entry->age,	/* older_than_this in relative milliseconds */
 		__entry->moved,
-		__print_symbolic(__entry->reason, WB_WORK_REASON)
+		__print_symbolic(__entry->reason, WB_WORK_REASON),
+		__get_str(cgroup)
 	)
 );
 
@@ -381,11 +470,11 @@ TRACE_EVENT(global_dirty_state,
 
 TRACE_EVENT(bdi_dirty_ratelimit,
 
-	TP_PROTO(struct backing_dev_info *bdi,
+	TP_PROTO(struct bdi_writeback *wb,
 		 unsigned long dirty_rate,
 		 unsigned long task_ratelimit),
 
-	TP_ARGS(bdi, dirty_rate, task_ratelimit),
+	TP_ARGS(wb, dirty_rate, task_ratelimit),
 
 	TP_STRUCT__entry(
 		__array(char,		bdi, 32)
@@ -395,36 +484,39 @@ TRACE_EVENT(bdi_dirty_ratelimit,
 		__field(unsigned long,	dirty_ratelimit)
 		__field(unsigned long,	task_ratelimit)
 		__field(unsigned long,	balanced_dirty_ratelimit)
+		__dynamic_array(char, cgroup, __trace_wb_cgroup_size(wb))
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->bdi, dev_name(bdi->dev), 32);
-		__entry->write_bw	= KBps(bdi->wb.write_bandwidth);
-		__entry->avg_write_bw	= KBps(bdi->wb.avg_write_bandwidth);
+		strlcpy(__entry->bdi, dev_name(wb->bdi->dev), 32);
+		__entry->write_bw	= KBps(wb->write_bandwidth);
+		__entry->avg_write_bw	= KBps(wb->avg_write_bandwidth);
 		__entry->dirty_rate	= KBps(dirty_rate);
-		__entry->dirty_ratelimit = KBps(bdi->wb.dirty_ratelimit);
+		__entry->dirty_ratelimit = KBps(wb->dirty_ratelimit);
 		__entry->task_ratelimit	= KBps(task_ratelimit);
 		__entry->balanced_dirty_ratelimit =
-					KBps(bdi->wb.balanced_dirty_ratelimit);
+					KBps(wb->balanced_dirty_ratelimit);
+		__trace_wb_assign_cgroup(__get_str(cgroup), wb);
 	),
 
 	TP_printk("bdi %s: "
 		  "write_bw=%lu awrite_bw=%lu dirty_rate=%lu "
 		  "dirty_ratelimit=%lu task_ratelimit=%lu "
-		  "balanced_dirty_ratelimit=%lu",
+		  "balanced_dirty_ratelimit=%lu cgroup=%s",
 		  __entry->bdi,
 		  __entry->write_bw,		/* write bandwidth */
 		  __entry->avg_write_bw,	/* avg write bandwidth */
 		  __entry->dirty_rate,		/* bdi dirty rate */
 		  __entry->dirty_ratelimit,	/* base ratelimit */
 		  __entry->task_ratelimit, /* ratelimit with position control */
-		  __entry->balanced_dirty_ratelimit /* the balanced ratelimit */
+		  __entry->balanced_dirty_ratelimit, /* the balanced ratelimit */
+		  __get_str(cgroup)
 	)
 );
 
 TRACE_EVENT(balance_dirty_pages,
 
-	TP_PROTO(struct backing_dev_info *bdi,
+	TP_PROTO(struct bdi_writeback *wb,
 		 unsigned long thresh,
 		 unsigned long bg_thresh,
 		 unsigned long dirty,
@@ -437,7 +529,7 @@ TRACE_EVENT(balance_dirty_pages,
 		 long pause,
 		 unsigned long start_time),
 
-	TP_ARGS(bdi, thresh, bg_thresh, dirty, bdi_thresh, bdi_dirty,
+	TP_ARGS(wb, thresh, bg_thresh, dirty, bdi_thresh, bdi_dirty,
 		dirty_ratelimit, task_ratelimit,
 		dirtied, period, pause, start_time),
 
@@ -456,11 +548,12 @@ TRACE_EVENT(balance_dirty_pages,
 		__field(	 long,	pause)
 		__field(unsigned long,	period)
 		__field(	 long,	think)
+		__dynamic_array(char, cgroup, __trace_wb_cgroup_size(wb))
 	),
 
 	TP_fast_assign(
 		unsigned long freerun = (thresh + bg_thresh) / 2;
-		strlcpy(__entry->bdi, dev_name(bdi->dev), 32);
+		strlcpy(__entry->bdi, dev_name(wb->bdi->dev), 32);
 
 		__entry->limit		= global_wb_domain.dirty_limit;
 		__entry->setpoint	= (global_wb_domain.dirty_limit +
@@ -478,6 +571,7 @@ TRACE_EVENT(balance_dirty_pages,
 		__entry->period		= period * 1000 / HZ;
 		__entry->pause		= pause * 1000 / HZ;
 		__entry->paused		= (jiffies - start_time) * 1000 / HZ;
+		__trace_wb_assign_cgroup(__get_str(cgroup), wb);
 	),
 
 
@@ -486,7 +580,7 @@ TRACE_EVENT(balance_dirty_pages,
 		  "bdi_setpoint=%lu bdi_dirty=%lu "
 		  "dirty_ratelimit=%lu task_ratelimit=%lu "
 		  "dirtied=%u dirtied_pause=%u "
-		  "paused=%lu pause=%ld period=%lu think=%ld",
+		  "paused=%lu pause=%ld period=%lu think=%ld cgroup=%s",
 		  __entry->bdi,
 		  __entry->limit,
 		  __entry->setpoint,
@@ -500,7 +594,8 @@ TRACE_EVENT(balance_dirty_pages,
 		  __entry->paused,	/* ms */
 		  __entry->pause,	/* ms */
 		  __entry->period,	/* ms */
-		  __entry->think	/* ms */
+		  __entry->think,	/* ms */
+		  __get_str(cgroup)
 	  )
 );
 
@@ -514,6 +609,8 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
 		__field(unsigned long, ino)
 		__field(unsigned long, state)
 		__field(unsigned long, dirtied_when)
+		__dynamic_array(char, cgroup,
+				__trace_wb_cgroup_size(inode_to_wb(inode)))
 	),
 
 	TP_fast_assign(
@@ -522,14 +619,16 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
 		__entry->ino		= inode->i_ino;
 		__entry->state		= inode->i_state;
 		__entry->dirtied_when	= inode->dirtied_when;
+		__trace_wb_assign_cgroup(__get_str(cgroup), inode_to_wb(inode));
 	),
 
-	TP_printk("bdi %s: ino=%lu state=%s dirtied_when=%lu age=%lu",
+	TP_printk("bdi %s: ino=%lu state=%s dirtied_when=%lu age=%lu cgroup=%s",
 		  __entry->name,
 		  __entry->ino,
 		  show_inode_state(__entry->state),
 		  __entry->dirtied_when,
-		  (jiffies - __entry->dirtied_when) / HZ
+		  (jiffies - __entry->dirtied_when) / HZ,
+		  __get_str(cgroup)
 	)
 );
 
@@ -585,6 +684,7 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
 		__field(unsigned long, writeback_index)
 		__field(long, nr_to_write)
 		__field(unsigned long, wrote)
+		__dynamic_array(char, cgroup, __trace_wbc_cgroup_size(wbc))
 	),
 
 	TP_fast_assign(
@@ -596,10 +696,11 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
 		__entry->writeback_index = inode->i_mapping->writeback_index;
 		__entry->nr_to_write	= nr_to_write;
 		__entry->wrote		= nr_to_write - wbc->nr_to_write;
+		__trace_wbc_assign_cgroup(__get_str(cgroup), wbc);
 	),
 
 	TP_printk("bdi %s: ino=%lu state=%s dirtied_when=%lu age=%lu "
-		  "index=%lu to_write=%ld wrote=%lu",
+		  "index=%lu to_write=%ld wrote=%lu cgroup=%s",
 		  __entry->name,
 		  __entry->ino,
 		  show_inode_state(__entry->state),
@@ -607,7 +708,8 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
 		  (jiffies - __entry->dirtied_when) / HZ,
 		  __entry->writeback_index,
 		  __entry->nr_to_write,
-		  __entry->wrote
+		  __entry->wrote,
+		  __get_str(cgroup)
 	)
 );
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 22cddd3..802beeb 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1289,7 +1289,7 @@ static void wb_update_dirty_ratelimit(struct dirty_throttle_control *dtc,
 	wb->dirty_ratelimit = max(dirty_ratelimit, 1UL);
 	wb->balanced_dirty_ratelimit = balanced_dirty_ratelimit;
 
-	trace_bdi_dirty_ratelimit(wb->bdi, dirty_rate, task_ratelimit);
+	trace_bdi_dirty_ratelimit(wb, dirty_rate, task_ratelimit);
 }
 
 static void __wb_update_bandwidth(struct dirty_throttle_control *gdtc,
@@ -1683,7 +1683,7 @@ static void balance_dirty_pages(struct address_space *mapping,
 		 * do a reset, as it may be a light dirtier.
 		 */
 		if (pause < min_pause) {
-			trace_balance_dirty_pages(bdi,
+			trace_balance_dirty_pages(wb,
 						  sdtc->thresh,
 						  sdtc->bg_thresh,
 						  sdtc->dirty,
@@ -1712,7 +1712,7 @@ static void balance_dirty_pages(struct address_space *mapping,
 		}
 
 pause:
-		trace_balance_dirty_pages(bdi,
+		trace_balance_dirty_pages(wb,
 					  sdtc->thresh,
 					  sdtc->bg_thresh,
 					  sdtc->dirty,
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/5] kernfs: implement kernfs_path_len()
  2015-07-07 15:10 ` [PATCH 4/5] kernfs: implement kernfs_path_len() Tejun Heo
@ 2015-07-07 15:12   ` Tejun Heo
  2015-07-07 22:39     ` Greg Kroah-Hartman
       [not found]   ` <559CDA8B.6040909@siteground.com>
  1 sibling, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2015-07-07 15:12 UTC (permalink / raw)
  To: axboe; +Cc: jack, linux-kernel, cgroups, kernel-team, Greg Kroah-Hartman

On Tue, Jul 07, 2015 at 11:10:22AM -0400, Tejun Heo wrote:
> Add a function to determine the path length of a kernfs node.  This
> for now will be used by writeback tracepoint updates.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Greg, would it be okay to route this together with writeback changes?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/5] kernfs: implement kernfs_path_len()
  2015-07-07 15:12   ` Tejun Heo
@ 2015-07-07 22:39     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 9+ messages in thread
From: Greg Kroah-Hartman @ 2015-07-07 22:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, jack, linux-kernel, cgroups, kernel-team

On Tue, Jul 07, 2015 at 11:12:18AM -0400, Tejun Heo wrote:
> On Tue, Jul 07, 2015 at 11:10:22AM -0400, Tejun Heo wrote:
> > Add a function to determine the path length of a kernfs node.  This
> > for now will be used by writeback tracepoint updates.
> > 
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> Greg, would it be okay to route this together with writeback changes?

No objection from me at all:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/5] kernfs: implement kernfs_path_len()
       [not found]   ` <559CDA8B.6040909@siteground.com>
@ 2015-07-09 21:41     ` Tejun Heo
  0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2015-07-09 21:41 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Greg Kroah-Hartman, axboe, jack, linux-kernel, cgroups, kernel-team

Hello, Nioklay.

I restored the cc list.  Please use reply-to-all.

On Wed, Jul 08, 2015 at 11:08:43AM +0300, Nikolay Borisov wrote:
> > +size_t kernfs_path_len(struct kernfs_node *kn)
> > +{
> > +	size_t len = 0;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&kernfs_rename_lock, flags);
> > +
> > +	do {
> > +		len += strlen(kn->name) + 1;
> > +		kn = kn->parent;
> > +	} while (kn && kn->parent);
> > +
> > +	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
> > +
> > +	return len;
> > +}
> > +
> 
> Can you explain the reason why you need to disable the irqs while
> executing this function? Presumably it has to do with the context of its
> usage - tracepoints but I wasn't able to find any information about the
> implications of interrupts being enabled while in a trace point?

It doesn't have much to do with the specific usage.
kernfs_rename_lock is irq-safe because we want to be able to call
functions like kernfs_name() and kernfs_path() regardless of the
current context.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-07-09 21:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-07 15:10 [PATCHSET block/for-4.3] writeback: cgroup writeback updates Tejun Heo
2015-07-07 15:10 ` [PATCH 1/5] writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg Tejun Heo
2015-07-07 15:10 ` [PATCH 2/5] writeback: remove wb_writeback_work->single_wait/done Tejun Heo
2015-07-07 15:10 ` [PATCH 3/5] writeback: explain why @inode is allowed to be NULL for inode_congested() Tejun Heo
2015-07-07 15:10 ` [PATCH 4/5] kernfs: implement kernfs_path_len() Tejun Heo
2015-07-07 15:12   ` Tejun Heo
2015-07-07 22:39     ` Greg Kroah-Hartman
     [not found]   ` <559CDA8B.6040909@siteground.com>
2015-07-09 21:41     ` Tejun Heo
2015-07-07 15:10 ` [PATCH 5/5] writeback: update writeback tracepoints to report cgroup Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).