All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4][V3] metadata throttling in writeback patches
@ 2016-09-20 20:57 ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

This is the latest set of patches based on my conversations with Jan and
Johannes.  The biggest change has been changing the metadata accounting counters
to be in bytes intead of pages in order to better support varying blocksizes.
I've also stopped messing with the other pagecache related counters so we can
keep them truly separate.  Johannes suggested this change and I simply convert
the bytes counter to pages when calculating dirty limits and such.

The other big change is changing WB_WRITTEN/WB_DIRTIED to be in bytes instead of
pages as well.  This is just a name and accounting change, it doesn't really
change the core logic at all.

I'm sending this out ahead of my full battery of tests, but I want to get
feedback on this direction as soon as possible.  In the meantime I've changed my
btrfs specific patches to work with these patches and am running long running
tests now to verify everything still works.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/4][V3] metadata throttling in writeback patches
@ 2016-09-20 20:57 ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

This is the latest set of patches based on my conversations with Jan and
Johannes.  The biggest change has been changing the metadata accounting counters
to be in bytes intead of pages in order to better support varying blocksizes.
I've also stopped messing with the other pagecache related counters so we can
keep them truly separate.  Johannes suggested this change and I simply convert
the bytes counter to pages when calculating dirty limits and such.

The other big change is changing WB_WRITTEN/WB_DIRTIED to be in bytes instead of
pages as well.  This is just a name and accounting change, it doesn't really
change the core logic at all.

I'm sending this out ahead of my full battery of tests, but I want to get
feedback on this direction as soon as possible.  In the meantime I've changed my
btrfs specific patches to work with these patches and am running long running
tests now to verify everything still works.  Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/4][V3] metadata throttling in writeback patches
@ 2016-09-20 20:57 ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

This is the latest set of patches based on my conversations with Jan and
Johannes.  The biggest change has been changing the metadata accounting counters
to be in bytes intead of pages in order to better support varying blocksizes.
I've also stopped messing with the other pagecache related counters so we can
keep them truly separate.  Johannes suggested this change and I simply convert
the bytes counter to pages when calculating dirty limits and such.

The other big change is changing WB_WRITTEN/WB_DIRTIED to be in bytes instead of
pages as well.  This is just a name and accounting change, it doesn't really
change the core logic at all.

I'm sending this out ahead of my full battery of tests, but I want to get
feedback on this direction as soon as possible.  In the meantime I've changed my
btrfs specific patches to work with these patches and am running long running
tests now to verify everything still works.  Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/4] remove mapping from balance_dirty_pages*()
  2016-09-20 20:57 ` Josef Bacik
  (?)
@ 2016-09-20 20:57   ` Josef Bacik
  -1 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

The only reason we pass in the mapping is to get the inode in order to see if
writeback cgroups is enabled, and even then it only checks the bdi and a super
block flag.  balance_dirty_pages() doesn't even use the mapping.  Since
balance_dirty_pages*() works on a bdi level, just pass in the bdi and super
block directly so we can avoid using mapping.  This will allow us to still use
balance_dirty_pages for dirty metadata pages that are not backed by an
address_mapping.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 drivers/mtd/devices/block2mtd.c | 12 ++++++++----
 fs/btrfs/disk-io.c              |  4 ++--
 fs/btrfs/file.c                 |  3 ++-
 fs/btrfs/ioctl.c                |  3 ++-
 fs/btrfs/relocation.c           |  3 ++-
 fs/buffer.c                     |  3 ++-
 fs/iomap.c                      |  3 ++-
 fs/ntfs/attrib.c                | 10 +++++++---
 fs/ntfs/file.c                  |  4 ++--
 include/linux/backing-dev.h     | 29 +++++++++++++++++++++++------
 include/linux/writeback.h       |  3 ++-
 mm/filemap.c                    |  4 +++-
 mm/memory.c                     |  9 +++++++--
 mm/page-writeback.c             | 15 +++++++--------
 14 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index 7c887f1..7892d0b 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -52,7 +52,8 @@ static struct page *page_read(struct address_space *mapping, int index)
 /* erase a specified part of the device */
 static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 {
-	struct address_space *mapping = dev->blkdev->bd_inode->i_mapping;
+	struct inode *inode = dev->blkdev->bd_inode;
+	struct address_space *mapping = inode->i_mapping;
 	struct page *page;
 	int index = to >> PAGE_SHIFT;	// page index
 	int pages = len >> PAGE_SHIFT;
@@ -71,7 +72,8 @@ static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 				memset(page_address(page), 0xff, PAGE_SIZE);
 				set_page_dirty(page);
 				unlock_page(page);
-				balance_dirty_pages_ratelimited(mapping);
+				balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+								inode->i_sb);
 				break;
 			}
 
@@ -141,7 +143,8 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
 		loff_t to, size_t len, size_t *retlen)
 {
 	struct page *page;
-	struct address_space *mapping = dev->blkdev->bd_inode->i_mapping;
+	struct inode *inode = dev->blkdev->bd_inode;
+	struct address_space *mapping = inode->i_mapping;
 	int index = to >> PAGE_SHIFT;	// page index
 	int offset = to & ~PAGE_MASK;	// page offset
 	int cpylen;
@@ -162,7 +165,8 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
 			memcpy(page_address(page) + offset, buf, cpylen);
 			set_page_dirty(page);
 			unlock_page(page);
-			balance_dirty_pages_ratelimited(mapping);
+			balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+							inode->i_sb);
 		}
 		put_page(page);
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 87dad55..4034ad6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4024,8 +4024,8 @@ static void __btrfs_btree_balance_dirty(struct btrfs_root *root,
 	ret = percpu_counter_compare(&root->fs_info->dirty_metadata_bytes,
 				     BTRFS_DIRTY_METADATA_THRESH);
 	if (ret > 0) {
-		balance_dirty_pages_ratelimited(
-				   root->fs_info->btree_inode->i_mapping);
+		balance_dirty_pages_ratelimited(&root->fs_info->bdi,
+						root->fs_info->sb);
 	}
 }
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9404121..f060b08 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1686,7 +1686,8 @@ again:
 
 		cond_resched();
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		if (dirty_pages < (root->nodesize >> PAGE_SHIFT) + 1)
 			btrfs_btree_balance_dirty(root);
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 14ed1e9..a222bad 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1410,7 +1410,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
 		}
 
 		defrag_count += ret;
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		inode_unlock(inode);
 
 		if (newer_than) {
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index b26a5ae..6e194a5 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3202,7 +3202,8 @@ static int relocate_file_extent_cluster(struct inode *inode,
 		put_page(page);
 
 		index++;
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		btrfs_throttle(BTRFS_I(inode)->root);
 	}
 	WARN_ON(nr != cluster->nr);
diff --git a/fs/buffer.c b/fs/buffer.c
index 9c8eb9b..9bbe30d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2386,7 +2386,8 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
 		BUG_ON(err != len);
 		err = 0;
 
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 
 		if (unlikely(fatal_signal_pending(current))) {
 			err = -EINTR;
diff --git a/fs/iomap.c b/fs/iomap.c
index 48141b8..937e266 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -226,7 +226,8 @@ again:
 		written += copied;
 		length -= copied;
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 	} while (iov_iter_count(i) && length);
 
 	return written ? written : status;
diff --git a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
index 44a39a0..0a8a39e 100644
--- a/fs/ntfs/attrib.c
+++ b/fs/ntfs/attrib.c
@@ -2493,6 +2493,7 @@ conv_err_out:
 int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 {
 	ntfs_volume *vol = ni->vol;
+	struct inode *inode = VFS_I(ni);
 	struct address_space *mapping;
 	struct page *page;
 	u8 *kaddr;
@@ -2545,7 +2546,8 @@ int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 		kunmap_atomic(kaddr);
 		set_page_dirty(page);
 		put_page(page);
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		cond_resched();
 		if (idx == end)
 			goto done;
@@ -2586,7 +2588,8 @@ int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 		/* Finally unlock and release the page. */
 		unlock_page(page);
 		put_page(page);
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		cond_resched();
 	}
 	/* If there is a last partial page, need to do it the slow way. */
@@ -2603,7 +2606,8 @@ int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 		kunmap_atomic(kaddr);
 		set_page_dirty(page);
 		put_page(page);
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		cond_resched();
 	}
 done:
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index f548629..66082eb 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -276,7 +276,7 @@ do_non_resident_extend:
 		 * number of pages we read and make dirty in the case of sparse
 		 * files.
 		 */
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(vi), vi->i_sb);
 		cond_resched();
 	} while (++index < end_index);
 	read_lock_irqsave(&ni->size_lock, flags);
@@ -1914,7 +1914,7 @@ again:
 		iov_iter_advance(i, copied);
 		pos += copied;
 		written += copied;
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(vi), vi->i_sb);
 		if (fatal_signal_pending(current)) {
 			status = -EINTR;
 			break;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 491a917..089acf6 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -252,8 +252,9 @@ void wb_blkcg_offline(struct blkcg *blkcg);
 int inode_congested(struct inode *inode, int cong_bits);
 
 /**
- * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
- * @inode: inode of interest
+ * bdi_cgwb_enabled - test wether cgroup writeback is enabled on a filesystem
+ * @bdi: the bdi we care about
+ * @sb: the super for the bdi
  *
  * cgroup writeback requires support from both the bdi and filesystem.
  * Also, both memcg and iocg have to be on the default hierarchy.  Test
@@ -262,15 +263,25 @@ int inode_congested(struct inode *inode, int cong_bits);
  * Note that the test result may change dynamically on the same inode
  * depending on how memcg and iocg are configured.
  */
-static inline bool inode_cgwb_enabled(struct inode *inode)
+static inline bool bdi_cgwb_enabled(struct backing_dev_info *bdi,
+				    struct super_block *sb)
 {
-	struct backing_dev_info *bdi = inode_to_bdi(inode);
-
 	return cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
 		cgroup_subsys_on_dfl(io_cgrp_subsys) &&
 		bdi_cap_account_dirty(bdi) &&
 		(bdi->capabilities & BDI_CAP_CGROUP_WRITEBACK) &&
-		(inode->i_sb->s_iflags & SB_I_CGROUPWB);
+		(sb->s_iflags & SB_I_CGROUPWB);
+}
+
+/**
+ * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
+ * @inode: inode of interest
+ *
+ * Does the inode have cgroup writeback support.
+ */
+static inline bool inode_cgwb_enabled(struct inode *inode)
+{
+	return bdi_cgwb_enabled(inode_to_bdi(inode), inode->i_sb);
 }
 
 /**
@@ -413,6 +424,12 @@ static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked)
 
 #else	/* CONFIG_CGROUP_WRITEBACK */
 
+static inline bool bdi_cgwb_enabled(struct backing_dev_info *bdi,
+				    struct super_block *sb)
+{
+	return false;
+}
+
 static inline bool inode_cgwb_enabled(struct inode *inode)
 {
 	return false;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index fc1e16c..256ffc3 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -364,7 +364,8 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh);
 
 void wb_update_bandwidth(struct bdi_writeback *wb, unsigned long start_time);
 void page_writeback_init(void);
-void balance_dirty_pages_ratelimited(struct address_space *mapping);
+void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
+				     struct super_block *sb);
 bool wb_over_bg_thresh(struct bdi_writeback *wb);
 
 typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc,
diff --git a/mm/filemap.c b/mm/filemap.c
index 3083ded..abb0e98 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2667,6 +2667,7 @@ ssize_t generic_perform_write(struct file *file,
 				struct iov_iter *i, loff_t pos)
 {
 	struct address_space *mapping = file->f_mapping;
+	struct inode *inode = mapping->host;
 	const struct address_space_operations *a_ops = mapping->a_ops;
 	long status = 0;
 	ssize_t written = 0;
@@ -2746,7 +2747,8 @@ again:
 		pos += copied;
 		written += copied;
 
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 	} while (iov_iter_count(i));
 
 	return written ? written : status;
diff --git a/mm/memory.c b/mm/memory.c
index 83be99d..d43e73b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -64,6 +64,7 @@
 #include <linux/debugfs.h>
 #include <linux/userfaultfd_k.h>
 #include <linux/dax.h>
+#include <linux/backing-dev.h>
 
 #include <asm/io.h>
 #include <asm/mmu_context.h>
@@ -2105,11 +2106,13 @@ static inline int wp_page_reuse(struct fault_env *fe, pte_t orig_pte,
 		put_page(page);
 
 		if ((dirtied || page_mkwrite) && mapping) {
+			struct inode *inode = mapping->host;
 			/*
 			 * Some device drivers do not set page.mapping
 			 * but still dirty their pages
 			 */
-			balance_dirty_pages_ratelimited(mapping);
+			balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+							inode->i_sb);
 		}
 
 		if (!page_mkwrite)
@@ -3291,11 +3294,13 @@ static int do_shared_fault(struct fault_env *fe, pgoff_t pgoff)
 	mapping = page_rmapping(fault_page);
 	unlock_page(fault_page);
 	if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
+		struct inode *inode = mapping->host;
 		/*
 		 * Some device drivers do not set page.mapping but still
 		 * dirty their pages
 		 */
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 	}
 
 	if (!vma->vm_ops->page_mkwrite)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index f4cd7d8..121a6e3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1559,8 +1559,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
  * If we're over `background_thresh' then the writeback threads are woken to
  * perform some writeout.
  */
-static void balance_dirty_pages(struct address_space *mapping,
-				struct bdi_writeback *wb,
+static void balance_dirty_pages(struct bdi_writeback *wb,
 				unsigned long pages_dirtied)
 {
 	struct dirty_throttle_control gdtc_stor = { GDTC_INIT(wb) };
@@ -1849,7 +1848,8 @@ DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0;
 
 /**
  * balance_dirty_pages_ratelimited - balance dirty memory state
- * @mapping: address_space which was dirtied
+ * @bdi: the bdi that was dirtied
+ * @sb: the super block that was dirtied
  *
  * Processes which are dirtying memory should call in here once for each page
  * which was newly dirtied.  The function will periodically check the system's
@@ -1860,10 +1860,9 @@ DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0;
  * limit we decrease the ratelimiting by a lot, to prevent individual processes
  * from overshooting the limit by (ratelimit_pages) each.
  */
-void balance_dirty_pages_ratelimited(struct address_space *mapping)
+void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
+				     struct super_block *sb)
 {
-	struct inode *inode = mapping->host;
-	struct backing_dev_info *bdi = inode_to_bdi(inode);
 	struct bdi_writeback *wb = NULL;
 	int ratelimit;
 	int *p;
@@ -1871,7 +1870,7 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
 	if (!bdi_cap_account_dirty(bdi))
 		return;
 
-	if (inode_cgwb_enabled(inode))
+	if (bdi_cgwb_enabled(bdi, sb))
 		wb = wb_get_create_current(bdi, GFP_KERNEL);
 	if (!wb)
 		wb = &bdi->wb;
@@ -1909,7 +1908,7 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
 	preempt_enable();
 
 	if (unlikely(current->nr_dirtied >= ratelimit))
-		balance_dirty_pages(mapping, wb, current->nr_dirtied);
+		balance_dirty_pages(wb, current->nr_dirtied);
 
 	wb_put(wb);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 1/4] remove mapping from balance_dirty_pages*()
@ 2016-09-20 20:57   ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

The only reason we pass in the mapping is to get the inode in order to see if
writeback cgroups is enabled, and even then it only checks the bdi and a super
block flag.  balance_dirty_pages() doesn't even use the mapping.  Since
balance_dirty_pages*() works on a bdi level, just pass in the bdi and super
block directly so we can avoid using mapping.  This will allow us to still use
balance_dirty_pages for dirty metadata pages that are not backed by an
address_mapping.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 drivers/mtd/devices/block2mtd.c | 12 ++++++++----
 fs/btrfs/disk-io.c              |  4 ++--
 fs/btrfs/file.c                 |  3 ++-
 fs/btrfs/ioctl.c                |  3 ++-
 fs/btrfs/relocation.c           |  3 ++-
 fs/buffer.c                     |  3 ++-
 fs/iomap.c                      |  3 ++-
 fs/ntfs/attrib.c                | 10 +++++++---
 fs/ntfs/file.c                  |  4 ++--
 include/linux/backing-dev.h     | 29 +++++++++++++++++++++++------
 include/linux/writeback.h       |  3 ++-
 mm/filemap.c                    |  4 +++-
 mm/memory.c                     |  9 +++++++--
 mm/page-writeback.c             | 15 +++++++--------
 14 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index 7c887f1..7892d0b 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -52,7 +52,8 @@ static struct page *page_read(struct address_space *mapping, int index)
 /* erase a specified part of the device */
 static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 {
-	struct address_space *mapping = dev->blkdev->bd_inode->i_mapping;
+	struct inode *inode = dev->blkdev->bd_inode;
+	struct address_space *mapping = inode->i_mapping;
 	struct page *page;
 	int index = to >> PAGE_SHIFT;	// page index
 	int pages = len >> PAGE_SHIFT;
@@ -71,7 +72,8 @@ static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 				memset(page_address(page), 0xff, PAGE_SIZE);
 				set_page_dirty(page);
 				unlock_page(page);
-				balance_dirty_pages_ratelimited(mapping);
+				balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+								inode->i_sb);
 				break;
 			}
 
@@ -141,7 +143,8 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
 		loff_t to, size_t len, size_t *retlen)
 {
 	struct page *page;
-	struct address_space *mapping = dev->blkdev->bd_inode->i_mapping;
+	struct inode *inode = dev->blkdev->bd_inode;
+	struct address_space *mapping = inode->i_mapping;
 	int index = to >> PAGE_SHIFT;	// page index
 	int offset = to & ~PAGE_MASK;	// page offset
 	int cpylen;
@@ -162,7 +165,8 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
 			memcpy(page_address(page) + offset, buf, cpylen);
 			set_page_dirty(page);
 			unlock_page(page);
-			balance_dirty_pages_ratelimited(mapping);
+			balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+							inode->i_sb);
 		}
 		put_page(page);
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 87dad55..4034ad6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4024,8 +4024,8 @@ static void __btrfs_btree_balance_dirty(struct btrfs_root *root,
 	ret = percpu_counter_compare(&root->fs_info->dirty_metadata_bytes,
 				     BTRFS_DIRTY_METADATA_THRESH);
 	if (ret > 0) {
-		balance_dirty_pages_ratelimited(
-				   root->fs_info->btree_inode->i_mapping);
+		balance_dirty_pages_ratelimited(&root->fs_info->bdi,
+						root->fs_info->sb);
 	}
 }
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9404121..f060b08 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1686,7 +1686,8 @@ again:
 
 		cond_resched();
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		if (dirty_pages < (root->nodesize >> PAGE_SHIFT) + 1)
 			btrfs_btree_balance_dirty(root);
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 14ed1e9..a222bad 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1410,7 +1410,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
 		}
 
 		defrag_count += ret;
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		inode_unlock(inode);
 
 		if (newer_than) {
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index b26a5ae..6e194a5 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3202,7 +3202,8 @@ static int relocate_file_extent_cluster(struct inode *inode,
 		put_page(page);
 
 		index++;
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		btrfs_throttle(BTRFS_I(inode)->root);
 	}
 	WARN_ON(nr != cluster->nr);
diff --git a/fs/buffer.c b/fs/buffer.c
index 9c8eb9b..9bbe30d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2386,7 +2386,8 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
 		BUG_ON(err != len);
 		err = 0;
 
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 
 		if (unlikely(fatal_signal_pending(current))) {
 			err = -EINTR;
diff --git a/fs/iomap.c b/fs/iomap.c
index 48141b8..937e266 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -226,7 +226,8 @@ again:
 		written += copied;
 		length -= copied;
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 	} while (iov_iter_count(i) && length);
 
 	return written ? written : status;
diff --git a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
index 44a39a0..0a8a39e 100644
--- a/fs/ntfs/attrib.c
+++ b/fs/ntfs/attrib.c
@@ -2493,6 +2493,7 @@ conv_err_out:
 int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 {
 	ntfs_volume *vol = ni->vol;
+	struct inode *inode = VFS_I(ni);
 	struct address_space *mapping;
 	struct page *page;
 	u8 *kaddr;
@@ -2545,7 +2546,8 @@ int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 		kunmap_atomic(kaddr);
 		set_page_dirty(page);
 		put_page(page);
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		cond_resched();
 		if (idx == end)
 			goto done;
@@ -2586,7 +2588,8 @@ int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 		/* Finally unlock and release the page. */
 		unlock_page(page);
 		put_page(page);
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		cond_resched();
 	}
 	/* If there is a last partial page, need to do it the slow way. */
@@ -2603,7 +2606,8 @@ int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 		kunmap_atomic(kaddr);
 		set_page_dirty(page);
 		put_page(page);
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		cond_resched();
 	}
 done:
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index f548629..66082eb 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -276,7 +276,7 @@ do_non_resident_extend:
 		 * number of pages we read and make dirty in the case of sparse
 		 * files.
 		 */
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(vi), vi->i_sb);
 		cond_resched();
 	} while (++index < end_index);
 	read_lock_irqsave(&ni->size_lock, flags);
@@ -1914,7 +1914,7 @@ again:
 		iov_iter_advance(i, copied);
 		pos += copied;
 		written += copied;
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(vi), vi->i_sb);
 		if (fatal_signal_pending(current)) {
 			status = -EINTR;
 			break;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 491a917..089acf6 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -252,8 +252,9 @@ void wb_blkcg_offline(struct blkcg *blkcg);
 int inode_congested(struct inode *inode, int cong_bits);
 
 /**
- * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
- * @inode: inode of interest
+ * bdi_cgwb_enabled - test wether cgroup writeback is enabled on a filesystem
+ * @bdi: the bdi we care about
+ * @sb: the super for the bdi
  *
  * cgroup writeback requires support from both the bdi and filesystem.
  * Also, both memcg and iocg have to be on the default hierarchy.  Test
@@ -262,15 +263,25 @@ int inode_congested(struct inode *inode, int cong_bits);
  * Note that the test result may change dynamically on the same inode
  * depending on how memcg and iocg are configured.
  */
-static inline bool inode_cgwb_enabled(struct inode *inode)
+static inline bool bdi_cgwb_enabled(struct backing_dev_info *bdi,
+				    struct super_block *sb)
 {
-	struct backing_dev_info *bdi = inode_to_bdi(inode);
-
 	return cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
 		cgroup_subsys_on_dfl(io_cgrp_subsys) &&
 		bdi_cap_account_dirty(bdi) &&
 		(bdi->capabilities & BDI_CAP_CGROUP_WRITEBACK) &&
-		(inode->i_sb->s_iflags & SB_I_CGROUPWB);
+		(sb->s_iflags & SB_I_CGROUPWB);
+}
+
+/**
+ * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
+ * @inode: inode of interest
+ *
+ * Does the inode have cgroup writeback support.
+ */
+static inline bool inode_cgwb_enabled(struct inode *inode)
+{
+	return bdi_cgwb_enabled(inode_to_bdi(inode), inode->i_sb);
 }
 
 /**
@@ -413,6 +424,12 @@ static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked)
 
 #else	/* CONFIG_CGROUP_WRITEBACK */
 
+static inline bool bdi_cgwb_enabled(struct backing_dev_info *bdi,
+				    struct super_block *sb)
+{
+	return false;
+}
+
 static inline bool inode_cgwb_enabled(struct inode *inode)
 {
 	return false;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index fc1e16c..256ffc3 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -364,7 +364,8 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh);
 
 void wb_update_bandwidth(struct bdi_writeback *wb, unsigned long start_time);
 void page_writeback_init(void);
-void balance_dirty_pages_ratelimited(struct address_space *mapping);
+void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
+				     struct super_block *sb);
 bool wb_over_bg_thresh(struct bdi_writeback *wb);
 
 typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc,
diff --git a/mm/filemap.c b/mm/filemap.c
index 3083ded..abb0e98 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2667,6 +2667,7 @@ ssize_t generic_perform_write(struct file *file,
 				struct iov_iter *i, loff_t pos)
 {
 	struct address_space *mapping = file->f_mapping;
+	struct inode *inode = mapping->host;
 	const struct address_space_operations *a_ops = mapping->a_ops;
 	long status = 0;
 	ssize_t written = 0;
@@ -2746,7 +2747,8 @@ again:
 		pos += copied;
 		written += copied;
 
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 	} while (iov_iter_count(i));
 
 	return written ? written : status;
diff --git a/mm/memory.c b/mm/memory.c
index 83be99d..d43e73b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -64,6 +64,7 @@
 #include <linux/debugfs.h>
 #include <linux/userfaultfd_k.h>
 #include <linux/dax.h>
+#include <linux/backing-dev.h>
 
 #include <asm/io.h>
 #include <asm/mmu_context.h>
@@ -2105,11 +2106,13 @@ static inline int wp_page_reuse(struct fault_env *fe, pte_t orig_pte,
 		put_page(page);
 
 		if ((dirtied || page_mkwrite) && mapping) {
+			struct inode *inode = mapping->host;
 			/*
 			 * Some device drivers do not set page.mapping
 			 * but still dirty their pages
 			 */
-			balance_dirty_pages_ratelimited(mapping);
+			balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+							inode->i_sb);
 		}
 
 		if (!page_mkwrite)
@@ -3291,11 +3294,13 @@ static int do_shared_fault(struct fault_env *fe, pgoff_t pgoff)
 	mapping = page_rmapping(fault_page);
 	unlock_page(fault_page);
 	if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
+		struct inode *inode = mapping->host;
 		/*
 		 * Some device drivers do not set page.mapping but still
 		 * dirty their pages
 		 */
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 	}
 
 	if (!vma->vm_ops->page_mkwrite)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index f4cd7d8..121a6e3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1559,8 +1559,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
  * If we're over `background_thresh' then the writeback threads are woken to
  * perform some writeout.
  */
-static void balance_dirty_pages(struct address_space *mapping,
-				struct bdi_writeback *wb,
+static void balance_dirty_pages(struct bdi_writeback *wb,
 				unsigned long pages_dirtied)
 {
 	struct dirty_throttle_control gdtc_stor = { GDTC_INIT(wb) };
@@ -1849,7 +1848,8 @@ DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0;
 
 /**
  * balance_dirty_pages_ratelimited - balance dirty memory state
- * @mapping: address_space which was dirtied
+ * @bdi: the bdi that was dirtied
+ * @sb: the super block that was dirtied
  *
  * Processes which are dirtying memory should call in here once for each page
  * which was newly dirtied.  The function will periodically check the system's
@@ -1860,10 +1860,9 @@ DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0;
  * limit we decrease the ratelimiting by a lot, to prevent individual processes
  * from overshooting the limit by (ratelimit_pages) each.
  */
-void balance_dirty_pages_ratelimited(struct address_space *mapping)
+void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
+				     struct super_block *sb)
 {
-	struct inode *inode = mapping->host;
-	struct backing_dev_info *bdi = inode_to_bdi(inode);
 	struct bdi_writeback *wb = NULL;
 	int ratelimit;
 	int *p;
@@ -1871,7 +1870,7 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
 	if (!bdi_cap_account_dirty(bdi))
 		return;
 
-	if (inode_cgwb_enabled(inode))
+	if (bdi_cgwb_enabled(bdi, sb))
 		wb = wb_get_create_current(bdi, GFP_KERNEL);
 	if (!wb)
 		wb = &bdi->wb;
@@ -1909,7 +1908,7 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
 	preempt_enable();
 
 	if (unlikely(current->nr_dirtied >= ratelimit))
-		balance_dirty_pages(mapping, wb, current->nr_dirtied);
+		balance_dirty_pages(wb, current->nr_dirtied);
 
 	wb_put(wb);
 }
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 1/4] remove mapping from balance_dirty_pages*()
@ 2016-09-20 20:57   ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

The only reason we pass in the mapping is to get the inode in order to see if
writeback cgroups is enabled, and even then it only checks the bdi and a super
block flag.  balance_dirty_pages() doesn't even use the mapping.  Since
balance_dirty_pages*() works on a bdi level, just pass in the bdi and super
block directly so we can avoid using mapping.  This will allow us to still use
balance_dirty_pages for dirty metadata pages that are not backed by an
address_mapping.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 drivers/mtd/devices/block2mtd.c | 12 ++++++++----
 fs/btrfs/disk-io.c              |  4 ++--
 fs/btrfs/file.c                 |  3 ++-
 fs/btrfs/ioctl.c                |  3 ++-
 fs/btrfs/relocation.c           |  3 ++-
 fs/buffer.c                     |  3 ++-
 fs/iomap.c                      |  3 ++-
 fs/ntfs/attrib.c                | 10 +++++++---
 fs/ntfs/file.c                  |  4 ++--
 include/linux/backing-dev.h     | 29 +++++++++++++++++++++++------
 include/linux/writeback.h       |  3 ++-
 mm/filemap.c                    |  4 +++-
 mm/memory.c                     |  9 +++++++--
 mm/page-writeback.c             | 15 +++++++--------
 14 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index 7c887f1..7892d0b 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -52,7 +52,8 @@ static struct page *page_read(struct address_space *mapping, int index)
 /* erase a specified part of the device */
 static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 {
-	struct address_space *mapping = dev->blkdev->bd_inode->i_mapping;
+	struct inode *inode = dev->blkdev->bd_inode;
+	struct address_space *mapping = inode->i_mapping;
 	struct page *page;
 	int index = to >> PAGE_SHIFT;	// page index
 	int pages = len >> PAGE_SHIFT;
@@ -71,7 +72,8 @@ static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 				memset(page_address(page), 0xff, PAGE_SIZE);
 				set_page_dirty(page);
 				unlock_page(page);
-				balance_dirty_pages_ratelimited(mapping);
+				balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+								inode->i_sb);
 				break;
 			}
 
@@ -141,7 +143,8 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
 		loff_t to, size_t len, size_t *retlen)
 {
 	struct page *page;
-	struct address_space *mapping = dev->blkdev->bd_inode->i_mapping;
+	struct inode *inode = dev->blkdev->bd_inode;
+	struct address_space *mapping = inode->i_mapping;
 	int index = to >> PAGE_SHIFT;	// page index
 	int offset = to & ~PAGE_MASK;	// page offset
 	int cpylen;
@@ -162,7 +165,8 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
 			memcpy(page_address(page) + offset, buf, cpylen);
 			set_page_dirty(page);
 			unlock_page(page);
-			balance_dirty_pages_ratelimited(mapping);
+			balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+							inode->i_sb);
 		}
 		put_page(page);
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 87dad55..4034ad6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4024,8 +4024,8 @@ static void __btrfs_btree_balance_dirty(struct btrfs_root *root,
 	ret = percpu_counter_compare(&root->fs_info->dirty_metadata_bytes,
 				     BTRFS_DIRTY_METADATA_THRESH);
 	if (ret > 0) {
-		balance_dirty_pages_ratelimited(
-				   root->fs_info->btree_inode->i_mapping);
+		balance_dirty_pages_ratelimited(&root->fs_info->bdi,
+						root->fs_info->sb);
 	}
 }
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9404121..f060b08 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1686,7 +1686,8 @@ again:
 
 		cond_resched();
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		if (dirty_pages < (root->nodesize >> PAGE_SHIFT) + 1)
 			btrfs_btree_balance_dirty(root);
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 14ed1e9..a222bad 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1410,7 +1410,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
 		}
 
 		defrag_count += ret;
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		inode_unlock(inode);
 
 		if (newer_than) {
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index b26a5ae..6e194a5 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3202,7 +3202,8 @@ static int relocate_file_extent_cluster(struct inode *inode,
 		put_page(page);
 
 		index++;
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		btrfs_throttle(BTRFS_I(inode)->root);
 	}
 	WARN_ON(nr != cluster->nr);
diff --git a/fs/buffer.c b/fs/buffer.c
index 9c8eb9b..9bbe30d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2386,7 +2386,8 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
 		BUG_ON(err != len);
 		err = 0;
 
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 
 		if (unlikely(fatal_signal_pending(current))) {
 			err = -EINTR;
diff --git a/fs/iomap.c b/fs/iomap.c
index 48141b8..937e266 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -226,7 +226,8 @@ again:
 		written += copied;
 		length -= copied;
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 	} while (iov_iter_count(i) && length);
 
 	return written ? written : status;
diff --git a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
index 44a39a0..0a8a39e 100644
--- a/fs/ntfs/attrib.c
+++ b/fs/ntfs/attrib.c
@@ -2493,6 +2493,7 @@ conv_err_out:
 int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 {
 	ntfs_volume *vol = ni->vol;
+	struct inode *inode = VFS_I(ni);
 	struct address_space *mapping;
 	struct page *page;
 	u8 *kaddr;
@@ -2545,7 +2546,8 @@ int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 		kunmap_atomic(kaddr);
 		set_page_dirty(page);
 		put_page(page);
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		cond_resched();
 		if (idx == end)
 			goto done;
@@ -2586,7 +2588,8 @@ int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 		/* Finally unlock and release the page. */
 		unlock_page(page);
 		put_page(page);
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		cond_resched();
 	}
 	/* If there is a last partial page, need to do it the slow way. */
@@ -2603,7 +2606,8 @@ int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
 		kunmap_atomic(kaddr);
 		set_page_dirty(page);
 		put_page(page);
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 		cond_resched();
 	}
 done:
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index f548629..66082eb 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -276,7 +276,7 @@ do_non_resident_extend:
 		 * number of pages we read and make dirty in the case of sparse
 		 * files.
 		 */
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(vi), vi->i_sb);
 		cond_resched();
 	} while (++index < end_index);
 	read_lock_irqsave(&ni->size_lock, flags);
@@ -1914,7 +1914,7 @@ again:
 		iov_iter_advance(i, copied);
 		pos += copied;
 		written += copied;
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(vi), vi->i_sb);
 		if (fatal_signal_pending(current)) {
 			status = -EINTR;
 			break;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 491a917..089acf6 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -252,8 +252,9 @@ void wb_blkcg_offline(struct blkcg *blkcg);
 int inode_congested(struct inode *inode, int cong_bits);
 
 /**
- * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
- * @inode: inode of interest
+ * bdi_cgwb_enabled - test wether cgroup writeback is enabled on a filesystem
+ * @bdi: the bdi we care about
+ * @sb: the super for the bdi
  *
  * cgroup writeback requires support from both the bdi and filesystem.
  * Also, both memcg and iocg have to be on the default hierarchy.  Test
@@ -262,15 +263,25 @@ int inode_congested(struct inode *inode, int cong_bits);
  * Note that the test result may change dynamically on the same inode
  * depending on how memcg and iocg are configured.
  */
-static inline bool inode_cgwb_enabled(struct inode *inode)
+static inline bool bdi_cgwb_enabled(struct backing_dev_info *bdi,
+				    struct super_block *sb)
 {
-	struct backing_dev_info *bdi = inode_to_bdi(inode);
-
 	return cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
 		cgroup_subsys_on_dfl(io_cgrp_subsys) &&
 		bdi_cap_account_dirty(bdi) &&
 		(bdi->capabilities & BDI_CAP_CGROUP_WRITEBACK) &&
-		(inode->i_sb->s_iflags & SB_I_CGROUPWB);
+		(sb->s_iflags & SB_I_CGROUPWB);
+}
+
+/**
+ * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
+ * @inode: inode of interest
+ *
+ * Does the inode have cgroup writeback support.
+ */
+static inline bool inode_cgwb_enabled(struct inode *inode)
+{
+	return bdi_cgwb_enabled(inode_to_bdi(inode), inode->i_sb);
 }
 
 /**
@@ -413,6 +424,12 @@ static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked)
 
 #else	/* CONFIG_CGROUP_WRITEBACK */
 
+static inline bool bdi_cgwb_enabled(struct backing_dev_info *bdi,
+				    struct super_block *sb)
+{
+	return false;
+}
+
 static inline bool inode_cgwb_enabled(struct inode *inode)
 {
 	return false;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index fc1e16c..256ffc3 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -364,7 +364,8 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh);
 
 void wb_update_bandwidth(struct bdi_writeback *wb, unsigned long start_time);
 void page_writeback_init(void);
-void balance_dirty_pages_ratelimited(struct address_space *mapping);
+void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
+				     struct super_block *sb);
 bool wb_over_bg_thresh(struct bdi_writeback *wb);
 
 typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc,
diff --git a/mm/filemap.c b/mm/filemap.c
index 3083ded..abb0e98 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2667,6 +2667,7 @@ ssize_t generic_perform_write(struct file *file,
 				struct iov_iter *i, loff_t pos)
 {
 	struct address_space *mapping = file->f_mapping;
+	struct inode *inode = mapping->host;
 	const struct address_space_operations *a_ops = mapping->a_ops;
 	long status = 0;
 	ssize_t written = 0;
@@ -2746,7 +2747,8 @@ again:
 		pos += copied;
 		written += copied;
 
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 	} while (iov_iter_count(i));
 
 	return written ? written : status;
diff --git a/mm/memory.c b/mm/memory.c
index 83be99d..d43e73b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -64,6 +64,7 @@
 #include <linux/debugfs.h>
 #include <linux/userfaultfd_k.h>
 #include <linux/dax.h>
+#include <linux/backing-dev.h>
 
 #include <asm/io.h>
 #include <asm/mmu_context.h>
@@ -2105,11 +2106,13 @@ static inline int wp_page_reuse(struct fault_env *fe, pte_t orig_pte,
 		put_page(page);
 
 		if ((dirtied || page_mkwrite) && mapping) {
+			struct inode *inode = mapping->host;
 			/*
 			 * Some device drivers do not set page.mapping
 			 * but still dirty their pages
 			 */
-			balance_dirty_pages_ratelimited(mapping);
+			balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+							inode->i_sb);
 		}
 
 		if (!page_mkwrite)
@@ -3291,11 +3294,13 @@ static int do_shared_fault(struct fault_env *fe, pgoff_t pgoff)
 	mapping = page_rmapping(fault_page);
 	unlock_page(fault_page);
 	if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
+		struct inode *inode = mapping->host;
 		/*
 		 * Some device drivers do not set page.mapping but still
 		 * dirty their pages
 		 */
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited(inode_to_bdi(inode),
+						inode->i_sb);
 	}
 
 	if (!vma->vm_ops->page_mkwrite)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index f4cd7d8..121a6e3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1559,8 +1559,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
  * If we're over `background_thresh' then the writeback threads are woken to
  * perform some writeout.
  */
-static void balance_dirty_pages(struct address_space *mapping,
-				struct bdi_writeback *wb,
+static void balance_dirty_pages(struct bdi_writeback *wb,
 				unsigned long pages_dirtied)
 {
 	struct dirty_throttle_control gdtc_stor = { GDTC_INIT(wb) };
@@ -1849,7 +1848,8 @@ DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0;
 
 /**
  * balance_dirty_pages_ratelimited - balance dirty memory state
- * @mapping: address_space which was dirtied
+ * @bdi: the bdi that was dirtied
+ * @sb: the super block that was dirtied
  *
  * Processes which are dirtying memory should call in here once for each page
  * which was newly dirtied.  The function will periodically check the system's
@@ -1860,10 +1860,9 @@ DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0;
  * limit we decrease the ratelimiting by a lot, to prevent individual processes
  * from overshooting the limit by (ratelimit_pages) each.
  */
-void balance_dirty_pages_ratelimited(struct address_space *mapping)
+void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
+				     struct super_block *sb)
 {
-	struct inode *inode = mapping->host;
-	struct backing_dev_info *bdi = inode_to_bdi(inode);
 	struct bdi_writeback *wb = NULL;
 	int ratelimit;
 	int *p;
@@ -1871,7 +1870,7 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
 	if (!bdi_cap_account_dirty(bdi))
 		return;
 
-	if (inode_cgwb_enabled(inode))
+	if (bdi_cgwb_enabled(bdi, sb))
 		wb = wb_get_create_current(bdi, GFP_KERNEL);
 	if (!wb)
 		wb = &bdi->wb;
@@ -1909,7 +1908,7 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
 	preempt_enable();
 
 	if (unlikely(current->nr_dirtied >= ratelimit))
-		balance_dirty_pages(mapping, wb, current->nr_dirtied);
+		balance_dirty_pages(wb, current->nr_dirtied);
 
 	wb_put(wb);
 }
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/4] writeback: allow for dirty metadata accounting
  2016-09-20 20:57 ` Josef Bacik
  (?)
@ 2016-09-20 20:57   ` Josef Bacik
  -1 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

Btrfs has no bounds except memory on the amount of dirty memory that we have in
use for metadata.  Historically we have used a special inode so we could take
advantage of the balance_dirty_pages throttling that comes with using pagecache.
However as we'd like to support different blocksizes it would be nice to not
have to rely on pagecache, but still get the balance_dirty_pages throttling
without having to do it ourselves.

So introduce *METADATA_DIRTY_BYTES and *METADATA_WRITEBACK_BYTES.  These are
zone and bdi_writeback counters to keep track of how many bytes we have in
flight for METADATA.  We need to count in bytes as blocksizes could be
percentages of pagesize.  We simply convert the bytes to number of pages where
it is needed for the throttling.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 arch/tile/mm/pgtable.c           |   3 +-
 drivers/base/node.c              |   6 ++
 fs/fs-writeback.c                |   2 +
 fs/proc/meminfo.c                |   5 ++
 include/linux/backing-dev-defs.h |   2 +
 include/linux/mm.h               |   9 +++
 include/linux/mmzone.h           |   2 +
 include/trace/events/writeback.h |  13 +++-
 mm/backing-dev.c                 |   5 ++
 mm/page-writeback.c              | 157 +++++++++++++++++++++++++++++++++++----
 mm/page_alloc.c                  |  16 +++-
 mm/vmscan.c                      |   4 +-
 12 files changed, 200 insertions(+), 24 deletions(-)

diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 7cc6ee7..9543468 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
 {
 	struct zone *zone;
 
-	pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
+	pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
 	       (global_node_page_state(NR_ACTIVE_ANON) +
 		global_node_page_state(NR_ACTIVE_FILE)),
 	       (global_node_page_state(NR_INACTIVE_ANON) +
 		global_node_page_state(NR_INACTIVE_FILE)),
 	       global_node_page_state(NR_FILE_DIRTY),
+	       global_node_page_state(NR_METADATA_DIRTY),
 	       global_node_page_state(NR_WRITEBACK),
 	       global_node_page_state(NR_UNSTABLE_NFS),
 	       global_page_state(NR_FREE_PAGES),
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5548f96..3615264 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -51,6 +51,8 @@ static DEVICE_ATTR(cpumap,  S_IRUGO, node_read_cpumask, NULL);
 static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL);
 
 #define K(x) ((x) << (PAGE_SHIFT - 10))
+#define BtoK(x) ((x) >> 10)
+
 static ssize_t node_read_meminfo(struct device *dev,
 			struct device_attribute *attr, char *buf)
 {
@@ -99,7 +101,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
 	n += sprintf(buf + n,
 		       "Node %d Dirty:          %8lu kB\n"
+		       "Node %d MetadataDirty:	%8lu kB\n"
 		       "Node %d Writeback:      %8lu kB\n"
+		       "Node %d MetaWriteback:  %8lu kB\n"
 		       "Node %d FilePages:      %8lu kB\n"
 		       "Node %d Mapped:         %8lu kB\n"
 		       "Node %d AnonPages:      %8lu kB\n"
@@ -119,7 +123,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
 			,
 		       nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
+		       nid, BtoK(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES)),
 		       nid, K(node_page_state(pgdat, NR_WRITEBACK)),
+		       nid, BtoK(node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES)),
 		       nid, K(node_page_state(pgdat, NR_FILE_PAGES)),
 		       nid, K(node_page_state(pgdat, NR_FILE_MAPPED)),
 		       nid, K(node_page_state(pgdat, NR_ANON_MAPPED)),
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 56c8fda..aafdb11 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1801,6 +1801,7 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb)
 	return work;
 }
 
+#define BtoP(x) ((x) >> PAGE_SHIFT)
 /*
  * Add in the number of potentially dirty inodes, because each inode
  * write can dirty pagecache in the underlying blockdev.
@@ -1809,6 +1810,7 @@ static unsigned long get_nr_dirty_pages(void)
 {
 	return global_node_page_state(NR_FILE_DIRTY) +
 		global_node_page_state(NR_UNSTABLE_NFS) +
+		BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES)) +
 		get_nr_dirty_inodes();
 }
 
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 09e18fd..95b0d8a 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -36,6 +36,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
  * display in kilobytes.
  */
 #define K(x) ((x) << (PAGE_SHIFT - 10))
+#define BtoK(x) ((x) >> 10)
 	si_meminfo(&i);
 	si_swapinfo(&i);
 	committed = percpu_counter_read_positive(&vm_committed_as);
@@ -80,7 +81,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		"SwapTotal:      %8lu kB\n"
 		"SwapFree:       %8lu kB\n"
 		"Dirty:          %8lu kB\n"
+		"MetadataDirty:  %8lu kB\n"
 		"Writeback:      %8lu kB\n"
+		"MetaWriteback:  %8lu kB\n"
 		"AnonPages:      %8lu kB\n"
 		"Mapped:         %8lu kB\n"
 		"Shmem:          %8lu kB\n"
@@ -139,7 +142,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		K(i.totalswap),
 		K(i.freeswap),
 		K(global_node_page_state(NR_FILE_DIRTY)),
+		BtoK(global_node_page_state(NR_METADATA_DIRTY_BYTES)),
 		K(global_node_page_state(NR_WRITEBACK)),
+		BtoK(global_node_page_state(NR_METADATA_WRITEBACK_BYTES)),
 		K(global_node_page_state(NR_ANON_MAPPED)),
 		K(global_node_page_state(NR_FILE_MAPPED)),
 		K(i.sharedram),
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 3f10307..1a7c3c1 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -34,6 +34,8 @@ typedef int (congested_fn)(void *, int);
 enum wb_stat_item {
 	WB_RECLAIMABLE,
 	WB_WRITEBACK,
+	WB_METADATA_DIRTY_BYTES,
+	WB_METADATA_WRITEBACK_BYTES,
 	WB_DIRTIED,
 	WB_WRITTEN,
 	NR_WB_STAT_ITEMS
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 08ed53e..6d2e3e8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -31,6 +31,7 @@ struct file_ra_state;
 struct user_struct;
 struct writeback_control;
 struct bdi_writeback;
+struct backing_dev_info;
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES	/* Don't use mapnrs, do it properly */
 extern unsigned long max_mapnr;
@@ -1363,6 +1364,14 @@ int redirty_page_for_writepage(struct writeback_control *wbc,
 void account_page_dirtied(struct page *page, struct address_space *mapping);
 void account_page_cleaned(struct page *page, struct address_space *mapping,
 			  struct bdi_writeback *wb);
+void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
+			      long bytes);
+void account_metadata_cleaned(struct page *page, struct backing_dev_info *bdi,
+			      long bytes);
+void account_metadata_writeback(struct page *page,
+				struct backing_dev_info *bdi, long bytes);
+void account_metadata_end_writeback(struct page *page,
+				    struct backing_dev_info *bdi, long bytes);
 int set_page_dirty(struct page *page);
 int set_page_dirty_lock(struct page *page);
 void cancel_dirty_page(struct page *page);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f2e4e90..5d4c443 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -167,6 +167,8 @@ enum node_stat_item {
 	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
 	NR_DIRTIED,		/* page dirtyings since bootup */
 	NR_WRITTEN,		/* page writings since bootup */
+	NR_METADATA_DIRTY_BYTES,	/* Metadata dirty bytes */
+	NR_METADATA_WRITEBACK_BYTES,	/* Metadata writeback bytes */
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 2ccd9cc..f97c8de 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -390,6 +390,8 @@ TRACE_EVENT(writeback_queue_io,
 	)
 );
 
+#define BtoP(x) ((x) >> PAGE_SHIFT)
+
 TRACE_EVENT(global_dirty_state,
 
 	TP_PROTO(unsigned long background_thresh,
@@ -402,7 +404,9 @@ TRACE_EVENT(global_dirty_state,
 
 	TP_STRUCT__entry(
 		__field(unsigned long,	nr_dirty)
+		__field(unsigned long,	nr_metadata_dirty)
 		__field(unsigned long,	nr_writeback)
+		__field(unsigned long,	nr_metadata_writeback)
 		__field(unsigned long,	nr_unstable)
 		__field(unsigned long,	background_thresh)
 		__field(unsigned long,	dirty_thresh)
@@ -413,7 +417,9 @@ TRACE_EVENT(global_dirty_state,
 
 	TP_fast_assign(
 		__entry->nr_dirty	= global_node_page_state(NR_FILE_DIRTY);
+		__entry->nr_metadata_dirty = BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES));
 		__entry->nr_writeback	= global_node_page_state(NR_WRITEBACK);
+		__entry->nr_metadata_dirty = BtoP(global_node_page_state(NR_METADATA_WRITEBACK_BYTES));
 		__entry->nr_unstable	= global_node_page_state(NR_UNSTABLE_NFS);
 		__entry->nr_dirtied	= global_node_page_state(NR_DIRTIED);
 		__entry->nr_written	= global_node_page_state(NR_WRITTEN);
@@ -424,7 +430,8 @@ TRACE_EVENT(global_dirty_state,
 
 	TP_printk("dirty=%lu writeback=%lu unstable=%lu "
 		  "bg_thresh=%lu thresh=%lu limit=%lu "
-		  "dirtied=%lu written=%lu",
+		  "dirtied=%lu written=%lu metadata_dirty=%lu "
+		  "metadata_writeback=%lu",
 		  __entry->nr_dirty,
 		  __entry->nr_writeback,
 		  __entry->nr_unstable,
@@ -432,7 +439,9 @@ TRACE_EVENT(global_dirty_state,
 		  __entry->dirty_thresh,
 		  __entry->dirty_limit,
 		  __entry->nr_dirtied,
-		  __entry->nr_written
+		  __entry->nr_written,
+		  __entry->nr_metadata_dirty,
+		  __entry->nr_metadata_writeback
 	)
 );
 
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index efe2377..d76f432 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -70,6 +70,7 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 	wb_thresh = wb_calc_thresh(wb, dirty_thresh);
 
 #define K(x) ((x) << (PAGE_SHIFT - 10))
+#define BtoK(x) ((x) >> 10)
 	seq_printf(m,
 		   "BdiWriteback:       %10lu kB\n"
 		   "BdiReclaimable:     %10lu kB\n"
@@ -78,6 +79,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   "BackgroundThresh:   %10lu kB\n"
 		   "BdiDirtied:         %10lu kB\n"
 		   "BdiWritten:         %10lu kB\n"
+		   "BdiMetadataDirty:   %10lu kB\n"
+		   "BdiMetaWriteback:	%10lu kB\n"
 		   "BdiWriteBandwidth:  %10lu kBps\n"
 		   "b_dirty:            %10lu\n"
 		   "b_io:               %10lu\n"
@@ -92,6 +95,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   K(background_thresh),
 		   (unsigned long) K(wb_stat(wb, WB_DIRTIED)),
 		   (unsigned long) K(wb_stat(wb, WB_WRITTEN)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_DIRTY_BYTES)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_WRITEBACK_BYTES)),
 		   (unsigned long) K(wb->write_bandwidth),
 		   nr_dirty,
 		   nr_io,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 121a6e3..423d2f5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -380,6 +380,30 @@ static unsigned long global_dirtyable_memory(void)
 }
 
 /**
+ * global_dirty_memory - the number of globally dirty pages
+ *
+ * Returns the global number of pages that are dirty in pagecache and metadata.
+ */
+static unsigned long global_dirty_memory(void)
+{
+	return global_node_page_state(NR_FILE_DIRTY) +
+		global_node_page_state(NR_UNSTABLE_NFS) +
+		(global_node_page_state(NR_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+}
+
+/**
+ * global_writeback_memory - the number of pages under writeback globally
+ *
+ * Returns the global number of pages under writeback both in pagecache and in
+ * metadata.
+ */
+static unsigned long global_writeback_memory(void)
+{
+	return global_node_page_state(NR_WRITEBACK) +
+		(global_node_page_state(NR_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
+}
+
+/**
  * domain_dirty_limits - calculate thresh and bg_thresh for a wb_domain
  * @dtc: dirty_throttle_control of interest
  *
@@ -506,6 +530,10 @@ bool node_dirty_ok(struct pglist_data *pgdat)
 	nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
 	nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
 	nr_pages += node_page_state(pgdat, NR_WRITEBACK);
+	nr_pages += (node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
+		     PAGE_SHIFT);
+	nr_pages += (node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES) >>
+		     PAGE_SHIFT);
 
 	return nr_pages <= limit;
 }
@@ -1514,7 +1542,7 @@ static long wb_min_pause(struct bdi_writeback *wb,
 static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
 {
 	struct bdi_writeback *wb = dtc->wb;
-	unsigned long wb_reclaimable;
+	unsigned long wb_reclaimable, wb_writeback;
 
 	/*
 	 * wb_thresh is not treated as some limiting factor as
@@ -1544,12 +1572,17 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
 	 * deltas.
 	 */
 	if (dtc->wb_thresh < 2 * wb_stat_error(wb)) {
-		wb_reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE);
-		dtc->wb_dirty = wb_reclaimable + wb_stat_sum(wb, WB_WRITEBACK);
+		wb_reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE) +
+			(wb_stat_sum(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+		wb_writeback = wb_stat_sum(wb, WB_WRITEBACK) +
+			(wb_stat_sum(wb, WB_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
 	} else {
-		wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE);
-		dtc->wb_dirty = wb_reclaimable + wb_stat(wb, WB_WRITEBACK);
+		wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE) +
+			(wb_stat(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+		wb_writeback = wb_stat(wb, WB_WRITEBACK) +
+			(wb_stat(wb, WB_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
 	}
+	dtc->wb_dirty = wb_reclaimable + wb_writeback;
 }
 
 /*
@@ -1594,10 +1627,9 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
 		 * written to the server's write cache, but has not yet
 		 * been flushed to permanent storage.
 		 */
-		nr_reclaimable = global_node_page_state(NR_FILE_DIRTY) +
-					global_node_page_state(NR_UNSTABLE_NFS);
+		nr_reclaimable = global_dirty_memory();
 		gdtc->avail = global_dirtyable_memory();
-		gdtc->dirty = nr_reclaimable + global_node_page_state(NR_WRITEBACK);
+		gdtc->dirty = nr_reclaimable + global_writeback_memory();
 
 		domain_dirty_limits(gdtc);
 
@@ -1928,20 +1960,22 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb)
 	struct dirty_throttle_control * const gdtc = &gdtc_stor;
 	struct dirty_throttle_control * const mdtc = mdtc_valid(&mdtc_stor) ?
 						     &mdtc_stor : NULL;
+	unsigned long wb_reclaimable;
 
 	/*
 	 * Similar to balance_dirty_pages() but ignores pages being written
 	 * as we're trying to decide whether to put more under writeback.
 	 */
 	gdtc->avail = global_dirtyable_memory();
-	gdtc->dirty = global_node_page_state(NR_FILE_DIRTY) +
-		      global_node_page_state(NR_UNSTABLE_NFS);
+	gdtc->dirty = global_dirty_memory();
 	domain_dirty_limits(gdtc);
 
 	if (gdtc->dirty > gdtc->bg_thresh)
 		return true;
 
-	if (wb_stat(wb, WB_RECLAIMABLE) >
+	wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE) +
+		(wb_stat(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+	if (wb_reclaimable >
 	    wb_calc_thresh(gdtc->wb, gdtc->bg_thresh))
 		return true;
 
@@ -1956,7 +1990,7 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb)
 		if (mdtc->dirty > mdtc->bg_thresh)
 			return true;
 
-		if (wb_stat(wb, WB_RECLAIMABLE) >
+		if (wb_reclaimable >
 		    wb_calc_thresh(mdtc->wb, mdtc->bg_thresh))
 			return true;
 	}
@@ -1980,8 +2014,8 @@ void throttle_vm_writeout(gfp_t gfp_mask)
                 dirty_thresh += dirty_thresh / 10;      /* wheeee... */
 
                 if (global_node_page_state(NR_UNSTABLE_NFS) +
-			global_node_page_state(NR_WRITEBACK) <= dirty_thresh)
-                        	break;
+		    global_writeback_memory() <= dirty_thresh)
+			break;
                 congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 		/*
@@ -2008,8 +2042,7 @@ int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
 void laptop_mode_timer_fn(unsigned long data)
 {
 	struct request_queue *q = (struct request_queue *)data;
-	int nr_pages = global_node_page_state(NR_FILE_DIRTY) +
-		global_node_page_state(NR_UNSTABLE_NFS);
+	int nr_pages = global_dirty_memory();
 	struct bdi_writeback *wb;
 
 	/*
@@ -2473,6 +2506,98 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 EXPORT_SYMBOL(account_page_dirtied);
 
 /*
+ * account_metadata_dirtied
+ * @page - the page being dirited
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes being dirtied
+ *
+ * Do the dirty page accounting for metadata pages that aren't backed by an
+ * address_space.
+ */
+void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
+			      long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
+			      bytes);
+	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
+	current->nr_dirtied++;
+	task_io_account_write(bytes);
+	this_cpu_inc(bdp_ratelimits);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_dirtied);
+
+/*
+ * account_metadata_cleaned
+ * @page - the page being cleaned
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes cleaned
+ *
+ * Called on a no longer dirty metadata page.
+ */
+void account_metadata_cleaned(struct page *page, struct backing_dev_info *bdi,
+			      long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
+			      -bytes);
+	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, -bytes);
+	task_io_account_cancelled_write(bytes);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_cleaned);
+
+/*
+ * account_metadata_writeback
+ * @page - the page being marked as writeback
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes we are submitting for writeback
+ *
+ * Called on a metadata page that has been marked writeback.
+ */
+void account_metadata_writeback(struct page *page,
+				struct backing_dev_info *bdi, long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, -bytes);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
+					 -bytes);
+	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, bytes);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
+					 bytes);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_writeback);
+
+/*
+ * account_metadata_end_writeback
+ * @page - the page we are ending writeback on
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes that just ended writeback
+ *
+ * Called on a metadata page that has completed writeback.
+ */
+void account_metadata_end_writeback(struct page *page,
+				    struct backing_dev_info *bdi, long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
+					 -bytes);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_end_writeback);
+
+/*
  * Helper function for deaccounting dirty page without writeback.
  *
  * Caller must hold lock_page_memcg().
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 39a372a..978ae3e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4164,6 +4164,8 @@ out:
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
+#define BtoK(x) ((x) >> 10)
+#define BtoP(x) ((x) >> PAGE_SHIFT)
 
 static void show_migration_types(unsigned char type)
 {
@@ -4218,10 +4220,10 @@ void show_free_areas(unsigned int filter)
 
 	printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"
 		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
-		" unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n"
-		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
-		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
-		" free:%lu free_pcp:%lu free_cma:%lu\n",
+		" unevictable:%lu dirty:%lu metadata_dirty:%lu writeback:%lu\n"
+	        " unstable:%lu metadata_writeback:%lu slab_reclaimable:%lu\n"
+	        " slab_unreclaimable:%lu mapped:%lu shmem:%lu pagetables:%lu\n"
+	        " bounce:%lu free:%lu free_pcp:%lu free_cma:%lu\n",
 		global_node_page_state(NR_ACTIVE_ANON),
 		global_node_page_state(NR_INACTIVE_ANON),
 		global_node_page_state(NR_ISOLATED_ANON),
@@ -4230,8 +4232,10 @@ void show_free_areas(unsigned int filter)
 		global_node_page_state(NR_ISOLATED_FILE),
 		global_node_page_state(NR_UNEVICTABLE),
 		global_node_page_state(NR_FILE_DIRTY),
+		BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES)),
 		global_node_page_state(NR_WRITEBACK),
 		global_node_page_state(NR_UNSTABLE_NFS),
+		BtoP(global_node_page_state(NR_METADATA_WRITEBACK_BYTES)),
 		global_page_state(NR_SLAB_RECLAIMABLE),
 		global_page_state(NR_SLAB_UNRECLAIMABLE),
 		global_node_page_state(NR_FILE_MAPPED),
@@ -4253,7 +4257,9 @@ void show_free_areas(unsigned int filter)
 			" isolated(file):%lukB"
 			" mapped:%lukB"
 			" dirty:%lukB"
+			" metadata_dirty:%lukB"
 			" writeback:%lukB"
+			" metadata_writeback:%lukB"
 			" shmem:%lukB"
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 			" shmem_thp: %lukB"
@@ -4275,7 +4281,9 @@ void show_free_areas(unsigned int filter)
 			K(node_page_state(pgdat, NR_ISOLATED_FILE)),
 			K(node_page_state(pgdat, NR_FILE_MAPPED)),
 			K(node_page_state(pgdat, NR_FILE_DIRTY)),
+			BtoK(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES)),
 			K(node_page_state(pgdat, NR_WRITEBACK)),
+			BtoK(node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES)),
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 			K(node_page_state(pgdat, NR_SHMEM_THPS) * HPAGE_PMD_NR),
 			K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 374d95d..c3be15c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3714,7 +3714,9 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat)
 
 	/* If we can't clean pages, remove dirty pages from consideration */
 	if (!(node_reclaim_mode & RECLAIM_WRITE))
-		delta += node_page_state(pgdat, NR_FILE_DIRTY);
+		delta += node_page_state(pgdat, NR_FILE_DIRTY) +
+			(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
+			 PAGE_SHIFT);
 
 	/* Watch for any possible underflows due to delta */
 	if (unlikely(delta > nr_pagecache_reclaimable))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/4] writeback: allow for dirty metadata accounting
@ 2016-09-20 20:57   ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

Btrfs has no bounds except memory on the amount of dirty memory that we have in
use for metadata.  Historically we have used a special inode so we could take
advantage of the balance_dirty_pages throttling that comes with using pagecache.
However as we'd like to support different blocksizes it would be nice to not
have to rely on pagecache, but still get the balance_dirty_pages throttling
without having to do it ourselves.

So introduce *METADATA_DIRTY_BYTES and *METADATA_WRITEBACK_BYTES.  These are
zone and bdi_writeback counters to keep track of how many bytes we have in
flight for METADATA.  We need to count in bytes as blocksizes could be
percentages of pagesize.  We simply convert the bytes to number of pages where
it is needed for the throttling.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 arch/tile/mm/pgtable.c           |   3 +-
 drivers/base/node.c              |   6 ++
 fs/fs-writeback.c                |   2 +
 fs/proc/meminfo.c                |   5 ++
 include/linux/backing-dev-defs.h |   2 +
 include/linux/mm.h               |   9 +++
 include/linux/mmzone.h           |   2 +
 include/trace/events/writeback.h |  13 +++-
 mm/backing-dev.c                 |   5 ++
 mm/page-writeback.c              | 157 +++++++++++++++++++++++++++++++++++----
 mm/page_alloc.c                  |  16 +++-
 mm/vmscan.c                      |   4 +-
 12 files changed, 200 insertions(+), 24 deletions(-)

diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 7cc6ee7..9543468 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
 {
 	struct zone *zone;
 
-	pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
+	pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
 	       (global_node_page_state(NR_ACTIVE_ANON) +
 		global_node_page_state(NR_ACTIVE_FILE)),
 	       (global_node_page_state(NR_INACTIVE_ANON) +
 		global_node_page_state(NR_INACTIVE_FILE)),
 	       global_node_page_state(NR_FILE_DIRTY),
+	       global_node_page_state(NR_METADATA_DIRTY),
 	       global_node_page_state(NR_WRITEBACK),
 	       global_node_page_state(NR_UNSTABLE_NFS),
 	       global_page_state(NR_FREE_PAGES),
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5548f96..3615264 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -51,6 +51,8 @@ static DEVICE_ATTR(cpumap,  S_IRUGO, node_read_cpumask, NULL);
 static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL);
 
 #define K(x) ((x) << (PAGE_SHIFT - 10))
+#define BtoK(x) ((x) >> 10)
+
 static ssize_t node_read_meminfo(struct device *dev,
 			struct device_attribute *attr, char *buf)
 {
@@ -99,7 +101,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
 	n += sprintf(buf + n,
 		       "Node %d Dirty:          %8lu kB\n"
+		       "Node %d MetadataDirty:	%8lu kB\n"
 		       "Node %d Writeback:      %8lu kB\n"
+		       "Node %d MetaWriteback:  %8lu kB\n"
 		       "Node %d FilePages:      %8lu kB\n"
 		       "Node %d Mapped:         %8lu kB\n"
 		       "Node %d AnonPages:      %8lu kB\n"
@@ -119,7 +123,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
 			,
 		       nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
+		       nid, BtoK(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES)),
 		       nid, K(node_page_state(pgdat, NR_WRITEBACK)),
+		       nid, BtoK(node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES)),
 		       nid, K(node_page_state(pgdat, NR_FILE_PAGES)),
 		       nid, K(node_page_state(pgdat, NR_FILE_MAPPED)),
 		       nid, K(node_page_state(pgdat, NR_ANON_MAPPED)),
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 56c8fda..aafdb11 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1801,6 +1801,7 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb)
 	return work;
 }
 
+#define BtoP(x) ((x) >> PAGE_SHIFT)
 /*
  * Add in the number of potentially dirty inodes, because each inode
  * write can dirty pagecache in the underlying blockdev.
@@ -1809,6 +1810,7 @@ static unsigned long get_nr_dirty_pages(void)
 {
 	return global_node_page_state(NR_FILE_DIRTY) +
 		global_node_page_state(NR_UNSTABLE_NFS) +
+		BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES)) +
 		get_nr_dirty_inodes();
 }
 
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 09e18fd..95b0d8a 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -36,6 +36,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
  * display in kilobytes.
  */
 #define K(x) ((x) << (PAGE_SHIFT - 10))
+#define BtoK(x) ((x) >> 10)
 	si_meminfo(&i);
 	si_swapinfo(&i);
 	committed = percpu_counter_read_positive(&vm_committed_as);
@@ -80,7 +81,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		"SwapTotal:      %8lu kB\n"
 		"SwapFree:       %8lu kB\n"
 		"Dirty:          %8lu kB\n"
+		"MetadataDirty:  %8lu kB\n"
 		"Writeback:      %8lu kB\n"
+		"MetaWriteback:  %8lu kB\n"
 		"AnonPages:      %8lu kB\n"
 		"Mapped:         %8lu kB\n"
 		"Shmem:          %8lu kB\n"
@@ -139,7 +142,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		K(i.totalswap),
 		K(i.freeswap),
 		K(global_node_page_state(NR_FILE_DIRTY)),
+		BtoK(global_node_page_state(NR_METADATA_DIRTY_BYTES)),
 		K(global_node_page_state(NR_WRITEBACK)),
+		BtoK(global_node_page_state(NR_METADATA_WRITEBACK_BYTES)),
 		K(global_node_page_state(NR_ANON_MAPPED)),
 		K(global_node_page_state(NR_FILE_MAPPED)),
 		K(i.sharedram),
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 3f10307..1a7c3c1 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -34,6 +34,8 @@ typedef int (congested_fn)(void *, int);
 enum wb_stat_item {
 	WB_RECLAIMABLE,
 	WB_WRITEBACK,
+	WB_METADATA_DIRTY_BYTES,
+	WB_METADATA_WRITEBACK_BYTES,
 	WB_DIRTIED,
 	WB_WRITTEN,
 	NR_WB_STAT_ITEMS
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 08ed53e..6d2e3e8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -31,6 +31,7 @@ struct file_ra_state;
 struct user_struct;
 struct writeback_control;
 struct bdi_writeback;
+struct backing_dev_info;
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES	/* Don't use mapnrs, do it properly */
 extern unsigned long max_mapnr;
@@ -1363,6 +1364,14 @@ int redirty_page_for_writepage(struct writeback_control *wbc,
 void account_page_dirtied(struct page *page, struct address_space *mapping);
 void account_page_cleaned(struct page *page, struct address_space *mapping,
 			  struct bdi_writeback *wb);
+void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
+			      long bytes);
+void account_metadata_cleaned(struct page *page, struct backing_dev_info *bdi,
+			      long bytes);
+void account_metadata_writeback(struct page *page,
+				struct backing_dev_info *bdi, long bytes);
+void account_metadata_end_writeback(struct page *page,
+				    struct backing_dev_info *bdi, long bytes);
 int set_page_dirty(struct page *page);
 int set_page_dirty_lock(struct page *page);
 void cancel_dirty_page(struct page *page);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f2e4e90..5d4c443 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -167,6 +167,8 @@ enum node_stat_item {
 	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
 	NR_DIRTIED,		/* page dirtyings since bootup */
 	NR_WRITTEN,		/* page writings since bootup */
+	NR_METADATA_DIRTY_BYTES,	/* Metadata dirty bytes */
+	NR_METADATA_WRITEBACK_BYTES,	/* Metadata writeback bytes */
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 2ccd9cc..f97c8de 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -390,6 +390,8 @@ TRACE_EVENT(writeback_queue_io,
 	)
 );
 
+#define BtoP(x) ((x) >> PAGE_SHIFT)
+
 TRACE_EVENT(global_dirty_state,
 
 	TP_PROTO(unsigned long background_thresh,
@@ -402,7 +404,9 @@ TRACE_EVENT(global_dirty_state,
 
 	TP_STRUCT__entry(
 		__field(unsigned long,	nr_dirty)
+		__field(unsigned long,	nr_metadata_dirty)
 		__field(unsigned long,	nr_writeback)
+		__field(unsigned long,	nr_metadata_writeback)
 		__field(unsigned long,	nr_unstable)
 		__field(unsigned long,	background_thresh)
 		__field(unsigned long,	dirty_thresh)
@@ -413,7 +417,9 @@ TRACE_EVENT(global_dirty_state,
 
 	TP_fast_assign(
 		__entry->nr_dirty	= global_node_page_state(NR_FILE_DIRTY);
+		__entry->nr_metadata_dirty = BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES));
 		__entry->nr_writeback	= global_node_page_state(NR_WRITEBACK);
+		__entry->nr_metadata_dirty = BtoP(global_node_page_state(NR_METADATA_WRITEBACK_BYTES));
 		__entry->nr_unstable	= global_node_page_state(NR_UNSTABLE_NFS);
 		__entry->nr_dirtied	= global_node_page_state(NR_DIRTIED);
 		__entry->nr_written	= global_node_page_state(NR_WRITTEN);
@@ -424,7 +430,8 @@ TRACE_EVENT(global_dirty_state,
 
 	TP_printk("dirty=%lu writeback=%lu unstable=%lu "
 		  "bg_thresh=%lu thresh=%lu limit=%lu "
-		  "dirtied=%lu written=%lu",
+		  "dirtied=%lu written=%lu metadata_dirty=%lu "
+		  "metadata_writeback=%lu",
 		  __entry->nr_dirty,
 		  __entry->nr_writeback,
 		  __entry->nr_unstable,
@@ -432,7 +439,9 @@ TRACE_EVENT(global_dirty_state,
 		  __entry->dirty_thresh,
 		  __entry->dirty_limit,
 		  __entry->nr_dirtied,
-		  __entry->nr_written
+		  __entry->nr_written,
+		  __entry->nr_metadata_dirty,
+		  __entry->nr_metadata_writeback
 	)
 );
 
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index efe2377..d76f432 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -70,6 +70,7 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 	wb_thresh = wb_calc_thresh(wb, dirty_thresh);
 
 #define K(x) ((x) << (PAGE_SHIFT - 10))
+#define BtoK(x) ((x) >> 10)
 	seq_printf(m,
 		   "BdiWriteback:       %10lu kB\n"
 		   "BdiReclaimable:     %10lu kB\n"
@@ -78,6 +79,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   "BackgroundThresh:   %10lu kB\n"
 		   "BdiDirtied:         %10lu kB\n"
 		   "BdiWritten:         %10lu kB\n"
+		   "BdiMetadataDirty:   %10lu kB\n"
+		   "BdiMetaWriteback:	%10lu kB\n"
 		   "BdiWriteBandwidth:  %10lu kBps\n"
 		   "b_dirty:            %10lu\n"
 		   "b_io:               %10lu\n"
@@ -92,6 +95,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   K(background_thresh),
 		   (unsigned long) K(wb_stat(wb, WB_DIRTIED)),
 		   (unsigned long) K(wb_stat(wb, WB_WRITTEN)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_DIRTY_BYTES)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_WRITEBACK_BYTES)),
 		   (unsigned long) K(wb->write_bandwidth),
 		   nr_dirty,
 		   nr_io,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 121a6e3..423d2f5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -380,6 +380,30 @@ static unsigned long global_dirtyable_memory(void)
 }
 
 /**
+ * global_dirty_memory - the number of globally dirty pages
+ *
+ * Returns the global number of pages that are dirty in pagecache and metadata.
+ */
+static unsigned long global_dirty_memory(void)
+{
+	return global_node_page_state(NR_FILE_DIRTY) +
+		global_node_page_state(NR_UNSTABLE_NFS) +
+		(global_node_page_state(NR_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+}
+
+/**
+ * global_writeback_memory - the number of pages under writeback globally
+ *
+ * Returns the global number of pages under writeback both in pagecache and in
+ * metadata.
+ */
+static unsigned long global_writeback_memory(void)
+{
+	return global_node_page_state(NR_WRITEBACK) +
+		(global_node_page_state(NR_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
+}
+
+/**
  * domain_dirty_limits - calculate thresh and bg_thresh for a wb_domain
  * @dtc: dirty_throttle_control of interest
  *
@@ -506,6 +530,10 @@ bool node_dirty_ok(struct pglist_data *pgdat)
 	nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
 	nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
 	nr_pages += node_page_state(pgdat, NR_WRITEBACK);
+	nr_pages += (node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
+		     PAGE_SHIFT);
+	nr_pages += (node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES) >>
+		     PAGE_SHIFT);
 
 	return nr_pages <= limit;
 }
@@ -1514,7 +1542,7 @@ static long wb_min_pause(struct bdi_writeback *wb,
 static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
 {
 	struct bdi_writeback *wb = dtc->wb;
-	unsigned long wb_reclaimable;
+	unsigned long wb_reclaimable, wb_writeback;
 
 	/*
 	 * wb_thresh is not treated as some limiting factor as
@@ -1544,12 +1572,17 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
 	 * deltas.
 	 */
 	if (dtc->wb_thresh < 2 * wb_stat_error(wb)) {
-		wb_reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE);
-		dtc->wb_dirty = wb_reclaimable + wb_stat_sum(wb, WB_WRITEBACK);
+		wb_reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE) +
+			(wb_stat_sum(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+		wb_writeback = wb_stat_sum(wb, WB_WRITEBACK) +
+			(wb_stat_sum(wb, WB_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
 	} else {
-		wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE);
-		dtc->wb_dirty = wb_reclaimable + wb_stat(wb, WB_WRITEBACK);
+		wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE) +
+			(wb_stat(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+		wb_writeback = wb_stat(wb, WB_WRITEBACK) +
+			(wb_stat(wb, WB_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
 	}
+	dtc->wb_dirty = wb_reclaimable + wb_writeback;
 }
 
 /*
@@ -1594,10 +1627,9 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
 		 * written to the server's write cache, but has not yet
 		 * been flushed to permanent storage.
 		 */
-		nr_reclaimable = global_node_page_state(NR_FILE_DIRTY) +
-					global_node_page_state(NR_UNSTABLE_NFS);
+		nr_reclaimable = global_dirty_memory();
 		gdtc->avail = global_dirtyable_memory();
-		gdtc->dirty = nr_reclaimable + global_node_page_state(NR_WRITEBACK);
+		gdtc->dirty = nr_reclaimable + global_writeback_memory();
 
 		domain_dirty_limits(gdtc);
 
@@ -1928,20 +1960,22 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb)
 	struct dirty_throttle_control * const gdtc = &gdtc_stor;
 	struct dirty_throttle_control * const mdtc = mdtc_valid(&mdtc_stor) ?
 						     &mdtc_stor : NULL;
+	unsigned long wb_reclaimable;
 
 	/*
 	 * Similar to balance_dirty_pages() but ignores pages being written
 	 * as we're trying to decide whether to put more under writeback.
 	 */
 	gdtc->avail = global_dirtyable_memory();
-	gdtc->dirty = global_node_page_state(NR_FILE_DIRTY) +
-		      global_node_page_state(NR_UNSTABLE_NFS);
+	gdtc->dirty = global_dirty_memory();
 	domain_dirty_limits(gdtc);
 
 	if (gdtc->dirty > gdtc->bg_thresh)
 		return true;
 
-	if (wb_stat(wb, WB_RECLAIMABLE) >
+	wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE) +
+		(wb_stat(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+	if (wb_reclaimable >
 	    wb_calc_thresh(gdtc->wb, gdtc->bg_thresh))
 		return true;
 
@@ -1956,7 +1990,7 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb)
 		if (mdtc->dirty > mdtc->bg_thresh)
 			return true;
 
-		if (wb_stat(wb, WB_RECLAIMABLE) >
+		if (wb_reclaimable >
 		    wb_calc_thresh(mdtc->wb, mdtc->bg_thresh))
 			return true;
 	}
@@ -1980,8 +2014,8 @@ void throttle_vm_writeout(gfp_t gfp_mask)
                 dirty_thresh += dirty_thresh / 10;      /* wheeee... */
 
                 if (global_node_page_state(NR_UNSTABLE_NFS) +
-			global_node_page_state(NR_WRITEBACK) <= dirty_thresh)
-                        	break;
+		    global_writeback_memory() <= dirty_thresh)
+			break;
                 congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 		/*
@@ -2008,8 +2042,7 @@ int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
 void laptop_mode_timer_fn(unsigned long data)
 {
 	struct request_queue *q = (struct request_queue *)data;
-	int nr_pages = global_node_page_state(NR_FILE_DIRTY) +
-		global_node_page_state(NR_UNSTABLE_NFS);
+	int nr_pages = global_dirty_memory();
 	struct bdi_writeback *wb;
 
 	/*
@@ -2473,6 +2506,98 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 EXPORT_SYMBOL(account_page_dirtied);
 
 /*
+ * account_metadata_dirtied
+ * @page - the page being dirited
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes being dirtied
+ *
+ * Do the dirty page accounting for metadata pages that aren't backed by an
+ * address_space.
+ */
+void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
+			      long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
+			      bytes);
+	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
+	current->nr_dirtied++;
+	task_io_account_write(bytes);
+	this_cpu_inc(bdp_ratelimits);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_dirtied);
+
+/*
+ * account_metadata_cleaned
+ * @page - the page being cleaned
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes cleaned
+ *
+ * Called on a no longer dirty metadata page.
+ */
+void account_metadata_cleaned(struct page *page, struct backing_dev_info *bdi,
+			      long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
+			      -bytes);
+	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, -bytes);
+	task_io_account_cancelled_write(bytes);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_cleaned);
+
+/*
+ * account_metadata_writeback
+ * @page - the page being marked as writeback
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes we are submitting for writeback
+ *
+ * Called on a metadata page that has been marked writeback.
+ */
+void account_metadata_writeback(struct page *page,
+				struct backing_dev_info *bdi, long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, -bytes);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
+					 -bytes);
+	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, bytes);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
+					 bytes);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_writeback);
+
+/*
+ * account_metadata_end_writeback
+ * @page - the page we are ending writeback on
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes that just ended writeback
+ *
+ * Called on a metadata page that has completed writeback.
+ */
+void account_metadata_end_writeback(struct page *page,
+				    struct backing_dev_info *bdi, long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
+					 -bytes);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_end_writeback);
+
+/*
  * Helper function for deaccounting dirty page without writeback.
  *
  * Caller must hold lock_page_memcg().
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 39a372a..978ae3e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4164,6 +4164,8 @@ out:
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
+#define BtoK(x) ((x) >> 10)
+#define BtoP(x) ((x) >> PAGE_SHIFT)
 
 static void show_migration_types(unsigned char type)
 {
@@ -4218,10 +4220,10 @@ void show_free_areas(unsigned int filter)
 
 	printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"
 		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
-		" unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n"
-		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
-		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
-		" free:%lu free_pcp:%lu free_cma:%lu\n",
+		" unevictable:%lu dirty:%lu metadata_dirty:%lu writeback:%lu\n"
+	        " unstable:%lu metadata_writeback:%lu slab_reclaimable:%lu\n"
+	        " slab_unreclaimable:%lu mapped:%lu shmem:%lu pagetables:%lu\n"
+	        " bounce:%lu free:%lu free_pcp:%lu free_cma:%lu\n",
 		global_node_page_state(NR_ACTIVE_ANON),
 		global_node_page_state(NR_INACTIVE_ANON),
 		global_node_page_state(NR_ISOLATED_ANON),
@@ -4230,8 +4232,10 @@ void show_free_areas(unsigned int filter)
 		global_node_page_state(NR_ISOLATED_FILE),
 		global_node_page_state(NR_UNEVICTABLE),
 		global_node_page_state(NR_FILE_DIRTY),
+		BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES)),
 		global_node_page_state(NR_WRITEBACK),
 		global_node_page_state(NR_UNSTABLE_NFS),
+		BtoP(global_node_page_state(NR_METADATA_WRITEBACK_BYTES)),
 		global_page_state(NR_SLAB_RECLAIMABLE),
 		global_page_state(NR_SLAB_UNRECLAIMABLE),
 		global_node_page_state(NR_FILE_MAPPED),
@@ -4253,7 +4257,9 @@ void show_free_areas(unsigned int filter)
 			" isolated(file):%lukB"
 			" mapped:%lukB"
 			" dirty:%lukB"
+			" metadata_dirty:%lukB"
 			" writeback:%lukB"
+			" metadata_writeback:%lukB"
 			" shmem:%lukB"
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 			" shmem_thp: %lukB"
@@ -4275,7 +4281,9 @@ void show_free_areas(unsigned int filter)
 			K(node_page_state(pgdat, NR_ISOLATED_FILE)),
 			K(node_page_state(pgdat, NR_FILE_MAPPED)),
 			K(node_page_state(pgdat, NR_FILE_DIRTY)),
+			BtoK(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES)),
 			K(node_page_state(pgdat, NR_WRITEBACK)),
+			BtoK(node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES)),
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 			K(node_page_state(pgdat, NR_SHMEM_THPS) * HPAGE_PMD_NR),
 			K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 374d95d..c3be15c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3714,7 +3714,9 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat)
 
 	/* If we can't clean pages, remove dirty pages from consideration */
 	if (!(node_reclaim_mode & RECLAIM_WRITE))
-		delta += node_page_state(pgdat, NR_FILE_DIRTY);
+		delta += node_page_state(pgdat, NR_FILE_DIRTY) +
+			(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
+			 PAGE_SHIFT);
 
 	/* Watch for any possible underflows due to delta */
 	if (unlikely(delta > nr_pagecache_reclaimable))
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/4] writeback: allow for dirty metadata accounting
@ 2016-09-20 20:57   ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

Btrfs has no bounds except memory on the amount of dirty memory that we have in
use for metadata.  Historically we have used a special inode so we could take
advantage of the balance_dirty_pages throttling that comes with using pagecache.
However as we'd like to support different blocksizes it would be nice to not
have to rely on pagecache, but still get the balance_dirty_pages throttling
without having to do it ourselves.

So introduce *METADATA_DIRTY_BYTES and *METADATA_WRITEBACK_BYTES.  These are
zone and bdi_writeback counters to keep track of how many bytes we have in
flight for METADATA.  We need to count in bytes as blocksizes could be
percentages of pagesize.  We simply convert the bytes to number of pages where
it is needed for the throttling.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 arch/tile/mm/pgtable.c           |   3 +-
 drivers/base/node.c              |   6 ++
 fs/fs-writeback.c                |   2 +
 fs/proc/meminfo.c                |   5 ++
 include/linux/backing-dev-defs.h |   2 +
 include/linux/mm.h               |   9 +++
 include/linux/mmzone.h           |   2 +
 include/trace/events/writeback.h |  13 +++-
 mm/backing-dev.c                 |   5 ++
 mm/page-writeback.c              | 157 +++++++++++++++++++++++++++++++++++----
 mm/page_alloc.c                  |  16 +++-
 mm/vmscan.c                      |   4 +-
 12 files changed, 200 insertions(+), 24 deletions(-)

diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 7cc6ee7..9543468 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
 {
 	struct zone *zone;
 
-	pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
+	pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
 	       (global_node_page_state(NR_ACTIVE_ANON) +
 		global_node_page_state(NR_ACTIVE_FILE)),
 	       (global_node_page_state(NR_INACTIVE_ANON) +
 		global_node_page_state(NR_INACTIVE_FILE)),
 	       global_node_page_state(NR_FILE_DIRTY),
+	       global_node_page_state(NR_METADATA_DIRTY),
 	       global_node_page_state(NR_WRITEBACK),
 	       global_node_page_state(NR_UNSTABLE_NFS),
 	       global_page_state(NR_FREE_PAGES),
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5548f96..3615264 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -51,6 +51,8 @@ static DEVICE_ATTR(cpumap,  S_IRUGO, node_read_cpumask, NULL);
 static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL);
 
 #define K(x) ((x) << (PAGE_SHIFT - 10))
+#define BtoK(x) ((x) >> 10)
+
 static ssize_t node_read_meminfo(struct device *dev,
 			struct device_attribute *attr, char *buf)
 {
@@ -99,7 +101,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
 	n += sprintf(buf + n,
 		       "Node %d Dirty:          %8lu kB\n"
+		       "Node %d MetadataDirty:	%8lu kB\n"
 		       "Node %d Writeback:      %8lu kB\n"
+		       "Node %d MetaWriteback:  %8lu kB\n"
 		       "Node %d FilePages:      %8lu kB\n"
 		       "Node %d Mapped:         %8lu kB\n"
 		       "Node %d AnonPages:      %8lu kB\n"
@@ -119,7 +123,9 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
 			,
 		       nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
+		       nid, BtoK(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES)),
 		       nid, K(node_page_state(pgdat, NR_WRITEBACK)),
+		       nid, BtoK(node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES)),
 		       nid, K(node_page_state(pgdat, NR_FILE_PAGES)),
 		       nid, K(node_page_state(pgdat, NR_FILE_MAPPED)),
 		       nid, K(node_page_state(pgdat, NR_ANON_MAPPED)),
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 56c8fda..aafdb11 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1801,6 +1801,7 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb)
 	return work;
 }
 
+#define BtoP(x) ((x) >> PAGE_SHIFT)
 /*
  * Add in the number of potentially dirty inodes, because each inode
  * write can dirty pagecache in the underlying blockdev.
@@ -1809,6 +1810,7 @@ static unsigned long get_nr_dirty_pages(void)
 {
 	return global_node_page_state(NR_FILE_DIRTY) +
 		global_node_page_state(NR_UNSTABLE_NFS) +
+		BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES)) +
 		get_nr_dirty_inodes();
 }
 
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 09e18fd..95b0d8a 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -36,6 +36,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
  * display in kilobytes.
  */
 #define K(x) ((x) << (PAGE_SHIFT - 10))
+#define BtoK(x) ((x) >> 10)
 	si_meminfo(&i);
 	si_swapinfo(&i);
 	committed = percpu_counter_read_positive(&vm_committed_as);
@@ -80,7 +81,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		"SwapTotal:      %8lu kB\n"
 		"SwapFree:       %8lu kB\n"
 		"Dirty:          %8lu kB\n"
+		"MetadataDirty:  %8lu kB\n"
 		"Writeback:      %8lu kB\n"
+		"MetaWriteback:  %8lu kB\n"
 		"AnonPages:      %8lu kB\n"
 		"Mapped:         %8lu kB\n"
 		"Shmem:          %8lu kB\n"
@@ -139,7 +142,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		K(i.totalswap),
 		K(i.freeswap),
 		K(global_node_page_state(NR_FILE_DIRTY)),
+		BtoK(global_node_page_state(NR_METADATA_DIRTY_BYTES)),
 		K(global_node_page_state(NR_WRITEBACK)),
+		BtoK(global_node_page_state(NR_METADATA_WRITEBACK_BYTES)),
 		K(global_node_page_state(NR_ANON_MAPPED)),
 		K(global_node_page_state(NR_FILE_MAPPED)),
 		K(i.sharedram),
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 3f10307..1a7c3c1 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -34,6 +34,8 @@ typedef int (congested_fn)(void *, int);
 enum wb_stat_item {
 	WB_RECLAIMABLE,
 	WB_WRITEBACK,
+	WB_METADATA_DIRTY_BYTES,
+	WB_METADATA_WRITEBACK_BYTES,
 	WB_DIRTIED,
 	WB_WRITTEN,
 	NR_WB_STAT_ITEMS
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 08ed53e..6d2e3e8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -31,6 +31,7 @@ struct file_ra_state;
 struct user_struct;
 struct writeback_control;
 struct bdi_writeback;
+struct backing_dev_info;
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES	/* Don't use mapnrs, do it properly */
 extern unsigned long max_mapnr;
@@ -1363,6 +1364,14 @@ int redirty_page_for_writepage(struct writeback_control *wbc,
 void account_page_dirtied(struct page *page, struct address_space *mapping);
 void account_page_cleaned(struct page *page, struct address_space *mapping,
 			  struct bdi_writeback *wb);
+void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
+			      long bytes);
+void account_metadata_cleaned(struct page *page, struct backing_dev_info *bdi,
+			      long bytes);
+void account_metadata_writeback(struct page *page,
+				struct backing_dev_info *bdi, long bytes);
+void account_metadata_end_writeback(struct page *page,
+				    struct backing_dev_info *bdi, long bytes);
 int set_page_dirty(struct page *page);
 int set_page_dirty_lock(struct page *page);
 void cancel_dirty_page(struct page *page);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f2e4e90..5d4c443 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -167,6 +167,8 @@ enum node_stat_item {
 	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
 	NR_DIRTIED,		/* page dirtyings since bootup */
 	NR_WRITTEN,		/* page writings since bootup */
+	NR_METADATA_DIRTY_BYTES,	/* Metadata dirty bytes */
+	NR_METADATA_WRITEBACK_BYTES,	/* Metadata writeback bytes */
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 2ccd9cc..f97c8de 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -390,6 +390,8 @@ TRACE_EVENT(writeback_queue_io,
 	)
 );
 
+#define BtoP(x) ((x) >> PAGE_SHIFT)
+
 TRACE_EVENT(global_dirty_state,
 
 	TP_PROTO(unsigned long background_thresh,
@@ -402,7 +404,9 @@ TRACE_EVENT(global_dirty_state,
 
 	TP_STRUCT__entry(
 		__field(unsigned long,	nr_dirty)
+		__field(unsigned long,	nr_metadata_dirty)
 		__field(unsigned long,	nr_writeback)
+		__field(unsigned long,	nr_metadata_writeback)
 		__field(unsigned long,	nr_unstable)
 		__field(unsigned long,	background_thresh)
 		__field(unsigned long,	dirty_thresh)
@@ -413,7 +417,9 @@ TRACE_EVENT(global_dirty_state,
 
 	TP_fast_assign(
 		__entry->nr_dirty	= global_node_page_state(NR_FILE_DIRTY);
+		__entry->nr_metadata_dirty = BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES));
 		__entry->nr_writeback	= global_node_page_state(NR_WRITEBACK);
+		__entry->nr_metadata_dirty = BtoP(global_node_page_state(NR_METADATA_WRITEBACK_BYTES));
 		__entry->nr_unstable	= global_node_page_state(NR_UNSTABLE_NFS);
 		__entry->nr_dirtied	= global_node_page_state(NR_DIRTIED);
 		__entry->nr_written	= global_node_page_state(NR_WRITTEN);
@@ -424,7 +430,8 @@ TRACE_EVENT(global_dirty_state,
 
 	TP_printk("dirty=%lu writeback=%lu unstable=%lu "
 		  "bg_thresh=%lu thresh=%lu limit=%lu "
-		  "dirtied=%lu written=%lu",
+		  "dirtied=%lu written=%lu metadata_dirty=%lu "
+		  "metadata_writeback=%lu",
 		  __entry->nr_dirty,
 		  __entry->nr_writeback,
 		  __entry->nr_unstable,
@@ -432,7 +439,9 @@ TRACE_EVENT(global_dirty_state,
 		  __entry->dirty_thresh,
 		  __entry->dirty_limit,
 		  __entry->nr_dirtied,
-		  __entry->nr_written
+		  __entry->nr_written,
+		  __entry->nr_metadata_dirty,
+		  __entry->nr_metadata_writeback
 	)
 );
 
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index efe2377..d76f432 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -70,6 +70,7 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 	wb_thresh = wb_calc_thresh(wb, dirty_thresh);
 
 #define K(x) ((x) << (PAGE_SHIFT - 10))
+#define BtoK(x) ((x) >> 10)
 	seq_printf(m,
 		   "BdiWriteback:       %10lu kB\n"
 		   "BdiReclaimable:     %10lu kB\n"
@@ -78,6 +79,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   "BackgroundThresh:   %10lu kB\n"
 		   "BdiDirtied:         %10lu kB\n"
 		   "BdiWritten:         %10lu kB\n"
+		   "BdiMetadataDirty:   %10lu kB\n"
+		   "BdiMetaWriteback:	%10lu kB\n"
 		   "BdiWriteBandwidth:  %10lu kBps\n"
 		   "b_dirty:            %10lu\n"
 		   "b_io:               %10lu\n"
@@ -92,6 +95,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   K(background_thresh),
 		   (unsigned long) K(wb_stat(wb, WB_DIRTIED)),
 		   (unsigned long) K(wb_stat(wb, WB_WRITTEN)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_DIRTY_BYTES)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_WRITEBACK_BYTES)),
 		   (unsigned long) K(wb->write_bandwidth),
 		   nr_dirty,
 		   nr_io,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 121a6e3..423d2f5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -380,6 +380,30 @@ static unsigned long global_dirtyable_memory(void)
 }
 
 /**
+ * global_dirty_memory - the number of globally dirty pages
+ *
+ * Returns the global number of pages that are dirty in pagecache and metadata.
+ */
+static unsigned long global_dirty_memory(void)
+{
+	return global_node_page_state(NR_FILE_DIRTY) +
+		global_node_page_state(NR_UNSTABLE_NFS) +
+		(global_node_page_state(NR_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+}
+
+/**
+ * global_writeback_memory - the number of pages under writeback globally
+ *
+ * Returns the global number of pages under writeback both in pagecache and in
+ * metadata.
+ */
+static unsigned long global_writeback_memory(void)
+{
+	return global_node_page_state(NR_WRITEBACK) +
+		(global_node_page_state(NR_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
+}
+
+/**
  * domain_dirty_limits - calculate thresh and bg_thresh for a wb_domain
  * @dtc: dirty_throttle_control of interest
  *
@@ -506,6 +530,10 @@ bool node_dirty_ok(struct pglist_data *pgdat)
 	nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
 	nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
 	nr_pages += node_page_state(pgdat, NR_WRITEBACK);
+	nr_pages += (node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
+		     PAGE_SHIFT);
+	nr_pages += (node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES) >>
+		     PAGE_SHIFT);
 
 	return nr_pages <= limit;
 }
@@ -1514,7 +1542,7 @@ static long wb_min_pause(struct bdi_writeback *wb,
 static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
 {
 	struct bdi_writeback *wb = dtc->wb;
-	unsigned long wb_reclaimable;
+	unsigned long wb_reclaimable, wb_writeback;
 
 	/*
 	 * wb_thresh is not treated as some limiting factor as
@@ -1544,12 +1572,17 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
 	 * deltas.
 	 */
 	if (dtc->wb_thresh < 2 * wb_stat_error(wb)) {
-		wb_reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE);
-		dtc->wb_dirty = wb_reclaimable + wb_stat_sum(wb, WB_WRITEBACK);
+		wb_reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE) +
+			(wb_stat_sum(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+		wb_writeback = wb_stat_sum(wb, WB_WRITEBACK) +
+			(wb_stat_sum(wb, WB_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
 	} else {
-		wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE);
-		dtc->wb_dirty = wb_reclaimable + wb_stat(wb, WB_WRITEBACK);
+		wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE) +
+			(wb_stat(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+		wb_writeback = wb_stat(wb, WB_WRITEBACK) +
+			(wb_stat(wb, WB_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
 	}
+	dtc->wb_dirty = wb_reclaimable + wb_writeback;
 }
 
 /*
@@ -1594,10 +1627,9 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
 		 * written to the server's write cache, but has not yet
 		 * been flushed to permanent storage.
 		 */
-		nr_reclaimable = global_node_page_state(NR_FILE_DIRTY) +
-					global_node_page_state(NR_UNSTABLE_NFS);
+		nr_reclaimable = global_dirty_memory();
 		gdtc->avail = global_dirtyable_memory();
-		gdtc->dirty = nr_reclaimable + global_node_page_state(NR_WRITEBACK);
+		gdtc->dirty = nr_reclaimable + global_writeback_memory();
 
 		domain_dirty_limits(gdtc);
 
@@ -1928,20 +1960,22 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb)
 	struct dirty_throttle_control * const gdtc = &gdtc_stor;
 	struct dirty_throttle_control * const mdtc = mdtc_valid(&mdtc_stor) ?
 						     &mdtc_stor : NULL;
+	unsigned long wb_reclaimable;
 
 	/*
 	 * Similar to balance_dirty_pages() but ignores pages being written
 	 * as we're trying to decide whether to put more under writeback.
 	 */
 	gdtc->avail = global_dirtyable_memory();
-	gdtc->dirty = global_node_page_state(NR_FILE_DIRTY) +
-		      global_node_page_state(NR_UNSTABLE_NFS);
+	gdtc->dirty = global_dirty_memory();
 	domain_dirty_limits(gdtc);
 
 	if (gdtc->dirty > gdtc->bg_thresh)
 		return true;
 
-	if (wb_stat(wb, WB_RECLAIMABLE) >
+	wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE) +
+		(wb_stat(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
+	if (wb_reclaimable >
 	    wb_calc_thresh(gdtc->wb, gdtc->bg_thresh))
 		return true;
 
@@ -1956,7 +1990,7 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb)
 		if (mdtc->dirty > mdtc->bg_thresh)
 			return true;
 
-		if (wb_stat(wb, WB_RECLAIMABLE) >
+		if (wb_reclaimable >
 		    wb_calc_thresh(mdtc->wb, mdtc->bg_thresh))
 			return true;
 	}
@@ -1980,8 +2014,8 @@ void throttle_vm_writeout(gfp_t gfp_mask)
                 dirty_thresh += dirty_thresh / 10;      /* wheeee... */
 
                 if (global_node_page_state(NR_UNSTABLE_NFS) +
-			global_node_page_state(NR_WRITEBACK) <= dirty_thresh)
-                        	break;
+		    global_writeback_memory() <= dirty_thresh)
+			break;
                 congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 		/*
@@ -2008,8 +2042,7 @@ int dirty_writeback_centisecs_handler(struct ctl_table *table, int write,
 void laptop_mode_timer_fn(unsigned long data)
 {
 	struct request_queue *q = (struct request_queue *)data;
-	int nr_pages = global_node_page_state(NR_FILE_DIRTY) +
-		global_node_page_state(NR_UNSTABLE_NFS);
+	int nr_pages = global_dirty_memory();
 	struct bdi_writeback *wb;
 
 	/*
@@ -2473,6 +2506,98 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 EXPORT_SYMBOL(account_page_dirtied);
 
 /*
+ * account_metadata_dirtied
+ * @page - the page being dirited
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes being dirtied
+ *
+ * Do the dirty page accounting for metadata pages that aren't backed by an
+ * address_space.
+ */
+void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
+			      long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
+			      bytes);
+	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
+	current->nr_dirtied++;
+	task_io_account_write(bytes);
+	this_cpu_inc(bdp_ratelimits);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_dirtied);
+
+/*
+ * account_metadata_cleaned
+ * @page - the page being cleaned
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes cleaned
+ *
+ * Called on a no longer dirty metadata page.
+ */
+void account_metadata_cleaned(struct page *page, struct backing_dev_info *bdi,
+			      long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
+			      -bytes);
+	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, -bytes);
+	task_io_account_cancelled_write(bytes);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_cleaned);
+
+/*
+ * account_metadata_writeback
+ * @page - the page being marked as writeback
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes we are submitting for writeback
+ *
+ * Called on a metadata page that has been marked writeback.
+ */
+void account_metadata_writeback(struct page *page,
+				struct backing_dev_info *bdi, long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, -bytes);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
+					 -bytes);
+	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, bytes);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
+					 bytes);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_writeback);
+
+/*
+ * account_metadata_end_writeback
+ * @page - the page we are ending writeback on
+ * @bdi - the bdi that owns this page
+ * @bytes - the number of bytes that just ended writeback
+ *
+ * Called on a metadata page that has completed writeback.
+ */
+void account_metadata_end_writeback(struct page *page,
+				    struct backing_dev_info *bdi, long bytes)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
+	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
+					 -bytes);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(account_metadata_end_writeback);
+
+/*
  * Helper function for deaccounting dirty page without writeback.
  *
  * Caller must hold lock_page_memcg().
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 39a372a..978ae3e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4164,6 +4164,8 @@ out:
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
+#define BtoK(x) ((x) >> 10)
+#define BtoP(x) ((x) >> PAGE_SHIFT)
 
 static void show_migration_types(unsigned char type)
 {
@@ -4218,10 +4220,10 @@ void show_free_areas(unsigned int filter)
 
 	printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"
 		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
-		" unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n"
-		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
-		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
-		" free:%lu free_pcp:%lu free_cma:%lu\n",
+		" unevictable:%lu dirty:%lu metadata_dirty:%lu writeback:%lu\n"
+	        " unstable:%lu metadata_writeback:%lu slab_reclaimable:%lu\n"
+	        " slab_unreclaimable:%lu mapped:%lu shmem:%lu pagetables:%lu\n"
+	        " bounce:%lu free:%lu free_pcp:%lu free_cma:%lu\n",
 		global_node_page_state(NR_ACTIVE_ANON),
 		global_node_page_state(NR_INACTIVE_ANON),
 		global_node_page_state(NR_ISOLATED_ANON),
@@ -4230,8 +4232,10 @@ void show_free_areas(unsigned int filter)
 		global_node_page_state(NR_ISOLATED_FILE),
 		global_node_page_state(NR_UNEVICTABLE),
 		global_node_page_state(NR_FILE_DIRTY),
+		BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES)),
 		global_node_page_state(NR_WRITEBACK),
 		global_node_page_state(NR_UNSTABLE_NFS),
+		BtoP(global_node_page_state(NR_METADATA_WRITEBACK_BYTES)),
 		global_page_state(NR_SLAB_RECLAIMABLE),
 		global_page_state(NR_SLAB_UNRECLAIMABLE),
 		global_node_page_state(NR_FILE_MAPPED),
@@ -4253,7 +4257,9 @@ void show_free_areas(unsigned int filter)
 			" isolated(file):%lukB"
 			" mapped:%lukB"
 			" dirty:%lukB"
+			" metadata_dirty:%lukB"
 			" writeback:%lukB"
+			" metadata_writeback:%lukB"
 			" shmem:%lukB"
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 			" shmem_thp: %lukB"
@@ -4275,7 +4281,9 @@ void show_free_areas(unsigned int filter)
 			K(node_page_state(pgdat, NR_ISOLATED_FILE)),
 			K(node_page_state(pgdat, NR_FILE_MAPPED)),
 			K(node_page_state(pgdat, NR_FILE_DIRTY)),
+			BtoK(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES)),
 			K(node_page_state(pgdat, NR_WRITEBACK)),
+			BtoK(node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES)),
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 			K(node_page_state(pgdat, NR_SHMEM_THPS) * HPAGE_PMD_NR),
 			K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 374d95d..c3be15c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3714,7 +3714,9 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat)
 
 	/* If we can't clean pages, remove dirty pages from consideration */
 	if (!(node_reclaim_mode & RECLAIM_WRITE))
-		delta += node_page_state(pgdat, NR_FILE_DIRTY);
+		delta += node_page_state(pgdat, NR_FILE_DIRTY) +
+			(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
+			 PAGE_SHIFT);
 
 	/* Watch for any possible underflows due to delta */
 	if (unlikely(delta > nr_pagecache_reclaimable))
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/4] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes
  2016-09-20 20:57 ` Josef Bacik
  (?)
@ 2016-09-20 20:57   ` Josef Bacik
  -1 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

These are counters that constantly go up in order to do bandwidth calculations.
It isn't important what the units are in, as long as they are consistent between
the two of them, so convert them to count bytes written/dirtied, and allow the
metadata accounting stuff to change the counters as well.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/fuse/file.c                   |  4 ++--
 include/linux/backing-dev-defs.h |  4 ++--
 include/linux/backing-dev.h      |  2 +-
 mm/backing-dev.c                 |  8 ++++----
 mm/page-writeback.c              | 26 ++++++++++++++++----------
 5 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f394aff..3f5991e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1466,7 +1466,7 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
 	for (i = 0; i < req->num_pages; i++) {
 		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
 		dec_node_page_state(req->pages[i], NR_WRITEBACK_TEMP);
-		wb_writeout_inc(&bdi->wb);
+		wb_writeout_inc(&bdi->wb, PAGE_SIZE);
 	}
 	wake_up(&fi->page_waitq);
 }
@@ -1770,7 +1770,7 @@ static bool fuse_writepage_in_flight(struct fuse_req *new_req,
 
 		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
 		dec_node_page_state(page, NR_WRITEBACK_TEMP);
-		wb_writeout_inc(&bdi->wb);
+		wb_writeout_inc(&bdi->wb, PAGE_SIZE);
 		fuse_writepage_free(fc, new_req);
 		fuse_request_free(new_req);
 		goto out;
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 1a7c3c1..cef0f24 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -36,8 +36,8 @@ enum wb_stat_item {
 	WB_WRITEBACK,
 	WB_METADATA_DIRTY_BYTES,
 	WB_METADATA_WRITEBACK_BYTES,
-	WB_DIRTIED,
-	WB_WRITTEN,
+	WB_DIRTIED_BYTES,
+	WB_WRITTEN_BYTES,
 	NR_WB_STAT_ITEMS
 };
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 089acf6..742238a 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -113,7 +113,7 @@ static inline s64 wb_stat_sum(struct bdi_writeback *wb, enum wb_stat_item item)
 	return sum;
 }
 
-extern void wb_writeout_inc(struct bdi_writeback *wb);
+extern void wb_writeout_inc(struct bdi_writeback *wb, long bytes);
 
 /*
  * maximal error of a stat counter.
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d76f432..f0695b0 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -77,8 +77,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   "BdiDirtyThresh:     %10lu kB\n"
 		   "DirtyThresh:        %10lu kB\n"
 		   "BackgroundThresh:   %10lu kB\n"
-		   "BdiDirtied:         %10lu kB\n"
-		   "BdiWritten:         %10lu kB\n"
+		   "BdiDirtiedBytes:    %10lu kB\n"
+		   "BdiWrittenBytes:    %10lu kB\n"
 		   "BdiMetadataDirty:   %10lu kB\n"
 		   "BdiMetaWriteback:	%10lu kB\n"
 		   "BdiWriteBandwidth:  %10lu kBps\n"
@@ -93,8 +93,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   K(wb_thresh),
 		   K(dirty_thresh),
 		   K(background_thresh),
-		   (unsigned long) K(wb_stat(wb, WB_DIRTIED)),
-		   (unsigned long) K(wb_stat(wb, WB_WRITTEN)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_DIRTIED_BYTES)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_WRITTEN_BYTES)),
 		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_DIRTY_BYTES)),
 		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_WRITEBACK_BYTES)),
 		   (unsigned long) K(wb->write_bandwidth),
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 423d2f5..6d08673 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -624,11 +624,11 @@ static void wb_domain_writeout_inc(struct wb_domain *dom,
  * Increment @wb's writeout completion count and the global writeout
  * completion count. Called from test_clear_page_writeback().
  */
-static inline void __wb_writeout_inc(struct bdi_writeback *wb)
+static inline void __wb_writeout_inc(struct bdi_writeback *wb, long bytes)
 {
 	struct wb_domain *cgdom;
 
-	__inc_wb_stat(wb, WB_WRITTEN);
+	__add_wb_stat(wb, WB_WRITTEN_BYTES, bytes);
 	wb_domain_writeout_inc(&global_wb_domain, &wb->completions,
 			       wb->bdi->max_prop_frac);
 
@@ -638,12 +638,12 @@ static inline void __wb_writeout_inc(struct bdi_writeback *wb)
 				       wb->bdi->max_prop_frac);
 }
 
-void wb_writeout_inc(struct bdi_writeback *wb)
+void wb_writeout_inc(struct bdi_writeback *wb, long bytes)
 {
 	unsigned long flags;
 
 	local_irq_save(flags);
-	__wb_writeout_inc(wb);
+	__wb_writeout_inc(wb, bytes);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(wb_writeout_inc);
@@ -1390,8 +1390,8 @@ static void __wb_update_bandwidth(struct dirty_throttle_control *gdtc,
 	if (elapsed < BANDWIDTH_INTERVAL)
 		return;
 
-	dirtied = percpu_counter_read(&wb->stat[WB_DIRTIED]);
-	written = percpu_counter_read(&wb->stat[WB_WRITTEN]);
+	dirtied = percpu_counter_read(&wb->stat[WB_DIRTIED_BYTES]);
+	written = percpu_counter_read(&wb->stat[WB_WRITTEN_BYTES]);
 
 	/*
 	 * Skip quiet periods when disk bandwidth is under-utilized.
@@ -2497,7 +2497,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 		__inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		__inc_node_page_state(page, NR_DIRTIED);
 		__inc_wb_stat(wb, WB_RECLAIMABLE);
-		__inc_wb_stat(wb, WB_DIRTIED);
+		__add_wb_stat(wb, WB_DIRTIED_BYTES, PAGE_SIZE);
 		task_io_account_write(PAGE_SIZE);
 		current->nr_dirtied++;
 		this_cpu_inc(bdp_ratelimits);
@@ -2523,6 +2523,7 @@ void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
 	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
 			      bytes);
 	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
+	__add_wb_stat(&bdi->wb, WB_DIRTIED_BYTES, bytes);
 	current->nr_dirtied++;
 	task_io_account_write(bytes);
 	this_cpu_inc(bdp_ratelimits);
@@ -2593,6 +2594,7 @@ void account_metadata_end_writeback(struct page *page,
 	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
 	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
 					 -bytes);
+	__add_wb_stat(&bdi->wb, WB_WRITTEN_BYTES, bytes);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(account_metadata_end_writeback);
@@ -2672,12 +2674,16 @@ void account_page_redirty(struct page *page)
 	if (mapping && mapping_cap_account_dirty(mapping)) {
 		struct inode *inode = mapping->host;
 		struct bdi_writeback *wb;
+		unsigned long flags;
 		bool locked;
 
 		wb = unlocked_inode_to_wb_begin(inode, &locked);
 		current->nr_dirtied--;
-		dec_node_page_state(page, NR_DIRTIED);
-		dec_wb_stat(wb, WB_DIRTIED);
+
+		local_irq_save(flags);
+		__dec_node_page_state(page, NR_DIRTIED);
+		__add_wb_stat(wb, WB_DIRTIED_BYTES, -PAGE_SIZE);
+		local_irq_restore(flags);
 		unlocked_inode_to_wb_end(inode, locked);
 	}
 }
@@ -2897,7 +2903,7 @@ int test_clear_page_writeback(struct page *page)
 				struct bdi_writeback *wb = inode_to_wb(inode);
 
 				__dec_wb_stat(wb, WB_WRITEBACK);
-				__wb_writeout_inc(wb);
+				__wb_writeout_inc(wb, PAGE_SIZE);
 			}
 		}
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/4] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes
@ 2016-09-20 20:57   ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

These are counters that constantly go up in order to do bandwidth calculations.
It isn't important what the units are in, as long as they are consistent between
the two of them, so convert them to count bytes written/dirtied, and allow the
metadata accounting stuff to change the counters as well.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/fuse/file.c                   |  4 ++--
 include/linux/backing-dev-defs.h |  4 ++--
 include/linux/backing-dev.h      |  2 +-
 mm/backing-dev.c                 |  8 ++++----
 mm/page-writeback.c              | 26 ++++++++++++++++----------
 5 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f394aff..3f5991e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1466,7 +1466,7 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
 	for (i = 0; i < req->num_pages; i++) {
 		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
 		dec_node_page_state(req->pages[i], NR_WRITEBACK_TEMP);
-		wb_writeout_inc(&bdi->wb);
+		wb_writeout_inc(&bdi->wb, PAGE_SIZE);
 	}
 	wake_up(&fi->page_waitq);
 }
@@ -1770,7 +1770,7 @@ static bool fuse_writepage_in_flight(struct fuse_req *new_req,
 
 		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
 		dec_node_page_state(page, NR_WRITEBACK_TEMP);
-		wb_writeout_inc(&bdi->wb);
+		wb_writeout_inc(&bdi->wb, PAGE_SIZE);
 		fuse_writepage_free(fc, new_req);
 		fuse_request_free(new_req);
 		goto out;
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 1a7c3c1..cef0f24 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -36,8 +36,8 @@ enum wb_stat_item {
 	WB_WRITEBACK,
 	WB_METADATA_DIRTY_BYTES,
 	WB_METADATA_WRITEBACK_BYTES,
-	WB_DIRTIED,
-	WB_WRITTEN,
+	WB_DIRTIED_BYTES,
+	WB_WRITTEN_BYTES,
 	NR_WB_STAT_ITEMS
 };
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 089acf6..742238a 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -113,7 +113,7 @@ static inline s64 wb_stat_sum(struct bdi_writeback *wb, enum wb_stat_item item)
 	return sum;
 }
 
-extern void wb_writeout_inc(struct bdi_writeback *wb);
+extern void wb_writeout_inc(struct bdi_writeback *wb, long bytes);
 
 /*
  * maximal error of a stat counter.
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d76f432..f0695b0 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -77,8 +77,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   "BdiDirtyThresh:     %10lu kB\n"
 		   "DirtyThresh:        %10lu kB\n"
 		   "BackgroundThresh:   %10lu kB\n"
-		   "BdiDirtied:         %10lu kB\n"
-		   "BdiWritten:         %10lu kB\n"
+		   "BdiDirtiedBytes:    %10lu kB\n"
+		   "BdiWrittenBytes:    %10lu kB\n"
 		   "BdiMetadataDirty:   %10lu kB\n"
 		   "BdiMetaWriteback:	%10lu kB\n"
 		   "BdiWriteBandwidth:  %10lu kBps\n"
@@ -93,8 +93,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   K(wb_thresh),
 		   K(dirty_thresh),
 		   K(background_thresh),
-		   (unsigned long) K(wb_stat(wb, WB_DIRTIED)),
-		   (unsigned long) K(wb_stat(wb, WB_WRITTEN)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_DIRTIED_BYTES)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_WRITTEN_BYTES)),
 		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_DIRTY_BYTES)),
 		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_WRITEBACK_BYTES)),
 		   (unsigned long) K(wb->write_bandwidth),
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 423d2f5..6d08673 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -624,11 +624,11 @@ static void wb_domain_writeout_inc(struct wb_domain *dom,
  * Increment @wb's writeout completion count and the global writeout
  * completion count. Called from test_clear_page_writeback().
  */
-static inline void __wb_writeout_inc(struct bdi_writeback *wb)
+static inline void __wb_writeout_inc(struct bdi_writeback *wb, long bytes)
 {
 	struct wb_domain *cgdom;
 
-	__inc_wb_stat(wb, WB_WRITTEN);
+	__add_wb_stat(wb, WB_WRITTEN_BYTES, bytes);
 	wb_domain_writeout_inc(&global_wb_domain, &wb->completions,
 			       wb->bdi->max_prop_frac);
 
@@ -638,12 +638,12 @@ static inline void __wb_writeout_inc(struct bdi_writeback *wb)
 				       wb->bdi->max_prop_frac);
 }
 
-void wb_writeout_inc(struct bdi_writeback *wb)
+void wb_writeout_inc(struct bdi_writeback *wb, long bytes)
 {
 	unsigned long flags;
 
 	local_irq_save(flags);
-	__wb_writeout_inc(wb);
+	__wb_writeout_inc(wb, bytes);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(wb_writeout_inc);
@@ -1390,8 +1390,8 @@ static void __wb_update_bandwidth(struct dirty_throttle_control *gdtc,
 	if (elapsed < BANDWIDTH_INTERVAL)
 		return;
 
-	dirtied = percpu_counter_read(&wb->stat[WB_DIRTIED]);
-	written = percpu_counter_read(&wb->stat[WB_WRITTEN]);
+	dirtied = percpu_counter_read(&wb->stat[WB_DIRTIED_BYTES]);
+	written = percpu_counter_read(&wb->stat[WB_WRITTEN_BYTES]);
 
 	/*
 	 * Skip quiet periods when disk bandwidth is under-utilized.
@@ -2497,7 +2497,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 		__inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		__inc_node_page_state(page, NR_DIRTIED);
 		__inc_wb_stat(wb, WB_RECLAIMABLE);
-		__inc_wb_stat(wb, WB_DIRTIED);
+		__add_wb_stat(wb, WB_DIRTIED_BYTES, PAGE_SIZE);
 		task_io_account_write(PAGE_SIZE);
 		current->nr_dirtied++;
 		this_cpu_inc(bdp_ratelimits);
@@ -2523,6 +2523,7 @@ void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
 	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
 			      bytes);
 	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
+	__add_wb_stat(&bdi->wb, WB_DIRTIED_BYTES, bytes);
 	current->nr_dirtied++;
 	task_io_account_write(bytes);
 	this_cpu_inc(bdp_ratelimits);
@@ -2593,6 +2594,7 @@ void account_metadata_end_writeback(struct page *page,
 	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
 	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
 					 -bytes);
+	__add_wb_stat(&bdi->wb, WB_WRITTEN_BYTES, bytes);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(account_metadata_end_writeback);
@@ -2672,12 +2674,16 @@ void account_page_redirty(struct page *page)
 	if (mapping && mapping_cap_account_dirty(mapping)) {
 		struct inode *inode = mapping->host;
 		struct bdi_writeback *wb;
+		unsigned long flags;
 		bool locked;
 
 		wb = unlocked_inode_to_wb_begin(inode, &locked);
 		current->nr_dirtied--;
-		dec_node_page_state(page, NR_DIRTIED);
-		dec_wb_stat(wb, WB_DIRTIED);
+
+		local_irq_save(flags);
+		__dec_node_page_state(page, NR_DIRTIED);
+		__add_wb_stat(wb, WB_DIRTIED_BYTES, -PAGE_SIZE);
+		local_irq_restore(flags);
 		unlocked_inode_to_wb_end(inode, locked);
 	}
 }
@@ -2897,7 +2903,7 @@ int test_clear_page_writeback(struct page *page)
 				struct bdi_writeback *wb = inode_to_wb(inode);
 
 				__dec_wb_stat(wb, WB_WRITEBACK);
-				__wb_writeout_inc(wb);
+				__wb_writeout_inc(wb, PAGE_SIZE);
 			}
 		}
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/4] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes
@ 2016-09-20 20:57   ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

These are counters that constantly go up in order to do bandwidth calculations.
It isn't important what the units are in, as long as they are consistent between
the two of them, so convert them to count bytes written/dirtied, and allow the
metadata accounting stuff to change the counters as well.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/fuse/file.c                   |  4 ++--
 include/linux/backing-dev-defs.h |  4 ++--
 include/linux/backing-dev.h      |  2 +-
 mm/backing-dev.c                 |  8 ++++----
 mm/page-writeback.c              | 26 ++++++++++++++++----------
 5 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f394aff..3f5991e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1466,7 +1466,7 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
 	for (i = 0; i < req->num_pages; i++) {
 		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
 		dec_node_page_state(req->pages[i], NR_WRITEBACK_TEMP);
-		wb_writeout_inc(&bdi->wb);
+		wb_writeout_inc(&bdi->wb, PAGE_SIZE);
 	}
 	wake_up(&fi->page_waitq);
 }
@@ -1770,7 +1770,7 @@ static bool fuse_writepage_in_flight(struct fuse_req *new_req,
 
 		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
 		dec_node_page_state(page, NR_WRITEBACK_TEMP);
-		wb_writeout_inc(&bdi->wb);
+		wb_writeout_inc(&bdi->wb, PAGE_SIZE);
 		fuse_writepage_free(fc, new_req);
 		fuse_request_free(new_req);
 		goto out;
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 1a7c3c1..cef0f24 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -36,8 +36,8 @@ enum wb_stat_item {
 	WB_WRITEBACK,
 	WB_METADATA_DIRTY_BYTES,
 	WB_METADATA_WRITEBACK_BYTES,
-	WB_DIRTIED,
-	WB_WRITTEN,
+	WB_DIRTIED_BYTES,
+	WB_WRITTEN_BYTES,
 	NR_WB_STAT_ITEMS
 };
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 089acf6..742238a 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -113,7 +113,7 @@ static inline s64 wb_stat_sum(struct bdi_writeback *wb, enum wb_stat_item item)
 	return sum;
 }
 
-extern void wb_writeout_inc(struct bdi_writeback *wb);
+extern void wb_writeout_inc(struct bdi_writeback *wb, long bytes);
 
 /*
  * maximal error of a stat counter.
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d76f432..f0695b0 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -77,8 +77,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   "BdiDirtyThresh:     %10lu kB\n"
 		   "DirtyThresh:        %10lu kB\n"
 		   "BackgroundThresh:   %10lu kB\n"
-		   "BdiDirtied:         %10lu kB\n"
-		   "BdiWritten:         %10lu kB\n"
+		   "BdiDirtiedBytes:    %10lu kB\n"
+		   "BdiWrittenBytes:    %10lu kB\n"
 		   "BdiMetadataDirty:   %10lu kB\n"
 		   "BdiMetaWriteback:	%10lu kB\n"
 		   "BdiWriteBandwidth:  %10lu kBps\n"
@@ -93,8 +93,8 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   K(wb_thresh),
 		   K(dirty_thresh),
 		   K(background_thresh),
-		   (unsigned long) K(wb_stat(wb, WB_DIRTIED)),
-		   (unsigned long) K(wb_stat(wb, WB_WRITTEN)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_DIRTIED_BYTES)),
+		   (unsigned long) BtoK(wb_stat(wb, WB_WRITTEN_BYTES)),
 		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_DIRTY_BYTES)),
 		   (unsigned long) BtoK(wb_stat(wb, WB_METADATA_WRITEBACK_BYTES)),
 		   (unsigned long) K(wb->write_bandwidth),
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 423d2f5..6d08673 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -624,11 +624,11 @@ static void wb_domain_writeout_inc(struct wb_domain *dom,
  * Increment @wb's writeout completion count and the global writeout
  * completion count. Called from test_clear_page_writeback().
  */
-static inline void __wb_writeout_inc(struct bdi_writeback *wb)
+static inline void __wb_writeout_inc(struct bdi_writeback *wb, long bytes)
 {
 	struct wb_domain *cgdom;
 
-	__inc_wb_stat(wb, WB_WRITTEN);
+	__add_wb_stat(wb, WB_WRITTEN_BYTES, bytes);
 	wb_domain_writeout_inc(&global_wb_domain, &wb->completions,
 			       wb->bdi->max_prop_frac);
 
@@ -638,12 +638,12 @@ static inline void __wb_writeout_inc(struct bdi_writeback *wb)
 				       wb->bdi->max_prop_frac);
 }
 
-void wb_writeout_inc(struct bdi_writeback *wb)
+void wb_writeout_inc(struct bdi_writeback *wb, long bytes)
 {
 	unsigned long flags;
 
 	local_irq_save(flags);
-	__wb_writeout_inc(wb);
+	__wb_writeout_inc(wb, bytes);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(wb_writeout_inc);
@@ -1390,8 +1390,8 @@ static void __wb_update_bandwidth(struct dirty_throttle_control *gdtc,
 	if (elapsed < BANDWIDTH_INTERVAL)
 		return;
 
-	dirtied = percpu_counter_read(&wb->stat[WB_DIRTIED]);
-	written = percpu_counter_read(&wb->stat[WB_WRITTEN]);
+	dirtied = percpu_counter_read(&wb->stat[WB_DIRTIED_BYTES]);
+	written = percpu_counter_read(&wb->stat[WB_WRITTEN_BYTES]);
 
 	/*
 	 * Skip quiet periods when disk bandwidth is under-utilized.
@@ -2497,7 +2497,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 		__inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 		__inc_node_page_state(page, NR_DIRTIED);
 		__inc_wb_stat(wb, WB_RECLAIMABLE);
-		__inc_wb_stat(wb, WB_DIRTIED);
+		__add_wb_stat(wb, WB_DIRTIED_BYTES, PAGE_SIZE);
 		task_io_account_write(PAGE_SIZE);
 		current->nr_dirtied++;
 		this_cpu_inc(bdp_ratelimits);
@@ -2523,6 +2523,7 @@ void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
 	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
 			      bytes);
 	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
+	__add_wb_stat(&bdi->wb, WB_DIRTIED_BYTES, bytes);
 	current->nr_dirtied++;
 	task_io_account_write(bytes);
 	this_cpu_inc(bdp_ratelimits);
@@ -2593,6 +2594,7 @@ void account_metadata_end_writeback(struct page *page,
 	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
 	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
 					 -bytes);
+	__add_wb_stat(&bdi->wb, WB_WRITTEN_BYTES, bytes);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(account_metadata_end_writeback);
@@ -2672,12 +2674,16 @@ void account_page_redirty(struct page *page)
 	if (mapping && mapping_cap_account_dirty(mapping)) {
 		struct inode *inode = mapping->host;
 		struct bdi_writeback *wb;
+		unsigned long flags;
 		bool locked;
 
 		wb = unlocked_inode_to_wb_begin(inode, &locked);
 		current->nr_dirtied--;
-		dec_node_page_state(page, NR_DIRTIED);
-		dec_wb_stat(wb, WB_DIRTIED);
+
+		local_irq_save(flags);
+		__dec_node_page_state(page, NR_DIRTIED);
+		__add_wb_stat(wb, WB_DIRTIED_BYTES, -PAGE_SIZE);
+		local_irq_restore(flags);
 		unlocked_inode_to_wb_end(inode, locked);
 	}
 }
@@ -2897,7 +2903,7 @@ int test_clear_page_writeback(struct page *page)
 				struct bdi_writeback *wb = inode_to_wb(inode);
 
 				__dec_wb_stat(wb, WB_WRITEBACK);
-				__wb_writeout_inc(wb);
+				__wb_writeout_inc(wb, PAGE_SIZE);
 			}
 		}
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] writeback: introduce super_operations->write_metadata
  2016-09-20 20:57 ` Josef Bacik
  (?)
@ 2016-09-20 20:57   ` Josef Bacik
  -1 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

Now that we have metadata counters in the VM, we need to provide a way to kick
writeback on dirty metadata.  Introduce super_operations->write_metadata.  This
allows file systems to deal with writing back any dirty metadata we need based
on the writeback needs of the system.  Since there is no inode to key off of we
need a list in the bdi for dirty super blocks to be added.  From there we can
find any dirty sb's on the bdi we are currently doing writeback on and call into
their ->write_metadata callback.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/fs-writeback.c                | 72 ++++++++++++++++++++++++++++++++++++----
 fs/super.c                       |  7 ++++
 include/linux/backing-dev-defs.h |  2 ++
 include/linux/fs.h               |  4 +++
 mm/backing-dev.c                 |  2 ++
 5 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index aafdb11..8cd072e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1464,6 +1464,31 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
 	return pages;
 }
 
+static long writeback_sb_metadata(struct super_block *sb,
+				  struct bdi_writeback *wb,
+				  struct wb_writeback_work *work)
+{
+	struct writeback_control wbc = {
+		.sync_mode		= work->sync_mode,
+		.tagged_writepages	= work->tagged_writepages,
+		.for_kupdate		= work->for_kupdate,
+		.for_background		= work->for_background,
+		.for_sync		= work->for_sync,
+		.range_cyclic		= work->range_cyclic,
+		.range_start		= 0,
+		.range_end		= LLONG_MAX,
+	};
+	long write_chunk;
+
+	write_chunk = writeback_chunk_size(wb, work);
+	wbc.nr_to_write = write_chunk;
+	sb->s_op->write_metadata(sb, &wbc);
+	work->nr_pages -= write_chunk - wbc.nr_to_write;
+
+	return write_chunk - wbc.nr_to_write;
+}
+
+
 /*
  * Write a portion of b_io inodes which belong to @sb.
  *
@@ -1490,6 +1515,7 @@ static long writeback_sb_inodes(struct super_block *sb,
 	unsigned long start_time = jiffies;
 	long write_chunk;
 	long wrote = 0;  /* count both pages and inodes */
+	bool done = false;
 
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
@@ -1606,12 +1632,18 @@ static long writeback_sb_inodes(struct super_block *sb,
 		 * background threshold and other termination conditions.
 		 */
 		if (wrote) {
-			if (time_is_before_jiffies(start_time + HZ / 10UL))
-				break;
-			if (work->nr_pages <= 0)
+			if (time_is_before_jiffies(start_time + HZ / 10UL) ||
+			    work->nr_pages <= 0) {
+				done = true;
 				break;
+			}
 		}
 	}
+	if (!done && sb->s_op->write_metadata) {
+		spin_unlock(&wb->list_lock);
+		wrote += writeback_sb_metadata(sb, wb, work);
+		spin_unlock(&wb->list_lock);
+	}
 	return wrote;
 }
 
@@ -1620,6 +1652,7 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
 {
 	unsigned long start_time = jiffies;
 	long wrote = 0;
+	bool done = false;
 
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
@@ -1639,12 +1672,39 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
 
 		/* refer to the same tests at the end of writeback_sb_inodes */
 		if (wrote) {
-			if (time_is_before_jiffies(start_time + HZ / 10UL))
-				break;
-			if (work->nr_pages <= 0)
+			if (time_is_before_jiffies(start_time + HZ / 10UL) ||
+			    work->nr_pages <= 0) {
+				done = true;
 				break;
+			}
 		}
 	}
+
+	if (!done && wb_stat(wb, WB_METADATA_DIRTY_BYTES)) {
+		LIST_HEAD(list);
+
+		spin_unlock(&wb->list_lock);
+		spin_lock(&wb->bdi->sb_list_lock);
+		list_splice_init(&wb->bdi->dirty_sb_list, &list);
+		while (!list_empty(&list)) {
+			struct super_block *sb;
+
+			sb = list_first_entry(&list, struct super_block,
+					      s_bdi_dirty_list);
+			list_move_tail(&sb->s_bdi_dirty_list,
+				       &wb->bdi->dirty_sb_list);
+			if (!sb->s_op->write_metadata)
+				continue;
+			if (!trylock_super(sb))
+				continue;
+			spin_unlock(&wb->bdi->sb_list_lock);
+			wrote += writeback_sb_metadata(sb, wb, work);
+			spin_lock(&wb->bdi->sb_list_lock);
+			up_read(&sb->s_umount);
+		}
+		spin_unlock(&wb->bdi->sb_list_lock);
+		spin_lock(&wb->list_lock);
+	}
 	/* Leave any unwritten inodes on b_io */
 	return wrote;
 }
diff --git a/fs/super.c b/fs/super.c
index c2ff475..eb32913 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -215,6 +215,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	spin_lock_init(&s->s_inode_list_lock);
 	INIT_LIST_HEAD(&s->s_inodes_wb);
 	spin_lock_init(&s->s_inode_wblist_lock);
+	INIT_LIST_HEAD(&s->s_bdi_dirty_list);
 
 	if (list_lru_init_memcg(&s->s_dentry_lru))
 		goto fail;
@@ -305,6 +306,8 @@ void deactivate_locked_super(struct super_block *s)
 {
 	struct file_system_type *fs = s->s_type;
 	if (atomic_dec_and_test(&s->s_active)) {
+		struct backing_dev_info *bdi = s->s_bdi;
+
 		cleancache_invalidate_fs(s);
 		unregister_shrinker(&s->s_shrink);
 		fs->kill_sb(s);
@@ -317,6 +320,10 @@ void deactivate_locked_super(struct super_block *s)
 		list_lru_destroy(&s->s_dentry_lru);
 		list_lru_destroy(&s->s_inode_lru);
 
+		spin_lock(&bdi->sb_list_lock);
+		list_del_init(&s->s_bdi_dirty_list);
+		spin_unlock(&bdi->sb_list_lock);
+
 		put_filesystem(fs);
 		put_super(s);
 	} else {
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index cef0f24..11cea63 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -168,6 +168,8 @@ struct backing_dev_info {
 
 	struct timer_list laptop_mode_wb_timer;
 
+	spinlock_t sb_list_lock;
+	struct list_head dirty_sb_list;
 #ifdef CONFIG_DEBUG_FS
 	struct dentry *debug_dir;
 	struct dentry *debug_stats;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f3f0b4c8..f521187 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1430,6 +1430,8 @@ struct super_block {
 
 	spinlock_t		s_inode_wblist_lock;
 	struct list_head	s_inodes_wb;	/* writeback inodes */
+
+	struct list_head	s_bdi_dirty_list;
 };
 
 /* Helper functions so that in most cases filesystems will
@@ -1805,6 +1807,8 @@ struct super_operations {
 				  struct shrink_control *);
 	long (*free_cached_objects)(struct super_block *,
 				    struct shrink_control *);
+	void (*write_metadata)(struct super_block *sb,
+			       struct writeback_control *wbc);
 };
 
 /*
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index f0695b0..541f532 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -785,6 +785,8 @@ int bdi_init(struct backing_dev_info *bdi)
 	bdi->max_prop_frac = FPROP_FRAC_BASE;
 	INIT_LIST_HEAD(&bdi->bdi_list);
 	INIT_LIST_HEAD(&bdi->wb_list);
+	INIT_LIST_HEAD(&bdi->dirty_sb_list);
+	spin_lock_init(&bdi->sb_list_lock);
 	init_waitqueue_head(&bdi->wb_waitq);
 
 	ret = cgwb_bdi_init(bdi);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] writeback: introduce super_operations->write_metadata
@ 2016-09-20 20:57   ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

Now that we have metadata counters in the VM, we need to provide a way to kick
writeback on dirty metadata.  Introduce super_operations->write_metadata.  This
allows file systems to deal with writing back any dirty metadata we need based
on the writeback needs of the system.  Since there is no inode to key off of we
need a list in the bdi for dirty super blocks to be added.  From there we can
find any dirty sb's on the bdi we are currently doing writeback on and call into
their ->write_metadata callback.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/fs-writeback.c                | 72 ++++++++++++++++++++++++++++++++++++----
 fs/super.c                       |  7 ++++
 include/linux/backing-dev-defs.h |  2 ++
 include/linux/fs.h               |  4 +++
 mm/backing-dev.c                 |  2 ++
 5 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index aafdb11..8cd072e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1464,6 +1464,31 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
 	return pages;
 }
 
+static long writeback_sb_metadata(struct super_block *sb,
+				  struct bdi_writeback *wb,
+				  struct wb_writeback_work *work)
+{
+	struct writeback_control wbc = {
+		.sync_mode		= work->sync_mode,
+		.tagged_writepages	= work->tagged_writepages,
+		.for_kupdate		= work->for_kupdate,
+		.for_background		= work->for_background,
+		.for_sync		= work->for_sync,
+		.range_cyclic		= work->range_cyclic,
+		.range_start		= 0,
+		.range_end		= LLONG_MAX,
+	};
+	long write_chunk;
+
+	write_chunk = writeback_chunk_size(wb, work);
+	wbc.nr_to_write = write_chunk;
+	sb->s_op->write_metadata(sb, &wbc);
+	work->nr_pages -= write_chunk - wbc.nr_to_write;
+
+	return write_chunk - wbc.nr_to_write;
+}
+
+
 /*
  * Write a portion of b_io inodes which belong to @sb.
  *
@@ -1490,6 +1515,7 @@ static long writeback_sb_inodes(struct super_block *sb,
 	unsigned long start_time = jiffies;
 	long write_chunk;
 	long wrote = 0;  /* count both pages and inodes */
+	bool done = false;
 
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
@@ -1606,12 +1632,18 @@ static long writeback_sb_inodes(struct super_block *sb,
 		 * background threshold and other termination conditions.
 		 */
 		if (wrote) {
-			if (time_is_before_jiffies(start_time + HZ / 10UL))
-				break;
-			if (work->nr_pages <= 0)
+			if (time_is_before_jiffies(start_time + HZ / 10UL) ||
+			    work->nr_pages <= 0) {
+				done = true;
 				break;
+			}
 		}
 	}
+	if (!done && sb->s_op->write_metadata) {
+		spin_unlock(&wb->list_lock);
+		wrote += writeback_sb_metadata(sb, wb, work);
+		spin_unlock(&wb->list_lock);
+	}
 	return wrote;
 }
 
@@ -1620,6 +1652,7 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
 {
 	unsigned long start_time = jiffies;
 	long wrote = 0;
+	bool done = false;
 
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
@@ -1639,12 +1672,39 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
 
 		/* refer to the same tests at the end of writeback_sb_inodes */
 		if (wrote) {
-			if (time_is_before_jiffies(start_time + HZ / 10UL))
-				break;
-			if (work->nr_pages <= 0)
+			if (time_is_before_jiffies(start_time + HZ / 10UL) ||
+			    work->nr_pages <= 0) {
+				done = true;
 				break;
+			}
 		}
 	}
+
+	if (!done && wb_stat(wb, WB_METADATA_DIRTY_BYTES)) {
+		LIST_HEAD(list);
+
+		spin_unlock(&wb->list_lock);
+		spin_lock(&wb->bdi->sb_list_lock);
+		list_splice_init(&wb->bdi->dirty_sb_list, &list);
+		while (!list_empty(&list)) {
+			struct super_block *sb;
+
+			sb = list_first_entry(&list, struct super_block,
+					      s_bdi_dirty_list);
+			list_move_tail(&sb->s_bdi_dirty_list,
+				       &wb->bdi->dirty_sb_list);
+			if (!sb->s_op->write_metadata)
+				continue;
+			if (!trylock_super(sb))
+				continue;
+			spin_unlock(&wb->bdi->sb_list_lock);
+			wrote += writeback_sb_metadata(sb, wb, work);
+			spin_lock(&wb->bdi->sb_list_lock);
+			up_read(&sb->s_umount);
+		}
+		spin_unlock(&wb->bdi->sb_list_lock);
+		spin_lock(&wb->list_lock);
+	}
 	/* Leave any unwritten inodes on b_io */
 	return wrote;
 }
diff --git a/fs/super.c b/fs/super.c
index c2ff475..eb32913 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -215,6 +215,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	spin_lock_init(&s->s_inode_list_lock);
 	INIT_LIST_HEAD(&s->s_inodes_wb);
 	spin_lock_init(&s->s_inode_wblist_lock);
+	INIT_LIST_HEAD(&s->s_bdi_dirty_list);
 
 	if (list_lru_init_memcg(&s->s_dentry_lru))
 		goto fail;
@@ -305,6 +306,8 @@ void deactivate_locked_super(struct super_block *s)
 {
 	struct file_system_type *fs = s->s_type;
 	if (atomic_dec_and_test(&s->s_active)) {
+		struct backing_dev_info *bdi = s->s_bdi;
+
 		cleancache_invalidate_fs(s);
 		unregister_shrinker(&s->s_shrink);
 		fs->kill_sb(s);
@@ -317,6 +320,10 @@ void deactivate_locked_super(struct super_block *s)
 		list_lru_destroy(&s->s_dentry_lru);
 		list_lru_destroy(&s->s_inode_lru);
 
+		spin_lock(&bdi->sb_list_lock);
+		list_del_init(&s->s_bdi_dirty_list);
+		spin_unlock(&bdi->sb_list_lock);
+
 		put_filesystem(fs);
 		put_super(s);
 	} else {
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index cef0f24..11cea63 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -168,6 +168,8 @@ struct backing_dev_info {
 
 	struct timer_list laptop_mode_wb_timer;
 
+	spinlock_t sb_list_lock;
+	struct list_head dirty_sb_list;
 #ifdef CONFIG_DEBUG_FS
 	struct dentry *debug_dir;
 	struct dentry *debug_stats;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f3f0b4c8..f521187 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1430,6 +1430,8 @@ struct super_block {
 
 	spinlock_t		s_inode_wblist_lock;
 	struct list_head	s_inodes_wb;	/* writeback inodes */
+
+	struct list_head	s_bdi_dirty_list;
 };
 
 /* Helper functions so that in most cases filesystems will
@@ -1805,6 +1807,8 @@ struct super_operations {
 				  struct shrink_control *);
 	long (*free_cached_objects)(struct super_block *,
 				    struct shrink_control *);
+	void (*write_metadata)(struct super_block *sb,
+			       struct writeback_control *wbc);
 };
 
 /*
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index f0695b0..541f532 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -785,6 +785,8 @@ int bdi_init(struct backing_dev_info *bdi)
 	bdi->max_prop_frac = FPROP_FRAC_BASE;
 	INIT_LIST_HEAD(&bdi->bdi_list);
 	INIT_LIST_HEAD(&bdi->wb_list);
+	INIT_LIST_HEAD(&bdi->dirty_sb_list);
+	spin_lock_init(&bdi->sb_list_lock);
 	init_waitqueue_head(&bdi->wb_waitq);
 
 	ret = cgwb_bdi_init(bdi);
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] writeback: introduce super_operations->write_metadata
@ 2016-09-20 20:57   ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-20 20:57 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

Now that we have metadata counters in the VM, we need to provide a way to kick
writeback on dirty metadata.  Introduce super_operations->write_metadata.  This
allows file systems to deal with writing back any dirty metadata we need based
on the writeback needs of the system.  Since there is no inode to key off of we
need a list in the bdi for dirty super blocks to be added.  From there we can
find any dirty sb's on the bdi we are currently doing writeback on and call into
their ->write_metadata callback.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/fs-writeback.c                | 72 ++++++++++++++++++++++++++++++++++++----
 fs/super.c                       |  7 ++++
 include/linux/backing-dev-defs.h |  2 ++
 include/linux/fs.h               |  4 +++
 mm/backing-dev.c                 |  2 ++
 5 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index aafdb11..8cd072e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1464,6 +1464,31 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
 	return pages;
 }
 
+static long writeback_sb_metadata(struct super_block *sb,
+				  struct bdi_writeback *wb,
+				  struct wb_writeback_work *work)
+{
+	struct writeback_control wbc = {
+		.sync_mode		= work->sync_mode,
+		.tagged_writepages	= work->tagged_writepages,
+		.for_kupdate		= work->for_kupdate,
+		.for_background		= work->for_background,
+		.for_sync		= work->for_sync,
+		.range_cyclic		= work->range_cyclic,
+		.range_start		= 0,
+		.range_end		= LLONG_MAX,
+	};
+	long write_chunk;
+
+	write_chunk = writeback_chunk_size(wb, work);
+	wbc.nr_to_write = write_chunk;
+	sb->s_op->write_metadata(sb, &wbc);
+	work->nr_pages -= write_chunk - wbc.nr_to_write;
+
+	return write_chunk - wbc.nr_to_write;
+}
+
+
 /*
  * Write a portion of b_io inodes which belong to @sb.
  *
@@ -1490,6 +1515,7 @@ static long writeback_sb_inodes(struct super_block *sb,
 	unsigned long start_time = jiffies;
 	long write_chunk;
 	long wrote = 0;  /* count both pages and inodes */
+	bool done = false;
 
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
@@ -1606,12 +1632,18 @@ static long writeback_sb_inodes(struct super_block *sb,
 		 * background threshold and other termination conditions.
 		 */
 		if (wrote) {
-			if (time_is_before_jiffies(start_time + HZ / 10UL))
-				break;
-			if (work->nr_pages <= 0)
+			if (time_is_before_jiffies(start_time + HZ / 10UL) ||
+			    work->nr_pages <= 0) {
+				done = true;
 				break;
+			}
 		}
 	}
+	if (!done && sb->s_op->write_metadata) {
+		spin_unlock(&wb->list_lock);
+		wrote += writeback_sb_metadata(sb, wb, work);
+		spin_unlock(&wb->list_lock);
+	}
 	return wrote;
 }
 
@@ -1620,6 +1652,7 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
 {
 	unsigned long start_time = jiffies;
 	long wrote = 0;
+	bool done = false;
 
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
@@ -1639,12 +1672,39 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
 
 		/* refer to the same tests at the end of writeback_sb_inodes */
 		if (wrote) {
-			if (time_is_before_jiffies(start_time + HZ / 10UL))
-				break;
-			if (work->nr_pages <= 0)
+			if (time_is_before_jiffies(start_time + HZ / 10UL) ||
+			    work->nr_pages <= 0) {
+				done = true;
 				break;
+			}
 		}
 	}
+
+	if (!done && wb_stat(wb, WB_METADATA_DIRTY_BYTES)) {
+		LIST_HEAD(list);
+
+		spin_unlock(&wb->list_lock);
+		spin_lock(&wb->bdi->sb_list_lock);
+		list_splice_init(&wb->bdi->dirty_sb_list, &list);
+		while (!list_empty(&list)) {
+			struct super_block *sb;
+
+			sb = list_first_entry(&list, struct super_block,
+					      s_bdi_dirty_list);
+			list_move_tail(&sb->s_bdi_dirty_list,
+				       &wb->bdi->dirty_sb_list);
+			if (!sb->s_op->write_metadata)
+				continue;
+			if (!trylock_super(sb))
+				continue;
+			spin_unlock(&wb->bdi->sb_list_lock);
+			wrote += writeback_sb_metadata(sb, wb, work);
+			spin_lock(&wb->bdi->sb_list_lock);
+			up_read(&sb->s_umount);
+		}
+		spin_unlock(&wb->bdi->sb_list_lock);
+		spin_lock(&wb->list_lock);
+	}
 	/* Leave any unwritten inodes on b_io */
 	return wrote;
 }
diff --git a/fs/super.c b/fs/super.c
index c2ff475..eb32913 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -215,6 +215,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	spin_lock_init(&s->s_inode_list_lock);
 	INIT_LIST_HEAD(&s->s_inodes_wb);
 	spin_lock_init(&s->s_inode_wblist_lock);
+	INIT_LIST_HEAD(&s->s_bdi_dirty_list);
 
 	if (list_lru_init_memcg(&s->s_dentry_lru))
 		goto fail;
@@ -305,6 +306,8 @@ void deactivate_locked_super(struct super_block *s)
 {
 	struct file_system_type *fs = s->s_type;
 	if (atomic_dec_and_test(&s->s_active)) {
+		struct backing_dev_info *bdi = s->s_bdi;
+
 		cleancache_invalidate_fs(s);
 		unregister_shrinker(&s->s_shrink);
 		fs->kill_sb(s);
@@ -317,6 +320,10 @@ void deactivate_locked_super(struct super_block *s)
 		list_lru_destroy(&s->s_dentry_lru);
 		list_lru_destroy(&s->s_inode_lru);
 
+		spin_lock(&bdi->sb_list_lock);
+		list_del_init(&s->s_bdi_dirty_list);
+		spin_unlock(&bdi->sb_list_lock);
+
 		put_filesystem(fs);
 		put_super(s);
 	} else {
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index cef0f24..11cea63 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -168,6 +168,8 @@ struct backing_dev_info {
 
 	struct timer_list laptop_mode_wb_timer;
 
+	spinlock_t sb_list_lock;
+	struct list_head dirty_sb_list;
 #ifdef CONFIG_DEBUG_FS
 	struct dentry *debug_dir;
 	struct dentry *debug_stats;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f3f0b4c8..f521187 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1430,6 +1430,8 @@ struct super_block {
 
 	spinlock_t		s_inode_wblist_lock;
 	struct list_head	s_inodes_wb;	/* writeback inodes */
+
+	struct list_head	s_bdi_dirty_list;
 };
 
 /* Helper functions so that in most cases filesystems will
@@ -1805,6 +1807,8 @@ struct super_operations {
 				  struct shrink_control *);
 	long (*free_cached_objects)(struct super_block *,
 				    struct shrink_control *);
+	void (*write_metadata)(struct super_block *sb,
+			       struct writeback_control *wbc);
 };
 
 /*
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index f0695b0..541f532 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -785,6 +785,8 @@ int bdi_init(struct backing_dev_info *bdi)
 	bdi->max_prop_frac = FPROP_FRAC_BASE;
 	INIT_LIST_HEAD(&bdi->bdi_list);
 	INIT_LIST_HEAD(&bdi->wb_list);
+	INIT_LIST_HEAD(&bdi->dirty_sb_list);
+	spin_lock_init(&bdi->sb_list_lock);
 	init_waitqueue_head(&bdi->wb_waitq);
 
 	ret = cgwb_bdi_init(bdi);
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] writeback: allow for dirty metadata accounting
  2016-09-20 20:57   ` Josef Bacik
@ 2016-09-22 11:18     ` Jan Kara
  -1 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2016-09-22 11:18 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On Tue 20-09-16 16:57:46, Josef Bacik wrote:
> Btrfs has no bounds except memory on the amount of dirty memory that we have in
> use for metadata.  Historically we have used a special inode so we could take
> advantage of the balance_dirty_pages throttling that comes with using pagecache.
> However as we'd like to support different blocksizes it would be nice to not
> have to rely on pagecache, but still get the balance_dirty_pages throttling
> without having to do it ourselves.
> 
> So introduce *METADATA_DIRTY_BYTES and *METADATA_WRITEBACK_BYTES.  These are
> zone and bdi_writeback counters to keep track of how many bytes we have in
> flight for METADATA.  We need to count in bytes as blocksizes could be
> percentages of pagesize.  We simply convert the bytes to number of pages where
> it is needed for the throttling.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  arch/tile/mm/pgtable.c           |   3 +-
>  drivers/base/node.c              |   6 ++
>  fs/fs-writeback.c                |   2 +
>  fs/proc/meminfo.c                |   5 ++
>  include/linux/backing-dev-defs.h |   2 +
>  include/linux/mm.h               |   9 +++
>  include/linux/mmzone.h           |   2 +
>  include/trace/events/writeback.h |  13 +++-
>  mm/backing-dev.c                 |   5 ++
>  mm/page-writeback.c              | 157 +++++++++++++++++++++++++++++++++++----
>  mm/page_alloc.c                  |  16 +++-
>  mm/vmscan.c                      |   4 +-
>  12 files changed, 200 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
> index 7cc6ee7..9543468 100644
> --- a/arch/tile/mm/pgtable.c
> +++ b/arch/tile/mm/pgtable.c
> @@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
>  {
>  	struct zone *zone;
>  
> -	pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
> +	pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
>  	       (global_node_page_state(NR_ACTIVE_ANON) +
>  		global_node_page_state(NR_ACTIVE_FILE)),
>  	       (global_node_page_state(NR_INACTIVE_ANON) +
>  		global_node_page_state(NR_INACTIVE_FILE)),
>  	       global_node_page_state(NR_FILE_DIRTY),
> +	       global_node_page_state(NR_METADATA_DIRTY),

Leftover from previous version? Ah, it is tile architecture so I see how it
could have passed testing ;)

> @@ -506,6 +530,10 @@ bool node_dirty_ok(struct pglist_data *pgdat)
>  	nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
>  	nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
>  	nr_pages += node_page_state(pgdat, NR_WRITEBACK);
> +	nr_pages += (node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
> +		     PAGE_SHIFT);
> +	nr_pages += (node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES) >>
> +		     PAGE_SHIFT);
>  
>  	return nr_pages <= limit;
>  }

I still don't think this is correct. It currently achieves the same
behavior as before the patch but once you start accounting something else
than pagecache pages into these counters, things will go wrong. This
function is used to control distribution of pagecache pages among NUMA
nodes and as such it should IMHO only account for pagecache pages...

> @@ -3714,7 +3714,9 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat)
>  
>  	/* If we can't clean pages, remove dirty pages from consideration */
>  	if (!(node_reclaim_mode & RECLAIM_WRITE))
> -		delta += node_page_state(pgdat, NR_FILE_DIRTY);
> +		delta += node_page_state(pgdat, NR_FILE_DIRTY) +
> +			(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
> +			 PAGE_SHIFT);
>  
>  	/* Watch for any possible underflows due to delta */
>  	if (unlikely(delta > nr_pagecache_reclaimable))

The same comment as above applies here.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] writeback: allow for dirty metadata accounting
@ 2016-09-22 11:18     ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2016-09-22 11:18 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On Tue 20-09-16 16:57:46, Josef Bacik wrote:
> Btrfs has no bounds except memory on the amount of dirty memory that we have in
> use for metadata.  Historically we have used a special inode so we could take
> advantage of the balance_dirty_pages throttling that comes with using pagecache.
> However as we'd like to support different blocksizes it would be nice to not
> have to rely on pagecache, but still get the balance_dirty_pages throttling
> without having to do it ourselves.
> 
> So introduce *METADATA_DIRTY_BYTES and *METADATA_WRITEBACK_BYTES.  These are
> zone and bdi_writeback counters to keep track of how many bytes we have in
> flight for METADATA.  We need to count in bytes as blocksizes could be
> percentages of pagesize.  We simply convert the bytes to number of pages where
> it is needed for the throttling.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  arch/tile/mm/pgtable.c           |   3 +-
>  drivers/base/node.c              |   6 ++
>  fs/fs-writeback.c                |   2 +
>  fs/proc/meminfo.c                |   5 ++
>  include/linux/backing-dev-defs.h |   2 +
>  include/linux/mm.h               |   9 +++
>  include/linux/mmzone.h           |   2 +
>  include/trace/events/writeback.h |  13 +++-
>  mm/backing-dev.c                 |   5 ++
>  mm/page-writeback.c              | 157 +++++++++++++++++++++++++++++++++++----
>  mm/page_alloc.c                  |  16 +++-
>  mm/vmscan.c                      |   4 +-
>  12 files changed, 200 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
> index 7cc6ee7..9543468 100644
> --- a/arch/tile/mm/pgtable.c
> +++ b/arch/tile/mm/pgtable.c
> @@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
>  {
>  	struct zone *zone;
>  
> -	pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
> +	pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
>  	       (global_node_page_state(NR_ACTIVE_ANON) +
>  		global_node_page_state(NR_ACTIVE_FILE)),
>  	       (global_node_page_state(NR_INACTIVE_ANON) +
>  		global_node_page_state(NR_INACTIVE_FILE)),
>  	       global_node_page_state(NR_FILE_DIRTY),
> +	       global_node_page_state(NR_METADATA_DIRTY),

Leftover from previous version? Ah, it is tile architecture so I see how it
could have passed testing ;)

> @@ -506,6 +530,10 @@ bool node_dirty_ok(struct pglist_data *pgdat)
>  	nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
>  	nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
>  	nr_pages += node_page_state(pgdat, NR_WRITEBACK);
> +	nr_pages += (node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
> +		     PAGE_SHIFT);
> +	nr_pages += (node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES) >>
> +		     PAGE_SHIFT);
>  
>  	return nr_pages <= limit;
>  }

I still don't think this is correct. It currently achieves the same
behavior as before the patch but once you start accounting something else
than pagecache pages into these counters, things will go wrong. This
function is used to control distribution of pagecache pages among NUMA
nodes and as such it should IMHO only account for pagecache pages...

> @@ -3714,7 +3714,9 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat)
>  
>  	/* If we can't clean pages, remove dirty pages from consideration */
>  	if (!(node_reclaim_mode & RECLAIM_WRITE))
> -		delta += node_page_state(pgdat, NR_FILE_DIRTY);
> +		delta += node_page_state(pgdat, NR_FILE_DIRTY) +
> +			(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
> +			 PAGE_SHIFT);
>  
>  	/* Watch for any possible underflows due to delta */
>  	if (unlikely(delta > nr_pagecache_reclaimable))

The same comment as above applies here.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes
  2016-09-20 20:57   ` Josef Bacik
@ 2016-09-22 11:34     ` Jan Kara
  -1 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2016-09-22 11:34 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On Tue 20-09-16 16:57:47, Josef Bacik wrote:
> These are counters that constantly go up in order to do bandwidth calculations.
> It isn't important what the units are in, as long as they are consistent between
> the two of them, so convert them to count bytes written/dirtied, and allow the
> metadata accounting stuff to change the counters as well.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/fuse/file.c                   |  4 ++--
>  include/linux/backing-dev-defs.h |  4 ++--
>  include/linux/backing-dev.h      |  2 +-
>  mm/backing-dev.c                 |  8 ++++----
>  mm/page-writeback.c              | 26 ++++++++++++++++----------
>  5 files changed, 25 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index f394aff..3f5991e 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1466,7 +1466,7 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
>  	for (i = 0; i < req->num_pages; i++) {
>  		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
>  		dec_node_page_state(req->pages[i], NR_WRITEBACK_TEMP);
> -		wb_writeout_inc(&bdi->wb);
> +		wb_writeout_inc(&bdi->wb, PAGE_SIZE);

Nitpick: Rename this to wb_writeout_add()? You have to change all the call
sites anyway and it is more consistent with other naming.

> @@ -2523,6 +2523,7 @@ void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
>  	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
>  			      bytes);
>  	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
> +	__add_wb_stat(&bdi->wb, WB_DIRTIED_BYTES, bytes);
>  	current->nr_dirtied++;
>  	task_io_account_write(bytes);
>  	this_cpu_inc(bdp_ratelimits);
> @@ -2593,6 +2594,7 @@ void account_metadata_end_writeback(struct page *page,
>  	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
>  	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
>  					 -bytes);
> +	__add_wb_stat(&bdi->wb, WB_WRITTEN_BYTES, bytes);
>  	local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL(account_metadata_end_writeback);

It seems it would make sense to move this patch to be second in the
series so that above two functions could do the right thing from the
beginning.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes
@ 2016-09-22 11:34     ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2016-09-22 11:34 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On Tue 20-09-16 16:57:47, Josef Bacik wrote:
> These are counters that constantly go up in order to do bandwidth calculations.
> It isn't important what the units are in, as long as they are consistent between
> the two of them, so convert them to count bytes written/dirtied, and allow the
> metadata accounting stuff to change the counters as well.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/fuse/file.c                   |  4 ++--
>  include/linux/backing-dev-defs.h |  4 ++--
>  include/linux/backing-dev.h      |  2 +-
>  mm/backing-dev.c                 |  8 ++++----
>  mm/page-writeback.c              | 26 ++++++++++++++++----------
>  5 files changed, 25 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index f394aff..3f5991e 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1466,7 +1466,7 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
>  	for (i = 0; i < req->num_pages; i++) {
>  		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
>  		dec_node_page_state(req->pages[i], NR_WRITEBACK_TEMP);
> -		wb_writeout_inc(&bdi->wb);
> +		wb_writeout_inc(&bdi->wb, PAGE_SIZE);

Nitpick: Rename this to wb_writeout_add()? You have to change all the call
sites anyway and it is more consistent with other naming.

> @@ -2523,6 +2523,7 @@ void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
>  	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
>  			      bytes);
>  	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
> +	__add_wb_stat(&bdi->wb, WB_DIRTIED_BYTES, bytes);
>  	current->nr_dirtied++;
>  	task_io_account_write(bytes);
>  	this_cpu_inc(bdp_ratelimits);
> @@ -2593,6 +2594,7 @@ void account_metadata_end_writeback(struct page *page,
>  	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
>  	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
>  					 -bytes);
> +	__add_wb_stat(&bdi->wb, WB_WRITTEN_BYTES, bytes);
>  	local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL(account_metadata_end_writeback);

It seems it would make sense to move this patch to be second in the
series so that above two functions could do the right thing from the
beginning.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] writeback: introduce super_operations->write_metadata
  2016-09-20 20:57   ` Josef Bacik
@ 2016-09-22 11:48     ` Jan Kara
  -1 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2016-09-22 11:48 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On Tue 20-09-16 16:57:48, Josef Bacik wrote:
> Now that we have metadata counters in the VM, we need to provide a way to kick
> writeback on dirty metadata.  Introduce super_operations->write_metadata.  This
> allows file systems to deal with writing back any dirty metadata we need based
> on the writeback needs of the system.  Since there is no inode to key off of we
> need a list in the bdi for dirty super blocks to be added.  From there we can
> find any dirty sb's on the bdi we are currently doing writeback on and call into
> their ->write_metadata callback.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/fs-writeback.c                | 72 ++++++++++++++++++++++++++++++++++++----
>  fs/super.c                       |  7 ++++
>  include/linux/backing-dev-defs.h |  2 ++
>  include/linux/fs.h               |  4 +++
>  mm/backing-dev.c                 |  2 ++
>  5 files changed, 81 insertions(+), 6 deletions(-)
> 

...

> +	if (!done && sb->s_op->write_metadata) {
> +		spin_unlock(&wb->list_lock);
> +		wrote += writeback_sb_metadata(sb, wb, work);
> +		spin_unlock(&wb->list_lock);
		^^^
		spin_lock();

Otherwise the patch looks good to me. So feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

after fixing the above.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] writeback: introduce super_operations->write_metadata
@ 2016-09-22 11:48     ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2016-09-22 11:48 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On Tue 20-09-16 16:57:48, Josef Bacik wrote:
> Now that we have metadata counters in the VM, we need to provide a way to kick
> writeback on dirty metadata.  Introduce super_operations->write_metadata.  This
> allows file systems to deal with writing back any dirty metadata we need based
> on the writeback needs of the system.  Since there is no inode to key off of we
> need a list in the bdi for dirty super blocks to be added.  From there we can
> find any dirty sb's on the bdi we are currently doing writeback on and call into
> their ->write_metadata callback.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/fs-writeback.c                | 72 ++++++++++++++++++++++++++++++++++++----
>  fs/super.c                       |  7 ++++
>  include/linux/backing-dev-defs.h |  2 ++
>  include/linux/fs.h               |  4 +++
>  mm/backing-dev.c                 |  2 ++
>  5 files changed, 81 insertions(+), 6 deletions(-)
> 

...

> +	if (!done && sb->s_op->write_metadata) {
> +		spin_unlock(&wb->list_lock);
> +		wrote += writeback_sb_metadata(sb, wb, work);
> +		spin_unlock(&wb->list_lock);
		^^^
		spin_lock();

Otherwise the patch looks good to me. So feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

after fixing the above.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] writeback: allow for dirty metadata accounting
  2016-09-22 11:18     ` Jan Kara
  (?)
@ 2016-09-22 13:34       ` Josef Bacik
  -1 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-22 13:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On 09/22/2016 07:18 AM, Jan Kara wrote:
> On Tue 20-09-16 16:57:46, Josef Bacik wrote:
>> Btrfs has no bounds except memory on the amount of dirty memory that we have in
>> use for metadata.  Historically we have used a special inode so we could take
>> advantage of the balance_dirty_pages throttling that comes with using pagecache.
>> However as we'd like to support different blocksizes it would be nice to not
>> have to rely on pagecache, but still get the balance_dirty_pages throttling
>> without having to do it ourselves.
>>
>> So introduce *METADATA_DIRTY_BYTES and *METADATA_WRITEBACK_BYTES.  These are
>> zone and bdi_writeback counters to keep track of how many bytes we have in
>> flight for METADATA.  We need to count in bytes as blocksizes could be
>> percentages of pagesize.  We simply convert the bytes to number of pages where
>> it is needed for the throttling.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> ---
>>  arch/tile/mm/pgtable.c           |   3 +-
>>  drivers/base/node.c              |   6 ++
>>  fs/fs-writeback.c                |   2 +
>>  fs/proc/meminfo.c                |   5 ++
>>  include/linux/backing-dev-defs.h |   2 +
>>  include/linux/mm.h               |   9 +++
>>  include/linux/mmzone.h           |   2 +
>>  include/trace/events/writeback.h |  13 +++-
>>  mm/backing-dev.c                 |   5 ++
>>  mm/page-writeback.c              | 157 +++++++++++++++++++++++++++++++++++----
>>  mm/page_alloc.c                  |  16 +++-
>>  mm/vmscan.c                      |   4 +-
>>  12 files changed, 200 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
>> index 7cc6ee7..9543468 100644
>> --- a/arch/tile/mm/pgtable.c
>> +++ b/arch/tile/mm/pgtable.c
>> @@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
>>  {
>>  	struct zone *zone;
>>
>> -	pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
>> +	pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
>>  	       (global_node_page_state(NR_ACTIVE_ANON) +
>>  		global_node_page_state(NR_ACTIVE_FILE)),
>>  	       (global_node_page_state(NR_INACTIVE_ANON) +
>>  		global_node_page_state(NR_INACTIVE_FILE)),
>>  	       global_node_page_state(NR_FILE_DIRTY),
>> +	       global_node_page_state(NR_METADATA_DIRTY),
>
> Leftover from previous version? Ah, it is tile architecture so I see how it
> could have passed testing ;)
>

Ah now I understand the kbuild error I got, oops ;)

>> @@ -506,6 +530,10 @@ bool node_dirty_ok(struct pglist_data *pgdat)
>>  	nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
>>  	nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
>>  	nr_pages += node_page_state(pgdat, NR_WRITEBACK);
>> +	nr_pages += (node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
>> +		     PAGE_SHIFT);
>> +	nr_pages += (node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES) >>
>> +		     PAGE_SHIFT);
>>
>>  	return nr_pages <= limit;
>>  }
>
> I still don't think this is correct. It currently achieves the same
> behavior as before the patch but once you start accounting something else
> than pagecache pages into these counters, things will go wrong. This
> function is used to control distribution of pagecache pages among NUMA
> nodes and as such it should IMHO only account for pagecache pages...
>
>> @@ -3714,7 +3714,9 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat)
>>
>>  	/* If we can't clean pages, remove dirty pages from consideration */
>>  	if (!(node_reclaim_mode & RECLAIM_WRITE))
>> -		delta += node_page_state(pgdat, NR_FILE_DIRTY);
>> +		delta += node_page_state(pgdat, NR_FILE_DIRTY) +
>> +			(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
>> +			 PAGE_SHIFT);
>>
>>  	/* Watch for any possible underflows due to delta */
>>  	if (unlikely(delta > nr_pagecache_reclaimable))
>
> The same comment as above applies here.
>

Ok that sounds reasonable, I'll make this change.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] writeback: allow for dirty metadata accounting
@ 2016-09-22 13:34       ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-22 13:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On 09/22/2016 07:18 AM, Jan Kara wrote:
> On Tue 20-09-16 16:57:46, Josef Bacik wrote:
>> Btrfs has no bounds except memory on the amount of dirty memory that we have in
>> use for metadata.  Historically we have used a special inode so we could take
>> advantage of the balance_dirty_pages throttling that comes with using pagecache.
>> However as we'd like to support different blocksizes it would be nice to not
>> have to rely on pagecache, but still get the balance_dirty_pages throttling
>> without having to do it ourselves.
>>
>> So introduce *METADATA_DIRTY_BYTES and *METADATA_WRITEBACK_BYTES.  These are
>> zone and bdi_writeback counters to keep track of how many bytes we have in
>> flight for METADATA.  We need to count in bytes as blocksizes could be
>> percentages of pagesize.  We simply convert the bytes to number of pages where
>> it is needed for the throttling.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> ---
>>  arch/tile/mm/pgtable.c           |   3 +-
>>  drivers/base/node.c              |   6 ++
>>  fs/fs-writeback.c                |   2 +
>>  fs/proc/meminfo.c                |   5 ++
>>  include/linux/backing-dev-defs.h |   2 +
>>  include/linux/mm.h               |   9 +++
>>  include/linux/mmzone.h           |   2 +
>>  include/trace/events/writeback.h |  13 +++-
>>  mm/backing-dev.c                 |   5 ++
>>  mm/page-writeback.c              | 157 +++++++++++++++++++++++++++++++++++----
>>  mm/page_alloc.c                  |  16 +++-
>>  mm/vmscan.c                      |   4 +-
>>  12 files changed, 200 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
>> index 7cc6ee7..9543468 100644
>> --- a/arch/tile/mm/pgtable.c
>> +++ b/arch/tile/mm/pgtable.c
>> @@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
>>  {
>>  	struct zone *zone;
>>
>> -	pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
>> +	pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
>>  	       (global_node_page_state(NR_ACTIVE_ANON) +
>>  		global_node_page_state(NR_ACTIVE_FILE)),
>>  	       (global_node_page_state(NR_INACTIVE_ANON) +
>>  		global_node_page_state(NR_INACTIVE_FILE)),
>>  	       global_node_page_state(NR_FILE_DIRTY),
>> +	       global_node_page_state(NR_METADATA_DIRTY),
>
> Leftover from previous version? Ah, it is tile architecture so I see how it
> could have passed testing ;)
>

Ah now I understand the kbuild error I got, oops ;)

>> @@ -506,6 +530,10 @@ bool node_dirty_ok(struct pglist_data *pgdat)
>>  	nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
>>  	nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
>>  	nr_pages += node_page_state(pgdat, NR_WRITEBACK);
>> +	nr_pages += (node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
>> +		     PAGE_SHIFT);
>> +	nr_pages += (node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES) >>
>> +		     PAGE_SHIFT);
>>
>>  	return nr_pages <= limit;
>>  }
>
> I still don't think this is correct. It currently achieves the same
> behavior as before the patch but once you start accounting something else
> than pagecache pages into these counters, things will go wrong. This
> function is used to control distribution of pagecache pages among NUMA
> nodes and as such it should IMHO only account for pagecache pages...
>
>> @@ -3714,7 +3714,9 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat)
>>
>>  	/* If we can't clean pages, remove dirty pages from consideration */
>>  	if (!(node_reclaim_mode & RECLAIM_WRITE))
>> -		delta += node_page_state(pgdat, NR_FILE_DIRTY);
>> +		delta += node_page_state(pgdat, NR_FILE_DIRTY) +
>> +			(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
>> +			 PAGE_SHIFT);
>>
>>  	/* Watch for any possible underflows due to delta */
>>  	if (unlikely(delta > nr_pagecache_reclaimable))
>
> The same comment as above applies here.
>

Ok that sounds reasonable, I'll make this change.  Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] writeback: allow for dirty metadata accounting
@ 2016-09-22 13:34       ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-22 13:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On 09/22/2016 07:18 AM, Jan Kara wrote:
> On Tue 20-09-16 16:57:46, Josef Bacik wrote:
>> Btrfs has no bounds except memory on the amount of dirty memory that we have in
>> use for metadata.  Historically we have used a special inode so we could take
>> advantage of the balance_dirty_pages throttling that comes with using pagecache.
>> However as we'd like to support different blocksizes it would be nice to not
>> have to rely on pagecache, but still get the balance_dirty_pages throttling
>> without having to do it ourselves.
>>
>> So introduce *METADATA_DIRTY_BYTES and *METADATA_WRITEBACK_BYTES.  These are
>> zone and bdi_writeback counters to keep track of how many bytes we have in
>> flight for METADATA.  We need to count in bytes as blocksizes could be
>> percentages of pagesize.  We simply convert the bytes to number of pages where
>> it is needed for the throttling.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> ---
>>  arch/tile/mm/pgtable.c           |   3 +-
>>  drivers/base/node.c              |   6 ++
>>  fs/fs-writeback.c                |   2 +
>>  fs/proc/meminfo.c                |   5 ++
>>  include/linux/backing-dev-defs.h |   2 +
>>  include/linux/mm.h               |   9 +++
>>  include/linux/mmzone.h           |   2 +
>>  include/trace/events/writeback.h |  13 +++-
>>  mm/backing-dev.c                 |   5 ++
>>  mm/page-writeback.c              | 157 +++++++++++++++++++++++++++++++++++----
>>  mm/page_alloc.c                  |  16 +++-
>>  mm/vmscan.c                      |   4 +-
>>  12 files changed, 200 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
>> index 7cc6ee7..9543468 100644
>> --- a/arch/tile/mm/pgtable.c
>> +++ b/arch/tile/mm/pgtable.c
>> @@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
>>  {
>>  	struct zone *zone;
>>
>> -	pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
>> +	pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
>>  	       (global_node_page_state(NR_ACTIVE_ANON) +
>>  		global_node_page_state(NR_ACTIVE_FILE)),
>>  	       (global_node_page_state(NR_INACTIVE_ANON) +
>>  		global_node_page_state(NR_INACTIVE_FILE)),
>>  	       global_node_page_state(NR_FILE_DIRTY),
>> +	       global_node_page_state(NR_METADATA_DIRTY),
>
> Leftover from previous version? Ah, it is tile architecture so I see how it
> could have passed testing ;)
>

Ah now I understand the kbuild error I got, oops ;)

>> @@ -506,6 +530,10 @@ bool node_dirty_ok(struct pglist_data *pgdat)
>>  	nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
>>  	nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
>>  	nr_pages += node_page_state(pgdat, NR_WRITEBACK);
>> +	nr_pages += (node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
>> +		     PAGE_SHIFT);
>> +	nr_pages += (node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES) >>
>> +		     PAGE_SHIFT);
>>
>>  	return nr_pages <= limit;
>>  }
>
> I still don't think this is correct. It currently achieves the same
> behavior as before the patch but once you start accounting something else
> than pagecache pages into these counters, things will go wrong. This
> function is used to control distribution of pagecache pages among NUMA
> nodes and as such it should IMHO only account for pagecache pages...
>
>> @@ -3714,7 +3714,9 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat)
>>
>>  	/* If we can't clean pages, remove dirty pages from consideration */
>>  	if (!(node_reclaim_mode & RECLAIM_WRITE))
>> -		delta += node_page_state(pgdat, NR_FILE_DIRTY);
>> +		delta += node_page_state(pgdat, NR_FILE_DIRTY) +
>> +			(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) >>
>> +			 PAGE_SHIFT);
>>
>>  	/* Watch for any possible underflows due to delta */
>>  	if (unlikely(delta > nr_pagecache_reclaimable))
>
> The same comment as above applies here.
>

Ok that sounds reasonable, I'll make this change.  Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes
  2016-09-22 11:34     ` Jan Kara
  (?)
@ 2016-09-22 13:35       ` Josef Bacik
  -1 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-22 13:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On 09/22/2016 07:34 AM, Jan Kara wrote:
> On Tue 20-09-16 16:57:47, Josef Bacik wrote:
>> These are counters that constantly go up in order to do bandwidth calculations.
>> It isn't important what the units are in, as long as they are consistent between
>> the two of them, so convert them to count bytes written/dirtied, and allow the
>> metadata accounting stuff to change the counters as well.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> ---
>>  fs/fuse/file.c                   |  4 ++--
>>  include/linux/backing-dev-defs.h |  4 ++--
>>  include/linux/backing-dev.h      |  2 +-
>>  mm/backing-dev.c                 |  8 ++++----
>>  mm/page-writeback.c              | 26 ++++++++++++++++----------
>>  5 files changed, 25 insertions(+), 19 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index f394aff..3f5991e 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -1466,7 +1466,7 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
>>  	for (i = 0; i < req->num_pages; i++) {
>>  		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
>>  		dec_node_page_state(req->pages[i], NR_WRITEBACK_TEMP);
>> -		wb_writeout_inc(&bdi->wb);
>> +		wb_writeout_inc(&bdi->wb, PAGE_SIZE);
>
> Nitpick: Rename this to wb_writeout_add()? You have to change all the call
> sites anyway and it is more consistent with other naming.
>
>> @@ -2523,6 +2523,7 @@ void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
>>  	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
>>  			      bytes);
>>  	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
>> +	__add_wb_stat(&bdi->wb, WB_DIRTIED_BYTES, bytes);
>>  	current->nr_dirtied++;
>>  	task_io_account_write(bytes);
>>  	this_cpu_inc(bdp_ratelimits);
>> @@ -2593,6 +2594,7 @@ void account_metadata_end_writeback(struct page *page,
>>  	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
>>  	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
>>  					 -bytes);
>> +	__add_wb_stat(&bdi->wb, WB_WRITTEN_BYTES, bytes);
>>  	local_irq_restore(flags);
>>  }
>>  EXPORT_SYMBOL(account_metadata_end_writeback);
>
> It seems it would make sense to move this patch to be second in the
> series so that above two functions could do the right thing from the
> beginning.
>

Yup that's fair, I'll move this around and rename wb_writeout_inc.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes
@ 2016-09-22 13:35       ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-22 13:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On 09/22/2016 07:34 AM, Jan Kara wrote:
> On Tue 20-09-16 16:57:47, Josef Bacik wrote:
>> These are counters that constantly go up in order to do bandwidth calculations.
>> It isn't important what the units are in, as long as they are consistent between
>> the two of them, so convert them to count bytes written/dirtied, and allow the
>> metadata accounting stuff to change the counters as well.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> ---
>>  fs/fuse/file.c                   |  4 ++--
>>  include/linux/backing-dev-defs.h |  4 ++--
>>  include/linux/backing-dev.h      |  2 +-
>>  mm/backing-dev.c                 |  8 ++++----
>>  mm/page-writeback.c              | 26 ++++++++++++++++----------
>>  5 files changed, 25 insertions(+), 19 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index f394aff..3f5991e 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -1466,7 +1466,7 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
>>  	for (i = 0; i < req->num_pages; i++) {
>>  		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
>>  		dec_node_page_state(req->pages[i], NR_WRITEBACK_TEMP);
>> -		wb_writeout_inc(&bdi->wb);
>> +		wb_writeout_inc(&bdi->wb, PAGE_SIZE);
>
> Nitpick: Rename this to wb_writeout_add()? You have to change all the call
> sites anyway and it is more consistent with other naming.
>
>> @@ -2523,6 +2523,7 @@ void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
>>  	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
>>  			      bytes);
>>  	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
>> +	__add_wb_stat(&bdi->wb, WB_DIRTIED_BYTES, bytes);
>>  	current->nr_dirtied++;
>>  	task_io_account_write(bytes);
>>  	this_cpu_inc(bdp_ratelimits);
>> @@ -2593,6 +2594,7 @@ void account_metadata_end_writeback(struct page *page,
>>  	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
>>  	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
>>  					 -bytes);
>> +	__add_wb_stat(&bdi->wb, WB_WRITTEN_BYTES, bytes);
>>  	local_irq_restore(flags);
>>  }
>>  EXPORT_SYMBOL(account_metadata_end_writeback);
>
> It seems it would make sense to move this patch to be second in the
> series so that above two functions could do the right thing from the
> beginning.
>

Yup that's fair, I'll move this around and rename wb_writeout_inc.  Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes
@ 2016-09-22 13:35       ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-22 13:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On 09/22/2016 07:34 AM, Jan Kara wrote:
> On Tue 20-09-16 16:57:47, Josef Bacik wrote:
>> These are counters that constantly go up in order to do bandwidth calculations.
>> It isn't important what the units are in, as long as they are consistent between
>> the two of them, so convert them to count bytes written/dirtied, and allow the
>> metadata accounting stuff to change the counters as well.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> ---
>>  fs/fuse/file.c                   |  4 ++--
>>  include/linux/backing-dev-defs.h |  4 ++--
>>  include/linux/backing-dev.h      |  2 +-
>>  mm/backing-dev.c                 |  8 ++++----
>>  mm/page-writeback.c              | 26 ++++++++++++++++----------
>>  5 files changed, 25 insertions(+), 19 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index f394aff..3f5991e 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -1466,7 +1466,7 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
>>  	for (i = 0; i < req->num_pages; i++) {
>>  		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
>>  		dec_node_page_state(req->pages[i], NR_WRITEBACK_TEMP);
>> -		wb_writeout_inc(&bdi->wb);
>> +		wb_writeout_inc(&bdi->wb, PAGE_SIZE);
>
> Nitpick: Rename this to wb_writeout_add()? You have to change all the call
> sites anyway and it is more consistent with other naming.
>
>> @@ -2523,6 +2523,7 @@ void account_metadata_dirtied(struct page *page, struct backing_dev_info *bdi,
>>  	__mod_node_page_state(page_pgdat(page), NR_METADATA_DIRTY_BYTES,
>>  			      bytes);
>>  	__add_wb_stat(&bdi->wb, WB_METADATA_DIRTY_BYTES, bytes);
>> +	__add_wb_stat(&bdi->wb, WB_DIRTIED_BYTES, bytes);
>>  	current->nr_dirtied++;
>>  	task_io_account_write(bytes);
>>  	this_cpu_inc(bdp_ratelimits);
>> @@ -2593,6 +2594,7 @@ void account_metadata_end_writeback(struct page *page,
>>  	__add_wb_stat(&bdi->wb, WB_METADATA_WRITEBACK_BYTES, -bytes);
>>  	__mod_node_page_state(page_pgdat(page), NR_METADATA_WRITEBACK_BYTES,
>>  					 -bytes);
>> +	__add_wb_stat(&bdi->wb, WB_WRITTEN_BYTES, bytes);
>>  	local_irq_restore(flags);
>>  }
>>  EXPORT_SYMBOL(account_metadata_end_writeback);
>
> It seems it would make sense to move this patch to be second in the
> series so that above two functions could do the right thing from the
> beginning.
>

Yup that's fair, I'll move this around and rename wb_writeout_inc.  Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] writeback: introduce super_operations->write_metadata
  2016-09-22 11:48     ` Jan Kara
  (?)
@ 2016-09-22 13:36       ` Josef Bacik
  -1 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-22 13:36 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On 09/22/2016 07:48 AM, Jan Kara wrote:
> On Tue 20-09-16 16:57:48, Josef Bacik wrote:
>> Now that we have metadata counters in the VM, we need to provide a way to kick
>> writeback on dirty metadata.  Introduce super_operations->write_metadata.  This
>> allows file systems to deal with writing back any dirty metadata we need based
>> on the writeback needs of the system.  Since there is no inode to key off of we
>> need a list in the bdi for dirty super blocks to be added.  From there we can
>> find any dirty sb's on the bdi we are currently doing writeback on and call into
>> their ->write_metadata callback.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> ---
>>  fs/fs-writeback.c                | 72 ++++++++++++++++++++++++++++++++++++----
>>  fs/super.c                       |  7 ++++
>>  include/linux/backing-dev-defs.h |  2 ++
>>  include/linux/fs.h               |  4 +++
>>  mm/backing-dev.c                 |  2 ++
>>  5 files changed, 81 insertions(+), 6 deletions(-)
>>
>
> ...
>
>> +	if (!done && sb->s_op->write_metadata) {
>> +		spin_unlock(&wb->list_lock);
>> +		wrote += writeback_sb_metadata(sb, wb, work);
>> +		spin_unlock(&wb->list_lock);
> 		^^^
> 		spin_lock();
>
> Otherwise the patch looks good to me. So feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> after fixing the above.
>

Yup I hit this as soon as I started testing so I'll go ahead and add your 
reviewed-by.  I'll resend the whole series after these changes have actually 
gone through some testing since it seems you are happy with the overall 
direction.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] writeback: introduce super_operations->write_metadata
@ 2016-09-22 13:36       ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-22 13:36 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On 09/22/2016 07:48 AM, Jan Kara wrote:
> On Tue 20-09-16 16:57:48, Josef Bacik wrote:
>> Now that we have metadata counters in the VM, we need to provide a way to kick
>> writeback on dirty metadata.  Introduce super_operations->write_metadata.  This
>> allows file systems to deal with writing back any dirty metadata we need based
>> on the writeback needs of the system.  Since there is no inode to key off of we
>> need a list in the bdi for dirty super blocks to be added.  From there we can
>> find any dirty sb's on the bdi we are currently doing writeback on and call into
>> their ->write_metadata callback.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> ---
>>  fs/fs-writeback.c                | 72 ++++++++++++++++++++++++++++++++++++----
>>  fs/super.c                       |  7 ++++
>>  include/linux/backing-dev-defs.h |  2 ++
>>  include/linux/fs.h               |  4 +++
>>  mm/backing-dev.c                 |  2 ++
>>  5 files changed, 81 insertions(+), 6 deletions(-)
>>
>
> ...
>
>> +	if (!done && sb->s_op->write_metadata) {
>> +		spin_unlock(&wb->list_lock);
>> +		wrote += writeback_sb_metadata(sb, wb, work);
>> +		spin_unlock(&wb->list_lock);
> 		^^^
> 		spin_lock();
>
> Otherwise the patch looks good to me. So feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> after fixing the above.
>

Yup I hit this as soon as I started testing so I'll go ahead and add your 
reviewed-by.  I'll resend the whole series after these changes have actually 
gone through some testing since it seems you are happy with the overall 
direction.  Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] writeback: introduce super_operations->write_metadata
@ 2016-09-22 13:36       ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-09-22 13:36 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm, hannes

On 09/22/2016 07:48 AM, Jan Kara wrote:
> On Tue 20-09-16 16:57:48, Josef Bacik wrote:
>> Now that we have metadata counters in the VM, we need to provide a way to kick
>> writeback on dirty metadata.  Introduce super_operations->write_metadata.  This
>> allows file systems to deal with writing back any dirty metadata we need based
>> on the writeback needs of the system.  Since there is no inode to key off of we
>> need a list in the bdi for dirty super blocks to be added.  From there we can
>> find any dirty sb's on the bdi we are currently doing writeback on and call into
>> their ->write_metadata callback.
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>> ---
>>  fs/fs-writeback.c                | 72 ++++++++++++++++++++++++++++++++++++----
>>  fs/super.c                       |  7 ++++
>>  include/linux/backing-dev-defs.h |  2 ++
>>  include/linux/fs.h               |  4 +++
>>  mm/backing-dev.c                 |  2 ++
>>  5 files changed, 81 insertions(+), 6 deletions(-)
>>
>
> ...
>
>> +	if (!done && sb->s_op->write_metadata) {
>> +		spin_unlock(&wb->list_lock);
>> +		wrote += writeback_sb_metadata(sb, wb, work);
>> +		spin_unlock(&wb->list_lock);
> 		^^^
> 		spin_lock();
>
> Otherwise the patch looks good to me. So feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> after fixing the above.
>

Yup I hit this as soon as I started testing so I'll go ahead and add your 
reviewed-by.  I'll resend the whole series after these changes have actually 
gone through some testing since it seems you are happy with the overall 
direction.  Thanks,

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] writeback: allow for dirty metadata accounting
  2016-09-20 20:57   ` Josef Bacik
@ 2016-09-22 19:48     ` Johannes Weiner
  -1 siblings, 0 replies; 32+ messages in thread
From: Johannes Weiner @ 2016-09-22 19:48 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm

Hi Josef,

as we talked off line, I think the idea of maintaining a byte counter
and rounding in balance_dirty_pages() is the best way to do this. And
Jan spotted all the actual bugs, so I only have a few nitpicks :)

On Tue, Sep 20, 2016 at 04:57:46PM -0400, Josef Bacik wrote:
> @@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
>  {
>  	struct zone *zone;
>  
> -	pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
> +	pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
>  	       (global_node_page_state(NR_ACTIVE_ANON) +
>  		global_node_page_state(NR_ACTIVE_FILE)),
>  	       (global_node_page_state(NR_INACTIVE_ANON) +
>  		global_node_page_state(NR_INACTIVE_FILE)),
>  	       global_node_page_state(NR_FILE_DIRTY),
> +	       global_node_page_state(NR_METADATA_DIRTY),
>  	       global_node_page_state(NR_WRITEBACK),

Print NR_METADATA_WRITEBACK here as well?

> @@ -51,6 +51,8 @@ static DEVICE_ATTR(cpumap,  S_IRUGO, node_read_cpumask, NULL);
>  static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL);
>  
>  #define K(x) ((x) << (PAGE_SHIFT - 10))
> +#define BtoK(x) ((x) >> 10)
> +
>  static ssize_t node_read_meminfo(struct device *dev,
>  			struct device_attribute *attr, char *buf)
>  {
> @@ -99,7 +101,9 @@ static ssize_t node_read_meminfo(struct device *dev,
>  #endif
>  	n += sprintf(buf + n,
>  		       "Node %d Dirty:          %8lu kB\n"
> +		       "Node %d MetadataDirty:	%8lu kB\n"
>  		       "Node %d Writeback:      %8lu kB\n"
> +		       "Node %d MetaWriteback:  %8lu kB\n"

Between the enums and stat printing, the naming is kind of all over
the place. How about NR_META_DIRTY_BYTES and NR_META_WRITEBACK_BYTES
as a separate group than the existing dirty & writeback stats?

 	n += sprintf(buf + n,
 		       "Node %d Dirty:          %8lu kB\n"
 		       "Node %d Writeback:      %8lu kB\n"
+		       "Node %d MetaDirty:	%8lu kB\n"
+		       "Node %d MetaWriteback:  %8lu kB\n"

>  		       "Node %d FilePages:      %8lu kB\n"
>  		       "Node %d Mapped:         %8lu kB\n"
>  		       "Node %d AnonPages:      %8lu kB\n"
> @@ -119,7 +123,9 @@ static ssize_t node_read_meminfo(struct device *dev,
>  #endif
>  			,
>  		       nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
> +		       nid, BtoK(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES)),
>  		       nid, K(node_page_state(pgdat, NR_WRITEBACK)),
> +		       nid, BtoK(node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES)),
>  		       nid, K(node_page_state(pgdat, NR_FILE_PAGES)),
>  		       nid, K(node_page_state(pgdat, NR_FILE_MAPPED)),
>  		       nid, K(node_page_state(pgdat, NR_ANON_MAPPED)),
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 56c8fda..aafdb11 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1801,6 +1801,7 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb)
>  	return work;
>  }
>  
> +#define BtoP(x) ((x) >> PAGE_SHIFT)

Might be more readable inline:

> @@ -1809,6 +1810,7 @@ static unsigned long get_nr_dirty_pages(void)
>  {
>  	return global_node_page_state(NR_FILE_DIRTY) +
>  		global_node_page_state(NR_UNSTABLE_NFS) +
> +		BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES)) +

		global_node_page_state(NR_META_DIRTY_BYTES) / PAGE_SIZE +

>  		get_nr_dirty_inodes();
>  }

> @@ -80,7 +81,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>  		"SwapTotal:      %8lu kB\n"
>  		"SwapFree:       %8lu kB\n"
>  		"Dirty:          %8lu kB\n"
> +		"MetadataDirty:  %8lu kB\n"
>  		"Writeback:      %8lu kB\n"
> +		"MetaWriteback:  %8lu kB\n"

 		"Dirty:          %8lu kB\n"
 		"Writeback:      %8lu kB\n"
+		"MetaDirty:      %8lu kB\n"
+		"MetaWriteback:  %8lu kB\n"


>  		"AnonPages:      %8lu kB\n"
>  		"Mapped:         %8lu kB\n"
>  		"Shmem:          %8lu kB\n"
> @@ -139,7 +142,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>  		K(i.totalswap),
>  		K(i.freeswap),
>  		K(global_node_page_state(NR_FILE_DIRTY)),
> +		BtoK(global_node_page_state(NR_METADATA_DIRTY_BYTES)),
>  		K(global_node_page_state(NR_WRITEBACK)),
> +		BtoK(global_node_page_state(NR_META_WRITEBACK_BYTES)),

		K(global_node_page_state(NR_META_WRITEBACK_BYTES / PAGE_SIZE)),

and drop BtoK?

> @@ -34,6 +34,8 @@ typedef int (congested_fn)(void *, int);
>  enum wb_stat_item {
>  	WB_RECLAIMABLE,
>  	WB_WRITEBACK,
> +	WB_METADATA_DIRTY_BYTES,
> +	WB_METADATA_WRITEBACK_BYTES,

	WB_META_DIRTY_BYTES,
	WB_META_WRITEBACK_BYTES,

etc.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] writeback: allow for dirty metadata accounting
@ 2016-09-22 19:48     ` Johannes Weiner
  0 siblings, 0 replies; 32+ messages in thread
From: Johannes Weiner @ 2016-09-22 19:48 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, linux-fsdevel, kernel-team, jack, viro, dchinner,
	hch, linux-mm

Hi Josef,

as we talked off line, I think the idea of maintaining a byte counter
and rounding in balance_dirty_pages() is the best way to do this. And
Jan spotted all the actual bugs, so I only have a few nitpicks :)

On Tue, Sep 20, 2016 at 04:57:46PM -0400, Josef Bacik wrote:
> @@ -44,12 +44,13 @@ void show_mem(unsigned int filter)
>  {
>  	struct zone *zone;
>  
> -	pr_err("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
> +	pr_err("Active:%lu inactive:%lu dirty:%lu metadata_dirty:%lu writeback:%lu unstable:%lu free:%lu\n slab:%lu mapped:%lu pagetables:%lu bounce:%lu pagecache:%lu swap:%lu\n",
>  	       (global_node_page_state(NR_ACTIVE_ANON) +
>  		global_node_page_state(NR_ACTIVE_FILE)),
>  	       (global_node_page_state(NR_INACTIVE_ANON) +
>  		global_node_page_state(NR_INACTIVE_FILE)),
>  	       global_node_page_state(NR_FILE_DIRTY),
> +	       global_node_page_state(NR_METADATA_DIRTY),
>  	       global_node_page_state(NR_WRITEBACK),

Print NR_METADATA_WRITEBACK here as well?

> @@ -51,6 +51,8 @@ static DEVICE_ATTR(cpumap,  S_IRUGO, node_read_cpumask, NULL);
>  static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL);
>  
>  #define K(x) ((x) << (PAGE_SHIFT - 10))
> +#define BtoK(x) ((x) >> 10)
> +
>  static ssize_t node_read_meminfo(struct device *dev,
>  			struct device_attribute *attr, char *buf)
>  {
> @@ -99,7 +101,9 @@ static ssize_t node_read_meminfo(struct device *dev,
>  #endif
>  	n += sprintf(buf + n,
>  		       "Node %d Dirty:          %8lu kB\n"
> +		       "Node %d MetadataDirty:	%8lu kB\n"
>  		       "Node %d Writeback:      %8lu kB\n"
> +		       "Node %d MetaWriteback:  %8lu kB\n"

Between the enums and stat printing, the naming is kind of all over
the place. How about NR_META_DIRTY_BYTES and NR_META_WRITEBACK_BYTES
as a separate group than the existing dirty & writeback stats?

 	n += sprintf(buf + n,
 		       "Node %d Dirty:          %8lu kB\n"
 		       "Node %d Writeback:      %8lu kB\n"
+		       "Node %d MetaDirty:	%8lu kB\n"
+		       "Node %d MetaWriteback:  %8lu kB\n"

>  		       "Node %d FilePages:      %8lu kB\n"
>  		       "Node %d Mapped:         %8lu kB\n"
>  		       "Node %d AnonPages:      %8lu kB\n"
> @@ -119,7 +123,9 @@ static ssize_t node_read_meminfo(struct device *dev,
>  #endif
>  			,
>  		       nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
> +		       nid, BtoK(node_page_state(pgdat, NR_METADATA_DIRTY_BYTES)),
>  		       nid, K(node_page_state(pgdat, NR_WRITEBACK)),
> +		       nid, BtoK(node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES)),
>  		       nid, K(node_page_state(pgdat, NR_FILE_PAGES)),
>  		       nid, K(node_page_state(pgdat, NR_FILE_MAPPED)),
>  		       nid, K(node_page_state(pgdat, NR_ANON_MAPPED)),
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 56c8fda..aafdb11 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1801,6 +1801,7 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb)
>  	return work;
>  }
>  
> +#define BtoP(x) ((x) >> PAGE_SHIFT)

Might be more readable inline:

> @@ -1809,6 +1810,7 @@ static unsigned long get_nr_dirty_pages(void)
>  {
>  	return global_node_page_state(NR_FILE_DIRTY) +
>  		global_node_page_state(NR_UNSTABLE_NFS) +
> +		BtoP(global_node_page_state(NR_METADATA_DIRTY_BYTES)) +

		global_node_page_state(NR_META_DIRTY_BYTES) / PAGE_SIZE +

>  		get_nr_dirty_inodes();
>  }

> @@ -80,7 +81,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>  		"SwapTotal:      %8lu kB\n"
>  		"SwapFree:       %8lu kB\n"
>  		"Dirty:          %8lu kB\n"
> +		"MetadataDirty:  %8lu kB\n"
>  		"Writeback:      %8lu kB\n"
> +		"MetaWriteback:  %8lu kB\n"

 		"Dirty:          %8lu kB\n"
 		"Writeback:      %8lu kB\n"
+		"MetaDirty:      %8lu kB\n"
+		"MetaWriteback:  %8lu kB\n"


>  		"AnonPages:      %8lu kB\n"
>  		"Mapped:         %8lu kB\n"
>  		"Shmem:          %8lu kB\n"
> @@ -139,7 +142,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>  		K(i.totalswap),
>  		K(i.freeswap),
>  		K(global_node_page_state(NR_FILE_DIRTY)),
> +		BtoK(global_node_page_state(NR_METADATA_DIRTY_BYTES)),
>  		K(global_node_page_state(NR_WRITEBACK)),
> +		BtoK(global_node_page_state(NR_META_WRITEBACK_BYTES)),

		K(global_node_page_state(NR_META_WRITEBACK_BYTES / PAGE_SIZE)),

and drop BtoK?

> @@ -34,6 +34,8 @@ typedef int (congested_fn)(void *, int);
>  enum wb_stat_item {
>  	WB_RECLAIMABLE,
>  	WB_WRITEBACK,
> +	WB_METADATA_DIRTY_BYTES,
> +	WB_METADATA_WRITEBACK_BYTES,

	WB_META_DIRTY_BYTES,
	WB_META_WRITEBACK_BYTES,

etc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-09-22 19:48 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-20 20:57 [PATCH 0/4][V3] metadata throttling in writeback patches Josef Bacik
2016-09-20 20:57 ` Josef Bacik
2016-09-20 20:57 ` Josef Bacik
2016-09-20 20:57 ` [PATCH 1/4] remove mapping from balance_dirty_pages*() Josef Bacik
2016-09-20 20:57   ` Josef Bacik
2016-09-20 20:57   ` Josef Bacik
2016-09-20 20:57 ` [PATCH 2/4] writeback: allow for dirty metadata accounting Josef Bacik
2016-09-20 20:57   ` Josef Bacik
2016-09-20 20:57   ` Josef Bacik
2016-09-22 11:18   ` Jan Kara
2016-09-22 11:18     ` Jan Kara
2016-09-22 13:34     ` Josef Bacik
2016-09-22 13:34       ` Josef Bacik
2016-09-22 13:34       ` Josef Bacik
2016-09-22 19:48   ` Johannes Weiner
2016-09-22 19:48     ` Johannes Weiner
2016-09-20 20:57 ` [PATCH 3/4] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes Josef Bacik
2016-09-20 20:57   ` Josef Bacik
2016-09-20 20:57   ` Josef Bacik
2016-09-22 11:34   ` Jan Kara
2016-09-22 11:34     ` Jan Kara
2016-09-22 13:35     ` Josef Bacik
2016-09-22 13:35       ` Josef Bacik
2016-09-22 13:35       ` Josef Bacik
2016-09-20 20:57 ` [PATCH 4/4] writeback: introduce super_operations->write_metadata Josef Bacik
2016-09-20 20:57   ` Josef Bacik
2016-09-20 20:57   ` Josef Bacik
2016-09-22 11:48   ` Jan Kara
2016-09-22 11:48     ` Jan Kara
2016-09-22 13:36     ` Josef Bacik
2016-09-22 13:36       ` Josef Bacik
2016-09-22 13:36       ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.