linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] Remove remaining parts of congestions tracking code.
@ 2022-01-27  2:46 NeilBrown
  2022-01-27  2:46 ` [PATCH 2/9] Remove bdi_congested() and wb_congested() and related functions NeilBrown
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: NeilBrown @ 2022-01-27  2:46 UTC (permalink / raw)
  To: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

Congestion hasn't been reliably tracked for quite some time.
Most MM uses of it for guiding writeback decisions were removed in 5.16.
Some other uses were removed in 17-rc1.

This series removes the remaining places that test for congestion, and
the few places which still set it.

The second patch touches a few filesystems.  I didn't think there was
much value in splitting this out by filesystems, but if maintainers
would rather I did that, I will.

The f2fs, cephfs, fuse, NFS, and block patches can go through the
respective trees proving the final patch doesn't land until after they
all do - so maybe it should be held for 5.18-rc2 if all the rest lands
by 5.18-rc1.

Thanks,
NeilBrown

---

NeilBrown (9):
      Remove inode_congested()
      Remove bdi_congested() and wb_congested() and related functions
      f2fs: change retry waiting for f2fs_write_single_data_page()
      f2f2: replace some congestion_wait() calls with io_schedule_timeout()
      cephfs: don't set/clear bdi_congestion
      fuse: don't set/clear bdi_congested
      NFS: remove congestion control.
      block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC"
      Remove congestion tracking framework.


 block/bfq-iosched.c              |  2 +-
 drivers/block/drbd/drbd_int.h    |  3 --
 drivers/block/drbd/drbd_req.c    |  3 +-
 fs/ceph/addr.c                   | 27 ---------------
 fs/ceph/super.c                  |  2 --
 fs/ceph/super.h                  |  2 --
 fs/ext2/ialloc.c                 |  2 --
 fs/f2fs/compress.c               |  6 ++--
 fs/f2fs/data.c                   |  9 +++--
 fs/f2fs/segment.c                | 14 ++++----
 fs/f2fs/super.c                  |  8 ++---
 fs/fuse/control.c                | 17 ----------
 fs/fuse/dev.c                    |  8 -----
 fs/nfs/sysctl.c                  |  7 ----
 fs/nfs/write.c                   | 53 +----------------------------
 fs/nilfs2/segbuf.c               | 11 ------
 fs/xfs/xfs_buf.c                 |  3 --
 include/linux/backing-dev-defs.h |  8 -----
 include/linux/backing-dev.h      | 28 ----------------
 include/linux/nfs_fs.h           |  1 -
 include/linux/nfs_fs_sb.h        |  1 -
 include/trace/events/writeback.h | 28 ----------------
 mm/backing-dev.c                 | 57 --------------------------------
 mm/vmscan.c                      |  4 +--
 24 files changed, 25 insertions(+), 279 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/9] Remove inode_congested()
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
                   ` (3 preceding siblings ...)
  2022-01-27  2:46 ` [PATCH 8/9] block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC" NeilBrown
@ 2022-01-27  2:46 ` NeilBrown
  2022-01-28  9:37   ` Miklos Szeredi
  2022-01-27  2:46 ` [PATCH 6/9] fuse: don't set/clear bdi_congested NeilBrown
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2022-01-27  2:46 UTC (permalink / raw)
  To: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

inode_congested() reports if the backing-device for the inode is
congested.  Few bdi report congestion any more, only ceph, fuse, and
nfs.  Having support just for those is unlikely to be useful.

The places which test inode_congested() or it variants like
inode_write_congested(), avoid initiating IO if congestion is present.
We now have to rely on other places in the stack to back off, or abort
requests - we already do for everything except these 3 filesystems.

So remove inode_congested() and related functions, and remove the call
sites, assuming that inode_congested() always returns 'false'.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/fs-writeback.c           |   37 -------------------------------------
 include/linux/backing-dev.h |   22 ----------------------
 mm/fadvise.c                |    5 ++---
 mm/readahead.c              |    6 ------
 mm/vmscan.c                 |   17 +----------------
 5 files changed, 3 insertions(+), 84 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f8d7fe6db989..42a3dfad40b8 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -893,43 +893,6 @@ void wbc_account_cgroup_owner(struct writeback_control *wbc, struct page *page,
 }
 EXPORT_SYMBOL_GPL(wbc_account_cgroup_owner);
 
-/**
- * inode_congested - test whether an inode is congested
- * @inode: inode to test for congestion (may be NULL)
- * @cong_bits: mask of WB_[a]sync_congested bits to test
- *
- * Tests whether @inode is congested.  @cong_bits is the mask of congestion
- * bits to test and the return value is the mask of set bits.
- *
- * If cgroup writeback is enabled for @inode, the congestion state is
- * determined by whether the cgwb (cgroup bdi_writeback) for the blkcg
- * associated with @inode is congested; otherwise, the root wb's congestion
- * state is used.
- *
- * @inode is allowed to be NULL as this function is often called on
- * mapping->host which is NULL for the swapper space.
- */
-int inode_congested(struct inode *inode, int cong_bits)
-{
-	/*
-	 * Once set, ->i_wb never becomes NULL while the inode is alive.
-	 * Start transaction iff ->i_wb is visible.
-	 */
-	if (inode && inode_to_wb_is_valid(inode)) {
-		struct bdi_writeback *wb;
-		struct wb_lock_cookie lock_cookie = {};
-		bool congested;
-
-		wb = unlocked_inode_to_wb_begin(inode, &lock_cookie);
-		congested = wb_congested(wb, cong_bits);
-		unlocked_inode_to_wb_end(inode, &lock_cookie);
-		return congested;
-	}
-
-	return wb_congested(&inode_to_bdi(inode)->wb, cong_bits);
-}
-EXPORT_SYMBOL_GPL(inode_congested);
-
 /**
  * wb_split_bdi_pages - split nr_pages to write according to bandwidth
  * @wb: target bdi_writeback to split @nr_pages to
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 483979c1b9f4..860b675c2929 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -162,7 +162,6 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi,
 				    gfp_t gfp);
 void wb_memcg_offline(struct mem_cgroup *memcg);
 void wb_blkcg_offline(struct blkcg *blkcg);
-int inode_congested(struct inode *inode, int cong_bits);
 
 /**
  * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
@@ -390,29 +389,8 @@ static inline void wb_blkcg_offline(struct blkcg *blkcg)
 {
 }
 
-static inline int inode_congested(struct inode *inode, int cong_bits)
-{
-	return wb_congested(&inode_to_bdi(inode)->wb, cong_bits);
-}
-
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
-static inline int inode_read_congested(struct inode *inode)
-{
-	return inode_congested(inode, 1 << WB_sync_congested);
-}
-
-static inline int inode_write_congested(struct inode *inode)
-{
-	return inode_congested(inode, 1 << WB_async_congested);
-}
-
-static inline int inode_rw_congested(struct inode *inode)
-{
-	return inode_congested(inode, (1 << WB_sync_congested) |
-				      (1 << WB_async_congested));
-}
-
 static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits)
 {
 	return wb_congested(&bdi->wb, cong_bits);
diff --git a/mm/fadvise.c b/mm/fadvise.c
index d6baa4f451c5..338f16022012 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -109,9 +109,8 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 	case POSIX_FADV_NOREUSE:
 		break;
 	case POSIX_FADV_DONTNEED:
-		if (!inode_write_congested(mapping->host))
-			__filemap_fdatawrite_range(mapping, offset, endbyte,
-						   WB_SYNC_NONE);
+		__filemap_fdatawrite_range(mapping, offset, endbyte,
+					   WB_SYNC_NONE);
 
 		/*
 		 * First and last FULL page! Partial pages are deliberately
diff --git a/mm/readahead.c b/mm/readahead.c
index cf0dcf89eb69..feda2b1702f1 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -595,12 +595,6 @@ void page_cache_async_ra(struct readahead_control *ractl,
 
 	folio_clear_readahead(folio);
 
-	/*
-	 * Defer asynchronous read-ahead on IO congestion.
-	 */
-	if (inode_read_congested(ractl->mapping->host))
-		return;
-
 	if (blk_cgroup_congested())
 		return;
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 090bfb605ecf..ce8492939bd3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -989,17 +989,6 @@ static inline int is_page_cache_freeable(struct page *page)
 	return page_count(page) - page_has_private(page) == 1 + page_cache_pins;
 }
 
-static int may_write_to_inode(struct inode *inode)
-{
-	if (current->flags & PF_SWAPWRITE)
-		return 1;
-	if (!inode_write_congested(inode))
-		return 1;
-	if (inode_to_bdi(inode) == current->backing_dev_info)
-		return 1;
-	return 0;
-}
-
 /*
  * We detected a synchronous write error writing a page out.  Probably
  * -ENOSPC.  We need to propagate that into the address_space for a subsequent
@@ -1199,8 +1188,6 @@ static pageout_t pageout(struct page *page, struct address_space *mapping)
 	}
 	if (mapping->a_ops->writepage == NULL)
 		return PAGE_ACTIVATE;
-	if (!may_write_to_inode(mapping->host))
-		return PAGE_KEEP;
 
 	if (clear_page_dirty_for_io(page)) {
 		int res;
@@ -1576,9 +1563,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 		 * end of the LRU a second time.
 		 */
 		mapping = page_mapping(page);
-		if (((dirty || writeback) && mapping &&
-		     inode_write_congested(mapping->host)) ||
-		    (writeback && PageReclaim(page)))
+		if (writeback && PageReclaim(page))
 			stat->nr_congested++;
 
 		/*



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/9] Remove bdi_congested() and wb_congested() and related functions
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
@ 2022-01-27  2:46 ` NeilBrown
  2022-01-27 22:10   ` Ryusuke Konishi
  2022-01-27  2:46 ` [PATCH 5/9] cephfs: don't set/clear bdi_congestion NeilBrown
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2022-01-27  2:46 UTC (permalink / raw)
  To: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

These functions are no longer useful as the only bdis that report
congestion are in ceph, fuse, and nfs.  None of those bdis can be the
target of the calls in drbd, ext2, nilfs2, or xfs.

Removing the test on bdi_write_contested() in current_may_throttle()
could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
is set.

So replace the calls by 'false' and simplify the code - and remove the
functions.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/block/drbd/drbd_int.h |    3 ---
 drivers/block/drbd/drbd_req.c |    3 +--
 fs/ext2/ialloc.c              |    2 --
 fs/nilfs2/segbuf.c            |   11 -----------
 fs/xfs/xfs_buf.c              |    3 ---
 include/linux/backing-dev.h   |   26 --------------------------
 mm/vmscan.c                   |    4 +---
 7 files changed, 2 insertions(+), 50 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index f27d5b0f9a0b..f804b1bfb3e6 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -638,9 +638,6 @@ enum {
 	STATE_SENT,		/* Do not change state/UUIDs while this is set */
 	CALLBACK_PENDING,	/* Whether we have a call_usermodehelper(, UMH_WAIT_PROC)
 				 * pending, from drbd worker context.
-				 * If set, bdi_write_congested() returns true,
-				 * so shrink_page_list() would not recurse into,
-				 * and potentially deadlock on, this drbd worker.
 				 */
 	DISCONNECT_SENT,
 
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 3235532ae077..2e5fb7e442e3 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -909,8 +909,7 @@ static bool remote_due_to_read_balancing(struct drbd_device *device, sector_t se
 
 	switch (rbm) {
 	case RB_CONGESTED_REMOTE:
-		return bdi_read_congested(
-			device->ldev->backing_bdev->bd_disk->bdi);
+		return 0;
 	case RB_LEAST_PENDING:
 		return atomic_read(&device->local_cnt) >
 			atomic_read(&device->ap_pending_cnt) + atomic_read(&device->rs_pending_cnt);
diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c
index df14e750e9fe..d632764da240 100644
--- a/fs/ext2/ialloc.c
+++ b/fs/ext2/ialloc.c
@@ -173,8 +173,6 @@ static void ext2_preread_inode(struct inode *inode)
 	struct backing_dev_info *bdi;
 
 	bdi = inode_to_bdi(inode);
-	if (bdi_rw_congested(bdi))
-		return;
 
 	block_group = (inode->i_ino - 1) / EXT2_INODES_PER_GROUP(inode->i_sb);
 	gdp = ext2_get_group_desc(inode->i_sb, block_group, NULL);
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index 43287b0d3e9b..d1ebc9da7130 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -343,17 +343,6 @@ static int nilfs_segbuf_submit_bio(struct nilfs_segment_buffer *segbuf,
 	struct bio *bio = wi->bio;
 	int err;
 
-	if (segbuf->sb_nbio > 0 &&
-	    bdi_write_congested(segbuf->sb_super->s_bdi)) {
-		wait_for_completion(&segbuf->sb_bio_event);
-		segbuf->sb_nbio--;
-		if (unlikely(atomic_read(&segbuf->sb_err))) {
-			bio_put(bio);
-			err = -EIO;
-			goto failed;
-		}
-	}
-
 	bio->bi_end_io = nilfs_end_bio_write;
 	bio->bi_private = segbuf;
 	bio_set_op_attrs(bio, mode, mode_flags);
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index b45e0d50a405..b7ebcfe6b8d3 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -843,9 +843,6 @@ xfs_buf_readahead_map(
 {
 	struct xfs_buf		*bp;
 
-	if (bdi_read_congested(target->bt_bdev->bd_disk->bdi))
-		return;
-
 	xfs_buf_read_map(target, map, nmaps,
 		     XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD, &bp, ops,
 		     __this_address);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 860b675c2929..2d764566280c 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -135,11 +135,6 @@ static inline bool writeback_in_progress(struct bdi_writeback *wb)
 
 struct backing_dev_info *inode_to_bdi(struct inode *inode);
 
-static inline int wb_congested(struct bdi_writeback *wb, int cong_bits)
-{
-	return wb->congested & cong_bits;
-}
-
 long congestion_wait(int sync, long timeout);
 
 static inline bool mapping_can_writeback(struct address_space *mapping)
@@ -391,27 +386,6 @@ static inline void wb_blkcg_offline(struct blkcg *blkcg)
 
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
-static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits)
-{
-	return wb_congested(&bdi->wb, cong_bits);
-}
-
-static inline int bdi_read_congested(struct backing_dev_info *bdi)
-{
-	return bdi_congested(bdi, 1 << WB_sync_congested);
-}
-
-static inline int bdi_write_congested(struct backing_dev_info *bdi)
-{
-	return bdi_congested(bdi, 1 << WB_async_congested);
-}
-
-static inline int bdi_rw_congested(struct backing_dev_info *bdi)
-{
-	return bdi_congested(bdi, (1 << WB_sync_congested) |
-				  (1 << WB_async_congested));
-}
-
 const char *bdi_dev_name(struct backing_dev_info *bdi);
 
 #endif	/* _LINUX_BACKING_DEV_H */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ce8492939bd3..0b930556c4f2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2362,9 +2362,7 @@ static unsigned int move_pages_to_lru(struct lruvec *lruvec,
  */
 static int current_may_throttle(void)
 {
-	return !(current->flags & PF_LOCAL_THROTTLE) ||
-		current->backing_dev_info == NULL ||
-		bdi_write_congested(current->backing_dev_info);
+	return !(current->flags & PF_LOCAL_THROTTLE);
 }
 
 /*



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/9] f2fs: change retry waiting for f2fs_write_single_data_page()
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
                   ` (5 preceding siblings ...)
  2022-01-27  2:46 ` [PATCH 6/9] fuse: don't set/clear bdi_congested NeilBrown
@ 2022-01-27  2:46 ` NeilBrown
  2022-01-28  1:34   ` Jaegeuk Kim
  2022-01-27  2:46 ` [PATCH 7/9] NFS: remove congestion control NeilBrown
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2022-01-27  2:46 UTC (permalink / raw)
  To: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

f2fs_write_single_data_page() can return -EAGAIN if it cannot get
the cp_rwsem lock - it holds a page lock and so cannot wait for it.

Some code which calls f2fs_write_single_data_page() use
congestion_wait() and then tries again.  congestion_wait() doesn't do
anything useful as congestion is no longer tracked.  So this is just a
simple sleep.

A better approach is it wait until the cp_rwsem lock can be taken - then
try again.  There is certainly no point trying again *before* the lock
can be taken.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/f2fs/compress.c |    6 +++---
 fs/f2fs/data.c     |    9 ++++++---
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index d0c3aeba5945..58ff7f4b296c 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -1505,9 +1505,9 @@ static int f2fs_write_raw_pages(struct compress_ctx *cc,
 				if (IS_NOQUOTA(cc->inode))
 					return 0;
 				ret = 0;
-				cond_resched();
-				congestion_wait(BLK_RW_ASYNC,
-						DEFAULT_IO_TIMEOUT);
+				/* Wait until we can get the lock, then try again. */
+				f2fs_lock_op(F2FS_I_SB(cc->inode));
+				f2fs_unlock_op(F2FS_I_SB(cc->inode));
 				goto retry_write;
 			}
 			return ret;
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 8c417864c66a..1d2341163e2c 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3047,9 +3047,12 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
 				} else if (ret == -EAGAIN) {
 					ret = 0;
 					if (wbc->sync_mode == WB_SYNC_ALL) {
-						cond_resched();
-						congestion_wait(BLK_RW_ASYNC,
-							DEFAULT_IO_TIMEOUT);
+						/* Wait until we can get the
+						 * lock, then try again.
+						 */
+						f2fs_lock_op(F2FS_I_SB(mapping->host));
+						f2fs_unlock_op(F2FS_I_SB(mapping->host));
+
 						goto retry_write;
 					}
 					goto next;



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/9] f2f2: replace some congestion_wait() calls with io_schedule_timeout()
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
                   ` (7 preceding siblings ...)
  2022-01-27  2:46 ` [PATCH 7/9] NFS: remove congestion control NeilBrown
@ 2022-01-27  2:46 ` NeilBrown
  2022-01-28  1:27   ` Jaegeuk Kim
  2022-01-27 22:42 ` [PATCH 0/9] Remove remaining parts of congestions tracking code Andrew Morton
  2022-01-28  0:58 ` Jens Axboe
  10 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2022-01-27  2:46 UTC (permalink / raw)
  To: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

As congestion is no longer tracked, contestion_wait() is effectively
equivalent to io_schedule_timeout().
It isn't clear to me what these contestion_wait() calls are waiting
for, so I cannot change them to wait for some particular event.
So simply change them to io_schedule_timeout(), which will have
exactly the same behaviour.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/f2fs/segment.c |   14 ++++++++------
 fs/f2fs/super.c   |    8 ++++----
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 1dabc8244083..78e3fbc24e77 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -313,8 +313,8 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure)
 skip:
 		iput(inode);
 	}
-	congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
-	cond_resched();
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	io_schedule_timeout(DEFAULT_IO_TIMEOUT);
 	if (gc_failure) {
 		if (++looped >= count)
 			return;
@@ -802,9 +802,10 @@ int f2fs_flush_device_cache(struct f2fs_sb_info *sbi)
 
 		do {
 			ret = __submit_flush_wait(sbi, FDEV(i).bdev);
-			if (ret)
-				congestion_wait(BLK_RW_ASYNC,
-						DEFAULT_IO_TIMEOUT);
+			if (ret) {
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				io_schedule_timeout(DEFAULT_IO_TIMEOUT);
+			}
 		} while (ret && --count);
 
 		if (ret) {
@@ -3133,7 +3134,8 @@ static unsigned int __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
 			blk_finish_plug(&plug);
 			mutex_unlock(&dcc->cmd_lock);
 			trimmed += __wait_all_discard_cmd(sbi, NULL);
-			congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
+			set_current_state(TASK_UNINTERRUPTIBLE);
+			io_schedule_timeout(DEFAULT_IO_TIMEOUT);
 			goto next;
 		}
 skip:
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 76e6a3df9aba..ae8dcbb71596 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2135,8 +2135,8 @@ static void f2fs_enable_checkpoint(struct f2fs_sb_info *sbi)
 	/* we should flush all the data to keep data consistency */
 	do {
 		sync_inodes_sb(sbi->sb);
-		cond_resched();
-		congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		io_schedule_timeout(DEFAULT_IO_TIMEOUT);
 	} while (get_pages(sbi, F2FS_DIRTY_DATA) && retry--);
 
 	if (unlikely(retry < 0))
@@ -2504,8 +2504,8 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type,
 							&page, &fsdata);
 		if (unlikely(err)) {
 			if (err == -ENOMEM) {
-				congestion_wait(BLK_RW_ASYNC,
-						DEFAULT_IO_TIMEOUT);
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				io_schedule_timeout(DEFAULT_IO_TIMEOUT);
 				goto retry;
 			}
 			set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/9] cephfs: don't set/clear bdi_congestion
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
  2022-01-27  2:46 ` [PATCH 2/9] Remove bdi_congested() and wb_congested() and related functions NeilBrown
@ 2022-01-27  2:46 ` NeilBrown
  2022-01-27 11:12   ` Jeff Layton
  2022-01-27  2:46 ` [PATCH 9/9] Remove congestion tracking framework NeilBrown
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2022-01-27  2:46 UTC (permalink / raw)
  To: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

The bdi congestion framework is no-longer used - writeback uses other
mechanisms to manage throughput.

So remove calls to set_bdi_congested() and clear_bdi_congested(), and
remove the writeback_count which is used only to guide the setting and
clearing.

The congestion_kb mount option is no longer meaningful, but as it is
visible to user-space, removing it needs more consideration.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/ceph/addr.c  |   27 ---------------------------
 fs/ceph/super.c |    2 --
 fs/ceph/super.h |    2 --
 3 files changed, 31 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index c98e5238a1b6..9147667f8cd5 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -57,11 +57,6 @@
  * accounting is preserved.
  */
 
-#define CONGESTION_ON_THRESH(congestion_kb) (congestion_kb >> (PAGE_SHIFT-10))
-#define CONGESTION_OFF_THRESH(congestion_kb)				\
-	(CONGESTION_ON_THRESH(congestion_kb) -				\
-	 (CONGESTION_ON_THRESH(congestion_kb) >> 2))
-
 static int ceph_netfs_check_write_begin(struct file *file, loff_t pos, unsigned int len,
 					struct folio *folio, void **_fsdata);
 
@@ -561,10 +556,6 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 	dout("writepage %p page %p index %lu on %llu~%llu snapc %p seq %lld\n",
 	     inode, page, page->index, page_off, len, snapc, snapc->seq);
 
-	if (atomic_long_inc_return(&fsc->writeback_count) >
-	    CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
-		set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
-
 	req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1,
 				    CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc,
 				    ceph_wbc.truncate_seq, ceph_wbc.truncate_size,
@@ -621,10 +612,6 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 	ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
 	ceph_put_snap_context(snapc);  /* page's reference */
 
-	if (atomic_long_dec_return(&fsc->writeback_count) <
-	    CONGESTION_OFF_THRESH(fsc->mount_options->congestion_kb))
-		clear_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
-
 	return err;
 }
 
@@ -704,12 +691,6 @@ static void writepages_finish(struct ceph_osd_request *req)
 			BUG_ON(!page);
 			WARN_ON(!PageUptodate(page));
 
-			if (atomic_long_dec_return(&fsc->writeback_count) <
-			     CONGESTION_OFF_THRESH(
-					fsc->mount_options->congestion_kb))
-				clear_bdi_congested(inode_to_bdi(inode),
-						    BLK_RW_ASYNC);
-
 			ceph_put_snap_context(detach_page_private(page));
 			end_page_writeback(page);
 			dout("unlocking %p\n", page);
@@ -952,14 +933,6 @@ static int ceph_writepages_start(struct address_space *mapping,
 			dout("%p will write page %p idx %lu\n",
 			     inode, page, page->index);
 
-			if (atomic_long_inc_return(&fsc->writeback_count) >
-			    CONGESTION_ON_THRESH(
-				    fsc->mount_options->congestion_kb)) {
-				set_bdi_congested(inode_to_bdi(inode),
-						  BLK_RW_ASYNC);
-			}
-
-
 			pages[locked_pages++] = page;
 			pvec.pages[i] = NULL;
 
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index bf79f369aec6..b2f38af9fca8 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -801,8 +801,6 @@ static struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt,
 	fsc->filp_gen = 1;
 	fsc->have_copy_from2 = true;
 
-	atomic_long_set(&fsc->writeback_count, 0);
-
 	err = -ENOMEM;
 	/*
 	 * The number of concurrent works can be high but they don't need
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 67f145e1ae7a..fc58adf1d36a 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -120,8 +120,6 @@ struct ceph_fs_client {
 
 	struct ceph_mds_client *mdsc;
 
-	atomic_long_t writeback_count;
-
 	struct workqueue_struct *inode_wq;
 	struct workqueue_struct *cap_wq;
 



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/9] fuse: don't set/clear bdi_congested
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
                   ` (4 preceding siblings ...)
  2022-01-27  2:46 ` [PATCH 1/9] Remove inode_congested() NeilBrown
@ 2022-01-27  2:46 ` NeilBrown
  2022-01-27  2:46 ` [PATCH 3/9] f2fs: change retry waiting for f2fs_write_single_data_page() NeilBrown
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2022-01-27  2:46 UTC (permalink / raw)
  To: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

The bid congestion framework is no longer used to manage writeout etc,
so drop updating it in fuse.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/fuse/control.c |   17 -----------------
 fs/fuse/dev.c     |    8 --------
 2 files changed, 25 deletions(-)

diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index 000d2e5627e9..7cede9a3bc96 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c
@@ -164,7 +164,6 @@ static ssize_t fuse_conn_congestion_threshold_write(struct file *file,
 {
 	unsigned val;
 	struct fuse_conn *fc;
-	struct fuse_mount *fm;
 	ssize_t ret;
 
 	ret = fuse_conn_limit_write(file, buf, count, ppos, &val,
@@ -178,22 +177,6 @@ static ssize_t fuse_conn_congestion_threshold_write(struct file *file,
 	down_read(&fc->killsb);
 	spin_lock(&fc->bg_lock);
 	fc->congestion_threshold = val;
-
-	/*
-	 * Get any fuse_mount belonging to this fuse_conn; s_bdi is
-	 * shared between all of them
-	 */
-
-	if (!list_empty(&fc->mounts)) {
-		fm = list_first_entry(&fc->mounts, struct fuse_mount, fc_entry);
-		if (fc->num_background < fc->congestion_threshold) {
-			clear_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
-			clear_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
-		} else {
-			set_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
-			set_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
-		}
-	}
 	spin_unlock(&fc->bg_lock);
 	up_read(&fc->killsb);
 	fuse_conn_put(fc);
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index cd54a529460d..e1b4a846c90d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -315,10 +315,6 @@ void fuse_request_end(struct fuse_req *req)
 				wake_up(&fc->blocked_waitq);
 		}
 
-		if (fc->num_background == fc->congestion_threshold && fm->sb) {
-			clear_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
-			clear_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
-		}
 		fc->num_background--;
 		fc->active_background--;
 		flush_bg_queue(fc);
@@ -540,10 +536,6 @@ static bool fuse_request_queue_background(struct fuse_req *req)
 		fc->num_background++;
 		if (fc->num_background == fc->max_background)
 			fc->blocked = 1;
-		if (fc->num_background == fc->congestion_threshold && fm->sb) {
-			set_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
-			set_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
-		}
 		list_add_tail(&req->list, &fc->bg_queue);
 		flush_bg_queue(fc);
 		queued = true;



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/9] NFS: remove congestion control.
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
                   ` (6 preceding siblings ...)
  2022-01-27  2:46 ` [PATCH 3/9] f2fs: change retry waiting for f2fs_write_single_data_page() NeilBrown
@ 2022-01-27  2:46 ` NeilBrown
  2022-01-27  2:46 ` [PATCH 4/9] f2f2: replace some congestion_wait() calls with io_schedule_timeout() NeilBrown
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2022-01-27  2:46 UTC (permalink / raw)
  To: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

Linux no longer uses the bdi congestion tracking framework.
So remove code from bdi which tries to support it.

Also remove the "nfs_congestion_kb" sysctl.  This is a user-visible
change, but unlikely to be a problematic one.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/sysctl.c           |    7 ------
 fs/nfs/write.c            |   53 +--------------------------------------------
 include/linux/nfs_fs.h    |    1 -
 include/linux/nfs_fs_sb.h |    1 -
 4 files changed, 1 insertion(+), 61 deletions(-)

diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index 7aea195ddb35..18f3ff77fd0c 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -22,13 +22,6 @@ static struct ctl_table nfs_cb_sysctls[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
 	},
-	{
-		.procname	= "nfs_congestion_kb",
-		.data		= &nfs_congestion_kb,
-		.maxlen		= sizeof(nfs_congestion_kb),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-	},
 	{ }
 };
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 987a187bd39a..1c22ea6f23c3 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -397,33 +397,8 @@ static int wb_priority(struct writeback_control *wbc)
 	return ret;
 }
 
-/*
- * NFS congestion control
- */
-
-int nfs_congestion_kb;
-
-#define NFS_CONGESTION_ON_THRESH 	(nfs_congestion_kb >> (PAGE_SHIFT-10))
-#define NFS_CONGESTION_OFF_THRESH	\
-	(NFS_CONGESTION_ON_THRESH - (NFS_CONGESTION_ON_THRESH >> 2))
-
-static void nfs_set_page_writeback(struct page *page)
-{
-	struct inode *inode = page_file_mapping(page)->host;
-	struct nfs_server *nfss = NFS_SERVER(inode);
-	int ret = test_set_page_writeback(page);
-
-	WARN_ON_ONCE(ret != 0);
-
-	if (atomic_long_inc_return(&nfss->writeback) >
-			NFS_CONGESTION_ON_THRESH)
-		set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
-}
-
 static void nfs_end_page_writeback(struct nfs_page *req)
 {
-	struct inode *inode = page_file_mapping(req->wb_page)->host;
-	struct nfs_server *nfss = NFS_SERVER(inode);
 	bool is_done;
 
 	is_done = nfs_page_group_sync_on_bit(req, PG_WB_END);
@@ -432,8 +407,6 @@ static void nfs_end_page_writeback(struct nfs_page *req)
 		return;
 
 	end_page_writeback(req->wb_page);
-	if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
-		clear_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
 }
 
 /*
@@ -617,7 +590,7 @@ static int nfs_page_async_flush(struct nfs_pageio_descriptor *pgio,
 	if (IS_ERR(req))
 		goto out;
 
-	nfs_set_page_writeback(page);
+	set_page_writeback(page);
 	WARN_ON_ONCE(test_bit(PG_CLEAN, &req->wb_flags));
 
 	/* If there is a fatal error that covers this write, just exit */
@@ -1850,7 +1823,6 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
 	struct nfs_page	*req;
 	int status = data->task.tk_status;
 	struct nfs_commit_info cinfo;
-	struct nfs_server *nfss;
 
 	while (!list_empty(&data->pages)) {
 		req = nfs_list_entry(data->pages.next);
@@ -1891,9 +1863,6 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
 		/* Latency breaker */
 		cond_resched();
 	}
-	nfss = NFS_SERVER(data->inode);
-	if (atomic_long_read(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
-		clear_bdi_congested(inode_to_bdi(data->inode), BLK_RW_ASYNC);
 
 	nfs_init_cinfo(&cinfo, data->inode, data->dreq);
 	nfs_commit_end(cinfo.mds);
@@ -2162,26 +2131,6 @@ int __init nfs_init_writepagecache(void)
 	if (nfs_commit_mempool == NULL)
 		goto out_destroy_commit_cache;
 
-	/*
-	 * NFS congestion size, scale with available memory.
-	 *
-	 *  64MB:    8192k
-	 * 128MB:   11585k
-	 * 256MB:   16384k
-	 * 512MB:   23170k
-	 *   1GB:   32768k
-	 *   2GB:   46340k
-	 *   4GB:   65536k
-	 *   8GB:   92681k
-	 *  16GB:  131072k
-	 *
-	 * This allows larger machines to have larger/more transfers.
-	 * Limit the default to 256M
-	 */
-	nfs_congestion_kb = (16*int_sqrt(totalram_pages())) << (PAGE_SHIFT-10);
-	if (nfs_congestion_kb > 256*1024)
-		nfs_congestion_kb = 256*1024;
-
 	return 0;
 
 out_destroy_commit_cache:
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 02aa49323d1d..17045c229277 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -569,7 +569,6 @@ extern void nfs_complete_unlink(struct dentry *dentry, struct inode *);
 /*
  * linux/fs/nfs/write.c
  */
-extern int  nfs_congestion_kb;
 extern int  nfs_writepage(struct page *page, struct writeback_control *wbc);
 extern int  nfs_writepages(struct address_space *, struct writeback_control *);
 extern int  nfs_flush_incompatible(struct file *file, struct page *page);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index ca0959e51e81..3444ebbc63b6 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -137,7 +137,6 @@ struct nfs_server {
 	struct rpc_clnt *	client_acl;	/* ACL RPC client handle */
 	struct nlm_host		*nlm_host;	/* NLM client handle */
 	struct nfs_iostats __percpu *io_stats;	/* I/O statistics */
-	atomic_long_t		writeback;	/* number of writeback pages */
 	unsigned int		flags;		/* various flags */
 
 /* The following are for internal use only. Also see uapi/linux/nfs_mount.h */



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 8/9] block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC"
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
                   ` (2 preceding siblings ...)
  2022-01-27  2:46 ` [PATCH 9/9] Remove congestion tracking framework NeilBrown
@ 2022-01-27  2:46 ` NeilBrown
  2022-01-27  2:46 ` [PATCH 1/9] Remove inode_congested() NeilBrown
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2022-01-27  2:46 UTC (permalink / raw)
  To: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

bfq_get_queue() expects a "bool" for the third arg, so pass "false"
rather than "BLK_RW_ASYNC" which will soon be removed.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 block/bfq-iosched.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0c612a911696..4e645ae1e066 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5448,7 +5448,7 @@ static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio)
 	bfqq = bic_to_bfqq(bic, false);
 	if (bfqq) {
 		bfq_release_process_ref(bfqd, bfqq);
-		bfqq = bfq_get_queue(bfqd, bio, BLK_RW_ASYNC, bic, true);
+		bfqq = bfq_get_queue(bfqd, bio, false, bic, true);
 		bic_set_bfqq(bic, bfqq, false);
 	}
 



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 9/9] Remove congestion tracking framework.
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
  2022-01-27  2:46 ` [PATCH 2/9] Remove bdi_congested() and wb_congested() and related functions NeilBrown
  2022-01-27  2:46 ` [PATCH 5/9] cephfs: don't set/clear bdi_congestion NeilBrown
@ 2022-01-27  2:46 ` NeilBrown
  2022-01-27  2:46 ` [PATCH 8/9] block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC" NeilBrown
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2022-01-27  2:46 UTC (permalink / raw)
  To: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

This framework is no longer used - so discard it.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 include/linux/backing-dev-defs.h |    8 -----
 include/linux/backing-dev.h      |    2 -
 include/trace/events/writeback.h |   28 -------------------
 mm/backing-dev.c                 |   57 --------------------------------------
 4 files changed, 95 deletions(-)

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 993c5628a726..e863c88df95f 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -207,14 +207,6 @@ struct backing_dev_info {
 #endif
 };
 
-enum {
-	BLK_RW_ASYNC	= 0,
-	BLK_RW_SYNC	= 1,
-};
-
-void clear_bdi_congested(struct backing_dev_info *bdi, int sync);
-void set_bdi_congested(struct backing_dev_info *bdi, int sync);
-
 struct wb_lock_cookie {
 	bool locked;
 	unsigned long flags;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 2d764566280c..87ce24d238f3 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -135,8 +135,6 @@ static inline bool writeback_in_progress(struct bdi_writeback *wb)
 
 struct backing_dev_info *inode_to_bdi(struct inode *inode);
 
-long congestion_wait(int sync, long timeout);
-
 static inline bool mapping_can_writeback(struct address_space *mapping)
 {
 	return inode_to_bdi(mapping->host)->capabilities & BDI_CAP_WRITEBACK;
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index a345b1e12daf..86b2a82da546 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -735,34 +735,6 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
 	)
 );
 
-DECLARE_EVENT_CLASS(writeback_congest_waited_template,
-
-	TP_PROTO(unsigned int usec_timeout, unsigned int usec_delayed),
-
-	TP_ARGS(usec_timeout, usec_delayed),
-
-	TP_STRUCT__entry(
-		__field(	unsigned int,	usec_timeout	)
-		__field(	unsigned int,	usec_delayed	)
-	),
-
-	TP_fast_assign(
-		__entry->usec_timeout	= usec_timeout;
-		__entry->usec_delayed	= usec_delayed;
-	),
-
-	TP_printk("usec_timeout=%u usec_delayed=%u",
-			__entry->usec_timeout,
-			__entry->usec_delayed)
-);
-
-DEFINE_EVENT(writeback_congest_waited_template, writeback_congestion_wait,
-
-	TP_PROTO(unsigned int usec_timeout, unsigned int usec_delayed),
-
-	TP_ARGS(usec_timeout, usec_delayed)
-);
-
 DECLARE_EVENT_CLASS(writeback_single_inode_template,
 
 	TP_PROTO(struct inode *inode,
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index eae96dfe0261..7176af65b103 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -1005,60 +1005,3 @@ const char *bdi_dev_name(struct backing_dev_info *bdi)
 	return bdi->dev_name;
 }
 EXPORT_SYMBOL_GPL(bdi_dev_name);
-
-static wait_queue_head_t congestion_wqh[2] = {
-		__WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[0]),
-		__WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[1])
-	};
-static atomic_t nr_wb_congested[2];
-
-void clear_bdi_congested(struct backing_dev_info *bdi, int sync)
-{
-	wait_queue_head_t *wqh = &congestion_wqh[sync];
-	enum wb_congested_state bit;
-
-	bit = sync ? WB_sync_congested : WB_async_congested;
-	if (test_and_clear_bit(bit, &bdi->wb.congested))
-		atomic_dec(&nr_wb_congested[sync]);
-	smp_mb__after_atomic();
-	if (waitqueue_active(wqh))
-		wake_up(wqh);
-}
-EXPORT_SYMBOL(clear_bdi_congested);
-
-void set_bdi_congested(struct backing_dev_info *bdi, int sync)
-{
-	enum wb_congested_state bit;
-
-	bit = sync ? WB_sync_congested : WB_async_congested;
-	if (!test_and_set_bit(bit, &bdi->wb.congested))
-		atomic_inc(&nr_wb_congested[sync]);
-}
-EXPORT_SYMBOL(set_bdi_congested);
-
-/**
- * congestion_wait - wait for a backing_dev to become uncongested
- * @sync: SYNC or ASYNC IO
- * @timeout: timeout in jiffies
- *
- * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
- * write congestion.  If no backing_devs are congested then just wait for the
- * next write to be completed.
- */
-long congestion_wait(int sync, long timeout)
-{
-	long ret;
-	unsigned long start = jiffies;
-	DEFINE_WAIT(wait);
-	wait_queue_head_t *wqh = &congestion_wqh[sync];
-
-	prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
-	ret = io_schedule_timeout(timeout);
-	finish_wait(wqh, &wait);
-
-	trace_writeback_congestion_wait(jiffies_to_usecs(timeout),
-					jiffies_to_usecs(jiffies - start));
-
-	return ret;
-}
-EXPORT_SYMBOL(congestion_wait);



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/9] cephfs: don't set/clear bdi_congestion
  2022-01-27  2:46 ` [PATCH 5/9] cephfs: don't set/clear bdi_congestion NeilBrown
@ 2022-01-27 11:12   ` Jeff Layton
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2022-01-27 11:12 UTC (permalink / raw)
  To: NeilBrown, Andrew Morton, Jaegeuk Kim, Chao Yu, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

On Thu, 2022-01-27 at 13:46 +1100, NeilBrown wrote:
> The bdi congestion framework is no-longer used - writeback uses other
> mechanisms to manage throughput.
> 
> So remove calls to set_bdi_congested() and clear_bdi_congested(), and
> remove the writeback_count which is used only to guide the setting and
> clearing.
> 
> The congestion_kb mount option is no longer meaningful, but as it is
> visible to user-space, removing it needs more consideration.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/ceph/addr.c  |   27 ---------------------------
>  fs/ceph/super.c |    2 --
>  fs/ceph/super.h |    2 --
>  3 files changed, 31 deletions(-)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index c98e5238a1b6..9147667f8cd5 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -57,11 +57,6 @@
>   * accounting is preserved.
>   */
>  
> -#define CONGESTION_ON_THRESH(congestion_kb) (congestion_kb >> (PAGE_SHIFT-10))
> -#define CONGESTION_OFF_THRESH(congestion_kb)				\
> -	(CONGESTION_ON_THRESH(congestion_kb) -				\
> -	 (CONGESTION_ON_THRESH(congestion_kb) >> 2))
> -
>  static int ceph_netfs_check_write_begin(struct file *file, loff_t pos, unsigned int len,
>  					struct folio *folio, void **_fsdata);
>  
> @@ -561,10 +556,6 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
>  	dout("writepage %p page %p index %lu on %llu~%llu snapc %p seq %lld\n",
>  	     inode, page, page->index, page_off, len, snapc, snapc->seq);
>  
> -	if (atomic_long_inc_return(&fsc->writeback_count) >
> -	    CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
> -		set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
> -
>  	req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1,
>  				    CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc,
>  				    ceph_wbc.truncate_seq, ceph_wbc.truncate_size,
> @@ -621,10 +612,6 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
>  	ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
>  	ceph_put_snap_context(snapc);  /* page's reference */
>  
> -	if (atomic_long_dec_return(&fsc->writeback_count) <
> -	    CONGESTION_OFF_THRESH(fsc->mount_options->congestion_kb))
> -		clear_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
> -
>  	return err;
>  }
>  
> @@ -704,12 +691,6 @@ static void writepages_finish(struct ceph_osd_request *req)
>  			BUG_ON(!page);
>  			WARN_ON(!PageUptodate(page));
>  
> -			if (atomic_long_dec_return(&fsc->writeback_count) <
> -			     CONGESTION_OFF_THRESH(
> -					fsc->mount_options->congestion_kb))
> -				clear_bdi_congested(inode_to_bdi(inode),
> -						    BLK_RW_ASYNC);
> -
>  			ceph_put_snap_context(detach_page_private(page));
>  			end_page_writeback(page);
>  			dout("unlocking %p\n", page);
> @@ -952,14 +933,6 @@ static int ceph_writepages_start(struct address_space *mapping,
>  			dout("%p will write page %p idx %lu\n",
>  			     inode, page, page->index);
>  
> -			if (atomic_long_inc_return(&fsc->writeback_count) >
> -			    CONGESTION_ON_THRESH(
> -				    fsc->mount_options->congestion_kb)) {
> -				set_bdi_congested(inode_to_bdi(inode),
> -						  BLK_RW_ASYNC);
> -			}
> -
> -
>  			pages[locked_pages++] = page;
>  			pvec.pages[i] = NULL;
>  
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index bf79f369aec6..b2f38af9fca8 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -801,8 +801,6 @@ static struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt,
>  	fsc->filp_gen = 1;
>  	fsc->have_copy_from2 = true;
>  
> -	atomic_long_set(&fsc->writeback_count, 0);
> -
>  	err = -ENOMEM;
>  	/*
>  	 * The number of concurrent works can be high but they don't need
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 67f145e1ae7a..fc58adf1d36a 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -120,8 +120,6 @@ struct ceph_fs_client {
>  
>  	struct ceph_mds_client *mdsc;
>  
> -	atomic_long_t writeback_count;
> -
>  	struct workqueue_struct *inode_wq;
>  	struct workqueue_struct *cap_wq;
>  
> 
> 

Thanks Neil.

I'll plan to pull this into the ceph testing branch and do some testing
with it, but at a quick glance I don't forsee any issues. This should
make v5.18, but we may be able to get it in sooner.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/9] Remove bdi_congested() and wb_congested() and related functions
  2022-01-27  2:46 ` [PATCH 2/9] Remove bdi_congested() and wb_congested() and related functions NeilBrown
@ 2022-01-27 22:10   ` Ryusuke Konishi
  0 siblings, 0 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2022-01-27 22:10 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Darrick J. Wong,
	Philipp Reisner, Lars Ellenberg, Paolo Valente, Jens Axboe,
	Linux MM, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, LKML,
	linux-block

On Thu, Jan 27, 2022 at 11:47 AM NeilBrown <neilb@suse.de> wrote:
>
> These functions are no longer useful as the only bdis that report
> congestion are in ceph, fuse, and nfs.  None of those bdis can be the
> target of the calls in drbd, ext2, nilfs2, or xfs.
>
> Removing the test on bdi_write_contested() in current_may_throttle()
> could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
> is set.
>
> So replace the calls by 'false' and simplify the code - and remove the
> functions.
>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  drivers/block/drbd/drbd_int.h |    3 ---
>  drivers/block/drbd/drbd_req.c |    3 +--
>  fs/ext2/ialloc.c              |    2 --
>  fs/nilfs2/segbuf.c            |   11 -----------
>  fs/xfs/xfs_buf.c              |    3 ---
>  include/linux/backing-dev.h   |   26 --------------------------
>  mm/vmscan.c                   |    4 +---
>  7 files changed, 2 insertions(+), 50 deletions(-)

for nilfs2 bits,

Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>

Thanks,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/9] Remove remaining parts of congestions tracking code.
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
                   ` (8 preceding siblings ...)
  2022-01-27  2:46 ` [PATCH 4/9] f2f2: replace some congestion_wait() calls with io_schedule_timeout() NeilBrown
@ 2022-01-27 22:42 ` Andrew Morton
  2022-01-28  0:58 ` Jens Axboe
  10 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2022-01-27 22:42 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov, Miklos Szeredi,
	Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe, linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

On Thu, 27 Jan 2022 13:46:29 +1100 NeilBrown <neilb@suse.de> wrote:

> Congestion hasn't been reliably tracked for quite some time.
> Most MM uses of it for guiding writeback decisions were removed in 5.16.
> Some other uses were removed in 17-rc1.
> 
> This series removes the remaining places that test for congestion, and
> the few places which still set it.
> 
> The second patch touches a few filesystems.  I didn't think there was
> much value in splitting this out by filesystems, but if maintainers
> would rather I did that, I will.
> 
> The f2fs, cephfs, fuse, NFS, and block patches can go through the
> respective trees proving the final patch doesn't land until after they
> all do - so maybe it should be held for 5.18-rc2 if all the rest lands
> by 5.18-rc1.

Plan B: I'll just take everything.  While collecting tested-bys and
acked-bys from filesystem maintainers (please).


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/9] Remove remaining parts of congestions tracking code.
  2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
                   ` (9 preceding siblings ...)
  2022-01-27 22:42 ` [PATCH 0/9] Remove remaining parts of congestions tracking code Andrew Morton
@ 2022-01-28  0:58 ` Jens Axboe
  10 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2022-01-28  0:58 UTC (permalink / raw)
  To: NeilBrown, Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton,
	Ilya Dryomov, Miklos Szeredi, Trond Myklebust, Anna Schumaker,
	Ryusuke Konishi, Darrick J. Wong, Philipp Reisner,
	Lars Ellenberg, Paolo Valente
  Cc: linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

On 1/26/22 7:46 PM, NeilBrown wrote:
> Congestion hasn't been reliably tracked for quite some time.
> Most MM uses of it for guiding writeback decisions were removed in 5.16.
> Some other uses were removed in 17-rc1.
> 
> This series removes the remaining places that test for congestion, and
> the few places which still set it.
> 
> The second patch touches a few filesystems.  I didn't think there was
> much value in splitting this out by filesystems, but if maintainers
> would rather I did that, I will.
> 
> The f2fs, cephfs, fuse, NFS, and block patches can go through the
> respective trees proving the final patch doesn't land until after they
> all do - so maybe it should be held for 5.18-rc2 if all the rest lands
> by 5.18-rc1.

For the series:

Acked-by: Jens Axboe <axboe@kernel.dk>

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/9] f2f2: replace some congestion_wait() calls with io_schedule_timeout()
  2022-01-27  2:46 ` [PATCH 4/9] f2f2: replace some congestion_wait() calls with io_schedule_timeout() NeilBrown
@ 2022-01-28  1:27   ` Jaegeuk Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Jaegeuk Kim @ 2022-01-28  1:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe, linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

I saw some missing cases. Could you please consider this instead?
And, please fix "f2f2:" to "f2fs:".

---
 fs/f2fs/compress.c |  4 +---
 fs/f2fs/data.c     | 13 ++++++-------
 fs/f2fs/f2fs.h     |  6 ++++++
 fs/f2fs/segment.c  |  8 +++-----
 fs/f2fs/super.c    |  6 ++----
 5 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 67bac2792e57..6b22d407a4a4 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -1505,9 +1505,7 @@ static int f2fs_write_raw_pages(struct compress_ctx *cc,
 				if (IS_NOQUOTA(cc->inode))
 					return 0;
 				ret = 0;
-				cond_resched();
-				congestion_wait(BLK_RW_ASYNC,
-						DEFAULT_IO_TIMEOUT);
+				f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
 				goto retry_write;
 			}
 			return ret;
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 0f124e8de1d4..c9285c88cb85 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3046,13 +3046,12 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
 					goto next;
 				} else if (ret == -EAGAIN) {
 					ret = 0;
-					if (wbc->sync_mode == WB_SYNC_ALL) {
-						cond_resched();
-						congestion_wait(BLK_RW_ASYNC,
-							DEFAULT_IO_TIMEOUT);
-						goto retry_write;
-					}
-					goto next;
+					if (wbc->sync_mode != WB_SYNC_ALL)
+						goto next;
+
+					f2fs_io_schedule_timeout(
+						DEFAULT_IO_TIMEOUT);
+					goto retry_write;
 				}
 				done_index = page->index + 1;
 				done = 1;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6ddb98ff0b7c..dbd650a5a8fc 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4501,6 +4501,12 @@ static inline bool f2fs_block_unit_discard(struct f2fs_sb_info *sbi)
 	return F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_BLOCK;
 }
 
+static inline void f2fs_io_schedule_timeout(long timeout)
+{
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	io_schedule_timeout(timeout);
+}
+
 #define EFSBADCRC	EBADMSG		/* Bad CRC detected */
 #define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
 
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 56211e201d51..885b27d7e491 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -313,8 +313,7 @@ void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure)
 skip:
 		iput(inode);
 	}
-	congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
-	cond_resched();
+	f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
 	if (gc_failure) {
 		if (++looped >= count)
 			return;
@@ -803,8 +802,7 @@ int f2fs_flush_device_cache(struct f2fs_sb_info *sbi)
 		do {
 			ret = __submit_flush_wait(sbi, FDEV(i).bdev);
 			if (ret)
-				congestion_wait(BLK_RW_ASYNC,
-						DEFAULT_IO_TIMEOUT);
+				f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
 		} while (ret && --count);
 
 		if (ret) {
@@ -3137,7 +3135,7 @@ static unsigned int __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
 			blk_finish_plug(&plug);
 			mutex_unlock(&dcc->cmd_lock);
 			trimmed += __wait_all_discard_cmd(sbi, NULL);
-			congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
+			f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
 			goto next;
 		}
 skip:
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 9af6c20532ec..f484a839fc52 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2135,8 +2135,7 @@ static void f2fs_enable_checkpoint(struct f2fs_sb_info *sbi)
 	/* we should flush all the data to keep data consistency */
 	do {
 		sync_inodes_sb(sbi->sb);
-		cond_resched();
-		congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
+		f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
 	} while (get_pages(sbi, F2FS_DIRTY_DATA) && retry--);
 
 	if (unlikely(retry < 0))
@@ -2504,8 +2503,7 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type,
 							&page, &fsdata);
 		if (unlikely(err)) {
 			if (err == -ENOMEM) {
-				congestion_wait(BLK_RW_ASYNC,
-						DEFAULT_IO_TIMEOUT);
+				f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
 				goto retry;
 			}
 			set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);
-- 
2.35.0.rc0.227.g00780c9af4-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/9] f2fs: change retry waiting for f2fs_write_single_data_page()
  2022-01-27  2:46 ` [PATCH 3/9] f2fs: change retry waiting for f2fs_write_single_data_page() NeilBrown
@ 2022-01-28  1:34   ` Jaegeuk Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Jaegeuk Kim @ 2022-01-28  1:34 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, Chao Yu, Jeff Layton, Ilya Dryomov,
	Miklos Szeredi, Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe, linux-mm, linux-nilfs, linux-nfs, linux-fsdevel,
	linux-f2fs-devel, linux-ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

On 01/27, NeilBrown wrote:
> f2fs_write_single_data_page() can return -EAGAIN if it cannot get
> the cp_rwsem lock - it holds a page lock and so cannot wait for it.
> 
> Some code which calls f2fs_write_single_data_page() use
> congestion_wait() and then tries again.  congestion_wait() doesn't do
> anything useful as congestion is no longer tracked.  So this is just a
> simple sleep.
> 
> A better approach is it wait until the cp_rwsem lock can be taken - then
> try again.  There is certainly no point trying again *before* the lock
> can be taken.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/f2fs/compress.c |    6 +++---
>  fs/f2fs/data.c     |    9 ++++++---
>  2 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> index d0c3aeba5945..58ff7f4b296c 100644
> --- a/fs/f2fs/compress.c
> +++ b/fs/f2fs/compress.c
> @@ -1505,9 +1505,9 @@ static int f2fs_write_raw_pages(struct compress_ctx *cc,
>  				if (IS_NOQUOTA(cc->inode))
>  					return 0;
>  				ret = 0;
> -				cond_resched();
> -				congestion_wait(BLK_RW_ASYNC,
> -						DEFAULT_IO_TIMEOUT);
> +				/* Wait until we can get the lock, then try again. */
> +				f2fs_lock_op(F2FS_I_SB(cc->inode));
> +				f2fs_unlock_op(F2FS_I_SB(cc->inode));

Since checkpoint uses down_write(cp_rwsem), I'm not sure the write path is safe
and needs to wait for checkpoint. Can we just do io_schedule_timeout()?

>  				goto retry_write;
>  			}
>  			return ret;
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 8c417864c66a..1d2341163e2c 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -3047,9 +3047,12 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
>  				} else if (ret == -EAGAIN) {
>  					ret = 0;
>  					if (wbc->sync_mode == WB_SYNC_ALL) {
> -						cond_resched();
> -						congestion_wait(BLK_RW_ASYNC,
> -							DEFAULT_IO_TIMEOUT);
> +						/* Wait until we can get the
> +						 * lock, then try again.
> +						 */
> +						f2fs_lock_op(F2FS_I_SB(mapping->host));
> +						f2fs_unlock_op(F2FS_I_SB(mapping->host));
> +
>  						goto retry_write;
>  					}
>  					goto next;
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/9] Remove inode_congested()
  2022-01-27  2:46 ` [PATCH 1/9] Remove inode_congested() NeilBrown
@ 2022-01-28  9:37   ` Miklos Szeredi
  2022-01-28 21:36     ` NeilBrown
  0 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2022-01-28  9:37 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe, linux-mm, linux-nilfs, Linux NFS list, linux-fsdevel,
	linux-f2fs-devel, Ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

On Thu, 27 Jan 2022 at 03:47, NeilBrown <neilb@suse.de> wrote:
>
> inode_congested() reports if the backing-device for the inode is
> congested.  Few bdi report congestion any more, only ceph, fuse, and
> nfs.  Having support just for those is unlikely to be useful.
>
> The places which test inode_congested() or it variants like
> inode_write_congested(), avoid initiating IO if congestion is present.
> We now have to rely on other places in the stack to back off, or abort
> requests - we already do for everything except these 3 filesystems.
>
> So remove inode_congested() and related functions, and remove the call
> sites, assuming that inode_congested() always returns 'false'.

Looks to me this is going to "break" fuse; e.g. readahead path will go
ahead and try to submit more requests, even if the queue is getting
congested.   In this case the readahead submission will eventually
block, which is counterproductive.

I think we should *first* make sure all call sites are substituted
with appropriate mechanisms in the affected filesystems and as a last
step remove the superfluous bdi congestion mechanism.

You are saying that all fs except these three already have such
mechanisms in place, right?  Can you elaborate on that?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/9] Remove inode_congested()
  2022-01-28  9:37   ` Miklos Szeredi
@ 2022-01-28 21:36     ` NeilBrown
  0 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2022-01-28 21:36 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andrew Morton, Jaegeuk Kim, Chao Yu, Jeff Layton, Ilya Dryomov,
	Trond Myklebust, Anna Schumaker, Ryusuke Konishi,
	Darrick J. Wong, Philipp Reisner, Lars Ellenberg, Paolo Valente,
	Jens Axboe, linux-mm, linux-nilfs, Linux NFS list, linux-fsdevel,
	linux-f2fs-devel, Ext4, ceph-devel, drbd-dev, linux-kernel,
	linux-block

On Fri, 28 Jan 2022, Miklos Szeredi wrote:
> On Thu, 27 Jan 2022 at 03:47, NeilBrown <neilb@suse.de> wrote:
> >
> > inode_congested() reports if the backing-device for the inode is
> > congested.  Few bdi report congestion any more, only ceph, fuse, and
> > nfs.  Having support just for those is unlikely to be useful.
> >
> > The places which test inode_congested() or it variants like
> > inode_write_congested(), avoid initiating IO if congestion is present.
> > We now have to rely on other places in the stack to back off, or abort
> > requests - we already do for everything except these 3 filesystems.
> >
> > So remove inode_congested() and related functions, and remove the call
> > sites, assuming that inode_congested() always returns 'false'.
> 
> Looks to me this is going to "break" fuse; e.g. readahead path will go
> ahead and try to submit more requests, even if the queue is getting
> congested.   In this case the readahead submission will eventually
> block, which is counterproductive.
> 
> I think we should *first* make sure all call sites are substituted
> with appropriate mechanisms in the affected filesystems and as a last
> step remove the superfluous bdi congestion mechanism.
> 
> You are saying that all fs except these three already have such
> mechanisms in place, right?  Can you elaborate on that?

Not much.  I haven't looked into how other filesystems cope, I just know
that they must because no other filesystem ever has a congested bdi
(with one or two minor exceptions, like filesystems over drbd).

Surely read-ahead should never block.  If it hits congestion, the
read-ahead request should simply fail.  block-based filesystems seem to
set REQ_RAHEAD which might get mapped to REQ_FAILFAST_MASK, though I
don't know how that is ultimately used.

Maybe fuse and others should continue to track 'congestion' and reject
read-ahead requests when congested.
Maybe also skip WB_SYNC_NONE writes..

Or maybe this doesn't really matter in practice...  I wonder if we can
measure the usefulness of congestion.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-01-28 21:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27  2:46 [PATCH 0/9] Remove remaining parts of congestions tracking code NeilBrown
2022-01-27  2:46 ` [PATCH 2/9] Remove bdi_congested() and wb_congested() and related functions NeilBrown
2022-01-27 22:10   ` Ryusuke Konishi
2022-01-27  2:46 ` [PATCH 5/9] cephfs: don't set/clear bdi_congestion NeilBrown
2022-01-27 11:12   ` Jeff Layton
2022-01-27  2:46 ` [PATCH 9/9] Remove congestion tracking framework NeilBrown
2022-01-27  2:46 ` [PATCH 8/9] block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC" NeilBrown
2022-01-27  2:46 ` [PATCH 1/9] Remove inode_congested() NeilBrown
2022-01-28  9:37   ` Miklos Szeredi
2022-01-28 21:36     ` NeilBrown
2022-01-27  2:46 ` [PATCH 6/9] fuse: don't set/clear bdi_congested NeilBrown
2022-01-27  2:46 ` [PATCH 3/9] f2fs: change retry waiting for f2fs_write_single_data_page() NeilBrown
2022-01-28  1:34   ` Jaegeuk Kim
2022-01-27  2:46 ` [PATCH 7/9] NFS: remove congestion control NeilBrown
2022-01-27  2:46 ` [PATCH 4/9] f2f2: replace some congestion_wait() calls with io_schedule_timeout() NeilBrown
2022-01-28  1:27   ` Jaegeuk Kim
2022-01-27 22:42 ` [PATCH 0/9] Remove remaining parts of congestions tracking code Andrew Morton
2022-01-28  0:58 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).