linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Remove some 'congested' tests
@ 2021-12-13  4:14 NeilBrown
  2021-12-13  4:14 ` [PATCH 1/2] Remove inode_congested() NeilBrown
  2021-12-13  4:14 ` [PATCH 2/2] Remove bdi_congested() and wb_congested() and related functions NeilBrown
  0 siblings, 2 replies; 7+ messages in thread
From: NeilBrown @ 2021-12-13  4:14 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Philipp Reisner, Lars Ellenberg,
	Jan Kara, Ryusuke Konishi, Darrick J. Wong
  Cc: linux-xfs, linux-ext4, linux-nilfs, linux-mm, linux-fsdevel,
	linux-kernel

The framework for reporting congestion for "bdi"s is no longer widely
used.  bdis for block devices don't report congestion at all.
bdis for nfs, ceph, and fuse do, but any code which depends on that
is not going to work for most filesystems.

So we should remove it.

These two patches remove {inode,bdi,wb}_congested() and related
functions, and change all call site to assume the result was "false",
which it (almost) always is.

NeilBrown
---

NeilBrown (2):
      Remove inode_congested()
      Remove bdi_congested() and wb_congested() and related functions


 drivers/block/drbd/drbd_int.h |  3 ---
 drivers/block/drbd/drbd_req.c |  3 +--
 fs/ext2/ialloc.c              |  2 --
 fs/nilfs2/segbuf.c            | 11 -----------
 fs/xfs/xfs_buf.c              |  3 ---
 include/linux/backing-dev.h   | 26 --------------------------
 mm/vmscan.c                   |  4 +---
 7 files changed, 2 insertions(+), 50 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] Remove inode_congested()
  2021-12-13  4:14 [PATCH 0/2] Remove some 'congested' tests NeilBrown
@ 2021-12-13  4:14 ` NeilBrown
  2021-12-13  4:22   ` Matthew Wilcox
  2021-12-13  4:14 ` [PATCH 2/2] Remove bdi_congested() and wb_congested() and related functions NeilBrown
  1 sibling, 1 reply; 7+ messages in thread
From: NeilBrown @ 2021-12-13  4:14 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Philipp Reisner, Lars Ellenberg,
	Jan Kara, Ryusuke Konishi, Darrick J. Wong
  Cc: linux-xfs, linux-ext4, linux-nilfs, linux-mm, linux-fsdevel,
	linux-kernel

inode_congested() reports if the backing-device for the inode is
congested.  Few bdi report congestion any more, only ceph, fuse, and
nfs.  Have support just for them is unlikely to be useful.

The places which test inode_congested() or it variants like
inode_write_congested(), avoid initiation IO if congestion is present.
We now have to rely on other places in the stack to back off, or abort
requests - we already do for everything except these 3 filesystems.

So remove inode_congested() and related functions, and remove the call
sites, assuming that inode_congested() always returns 'false'.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/fs-writeback.c           |   37 -------------------------------------
 include/linux/backing-dev.h |   22 ----------------------
 mm/fadvise.c                |    5 ++---
 mm/readahead.c              |    6 ------
 mm/vmscan.c                 |   17 +----------------
 5 files changed, 3 insertions(+), 84 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 67f0e88eed01..ce41d8413654 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -891,43 +891,6 @@ void wbc_account_cgroup_owner(struct writeback_control *wbc, struct page *page,
 }
 EXPORT_SYMBOL_GPL(wbc_account_cgroup_owner);
 
-/**
- * inode_congested - test whether an inode is congested
- * @inode: inode to test for congestion (may be NULL)
- * @cong_bits: mask of WB_[a]sync_congested bits to test
- *
- * Tests whether @inode is congested.  @cong_bits is the mask of congestion
- * bits to test and the return value is the mask of set bits.
- *
- * If cgroup writeback is enabled for @inode, the congestion state is
- * determined by whether the cgwb (cgroup bdi_writeback) for the blkcg
- * associated with @inode is congested; otherwise, the root wb's congestion
- * state is used.
- *
- * @inode is allowed to be NULL as this function is often called on
- * mapping->host which is NULL for the swapper space.
- */
-int inode_congested(struct inode *inode, int cong_bits)
-{
-	/*
-	 * Once set, ->i_wb never becomes NULL while the inode is alive.
-	 * Start transaction iff ->i_wb is visible.
-	 */
-	if (inode && inode_to_wb_is_valid(inode)) {
-		struct bdi_writeback *wb;
-		struct wb_lock_cookie lock_cookie = {};
-		bool congested;
-
-		wb = unlocked_inode_to_wb_begin(inode, &lock_cookie);
-		congested = wb_congested(wb, cong_bits);
-		unlocked_inode_to_wb_end(inode, &lock_cookie);
-		return congested;
-	}
-
-	return wb_congested(&inode_to_bdi(inode)->wb, cong_bits);
-}
-EXPORT_SYMBOL_GPL(inode_congested);
-
 /**
  * wb_split_bdi_pages - split nr_pages to write according to bandwidth
  * @wb: target bdi_writeback to split @nr_pages to
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 483979c1b9f4..860b675c2929 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -162,7 +162,6 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi,
 				    gfp_t gfp);
 void wb_memcg_offline(struct mem_cgroup *memcg);
 void wb_blkcg_offline(struct blkcg *blkcg);
-int inode_congested(struct inode *inode, int cong_bits);
 
 /**
  * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
@@ -390,29 +389,8 @@ static inline void wb_blkcg_offline(struct blkcg *blkcg)
 {
 }
 
-static inline int inode_congested(struct inode *inode, int cong_bits)
-{
-	return wb_congested(&inode_to_bdi(inode)->wb, cong_bits);
-}
-
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
-static inline int inode_read_congested(struct inode *inode)
-{
-	return inode_congested(inode, 1 << WB_sync_congested);
-}
-
-static inline int inode_write_congested(struct inode *inode)
-{
-	return inode_congested(inode, 1 << WB_async_congested);
-}
-
-static inline int inode_rw_congested(struct inode *inode)
-{
-	return inode_congested(inode, (1 << WB_sync_congested) |
-				      (1 << WB_async_congested));
-}
-
 static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits)
 {
 	return wb_congested(&bdi->wb, cong_bits);
diff --git a/mm/fadvise.c b/mm/fadvise.c
index d6baa4f451c5..338f16022012 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -109,9 +109,8 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 	case POSIX_FADV_NOREUSE:
 		break;
 	case POSIX_FADV_DONTNEED:
-		if (!inode_write_congested(mapping->host))
-			__filemap_fdatawrite_range(mapping, offset, endbyte,
-						   WB_SYNC_NONE);
+		__filemap_fdatawrite_range(mapping, offset, endbyte,
+					   WB_SYNC_NONE);
 
 		/*
 		 * First and last FULL page! Partial pages are deliberately
diff --git a/mm/readahead.c b/mm/readahead.c
index 6ae5693de28c..cc5845b8c7c3 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -595,12 +595,6 @@ void page_cache_async_ra(struct readahead_control *ractl,
 
 	ClearPageReadahead(page);
 
-	/*
-	 * Defer asynchronous read-ahead on IO congestion.
-	 */
-	if (inode_read_congested(ractl->mapping->host))
-		return;
-
 	if (blk_cgroup_congested())
 		return;
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fb9584641ac7..540aa0ea67ff 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -989,17 +989,6 @@ static inline int is_page_cache_freeable(struct page *page)
 	return page_count(page) - page_has_private(page) == 1 + page_cache_pins;
 }
 
-static int may_write_to_inode(struct inode *inode)
-{
-	if (current->flags & PF_SWAPWRITE)
-		return 1;
-	if (!inode_write_congested(inode))
-		return 1;
-	if (inode_to_bdi(inode) == current->backing_dev_info)
-		return 1;
-	return 0;
-}
-
 /*
  * We detected a synchronous write error writing a page out.  Probably
  * -ENOSPC.  We need to propagate that into the address_space for a subsequent
@@ -1158,8 +1147,6 @@ static pageout_t pageout(struct page *page, struct address_space *mapping)
 	}
 	if (mapping->a_ops->writepage == NULL)
 		return PAGE_ACTIVATE;
-	if (!may_write_to_inode(mapping->host))
-		return PAGE_KEEP;
 
 	if (clear_page_dirty_for_io(page)) {
 		int res;
@@ -1535,9 +1522,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 		 * end of the LRU a second time.
 		 */
 		mapping = page_mapping(page);
-		if (((dirty || writeback) && mapping &&
-		     inode_write_congested(mapping->host)) ||
-		    (writeback && PageReclaim(page)))
+		if (writeback && PageReclaim(page))
 			stat->nr_congested++;
 
 		/*



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] Remove bdi_congested() and wb_congested() and related functions
  2021-12-13  4:14 [PATCH 0/2] Remove some 'congested' tests NeilBrown
  2021-12-13  4:14 ` [PATCH 1/2] Remove inode_congested() NeilBrown
@ 2021-12-13  4:14 ` NeilBrown
  2021-12-13  5:07   ` Dave Chinner
  1 sibling, 1 reply; 7+ messages in thread
From: NeilBrown @ 2021-12-13  4:14 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Philipp Reisner, Lars Ellenberg,
	Jan Kara, Ryusuke Konishi, Darrick J. Wong
  Cc: linux-xfs, linux-ext4, linux-nilfs, linux-mm, linux-fsdevel,
	linux-kernel

These functions are no longer useful as the only bdis that report
congestion are in ceph, fuse, and nfs.  None of those bdis can be the
target of the calls in drbd, ext2, nilfs2, or xfs.

Removing the test on bdi_write_contested() in current_may_throttle()
could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
is set.

So replace the calls by 'false' and simplify the code - and remove the
functions.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/block/drbd/drbd_int.h |    3 ---
 drivers/block/drbd/drbd_req.c |    3 +--
 fs/ext2/ialloc.c              |    2 --
 fs/nilfs2/segbuf.c            |   11 -----------
 fs/xfs/xfs_buf.c              |    3 ---
 include/linux/backing-dev.h   |   26 --------------------------
 mm/vmscan.c                   |    4 +---
 7 files changed, 2 insertions(+), 50 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index f27d5b0f9a0b..f804b1bfb3e6 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -638,9 +638,6 @@ enum {
 	STATE_SENT,		/* Do not change state/UUIDs while this is set */
 	CALLBACK_PENDING,	/* Whether we have a call_usermodehelper(, UMH_WAIT_PROC)
 				 * pending, from drbd worker context.
-				 * If set, bdi_write_congested() returns true,
-				 * so shrink_page_list() would not recurse into,
-				 * and potentially deadlock on, this drbd worker.
 				 */
 	DISCONNECT_SENT,
 
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 3235532ae077..2e5fb7e442e3 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -909,8 +909,7 @@ static bool remote_due_to_read_balancing(struct drbd_device *device, sector_t se
 
 	switch (rbm) {
 	case RB_CONGESTED_REMOTE:
-		return bdi_read_congested(
-			device->ldev->backing_bdev->bd_disk->bdi);
+		return 0;
 	case RB_LEAST_PENDING:
 		return atomic_read(&device->local_cnt) >
 			atomic_read(&device->ap_pending_cnt) + atomic_read(&device->rs_pending_cnt);
diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c
index df14e750e9fe..d632764da240 100644
--- a/fs/ext2/ialloc.c
+++ b/fs/ext2/ialloc.c
@@ -173,8 +173,6 @@ static void ext2_preread_inode(struct inode *inode)
 	struct backing_dev_info *bdi;
 
 	bdi = inode_to_bdi(inode);
-	if (bdi_rw_congested(bdi))
-		return;
 
 	block_group = (inode->i_ino - 1) / EXT2_INODES_PER_GROUP(inode->i_sb);
 	gdp = ext2_get_group_desc(inode->i_sb, block_group, NULL);
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index 43287b0d3e9b..d1ebc9da7130 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -343,17 +343,6 @@ static int nilfs_segbuf_submit_bio(struct nilfs_segment_buffer *segbuf,
 	struct bio *bio = wi->bio;
 	int err;
 
-	if (segbuf->sb_nbio > 0 &&
-	    bdi_write_congested(segbuf->sb_super->s_bdi)) {
-		wait_for_completion(&segbuf->sb_bio_event);
-		segbuf->sb_nbio--;
-		if (unlikely(atomic_read(&segbuf->sb_err))) {
-			bio_put(bio);
-			err = -EIO;
-			goto failed;
-		}
-	}
-
 	bio->bi_end_io = nilfs_end_bio_write;
 	bio->bi_private = segbuf;
 	bio_set_op_attrs(bio, mode, mode_flags);
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 631c5a61d89b..22f73b3e888e 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -843,9 +843,6 @@ xfs_buf_readahead_map(
 {
 	struct xfs_buf		*bp;
 
-	if (bdi_read_congested(target->bt_bdev->bd_disk->bdi))
-		return;
-
 	xfs_buf_read_map(target, map, nmaps,
 		     XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD, &bp, ops,
 		     __this_address);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 860b675c2929..2d764566280c 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -135,11 +135,6 @@ static inline bool writeback_in_progress(struct bdi_writeback *wb)
 
 struct backing_dev_info *inode_to_bdi(struct inode *inode);
 
-static inline int wb_congested(struct bdi_writeback *wb, int cong_bits)
-{
-	return wb->congested & cong_bits;
-}
-
 long congestion_wait(int sync, long timeout);
 
 static inline bool mapping_can_writeback(struct address_space *mapping)
@@ -391,27 +386,6 @@ static inline void wb_blkcg_offline(struct blkcg *blkcg)
 
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
-static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits)
-{
-	return wb_congested(&bdi->wb, cong_bits);
-}
-
-static inline int bdi_read_congested(struct backing_dev_info *bdi)
-{
-	return bdi_congested(bdi, 1 << WB_sync_congested);
-}
-
-static inline int bdi_write_congested(struct backing_dev_info *bdi)
-{
-	return bdi_congested(bdi, 1 << WB_async_congested);
-}
-
-static inline int bdi_rw_congested(struct backing_dev_info *bdi)
-{
-	return bdi_congested(bdi, (1 << WB_sync_congested) |
-				  (1 << WB_async_congested));
-}
-
 const char *bdi_dev_name(struct backing_dev_info *bdi);
 
 #endif	/* _LINUX_BACKING_DEV_H */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 540aa0ea67ff..f46a7a17dc49 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2321,9 +2321,7 @@ static unsigned int move_pages_to_lru(struct lruvec *lruvec,
  */
 static int current_may_throttle(void)
 {
-	return !(current->flags & PF_LOCAL_THROTTLE) ||
-		current->backing_dev_info == NULL ||
-		bdi_write_congested(current->backing_dev_info);
+	return !(current->flags & PF_LOCAL_THROTTLE);
 }
 
 /*



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Remove inode_congested()
  2021-12-13  4:14 ` [PATCH 1/2] Remove inode_congested() NeilBrown
@ 2021-12-13  4:22   ` Matthew Wilcox
  2021-12-13  4:59     ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2021-12-13  4:22 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, Mel Gorman, Philipp Reisner, Lars Ellenberg,
	Jan Kara, Ryusuke Konishi, Darrick J. Wong, linux-xfs,
	linux-ext4, linux-nilfs, linux-mm, linux-fsdevel, linux-kernel

On Mon, Dec 13, 2021 at 03:14:27PM +1100, NeilBrown wrote:
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fb9584641ac7..540aa0ea67ff 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -989,17 +989,6 @@ static inline int is_page_cache_freeable(struct page *page)
>  	return page_count(page) - page_has_private(page) == 1 + page_cache_pins;
>  }
>  
> -static int may_write_to_inode(struct inode *inode)
> -{
> -	if (current->flags & PF_SWAPWRITE)
> -		return 1;
> -	if (!inode_write_congested(inode))
> -		return 1;
> -	if (inode_to_bdi(inode) == current->backing_dev_info)
> -		return 1;
> -	return 0;
> -}

Why is it safe to get rid of the PF_SWAPWRITE and current->backing_dev_info
checks?

> @@ -1158,8 +1147,6 @@ static pageout_t pageout(struct page *page, struct address_space *mapping)
>  	}
>  	if (mapping->a_ops->writepage == NULL)
>  		return PAGE_ACTIVATE;
> -	if (!may_write_to_inode(mapping->host))
> -		return PAGE_KEEP;
>  
>  	if (clear_page_dirty_for_io(page)) {
>  		int res;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Remove inode_congested()
  2021-12-13  4:22   ` Matthew Wilcox
@ 2021-12-13  4:59     ` NeilBrown
  0 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2021-12-13  4:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Mel Gorman, Philipp Reisner, Lars Ellenberg,
	Jan Kara, Ryusuke Konishi, Darrick J. Wong, linux-xfs,
	linux-ext4, linux-nilfs, linux-mm, linux-fsdevel, linux-kernel

On Mon, 13 Dec 2021, Matthew Wilcox wrote:
> On Mon, Dec 13, 2021 at 03:14:27PM +1100, NeilBrown wrote:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index fb9584641ac7..540aa0ea67ff 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -989,17 +989,6 @@ static inline int is_page_cache_freeable(struct page *page)
> >  	return page_count(page) - page_has_private(page) == 1 + page_cache_pins;
> >  }
> >  
> > -static int may_write_to_inode(struct inode *inode)
> > -{
> > -	if (current->flags & PF_SWAPWRITE)
> > -		return 1;
> > -	if (!inode_write_congested(inode))
> > -		return 1;
> > -	if (inode_to_bdi(inode) == current->backing_dev_info)
> > -		return 1;
> > -	return 0;
> > -}
> 
> Why is it safe to get rid of the PF_SWAPWRITE and current->backing_dev_info
> checks?

Ask George Bool.
If inode_write_congested() returns False, then may_write_to_inode() will
always return True.

NeilBrown


> 
> > @@ -1158,8 +1147,6 @@ static pageout_t pageout(struct page *page, struct address_space *mapping)
> >  	}
> >  	if (mapping->a_ops->writepage == NULL)
> >  		return PAGE_ACTIVATE;
> > -	if (!may_write_to_inode(mapping->host))
> > -		return PAGE_KEEP;
> >  
> >  	if (clear_page_dirty_for_io(page)) {
> >  		int res;
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] Remove bdi_congested() and wb_congested() and related functions
  2021-12-13  4:14 ` [PATCH 2/2] Remove bdi_congested() and wb_congested() and related functions NeilBrown
@ 2021-12-13  5:07   ` Dave Chinner
  2021-12-13  7:04     ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2021-12-13  5:07 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, Mel Gorman, Philipp Reisner, Lars Ellenberg,
	Jan Kara, Ryusuke Konishi, Darrick J. Wong, linux-xfs,
	linux-ext4, linux-nilfs, linux-mm, linux-fsdevel, linux-kernel

On Mon, Dec 13, 2021 at 03:14:27PM +1100, NeilBrown wrote:
> These functions are no longer useful as the only bdis that report
> congestion are in ceph, fuse, and nfs.  None of those bdis can be the
> target of the calls in drbd, ext2, nilfs2, or xfs.
> 
> Removing the test on bdi_write_contested() in current_may_throttle()
> could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
> is set.
> 
> So replace the calls by 'false' and simplify the code - and remove the
> functions.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
....
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 631c5a61d89b..22f73b3e888e 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -843,9 +843,6 @@ xfs_buf_readahead_map(
>  {
>  	struct xfs_buf		*bp;
>  
> -	if (bdi_read_congested(target->bt_bdev->bd_disk->bdi))
> -		return;

Ok, but this isn't a "throttle writeback" test here - it's trying to
avoid having speculative readahead blocking on a full request queue
instead of just skipping the readahead IO. i.e. prevent readahead
thrashing and/or adding unnecessary read load when we already have a
full read queue...

So what is the replacement for that? We want to skip the entire
buffer lookup/setup/read overhead if we're likely to block on IO
submission - is there anything we can use to do this these days?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] Remove bdi_congested() and wb_congested() and related functions
  2021-12-13  5:07   ` Dave Chinner
@ 2021-12-13  7:04     ` NeilBrown
  0 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2021-12-13  7:04 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andrew Morton, Mel Gorman, Philipp Reisner, Lars Ellenberg,
	Jan Kara, Ryusuke Konishi, Darrick J. Wong, linux-xfs,
	linux-ext4, linux-nilfs, linux-mm, linux-fsdevel, linux-kernel

On Mon, 13 Dec 2021, Dave Chinner wrote:
> On Mon, Dec 13, 2021 at 03:14:27PM +1100, NeilBrown wrote:
> > These functions are no longer useful as the only bdis that report
> > congestion are in ceph, fuse, and nfs.  None of those bdis can be the
> > target of the calls in drbd, ext2, nilfs2, or xfs.
> > 
> > Removing the test on bdi_write_contested() in current_may_throttle()
> > could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
> > is set.
> > 
> > So replace the calls by 'false' and simplify the code - and remove the
> > functions.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> ....
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 631c5a61d89b..22f73b3e888e 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -843,9 +843,6 @@ xfs_buf_readahead_map(
> >  {
> >  	struct xfs_buf		*bp;
> >  
> > -	if (bdi_read_congested(target->bt_bdev->bd_disk->bdi))
> > -		return;
> 
> Ok, but this isn't a "throttle writeback" test here - it's trying to
> avoid having speculative readahead blocking on a full request queue
> instead of just skipping the readahead IO. i.e. prevent readahead
> thrashing and/or adding unnecessary read load when we already have a
> full read queue...
> 
> So what is the replacement for that? We want to skip the entire
> buffer lookup/setup/read overhead if we're likely to block on IO
> submission - is there anything we can use to do this these days?

I don't think there is a concept of a "full read queue" any more.
There are things that can block an IO submission though.
There is allocation of the bio from a mempool, and there is
rq_qos_throttle, and there are probably other places where submission
can block.  I don't think you can tell in advance if a submission is
likely to block.

I think the idea is that the top level of the submission stack should
rate-limit based on the estimated throughput of the stack.  I think
write-back does this.  I don't know about read-ahead.

NeilBrown

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-13  7:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-13  4:14 [PATCH 0/2] Remove some 'congested' tests NeilBrown
2021-12-13  4:14 ` [PATCH 1/2] Remove inode_congested() NeilBrown
2021-12-13  4:22   ` Matthew Wilcox
2021-12-13  4:59     ` NeilBrown
2021-12-13  4:14 ` [PATCH 2/2] Remove bdi_congested() and wb_congested() and related functions NeilBrown
2021-12-13  5:07   ` Dave Chinner
2021-12-13  7:04     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).