Subject: writeback: dirty position control - bdi reserve area Date: Thu Aug 04 22:16:46 CST 2011 Keep a minimal pool of dirty pages for each bdi, so that the disk IO queues won't underrun. It's particularly useful for JBOD and small memory system. XXX: When memory is small (in comparison to write bandwidth), this control line may result in (pos_ratio > 1) at the setpoint and push the dirty pages high. This is more or less intended because the bdi is in the danger of IO queue underflow. However the global dirty pages, when pushed close to limit, will eventually conteract our desire to push up the low bdi_dirty. In low memory JBOD tests we do see disks under-utilized from time to time. One scheme that may completely fix this is to add a BDI_queue_empty to indicate the block IO queue emptiness (but still there may be in flight IOs on the driver/hardware side) and to unthrottle the tasks regardless of the global limit on seeing BDI_queue_empty. Signed-off-by: Wu Fengguang --- mm/page-writeback.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) --- linux-next.orig/mm/page-writeback.c 2011-08-16 09:06:46.000000000 +0800 +++ linux-next/mm/page-writeback.c 2011-08-16 09:06:50.000000000 +0800 @@ -488,6 +488,16 @@ unsigned long bdi_dirty_limit(struct bac * 0 +------------.------------------.----------------------*-------------> * freerun^ setpoint^ limit^ dirty pages * + * (o) bdi reserve area + * + * The bdi reserve area tries to keep a reasonable number of dirty pages for + * preventing block queue underrun. + * + * reserve area, scale up rate as dirty pages drop low + * |<----------------------------------------------->| + * |-------------------------------------------------------*-------|---------- + * 0 bdi setpoint^ ^bdi_thresh + * * (o) bdi control lines * * The control lines for the global/bdi setpoints both stretch up to @limit. @@ -571,6 +581,19 @@ static unsigned long bdi_position_ratio( pos_ratio += 1 << RATELIMIT_CALC_SHIFT; /* + * bdi reserve area, safeguard against dirty pool underrun and disk idle + */ + x_intercept = min(bdi->avg_write_bandwidth + 2 * MIN_WRITEBACK_PAGES, + freerun); + if (bdi_dirty < x_intercept) { + if (bdi_dirty > x_intercept / 8) { + pos_ratio *= x_intercept; + do_div(pos_ratio, bdi_dirty); + } else + pos_ratio *= 8; + } + + /* * bdi setpoint * * f(dirty) := 1.0 + k * (dirty - setpoint)