On Wed, Aug 10, 2011 at 03:16:22AM +0800, Vivek Goyal wrote: > On Sat, Aug 06, 2011 at 04:44:52PM +0800, Wu Fengguang wrote: > > [..] > > -/* > > - * task_dirty_limit - scale down dirty throttling threshold for one task > > - * > > - * task specific dirty limit: > > - * > > - * dirty -= (dirty/8) * p_{t} > > - * > > - * To protect light/slow dirtying tasks from heavier/fast ones, we start > > - * throttling individual tasks before reaching the bdi dirty limit. > > - * Relatively low thresholds will be allocated to heavy dirtiers. So when > > - * dirty pages grow large, heavy dirtiers will be throttled first, which will > > - * effectively curb the growth of dirty pages. Light dirtiers with high enough > > - * dirty threshold may never get throttled. > > - */ > > Hi Fengguang, > > So we have got rid of the notion of per task dirty limit based on their > fraction? What replaces it. It's simply removed :) > I can't see any code which is replacing it. The think time compensation feature (patch attached) will be providing the same protection for light/slow dirtiers. With it, the slower dirtiers won't be throttled at all, because the pause time calculated by period = pages_dirtied / rate pause = period - think will be <= 0. For example, given write_bw = 100MB/s and - 2 dd tasks that dirty pages as fast as possible - 1 scp whose dirty rate is limited by network bandwidth 10MB/s Then with think time compensation, the real dirty rates will be - 2 dd tasks: (100-10)/2 = 45MB/s (each) - 1 scp task: 10MB/s The scp task won't be throttled by balance_dirty_pages() any more. This is a tested feature. In the below graph, the dirty rate (the slope of the lines) of the last 3 tasks are 2, 4, 8 MB/s http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/RATES-2-4-8/btrfs-fio-rates-128k-8p-2975M-2.6.38-rc6-dt6+-2011-03-01-20-45/balance_dirty_pages-task-bw.png given this fio workload, which started one full speed dirtier and four 1, 2, 4, 8 MB/s rate limited dirtiers http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/RATES-2-4-8/btrfs-fio-rates-128k-8p-2975M-2.6.38-rc6-dt6+-2011-03-01-20-45/fio-rates > If yes, I am wondering how > do you get fairness among tasks which share this bdi. > > Also wondering what did this patch series to do make sure that tasks > share bdi more fairly and get write_bw/N bandwidth. Each of the N dd tasks will be rate limited by rate = base_rate * pos_ratio At any time snapshot, each bdi task will see almost the same base_rate and pos_ratio, so will be throttled almost at the same rate. This is a strong guarantee of fairness under all situations. Since pos_ratio is fluctuating (evenly) around 1.0, and base_rate=bdi->dirty_ratelimit is fluctuating around (write_bw/N), on average we get avg_rate = (write_bw/N) * 1.0 (I'll explain the "dirty_ratelimit = write_bw/N" magic other emails.) The below graphs demonstrate the dirty progress of the last 3 dd tasks. The slope of each curve is the dirty rate. They vividly show three curves progressing at the same pace in all of the 3 stages - rampup stage (20-100s) - disturbed stage (120s-160s) (disturbed by starting a 1GB read dd in the middle of the tests) - stable stage (after 160s) And dirtied almost the same amount of pages during the test. http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v8/8G/xfs-10dd-4k-32p-6802M-20:10-3.0.0-next-20110802+-2011-08-06.16:26/balance_dirty_pages-task-bw.png http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v8/2G/xfs-10dd-4k-8p-1947M-20:10-3.0.0-next-20110802+-2011-08-06.15:49/balance_dirty_pages-task-bw.png Thanks, Fengguang