From: "John Stoffel" <john@stoffel.org> To: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, miklos@szeredi.hu, akpm@linux-foundation.org, neilb@suse.de, dgc@sgi.com, tomoki.sekiyama.qu@hitachi.com, nikita@clusterfs.com, trond.myklebust@fys.uio.no, yingchao.zhou@gmail.com, richard@rsk.demon.co.uk, torvalds@linux-foundation.org Subject: Re: [PATCH 21/23] mm: per device dirty threshold Date: Tue, 11 Sep 2007 22:36:12 -0400 [thread overview] Message-ID: <18151.20636.425784.226044@stoffel.org> (raw) In-Reply-To: <20070911200015.732492000@chello.nl> Peter> Scale writeback cache per backing device, proportional to its Peter> writeout speed. By decoupling the BDI dirty thresholds a Peter> number of problems we currently have will go away, namely: Ah, this clarifies my questions! Thanks! Peter> - mutual interference starvation (for any number of BDIs); Peter> - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts). Peter> It might be that all dirty pages are for a single BDI while Peter> other BDIs are idling. By giving each BDI a 'fair' share of the Peter> dirty limit, each one can have dirty pages outstanding and make Peter> progress. Question, can you change (shrink) the limit on a BDI while it has IO in flight? And what will that do to the system? I.e. if you have one device doing IO, so that it has a majority of the dirty limit. Then another device starts IO, and it's a *faster* device, how quickly/slowly does the BDI dirty limits change for both the old and new device? Peter> A global threshold also creates a deadlock for stacked BDIs; Peter> when A writes to B, and A generates enough dirty pages to get Peter> throttled, B will never start writeback until the dirty pages Peter> go away. Again, by giving each BDI its own 'independent' dirty Peter> limit, this problem is avoided. Peter> So the problem is to determine how to distribute the total Peter> dirty limit across the BDIs fairly and efficiently. A DBI that You mean BDI here, not DBI. Peter> has a large dirty limit but does not have any dirty pages Peter> outstanding is a waste. Peter> What is done is to keep a floating proportion between the DBIs Peter> based on writeback completions. This way faster/more active Peter> devices get a larger share than slower/idle devices. Does a slower device get a BDI which is calculated to keep it's limit under a certain number of seconds of outstanding IO? This way no device can build up more than say 15 seconds of outstanding IO to flush at any one time. Thanks! John
WARNING: multiple messages have this Message-ID (diff)
From: "John Stoffel" <john@stoffel.org> To: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, miklos@szeredi.hu, akpm@linux-foundation.org, neilb@suse.de, dgc@sgi.com, tomoki.sekiyama.qu@hitachi.com, nikita@clusterfs.com, trond.myklebust@fys.uio.no, yingchao.zhou@gmail.com, richard@rsk.demon.co.uk, torvalds@linux-foundation.org Subject: Re: [PATCH 21/23] mm: per device dirty threshold Date: Tue, 11 Sep 2007 22:36:12 -0400 [thread overview] Message-ID: <18151.20636.425784.226044@stoffel.org> (raw) In-Reply-To: <20070911200015.732492000@chello.nl> Peter> Scale writeback cache per backing device, proportional to its Peter> writeout speed. By decoupling the BDI dirty thresholds a Peter> number of problems we currently have will go away, namely: Ah, this clarifies my questions! Thanks! Peter> - mutual interference starvation (for any number of BDIs); Peter> - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts). Peter> It might be that all dirty pages are for a single BDI while Peter> other BDIs are idling. By giving each BDI a 'fair' share of the Peter> dirty limit, each one can have dirty pages outstanding and make Peter> progress. Question, can you change (shrink) the limit on a BDI while it has IO in flight? And what will that do to the system? I.e. if you have one device doing IO, so that it has a majority of the dirty limit. Then another device starts IO, and it's a *faster* device, how quickly/slowly does the BDI dirty limits change for both the old and new device? Peter> A global threshold also creates a deadlock for stacked BDIs; Peter> when A writes to B, and A generates enough dirty pages to get Peter> throttled, B will never start writeback until the dirty pages Peter> go away. Again, by giving each BDI its own 'independent' dirty Peter> limit, this problem is avoided. Peter> So the problem is to determine how to distribute the total Peter> dirty limit across the BDIs fairly and efficiently. A DBI that You mean BDI here, not DBI. Peter> has a large dirty limit but does not have any dirty pages Peter> outstanding is a waste. Peter> What is done is to keep a floating proportion between the DBIs Peter> based on writeback completions. This way faster/more active Peter> devices get a larger share than slower/idle devices. Does a slower device get a BDI which is calculated to keep it's limit under a certain number of seconds of outstanding IO? This way no device can build up more than say 15 seconds of outstanding IO to flush at any one time. Thanks! John -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-09-12 2:37 UTC|newest] Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top 2007-09-11 19:53 [PATCH 00/23] per device dirty throttling -v10 Peter Zijlstra 2007-09-11 19:53 ` Peter Zijlstra 2007-09-11 19:53 ` [PATCH 01/23] nfs: remove congestion_end() Peter Zijlstra 2007-09-11 19:53 ` Peter Zijlstra 2007-09-11 19:53 ` [PATCH 02/23] lib: percpu_counter_add Peter Zijlstra 2007-09-11 19:53 ` Peter Zijlstra 2007-09-11 19:53 ` [PATCH 03/23] lib: percpu_counter_sub Peter Zijlstra 2007-09-11 19:53 ` Peter Zijlstra 2007-09-11 19:53 ` [PATCH 04/23] lib: percpu_counter variable batch Peter Zijlstra 2007-09-11 19:53 ` Peter Zijlstra 2007-09-11 19:53 ` [PATCH 05/23] lib: make percpu_counter_add take s64 Peter Zijlstra 2007-09-11 19:53 ` Peter Zijlstra 2007-09-11 19:53 ` [PATCH 06/23] lib: percpu_counter_set Peter Zijlstra 2007-09-11 19:53 ` Peter Zijlstra 2007-09-11 19:53 ` [PATCH 07/23] lib: percpu_counter_sum_positive Peter Zijlstra 2007-09-11 19:53 ` Peter Zijlstra 2007-09-11 19:53 ` [PATCH 08/23] lib: percpu_count_sum() Peter Zijlstra 2007-09-11 19:53 ` Peter Zijlstra 2007-09-11 19:53 ` [PATCH 09/23] lib: percpu_counter_init error handling Peter Zijlstra 2007-09-11 19:53 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 10/23] lib: percpu_counter_init_irq Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 11/23] mm: bdi init hooks Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 12/23] containers: " Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 13/23] mtd: " Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 14/23] mtd: clean up the backing_dev_info usage Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 15/23] mtd: give mtdconcat devices their own backing_dev_info Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 16/23] mm: scalable bdi statistics counters Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 17/23] mm: count reclaimable pages per BDI Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 18/23] mm: count writeback " Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 19/23] mm: expose BDI statistics in sysfs Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 20/23] lib: floating proportions Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 21/23] mm: per device dirty threshold Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-12 2:36 ` John Stoffel [this message] 2007-09-12 2:36 ` John Stoffel 2007-09-12 8:45 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 22/23] mm: dirty balancing for tasks Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-11 19:54 ` [PATCH 23/23] debug: sysfs files for the current ratio/size/total Peter Zijlstra 2007-09-11 19:54 ` Peter Zijlstra 2007-09-12 2:31 ` [PATCH 00/23] per device dirty throttling -v10 John Stoffel 2007-09-12 2:31 ` John Stoffel 2007-09-12 9:00 ` Peter Zijlstra -- strict thread matches above, loose matches on Subject: below -- 2007-08-16 7:45 [PATCH 00/23] per device dirty throttling -v9 Peter Zijlstra 2007-08-16 7:45 ` [PATCH 21/23] mm: per device dirty threshold Peter Zijlstra 2007-08-16 7:45 ` Peter Zijlstra 2007-08-03 12:37 [PATCH 00/23] per device dirty throttling -v8 Peter Zijlstra 2007-08-03 12:37 ` [PATCH 21/23] mm: per device dirty threshold Peter Zijlstra 2007-08-03 12:37 ` Peter Zijlstra
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=18151.20636.425784.226044@stoffel.org \ --to=john@stoffel.org \ --cc=a.p.zijlstra@chello.nl \ --cc=akpm@linux-foundation.org \ --cc=dgc@sgi.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=miklos@szeredi.hu \ --cc=neilb@suse.de \ --cc=nikita@clusterfs.com \ --cc=richard@rsk.demon.co.uk \ --cc=tomoki.sekiyama.qu@hitachi.com \ --cc=torvalds@linux-foundation.org \ --cc=trond.myklebust@fys.uio.no \ --cc=yingchao.zhou@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.