* fadvise DONTNEED implementation (or lack thereof) @ 2010-11-04 5:58 Ben Gamari 2010-11-09 7:28 ` KOSAKI Motohiro 0 siblings, 1 reply; 23+ messages in thread From: Ben Gamari @ 2010-11-04 5:58 UTC (permalink / raw) To: linux-kernel, rsync, linux-mm I've recently been trying to track down the root cause of my server's persistent issue of thrashing horribly after being left inactive. It seems that the issue is likely my nightly backup schedule (using rsync) which traverses my entire 50GB home directory. I was surprised to find that rsync does not use fadvise to notify the kernel of its use-once data usage pattern. It looks like a patch[1] was written (although never merged, it seems) incorporating fadvise support, but I found its implementation rather odd, using mincore() and FADV_DONTNEED to kick out only regions brought in by rsync. It seemed to me the simpler and more appropriate solution would be to simply flag every touched file with FADV_NOREUSE and let the kernel manage automatically expelling used pages. After looking deeper into the kernel implementation[2] of fadvise() the reason for using DONTNEED became more apparant. It seems that the kernel implements NOREUSE as a noop. A little googling revealed[3] that I not the first person to encounter this limitation. It looks like a few folks[4] have discussed addressing the issue in the past, but nothing has happened as of 2.6.36. Are there plans to implement this functionality in the near future? It seems like the utility of fadvise is severely limited by lacking support for NOREUSE. Cheers, - Ben [1] http://insights.oetiker.ch/linux/fadvise.html [2] http://lxr.free-electrons.com/source/mm/fadvise.c?a=avr32 [3] https://issues.apache.org/jira/browse/CASSANDRA-1470 http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html [4] http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0442.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-04 5:58 fadvise DONTNEED implementation (or lack thereof) Ben Gamari @ 2010-11-09 7:28 ` KOSAKI Motohiro 2010-11-09 8:03 ` KOSAKI Motohiro 2010-11-09 12:54 ` Ben Gamari 0 siblings, 2 replies; 23+ messages in thread From: KOSAKI Motohiro @ 2010-11-09 7:28 UTC (permalink / raw) To: Ben Gamari; +Cc: kosaki.motohiro, linux-kernel, rsync, linux-mm > I've recently been trying to track down the root cause of my server's > persistent issue of thrashing horribly after being left inactive. It > seems that the issue is likely my nightly backup schedule (using rsync) > which traverses my entire 50GB home directory. I was surprised to find > that rsync does not use fadvise to notify the kernel of its use-once > data usage pattern. > > It looks like a patch[1] was written (although never merged, it seems) > incorporating fadvise support, but I found its implementation rather > odd, using mincore() and FADV_DONTNEED to kick out only regions brought > in by rsync. It seemed to me the simpler and more appropriate solution > would be to simply flag every touched file with FADV_NOREUSE and let the > kernel manage automatically expelling used pages. > > After looking deeper into the kernel implementation[2] of fadvise() the > reason for using DONTNEED became more apparant. It seems that the kernel > implements NOREUSE as a noop. A little googling revealed[3] that I not > the first person to encounter this limitation. It looks like a few > folks[4] have discussed addressing the issue in the past, but nothing > has happened as of 2.6.36. Are there plans to implement this > functionality in the near future? It seems like the utility of fadvise > is severely limited by lacking support for NOREUSE. btw, Other OSs seems to also don't implement it. example, http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/gen/posix_fadvise.c 35 /* 36 * SUSv3 - file advisory information 37 * 38 * This function does nothing, but that's OK because the 39 * Posix specification doesn't require it to do anything 40 * other than return appropriate error numbers. 41 * 42 * In the future, a file system dependent fadvise() or fcntl() 43 * interface, similar to madvise(), should be developed to enable 44 * the kernel to optimize I/O operations based on the given advice. 45 */ 46 47 /* ARGSUSED1 */ 48 int 49 posix_fadvise(int fd, off_t offset, off_t len, int advice) 50 { 51 struct stat64 statb; 52 53 switch (advice) { 54 case POSIX_FADV_NORMAL: 55 case POSIX_FADV_RANDOM: 56 case POSIX_FADV_SEQUENTIAL: 57 case POSIX_FADV_WILLNEED: 58 case POSIX_FADV_DONTNEED: 59 case POSIX_FADV_NOREUSE: 60 break; 61 default: 62 return (EINVAL); 63 } 64 if (len < 0) 65 return (EINVAL); 66 if (fstat64(fd, &statb) != 0) 67 return (EBADF); 68 if (S_ISFIFO(statb.st_mode)) 69 return (ESPIPE); 70 return (0); 71 } So, I don't think application developers will use fadvise() aggressively because we don't have a cross platform agreement of a fadvice behavior. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-09 7:28 ` KOSAKI Motohiro @ 2010-11-09 8:03 ` KOSAKI Motohiro 2010-11-09 12:54 ` Ben Gamari 1 sibling, 0 replies; 23+ messages in thread From: KOSAKI Motohiro @ 2010-11-09 8:03 UTC (permalink / raw) To: KOSAKI Motohiro Cc: kosaki.motohiro, Ben Gamari, linux-kernel, rsync, linux-mm > > I've recently been trying to track down the root cause of my server's > > persistent issue of thrashing horribly after being left inactive. It > > seems that the issue is likely my nightly backup schedule (using rsync) > > which traverses my entire 50GB home directory. I was surprised to find > > that rsync does not use fadvise to notify the kernel of its use-once > > data usage pattern. > > > > It looks like a patch[1] was written (although never merged, it seems) > > incorporating fadvise support, but I found its implementation rather > > odd, using mincore() and FADV_DONTNEED to kick out only regions brought > > in by rsync. It seemed to me the simpler and more appropriate solution > > would be to simply flag every touched file with FADV_NOREUSE and let the > > kernel manage automatically expelling used pages. > > > > After looking deeper into the kernel implementation[2] of fadvise() the > > reason for using DONTNEED became more apparant. It seems that the kernel > > implements NOREUSE as a noop. A little googling revealed[3] that I not > > the first person to encounter this limitation. It looks like a few > > folks[4] have discussed addressing the issue in the past, but nothing > > has happened as of 2.6.36. Are there plans to implement this > > functionality in the near future? It seems like the utility of fadvise > > is severely limited by lacking support for NOREUSE. > > btw, Other OSs seems to also don't implement it. > example, I've heared other OSs status of fadvise() from private mail. NetBSD: no-op (as linux) FreeBSD/DragonflyBSD/OpenBSD: don't exist posix_fadvise(2) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-09 7:28 ` KOSAKI Motohiro 2010-11-09 8:03 ` KOSAKI Motohiro @ 2010-11-09 12:54 ` Ben Gamari 2010-11-14 5:09 ` KOSAKI Motohiro 1 sibling, 1 reply; 23+ messages in thread From: Ben Gamari @ 2010-11-09 12:54 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: kosaki.motohiro, linux-kernel, rsync, linux-mm On Tue, 9 Nov 2010 16:28:02 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > So, I don't think application developers will use fadvise() aggressively > because we don't have a cross platform agreement of a fadvice behavior. > I strongly disagree. For a long time I have been trying to resolve interactivity issues caused by my rsync-based backup script. Many kernel developers have said that there is nothing the kernel can do without more information from user-space (e.g. cgroups, madvise). While cgroups help, the fix is round-about at best and requires configuration where really none should be necessary. The easiest solution for everyone involved would be for rsync to use FADV_DONTNEED. The behavior doesn't need to be perfectly consistent between platforms for the flag to be useful so long as each implementation does something sane to help use-once access patterns. People seem to mention frequently that there are no users of FADV_DONTNEED and therefore we don't need to implement it. It seems like this is ignoring an obvious catch-22. Currently rsync has no fadvise support at all, since using[1] the implemented hints to get the desired effect is far too complicated^M^M^M^Mhacky to be considered merge-worthy. Considering the number of Google hits returned for fadvise, I wouldn't be surprised if there were countless other projects with this same difficulty. We want to be able to tell the kernel about our useage patterns, but the kernel won't listen. Cheers, - Ben [1] http://insights.oetiker.ch/linux/fadvise.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-09 12:54 ` Ben Gamari @ 2010-11-14 5:09 ` KOSAKI Motohiro 2010-11-14 5:20 ` Ben Gamari 2010-11-15 6:07 ` Minchan Kim 0 siblings, 2 replies; 23+ messages in thread From: KOSAKI Motohiro @ 2010-11-14 5:09 UTC (permalink / raw) To: Ben Gamari; +Cc: kosaki.motohiro, linux-kernel, rsync, linux-mm > On Tue, 9 Nov 2010 16:28:02 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > So, I don't think application developers will use fadvise() aggressively > > because we don't have a cross platform agreement of a fadvice behavior. > > > I strongly disagree. For a long time I have been trying to resolve > interactivity issues caused by my rsync-based backup script. Many kernel > developers have said that there is nothing the kernel can do without > more information from user-space (e.g. cgroups, madvise). While cgroups > help, the fix is round-about at best and requires configuration where > really none should be necessary. The easiest solution for everyone > involved would be for rsync to use FADV_DONTNEED. The behavior doesn't > need to be perfectly consistent between platforms for the flag to be > useful so long as each implementation does something sane to help > use-once access patterns. > > People seem to mention frequently that there are no users of > FADV_DONTNEED and therefore we don't need to implement it. It seems like > this is ignoring an obvious catch-22. Currently rsync has no fadvise > support at all, since using[1] the implemented hints to get the desired > effect is far too complicated^M^M^M^Mhacky to be considered > merge-worthy. Considering the number of Google hits returned for > fadvise, I wouldn't be surprised if there were countless other projects > with this same difficulty. We want to be able to tell the kernel about > our useage patterns, but the kernel won't listen. Because we have an alternative solution already. please try memcgroup :) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-14 5:09 ` KOSAKI Motohiro @ 2010-11-14 5:20 ` Ben Gamari 2010-11-14 21:33 ` Brian K. White 2010-11-15 6:07 ` Minchan Kim 1 sibling, 1 reply; 23+ messages in thread From: Ben Gamari @ 2010-11-14 5:20 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: kosaki.motohiro, linux-kernel, rsync, linux-mm On Sun, 14 Nov 2010 14:09:29 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > Because we have an alternative solution already. please try memcgroup :) > Alright, fair enough. It still seems like there are many cases where fadvise seems more appropriate, but memcg should at least satisfy my personal needs so I'll shut up now. Thanks! - Ben ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-14 5:20 ` Ben Gamari @ 2010-11-14 21:33 ` Brian K. White 0 siblings, 0 replies; 23+ messages in thread From: Brian K. White @ 2010-11-14 21:33 UTC (permalink / raw) To: rsync; +Cc: linux-kernel, linux-mm On 11/14/2010 12:20 AM, Ben Gamari wrote: > On Sun, 14 Nov 2010 14:09:29 +0900 (JST), KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com> wrote: >> Because we have an alternative solution already. please try memcgroup :) >> > Alright, fair enough. It still seems like there are many cases where > fadvise seems more appropriate, but memcg should at least satisfy my > personal needs so I'll shut up now. Thanks! > > - Ben Could someone expand on this a little? The "there are no users of this feature" argument is indeed a silly one. I've only wanted the ability to perform i/o without poisoning the cache since oh, 10 or more years ago at least. It really hurts my users since they are all direct login interactive db app users. No load balancing web interface can hide the fact when a box goes to a crawl. How would one use memcgroup to prevent a backup or other large file operation from wiping out the cache with used-once garbage? (note for rsync in particular, how does this help rsync on other platforms?) -- bkw ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-14 5:09 ` KOSAKI Motohiro 2010-11-14 5:20 ` Ben Gamari @ 2010-11-15 6:07 ` Minchan Kim 2010-11-15 7:09 ` KOSAKI Motohiro 2010-11-15 8:47 ` Peter Zijlstra 1 sibling, 2 replies; 23+ messages in thread From: Minchan Kim @ 2010-11-15 6:07 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Ben Gamari, linux-kernel, rsync, linux-mm, Peter Zijlstra, Wu Fengguang On Sun, Nov 14, 2010 at 2:09 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> On Tue, 9 Nov 2010 16:28:02 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> > So, I don't think application developers will use fadvise() aggressively >> > because we don't have a cross platform agreement of a fadvice behavior. >> > >> I strongly disagree. For a long time I have been trying to resolve >> interactivity issues caused by my rsync-based backup script. Many kernel >> developers have said that there is nothing the kernel can do without >> more information from user-space (e.g. cgroups, madvise). While cgroups >> help, the fix is round-about at best and requires configuration where >> really none should be necessary. The easiest solution for everyone >> involved would be for rsync to use FADV_DONTNEED. The behavior doesn't >> need to be perfectly consistent between platforms for the flag to be >> useful so long as each implementation does something sane to help >> use-once access patterns. >> >> People seem to mention frequently that there are no users of >> FADV_DONTNEED and therefore we don't need to implement it. It seems like >> this is ignoring an obvious catch-22. Currently rsync has no fadvise >> support at all, since using[1] the implemented hints to get the desired >> effect is far too complicated^M^M^M^Mhacky to be considered >> merge-worthy. Considering the number of Google hits returned for >> fadvise, I wouldn't be surprised if there were countless other projects >> with this same difficulty. We want to be able to tell the kernel about >> our useage patterns, but the kernel won't listen. > > Because we have an alternative solution already. please try memcgroup :) I think memcg could be a solution of them but fundamental solution is that we have to cure it in VM itself. I feel it's absolutely absurd to enable and use memcg for amending it. I wonder what's the problem in Peter's patch 'drop behind'. http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html Could anyone tell me why it can't accept upstream? -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-15 6:07 ` Minchan Kim @ 2010-11-15 7:09 ` KOSAKI Motohiro 2010-11-15 7:19 ` Minchan Kim 2010-11-15 8:47 ` Peter Zijlstra 1 sibling, 1 reply; 23+ messages in thread From: KOSAKI Motohiro @ 2010-11-15 7:09 UTC (permalink / raw) To: Minchan Kim Cc: kosaki.motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Peter Zijlstra, Wu Fengguang > > Because we have an alternative solution already. please try memcgroup :) > > I think memcg could be a solution of them but fundamental solution is > that we have to cure it in VM itself. > I feel it's absolutely absurd to enable and use memcg for amending it. > > I wonder what's the problem in Peter's patch 'drop behind'. > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html > > Could anyone tell me why it can't accept upstream? I don't know the reason. And this one looks reasonable to me. I'm curious the above patch solve rsync issue or not. Minchan, have you tested it yourself? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-15 7:09 ` KOSAKI Motohiro @ 2010-11-15 7:19 ` Minchan Kim 2010-11-15 7:28 ` KOSAKI Motohiro 0 siblings, 1 reply; 23+ messages in thread From: Minchan Kim @ 2010-11-15 7:19 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Ben Gamari, linux-kernel, rsync, linux-mm, Peter Zijlstra, Wu Fengguang On Mon, Nov 15, 2010 at 4:09 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> > Because we have an alternative solution already. please try memcgroup :) >> >> I think memcg could be a solution of them but fundamental solution is >> that we have to cure it in VM itself. >> I feel it's absolutely absurd to enable and use memcg for amending it. >> >> I wonder what's the problem in Peter's patch 'drop behind'. >> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html >> >> Could anyone tell me why it can't accept upstream? > > I don't know the reason. And this one looks reasonable to me. I'm curious the above > patch solve rsync issue or not. > Minchan, have you tested it yourself? Still yet. :) If we all think it's reasonable, it would be valuable to adjust it with current mmotm and see the effect. > > > -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-15 7:19 ` Minchan Kim @ 2010-11-15 7:28 ` KOSAKI Motohiro 2010-11-15 7:46 ` Minchan Kim 2010-11-15 12:46 ` Ben Gamari 0 siblings, 2 replies; 23+ messages in thread From: KOSAKI Motohiro @ 2010-11-15 7:28 UTC (permalink / raw) To: Minchan Kim Cc: kosaki.motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Peter Zijlstra, Wu Fengguang > On Mon, Nov 15, 2010 at 4:09 PM, KOSAKI Motohiro > <kosaki.motohiro@jp.fujitsu.com> wrote: > >> > Because we have an alternative solution already. please try memcgroup :) > >> > >> I think memcg could be a solution of them but fundamental solution is > >> that we have to cure it in VM itself. > >> I feel it's absolutely absurd to enable and use memcg for amending it. > >> > >> I wonder what's the problem in Peter's patch 'drop behind'. > >> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html > >> > >> Could anyone tell me why it can't accept upstream? > > > > I don't know the reason. And this one looks reasonable to me. I'm curious the above > > patch solve rsync issue or not. > > Minchan, have you tested it yourself? > > Still yet. :) > If we all think it's reasonable, it would be valuable to adjust it > with current mmotm and see the effect. Who can make rsync like io pattern test suite? a code change is easy. but to comfirm justification is more harder work. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-15 7:28 ` KOSAKI Motohiro @ 2010-11-15 7:46 ` Minchan Kim 2010-11-15 12:46 ` Ben Gamari 1 sibling, 0 replies; 23+ messages in thread From: Minchan Kim @ 2010-11-15 7:46 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Ben Gamari, linux-kernel, rsync, linux-mm, Peter Zijlstra, Wu Fengguang On Mon, Nov 15, 2010 at 4:28 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> On Mon, Nov 15, 2010 at 4:09 PM, KOSAKI Motohiro >> <kosaki.motohiro@jp.fujitsu.com> wrote: >> >> > Because we have an alternative solution already. please try memcgroup :) >> >> >> >> I think memcg could be a solution of them but fundamental solution is >> >> that we have to cure it in VM itself. >> >> I feel it's absolutely absurd to enable and use memcg for amending it. >> >> >> >> I wonder what's the problem in Peter's patch 'drop behind'. >> >> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html >> >> >> >> Could anyone tell me why it can't accept upstream? >> > >> > I don't know the reason. And this one looks reasonable to me. I'm curious the above >> > patch solve rsync issue or not. >> > Minchan, have you tested it yourself? >> >> Still yet. :) >> If we all think it's reasonable, it would be valuable to adjust it >> with current mmotm and see the effect. > > Who can make rsync like io pattern test suite? a code change is easy. but > to comfirm justification is more harder work. Maybe Ben, Brian those reports the problem. :) -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-15 7:28 ` KOSAKI Motohiro 2010-11-15 7:46 ` Minchan Kim @ 2010-11-15 12:46 ` Ben Gamari 1 sibling, 0 replies; 23+ messages in thread From: Ben Gamari @ 2010-11-15 12:46 UTC (permalink / raw) To: KOSAKI Motohiro, Minchan Kim Cc: kosaki.motohiro, linux-kernel, rsync, linux-mm, Peter Zijlstra, Wu Fengguang On Mon, 15 Nov 2010 16:28:32 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > Who can make rsync like io pattern test suite? a code change is easy. but > to comfirm justification is more harder work. > I'm afraid I don't have time to work up any code. I would be happy to try the patch with my backup use-case though. I'll just have to think of an objective way of measuring the result. - Ben ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-15 6:07 ` Minchan Kim 2010-11-15 7:09 ` KOSAKI Motohiro @ 2010-11-15 8:47 ` Peter Zijlstra 2010-11-15 9:05 ` Minchan Kim 2010-11-15 9:10 ` KOSAKI Motohiro 1 sibling, 2 replies; 23+ messages in thread From: Peter Zijlstra @ 2010-11-15 8:47 UTC (permalink / raw) To: Minchan Kim Cc: KOSAKI Motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Wu Fengguang On Mon, 2010-11-15 at 15:07 +0900, Minchan Kim wrote: > On Sun, Nov 14, 2010 at 2:09 PM, KOSAKI Motohiro > <kosaki.motohiro@jp.fujitsu.com> wrote: > >> On Tue, 9 Nov 2010 16:28:02 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > >> > So, I don't think application developers will use fadvise() aggressively > >> > because we don't have a cross platform agreement of a fadvice behavior. > >> > > >> I strongly disagree. For a long time I have been trying to resolve > >> interactivity issues caused by my rsync-based backup script. Many kernel > >> developers have said that there is nothing the kernel can do without > >> more information from user-space (e.g. cgroups, madvise). While cgroups > >> help, the fix is round-about at best and requires configuration where > >> really none should be necessary. The easiest solution for everyone > >> involved would be for rsync to use FADV_DONTNEED. The behavior doesn't > >> need to be perfectly consistent between platforms for the flag to be > >> useful so long as each implementation does something sane to help > >> use-once access patterns. > >> > >> People seem to mention frequently that there are no users of > >> FADV_DONTNEED and therefore we don't need to implement it. It seems like > >> this is ignoring an obvious catch-22. Currently rsync has no fadvise > >> support at all, since using[1] the implemented hints to get the desired > >> effect is far too complicated^M^M^M^Mhacky to be considered > >> merge-worthy. Considering the number of Google hits returned for > >> fadvise, I wouldn't be surprised if there were countless other projects > >> with this same difficulty. We want to be able to tell the kernel about > >> our useage patterns, but the kernel won't listen. > > > > Because we have an alternative solution already. please try memcgroup :) Using memcgroup for this is utter crap, it just contains the trainwreck, it doesn't solve it in any way. > I think memcg could be a solution of them but fundamental solution is > that we have to cure it in VM itself. > I feel it's absolutely absurd to enable and use memcg for amending it. Agreed.. > I wonder what's the problem in Peter's patch 'drop behind'. > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html > > Could anyone tell me why it can't accept upstream? Read the thread, its quite clear nobody got convinced it was a good idea and wanted to fix the use-once policy, then Rik rewrote all of page-reclaim. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-15 8:47 ` Peter Zijlstra @ 2010-11-15 9:05 ` Minchan Kim 2010-11-15 14:48 ` Rik van Riel 2010-11-15 9:10 ` KOSAKI Motohiro 1 sibling, 1 reply; 23+ messages in thread From: Minchan Kim @ 2010-11-15 9:05 UTC (permalink / raw) To: Peter Zijlstra Cc: KOSAKI Motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Wu Fengguang, Rik van Riel On Mon, Nov 15, 2010 at 5:47 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2010-11-15 at 15:07 +0900, Minchan Kim wrote: >> On Sun, Nov 14, 2010 at 2:09 PM, KOSAKI Motohiro >> <kosaki.motohiro@jp.fujitsu.com> wrote: >> >> On Tue, 9 Nov 2010 16:28:02 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> >> > So, I don't think application developers will use fadvise() aggressively >> >> > because we don't have a cross platform agreement of a fadvice behavior. >> >> > >> >> I strongly disagree. For a long time I have been trying to resolve >> >> interactivity issues caused by my rsync-based backup script. Many kernel >> >> developers have said that there is nothing the kernel can do without >> >> more information from user-space (e.g. cgroups, madvise). While cgroups >> >> help, the fix is round-about at best and requires configuration where >> >> really none should be necessary. The easiest solution for everyone >> >> involved would be for rsync to use FADV_DONTNEED. The behavior doesn't >> >> need to be perfectly consistent between platforms for the flag to be >> >> useful so long as each implementation does something sane to help >> >> use-once access patterns. >> >> >> >> People seem to mention frequently that there are no users of >> >> FADV_DONTNEED and therefore we don't need to implement it. It seems like >> >> this is ignoring an obvious catch-22. Currently rsync has no fadvise >> >> support at all, since using[1] the implemented hints to get the desired >> >> effect is far too complicated^M^M^M^Mhacky to be considered >> >> merge-worthy. Considering the number of Google hits returned for >> >> fadvise, I wouldn't be surprised if there were countless other projects >> >> with this same difficulty. We want to be able to tell the kernel about >> >> our useage patterns, but the kernel won't listen. >> > >> > Because we have an alternative solution already. please try memcgroup :) > > Using memcgroup for this is utter crap, it just contains the trainwreck, > it doesn't solve it in any way. > >> I think memcg could be a solution of them but fundamental solution is >> that we have to cure it in VM itself. >> I feel it's absolutely absurd to enable and use memcg for amending it. > > Agreed.. > >> I wonder what's the problem in Peter's patch 'drop behind'. >> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html >> >> Could anyone tell me why it can't accept upstream? > > Read the thread, its quite clear nobody got convinced it was a good idea > and wanted to fix the use-once policy, then Rik rewrote all of > page-reclaim. > Thanks for the information. I hope this is a chance to rethink about it. Rik, Could you give us to any comment about this idea? -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-15 9:05 ` Minchan Kim @ 2010-11-15 14:48 ` Rik van Riel 2010-11-17 10:16 ` Minchan Kim 0 siblings, 1 reply; 23+ messages in thread From: Rik van Riel @ 2010-11-15 14:48 UTC (permalink / raw) To: Minchan Kim Cc: Peter Zijlstra, KOSAKI Motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Wu Fengguang On 11/15/2010 04:05 AM, Minchan Kim wrote: > On Mon, Nov 15, 2010 at 5:47 PM, Peter Zijlstra<peterz@infradead.org> wrote: >> On Mon, 2010-11-15 at 15:07 +0900, Minchan Kim wrote: >>> I wonder what's the problem in Peter's patch 'drop behind'. >>> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html >>> >>> Could anyone tell me why it can't accept upstream? >> >> Read the thread, its quite clear nobody got convinced it was a good idea >> and wanted to fix the use-once policy, then Rik rewrote all of >> page-reclaim. >> > > Thanks for the information. > I hope this is a chance to rethink about it. > Rik, Could you give us to any comment about this idea? At the time, there were all kinds of general problems in page reclaim that all needed to be fixed. Peter's patch was mostly a band-aid for streaming IO. However, now that most of the other page reclaim problems seem to have been resolved, it would be worthwhile to test whether Peter's drop-behind approach gives an additional improvement. I could see it help by getting rid of already-read pages earlier, leaving more space for read-ahead data. I suspect it would do fairly little to protect the working set, because we do not scan the active file list at all unless it grows to be larger than the inactive file list. -- All rights reversed ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-15 14:48 ` Rik van Riel @ 2010-11-17 10:16 ` Minchan Kim 2010-11-17 11:15 ` Minchan Kim 2010-11-17 16:22 ` Rik van Riel 0 siblings, 2 replies; 23+ messages in thread From: Minchan Kim @ 2010-11-17 10:16 UTC (permalink / raw) To: Rik van Riel Cc: Peter Zijlstra, KOSAKI Motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Wu Fengguang On Mon, Nov 15, 2010 at 11:48 PM, Rik van Riel <riel@redhat.com> wrote: > On 11/15/2010 04:05 AM, Minchan Kim wrote: >> >> On Mon, Nov 15, 2010 at 5:47 PM, Peter Zijlstra<peterz@infradead.org> >> wrote: >>> >>> On Mon, 2010-11-15 at 15:07 +0900, Minchan Kim wrote: > >>>> I wonder what's the problem in Peter's patch 'drop behind'. >>>> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html >>>> >>>> Could anyone tell me why it can't accept upstream? >>> >>> Read the thread, its quite clear nobody got convinced it was a good idea >>> and wanted to fix the use-once policy, then Rik rewrote all of >>> page-reclaim. >>> >> >> Thanks for the information. >> I hope this is a chance to rethink about it. >> Rik, Could you give us to any comment about this idea? Sorry for late reply, Rik. > At the time, there were all kinds of general problems > in page reclaim that all needed to be fixed. Peter's > patch was mostly a band-aid for streaming IO. > > However, now that most of the other page reclaim problems > seem to have been resolved, it would be worthwhile to test > whether Peter's drop-behind approach gives an additional > improvement. Okay. I will have a time to make the workload for testing. > > I could see it help by getting rid of already-read pages > earlier, leaving more space for read-ahead data. Yes. Peter's logic breaks demotion if the page is in active list. But I think if it's just active page like rsync's two touch, we have to move tail of inactive although it's in active list. I will look into this, too. > > I suspect it would do fairly little to protect the working > set, because we do not scan the active file list at all > unless it grows to be larger than the inactive file list. Absolutely. But how about rsync's two touch? It can evict working set. I need the time for investigation. Thanks for the comment. > > -- > All rights reversed > -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-17 10:16 ` Minchan Kim @ 2010-11-17 11:15 ` Minchan Kim 2010-11-17 16:22 ` Rik van Riel 1 sibling, 0 replies; 23+ messages in thread From: Minchan Kim @ 2010-11-17 11:15 UTC (permalink / raw) To: Rik van Riel Cc: Peter Zijlstra, KOSAKI Motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Wu Fengguang On Wed, Nov 17, 2010 at 7:16 PM, Minchan Kim <minchan.kim@gmail.com> wrote: > On Mon, Nov 15, 2010 at 11:48 PM, Rik van Riel <riel@redhat.com> wrote: >> On 11/15/2010 04:05 AM, Minchan Kim wrote: >>> >>> On Mon, Nov 15, 2010 at 5:47 PM, Peter Zijlstra<peterz@infradead.org> >>> wrote: >>>> >>>> On Mon, 2010-11-15 at 15:07 +0900, Minchan Kim wrote: >> >>>>> I wonder what's the problem in Peter's patch 'drop behind'. >>>>> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html >>>>> >>>>> Could anyone tell me why it can't accept upstream? >>>> >>>> Read the thread, its quite clear nobody got convinced it was a good idea >>>> and wanted to fix the use-once policy, then Rik rewrote all of >>>> page-reclaim. >>>> >>> >>> Thanks for the information. >>> I hope this is a chance to rethink about it. >>> Rik, Could you give us to any comment about this idea? > > > Sorry for late reply, Rik. > >> At the time, there were all kinds of general problems >> in page reclaim that all needed to be fixed. Peter's >> patch was mostly a band-aid for streaming IO. >> >> However, now that most of the other page reclaim problems >> seem to have been resolved, it would be worthwhile to test >> whether Peter's drop-behind approach gives an additional >> improvement. > > Okay. I will have a time to make the workload for testing. > >> >> I could see it help by getting rid of already-read pages >> earlier, leaving more space for read-ahead data. > > Yes. Peter's logic breaks demotion if the page is in active list. > But I think if it's just active page like rsync's two touch, we have > to move tail of inactive although it's in active list. > I will look into this, too. Most important thing is how to know it's real working set or just trick by two touch. If it's very hard, recent Mandeep's patch can be a another solution. http://thread.gmane.org/gmane.linux.kernel.mm/54572 I will try it, too. -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-17 10:16 ` Minchan Kim 2010-11-17 11:15 ` Minchan Kim @ 2010-11-17 16:22 ` Rik van Riel 2010-11-18 2:47 ` Minchan Kim 1 sibling, 1 reply; 23+ messages in thread From: Rik van Riel @ 2010-11-17 16:22 UTC (permalink / raw) To: Minchan Kim Cc: Peter Zijlstra, KOSAKI Motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Wu Fengguang On 11/17/2010 05:16 AM, Minchan Kim wrote: > Absolutely. But how about rsync's two touch? > It can evict working set. > > I need the time for investigation. > Thanks for the comment. Maybe we could exempt MADV_SEQUENTIAL and FADV_SEQUENTIAL touches from promoting the page to the active list? Then we just need to make sure rsync uses fadvise properly to keep the working set protected from rsync. -- All rights reversed ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-17 16:22 ` Rik van Riel @ 2010-11-18 2:47 ` Minchan Kim 2010-11-18 3:24 ` Rik van Riel 0 siblings, 1 reply; 23+ messages in thread From: Minchan Kim @ 2010-11-18 2:47 UTC (permalink / raw) To: Rik van Riel Cc: Peter Zijlstra, KOSAKI Motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Wu Fengguang On Thu, Nov 18, 2010 at 1:22 AM, Rik van Riel <riel@redhat.com> wrote: > On 11/17/2010 05:16 AM, Minchan Kim wrote: > >> Absolutely. But how about rsync's two touch? >> It can evict working set. >> >> I need the time for investigation. >> Thanks for the comment. > > Maybe we could exempt MADV_SEQUENTIAL and FADV_SEQUENTIAL > touches from promoting the page to the active list? > The problem is non-mapped file page. non-mapped file page promotion happens by only mark_page_accessed. But it doesn't enough information to prevent promotion(ex, vma or file) Hmm.. Do other guys have any idea? Here is another idea. Current problem is following as. User can use fadivse with FADV_DONTNEED. But problem is that it can't affect when it meet dirty pages. So user have to sync dirty page before calling fadvise with FADV_DONTNEED. It would lose performance. Let's add some semantic of FADV_DONTNEED. It invalidates only pages which are not dirty. If it meets dirty page, let's move the page into inactive's tail or head. If we move the page into tail, shrinker can move it into head again for deferred write if it isn't written the backed device. > Then we just need to make sure rsync uses fadvise properly > to keep the working set protected from rsync. > > -- > All rights reversed > -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-18 2:47 ` Minchan Kim @ 2010-11-18 3:24 ` Rik van Riel 2010-11-18 3:46 ` Minchan Kim 0 siblings, 1 reply; 23+ messages in thread From: Rik van Riel @ 2010-11-18 3:24 UTC (permalink / raw) To: Minchan Kim Cc: Peter Zijlstra, KOSAKI Motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Wu Fengguang On 11/17/2010 09:47 PM, Minchan Kim wrote: > On Thu, Nov 18, 2010 at 1:22 AM, Rik van Riel<riel@redhat.com> wrote: >> On 11/17/2010 05:16 AM, Minchan Kim wrote: >> >>> Absolutely. But how about rsync's two touch? >>> It can evict working set. >>> >>> I need the time for investigation. >>> Thanks for the comment. >> >> Maybe we could exempt MADV_SEQUENTIAL and FADV_SEQUENTIAL >> touches from promoting the page to the active list? >> > > The problem is non-mapped file page. > non-mapped file page promotion happens by only mark_page_accessed. > But it doesn't enough information to prevent promotion(ex, vma or file) I believe we have enough information in filemap.c and can just pass that as a parameter to mark_page_accessed. > Here is another idea. > Current problem is following as. > User can use fadivse with FADV_DONTNEED. > But problem is that it can't affect when it meet dirty pages. > So user have to sync dirty page before calling fadvise with FADV_DONTNEED. > It would lose performance. > > Let's add some semantic of FADV_DONTNEED. > It invalidates only pages which are not dirty. > If it meets dirty page, let's move the page into inactive's tail or head. > If we move the page into tail, shrinker can move it into head again > for deferred write if it isn't written the backed device. That sounds like a good idea. -- All rights reversed ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-18 3:24 ` Rik van Riel @ 2010-11-18 3:46 ` Minchan Kim 0 siblings, 0 replies; 23+ messages in thread From: Minchan Kim @ 2010-11-18 3:46 UTC (permalink / raw) To: Rik van Riel Cc: Peter Zijlstra, KOSAKI Motohiro, Ben Gamari, linux-kernel, rsync, linux-mm, Wu Fengguang On Thu, Nov 18, 2010 at 12:24 PM, Rik van Riel <riel@redhat.com> wrote: > On 11/17/2010 09:47 PM, Minchan Kim wrote: >> >> On Thu, Nov 18, 2010 at 1:22 AM, Rik van Riel<riel@redhat.com> wrote: >>> >>> On 11/17/2010 05:16 AM, Minchan Kim wrote: >>> >>>> Absolutely. But how about rsync's two touch? >>>> It can evict working set. >>>> >>>> I need the time for investigation. >>>> Thanks for the comment. >>> >>> Maybe we could exempt MADV_SEQUENTIAL and FADV_SEQUENTIAL >>> touches from promoting the page to the active list? >>> >> >> The problem is non-mapped file page. >> non-mapped file page promotion happens by only mark_page_accessed. >> But it doesn't enough information to prevent promotion(ex, vma or file) > > I believe we have enough information in filemap.c and can just > pass that as a parameter to mark_page_accessed. FADV_SEQUENTIAL is per file/vma semantic and It is used by many place. I think changing all those places isn't simple and I don't want to add new structure to propagate the information to mark_page_accessed. > >> Here is another idea. >> Current problem is following as. >> User can use fadivse with FADV_DONTNEED. >> But problem is that it can't affect when it meet dirty pages. >> So user have to sync dirty page before calling fadvise with FADV_DONTNEED. >> It would lose performance. >> >> Let's add some semantic of FADV_DONTNEED. >> It invalidates only pages which are not dirty. >> If it meets dirty page, let's move the page into inactive's tail or head. >> If we move the page into tail, shrinker can move it into head again >> for deferred write if it isn't written the backed device. > > That sounds like a good idea. I will implement it. Thanks, Rik. > > -- > All rights reversed > -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: fadvise DONTNEED implementation (or lack thereof) 2010-11-15 8:47 ` Peter Zijlstra 2010-11-15 9:05 ` Minchan Kim @ 2010-11-15 9:10 ` KOSAKI Motohiro 1 sibling, 0 replies; 23+ messages in thread From: KOSAKI Motohiro @ 2010-11-15 9:10 UTC (permalink / raw) To: Peter Zijlstra Cc: kosaki.motohiro, Minchan Kim, Ben Gamari, linux-kernel, rsync, linux-mm, Wu Fengguang > > I wonder what's the problem in Peter's patch 'drop behind'. > > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html > > > > Could anyone tell me why it can't accept upstream? > > Read the thread, its quite clear nobody got convinced it was a good idea > and wanted to fix the use-once policy, then Rik rewrote all of > page-reclaim. If my understand is correct, rsync touch data twice (for a hash calculation and for a copy). then, currect used-once-heuristics seems still doesn't work. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2010-11-18 3:46 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-11-04 5:58 fadvise DONTNEED implementation (or lack thereof) Ben Gamari 2010-11-09 7:28 ` KOSAKI Motohiro 2010-11-09 8:03 ` KOSAKI Motohiro 2010-11-09 12:54 ` Ben Gamari 2010-11-14 5:09 ` KOSAKI Motohiro 2010-11-14 5:20 ` Ben Gamari 2010-11-14 21:33 ` Brian K. White 2010-11-15 6:07 ` Minchan Kim 2010-11-15 7:09 ` KOSAKI Motohiro 2010-11-15 7:19 ` Minchan Kim 2010-11-15 7:28 ` KOSAKI Motohiro 2010-11-15 7:46 ` Minchan Kim 2010-11-15 12:46 ` Ben Gamari 2010-11-15 8:47 ` Peter Zijlstra 2010-11-15 9:05 ` Minchan Kim 2010-11-15 14:48 ` Rik van Riel 2010-11-17 10:16 ` Minchan Kim 2010-11-17 11:15 ` Minchan Kim 2010-11-17 16:22 ` Rik van Riel 2010-11-18 2:47 ` Minchan Kim 2010-11-18 3:24 ` Rik van Riel 2010-11-18 3:46 ` Minchan Kim 2010-11-15 9:10 ` KOSAKI Motohiro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).