From: Dave Chinner <david@fromorbit.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net>, Michal Hocko <mhocko@suse.cz>, Minchan Kim <minchan@kernel.org>, Vladimir Davydov <vdavydov@virtuozzo.com>, Johannes Weiner <hannes@cmpxchg.org>, Vlastimil Babka <vbabka@suse.cz>, Andrew Morton <akpm@linux-foundation.org>, Bob Peterson <rpeterso@redhat.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, "Huang, Ying" <ying.huang@intel.com>, Christoph Hellwig <hch@lst.de>, Wu Fengguang <fengguang.wu@intel.com>, LKP <lkp@01.org>, Tejun Heo <tj@kernel.org>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression Date: Fri, 19 Aug 2016 07:19:49 +1000 [thread overview] Message-ID: <20160818211949.GE22388@dastard> (raw) In-Reply-To: <CA+55aFyEQhjm9CU0yhk0WBAArB9soOA0JfWzjricnOqG9GB41g@mail.gmail.com> On Thu, Aug 18, 2016 at 10:55:01AM -0700, Linus Torvalds wrote: > On Thu, Aug 18, 2016 at 6:24 AM, Mel Gorman <mgorman@techsingularity.net> wrote: > > On Thu, Aug 18, 2016 at 05:11:11PM +1000, Dave Chinner wrote: > >> FWIW, I just remembered about /proc/sys/vm/zone_reclaim_mode. > >> > > > > That is a terrifying "fix" for this problem. It just happens to work > > because there is no spillover to other nodes so only one kswapd instance > > is potentially active. > > Well, it may be a terrifying fix, but it does bring up an intriguing > notion: maybe what we should think about is to make the actual page > cache allocations be more "node-sticky" for a particular mapping? Not > some hard node binding, but if we were to make a single mapping *tend* > to allocate pages primarily within the same node, that would have the > kind of secondary afvantage that it would avoid the cross-node mapping > locking. For streaming or use-once IO it makes a lot of sense to restrict the locality of the page cache. The faster the IO device, the less dirty page buffering we need to maintain full device bandwidth. And the larger the machine the greater the effect of global page cache pollution on the other appplications is. > Think of it as a gentler "guiding" fix to the spinlock contention > issue than a hard hammer. > > And trying to (at least initially) keep the allocations of one > particular file to one particular node sounds like it could have other > locality advantages too. > > In fact, looking at the __page_cache_alloc(), we already have that > "spread pages out" logic. I'm assuming Dave doesn't actually have that > bit set (I don't think it's the default), but I'm also envisioning > that maybe we could extend on that notion, and try to spread out > allocations in general, but keep page allocations from one particular > mapping within one node. CONFIG_CPUSETS=y But I don't have any cpusets configured (unless systemd is doing something wacky under the covers) so the page spread bit should not be set. > The fact that zone_reclaim_mode really improves on Dave's numbers > *that* dramatically does seem to imply that there is something to be > said for this. > > We do *not* want to limit the whole page cache to a particular node - > that sounds very unreasonable in general. But limiting any particular > file mapping (by default - I'm sure there are things like databases > that just want their one DB file to take over all of memory) to a > single node sounds much less unreasonable. > > What do you guys think? Worth exploring? The problem is that whenever we turn this sort of behaviour on, some benchmark regresses because it no longer holds it's working set in the page cache, leading to the change being immediately reverted. Enterprise java benchmarks ring a bell, for some reason. Hence my comment above about needing it to be tied into specific "use-once-only" page cache behaviours. I know we have working set estimation, fadvise modes and things like readahead that help track sequential and use-once access patterns, but I'm not sure how we can tie that all together.... Cheers, Dave. -- Dave Chinner david@fromorbit.com
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com> To: lkp@lists.01.org Subject: Re: [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression Date: Fri, 19 Aug 2016 07:19:49 +1000 [thread overview] Message-ID: <20160818211949.GE22388@dastard> (raw) In-Reply-To: <CA+55aFyEQhjm9CU0yhk0WBAArB9soOA0JfWzjricnOqG9GB41g@mail.gmail.com> [-- Attachment #1: Type: text/plain, Size: 3241 bytes --] On Thu, Aug 18, 2016 at 10:55:01AM -0700, Linus Torvalds wrote: > On Thu, Aug 18, 2016 at 6:24 AM, Mel Gorman <mgorman@techsingularity.net> wrote: > > On Thu, Aug 18, 2016 at 05:11:11PM +1000, Dave Chinner wrote: > >> FWIW, I just remembered about /proc/sys/vm/zone_reclaim_mode. > >> > > > > That is a terrifying "fix" for this problem. It just happens to work > > because there is no spillover to other nodes so only one kswapd instance > > is potentially active. > > Well, it may be a terrifying fix, but it does bring up an intriguing > notion: maybe what we should think about is to make the actual page > cache allocations be more "node-sticky" for a particular mapping? Not > some hard node binding, but if we were to make a single mapping *tend* > to allocate pages primarily within the same node, that would have the > kind of secondary afvantage that it would avoid the cross-node mapping > locking. For streaming or use-once IO it makes a lot of sense to restrict the locality of the page cache. The faster the IO device, the less dirty page buffering we need to maintain full device bandwidth. And the larger the machine the greater the effect of global page cache pollution on the other appplications is. > Think of it as a gentler "guiding" fix to the spinlock contention > issue than a hard hammer. > > And trying to (at least initially) keep the allocations of one > particular file to one particular node sounds like it could have other > locality advantages too. > > In fact, looking at the __page_cache_alloc(), we already have that > "spread pages out" logic. I'm assuming Dave doesn't actually have that > bit set (I don't think it's the default), but I'm also envisioning > that maybe we could extend on that notion, and try to spread out > allocations in general, but keep page allocations from one particular > mapping within one node. CONFIG_CPUSETS=y But I don't have any cpusets configured (unless systemd is doing something wacky under the covers) so the page spread bit should not be set. > The fact that zone_reclaim_mode really improves on Dave's numbers > *that* dramatically does seem to imply that there is something to be > said for this. > > We do *not* want to limit the whole page cache to a particular node - > that sounds very unreasonable in general. But limiting any particular > file mapping (by default - I'm sure there are things like databases > that just want their one DB file to take over all of memory) to a > single node sounds much less unreasonable. > > What do you guys think? Worth exploring? The problem is that whenever we turn this sort of behaviour on, some benchmark regresses because it no longer holds it's working set in the page cache, leading to the change being immediately reverted. Enterprise java benchmarks ring a bell, for some reason. Hence my comment above about needing it to be tied into specific "use-once-only" page cache behaviours. I know we have working set estimation, fadvise modes and things like readahead that help track sequential and use-once access patterns, but I'm not sure how we can tie that all together.... Cheers, Dave. -- Dave Chinner david(a)fromorbit.com
next prev parent reply other threads:[~2016-08-19 4:04 UTC|newest] Thread overview: 219+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-08-09 14:33 [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression kernel test robot 2016-08-09 14:33 ` kernel test robot 2016-08-10 18:24 ` [lkp] " Linus Torvalds 2016-08-10 18:24 ` Linus Torvalds 2016-08-10 23:08 ` [lkp] " Dave Chinner 2016-08-10 23:08 ` Dave Chinner 2016-08-10 23:51 ` [lkp] " Linus Torvalds 2016-08-10 23:51 ` Linus Torvalds 2016-08-10 23:58 ` [LKP] [lkp] " Huang, Ying 2016-08-10 23:58 ` Huang, Ying 2016-08-11 0:11 ` [LKP] [lkp] " Huang, Ying 2016-08-11 0:11 ` Huang, Ying 2016-08-11 0:23 ` [LKP] [lkp] " Linus Torvalds 2016-08-11 0:23 ` Linus Torvalds 2016-08-11 0:33 ` [LKP] [lkp] " Huang, Ying 2016-08-11 0:33 ` Huang, Ying 2016-08-11 1:00 ` [LKP] [lkp] " Linus Torvalds 2016-08-11 1:00 ` Linus Torvalds 2016-08-11 4:46 ` [LKP] [lkp] " Dave Chinner 2016-08-11 4:46 ` Dave Chinner 2016-08-15 17:22 ` [LKP] [lkp] " Huang, Ying 2016-08-15 17:22 ` Huang, Ying 2016-08-16 0:08 ` [LKP] [lkp] " Dave Chinner 2016-08-16 0:08 ` Dave Chinner 2016-08-11 15:57 ` [LKP] [lkp] " Christoph Hellwig 2016-08-11 15:57 ` Christoph Hellwig 2016-08-11 16:55 ` [LKP] [lkp] " Linus Torvalds 2016-08-11 16:55 ` Linus Torvalds 2016-08-11 17:51 ` [LKP] [lkp] " Huang, Ying 2016-08-11 17:51 ` Huang, Ying 2016-08-11 19:51 ` [LKP] [lkp] " Linus Torvalds 2016-08-11 19:51 ` Linus Torvalds 2016-08-11 20:00 ` [LKP] [lkp] " Christoph Hellwig 2016-08-11 20:00 ` Christoph Hellwig 2016-08-11 20:35 ` [LKP] [lkp] " Linus Torvalds 2016-08-11 20:35 ` Linus Torvalds 2016-08-11 22:16 ` [LKP] [lkp] " Al Viro 2016-08-11 22:16 ` Al Viro 2016-08-11 22:30 ` [LKP] [lkp] " Linus Torvalds 2016-08-11 22:30 ` Linus Torvalds 2016-08-11 21:16 ` [LKP] [lkp] " Huang, Ying 2016-08-11 21:16 ` Huang, Ying 2016-08-11 21:40 ` [LKP] [lkp] " Linus Torvalds 2016-08-11 21:40 ` Linus Torvalds 2016-08-11 22:08 ` [LKP] [lkp] " Christoph Hellwig 2016-08-11 22:08 ` Christoph Hellwig 2016-08-12 0:54 ` [LKP] [lkp] " Dave Chinner 2016-08-12 0:54 ` Dave Chinner 2016-08-12 2:23 ` [LKP] [lkp] " Dave Chinner 2016-08-12 2:23 ` Dave Chinner 2016-08-12 2:32 ` [LKP] [lkp] " Linus Torvalds 2016-08-12 2:32 ` Linus Torvalds 2016-08-12 2:52 ` [LKP] [lkp] " Christoph Hellwig 2016-08-12 2:52 ` Christoph Hellwig 2016-08-12 3:20 ` [LKP] [lkp] " Linus Torvalds 2016-08-12 3:20 ` Linus Torvalds 2016-08-12 4:16 ` [LKP] [lkp] " Dave Chinner 2016-08-12 4:16 ` Dave Chinner 2016-08-12 5:02 ` [LKP] [lkp] " Linus Torvalds 2016-08-12 5:02 ` Linus Torvalds 2016-08-12 6:04 ` [LKP] [lkp] " Dave Chinner 2016-08-12 6:04 ` Dave Chinner 2016-08-12 6:29 ` [LKP] [lkp] " Ye Xiaolong 2016-08-12 6:29 ` Ye Xiaolong 2016-08-12 8:51 ` [LKP] [lkp] " Ye Xiaolong 2016-08-12 8:51 ` Ye Xiaolong 2016-08-12 10:02 ` [LKP] [lkp] " Dave Chinner 2016-08-12 10:02 ` Dave Chinner 2016-08-12 10:43 ` Fengguang Wu 2016-08-12 10:43 ` Fengguang Wu 2016-08-13 0:30 ` [LKP] [lkp] " Christoph Hellwig 2016-08-13 0:30 ` Christoph Hellwig 2016-08-13 21:48 ` [LKP] [lkp] " Christoph Hellwig 2016-08-13 21:48 ` Christoph Hellwig 2016-08-13 22:07 ` [LKP] [lkp] " Fengguang Wu 2016-08-13 22:07 ` Fengguang Wu 2016-08-13 22:15 ` [LKP] [lkp] " Christoph Hellwig 2016-08-13 22:15 ` Christoph Hellwig 2016-08-13 22:51 ` [LKP] [lkp] " Fengguang Wu 2016-08-13 22:51 ` Fengguang Wu 2016-08-14 14:50 ` [LKP] [lkp] " Fengguang Wu 2016-08-14 14:50 ` Fengguang Wu 2016-08-14 16:17 ` [LKP] [lkp] " Christoph Hellwig 2016-08-14 16:17 ` Christoph Hellwig 2016-08-14 23:46 ` [LKP] [lkp] " Dave Chinner 2016-08-14 23:46 ` Dave Chinner 2016-08-14 23:57 ` [LKP] [lkp] " Fengguang Wu 2016-08-14 23:57 ` Fengguang Wu 2016-08-15 14:14 ` [LKP] [lkp] " Fengguang Wu 2016-08-15 14:14 ` Fengguang Wu 2016-08-15 21:22 ` [LKP] [lkp] " Dave Chinner 2016-08-15 21:22 ` Dave Chinner 2016-08-16 12:20 ` [LKP] [lkp] " Fengguang Wu 2016-08-16 12:20 ` Fengguang Wu 2016-08-15 20:30 ` [LKP] [lkp] " Huang, Ying 2016-08-15 20:30 ` Huang, Ying 2016-08-22 22:09 ` [LKP] [lkp] " Huang, Ying 2016-08-22 22:09 ` Huang, Ying 2016-09-26 6:25 ` [LKP] [lkp] " Huang, Ying 2016-09-26 6:25 ` Huang, Ying 2016-09-26 14:55 ` [LKP] [lkp] " Christoph Hellwig 2016-09-26 14:55 ` Christoph Hellwig 2016-09-27 0:52 ` [LKP] [lkp] " Huang, Ying 2016-09-27 0:52 ` Huang, Ying 2016-08-16 13:25 ` [LKP] [lkp] " Fengguang Wu 2016-08-16 13:25 ` Fengguang Wu 2016-08-13 23:32 ` [LKP] [lkp] " Dave Chinner 2016-08-13 23:32 ` Dave Chinner 2016-08-12 2:27 ` [LKP] [lkp] " Linus Torvalds 2016-08-12 2:27 ` Linus Torvalds 2016-08-12 3:56 ` [LKP] [lkp] " Dave Chinner 2016-08-12 3:56 ` Dave Chinner 2016-08-12 18:03 ` [LKP] [lkp] " Linus Torvalds 2016-08-12 18:03 ` Linus Torvalds 2016-08-13 23:58 ` [LKP] [lkp] " Fengguang Wu 2016-08-13 23:58 ` Fengguang Wu 2016-08-15 0:48 ` [LKP] [lkp] " Dave Chinner 2016-08-15 0:48 ` Dave Chinner 2016-08-15 1:37 ` [LKP] [lkp] " Linus Torvalds 2016-08-15 1:37 ` Linus Torvalds 2016-08-15 2:28 ` [LKP] [lkp] " Dave Chinner 2016-08-15 2:28 ` Dave Chinner 2016-08-15 2:53 ` [LKP] [lkp] " Linus Torvalds 2016-08-15 2:53 ` Linus Torvalds 2016-08-15 5:00 ` [LKP] [lkp] " Dave Chinner 2016-08-15 5:00 ` Dave Chinner [not found] ` <CA+55aFwva2Xffai+Eqv1Jn_NGryk3YJ2i5JoHOQnbQv6qVPAsw@mail.gmail.com> [not found] ` <CA+55aFy14nUnJQ_GdF=j8Fa9xiH70c6fY2G3q5HQ01+8z1z3qQ@mail.gmail.com> 2016-08-15 5:12 ` Linus Torvalds 2016-08-15 22:22 ` [LKP] [lkp] " Dave Chinner 2016-08-15 22:22 ` Dave Chinner 2016-08-15 22:42 ` [LKP] [lkp] " Dave Chinner 2016-08-15 22:42 ` Dave Chinner 2016-08-15 23:20 ` [LKP] [lkp] " Linus Torvalds 2016-08-15 23:20 ` Linus Torvalds 2016-08-15 23:48 ` [LKP] [lkp] " Linus Torvalds 2016-08-15 23:48 ` Linus Torvalds 2016-08-16 0:44 ` [LKP] [lkp] " Dave Chinner 2016-08-16 0:44 ` Dave Chinner 2016-08-16 15:05 ` [LKP] [lkp] " Mel Gorman 2016-08-16 15:05 ` Mel Gorman 2016-08-16 17:47 ` [LKP] [lkp] " Linus Torvalds 2016-08-16 17:47 ` Linus Torvalds 2016-08-17 15:48 ` [LKP] [lkp] " Michal Hocko 2016-08-17 15:48 ` Michal Hocko 2016-08-17 16:42 ` [LKP] [lkp] " Michal Hocko 2016-08-17 16:42 ` Michal Hocko 2016-08-17 15:49 ` [LKP] [lkp] " Mel Gorman 2016-08-17 15:49 ` Mel Gorman 2016-08-18 0:45 ` [LKP] [lkp] " Mel Gorman 2016-08-18 0:45 ` Mel Gorman 2016-08-18 7:11 ` [LKP] [lkp] " Dave Chinner 2016-08-18 7:11 ` Dave Chinner 2016-08-18 13:24 ` [LKP] [lkp] " Mel Gorman 2016-08-18 13:24 ` Mel Gorman 2016-08-18 17:55 ` [LKP] [lkp] " Linus Torvalds 2016-08-18 17:55 ` Linus Torvalds 2016-08-18 21:19 ` Dave Chinner [this message] 2016-08-18 21:19 ` Dave Chinner 2016-08-18 22:25 ` [LKP] [lkp] " Linus Torvalds 2016-08-18 22:25 ` Linus Torvalds 2016-08-19 9:00 ` [LKP] [lkp] " Michal Hocko 2016-08-19 9:00 ` Michal Hocko 2016-08-19 10:49 ` [LKP] [lkp] " Mel Gorman 2016-08-19 10:49 ` Mel Gorman 2016-08-19 23:48 ` [LKP] [lkp] " Dave Chinner 2016-08-19 23:48 ` Dave Chinner 2016-08-20 1:08 ` [LKP] [lkp] " Linus Torvalds 2016-08-20 1:08 ` Linus Torvalds 2016-08-20 12:16 ` [LKP] [lkp] " Mel Gorman 2016-08-20 12:16 ` Mel Gorman 2016-08-19 15:08 ` [LKP] [lkp] " Mel Gorman 2016-08-19 15:08 ` Mel Gorman 2016-09-01 23:32 ` [LKP] [lkp] " Dave Chinner 2016-09-01 23:32 ` Dave Chinner 2016-09-06 15:37 ` [LKP] [lkp] " Mel Gorman 2016-09-06 15:37 ` Mel Gorman 2016-09-06 15:52 ` [LKP] [lkp] " Huang, Ying 2016-09-06 15:52 ` Huang, Ying 2016-08-24 15:40 ` [LKP] [lkp] " Huang, Ying 2016-08-24 15:40 ` Huang, Ying 2016-08-25 9:37 ` [LKP] [lkp] " Mel Gorman 2016-08-25 9:37 ` Mel Gorman 2016-08-18 2:44 ` [LKP] [lkp] " Dave Chinner 2016-08-18 2:44 ` Dave Chinner 2016-08-16 0:15 ` [LKP] [lkp] " Linus Torvalds 2016-08-16 0:15 ` Linus Torvalds 2016-08-16 0:38 ` [LKP] [lkp] " Dave Chinner 2016-08-16 0:38 ` Dave Chinner 2016-08-16 0:50 ` [LKP] [lkp] " Linus Torvalds 2016-08-16 0:50 ` Linus Torvalds 2016-08-16 0:19 ` [LKP] [lkp] " Dave Chinner 2016-08-16 0:19 ` Dave Chinner 2016-08-16 1:51 ` [LKP] [lkp] " Linus Torvalds 2016-08-16 1:51 ` Linus Torvalds 2016-08-16 22:02 ` [LKP] [lkp] " Dave Chinner 2016-08-16 22:02 ` Dave Chinner 2016-08-16 23:23 ` [LKP] [lkp] " Linus Torvalds 2016-08-16 23:23 ` Linus Torvalds 2016-08-15 23:01 ` [LKP] [lkp] " Linus Torvalds 2016-08-15 23:01 ` Linus Torvalds 2016-08-16 0:17 ` [LKP] [lkp] " Dave Chinner 2016-08-16 0:17 ` Dave Chinner 2016-08-16 0:45 ` [LKP] [lkp] " Linus Torvalds 2016-08-16 0:45 ` Linus Torvalds 2016-08-15 5:03 ` [LKP] [lkp] " Ingo Molnar 2016-08-15 5:03 ` Ingo Molnar 2016-08-17 16:24 ` [LKP] [lkp] " Peter Zijlstra 2016-08-17 16:24 ` Peter Zijlstra 2016-08-15 12:58 ` [LKP] [lkp] " Fengguang Wu 2016-08-15 12:58 ` Fengguang Wu 2016-08-11 1:16 ` [LKP] [lkp] " Dave Chinner 2016-08-11 1:16 ` Dave Chinner 2016-08-11 1:32 ` [LKP] [lkp] " Dave Chinner 2016-08-11 1:32 ` Dave Chinner 2016-08-11 2:36 ` [LKP] [lkp] " Ye Xiaolong 2016-08-11 2:36 ` Ye Xiaolong 2016-08-11 3:05 ` [LKP] [lkp] " Dave Chinner 2016-08-11 3:05 ` Dave Chinner 2016-08-12 1:26 ` [LKP] [lkp] " Dave Chinner 2016-08-12 1:26 ` Dave Chinner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20160818211949.GE22388@dastard \ --to=david@fromorbit.com \ --cc=akpm@linux-foundation.org \ --cc=fengguang.wu@intel.com \ --cc=hannes@cmpxchg.org \ --cc=hch@lst.de \ --cc=kirill.shutemov@linux.intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=lkp@01.org \ --cc=mgorman@techsingularity.net \ --cc=mhocko@suse.cz \ --cc=minchan@kernel.org \ --cc=rpeterso@redhat.com \ --cc=tj@kernel.org \ --cc=torvalds@linux-foundation.org \ --cc=vbabka@suse.cz \ --cc=vdavydov@virtuozzo.com \ --cc=ying.huang@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.