From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752997AbcHORWq (ORCPT ); Mon, 15 Aug 2016 13:22:46 -0400 Received: from mga11.intel.com ([192.55.52.93]:9811 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752824AbcHORWo (ORCPT ); Mon, 15 Aug 2016 13:22:44 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,526,1464678000"; d="scan'208";a="1014707949" From: "Huang\, Ying" To: Dave Chinner Cc: Linus Torvalds , "Huang\, Ying" , LKML , Bob Peterson , Wu Fengguang , LKP , Christoph Hellwig Subject: Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression References: <20160809143359.GA11220@yexl-desktop> <20160810230840.GS16044@dastard> <87eg5w18iu.fsf@yhuang-mobile.sh.intel.com> <87a8gk17x7.fsf@yhuang-mobile.sh.intel.com> <8760r816wf.fsf@yhuang-mobile.sh.intel.com> <20160811044609.GW16044@dastard> Date: Mon, 15 Aug 2016 10:22:43 -0700 In-Reply-To: <20160811044609.GW16044@dastard> (Dave Chinner's message of "Thu, 11 Aug 2016 14:46:09 +1000") Message-ID: <87twemndzw.fsf@yhuang-mobile.sh.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Chinner, Dave Chinner writes: > On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote: >> On Wed, Aug 10, 2016 at 5:33 PM, Huang, Ying wrote: >> > >> > Here it is, >> >> Thanks. >> >> Appended is a munged "after" list, with the "before" values in >> parenthesis. It actually looks fairly similar. >> >> The biggest difference is that we have "mark_page_accessed()" show up >> after, and not before. There was also a lot of LRU noise in the >> non-profile data. I wonder if that is the reason here: the old model >> of using generic_perform_write/block_page_mkwrite didn't mark the >> pages accessed, and now with iomap_file_buffered_write() they get >> marked as active and that screws up the LRU list, and makes us not >> flush out the dirty pages well (because they are seen as active and >> not good for writeback), and then you get bad memory use. >> >> I'm not seeing anything that looks like locking-related. > > Not in that profile. I've been doing some local testing inside a > 4-node fake-numa 16p/16GB RAM VM to see what I can find. You run the test in a virtual machine, I think that is why your perf data looks strange (high value of _raw_spin_unlock_irqrestore). To setup KVM to use perf, you may refer to, https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Tuning_and_Optimization_Guide/sect-Virtualization_Tuning_Optimization_Guide-Monitoring_Tools-vPMU.html https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Administration_Guide/sect-perf-mon.html I haven't tested them. You may Google to find more information. Or the perf/kvm people can give you more information. > I'm yet to work out how I can trigger a profile like the one that > was reported (I really need to see the event traces), but in the > mean time I found this.... > > Doing a large sequential single threaded buffered write using a 4k > buffer (so single page per syscall to make the XFS IO path allocator > behave the same way as in 4.7), I'm seeing a CPU profile that > indicates we have a potential mapping->tree_lock issue: > > # xfs_io -f -c "truncate 0" -c "pwrite 0 47g" /mnt/scratch/fooey > wrote 50465865728/50465865728 bytes at offset 0 > 47.000 GiB, 12320768 ops; 0:01:36.00 (499.418 MiB/sec and 127850.9132 ops/sec) > > .... > > 24.15% [kernel] [k] _raw_spin_unlock_irqrestore > 9.67% [kernel] [k] copy_user_generic_string > 5.64% [kernel] [k] _raw_spin_unlock_irq > 3.34% [kernel] [k] get_page_from_freelist > 2.57% [kernel] [k] mark_page_accessed > 2.45% [kernel] [k] do_raw_spin_lock > 1.83% [kernel] [k] shrink_page_list > 1.70% [kernel] [k] free_hot_cold_page > 1.26% [kernel] [k] xfs_do_writepage Best Regards, Huang, Ying From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1939515239207811628==" MIME-Version: 1.0 From: Huang, Ying To: lkp@lists.01.org Subject: Re: [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression Date: Mon, 15 Aug 2016 10:22:43 -0700 Message-ID: <87twemndzw.fsf@yhuang-mobile.sh.intel.com> In-Reply-To: <20160811044609.GW16044@dastard> List-Id: --===============1939515239207811628== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi, Chinner, Dave Chinner writes: > On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote: >> On Wed, Aug 10, 2016 at 5:33 PM, Huang, Ying wr= ote: >> > >> > Here it is, >> = >> Thanks. >> = >> Appended is a munged "after" list, with the "before" values in >> parenthesis. It actually looks fairly similar. >> = >> The biggest difference is that we have "mark_page_accessed()" show up >> after, and not before. There was also a lot of LRU noise in the >> non-profile data. I wonder if that is the reason here: the old model >> of using generic_perform_write/block_page_mkwrite didn't mark the >> pages accessed, and now with iomap_file_buffered_write() they get >> marked as active and that screws up the LRU list, and makes us not >> flush out the dirty pages well (because they are seen as active and >> not good for writeback), and then you get bad memory use. >> = >> I'm not seeing anything that looks like locking-related. > > Not in that profile. I've been doing some local testing inside a > 4-node fake-numa 16p/16GB RAM VM to see what I can find. You run the test in a virtual machine, I think that is why your perf data looks strange (high value of _raw_spin_unlock_irqrestore). To setup KVM to use perf, you may refer to, https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/ht= ml/Virtualization_Tuning_and_Optimization_Guide/sect-Virtualization_Tuning_= Optimization_Guide-Monitoring_Tools-vPMU.html https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/ht= ml/Virtualization_Administration_Guide/sect-perf-mon.html I haven't tested them. You may Google to find more information. Or the perf/kvm people can give you more information. > I'm yet to work out how I can trigger a profile like the one that > was reported (I really need to see the event traces), but in the > mean time I found this.... > > Doing a large sequential single threaded buffered write using a 4k > buffer (so single page per syscall to make the XFS IO path allocator > behave the same way as in 4.7), I'm seeing a CPU profile that > indicates we have a potential mapping->tree_lock issue: > > # xfs_io -f -c "truncate 0" -c "pwrite 0 47g" /mnt/scratch/fooey > wrote 50465865728/50465865728 bytes at offset 0 > 47.000 GiB, 12320768 ops; 0:01:36.00 (499.418 MiB/sec and 127850.9132 ops= /sec) > > .... > > 24.15% [kernel] [k] _raw_spin_unlock_irqrestore > 9.67% [kernel] [k] copy_user_generic_string > 5.64% [kernel] [k] _raw_spin_unlock_irq > 3.34% [kernel] [k] get_page_from_freelist > 2.57% [kernel] [k] mark_page_accessed > 2.45% [kernel] [k] do_raw_spin_lock > 1.83% [kernel] [k] shrink_page_list > 1.70% [kernel] [k] free_hot_cold_page > 1.26% [kernel] [k] xfs_do_writepage Best Regards, Huang, Ying --===============1939515239207811628==--