From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752997AbcHORWq (ORCPT <rfc822;w@1wt.eu>);
	Mon, 15 Aug 2016 13:22:46 -0400
Received: from mga11.intel.com ([192.55.52.93]:9811 "EHLO mga11.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752824AbcHORWo (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 15 Aug 2016 13:22:44 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.28,526,1464678000"; 
   d="scan'208";a="1014707949"
From: "Huang\, Ying" <ying.huang@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        "Huang\, Ying" <ying.huang@intel.com>,
        LKML <linux-kernel@vger.kernel.org>,
        Bob Peterson <rpeterso@redhat.com>,
        Wu Fengguang <fengguang.wu@intel.com>, LKP <lkp@01.org>,
        Christoph Hellwig <hch@lst.de>
Subject: Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression
References: <20160809143359.GA11220@yexl-desktop>
	<CA+55aFyJaT3ufm7kfU=PGi0YtHzBEYYLxcA+PUrka8uQ3=5+bg@mail.gmail.com>
	<20160810230840.GS16044@dastard>
	<CA+55aFyR3DxeQimeU+j4gEXku0WJhEuDZPr7PNSDRUQHTaTAXA@mail.gmail.com>
	<87eg5w18iu.fsf@yhuang-mobile.sh.intel.com>
	<87a8gk17x7.fsf@yhuang-mobile.sh.intel.com>
	<CA+55aFyQ6cjKOQZKEuQV9ch+aerW1Q6uGdv+pCS81rX0yjyR6w@mail.gmail.com>
	<8760r816wf.fsf@yhuang-mobile.sh.intel.com>
	<CA+55aFy=xeEKgfHWisfxsZNqrgMJxvKvhrWrPmdpMxu0pnOXig@mail.gmail.com>
	<20160811044609.GW16044@dastard>
Date: Mon, 15 Aug 2016 10:22:43 -0700
In-Reply-To: <20160811044609.GW16044@dastard> (Dave Chinner's message of "Thu,
	11 Aug 2016 14:46:09 +1000")
Message-ID: <87twemndzw.fsf@yhuang-mobile.sh.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi, Chinner,

Dave Chinner <david@fromorbit.com> writes:

> On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote:
>> On Wed, Aug 10, 2016 at 5:33 PM, Huang, Ying <ying.huang@intel.com> wrote:
>> >
>> > Here it is,
>> 
>> Thanks.
>> 
>> Appended is a munged "after" list, with the "before" values in
>> parenthesis. It actually looks fairly similar.
>> 
>> The biggest difference is that we have "mark_page_accessed()" show up
>> after, and not before. There was also a lot of LRU noise in the
>> non-profile data. I wonder if that is the reason here: the old model
>> of using generic_perform_write/block_page_mkwrite didn't mark the
>> pages accessed, and now with iomap_file_buffered_write() they get
>> marked as active and that screws up the LRU list, and makes us not
>> flush out the dirty pages well (because they are seen as active and
>> not good for writeback), and then you get bad memory use.
>> 
>> I'm not seeing anything that looks like locking-related.
>
> Not in that profile. I've been doing some local testing inside a
> 4-node fake-numa 16p/16GB RAM VM to see what I can find.

You run the test in a virtual machine, I think that is why your perf
data looks strange (high value of _raw_spin_unlock_irqrestore).

To setup KVM to use perf, you may refer to,

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Tuning_and_Optimization_Guide/sect-Virtualization_Tuning_Optimization_Guide-Monitoring_Tools-vPMU.html
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Administration_Guide/sect-perf-mon.html

I haven't tested them.  You may Google to find more information.  Or the
perf/kvm people can give you more information.

> I'm yet to work out how I can trigger a profile like the one that
> was reported (I really need to see the event traces), but in the
> mean time I found this....
>
> Doing a large sequential single threaded buffered write using a 4k
> buffer (so single page per syscall to make the XFS IO path allocator
> behave the same way as in 4.7), I'm seeing a CPU profile that
> indicates we have a potential mapping->tree_lock issue:
>
> # xfs_io -f -c "truncate 0" -c "pwrite 0 47g" /mnt/scratch/fooey
> wrote 50465865728/50465865728 bytes at offset 0
> 47.000 GiB, 12320768 ops; 0:01:36.00 (499.418 MiB/sec and 127850.9132 ops/sec)
>
> ....
>
>   24.15%  [kernel]  [k] _raw_spin_unlock_irqrestore
>    9.67%  [kernel]  [k] copy_user_generic_string
>    5.64%  [kernel]  [k] _raw_spin_unlock_irq
>    3.34%  [kernel]  [k] get_page_from_freelist
>    2.57%  [kernel]  [k] mark_page_accessed
>    2.45%  [kernel]  [k] do_raw_spin_lock
>    1.83%  [kernel]  [k] shrink_page_list
>    1.70%  [kernel]  [k] free_hot_cold_page
>    1.26%  [kernel]  [k] xfs_do_writepage

Best Regards,
Huang, Ying

From mboxrd@z Thu Jan  1 00:00:00 1970
Content-Type: multipart/mixed; boundary="===============1939515239207811628=="
MIME-Version: 1.0
From: Huang, Ying <ying.huang@intel.com>
To: lkp@lists.01.org
Subject: Re: [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression
Date: Mon, 15 Aug 2016 10:22:43 -0700
Message-ID: <87twemndzw.fsf@yhuang-mobile.sh.intel.com>
In-Reply-To: <20160811044609.GW16044@dastard>
List-Id: <oe-lkp.lists.linux.dev>

--===============1939515239207811628==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

Hi, Chinner,

Dave Chinner <david@fromorbit.com> writes:

> On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote:
>> On Wed, Aug 10, 2016 at 5:33 PM, Huang, Ying <ying.huang@intel.com> wr=
ote:
>> >
>> > Here it is,
>> =

>> Thanks.
>> =

>> Appended is a munged "after" list, with the "before" values in
>> parenthesis. It actually looks fairly similar.
>> =

>> The biggest difference is that we have "mark_page_accessed()" show up
>> after, and not before. There was also a lot of LRU noise in the
>> non-profile data. I wonder if that is the reason here: the old model
>> of using generic_perform_write/block_page_mkwrite didn't mark the
>> pages accessed, and now with iomap_file_buffered_write() they get
>> marked as active and that screws up the LRU list, and makes us not
>> flush out the dirty pages well (because they are seen as active and
>> not good for writeback), and then you get bad memory use.
>> =

>> I'm not seeing anything that looks like locking-related.
>
> Not in that profile. I've been doing some local testing inside a
> 4-node fake-numa 16p/16GB RAM VM to see what I can find.

You run the test in a virtual machine, I think that is why your perf
data looks strange (high value of _raw_spin_unlock_irqrestore).

To setup KVM to use perf, you may refer to,

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/ht=
ml/Virtualization_Tuning_and_Optimization_Guide/sect-Virtualization_Tuning_=
Optimization_Guide-Monitoring_Tools-vPMU.html
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/ht=
ml/Virtualization_Administration_Guide/sect-perf-mon.html

I haven't tested them.  You may Google to find more information.  Or the
perf/kvm people can give you more information.

> I'm yet to work out how I can trigger a profile like the one that
> was reported (I really need to see the event traces), but in the
> mean time I found this....
>
> Doing a large sequential single threaded buffered write using a 4k
> buffer (so single page per syscall to make the XFS IO path allocator
> behave the same way as in 4.7), I'm seeing a CPU profile that
> indicates we have a potential mapping->tree_lock issue:
>
> # xfs_io -f -c "truncate 0" -c "pwrite 0 47g" /mnt/scratch/fooey
> wrote 50465865728/50465865728 bytes at offset 0
> 47.000 GiB, 12320768 ops; 0:01:36.00 (499.418 MiB/sec and 127850.9132 ops=
/sec)
>
> ....
>
>   24.15%  [kernel]  [k] _raw_spin_unlock_irqrestore
>    9.67%  [kernel]  [k] copy_user_generic_string
>    5.64%  [kernel]  [k] _raw_spin_unlock_irq
>    3.34%  [kernel]  [k] get_page_from_freelist
>    2.57%  [kernel]  [k] mark_page_accessed
>    2.45%  [kernel]  [k] do_raw_spin_lock
>    1.83%  [kernel]  [k] shrink_page_list
>    1.70%  [kernel]  [k] free_hot_cold_page
>    1.26%  [kernel]  [k] xfs_do_writepage

Best Regards,
Huang, Ying

--===============1939515239207811628==--