From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751983AbdHCKoX (ORCPT <rfc822;w@1wt.eu>);
        Thu, 3 Aug 2017 06:44:23 -0400
Received: from mx2.suse.de ([195.135.220.15]:44966 "EHLO mx1.suse.de"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1751010AbdHCKoV (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 3 Aug 2017 06:44:21 -0400
Date: Thu, 3 Aug 2017 12:44:18 +0200
From: Michal Hocko <mhocko@kernel.org>
To: Wei Wang <wei.w.wang@intel.com>
Cc: linux-kernel@vger.kernel.org,
        virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
        linux-mm@kvack.org, mst@redhat.com, mawilcox@microsoft.com,
        akpm@linux-foundation.org, virtio-dev@lists.oasis-open.org,
        david@redhat.com, cornelia.huck@de.ibm.com,
        mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com,
        pbonzini@redhat.com, liliang.opensource@gmail.com,
        yang.zhang.wz@gmail.com, quan.xu@aliyun.com
Subject: Re: [PATCH v13 4/5] mm: support reporting free page blocks
Message-ID: <20170803104417.GI12521@dhcp22.suse.cz>
References: <1501742299-4369-1-git-send-email-wei.w.wang@intel.com>
 <1501742299-4369-5-git-send-email-wei.w.wang@intel.com>
 <20170803091151.GF12521@dhcp22.suse.cz>
 <5982FE07.3040207@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5982FE07.3040207@intel.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu 03-08-17 18:42:15, Wei Wang wrote:
> On 08/03/2017 05:11 PM, Michal Hocko wrote:
> >On Thu 03-08-17 14:38:18, Wei Wang wrote:
[...]
> >>+static int report_free_page_block(struct zone *zone, unsigned int order,
> >>+				  unsigned int migratetype, struct page **page)
> >This is just too ugly and wrong actually. Never provide struct page
> >pointers outside of the zone->lock. What I've had in mind was to simply
> >walk free lists of the suitable order and call the callback for each one.
> >Something as simple as
> >
> >	for (i = 0; i < MAX_NR_ZONES; i++) {
> >		struct zone *zone = &pgdat->node_zones[i];
> >
> >		if (!populated_zone(zone))
> >			continue;
> >		spin_lock_irqsave(&zone->lock, flags);
> >		for (order = min_order; order < MAX_ORDER; ++order) {
> >			struct free_area *free_area = &zone->free_area[order];
> >			enum migratetype mt;
> >			struct page *page;
> >
> >			if (!free_area->nr_pages)
> >				continue;
> >
> >			for_each_migratetype_order(order, mt) {
> >				list_for_each_entry(page,
> >						&free_area->free_list[mt], lru) {
> >
> >					pfn = page_to_pfn(page);
> >					visit(opaque2, prn, 1<<order);
> >				}
> >			}
> >		}
> >
> >		spin_unlock_irqrestore(&zone->lock, flags);
> >	}
> >
> >[...]
> 
> 
> I think the above would take the lock for too long time. That's why we
> prefer to take one free page block each time, and taking it one by one
> also doesn't make a difference, in terms of the performance that we
> need.

I think you should start with simple approach and impove incrementally
if this turns out to be not optimal. I really detest taking struct pages
outside of the lock. You never know what might happen after the lock is
dropped. E.g. can you race with the memory hotremove?

> The struct page is used as a "state" to get the next free page block. It is
> only
> given for an internal implementation of a function in mm ( not seen by the
> outside caller). Would this be OK?
> If not, how about pfn - we can also pass in pfn to the function, and do
> pfn_to_page each time the function starts, and then do page_to_pfn when
> returns.

No, just do not try to play tricks with struct pages which might have
gone away.
-- 
Michal Hocko
SUSE Labs

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v13 4/5] mm: support reporting free page blocks
Date: Thu, 3 Aug 2017 12:44:18 +0200
Message-ID: <20170803104417.GI12521@dhcp22.suse.cz>
References: <1501742299-4369-1-git-send-email-wei.w.wang@intel.com>
 <1501742299-4369-5-git-send-email-wei.w.wang@intel.com>
 <20170803091151.GF12521@dhcp22.suse.cz>
 <5982FE07.3040207@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com,
	mawilcox@microsoft.com, akpm@linux-foundation.org,
	virtio-dev@lists.oasis-open.org, david@redhat.com,
	cornelia.huck@de.ibm.com, mgorman@techsingularity.net,
	aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com,
	liliang.opensource@gmail.com, yang.zhang.wz@gmail.com,
	quan.xu@aliyun.com
To: Wei Wang <wei.w.wang@intel.com>
Return-path: <owner-linux-mm@kvack.org>
Content-Disposition: inline
In-Reply-To: <5982FE07.3040207@intel.com>
Sender: owner-linux-mm@kvack.org
List-Id: kvm.vger.kernel.org

On Thu 03-08-17 18:42:15, Wei Wang wrote:
> On 08/03/2017 05:11 PM, Michal Hocko wrote:
> >On Thu 03-08-17 14:38:18, Wei Wang wrote:
[...]
> >>+static int report_free_page_block(struct zone *zone, unsigned int order,
> >>+				  unsigned int migratetype, struct page **page)
> >This is just too ugly and wrong actually. Never provide struct page
> >pointers outside of the zone->lock. What I've had in mind was to simply
> >walk free lists of the suitable order and call the callback for each one.
> >Something as simple as
> >
> >	for (i = 0; i < MAX_NR_ZONES; i++) {
> >		struct zone *zone = &pgdat->node_zones[i];
> >
> >		if (!populated_zone(zone))
> >			continue;
> >		spin_lock_irqsave(&zone->lock, flags);
> >		for (order = min_order; order < MAX_ORDER; ++order) {
> >			struct free_area *free_area = &zone->free_area[order];
> >			enum migratetype mt;
> >			struct page *page;
> >
> >			if (!free_area->nr_pages)
> >				continue;
> >
> >			for_each_migratetype_order(order, mt) {
> >				list_for_each_entry(page,
> >						&free_area->free_list[mt], lru) {
> >
> >					pfn = page_to_pfn(page);
> >					visit(opaque2, prn, 1<<order);
> >				}
> >			}
> >		}
> >
> >		spin_unlock_irqrestore(&zone->lock, flags);
> >	}
> >
> >[...]
> 
> 
> I think the above would take the lock for too long time. That's why we
> prefer to take one free page block each time, and taking it one by one
> also doesn't make a difference, in terms of the performance that we
> need.

I think you should start with simple approach and impove incrementally
if this turns out to be not optimal. I really detest taking struct pages
outside of the lock. You never know what might happen after the lock is
dropped. E.g. can you race with the memory hotremove?

> The struct page is used as a "state" to get the next free page block. It is
> only
> given for an internal implementation of a function in mm ( not seen by the
> outside caller). Would this be OK?
> If not, how about pfn - we can also pass in pfn to the function, and do
> pfn_to_page each time the function starts, and then do page_to_pfn when
> returns.

No, just do not try to play tricks with struct pages which might have
gone away.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>