All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	virtualization@lists.linux-foundation.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Juergen Gross <jgross@suse.com>,
	Pavel Tatashin <pavel.tatashin@microsoft.com>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Oscar Salvador <osalvador@suse.de>,
	Pingfan Liu <kernelfans@gmail.com>, Qian Cai <cai@lca.pw>,
	Dan Williams <dan.j.williams@intel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Wei Yang <richardw.yang@linux.intel.com>,
	Alexander Potapenko <glider@google.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>,
	Matthew Wilcox <willy@infradead.org>, Yu Zhao <yuzhao@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>
Subject: Re: [PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
Date: Wed, 23 Oct 2019 11:43:45 +0200	[thread overview]
Message-ID: <20191023094345.GL754@dhcp22.suse.cz> (raw)
In-Reply-To: <b4be42a4-cbfc-8706-cc94-26211ddcbe4a@redhat.com>

On Tue 22-10-19 16:02:09, David Hildenbrand wrote:
[...]
> >>> MEM_CANCEL_OFFLINE could gain the reference back to balance the
> >>> MEM_GOING_OFFLINE step.
> >>
> >> The pages are already unisolated and could be used by the buddy. But again,
> >> I think you have an idea that tries to avoid putting pages to the buddy.
> > 
> > Yeah, set_page_count(page, 0) if you do not want to release that page
> > from the notifier context to reflect that the page is ok to be offlined
> > with the rest.
> >   
> 
> I neither see how you deal with __test_page_isolated_in_pageblock() nor with
> __offline_isolated_pages(). Sorry, but what I read is incomplete and you
> probably have a full proposal in your head. Please read below how I think
> you want to solve it.

Yeah, sorry that I am throwing incomplete ideas at you. I am just trying
to really nail down how to deal with reference counting here because it
is an important aspect.
 
> >>> explicit control via the reference count which is the standard way to
> >>> control the struct page life cycle.
> >>>
> >>> Anyway hooking into __put_page (which tends to be a hot path with
> >>> something that is barely used on most systems) doesn't sound nice to me.
> >>> This is the whole point which made me think about the whole reference
> >>> count approach in the first place.
> >>
> >> Again, the race I think that is possible
> >>
> >> somebody: get_page_unless_zero(page)
> >> virtio_mem: page_ref_dec(pfn_to_page(pfn)
> >> somebody: put_page() -> straight to the buddy
> > 
> > Who is that somebody? I thought that it is only the owner/driver to have
> > a control over the page. Also the above is not possible as long as the
> > owner/driver keeps a reference to the PageOffline page throughout the
> > time it is marked that way.
> >   
> 
> I was reading
> 
> include/linux/mm_types.h:
> 
> "If you want to use the refcount field, it must be used in such a way
>  that other CPUs temporarily incrementing and then decrementing the
>  refcount does not cause problems"
> 
> And that made me think "anybody can go ahead and try get_page_unless_zero()".
> 
> If I am missing something here and this can indeed not happen (e.g.,
> because PageOffline() pages are never mapped to user space), then I'll
> happily remove this code.

The point is that if the owner of the page is holding the only reference
to the page then it is clear that nothing like that's happened.
 
[...]

> Let's recap what I suggest:
> 
> "PageOffline() pages that have a reference count of 0 will be treated
>  like free pages when offlining pages, allowing the containing memory
>  block to get offlined. In case a driver wants to revive such a page, it
>  has to synchronize against memory onlining/offlining (e.g., using memory
>  notifiers) while incrementing the reference count. Also, a driver that
>  relies in this feature is aware that re-onlining the memory will require
>  to re-set the pages PageOffline() - e.g., via the online_page_callback_t."

OK

[...]
> d) __put_page() is modified to not return pages to the buddy in any
>    case as a safety net. We might be able to get rid of that.

I do not like exactly this part
 
> What I think you suggest:
> 
> a) has_unmovable_pages() skips over all PageOffline() pages.
>    This results in a lot of false negatives when trying to offline. Might be ok.
> 
> b) The driver decrements the reference count of the PageOffline pages
>    in MEM_GOING_OFFLINE.

Well, driver should make the page unreferenced or fail. What is done
really depends on the specific driver

> c) The driver increments the reference count of the PageOffline pages
>    in MEM_CANCEL_OFFLINE. One issue might be that the pages are no longer
>    isolated once we get that call. Might be ok.

Only previous PageBuddy pages are returned to the allocator IIRC. Mostly
because of MovablePage()

> d) How to make __test_page_isolated_in_pageblock() succeed?
>    Like I propose in this patch (PageOffline() + refcount == 0)?

Yep

> e) How to make __offline_isolated_pages() succeed?
>    Like I propose in this patch (PageOffline() + refcount == 0)?

Simply skip over PageOffline pages. Reference count should never be != 0
at this stage.
 
> In summary, is what you suggest simply delaying setting the reference count to 0
> in MEM_GOING_OFFLINE instead of right away when the driver unpluggs the pages?

Yes

> What's the big benefit you see and I fail to see?

Aparat from no hooks into __put_page it is also an explicit control over
the page via reference counting. Do you see any downsides?
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: Pingfan Liu <kernelfans@gmail.com>,
	virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
	Alexander Potapenko <glider@google.com>,
	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Yu Zhao <yuzhao@google.com>, Matthew Wilcox <willy@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	Pavel Tatashin <pavel.tatashin@microsoft.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>, Qian Cai <cai@lca.pw>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Oscar Salvador <osalvador@suse.de>,
	Juergen Gross <jgross@suse.com>,
	Yang Shi <yang.shi@linux.alibaba.c>
Subject: Re: [PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0
Date: Wed, 23 Oct 2019 11:43:45 +0200	[thread overview]
Message-ID: <20191023094345.GL754@dhcp22.suse.cz> (raw)
In-Reply-To: <b4be42a4-cbfc-8706-cc94-26211ddcbe4a@redhat.com>

On Tue 22-10-19 16:02:09, David Hildenbrand wrote:
[...]
> >>> MEM_CANCEL_OFFLINE could gain the reference back to balance the
> >>> MEM_GOING_OFFLINE step.
> >>
> >> The pages are already unisolated and could be used by the buddy. But again,
> >> I think you have an idea that tries to avoid putting pages to the buddy.
> > 
> > Yeah, set_page_count(page, 0) if you do not want to release that page
> > from the notifier context to reflect that the page is ok to be offlined
> > with the rest.
> >   
> 
> I neither see how you deal with __test_page_isolated_in_pageblock() nor with
> __offline_isolated_pages(). Sorry, but what I read is incomplete and you
> probably have a full proposal in your head. Please read below how I think
> you want to solve it.

Yeah, sorry that I am throwing incomplete ideas at you. I am just trying
to really nail down how to deal with reference counting here because it
is an important aspect.
 
> >>> explicit control via the reference count which is the standard way to
> >>> control the struct page life cycle.
> >>>
> >>> Anyway hooking into __put_page (which tends to be a hot path with
> >>> something that is barely used on most systems) doesn't sound nice to me.
> >>> This is the whole point which made me think about the whole reference
> >>> count approach in the first place.
> >>
> >> Again, the race I think that is possible
> >>
> >> somebody: get_page_unless_zero(page)
> >> virtio_mem: page_ref_dec(pfn_to_page(pfn)
> >> somebody: put_page() -> straight to the buddy
> > 
> > Who is that somebody? I thought that it is only the owner/driver to have
> > a control over the page. Also the above is not possible as long as the
> > owner/driver keeps a reference to the PageOffline page throughout the
> > time it is marked that way.
> >   
> 
> I was reading
> 
> include/linux/mm_types.h:
> 
> "If you want to use the refcount field, it must be used in such a way
>  that other CPUs temporarily incrementing and then decrementing the
>  refcount does not cause problems"
> 
> And that made me think "anybody can go ahead and try get_page_unless_zero()".
> 
> If I am missing something here and this can indeed not happen (e.g.,
> because PageOffline() pages are never mapped to user space), then I'll
> happily remove this code.

The point is that if the owner of the page is holding the only reference
to the page then it is clear that nothing like that's happened.
 
[...]

> Let's recap what I suggest:
> 
> "PageOffline() pages that have a reference count of 0 will be treated
>  like free pages when offlining pages, allowing the containing memory
>  block to get offlined. In case a driver wants to revive such a page, it
>  has to synchronize against memory onlining/offlining (e.g., using memory
>  notifiers) while incrementing the reference count. Also, a driver that
>  relies in this feature is aware that re-onlining the memory will require
>  to re-set the pages PageOffline() - e.g., via the online_page_callback_t."

OK

[...]
> d) __put_page() is modified to not return pages to the buddy in any
>    case as a safety net. We might be able to get rid of that.

I do not like exactly this part
 
> What I think you suggest:
> 
> a) has_unmovable_pages() skips over all PageOffline() pages.
>    This results in a lot of false negatives when trying to offline. Might be ok.
> 
> b) The driver decrements the reference count of the PageOffline pages
>    in MEM_GOING_OFFLINE.

Well, driver should make the page unreferenced or fail. What is done
really depends on the specific driver

> c) The driver increments the reference count of the PageOffline pages
>    in MEM_CANCEL_OFFLINE. One issue might be that the pages are no longer
>    isolated once we get that call. Might be ok.

Only previous PageBuddy pages are returned to the allocator IIRC. Mostly
because of MovablePage()

> d) How to make __test_page_isolated_in_pageblock() succeed?
>    Like I propose in this patch (PageOffline() + refcount == 0)?

Yep

> e) How to make __offline_isolated_pages() succeed?
>    Like I propose in this patch (PageOffline() + refcount == 0)?

Simply skip over PageOffline pages. Reference count should never be != 0
at this stage.
 
> In summary, is what you suggest simply delaying setting the reference count to 0
> in MEM_GOING_OFFLINE instead of right away when the driver unpluggs the pages?

Yes

> What's the big benefit you see and I fail to see?

Aparat from no hooks into __put_page it is also an explicit control over
the page via reference counting. Do you see any downsides?
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-10-23  9:43 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19 14:22 [PATCH RFC v3 0/9] virtio-mem: paravirtualized memory David Hildenbrand
2019-09-19 14:22 ` [PATCH RFC v3 1/9] ACPI: NUMA: export pxm_to_node David Hildenbrand
2019-09-19 14:22 ` David Hildenbrand
2019-09-23 10:13   ` David Hildenbrand
2019-09-23 10:36     ` Michal Hocko
2019-09-23 10:36       ` Michal Hocko
2019-09-23 10:39       ` David Hildenbrand
2019-09-23 10:39       ` David Hildenbrand
2019-09-23 10:13   ` David Hildenbrand
2019-09-19 14:22 ` [PATCH RFC v3 2/9] virtio-mem: Paravirtualized memory hotplug David Hildenbrand
2019-09-19 14:22 ` David Hildenbrand
2019-09-19 14:22 ` [PATCH RFC v3 3/9] virtio-mem: Paravirtualized memory hotunplug part 1 David Hildenbrand
2019-09-19 14:22 ` David Hildenbrand
2019-09-19 14:22 ` [PATCH RFC v3 4/9] mm: Export alloc_contig_range() / free_contig_range() David Hildenbrand
2019-10-16 11:20   ` Michal Hocko
2019-10-16 12:31     ` David Hildenbrand
2019-10-16 12:31     ` David Hildenbrand
2019-10-16 11:20   ` Michal Hocko
2019-09-19 14:22 ` David Hildenbrand
2019-09-19 14:22 ` [PATCH RFC v3 5/9] virtio-mem: Paravirtualized memory hotunplug part 2 David Hildenbrand
2019-09-19 14:22 ` David Hildenbrand
2019-09-19 14:22 ` [PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0 David Hildenbrand
2019-09-19 14:22   ` David Hildenbrand
2019-10-16 11:43   ` Michal Hocko
2019-10-16 12:50     ` David Hildenbrand
2019-10-16 12:50     ` David Hildenbrand
2019-10-16 13:45       ` Michal Hocko
2019-10-16 13:55         ` David Hildenbrand
2019-10-16 13:55         ` David Hildenbrand
2019-10-16 14:09           ` Michal Hocko
2019-10-16 14:16             ` David Hildenbrand
2019-10-16 14:16             ` David Hildenbrand
2019-10-16 14:09           ` Michal Hocko
2019-10-16 13:59         ` David Hildenbrand
2019-10-16 13:59           ` David Hildenbrand
2019-10-16 13:45       ` Michal Hocko
2019-10-16 13:45     ` David Hildenbrand
2019-10-16 13:45     ` David Hildenbrand
2019-10-16 14:03       ` Michal Hocko
2019-10-16 14:14         ` David Hildenbrand
2019-10-16 14:14           ` David Hildenbrand
2019-10-18  8:15           ` Michal Hocko
2019-10-18  8:50             ` David Hildenbrand
2019-10-18  8:50             ` David Hildenbrand
2019-10-18 11:20               ` Michal Hocko
2019-10-18 11:20               ` Michal Hocko
2019-10-18 12:35                 ` David Hildenbrand
2019-10-18 12:35                   ` David Hildenbrand
2019-10-22 12:23                   ` Michal Hocko
2019-10-22 12:23                     ` Michal Hocko
2019-10-22 14:02                     ` David Hildenbrand
2019-10-22 14:02                       ` David Hildenbrand
2019-10-23  9:43                       ` Michal Hocko [this message]
2019-10-23  9:43                         ` Michal Hocko
2019-10-23 10:03                         ` David Hildenbrand
2019-10-23 10:03                         ` David Hildenbrand
2019-10-24  8:42                           ` Michal Hocko
2019-10-24  8:42                           ` Michal Hocko
2019-10-24  8:51                             ` David Hildenbrand
2019-10-24  8:51                               ` David Hildenbrand
2019-10-25 11:28                               ` [PATCH RFC] mm: Allow to offline unmovable PageOffline() pages if the driver agrees David Hildenbrand
2019-10-25 11:28                                 ` David Hildenbrand
2019-10-18  8:15           ` [PATCH RFC v3 6/9] mm: Allow to offline PageOffline() pages with a reference count of 0 Michal Hocko
2019-10-16 14:03       ` Michal Hocko
2019-10-16 11:43   ` Michal Hocko
2019-09-19 14:22 ` [PATCH RFC v3 7/9] virtio-mem: Allow to offline partially unplugged memory blocks David Hildenbrand
2019-09-19 14:22 ` David Hildenbrand
2019-09-19 14:22 ` [PATCH RFC v3 8/9] mm/memory_hotplug: Introduce offline_and_remove_memory() David Hildenbrand
2019-09-19 14:22 ` David Hildenbrand
2019-10-16 11:47   ` Michal Hocko
2019-10-16 11:47     ` Michal Hocko
2019-10-16 12:57     ` David Hildenbrand
2019-10-16 12:57       ` David Hildenbrand
2019-09-19 14:22 ` [PATCH RFC v3 9/9] virtio-mem: Offline and remove completely unplugged memory blocks David Hildenbrand
2019-09-19 14:22 ` David Hildenbrand
2019-10-16  8:12 ` [PATCH RFC v3 0/9] virtio-mem: paravirtualized memory David Hildenbrand
2019-10-16  8:12 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191023094345.GL754@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=anshuman.khandual@arm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=cai@lca.pw \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=glider@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jgross@suse.com \
    --cc=kernelfans@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=minchan@kernel.org \
    --cc=osalvador@suse.de \
    --cc=pavel.tatashin@microsoft.com \
    --cc=richardw.yang@linux.intel.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=vbabka@suse.cz \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.