All of lore.kernel.org
 help / color / mirror / Atom feed
From: joeyli <jlee@suse.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Jiri Kosina <jikos@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Toshi Kani <toshi.kani@hp.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	linux-api@vger.kernel.org
Subject: Re: memory hotplug and force_remove
Date: Fri, 31 Mar 2017 00:57:29 +0800	[thread overview]
Message-ID: <20170330165729.GN28365@linux-l9pv.suse> (raw)
In-Reply-To: <20170330162031.GE4326@dhcp22.suse.cz>

On Thu, Mar 30, 2017 at 06:20:34PM +0200, Michal Hocko wrote:
> On Thu 30-03-17 10:47:52, Jiri Kosina wrote:
> > On Tue, 28 Mar 2017, Rafael J. Wysocki wrote:
> > 
> > > > > > we have been chasing the following BUG() triggering during the memory
> > > > > > hotremove (remove_memory):
> > > > > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > > > > 				check_memblock_offlined_cb);
> > > > > > 	if (ret)
> > > > > > 		BUG();
> > > > > > 
> > > > > > and it took a while to learn that the issue is caused by
> > > > > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > > > > surprised to see such an option because at least for the memory hotplug
> > > > > > it cannot work at all. Memory hotplug fails when the memory is still
> > > > > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > > > > will lead to problematic behavior later like crash or a silent memory
> > > > > > corruption if the memory gets onlined back and reused by somebody else.
> > > > > > 
> > > > > > I am wondering what was the motivation for introducing this behavior and
> > > > > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > > > > it completely. What would break in such a case?
> > > > > 
> > > > > Honestly, I don't remember from the top of my head and I haven't looked at
> > > > > that code for several months.
> > > > > 
> > > > > I need some time to recall that.
> > > > 
> > > > Did you have any chance to look into this?
> > > 
> > > Well, yes.
> > > 
> > > It looks like that was added for some people who depended on the old behavior
> > > at that time.
> > > 
> > > I guess we can try to drop it and see what happpens. :-)
> > 
> > I'd agree with that; at the same time, udev rule should be submitted to 
> > systemd folks though. I don't think there is anything existing in this 
> > area yet (neither do distros ship their own udev rules for this AFAIK).
> 
> Another option would keepint the force_remove knob but make the code be
> error handling aware. In other words rather than ignoring offline error
> simply propagate it up the chain and do not consider the offline. Would
> that be acceptable?

Then the only difference between normal mode is that the force_remove mode
doesn't send out uevent for not-offline-yet container.

I vote to remove force_remove not just it ignored offline error and also
it's a acpi global knob that it affect all container devices in system.

Thanks a lot!
Joey Lee

WARNING: multiple messages have this Message-ID (diff)
From: joeyli <jlee-IBi9RG/b67k@public.gmane.org>
To: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Jiri Kosina <jikos-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"Rafael J. Wysocki" <rjw-LthD3rsA81gm4RdzfppkhA@public.gmane.org>,
	Toshi Kani <toshi.kani-VXdhtT5mjnY@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: memory hotplug and force_remove
Date: Fri, 31 Mar 2017 00:57:29 +0800	[thread overview]
Message-ID: <20170330165729.GN28365@linux-l9pv.suse> (raw)
In-Reply-To: <20170330162031.GE4326-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>

On Thu, Mar 30, 2017 at 06:20:34PM +0200, Michal Hocko wrote:
> On Thu 30-03-17 10:47:52, Jiri Kosina wrote:
> > On Tue, 28 Mar 2017, Rafael J. Wysocki wrote:
> > 
> > > > > > we have been chasing the following BUG() triggering during the memory
> > > > > > hotremove (remove_memory):
> > > > > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > > > > 				check_memblock_offlined_cb);
> > > > > > 	if (ret)
> > > > > > 		BUG();
> > > > > > 
> > > > > > and it took a while to learn that the issue is caused by
> > > > > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > > > > surprised to see such an option because at least for the memory hotplug
> > > > > > it cannot work at all. Memory hotplug fails when the memory is still
> > > > > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > > > > will lead to problematic behavior later like crash or a silent memory
> > > > > > corruption if the memory gets onlined back and reused by somebody else.
> > > > > > 
> > > > > > I am wondering what was the motivation for introducing this behavior and
> > > > > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > > > > it completely. What would break in such a case?
> > > > > 
> > > > > Honestly, I don't remember from the top of my head and I haven't looked at
> > > > > that code for several months.
> > > > > 
> > > > > I need some time to recall that.
> > > > 
> > > > Did you have any chance to look into this?
> > > 
> > > Well, yes.
> > > 
> > > It looks like that was added for some people who depended on the old behavior
> > > at that time.
> > > 
> > > I guess we can try to drop it and see what happpens. :-)
> > 
> > I'd agree with that; at the same time, udev rule should be submitted to 
> > systemd folks though. I don't think there is anything existing in this 
> > area yet (neither do distros ship their own udev rules for this AFAIK).
> 
> Another option would keepint the force_remove knob but make the code be
> error handling aware. In other words rather than ignoring offline error
> simply propagate it up the chain and do not consider the offline. Would
> that be acceptable?

Then the only difference between normal mode is that the force_remove mode
doesn't send out uevent for not-offline-yet container.

I vote to remove force_remove not just it ignored offline error and also
it's a acpi global knob that it affect all container devices in system.

Thanks a lot!
Joey Lee

WARNING: multiple messages have this Message-ID (diff)
From: joeyli <jlee@suse.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Jiri Kosina <jikos@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Toshi Kani <toshi.kani@hp.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	linux-api@vger.kernel.org
Subject: Re: memory hotplug and force_remove
Date: Fri, 31 Mar 2017 00:57:29 +0800	[thread overview]
Message-ID: <20170330165729.GN28365@linux-l9pv.suse> (raw)
In-Reply-To: <20170330162031.GE4326@dhcp22.suse.cz>

On Thu, Mar 30, 2017 at 06:20:34PM +0200, Michal Hocko wrote:
> On Thu 30-03-17 10:47:52, Jiri Kosina wrote:
> > On Tue, 28 Mar 2017, Rafael J. Wysocki wrote:
> > 
> > > > > > we have been chasing the following BUG() triggering during the memory
> > > > > > hotremove (remove_memory):
> > > > > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > > > > 				check_memblock_offlined_cb);
> > > > > > 	if (ret)
> > > > > > 		BUG();
> > > > > > 
> > > > > > and it took a while to learn that the issue is caused by
> > > > > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > > > > surprised to see such an option because at least for the memory hotplug
> > > > > > it cannot work at all. Memory hotplug fails when the memory is still
> > > > > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > > > > will lead to problematic behavior later like crash or a silent memory
> > > > > > corruption if the memory gets onlined back and reused by somebody else.
> > > > > > 
> > > > > > I am wondering what was the motivation for introducing this behavior and
> > > > > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > > > > it completely. What would break in such a case?
> > > > > 
> > > > > Honestly, I don't remember from the top of my head and I haven't looked at
> > > > > that code for several months.
> > > > > 
> > > > > I need some time to recall that.
> > > > 
> > > > Did you have any chance to look into this?
> > > 
> > > Well, yes.
> > > 
> > > It looks like that was added for some people who depended on the old behavior
> > > at that time.
> > > 
> > > I guess we can try to drop it and see what happpens. :-)
> > 
> > I'd agree with that; at the same time, udev rule should be submitted to 
> > systemd folks though. I don't think there is anything existing in this 
> > area yet (neither do distros ship their own udev rules for this AFAIK).
> 
> Another option would keepint the force_remove knob but make the code be
> error handling aware. In other words rather than ignoring offline error
> simply propagate it up the chain and do not consider the offline. Would
> that be acceptable?

Then the only difference between normal mode is that the force_remove mode
doesn't send out uevent for not-offline-yet container.

I vote to remove force_remove not just it ignored offline error and also
it's a acpi global knob that it affect all container devices in system.

Thanks a lot!
Joey Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-03-30 16:57 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-20 19:29 memory hotplug and force_remove Michal Hocko
2017-03-20 19:29 ` Michal Hocko
2017-03-20 21:24 ` Rafael J. Wysocki
2017-03-20 21:24   ` Rafael J. Wysocki
2017-03-21 16:13   ` joeyli
2017-03-21 16:13     ` joeyli
2017-03-28  7:58   ` Michal Hocko
2017-03-28  7:58     ` Michal Hocko
2017-03-28 15:22     ` Rafael J. Wysocki
2017-03-28 15:22       ` Rafael J. Wysocki
2017-03-30  8:47       ` Jiri Kosina
2017-03-30  8:47         ` Jiri Kosina
2017-03-30 16:20         ` Michal Hocko
2017-03-30 16:20           ` Michal Hocko
2017-03-30 16:57           ` joeyli [this message]
2017-03-30 16:57             ` joeyli
2017-03-30 16:57             ` joeyli
2017-03-30 20:15             ` Rafael J. Wysocki
2017-03-30 20:15               ` Rafael J. Wysocki
2017-03-30 20:15               ` Rafael J. Wysocki
2017-03-31  0:00               ` joeyli
2017-03-31  0:00                 ` joeyli
2017-03-31  8:30       ` Michal Hocko
2017-03-31  8:30         ` Michal Hocko
2017-03-31  8:30         ` Michal Hocko
2017-03-31 10:49         ` joeyli
2017-03-31 10:49           ` joeyli
2017-03-31 10:49           ` joeyli
2017-03-31 10:55           ` Michal Hocko
2017-03-31 10:55             ` Michal Hocko
2017-03-31 10:55             ` Michal Hocko
2017-03-31 11:55             ` joeyli
2017-03-31 11:55               ` joeyli
2017-03-31 11:55               ` joeyli
2017-03-31 12:02               ` Michal Hocko
2017-03-31 12:02                 ` Michal Hocko
2017-03-31 22:35                 ` Rafael J. Wysocki
2017-03-31 22:35                   ` Rafael J. Wysocki
2017-03-31 22:35                   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170330165729.GN28365@linux-l9pv.suse \
    --to=jlee@suse.com \
    --cc=jikos@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=toshi.kani@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.