From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tetsuo Handa Subject: Re: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify() Date: Sat, 14 Oct 2017 01:41:14 +0900 Message-ID: <201710140141.JFF26087.FLQHOFOOtFMVSJ__11269.0896647566$1507912899$gmane$org@I-love.SAKURA.ne.jp> References: <1507632457-4611-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> <20171013162134-mutt-send-email-mst@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20171013162134-mutt-send-email-mst@kernel.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: mst@redhat.com Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org, mhocko@kernel.org List-Id: virtualization@lists.linuxfoundation.org Michael S. Tsirkin wrote: > On Tue, Oct 10, 2017 at 07:47:37PM +0900, Tetsuo Handa wrote: > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > serialize against fill_balloon(). But in fill_balloon(), > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] > > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY > > is specified, this allocation attempt might indirectly depend on somebody > > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect > > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via > > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via > > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock > > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it > > will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if > > leak_balloon() is called from out_of_memory(). > > > > Thread1 Thread2 > > fill_balloon() > > takes a balloon_lock > > balloon_page_enqueue() > > alloc_page(GFP_HIGHUSER_MOVABLE) > > direct reclaim (__GFP_FS context) takes a fs lock > > waits for that fs lock alloc_page(GFP_NOFS) > > __alloc_pages_may_oom() > > takes the oom_lock > > out_of_memory() > > blocking_notifier_call_chain() > > leak_balloon() > > tries to take that balloon_lock and deadlocks > > > > Signed-off-by: Tetsuo Handa > > This doesn't deflate on oom if lock is contended, and we acked > DEFLATE_ON_OOM so host actually expects us to. I still don't understand what is wrong with not deflating on OOM. According to https://lists.oasis-open.org/archives/virtio-dev/201504/msg00084.html , If VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been negotiated, the driver MAY use pages from the balloon when \field{num_pages} is less than or equal to the actual number of pages in the balloon if this is required for system stability (e.g. if memory is required by applications running within the guest). it says "MAY" rather than "MUST". I think it is legal to be a no-op. Maybe I don't understand the difference between "deflate the balloon" and "use pages from the balloon" ? According to https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027807.html , it seems to me that the expected behavior after deflating while inflating was not defined when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was proposed. When the host increased "struct virtio_balloon_config"->num_pages and kicked the guest, the guest's update_balloon_size_func() starts calling fill_balloon() until "struct virtio_balloon"->num_pages reaches "struct virtio_balloon_config"->num_pages, doesn't it? struct virtio_balloon_config { /* Number of pages host wants Guest to give up. */ __u32 num_pages; /* Number of pages we've actually got in balloon. */ __u32 actual; }; If leak_balloon() is called via out_of_memory(), leak_balloon() will decrease "struct virtio_balloon"->num_pages. But, is "struct virtio_balloon_config"->num_pages updated when leak_balloon() is called via out_of_memory() ? If yes, update_balloon_size_func() would stop calling fill_balloon() when leak_balloon() was called via out_of_memory(). If no, update_balloon_size_func() would continue calling fill_balloon() when leak_balloon() was called via out_of_memory() via fill_balloon() via update_balloon_size_func(). That is, when fill_balloon() tries to increase "struct virtio_balloon"->num_pages, leak_balloon() which decreases "struct virtio_balloon"->num_pages is called due to indirect __GFP_DIRECT_RECLAIM dependency via out_of_memory(). As a result, fill_balloon() will continue trying to increase "struct virtio_balloon"->num_pages and leak_balloon() will continue decreasing "struct virtio_balloon"->num_pages when leak_balloon() is called via fill_balloon() via update_balloon_size_func() due to host increased "struct virtio_balloon_config"->num_pages and kicked the guest. We deflate the balloon in order to inflate the balloon. That is OOM lockup, isn't it? How is such situation better than invoking the OOM killer in order to inflate the balloon? > > The proper fix isn't that hard - just avoid allocations under lock. > > Patch posted, pls take a look. Your patch allocates pages in order to inflate the balloon, but your patch will allow leak_balloon() to deflate the balloon. How deflating the balloon (i.e. calling leak_balloon()) makes sense when allocating pages for inflating the balloon (i.e. calling fill_balloon()) ?