From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751706AbcDTV31 (ORCPT ); Wed, 20 Apr 2016 17:29:27 -0400 Received: from mail-yw0-f177.google.com ([209.85.161.177]:36068 "EHLO mail-yw0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751124AbcDTV3Z (ORCPT ); Wed, 20 Apr 2016 17:29:25 -0400 Date: Wed, 20 Apr 2016 17:29:22 -0400 From: Tejun Heo To: Michal Hocko Cc: Johannes Weiner , Petr Mladek , cgroups@vger.kernel.org, Cyril Hrubis , linux-kernel@vger.kernel.org Subject: Re: [PATCH for-4.6-fixes] memcg: remove lru_add_drain_all() invocation from mem_cgroup_move_charge() Message-ID: <20160420212922.GH4775@htj.duckdns.org> References: <20160413094216.GC5774@pathway.suse.cz> <20160413183309.GG3676@htj.duckdns.org> <20160413192313.GA30260@dhcp22.suse.cz> <20160414175055.GA6794@cmpxchg.org> <20160415191719.GK12583@htj.duckdns.org> <20160417120747.GC21757@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160417120747.GC21757@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Michal. On Sun, Apr 17, 2016 at 08:07:48AM -0400, Michal Hocko wrote: > On Fri 15-04-16 15:17:19, Tejun Heo wrote: > > mem_cgroup_move_charge() invokes lru_add_drain_all() so that the pvec > > pages can be moved too. lru_add_drain_all() schedules and flushes > > work items on system_wq which depends on being able to create new > > kworkers to make forward progress. Since 1ed1328792ff ("sched, > > cgroup: replace signal_struct->group_rwsem with a global > > percpu_rwsem"), a new task can't be created while in the cgroup > > migration path and the described lru_add_drain_all() invocation can > > easily lead to a deadlock. > > > > Charge moving is best-effort and whether the pvec pages are migrated > > or not doesn't really matter. Don't call it during charge moving. > > Eventually, we want to move the actual charge moving outside the > > migration path. > > > > Signed-off-by: Tejun Heo > > Reported-by: Johannes Weiner > > I guess > Debugged-by: Petr Mladek > Reported-by: Cyril Hrubis Yeah, definitely. Sorry about missing them. > > Suggested-by: Michal Hocko > > Fixes: 1ed1328792ff ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem") > > Cc: stable@vger.kernel.org > > --- > > Hello, > > > > So, this deadlock seems pretty easy to trigger. We'll make the charge > > moving asynchronous eventually but let's not hold off fixing an > > immediate problem. > > Although this looks rather straightforward and it fixes the immediate > problem I am little bit nervous about it. As already pointed out in > other email mem_cgroup_move_charge still depends on mmap_sem for > read and we might hit an even more subtle lockup if the current holder > of the mmap_sem for write depends on the task creation (e.g. some of the > direct reclaim path uses WQ which is really hard to rule out and I even > think that some shrinkers do this). > > I liked your proposal when mem_cgroup_move_charge would be called from a > context which doesn't hold the problematic rwsem much more. Would that > be too intrusive for the stable backport? Yeah, I'm working on the fix but let's plug this one first as it seems really easy to trigger. I got a couple off-list reports (in and outside fb) of this triggering. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH for-4.6-fixes] memcg: remove lru_add_drain_all() invocation from mem_cgroup_move_charge() Date: Wed, 20 Apr 2016 17:29:22 -0400 Message-ID: <20160420212922.GH4775@htj.duckdns.org> References: <20160413094216.GC5774@pathway.suse.cz> <20160413183309.GG3676@htj.duckdns.org> <20160413192313.GA30260@dhcp22.suse.cz> <20160414175055.GA6794@cmpxchg.org> <20160415191719.GK12583@htj.duckdns.org> <20160417120747.GC21757@dhcp22.suse.cz> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=M/tCLm3V6RxzNdzp7cC1yrHRtL8uFdU9mo62Q+TjzQs=; b=IEjjPdbdclhlnJ8F0fl7kmupAa2sl18mwUZow7Ewr/mVVtwk3PoECf1dxCSRkpJMMf zZHI+GMehp7NGr4uUblvoI6X6wtnyriVX5nzqckpi9ZYsbD6fb0H45/0SCP31jkymN11 SP8f5/tgaVijwn5oiBL9Fsyp2tgJYhIHOGiaOPF1YddcunJlxbKzY5Az4xKFQNTIbFGY 3Qc/PqI6GII1+INLkK2Xi5Z/WVa51OLNOhEjVY1OddcAHVP58CUYXO3pdYiH0OZghqr+ XfoQ7n/Rfw2Tbwhgm81HBCWdrEaQPfrgbIENv1XvMpClHx2TfN0SfGac8Fp4FiyB9R/s QpLQ== Content-Disposition: inline In-Reply-To: <20160417120747.GC21757-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: Johannes Weiner , Petr Mladek , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Cyril Hrubis , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Hello, Michal. On Sun, Apr 17, 2016 at 08:07:48AM -0400, Michal Hocko wrote: > On Fri 15-04-16 15:17:19, Tejun Heo wrote: > > mem_cgroup_move_charge() invokes lru_add_drain_all() so that the pvec > > pages can be moved too. lru_add_drain_all() schedules and flushes > > work items on system_wq which depends on being able to create new > > kworkers to make forward progress. Since 1ed1328792ff ("sched, > > cgroup: replace signal_struct->group_rwsem with a global > > percpu_rwsem"), a new task can't be created while in the cgroup > > migration path and the described lru_add_drain_all() invocation can > > easily lead to a deadlock. > > > > Charge moving is best-effort and whether the pvec pages are migrated > > or not doesn't really matter. Don't call it during charge moving. > > Eventually, we want to move the actual charge moving outside the > > migration path. > > > > Signed-off-by: Tejun Heo > > Reported-by: Johannes Weiner > > I guess > Debugged-by: Petr Mladek > Reported-by: Cyril Hrubis Yeah, definitely. Sorry about missing them. > > Suggested-by: Michal Hocko > > Fixes: 1ed1328792ff ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem") > > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > --- > > Hello, > > > > So, this deadlock seems pretty easy to trigger. We'll make the charge > > moving asynchronous eventually but let's not hold off fixing an > > immediate problem. > > Although this looks rather straightforward and it fixes the immediate > problem I am little bit nervous about it. As already pointed out in > other email mem_cgroup_move_charge still depends on mmap_sem for > read and we might hit an even more subtle lockup if the current holder > of the mmap_sem for write depends on the task creation (e.g. some of the > direct reclaim path uses WQ which is really hard to rule out and I even > think that some shrinkers do this). > > I liked your proposal when mem_cgroup_move_charge would be called from a > context which doesn't hold the problematic rwsem much more. Would that > be too intrusive for the stable backport? Yeah, I'm working on the fix but let's plug this one first as it seems really easy to trigger. I got a couple off-list reports (in and outside fb) of this triggering. Thanks. -- tejun