From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f67.google.com ([74.125.82.67]:32885 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932270AbcECPNQ (ORCPT ); Tue, 3 May 2016 11:13:16 -0400 Date: Tue, 3 May 2016 17:13:12 +0200 From: Michal Hocko To: NeilBrown Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Andrew Morton , Dave Chinner , "Theodore Ts'o" , Chris Mason , Jan Kara , ceph-devel@vger.kernel.org, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, logfs@logfs.org, xfs@oss.sgi.com, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mtd@lists.infradead.org, reiserfs-devel@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, linux-f2fs-devel@lists.sourceforge.net, linux-afs@lists.infradead.org, LKML Subject: Re: [PATCH 0/2] scop GFP_NOFS api Message-ID: <20160503151312.GA4470@dhcp22.suse.cz> References: <1461671772-1269-1-git-send-email-mhocko@kernel.org> <8737q5ugcx.fsf@notabene.neil.brown.name> <20160429120418.GK21977@dhcp22.suse.cz> <87twiiu5gs.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <87twiiu5gs.fsf@notabene.neil.brown.name> Sender: linux-btrfs-owner@vger.kernel.org List-ID: Hi, On Sun 01-05-16 07:55:31, NeilBrown wrote: [...] > One particular problem with your process-context idea is that it isn't > inherited across threads. > Steve Whitehouse's example in gfs shows how allocation dependencies can > even cross into user space. Hmm, I am still not sure I understand that example completely but making a dependency between direct reclaim and userspace can hardly work. Especially when the direct reclaim might be sitting on top of hard to guess pile of locks. So unless I've missed anything what Steve has described is a clear NOFS context. > A more localized one that I have seen is that NFSv4 sometimes needs to > start up a state-management thread (particularly if the server > restarted). > It uses kthread_run(), which doesn't actually create the thread but asks > kthreadd to do it. If NFS writeout is waiting for state management it > would need to make sure that kthreadd runs in allocation context to > avoid deadlock. > I feel that I've forgotten some important detail here and this might > have been fixed somehow, but the point still stands that the allocation > context can cross from thread to thread and can effectively become > anything and everything. Not sure I understand your point here but relying on kthread_run from GFP_NOFS context has always been deadlock prone with or without scope GFP_NOFS semantic so I am not really sure I see your point here. Similarly relying on a work item which doesn't have a dedicated WQ_MEM_RECLAIM WQ is deadlock prone. You simply shouldn't do that. > It is OK to wait for memory to be freed. It is not OK to wait for any > particular piece of memory to be freed because you don't always know who > is waiting for you, or who you really are waiting on to free that > memory. > > Whenever trying to free memory I think you need to do best-effort > without blocking. I agree with that. Or at least you have to wait on something that is _guaranteed_ to make a forward progress. I am not really that sure this is easy to achieve with the current code base. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 3 May 2016 17:13:12 +0200 From: Michal Hocko To: NeilBrown Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Andrew Morton , Dave Chinner , Theodore Ts'o , Chris Mason , Jan Kara , ceph-devel@vger.kernel.org, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, logfs@logfs.org, xfs@oss.sgi.com, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mtd@lists.infradead.org, reiserfs-devel@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, linux-f2fs-devel@lists.sourceforge.net, linux-afs@lists.infradead.org, LKML Subject: Re: [PATCH 0/2] scop GFP_NOFS api Message-ID: <20160503151312.GA4470@dhcp22.suse.cz> References: <1461671772-1269-1-git-send-email-mhocko@kernel.org> <8737q5ugcx.fsf@notabene.neil.brown.name> <20160429120418.GK21977@dhcp22.suse.cz> <87twiiu5gs.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87twiiu5gs.fsf@notabene.neil.brown.name> Sender: owner-linux-mm@kvack.org List-ID: Hi, On Sun 01-05-16 07:55:31, NeilBrown wrote: [...] > One particular problem with your process-context idea is that it isn't > inherited across threads. > Steve Whitehouse's example in gfs shows how allocation dependencies can > even cross into user space. Hmm, I am still not sure I understand that example completely but making a dependency between direct reclaim and userspace can hardly work. Especially when the direct reclaim might be sitting on top of hard to guess pile of locks. So unless I've missed anything what Steve has described is a clear NOFS context. > A more localized one that I have seen is that NFSv4 sometimes needs to > start up a state-management thread (particularly if the server > restarted). > It uses kthread_run(), which doesn't actually create the thread but asks > kthreadd to do it. If NFS writeout is waiting for state management it > would need to make sure that kthreadd runs in allocation context to > avoid deadlock. > I feel that I've forgotten some important detail here and this might > have been fixed somehow, but the point still stands that the allocation > context can cross from thread to thread and can effectively become > anything and everything. Not sure I understand your point here but relying on kthread_run from GFP_NOFS context has always been deadlock prone with or without scope GFP_NOFS semantic so I am not really sure I see your point here. Similarly relying on a work item which doesn't have a dedicated WQ_MEM_RECLAIM WQ is deadlock prone. You simply shouldn't do that. > It is OK to wait for memory to be freed. It is not OK to wait for any > particular piece of memory to be freed because you don't always know who > is waiting for you, or who you really are waiting on to free that > memory. > > Whenever trying to free memory I think you need to do best-effort > without blocking. I agree with that. Or at least you have to wait on something that is _guaranteed_ to make a forward progress. I am not really that sure this is easy to achieve with the current code base. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (Postfix) with ESMTP id DF8857CA0 for ; Tue, 3 May 2016 10:13:20 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay3.corp.sgi.com (Postfix) with ESMTP id 6177EAC001 for ; Tue, 3 May 2016 08:13:20 -0700 (PDT) Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) by cuda.sgi.com with ESMTP id A7ej39lRs9O8XnZp (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Tue, 03 May 2016 08:13:15 -0700 (PDT) Received: by mail-wm0-f67.google.com with SMTP id r12so4370661wme.0 for ; Tue, 03 May 2016 08:13:15 -0700 (PDT) Date: Tue, 3 May 2016 17:13:12 +0200 From: Michal Hocko Subject: Re: [PATCH 0/2] scop GFP_NOFS api Message-ID: <20160503151312.GA4470@dhcp22.suse.cz> References: <1461671772-1269-1-git-send-email-mhocko@kernel.org> <8737q5ugcx.fsf@notabene.neil.brown.name> <20160429120418.GK21977@dhcp22.suse.cz> <87twiiu5gs.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <87twiiu5gs.fsf@notabene.neil.brown.name> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: NeilBrown Cc: linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org, Theodore Ts'o , Chris Mason , linux-ntfs-dev@lists.sourceforge.net, LKML , reiserfs-devel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, logfs@logfs.org, cluster-devel@redhat.com, linux-mm@kvack.org, linux-mtd@lists.infradead.org, linux-fsdevel@vger.kernel.org, Jan Kara , Andrew Morton , xfs@oss.sgi.com, ceph-devel@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-afs@lists.infradead.org Hi, On Sun 01-05-16 07:55:31, NeilBrown wrote: [...] > One particular problem with your process-context idea is that it isn't > inherited across threads. > Steve Whitehouse's example in gfs shows how allocation dependencies can > even cross into user space. Hmm, I am still not sure I understand that example completely but making a dependency between direct reclaim and userspace can hardly work. Especially when the direct reclaim might be sitting on top of hard to guess pile of locks. So unless I've missed anything what Steve has described is a clear NOFS context. > A more localized one that I have seen is that NFSv4 sometimes needs to > start up a state-management thread (particularly if the server > restarted). > It uses kthread_run(), which doesn't actually create the thread but asks > kthreadd to do it. If NFS writeout is waiting for state management it > would need to make sure that kthreadd runs in allocation context to > avoid deadlock. > I feel that I've forgotten some important detail here and this might > have been fixed somehow, but the point still stands that the allocation > context can cross from thread to thread and can effectively become > anything and everything. Not sure I understand your point here but relying on kthread_run from GFP_NOFS context has always been deadlock prone with or without scope GFP_NOFS semantic so I am not really sure I see your point here. Similarly relying on a work item which doesn't have a dedicated WQ_MEM_RECLAIM WQ is deadlock prone. You simply shouldn't do that. > It is OK to wait for memory to be freed. It is not OK to wait for any > particular piece of memory to be freed because you don't always know who > is waiting for you, or who you really are waiting on to free that > memory. > > Whenever trying to free memory I think you need to do best-effort > without blocking. I agree with that. Or at least you have to wait on something that is _guaranteed_ to make a forward progress. I am not really that sure this is easy to achieve with the current code base. -- Michal Hocko SUSE Labs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Date: Tue, 3 May 2016 17:13:12 +0200 Subject: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api In-Reply-To: <87twiiu5gs.fsf@notabene.neil.brown.name> References: <1461671772-1269-1-git-send-email-mhocko@kernel.org> <8737q5ugcx.fsf@notabene.neil.brown.name> <20160429120418.GK21977@dhcp22.suse.cz> <87twiiu5gs.fsf@notabene.neil.brown.name> Message-ID: <20160503151312.GA4470@dhcp22.suse.cz> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, On Sun 01-05-16 07:55:31, NeilBrown wrote: [...] > One particular problem with your process-context idea is that it isn't > inherited across threads. > Steve Whitehouse's example in gfs shows how allocation dependencies can > even cross into user space. Hmm, I am still not sure I understand that example completely but making a dependency between direct reclaim and userspace can hardly work. Especially when the direct reclaim might be sitting on top of hard to guess pile of locks. So unless I've missed anything what Steve has described is a clear NOFS context. > A more localized one that I have seen is that NFSv4 sometimes needs to > start up a state-management thread (particularly if the server > restarted). > It uses kthread_run(), which doesn't actually create the thread but asks > kthreadd to do it. If NFS writeout is waiting for state management it > would need to make sure that kthreadd runs in allocation context to > avoid deadlock. > I feel that I've forgotten some important detail here and this might > have been fixed somehow, but the point still stands that the allocation > context can cross from thread to thread and can effectively become > anything and everything. Not sure I understand your point here but relying on kthread_run from GFP_NOFS context has always been deadlock prone with or without scope GFP_NOFS semantic so I am not really sure I see your point here. Similarly relying on a work item which doesn't have a dedicated WQ_MEM_RECLAIM WQ is deadlock prone. You simply shouldn't do that. > It is OK to wait for memory to be freed. It is not OK to wait for any > particular piece of memory to be freed because you don't always know who > is waiting for you, or who you really are waiting on to free that > memory. > > Whenever trying to free memory I think you need to do best-effort > without blocking. I agree with that. Or at least you have to wait on something that is _guaranteed_ to make a forward progress. I am not really that sure this is easy to achieve with the current code base. -- Michal Hocko SUSE Labs