From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S934072AbdC3QML (ORCPT <rfc822;w@1wt.eu>);
        Thu, 30 Mar 2017 12:12:11 -0400
Received: from mx2.suse.de ([195.135.220.15]:35970 "EHLO mx2.suse.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S933356AbdC3QMK (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 30 Mar 2017 12:12:10 -0400
Date: Thu, 30 Mar 2017 18:12:06 +0200
From: Michal Hocko <mhocko@kernel.org>
To: Ilya Dryomov <idryomov@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        stable@vger.kernel.org, Sergey Jerusalimov <wintchester@gmail.com>,
        Jeff Layton <jlayton@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4.4 48/76] libceph: force GFP_NOIO for socket allocations
Message-ID: <20170330161204.GD4326@dhcp22.suse.cz>
References: <20170329105536.GH27994@dhcp22.suse.cz>
 <CAOi1vP93+MAQsSKpEGcrK0h3WpUH2rFnaFR4nUhhAQAXk0mrNA@mail.gmail.com>
 <20170329111650.GI27994@dhcp22.suse.cz>
 <CAOi1vP_6HvHAGo4Neu=q_LY_m_NRmSRkkGsW=95xYctLUdag6A@mail.gmail.com>
 <20170330062500.GB1972@dhcp22.suse.cz>
 <CAOi1vP8z4hngZecp6MoOOhKsLadZ5eJbQ92MvAGBbqdN03CfPw@mail.gmail.com>
 <20170330112126.GE1972@dhcp22.suse.cz>
 <CAOi1vP9Mdk+meGj39+wccBa6HN07y-pDxMLJj_xmAYBsRSoy1g@mail.gmail.com>
 <20170330143652.GA4326@dhcp22.suse.cz>
 <CAOi1vP_6zY5vZgYwBGEajz1nrjda7cDbWSLOhnBJGE=JZK1vBg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAOi1vP_6zY5vZgYwBGEajz1nrjda7cDbWSLOhnBJGE=JZK1vBg@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu 30-03-17 17:06:51, Ilya Dryomov wrote:
[...]
> > But if the allocation is stuck then the holder of the lock cannot make
> > a forward progress and it is effectivelly deadlocked because other IO
> > depends on the lock it holds. Maybe I just ask bad questions but what
> 
> Only I/O to the same OSD.  A typical ceph cluster has dozens of OSDs,
> so there is plenty of room for other in-flight I/Os to finish and move
> the allocator forward.  The lock in question is per-ceph_connection
> (read: per-OSD).
> 
> > makes GFP_NOIO different from GFP_KERNEL here. We know that the later
> > might need to wait for an IO to finish in the shrinker but it itself
> > doesn't get the lock in question directly. The former depends on the
> > allocator forward progress as well and that in turn wait for somebody
> > else to proceed with the IO. So to me any blocking allocation while
> > holding a lock which blocks further IO to complete is simply broken.
> 
> Right, with GFP_NOIO we simply wait -- there is nothing wrong with
> a blocking allocation, at least in the general case.  With GFP_KERNEL
> we deadlock, either in rbd/libceph (less likely) or in the filesystem
> above (more likely, shown in the xfs_reclaim_inodes_ag() traces you
> omitted in your quote).

I am not convinced. It seems you are relying on something that is not
guaranteed fundamentally. AFAIU all the IO paths should _guarantee_
and use mempools for that purpose if they need to allocate.

But, hey, I will not argue as my understanding of ceph is close to
zero. You are the maintainer so it is your call. I would just really
appreciate if you could document this as much as possible (ideally
at the place where you call memalloc_noio_save and describe the lock
dependency there).

Thanks!
-- 
Michal Hocko
SUSE Labs