From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Bqph=A2=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,
	FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CE51FC433E3
	for <linux-mm@archiver.kernel.org>; Wed, 15 Jul 2020 03:53:14 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 682202065D
	for <linux-mm@archiver.kernel.org>; Wed, 15 Jul 2020 03:53:14 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 682202065D
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id E1A116B0006; Tue, 14 Jul 2020 23:53:13 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DC8F96B0007; Tue, 14 Jul 2020 23:53:13 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id CB82C6B0008; Tue, 14 Jul 2020 23:53:13 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5])
	by kanga.kvack.org (Postfix) with ESMTP id B218F6B0006
	for <linux-mm@kvack.org>; Tue, 14 Jul 2020 23:53:13 -0400 (EDT)
Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 3F91E1EE6
	for <linux-mm@kvack.org>; Wed, 15 Jul 2020 03:53:13 +0000 (UTC)
X-FDA: 77038939866.04.unit52_4e1683326ef6
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin04.hostedemail.com (Postfix) with ESMTP id 0C8E7800490E
	for <linux-mm@kvack.org>; Wed, 15 Jul 2020 03:53:13 +0000 (UTC)
X-HE-Tag: unit52_4e1683326ef6
X-Filterd-Recvd-Size: 8896
Received: from r3-21.sinamail.sina.com.cn (r3-21.sinamail.sina.com.cn [202.108.3.21])
	by imf10.hostedemail.com (Postfix) with SMTP
	for <linux-mm@kvack.org>; Wed, 15 Jul 2020 03:53:10 +0000 (UTC)
Received: from unknown (HELO localhost.localdomain)([123.123.24.222])
	by sina.com with ESMTP
	id 5F0E7DA100033882; Wed, 15 Jul 2020 11:53:07 +0800 (CST)
X-Sender: hdanton@sina.com
X-Auth-ID: hdanton@sina.com
X-SMAIL-MID: 944775628825
From: Hillf Danton <hdanton@sina.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Todd Kjos <tkjos@google.com>,
	Michal Hocko <mhocko@kernel.org>,
	Hridya Valsaraju <hridya@google.com>,
	Hillf Danton <hdanton@sina.com>,
	Eric Biggers <ebiggers@kernel.org>,
	syzbot <syzbot+7a0d9d0b26efefe61780@syzkaller.appspotmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	=?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= <arve@android.com>,
	Christian Brauner <christian@brauner.io>,
	"open list:ANDROID DRIVERS" <devel@driverdev.osuosl.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Hugh Dickins <hughd@google.com>,
	"Joel Fernandes (Google)" <joel@joelfernandes.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Martijn Coenen <maco@android.com>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Todd Kjos <tkjos@android.com>,
	Markus Elfring <Markus.Elfring@web.de>
Subject: Re: possible deadlock in shmem_fallocate (4)
Date: Wed, 15 Jul 2020 11:52:56 +0800
Message-Id: <20200715035256.13628-1-hdanton@sina.com>
In-Reply-To: <CAJuCfpFz9kEfTPxcausVj63mUvU7i6Dvv6=KNePVX2qR+-Ci2A@mail.gmail.com>
References: <0000000000000b5f9d059aa2037f@google.com> <20200714033252.8748-1-hdanton@sina.com> <20200714053205.15240-1-hdanton@sina.com> <20200714140859.15156-1-hdanton@sina.com> <20200714141815.GP24642@dhcp22.suse.cz> <CAHRSSEzbCW3E0QTR0ryf3p=5J5uhs_vY2D6fFQEzP=HeCDkPGQ@mail.gmail.com> <CAJuCfpExhJJO_xPk663-eUkmAXVVwNDd9a7ahQuwMW8JVMBJpg@mail.gmail.com>
MIME-Version: 1.0
X-Rspamd-Queue-Id: 0C8E7800490E
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam03
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


On Tue, 14 Jul 2020 10:32:20 -0700 Suren Baghdasaryan wrote:
> On Tue, Jul 14, 2020 at 9:41 AM Suren Baghdasaryan <surenb@google.com> =
wrote:
> >
> > On Tue, Jul 14, 2020 at 8:47 AM Todd Kjos <tkjos@google.com> wrote:
> > >
> > > +Suren Baghdasaryan +Hridya Valsaraju who support the ashmem driver=
.
> >
> > Thanks for looping me in.
> >
> > > On Tue, Jul 14, 2020 at 7:18 AM Michal Hocko <mhocko@kernel.org> wr=
ote:
> > > >
> > > > On Tue 14-07-20 22:08:59, Hillf Danton wrote:
> > > > >
> > > > > On Tue, 14 Jul 2020 10:26:29 +0200 Michal Hocko wrote:
> > > > > > On Tue 14-07-20 13:32:05, Hillf Danton wrote:
> > > > > > >
> > > > > > > On Mon, 13 Jul 2020 20:41:11 -0700 Eric Biggers wrote:
> > > > > > > > On Tue, Jul 14, 2020 at 11:32:52AM +0800, Hillf Danton wr=
ote:
> > > > > > > > >
> > > > > > > > > Add FALLOC_FL_NOBLOCK and on the shmem side try to lock=
 inode upon the
> > > > > > > > > new flag. And the overall upside is to keep the current=
 gfp either in
> > > > > > > > > the khugepaged context or not.
> > > > > > > > >
> > > > > > > > > --- a/include/uapi/linux/falloc.h
> > > > > > > > > +++ b/include/uapi/linux/falloc.h
> > > > > > > > > @@ -77,4 +77,6 @@
> > > > > > > > >   */
> > > > > > > > >  #define FALLOC_FL_UNSHARE_RANGE              0x40
> > > > > > > > >
> > > > > > > > > +#define FALLOC_FL_NOBLOCK            0x80
> > > > > > > > > +
> > > > > > > >
> > > > > > > > You can't add a new UAPI flag to fix a kernel-internal pr=
oblem like this.
> > > > > > >
> > > > > > > Sounds fair, see below.
> > > > > > >
> > > > > > > What the report indicates is a missing PF_MEMALLOC_NOFS and=
 it's
> > > > > > > checked on the ashmem side and added as an exception before=
 going
> > > > > > > to filesystem. On shmem side, no more than a best effort is=
 paid
> > > > > > > on the inteded exception.
> > > > > > >
> > > > > > > --- a/drivers/staging/android/ashmem.c
> > > > > > > +++ b/drivers/staging/android/ashmem.c
> > > > > > > @@ -437,6 +437,7 @@ static unsigned long
> > > > > > >  ashmem_shrink_scan(struct shrinker *shrink, struct shrink_=
control *sc)
> > > > > > >  {
> > > > > > >   unsigned long freed =3D 0;
> > > > > > > + bool nofs;
> > > > > > >
> > > > > > >   /* We might recurse into filesystem code, so bail out if =
necessary */
> > > > > > >   if (!(sc->gfp_mask & __GFP_FS))
> > > > > > > @@ -445,6 +446,11 @@ ashmem_shrink_scan(struct shrinker *sh=
ri
> > > > > > >   if (!mutex_trylock(&ashmem_mutex))
> > > > > > >           return -1;
> > > > > > >
> > > > > > > + /* enter filesystem with caution: nonblock on locking */
> > > > > > > + nofs =3D current->flags & PF_MEMALLOC_NOFS;
> > > > > > > + if (!nofs)
> > > > > > > +         current->flags |=3D PF_MEMALLOC_NOFS;
> > > > > > > +
> > > > > > >   while (!list_empty(&ashmem_lru_list)) {
> > > > > > >           struct ashmem_range *range =3D
> > > > > > >                   list_first_entry(&ashmem_lru_list, typeof=
(*range), lru);
> > > > > >
> > > > > > I do not think this is an appropriate fix. First of all is th=
is a real
> > > > > > deadlock or a lockdep false positive? Is it possible that ash=
mem just
> > > > >
> > > > > The warning matters and we can do something to quiesce it.
> > > >
> > > > The underlying issue should be fixed rather than _something_ done=
 to
> > > > silence it.
> > > >
> > > > > > needs to properly annotate its shmem inodes? Or is it possibl=
e that
> > > > > > the internal backing shmem file is visible to the userspace s=
o the write
> > > > > > path would be possible?
> > > > > >
> > > > > > If this a real problem then the proper fix would be to set in=
ternal
> > > > > > shmem mapping's gfp_mask to drop __GFP_FS.
> > > > >
> > > > > Thanks for the tip, see below.
> > > > >
> > > > > Can you expand a bit on how it helps direct reclaimers like khu=
gepaged
> > > > > in the syzbot report wrt deadlock?
> > > >
> > > > I do not understand your question.
> > > >
> > > > > TBH I have difficult time following
> > > > > up after staring at the chart below for quite a while.
> > > >
> > > > Yes, lockdep reports are quite hard to follow and they tend to co=
nfuse
> > > > one hell out of me. But this one says that there is a reclaim dep=
endency
> > > > between the shmem inode lock and the reclaim context.
> > > >
> > > > > Possible unsafe locking scenario:
> > > > >
> > > > >        CPU0                    CPU1
> > > > >        ----                    ----
> > > > >   lock(fs_reclaim);
> > > > >                                lock(&sb->s_type->i_mutex_key#15=
);
> > > > >                                lock(fs_reclaim);
> > > > >
> > > > >   lock(&sb->s_type->i_mutex_key#15);
> > > >
> > > > Please refrain from proposing fixes until the actual problem is
> > > > understood. I suspect that this might be just false positive beca=
use the
> > > > lockdep cannot tell the backing shmem which is internal to ashmem=
(?)
> > > > with any general shmem.
>=20
> Actually looking some more into this, I think you are right. Ashmem
> currently does not redirect writes into the backing shmem and
> fallocate call from ashmem_shrink_scan is always performed against
> asma->file, which is the backing shmem. IOW writes into the backing
> shmem are not supported, therefore this concurrent locking can't
> happen.

The print of generic_file_write_iter in the syzbot report backs that
concurrency because of f_op::fallocate and another is
 Reported-by: syzbot+7a0d9d0b26efefe61780@syzkaller.appspotmail.com

>=20
> I'm not sure how we can annotate the fact that the inode_lock in
> generic_file_write_iter and in shmem_fallocate always operate on
> different inodes. Ideas?
>=20
> > > >  But somebody really familiar with ashmem code
> > > > should have a look I believe.
> >
> > I believe the deadlock is possible if a write to ashmem fd coincides
> > with shrinking of ashmem caches. I just developed a possible fix here
> > https://android-review.googlesource.com/c/kernel/common/+/1361205 but
> > wanted to test it before posting upstream. The idea is to detect such
> > a race between write and cache shrinking operations and let
> > ashmem_shrink_scan bail out if the race is detected instead of taking
> > inode_lock. AFAIK writing ashmem files is not a usual usage for ashme=
m
> > (standard usage is to mmap it and use as shared memory), therefore
> > this bailing out early should not affect ashmem cache maintenance
> > much. Besides ashmem_shrink_scan already bails out early if a
> > contention on ashmem_mutex is detected, which is a much more probable
> > case (see: https://elixir.bootlin.com/linux/v5.8-rc4/source/drivers/s=
taging/android/ashmem.c#L497).
> >
> > I'll test and post the patch here in a day or so if there are no earl=
y
> > objections to it.
> > Thanks!
> >
> > > >
> > > > --
> > > > Michal Hocko
> > > > SUSE Labs