From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from verein.lst.de ([213.95.11.211]:36045 "EHLO newverein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755597AbcI2Ii0 (ORCPT ); Thu, 29 Sep 2016 04:38:26 -0400 Date: Thu, 29 Sep 2016 10:38:24 +0200 From: Christoph Hellwig To: "Darrick J. Wong" Cc: Christoph Hellwig , xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Subject: aio completions vs file_accessed race, was: Re: [PATCH 7/8] xfs: split direct I/O and DAX path Message-ID: <20160929083824.GA12799@lst.de> References: <1466609236-23801-1-git-send-email-hch@lst.de> <1466609236-23801-8-git-send-email-hch@lst.de> <20160929025351.GB4901@birch.djwong.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20160929025351.GB4901@birch.djwong.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Sep 28, 2016 at 07:53:52PM -0700, Darrick J. Wong wrote: > So I noticed that generic/323 starts crashing in file_accessed -> touch_atime > because iocb->ki_filp->f_path.dentry == NULL. For a while I thought it was > some weird reflink bug, but I finally had time to go build a vanilla 4.8-rc8 > kernel and that blew up here too. I'm not sure why this line got inserted > here, since it wasn't there prior to this patch, AFAICT. This line was there before near the end of xfs_file_dio_aio_read already, e.g. line 376 just before the above commit, but it only got introduced a bit earlier in "xfs: stop using generic_file_read_iter for direct I/O", which copied it over from generic_file_read_iter. І think any new issues in these commits could just be a minor timing change, as we're not changing struct file refcounting in any way here. generic/323 reproduces the last struct file reference being dropped by aio completions, so it seems like we have an issue here, which I suspect is something in the common code. I can't reproduce it locally, but looking at the aio_complete -> kiocb_free callchain and the lack of other struct file refcounting in aio.c it seems inherently unsafe to reference struct file once the completion may have run, that is after (__)blkdev_direct_IO returned. I'll see if I can come up with a solution for that, most likely that would involve moving the file_accessed call into __blkdev_direct_IO before we drop the final reference on the dio structure.