Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Matthew Bobrowski <mbobrowski@mbobrowski.org>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	hch@infradead.org, aneesh.kumar@linux.ibm.com,
	darrick.wong@oracle.com
Subject: Re: [PATCH 0/5] ext4: direct IO via iomap infrastructure
Date: Sat, 24 Aug 2019 13:18:31 +1000
Message-ID: <20190824031830.GB2174@poseidon.bobrowski.net> (raw)
In-Reply-To: <20190822141126.70A94A407B@d06av23.portsmouth.uk.ibm.com>

On Thu, Aug 22, 2019 at 07:41:25PM +0530, Ritesh Harjani wrote:
> 
> 
> On 8/22/19 5:30 PM, Matthew Bobrowski wrote:
> > On Wed, Aug 21, 2019 at 11:14:07PM +1000, Matthew Bobrowski wrote:
> > > On Tue, Aug 13, 2019 at 05:57:22PM +0530, RITESH HARJANI wrote:
> > > > But what I meant was this (I may be wrong here since I haven't
> > > > really looked into it), but for my understanding I would like to
> > > > discuss this -
> > > > 
> > > > So earlier with this flag(EXT4_STATE_DIO_UNWRITTEN) we were determining on
> > > > whether a newextent can be merged with ex1 in function
> > > > ext4_extents_can_be_merged. But now since we have removed that flag we have
> > > > no way of knowing that whether this inode has any unwritten extents or not
> > > > from any DIO path.
> > > > 
> > > > What I meant is isn't this removal of setting/unsetting of
> > > > flag(EXT4_STATE_DIO_UNWRITTEN) changing the behavior of this func -
> > > > ext4_extents_can_be_merged?
> > > 
> > > OK, I'm stuck and looking for either clarity, revalidation of my
> > > thought process, or any input on how to solve this problem for that
> > > matter.
> > > 
> > > In the current ext4 direct IO implementation, the dynamic state flag
> > > EXT4_STATE_DIO_UNWRITTEN is set/unset for synchronous direct IO
> > > writes. On the other hand, the flag EXT4_IO_END_UNWRITTEN is set/unset
> > > for ext4_io_end->flag, and the value of i_unwritten is
> > > incremented/decremented for asynchronous direct IO writes. All
> > > mechanisms by which are used to track and determine whether the inode,
> > > or an IO in flight against a particular inode have any pending
> > > unwritten extents that need to be converted after the IO has
> > > completed. In addition to this, we have ext4_can_extents_be_merged()
> > > performing explicit checks against both EXT4_STATE_DIO_UNWRITTEN and
> > > i_unwritten and using them to determine whether it can or cannot merge
> > > a requested extent into an existing extent.
> > > 
> > > This is all fine for the current direct IO implementation. However,
> > > while switching the direct IO code paths over to make use of the iomap
> > > infrastructure, I believe that we can no longer simply track whether
> > > an inode has unwritten extents needing to be converted by simply
> > > setting and checking the EXT4_STATE_DIO_UNWRITTEN flag on the
> > > inode. The reason being is that there can be multiple direct IO
> > > operations to unwritten extents running against the inode and we don't
> > > particularly distinguish synchronous from asynchronous writes within
> > > ext4_iomap_begin() as there's really no need. So, the only way to
> > > accurately determine whether extent conversion is deemed necessary for
> > > an IO operation whether it'd be synchronous or asynchronous is by
> > > checking the IOMAP_DIO_UNWRITTEN flag within the ->end_io()
> > > callback. I'm certain that this portion of the logic is correct, but
> > > we're still left with some issues when it comes to the checks that I
> > > previously mentioned in ext4_can_extents_be_merged(), which is the
> > > part I need some input on.
> > > 
> > > I was doing some thinking and I don't believe that making use of the
> > > EXT4_STATE_DIO_UNWRITTEN flag is the solution at all here. This is not
> > > only for reasons that I've briefly mentioned above, but also because
> > > of the fact that it'll probably lead to a lot of inaccurate judgements
> > > while taking particular code paths and some really ugly code that
> > > creeps close to the definition of insanity. Rather, what if we solve
> > > this problem by continuing to just use i_unwritten to keep track of
> > > all the direct IOs to unwritten against running against an inode?
> > > Within ext4_iomap_begin() post successful creation of unwritten
> > > extents we'd call atomic_inc(&EXT4_I(inode)->i_unwritten) and
> > > subsequently within the ->end_io() callback whether we take the
> > > success or error path we'd call
> > > atomic_dec(&EXT4_I(inode)->i_unwritten) accordingly? This way we can
> > > still rely on this value to be used in the check within
> > > ext4_can_extents_be_merged(). Open for alternate suggestions if anyone
> > > has any...
> > 
> > Actually, no...
> > 
> > I've done some more thinking and what I suggested above around the use
> > of i_unwritten will also not work properly. Using iomap
> > infrastructure, there is the possibility of calling into the
> > ->iomap_begin() more than once for a single direct IO operation. This
> > means that by the time we even get to decrementing i_unwritten in the
> > ->end_io() callback after converting the unwritten extents we're
> > already running the possibility of i_unwritten becoming unbalanced
> > really quickly and staying that way. This also means that the
> > statement checking i_unwritten in ext4_can_extents_be_merged() will be
> > affected and potentially result in it being evaluated incorrectly. I
> > was thinking that we could just decrement i_unwritten in
> > ->iomap_end(), but that seems to me like it would be racy and also
> > lead to incorrect results. At this point I'm out of ideas on how to
> > solve this, so any other ideas would be appreciated!
> 
> I will let others also comment, if someone has any other better approach.
> 
> 1. One approach is to add the infrastructure in iomap with
> iomap_dio->private which is filesystem specific pointer. This can be
> updated by filesystem in ->iomap_begin call into iomap->private.
> And in case of iomap_dio_rw, this iomap->private will be copied to
> iomap_dio->private if not already set.
> 
> But I think this will eventually become hacky in the sense when you will
> have to determine on whether the dio->private is already set or not when
> iomap_apply will be called second time. It will become a problem with AIO
> DIO in ext4 since we use i_unwritten which tells us whether there is any
> unwritten extent but it does not tell whether this unwritten extent is due
> to a DIRECT AIO DIO in progress or a buffered one.
> 
> So we can ignore this approach - unless you or someone else have some good
> design ideas to build on top of above.

I'm not sure whether _this_ is the solution or not, so let's maybe
wait for others to comment. One thing that I and probably the iomap
maintainers would like to avoid is adding any special case code to
iomap infrastructure, if possible. I mean, from what you suggest it
seems to be rather generic and not overly intrusive, although I know
for a fact that iomap infrastructure exists because stuff like
buffer_heads and the old direct IO code ended up accommodating so many
different edge cases making it almost unmodifiable and unmaintainable.

> 2. Second approach which I was thinking is to track only those extents which
> are marked unwritten and are under IO. This can be done in ext4_map_blocks.
> This way we will not have to track a particular inode has any unwritten
> extents or not, but it will be extent based.
> Something similar was also done a while ago. Do you think this approach will
> work in our case?
> 
> So with this extents will be scanned in extent status tree to see if any
> among those are under IO and are unwritten in func
> ext4_can_extents_be_merged.
> 
> https://patchwork.ozlabs.org/patch/1013837/

Maybe this would be a better approach and I think that it'd
work. Based upon what I read within in that thread there weren't
really any objections to the idea, although I can't see that it made
it upstream, so I may be missing something?

--M

  reply index

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-12 12:52 Matthew Bobrowski
2019-08-12 12:52 ` [PATCH 1/5] ext4: introduce direct IO read code path using " Matthew Bobrowski
2019-08-12 17:18   ` Christoph Hellwig
2019-08-12 20:17     ` Matthew Wilcox
2019-08-13 10:45       ` Matthew Bobrowski
2019-08-12 12:52 ` [PATCH 2/5] ext4: move inode extension/truncate code out from ext4_iomap_end() Matthew Bobrowski
2019-08-12 17:18   ` Christoph Hellwig
2019-08-13 10:46     ` Matthew Bobrowski
2019-08-28 19:59   ` Jan Kara
2019-08-28 21:54     ` Matthew Bobrowski
2019-08-29  8:18       ` Jan Kara
2019-08-12 12:53 ` [PATCH 3/5] iomap: modify ->end_io() calling convention Matthew Bobrowski
2019-08-12 17:18   ` Christoph Hellwig
2019-08-13 10:43     ` Matthew Bobrowski
2019-08-12 12:53 ` [PATCH 4/5] ext4: introduce direct IO write code path using iomap infrastructure Matthew Bobrowski
2019-08-12 17:04   ` RITESH HARJANI
2019-08-13 12:58     ` Matthew Bobrowski
2019-08-13 14:35       ` Darrick J. Wong
2019-08-14  9:51         ` Matthew Bobrowski
2019-08-12 17:34   ` Christoph Hellwig
2019-08-13 10:45     ` Matthew Bobrowski
2019-08-28 20:26   ` Jan Kara
2019-08-28 22:32     ` Dave Chinner
2019-08-29  8:03       ` Jan Kara
2019-08-29 11:47       ` Matthew Bobrowski
2019-08-29 11:45     ` Matthew Bobrowski
2019-08-29 12:38       ` Jan Kara
2019-08-12 12:53 ` [PATCH 5/5] ext4: clean up redundant buffer_head direct IO code Matthew Bobrowski
2019-08-12 17:31 ` [PATCH 0/5] ext4: direct IO via iomap infrastructure RITESH HARJANI
2019-08-13 11:10   ` Matthew Bobrowski
2019-08-13 12:27     ` RITESH HARJANI
2019-08-14  9:48       ` Matthew Bobrowski
2019-08-14 11:58         ` RITESH HARJANI
2019-08-21 13:14       ` Matthew Bobrowski
2019-08-22 12:00         ` Matthew Bobrowski
2019-08-22 14:11           ` Ritesh Harjani
2019-08-24  3:18             ` Matthew Bobrowski [this message]
2019-08-24  3:55               ` Darrick J. Wong
2019-08-24 23:04                 ` Christoph Hellwig
2019-08-27  9:52                   ` Matthew Bobrowski
2019-08-28 12:05                     ` Matthew Bobrowski
2019-08-28 14:27                       ` Theodore Y. Ts'o
2019-08-28 18:02                         ` Jan Kara
2019-08-29  6:36                           ` Christoph Hellwig
2019-08-29 11:20                             ` Matthew Bobrowski
2019-08-29 14:41                               ` Christoph Hellwig
2019-08-23 13:43           ` [RFC 1/1] ext4: PoC implementation of option-1 Ritesh Harjani
2019-08-23 13:49             ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190824031830.GB2174@poseidon.bobrowski.net \
    --to=mbobrowski@mbobrowski.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=riteshh@linux.ibm.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git