All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tao Ma <tm@tao.ma>
To: Zheng Liu <gnehzuil.liu@gmail.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [RFC][PATCH 3/3] ext4: add dio overwrite nolock
Date: Wed, 02 May 2012 14:59:34 +0800	[thread overview]
Message-ID: <4FA0DB56.5000803@tao.ma> (raw)
In-Reply-To: <1335584346-8070-4-git-send-email-wenqing.lz@taobao.com>

On 04/28/2012 11:39 AM, Zheng Liu wrote:
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> Aligned and overwrite direct IO can be parallelized.  In ext4_file_dio_write,
> we first check whether these conditions are satisfied or not.  If so, we unlock
> the i_mutex and acquire i_data_sem directly.  Meanwhile iocb->private is set to
> indicate that this is a overwrite dio, and it will be processed in
> ext4_ext_direct_IO.
> 
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
>  fs/ext4/file.c |  140 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 137 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index e5d6be3..8a5f713 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -100,9 +100,21 @@ static ssize_t
>  ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov,
>  		    unsigned long nr_segs, loff_t pos)
>  {
> -	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
> -	int unaligned_aio = 0;
> +	struct file *file = iocb->ki_filp;
> +	struct address_space * mapping = file->f_mapping;
> +	struct inode *inode = file->f_path.dentry->d_inode;
> +	struct blk_plug plug;
>  	ssize_t ret;
> +	ssize_t written, written_buffered;
> +	size_t length = iov_length(iov, nr_segs);
> +	size_t ocount;		/* original count */
> +	size_t count;		/* after file limit checks */
> +	int unaligned_aio = 0;
> +	int overwrite = 0;
> +	loff_t *ppos = &iocb->ki_pos;
> +	loff_t endbyte;
> +
> +	BUG_ON(iocb->ki_pos != pos);
>  
>  	if (!is_sync_kiocb(iocb))
>  		unaligned_aio = ext4_unaligned_aio(inode, iov, nr_segs, pos);
> @@ -121,7 +133,129 @@ ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov,
>  		ext4_aiodio_wait(inode);
>  	}
>  
> -	ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
> +	mutex_lock(&inode->i_mutex);
> +	blk_start_plug(&plug);
> +
> +	ocount = 0;
> +	ret = generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ);
> +	if (ret)
> +		goto unlock_out;
> +
> +	count = ocount;
> +	pos = *ppos;
> +
> +	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> +
> +	/* We can write back this queue in page reclaim */
> +	current->backing_dev_info = mapping->backing_dev_info;
> +	written = 0;
> +
> +	ret = generic_write_checks(file, &pos, &count, S_ISBLK(inode->i_mode));
> +	if (ret)
> +		goto out;
> +
> +	if (count == 0)
> +		goto out;
> +
> +	ret = file_remove_suid(file);
> +	if (ret)
> +		goto out;
> +
> +	file_update_time(file);
> +
> +	iocb->private = NULL;
> +
> +	if (!unaligned_aio && !file->f_mapping->nrpages &&
> +	    pos + length < i_size_read(inode) &&
should be pos + length <= ?
And inode->i_size should be ok since now we have i_mutex held.
> +	    ext4_should_dioread_nolock(inode)) {
> +		struct ext4_map_blocks map;
> +		unsigned int blkbits = inode->i_blkbits;
> +		int err;
> +		int len;
> +
> +		map.m_lblk = pos >> blkbits;
> +		map.m_len = (EXT4_BLOCK_ALIGN(pos + length, blkbits) >> blkbits)
> +			- map.m_lblk;
> +		len = map.m_len;
> +
> +		err = ext4_map_blocks(NULL, inode, &map, 0);
> +		if (err == len && (!map.m_flags ||
> +		    map.m_flags & EXT4_MAP_MAPPED)) {
could you please add some comments about how and why map.m_flags are
checked this way?
> +			overwrite = 1;
> +			iocb->private = &overwrite;
> +			mutex_unlock(&inode->i_mutex);
> +			down_read(&EXT4_I(inode)->i_data_sem);
Is there any possibility that the metadata is changed after we dropped
the i_mutex before the down_read?
> +		}
> +	}
> +
> +	if (file->f_mapping->nrpages && overwrite) {
> +		overwrite = 0;
> +		up_read(&EXT4_I(inode)->i_data_sem);
> +		mutex_lock(&inode->i_mutex);
I am not sure whether it could happen. But if it does happen, should we
also change the value in iocb->private?
> +	}
> +
> +	written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
> +						ppos, count, ocount);
> +	if (written < 0 || written == count)
> +		goto out;
> +	/*
> +	 * direct-io write to a hole: fall through to buffered I/O
> +	 * for completing the rest of the request.
> +	 */
> +	pos += written;
> +	count -= written;
> +	written_buffered = generic_file_buffered_write(iocb, iov,
> +					nr_segs, pos, ppos, count,
> +					written);
If we fall back here, should we re-lock the i_mutex since the buffer
write isn't guaranteed?

Thanks
Tao
> +	/*
> +	 * If generic_file_buffered_write() retuned a synchronous error
> +	 * then we want to return the number of bytes which were
> +	 * direct-written, or the error code if that was zero.  Note
> +	 * that this differs from normal direct-io semantics, which
> +	 * will return -EFOO even if some bytes were written.
> +	 */
> +	if (written_buffered < 0) {
> +		ret = written_buffered;
> +		goto out;
> +	}
> +
> +	/*
> +	 * We need to ensure that the page cache pages are written to
> +	 * disk and invalidated to preserve the expected O_DIRECT
> +	 * semantics.
> +	 */
> +	endbyte = pos + written_buffered - written - 1;
> +	ret = filemap_write_and_wait_range(file->f_mapping, pos, endbyte);
> +	if (ret == 0) {
> +		written = written_buffered;
> +		invalidate_mapping_pages(mapping,
> +					 pos >> PAGE_CACHE_SHIFT,
> +					 endbyte >> PAGE_CACHE_SHIFT);
> +	} else {
> +		/*
> +		 * We don't know how much we wrote, so just return
> +		 * the number of bytes which were direct-written
> +		 */
> +	}
> +
> +out:
> +	current->backing_dev_info = NULL;
> +	ret = written ? written : ret;
> +
> +unlock_out:
> +	if (overwrite)
> +		up_read(&EXT4_I(inode)->i_data_sem);
> +	else
> +		mutex_unlock(&inode->i_mutex);
> +
> +	if (ret > 0 || ret == -EIOCBQUEUED) {
> +		ssize_t err;
> +
> +		err = generic_write_sync(file, pos, ret);
> +		if (err < 0 && ret > 0)
> +			ret = err;
> +	}
> +	blk_finish_plug(&plug);
>  
>  	if (unaligned_aio)
>  		mutex_unlock(ext4_aio_mutex(inode));


  reply	other threads:[~2012-05-02  6:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-28  3:39 [RFC][PATCH 0/3] ext4: dio overwrite nolock Zheng Liu
2012-04-28  3:39 ` [RFC][PATCH 1/3] ext4: split ext4_file_write into buffered IO and direct IO Zheng Liu
2012-05-02  4:11   ` Tao Ma
2012-05-02  5:50     ` Zheng Liu
2012-04-28  3:39 ` [RFC][PATCH 2/3] ext4: add a new flag for ext4_map_blocks Zheng Liu
2012-04-28  3:39 ` [RFC][PATCH 3/3] ext4: add dio overwrite nolock Zheng Liu
2012-05-02  6:59   ` Tao Ma [this message]
2012-05-02  8:16     ` Zheng Liu
2012-05-02 15:05   ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FA0DB56.5000803@tao.ma \
    --to=tm@tao.ma \
    --cc=gnehzuil.liu@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.