All of lore.kernel.org
 help / color / mirror / Atom feed
From: majianpeng <majianpeng@gmail.com>
To: "Yan, Zheng" <zheng.z.yan@intel.com>
Cc: sage <sage@inktank.com>, ceph-devel <ceph-devel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Re: [PATCH 2/2] ceph: Implement writev/pwritev for sync operation.
Date: Fri, 6 Sep 2013 08:46:31 +0800	[thread overview]
Message-ID: <201309060846283230053@gmail.com> (raw)
In-Reply-To: 5227339C.5010706@intel.com

>Hi,
>
>Thank you for the patch.
>
>On 09/03/2013 04:52 PM, majianpeng wrote:
>> For writev/pwritev sync-operatoin, ceph only do the first iov.
>> It don't think other iovs.Now implement this.
>> I divided the write-sync-operation into two functions.One for
>> direct-write,other for none-direct-sync-write.This is because for
>> none-direct-sync-write we can merge iovs to one.But for direct-write,
>> we can't merge iovs.
>> 
>> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
>> ---
>>  fs/ceph/file.c | 328 +++++++++++++++++++++++++++++++++++++++++++--------------
>>  1 file changed, 248 insertions(+), 80 deletions(-)
>> 
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 7d6a3ee..42c97b3 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -533,17 +533,19 @@ static void ceph_sync_write_unsafe(struct ceph_osd_request *req, bool unsafe)
>>  	}
>>  }
>>  
>> +
>>  /*
>> - * Synchronous write, straight from __user pointer or user pages (if
>> - * O_DIRECT).
>> + * Synchronous write, straight from __user pointer or user pages.
>>   *
>>   * If write spans object boundary, just do multiple writes.  (For a
>>   * correct atomic write, we should e.g. take write locks on all
>>   * objects, rollback on failure, etc.)
>>   */
>> -static ssize_t ceph_sync_write(struct file *file, const char __user *data,
>> -			       size_t left, loff_t pos, loff_t *ppos)
>> +static ssize_t
>> +ceph_sync_direct_write(struct kiocb *iocb, const struct iovec *iov,
>> +		       unsigned long nr_segs, size_t count)
>>  {
>> +	struct file *file = iocb->ki_filp;
>>  	struct inode *inode = file_inode(file);
>>  	struct ceph_inode_info *ci = ceph_inode(inode);
>>  	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>> @@ -557,59 +559,55 @@ static ssize_t ceph_sync_write(struct file *file, const char __user *data,
>>  	int written = 0;
>>  	int flags;
>>  	int check_caps = 0;
>> -	int page_align, io_align;
>> -	unsigned long buf_align;
>> -	int ret;
>> +	int page_align;
>> +	int ret, i;
>>  	struct timespec mtime = CURRENT_TIME;
>> -	bool own_pages = false;
>> +	loff_t pos = iocb->ki_pos;
>>  
>>  	if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
>>  		return -EROFS;
>>  
>> -	dout("sync_write on file %p %lld~%u %s\n", file, pos,
>> -	     (unsigned)left, (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
>> +	dout("sync_direct_write on file %p %lld~%u\n", file, pos,
>> +	     (unsigned)count);
>>  
>> -	ret = filemap_write_and_wait_range(inode->i_mapping, pos, pos + left);
>> +	ret = filemap_write_and_wait_range(inode->i_mapping, pos, pos + count);
>>  	if (ret < 0)
>>  		return ret;
>>  
>>  	ret = invalidate_inode_pages2_range(inode->i_mapping,
>>  					    pos >> PAGE_CACHE_SHIFT,
>> -					    (pos + left) >> PAGE_CACHE_SHIFT);
>> +					    (pos + count) >> PAGE_CACHE_SHIFT);
>>  	if (ret < 0)
>>  		dout("invalidate_inode_pages2_range returned %d\n", ret);
>>  
>>  	flags = CEPH_OSD_FLAG_ORDERSNAP |
>>  		CEPH_OSD_FLAG_ONDISK |
>>  		CEPH_OSD_FLAG_WRITE;
>> -	if ((file->f_flags & (O_SYNC|O_DIRECT)) == 0)
>> -		flags |= CEPH_OSD_FLAG_ACK;
>> -	else
>> -		num_ops++;	/* Also include a 'startsync' command. */
>> +	num_ops++;	/* Also include a 'startsync' command. */
>>  
>> -	/*
>> -	 * we may need to do multiple writes here if we span an object
>> -	 * boundary.  this isn't atomic, unfortunately.  :(
>> -	 */
>> -more:
>> -	io_align = pos & ~PAGE_MASK;
>> -	buf_align = (unsigned long)data & ~PAGE_MASK;
>> -	len = left;
>> +	for (i = 0; i < nr_segs && count; i++) {
>
>POSIX requires that write syscall is atomic. I means we should allocate a single OSD request
>for all buffer segments that belong to the same object.
>
I think we could not.
For direct write, we use ceph_get_direct_page_vector to get pages.
Given iov1 and iov2 are in the same object. But we can't make the pages of iov1/2 to join together.
Because for ceph page_vector,it only record the offset of first page.

Or am i missing something?
Maybe we can use ceph pagelist but it will copy data.

Thanks!
Jianpeng Ma
>Regards
>Yan, Zheng

  reply	other threads:[~2013-09-06  0:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-03  8:52 [PATCH 2/2] ceph: Implement writev/pwritev for sync operation majianpeng
2013-09-04 13:17 ` Yan, Zheng
2013-09-04 13:20 ` Yan, Zheng
2013-09-06  0:46   ` majianpeng [this message]
2013-09-06  1:09     ` Yan, Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201309060846283230053@gmail.com \
    --to=majianpeng@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sage@inktank.com \
    --cc=zheng.z.yan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.