From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Talyansky, Roman" Subject: Re: Write operation is stuck Date: Wed, 24 Feb 2010 14:34:23 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1446135999712393956==" Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: Yehuda Sadeh Weinraub Cc: Sage Weil , "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org --===============1446135999712393956== Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_C6A64D82E3A5D24B949315CFBC1FA1AD072A3B5200DEWDFECCR01wd_" --_000_C6A64D82E3A5D24B949315CFBC1FA1AD072A3B5200DEWDFECCR01wd_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Yehuda, Thanks for the info on the fix. I'll incorporate it into the code and rerun= the experiments. It also seems that the code at that location became a bit more complex - ne= w #if occurred: #if LINUX_VERSION_CODE >=3D KERNEL_VERSION(2, 6, 32) And consequently the code under #else should be fixed as well. Thanks, Roman From: Yehuda Sadeh Weinraub [mailto:yehudasa@gmail.com] Sent: Tuesday, February 23, 2010 8:11 PM To: Talyansky, Roman Cc: Sage Weil; ceph-devel@lists.sourceforge.net Subject: Re: [ceph-devel] Write operation is stuck On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman > wrote: Hi Sage, As you advised us, we switched to the release 0.19 of ceph and ran into ano= ther bug in the ceph client. When writing to a file with the O_SYNC flag, = "0" is always returned although the data is written to disk. This poses a problem in our benchmark which uses the return value as number= of bytes written. Also it seems that such behavior infringes the POSIX wri= te() contract. Yeah, thanks. A fix was pushed to the unstable branch. We will probably sta= rt maintaining a stable version that will contain such fixes, but you can a= pply this in the mean time: diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 2c4ae44..88932c9 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const= struct iovec *iov, struct ceph_osd_client *osdc =3D &ceph_client(inode->i_sb)->osdc; loff_t endoff =3D pos + iov->iov_len; int got =3D 0; - int ret; + int ret, err; if (ceph_snap(inode) !=3D CEPH_NOSNAP) return -EROFS; @@ -838,9 +838,12 @@ retry_snap: if ((ret >=3D 0 || ret =3D=3D -EIOCBQUEUED) && ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->h= ost) - || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL= ))) - ret =3D vfs_fsync_range(file, file->f_path.dentry, + || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL= ))) { + err =3D vfs_fsync_range(file, file->f_path.dentry, pos, pos + ret - 1, 1); + if (err < 0) + ret =3D err; + } } if (ret >=3D 0) { spin_lock(&inode->i_lock); Yehuda --_000_C6A64D82E3A5D24B949315CFBC1FA1AD072A3B5200DEWDFECCR01wd_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Yehuda,

 

Thanks for the info on the fix. I’ll incorporate it in= to the code and rerun the experiments.

It also seems that the code at that location became a bit mo= re complex – new #if occurred:

 

#if LINUX_VERSION_CODE &g= t;=3D KERNEL_VERSION(2, 6, 32)

 

And consequently the code under #else should be fixed as wel= l.

 

Thanks,

 

Roman

 

From: Yehuda Sadeh Weinraub [mailto:yehudasa@gmail.com]
Sent: Tuesday, February 23, 2010 8:11 PM
To: Talyansky, Roman
Cc: Sage Weil; ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck
=

 

 

On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman <= roman.talyansky@sap.com> wro= te:

Hi Sage,

As you advised us, we switched to the release 0.19 of ceph and ran into ano= ther bug in the ceph client. When writing to a file with the O_SYNC flag,  "0" is always returned although the data is written to disk= .
This poses a problem in our benchmark which uses the return value as number= of bytes written. Also it seems that such behavior infringes the POSIX write() contract.

 

Yeah, thanks. A fix was pushed to the unstable branch.= We will probably start maintaining a stable version that will contain such fix= es, but you can apply this in the mean time:

 

diff --git a/fs/ceph/file.c b/fs/ceph/file.c

index 2c4ae44..88932c9 100644

--- a/fs/ceph/file.c

+++ b/fs/ceph/file.c

@@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(stru= ct kiocb *iocb, const struct iovec *iov,

        struct ceph_osd_clien= t *osdc =3D &ceph_client(inode->i_sb)->osdc;

        loff_t endoff =3D pos= + iov->iov_len;

        int got =3D 0;

-       int ret;

+       int ret, err;

 

        if (ceph_snap(inode) = !=3D CEPH_NOSNAP)

                return -EROFS;

@@ -838,9 +838,12 @@ retry_snap:

 

                if ((ret >=3D 0 || ret =3D=3D -EIOCBQUEUED) &&<= /p>

                    ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host)

-               &nb= sp;    || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL)))

-               &nb= sp;       ret =3D vfs_fsync_range(file, file->f_path.dentry,<= o:p>

+               &nb= sp;    || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) {=

+               &nb= sp;       err =3D vfs_fsync_range(file, file->f_path.dentry,<= o:p>

                                    = ;          pos, pos + ret - 1, 1);

+               &nb= sp;       if (err < 0)

+               &nb= sp;               ret =3D err;

+               }

        }

        if (ret >=3D 0) {<= o:p>

                spin_lock(&inode->i_lock);

 

 

 

Yehuda

--_000_C6A64D82E3A5D24B949315CFBC1FA1AD072A3B5200DEWDFECCR01wd_-- --===============1446135999712393956== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev --===============1446135999712393956== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel --===============1446135999712393956==--