All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Talyansky, Roman" <roman.talyansky@sap.com>
To: Sage Weil <sage@newdream.net>
Cc: "ceph-devel@lists.sourceforge.net" <ceph-devel@lists.sourceforge.net>
Subject: Re: Write operation is stuck
Date: Tue, 23 Feb 2010 15:11:43 +0100	[thread overview]
Message-ID: <C6A64D82E3A5D24B949315CFBC1FA1AD072A34414F@DEWDFECCR01.wdf.sap.corp> (raw)
In-Reply-To: <Pine.LNX.4.64.1002191007260.20757@cobra.newdream.net>

[-- Attachment #1: Type: text/plain, Size: 3262 bytes --]

Hi Sage,

As you advised us, we switched to the release 0.19 of ceph and ran into another bug in the ceph client. When writing to a file with the O_SYNC flag,  "0" is always returned although the data is written to disk.
This poses a problem in our benchmark which uses the return value as number of bytes written. Also it seems that such behavior infringes the POSIX write() contract.

Attached is a small unit test in c++.
The unit test creates 2 files which are exactly the same, both filled randomly with numbers 0-9.
Afterwards the both files are closed.
Then one file is reopened and filled with 1's.

Running the test:
$ g++ temp.cc
$ ./a.out 100  (this is the number of bytes in the files)
Each time 0 is returned it is printed out on the screen.
Run the executable a.out from within a directory on a ceph file system.

After the program  finishes you will find 2 files:
./test  - filled with one's
./test.start - filled with random numeric data

If you run this test on NFS and ceph you will see that no errors are printed out on the NFS file system, and 100 errors are printed out on ceph.

Thanks,

Roman & Roman

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Friday, February 19, 2010 8:39 PM
To: Talyansky, Roman
Cc: ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck

Hi Roman,

On Fri, 19 Feb 2010, Talyansky, Roman wrote:
> Since I test several ceph versions simultaneously I could confuse the error checking at different nodes.
> I'll double check this and let you know.

Thanks.  If you haven't switched to the just-released 0.19, now might be 
the time to do that.

> > It also looks like the IO is synchronous, which may have something 
> > to do with your performance.  Are you mounting with -o sync or using 
> > direct IO, or are multiple clients reading and writing to the same file or 
> > something?
>
> The IO is indeed synchronous. However the performance under ceph is much 
> worse than even under nfs, which looks strange. I do not mount with -o 
> synch. And in our experiments multiple clients read and write the same 
> file.

If you are accessing the same file from multiple clients, then any 
comparison with nfs is going to be misleading.  NFS provides only close to 
open consistency, so IO will be buffered and inconsistent.  Ceph provides 
fully consistent semantics by switching to synchronous IO when there are 
multiple clients.  Ceph will be slower, but correct; nfs will be fast, but 
incorrect.

If your application is smart enough to handle it's own consistency (each 
client is writing to a different region of the file) then you probably 
want something along the lines of O_LAZY [1], so that the application can 
tell the FS not to worry about consistency and stick with buffered IO.  
Unfortunately O_LAZY doesn't exist in Linux at this point.  There is some 
preliminary support for it in Ceph... if that's what you're looking for, 
we can cook up some patches for you.

If you can find us in #ceph on irc.oftc.net that might be a quicker way to 
diagnose the performance problems with your workload.

Thanks!
sage

[1] http://www.pdl.cmu.edu/posix/docs/posix_lazy_io.pdf

[-- Attachment #2: temp.cc --]
[-- Type: application/octet-stream, Size: 1658 bytes --]

#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
#include <vector>
#include <string>
#include <iostream>
#include <fstream>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
using namespace std;


#define BUF_LEN 100000
#define BEGIN_INPUT_F_NAME "test.start"
#define F_NAME "test"
int main(int argc, char** argv)
{

				switch(argc){
								case 2:
												cout<<"File size is "<<atoi(argv[1])<<endl;		
												break;	
								default:
												cerr<<"Usage: "<<argv[0]<<" size of file"<<endl;
												exit(1);
				}
				int fSize	= atoi(argv[1]);
				ofstream beginFile;
				ofstream workFile;
				beginFile.open(BEGIN_INPUT_F_NAME);
				workFile.open(F_NAME);

				int ran=0;
				for(int i=0;i<fSize;i++){
								ran=rand();
								ran=48+ran%10;
								beginFile<<(char)ran;
								workFile<<(char)ran;
				}
				beginFile.close();
				workFile.close();

				char buff[]={49};
				//Start filling files with ones
				//

				int flags = O_SYNC|O_RDWR;
				int fd = ::open(F_NAME, flags);
				if (fd <= 0) {
								cerr << " open problem with: " << F_NAME << endl;
				}

				for(int i = 0; i <fSize; i++){
								off_t res = ::lseek(fd, i, SEEK_SET);
								if (res != i) {
												cerr << "seek op failed res=" << res << " offset=" << i << endl;
								}
								res = ::write(fd,buff,1 );
								if (res != 1){
												cerr << "res=" << res << " write error=" << strerror(errno) << std::endl;
								}

				}

				int res_close = ::close(fd); 
				if (res_close == -1){
								cerr << "close error=" << strerror(errno) << std::endl;
				}

				exit(0);
}


[-- Attachment #3: Type: text/plain, Size: 345 bytes --]

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev

[-- Attachment #4: Type: text/plain, Size: 161 bytes --]

_______________________________________________
Ceph-devel mailing list
Ceph-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ceph-devel

  reply	other threads:[~2010-02-23 14:11 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-10 21:26 Write operation is stuck Talyansky, Roman
2010-02-10 21:39 ` Sage Weil
2010-02-10 22:44   ` Talyansky, Roman
2010-02-10 22:49     ` Sage Weil
2010-02-16 17:27   ` Talyansky, Roman
2010-02-16 18:35     ` Sage Weil
2010-02-19 15:40       ` Talyansky, Roman
2010-02-19 18:39         ` Sage Weil
2010-02-23 14:11           ` Talyansky, Roman [this message]
2010-02-23 18:11             ` Yehuda Sadeh Weinraub
2010-02-24 13:34               ` Talyansky, Roman
2010-02-24 14:56                 ` Sage Weil
2010-02-24 16:42                   ` Talyansky, Roman
2010-02-24 18:43                     ` Sage Weil
2010-02-24 23:21                       ` Talyansky, Roman
2010-02-25 10:07                       ` Talyansky, Roman
2010-08-27 12:18 Bogdan Lobodzinski
2010-08-27 15:42 ` Wido den Hollander
2010-08-27 16:09 ` Sage Weil
2010-08-30 15:32   ` Bogdan Lobodzinski
2010-08-30 19:39     ` Sage Weil
2010-08-31  7:56       ` Bogdan Lobodzinski
2010-09-01 15:21         ` Bogdan Lobodzinski
2010-09-01 19:29           ` Wido den Hollander
2010-09-03 15:02             ` Bogdan Lobodzinski
2010-09-03 17:10               ` Yehuda Sadeh Weinraub
2010-09-03 19:20                 ` Yehuda Sadeh Weinraub

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C6A64D82E3A5D24B949315CFBC1FA1AD072A34414F@DEWDFECCR01.wdf.sap.corp \
    --to=roman.talyansky@sap.com \
    --cc=ceph-devel@lists.sourceforge.net \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.