All of lore.kernel.org
 help / color / mirror / Atom feed
* Consistently reading/writing rados objects via command line
@ 2013-01-22  1:01 Nick Bartos
  2013-01-22  1:11 ` Gregory Farnum
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Bartos @ 2013-01-22  1:01 UTC (permalink / raw)
  To: ceph-devel

I would like to store some objects in rados, and retrieve them in a
consistent manor.  In my initial tests, if I do a 'rados -p foo put
test /tmp/test', while it is uploading I can do a 'rados -p foo get
test /tmp/blah' on another machine, and it will download a partially
written file without returning an error code, so the downloader cannot
tell the file is corrupt/incomplete.

My question is, how do I read/write objects in rados via the command
line in such a way where the downloader does not get a corrupt or
incomplete file?  It's fine if it just returns an error on the client
and I can try again, I just need to be notified on error.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-22  1:01 Consistently reading/writing rados objects via command line Nick Bartos
@ 2013-01-22  1:11 ` Gregory Farnum
  2013-01-22  5:14   ` Sage Weil
  2013-01-22 16:59   ` Nick Bartos
  0 siblings, 2 replies; 13+ messages in thread
From: Gregory Farnum @ 2013-01-22  1:11 UTC (permalink / raw)
  To: Nick Bartos; +Cc: ceph-devel

On Monday, January 21, 2013 at 5:01 PM, Nick Bartos wrote:
> I would like to store some objects in rados, and retrieve them in a
> consistent manor. In my initial tests, if I do a 'rados -p foo put
> test /tmp/test', while it is uploading I can do a 'rados -p foo get
> test /tmp/blah' on another machine, and it will download a partially
> written file without returning an error code, so the downloader cannot
> tell the file is corrupt/incomplete.
>  
> My question is, how do I read/write objects in rados via the command
> line in such a way where the downloader does not get a corrupt or
> incomplete file? It's fine if it just returns an error on the client
> and I can try again, I just need to be notified on error.
>  
You must be writing large-ish objects? By default the rados tool will upload objects 4MB at a time and you're trying to download mid-way through the full object upload. You can add a "--block-size 20971520" to upload 20MB in a single operation, but make sure you don't exceed the "osd max write size" (90MB by default).
This is all client-side stuff, though — from the RADOS object store's perspective, the file is complete after each 4MB write. If you want something more sophisticated (like handling larger objects) you'll need to do at least some minimal tooling of your own, e.g. by setting an object xattr before starting and after finishing the file change, then checking for that presence when reading (and locking on reads or doing a check when the read completes). You can do that with the "setxattr", "rmxattr", and "getxattr" options.
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-22  1:11 ` Gregory Farnum
@ 2013-01-22  5:14   ` Sage Weil
  2013-01-22 16:38     ` Nick Bartos
  2013-01-22 16:59   ` Nick Bartos
  1 sibling, 1 reply; 13+ messages in thread
From: Sage Weil @ 2013-01-22  5:14 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Nick Bartos, ceph-devel

On Mon, 21 Jan 2013, Gregory Farnum wrote:
> On Monday, January 21, 2013 at 5:01 PM, Nick Bartos wrote:
> > I would like to store some objects in rados, and retrieve them in a
> > consistent manor. In my initial tests, if I do a 'rados -p foo put
> > test /tmp/test', while it is uploading I can do a 'rados -p foo get
> > test /tmp/blah' on another machine, and it will download a partially
> > written file without returning an error code, so the downloader cannot
> > tell the file is corrupt/incomplete.
> >  
> > My question is, how do I read/write objects in rados via the command
> > line in such a way where the downloader does not get a corrupt or
> > incomplete file? It's fine if it just returns an error on the client
> > and I can try again, I just need to be notified on error.
> >  
> You must be writing large-ish objects? By default the rados tool will upload objects 4MB at a time and you're trying to download mid-way through the full object upload. You can add a "--block-size 20971520" to upload 20MB in a single operation, but make sure you don't exceed the "osd max write size" (90MB by default).
> This is all client-side stuff, though ? from the RADOS object store's perspective, the file is complete after each 4MB write. If you want something more sophisticated (like handling larger objects) you'll need to do at least some minimal tooling of your own, e.g. by setting an object xattr before starting and after finishing the file change, then checking for that presence when reading (and locking on reads or doing a check when the read completes). You can do that with the "setxattr", "rmxattr", and "getxattr" options.

With a bit of additional support in the rados tool, we could write to 
object $foo.tmp with key $foo, and then clone it into position and delete 
the .tmp.

If they're really big objects, though, you may also be better off with 
radosgw, which provides striping and atomicity..

sage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-22  5:14   ` Sage Weil
@ 2013-01-22 16:38     ` Nick Bartos
  2013-01-22 17:27       ` Sage Weil
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Bartos @ 2013-01-22 16:38 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Assuming that the clone is atomic so that the client only ever grabbed
a complete old or new version of the file, that method really seems
ideal.  How much work/time would that be?

The objects will likely average around 10-20MB, but it's possible that
in some cases they may grow to a few hundred MB.


On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil <sage@inktank.com> wrote:
> With a bit of additional support in the rados tool, we could write to
> object $foo.tmp with key $foo, and then clone it into position and delete
> the .tmp.
>
> If they're really big objects, though, you may also be better off with
> radosgw, which provides striping and atomicity..
>
> sage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-22  1:11 ` Gregory Farnum
  2013-01-22  5:14   ` Sage Weil
@ 2013-01-22 16:59   ` Nick Bartos
  1 sibling, 0 replies; 13+ messages in thread
From: Nick Bartos @ 2013-01-22 16:59 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

I had thought about doing something like that, but I'm not sure how to
do it in a race-free way.  For example if I was to set 'done=yes' on a
file, then check that before trying to download the file, the instant
I try to download the file the writer of the file could remove the
xattr and start updating the file, which seems like it would result in
the client getting a corrupted file.

On Mon, Jan 21, 2013 at 5:11 PM, Gregory Farnum <greg@inktank.com> wrote:
> You must be writing large-ish objects? By default the rados tool will upload objects 4MB at a time and you're trying to download mid-way through the full object upload. You can add a "--block-size 20971520" to upload 20MB in a single operation, but make sure you don't exceed the "osd max write size" (90MB by default).
> This is all client-side stuff, though — from the RADOS object store's perspective, the file is complete after each 4MB write. If you want something more sophisticated (like handling larger objects) you'll need to do at least some minimal tooling of your own, e.g. by setting an object xattr before starting and after finishing the file change, then checking for that presence when reading (and locking on reads or doing a check when the read completes). You can do that with the "setxattr", "rmxattr", and "getxattr" options.
> -Greg
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-22 16:38     ` Nick Bartos
@ 2013-01-22 17:27       ` Sage Weil
  2013-01-22 17:43         ` Sage Weil
  2013-01-22 18:42         ` Nick Bartos
  0 siblings, 2 replies; 13+ messages in thread
From: Sage Weil @ 2013-01-22 17:27 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Gregory Farnum, ceph-devel

On Tue, 22 Jan 2013, Nick Bartos wrote:
> Assuming that the clone is atomic so that the client only ever grabbed
> a complete old or new version of the file, that method really seems
> ideal.  How much work/time would that be?
> 
> The objects will likely average around 10-20MB, but it's possible that
> in some cases they may grow to a few hundred MB.

You're in luck--my email load was mercifully light this morning.  

  713  ./rados -p data ls -
  714  ./rados put foo.tmp /etc/passwd  -p data --object-locator foo
  715  ./rados clone foo.tmp foo -p data --object-locator foo
  716  ./rados -p data ls -
  717  ./rados -p data rm foo.tmp --object-locator foo
  718  ./rados -p data ls -
  719  ./rados -p data get foo -

see wip-rados-clone.

sage


> 
> 
> On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil <sage@inktank.com> wrote:
> > With a bit of additional support in the rados tool, we could write to
> > object $foo.tmp with key $foo, and then clone it into position and delete
> > the .tmp.
> >
> > If they're really big objects, though, you may also be better off with
> > radosgw, which provides striping and atomicity..
> >
> > sage
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-22 17:27       ` Sage Weil
@ 2013-01-22 17:43         ` Sage Weil
  2013-01-22 18:42         ` Nick Bartos
  1 sibling, 0 replies; 13+ messages in thread
From: Sage Weil @ 2013-01-22 17:43 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Gregory Farnum, ceph-devel

On Tue, 22 Jan 2013, Sage Weil wrote:
> On Tue, 22 Jan 2013, Nick Bartos wrote:
> > Assuming that the clone is atomic so that the client only ever grabbed
> > a complete old or new version of the file, that method really seems
> > ideal.  How much work/time would that be?
> > 
> > The objects will likely average around 10-20MB, but it's possible that
> > in some cases they may grow to a few hundred MB.

Please keep in mind that a few hundred MB is on the large side for a raw 
rados object.  Be aware that it will make the relative disk utilization 
more erratic.

sage


> 
> You're in luck--my email load was mercifully light this morning.  
> 
>   713  ./rados -p data ls -
>   714  ./rados put foo.tmp /etc/passwd  -p data --object-locator foo
>   715  ./rados clone foo.tmp foo -p data --object-locator foo
>   716  ./rados -p data ls -
>   717  ./rados -p data rm foo.tmp --object-locator foo
>   718  ./rados -p data ls -
>   719  ./rados -p data get foo -
> 
> see wip-rados-clone.
> 
> sage
> 
> 
> > 
> > 
> > On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil <sage@inktank.com> wrote:
> > > With a bit of additional support in the rados tool, we could write to
> > > object $foo.tmp with key $foo, and then clone it into position and delete
> > > the .tmp.
> > >
> > > If they're really big objects, though, you may also be better off with
> > > radosgw, which provides striping and atomicity..
> > >
> > > sage
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-22 17:27       ` Sage Weil
  2013-01-22 17:43         ` Sage Weil
@ 2013-01-22 18:42         ` Nick Bartos
  2013-01-22 20:28           ` Sage Weil
  1 sibling, 1 reply; 13+ messages in thread
From: Nick Bartos @ 2013-01-22 18:42 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Thanks!  Is it safe to just apply that last commit to 0.56.1?  Also,
is the rados command 'clonedata' instead of 'clone'?  That's what it
looked like in the code.

On Tue, Jan 22, 2013 at 9:27 AM, Sage Weil <sage@inktank.com> wrote:
> On Tue, 22 Jan 2013, Nick Bartos wrote:
>> Assuming that the clone is atomic so that the client only ever grabbed
>> a complete old or new version of the file, that method really seems
>> ideal.  How much work/time would that be?
>>
>> The objects will likely average around 10-20MB, but it's possible that
>> in some cases they may grow to a few hundred MB.
>
> You're in luck--my email load was mercifully light this morning.
>
>   713  ./rados -p data ls -
>   714  ./rados put foo.tmp /etc/passwd  -p data --object-locator foo
>   715  ./rados clone foo.tmp foo -p data --object-locator foo
>   716  ./rados -p data ls -
>   717  ./rados -p data rm foo.tmp --object-locator foo
>   718  ./rados -p data ls -
>   719  ./rados -p data get foo -
>
> see wip-rados-clone.
>
> sage
>
>
>>
>>
>> On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil <sage@inktank.com> wrote:
>> > With a bit of additional support in the rados tool, we could write to
>> > object $foo.tmp with key $foo, and then clone it into position and delete
>> > the .tmp.
>> >
>> > If they're really big objects, though, you may also be better off with
>> > radosgw, which provides striping and atomicity..
>> >
>> > sage
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-22 18:42         ` Nick Bartos
@ 2013-01-22 20:28           ` Sage Weil
  2013-01-23 19:01             ` Nick Bartos
  0 siblings, 1 reply; 13+ messages in thread
From: Sage Weil @ 2013-01-22 20:28 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Gregory Farnum, ceph-devel

On Tue, 22 Jan 2013, Nick Bartos wrote:
> Thanks!  Is it safe to just apply that last commit to 0.56.1?  Also,
> is the rados command 'clonedata' instead of 'clone'?  That's what it
> looked like in the code.

Yep, and yep!

s

> 
> On Tue, Jan 22, 2013 at 9:27 AM, Sage Weil <sage@inktank.com> wrote:
> > On Tue, 22 Jan 2013, Nick Bartos wrote:
> >> Assuming that the clone is atomic so that the client only ever grabbed
> >> a complete old or new version of the file, that method really seems
> >> ideal.  How much work/time would that be?
> >>
> >> The objects will likely average around 10-20MB, but it's possible that
> >> in some cases they may grow to a few hundred MB.
> >
> > You're in luck--my email load was mercifully light this morning.
> >
> >   713  ./rados -p data ls -
> >   714  ./rados put foo.tmp /etc/passwd  -p data --object-locator foo
> >   715  ./rados clone foo.tmp foo -p data --object-locator foo
> >   716  ./rados -p data ls -
> >   717  ./rados -p data rm foo.tmp --object-locator foo
> >   718  ./rados -p data ls -
> >   719  ./rados -p data get foo -
> >
> > see wip-rados-clone.
> >
> > sage
> >
> >
> >>
> >>
> >> On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil <sage@inktank.com> wrote:
> >> > With a bit of additional support in the rados tool, we could write to
> >> > object $foo.tmp with key $foo, and then clone it into position and delete
> >> > the .tmp.
> >> >
> >> > If they're really big objects, though, you may also be better off with
> >> > radosgw, which provides striping and atomicity..
> >> >
> >> > sage
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-22 20:28           ` Sage Weil
@ 2013-01-23 19:01             ` Nick Bartos
  2013-01-24  4:25               ` Sage Weil
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Bartos @ 2013-01-23 19:01 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

This seems to be working ok for the most part, but I noticed that
using large files gives errors getting them (but not putting them).
The problems start after 2GB which, as you said, is larger than should
be used in this method.  It shouldn't affect us since we shouldn't be
using this for files that large, but I thought it was worth reporting.

This is the test:

dd if=/dev/zero of=4.bin bs=1M count=100
export FILE=4.bin
rados -p swift_ring ls -
rados -p swift_ring put $FILE.tmp $FILE --object-locator $FILE
rados -p swift_ring clonedata $FILE.tmp $FILE --object-locator $FILE
rados -p swift_ring ls -
rados -p swift_ring rm $FILE.tmp --object-locator $FILE
rados -p swift_ring ls -
rados -p swift_ring stat $FILE
rm -f $FILE.downloaded
rados -p swift_ring get $FILE $FILE.downloaded

These are the results:

dd if=/dev/zero of=4.bin bs=1M count=1000:
# rados -p swift_ring stat $FILE
swift_ring/4.bin mtime 1358967088, size 1048576000
rados -p swift_ring get $FILE $FILE.downloaded
<ok>

dd if=/dev/zero of=4.bin bs=1M count=2000:
# rados -p swift_ring stat $FILE
swift_ring/4.bin mtime 1358967172, size 2097152000
# rados -p swift_ring get $FILE $FILE.downloaded
<ok>

dd if=/dev/zero of=4.bin bs=1M count=3000:
# rados -p swift_ring stat $FILE
swift_ring/4.bin mtime 1358966844, size 3145728000
# rados -p swift_ring get $FILE $FILE.downloaded
error getting swift_ring/4.bin: Unknown error 1149239296

dd if=/dev/zero of=4.bin bs=1M count=8000:
# rados -p swift_ring stat $FILE
swift_ring/4.bin mtime 1358967388, size 8388608000
# rados -p swift_ring get $FILE $FILE.downloaded
error getting swift_ring/4.bin: Bad address





On Tue, Jan 22, 2013 at 12:28 PM, Sage Weil <sage@inktank.com> wrote:
> On Tue, 22 Jan 2013, Nick Bartos wrote:
>> Thanks!  Is it safe to just apply that last commit to 0.56.1?  Also,
>> is the rados command 'clonedata' instead of 'clone'?  That's what it
>> looked like in the code.
>
> Yep, and yep!
>
> s
>
>>
>> On Tue, Jan 22, 2013 at 9:27 AM, Sage Weil <sage@inktank.com> wrote:
>> > On Tue, 22 Jan 2013, Nick Bartos wrote:
>> >> Assuming that the clone is atomic so that the client only ever grabbed
>> >> a complete old or new version of the file, that method really seems
>> >> ideal.  How much work/time would that be?
>> >>
>> >> The objects will likely average around 10-20MB, but it's possible that
>> >> in some cases they may grow to a few hundred MB.
>> >
>> > You're in luck--my email load was mercifully light this morning.
>> >
>> >   713  ./rados -p data ls -
>> >   714  ./rados put foo.tmp /etc/passwd  -p data --object-locator foo
>> >   715  ./rados clone foo.tmp foo -p data --object-locator foo
>> >   716  ./rados -p data ls -
>> >   717  ./rados -p data rm foo.tmp --object-locator foo
>> >   718  ./rados -p data ls -
>> >   719  ./rados -p data get foo -
>> >
>> > see wip-rados-clone.
>> >
>> > sage
>> >
>> >
>> >>
>> >>
>> >> On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil <sage@inktank.com> wrote:
>> >> > With a bit of additional support in the rados tool, we could write to
>> >> > object $foo.tmp with key $foo, and then clone it into position and delete
>> >> > the .tmp.
>> >> >
>> >> > If they're really big objects, though, you may also be better off with
>> >> > radosgw, which provides striping and atomicity..
>> >> >
>> >> > sage
>> >>
>> >>
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-23 19:01             ` Nick Bartos
@ 2013-01-24  4:25               ` Sage Weil
  2013-01-24  5:31                 ` Sage Weil
  0 siblings, 1 reply; 13+ messages in thread
From: Sage Weil @ 2013-01-24  4:25 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Gregory Farnum, ceph-devel

Hi Nick-

The problem here looks to just be that do_get() in rados.cc isn't making 
any attempt to read large objects in chunks.  I'm not sure where the 2GB 
limit is, but it well beyond non-optimal before it gets to that point.  
That function needs to read in chunks of a few MB and keep going until it 
gets a short read, modulo some extra futzing for stdout.

Any takers?  :)

sage


On Wed, 23 Jan 2013, Nick Bartos wrote:

> This seems to be working ok for the most part, but I noticed that
> using large files gives errors getting them (but not putting them).
> The problems start after 2GB which, as you said, is larger than should
> be used in this method.  It shouldn't affect us since we shouldn't be
> using this for files that large, but I thought it was worth reporting.
> 
> This is the test:
> 
> dd if=/dev/zero of=4.bin bs=1M count=100
> export FILE=4.bin
> rados -p swift_ring ls -
> rados -p swift_ring put $FILE.tmp $FILE --object-locator $FILE
> rados -p swift_ring clonedata $FILE.tmp $FILE --object-locator $FILE
> rados -p swift_ring ls -
> rados -p swift_ring rm $FILE.tmp --object-locator $FILE
> rados -p swift_ring ls -
> rados -p swift_ring stat $FILE
> rm -f $FILE.downloaded
> rados -p swift_ring get $FILE $FILE.downloaded
> 
> These are the results:
> 
> dd if=/dev/zero of=4.bin bs=1M count=1000:
> # rados -p swift_ring stat $FILE
> swift_ring/4.bin mtime 1358967088, size 1048576000
> rados -p swift_ring get $FILE $FILE.downloaded
> <ok>
> 
> dd if=/dev/zero of=4.bin bs=1M count=2000:
> # rados -p swift_ring stat $FILE
> swift_ring/4.bin mtime 1358967172, size 2097152000
> # rados -p swift_ring get $FILE $FILE.downloaded
> <ok>
> 
> dd if=/dev/zero of=4.bin bs=1M count=3000:
> # rados -p swift_ring stat $FILE
> swift_ring/4.bin mtime 1358966844, size 3145728000
> # rados -p swift_ring get $FILE $FILE.downloaded
> error getting swift_ring/4.bin: Unknown error 1149239296
> 
> dd if=/dev/zero of=4.bin bs=1M count=8000:
> # rados -p swift_ring stat $FILE
> swift_ring/4.bin mtime 1358967388, size 8388608000
> # rados -p swift_ring get $FILE $FILE.downloaded
> error getting swift_ring/4.bin: Bad address
> 
> 
> 
> 
> 
> On Tue, Jan 22, 2013 at 12:28 PM, Sage Weil <sage@inktank.com> wrote:
> > On Tue, 22 Jan 2013, Nick Bartos wrote:
> >> Thanks!  Is it safe to just apply that last commit to 0.56.1?  Also,
> >> is the rados command 'clonedata' instead of 'clone'?  That's what it
> >> looked like in the code.
> >
> > Yep, and yep!
> >
> > s
> >
> >>
> >> On Tue, Jan 22, 2013 at 9:27 AM, Sage Weil <sage@inktank.com> wrote:
> >> > On Tue, 22 Jan 2013, Nick Bartos wrote:
> >> >> Assuming that the clone is atomic so that the client only ever grabbed
> >> >> a complete old or new version of the file, that method really seems
> >> >> ideal.  How much work/time would that be?
> >> >>
> >> >> The objects will likely average around 10-20MB, but it's possible that
> >> >> in some cases they may grow to a few hundred MB.
> >> >
> >> > You're in luck--my email load was mercifully light this morning.
> >> >
> >> >   713  ./rados -p data ls -
> >> >   714  ./rados put foo.tmp /etc/passwd  -p data --object-locator foo
> >> >   715  ./rados clone foo.tmp foo -p data --object-locator foo
> >> >   716  ./rados -p data ls -
> >> >   717  ./rados -p data rm foo.tmp --object-locator foo
> >> >   718  ./rados -p data ls -
> >> >   719  ./rados -p data get foo -
> >> >
> >> > see wip-rados-clone.
> >> >
> >> > sage
> >> >
> >> >
> >> >>
> >> >>
> >> >> On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil <sage@inktank.com> wrote:
> >> >> > With a bit of additional support in the rados tool, we could write to
> >> >> > object $foo.tmp with key $foo, and then clone it into position and delete
> >> >> > the .tmp.
> >> >> >
> >> >> > If they're really big objects, though, you may also be better off with
> >> >> > radosgw, which provides striping and atomicity..
> >> >> >
> >> >> > sage
> >> >>
> >> >>
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-24  4:25               ` Sage Weil
@ 2013-01-24  5:31                 ` Sage Weil
  2013-01-24 17:37                   ` Nick Bartos
  0 siblings, 1 reply; 13+ messages in thread
From: Sage Weil @ 2013-01-24  5:31 UTC (permalink / raw)
  To: Nick Bartos; +Cc: Gregory Farnum, ceph-devel

Try wip-rados-get

On Wed, 23 Jan 2013, Sage Weil wrote:

> Hi Nick-
> 
> The problem here looks to just be that do_get() in rados.cc isn't making 
> any attempt to read large objects in chunks.  I'm not sure where the 2GB 
> limit is, but it well beyond non-optimal before it gets to that point.  
> That function needs to read in chunks of a few MB and keep going until it 
> gets a short read, modulo some extra futzing for stdout.
> 
> Any takers?  :)
> 
> sage
> 
> 
> On Wed, 23 Jan 2013, Nick Bartos wrote:
> 
> > This seems to be working ok for the most part, but I noticed that
> > using large files gives errors getting them (but not putting them).
> > The problems start after 2GB which, as you said, is larger than should
> > be used in this method.  It shouldn't affect us since we shouldn't be
> > using this for files that large, but I thought it was worth reporting.
> > 
> > This is the test:
> > 
> > dd if=/dev/zero of=4.bin bs=1M count=100
> > export FILE=4.bin
> > rados -p swift_ring ls -
> > rados -p swift_ring put $FILE.tmp $FILE --object-locator $FILE
> > rados -p swift_ring clonedata $FILE.tmp $FILE --object-locator $FILE
> > rados -p swift_ring ls -
> > rados -p swift_ring rm $FILE.tmp --object-locator $FILE
> > rados -p swift_ring ls -
> > rados -p swift_ring stat $FILE
> > rm -f $FILE.downloaded
> > rados -p swift_ring get $FILE $FILE.downloaded
> > 
> > These are the results:
> > 
> > dd if=/dev/zero of=4.bin bs=1M count=1000:
> > # rados -p swift_ring stat $FILE
> > swift_ring/4.bin mtime 1358967088, size 1048576000
> > rados -p swift_ring get $FILE $FILE.downloaded
> > <ok>
> > 
> > dd if=/dev/zero of=4.bin bs=1M count=2000:
> > # rados -p swift_ring stat $FILE
> > swift_ring/4.bin mtime 1358967172, size 2097152000
> > # rados -p swift_ring get $FILE $FILE.downloaded
> > <ok>
> > 
> > dd if=/dev/zero of=4.bin bs=1M count=3000:
> > # rados -p swift_ring stat $FILE
> > swift_ring/4.bin mtime 1358966844, size 3145728000
> > # rados -p swift_ring get $FILE $FILE.downloaded
> > error getting swift_ring/4.bin: Unknown error 1149239296
> > 
> > dd if=/dev/zero of=4.bin bs=1M count=8000:
> > # rados -p swift_ring stat $FILE
> > swift_ring/4.bin mtime 1358967388, size 8388608000
> > # rados -p swift_ring get $FILE $FILE.downloaded
> > error getting swift_ring/4.bin: Bad address
> > 
> > 
> > 
> > 
> > 
> > On Tue, Jan 22, 2013 at 12:28 PM, Sage Weil <sage@inktank.com> wrote:
> > > On Tue, 22 Jan 2013, Nick Bartos wrote:
> > >> Thanks!  Is it safe to just apply that last commit to 0.56.1?  Also,
> > >> is the rados command 'clonedata' instead of 'clone'?  That's what it
> > >> looked like in the code.
> > >
> > > Yep, and yep!
> > >
> > > s
> > >
> > >>
> > >> On Tue, Jan 22, 2013 at 9:27 AM, Sage Weil <sage@inktank.com> wrote:
> > >> > On Tue, 22 Jan 2013, Nick Bartos wrote:
> > >> >> Assuming that the clone is atomic so that the client only ever grabbed
> > >> >> a complete old or new version of the file, that method really seems
> > >> >> ideal.  How much work/time would that be?
> > >> >>
> > >> >> The objects will likely average around 10-20MB, but it's possible that
> > >> >> in some cases they may grow to a few hundred MB.
> > >> >
> > >> > You're in luck--my email load was mercifully light this morning.
> > >> >
> > >> >   713  ./rados -p data ls -
> > >> >   714  ./rados put foo.tmp /etc/passwd  -p data --object-locator foo
> > >> >   715  ./rados clone foo.tmp foo -p data --object-locator foo
> > >> >   716  ./rados -p data ls -
> > >> >   717  ./rados -p data rm foo.tmp --object-locator foo
> > >> >   718  ./rados -p data ls -
> > >> >   719  ./rados -p data get foo -
> > >> >
> > >> > see wip-rados-clone.
> > >> >
> > >> > sage
> > >> >
> > >> >
> > >> >>
> > >> >>
> > >> >> On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil <sage@inktank.com> wrote:
> > >> >> > With a bit of additional support in the rados tool, we could write to
> > >> >> > object $foo.tmp with key $foo, and then clone it into position and delete
> > >> >> > the .tmp.
> > >> >> >
> > >> >> > If they're really big objects, though, you may also be better off with
> > >> >> > radosgw, which provides striping and atomicity..
> > >> >> >
> > >> >> > sage
> > >> >>
> > >> >>
> > >>
> > >>
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consistently reading/writing rados objects via command line
  2013-01-24  5:31                 ` Sage Weil
@ 2013-01-24 17:37                   ` Nick Bartos
  0 siblings, 0 replies; 13+ messages in thread
From: Nick Bartos @ 2013-01-24 17:37 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

Thanks!

On Wed, Jan 23, 2013 at 9:31 PM, Sage Weil <sage@inktank.com> wrote:
> Try wip-rados-get
>
> On Wed, 23 Jan 2013, Sage Weil wrote:
>
>> Hi Nick-
>>
>> The problem here looks to just be that do_get() in rados.cc isn't making
>> any attempt to read large objects in chunks.  I'm not sure where the 2GB
>> limit is, but it well beyond non-optimal before it gets to that point.
>> That function needs to read in chunks of a few MB and keep going until it
>> gets a short read, modulo some extra futzing for stdout.
>>
>> Any takers?  :)
>>
>> sage
>>
>>
>> On Wed, 23 Jan 2013, Nick Bartos wrote:
>>
>> > This seems to be working ok for the most part, but I noticed that
>> > using large files gives errors getting them (but not putting them).
>> > The problems start after 2GB which, as you said, is larger than should
>> > be used in this method.  It shouldn't affect us since we shouldn't be
>> > using this for files that large, but I thought it was worth reporting.
>> >
>> > This is the test:
>> >
>> > dd if=/dev/zero of=4.bin bs=1M count=100
>> > export FILE=4.bin
>> > rados -p swift_ring ls -
>> > rados -p swift_ring put $FILE.tmp $FILE --object-locator $FILE
>> > rados -p swift_ring clonedata $FILE.tmp $FILE --object-locator $FILE
>> > rados -p swift_ring ls -
>> > rados -p swift_ring rm $FILE.tmp --object-locator $FILE
>> > rados -p swift_ring ls -
>> > rados -p swift_ring stat $FILE
>> > rm -f $FILE.downloaded
>> > rados -p swift_ring get $FILE $FILE.downloaded
>> >
>> > These are the results:
>> >
>> > dd if=/dev/zero of=4.bin bs=1M count=1000:
>> > # rados -p swift_ring stat $FILE
>> > swift_ring/4.bin mtime 1358967088, size 1048576000
>> > rados -p swift_ring get $FILE $FILE.downloaded
>> > <ok>
>> >
>> > dd if=/dev/zero of=4.bin bs=1M count=2000:
>> > # rados -p swift_ring stat $FILE
>> > swift_ring/4.bin mtime 1358967172, size 2097152000
>> > # rados -p swift_ring get $FILE $FILE.downloaded
>> > <ok>
>> >
>> > dd if=/dev/zero of=4.bin bs=1M count=3000:
>> > # rados -p swift_ring stat $FILE
>> > swift_ring/4.bin mtime 1358966844, size 3145728000
>> > # rados -p swift_ring get $FILE $FILE.downloaded
>> > error getting swift_ring/4.bin: Unknown error 1149239296
>> >
>> > dd if=/dev/zero of=4.bin bs=1M count=8000:
>> > # rados -p swift_ring stat $FILE
>> > swift_ring/4.bin mtime 1358967388, size 8388608000
>> > # rados -p swift_ring get $FILE $FILE.downloaded
>> > error getting swift_ring/4.bin: Bad address
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jan 22, 2013 at 12:28 PM, Sage Weil <sage@inktank.com> wrote:
>> > > On Tue, 22 Jan 2013, Nick Bartos wrote:
>> > >> Thanks!  Is it safe to just apply that last commit to 0.56.1?  Also,
>> > >> is the rados command 'clonedata' instead of 'clone'?  That's what it
>> > >> looked like in the code.
>> > >
>> > > Yep, and yep!
>> > >
>> > > s
>> > >
>> > >>
>> > >> On Tue, Jan 22, 2013 at 9:27 AM, Sage Weil <sage@inktank.com> wrote:
>> > >> > On Tue, 22 Jan 2013, Nick Bartos wrote:
>> > >> >> Assuming that the clone is atomic so that the client only ever grabbed
>> > >> >> a complete old or new version of the file, that method really seems
>> > >> >> ideal.  How much work/time would that be?
>> > >> >>
>> > >> >> The objects will likely average around 10-20MB, but it's possible that
>> > >> >> in some cases they may grow to a few hundred MB.
>> > >> >
>> > >> > You're in luck--my email load was mercifully light this morning.
>> > >> >
>> > >> >   713  ./rados -p data ls -
>> > >> >   714  ./rados put foo.tmp /etc/passwd  -p data --object-locator foo
>> > >> >   715  ./rados clone foo.tmp foo -p data --object-locator foo
>> > >> >   716  ./rados -p data ls -
>> > >> >   717  ./rados -p data rm foo.tmp --object-locator foo
>> > >> >   718  ./rados -p data ls -
>> > >> >   719  ./rados -p data get foo -
>> > >> >
>> > >> > see wip-rados-clone.
>> > >> >
>> > >> > sage
>> > >> >
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil <sage@inktank.com> wrote:
>> > >> >> > With a bit of additional support in the rados tool, we could write to
>> > >> >> > object $foo.tmp with key $foo, and then clone it into position and delete
>> > >> >> > the .tmp.
>> > >> >> >
>> > >> >> > If they're really big objects, though, you may also be better off with
>> > >> >> > radosgw, which provides striping and atomicity..
>> > >> >> >
>> > >> >> > sage
>> > >> >>
>> > >> >>
>> > >>
>> > >>
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-01-24 17:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-22  1:01 Consistently reading/writing rados objects via command line Nick Bartos
2013-01-22  1:11 ` Gregory Farnum
2013-01-22  5:14   ` Sage Weil
2013-01-22 16:38     ` Nick Bartos
2013-01-22 17:27       ` Sage Weil
2013-01-22 17:43         ` Sage Weil
2013-01-22 18:42         ` Nick Bartos
2013-01-22 20:28           ` Sage Weil
2013-01-23 19:01             ` Nick Bartos
2013-01-24  4:25               ` Sage Weil
2013-01-24  5:31                 ` Sage Weil
2013-01-24 17:37                   ` Nick Bartos
2013-01-22 16:59   ` Nick Bartos

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.