* libcephfs create file with layout and replication
@ 2012-11-17 20:13 Noah Watkins
2012-11-17 21:35 ` Josh Durgin
2012-11-17 23:23 ` Sage Weil
0 siblings, 2 replies; 11+ messages in thread
From: Noah Watkins @ 2012-11-17 20:13 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil
The Hadoop VFS layer assumes that block size and replication can be
set on a per-file basis, which is important to users for file
layout/workload optimizations.
The libcephfs interface doesn't make this entirely easy. Here is one
approach, but it isn't thread safe as the default values are global
variables in the client.
orig_obj_size = ceph_get_default_object_size() //save
set_default_object_size(new size)
open(path, O_CREAT)
set_default_object_size(new size) //reset
Something more convenient might be:
ceph_open_layout(path, flags, mode, layout, replication)
where layout and replication are used with O_CREAT | O_EXCL, or and
interface for setting these values explicitly on newly created files:
ceph_open(path, O_CREAT|O_EXCL)
ceph_set_layout(path, layout, replication)
where ceph_set_layout would succeed ostensibly on zero-length files.
Any thoughts on how to handle this?
Thanks,
Noah
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication
2012-11-17 20:13 libcephfs create file with layout and replication Noah Watkins
@ 2012-11-17 21:35 ` Josh Durgin
2012-11-17 23:23 ` Sage Weil
1 sibling, 0 replies; 11+ messages in thread
From: Josh Durgin @ 2012-11-17 21:35 UTC (permalink / raw)
To: Noah Watkins; +Cc: ceph-devel, Sage Weil
On 11/17/2012 12:13 PM, Noah Watkins wrote:
> The Hadoop VFS layer assumes that block size and replication can be
> set on a per-file basis, which is important to users for file
> layout/workload optimizations.
>
> The libcephfs interface doesn't make this entirely easy. Here is one
> approach, but it isn't thread safe as the default values are global
> variables in the client.
>
> orig_obj_size = ceph_get_default_object_size() //save
> set_default_object_size(new size)
> open(path, O_CREAT)
> set_default_object_size(new size) //reset
>
> Something more convenient might be:
>
> ceph_open_layout(path, flags, mode, layout, replication)
I think this makes the most sense, since changing the layout of a
file after it's been created can't happen, and this interface
makes that the most clear. It also avoids maintaining extra state
in libcephfs between calls.
Since replication count is a per-pool setting, I think the hadoop
bindings would have to translate from a vfs request to a pool
with the requested replication level. So something like this,
where layout is a struct containing stripe unit, stripe count,
and object size (the subset of struct ceph_file_layout related to
objects that's useful currently):
ceph_open_layout(path, flags, mode, layout, pool_name)
BTW, for anyone interested, there's a nice description of
the layout parameters here:
http://ceph.com/docs/master/dev/file-striping/
> where layout and replication are used with O_CREAT | O_EXCL, or and
> interface for setting these values explicitly on newly created files:
>
> ceph_open(path, O_CREAT|O_EXCL)
> ceph_set_layout(path, layout, replication)
>
> where ceph_set_layout would succeed ostensibly on zero-length files.
>
> Any thoughts on how to handle this?
>
> Thanks,
> Noah
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication
2012-11-17 20:13 libcephfs create file with layout and replication Noah Watkins
2012-11-17 21:35 ` Josh Durgin
@ 2012-11-17 23:23 ` Sage Weil
2012-11-17 23:58 ` Noah Watkins
1 sibling, 1 reply; 11+ messages in thread
From: Sage Weil @ 2012-11-17 23:23 UTC (permalink / raw)
To: Noah Watkins; +Cc: ceph-devel
On Sat, 17 Nov 2012, Noah Watkins wrote:
> The Hadoop VFS layer assumes that block size and replication can be
> set on a per-file basis, which is important to users for file
> layout/workload optimizations.
>
> The libcephfs interface doesn't make this entirely easy. Here is one
> approach, but it isn't thread safe as the default values are global
> variables in the client.
>
> orig_obj_size = ceph_get_default_object_size() //save
> set_default_object_size(new size)
> open(path, O_CREAT)
> set_default_object_size(new size) //reset
>
> Something more convenient might be:
>
> ceph_open_layout(path, flags, mode, layout, replication)
>
> where layout and replication are used with O_CREAT | O_EXCL, or and
> interface for setting these values explicitly on newly created files:
>
> ceph_open(path, O_CREAT|O_EXCL)
> ceph_set_layout(path, layout, replication)
This is basically what we have now... at least that's how things work for
the kernel client. We should make sure there is a clean way via libcephfs
to do that.
The client/mds protocol also allows you to specify the layout on file
creation. This is better since it has one less round trip to the MDS.
Let's just create a new open call with those additional arguments.
FWIW, the striping parameters are object size, stripe unit, stripe count,
and data pool.
sage
>
> where ceph_set_layout would succeed ostensibly on zero-length files.
>
> Any thoughts on how to handle this?
>
> Thanks,
> Noah
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication
2012-11-17 23:23 ` Sage Weil
@ 2012-11-17 23:58 ` Noah Watkins
2012-11-18 0:15 ` Sage Weil
0 siblings, 1 reply; 11+ messages in thread
From: Noah Watkins @ 2012-11-17 23:58 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Sat, Nov 17, 2012 at 3:23 PM, Sage Weil <sage@inktank.com> wrote:
> On Sat, 17 Nov 2012, Noah Watkins wrote:
>
> FWIW, the striping parameters are object size, stripe unit, stripe count,
> and data pool.
In ceph_mds_request_args.open I see the all the striping parameters
except data pool, and I don't see any places that the file_replication
parameter is being used. Should a pg_pool field be added?
-Noah
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication
2012-11-17 23:58 ` Noah Watkins
@ 2012-11-18 0:15 ` Sage Weil
2012-11-18 1:20 ` Noah Watkins
0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2012-11-18 0:15 UTC (permalink / raw)
To: Noah Watkins; +Cc: ceph-devel
On Sat, 17 Nov 2012, Noah Watkins wrote:
> On Sat, Nov 17, 2012 at 3:23 PM, Sage Weil <sage@inktank.com> wrote:
> > On Sat, 17 Nov 2012, Noah Watkins wrote:
> >
> > FWIW, the striping parameters are object size, stripe unit, stripe count,
> > and data pool.
>
> In ceph_mds_request_args.open I see the all the striping parameters
> except data pool, and I don't see any places that the file_replication
> parameter is being used. Should a pg_pool field be added?
Yeah, I think this bit needs to be fixed in the on-write protocol. That
is a delicate fix.
We ignore that for the purposes of getting the libcephfs API correct,
though...
sage
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication
2012-11-18 0:15 ` Sage Weil
@ 2012-11-18 1:20 ` Noah Watkins
2012-11-18 20:05 ` Noah Watkins
0 siblings, 1 reply; 11+ messages in thread
From: Noah Watkins @ 2012-11-18 1:20 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil <sage@inktank.com> wrote:
>
> We ignore that for the purposes of getting the libcephfs API correct,
> though...
Ok, make sense. Thanks.
Noah
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication
2012-11-18 1:20 ` Noah Watkins
@ 2012-11-18 20:05 ` Noah Watkins
2012-11-20 1:04 ` Gregory Farnum
0 siblings, 1 reply; 11+ messages in thread
From: Noah Watkins @ 2012-11-18 20:05 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Wanna have a look at a first pass on this patch?
wip-client-open-layout
Thanks,
Noah
On Sat, Nov 17, 2012 at 5:20 PM, Noah Watkins <jayhawk@cs.ucsc.edu> wrote:
> On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil <sage@inktank.com> wrote:
>>
>> We ignore that for the purposes of getting the libcephfs API correct,
>> though...
>
> Ok, make sense. Thanks.
>
> Noah
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication
2012-11-18 20:05 ` Noah Watkins
@ 2012-11-20 1:04 ` Gregory Farnum
2012-11-20 2:48 ` Noah Watkins
0 siblings, 1 reply; 11+ messages in thread
From: Gregory Farnum @ 2012-11-20 1:04 UTC (permalink / raw)
To: Noah Watkins; +Cc: Sage Weil, ceph-devel
On Sun, Nov 18, 2012 at 12:05 PM, Noah Watkins <jayhawk@cs.ucsc.edu> wrote:
> Wanna have a look at a first pass on this patch?
>
> wip-client-open-layout
>
> Thanks,
> Noah
Just glanced over this, and I'm curious:
1) Why symlink another reference to your file_layout.h?
2) There's already a ceph_file_layout struct which is used "widely"
(MDS, kernel, userspace client). It also has an accompanying function
that does basic validity checks.
> On Sat, Nov 17, 2012 at 5:20 PM, Noah Watkins <jayhawk@cs.ucsc.edu> wrote:
>> On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil <sage@inktank.com> wrote:
>>>
>>> We ignore that for the purposes of getting the libcephfs API correct,
>>> though...
>>
>> Ok, make sense. Thanks.
>>
>> Noah
FYI, there's an "unused" __le32 in the open struct (used to be for
preferred PG). We should be able to steal that away without too much
pain or massaging! :)
-Greg
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication
2012-11-20 1:04 ` Gregory Farnum
@ 2012-11-20 2:48 ` Noah Watkins
2012-11-20 3:28 ` Sage Weil
0 siblings, 1 reply; 11+ messages in thread
From: Noah Watkins @ 2012-11-20 2:48 UTC (permalink / raw)
To: Gregory Farnum; +Cc: Sage Weil, ceph-devel
On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum <greg@inktank.com> wrote:
>
> Just glanced over this, and I'm curious:
> 1) Why symlink another reference to your file_layout.h?
I followed the same pattern as page.h in librados, but may have
misunderstood its use. When libcephfs.h is installed, it includes
#include "file_layout.h"
and we assume the user has -Iprefix/cephfs/.
but in the build tree, include/cephfs isn't an includes path used,
hence the symlink.
> 2) There's already a ceph_file_layout struct which is used "widely"
> (MDS, kernel, userspace client). It also has an accompanying function
> that does basic validity checks.
I avoided ceph_file_layout because I was under the impression that all
of the __le64 stuff in it was very much Linux-specific. I had run into
a lot of this hacking on an OSX port.
> FYI, there's an "unused" __le32 in the open struct (used to be for
> preferred PG). We should be able to steal that away without too much
> pain or massaging! :)
Nice. Do you think I should revert back to using ceph_file_layout?
Thanks,
Noah
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication
2012-11-20 2:48 ` Noah Watkins
@ 2012-11-20 3:28 ` Sage Weil
2012-11-20 21:59 ` Noah Watkins
0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2012-11-20 3:28 UTC (permalink / raw)
To: Noah Watkins; +Cc: Gregory Farnum, ceph-devel
On Mon, 19 Nov 2012, Noah Watkins wrote:
> On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum <greg@inktank.com> wrote:
> >
> > Just glanced over this, and I'm curious:
> > 1) Why symlink another reference to your file_layout.h?
>
> I followed the same pattern as page.h in librados, but may have
> misunderstood its use. When libcephfs.h is installed, it includes
>
> #include "file_layout.h"
>
> and we assume the user has -Iprefix/cephfs/.
>
> but in the build tree, include/cephfs isn't an includes path used,
> hence the symlink.
>
> > 2) There's already a ceph_file_layout struct which is used "widely"
> > (MDS, kernel, userspace client). It also has an accompanying function
> > that does basic validity checks.
>
> I avoided ceph_file_layout because I was under the impression that all
> of the __le64 stuff in it was very much Linux-specific. I had run into
> a lot of this hacking on an OSX port.
>
> > FYI, there's an "unused" __le32 in the open struct (used to be for
> > preferred PG). We should be able to steal that away without too much
> > pain or massaging! :)
>
> Nice. Do you think I should revert back to using ceph_file_layout?
We could avoid the whole issue by passing 4 arguments to the function...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication
2012-11-20 3:28 ` Sage Weil
@ 2012-11-20 21:59 ` Noah Watkins
0 siblings, 0 replies; 11+ messages in thread
From: Noah Watkins @ 2012-11-20 21:59 UTC (permalink / raw)
To: Sage Weil; +Cc: Gregory Farnum, ceph-devel
On Mon, Nov 19, 2012 at 7:28 PM, Sage Weil <sage@inktank.com> wrote:
>
> We could avoid the whole issue by passing 4 arguments to the function...
I pushed a new patch that takes each of the 4 new arguments.
wip-client-open-layout
Thanks,
-Noah
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-11-20 21:59 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-17 20:13 libcephfs create file with layout and replication Noah Watkins
2012-11-17 21:35 ` Josh Durgin
2012-11-17 23:23 ` Sage Weil
2012-11-17 23:58 ` Noah Watkins
2012-11-18 0:15 ` Sage Weil
2012-11-18 1:20 ` Noah Watkins
2012-11-18 20:05 ` Noah Watkins
2012-11-20 1:04 ` Gregory Farnum
2012-11-20 2:48 ` Noah Watkins
2012-11-20 3:28 ` Sage Weil
2012-11-20 21:59 ` Noah Watkins
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.