All of lore.kernel.org
 help / color / mirror / Atom feed
* xattrs vs. omap with radosgw
@ 2015-06-16 18:31 GuangYang
       [not found] ` <BLU175-W13D5737A24429707F8F978DFA70-MsuGFMq8XAE@public.gmane.org>
  2015-06-16 19:43 ` Sage Weil
  0 siblings, 2 replies; 13+ messages in thread
From: GuangYang @ 2015-06-16 18:31 UTC (permalink / raw)
  To: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-idqoXFIVOFJgJs9I8MT0rw

Hi Cephers,
While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O.

I would like to check if anybody has experience with offloading the metadata to omap:
  1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)?
  2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap.

Any sharing is deeply appreciated. Thanks!

Thanks,
Guang 		 	   		  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xattrs vs. omap with radosgw
       [not found] ` <BLU175-W13D5737A24429707F8F978DFA70-MsuGFMq8XAE@public.gmane.org>
@ 2015-06-16 18:38   ` Somnath Roy
  0 siblings, 0 replies; 13+ messages in thread
From: Somnath Roy @ 2015-06-16 18:38 UTC (permalink / raw)
  To: GuangYang, ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	ceph-users-idqoXFIVOFJgJs9I8MT0rw

Guang,
Try to play around with the following conf attributes specially filestore_max_inline_xattr_size and filestore_max_inline_xattrs

// Use omap for xattrs for attrs over
// filestore_max_inline_xattr_size or
OPTION(filestore_max_inline_xattr_size, OPT_U32, 0)     //Override
OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536)
OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048)
OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)

// for more than filestore_max_inline_xattrs attrs
OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override
OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10)
OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10)
OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2)

I think the behavior for XFS is if the xttrs are more than 10, it will use OMAP.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org] On Behalf Of GuangYang
Sent: Tuesday, June 16, 2015 11:31 AM
To: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
Subject: [ceph-users] xattrs vs. omap with radosgw

Hi Cephers,
While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O.

I would like to check if anybody has experience with offloading the metadata to omap:
  1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)?
  2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap.

Any sharing is deeply appreciated. Thanks!

Thanks,
Guang
_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xattrs vs. omap with radosgw
  2015-06-16 18:31 xattrs vs. omap with radosgw GuangYang
       [not found] ` <BLU175-W13D5737A24429707F8F978DFA70-MsuGFMq8XAE@public.gmane.org>
@ 2015-06-16 19:43 ` Sage Weil
  2015-06-16 20:48   ` GuangYang
                     ` (3 more replies)
  1 sibling, 4 replies; 13+ messages in thread
From: Sage Weil @ 2015-06-16 19:43 UTC (permalink / raw)
  To: GuangYang; +Cc: ceph-devel, ceph-users

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1260 bytes --]

On Tue, 16 Jun 2015, GuangYang wrote:
> Hi Cephers,
> While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O.
> 
> I would like to check if anybody has experience with offloading the metadata to omap:
>   1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)?
>   2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap.
> 
> Any sharing is deeply appreciated. Thanks!

Hi Guang,

Is this hammer or firefly?

With hammer the size of object_info_t crossed the 255 byte boundary, which 
is the max xattr value that XFS can inline.  We've since merged something 
that stripes over several small xattrs so that we can keep things inline, 
but it hasn't been backported to hammer yet.  See
c6cdb4081e366f471b372102905a1192910ab2da.  Perhaps this is what you're 
seeing?

I think we're still better off with larger XFS inodes and inline xattrs if 
it means we avoid leveldb at all for most objects.

sage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: xattrs vs. omap with radosgw
  2015-06-16 19:43 ` Sage Weil
@ 2015-06-16 20:48   ` GuangYang
  2015-06-16 20:51     ` Mark Nelson
  2015-06-17  1:32   ` Zhou, Yuan
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: GuangYang @ 2015-06-16 20:48 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel, ceph-users

Thanks Sage for the quick response.

It is on Firefly v0.80.4.

While trying to put with *rados* directly, the xattrs can be inline. The problem comes to light when using radosgw, since we have a bunch of metadata to keep via xattrs, including:
   rgw.idtag  : 15 bytes
   rgw.manifest :  381 bytes
   rgw.acl : 121 bytes
   rgw.etag : 33 bytes

Given the background, it looks like the problem is that the rgw.manifest is too large so that XFS make it extents. If I understand correctly, if we port the change to Firefly, we should be able to inline the inode since the accumulated size is still less than 2K (please correct me if I am wrong here).

Thanks,
Guang


----------------------------------------
> Date: Tue, 16 Jun 2015 12:43:08 -0700
> From: sage@newdream.net
> To: yguang11@outlook.com
> CC: ceph-devel@vger.kernel.org; ceph-users@lists.ceph.com
> Subject: Re: xattrs vs. omap with radosgw
>
> On Tue, 16 Jun 2015, GuangYang wrote:
>> Hi Cephers,
>> While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O.
>>
>> I would like to check if anybody has experience with offloading the metadata to omap:
>> 1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)?
>> 2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap.
>>
>> Any sharing is deeply appreciated. Thanks!
>
> Hi Guang,
>
> Is this hammer or firefly?
>
> With hammer the size of object_info_t crossed the 255 byte boundary, which
> is the max xattr value that XFS can inline. We've since merged something
> that stripes over several small xattrs so that we can keep things inline,
> but it hasn't been backported to hammer yet. See
> c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're
> seeing?
>
> I think we're still better off with larger XFS inodes and inline xattrs if
> it means we avoid leveldb at all for most objects.
>
> sage
 		 	   		  --
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xattrs vs. omap with radosgw
  2015-06-16 20:48   ` GuangYang
@ 2015-06-16 20:51     ` Mark Nelson
       [not found]       ` <55808C60.8000706-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Nelson @ 2015-06-16 20:51 UTC (permalink / raw)
  To: GuangYang, Sage Weil; +Cc: ceph-devel, ceph-users



On 06/16/2015 03:48 PM, GuangYang wrote:
> Thanks Sage for the quick response.
>
> It is on Firefly v0.80.4.
>
> While trying to put with *rados* directly, the xattrs can be inline. The problem comes to light when using radosgw, since we have a bunch of metadata to keep via xattrs, including:
>     rgw.idtag  : 15 bytes
>     rgw.manifest :  381 bytes

Ah, that manifest will push us over the limit afaik resulting in every 
inode getting a new extent.

>     rgw.acl : 121 bytes
>     rgw.etag : 33 bytes
>
> Given the background, it looks like the problem is that the rgw.manifest is too large so that XFS make it extents. If I understand correctly, if we port the change to Firefly, we should be able to inline the inode since the accumulated size is still less than 2K (please correct me if I am wrong here).

I think you are correct so long as the patch breaks that manifest down 
into 254 byte or smaller chunks.

>
> Thanks,
> Guang
>
>
> ----------------------------------------
>> Date: Tue, 16 Jun 2015 12:43:08 -0700
>> From: sage@newdream.net
>> To: yguang11@outlook.com
>> CC: ceph-devel@vger.kernel.org; ceph-users@lists.ceph.com
>> Subject: Re: xattrs vs. omap with radosgw
>>
>> On Tue, 16 Jun 2015, GuangYang wrote:
>>> Hi Cephers,
>>> While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O.
>>>
>>> I would like to check if anybody has experience with offloading the metadata to omap:
>>> 1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)?
>>> 2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap.
>>>
>>> Any sharing is deeply appreciated. Thanks!
>>
>> Hi Guang,
>>
>> Is this hammer or firefly?
>>
>> With hammer the size of object_info_t crossed the 255 byte boundary, which
>> is the max xattr value that XFS can inline. We've since merged something
>> that stripes over several small xattrs so that we can keep things inline,
>> but it hasn't been backported to hammer yet. See
>> c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're
>> seeing?
>>
>> I think we're still better off with larger XFS inodes and inline xattrs if
>> it means we avoid leveldb at all for most objects.
>>
>> sage
>   		 	   		  --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: xattrs vs. omap with radosgw
  2015-06-16 19:43 ` Sage Weil
  2015-06-16 20:48   ` GuangYang
@ 2015-06-17  1:32   ` Zhou, Yuan
       [not found]     ` <06681238D8946F44A60AA400760A1CBF01FDC834-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-06-17  7:32   ` Nathan Cutler
  2015-06-25  5:46   ` Pete Zaitcev
  3 siblings, 1 reply; 13+ messages in thread
From: Zhou, Yuan @ 2015-06-17  1:32 UTC (permalink / raw)
  To: Sage Weil, GuangYang; +Cc: ceph-devel, ceph-users

FWIW, there was some discussion in OpenStack Swift and their performance tests showed 255 is not the best in recent XFS. They decided to use large xattr boundary size(65535).

https://gist.github.com/smerritt/5e7e650abaa20599ff34


-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
Sent: Wednesday, June 17, 2015 3:43 AM
To: GuangYang
Cc: ceph-devel@vger.kernel.org; ceph-users@lists.ceph.com
Subject: Re: xattrs vs. omap with radosgw

On Tue, 16 Jun 2015, GuangYang wrote:
> Hi Cephers,
> While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O.
> 
> I would like to check if anybody has experience with offloading the metadata to omap:
>   1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)?
>   2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap.
> 
> Any sharing is deeply appreciated. Thanks!

Hi Guang,

Is this hammer or firefly?

With hammer the size of object_info_t crossed the 255 byte boundary, which is the max xattr value that XFS can inline.  We've since merged something that stripes over several small xattrs so that we can keep things inline, but it hasn't been backported to hammer yet.  See c6cdb4081e366f471b372102905a1192910ab2da.  Perhaps this is what you're seeing?

I think we're still better off with larger XFS inodes and inline xattrs if it means we avoid leveldb at all for most objects.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xattrs vs. omap with radosgw
       [not found]       ` <55808C60.8000706-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-06-17  3:56         ` GuangYang
  0 siblings, 0 replies; 13+ messages in thread
From: GuangYang @ 2015-06-17  3:56 UTC (permalink / raw)
  To: Mark Nelson, Sage Weil
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-idqoXFIVOFJgJs9I8MT0rw


[-- Attachment #1.1: Type: text/plain, Size: 3529 bytes --]

After back-porting Sage's patch to Giant, with radosgw, the xattrs can get inline. I haven't run extensive testing yet, will update once I have some performance data to share.

Thanks,
Guang

> Date: Tue, 16 Jun 2015 15:51:44 -0500
> From: mnelson@redhat.com
> To: yguang11@outlook.com; sage@newdream.net
> CC: ceph-devel@vger.kernel.org; ceph-users@lists.ceph.com
> Subject: Re: xattrs vs. omap with radosgw
> 
> 
> 
> On 06/16/2015 03:48 PM, GuangYang wrote:
> > Thanks Sage for the quick response.
> >
> > It is on Firefly v0.80.4.
> >
> > While trying to put with *rados* directly, the xattrs can be inline. The problem comes to light when using radosgw, since we have a bunch of metadata to keep via xattrs, including:
> >     rgw.idtag  : 15 bytes
> >     rgw.manifest :  381 bytes
> 
> Ah, that manifest will push us over the limit afaik resulting in every 
> inode getting a new extent.
> 
> >     rgw.acl : 121 bytes
> >     rgw.etag : 33 bytes
> >
> > Given the background, it looks like the problem is that the rgw.manifest is too large so that XFS make it extents. If I understand correctly, if we port the change to Firefly, we should be able to inline the inode since the accumulated size is still less than 2K (please correct me if I am wrong here).
> 
> I think you are correct so long as the patch breaks that manifest down 
> into 254 byte or smaller chunks.
> 
> >
> > Thanks,
> > Guang
> >
> >
> > ----------------------------------------
> >> Date: Tue, 16 Jun 2015 12:43:08 -0700
> >> From: sage@newdream.net
> >> To: yguang11@outlook.com
> >> CC: ceph-devel@vger.kernel.org; ceph-users@lists.ceph.com
> >> Subject: Re: xattrs vs. omap with radosgw
> >>
> >> On Tue, 16 Jun 2015, GuangYang wrote:
> >>> Hi Cephers,
> >>> While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O.
> >>>
> >>> I would like to check if anybody has experience with offloading the metadata to omap:
> >>> 1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)?
> >>> 2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap.
> >>>
> >>> Any sharing is deeply appreciated. Thanks!
> >>
> >> Hi Guang,
> >>
> >> Is this hammer or firefly?
> >>
> >> With hammer the size of object_info_t crossed the 255 byte boundary, which
> >> is the max xattr value that XFS can inline. We've since merged something
> >> that stripes over several small xattrs so that we can keep things inline,
> >> but it hasn't been backported to hammer yet. See
> >> c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're
> >> seeing?
> >>
> >> I think we're still better off with larger XFS inodes and inline xattrs if
> >> it means we avoid leveldb at all for most objects.
> >>
> >> sage
> >   		 	   		  --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 4508 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xattrs vs. omap with radosgw
       [not found]     ` <06681238D8946F44A60AA400760A1CBF01FDC834-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-06-17  4:08       ` GuangYang
  2015-06-17  4:11       ` Sage Weil
  1 sibling, 0 replies; 13+ messages in thread
From: GuangYang @ 2015-06-17  4:08 UTC (permalink / raw)
  To: Zhou, Yuan, Sage Weil
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-idqoXFIVOFJgJs9I8MT0rw


[-- Attachment #1.1: Type: text/plain, Size: 2594 bytes --]

Hi Yuan,
Thanks for sharing the link, it is interesting to read. My understanding of the test results, is that with a fixed size of xattrs, using smaller stripe size will incur larger latency for read, which kind of makes sense since there are more k-v pairs, and with the size, it needs to get extents anyway. 

Correct me if I am wrong here...

Thanks,
Guang

> From: yuan.zhou@intel.com
> To: sage@newdream.net; yguang11@outlook.com
> CC: ceph-devel@vger.kernel.org; ceph-users@lists.ceph.com
> Subject: RE: xattrs vs. omap with radosgw
> Date: Wed, 17 Jun 2015 01:32:35 +0000
> 
> FWIW, there was some discussion in OpenStack Swift and their performance tests showed 255 is not the best in recent XFS. They decided to use large xattr boundary size(65535).
> 
> https://gist.github.com/smerritt/5e7e650abaa20599ff34
> 
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Wednesday, June 17, 2015 3:43 AM
> To: GuangYang
> Cc: ceph-devel@vger.kernel.org; ceph-users@lists.ceph.com
> Subject: Re: xattrs vs. omap with radosgw
> 
> On Tue, 16 Jun 2015, GuangYang wrote:
>> Hi Cephers,
>> While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O.
>> 
>> I would like to check if anybody has experience with offloading the metadata to omap:
>>   1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)?
>>   2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap.
>> 
>> Any sharing is deeply appreciated. Thanks!
> 
> Hi Guang,
> 
> Is this hammer or firefly?
> 
> With hammer the size of object_info_t crossed the 255 byte boundary, which is the max xattr value that XFS can inline. We've since merged something that stripes over several small xattrs so that we can keep things inline, but it hasn't been backported to hammer yet. See c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're seeing?
> 
> I think we're still better off with larger XFS inodes and inline xattrs if it means we avoid leveldb at all for most objects.
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 2812 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xattrs vs. omap with radosgw
       [not found]     ` <06681238D8946F44A60AA400760A1CBF01FDC834-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2015-06-17  4:08       ` GuangYang
@ 2015-06-17  4:11       ` Sage Weil
  1 sibling, 0 replies; 13+ messages in thread
From: Sage Weil @ 2015-06-17  4:11 UTC (permalink / raw)
  To: Zhou, Yuan
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-idqoXFIVOFJgJs9I8MT0rw

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2685 bytes --]

On Wed, 17 Jun 2015, Zhou, Yuan wrote:
> FWIW, there was some discussion in OpenStack Swift and their performance tests showed 255 is not the best in recent XFS. They decided to use large xattr boundary size(65535).
> 
> https://gist.github.com/smerritt/5e7e650abaa20599ff34

If I read this correctly the total metadata they are setting is pretty 
big:

PILE_O_METADATA = pickle.dumps(dict(
    ("attribute%d" % i, hashlib.sha512("thingy %d" % i).hexdigest())
    for i in range(200)))

So lots of small attrs won't really help since they'll have to spill out 
into other extents eventually no matter what.

In our case, we have big (2k) inodes and can easily fit everything in 
there.. as long as it is in <255 byte pieces.

sage


> 
> 
> -----Original Message-----
> From: ceph-devel-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:ceph-devel-owner-fy+rA21nqHI@public.gmane.orgrnel.org] On Behalf Of Sage Weil
> Sent: Wednesday, June 17, 2015 3:43 AM
> To: GuangYang
> Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> Subject: Re: xattrs vs. omap with radosgw
> 
> On Tue, 16 Jun 2015, GuangYang wrote:
> > Hi Cephers,
> > While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which incurred extra I/O.
> > 
> > I would like to check if anybody has experience with offloading the metadata to omap:
> >   1> Offload everything to omap? If this is the case, should we make the inode size as 512 (instead of 2k)?
> >   2> Partial offload the metadata to omap, e.g. only offloading the rgw specified metadata to omap.
> > 
> > Any sharing is deeply appreciated. Thanks!
> 
> Hi Guang,
> 
> Is this hammer or firefly?
> 
> With hammer the size of object_info_t crossed the 255 byte boundary, which is the max xattr value that XFS can inline.  We've since merged something that stripes over several small xattrs so that we can keep things inline, but it hasn't been backported to hammer yet.  See c6cdb4081e366f471b372102905a1192910ab2da.  Perhaps this is what you're seeing?
> 
> I think we're still better off with larger XFS inodes and inline xattrs if it means we avoid leveldb at all for most objects.
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xattrs vs. omap with radosgw
  2015-06-16 19:43 ` Sage Weil
  2015-06-16 20:48   ` GuangYang
  2015-06-17  1:32   ` Zhou, Yuan
@ 2015-06-17  7:32   ` Nathan Cutler
       [not found]     ` <55812276.2040305-AlSwsSmVLrQ@public.gmane.org>
  2015-06-17 14:38     ` Sage Weil
  2015-06-25  5:46   ` Pete Zaitcev
  3 siblings, 2 replies; 13+ messages in thread
From: Nathan Cutler @ 2015-06-17  7:32 UTC (permalink / raw)
  To: Sage Weil; +Cc: GuangYang, ceph-devel, ceph-users

> We've since merged something 
> that stripes over several small xattrs so that we can keep things inline, 
> but it hasn't been backported to hammer yet.  See
> c6cdb4081e366f471b372102905a1192910ab2da.

Hi Sage:

You wrote "yet" - should we earmark it for hammer backport?

Nathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xattrs vs. omap with radosgw
       [not found]     ` <55812276.2040305-AlSwsSmVLrQ@public.gmane.org>
@ 2015-06-17  9:25       ` Abhishek L
  0 siblings, 0 replies; 13+ messages in thread
From: Abhishek L @ 2015-06-17  9:25 UTC (permalink / raw)
  To: Nathan Cutler
  Cc: Sage Weil, ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	ceph-users-idqoXFIVOFJgJs9I8MT0rw

On Wed, Jun 17, 2015 at 1:02 PM, Nathan Cutler <ncutler-AlSwsSmVLrQ@public.gmane.org> wrote:
>> We've since merged something
>> that stripes over several small xattrs so that we can keep things inline,
>> but it hasn't been backported to hammer yet.  See
>> c6cdb4081e366f471b372102905a1192910ab2da.
>
> Hi Sage:
>
> You wrote "yet" - should we earmark it for hammer backport?
>
I'm guessing https://github.com/ceph/ceph/pull/4973 is the backport for hammer
(issue http://tracker.ceph.com/issues/11981)

Regards
Abhishek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xattrs vs. omap with radosgw
  2015-06-17  7:32   ` Nathan Cutler
       [not found]     ` <55812276.2040305-AlSwsSmVLrQ@public.gmane.org>
@ 2015-06-17 14:38     ` Sage Weil
  1 sibling, 0 replies; 13+ messages in thread
From: Sage Weil @ 2015-06-17 14:38 UTC (permalink / raw)
  To: Nathan Cutler; +Cc: GuangYang, ceph-devel, ceph-users

On Wed, 17 Jun 2015, Nathan Cutler wrote:
> > We've since merged something 
> > that stripes over several small xattrs so that we can keep things inline, 
> > but it hasn't been backported to hammer yet.  See
> > c6cdb4081e366f471b372102905a1192910ab2da.
> 
> Hi Sage:
> 
> You wrote "yet" - should we earmark it for hammer backport?

Yes, please!

sage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xattrs vs. omap with radosgw
  2015-06-16 19:43 ` Sage Weil
                     ` (2 preceding siblings ...)
  2015-06-17  7:32   ` Nathan Cutler
@ 2015-06-25  5:46   ` Pete Zaitcev
  3 siblings, 0 replies; 13+ messages in thread
From: Pete Zaitcev @ 2015-06-25  5:46 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Tue, 16 Jun 2015 12:43:08 -0700 (PDT)
Sage Weil <sage@newdream.net> wrote:

> With hammer the size of object_info_t crossed the 255 byte boundary, which 
> is the max xattr value that XFS can inline.  We've since merged something 
> that stripes over several small xattrs so that we can keep things inline, 
> but it hasn't been backported to hammer yet.  See
> c6cdb4081e366f471b372102905a1192910ab2da.

Meanwhile, Swift stopped striping altogether:
 https://github.com/openstack/swift/commit/cc2f0f4ed6f12554b7d8e8cb61e14f2b103445a0

(but yes, it's still advantageous to fit into an inode)

-- Pete

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-06-25  5:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-16 18:31 xattrs vs. omap with radosgw GuangYang
     [not found] ` <BLU175-W13D5737A24429707F8F978DFA70-MsuGFMq8XAE@public.gmane.org>
2015-06-16 18:38   ` Somnath Roy
2015-06-16 19:43 ` Sage Weil
2015-06-16 20:48   ` GuangYang
2015-06-16 20:51     ` Mark Nelson
     [not found]       ` <55808C60.8000706-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-17  3:56         ` GuangYang
2015-06-17  1:32   ` Zhou, Yuan
     [not found]     ` <06681238D8946F44A60AA400760A1CBF01FDC834-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-06-17  4:08       ` GuangYang
2015-06-17  4:11       ` Sage Weil
2015-06-17  7:32   ` Nathan Cutler
     [not found]     ` <55812276.2040305-AlSwsSmVLrQ@public.gmane.org>
2015-06-17  9:25       ` Abhishek L
2015-06-17 14:38     ` Sage Weil
2015-06-25  5:46   ` Pete Zaitcev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.