All of lore.kernel.org
 help / color / mirror / Atom feed
* uint32_t BlueStore::Extent::logical_offset?
@ 2016-11-22 16:58 Igor Fedotov
  2016-11-22 17:06 ` Gregory Farnum
  2016-11-22 21:47 ` Sage Weil
  0 siblings, 2 replies; 8+ messages in thread
From: Igor Fedotov @ 2016-11-22 16:58 UTC (permalink / raw)
  To: Sage Weil, ceph-devel

Hi Sage,


I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide.

IMHO it's to be uint64_t unless we limit onode size to 4Gb.

Looks like we have implicit truncate when doing set_lextent/new Extent 
at the moment and hence some issues with large onodes are possible.

Thanks,
Igor

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: uint32_t BlueStore::Extent::logical_offset?
  2016-11-22 16:58 uint32_t BlueStore::Extent::logical_offset? Igor Fedotov
@ 2016-11-22 17:06 ` Gregory Farnum
  2016-11-22 17:15   ` Igor Fedotov
  2016-11-22 21:47 ` Sage Weil
  1 sibling, 1 reply; 8+ messages in thread
From: Gregory Farnum @ 2016-11-22 17:06 UTC (permalink / raw)
  To: Igor Fedotov; +Cc: Sage Weil, ceph-devel

On Tue, Nov 22, 2016 at 11:58 AM, Igor Fedotov <ifedotov@mirantis.com> wrote:
> Hi Sage,
>
>
> I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide.
>
> IMHO it's to be uint64_t unless we limit onode size to 4Gb.
>
> Looks like we have implicit truncate when doing set_lextent/new Extent at
> the moment and hence some issues with large onodes are possible.

onodes represent a single RADOS object within BlueStore, don't they?
So 4GB is much more than we need; the rest of the system on top is
going to fail well before you reach that size.
-Greg

>
> Thanks,
> Igor
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: uint32_t BlueStore::Extent::logical_offset?
  2016-11-22 17:06 ` Gregory Farnum
@ 2016-11-22 17:15   ` Igor Fedotov
  0 siblings, 0 replies; 8+ messages in thread
From: Igor Fedotov @ 2016-11-22 17:15 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, ceph-devel

But ObjectStore provides 64-bit interface hence such a limit is pretty 
implicit.


On 22.11.2016 20:06, Gregory Farnum wrote:
> On Tue, Nov 22, 2016 at 11:58 AM, Igor Fedotov <ifedotov@mirantis.com> wrote:
>> Hi Sage,
>>
>>
>> I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide.
>>
>> IMHO it's to be uint64_t unless we limit onode size to 4Gb.
>>
>> Looks like we have implicit truncate when doing set_lextent/new Extent at
>> the moment and hence some issues with large onodes are possible.
> onodes represent a single RADOS object within BlueStore, don't they?
> So 4GB is much more than we need; the rest of the system on top is
> going to fail well before you reach that size.
> -Greg
>
>> Thanks,
>> Igor
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: uint32_t BlueStore::Extent::logical_offset?
  2016-11-22 16:58 uint32_t BlueStore::Extent::logical_offset? Igor Fedotov
  2016-11-22 17:06 ` Gregory Farnum
@ 2016-11-22 21:47 ` Sage Weil
  2016-11-22 21:53   ` Somnath Roy
  1 sibling, 1 reply; 8+ messages in thread
From: Sage Weil @ 2016-11-22 21:47 UTC (permalink / raw)
  To: Igor Fedotov; +Cc: ceph-devel

On Tue, 22 Nov 2016, Igor Fedotov wrote:
> Hi Sage,
> 
> 
> I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide.
> 
> IMHO it's to be uint64_t unless we limit onode size to 4Gb.
> 
> Looks like we have implicit truncate when doing set_lextent/new Extent at the
> moment and hence some issues with large onodes are possible.

The max object size enforced in the OSD is ~128 MB (or in that 
neighborhood, if I remember correctly).  We really shouldn't be storing 
individual rados objects that are orders of magnitude larger than that.

sage

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: uint32_t BlueStore::Extent::logical_offset?
  2016-11-22 21:47 ` Sage Weil
@ 2016-11-22 21:53   ` Somnath Roy
  2016-11-22 22:42     ` Matt Benjamin
  0 siblings, 1 reply; 8+ messages in thread
From: Somnath Roy @ 2016-11-22 21:53 UTC (permalink / raw)
  To: Sage Weil, Igor Fedotov; +Cc: ceph-devel

Sage,
OSD max obect size in the latest master is 100G , it used to be smaller..

OPTION(osd_max_object_size, OPT_U64, 100*1024L*1024L*1024L) // OSD's maximum object size

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
Sent: Tuesday, November 22, 2016 1:47 PM
To: Igor Fedotov
Cc: ceph-devel
Subject: Re: uint32_t BlueStore::Extent::logical_offset?

On Tue, 22 Nov 2016, Igor Fedotov wrote:
> Hi Sage,
>
>
> I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide.
>
> IMHO it's to be uint64_t unless we limit onode size to 4Gb.
>
> Looks like we have implicit truncate when doing set_lextent/new Extent
> at the moment and hence some issues with large onodes are possible.

The max object size enforced in the OSD is ~128 MB (or in that neighborhood, if I remember correctly).  We really shouldn't be storing individual rados objects that are orders of magnitude larger than that.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: uint32_t BlueStore::Extent::logical_offset?
  2016-11-22 21:53   ` Somnath Roy
@ 2016-11-22 22:42     ` Matt Benjamin
  2016-11-22 22:45       ` Sage Weil
  0 siblings, 1 reply; 8+ messages in thread
From: Matt Benjamin @ 2016-11-22 22:42 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Sage Weil, Igor Fedotov, ceph-devel

It would seem preferable not to bake-in such a limit on extent size, even if offsets exceeding that won't immediately be used.

Matt

----- Original Message -----
> From: "Somnath Roy" <Somnath.Roy@sandisk.com>
> To: "Sage Weil" <sage@newdream.net>, "Igor Fedotov" <ifedotov@mirantis.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
> Sent: Tuesday, November 22, 2016 4:53:18 PM
> Subject: RE: uint32_t BlueStore::Extent::logical_offset?
> 
> Sage,
> OSD max obect size in the latest master is 100G , it used to be smaller..
> 
> OPTION(osd_max_object_size, OPT_U64, 100*1024L*1024L*1024L) // OSD's maximum
> object size
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Tuesday, November 22, 2016 1:47 PM
> To: Igor Fedotov
> Cc: ceph-devel
> Subject: Re: uint32_t BlueStore::Extent::logical_offset?
> 
> On Tue, 22 Nov 2016, Igor Fedotov wrote:
> > Hi Sage,
> >
> >
> > I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide.
> >
> > IMHO it's to be uint64_t unless we limit onode size to 4Gb.
> >
> > Looks like we have implicit truncate when doing set_lextent/new Extent
> > at the moment and hence some issues with large onodes are possible.
> 
> The max object size enforced in the OSD is ~128 MB (or in that neighborhood,
> if I remember correctly).  We really shouldn't be storing individual rados
> objects that are orders of magnitude larger than that.
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: uint32_t BlueStore::Extent::logical_offset?
  2016-11-22 22:42     ` Matt Benjamin
@ 2016-11-22 22:45       ` Sage Weil
  2016-11-22 23:02         ` Matt Benjamin
  0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2016-11-22 22:45 UTC (permalink / raw)
  To: Matt Benjamin; +Cc: Somnath Roy, Igor Fedotov, ceph-devel

On Tue, 22 Nov 2016, Matt Benjamin wrote:
> It would seem preferable not to bake-in such a limit on extent size, 
> even if offsets exceeding that won't immediately be used.

All things being equal, sure.  But 4gb is 1-2 orders of magnitude larger 
than we recommend or design for, and it costs us memory.

FWIW, the logical_offset is varint encoded on disk, so the disk format is 
actually unsized.

In the meantime, I think that 100GB object size limit should really be 
more like 100MB!

sage


> 
> Matt
> 
> ----- Original Message -----
> > From: "Somnath Roy" <Somnath.Roy@sandisk.com>
> > To: "Sage Weil" <sage@newdream.net>, "Igor Fedotov" <ifedotov@mirantis.com>
> > Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
> > Sent: Tuesday, November 22, 2016 4:53:18 PM
> > Subject: RE: uint32_t BlueStore::Extent::logical_offset?
> > 
> > Sage,
> > OSD max obect size in the latest master is 100G , it used to be smaller..
> > 
> > OPTION(osd_max_object_size, OPT_U64, 100*1024L*1024L*1024L) // OSD's maximum
> > object size
> > 
> > Thanks & Regards
> > Somnath
> > 
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
> > Sent: Tuesday, November 22, 2016 1:47 PM
> > To: Igor Fedotov
> > Cc: ceph-devel
> > Subject: Re: uint32_t BlueStore::Extent::logical_offset?
> > 
> > On Tue, 22 Nov 2016, Igor Fedotov wrote:
> > > Hi Sage,
> > >
> > >
> > > I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide.
> > >
> > > IMHO it's to be uint64_t unless we limit onode size to 4Gb.
> > >
> > > Looks like we have implicit truncate when doing set_lextent/new Extent
> > > at the moment and hence some issues with large onodes are possible.
> > 
> > The max object size enforced in the OSD is ~128 MB (or in that neighborhood,
> > if I remember correctly).  We really shouldn't be storing individual rados
> > objects that are orders of magnitude larger than that.
> > 
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> > body of a message to majordomo@vger.kernel.org More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
> > PLEASE NOTE: The information contained in this electronic mail message is
> > intended only for the use of the designated recipient(s) named above. If the
> > reader of this message is not the intended recipient, you are hereby
> > notified that you have received this message in error and that any review,
> > dissemination, distribution, or copying of this message is strictly
> > prohibited. If you have received this communication in error, please notify
> > the sender by telephone or e-mail (as shown above) immediately and destroy
> > any and all copies of this message in your possession (whether hard copies
> > or electronically stored copies).
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> -- 
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://www.redhat.com/en/technologies/storage
> 
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: uint32_t BlueStore::Extent::logical_offset?
  2016-11-22 22:45       ` Sage Weil
@ 2016-11-22 23:02         ` Matt Benjamin
  0 siblings, 0 replies; 8+ messages in thread
From: Matt Benjamin @ 2016-11-22 23:02 UTC (permalink / raw)
  To: Sage Weil; +Cc: Somnath Roy, Igor Fedotov, ceph-devel

It would seem like folks attempting to use bluestore in "ways we never thought of" might run into this fairly quickly.  Maybe it's not as much a hard limit as I'm imagining with reference to file systems, though.

Matt

----- Original Message -----
> From: "Sage Weil" <sage@newdream.net>
> To: "Matt Benjamin" <mbenjamin@redhat.com>
> Cc: "Somnath Roy" <Somnath.Roy@sandisk.com>, "Igor Fedotov" <ifedotov@mirantis.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Sent: Tuesday, November 22, 2016 5:45:14 PM
> Subject: Re: uint32_t BlueStore::Extent::logical_offset?
> 
> On Tue, 22 Nov 2016, Matt Benjamin wrote:
> > It would seem preferable not to bake-in such a limit on extent size,
> > even if offsets exceeding that won't immediately be used.
> 
> All things being equal, sure.  But 4gb is 1-2 orders of magnitude larger
> than we recommend or design for, and it costs us memory.
> 
> FWIW, the logical_offset is varint encoded on disk, so the disk format is
> actually unsized.
> 
> In the meantime, I think that 100GB object size limit should really be
> more like 100MB!
> 
> sage
> 
> 
> > 
> > Matt
> > 
> > ----- Original Message -----
> > > From: "Somnath Roy" <Somnath.Roy@sandisk.com>
> > > To: "Sage Weil" <sage@newdream.net>, "Igor Fedotov"
> > > <ifedotov@mirantis.com>
> > > Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
> > > Sent: Tuesday, November 22, 2016 4:53:18 PM
> > > Subject: RE: uint32_t BlueStore::Extent::logical_offset?
> > > 
> > > Sage,
> > > OSD max obect size in the latest master is 100G , it used to be smaller..
> > > 
> > > OPTION(osd_max_object_size, OPT_U64, 100*1024L*1024L*1024L) // OSD's
> > > maximum
> > > object size
> > > 
> > > Thanks & Regards
> > > Somnath
> > > 
> > > -----Original Message-----
> > > From: ceph-devel-owner@vger.kernel.org
> > > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
> > > Sent: Tuesday, November 22, 2016 1:47 PM
> > > To: Igor Fedotov
> > > Cc: ceph-devel
> > > Subject: Re: uint32_t BlueStore::Extent::logical_offset?
> > > 
> > > On Tue, 22 Nov 2016, Igor Fedotov wrote:
> > > > Hi Sage,
> > > >
> > > >
> > > > I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide.
> > > >
> > > > IMHO it's to be uint64_t unless we limit onode size to 4Gb.
> > > >
> > > > Looks like we have implicit truncate when doing set_lextent/new Extent
> > > > at the moment and hence some issues with large onodes are possible.
> > > 
> > > The max object size enforced in the OSD is ~128 MB (or in that
> > > neighborhood,
> > > if I remember correctly).  We really shouldn't be storing individual
> > > rados
> > > objects that are orders of magnitude larger than that.
> > > 
> > > sage
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the
> > > body of a message to majordomo@vger.kernel.org More majordomo info at
> > > http://vger.kernel.org/majordomo-info.html
> > > PLEASE NOTE: The information contained in this electronic mail message is
> > > intended only for the use of the designated recipient(s) named above. If
> > > the
> > > reader of this message is not the intended recipient, you are hereby
> > > notified that you have received this message in error and that any
> > > review,
> > > dissemination, distribution, or copying of this message is strictly
> > > prohibited. If you have received this communication in error, please
> > > notify
> > > the sender by telephone or e-mail (as shown above) immediately and
> > > destroy
> > > any and all copies of this message in your possession (whether hard
> > > copies
> > > or electronically stored copies).
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > --
> > Matt Benjamin
> > Red Hat, Inc.
> > 315 West Huron Street, Suite 140A
> > Ann Arbor, Michigan 48103
> > 
> > http://www.redhat.com/en/technologies/storage
> > 
> > tel.  734-821-5101
> > fax.  734-769-8938
> > cel.  734-216-5309
> > 
> > 
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-11-22 23:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-22 16:58 uint32_t BlueStore::Extent::logical_offset? Igor Fedotov
2016-11-22 17:06 ` Gregory Farnum
2016-11-22 17:15   ` Igor Fedotov
2016-11-22 21:47 ` Sage Weil
2016-11-22 21:53   ` Somnath Roy
2016-11-22 22:42     ` Matt Benjamin
2016-11-22 22:45       ` Sage Weil
2016-11-22 23:02         ` Matt Benjamin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.