* uint32_t BlueStore::Extent::logical_offset? @ 2016-11-22 16:58 Igor Fedotov 2016-11-22 17:06 ` Gregory Farnum 2016-11-22 21:47 ` Sage Weil 0 siblings, 2 replies; 8+ messages in thread From: Igor Fedotov @ 2016-11-22 16:58 UTC (permalink / raw) To: Sage Weil, ceph-devel Hi Sage, I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide. IMHO it's to be uint64_t unless we limit onode size to 4Gb. Looks like we have implicit truncate when doing set_lextent/new Extent at the moment and hence some issues with large onodes are possible. Thanks, Igor ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: uint32_t BlueStore::Extent::logical_offset? 2016-11-22 16:58 uint32_t BlueStore::Extent::logical_offset? Igor Fedotov @ 2016-11-22 17:06 ` Gregory Farnum 2016-11-22 17:15 ` Igor Fedotov 2016-11-22 21:47 ` Sage Weil 1 sibling, 1 reply; 8+ messages in thread From: Gregory Farnum @ 2016-11-22 17:06 UTC (permalink / raw) To: Igor Fedotov; +Cc: Sage Weil, ceph-devel On Tue, Nov 22, 2016 at 11:58 AM, Igor Fedotov <ifedotov@mirantis.com> wrote: > Hi Sage, > > > I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide. > > IMHO it's to be uint64_t unless we limit onode size to 4Gb. > > Looks like we have implicit truncate when doing set_lextent/new Extent at > the moment and hence some issues with large onodes are possible. onodes represent a single RADOS object within BlueStore, don't they? So 4GB is much more than we need; the rest of the system on top is going to fail well before you reach that size. -Greg > > Thanks, > Igor > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: uint32_t BlueStore::Extent::logical_offset? 2016-11-22 17:06 ` Gregory Farnum @ 2016-11-22 17:15 ` Igor Fedotov 0 siblings, 0 replies; 8+ messages in thread From: Igor Fedotov @ 2016-11-22 17:15 UTC (permalink / raw) To: Gregory Farnum; +Cc: Sage Weil, ceph-devel But ObjectStore provides 64-bit interface hence such a limit is pretty implicit. On 22.11.2016 20:06, Gregory Farnum wrote: > On Tue, Nov 22, 2016 at 11:58 AM, Igor Fedotov <ifedotov@mirantis.com> wrote: >> Hi Sage, >> >> >> I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide. >> >> IMHO it's to be uint64_t unless we limit onode size to 4Gb. >> >> Looks like we have implicit truncate when doing set_lextent/new Extent at >> the moment and hence some issues with large onodes are possible. > onodes represent a single RADOS object within BlueStore, don't they? > So 4GB is much more than we need; the rest of the system on top is > going to fail well before you reach that size. > -Greg > >> Thanks, >> Igor >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: uint32_t BlueStore::Extent::logical_offset? 2016-11-22 16:58 uint32_t BlueStore::Extent::logical_offset? Igor Fedotov 2016-11-22 17:06 ` Gregory Farnum @ 2016-11-22 21:47 ` Sage Weil 2016-11-22 21:53 ` Somnath Roy 1 sibling, 1 reply; 8+ messages in thread From: Sage Weil @ 2016-11-22 21:47 UTC (permalink / raw) To: Igor Fedotov; +Cc: ceph-devel On Tue, 22 Nov 2016, Igor Fedotov wrote: > Hi Sage, > > > I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide. > > IMHO it's to be uint64_t unless we limit onode size to 4Gb. > > Looks like we have implicit truncate when doing set_lextent/new Extent at the > moment and hence some issues with large onodes are possible. The max object size enforced in the OSD is ~128 MB (or in that neighborhood, if I remember correctly). We really shouldn't be storing individual rados objects that are orders of magnitude larger than that. sage ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: uint32_t BlueStore::Extent::logical_offset? 2016-11-22 21:47 ` Sage Weil @ 2016-11-22 21:53 ` Somnath Roy 2016-11-22 22:42 ` Matt Benjamin 0 siblings, 1 reply; 8+ messages in thread From: Somnath Roy @ 2016-11-22 21:53 UTC (permalink / raw) To: Sage Weil, Igor Fedotov; +Cc: ceph-devel Sage, OSD max obect size in the latest master is 100G , it used to be smaller.. OPTION(osd_max_object_size, OPT_U64, 100*1024L*1024L*1024L) // OSD's maximum object size Thanks & Regards Somnath -----Original Message----- From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil Sent: Tuesday, November 22, 2016 1:47 PM To: Igor Fedotov Cc: ceph-devel Subject: Re: uint32_t BlueStore::Extent::logical_offset? On Tue, 22 Nov 2016, Igor Fedotov wrote: > Hi Sage, > > > I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide. > > IMHO it's to be uint64_t unless we limit onode size to 4Gb. > > Looks like we have implicit truncate when doing set_lextent/new Extent > at the moment and hence some issues with large onodes are possible. The max object size enforced in the OSD is ~128 MB (or in that neighborhood, if I remember correctly). We really shouldn't be storing individual rados objects that are orders of magnitude larger than that. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: uint32_t BlueStore::Extent::logical_offset? 2016-11-22 21:53 ` Somnath Roy @ 2016-11-22 22:42 ` Matt Benjamin 2016-11-22 22:45 ` Sage Weil 0 siblings, 1 reply; 8+ messages in thread From: Matt Benjamin @ 2016-11-22 22:42 UTC (permalink / raw) To: Somnath Roy; +Cc: Sage Weil, Igor Fedotov, ceph-devel It would seem preferable not to bake-in such a limit on extent size, even if offsets exceeding that won't immediately be used. Matt ----- Original Message ----- > From: "Somnath Roy" <Somnath.Roy@sandisk.com> > To: "Sage Weil" <sage@newdream.net>, "Igor Fedotov" <ifedotov@mirantis.com> > Cc: "ceph-devel" <ceph-devel@vger.kernel.org> > Sent: Tuesday, November 22, 2016 4:53:18 PM > Subject: RE: uint32_t BlueStore::Extent::logical_offset? > > Sage, > OSD max obect size in the latest master is 100G , it used to be smaller.. > > OPTION(osd_max_object_size, OPT_U64, 100*1024L*1024L*1024L) // OSD's maximum > object size > > Thanks & Regards > Somnath > > -----Original Message----- > From: ceph-devel-owner@vger.kernel.org > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil > Sent: Tuesday, November 22, 2016 1:47 PM > To: Igor Fedotov > Cc: ceph-devel > Subject: Re: uint32_t BlueStore::Extent::logical_offset? > > On Tue, 22 Nov 2016, Igor Fedotov wrote: > > Hi Sage, > > > > > > I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide. > > > > IMHO it's to be uint64_t unless we limit onode size to 4Gb. > > > > Looks like we have implicit truncate when doing set_lextent/new Extent > > at the moment and hence some issues with large onodes are possible. > > The max object size enforced in the OSD is ~128 MB (or in that neighborhood, > if I remember correctly). We really shouldn't be storing individual rados > objects that are orders of magnitude larger than that. > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > body of a message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies > or electronically stored copies). > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: uint32_t BlueStore::Extent::logical_offset? 2016-11-22 22:42 ` Matt Benjamin @ 2016-11-22 22:45 ` Sage Weil 2016-11-22 23:02 ` Matt Benjamin 0 siblings, 1 reply; 8+ messages in thread From: Sage Weil @ 2016-11-22 22:45 UTC (permalink / raw) To: Matt Benjamin; +Cc: Somnath Roy, Igor Fedotov, ceph-devel On Tue, 22 Nov 2016, Matt Benjamin wrote: > It would seem preferable not to bake-in such a limit on extent size, > even if offsets exceeding that won't immediately be used. All things being equal, sure. But 4gb is 1-2 orders of magnitude larger than we recommend or design for, and it costs us memory. FWIW, the logical_offset is varint encoded on disk, so the disk format is actually unsized. In the meantime, I think that 100GB object size limit should really be more like 100MB! sage > > Matt > > ----- Original Message ----- > > From: "Somnath Roy" <Somnath.Roy@sandisk.com> > > To: "Sage Weil" <sage@newdream.net>, "Igor Fedotov" <ifedotov@mirantis.com> > > Cc: "ceph-devel" <ceph-devel@vger.kernel.org> > > Sent: Tuesday, November 22, 2016 4:53:18 PM > > Subject: RE: uint32_t BlueStore::Extent::logical_offset? > > > > Sage, > > OSD max obect size in the latest master is 100G , it used to be smaller.. > > > > OPTION(osd_max_object_size, OPT_U64, 100*1024L*1024L*1024L) // OSD's maximum > > object size > > > > Thanks & Regards > > Somnath > > > > -----Original Message----- > > From: ceph-devel-owner@vger.kernel.org > > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil > > Sent: Tuesday, November 22, 2016 1:47 PM > > To: Igor Fedotov > > Cc: ceph-devel > > Subject: Re: uint32_t BlueStore::Extent::logical_offset? > > > > On Tue, 22 Nov 2016, Igor Fedotov wrote: > > > Hi Sage, > > > > > > > > > I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide. > > > > > > IMHO it's to be uint64_t unless we limit onode size to 4Gb. > > > > > > Looks like we have implicit truncate when doing set_lextent/new Extent > > > at the moment and hence some issues with large onodes are possible. > > > > The max object size enforced in the OSD is ~128 MB (or in that neighborhood, > > if I remember correctly). We really shouldn't be storing individual rados > > objects that are orders of magnitude larger than that. > > > > sage > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > > body of a message to majordomo@vger.kernel.org More majordomo info at > > http://vger.kernel.org/majordomo-info.html > > PLEASE NOTE: The information contained in this electronic mail message is > > intended only for the use of the designated recipient(s) named above. If the > > reader of this message is not the intended recipient, you are hereby > > notified that you have received this message in error and that any review, > > dissemination, distribution, or copying of this message is strictly > > prohibited. If you have received this communication in error, please notify > > the sender by telephone or e-mail (as shown above) immediately and destroy > > any and all copies of this message in your possession (whether hard copies > > or electronically stored copies). > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Matt Benjamin > Red Hat, Inc. > 315 West Huron Street, Suite 140A > Ann Arbor, Michigan 48103 > > http://www.redhat.com/en/technologies/storage > > tel. 734-821-5101 > fax. 734-769-8938 > cel. 734-216-5309 > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: uint32_t BlueStore::Extent::logical_offset? 2016-11-22 22:45 ` Sage Weil @ 2016-11-22 23:02 ` Matt Benjamin 0 siblings, 0 replies; 8+ messages in thread From: Matt Benjamin @ 2016-11-22 23:02 UTC (permalink / raw) To: Sage Weil; +Cc: Somnath Roy, Igor Fedotov, ceph-devel It would seem like folks attempting to use bluestore in "ways we never thought of" might run into this fairly quickly. Maybe it's not as much a hard limit as I'm imagining with reference to file systems, though. Matt ----- Original Message ----- > From: "Sage Weil" <sage@newdream.net> > To: "Matt Benjamin" <mbenjamin@redhat.com> > Cc: "Somnath Roy" <Somnath.Roy@sandisk.com>, "Igor Fedotov" <ifedotov@mirantis.com>, "ceph-devel" > <ceph-devel@vger.kernel.org> > Sent: Tuesday, November 22, 2016 5:45:14 PM > Subject: Re: uint32_t BlueStore::Extent::logical_offset? > > On Tue, 22 Nov 2016, Matt Benjamin wrote: > > It would seem preferable not to bake-in such a limit on extent size, > > even if offsets exceeding that won't immediately be used. > > All things being equal, sure. But 4gb is 1-2 orders of magnitude larger > than we recommend or design for, and it costs us memory. > > FWIW, the logical_offset is varint encoded on disk, so the disk format is > actually unsized. > > In the meantime, I think that 100GB object size limit should really be > more like 100MB! > > sage > > > > > > Matt > > > > ----- Original Message ----- > > > From: "Somnath Roy" <Somnath.Roy@sandisk.com> > > > To: "Sage Weil" <sage@newdream.net>, "Igor Fedotov" > > > <ifedotov@mirantis.com> > > > Cc: "ceph-devel" <ceph-devel@vger.kernel.org> > > > Sent: Tuesday, November 22, 2016 4:53:18 PM > > > Subject: RE: uint32_t BlueStore::Extent::logical_offset? > > > > > > Sage, > > > OSD max obect size in the latest master is 100G , it used to be smaller.. > > > > > > OPTION(osd_max_object_size, OPT_U64, 100*1024L*1024L*1024L) // OSD's > > > maximum > > > object size > > > > > > Thanks & Regards > > > Somnath > > > > > > -----Original Message----- > > > From: ceph-devel-owner@vger.kernel.org > > > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil > > > Sent: Tuesday, November 22, 2016 1:47 PM > > > To: Igor Fedotov > > > Cc: ceph-devel > > > Subject: Re: uint32_t BlueStore::Extent::logical_offset? > > > > > > On Tue, 22 Nov 2016, Igor Fedotov wrote: > > > > Hi Sage, > > > > > > > > > > > > I'm wondering why BlueStore::Extent::logical_offset is 32-bit wide. > > > > > > > > IMHO it's to be uint64_t unless we limit onode size to 4Gb. > > > > > > > > Looks like we have implicit truncate when doing set_lextent/new Extent > > > > at the moment and hence some issues with large onodes are possible. > > > > > > The max object size enforced in the OSD is ~128 MB (or in that > > > neighborhood, > > > if I remember correctly). We really shouldn't be storing individual > > > rados > > > objects that are orders of magnitude larger than that. > > > > > > sage > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the > > > body of a message to majordomo@vger.kernel.org More majordomo info at > > > http://vger.kernel.org/majordomo-info.html > > > PLEASE NOTE: The information contained in this electronic mail message is > > > intended only for the use of the designated recipient(s) named above. If > > > the > > > reader of this message is not the intended recipient, you are hereby > > > notified that you have received this message in error and that any > > > review, > > > dissemination, distribution, or copying of this message is strictly > > > prohibited. If you have received this communication in error, please > > > notify > > > the sender by telephone or e-mail (as shown above) immediately and > > > destroy > > > any and all copies of this message in your possession (whether hard > > > copies > > > or electronically stored copies). > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > -- > > Matt Benjamin > > Red Hat, Inc. > > 315 West Huron Street, Suite 140A > > Ann Arbor, Michigan 48103 > > > > http://www.redhat.com/en/technologies/storage > > > > tel. 734-821-5101 > > fax. 734-769-8938 > > cel. 734-216-5309 > > > > > -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-11-22 23:43 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-22 16:58 uint32_t BlueStore::Extent::logical_offset? Igor Fedotov 2016-11-22 17:06 ` Gregory Farnum 2016-11-22 17:15 ` Igor Fedotov 2016-11-22 21:47 ` Sage Weil 2016-11-22 21:53 ` Somnath Roy 2016-11-22 22:42 ` Matt Benjamin 2016-11-22 22:45 ` Sage Weil 2016-11-22 23:02 ` Matt Benjamin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.