* Cache Layer on NVME driver
@ 2016-05-18 6:44 Rajath Shashidhara
[not found] ` <CACJqLyYGwhymjaVSoAGArPmPzFBmUHRrjGAtiQXKr6gEv_ZsSw@mail.gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Rajath Shashidhara @ 2016-05-18 6:44 UTC (permalink / raw)
To: ceph-devel; +Cc: Haomai Wang
Hello,
I was a GSoC'16 applicant for the project "Implementing Cache layer on
top of NVME Driver". Unfortunately, I was not selected for the
internship.
However, I would be interested in working on the project as an
independent contributor to Ceph.
I am expecting to receive the necessary support from Ceph developer community.
In case I missed out any important details in my project proposal or I
have the wrong understanding of the project, please help me figure out
the details.
Looking forward to contribute!
Thank you,
Rajath Shashidhara
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cache Layer on NVME driver
[not found] ` <CACJqLyYGwhymjaVSoAGArPmPzFBmUHRrjGAtiQXKr6gEv_ZsSw@mail.gmail.com>
@ 2016-05-18 13:32 ` Haomai Wang
2016-05-18 17:19 ` Sage Weil
0 siblings, 1 reply; 11+ messages in thread
From: Haomai Wang @ 2016-05-18 13:32 UTC (permalink / raw)
To: Rajath Shashidhara; +Cc: ceph-devel
Hi Rajath,
We are glad to see your passion, from my view, sage is planning to
implement a userspace cache in bluestore itself. Like
(https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
I guess the cache won't be a generic cache interface. Instead it will
be bound to specified needed object. So sage may give a brief?
On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>
>
> On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
> <rajath.shashidhara@gmail.com> wrote:
>>
>> Hello,
>>
>> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>> top of NVME Driver". Unfortunately, I was not selected for the
>> internship.
>
>
> Hi Rajath,
>
> We are glad to see your passion, from my view, sage is planning to implement
> a userspace cache in bluestore itself. Like
> (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>
> I guess the cache won't be a generic cache interface. Instead it will be
> bound to specified needed object. So sage may give a brief?
>
>>
>>
>> However, I would be interested in working on the project as an
>> independent contributor to Ceph.
>> I am expecting to receive the necessary support from Ceph developer
>> community.
>>
>> In case I missed out any important details in my project proposal or I
>> have the wrong understanding of the project, please help me figure out
>> the details.
>>
>> Looking forward to contribute!
>>
>> Thank you,
>> Rajath Shashidhara
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cache Layer on NVME driver
2016-05-18 13:32 ` Haomai Wang
@ 2016-05-18 17:19 ` Sage Weil
2016-05-18 23:50 ` Jianjian Huo
2016-05-19 5:12 ` Rajath Shashidhara
0 siblings, 2 replies; 11+ messages in thread
From: Sage Weil @ 2016-05-18 17:19 UTC (permalink / raw)
To: Haomai Wang; +Cc: Rajath Shashidhara, ceph-devel
Hi Rajath!
Great to hear you're interested in working with us outside of GSoC!
On Wed, 18 May 2016, Haomai Wang wrote:
> Hi Rajath,
>
> We are glad to see your passion, from my view, sage is planning to
> implement a userspace cache in bluestore itself. Like
> (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>
> I guess the cache won't be a generic cache interface. Instead it will
> be bound to specified needed object. So sage may give a brief?
Part of the reason why this project wasn't at the top of our list (we got
fewer slots than we had projects) was because the BlueStore code is in
flux and moving quite quickly. For the BlueStore side, we are building a
simple buffer cache that is tied to an Onode (in-memory per-object
metadata structure) and integrated tightly with the read and write IO
paths. This will eliminate our use of the block device buffer cache for
user/object data.
The other half of the picture, though, is the BlueFS layer that is
consumed by rocksdb: it also needs caching in order for rocksdb to perform
at all. My hope is that the code we write for the use data can be re-used
here as well, but it is still evolving.
The main missing piece I'd say is a way to string Buffer objects together
in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
policy that makes sense) so that trimming can be done safely and
efficiently. Right now the code is lock-free because each Onode is only
touched under the collection rwlock, but in order to do trimming we need
to be able to reap cold buffers from a global context.
Anyway, there is no clear or ready answer here yet, but we are ready to
discuss design/approach here on the list, and welcome your input (and
potentially, contributions to development!).
sage
>
> On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
> >
> >
> > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
> > <rajath.shashidhara@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
> >> top of NVME Driver". Unfortunately, I was not selected for the
> >> internship.
> >
> >
> > Hi Rajath,
> >
> > We are glad to see your passion, from my view, sage is planning to implement
> > a userspace cache in bluestore itself. Like
> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
> >
> > I guess the cache won't be a generic cache interface. Instead it will be
> > bound to specified needed object. So sage may give a brief?
> >
> >>
> >>
> >> However, I would be interested in working on the project as an
> >> independent contributor to Ceph.
> >> I am expecting to receive the necessary support from Ceph developer
> >> community.
> >>
> >> In case I missed out any important details in my project proposal or I
> >> have the wrong understanding of the project, please help me figure out
> >> the details.
> >>
> >> Looking forward to contribute!
> >>
> >> Thank you,
> >> Rajath Shashidhara
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Cache Layer on NVME driver
2016-05-18 17:19 ` Sage Weil
@ 2016-05-18 23:50 ` Jianjian Huo
2016-05-19 1:35 ` Haomai Wang
2016-05-19 6:43 ` Sage Weil
2016-05-19 5:12 ` Rajath Shashidhara
1 sibling, 2 replies; 11+ messages in thread
From: Jianjian Huo @ 2016-05-18 23:50 UTC (permalink / raw)
To: Sage Weil, Haomai Wang; +Cc: Rajath Shashidhara, ceph-devel
On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>
> Hi Rajath!
>
> Great to hear you're interested in working with us outside of GSoC!
>
> On Wed, 18 May 2016, Haomai Wang wrote:
> > Hi Rajath,
> >
> > We are glad to see your passion, from my view, sage is planning to
> > implement a userspace cache in bluestore itself. Like
> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
> >
> > I guess the cache won't be a generic cache interface. Instead it will
> > be bound to specified needed object. So sage may give a brief?
>
> Part of the reason why this project wasn't at the top of our list (we got
> fewer slots than we had projects) was because the BlueStore code is in
> flux and moving quite quickly. For the BlueStore side, we are building a
> simple buffer cache that is tied to an Onode (in-memory per-object
> metadata structure) and integrated tightly with the read and write IO
> paths. This will eliminate our use of the block device buffer cache for
> user/object data.
>
> The other half of the picture, though, is the BlueFS layer that is
> consumed by rocksdb: it also needs caching in order for rocksdb to perform
> at all. My hope is that the code we write for the use data can be re-used
> here as well, but it is still evolving.
When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
since Bluestore don't compress meta data in RocksDB.
Jianjian
>
> The main missing piece I'd say is a way to string Buffer objects together
> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
> policy that makes sense) so that trimming can be done safely and
> efficiently. Right now the code is lock-free because each Onode is only
> touched under the collection rwlock, but in order to do trimming we need
> to be able to reap cold buffers from a global context.
>
> Anyway, there is no clear or ready answer here yet, but we are ready to
> discuss design/approach here on the list, and welcome your input (and
> potentially, contributions to development!).
>
> sage
>
>
> >
> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
> > >
> > >
> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
> > > <rajath.shashidhara@gmail.com> wrote:
> > >>
> > >> Hello,
> > >>
> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
> > >> top of NVME Driver". Unfortunately, I was not selected for the
> > >> internship.
> > >
> > >
> > > Hi Rajath,
> > >
> > > We are glad to see your passion, from my view, sage is planning to implement
> > > a userspace cache in bluestore itself. Like
> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
> > >
> > > I guess the cache won't be a generic cache interface. Instead it will be
> > > bound to specified needed object. So sage may give a brief?
> > >
> > >>
> > >>
> > >> However, I would be interested in working on the project as an
> > >> independent contributor to Ceph.
> > >> I am expecting to receive the necessary support from Ceph developer
> > >> community.
> > >>
> > >> In case I missed out any important details in my project proposal or I
> > >> have the wrong understanding of the project, please help me figure out
> > >> the details.
> > >>
> > >> Looking forward to contribute!
> > >>
> > >> Thank you,
> > >> Rajath Shashidhara
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cache Layer on NVME driver
2016-05-18 23:50 ` Jianjian Huo
@ 2016-05-19 1:35 ` Haomai Wang
2016-05-19 2:39 ` Jianjian Huo
2016-05-19 6:43 ` Sage Weil
1 sibling, 1 reply; 11+ messages in thread
From: Haomai Wang @ 2016-05-19 1:35 UTC (permalink / raw)
To: Jianjian Huo; +Cc: Sage Weil, Rajath Shashidhara, ceph-devel
On Thu, May 19, 2016 at 7:50 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>
> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>>
>> Hi Rajath!
>>
>> Great to hear you're interested in working with us outside of GSoC!
>>
>> On Wed, 18 May 2016, Haomai Wang wrote:
>> > Hi Rajath,
>> >
>> > We are glad to see your passion, from my view, sage is planning to
>> > implement a userspace cache in bluestore itself. Like
>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>> >
>> > I guess the cache won't be a generic cache interface. Instead it will
>> > be bound to specified needed object. So sage may give a brief?
>>
>> Part of the reason why this project wasn't at the top of our list (we got
>> fewer slots than we had projects) was because the BlueStore code is in
>> flux and moving quite quickly. For the BlueStore side, we are building a
>> simple buffer cache that is tied to an Onode (in-memory per-object
>> metadata structure) and integrated tightly with the read and write IO
>> paths. This will eliminate our use of the block device buffer cache for
>> user/object data.
>>
>> The other half of the picture, though, is the BlueFS layer that is
>> consumed by rocksdb: it also needs caching in order for rocksdb to perform
>> at all. My hope is that the code we write for the use data can be re-used
>> here as well, but it is still evolving.
>
> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
> since Bluestore don't compress meta data in RocksDB.
Actually this is not behaviored as expected. From my last nvmedevice
benchmark, lots of read still go down device instead of caching by
rocksdb when I set a very large block cache. I guess there exists some
gaps between our usages and rocksdb implementation
>
> Jianjian
>>
>> The main missing piece I'd say is a way to string Buffer objects together
>> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
>> policy that makes sense) so that trimming can be done safely and
>> efficiently. Right now the code is lock-free because each Onode is only
>> touched under the collection rwlock, but in order to do trimming we need
>> to be able to reap cold buffers from a global context.
>>
>> Anyway, there is no clear or ready answer here yet, but we are ready to
>> discuss design/approach here on the list, and welcome your input (and
>> potentially, contributions to development!).
>>
>> sage
>>
>>
>> >
>> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>> > >
>> > >
>> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>> > > <rajath.shashidhara@gmail.com> wrote:
>> > >>
>> > >> Hello,
>> > >>
>> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>> > >> top of NVME Driver". Unfortunately, I was not selected for the
>> > >> internship.
>> > >
>> > >
>> > > Hi Rajath,
>> > >
>> > > We are glad to see your passion, from my view, sage is planning to implement
>> > > a userspace cache in bluestore itself. Like
>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>> > >
>> > > I guess the cache won't be a generic cache interface. Instead it will be
>> > > bound to specified needed object. So sage may give a brief?
>> > >
>> > >>
>> > >>
>> > >> However, I would be interested in working on the project as an
>> > >> independent contributor to Ceph.
>> > >> I am expecting to receive the necessary support from Ceph developer
>> > >> community.
>> > >>
>> > >> In case I missed out any important details in my project proposal or I
>> > >> have the wrong understanding of the project, please help me figure out
>> > >> the details.
>> > >>
>> > >> Looking forward to contribute!
>> > >>
>> > >> Thank you,
>> > >> Rajath Shashidhara
>> > >
>> > >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Cache Layer on NVME driver
2016-05-19 1:35 ` Haomai Wang
@ 2016-05-19 2:39 ` Jianjian Huo
2016-05-19 5:34 ` Haomai Wang
0 siblings, 1 reply; 11+ messages in thread
From: Jianjian Huo @ 2016-05-19 2:39 UTC (permalink / raw)
To: Haomai Wang; +Cc: Sage Weil, Rajath Shashidhara, ceph-devel
On Wed, May 18, 2016 at 6:35 PM, Haomai Wang <haomai@xsky.com> wrote:
> On Thu, May 19, 2016 at 7:50 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>
>> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>>>
>>> Hi Rajath!
>>>
>>> Great to hear you're interested in working with us outside of GSoC!
>>>
>>> On Wed, 18 May 2016, Haomai Wang wrote:
>>> > Hi Rajath,
>>> >
>>> > We are glad to see your passion, from my view, sage is planning to
>>> > implement a userspace cache in bluestore itself. Like
>>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>> >
>>> > I guess the cache won't be a generic cache interface. Instead it will
>>> > be bound to specified needed object. So sage may give a brief?
>>>
>>> Part of the reason why this project wasn't at the top of our list (we got
>>> fewer slots than we had projects) was because the BlueStore code is in
>>> flux and moving quite quickly. For the BlueStore side, we are building a
>>> simple buffer cache that is tied to an Onode (in-memory per-object
>>> metadata structure) and integrated tightly with the read and write IO
>>> paths. This will eliminate our use of the block device buffer cache for
>>> user/object data.
>>>
>>> The other half of the picture, though, is the BlueFS layer that is
>>> consumed by rocksdb: it also needs caching in order for rocksdb to perform
>>> at all. My hope is that the code we write for the use data can be re-used
>>> here as well, but it is still evolving.
>>
>> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
>> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
>> since Bluestore don't compress meta data in RocksDB.
>
> Actually this is not behaviored as expected. From my last nvmedevice
> benchmark, lots of read still go down device instead of caching by
> rocksdb when I set a very large block cache. I guess there exists some
> gaps between our usages and rocksdb implementation
What kind of workload did you use for that benchmarking, 100% read?
>
>>
>> Jianjian
>>>
>>> The main missing piece I'd say is a way to string Buffer objects together
>>> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
>>> policy that makes sense) so that trimming can be done safely and
>>> efficiently. Right now the code is lock-free because each Onode is only
>>> touched under the collection rwlock, but in order to do trimming we need
>>> to be able to reap cold buffers from a global context.
>>>
>>> Anyway, there is no clear or ready answer here yet, but we are ready to
>>> discuss design/approach here on the list, and welcome your input (and
>>> potentially, contributions to development!).
>>>
>>> sage
>>>
>>>
>>> >
>>> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>>> > >
>>> > >
>>> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>>> > > <rajath.shashidhara@gmail.com> wrote:
>>> > >>
>>> > >> Hello,
>>> > >>
>>> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>>> > >> top of NVME Driver". Unfortunately, I was not selected for the
>>> > >> internship.
>>> > >
>>> > >
>>> > > Hi Rajath,
>>> > >
>>> > > We are glad to see your passion, from my view, sage is planning to implement
>>> > > a userspace cache in bluestore itself. Like
>>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>> > >
>>> > > I guess the cache won't be a generic cache interface. Instead it will be
>>> > > bound to specified needed object. So sage may give a brief?
>>> > >
>>> > >>
>>> > >>
>>> > >> However, I would be interested in working on the project as an
>>> > >> independent contributor to Ceph.
>>> > >> I am expecting to receive the necessary support from Ceph developer
>>> > >> community.
>>> > >>
>>> > >> In case I missed out any important details in my project proposal or I
>>> > >> have the wrong understanding of the project, please help me figure out
>>> > >> the details.
>>> > >>
>>> > >> Looking forward to contribute!
>>> > >>
>>> > >> Thank you,
>>> > >> Rajath Shashidhara
>>> > >
>>> > >
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> > the body of a message to majordomo@vger.kernel.org
>>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> >
>>> >
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cache Layer on NVME driver
2016-05-18 17:19 ` Sage Weil
2016-05-18 23:50 ` Jianjian Huo
@ 2016-05-19 5:12 ` Rajath Shashidhara
1 sibling, 0 replies; 11+ messages in thread
From: Rajath Shashidhara @ 2016-05-19 5:12 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hi,
I will look into the new buffer cache implemented in BlueStore.
I will share my thoughts once I assess what has been done and what is
planned for the future.
As this is my first exposure to Ceph, this might take time.
Rajath
On Wed, May 18, 2016 at 10:49 PM, Sage Weil <sage@newdream.net> wrote:
> Hi Rajath!
>
> Great to hear you're interested in working with us outside of GSoC!
>
> On Wed, 18 May 2016, Haomai Wang wrote:
>> Hi Rajath,
>>
>> We are glad to see your passion, from my view, sage is planning to
>> implement a userspace cache in bluestore itself. Like
>> (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>
>> I guess the cache won't be a generic cache interface. Instead it will
>> be bound to specified needed object. So sage may give a brief?
>
> Part of the reason why this project wasn't at the top of our list (we got
> fewer slots than we had projects) was because the BlueStore code is in
> flux and moving quite quickly. For the BlueStore side, we are building a
> simple buffer cache that is tied to an Onode (in-memory per-object
> metadata structure) and integrated tightly with the read and write IO
> paths. This will eliminate our use of the block device buffer cache for
> user/object data.
>
> The other half of the picture, though, is the BlueFS layer that is
> consumed by rocksdb: it also needs caching in order for rocksdb to perform
> at all. My hope is that the code we write for the use data can be re-used
> here as well, but it is still evolving.
>
> The main missing piece I'd say is a way to string Buffer objects together
> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
> policy that makes sense) so that trimming can be done safely and
> efficiently. Right now the code is lock-free because each Onode is only
> touched under the collection rwlock, but in order to do trimming we need
> to be able to reap cold buffers from a global context.
>
> Anyway, there is no clear or ready answer here yet, but we are ready to
> discuss design/approach here on the list, and welcome your input (and
> potentially, contributions to development!).
>
> sage
>
>
>>
>> On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>> >
>> >
>> > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>> > <rajath.shashidhara@gmail.com> wrote:
>> >>
>> >> Hello,
>> >>
>> >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>> >> top of NVME Driver". Unfortunately, I was not selected for the
>> >> internship.
>> >
>> >
>> > Hi Rajath,
>> >
>> > We are glad to see your passion, from my view, sage is planning to implement
>> > a userspace cache in bluestore itself. Like
>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>> >
>> > I guess the cache won't be a generic cache interface. Instead it will be
>> > bound to specified needed object. So sage may give a brief?
>> >
>> >>
>> >>
>> >> However, I would be interested in working on the project as an
>> >> independent contributor to Ceph.
>> >> I am expecting to receive the necessary support from Ceph developer
>> >> community.
>> >>
>> >> In case I missed out any important details in my project proposal or I
>> >> have the wrong understanding of the project, please help me figure out
>> >> the details.
>> >>
>> >> Looking forward to contribute!
>> >>
>> >> Thank you,
>> >> Rajath Shashidhara
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cache Layer on NVME driver
2016-05-19 2:39 ` Jianjian Huo
@ 2016-05-19 5:34 ` Haomai Wang
2016-05-19 5:34 ` Haomai Wang
0 siblings, 1 reply; 11+ messages in thread
From: Haomai Wang @ 2016-05-19 5:34 UTC (permalink / raw)
To: Jianjian Huo; +Cc: Sage Weil, Rajath Shashidhara, ceph-devel
100%, because of it's a small rbd image. all metadata should be cached.
On Thu, May 19, 2016 at 10:39 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>
> On Wed, May 18, 2016 at 6:35 PM, Haomai Wang <haomai@xsky.com> wrote:
>> On Thu, May 19, 2016 at 7:50 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>>
>>> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>>>>
>>>> Hi Rajath!
>>>>
>>>> Great to hear you're interested in working with us outside of GSoC!
>>>>
>>>> On Wed, 18 May 2016, Haomai Wang wrote:
>>>> > Hi Rajath,
>>>> >
>>>> > We are glad to see your passion, from my view, sage is planning to
>>>> > implement a userspace cache in bluestore itself. Like
>>>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>>> >
>>>> > I guess the cache won't be a generic cache interface. Instead it will
>>>> > be bound to specified needed object. So sage may give a brief?
>>>>
>>>> Part of the reason why this project wasn't at the top of our list (we got
>>>> fewer slots than we had projects) was because the BlueStore code is in
>>>> flux and moving quite quickly. For the BlueStore side, we are building a
>>>> simple buffer cache that is tied to an Onode (in-memory per-object
>>>> metadata structure) and integrated tightly with the read and write IO
>>>> paths. This will eliminate our use of the block device buffer cache for
>>>> user/object data.
>>>>
>>>> The other half of the picture, though, is the BlueFS layer that is
>>>> consumed by rocksdb: it also needs caching in order for rocksdb to perform
>>>> at all. My hope is that the code we write for the use data can be re-used
>>>> here as well, but it is still evolving.
>>>
>>> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
>>> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
>>> since Bluestore don't compress meta data in RocksDB.
>>
>> Actually this is not behaviored as expected. From my last nvmedevice
>> benchmark, lots of read still go down device instead of caching by
>> rocksdb when I set a very large block cache. I guess there exists some
>> gaps between our usages and rocksdb implementation
>
> What kind of workload did you use for that benchmarking, 100% read?
>
>>
>>>
>>> Jianjian
>>>>
>>>> The main missing piece I'd say is a way to string Buffer objects together
>>>> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
>>>> policy that makes sense) so that trimming can be done safely and
>>>> efficiently. Right now the code is lock-free because each Onode is only
>>>> touched under the collection rwlock, but in order to do trimming we need
>>>> to be able to reap cold buffers from a global context.
>>>>
>>>> Anyway, there is no clear or ready answer here yet, but we are ready to
>>>> discuss design/approach here on the list, and welcome your input (and
>>>> potentially, contributions to development!).
>>>>
>>>> sage
>>>>
>>>>
>>>> >
>>>> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>>>> > >
>>>> > >
>>>> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>>>> > > <rajath.shashidhara@gmail.com> wrote:
>>>> > >>
>>>> > >> Hello,
>>>> > >>
>>>> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>>>> > >> top of NVME Driver". Unfortunately, I was not selected for the
>>>> > >> internship.
>>>> > >
>>>> > >
>>>> > > Hi Rajath,
>>>> > >
>>>> > > We are glad to see your passion, from my view, sage is planning to implement
>>>> > > a userspace cache in bluestore itself. Like
>>>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>>> > >
>>>> > > I guess the cache won't be a generic cache interface. Instead it will be
>>>> > > bound to specified needed object. So sage may give a brief?
>>>> > >
>>>> > >>
>>>> > >>
>>>> > >> However, I would be interested in working on the project as an
>>>> > >> independent contributor to Ceph.
>>>> > >> I am expecting to receive the necessary support from Ceph developer
>>>> > >> community.
>>>> > >>
>>>> > >> In case I missed out any important details in my project proposal or I
>>>> > >> have the wrong understanding of the project, please help me figure out
>>>> > >> the details.
>>>> > >>
>>>> > >> Looking forward to contribute!
>>>> > >>
>>>> > >> Thank you,
>>>> > >> Rajath Shashidhara
>>>> > >
>>>> > >
>>>> > --
>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> > the body of a message to majordomo@vger.kernel.org
>>>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> >
>>>> >
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cache Layer on NVME driver
2016-05-19 5:34 ` Haomai Wang
@ 2016-05-19 5:34 ` Haomai Wang
0 siblings, 0 replies; 11+ messages in thread
From: Haomai Wang @ 2016-05-19 5:34 UTC (permalink / raw)
To: Jianjian Huo; +Cc: Sage Weil, Rajath Shashidhara, ceph-devel
sorry, 100% rw
On Thu, May 19, 2016 at 1:34 PM, Haomai Wang <haomai@xsky.com> wrote:
> 100%, because of it's a small rbd image. all metadata should be cached.
>
> On Thu, May 19, 2016 at 10:39 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>
>> On Wed, May 18, 2016 at 6:35 PM, Haomai Wang <haomai@xsky.com> wrote:
>>> On Thu, May 19, 2016 at 7:50 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>>>
>>>> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>>>>>
>>>>> Hi Rajath!
>>>>>
>>>>> Great to hear you're interested in working with us outside of GSoC!
>>>>>
>>>>> On Wed, 18 May 2016, Haomai Wang wrote:
>>>>> > Hi Rajath,
>>>>> >
>>>>> > We are glad to see your passion, from my view, sage is planning to
>>>>> > implement a userspace cache in bluestore itself. Like
>>>>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>>>> >
>>>>> > I guess the cache won't be a generic cache interface. Instead it will
>>>>> > be bound to specified needed object. So sage may give a brief?
>>>>>
>>>>> Part of the reason why this project wasn't at the top of our list (we got
>>>>> fewer slots than we had projects) was because the BlueStore code is in
>>>>> flux and moving quite quickly. For the BlueStore side, we are building a
>>>>> simple buffer cache that is tied to an Onode (in-memory per-object
>>>>> metadata structure) and integrated tightly with the read and write IO
>>>>> paths. This will eliminate our use of the block device buffer cache for
>>>>> user/object data.
>>>>>
>>>>> The other half of the picture, though, is the BlueFS layer that is
>>>>> consumed by rocksdb: it also needs caching in order for rocksdb to perform
>>>>> at all. My hope is that the code we write for the use data can be re-used
>>>>> here as well, but it is still evolving.
>>>>
>>>> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
>>>> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
>>>> since Bluestore don't compress meta data in RocksDB.
>>>
>>> Actually this is not behaviored as expected. From my last nvmedevice
>>> benchmark, lots of read still go down device instead of caching by
>>> rocksdb when I set a very large block cache. I guess there exists some
>>> gaps between our usages and rocksdb implementation
>>
>> What kind of workload did you use for that benchmarking, 100% read?
>>
>>>
>>>>
>>>> Jianjian
>>>>>
>>>>> The main missing piece I'd say is a way to string Buffer objects together
>>>>> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
>>>>> policy that makes sense) so that trimming can be done safely and
>>>>> efficiently. Right now the code is lock-free because each Onode is only
>>>>> touched under the collection rwlock, but in order to do trimming we need
>>>>> to be able to reap cold buffers from a global context.
>>>>>
>>>>> Anyway, there is no clear or ready answer here yet, but we are ready to
>>>>> discuss design/approach here on the list, and welcome your input (and
>>>>> potentially, contributions to development!).
>>>>>
>>>>> sage
>>>>>
>>>>>
>>>>> >
>>>>> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>>>>> > >
>>>>> > >
>>>>> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>>>>> > > <rajath.shashidhara@gmail.com> wrote:
>>>>> > >>
>>>>> > >> Hello,
>>>>> > >>
>>>>> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>>>>> > >> top of NVME Driver". Unfortunately, I was not selected for the
>>>>> > >> internship.
>>>>> > >
>>>>> > >
>>>>> > > Hi Rajath,
>>>>> > >
>>>>> > > We are glad to see your passion, from my view, sage is planning to implement
>>>>> > > a userspace cache in bluestore itself. Like
>>>>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>>>> > >
>>>>> > > I guess the cache won't be a generic cache interface. Instead it will be
>>>>> > > bound to specified needed object. So sage may give a brief?
>>>>> > >
>>>>> > >>
>>>>> > >>
>>>>> > >> However, I would be interested in working on the project as an
>>>>> > >> independent contributor to Ceph.
>>>>> > >> I am expecting to receive the necessary support from Ceph developer
>>>>> > >> community.
>>>>> > >>
>>>>> > >> In case I missed out any important details in my project proposal or I
>>>>> > >> have the wrong understanding of the project, please help me figure out
>>>>> > >> the details.
>>>>> > >>
>>>>> > >> Looking forward to contribute!
>>>>> > >>
>>>>> > >> Thank you,
>>>>> > >> Rajath Shashidhara
>>>>> > >
>>>>> > >
>>>>> > --
>>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> > the body of a message to majordomo@vger.kernel.org
>>>>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> >
>>>>> >
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Cache Layer on NVME driver
2016-05-18 23:50 ` Jianjian Huo
2016-05-19 1:35 ` Haomai Wang
@ 2016-05-19 6:43 ` Sage Weil
2016-05-19 7:08 ` Haomai Wang
1 sibling, 1 reply; 11+ messages in thread
From: Sage Weil @ 2016-05-19 6:43 UTC (permalink / raw)
To: Jianjian Huo; +Cc: Haomai Wang, Rajath Shashidhara, ceph-devel
On Wed, 18 May 2016, Jianjian Huo wrote:
> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
> >
> > Hi Rajath!
> >
> > Great to hear you're interested in working with us outside of GSoC!
> >
> > On Wed, 18 May 2016, Haomai Wang wrote:
> > > Hi Rajath,
> > >
> > > We are glad to see your passion, from my view, sage is planning to
> > > implement a userspace cache in bluestore itself. Like
> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
> > >
> > > I guess the cache won't be a generic cache interface. Instead it will
> > > be bound to specified needed object. So sage may give a brief?
> >
> > Part of the reason why this project wasn't at the top of our list (we got
> > fewer slots than we had projects) was because the BlueStore code is in
> > flux and moving quite quickly. For the BlueStore side, we are building a
> > simple buffer cache that is tied to an Onode (in-memory per-object
> > metadata structure) and integrated tightly with the read and write IO
> > paths. This will eliminate our use of the block device buffer cache for
> > user/object data.
> >
> > The other half of the picture, though, is the BlueFS layer that is
> > consumed by rocksdb: it also needs caching in order for rocksdb to perform
> > at all. My hope is that the code we write for the use data can be re-used
> > here as well, but it is still evolving.
>
> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
> since Bluestore don't compress meta data in RocksDB.
What options control this? I was having trouble finding it the last time
I looked. We should switch it now (and use unbuffered IO from BlueFS) if
we can.
Thanks!
sage
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Cache Layer on NVME driver
2016-05-19 6:43 ` Sage Weil
@ 2016-05-19 7:08 ` Haomai Wang
0 siblings, 0 replies; 11+ messages in thread
From: Haomai Wang @ 2016-05-19 7:08 UTC (permalink / raw)
To: Sage Weil; +Cc: Jianjian Huo, Rajath Shashidhara, ceph-devel
I think jianjian refer to
this(https://github.com/ceph/ceph/blob/master/src/kv/RocksDBStore.cc#L285)
On Thu, May 19, 2016 at 2:43 PM, Sage Weil <sage@newdream.net> wrote:
> On Wed, 18 May 2016, Jianjian Huo wrote:
>> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>> >
>> > Hi Rajath!
>> >
>> > Great to hear you're interested in working with us outside of GSoC!
>> >
>> > On Wed, 18 May 2016, Haomai Wang wrote:
>> > > Hi Rajath,
>> > >
>> > > We are glad to see your passion, from my view, sage is planning to
>> > > implement a userspace cache in bluestore itself. Like
>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>> > >
>> > > I guess the cache won't be a generic cache interface. Instead it will
>> > > be bound to specified needed object. So sage may give a brief?
>> >
>> > Part of the reason why this project wasn't at the top of our list (we got
>> > fewer slots than we had projects) was because the BlueStore code is in
>> > flux and moving quite quickly. For the BlueStore side, we are building a
>> > simple buffer cache that is tied to an Onode (in-memory per-object
>> > metadata structure) and integrated tightly with the read and write IO
>> > paths. This will eliminate our use of the block device buffer cache for
>> > user/object data.
>> >
>> > The other half of the picture, though, is the BlueFS layer that is
>> > consumed by rocksdb: it also needs caching in order for rocksdb to perform
>> > at all. My hope is that the code we write for the use data can be re-used
>> > here as well, but it is still evolving.
>>
>> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
>> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
>> since Bluestore don't compress meta data in RocksDB.
>
> What options control this? I was having trouble finding it the last time
> I looked. We should switch it now (and use unbuffered IO from BlueFS) if
> we can.
>
> Thanks!
> sage
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-05-19 7:08 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-18 6:44 Cache Layer on NVME driver Rajath Shashidhara
[not found] ` <CACJqLyYGwhymjaVSoAGArPmPzFBmUHRrjGAtiQXKr6gEv_ZsSw@mail.gmail.com>
2016-05-18 13:32 ` Haomai Wang
2016-05-18 17:19 ` Sage Weil
2016-05-18 23:50 ` Jianjian Huo
2016-05-19 1:35 ` Haomai Wang
2016-05-19 2:39 ` Jianjian Huo
2016-05-19 5:34 ` Haomai Wang
2016-05-19 5:34 ` Haomai Wang
2016-05-19 6:43 ` Sage Weil
2016-05-19 7:08 ` Haomai Wang
2016-05-19 5:12 ` Rajath Shashidhara
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.