All of lore.kernel.org
 help / color / mirror / Atom feed
* Cache Layer on NVME driver
@ 2016-05-18  6:44 Rajath Shashidhara
       [not found] ` <CACJqLyYGwhymjaVSoAGArPmPzFBmUHRrjGAtiQXKr6gEv_ZsSw@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Rajath Shashidhara @ 2016-05-18  6:44 UTC (permalink / raw)
  To: ceph-devel; +Cc: Haomai Wang

Hello,

I was a GSoC'16 applicant for the project "Implementing Cache layer on
top of NVME Driver". Unfortunately, I was not selected for the
internship.

However, I would be interested in working on the project as an
independent contributor to Ceph.
I am expecting to receive the necessary support from Ceph developer community.

In case I missed out any important details in my project proposal or I
have the wrong understanding of the project, please help me figure out
the details.

Looking forward to contribute!

Thank you,
Rajath Shashidhara

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cache Layer on NVME driver
       [not found] ` <CACJqLyYGwhymjaVSoAGArPmPzFBmUHRrjGAtiQXKr6gEv_ZsSw@mail.gmail.com>
@ 2016-05-18 13:32   ` Haomai Wang
  2016-05-18 17:19     ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Haomai Wang @ 2016-05-18 13:32 UTC (permalink / raw)
  To: Rajath Shashidhara; +Cc: ceph-devel

Hi Rajath,

We are glad to see your passion, from my view, sage is planning to
implement a userspace cache in bluestore itself. Like
(https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).

I guess the cache won't be a generic cache interface. Instead it will
be bound to specified needed object. So sage may give a brief?

On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>
>
> On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
> <rajath.shashidhara@gmail.com> wrote:
>>
>> Hello,
>>
>> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>> top of NVME Driver". Unfortunately, I was not selected for the
>> internship.
>
>
> Hi Rajath,
>
> We are glad to see your passion, from my view, sage is planning to implement
> a userspace cache in bluestore itself. Like
> (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>
> I guess the cache won't be a generic cache interface. Instead it will be
> bound to specified needed object. So sage may give a brief?
>
>>
>>
>> However, I would be interested in working on the project as an
>> independent contributor to Ceph.
>> I am expecting to receive the necessary support from Ceph developer
>> community.
>>
>> In case I missed out any important details in my project proposal or I
>> have the wrong understanding of the project, please help me figure out
>> the details.
>>
>> Looking forward to contribute!
>>
>> Thank you,
>> Rajath Shashidhara
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cache Layer on NVME driver
  2016-05-18 13:32   ` Haomai Wang
@ 2016-05-18 17:19     ` Sage Weil
  2016-05-18 23:50       ` Jianjian Huo
  2016-05-19  5:12       ` Rajath Shashidhara
  0 siblings, 2 replies; 11+ messages in thread
From: Sage Weil @ 2016-05-18 17:19 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Rajath Shashidhara, ceph-devel

Hi Rajath!

Great to hear you're interested in working with us outside of GSoC!

On Wed, 18 May 2016, Haomai Wang wrote:
> Hi Rajath,
> 
> We are glad to see your passion, from my view, sage is planning to
> implement a userspace cache in bluestore itself. Like
> (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
> 
> I guess the cache won't be a generic cache interface. Instead it will
> be bound to specified needed object. So sage may give a brief?

Part of the reason why this project wasn't at the top of our list (we got 
fewer slots than we had projects) was because the BlueStore code is in 
flux and moving quite quickly.  For the BlueStore side, we are building a 
simple buffer cache that is tied to an Onode (in-memory per-object 
metadata structure) and integrated tightly with the read and write IO
paths.  This will eliminate our use of the block device buffer cache for 
user/object data.

The other half of the picture, though, is the BlueFS layer that is 
consumed by rocksdb: it also needs caching in order for rocksdb to perform 
at all.  My hope is that the code we write for the use data can be re-used 
here as well, but it is still evolving.

The main missing piece I'd say is a way to string Buffer objects together 
in a global(ish) LRU (or set of LRUs, or whatever we need for the caching 
policy that makes sense) so that trimming can be done safely and 
efficiently.  Right now the code is lock-free because each Onode is only 
touched under the collection rwlock, but in order to do trimming we need 
to be able to reap cold buffers from a global context.

Anyway, there is no clear or ready answer here yet, but we are ready to 
discuss design/approach here on the list, and welcome your input (and 
potentially, contributions to development!).

sage


> 
> On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
> >
> >
> > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
> > <rajath.shashidhara@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
> >> top of NVME Driver". Unfortunately, I was not selected for the
> >> internship.
> >
> >
> > Hi Rajath,
> >
> > We are glad to see your passion, from my view, sage is planning to implement
> > a userspace cache in bluestore itself. Like
> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
> >
> > I guess the cache won't be a generic cache interface. Instead it will be
> > bound to specified needed object. So sage may give a brief?
> >
> >>
> >>
> >> However, I would be interested in working on the project as an
> >> independent contributor to Ceph.
> >> I am expecting to receive the necessary support from Ceph developer
> >> community.
> >>
> >> In case I missed out any important details in my project proposal or I
> >> have the wrong understanding of the project, please help me figure out
> >> the details.
> >>
> >> Looking forward to contribute!
> >>
> >> Thank you,
> >> Rajath Shashidhara
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache Layer on NVME driver
  2016-05-18 17:19     ` Sage Weil
@ 2016-05-18 23:50       ` Jianjian Huo
  2016-05-19  1:35         ` Haomai Wang
  2016-05-19  6:43         ` Sage Weil
  2016-05-19  5:12       ` Rajath Shashidhara
  1 sibling, 2 replies; 11+ messages in thread
From: Jianjian Huo @ 2016-05-18 23:50 UTC (permalink / raw)
  To: Sage Weil, Haomai Wang; +Cc: Rajath Shashidhara, ceph-devel


On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>
> Hi Rajath!
>
> Great to hear you're interested in working with us outside of GSoC!
>
> On Wed, 18 May 2016, Haomai Wang wrote:
> > Hi Rajath,
> >
> > We are glad to see your passion, from my view, sage is planning to
> > implement a userspace cache in bluestore itself. Like
> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
> >
> > I guess the cache won't be a generic cache interface. Instead it will
> > be bound to specified needed object. So sage may give a brief?
>
> Part of the reason why this project wasn't at the top of our list (we got
> fewer slots than we had projects) was because the BlueStore code is in
> flux and moving quite quickly.  For the BlueStore side, we are building a
> simple buffer cache that is tied to an Onode (in-memory per-object
> metadata structure) and integrated tightly with the read and write IO
> paths.  This will eliminate our use of the block device buffer cache for
> user/object data.
>
> The other half of the picture, though, is the BlueFS layer that is
> consumed by rocksdb: it also needs caching in order for rocksdb to perform
> at all.  My hope is that the code we write for the use data can be re-used
> here as well, but it is still evolving.

When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well. 
RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache, 
since Bluestore don't compress meta data in RocksDB.

Jianjian
>
> The main missing piece I'd say is a way to string Buffer objects together
> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
> policy that makes sense) so that trimming can be done safely and
> efficiently.  Right now the code is lock-free because each Onode is only
> touched under the collection rwlock, but in order to do trimming we need
> to be able to reap cold buffers from a global context.
>
> Anyway, there is no clear or ready answer here yet, but we are ready to
> discuss design/approach here on the list, and welcome your input (and
> potentially, contributions to development!).
>
> sage
>
>
> >
> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
> > >
> > >
> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
> > > <rajath.shashidhara@gmail.com> wrote:
> > >>
> > >> Hello,
> > >>
> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
> > >> top of NVME Driver". Unfortunately, I was not selected for the
> > >> internship.
> > >
> > >
> > > Hi Rajath,
> > >
> > > We are glad to see your passion, from my view, sage is planning to implement
> > > a userspace cache in bluestore itself. Like
> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
> > >
> > > I guess the cache won't be a generic cache interface. Instead it will be
> > > bound to specified needed object. So sage may give a brief?
> > >
> > >>
> > >>
> > >> However, I would be interested in working on the project as an
> > >> independent contributor to Ceph.
> > >> I am expecting to receive the necessary support from Ceph developer
> > >> community.
> > >>
> > >> In case I missed out any important details in my project proposal or I
> > >> have the wrong understanding of the project, please help me figure out
> > >> the details.
> > >>
> > >> Looking forward to contribute!
> > >>
> > >> Thank you,
> > >> Rajath Shashidhara
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cache Layer on NVME driver
  2016-05-18 23:50       ` Jianjian Huo
@ 2016-05-19  1:35         ` Haomai Wang
  2016-05-19  2:39           ` Jianjian Huo
  2016-05-19  6:43         ` Sage Weil
  1 sibling, 1 reply; 11+ messages in thread
From: Haomai Wang @ 2016-05-19  1:35 UTC (permalink / raw)
  To: Jianjian Huo; +Cc: Sage Weil, Rajath Shashidhara, ceph-devel

On Thu, May 19, 2016 at 7:50 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>
> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>>
>> Hi Rajath!
>>
>> Great to hear you're interested in working with us outside of GSoC!
>>
>> On Wed, 18 May 2016, Haomai Wang wrote:
>> > Hi Rajath,
>> >
>> > We are glad to see your passion, from my view, sage is planning to
>> > implement a userspace cache in bluestore itself. Like
>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>> >
>> > I guess the cache won't be a generic cache interface. Instead it will
>> > be bound to specified needed object. So sage may give a brief?
>>
>> Part of the reason why this project wasn't at the top of our list (we got
>> fewer slots than we had projects) was because the BlueStore code is in
>> flux and moving quite quickly.  For the BlueStore side, we are building a
>> simple buffer cache that is tied to an Onode (in-memory per-object
>> metadata structure) and integrated tightly with the read and write IO
>> paths.  This will eliminate our use of the block device buffer cache for
>> user/object data.
>>
>> The other half of the picture, though, is the BlueFS layer that is
>> consumed by rocksdb: it also needs caching in order for rocksdb to perform
>> at all.  My hope is that the code we write for the use data can be re-used
>> here as well, but it is still evolving.
>
> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
> since Bluestore don't compress meta data in RocksDB.

Actually this is not behaviored as expected. From my last nvmedevice
benchmark, lots of read still go down device instead of caching by
rocksdb when I set a very large block cache. I guess there exists some
gaps between our usages and rocksdb implementation

>
> Jianjian
>>
>> The main missing piece I'd say is a way to string Buffer objects together
>> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
>> policy that makes sense) so that trimming can be done safely and
>> efficiently.  Right now the code is lock-free because each Onode is only
>> touched under the collection rwlock, but in order to do trimming we need
>> to be able to reap cold buffers from a global context.
>>
>> Anyway, there is no clear or ready answer here yet, but we are ready to
>> discuss design/approach here on the list, and welcome your input (and
>> potentially, contributions to development!).
>>
>> sage
>>
>>
>> >
>> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>> > >
>> > >
>> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>> > > <rajath.shashidhara@gmail.com> wrote:
>> > >>
>> > >> Hello,
>> > >>
>> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>> > >> top of NVME Driver". Unfortunately, I was not selected for the
>> > >> internship.
>> > >
>> > >
>> > > Hi Rajath,
>> > >
>> > > We are glad to see your passion, from my view, sage is planning to implement
>> > > a userspace cache in bluestore itself. Like
>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>> > >
>> > > I guess the cache won't be a generic cache interface. Instead it will be
>> > > bound to specified needed object. So sage may give a brief?
>> > >
>> > >>
>> > >>
>> > >> However, I would be interested in working on the project as an
>> > >> independent contributor to Ceph.
>> > >> I am expecting to receive the necessary support from Ceph developer
>> > >> community.
>> > >>
>> > >> In case I missed out any important details in my project proposal or I
>> > >> have the wrong understanding of the project, please help me figure out
>> > >> the details.
>> > >>
>> > >> Looking forward to contribute!
>> > >>
>> > >> Thank you,
>> > >> Rajath Shashidhara
>> > >
>> > >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache Layer on NVME driver
  2016-05-19  1:35         ` Haomai Wang
@ 2016-05-19  2:39           ` Jianjian Huo
  2016-05-19  5:34             ` Haomai Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Jianjian Huo @ 2016-05-19  2:39 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Sage Weil, Rajath Shashidhara, ceph-devel


On Wed, May 18, 2016 at 6:35 PM, Haomai Wang <haomai@xsky.com> wrote:
> On Thu, May 19, 2016 at 7:50 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>
>> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>>>
>>> Hi Rajath!
>>>
>>> Great to hear you're interested in working with us outside of GSoC!
>>>
>>> On Wed, 18 May 2016, Haomai Wang wrote:
>>> > Hi Rajath,
>>> >
>>> > We are glad to see your passion, from my view, sage is planning to
>>> > implement a userspace cache in bluestore itself. Like
>>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>> >
>>> > I guess the cache won't be a generic cache interface. Instead it will
>>> > be bound to specified needed object. So sage may give a brief?
>>>
>>> Part of the reason why this project wasn't at the top of our list (we got
>>> fewer slots than we had projects) was because the BlueStore code is in
>>> flux and moving quite quickly.  For the BlueStore side, we are building a
>>> simple buffer cache that is tied to an Onode (in-memory per-object
>>> metadata structure) and integrated tightly with the read and write IO
>>> paths.  This will eliminate our use of the block device buffer cache for
>>> user/object data.
>>>
>>> The other half of the picture, though, is the BlueFS layer that is
>>> consumed by rocksdb: it also needs caching in order for rocksdb to perform
>>> at all.  My hope is that the code we write for the use data can be re-used
>>> here as well, but it is still evolving.
>>
>> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
>> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
>> since Bluestore don't compress meta data in RocksDB.
>
> Actually this is not behaviored as expected. From my last nvmedevice
> benchmark, lots of read still go down device instead of caching by
> rocksdb when I set a very large block cache. I guess there exists some
> gaps between our usages and rocksdb implementation

What  kind of workload did you use for that benchmarking, 100% read?

>
>>
>> Jianjian
>>>
>>> The main missing piece I'd say is a way to string Buffer objects together
>>> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
>>> policy that makes sense) so that trimming can be done safely and
>>> efficiently.  Right now the code is lock-free because each Onode is only
>>> touched under the collection rwlock, but in order to do trimming we need
>>> to be able to reap cold buffers from a global context.
>>>
>>> Anyway, there is no clear or ready answer here yet, but we are ready to
>>> discuss design/approach here on the list, and welcome your input (and
>>> potentially, contributions to development!).
>>>
>>> sage
>>>
>>>
>>> >
>>> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>>> > >
>>> > >
>>> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>>> > > <rajath.shashidhara@gmail.com> wrote:
>>> > >>
>>> > >> Hello,
>>> > >>
>>> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>>> > >> top of NVME Driver". Unfortunately, I was not selected for the
>>> > >> internship.
>>> > >
>>> > >
>>> > > Hi Rajath,
>>> > >
>>> > > We are glad to see your passion, from my view, sage is planning to implement
>>> > > a userspace cache in bluestore itself. Like
>>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>> > >
>>> > > I guess the cache won't be a generic cache interface. Instead it will be
>>> > > bound to specified needed object. So sage may give a brief?
>>> > >
>>> > >>
>>> > >>
>>> > >> However, I would be interested in working on the project as an
>>> > >> independent contributor to Ceph.
>>> > >> I am expecting to receive the necessary support from Ceph developer
>>> > >> community.
>>> > >>
>>> > >> In case I missed out any important details in my project proposal or I
>>> > >> have the wrong understanding of the project, please help me figure out
>>> > >> the details.
>>> > >>
>>> > >> Looking forward to contribute!
>>> > >>
>>> > >> Thank you,
>>> > >> Rajath Shashidhara
>>> > >
>>> > >
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> > the body of a message to majordomo@vger.kernel.org
>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >
>>> >
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cache Layer on NVME driver
  2016-05-18 17:19     ` Sage Weil
  2016-05-18 23:50       ` Jianjian Huo
@ 2016-05-19  5:12       ` Rajath Shashidhara
  1 sibling, 0 replies; 11+ messages in thread
From: Rajath Shashidhara @ 2016-05-19  5:12 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi,

I will look into the new buffer cache implemented in BlueStore.
I will share my thoughts once I assess what has been done and what is
planned for the future.
As this is my first exposure to Ceph, this might take time.

Rajath

On Wed, May 18, 2016 at 10:49 PM, Sage Weil <sage@newdream.net> wrote:
> Hi Rajath!
>
> Great to hear you're interested in working with us outside of GSoC!
>
> On Wed, 18 May 2016, Haomai Wang wrote:
>> Hi Rajath,
>>
>> We are glad to see your passion, from my view, sage is planning to
>> implement a userspace cache in bluestore itself. Like
>> (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>
>> I guess the cache won't be a generic cache interface. Instead it will
>> be bound to specified needed object. So sage may give a brief?
>
> Part of the reason why this project wasn't at the top of our list (we got
> fewer slots than we had projects) was because the BlueStore code is in
> flux and moving quite quickly.  For the BlueStore side, we are building a
> simple buffer cache that is tied to an Onode (in-memory per-object
> metadata structure) and integrated tightly with the read and write IO
> paths.  This will eliminate our use of the block device buffer cache for
> user/object data.
>
> The other half of the picture, though, is the BlueFS layer that is
> consumed by rocksdb: it also needs caching in order for rocksdb to perform
> at all.  My hope is that the code we write for the use data can be re-used
> here as well, but it is still evolving.
>
> The main missing piece I'd say is a way to string Buffer objects together
> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
> policy that makes sense) so that trimming can be done safely and
> efficiently.  Right now the code is lock-free because each Onode is only
> touched under the collection rwlock, but in order to do trimming we need
> to be able to reap cold buffers from a global context.
>
> Anyway, there is no clear or ready answer here yet, but we are ready to
> discuss design/approach here on the list, and welcome your input (and
> potentially, contributions to development!).
>
> sage
>
>
>>
>> On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>> >
>> >
>> > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>> > <rajath.shashidhara@gmail.com> wrote:
>> >>
>> >> Hello,
>> >>
>> >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>> >> top of NVME Driver". Unfortunately, I was not selected for the
>> >> internship.
>> >
>> >
>> > Hi Rajath,
>> >
>> > We are glad to see your passion, from my view, sage is planning to implement
>> > a userspace cache in bluestore itself. Like
>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>> >
>> > I guess the cache won't be a generic cache interface. Instead it will be
>> > bound to specified needed object. So sage may give a brief?
>> >
>> >>
>> >>
>> >> However, I would be interested in working on the project as an
>> >> independent contributor to Ceph.
>> >> I am expecting to receive the necessary support from Ceph developer
>> >> community.
>> >>
>> >> In case I missed out any important details in my project proposal or I
>> >> have the wrong understanding of the project, please help me figure out
>> >> the details.
>> >>
>> >> Looking forward to contribute!
>> >>
>> >> Thank you,
>> >> Rajath Shashidhara
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cache Layer on NVME driver
  2016-05-19  2:39           ` Jianjian Huo
@ 2016-05-19  5:34             ` Haomai Wang
  2016-05-19  5:34               ` Haomai Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Haomai Wang @ 2016-05-19  5:34 UTC (permalink / raw)
  To: Jianjian Huo; +Cc: Sage Weil, Rajath Shashidhara, ceph-devel

100%, because of it's a small rbd image. all metadata should be cached.

On Thu, May 19, 2016 at 10:39 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>
> On Wed, May 18, 2016 at 6:35 PM, Haomai Wang <haomai@xsky.com> wrote:
>> On Thu, May 19, 2016 at 7:50 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>>
>>> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>>>>
>>>> Hi Rajath!
>>>>
>>>> Great to hear you're interested in working with us outside of GSoC!
>>>>
>>>> On Wed, 18 May 2016, Haomai Wang wrote:
>>>> > Hi Rajath,
>>>> >
>>>> > We are glad to see your passion, from my view, sage is planning to
>>>> > implement a userspace cache in bluestore itself. Like
>>>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>>> >
>>>> > I guess the cache won't be a generic cache interface. Instead it will
>>>> > be bound to specified needed object. So sage may give a brief?
>>>>
>>>> Part of the reason why this project wasn't at the top of our list (we got
>>>> fewer slots than we had projects) was because the BlueStore code is in
>>>> flux and moving quite quickly.  For the BlueStore side, we are building a
>>>> simple buffer cache that is tied to an Onode (in-memory per-object
>>>> metadata structure) and integrated tightly with the read and write IO
>>>> paths.  This will eliminate our use of the block device buffer cache for
>>>> user/object data.
>>>>
>>>> The other half of the picture, though, is the BlueFS layer that is
>>>> consumed by rocksdb: it also needs caching in order for rocksdb to perform
>>>> at all.  My hope is that the code we write for the use data can be re-used
>>>> here as well, but it is still evolving.
>>>
>>> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
>>> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
>>> since Bluestore don't compress meta data in RocksDB.
>>
>> Actually this is not behaviored as expected. From my last nvmedevice
>> benchmark, lots of read still go down device instead of caching by
>> rocksdb when I set a very large block cache. I guess there exists some
>> gaps between our usages and rocksdb implementation
>
> What  kind of workload did you use for that benchmarking, 100% read?
>
>>
>>>
>>> Jianjian
>>>>
>>>> The main missing piece I'd say is a way to string Buffer objects together
>>>> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
>>>> policy that makes sense) so that trimming can be done safely and
>>>> efficiently.  Right now the code is lock-free because each Onode is only
>>>> touched under the collection rwlock, but in order to do trimming we need
>>>> to be able to reap cold buffers from a global context.
>>>>
>>>> Anyway, there is no clear or ready answer here yet, but we are ready to
>>>> discuss design/approach here on the list, and welcome your input (and
>>>> potentially, contributions to development!).
>>>>
>>>> sage
>>>>
>>>>
>>>> >
>>>> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>>>> > >
>>>> > >
>>>> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>>>> > > <rajath.shashidhara@gmail.com> wrote:
>>>> > >>
>>>> > >> Hello,
>>>> > >>
>>>> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>>>> > >> top of NVME Driver". Unfortunately, I was not selected for the
>>>> > >> internship.
>>>> > >
>>>> > >
>>>> > > Hi Rajath,
>>>> > >
>>>> > > We are glad to see your passion, from my view, sage is planning to implement
>>>> > > a userspace cache in bluestore itself. Like
>>>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>>> > >
>>>> > > I guess the cache won't be a generic cache interface. Instead it will be
>>>> > > bound to specified needed object. So sage may give a brief?
>>>> > >
>>>> > >>
>>>> > >>
>>>> > >> However, I would be interested in working on the project as an
>>>> > >> independent contributor to Ceph.
>>>> > >> I am expecting to receive the necessary support from Ceph developer
>>>> > >> community.
>>>> > >>
>>>> > >> In case I missed out any important details in my project proposal or I
>>>> > >> have the wrong understanding of the project, please help me figure out
>>>> > >> the details.
>>>> > >>
>>>> > >> Looking forward to contribute!
>>>> > >>
>>>> > >> Thank you,
>>>> > >> Rajath Shashidhara
>>>> > >
>>>> > >
>>>> > --
>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> > the body of a message to majordomo@vger.kernel.org
>>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> >
>>>> >
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cache Layer on NVME driver
  2016-05-19  5:34             ` Haomai Wang
@ 2016-05-19  5:34               ` Haomai Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Haomai Wang @ 2016-05-19  5:34 UTC (permalink / raw)
  To: Jianjian Huo; +Cc: Sage Weil, Rajath Shashidhara, ceph-devel

sorry, 100% rw

On Thu, May 19, 2016 at 1:34 PM, Haomai Wang <haomai@xsky.com> wrote:
> 100%, because of it's a small rbd image. all metadata should be cached.
>
> On Thu, May 19, 2016 at 10:39 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>
>> On Wed, May 18, 2016 at 6:35 PM, Haomai Wang <haomai@xsky.com> wrote:
>>> On Thu, May 19, 2016 at 7:50 AM, Jianjian Huo <jianjian.huo@samsung.com> wrote:
>>>>
>>>> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>>>>>
>>>>> Hi Rajath!
>>>>>
>>>>> Great to hear you're interested in working with us outside of GSoC!
>>>>>
>>>>> On Wed, 18 May 2016, Haomai Wang wrote:
>>>>> > Hi Rajath,
>>>>> >
>>>>> > We are glad to see your passion, from my view, sage is planning to
>>>>> > implement a userspace cache in bluestore itself. Like
>>>>> > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>>>> >
>>>>> > I guess the cache won't be a generic cache interface. Instead it will
>>>>> > be bound to specified needed object. So sage may give a brief?
>>>>>
>>>>> Part of the reason why this project wasn't at the top of our list (we got
>>>>> fewer slots than we had projects) was because the BlueStore code is in
>>>>> flux and moving quite quickly.  For the BlueStore side, we are building a
>>>>> simple buffer cache that is tied to an Onode (in-memory per-object
>>>>> metadata structure) and integrated tightly with the read and write IO
>>>>> paths.  This will eliminate our use of the block device buffer cache for
>>>>> user/object data.
>>>>>
>>>>> The other half of the picture, though, is the BlueFS layer that is
>>>>> consumed by rocksdb: it also needs caching in order for rocksdb to perform
>>>>> at all.  My hope is that the code we write for the use data can be re-used
>>>>> here as well, but it is still evolving.
>>>>
>>>> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
>>>> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
>>>> since Bluestore don't compress meta data in RocksDB.
>>>
>>> Actually this is not behaviored as expected. From my last nvmedevice
>>> benchmark, lots of read still go down device instead of caching by
>>> rocksdb when I set a very large block cache. I guess there exists some
>>> gaps between our usages and rocksdb implementation
>>
>> What  kind of workload did you use for that benchmarking, 100% read?
>>
>>>
>>>>
>>>> Jianjian
>>>>>
>>>>> The main missing piece I'd say is a way to string Buffer objects together
>>>>> in a global(ish) LRU (or set of LRUs, or whatever we need for the caching
>>>>> policy that makes sense) so that trimming can be done safely and
>>>>> efficiently.  Right now the code is lock-free because each Onode is only
>>>>> touched under the collection rwlock, but in order to do trimming we need
>>>>> to be able to reap cold buffers from a global context.
>>>>>
>>>>> Anyway, there is no clear or ready answer here yet, but we are ready to
>>>>> discuss design/approach here on the list, and welcome your input (and
>>>>> potentially, contributions to development!).
>>>>>
>>>>> sage
>>>>>
>>>>>
>>>>> >
>>>>> > On Wed, May 18, 2016 at 9:32 PM, Haomai Wang <haomai@xsky.com> wrote:
>>>>> > >
>>>>> > >
>>>>> > > On Wed, May 18, 2016 at 2:44 PM, Rajath Shashidhara
>>>>> > > <rajath.shashidhara@gmail.com> wrote:
>>>>> > >>
>>>>> > >> Hello,
>>>>> > >>
>>>>> > >> I was a GSoC'16 applicant for the project "Implementing Cache layer on
>>>>> > >> top of NVME Driver". Unfortunately, I was not selected for the
>>>>> > >> internship.
>>>>> > >
>>>>> > >
>>>>> > > Hi Rajath,
>>>>> > >
>>>>> > > We are glad to see your passion, from my view, sage is planning to implement
>>>>> > > a userspace cache in bluestore itself. Like
>>>>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>>>>> > >
>>>>> > > I guess the cache won't be a generic cache interface. Instead it will be
>>>>> > > bound to specified needed object. So sage may give a brief?
>>>>> > >
>>>>> > >>
>>>>> > >>
>>>>> > >> However, I would be interested in working on the project as an
>>>>> > >> independent contributor to Ceph.
>>>>> > >> I am expecting to receive the necessary support from Ceph developer
>>>>> > >> community.
>>>>> > >>
>>>>> > >> In case I missed out any important details in my project proposal or I
>>>>> > >> have the wrong understanding of the project, please help me figure out
>>>>> > >> the details.
>>>>> > >>
>>>>> > >> Looking forward to contribute!
>>>>> > >>
>>>>> > >> Thank you,
>>>>> > >> Rajath Shashidhara
>>>>> > >
>>>>> > >
>>>>> > --
>>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> > the body of a message to majordomo@vger.kernel.org
>>>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> >
>>>>> >
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Cache Layer on NVME driver
  2016-05-18 23:50       ` Jianjian Huo
  2016-05-19  1:35         ` Haomai Wang
@ 2016-05-19  6:43         ` Sage Weil
  2016-05-19  7:08           ` Haomai Wang
  1 sibling, 1 reply; 11+ messages in thread
From: Sage Weil @ 2016-05-19  6:43 UTC (permalink / raw)
  To: Jianjian Huo; +Cc: Haomai Wang, Rajath Shashidhara, ceph-devel

On Wed, 18 May 2016, Jianjian Huo wrote:
> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
> >
> > Hi Rajath!
> >
> > Great to hear you're interested in working with us outside of GSoC!
> >
> > On Wed, 18 May 2016, Haomai Wang wrote:
> > > Hi Rajath,
> > >
> > > We are glad to see your passion, from my view, sage is planning to
> > > implement a userspace cache in bluestore itself. Like
> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
> > >
> > > I guess the cache won't be a generic cache interface. Instead it will
> > > be bound to specified needed object. So sage may give a brief?
> >
> > Part of the reason why this project wasn't at the top of our list (we got
> > fewer slots than we had projects) was because the BlueStore code is in
> > flux and moving quite quickly.  For the BlueStore side, we are building a
> > simple buffer cache that is tied to an Onode (in-memory per-object
> > metadata structure) and integrated tightly with the read and write IO
> > paths.  This will eliminate our use of the block device buffer cache for
> > user/object data.
> >
> > The other half of the picture, though, is the BlueFS layer that is
> > consumed by rocksdb: it also needs caching in order for rocksdb to perform
> > at all.  My hope is that the code we write for the use data can be re-used
> > here as well, but it is still evolving.
> 
> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well. 
> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache, 
> since Bluestore don't compress meta data in RocksDB.

What options control this?  I was having trouble finding it the last time 
I looked.  We should switch it now (and use unbuffered IO from BlueFS) if 
we can.

Thanks!
sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Cache Layer on NVME driver
  2016-05-19  6:43         ` Sage Weil
@ 2016-05-19  7:08           ` Haomai Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Haomai Wang @ 2016-05-19  7:08 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jianjian Huo, Rajath Shashidhara, ceph-devel

I think jianjian refer to
this(https://github.com/ceph/ceph/blob/master/src/kv/RocksDBStore.cc#L285)

On Thu, May 19, 2016 at 2:43 PM, Sage Weil <sage@newdream.net> wrote:
> On Wed, 18 May 2016, Jianjian Huo wrote:
>> On Wed, May 18, 2016 at 10:19 AM, Sage Weil <sage@newdream.net> wrote:
>> >
>> > Hi Rajath!
>> >
>> > Great to hear you're interested in working with us outside of GSoC!
>> >
>> > On Wed, 18 May 2016, Haomai Wang wrote:
>> > > Hi Rajath,
>> > >
>> > > We are glad to see your passion, from my view, sage is planning to
>> > > implement a userspace cache in bluestore itself. Like
>> > > (https://github.com/ceph/ceph/commit/b9ac31afe5a162176f019baa25348014a77f6ab8#commitcomment-17488250).
>> > >
>> > > I guess the cache won't be a generic cache interface. Instead it will
>> > > be bound to specified needed object. So sage may give a brief?
>> >
>> > Part of the reason why this project wasn't at the top of our list (we got
>> > fewer slots than we had projects) was because the BlueStore code is in
>> > flux and moving quite quickly.  For the BlueStore side, we are building a
>> > simple buffer cache that is tied to an Onode (in-memory per-object
>> > metadata structure) and integrated tightly with the read and write IO
>> > paths.  This will eliminate our use of the block device buffer cache for
>> > user/object data.
>> >
>> > The other half of the picture, though, is the BlueFS layer that is
>> > consumed by rocksdb: it also needs caching in order for rocksdb to perform
>> > at all.  My hope is that the code we write for the use data can be re-used
>> > here as well, but it is still evolving.
>>
>> When Bluestore moves away from kernel cache to its own buffer cache, RocksDB can use its own buffer cache as well.
>> RocksDB has this size configurable block cache to cache uncompressed data blocks, it can serve as buffer cache,
>> since Bluestore don't compress meta data in RocksDB.
>
> What options control this?  I was having trouble finding it the last time
> I looked.  We should switch it now (and use unbuffered IO from BlueFS) if
> we can.
>
> Thanks!
> sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-05-19  7:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-18  6:44 Cache Layer on NVME driver Rajath Shashidhara
     [not found] ` <CACJqLyYGwhymjaVSoAGArPmPzFBmUHRrjGAtiQXKr6gEv_ZsSw@mail.gmail.com>
2016-05-18 13:32   ` Haomai Wang
2016-05-18 17:19     ` Sage Weil
2016-05-18 23:50       ` Jianjian Huo
2016-05-19  1:35         ` Haomai Wang
2016-05-19  2:39           ` Jianjian Huo
2016-05-19  5:34             ` Haomai Wang
2016-05-19  5:34               ` Haomai Wang
2016-05-19  6:43         ` Sage Weil
2016-05-19  7:08           ` Haomai Wang
2016-05-19  5:12       ` Rajath Shashidhara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.