All of lore.kernel.org
 help / color / mirror / Atom feed
* Fscache support for Ceph
@ 2013-05-23 21:48 Milosz Tanski
       [not found] ` <CAKxz0mwqStYgeHnCvYokizsJXoe_cOccMSjx8L=EO9rFPyaK_A@mail.gmail.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Milosz Tanski @ 2013-05-23 21:48 UTC (permalink / raw)
  To: ceph-devel, linux-cachefs

This is my first at adding fscache support for the Ceph Linux module.

My motivation for doing this work was speedup our distributed database
that uses the Ceph filesystem as a backing store. By far more of the
workload that our application is doing is read only and latency is our
biggest challenge. Being able to cache frequently used blocks on the
SSD drives that our machines use dramatically speeds up our query
setup time when we're fetching multiple compressed indexes and then
navigating the block tree.

The branch containing the two patches is here:
https://bitbucket.org/adfin/linux-fs.git in the forceph branch.

If you want to review it in your browser here is the bitbucket url:
https://bitbucket.org/adfin/linux-fs/commits/branch/forceph

I've tested this both in mainline and in the branch that features
upcoming fscache changes. The patches are broken into two pieces.

01 - Setups the facility for fscache in it's independent files
02 - Enables fscache in the ceph filesystem and adds a new configuration option

The patches will follow in the new few emails as well.

Future wise; there's some new work being done to add write-back
caching to fscache & NFS. When that's done I'd like to integrated the
Ceph fscache implementation. From the benchmarks of the author of that
it seams like it has much the same benefit for write to NFS as bcache
does.

I'd like to get this into ceph, and I'm looking for feedback.

Thanks,
- Milosz

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Fscache support for Ceph
       [not found] ` <CAKxz0mwqStYgeHnCvYokizsJXoe_cOccMSjx8L=EO9rFPyaK_A@mail.gmail.com>
@ 2013-05-29 13:35   ` Milosz Tanski
  2013-05-29 17:46     ` Milosz Tanski
  0 siblings, 1 reply; 3+ messages in thread
From: Milosz Tanski @ 2013-05-29 13:35 UTC (permalink / raw)
  To: Elso Andras; +Cc: ceph-devel, linux-cachefs

Elbandi,

Thanks to your stack trace I see the bug. I'll send you a fix as soon
as I get back to my office. Apparently, I spent too much time testing
it in UP vms and UML.

Thanks,
-- Milosz

On Wed, May 29, 2013 at 5:47 AM, Elso Andras <elso.andras@gmail.com> wrote:
> Hi,
>
> I try your fscache patch on my test cluster. the client node is a
> ubuntu lucid (10.4) with 3.8 kernel (*) + your patch.
> Little after i mount the cephfs, i got this:
>
> [  316.303851] Pid: 1565, comm: lighttpd Not tainted 3.8.0-22-fscache
> #33 HP ProLiant DL160 G6
> [  316.303853] RIP: 0010:[<ffffffff81045c42>]  [<ffffffff81045c42>]
> __ticket_spin_lock+0x22/0x30
> [  316.303861] RSP: 0018:ffff8804180e79f8  EFLAGS: 00000297
> [  316.303863] RAX: 0000000000000004 RBX: ffffffffa0224e53 RCX: 0000000000000004
> [  316.303865] RDX: 0000000000000005 RSI: 00000000000000d0 RDI: ffff88041eb29a50
> [  316.303866] RBP: ffff8804180e79f8 R08: ffffe8ffffa40150 R09: 0000000000000000
> [  316.303868] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88041da75050
> [  316.303869] R13: ffff880428ef0000 R14: ffffffff81702b86 R15: ffff8804180e7968
> [  316.303871] FS:  00007fbcca138700(0000) GS:ffff88042f240000(0000)
> knlGS:0000000000000000
> [  316.303873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  316.303875] CR2: 00007f5c96649f00 CR3: 00000004180c9000 CR4: 00000000000007e0
> [  316.303877] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  316.303878] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  316.303880] Process lighttpd (pid: 1565, threadinfo
> ffff8804180e6000, task ffff88041cc22e80)
> [  316.303881] Stack:
> [  316.303883]  ffff8804180e7a08 ffffffff817047ae ffff8804180e7a58
> ffffffffa02c816a
> [  316.303886]  ffff8804180e7a58 ffff88041eb29a50 0000000000000000
> ffff88041eb29d50
> [  316.303889]  ffff88041eb29a50 ffff88041b29ed00 ffff88041eb29a40
> 0000000000000d01
> [  316.303892] Call Trace:
> [  316.303898]  [<ffffffff817047ae>] _raw_spin_lock+0xe/0x20
> [  316.303910]  [<ffffffffa02c816a>] ceph_init_file+0xca/0x1c0 [ceph]
> [  316.303917]  [<ffffffffa02c83e1>] ceph_open+0x181/0x3c0 [ceph]
> [  316.303925]  [<ffffffffa02c8260>] ? ceph_init_file+0x1c0/0x1c0 [ceph]
> [  316.303930]  [<ffffffff8119a62e>] do_dentry_open+0x21e/0x2a0
> [  316.303933]  [<ffffffff8119a6e5>] finish_open+0x35/0x50
> [  316.303940]  [<ffffffffa02c9304>] ceph_atomic_open+0x214/0x2f0 [ceph]
> [  316.303944]  [<ffffffff811b416f>] ? __d_alloc+0x5f/0x180
> [  316.303948]  [<ffffffff811a7fa1>] atomic_open+0xf1/0x460
> [  316.303951]  [<ffffffff811a86f4>] lookup_open+0x1a4/0x1d0
> [  316.303954]  [<ffffffff811a8fad>] do_last+0x30d/0x820
> [  316.303958]  [<ffffffff811ab413>] path_openat+0xb3/0x4d0
> [  316.303962]  [<ffffffff815da87d>] ? sock_aio_read+0x2d/0x40
> [  316.303965]  [<ffffffff8119c333>] ? do_sync_read+0xa3/0xe0
> [  316.303968]  [<ffffffff811ac232>] do_filp_open+0x42/0xa0
> [  316.303971]  [<ffffffff811b9eb5>] ? __alloc_fd+0xe5/0x170
> [  316.303974]  [<ffffffff8119be8a>] do_sys_open+0xfa/0x250
> [  316.303977]  [<ffffffff8119cacd>] ? vfs_read+0x10d/0x180
> [  316.303980]  [<ffffffff8119c001>] sys_open+0x21/0x30
> [  316.303983]  [<ffffffff8170d61d>] system_call_fastpath+0x1a/0x1f
>
> And the console print this lines forever, server is freezed:
> [  376.305754] BUG: soft lockup - CPU#2 stuck for 22s! [lighttpd:1565]
> [  404.294735] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:39]
> [  404.306735] BUG: soft lockup - CPU#2 stuck for 22s! [lighttpd:1565]
> [  432.295716] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:39]
>
> Have you any idea?
>
> Elbandi
>
> * http://packages.ubuntu.com/raring/linux-image-3.8.0-19-generic
>
> 2013/5/23 Milosz Tanski <milosz@adfin.com>:
>> This is my first at adding fscache support for the Ceph Linux module.
>>
>> My motivation for doing this work was speedup our distributed database
>> that uses the Ceph filesystem as a backing store. By far more of the
>> workload that our application is doing is read only and latency is our
>> biggest challenge. Being able to cache frequently used blocks on the
>> SSD drives that our machines use dramatically speeds up our query
>> setup time when we're fetching multiple compressed indexes and then
>> navigating the block tree.
>>
>> The branch containing the two patches is here:
>> https://bitbucket.org/adfin/linux-fs.git in the forceph branch.
>>
>> If you want to review it in your browser here is the bitbucket url:
>> https://bitbucket.org/adfin/linux-fs/commits/branch/forceph
>>
>> I've tested this both in mainline and in the branch that features
>> upcoming fscache changes. The patches are broken into two pieces.
>>
>> 01 - Setups the facility for fscache in it's independent files
>> 02 - Enables fscache in the ceph filesystem and adds a new configuration option
>>
>> The patches will follow in the new few emails as well.
>>
>> Future wise; there's some new work being done to add write-back
>> caching to fscache & NFS. When that's done I'd like to integrated the
>> Ceph fscache implementation. From the benchmarks of the author of that
>> it seams like it has much the same benefit for write to NFS as bcache
>> does.
>>
>> I'd like to get this into ceph, and I'm looking for feedback.
>>
>> Thanks,
>> - Milosz
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Fscache support for Ceph
  2013-05-29 13:35   ` Milosz Tanski
@ 2013-05-29 17:46     ` Milosz Tanski
  0 siblings, 0 replies; 3+ messages in thread
From: Milosz Tanski @ 2013-05-29 17:46 UTC (permalink / raw)
  To: Elso Andras; +Cc: ceph-devel, linux-cachefs

Elso,

I have both good and bad news for you.

First, the good news is that I fixed this particular issue. You can
find the changes needed here:
https://bitbucket.org/adfin/linux-fs/commits/339c82d37ec0223733778f83111f29599f220e35.
As you can see it's a simple fix. I also put another patch in my tree
that makes fscache a mount option.

The bad news is that when working with the ubuntu 3.8.9-22 kernel on
LTS there an sporadic crash. This is due to a bug in the upstream
kernel code. There is a fix for it in David Howells tree:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/commit/?h=fscache&id=82958c45e35963c93fc6cbe6a27752e2d97e9f9a

I can't repro this under normal conditions but I can repo it forcing
the kernel to drop caches.

Best,
- Milosz

On Wed, May 29, 2013 at 9:35 AM, Milosz Tanski <milosz@adfin.com> wrote:
> Elbandi,
>
> Thanks to your stack trace I see the bug. I'll send you a fix as soon
> as I get back to my office. Apparently, I spent too much time testing
> it in UP vms and UML.
>
> Thanks,
> -- Milosz
>
> On Wed, May 29, 2013 at 5:47 AM, Elso Andras <elso.andras@gmail.com> wrote:
>> Hi,
>>
>> I try your fscache patch on my test cluster. the client node is a
>> ubuntu lucid (10.4) with 3.8 kernel (*) + your patch.
>> Little after i mount the cephfs, i got this:
>>
>> [  316.303851] Pid: 1565, comm: lighttpd Not tainted 3.8.0-22-fscache
>> #33 HP ProLiant DL160 G6
>> [  316.303853] RIP: 0010:[<ffffffff81045c42>]  [<ffffffff81045c42>]
>> __ticket_spin_lock+0x22/0x30
>> [  316.303861] RSP: 0018:ffff8804180e79f8  EFLAGS: 00000297
>> [  316.303863] RAX: 0000000000000004 RBX: ffffffffa0224e53 RCX: 0000000000000004
>> [  316.303865] RDX: 0000000000000005 RSI: 00000000000000d0 RDI: ffff88041eb29a50
>> [  316.303866] RBP: ffff8804180e79f8 R08: ffffe8ffffa40150 R09: 0000000000000000
>> [  316.303868] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88041da75050
>> [  316.303869] R13: ffff880428ef0000 R14: ffffffff81702b86 R15: ffff8804180e7968
>> [  316.303871] FS:  00007fbcca138700(0000) GS:ffff88042f240000(0000)
>> knlGS:0000000000000000
>> [  316.303873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  316.303875] CR2: 00007f5c96649f00 CR3: 00000004180c9000 CR4: 00000000000007e0
>> [  316.303877] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  316.303878] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [  316.303880] Process lighttpd (pid: 1565, threadinfo
>> ffff8804180e6000, task ffff88041cc22e80)
>> [  316.303881] Stack:
>> [  316.303883]  ffff8804180e7a08 ffffffff817047ae ffff8804180e7a58
>> ffffffffa02c816a
>> [  316.303886]  ffff8804180e7a58 ffff88041eb29a50 0000000000000000
>> ffff88041eb29d50
>> [  316.303889]  ffff88041eb29a50 ffff88041b29ed00 ffff88041eb29a40
>> 0000000000000d01
>> [  316.303892] Call Trace:
>> [  316.303898]  [<ffffffff817047ae>] _raw_spin_lock+0xe/0x20
>> [  316.303910]  [<ffffffffa02c816a>] ceph_init_file+0xca/0x1c0 [ceph]
>> [  316.303917]  [<ffffffffa02c83e1>] ceph_open+0x181/0x3c0 [ceph]
>> [  316.303925]  [<ffffffffa02c8260>] ? ceph_init_file+0x1c0/0x1c0 [ceph]
>> [  316.303930]  [<ffffffff8119a62e>] do_dentry_open+0x21e/0x2a0
>> [  316.303933]  [<ffffffff8119a6e5>] finish_open+0x35/0x50
>> [  316.303940]  [<ffffffffa02c9304>] ceph_atomic_open+0x214/0x2f0 [ceph]
>> [  316.303944]  [<ffffffff811b416f>] ? __d_alloc+0x5f/0x180
>> [  316.303948]  [<ffffffff811a7fa1>] atomic_open+0xf1/0x460
>> [  316.303951]  [<ffffffff811a86f4>] lookup_open+0x1a4/0x1d0
>> [  316.303954]  [<ffffffff811a8fad>] do_last+0x30d/0x820
>> [  316.303958]  [<ffffffff811ab413>] path_openat+0xb3/0x4d0
>> [  316.303962]  [<ffffffff815da87d>] ? sock_aio_read+0x2d/0x40
>> [  316.303965]  [<ffffffff8119c333>] ? do_sync_read+0xa3/0xe0
>> [  316.303968]  [<ffffffff811ac232>] do_filp_open+0x42/0xa0
>> [  316.303971]  [<ffffffff811b9eb5>] ? __alloc_fd+0xe5/0x170
>> [  316.303974]  [<ffffffff8119be8a>] do_sys_open+0xfa/0x250
>> [  316.303977]  [<ffffffff8119cacd>] ? vfs_read+0x10d/0x180
>> [  316.303980]  [<ffffffff8119c001>] sys_open+0x21/0x30
>> [  316.303983]  [<ffffffff8170d61d>] system_call_fastpath+0x1a/0x1f
>>
>> And the console print this lines forever, server is freezed:
>> [  376.305754] BUG: soft lockup - CPU#2 stuck for 22s! [lighttpd:1565]
>> [  404.294735] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:39]
>> [  404.306735] BUG: soft lockup - CPU#2 stuck for 22s! [lighttpd:1565]
>> [  432.295716] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:39]
>>
>> Have you any idea?
>>
>> Elbandi
>>
>> * http://packages.ubuntu.com/raring/linux-image-3.8.0-19-generic
>>
>> 2013/5/23 Milosz Tanski <milosz@adfin.com>:
>>> This is my first at adding fscache support for the Ceph Linux module.
>>>
>>> My motivation for doing this work was speedup our distributed database
>>> that uses the Ceph filesystem as a backing store. By far more of the
>>> workload that our application is doing is read only and latency is our
>>> biggest challenge. Being able to cache frequently used blocks on the
>>> SSD drives that our machines use dramatically speeds up our query
>>> setup time when we're fetching multiple compressed indexes and then
>>> navigating the block tree.
>>>
>>> The branch containing the two patches is here:
>>> https://bitbucket.org/adfin/linux-fs.git in the forceph branch.
>>>
>>> If you want to review it in your browser here is the bitbucket url:
>>> https://bitbucket.org/adfin/linux-fs/commits/branch/forceph
>>>
>>> I've tested this both in mainline and in the branch that features
>>> upcoming fscache changes. The patches are broken into two pieces.
>>>
>>> 01 - Setups the facility for fscache in it's independent files
>>> 02 - Enables fscache in the ceph filesystem and adds a new configuration option
>>>
>>> The patches will follow in the new few emails as well.
>>>
>>> Future wise; there's some new work being done to add write-back
>>> caching to fscache & NFS. When that's done I'd like to integrated the
>>> Ceph fscache implementation. From the benchmarks of the author of that
>>> it seams like it has much the same benefit for write to NFS as bcache
>>> does.
>>>
>>> I'd like to get this into ceph, and I'm looking for feedback.
>>>
>>> Thanks,
>>> - Milosz
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-05-29 17:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-23 21:48 Fscache support for Ceph Milosz Tanski
     [not found] ` <CAKxz0mwqStYgeHnCvYokizsJXoe_cOccMSjx8L=EO9rFPyaK_A@mail.gmail.com>
2013-05-29 13:35   ` Milosz Tanski
2013-05-29 17:46     ` Milosz Tanski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.