All of lore.kernel.org
 help / color / mirror / Atom feed
* bcache-3.2 branch
@ 2012-06-20 12:08 Joseph Glanville
       [not found] ` <CAOzFzEh8pO37dVWoMoD+hFoUGrBoubSdktdu7SQS0UcXLcC66w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Glanville @ 2012-06-20 12:08 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Kent Overstreet

Hi Kent and list,

I have pulled down the latest bcache code and have been playing around
with it when I noticed that I am having issues starting Xen virtual
machines using bcache + LVM.
What is interesting is the QEMU storage emulation in userspace is able
to access the device fine however blkback kernel module which uses the
device directly seems to fail.
How would I go about debugging any of this?

Older versions of bcache work fine so it's a regression as far as I can tell.

Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bcache-3.2 branch
       [not found] ` <CAOzFzEh8pO37dVWoMoD+hFoUGrBoubSdktdu7SQS0UcXLcC66w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-07-09 15:57   ` Kent Overstreet
       [not found]     ` <20120709155734.GA23774-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Kent Overstreet @ 2012-07-09 15:57 UTC (permalink / raw)
  To: Joseph Glanville; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Wed, Jun 20, 2012 at 10:08:51PM +1000, Joseph Glanville wrote:
> Hi Kent and list,
> 
> I have pulled down the latest bcache code and have been playing around
> with it when I noticed that I am having issues starting Xen virtual
> machines using bcache + LVM.
> What is interesting is the QEMU storage emulation in userspace is able
> to access the device fine however blkback kernel module which uses the
> device directly seems to fail.
> How would I go about debugging any of this?
> 
> Older versions of bcache work fine so it's a regression as far as I can tell.

Hey, sorry for the delay - I just got back from my first sort-of
vacation in... awhile :P

I'm pretty sure I know the approximate source of the regression - I
fairly recently reworked some code in the generic block layer to handle
arbitrary size bios (which enabled some major cleanups in the bcache
code). I've chased down a few bugs with that code since then.

Got some logs for me to look at? Or did you want me to give you pointers
on debugging kernel code? :)

> 
> Joseph.
> 
> -- 
> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bcache-3.2 branch
       [not found]     ` <20120709155734.GA23774-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-07-09 16:32       ` Joseph Glanville
       [not found]         ` <CAOzFzEjovYu4eE9E_asOBVyBhqFuvhgzJ7UFyESLY0XycAfkuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Glanville @ 2012-07-09 16:32 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 10 July 2012 01:57, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Wed, Jun 20, 2012 at 10:08:51PM +1000, Joseph Glanville wrote:
>> Hi Kent and list,
>>
>> I have pulled down the latest bcache code and have been playing around
>> with it when I noticed that I am having issues starting Xen virtual
>> machines using bcache + LVM.
>> What is interesting is the QEMU storage emulation in userspace is able
>> to access the device fine however blkback kernel module which uses the
>> device directly seems to fail.
>> How would I go about debugging any of this?
>>
>> Older versions of bcache work fine so it's a regression as far as I can tell.
>
> Hey, sorry for the delay - I just got back from my first sort-of
> vacation in... awhile :P
>
> I'm pretty sure I know the approximate source of the regression - I
> fairly recently reworked some code in the generic block layer to handle
> arbitrary size bios (which enabled some major cleanups in the bcache
> code). I've chased down a few bugs with that code since then.
>
> Got some logs for me to look at? Or did you want me to give you pointers
> on debugging kernel code? :)

A few pointers would be great. :)

Also how do I best get it to do a really verbose log that I can use to
help you track down bugs?

>
>>
>> Joseph.
>>
>> --
>> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
>> Phone: 1300 56 99 52 | Mobile: 0428 754 846

Cheers,
Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bcache-3.2 branch
       [not found]         ` <CAOzFzEjovYu4eE9E_asOBVyBhqFuvhgzJ7UFyESLY0XycAfkuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-07-09 17:07           ` Kent Overstreet
       [not found]             ` <20120709170742.GA26798-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Kent Overstreet @ 2012-07-09 17:07 UTC (permalink / raw)
  To: Joseph Glanville; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Tue, Jul 10, 2012 at 02:32:36AM +1000, Joseph Glanville wrote:
> On 10 July 2012 01:57, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> > On Wed, Jun 20, 2012 at 10:08:51PM +1000, Joseph Glanville wrote:
> >> Hi Kent and list,
> >>
> >> I have pulled down the latest bcache code and have been playing around
> >> with it when I noticed that I am having issues starting Xen virtual
> >> machines using bcache + LVM.
> >> What is interesting is the QEMU storage emulation in userspace is able
> >> to access the device fine however blkback kernel module which uses the
> >> device directly seems to fail.
> >> How would I go about debugging any of this?
> >>
> >> Older versions of bcache work fine so it's a regression as far as I can tell.
> >
> > Hey, sorry for the delay - I just got back from my first sort-of
> > vacation in... awhile :P
> >
> > I'm pretty sure I know the approximate source of the regression - I
> > fairly recently reworked some code in the generic block layer to handle
> > arbitrary size bios (which enabled some major cleanups in the bcache
> > code). I've chased down a few bugs with that code since then.
> >
> > Got some logs for me to look at? Or did you want me to give you pointers
> > on debugging kernel code? :)
> 
> A few pointers would be great. :)

More than happy to :) I'm not sure what sort of general pointers I could
give you off the top of my head - there's no Unified Theory of
Debugging, it's just a big bag of tricks you learn to narrow things down
until you figure it out. But I'll try to tell you everything I'd do with
this bug, at least (and whatever else you find :)

Also just understanding how things work so you can figure out a root
cause from the symptom.

> 
> Also how do I best get it to do a really verbose log that I can use to
> help you track down bugs?

I think for all the bugs that have shown up in the wild so far we
haven't needed any special logging, just the normal stuff has been fine.
There's all kinds of logging and tracing and whatnot buried in there but
for the most part you don't want to bother with the non default stuff
unless you have to.

But anyways, just whatever the kernel spits out is the place to start.
If you've still got that, I'll take a look and tell you what I'd get out
of it.

> 
> >
> >>
> >> Joseph.
> >>
> >> --
> >> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> >> Phone: 1300 56 99 52 | Mobile: 0428 754 846
> 
> Cheers,
> Joseph.
> 
> -- 
> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 11+ messages in thread

* bcache & kernel branch that will build together
       [not found]             ` <20120709170742.GA26798-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-07-09 18:07               ` Jason Warr
       [not found]                 ` <4FFB1DD4.7030304-/cow75dQlsI@public.gmane.org>
  2012-07-10 18:52               ` bcache-3.2 branch Joseph Glanville
  1 sibling, 1 reply; 11+ messages in thread
From: Jason Warr @ 2012-07-09 18:07 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Dumb question coming from someone who is more of an admin than a developer.  Although I did build and run a dev lab at 
Sun in the early days of Solaris Nevada.

What kernel source version or branch will your current git tree patch to and compile with?

I have tried everything from 3.5rc1 to Linus's current git tree.  It breaks DRBD.  Once I disable DRBD in the config it 
fails at target_core_iblock.c.  So I never get a full build.

So either I am doing something obviously wrong or ???

I'd love to test this out and help with debugging.  I have plenty of hardware and interest in it working well.  I can 
even provide you access to debug when I hit an issue.

What error output can I provide and can you point me to good trees?

I appreciate it and I am glad to see that someone is finally making the effort to get a real, working block cache object 
into the kernel.  I will help in any way I can.

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bcache & kernel branch that will build together
       [not found]                 ` <4FFB1DD4.7030304-/cow75dQlsI@public.gmane.org>
@ 2012-07-09 18:41                   ` Kent Overstreet
       [not found]                     ` <20120709184149.GA3234-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Kent Overstreet @ 2012-07-09 18:41 UTC (permalink / raw)
  To: Jason Warr; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Mon, Jul 09, 2012 at 01:07:16PM -0500, Jason Warr wrote:
> Dumb question coming from someone who is more of an admin than a
> developer.  Although I did build and run a dev lab at Sun in the
> early days of Solaris Nevada.
> 
> What kernel source version or branch will your current git tree patch to and compile with?
> 
> I have tried everything from 3.5rc1 to Linus's current git tree.  It
> breaks DRBD.  Once I disable DRBD in the config it fails at
> target_core_iblock.c.  So I never get a full build.
> 
> So either I am doing something obviously wrong or ???

Bah, sounds like I didn't test my code when I rebased to 3.5. Lazy me.

For the moment, if you're not actually using drbd or iscsi you could
just disable both of those in the config. Assuming I didn't break dm and
md too. heh.

> 
> I'd love to test this out and help with debugging.  I have plenty of
> hardware and interest in it working well.  I can even provide you
> access to debug when I hit an issue.

Cool!

> What error output can I provide and can you point me to good trees?

You could send me your .config in case it turns out to be something
specific to your specific config. Ought to be something simple, though.

> I appreciate it and I am glad to see that someone is finally making
> the effort to get a real, working block cache object into the
> kernel.  I will help in any way I can.

Thanks :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bcache & kernel branch that will build together
       [not found]                     ` <20120709184149.GA3234-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-07-09 20:16                       ` Kent Overstreet
       [not found]                         ` <CAC7rs0vkasCKYwBTrvhWg8pzvHuix1ZaBJxTLBW+5AVG_31hEQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Kent Overstreet @ 2012-07-09 20:16 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Jason Warr, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Ok, I think I got them all fixed - the code that's up now should build for you.

On Mon, Jul 9, 2012 at 11:41 AM, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Mon, Jul 09, 2012 at 01:07:16PM -0500, Jason Warr wrote:
>> Dumb question coming from someone who is more of an admin than a
>> developer.  Although I did build and run a dev lab at Sun in the
>> early days of Solaris Nevada.
>>
>> What kernel source version or branch will your current git tree patch to and compile with?
>>
>> I have tried everything from 3.5rc1 to Linus's current git tree.  It
>> breaks DRBD.  Once I disable DRBD in the config it fails at
>> target_core_iblock.c.  So I never get a full build.
>>
>> So either I am doing something obviously wrong or ???
>
> Bah, sounds like I didn't test my code when I rebased to 3.5. Lazy me.
>
> For the moment, if you're not actually using drbd or iscsi you could
> just disable both of those in the config. Assuming I didn't break dm and
> md too. heh.
>
>>
>> I'd love to test this out and help with debugging.  I have plenty of
>> hardware and interest in it working well.  I can even provide you
>> access to debug when I hit an issue.
>
> Cool!
>
>> What error output can I provide and can you point me to good trees?
>
> You could send me your .config in case it turns out to be something
> specific to your specific config. Ought to be something simple, though.
>
>> I appreciate it and I am glad to see that someone is finally making
>> the effort to get a real, working block cache object into the
>> kernel.  I will help in any way I can.
>
> Thanks :)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bcache & kernel branch that will build together
       [not found]                         ` <CAC7rs0vkasCKYwBTrvhWg8pzvHuix1ZaBJxTLBW+5AVG_31hEQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-07-09 21:48                           ` Jason Warr
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Warr @ 2012-07-09 21:48 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Excellent.  That builds fine for me against 3.5rc6.

Much appreciated.

Now to see if I can get the Fedora "patched" source to build and boot.

On 07/09/2012 03:16 PM, Kent Overstreet wrote:
> Ok, I think I got them all fixed - the code that's up now should build for you.
>
> On Mon, Jul 9, 2012 at 11:41 AM, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> On Mon, Jul 09, 2012 at 01:07:16PM -0500, Jason Warr wrote:
>>> Dumb question coming from someone who is more of an admin than a
>>> developer.  Although I did build and run a dev lab at Sun in the
>>> early days of Solaris Nevada.
>>>
>>> What kernel source version or branch will your current git tree patch to and compile with?
>>>
>>> I have tried everything from 3.5rc1 to Linus's current git tree.  It
>>> breaks DRBD.  Once I disable DRBD in the config it fails at
>>> target_core_iblock.c.  So I never get a full build.
>>>
>>> So either I am doing something obviously wrong or ???
>>
>> Bah, sounds like I didn't test my code when I rebased to 3.5. Lazy me.
>>
>> For the moment, if you're not actually using drbd or iscsi you could
>> just disable both of those in the config. Assuming I didn't break dm and
>> md too. heh.
>>
>>>
>>> I'd love to test this out and help with debugging.  I have plenty of
>>> hardware and interest in it working well.  I can even provide you
>>> access to debug when I hit an issue.
>>
>> Cool!
>>
>>> What error output can I provide and can you point me to good trees?
>>
>> You could send me your .config in case it turns out to be something
>> specific to your specific config. Ought to be something simple, though.
>>
>>> I appreciate it and I am glad to see that someone is finally making
>>> the effort to get a real, working block cache object into the
>>> kernel.  I will help in any way I can.
>>
>> Thanks :)
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bcache-3.2 branch
       [not found]             ` <20120709170742.GA26798-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-07-09 18:07               ` bcache & kernel branch that will build together Jason Warr
@ 2012-07-10 18:52               ` Joseph Glanville
       [not found]                 ` <CAOzFzEjUy9zakyBhE5CNhcP-Dv7+hQBz3uzhKsMy5kj5GxfGwg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Joseph Glanville @ 2012-07-10 18:52 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 10 July 2012 03:07, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Tue, Jul 10, 2012 at 02:32:36AM +1000, Joseph Glanville wrote:
>> On 10 July 2012 01:57, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> > On Wed, Jun 20, 2012 at 10:08:51PM +1000, Joseph Glanville wrote:
>> >> Hi Kent and list,
>> >>
>> >> I have pulled down the latest bcache code and have been playing around
>> >> with it when I noticed that I am having issues starting Xen virtual
>> >> machines using bcache + LVM.
>> >> What is interesting is the QEMU storage emulation in userspace is able
>> >> to access the device fine however blkback kernel module which uses the
>> >> device directly seems to fail.
>> >> How would I go about debugging any of this?
>> >>
>> >> Older versions of bcache work fine so it's a regression as far as I can tell.
>> >
>> > Hey, sorry for the delay - I just got back from my first sort-of
>> > vacation in... awhile :P
>> >
>> > I'm pretty sure I know the approximate source of the regression - I
>> > fairly recently reworked some code in the generic block layer to handle
>> > arbitrary size bios (which enabled some major cleanups in the bcache
>> > code). I've chased down a few bugs with that code since then.
>> >
>> > Got some logs for me to look at? Or did you want me to give you pointers
>> > on debugging kernel code? :)
>>
>> A few pointers would be great. :)
>
> More than happy to :) I'm not sure what sort of general pointers I could
> give you off the top of my head - there's no Unified Theory of
> Debugging, it's just a big bag of tricks you learn to narrow things down
> until you figure it out. But I'll try to tell you everything I'd do with
> this bug, at least (and whatever else you find :)
>
> Also just understanding how things work so you can figure out a root
> cause from the symptom.
>
>>
>> Also how do I best get it to do a really verbose log that I can use to
>> help you track down bugs?
>
> I think for all the bugs that have shown up in the wild so far we
> haven't needed any special logging, just the normal stuff has been fine.
> There's all kinds of logging and tracing and whatnot buried in there but
> for the most part you don't want to bother with the non default stuff
> unless you have to.
>
> But anyways, just whatever the kernel spits out is the place to start.
> If you've still got that, I'll take a look and tell you what I'd get out
> of it.

Unfortunately the kernel wasn't talking much, I didn't see anything
unusual and everything else seemed to work fine. :(
I was able to successfully use bcached LVM volumes with filesystems
too, it only became an issue when trying to use them as block devices
for virtual machines.
From the virtual machine all I could see where I/O errors, probably
caused by the xen_blkback module returning failed read.
Debugging that beast is not all that fun but I will see how I can go
setting up a test system sometime this week with the latest bcache
code.
We are pretty entrenched in 3.2 but would be be more useful if I
carried out testing on latter kernels instead or is 3.2 fine?

>
>>
>> >
>> >>
>> >> Joseph.
>> >>
>> >> --
>> >> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
>> >> Phone: 1300 56 99 52 | Mobile: 0428 754 846
>>
>> Cheers,
>> Joseph.
>>
>> --
>> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
>> Phone: 1300 56 99 52 | Mobile: 0428 754 846

Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bcache-3.2 branch
       [not found]                 ` <CAOzFzEjUy9zakyBhE5CNhcP-Dv7+hQBz3uzhKsMy5kj5GxfGwg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-07-13  9:01                   ` Kent Overstreet
       [not found]                     ` <CAH+dOxKJs5WmHk2hb+6fLCs=fCK_6mezPueaewaHwSn3jZ2m0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Kent Overstreet @ 2012-07-13  9:01 UTC (permalink / raw)
  To: Joseph Glanville; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Argh, weird.

That kinda sounds like it'd be a massive pain for me to reproduce too...

So you're only seeing errors with Xen, correct?

Probably have to figure out either what xen_blkback is doing different
from everything else (in which case we should be able to reproduce the
errors without it) or track down where in the io stack the errors are
coming from.

Neither sound very appealing :/ I've had to chase bugs that showed up
like that before, the io stack is big and messy.

If you can get a test system set up though I can try and help narrow it down.

Something that would be really useful for narrowing it down is finding
out whether LVM is required - i.e. whether xen_blkback + bcache on a
partition works.

3.2 should be fine for debugging this (I'm keeping it up to date, and
running it on my workstation at work).

On Tue, Jul 10, 2012 at 11:52 AM, Joseph Glanville
<joseph.glanville-2MxvZkOi9dvvnOemgxGiVw@public.gmane.org> wrote:
> On 10 July 2012 03:07, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> On Tue, Jul 10, 2012 at 02:32:36AM +1000, Joseph Glanville wrote:
>>> On 10 July 2012 01:57, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>>> > On Wed, Jun 20, 2012 at 10:08:51PM +1000, Joseph Glanville wrote:
>>> >> Hi Kent and list,
>>> >>
>>> >> I have pulled down the latest bcache code and have been playing around
>>> >> with it when I noticed that I am having issues starting Xen virtual
>>> >> machines using bcache + LVM.
>>> >> What is interesting is the QEMU storage emulation in userspace is able
>>> >> to access the device fine however blkback kernel module which uses the
>>> >> device directly seems to fail.
>>> >> How would I go about debugging any of this?
>>> >>
>>> >> Older versions of bcache work fine so it's a regression as far as I can tell.
>>> >
>>> > Hey, sorry for the delay - I just got back from my first sort-of
>>> > vacation in... awhile :P
>>> >
>>> > I'm pretty sure I know the approximate source of the regression - I
>>> > fairly recently reworked some code in the generic block layer to handle
>>> > arbitrary size bios (which enabled some major cleanups in the bcache
>>> > code). I've chased down a few bugs with that code since then.
>>> >
>>> > Got some logs for me to look at? Or did you want me to give you pointers
>>> > on debugging kernel code? :)
>>>
>>> A few pointers would be great. :)
>>
>> More than happy to :) I'm not sure what sort of general pointers I could
>> give you off the top of my head - there's no Unified Theory of
>> Debugging, it's just a big bag of tricks you learn to narrow things down
>> until you figure it out. But I'll try to tell you everything I'd do with
>> this bug, at least (and whatever else you find :)
>>
>> Also just understanding how things work so you can figure out a root
>> cause from the symptom.
>>
>>>
>>> Also how do I best get it to do a really verbose log that I can use to
>>> help you track down bugs?
>>
>> I think for all the bugs that have shown up in the wild so far we
>> haven't needed any special logging, just the normal stuff has been fine.
>> There's all kinds of logging and tracing and whatnot buried in there but
>> for the most part you don't want to bother with the non default stuff
>> unless you have to.
>>
>> But anyways, just whatever the kernel spits out is the place to start.
>> If you've still got that, I'll take a look and tell you what I'd get out
>> of it.
>
> Unfortunately the kernel wasn't talking much, I didn't see anything
> unusual and everything else seemed to work fine. :(
> I was able to successfully use bcached LVM volumes with filesystems
> too, it only became an issue when trying to use them as block devices
> for virtual machines.
> From the virtual machine all I could see where I/O errors, probably
> caused by the xen_blkback module returning failed read.
> Debugging that beast is not all that fun but I will see how I can go
> setting up a test system sometime this week with the latest bcache
> code.
> We are pretty entrenched in 3.2 but would be be more useful if I
> carried out testing on latter kernels instead or is 3.2 fine?
>
>>
>>>
>>> >
>>> >>
>>> >> Joseph.
>>> >>
>>> >> --
>>> >> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
>>> >> Phone: 1300 56 99 52 | Mobile: 0428 754 846
>>>
>>> Cheers,
>>> Joseph.
>>>
>>> --
>>> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
>>> Phone: 1300 56 99 52 | Mobile: 0428 754 846
>
> Joseph.
>
> --
> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bcache-3.2 branch
       [not found]                     ` <CAH+dOxKJs5WmHk2hb+6fLCs=fCK_6mezPueaewaHwSn3jZ2m0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-07-13 21:10                       ` Joseph Glanville
  0 siblings, 0 replies; 11+ messages in thread
From: Joseph Glanville @ 2012-07-13 21:10 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 13 July 2012 19:01, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> Argh, weird.
>
> That kinda sounds like it'd be a massive pain for me to reproduce too...
>
> So you're only seeing errors with Xen, correct?

Yes, it seems find under other workloads. I will try dropping LVM out
of it and see how that goes.

>
> Probably have to figure out either what xen_blkback is doing different
> from everything else (in which case we should be able to reproduce the
> errors without it) or track down where in the io stack the errors are
> coming from.
>
> Neither sound very appealing :/ I've had to chase bugs that showed up
> like that before, the io stack is big and messy.
>
> If you can get a test system set up though I can try and help narrow it down.

For sure, should have something running on Monday to try play with it some more.

>
> Something that would be really useful for narrowing it down is finding
> out whether LVM is required - i.e. whether xen_blkback + bcache on a
> partition works.
>
> 3.2 should be fine for debugging this (I'm keeping it up to date, and
> running it on my workstation at work).

3.2 is a good target for a stable version, most major distributions
are heavily invested in 3.2 at this point.

>
> On Tue, Jul 10, 2012 at 11:52 AM, Joseph Glanville
> <joseph.glanville-2MxvZkOi9dvvnOemgxGiVw@public.gmane.org> wrote:
>> On 10 July 2012 03:07, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>>> On Tue, Jul 10, 2012 at 02:32:36AM +1000, Joseph Glanville wrote:
>>>> On 10 July 2012 01:57, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>>>> > On Wed, Jun 20, 2012 at 10:08:51PM +1000, Joseph Glanville wrote:
>>>> >> Hi Kent and list,
>>>> >>
>>>> >> I have pulled down the latest bcache code and have been playing around
>>>> >> with it when I noticed that I am having issues starting Xen virtual
>>>> >> machines using bcache + LVM.
>>>> >> What is interesting is the QEMU storage emulation in userspace is able
>>>> >> to access the device fine however blkback kernel module which uses the
>>>> >> device directly seems to fail.
>>>> >> How would I go about debugging any of this?
>>>> >>
>>>> >> Older versions of bcache work fine so it's a regression as far as I can tell.
>>>> >
>>>> > Hey, sorry for the delay - I just got back from my first sort-of
>>>> > vacation in... awhile :P
>>>> >
>>>> > I'm pretty sure I know the approximate source of the regression - I
>>>> > fairly recently reworked some code in the generic block layer to handle
>>>> > arbitrary size bios (which enabled some major cleanups in the bcache
>>>> > code). I've chased down a few bugs with that code since then.
>>>> >
>>>> > Got some logs for me to look at? Or did you want me to give you pointers
>>>> > on debugging kernel code? :)
>>>>
>>>> A few pointers would be great. :)
>>>
>>> More than happy to :) I'm not sure what sort of general pointers I could
>>> give you off the top of my head - there's no Unified Theory of
>>> Debugging, it's just a big bag of tricks you learn to narrow things down
>>> until you figure it out. But I'll try to tell you everything I'd do with
>>> this bug, at least (and whatever else you find :)
>>>
>>> Also just understanding how things work so you can figure out a root
>>> cause from the symptom.
>>>
>>>>
>>>> Also how do I best get it to do a really verbose log that I can use to
>>>> help you track down bugs?
>>>
>>> I think for all the bugs that have shown up in the wild so far we
>>> haven't needed any special logging, just the normal stuff has been fine.
>>> There's all kinds of logging and tracing and whatnot buried in there but
>>> for the most part you don't want to bother with the non default stuff
>>> unless you have to.
>>>
>>> But anyways, just whatever the kernel spits out is the place to start.
>>> If you've still got that, I'll take a look and tell you what I'd get out
>>> of it.
>>
>> Unfortunately the kernel wasn't talking much, I didn't see anything
>> unusual and everything else seemed to work fine. :(
>> I was able to successfully use bcached LVM volumes with filesystems
>> too, it only became an issue when trying to use them as block devices
>> for virtual machines.
>> From the virtual machine all I could see where I/O errors, probably
>> caused by the xen_blkback module returning failed read.
>> Debugging that beast is not all that fun but I will see how I can go
>> setting up a test system sometime this week with the latest bcache
>> code.
>> We are pretty entrenched in 3.2 but would be be more useful if I
>> carried out testing on latter kernels instead or is 3.2 fine?
>>
>>>
>>>>
>>>> >
>>>> >>
>>>> >> Joseph.
>>>> >>
>>>> >> --
>>>> >> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
>>>> >> Phone: 1300 56 99 52 | Mobile: 0428 754 846
>>>>
>>>> Cheers,
>>>> Joseph.
>>>>
>>>> --
>>>> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
>>>> Phone: 1300 56 99 52 | Mobile: 0428 754 846
>>
>> Joseph.
>>
>> --
>> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
>> Phone: 1300 56 99 52 | Mobile: 0428 754 846



-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-07-13 21:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-20 12:08 bcache-3.2 branch Joseph Glanville
     [not found] ` <CAOzFzEh8pO37dVWoMoD+hFoUGrBoubSdktdu7SQS0UcXLcC66w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-09 15:57   ` Kent Overstreet
     [not found]     ` <20120709155734.GA23774-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-07-09 16:32       ` Joseph Glanville
     [not found]         ` <CAOzFzEjovYu4eE9E_asOBVyBhqFuvhgzJ7UFyESLY0XycAfkuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-09 17:07           ` Kent Overstreet
     [not found]             ` <20120709170742.GA26798-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-07-09 18:07               ` bcache & kernel branch that will build together Jason Warr
     [not found]                 ` <4FFB1DD4.7030304-/cow75dQlsI@public.gmane.org>
2012-07-09 18:41                   ` Kent Overstreet
     [not found]                     ` <20120709184149.GA3234-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-07-09 20:16                       ` Kent Overstreet
     [not found]                         ` <CAC7rs0vkasCKYwBTrvhWg8pzvHuix1ZaBJxTLBW+5AVG_31hEQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-09 21:48                           ` Jason Warr
2012-07-10 18:52               ` bcache-3.2 branch Joseph Glanville
     [not found]                 ` <CAOzFzEjUy9zakyBhE5CNhcP-Dv7+hQBz3uzhKsMy5kj5GxfGwg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-13  9:01                   ` Kent Overstreet
     [not found]                     ` <CAH+dOxKJs5WmHk2hb+6fLCs=fCK_6mezPueaewaHwSn3jZ2m0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-13 21:10                       ` Joseph Glanville

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.