All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: ext3 corruption in domU
@ 2013-04-16 17:39 Anthony Sheetz
  2013-04-17 13:00 ` Ian Campbell
  2013-05-06 12:46 ` Anthony Sheetz
  0 siblings, 2 replies; 25+ messages in thread
From: Anthony Sheetz @ 2013-04-16 17:39 UTC (permalink / raw)
  To: xen-devel

(re-sending, first message seems to have gotten lost)

I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.

First, I'm happy to provide more information about this bug as
requsted. I recognize not all relevant data has
been collected yet.

Detailed information about this bug can be found at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124.

The executive summary is: Using Debian Testing (7.0, wheezy) dom0 with
LVM and full disk encryption with
Debian Stable (6.0, Squeeze) domU, transferring large files via scp or
rsync over openswan results in data corruption, with
eventual file system corruption. The culprit appears to be full disk
encryption, however that evidence may not be conclusive.

While I don't mind providing additional information, I'd hate to have
to repeat the information I've provided to the Debian bug hunting
folks.

Thanks in advance for any help you can provide.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-04-16 17:39 BUG: ext3 corruption in domU Anthony Sheetz
@ 2013-04-17 13:00 ` Ian Campbell
  2013-04-22 12:22   ` Anthony Sheetz
  2013-05-24 17:48   ` Roger Pau Monné
  2013-05-06 12:46 ` Anthony Sheetz
  1 sibling, 2 replies; 25+ messages in thread
From: Ian Campbell @ 2013-04-17 13:00 UTC (permalink / raw)
  To: Anthony Sheetz; +Cc: Konrad Rzeszutek Wilk, Roger Pau Monne, xen-devel

On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
> (re-sending, first message seems to have gotten lost)
> 
> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.

I'm here too (different hat ;-)), thanks for posting it here. I've added
some people who know about the block stuff to the CC.

Guys, my suspicion is that the issue is that barriers issued by ext3
inside the guest aren't making it all the way down the
ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
filesystem to eventually corrupt itself.

The issue seems to relate to the use of dm-crypt since
ext3->blkfront->blkback->lvm->disk is reported work fine.

However there is no problem with the local dom0 ext3 root filesystem
which is also in the same lvm VG on the crypt device (i.e.
ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
something is up at the blkfront->back link which causes the barriers
which blkback is injecting into the block subsystem either don't make it
to the dm-crypt layer or do not DTRT once they arrive.

I'm not really sure with how to proceed (or how to ask Anthony to
proceed) with verifying any part of that hypothesis though.

ISTR issues with old vs new style barriers or barriers with no data in
them or something, could this be related to that? (or am I thinking of
DISCARD?)

The issue was initially reported with Squeeze (Jeremy 2.6.32 tree) domU
on a Wheezy (mainline 3.2) dom0 but IIRC has also been repeated with
Wheezy on Wheezy now so this isn't cross version confusion about barrier
semantics AFAICT.

Ian.

> First, I'm happy to provide more information about this bug as
> requsted. I recognize not all relevant data has
> been collected yet.
> 
> Detailed information about this bug can be found at
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124.
> 
> The executive summary is: Using Debian Testing (7.0, wheezy) dom0 with
> LVM and full disk encryption with
> Debian Stable (6.0, Squeeze) domU, transferring large files via scp or
> rsync over openswan results in data corruption, with
> eventual file system corruption. The culprit appears to be full disk
> encryption, however that evidence may not be conclusive.
> 
> While I don't mind providing additional information, I'd hate to have
> to repeat the information I've provided to the Debian bug hunting
> folks.
> 
> Thanks in advance for any help you can provide.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-04-17 13:00 ` Ian Campbell
@ 2013-04-22 12:22   ` Anthony Sheetz
  2013-04-22 12:26     ` Ian Campbell
  2013-05-24 17:48   ` Roger Pau Monné
  1 sibling, 1 reply; 25+ messages in thread
From: Anthony Sheetz @ 2013-04-22 12:22 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Konrad Rzeszutek Wilk, Roger Pau Monne, xen-devel

I realize folks are pretty busy, but we're still interested in getting
this problem solved, and I want to be sure it's not lost in the
shuffle.
Any chance of getting some attention for it?

On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
>> (re-sending, first message seems to have gotten lost)
>>
>> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
>
> I'm here too (different hat ;-)), thanks for posting it here. I've added
> some people who know about the block stuff to the CC.
>
> Guys, my suspicion is that the issue is that barriers issued by ext3
> inside the guest aren't making it all the way down the
> ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
> filesystem to eventually corrupt itself.
>
> The issue seems to relate to the use of dm-crypt since
> ext3->blkfront->blkback->lvm->disk is reported work fine.
>
> However there is no problem with the local dom0 ext3 root filesystem
> which is also in the same lvm VG on the crypt device (i.e.
> ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
> something is up at the blkfront->back link which causes the barriers
> which blkback is injecting into the block subsystem either don't make it
> to the dm-crypt layer or do not DTRT once they arrive.
>
> I'm not really sure with how to proceed (or how to ask Anthony to
> proceed) with verifying any part of that hypothesis though.
>
> ISTR issues with old vs new style barriers or barriers with no data in
> them or something, could this be related to that? (or am I thinking of
> DISCARD?)
>
> The issue was initially reported with Squeeze (Jeremy 2.6.32 tree) domU
> on a Wheezy (mainline 3.2) dom0 but IIRC has also been repeated with
> Wheezy on Wheezy now so this isn't cross version confusion about barrier
> semantics AFAICT.
>
> Ian.
>
>> First, I'm happy to provide more information about this bug as
>> requsted. I recognize not all relevant data has
>> been collected yet.
>>
>> Detailed information about this bug can be found at
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124.
>>
>> The executive summary is: Using Debian Testing (7.0, wheezy) dom0 with
>> LVM and full disk encryption with
>> Debian Stable (6.0, Squeeze) domU, transferring large files via scp or
>> rsync over openswan results in data corruption, with
>> eventual file system corruption. The culprit appears to be full disk
>> encryption, however that evidence may not be conclusive.
>>
>> While I don't mind providing additional information, I'd hate to have
>> to repeat the information I've provided to the Debian bug hunting
>> folks.
>>
>> Thanks in advance for any help you can provide.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-04-22 12:22   ` Anthony Sheetz
@ 2013-04-22 12:26     ` Ian Campbell
  2013-05-22 20:10       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2013-04-22 12:26 UTC (permalink / raw)
  To: Anthony Sheetz; +Cc: Konrad Rzeszutek Wilk, Roger Pau Monne, xen-devel

Konrad is on vacation this week, so it'll probably be next week before
this gets looked at by him.

Ian.

On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote:
> I realize folks are pretty busy, but we're still interested in getting
> this problem solved, and I want to be sure it's not lost in the
> shuffle.
> Any chance of getting some attention for it?
> 
> On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
> >> (re-sending, first message seems to have gotten lost)
> >>
> >> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
> >
> > I'm here too (different hat ;-)), thanks for posting it here. I've added
> > some people who know about the block stuff to the CC.
> >
> > Guys, my suspicion is that the issue is that barriers issued by ext3
> > inside the guest aren't making it all the way down the
> > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
> > filesystem to eventually corrupt itself.
> >
> > The issue seems to relate to the use of dm-crypt since
> > ext3->blkfront->blkback->lvm->disk is reported work fine.
> >
> > However there is no problem with the local dom0 ext3 root filesystem
> > which is also in the same lvm VG on the crypt device (i.e.
> > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
> > something is up at the blkfront->back link which causes the barriers
> > which blkback is injecting into the block subsystem either don't make it
> > to the dm-crypt layer or do not DTRT once they arrive.
> >
> > I'm not really sure with how to proceed (or how to ask Anthony to
> > proceed) with verifying any part of that hypothesis though.
> >
> > ISTR issues with old vs new style barriers or barriers with no data in
> > them or something, could this be related to that? (or am I thinking of
> > DISCARD?)
> >
> > The issue was initially reported with Squeeze (Jeremy 2.6.32 tree) domU
> > on a Wheezy (mainline 3.2) dom0 but IIRC has also been repeated with
> > Wheezy on Wheezy now so this isn't cross version confusion about barrier
> > semantics AFAICT.
> >
> > Ian.
> >
> >> First, I'm happy to provide more information about this bug as
> >> requsted. I recognize not all relevant data has
> >> been collected yet.
> >>
> >> Detailed information about this bug can be found at
> >> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124.
> >>
> >> The executive summary is: Using Debian Testing (7.0, wheezy) dom0 with
> >> LVM and full disk encryption with
> >> Debian Stable (6.0, Squeeze) domU, transferring large files via scp or
> >> rsync over openswan results in data corruption, with
> >> eventual file system corruption. The culprit appears to be full disk
> >> encryption, however that evidence may not be conclusive.
> >>
> >> While I don't mind providing additional information, I'd hate to have
> >> to repeat the information I've provided to the Debian bug hunting
> >> folks.
> >>
> >> Thanks in advance for any help you can provide.
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
> >
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-04-16 17:39 BUG: ext3 corruption in domU Anthony Sheetz
  2013-04-17 13:00 ` Ian Campbell
@ 2013-05-06 12:46 ` Anthony Sheetz
  1 sibling, 0 replies; 25+ messages in thread
From: Anthony Sheetz @ 2013-05-06 12:46 UTC (permalink / raw)
  To: xen-devel

I would once again like to request help with a bug in Xen. Repeating
message from April 16th:

First, I'm happy to provide more information about this bug as
requsted. I recognize not all relevant data has
been collected yet.

Detailed information about this bug can be found at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124.

The executive summary is: Using Debian Testing (7.0, wheezy) dom0 with
LVM and full disk encryption with
Debian Stable (6.0, Squeeze) domU, transferring large files via scp or
rsync over openswan results in data corruption, with
eventual file system corruption. The culprit appears to be full disk
encryption, however that evidence may not be conclusive.

While I don't mind providing additional information, I'd hate to have
to repeat the information I've provided to the Debian bug hunting
folks.

Thanks in advance for any help you can provide.

On Tue, Apr 16, 2013 at 1:39 PM, Anthony Sheetz <sheetzam@inspire.com> wrote:
> (re-sending, first message seems to have gotten lost)
>
> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
>
> First, I'm happy to provide more information about this bug as
> requsted. I recognize not all relevant data has
> been collected yet.
>
> Detailed information about this bug can be found at
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124.
>
> The executive summary is: Using Debian Testing (7.0, wheezy) dom0 with
> LVM and full disk encryption with
> Debian Stable (6.0, Squeeze) domU, transferring large files via scp or
> rsync over openswan results in data corruption, with
> eventual file system corruption. The culprit appears to be full disk
> encryption, however that evidence may not be conclusive.
>
> While I don't mind providing additional information, I'd hate to have
> to repeat the information I've provided to the Debian bug hunting
> folks.
>
> Thanks in advance for any help you can provide.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-04-22 12:26     ` Ian Campbell
@ 2013-05-22 20:10       ` Konrad Rzeszutek Wilk
  2013-05-23 18:19         ` Anthony Sheetz
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-05-22 20:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Anthony Sheetz, Roger Pau Monne, xen-devel

On Mon, Apr 22, 2013 at 01:26:34PM +0100, Ian Campbell wrote:
> Konrad is on vacation this week, so it'll probably be next week before
> this gets looked at by him.

And I finally got to this email in my 'vacation-mbox'
> 
> Ian.
> 
> On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote:
> > I realize folks are pretty busy, but we're still interested in getting
> > this problem solved, and I want to be sure it's not lost in the
> > shuffle.
> > Any chance of getting some attention for it?
> > 
> > On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
> > >> (re-sending, first message seems to have gotten lost)
> > >>
> > >> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
> > >
> > > I'm here too (different hat ;-)), thanks for posting it here. I've added
> > > some people who know about the block stuff to the CC.
> > >
> > > Guys, my suspicion is that the issue is that barriers issued by ext3
> > > inside the guest aren't making it all the way down the
> > > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
> > > filesystem to eventually corrupt itself.
> > >
> > > The issue seems to relate to the use of dm-crypt since
> > > ext3->blkfront->blkback->lvm->disk is reported work fine.
> > >
> > > However there is no problem with the local dom0 ext3 root filesystem
> > > which is also in the same lvm VG on the crypt device (i.e.
> > > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
> > > something is up at the blkfront->back link which causes the barriers
> > > which blkback is injecting into the block subsystem either don't make it
> > > to the dm-crypt layer or do not DTRT once they arrive.
> > >
> > > I'm not really sure with how to proceed (or how to ask Anthony to
> > > proceed) with verifying any part of that hypothesis though.
> > >
> > > ISTR issues with old vs new style barriers or barriers with no data in
> > > them or something, could this be related to that? (or am I thinking of
> > > DISCARD?)

You are using two different kernel versions. The 2.6.32 domU is only using
WRITE_BARRIERs, while in the 3.2 kernels that have been completly eliminated.
The mechanism they use is called 'WRITE_FLUSH'. The 3.2 kernel has a patch:
ommit 29bde093787f3bdf7b9b4270ada6be7c8076e36b
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Mon Oct 10 00:42:22 2011 -0400

    xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.


which emulates the barrier request by draining all of the oustanding I/Os and then
sending the WRITE_FLUSH.

But it looks like you are hitting an issue here. Just to make sure 
that is the case, what happens if you use the _same_ kernel in both dom0 and
domU? Does it work then?

> > >
> > > The issue was initially reported with Squeeze (Jeremy 2.6.32 tree) domU
> > > on a Wheezy (mainline 3.2) dom0 but IIRC has also been repeated with
> > > Wheezy on Wheezy now so this isn't cross version confusion about barrier
> > > semantics AFAICT.
> > >
> > > Ian.
> > >
> > >> First, I'm happy to provide more information about this bug as
> > >> requsted. I recognize not all relevant data has
> > >> been collected yet.
> > >>
> > >> Detailed information about this bug can be found at
> > >> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124.
> > >>
> > >> The executive summary is: Using Debian Testing (7.0, wheezy) dom0 with
> > >> LVM and full disk encryption with
> > >> Debian Stable (6.0, Squeeze) domU, transferring large files via scp or
> > >> rsync over openswan results in data corruption, with
> > >> eventual file system corruption. The culprit appears to be full disk
> > >> encryption, however that evidence may not be conclusive.
> > >>
> > >> While I don't mind providing additional information, I'd hate to have
> > >> to repeat the information I've provided to the Debian bug hunting
> > >> folks.
> > >>
> > >> Thanks in advance for any help you can provide.
> > >>
> > >> _______________________________________________
> > >> Xen-devel mailing list
> > >> Xen-devel@lists.xen.org
> > >> http://lists.xen.org/xen-devel
> > >
> > >
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-22 20:10       ` Konrad Rzeszutek Wilk
@ 2013-05-23 18:19         ` Anthony Sheetz
  2013-05-24 14:20           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Anthony Sheetz @ 2013-05-23 18:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Roger Pau Monne, Ian Campbell, xen-devel

On Wed, May 22, 2013 at 4:10 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Mon, Apr 22, 2013 at 01:26:34PM +0100, Ian Campbell wrote:
>> Konrad is on vacation this week, so it'll probably be next week before
>> this gets looked at by him.
>
> And I finally got to this email in my 'vacation-mbox'
>>
>> Ian.
>>
>> On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote:
>> > I realize folks are pretty busy, but we're still interested in getting
>> > this problem solved, and I want to be sure it's not lost in the
>> > shuffle.
>> > Any chance of getting some attention for it?
>> >
>> > On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>> > > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
>> > >> (re-sending, first message seems to have gotten lost)
>> > >>
>> > >> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
>> > >
>> > > I'm here too (different hat ;-)), thanks for posting it here. I've added
>> > > some people who know about the block stuff to the CC.
>> > >
>> > > Guys, my suspicion is that the issue is that barriers issued by ext3
>> > > inside the guest aren't making it all the way down the
>> > > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
>> > > filesystem to eventually corrupt itself.
>> > >
>> > > The issue seems to relate to the use of dm-crypt since
>> > > ext3->blkfront->blkback->lvm->disk is reported work fine.
>> > >
>> > > However there is no problem with the local dom0 ext3 root filesystem
>> > > which is also in the same lvm VG on the crypt device (i.e.
>> > > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
>> > > something is up at the blkfront->back link which causes the barriers
>> > > which blkback is injecting into the block subsystem either don't make it
>> > > to the dm-crypt layer or do not DTRT once they arrive.
>> > >
>> > > I'm not really sure with how to proceed (or how to ask Anthony to
>> > > proceed) with verifying any part of that hypothesis though.
>> > >
>> > > ISTR issues with old vs new style barriers or barriers with no data in
>> > > them or something, could this be related to that? (or am I thinking of
>> > > DISCARD?)
>
> You are using two different kernel versions. The 2.6.32 domU is only using
> WRITE_BARRIERs, while in the 3.2 kernels that have been completly eliminated.
> The mechanism they use is called 'WRITE_FLUSH'. The 3.2 kernel has a patch:
> ommit 29bde093787f3bdf7b9b4270ada6be7c8076e36b
> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Date:   Mon Oct 10 00:42:22 2011 -0400
>
>     xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
>
>
> which emulates the barrier request by draining all of the oustanding I/Os and then
> sending the WRITE_FLUSH.
>
> But it looks like you are hitting an issue here. Just to make sure
> that is the case, what happens if you use the _same_ kernel in both dom0 and
> domU? Does it work then?
>

First, thank you so much for getting back to me, it's really appreciated.
At this point I've forgotten if I did this with Wheezy on Wheezy, and
what the result was.
I'll have to test using the 3.2 kernel on the domU Debian Squeeze and
get back to you. I should be able to do that early next week.

>> > >
>> > > The issue was initially reported with Squeeze (Jeremy 2.6.32 tree) domU
>> > > on a Wheezy (mainline 3.2) dom0 but IIRC has also been repeated with
>> > > Wheezy on Wheezy now so this isn't cross version confusion about barrier
>> > > semantics AFAICT.
>> > >
>> > > Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-23 18:19         ` Anthony Sheetz
@ 2013-05-24 14:20           ` Konrad Rzeszutek Wilk
  2013-05-28 14:27             ` Anthony Sheetz
  2013-05-29 11:53             ` Anthony Sheetz
  0 siblings, 2 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-05-24 14:20 UTC (permalink / raw)
  To: Anthony Sheetz; +Cc: xen-devel, Ian Campbell, Roger Pau Monne

On Thu, May 23, 2013 at 02:19:50PM -0400, Anthony Sheetz wrote:
> On Wed, May 22, 2013 at 4:10 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Mon, Apr 22, 2013 at 01:26:34PM +0100, Ian Campbell wrote:
> >> Konrad is on vacation this week, so it'll probably be next week before
> >> this gets looked at by him.
> >
> > And I finally got to this email in my 'vacation-mbox'
> >>
> >> Ian.
> >>
> >> On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote:
> >> > I realize folks are pretty busy, but we're still interested in getting
> >> > this problem solved, and I want to be sure it's not lost in the
> >> > shuffle.
> >> > Any chance of getting some attention for it?
> >> >
> >> > On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> >> > > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
> >> > >> (re-sending, first message seems to have gotten lost)
> >> > >>
> >> > >> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
> >> > >
> >> > > I'm here too (different hat ;-)), thanks for posting it here. I've added
> >> > > some people who know about the block stuff to the CC.
> >> > >
> >> > > Guys, my suspicion is that the issue is that barriers issued by ext3
> >> > > inside the guest aren't making it all the way down the
> >> > > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
> >> > > filesystem to eventually corrupt itself.
> >> > >
> >> > > The issue seems to relate to the use of dm-crypt since
> >> > > ext3->blkfront->blkback->lvm->disk is reported work fine.
> >> > >
> >> > > However there is no problem with the local dom0 ext3 root filesystem
> >> > > which is also in the same lvm VG on the crypt device (i.e.
> >> > > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
> >> > > something is up at the blkfront->back link which causes the barriers
> >> > > which blkback is injecting into the block subsystem either don't make it
> >> > > to the dm-crypt layer or do not DTRT once they arrive.
> >> > >
> >> > > I'm not really sure with how to proceed (or how to ask Anthony to
> >> > > proceed) with verifying any part of that hypothesis though.
> >> > >
> >> > > ISTR issues with old vs new style barriers or barriers with no data in
> >> > > them or something, could this be related to that? (or am I thinking of
> >> > > DISCARD?)
> >
> > You are using two different kernel versions. The 2.6.32 domU is only using
> > WRITE_BARRIERs, while in the 3.2 kernels that have been completly eliminated.
> > The mechanism they use is called 'WRITE_FLUSH'. The 3.2 kernel has a patch:
> > ommit 29bde093787f3bdf7b9b4270ada6be7c8076e36b
> > Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Date:   Mon Oct 10 00:42:22 2011 -0400
> >
> >     xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
> >
> >
> > which emulates the barrier request by draining all of the oustanding I/Os and then
> > sending the WRITE_FLUSH.
> >
> > But it looks like you are hitting an issue here. Just to make sure
> > that is the case, what happens if you use the _same_ kernel in both dom0 and
> > domU? Does it work then?
> >
> 
> First, thank you so much for getting back to me, it's really appreciated.
> At this point I've forgotten if I did this with Wheezy on Wheezy, and
> what the result was.
> I'll have to test using the 3.2 kernel on the domU Debian Squeeze and
> get back to you. I should be able to do that early next week.

Thank you. Also when you do this test, could you also provide the 'xenstore-ls'
output from dom0? And the 'dmesg' output from the guest (or at least
the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
the frontend/backend have the right negotiation parameters.

Have a good weekend!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-04-17 13:00 ` Ian Campbell
  2013-04-22 12:22   ` Anthony Sheetz
@ 2013-05-24 17:48   ` Roger Pau Monné
  2013-05-28 12:10     ` Anthony Sheetz
  1 sibling, 1 reply; 25+ messages in thread
From: Roger Pau Monné @ 2013-05-24 17:48 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Anthony Sheetz, Konrad Rzeszutek Wilk, xen-devel

On 17/04/13 15:00, Ian Campbell wrote:
> On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
>> (re-sending, first message seems to have gotten lost)
>>
>> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
> 
> I'm here too (different hat ;-)), thanks for posting it here. I've added
> some people who know about the block stuff to the CC.
> 
> Guys, my suspicion is that the issue is that barriers issued by ext3
> inside the guest aren't making it all the way down the
> ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
> filesystem to eventually corrupt itself.
> 
> The issue seems to relate to the use of dm-crypt since
> ext3->blkfront->blkback->lvm->disk is reported work fine.
> 
> However there is no problem with the local dom0 ext3 root filesystem
> which is also in the same lvm VG on the crypt device (i.e.
> ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
> something is up at the blkfront->back link which causes the barriers
> which blkback is injecting into the block subsystem either don't make it
> to the dm-crypt layer or do not DTRT once they arrive.
> 
> I'm not really sure with how to proceed (or how to ask Anthony to
> proceed) with verifying any part of that hypothesis though.
> 
> ISTR issues with old vs new style barriers or barriers with no data in
> them or something, could this be related to that? (or am I thinking of
> DISCARD?)
> 
> The issue was initially reported with Squeeze (Jeremy 2.6.32 tree) domU
> on a Wheezy (mainline 3.2) dom0 but IIRC has also been repeated with
> Wheezy on Wheezy now so this isn't cross version confusion about barrier
> semantics AFAICT.

Hello,

I've been trying to reproduce this issue, but so far I haven't been able
to. I guess I'm missing something, so here are the steps I followed:

First, I've created a primary partition in my HDD, it's sda3, and then
I've executed the following in order to encrypt it and setup the lvm:

# cryptsetup luksFormat /dev/sda3
# cryptsetup luksOpen /dev/sda3 crypt
# pvcreate /dev/mapper/crypt
# vgcreate crypt /dev/mapper/crypt
# lvcreate -L 20G crypt -n debian

That gives me a block device /dev/crypt/debian, that I'm attaching to a
Debian DomU as xvdb, I've created a partition to fill the whole disk and
formatted it inside the guest using mkfs.ext3.

Then, inside the guest, I've scp'ed a 10G file from a remote host, and
checked the checksum, everything OK. So far, I've tested with a Dom0
kernel 3.2.0-0.bpo.4-amd64 and a DomU kernel 3.2.0-0.bpo.4-amd64 and
2.6.32-5-xen-amd64, both tests where OK.

Regards, Roger.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-24 17:48   ` Roger Pau Monné
@ 2013-05-28 12:10     ` Anthony Sheetz
  2013-05-28 12:14       ` Roger Pau Monné
  0 siblings, 1 reply; 25+ messages in thread
From: Anthony Sheetz @ 2013-05-28 12:10 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Konrad Rzeszutek Wilk, Ian Campbell, xen-devel

Missed a reply-all...

I would guess the difference is I am using LVM with full disk
encryption. Take a look at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124 for the
details on exactly how I am able to recreate this bug.
In other words, I use the installer and chose the option to use full
disk encryption and LVM.
I'll be starting with the rest of the testing and data collection
which was requested shortly.

On Fri, May 24, 2013 at 1:48 PM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On 17/04/13 15:00, Ian Campbell wrote:
>> On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
>>> (re-sending, first message seems to have gotten lost)
>>>
>>> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
>>
>> I'm here too (different hat ;-)), thanks for posting it here. I've added
>> some people who know about the block stuff to the CC.
>>
>> Guys, my suspicion is that the issue is that barriers issued by ext3
>> inside the guest aren't making it all the way down the
>> ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
>> filesystem to eventually corrupt itself.
>>
>> The issue seems to relate to the use of dm-crypt since
>> ext3->blkfront->blkback->lvm->disk is reported work fine.
>>
>> However there is no problem with the local dom0 ext3 root filesystem
>> which is also in the same lvm VG on the crypt device (i.e.
>> ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
>> something is up at the blkfront->back link which causes the barriers
>> which blkback is injecting into the block subsystem either don't make it
>> to the dm-crypt layer or do not DTRT once they arrive.
>>
>> I'm not really sure with how to proceed (or how to ask Anthony to
>> proceed) with verifying any part of that hypothesis though.
>>
>> ISTR issues with old vs new style barriers or barriers with no data in
>> them or something, could this be related to that? (or am I thinking of
>> DISCARD?)
>>
>> The issue was initially reported with Squeeze (Jeremy 2.6.32 tree) domU
>> on a Wheezy (mainline 3.2) dom0 but IIRC has also been repeated with
>> Wheezy on Wheezy now so this isn't cross version confusion about barrier
>> semantics AFAICT.
>
> Hello,
>
> I've been trying to reproduce this issue, but so far I haven't been able
> to. I guess I'm missing something, so here are the steps I followed:
>
> First, I've created a primary partition in my HDD, it's sda3, and then
> I've executed the following in order to encrypt it and setup the lvm:
>
> # cryptsetup luksFormat /dev/sda3
> # cryptsetup luksOpen /dev/sda3 crypt
> # pvcreate /dev/mapper/crypt
> # vgcreate crypt /dev/mapper/crypt
> # lvcreate -L 20G crypt -n debian
>
> That gives me a block device /dev/crypt/debian, that I'm attaching to a
> Debian DomU as xvdb, I've created a partition to fill the whole disk and
> formatted it inside the guest using mkfs.ext3.
>
> Then, inside the guest, I've scp'ed a 10G file from a remote host, and
> checked the checksum, everything OK. So far, I've tested with a Dom0
> kernel 3.2.0-0.bpo.4-amd64 and a DomU kernel 3.2.0-0.bpo.4-amd64 and
> 2.6.32-5-xen-amd64, both tests where OK.
>
> Regards, Roger.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-28 12:10     ` Anthony Sheetz
@ 2013-05-28 12:14       ` Roger Pau Monné
  2013-05-28 18:15         ` Anthony Sheetz
  0 siblings, 1 reply; 25+ messages in thread
From: Roger Pau Monné @ 2013-05-28 12:14 UTC (permalink / raw)
  To: Anthony Sheetz; +Cc: Konrad Rzeszutek Wilk, Ian Campbell, xen-devel

On 28/05/13 14:10, Anthony Sheetz wrote:
> Missed a reply-all...
> 
> I would guess the difference is I am using LVM with full disk
> encryption. Take a look at
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124 for the
> details on exactly how I am able to recreate this bug.
> In other words, I use the installer and chose the option to use full
> disk encryption and LVM.
> I'll be starting with the rest of the testing and data collection
> which was requested shortly.

I would like to avoid reinstalling my whole OS, and I don't have a spare
HDD, so isn't there anyway I can reproduce the full disk encryption
using a partition?

> 
> On Fri, May 24, 2013 at 1:48 PM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>> On 17/04/13 15:00, Ian Campbell wrote:
>>> On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
>>>> (re-sending, first message seems to have gotten lost)
>>>>
>>>> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
>>>
>>> I'm here too (different hat ;-)), thanks for posting it here. I've added
>>> some people who know about the block stuff to the CC.
>>>
>>> Guys, my suspicion is that the issue is that barriers issued by ext3
>>> inside the guest aren't making it all the way down the
>>> ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
>>> filesystem to eventually corrupt itself.
>>>
>>> The issue seems to relate to the use of dm-crypt since
>>> ext3->blkfront->blkback->lvm->disk is reported work fine.
>>>
>>> However there is no problem with the local dom0 ext3 root filesystem
>>> which is also in the same lvm VG on the crypt device (i.e.
>>> ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
>>> something is up at the blkfront->back link which causes the barriers
>>> which blkback is injecting into the block subsystem either don't make it
>>> to the dm-crypt layer or do not DTRT once they arrive.
>>>
>>> I'm not really sure with how to proceed (or how to ask Anthony to
>>> proceed) with verifying any part of that hypothesis though.
>>>
>>> ISTR issues with old vs new style barriers or barriers with no data in
>>> them or something, could this be related to that? (or am I thinking of
>>> DISCARD?)
>>>
>>> The issue was initially reported with Squeeze (Jeremy 2.6.32 tree) domU
>>> on a Wheezy (mainline 3.2) dom0 but IIRC has also been repeated with
>>> Wheezy on Wheezy now so this isn't cross version confusion about barrier
>>> semantics AFAICT.
>>
>> Hello,
>>
>> I've been trying to reproduce this issue, but so far I haven't been able
>> to. I guess I'm missing something, so here are the steps I followed:
>>
>> First, I've created a primary partition in my HDD, it's sda3, and then
>> I've executed the following in order to encrypt it and setup the lvm:
>>
>> # cryptsetup luksFormat /dev/sda3
>> # cryptsetup luksOpen /dev/sda3 crypt
>> # pvcreate /dev/mapper/crypt
>> # vgcreate crypt /dev/mapper/crypt
>> # lvcreate -L 20G crypt -n debian
>>
>> That gives me a block device /dev/crypt/debian, that I'm attaching to a
>> Debian DomU as xvdb, I've created a partition to fill the whole disk and
>> formatted it inside the guest using mkfs.ext3.
>>
>> Then, inside the guest, I've scp'ed a 10G file from a remote host, and
>> checked the checksum, everything OK. So far, I've tested with a Dom0
>> kernel 3.2.0-0.bpo.4-amd64 and a DomU kernel 3.2.0-0.bpo.4-amd64 and
>> 2.6.32-5-xen-amd64, both tests where OK.
>>
>> Regards, Roger.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-24 14:20           ` Konrad Rzeszutek Wilk
@ 2013-05-28 14:27             ` Anthony Sheetz
  2013-05-28 18:02               ` Anthony Sheetz
  2013-05-29 11:53             ` Anthony Sheetz
  1 sibling, 1 reply; 25+ messages in thread
From: Anthony Sheetz @ 2013-05-28 14:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Ian Campbell, Roger Pau Monne

[-- Attachment #1: Type: text/plain, Size: 513 bytes --]

> Thank you. Also when you do this test, could you also provide the 'xenstore-ls'
> output from dom0? And the 'dmesg' output from the guest (or at least
> the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
> the frontend/backend have the right negotiation parameters.

Attached is the output of xenstore-ls from dom0, and dmesg from a domU
with kernel 2.6.32-5-xen-amd64
Will be working on putting a 3.2 kernel in place next, testing file
transfer, and adding the output of dmesg from that.

[-- Attachment #2: xenstore-ls.txt --]
[-- Type: text/plain, Size: 1412 bytes --]

tool = ""
 xenstored = ""
local = ""
 domain = ""
  0 = ""
   vm = "/vm/00000000-0000-0000-0000-000000000000"
   device = ""
   control = ""
    platform-feature-multiprocessor-suspend = "1"
   error = ""
   memory = ""
    target = "7552132"
   guest = ""
   hvmpv = ""
   data = ""
   cpu = ""
    1 = ""
     availability = "online"
    3 = ""
     availability = "online"
    2 = ""
     availability = "online"
    7 = ""
     availability = "online"
    0 = ""
     availability = "online"
    5 = ""
     availability = "online"
    6 = ""
     availability = "online"
    4 = ""
     availability = "online"
   description = ""
   console = ""
    limit = "1048576"
    type = "xenconsoled"
   domid = "0"
   name = "Domain-0"
 pool = ""
  0 = ""
   other_config = ""
   description = "Pool-0"
   uuid = "fc972176-828b-d7c6-669d-fe2e1911d99b"
   name = "Pool-0"
vm = ""
 00000000-0000-0000-0000-000000000000 = ""
  on_xend_stop = "ignore"
  pool_name = "Pool-0"
  shadow_memory = "0"
  uuid = "00000000-0000-0000-0000-000000000000"
  on_reboot = "restart"
  image = "(linux (kernel '') (superpages 0) (nomigrate 0) (tsc_mode 0))"
   ostype = "linux"
   kernel = ""
   cmdline = ""
   ramdisk = ""
  on_poweroff = "destroy"
  bootloader_args = ""
  on_xend_start = "ignore"
  on_crash = "restart"
  xend = ""
   restart_count = "0"
  vcpus = "8"
  vcpu_avail = "255"
  bootloader = ""
  name = "Domain-0"

[-- Attachment #3: dmesg.txt --]
[-- Type: text/plain, Size: 11191 bytes --]

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-48squeeze3) (dannf@debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Fri May 10 11:48:05 UTC 2013
[    0.000000] Command line: root=/dev/xvda2 ro root=/dev/xvda2 ro 
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] ACPI in unprivileged domain disabled
[    0.000000] released 0 pages of unused memory
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000060000000 (usable)
[    0.000000] DMI not present or invalid.
[    0.000000] last_pfn = 0x60000 max_arch_pfn = 0x400000000
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] init_memory_mapping: 0000000000000000-0000000060000000
[    0.000000]  0000000000 - 0060000000 page 4k
[    0.000000] kernel direct mapping tables up to 60000000 @ 100000-403000
[    0.000000] RAMDISK: 0170a000 - 030dc000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-0000000060000000
[    0.000000] Bootmem setup node 0 0000000000000000-0000000060000000
[    0.000000]   NODE_DATA [0000000000008000 - 000000000000ffff]
[    0.000000]   bootmap [0000000000010000 -  000000000001bfff] pages c
[    0.000000] (7 early reservations) ==> bootmem [0000000000 - 0060000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
[    0.000000]   #1 [00033df000 - 00033fe000]   XEN PAGETABLES ==> [00033df000 - 00033fe000]
[    0.000000]   #2 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
[    0.000000]   #3 [0001000000 - 00016e9b04]    TEXT DATA BSS ==> [0001000000 - 00016e9b04]
[    0.000000]   #4 [000170a000 - 00030dc000]          RAMDISK ==> [000170a000 - 00030dc000]
[    0.000000]   #5 [00030dc000 - 00033df000]   XEN START INFO ==> [00030dc000 - 00033df000]
[    0.000000]   #6 [0000100000 - 00003e1000]          PGTABLE ==> [0000100000 - 00003e1000]
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00100000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x000000a0
[    0.000000]     0: 0x00000100 -> 0x00060000
[    0.000000] On node 0 totalpages: 393120
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 740 pages reserved
[    0.000000]   DMA zone: 3204 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 5320 pages used for memmap
[    0.000000]   DMA32 zone: 383800 pages, LIFO batch:31
[    0.000000] SFI: Simple Firmware Interface v0.7 http://simplefirmware.org
[    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] No local APIC present
[    0.000000] APIC: disable apic facility
[    0.000000] nr_irqs_gsi: 16
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
[    0.000000] Allocating PCI resources starting at 60000000 (gap: 60000000:a0000000)
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.1.4 (preserve-AD)
[    0.000000] NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] PERCPU: Embedded 30 pages/cpu @ffff880003434000 s90392 r8192 d24296 u122880
[    0.000000] pcpu-alloc: s90392 r8192 d24296 u122880 alloc=30*4096
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] trying to map vcpu_info 0 at ffff88000343f020, mfn 11ae7a, offset 32
[    0.000000] cpu 0 using vcpu_info at ffff88000343f020
[    0.000000] Xen: using vcpu_info placement
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 387004
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: root=/dev/xvda2 ro root=/dev/xvda2 ro 
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Initializing CPU#0
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Calgary: detecting Calgary via BIOS EBDA area
[    0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[    0.000000] Memory: 1510808k/1572864k available (3156k kernel code, 384k absent, 61672k reserved, 2068k data, 604k init)
[    0.000000] SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:4352 nr_irqs:512
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [hvc0] enabled
[    0.000000] Xen: using vcpuop timer interface
[    0.000000] installing Xen timer for CPU 0
[    0.000000]   alloc irq_desc for 511 on node -1
[    0.000000]   alloc kstat_irqs on node -1
[    0.000000] Detected 2294.848 MHz processor.
[    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4589.69 BogoMIPS (lpj=9179392)
[    0.004000] Security Framework initialized
[    0.004000] SELinux:  Disabled at boot.
[    0.004000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.004000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.004000] Mount-cache hash table entries: 256
[    0.004000] Initializing cgroup subsys ns
[    0.004000] Initializing cgroup subsys cpuacct
[    0.004000] Initializing cgroup subsys devices
[    0.004000] Initializing cgroup subsys freezer
[    0.004000] Initializing cgroup subsys net_cls
[    0.004000] CPU: L1 I cache: 32K, L1 D cache: 32K
[    0.004000] CPU: L2 cache: 256K
[    0.004000] CPU: L3 cache: 6144K
[    0.004000] CPU 0/0x0 -> Node 0
[    0.004000] CPU: Unsupported number of siblings 16
[    0.004000] Performance Events: unsupported p6 CPU model 58 no PMU driver, software events only.
[    0.004000] SMP alternatives: switching to UP code
[    0.028304] Freeing SMP alternatives: 28k freed
[    0.028346]   alloc irq_desc for 510 on node -1
[    0.028348]   alloc kstat_irqs on node -1
[    0.028351]   alloc irq_desc for 509 on node -1
[    0.028353]   alloc kstat_irqs on node -1
[    0.028355]   alloc irq_desc for 508 on node -1
[    0.028356]   alloc kstat_irqs on node -1
[    0.028358]   alloc irq_desc for 507 on node -1
[    0.028359]   alloc kstat_irqs on node -1
[    0.028424] Brought up 1 CPUs
[    0.028438] CPU0 attaching NULL sched-domain.
[    0.028500] devtmpfs: initialized
[    0.031065] Grant table initialized
[    0.031069] regulator: core version 0.5
[    0.031103] NET: Registered protocol family 16
[    0.031137]   alloc irq_desc for 506 on node -1
[    0.031138]   alloc kstat_irqs on node -1
[    0.031698] PCI: setting up Xen PCI frontend stub
[    0.031955] bio: create slab <bio-0> at 0
[    0.031997] ACPI: Interpreter disabled.
[    0.032001] xen_balloon: Initialising balloon driver with page order 0.
[    0.032001] vgaarb: loaded
[    0.032037] PCI: System does not support PCI
[    0.032041] PCI: System does not support PCI
[    0.032092] Switching to clocksource xen
[    0.032733] pnp: PnP ACPI: disabled
[    0.032860] NET: Registered protocol family 2
[    0.032965] IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.033555] TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
[    0.034159] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.034273] TCP: Hash tables configured (established 262144 bind 65536)
[    0.034278] TCP reno registered
[    0.034324] NET: Registered protocol family 1
[    0.034360] Unpacking initramfs...
[    0.047134] Freeing initrd memory: 26440k freed
[    0.051991] platform rtc_cmos: registered platform RTC device (no PNP device found)
[    0.052149] audit: initializing netlink socket (disabled)
[    0.052159] type=2000 audit(1369750639.017:1): initialized
[    0.055146] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.055958] VFS: Disk quotas dquot_6.5.2
[    0.055995] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.056049] msgmni has been set to 3002
[    0.056272] alg: No test for stdrng (krng)
[    0.056309] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.056315] io scheduler noop registered
[    0.056317] io scheduler anticipatory registered
[    0.056321] io scheduler deadline registered
[    0.056357] io scheduler cfq registered (default)
[    0.058677] registering netback
[    0.059412]   alloc irq_desc for 505 on node -1
[    0.059414]   alloc kstat_irqs on node -1
[    0.059548] Linux agpgart interface v0.103
[    0.059569] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.059693] input: Macintosh mouse button emulation as /devices/virtual/input/input0
[    0.059725] PNP: No PS/2 controller found. Probing ports directly.
[    0.060548] i8042.c: No controller found.
[    0.060591] mice: PS/2 mouse device common for all mice
[    0.060641] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
[    0.200165] cpuidle: using governor ladder
[    0.200170] cpuidle: using governor menu
[    0.200178] No iBFT detected.
[    0.200372] TCP cubic registered
[    0.200448] NET: Registered protocol family 10
[    0.200920] Mobile IPv6
[    0.200925] NET: Registered protocol family 17
[    0.200977] PM: Resume from disk failed.
[    0.200983] registered taskstats version 1
[    0.200993] XENBUS: Device with no driver: device/vbd/51714
[    0.200997] XENBUS: Device with no driver: device/vbd/51713
[    0.201000] XENBUS: Device with no driver: device/vif/0
[    0.201003] XENBUS: Device with no driver: device/console/0
[    0.201017] /build/buildd-linux-2.6_2.6.32-48squeeze3-amd64-mcoLgp/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[    0.201042] Initalizing network drop monitor service
[    0.201095] Freeing unused kernel memory: 604k freed
[    0.201200] Write protecting the kernel read-only data: 4344k
[    0.232289] udev[46]: starting version 164
[    0.272966]   alloc irq_desc for 504 on node -1
[    0.272969]   alloc kstat_irqs on node -1
[    0.273178] Initialising Xen virtual ethernet driver.
[    0.274543]   alloc irq_desc for 503 on node -1
[    0.274545]   alloc kstat_irqs on node -1
[    0.280528]   alloc irq_desc for 502 on node -1
[    0.280530]   alloc kstat_irqs on node -1
[    0.297294] blkfront: xvda2: barriers enabled
[    0.298091] blkfront: xvda1: barriers enabled
[    0.570244] kjournald starting.  Commit interval 5 seconds
[    0.570256] EXT3-fs: mounted filesystem with ordered data mode.
[    1.548362] udev[138]: starting version 164
[    1.949084] input: PC Speaker as /devices/platform/pcspkr/input/input1
[    2.335357] Error: Driver 'pcspkr' is already registered, aborting...
[    2.473577] Adding 262136k swap on /dev/xvda1.  Priority:-1 extents:1 across:262136k SS
[    2.598794] EXT3 FS on xvda2, internal journal
[   13.308074] eth0: no IPv6 routers present

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-28 14:27             ` Anthony Sheetz
@ 2013-05-28 18:02               ` Anthony Sheetz
  2013-05-28 18:18                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Anthony Sheetz @ 2013-05-28 18:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Ian Campbell, Roger Pau Monne

[-- Attachment #1: Type: text/plain, Size: 921 bytes --]

On Tue, May 28, 2013 at 10:27 AM, Anthony Sheetz <sheetzam@inspire.com> wrote:
>> Thank you. Also when you do this test, could you also provide the 'xenstore-ls'
>> output from dom0? And the 'dmesg' output from the guest (or at least
>> the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
>> the frontend/backend have the right negotiation parameters.
>
> Attached is the output of xenstore-ls from dom0, and dmesg from a domU
> with kernel 2.6.32-5-xen-amd64
> Will be working on putting a 3.2 kernel in place next, testing file
> transfer, and adding the output of dmesg from that.

updated to 3.2 using
http://www.cyberciti.biz/faq/debian-linux-6-apt-get-install-linux-kernel-3-2/
for instructions.
During transfer of data saw this: BUG" scheduling while atomic:
kworker/0:2/10421/0x10000002
Transfer test resulted in a file which did not match md5sum. Attached
is the dmesg output from the domU.

[-- Attachment #2: dmesg.32.txt --]
[-- Type: text/plain, Size: 10391 bytes --]

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.2.0-0.bpo.4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Debian 3.2.41-2+deb7u2~bpo60+1
[    0.000000] Command line:  root=/dev/xvda2 ro 
[    0.000000] ACPI in unprivileged domain disabled
[    0.000000] Released 0 pages of unused memory
[    0.000000] Set 0 page(s) to 1-1 mapping
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000060800000 (usable)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI not present or invalid.
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x60800 max_arch_pfn = 0x400000000
[    0.000000] initial memory mapped : 0 - 03639000
[    0.000000] Base memory trampoline at [ffff88000009b000] 9b000 size 20480
[    0.000000] init_memory_mapping: 0000000000000000-0000000060800000
[    0.000000]  0000000000 - 0060800000 page 4k
[    0.000000] kernel direct mapping tables up to 60800000 @ cf9000-1000000
[    0.000000] xen: setting RW the range fdc000 - 1000000
[    0.000000] RAMDISK: 01949000 - 03639000
[    0.000000] NUMA turned off
[    0.000000] Faking a node at 0000000000000000-0000000060800000
[    0.000000] Initmem setup node 0 0000000000000000-0000000060800000
[    0.000000]   NODE_DATA [000000005fffb000 - 000000005fffffff]
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   empty
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x000000a0
[    0.000000]     0: 0x00000100 -> 0x00060800
[    0.000000] On node 0 totalpages: 395152
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 744 pages reserved
[    0.000000]   DMA zone: 3184 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 5348 pages used for memmap
[    0.000000]   DMA32 zone: 385820 pages, LIFO batch:31
[    0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
[    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] No local APIC present
[    0.000000] APIC: disable apic facility
[    0.000000] APIC: switched to apic NOOP
[    0.000000] nr_irqs_gsi: 16
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
[    0.000000] Allocating PCI resources starting at 60800000 (gap: 60800000:9f800000)
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.1.4 (preserve-AD)
[    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] PERCPU: Embedded 28 pages/cpu @ffff88005fc00000 s82880 r8192 d23616 u2097152
[    0.000000] pcpu-alloc: s82880 r8192 d23616 u2097152 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 389004
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line:  root=/dev/xvda2 ro 
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Calgary: detecting Calgary via BIOS EBDA area
[    0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[    0.000000] Memory: 1504508k/1581056k available (3531k kernel code, 448k absent, 76100k reserved, 3208k data, 616k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU dyntick-idle grace-period acceleration is enabled.
[    0.000000] NR_IRQS:33024 nr_irqs:256 16
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [hvc0] enabled
[    0.000000] Xen: using vcpuop timer interface
[    0.000000] installing Xen timer for CPU 0
[    0.000000] Detected 2294.848 MHz processor.
[    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4589.69 BogoMIPS (lpj=9179392)
[    0.004000] pid_max: default: 32768 minimum: 301
[    0.004000] Security Framework initialized
[    0.004000] AppArmor: AppArmor disabled by boot time parameter
[    0.004000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.004000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.004000] Mount-cache hash table entries: 256
[    0.004000] Initializing cgroup subsys cpuacct
[    0.004000] Initializing cgroup subsys memory
[    0.004000] Initializing cgroup subsys devices
[    0.004000] Initializing cgroup subsys freezer
[    0.004000] Initializing cgroup subsys net_cls
[    0.004000] Initializing cgroup subsys blkio
[    0.004000] Initializing cgroup subsys perf_event
[    0.004000] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[    0.004000] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
[    0.004000] CPU: Physical Processor ID: 0
[    0.004000] CPU: Processor Core ID: 0
[    0.004000] SMP alternatives: switching to UP code
[    0.029088] Freeing SMP alternatives: 16k freed
[    0.029163] Performance Events: unsupported p6 CPU model 58 no PMU driver, software events only.
[    0.029293] NMI watchdog disabled (cpu0): hardware events not enabled
[    0.029318] Brought up 1 CPUs
[    0.029448] devtmpfs: initialized
[    0.032173] Grant table initialized
[    0.032244] print_constraints: dummy: 
[    0.032305] NET: Registered protocol family 16
[    0.032510] PCI: setting up Xen PCI frontend stub
[    0.032517] PCI: pci_cache_line_size set to 64 bytes
[    0.033015] bio: create slab <bio-0> at 0
[    0.033078] ACPI: Interpreter disabled.
[    0.033098] xen/balloon: Initialising balloon driver.
[    0.033098] xen-balloon: Initialising balloon driver.
[    0.033098] vgaarb: loaded
[    0.033098] PCI: System does not support PCI
[    0.033098] PCI: System does not support PCI
[    0.033098] Switching to clocksource xen
[    0.033194] pnp: PnP ACPI: disabled
[    0.034979] PCI: max bus depth: 0 pci_try_num: 1
[    0.035010] NET: Registered protocol family 2
[    0.035175] IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.036322] TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
[    0.037073] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.037188] TCP: Hash tables configured (established 262144 bind 65536)
[    0.037193] TCP reno registered
[    0.037207] UDP hash table entries: 1024 (order: 3, 32768 bytes)
[    0.037225] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
[    0.037284] NET: Registered protocol family 1
[    0.037292] PCI: CLS 0 bytes, default 64
[    0.037327] Unpacking initramfs...
[    0.061808] Freeing initrd memory: 29632k freed
[    0.067281] platform rtc_cmos: registered platform RTC device (no PNP device found)
[    0.067460] audit: initializing netlink socket (disabled)
[    0.067471] type=2000 audit(1369752979.409:1): initialized
[    0.080739] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.080910] VFS: Disk quotas dquot_6.5.2
[    0.080931] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.080980] msgmni has been set to 2996
[    0.081099] alg: No test for stdrng (krng)
[    0.081120] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.081126] io scheduler noop registered
[    0.081129] io scheduler deadline registered
[    0.081140] io scheduler cfq registered (default)
[    0.081183] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    0.081202] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    0.081206] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.197788] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.198048] Linux agpgart interface v0.103
[    0.198133] i8042: PNP: No PS/2 controller found. Probing ports directly.
[    1.200733] i8042: No controller found
[    1.200830] mousedev: PS/2 mouse device common for all mice
[    1.240666] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
[    1.240721] rtc_cmos: probe of rtc_cmos failed with error -38
[    1.240885] TCP cubic registered
[    1.240933] NET: Registered protocol family 10
[    1.241267] Mobile IPv6
[    1.241274] NET: Registered protocol family 17
[    1.241283] Registering the dns_resolver key type
[    1.241388] PM: Hibernation image not present or could not be loaded.
[    1.241395] registered taskstats version 1
[    1.241410] XENBUS: Device with no driver: device/vbd/51714
[    1.241416] XENBUS: Device with no driver: device/vbd/51713
[    1.241420] XENBUS: Device with no driver: device/vif/0
[    1.241425] XENBUS: Device with no driver: device/console/0
[    1.241442] /build/buildd-linux_3.2.41-2+deb7u2~bpo60+1-amd64-mnypfK/linux-3.2.41/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[    1.241476] Initializing network drop monitor service
[    1.241791] Freeing unused kernel memory: 616k freed
[    1.241910] Write protecting the kernel read-only data: 6144k
[    1.244660] Freeing unused kernel memory: 548k freed
[    1.245050] Freeing unused kernel memory: 708k freed
[    1.276238] udev[45]: starting version 164
[    1.312147] Initialising Xen virtual ethernet driver.
[    1.327497] blkfront: xvda2: flush diskcache: enabled
[    1.331984] blkfront: xvda1: flush diskcache: enabled
[    1.667213] kjournald starting.  Commit interval 5 seconds
[    1.667240] EXT3-fs (xvda2): mounted filesystem with ordered data mode
[    2.738037] udev[140]: starting version 164
[    3.172340] input: PC Speaker as /devices/platform/pcspkr/input/input0
[    3.296421] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
[    3.660850] Error: Driver 'pcspkr' is already registered, aborting...
[    3.965481] Adding 262140k swap on /dev/xvda1.  Priority:-1 extents:1 across:262140k SS
[    4.075322] EXT3-fs (xvda2): using internal journal
[    5.839480] sshd (534): /proc/534/oom_adj is deprecated, please use /proc/534/oom_score_adj instead.
[   15.408035] eth0: no IPv6 routers present

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-28 12:14       ` Roger Pau Monné
@ 2013-05-28 18:15         ` Anthony Sheetz
  2013-05-29  8:39           ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Anthony Sheetz @ 2013-05-28 18:15 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Konrad Rzeszutek Wilk, Ian Campbell, xen-devel

>> I would guess the difference is I am using LVM with full disk
>> encryption. Take a look at
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124 for the
>> details on exactly how I am able to recreate this bug.
>> In other words, I use the installer and chose the option to use full
>> disk encryption and LVM.
>> I'll be starting with the rest of the testing and data collection
>> which was requested shortly.
>
> I would like to avoid reinstalling my whole OS, and I don't have a spare
> HDD, so isn't there anyway I can reproduce the full disk encryption
> using a partition?

As my colleague points out, the set up you have misses that a single
encrypted object is in use by both dom0 and domU. Without having your
dom0 on the same encrypted device as your domU (even though they use
different logical volumes) I'm not sure how to test it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-28 18:02               ` Anthony Sheetz
@ 2013-05-28 18:18                 ` Konrad Rzeszutek Wilk
  2013-05-28 18:19                   ` Anthony Sheetz
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-05-28 18:18 UTC (permalink / raw)
  To: Anthony Sheetz; +Cc: xen-devel, Ian Campbell, Roger Pau Monne

On Tue, May 28, 2013 at 02:02:41PM -0400, Anthony Sheetz wrote:
> On Tue, May 28, 2013 at 10:27 AM, Anthony Sheetz <sheetzam@inspire.com> wrote:
> >> Thank you. Also when you do this test, could you also provide the 'xenstore-ls'
> >> output from dom0? And the 'dmesg' output from the guest (or at least
> >> the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
> >> the frontend/backend have the right negotiation parameters.
> >
> > Attached is the output of xenstore-ls from dom0, and dmesg from a domU
> > with kernel 2.6.32-5-xen-amd64
> > Will be working on putting a 3.2 kernel in place next, testing file
> > transfer, and adding the output of dmesg from that.
> 
> updated to 3.2 using
> http://www.cyberciti.biz/faq/debian-linux-6-apt-get-install-linux-kernel-3-2/
> for instructions.
> During transfer of data saw this: BUG" scheduling while atomic:
> kworker/0:2/10421/0x10000002

? I don't see it here?
> Transfer test resulted in a file which did not match md5sum. Attached
> is the dmesg output from the domU.

Shouldn't the BUG be present here?

> [    0.000000] Initializing cgroup subsys cpuset
> [    0.000000] Initializing cgroup subsys cpu
> [    0.000000] Linux version 3.2.0-0.bpo.4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Debian 3.2.41-2+deb7u2~bpo60+1
> [    0.000000] Command line:  root=/dev/xvda2 ro 
> [    0.000000] ACPI in unprivileged domain disabled
> [    0.000000] Released 0 pages of unused memory
> [    0.000000] Set 0 page(s) to 1-1 mapping
> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
> [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
> [    0.000000]  Xen: 0000000000100000 - 0000000060800000 (usable)
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] DMI not present or invalid.
> [    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
> [    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
> [    0.000000] No AGP bridge found
> [    0.000000] last_pfn = 0x60800 max_arch_pfn = 0x400000000
> [    0.000000] initial memory mapped : 0 - 03639000
> [    0.000000] Base memory trampoline at [ffff88000009b000] 9b000 size 20480
> [    0.000000] init_memory_mapping: 0000000000000000-0000000060800000
> [    0.000000]  0000000000 - 0060800000 page 4k
> [    0.000000] kernel direct mapping tables up to 60800000 @ cf9000-1000000
> [    0.000000] xen: setting RW the range fdc000 - 1000000
> [    0.000000] RAMDISK: 01949000 - 03639000
> [    0.000000] NUMA turned off
> [    0.000000] Faking a node at 0000000000000000-0000000060800000
> [    0.000000] Initmem setup node 0 0000000000000000-0000000060800000
> [    0.000000]   NODE_DATA [000000005fffb000 - 000000005fffffff]
> [    0.000000] Zone PFN ranges:
> [    0.000000]   DMA      0x00000010 -> 0x00001000
> [    0.000000]   DMA32    0x00001000 -> 0x00100000
> [    0.000000]   Normal   empty
> [    0.000000] Movable zone start PFN for each node
> [    0.000000] early_node_map[2] active PFN ranges
> [    0.000000]     0: 0x00000010 -> 0x000000a0
> [    0.000000]     0: 0x00000100 -> 0x00060800
> [    0.000000] On node 0 totalpages: 395152
> [    0.000000]   DMA zone: 56 pages used for memmap
> [    0.000000]   DMA zone: 744 pages reserved
> [    0.000000]   DMA zone: 3184 pages, LIFO batch:0
> [    0.000000]   DMA32 zone: 5348 pages used for memmap
> [    0.000000]   DMA32 zone: 385820 pages, LIFO batch:31
> [    0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
> [    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
> [    0.000000] No local APIC present
> [    0.000000] APIC: disable apic facility
> [    0.000000] APIC: switched to apic NOOP
> [    0.000000] nr_irqs_gsi: 16
> [    0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
> [    0.000000] Allocating PCI resources starting at 60800000 (gap: 60800000:9f800000)
> [    0.000000] Booting paravirtualized kernel on Xen
> [    0.000000] Xen version: 4.1.4 (preserve-AD)
> [    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1 nr_node_ids:1
> [    0.000000] PERCPU: Embedded 28 pages/cpu @ffff88005fc00000 s82880 r8192 d23616 u2097152
> [    0.000000] pcpu-alloc: s82880 r8192 d23616 u2097152 alloc=1*2097152
> [    0.000000] pcpu-alloc: [0] 0 
> [    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 389004
> [    0.000000] Policy zone: DMA32
> [    0.000000] Kernel command line:  root=/dev/xvda2 ro 
> [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
> [    0.000000] Checking aperture...
> [    0.000000] No AGP bridge found
> [    0.000000] Calgary: detecting Calgary via BIOS EBDA area
> [    0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
> [    0.000000] Memory: 1504508k/1581056k available (3531k kernel code, 448k absent, 76100k reserved, 3208k data, 616k init)
> [    0.000000] Hierarchical RCU implementation.
> [    0.000000] 	RCU dyntick-idle grace-period acceleration is enabled.
> [    0.000000] NR_IRQS:33024 nr_irqs:256 16
> [    0.000000] Console: colour dummy device 80x25
> [    0.000000] console [tty0] enabled
> [    0.000000] console [hvc0] enabled
> [    0.000000] Xen: using vcpuop timer interface
> [    0.000000] installing Xen timer for CPU 0
> [    0.000000] Detected 2294.848 MHz processor.
> [    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4589.69 BogoMIPS (lpj=9179392)
> [    0.004000] pid_max: default: 32768 minimum: 301
> [    0.004000] Security Framework initialized
> [    0.004000] AppArmor: AppArmor disabled by boot time parameter
> [    0.004000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
> [    0.004000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
> [    0.004000] Mount-cache hash table entries: 256
> [    0.004000] Initializing cgroup subsys cpuacct
> [    0.004000] Initializing cgroup subsys memory
> [    0.004000] Initializing cgroup subsys devices
> [    0.004000] Initializing cgroup subsys freezer
> [    0.004000] Initializing cgroup subsys net_cls
> [    0.004000] Initializing cgroup subsys blkio
> [    0.004000] Initializing cgroup subsys perf_event
> [    0.004000] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
> [    0.004000] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
> [    0.004000] CPU: Physical Processor ID: 0
> [    0.004000] CPU: Processor Core ID: 0
> [    0.004000] SMP alternatives: switching to UP code
> [    0.029088] Freeing SMP alternatives: 16k freed
> [    0.029163] Performance Events: unsupported p6 CPU model 58 no PMU driver, software events only.
> [    0.029293] NMI watchdog disabled (cpu0): hardware events not enabled
> [    0.029318] Brought up 1 CPUs
> [    0.029448] devtmpfs: initialized
> [    0.032173] Grant table initialized
> [    0.032244] print_constraints: dummy: 
> [    0.032305] NET: Registered protocol family 16
> [    0.032510] PCI: setting up Xen PCI frontend stub
> [    0.032517] PCI: pci_cache_line_size set to 64 bytes
> [    0.033015] bio: create slab <bio-0> at 0
> [    0.033078] ACPI: Interpreter disabled.
> [    0.033098] xen/balloon: Initialising balloon driver.
> [    0.033098] xen-balloon: Initialising balloon driver.
> [    0.033098] vgaarb: loaded
> [    0.033098] PCI: System does not support PCI
> [    0.033098] PCI: System does not support PCI
> [    0.033098] Switching to clocksource xen
> [    0.033194] pnp: PnP ACPI: disabled
> [    0.034979] PCI: max bus depth: 0 pci_try_num: 1
> [    0.035010] NET: Registered protocol family 2
> [    0.035175] IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
> [    0.036322] TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
> [    0.037073] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
> [    0.037188] TCP: Hash tables configured (established 262144 bind 65536)
> [    0.037193] TCP reno registered
> [    0.037207] UDP hash table entries: 1024 (order: 3, 32768 bytes)
> [    0.037225] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
> [    0.037284] NET: Registered protocol family 1
> [    0.037292] PCI: CLS 0 bytes, default 64
> [    0.037327] Unpacking initramfs...
> [    0.061808] Freeing initrd memory: 29632k freed
> [    0.067281] platform rtc_cmos: registered platform RTC device (no PNP device found)
> [    0.067460] audit: initializing netlink socket (disabled)
> [    0.067471] type=2000 audit(1369752979.409:1): initialized
> [    0.080739] HugeTLB registered 2 MB page size, pre-allocated 0 pages
> [    0.080910] VFS: Disk quotas dquot_6.5.2
> [    0.080931] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
> [    0.080980] msgmni has been set to 2996
> [    0.081099] alg: No test for stdrng (krng)
> [    0.081120] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
> [    0.081126] io scheduler noop registered
> [    0.081129] io scheduler deadline registered
> [    0.081140] io scheduler cfq registered (default)
> [    0.081183] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
> [    0.081202] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
> [    0.081206] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> [    0.197788] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    0.198048] Linux agpgart interface v0.103
> [    0.198133] i8042: PNP: No PS/2 controller found. Probing ports directly.
> [    1.200733] i8042: No controller found
> [    1.200830] mousedev: PS/2 mouse device common for all mice
> [    1.240666] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
> [    1.240721] rtc_cmos: probe of rtc_cmos failed with error -38
> [    1.240885] TCP cubic registered
> [    1.240933] NET: Registered protocol family 10
> [    1.241267] Mobile IPv6
> [    1.241274] NET: Registered protocol family 17
> [    1.241283] Registering the dns_resolver key type
> [    1.241388] PM: Hibernation image not present or could not be loaded.
> [    1.241395] registered taskstats version 1
> [    1.241410] XENBUS: Device with no driver: device/vbd/51714
> [    1.241416] XENBUS: Device with no driver: device/vbd/51713
> [    1.241420] XENBUS: Device with no driver: device/vif/0
> [    1.241425] XENBUS: Device with no driver: device/console/0
> [    1.241442] /build/buildd-linux_3.2.41-2+deb7u2~bpo60+1-amd64-mnypfK/linux-3.2.41/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
> [    1.241476] Initializing network drop monitor service
> [    1.241791] Freeing unused kernel memory: 616k freed
> [    1.241910] Write protecting the kernel read-only data: 6144k
> [    1.244660] Freeing unused kernel memory: 548k freed
> [    1.245050] Freeing unused kernel memory: 708k freed
> [    1.276238] udev[45]: starting version 164
> [    1.312147] Initialising Xen virtual ethernet driver.
> [    1.327497] blkfront: xvda2: flush diskcache: enabled
> [    1.331984] blkfront: xvda1: flush diskcache: enabled
> [    1.667213] kjournald starting.  Commit interval 5 seconds
> [    1.667240] EXT3-fs (xvda2): mounted filesystem with ordered data mode
> [    2.738037] udev[140]: starting version 164
> [    3.172340] input: PC Speaker as /devices/platform/pcspkr/input/input0
> [    3.296421] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
> [    3.660850] Error: Driver 'pcspkr' is already registered, aborting...
> [    3.965481] Adding 262140k swap on /dev/xvda1.  Priority:-1 extents:1 across:262140k SS
> [    4.075322] EXT3-fs (xvda2): using internal journal
> [    5.839480] sshd (534): /proc/534/oom_adj is deprecated, please use /proc/534/oom_score_adj instead.
> [   15.408035] eth0: no IPv6 routers present

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-28 18:18                 ` Konrad Rzeszutek Wilk
@ 2013-05-28 18:19                   ` Anthony Sheetz
  2013-05-29 15:15                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Anthony Sheetz @ 2013-05-28 18:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Ian Campbell, Roger Pau Monne

I'd have thought so as well. It's possible that was console output
from dom0, come to think of it.

On Tue, May 28, 2013 at 2:18 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, May 28, 2013 at 02:02:41PM -0400, Anthony Sheetz wrote:
>> On Tue, May 28, 2013 at 10:27 AM, Anthony Sheetz <sheetzam@inspire.com> wrote:
>> >> Thank you. Also when you do this test, could you also provide the 'xenstore-ls'
>> >> output from dom0? And the 'dmesg' output from the guest (or at least
>> >> the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
>> >> the frontend/backend have the right negotiation parameters.
>> >
>> > Attached is the output of xenstore-ls from dom0, and dmesg from a domU
>> > with kernel 2.6.32-5-xen-amd64
>> > Will be working on putting a 3.2 kernel in place next, testing file
>> > transfer, and adding the output of dmesg from that.
>>
>> updated to 3.2 using
>> http://www.cyberciti.biz/faq/debian-linux-6-apt-get-install-linux-kernel-3-2/
>> for instructions.
>> During transfer of data saw this: BUG" scheduling while atomic:
>> kworker/0:2/10421/0x10000002
>
> ? I don't see it here?
>> Transfer test resulted in a file which did not match md5sum. Attached
>> is the dmesg output from the domU.
>
> Shouldn't the BUG be present here?
>
>> [    0.000000] Initializing cgroup subsys cpuset
>> [    0.000000] Initializing cgroup subsys cpu
>> [    0.000000] Linux version 3.2.0-0.bpo.4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Debian 3.2.41-2+deb7u2~bpo60+1
>> [    0.000000] Command line:  root=/dev/xvda2 ro
>> [    0.000000] ACPI in unprivileged domain disabled
>> [    0.000000] Released 0 pages of unused memory
>> [    0.000000] Set 0 page(s) to 1-1 mapping
>> [    0.000000] BIOS-provided physical RAM map:
>> [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
>> [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
>> [    0.000000]  Xen: 0000000000100000 - 0000000060800000 (usable)
>> [    0.000000] NX (Execute Disable) protection: active
>> [    0.000000] DMI not present or invalid.
>> [    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
>> [    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
>> [    0.000000] No AGP bridge found
>> [    0.000000] last_pfn = 0x60800 max_arch_pfn = 0x400000000
>> [    0.000000] initial memory mapped : 0 - 03639000
>> [    0.000000] Base memory trampoline at [ffff88000009b000] 9b000 size 20480
>> [    0.000000] init_memory_mapping: 0000000000000000-0000000060800000
>> [    0.000000]  0000000000 - 0060800000 page 4k
>> [    0.000000] kernel direct mapping tables up to 60800000 @ cf9000-1000000
>> [    0.000000] xen: setting RW the range fdc000 - 1000000
>> [    0.000000] RAMDISK: 01949000 - 03639000
>> [    0.000000] NUMA turned off
>> [    0.000000] Faking a node at 0000000000000000-0000000060800000
>> [    0.000000] Initmem setup node 0 0000000000000000-0000000060800000
>> [    0.000000]   NODE_DATA [000000005fffb000 - 000000005fffffff]
>> [    0.000000] Zone PFN ranges:
>> [    0.000000]   DMA      0x00000010 -> 0x00001000
>> [    0.000000]   DMA32    0x00001000 -> 0x00100000
>> [    0.000000]   Normal   empty
>> [    0.000000] Movable zone start PFN for each node
>> [    0.000000] early_node_map[2] active PFN ranges
>> [    0.000000]     0: 0x00000010 -> 0x000000a0
>> [    0.000000]     0: 0x00000100 -> 0x00060800
>> [    0.000000] On node 0 totalpages: 395152
>> [    0.000000]   DMA zone: 56 pages used for memmap
>> [    0.000000]   DMA zone: 744 pages reserved
>> [    0.000000]   DMA zone: 3184 pages, LIFO batch:0
>> [    0.000000]   DMA32 zone: 5348 pages used for memmap
>> [    0.000000]   DMA32 zone: 385820 pages, LIFO batch:31
>> [    0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
>> [    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
>> [    0.000000] No local APIC present
>> [    0.000000] APIC: disable apic facility
>> [    0.000000] APIC: switched to apic NOOP
>> [    0.000000] nr_irqs_gsi: 16
>> [    0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
>> [    0.000000] Allocating PCI resources starting at 60800000 (gap: 60800000:9f800000)
>> [    0.000000] Booting paravirtualized kernel on Xen
>> [    0.000000] Xen version: 4.1.4 (preserve-AD)
>> [    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1 nr_node_ids:1
>> [    0.000000] PERCPU: Embedded 28 pages/cpu @ffff88005fc00000 s82880 r8192 d23616 u2097152
>> [    0.000000] pcpu-alloc: s82880 r8192 d23616 u2097152 alloc=1*2097152
>> [    0.000000] pcpu-alloc: [0] 0
>> [    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 389004
>> [    0.000000] Policy zone: DMA32
>> [    0.000000] Kernel command line:  root=/dev/xvda2 ro
>> [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
>> [    0.000000] Checking aperture...
>> [    0.000000] No AGP bridge found
>> [    0.000000] Calgary: detecting Calgary via BIOS EBDA area
>> [    0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
>> [    0.000000] Memory: 1504508k/1581056k available (3531k kernel code, 448k absent, 76100k reserved, 3208k data, 616k init)
>> [    0.000000] Hierarchical RCU implementation.
>> [    0.000000]        RCU dyntick-idle grace-period acceleration is enabled.
>> [    0.000000] NR_IRQS:33024 nr_irqs:256 16
>> [    0.000000] Console: colour dummy device 80x25
>> [    0.000000] console [tty0] enabled
>> [    0.000000] console [hvc0] enabled
>> [    0.000000] Xen: using vcpuop timer interface
>> [    0.000000] installing Xen timer for CPU 0
>> [    0.000000] Detected 2294.848 MHz processor.
>> [    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4589.69 BogoMIPS (lpj=9179392)
>> [    0.004000] pid_max: default: 32768 minimum: 301
>> [    0.004000] Security Framework initialized
>> [    0.004000] AppArmor: AppArmor disabled by boot time parameter
>> [    0.004000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
>> [    0.004000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
>> [    0.004000] Mount-cache hash table entries: 256
>> [    0.004000] Initializing cgroup subsys cpuacct
>> [    0.004000] Initializing cgroup subsys memory
>> [    0.004000] Initializing cgroup subsys devices
>> [    0.004000] Initializing cgroup subsys freezer
>> [    0.004000] Initializing cgroup subsys net_cls
>> [    0.004000] Initializing cgroup subsys blkio
>> [    0.004000] Initializing cgroup subsys perf_event
>> [    0.004000] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
>> [    0.004000] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
>> [    0.004000] CPU: Physical Processor ID: 0
>> [    0.004000] CPU: Processor Core ID: 0
>> [    0.004000] SMP alternatives: switching to UP code
>> [    0.029088] Freeing SMP alternatives: 16k freed
>> [    0.029163] Performance Events: unsupported p6 CPU model 58 no PMU driver, software events only.
>> [    0.029293] NMI watchdog disabled (cpu0): hardware events not enabled
>> [    0.029318] Brought up 1 CPUs
>> [    0.029448] devtmpfs: initialized
>> [    0.032173] Grant table initialized
>> [    0.032244] print_constraints: dummy:
>> [    0.032305] NET: Registered protocol family 16
>> [    0.032510] PCI: setting up Xen PCI frontend stub
>> [    0.032517] PCI: pci_cache_line_size set to 64 bytes
>> [    0.033015] bio: create slab <bio-0> at 0
>> [    0.033078] ACPI: Interpreter disabled.
>> [    0.033098] xen/balloon: Initialising balloon driver.
>> [    0.033098] xen-balloon: Initialising balloon driver.
>> [    0.033098] vgaarb: loaded
>> [    0.033098] PCI: System does not support PCI
>> [    0.033098] PCI: System does not support PCI
>> [    0.033098] Switching to clocksource xen
>> [    0.033194] pnp: PnP ACPI: disabled
>> [    0.034979] PCI: max bus depth: 0 pci_try_num: 1
>> [    0.035010] NET: Registered protocol family 2
>> [    0.035175] IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
>> [    0.036322] TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
>> [    0.037073] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
>> [    0.037188] TCP: Hash tables configured (established 262144 bind 65536)
>> [    0.037193] TCP reno registered
>> [    0.037207] UDP hash table entries: 1024 (order: 3, 32768 bytes)
>> [    0.037225] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
>> [    0.037284] NET: Registered protocol family 1
>> [    0.037292] PCI: CLS 0 bytes, default 64
>> [    0.037327] Unpacking initramfs...
>> [    0.061808] Freeing initrd memory: 29632k freed
>> [    0.067281] platform rtc_cmos: registered platform RTC device (no PNP device found)
>> [    0.067460] audit: initializing netlink socket (disabled)
>> [    0.067471] type=2000 audit(1369752979.409:1): initialized
>> [    0.080739] HugeTLB registered 2 MB page size, pre-allocated 0 pages
>> [    0.080910] VFS: Disk quotas dquot_6.5.2
>> [    0.080931] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
>> [    0.080980] msgmni has been set to 2996
>> [    0.081099] alg: No test for stdrng (krng)
>> [    0.081120] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
>> [    0.081126] io scheduler noop registered
>> [    0.081129] io scheduler deadline registered
>> [    0.081140] io scheduler cfq registered (default)
>> [    0.081183] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
>> [    0.081202] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
>> [    0.081206] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>> [    0.197788] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> [    0.198048] Linux agpgart interface v0.103
>> [    0.198133] i8042: PNP: No PS/2 controller found. Probing ports directly.
>> [    1.200733] i8042: No controller found
>> [    1.200830] mousedev: PS/2 mouse device common for all mice
>> [    1.240666] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
>> [    1.240721] rtc_cmos: probe of rtc_cmos failed with error -38
>> [    1.240885] TCP cubic registered
>> [    1.240933] NET: Registered protocol family 10
>> [    1.241267] Mobile IPv6
>> [    1.241274] NET: Registered protocol family 17
>> [    1.241283] Registering the dns_resolver key type
>> [    1.241388] PM: Hibernation image not present or could not be loaded.
>> [    1.241395] registered taskstats version 1
>> [    1.241410] XENBUS: Device with no driver: device/vbd/51714
>> [    1.241416] XENBUS: Device with no driver: device/vbd/51713
>> [    1.241420] XENBUS: Device with no driver: device/vif/0
>> [    1.241425] XENBUS: Device with no driver: device/console/0
>> [    1.241442] /build/buildd-linux_3.2.41-2+deb7u2~bpo60+1-amd64-mnypfK/linux-3.2.41/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
>> [    1.241476] Initializing network drop monitor service
>> [    1.241791] Freeing unused kernel memory: 616k freed
>> [    1.241910] Write protecting the kernel read-only data: 6144k
>> [    1.244660] Freeing unused kernel memory: 548k freed
>> [    1.245050] Freeing unused kernel memory: 708k freed
>> [    1.276238] udev[45]: starting version 164
>> [    1.312147] Initialising Xen virtual ethernet driver.
>> [    1.327497] blkfront: xvda2: flush diskcache: enabled
>> [    1.331984] blkfront: xvda1: flush diskcache: enabled
>> [    1.667213] kjournald starting.  Commit interval 5 seconds
>> [    1.667240] EXT3-fs (xvda2): mounted filesystem with ordered data mode
>> [    2.738037] udev[140]: starting version 164
>> [    3.172340] input: PC Speaker as /devices/platform/pcspkr/input/input0
>> [    3.296421] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
>> [    3.660850] Error: Driver 'pcspkr' is already registered, aborting...
>> [    3.965481] Adding 262140k swap on /dev/xvda1.  Priority:-1 extents:1 across:262140k SS
>> [    4.075322] EXT3-fs (xvda2): using internal journal
>> [    5.839480] sshd (534): /proc/534/oom_adj is deprecated, please use /proc/534/oom_score_adj instead.
>> [   15.408035] eth0: no IPv6 routers present
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-28 18:15         ` Anthony Sheetz
@ 2013-05-29  8:39           ` Ian Campbell
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2013-05-29  8:39 UTC (permalink / raw)
  To: Anthony Sheetz; +Cc: xen-devel, Konrad Rzeszutek Wilk, Roger Pau Monné

On Tue, 2013-05-28 at 14:15 -0400, Anthony Sheetz wrote:
> >> I would guess the difference is I am using LVM with full disk
> >> encryption. Take a look at
> >> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=705124 for the
> >> details on exactly how I am able to recreate this bug.
> >> In other words, I use the installer and chose the option to use full
> >> disk encryption and LVM.
> >> I'll be starting with the rest of the testing and data collection
> >> which was requested shortly.
> >
> > I would like to avoid reinstalling my whole OS, and I don't have a spare
> > HDD, so isn't there anyway I can reproduce the full disk encryption
> > using a partition?
> 
> As my colleague points out, the set up you have misses that a single
> encrypted object is in use by both dom0 and domU. Without having your
> dom0 on the same encrypted device as your domU (even though they use
> different logical volumes) I'm not sure how to test it.

Perhaps you could install a second dom0 rootfs on the LVM partition and
use that for testing. This would at least avoid blowing away the
original "primary" dom0 rootfs, which I suppose is what Roger would like
to avoid.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-24 14:20           ` Konrad Rzeszutek Wilk
  2013-05-28 14:27             ` Anthony Sheetz
@ 2013-05-29 11:53             ` Anthony Sheetz
  2013-05-30 18:36               ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 25+ messages in thread
From: Anthony Sheetz @ 2013-05-29 11:53 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Ian Campbell, Roger Pau Monne

Is there anything else I can get you at this time to help troubleshoot this?

On Fri, May 24, 2013 at 10:20 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Thu, May 23, 2013 at 02:19:50PM -0400, Anthony Sheetz wrote:
>> On Wed, May 22, 2013 at 4:10 PM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>> > On Mon, Apr 22, 2013 at 01:26:34PM +0100, Ian Campbell wrote:
>> >> Konrad is on vacation this week, so it'll probably be next week before
>> >> this gets looked at by him.
>> >
>> > And I finally got to this email in my 'vacation-mbox'
>> >>
>> >> Ian.
>> >>
>> >> On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote:
>> >> > I realize folks are pretty busy, but we're still interested in getting
>> >> > this problem solved, and I want to be sure it's not lost in the
>> >> > shuffle.
>> >> > Any chance of getting some attention for it?
>> >> >
>> >> > On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>> >> > > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
>> >> > >> (re-sending, first message seems to have gotten lost)
>> >> > >>
>> >> > >> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
>> >> > >
>> >> > > I'm here too (different hat ;-)), thanks for posting it here. I've added
>> >> > > some people who know about the block stuff to the CC.
>> >> > >
>> >> > > Guys, my suspicion is that the issue is that barriers issued by ext3
>> >> > > inside the guest aren't making it all the way down the
>> >> > > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
>> >> > > filesystem to eventually corrupt itself.
>> >> > >
>> >> > > The issue seems to relate to the use of dm-crypt since
>> >> > > ext3->blkfront->blkback->lvm->disk is reported work fine.
>> >> > >
>> >> > > However there is no problem with the local dom0 ext3 root filesystem
>> >> > > which is also in the same lvm VG on the crypt device (i.e.
>> >> > > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
>> >> > > something is up at the blkfront->back link which causes the barriers
>> >> > > which blkback is injecting into the block subsystem either don't make it
>> >> > > to the dm-crypt layer or do not DTRT once they arrive.
>> >> > >
>> >> > > I'm not really sure with how to proceed (or how to ask Anthony to
>> >> > > proceed) with verifying any part of that hypothesis though.
>> >> > >
>> >> > > ISTR issues with old vs new style barriers or barriers with no data in
>> >> > > them or something, could this be related to that? (or am I thinking of
>> >> > > DISCARD?)
>> >
>> > You are using two different kernel versions. The 2.6.32 domU is only using
>> > WRITE_BARRIERs, while in the 3.2 kernels that have been completly eliminated.
>> > The mechanism they use is called 'WRITE_FLUSH'. The 3.2 kernel has a patch:
>> > ommit 29bde093787f3bdf7b9b4270ada6be7c8076e36b
>> > Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> > Date:   Mon Oct 10 00:42:22 2011 -0400
>> >
>> >     xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
>> >
>> >
>> > which emulates the barrier request by draining all of the oustanding I/Os and then
>> > sending the WRITE_FLUSH.
>> >
>> > But it looks like you are hitting an issue here. Just to make sure
>> > that is the case, what happens if you use the _same_ kernel in both dom0 and
>> > domU? Does it work then?
>> >
>>
>> First, thank you so much for getting back to me, it's really appreciated.
>> At this point I've forgotten if I did this with Wheezy on Wheezy, and
>> what the result was.
>> I'll have to test using the 3.2 kernel on the domU Debian Squeeze and
>> get back to you. I should be able to do that early next week.
>
> Thank you. Also when you do this test, could you also provide the 'xenstore-ls'
> output from dom0? And the 'dmesg' output from the guest (or at least
> the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
> the frontend/backend have the right negotiation parameters.
>
> Have a good weekend!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-28 18:19                   ` Anthony Sheetz
@ 2013-05-29 15:15                     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-05-29 15:15 UTC (permalink / raw)
  To: Anthony Sheetz; +Cc: Roger Pau Monne, Ian Campbell, xen-devel

On Tue, May 28, 2013 at 02:19:17PM -0400, Anthony Sheetz wrote:
> I'd have thought so as well. It's possible that was console output
> from dom0, come to think of it.


OK, any chance you could capture that? Some questions below:

> 
> On Tue, May 28, 2013 at 2:18 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Tue, May 28, 2013 at 02:02:41PM -0400, Anthony Sheetz wrote:
> >> On Tue, May 28, 2013 at 10:27 AM, Anthony Sheetz <sheetzam@inspire.com> wrote:
> >> >> Thank you. Also when you do this test, could you also provide the 'xenstore-ls'
> >> >> output from dom0? And the 'dmesg' output from the guest (or at least
> >> >> the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
> >> >> the frontend/backend have the right negotiation parameters.
> >> >
> >> > Attached is the output of xenstore-ls from dom0, and dmesg from a domU
> >> > with kernel 2.6.32-5-xen-amd64
> >> > Will be working on putting a 3.2 kernel in place next, testing file
> >> > transfer, and adding the output of dmesg from that.
> >>
> >> updated to 3.2 using
> >> http://www.cyberciti.biz/faq/debian-linux-6-apt-get-install-linux-kernel-3-2/
> >> for instructions.
> >> During transfer of data saw this: BUG" scheduling while atomic:
> >> kworker/0:2/10421/0x10000002
> >
> > ? I don't see it here?
> >> Transfer test resulted in a file which did not match md5sum. Attached
> >> is the dmesg output from the domU.

So the transfer you are speaking of is.. What exactly is it that?
Are you using 'scp' to an disk in the guest? Can you describe to me how
your disk in the guest is setup? When you do the 'md5sum' do you
do it after you have dropped the cache?

Is the storage on an USB stick/disk?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-29 11:53             ` Anthony Sheetz
@ 2013-05-30 18:36               ` Konrad Rzeszutek Wilk
  2013-06-04 12:55                 ` Anthony Sheetz
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-05-30 18:36 UTC (permalink / raw)
  To: Anthony Sheetz; +Cc: Roger Pau Monne, Ian Campbell, xen-devel

On Wed, May 29, 2013 at 07:53:39AM -0400, Anthony Sheetz wrote:
> Is there anything else I can get you at this time to help troubleshoot this?

Well, this reminds me of a ext3 bug in the 2.6.32 stable tree that 
the maintainer of ext3 would not want to backport the fix. It was an
bug that caused corruption.

If I could just remember the email thread about it. 
> 
> On Fri, May 24, 2013 at 10:20 AM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Thu, May 23, 2013 at 02:19:50PM -0400, Anthony Sheetz wrote:
> >> On Wed, May 22, 2013 at 4:10 PM, Konrad Rzeszutek Wilk
> >> <konrad.wilk@oracle.com> wrote:
> >> > On Mon, Apr 22, 2013 at 01:26:34PM +0100, Ian Campbell wrote:
> >> >> Konrad is on vacation this week, so it'll probably be next week before
> >> >> this gets looked at by him.
> >> >
> >> > And I finally got to this email in my 'vacation-mbox'
> >> >>
> >> >> Ian.
> >> >>
> >> >> On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote:
> >> >> > I realize folks are pretty busy, but we're still interested in getting
> >> >> > this problem solved, and I want to be sure it's not lost in the
> >> >> > shuffle.
> >> >> > Any chance of getting some attention for it?
> >> >> >
> >> >> > On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> >> >> > > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
> >> >> > >> (re-sending, first message seems to have gotten lost)
> >> >> > >>
> >> >> > >> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
> >> >> > >
> >> >> > > I'm here too (different hat ;-)), thanks for posting it here. I've added
> >> >> > > some people who know about the block stuff to the CC.
> >> >> > >
> >> >> > > Guys, my suspicion is that the issue is that barriers issued by ext3
> >> >> > > inside the guest aren't making it all the way down the
> >> >> > > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
> >> >> > > filesystem to eventually corrupt itself.
> >> >> > >
> >> >> > > The issue seems to relate to the use of dm-crypt since
> >> >> > > ext3->blkfront->blkback->lvm->disk is reported work fine.
> >> >> > >
> >> >> > > However there is no problem with the local dom0 ext3 root filesystem
> >> >> > > which is also in the same lvm VG on the crypt device (i.e.
> >> >> > > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
> >> >> > > something is up at the blkfront->back link which causes the barriers
> >> >> > > which blkback is injecting into the block subsystem either don't make it
> >> >> > > to the dm-crypt layer or do not DTRT once they arrive.
> >> >> > >
> >> >> > > I'm not really sure with how to proceed (or how to ask Anthony to
> >> >> > > proceed) with verifying any part of that hypothesis though.
> >> >> > >
> >> >> > > ISTR issues with old vs new style barriers or barriers with no data in
> >> >> > > them or something, could this be related to that? (or am I thinking of
> >> >> > > DISCARD?)
> >> >
> >> > You are using two different kernel versions. The 2.6.32 domU is only using
> >> > WRITE_BARRIERs, while in the 3.2 kernels that have been completly eliminated.
> >> > The mechanism they use is called 'WRITE_FLUSH'. The 3.2 kernel has a patch:
> >> > ommit 29bde093787f3bdf7b9b4270ada6be7c8076e36b
> >> > Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> > Date:   Mon Oct 10 00:42:22 2011 -0400
> >> >
> >> >     xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
> >> >
> >> >
> >> > which emulates the barrier request by draining all of the oustanding I/Os and then
> >> > sending the WRITE_FLUSH.
> >> >
> >> > But it looks like you are hitting an issue here. Just to make sure
> >> > that is the case, what happens if you use the _same_ kernel in both dom0 and
> >> > domU? Does it work then?
> >> >
> >>
> >> First, thank you so much for getting back to me, it's really appreciated.
> >> At this point I've forgotten if I did this with Wheezy on Wheezy, and
> >> what the result was.
> >> I'll have to test using the 3.2 kernel on the domU Debian Squeeze and
> >> get back to you. I should be able to do that early next week.
> >
> > Thank you. Also when you do this test, could you also provide the 'xenstore-ls'
> > output from dom0? And the 'dmesg' output from the guest (or at least
> > the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
> > the frontend/backend have the right negotiation parameters.
> >
> > Have a good weekend!
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-05-30 18:36               ` Konrad Rzeszutek Wilk
@ 2013-06-04 12:55                 ` Anthony Sheetz
  2013-06-04 13:41                   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Anthony Sheetz @ 2013-06-04 12:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Roger Pau Monne, Ian Campbell, xen-devel

On Thu, May 30, 2013 at 2:36 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Wed, May 29, 2013 at 07:53:39AM -0400, Anthony Sheetz wrote:
>> Is there anything else I can get you at this time to help troubleshoot this?
>
> Well, this reminds me of a ext3 bug in the 2.6.32 stable tree that
> the maintainer of ext3 would not want to backport the fix. It was an
> bug that caused corruption.
>
> If I could just remember the email thread about it.
>>
>> On Fri, May 24, 2013 at 10:20 AM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>> > On Thu, May 23, 2013 at 02:19:50PM -0400, Anthony Sheetz wrote:
>> >> On Wed, May 22, 2013 at 4:10 PM, Konrad Rzeszutek Wilk
>> >> <konrad.wilk@oracle.com> wrote:
>> >> > On Mon, Apr 22, 2013 at 01:26:34PM +0100, Ian Campbell wrote:
>> >> >> Konrad is on vacation this week, so it'll probably be next week before
>> >> >> this gets looked at by him.
>> >> >
>> >> > And I finally got to this email in my 'vacation-mbox'
>> >> >>
>> >> >> Ian.
>> >> >>
>> >> >> On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote:
>> >> >> > I realize folks are pretty busy, but we're still interested in getting
>> >> >> > this problem solved, and I want to be sure it's not lost in the
>> >> >> > shuffle.
>> >> >> > Any chance of getting some attention for it?
>> >> >> >
>> >> >> > On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>> >> >> > > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
>> >> >> > >> (re-sending, first message seems to have gotten lost)
>> >> >> > >>
>> >> >> > >> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
>> >> >> > >
>> >> >> > > I'm here too (different hat ;-)), thanks for posting it here. I've added
>> >> >> > > some people who know about the block stuff to the CC.
>> >> >> > >
>> >> >> > > Guys, my suspicion is that the issue is that barriers issued by ext3
>> >> >> > > inside the guest aren't making it all the way down the
>> >> >> > > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
>> >> >> > > filesystem to eventually corrupt itself.
>> >> >> > >
>> >> >> > > The issue seems to relate to the use of dm-crypt since
>> >> >> > > ext3->blkfront->blkback->lvm->disk is reported work fine.
>> >> >> > >
>> >> >> > > However there is no problem with the local dom0 ext3 root filesystem
>> >> >> > > which is also in the same lvm VG on the crypt device (i.e.
>> >> >> > > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
>> >> >> > > something is up at the blkfront->back link which causes the barriers
>> >> >> > > which blkback is injecting into the block subsystem either don't make it
>> >> >> > > to the dm-crypt layer or do not DTRT once they arrive.
>> >> >> > >
>> >> >> > > I'm not really sure with how to proceed (or how to ask Anthony to
>> >> >> > > proceed) with verifying any part of that hypothesis though.
>> >> >> > >
>> >> >> > > ISTR issues with old vs new style barriers or barriers with no data in
>> >> >> > > them or something, could this be related to that? (or am I thinking of
>> >> >> > > DISCARD?)
>> >> >
>> >> > You are using two different kernel versions. The 2.6.32 domU is only using
>> >> > WRITE_BARRIERs, while in the 3.2 kernels that have been completly eliminated.
>> >> > The mechanism they use is called 'WRITE_FLUSH'. The 3.2 kernel has a patch:
>> >> > ommit 29bde093787f3bdf7b9b4270ada6be7c8076e36b
>> >> > Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> >> > Date:   Mon Oct 10 00:42:22 2011 -0400
>> >> >
>> >> >     xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
>> >> >
>> >> >
>> >> > which emulates the barrier request by draining all of the oustanding I/Os and then
>> >> > sending the WRITE_FLUSH.
>> >> >
>> >> > But it looks like you are hitting an issue here. Just to make sure
>> >> > that is the case, what happens if you use the _same_ kernel in both dom0 and
>> >> > domU? Does it work then?
>> >> >
>> >>
>> >> First, thank you so much for getting back to me, it's really appreciated.
>> >> At this point I've forgotten if I did this with Wheezy on Wheezy, and
>> >> what the result was.
>> >> I'll have to test using the 3.2 kernel on the domU Debian Squeeze and
>> >> get back to you. I should be able to do that early next week.
>> >
>> > Thank you. Also when you do this test, could you also provide the 'xenstore-ls'
>> > output from dom0? And the 'dmesg' output from the guest (or at least
>> > the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
>> > the frontend/backend have the right negotiation parameters.
>> >
>> > Have a good weekend!
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>

Is there anything I can do at this point to help with this bug?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-06-04 12:55                 ` Anthony Sheetz
@ 2013-06-04 13:41                   ` Konrad Rzeszutek Wilk
  2013-06-07 17:10                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-04 13:41 UTC (permalink / raw)
  To: Anthony Sheetz, Teck Choon Giam; +Cc: Roger Pau Monne, Ian Campbell, xen-devel

On Tue, Jun 04, 2013 at 08:55:26AM -0400, Anthony Sheetz wrote:
> On Thu, May 30, 2013 at 2:36 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Wed, May 29, 2013 at 07:53:39AM -0400, Anthony Sheetz wrote:
> >> Is there anything else I can get you at this time to help troubleshoot this?
> >
> > Well, this reminds me of a ext3 bug in the 2.6.32 stable tree that
> > the maintainer of ext3 would not want to backport the fix. It was an
> > bug that caused corruption.
> >
> > If I could just remember the email thread about it.

Can't recall it, but maybe Teck can?

> >>
> >> On Fri, May 24, 2013 at 10:20 AM, Konrad Rzeszutek Wilk
> >> <konrad.wilk@oracle.com> wrote:
> >> > On Thu, May 23, 2013 at 02:19:50PM -0400, Anthony Sheetz wrote:
> >> >> On Wed, May 22, 2013 at 4:10 PM, Konrad Rzeszutek Wilk
> >> >> <konrad.wilk@oracle.com> wrote:
> >> >> > On Mon, Apr 22, 2013 at 01:26:34PM +0100, Ian Campbell wrote:
> >> >> >> Konrad is on vacation this week, so it'll probably be next week before
> >> >> >> this gets looked at by him.
> >> >> >
> >> >> > And I finally got to this email in my 'vacation-mbox'
> >> >> >>
> >> >> >> Ian.
> >> >> >>
> >> >> >> On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote:
> >> >> >> > I realize folks are pretty busy, but we're still interested in getting
> >> >> >> > this problem solved, and I want to be sure it's not lost in the
> >> >> >> > shuffle.
> >> >> >> > Any chance of getting some attention for it?
> >> >> >> >
> >> >> >> > On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> >> >> >> > > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
> >> >> >> > >> (re-sending, first message seems to have gotten lost)
> >> >> >> > >>
> >> >> >> > >> I was referred here by Ian Campbell ijc@hellion.org.uk from bugs.debian.org.
> >> >> >> > >
> >> >> >> > > I'm here too (different hat ;-)), thanks for posting it here. I've added
> >> >> >> > > some people who know about the block stuff to the CC.
> >> >> >> > >
> >> >> >> > > Guys, my suspicion is that the issue is that barriers issued by ext3
> >> >> >> > > inside the guest aren't making it all the way down the
> >> >> >> > > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
> >> >> >> > > filesystem to eventually corrupt itself.
> >> >> >> > >
> >> >> >> > > The issue seems to relate to the use of dm-crypt since
> >> >> >> > > ext3->blkfront->blkback->lvm->disk is reported work fine.
> >> >> >> > >
> >> >> >> > > However there is no problem with the local dom0 ext3 root filesystem
> >> >> >> > > which is also in the same lvm VG on the crypt device (i.e.
> >> >> >> > > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I figure
> >> >> >> > > something is up at the blkfront->back link which causes the barriers
> >> >> >> > > which blkback is injecting into the block subsystem either don't make it
> >> >> >> > > to the dm-crypt layer or do not DTRT once they arrive.
> >> >> >> > >
> >> >> >> > > I'm not really sure with how to proceed (or how to ask Anthony to
> >> >> >> > > proceed) with verifying any part of that hypothesis though.
> >> >> >> > >
> >> >> >> > > ISTR issues with old vs new style barriers or barriers with no data in
> >> >> >> > > them or something, could this be related to that? (or am I thinking of
> >> >> >> > > DISCARD?)
> >> >> >
> >> >> > You are using two different kernel versions. The 2.6.32 domU is only using
> >> >> > WRITE_BARRIERs, while in the 3.2 kernels that have been completly eliminated.
> >> >> > The mechanism they use is called 'WRITE_FLUSH'. The 3.2 kernel has a patch:
> >> >> > ommit 29bde093787f3bdf7b9b4270ada6be7c8076e36b
> >> >> > Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> >> > Date:   Mon Oct 10 00:42:22 2011 -0400
> >> >> >
> >> >> >     xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
> >> >> >
> >> >> >
> >> >> > which emulates the barrier request by draining all of the oustanding I/Os and then
> >> >> > sending the WRITE_FLUSH.
> >> >> >
> >> >> > But it looks like you are hitting an issue here. Just to make sure
> >> >> > that is the case, what happens if you use the _same_ kernel in both dom0 and
> >> >> > domU? Does it work then?
> >> >> >
> >> >>
> >> >> First, thank you so much for getting back to me, it's really appreciated.
> >> >> At this point I've forgotten if I did this with Wheezy on Wheezy, and
> >> >> what the result was.
> >> >> I'll have to test using the 3.2 kernel on the domU Debian Squeeze and
> >> >> get back to you. I should be able to do that early next week.
> >> >
> >> > Thank you. Also when you do this test, could you also provide the 'xenstore-ls'
> >> > output from dom0? And the 'dmesg' output from the guest (or at least
> >> > the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
> >> > the frontend/backend have the right negotiation parameters.
> >> >
> >> > Have a good weekend!
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
> >>
> 
> Is there anything I can do at this point to help with this bug?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-06-04 13:41                   ` Konrad Rzeszutek Wilk
@ 2013-06-07 17:10                     ` Konrad Rzeszutek Wilk
  2013-06-07 18:43                       ` Anthony Sheetz
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-07 17:10 UTC (permalink / raw)
  To: Anthony Sheetz, Teck Choon Giam; +Cc: xen-devel, Ian Campbell, Roger Pau Monne

On Tue, Jun 04, 2013 at 09:41:10AM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jun 04, 2013 at 08:55:26AM -0400, Anthony Sheetz wrote:
> > On Thu, May 30, 2013 at 2:36 PM, Konrad Rzeszutek Wilk
> > <konrad.wilk@oracle.com> wrote:
> > > On Wed, May 29, 2013 at 07:53:39AM -0400, Anthony Sheetz wrote:
> > >> Is there anything else I can get you at this time to help troubleshoot this?
> > >
> > > Well, this reminds me of a ext3 bug in the 2.6.32 stable tree that
> > > the maintainer of ext3 would not want to backport the fix. It was an
> > > bug that caused corruption.
> > >
> > > If I could just remember the email thread about it.
> 
> Can't recall it, but maybe Teck can?


He doesn't seem to respond.

Anthony, I have this on my queue to look - so will get to it.
Sadly that is not going to happen this week :-(

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-06-07 17:10                     ` Konrad Rzeszutek Wilk
@ 2013-06-07 18:43                       ` Anthony Sheetz
  2013-07-02 18:10                         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Anthony Sheetz @ 2013-06-07 18:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Teck Choon Giam, Ian Campbell, Roger Pau Monne

Not a problem. Just wanted to be sure we weren't a dependency. Thanks
for your attention!

On Fri, Jun 7, 2013 at 1:10 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, Jun 04, 2013 at 09:41:10AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Tue, Jun 04, 2013 at 08:55:26AM -0400, Anthony Sheetz wrote:
>> > On Thu, May 30, 2013 at 2:36 PM, Konrad Rzeszutek Wilk
>> > <konrad.wilk@oracle.com> wrote:
>> > > On Wed, May 29, 2013 at 07:53:39AM -0400, Anthony Sheetz wrote:
>> > >> Is there anything else I can get you at this time to help troubleshoot this?
>> > >
>> > > Well, this reminds me of a ext3 bug in the 2.6.32 stable tree that
>> > > the maintainer of ext3 would not want to backport the fix. It was an
>> > > bug that caused corruption.
>> > >
>> > > If I could just remember the email thread about it.
>>
>> Can't recall it, but maybe Teck can?
>
>
> He doesn't seem to respond.
>
> Anthony, I have this on my queue to look - so will get to it.
> Sadly that is not going to happen this week :-(

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: ext3 corruption in domU
  2013-06-07 18:43                       ` Anthony Sheetz
@ 2013-07-02 18:10                         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-02 18:10 UTC (permalink / raw)
  To: Anthony Sheetz; +Cc: Roger Pau Monne, Teck Choon Giam, Ian Campbell, xen-devel

On Fri, Jun 07, 2013 at 02:43:06PM -0400, Anthony Sheetz wrote:
> Not a problem. Just wanted to be sure we weren't a dependency. Thanks
> for your attention!
> 
> On Fri, Jun 7, 2013 at 1:10 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Tue, Jun 04, 2013 at 09:41:10AM -0400, Konrad Rzeszutek Wilk wrote:
> >> On Tue, Jun 04, 2013 at 08:55:26AM -0400, Anthony Sheetz wrote:
> >> > On Thu, May 30, 2013 at 2:36 PM, Konrad Rzeszutek Wilk
> >> > <konrad.wilk@oracle.com> wrote:
> >> > > On Wed, May 29, 2013 at 07:53:39AM -0400, Anthony Sheetz wrote:
> >> > >> Is there anything else I can get you at this time to help troubleshoot this?
> >> > >
> >> > > Well, this reminds me of a ext3 bug in the 2.6.32 stable tree that
> >> > > the maintainer of ext3 would not want to backport the fix. It was an
> >> > > bug that caused corruption.
> >> > >
> >> > > If I could just remember the email thread about it.
> >>
> >> Can't recall it, but maybe Teck can?
> >
> >
> > He doesn't seem to respond.
> >
> > Anthony, I have this on my queue to look - so will get to it.
> > Sadly that is not going to happen this week :-(

Installing a new box with Wheezy to try this out. The one thing I could
not find in the thread and in the bug was the guest config. Could you
please reply back with it? Thanks.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-07-02 18:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-16 17:39 BUG: ext3 corruption in domU Anthony Sheetz
2013-04-17 13:00 ` Ian Campbell
2013-04-22 12:22   ` Anthony Sheetz
2013-04-22 12:26     ` Ian Campbell
2013-05-22 20:10       ` Konrad Rzeszutek Wilk
2013-05-23 18:19         ` Anthony Sheetz
2013-05-24 14:20           ` Konrad Rzeszutek Wilk
2013-05-28 14:27             ` Anthony Sheetz
2013-05-28 18:02               ` Anthony Sheetz
2013-05-28 18:18                 ` Konrad Rzeszutek Wilk
2013-05-28 18:19                   ` Anthony Sheetz
2013-05-29 15:15                     ` Konrad Rzeszutek Wilk
2013-05-29 11:53             ` Anthony Sheetz
2013-05-30 18:36               ` Konrad Rzeszutek Wilk
2013-06-04 12:55                 ` Anthony Sheetz
2013-06-04 13:41                   ` Konrad Rzeszutek Wilk
2013-06-07 17:10                     ` Konrad Rzeszutek Wilk
2013-06-07 18:43                       ` Anthony Sheetz
2013-07-02 18:10                         ` Konrad Rzeszutek Wilk
2013-05-24 17:48   ` Roger Pau Monné
2013-05-28 12:10     ` Anthony Sheetz
2013-05-28 12:14       ` Roger Pau Monné
2013-05-28 18:15         ` Anthony Sheetz
2013-05-29  8:39           ` Ian Campbell
2013-05-06 12:46 ` Anthony Sheetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.