Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing

All of lore.kernel.org
 help / color / mirror / Atom feed

* Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing
@ 2013-03-22 19:09 Oliver Francke
  2013-03-22 19:30 ` Josh Durgin
  0 siblings, 1 reply; 5+ messages in thread
From: Oliver Francke @ 2013-03-22 19:09 UTC (permalink / raw)
  To: ceph-devel; +Cc: josh.durgin@inktank.com Durgin

Hi Josh, all,

I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things.

Today I installed a fresh cluster with mkephfs, went fine, imported a "master" debian 6.0 image with "format 2", made a snapshot, protected it, and made some clones.
Clones mounted with qemu-nbd, fiddled a bit with IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, took 2 secs and the VM was up n running. Cool.

Now an ordinary shutdown was performed, made a snapshot of this image. Started again, did some "apt-get update… install s/t…".
Shutdown -> rbd rollback -> startup again -> login -> install s/t else… filesystem showed "many" ex3-errors, fell into read-only mode, massive corruption.

qemu config was with ":rbd_cache=false" if it matters. Above scenario is reproducible, and as I stated out, no crash detected.

Perhaps it is in the same area as in the crash-thread, otherwise I will provide logfiles as needed.

Kind regards,

Oliver.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing
  2013-03-22 19:09 Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing Oliver Francke
@ 2013-03-22 19:30 ` Josh Durgin
       [not found]   ` <51502118.7060906@filoo.de>
  0 siblings, 1 reply; 5+ messages in thread
From: Josh Durgin @ 2013-03-22 19:30 UTC (permalink / raw)
  To: Oliver Francke; +Cc: ceph-devel

On 03/22/2013 12:09 PM, Oliver Francke wrote:
> Hi Josh, all,
>
> I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things.
>
> Today I installed a fresh cluster with mkephfs, went fine, imported a "master" debian 6.0 image with "format 2", made a snapshot, protected it, and made some clones.
> Clones mounted with qemu-nbd, fiddled a bit with IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, took 2 secs and the VM was up n running. Cool.
>
> Now an ordinary shutdown was performed, made a snapshot of this image. Started again, did some "apt-get update… install s/t…".
> Shutdown -> rbd rollback -> startup again -> login -> install s/t else… filesystem showed "many" ex3-errors, fell into read-only mode, massive corruption.

This sounds like it might be a bug in rollback. Could you try cloning
and snapshotting again, but export the image before booting, and after
rolling back, and compare the md5sums?

Running the rollback with:

--debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log

might help too. Does your ceph.conf where you ran the rollback have
anything related to rbd_cache in it?

> qemu config was with ":rbd_cache=false" if it matters. Above scenario is reproducible, and as I stated out, no crash detected.
>
> Perhaps it is in the same area as in the crash-thread, otherwise I will provide logfiles as needed.

It's unrelated, the other thread is an issue with the cache, which does
not cause corruption but triggers a crash.

Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing
       [not found]   ` <51502118.7060906@filoo.de>
@ 2013-03-26  8:30     ` Josh Durgin
  2013-03-26  8:33       ` Oliver Francke
  0 siblings, 1 reply; 5+ messages in thread
From: Josh Durgin @ 2013-03-26  8:30 UTC (permalink / raw)
  To: Oliver Francke; +Cc: ceph-devel

On 03/25/2013 03:04 AM, Oliver Francke wrote:
> Hi josh,
>
> logfile is attached...

Thanks. It shows nothing out of the ordinary, but I just reproduced the
incorrect rollback locally, so it shouldn't be hard to track down from
here.

I opened http://tracker.ceph.com/issues/4551 to track it.

Josh

> On 03/22/2013 08:30 PM, Josh Durgin wrote:
>> On 03/22/2013 12:09 PM, Oliver Francke wrote:
>>> Hi Josh, all,
>>>
>>> I did not want to hijack the thread dealing with a crashing VM, but
>>> perhaps there are some common things.
>>>
>>> Today I installed a fresh cluster with mkephfs, went fine, imported a
>>> "master" debian 6.0 image with "format 2", made a snapshot, protected
>>> it, and made some clones.
>>> Clones mounted with qemu-nbd, fiddled a bit with
>>> IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started,
>>> took 2 secs and the VM was up n running. Cool.
>>>
>>> Now an ordinary shutdown was performed, made a snapshot of this
>>> image. Started again, did some "apt-get update… install s/t…".
>>> Shutdown -> rbd rollback -> startup again -> login -> install s/t
>>> else… filesystem showed "many" ex3-errors, fell into read-only mode,
>>> massive corruption.
>>
>> This sounds like it might be a bug in rollback. Could you try cloning
>> and snapshotting again, but export the image before booting, and after
>> rolling back, and compare the md5sums?
>
> Done, first MD5-mismatch after 32 4MB blocks, checked with dd and a bs
> of 4MB.
>
>>
>> Running the rollback with:
>>
>> --debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log
>>
>> might help too. Does your ceph.conf where you ran the rollback have
>> anything related to rbd_cache in it?
>
> No cache settings in global ceph.conf.
>
> Hope it helps,
>
> Oliver.
>
>>
>>> qemu config was with ":rbd_cache=false" if it matters. Above scenario
>>> is reproducible, and as I stated out, no crash detected.
>>>
>>> Perhaps it is in the same area as in the crash-thread, otherwise I
>>> will provide logfiles as needed.
>>
>> It's unrelated, the other thread is an issue with the cache, which does
>> not cause corruption but triggers a crash.
>>
>> Josh
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing
  2013-03-26  8:30     ` Josh Durgin
@ 2013-03-26  8:33       ` Oliver Francke
  0 siblings, 0 replies; 5+ messages in thread
From: Oliver Francke @ 2013-03-26  8:33 UTC (permalink / raw)
  To: Josh Durgin; +Cc: ceph-devel

Hi Josh,

thanks for the quick response and...

On 03/26/2013 09:30 AM, Josh Durgin wrote:
> On 03/25/2013 03:04 AM, Oliver Francke wrote:
>> Hi josh,
>>
>> logfile is attached...
>
> Thanks. It shows nothing out of the ordinary, but I just reproduced the
> incorrect rollback locally, so it shouldn't be hard to track down from
> here.
>
> I opened http://tracker.ceph.com/issues/4551 to track it.

the good news.

Oliver.

>
> Josh
>
>> On 03/22/2013 08:30 PM, Josh Durgin wrote:
>>> On 03/22/2013 12:09 PM, Oliver Francke wrote:
>>>> Hi Josh, all,
>>>>
>>>> I did not want to hijack the thread dealing with a crashing VM, but
>>>> perhaps there are some common things.
>>>>
>>>> Today I installed a fresh cluster with mkephfs, went fine, imported a
>>>> "master" debian 6.0 image with "format 2", made a snapshot, protected
>>>> it, and made some clones.
>>>> Clones mounted with qemu-nbd, fiddled a bit with
>>>> IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started,
>>>> took 2 secs and the VM was up n running. Cool.
>>>>
>>>> Now an ordinary shutdown was performed, made a snapshot of this
>>>> image. Started again, did some "apt-get update… install s/t…".
>>>> Shutdown -> rbd rollback -> startup again -> login -> install s/t
>>>> else… filesystem showed "many" ex3-errors, fell into read-only mode,
>>>> massive corruption.
>>>
>>> This sounds like it might be a bug in rollback. Could you try cloning
>>> and snapshotting again, but export the image before booting, and after
>>> rolling back, and compare the md5sums?
>>
>> Done, first MD5-mismatch after 32 4MB blocks, checked with dd and a bs
>> of 4MB.
>>
>>>
>>> Running the rollback with:
>>>
>>> --debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log
>>>
>>> might help too. Does your ceph.conf where you ran the rollback have
>>> anything related to rbd_cache in it?
>>
>> No cache settings in global ceph.conf.
>>
>> Hope it helps,
>>
>> Oliver.
>>
>>>
>>>> qemu config was with ":rbd_cache=false" if it matters. Above scenario
>>>> is reproducible, and as I stated out, no crash detected.
>>>>
>>>> Perhaps it is in the same area as in the crash-thread, otherwise I
>>>> will provide logfiles as needed.
>>>
>>> It's unrelated, the other thread is an issue with the cache, which does
>>> not cause corruption but triggers a crash.
>>>
>>> Josh
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>


-- 

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing
@ 2013-03-22 19:09 Oliver Francke
  0 siblings, 0 replies; 5+ messages in thread
From: Oliver Francke @ 2013-03-22 19:09 UTC (permalink / raw)
  To: ceph-devel; +Cc: josh.durgin@inktank.com Durgin

Hi Josh, all,

I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things.

Today I installed a fresh cluster with mkephfs, went fine, imported a "master" debian 6.0 image with "format 2", made a snapshot, protected it, and made some clones.
Clones mounted with qemu-nbd, fiddled a bit with IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, took 2 secs and the VM was up n running. Cool.

Now an ordinary shutdown was performed, made a snapshot of this image. Started again, did some "apt-get update… install s/t…".
Shutdown -> rbd rollback -> startup again -> login -> install s/t else… filesystem showed "many" ex3-errors, fell into read-only mode, massive corruption.

qemu config was with ":rbd_cache=false" if it matters. Above scenario is reproducible, and as I stated out, no crash detected.

Perhaps it is in the same area as in the crash-thread, otherwise I will provide logfiles as needed.

Kind regards,

Oliver.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-03-26  8:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-22 19:09 Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing Oliver Francke
2013-03-22 19:30 ` Josh Durgin
     [not found]   ` <51502118.7060906@filoo.de>
2013-03-26  8:30     ` Josh Durgin
2013-03-26  8:33       ` Oliver Francke
  -- strict thread matches above, loose matches on Subject: below --
2013-03-22 19:09 Oliver Francke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.