* Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing
@ 2013-03-22 19:09 Oliver Francke
2013-03-22 19:30 ` Josh Durgin
0 siblings, 1 reply; 5+ messages in thread
From: Oliver Francke @ 2013-03-22 19:09 UTC (permalink / raw)
To: ceph-devel; +Cc: josh.durgin@inktank.com Durgin
Hi Josh, all,
I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things.
Today I installed a fresh cluster with mkephfs, went fine, imported a "master" debian 6.0 image with "format 2", made a snapshot, protected it, and made some clones.
Clones mounted with qemu-nbd, fiddled a bit with IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, took 2 secs and the VM was up n running. Cool.
Now an ordinary shutdown was performed, made a snapshot of this image. Started again, did some "apt-get update… install s/t…".
Shutdown -> rbd rollback -> startup again -> login -> install s/t else… filesystem showed "many" ex3-errors, fell into read-only mode, massive corruption.
qemu config was with ":rbd_cache=false" if it matters. Above scenario is reproducible, and as I stated out, no crash detected.
Perhaps it is in the same area as in the crash-thread, otherwise I will provide logfiles as needed.
Kind regards,
Oliver.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing
2013-03-22 19:09 Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing Oliver Francke
@ 2013-03-22 19:30 ` Josh Durgin
[not found] ` <51502118.7060906@filoo.de>
0 siblings, 1 reply; 5+ messages in thread
From: Josh Durgin @ 2013-03-22 19:30 UTC (permalink / raw)
To: Oliver Francke; +Cc: ceph-devel
On 03/22/2013 12:09 PM, Oliver Francke wrote:
> Hi Josh, all,
>
> I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things.
>
> Today I installed a fresh cluster with mkephfs, went fine, imported a "master" debian 6.0 image with "format 2", made a snapshot, protected it, and made some clones.
> Clones mounted with qemu-nbd, fiddled a bit with IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, took 2 secs and the VM was up n running. Cool.
>
> Now an ordinary shutdown was performed, made a snapshot of this image. Started again, did some "apt-get update… install s/t…".
> Shutdown -> rbd rollback -> startup again -> login -> install s/t else… filesystem showed "many" ex3-errors, fell into read-only mode, massive corruption.
This sounds like it might be a bug in rollback. Could you try cloning
and snapshotting again, but export the image before booting, and after
rolling back, and compare the md5sums?
Running the rollback with:
--debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log
might help too. Does your ceph.conf where you ran the rollback have
anything related to rbd_cache in it?
> qemu config was with ":rbd_cache=false" if it matters. Above scenario is reproducible, and as I stated out, no crash detected.
>
> Perhaps it is in the same area as in the crash-thread, otherwise I will provide logfiles as needed.
It's unrelated, the other thread is an issue with the cache, which does
not cause corruption but triggers a crash.
Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing
[not found] ` <51502118.7060906@filoo.de>
@ 2013-03-26 8:30 ` Josh Durgin
2013-03-26 8:33 ` Oliver Francke
0 siblings, 1 reply; 5+ messages in thread
From: Josh Durgin @ 2013-03-26 8:30 UTC (permalink / raw)
To: Oliver Francke; +Cc: ceph-devel
On 03/25/2013 03:04 AM, Oliver Francke wrote:
> Hi josh,
>
> logfile is attached...
Thanks. It shows nothing out of the ordinary, but I just reproduced the
incorrect rollback locally, so it shouldn't be hard to track down from
here.
I opened http://tracker.ceph.com/issues/4551 to track it.
Josh
> On 03/22/2013 08:30 PM, Josh Durgin wrote:
>> On 03/22/2013 12:09 PM, Oliver Francke wrote:
>>> Hi Josh, all,
>>>
>>> I did not want to hijack the thread dealing with a crashing VM, but
>>> perhaps there are some common things.
>>>
>>> Today I installed a fresh cluster with mkephfs, went fine, imported a
>>> "master" debian 6.0 image with "format 2", made a snapshot, protected
>>> it, and made some clones.
>>> Clones mounted with qemu-nbd, fiddled a bit with
>>> IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started,
>>> took 2 secs and the VM was up n running. Cool.
>>>
>>> Now an ordinary shutdown was performed, made a snapshot of this
>>> image. Started again, did some "apt-get update… install s/t…".
>>> Shutdown -> rbd rollback -> startup again -> login -> install s/t
>>> else… filesystem showed "many" ex3-errors, fell into read-only mode,
>>> massive corruption.
>>
>> This sounds like it might be a bug in rollback. Could you try cloning
>> and snapshotting again, but export the image before booting, and after
>> rolling back, and compare the md5sums?
>
> Done, first MD5-mismatch after 32 4MB blocks, checked with dd and a bs
> of 4MB.
>
>>
>> Running the rollback with:
>>
>> --debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log
>>
>> might help too. Does your ceph.conf where you ran the rollback have
>> anything related to rbd_cache in it?
>
> No cache settings in global ceph.conf.
>
> Hope it helps,
>
> Oliver.
>
>>
>>> qemu config was with ":rbd_cache=false" if it matters. Above scenario
>>> is reproducible, and as I stated out, no crash detected.
>>>
>>> Perhaps it is in the same area as in the crash-thread, otherwise I
>>> will provide logfiles as needed.
>>
>> It's unrelated, the other thread is an issue with the cache, which does
>> not cause corruption but triggers a crash.
>>
>> Josh
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing
2013-03-26 8:30 ` Josh Durgin
@ 2013-03-26 8:33 ` Oliver Francke
0 siblings, 0 replies; 5+ messages in thread
From: Oliver Francke @ 2013-03-26 8:33 UTC (permalink / raw)
To: Josh Durgin; +Cc: ceph-devel
Hi Josh,
thanks for the quick response and...
On 03/26/2013 09:30 AM, Josh Durgin wrote:
> On 03/25/2013 03:04 AM, Oliver Francke wrote:
>> Hi josh,
>>
>> logfile is attached...
>
> Thanks. It shows nothing out of the ordinary, but I just reproduced the
> incorrect rollback locally, so it shouldn't be hard to track down from
> here.
>
> I opened http://tracker.ceph.com/issues/4551 to track it.
the good news.
Oliver.
>
> Josh
>
>> On 03/22/2013 08:30 PM, Josh Durgin wrote:
>>> On 03/22/2013 12:09 PM, Oliver Francke wrote:
>>>> Hi Josh, all,
>>>>
>>>> I did not want to hijack the thread dealing with a crashing VM, but
>>>> perhaps there are some common things.
>>>>
>>>> Today I installed a fresh cluster with mkephfs, went fine, imported a
>>>> "master" debian 6.0 image with "format 2", made a snapshot, protected
>>>> it, and made some clones.
>>>> Clones mounted with qemu-nbd, fiddled a bit with
>>>> IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started,
>>>> took 2 secs and the VM was up n running. Cool.
>>>>
>>>> Now an ordinary shutdown was performed, made a snapshot of this
>>>> image. Started again, did some "apt-get update… install s/t…".
>>>> Shutdown -> rbd rollback -> startup again -> login -> install s/t
>>>> else… filesystem showed "many" ex3-errors, fell into read-only mode,
>>>> massive corruption.
>>>
>>> This sounds like it might be a bug in rollback. Could you try cloning
>>> and snapshotting again, but export the image before booting, and after
>>> rolling back, and compare the md5sums?
>>
>> Done, first MD5-mismatch after 32 4MB blocks, checked with dd and a bs
>> of 4MB.
>>
>>>
>>> Running the rollback with:
>>>
>>> --debug-ms 1 --debug-rbd 20 --log-file rbd-rollback.log
>>>
>>> might help too. Does your ceph.conf where you ran the rollback have
>>> anything related to rbd_cache in it?
>>
>> No cache settings in global ceph.conf.
>>
>> Hope it helps,
>>
>> Oliver.
>>
>>>
>>>> qemu config was with ":rbd_cache=false" if it matters. Above scenario
>>>> is reproducible, and as I stated out, no crash detected.
>>>>
>>>> Perhaps it is in the same area as in the crash-thread, otherwise I
>>>> will provide logfiles as needed.
>>>
>>> It's unrelated, the other thread is an issue with the cache, which does
>>> not cause corruption but triggers a crash.
>>>
>>> Josh
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
--
Oliver Francke
filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh
Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing
@ 2013-03-22 19:09 Oliver Francke
0 siblings, 0 replies; 5+ messages in thread
From: Oliver Francke @ 2013-03-22 19:09 UTC (permalink / raw)
To: ceph-devel; +Cc: josh.durgin@inktank.com Durgin
Hi Josh, all,
I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things.
Today I installed a fresh cluster with mkephfs, went fine, imported a "master" debian 6.0 image with "format 2", made a snapshot, protected it, and made some clones.
Clones mounted with qemu-nbd, fiddled a bit with IP/interfaces/hosts/net.rules…etc and cleanly unmounted, VM started, took 2 secs and the VM was up n running. Cool.
Now an ordinary shutdown was performed, made a snapshot of this image. Started again, did some "apt-get update… install s/t…".
Shutdown -> rbd rollback -> startup again -> login -> install s/t else… filesystem showed "many" ex3-errors, fell into read-only mode, massive corruption.
qemu config was with ":rbd_cache=false" if it matters. Above scenario is reproducible, and as I stated out, no crash detected.
Perhaps it is in the same area as in the crash-thread, otherwise I will provide logfiles as needed.
Kind regards,
Oliver.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-03-26 8:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-22 19:09 Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing Oliver Francke
2013-03-22 19:30 ` Josh Durgin
[not found] ` <51502118.7060906@filoo.de>
2013-03-26 8:30 ` Josh Durgin
2013-03-26 8:33 ` Oliver Francke
-- strict thread matches above, loose matches on Subject: below --
2013-03-22 19:09 Oliver Francke
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.