* [radosgw] Race condition corrupting data on COPY ?
@ 2013-03-18 9:50 Sylvain Munaut
2013-03-18 13:29 ` Yehuda Sadeh
0 siblings, 1 reply; 5+ messages in thread
From: Sylvain Munaut @ 2013-03-18 9:50 UTC (permalink / raw)
To: ceph-devel
Hi,
I've just noticed something rather worrying on our cluster.
Some files are apparently truncated. From the first look I had at it,
it happened on files where there was a metadata update right after the
file was stored. The exact sequence was:
- PUT to store the file
- GET to get the file (which at that point is still correct and has
the proper length)
- PUT using a 'copy source' over itself to update the metadata
all of theses happening sequentially in the same second, very quickly.
Then subsequent GET return a truncated file.
I'm looking into it to narrow down the issue but I wanted to know if
anyone had seen something similar ?
Cheers,
Sylvain
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [radosgw] Race condition corrupting data on COPY ?
2013-03-18 9:50 [radosgw] Race condition corrupting data on COPY ? Sylvain Munaut
@ 2013-03-18 13:29 ` Yehuda Sadeh
2013-03-18 14:40 ` Sylvain Munaut
0 siblings, 1 reply; 5+ messages in thread
From: Yehuda Sadeh @ 2013-03-18 13:29 UTC (permalink / raw)
To: Sylvain Munaut; +Cc: ceph-devel
On Mon, Mar 18, 2013 at 2:50 AM, Sylvain Munaut
<s.munaut@whatever-company.com> wrote:
> Hi,
>
>
> I've just noticed something rather worrying on our cluster.
>
> Some files are apparently truncated. From the first look I had at it,
> it happened on files where there was a metadata update right after the
> file was stored. The exact sequence was:
>
> - PUT to store the file
> - GET to get the file (which at that point is still correct and has
> the proper length)
> - PUT using a 'copy source' over itself to update the metadata
>
> all of theses happening sequentially in the same second, very quickly.
>
> Then subsequent GET return a truncated file.
>
>
> I'm looking into it to narrow down the issue but I wanted to know if
> anyone had seen something similar ?
>
>
What version are you using? Do you have logs?
Thanks,
Yehuda
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [radosgw] Race condition corrupting data on COPY ?
2013-03-18 13:29 ` Yehuda Sadeh
@ 2013-03-18 14:40 ` Sylvain Munaut
2013-03-18 16:25 ` Yehuda Sadeh
0 siblings, 1 reply; 5+ messages in thread
From: Sylvain Munaut @ 2013-03-18 14:40 UTC (permalink / raw)
To: Yehuda Sadeh; +Cc: ceph-devel
Hi,
> What version are you using? Do you have logs?
I'm running a custom build 0.56.3 + some patches ( basically up
to7889c5412 + fixes for #4150 and #4177 ).
I don't have any radosgw low ( debug level is set to 0 and it didn't
output anything ).
I have the HTTP logs :
10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +0000] "PUT
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
HTTP/1.1" 200 0 "-" "Boto/2.6.0 (linux2)"
10.0.0.74 s3.svc - [14/Mar/2013:09:23:14 +0000] "GET
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3D&Expires=1363256594&AWSAccessKeyId=XXX
HTTP/1.1" 200 622080 "-" "python-requests"
10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +0000] "PUT
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
HTTP/1.1" 200 146 "-" "Boto/2.6.0 (linux2)"
10.0.0.74 s3.svc - [14/Mar/2013:10:14:53 +0000] "GET
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3D&Expires=1363258236&AWSAccessKeyId=XXX
HTTP/1.1" 200 461220 "-" "python-requests"
Cheers,
Sylvain
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [radosgw] Race condition corrupting data on COPY ?
2013-03-18 14:40 ` Sylvain Munaut
@ 2013-03-18 16:25 ` Yehuda Sadeh
2013-03-18 16:39 ` Sylvain Munaut
0 siblings, 1 reply; 5+ messages in thread
From: Yehuda Sadeh @ 2013-03-18 16:25 UTC (permalink / raw)
To: Sylvain Munaut; +Cc: ceph-devel
On Mon, Mar 18, 2013 at 7:40 AM, Sylvain Munaut
<s.munaut@whatever-company.com> wrote:
> Hi,
>
>
>> What version are you using? Do you have logs?
>
> I'm running a custom build 0.56.3 + some patches ( basically up
> to7889c5412 + fixes for #4150 and #4177 ).
>
> I don't have any radosgw low ( debug level is set to 0 and it didn't
> output anything ).
> I have the HTTP logs :
>
> 10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +0000] "PUT
> /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
> HTTP/1.1" 200 0 "-" "Boto/2.6.0 (linux2)"
> 10.0.0.74 s3.svc - [14/Mar/2013:09:23:14 +0000] "GET
> /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3D&Expires=1363256594&AWSAccessKeyId=XXX
> HTTP/1.1" 200 622080 "-" "python-requests"
> 10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +0000] "PUT
> /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
> HTTP/1.1" 200 146 "-" "Boto/2.6.0 (linux2)"
> 10.0.0.74 s3.svc - [14/Mar/2013:10:14:53 +0000] "GET
> /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3D&Expires=1363258236&AWSAccessKeyId=XXX
> HTTP/1.1" 200 461220 "-" "python-requests"
>
>
Can't make much out of it, will probably need rgw logs (and preferably
with also 'debug ms = 1') for this issue.
Yehuda
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [radosgw] Race condition corrupting data on COPY ?
2013-03-18 16:25 ` Yehuda Sadeh
@ 2013-03-18 16:39 ` Sylvain Munaut
0 siblings, 0 replies; 5+ messages in thread
From: Sylvain Munaut @ 2013-03-18 16:39 UTC (permalink / raw)
To: Yehuda Sadeh; +Cc: ceph-devel
Hi,
> Can't make much out of it, will probably need rgw logs (and preferably
> with also 'debug ms = 1') for this issue.
Well, the problem is that I can't make it happen again ... it happened
4 times during an import of ~3000 files ... I'm trying to reproduce
this on a test cluster but so far, no luck. I'll give it another shot
tomorrow.
And I can't enable debug on prod for long periods, the space for log
is limited and would be filled in minutes with all the requests. I
also disabled the use of copy in production anyway because I can't
have it corrupt random customer files.
Cheers,
Sylvain
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-03-18 16:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-18 9:50 [radosgw] Race condition corrupting data on COPY ? Sylvain Munaut
2013-03-18 13:29 ` Yehuda Sadeh
2013-03-18 14:40 ` Sylvain Munaut
2013-03-18 16:25 ` Yehuda Sadeh
2013-03-18 16:39 ` Sylvain Munaut
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.