All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests
       [not found] <156923D1D92AF640829D4E847271EDAE522A46F435@MXMBON06.grupa.onet>
@ 2016-02-26 14:32 ` Dominik Mostowiec
       [not found]   ` <CAMNMNTxs46Hcaq-ZJZfYfF5m8aL9ETNjtBKr8UrdAxZFM7E=nw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Dominik Mostowiec @ 2016-02-26 14:32 UTC (permalink / raw)
  To: ceph-users, ceph-devel

Hi,
Maybe this is the reason of another bug?
http://tracker.ceph.com/issues/13764
The situation is very similiar...

--
Regards
Dominik

2016-02-25 16:17 GMT+01:00 Ritter Sławomir <Slawomir.Ritter@dreamlab.pl>:
> Hi,
>
>
>
> We have two CEPH clusters running on Dumpling 0.67.11 and some of our
> "multipart objects" are incompleted. It seems that some slow requests could
> cause corruption of related S3 objects. Moveover GETs for that objects are
> working without any error messages. There are only HTTP 200 in logs as well
> as no information about problems from popular client tools/libs.
>
>
>
> The situation looks very similiar to described in bug #8269, but we are
> using fixed 0.67.11 version:  http://tracker.ceph.com/issues/8269
>
>
>
> Regards,
>
>
>
> Sławomir Ritter
>
>
>
>
>
>
>
> EXAMPLE#1
>
>
>
> slow_request
>
> ============
>
> 2016-02-23 13:49:58.818640 osd.260 10.176.67.27:6800/688083 2119 : [WRN] 4
> slow requests, 4 included below; oldest blocked for > 30.727096 secs
>
> 2016-02-23 13:49:58.818673 osd.260 10.176.67.27:6800/688083 2120 : [WRN]
> slow request 30.727096 seconds old, received at 2016-02-23 13:49:28.091460:
> osd_op(c
>
> lient.47792965.0:185007087
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2
> [writef
>
> ull 0~524288] 10.ce729ebe e107594) v4 currently waiting for subops from
> [469,9]
>
>
>
>
>
> HTTP_500 in apache.log
>
> ======================
>
> 127.0.0.1 - - [23/Feb/2016:13:49:27 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=56
> HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
> 127.0.0.1 - - [23/Feb/2016:13:49:28 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57
> HTTP/1.0" 500 751 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
> 127.0.0.1 - - [23/Feb/2016:13:49:58 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57
> HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
> 127.0.0.1 - - [23/Feb/2016:13:49:59 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=58
> HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
>
>
>
>
> Empty RADOS object (real size = 0 bytes), list generated basis on MANIFEST
>
> ==========================================================================
>
> found
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.56_2
> 2097152   ok          2097152       10.7acc9476 (10.1476) [278,142,436]
> [278,142,436]
>
> found
> default.14654.445__multipart_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57
> 0         diff        4194304       10.4f5be025 (10.25)   [57,310,428]
> [57,310,428]
>
> found
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_1
> 4194304   ok          4194304       10.81191602 (10.1602) [441,109,420]
> [441,109,420]
>
> found
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2
> 2097152   ok          2097152       10.ce729ebe (10.1ebe) [260,469,9]
> [260,469,9]
>
>
>
>
>
> "Silent" GETs
>
> =============
>
> # object size from headers
>
> $ s3 -u head
> video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
> Content-Type: binary/octet-stream
>
> Content-Length: 641775701
>
> Server: nginx
>
>
>
> # but GETs only 637581397 (641775701 - missing 4194304 = 637581397)
>
> $ s3 -u get
> video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv >
> /tmp/test
>
> $  ls -al /tmp/test
>
> -rw-r--r-- 1 root root 637581397 Feb 23 17:05 /tmp/test
>
>
>
> # no error in logs
>
> 127.0.0.1 - - [23/Feb/2016:17:05:00 +0100] "GET
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
> HTTP/1.0" 200 637581711 "-" "Mozilla/4.0 (Compatible; s3; libs3 2.0; Linux
> x86_64)"
>
>
>
> # wget - retry for missing part, but there is no missing part, so it GETs
> head/tail of the file again....
>
> $ wget
> http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
>
> --2016-02-23 17:10:11--
> http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
>
> Connecting to 127.0.0.1:88... connected.
>
> HTTP request sent, awaiting response... 200 OK
>
> Length: 641775701 (612M) [binary/octet-stream]
>
> Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv'
>
>
>
> 99%
> [==================================================================================================================>
> ] 637,581,397 63.9M/s   in 9.5s
>
>
>
> 2016-02-23 17:10:20 (64.1 MB/s) - Connection closed at byte 637581397.
> Retrying.
>
>
>
> --2016-02-23 17:10:21--  (try: 2)
> http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
>
> Connecting to 127.0.0.1:88... connected.
>
> HTTP request sent, awaiting response... 206 Partial Content
>
> Length: 641775701 (612M), 4194304 (4.0M) remaining [binary/octet-stream]
>
> Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv'
>
>
>
> 100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>]
> 641,775,701 --.-K/s   in 0.007s
>
>
>
> 2016-02-23 17:10:21 (601 MB/s) -
> `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv' saved
> [641775701/641775701]
>
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problem: silently corrupted RadosGW objects caused by slow requests
       [not found]   ` <CAMNMNTxs46Hcaq-ZJZfYfF5m8aL9ETNjtBKr8UrdAxZFM7E=nw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-03 12:55     ` Ritter Sławomir
  2016-03-03 23:40       ` [ceph-users] " Robin H. Johnson
  0 siblings, 1 reply; 6+ messages in thread
From: Ritter Sławomir @ 2016-03-03 12:55 UTC (permalink / raw)
  To: ceph-users-Qp0mS5GaXlQ, ceph-devel

Hi,

I think this is really serious problem - again:  

- we silently lost S3/RGW objects in clusters 

Moreover, it our situation looks very similiar to described in uncorrected bug #13764 (Hammer) and in corrected #8269 (Dumpling).

Regards,

SR



-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Dominik Mostowiec
Sent: Friday, February 26, 2016 3:33 PM
To: ceph-users@ceph.com; ceph-devel
Subject: Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests

Hi,
Maybe this is the reason of another bug?
http://tracker.ceph.com/issues/13764
The situation is very similiar...

--
Regards
Dominik

2016-02-25 16:17 GMT+01:00 Ritter Sławomir <Slawomir.Ritter@dreamlab.pl>:
> Hi,
>
>
>
> We have two CEPH clusters running on Dumpling 0.67.11 and some of our
> "multipart objects" are incompleted. It seems that some slow requests could
> cause corruption of related S3 objects. Moveover GETs for that objects are
> working without any error messages. There are only HTTP 200 in logs as well
> as no information about problems from popular client tools/libs.
>
>
>
> The situation looks very similiar to described in bug #8269, but we are
> using fixed 0.67.11 version:  http://tracker.ceph.com/issues/8269
>
>
>
> Regards,
>
>
>
> Sławomir Ritter
>
>
>
>
>
>
>
> EXAMPLE#1
>
>
>
> slow_request
>
> ============
>
> 2016-02-23 13:49:58.818640 osd.260 10.176.67.27:6800/688083 2119 : [WRN] 4
> slow requests, 4 included below; oldest blocked for > 30.727096 secs
>
> 2016-02-23 13:49:58.818673 osd.260 10.176.67.27:6800/688083 2120 : [WRN]
> slow request 30.727096 seconds old, received at 2016-02-23 13:49:28.091460:
> osd_op(c
>
> lient.47792965.0:185007087
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2
> [writef
>
> ull 0~524288] 10.ce729ebe e107594) v4 currently waiting for subops from
> [469,9]
>
>
>
>
>
> HTTP_500 in apache.log
>
> ======================
>
> 127.0.0.1 - - [23/Feb/2016:13:49:27 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=56
> HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
> 127.0.0.1 - - [23/Feb/2016:13:49:28 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57
> HTTP/1.0" 500 751 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
> 127.0.0.1 - - [23/Feb/2016:13:49:58 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57
> HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
> 127.0.0.1 - - [23/Feb/2016:13:49:59 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=58
> HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
>
>
>
>
> Empty RADOS object (real size = 0 bytes), list generated basis on MANIFEST
>
> ==========================================================================
>
> found
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.56_2
> 2097152   ok          2097152       10.7acc9476 (10.1476) [278,142,436]
> [278,142,436]
>
> found
> default.14654.445__multipart_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57
> 0         diff        4194304       10.4f5be025 (10.25)   [57,310,428]
> [57,310,428]
>
> found
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_1
> 4194304   ok          4194304       10.81191602 (10.1602) [441,109,420]
> [441,109,420]
>
> found
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2
> 2097152   ok          2097152       10.ce729ebe (10.1ebe) [260,469,9]
> [260,469,9]
>
>
>
>
>
> "Silent" GETs
>
> =============
>
> # object size from headers
>
> $ s3 -u head
> video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
> Content-Type: binary/octet-stream
>
> Content-Length: 641775701
>
> Server: nginx
>
>
>
> # but GETs only 637581397 (641775701 - missing 4194304 = 637581397)
>
> $ s3 -u get
> video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv >
> /tmp/test
>
> $  ls -al /tmp/test
>
> -rw-r--r-- 1 root root 637581397 Feb 23 17:05 /tmp/test
>
>
>
> # no error in logs
>
> 127.0.0.1 - - [23/Feb/2016:17:05:00 +0100] "GET
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
> HTTP/1.0" 200 637581711 "-" "Mozilla/4.0 (Compatible; s3; libs3 2.0; Linux
> x86_64)"
>
>
>
> # wget - retry for missing part, but there is no missing part, so it GETs
> head/tail of the file again....
>
> $ wget
> http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
>
> --2016-02-23 17:10:11--
> http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
>
> Connecting to 127.0.0.1:88... connected.
>
> HTTP request sent, awaiting response... 200 OK
>
> Length: 641775701 (612M) [binary/octet-stream]
>
> Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv'
>
>
>
> 99%
> [==================================================================================================================>
> ] 637,581,397 63.9M/s   in 9.5s
>
>
>
> 2016-02-23 17:10:20 (64.1 MB/s) - Connection closed at byte 637581397.
> Retrying.
>
>
>
> --2016-02-23 17:10:21--  (try: 2)
> http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
>
> Connecting to 127.0.0.1:88... connected.
>
> HTTP request sent, awaiting response... 206 Partial Content
>
> Length: 641775701 (612M), 4194304 (4.0M) remaining [binary/octet-stream]
>
> Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv'
>
>
>
> 100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>]
> 641,775,701 --.-K/s   in 0.007s
>
>
>
> 2016-02-23 17:10:21 (601 MB/s) -
> `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv' saved
> [641775701/641775701]
>
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Pozdrawiam
Dominik
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests
  2016-03-03 12:55     ` Ritter Sławomir
@ 2016-03-03 23:40       ` Robin H. Johnson
       [not found]         ` <robbat2-20160303T232148-550781944Z-UgNl/1uUEYUufQK+DwRw3KxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Robin H. Johnson @ 2016-03-03 23:40 UTC (permalink / raw)
  To: Ritter Sławomir; +Cc: 	ceph-users@ceph.com, ceph-devel

On Thu, Mar 03, 2016 at 01:55:13PM +0100, Ritter Sławomir wrote:
> Hi,
> 
> I think this is really serious problem - again:  
> 
> - we silently lost S3/RGW objects in clusters 
> 
> Moreover, it our situation looks very similiar to described in
> uncorrected bug #13764 (Hammer) and in corrected #8269 (Dumpling).
FYI fix in #8269 _is_ present in Hammer:
commit bd8e026f88b rgw: don't allow multiple writers to same multiobject part

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problem: silently corrupted RadosGW objects caused by slow requests
       [not found]         ` <robbat2-20160303T232148-550781944Z-UgNl/1uUEYUufQK+DwRw3KxOck334EZe@public.gmane.org>
@ 2016-03-04 15:26           ` Ritter Sławomir
  2016-03-04 17:23             ` [ceph-users] " Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 6+ messages in thread
From: Ritter Sławomir @ 2016-03-04 15:26 UTC (permalink / raw)
  To: Robin H. Johnson; +Cc: ceph-devel, 	ceph-users-Qp0mS5GaXlQ@public.gmane.org

> From: Robin H. Johnson [mailto:robbat2@gentoo.org]
> Sent: Friday, March 04, 2016 12:40 AM
> To: Ritter Sławomir
> Cc: ceph-users@ceph.com; ceph-devel
> Subject: Re: [ceph-users] Problem: silently corrupted RadosGW objects caused
> by slow requests
> 
> On Thu, Mar 03, 2016 at 01:55:13PM +0100, Ritter Sławomir wrote:
> > Hi,
> >
> > I think this is really serious problem - again:
> >
> > - we silently lost S3/RGW objects in clusters
> >
> > Moreover, it our situation looks very similiar to described in
> > uncorrected bug #13764 (Hammer) and in corrected #8269 (Dumpling).
> FYI fix in #8269 _is_ present in Hammer:
> commit bd8e026f88b rgw: don't allow multiple writers to same multiobject part
> 
> --
> Robin Hugh Johnson
> Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
> E-Mail     : robbat2@gentoo.org
> GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
Yes,

fix for #8269 also has been included in our version: Dumpling 0.67.11.
Guys from #13764 are using patched Hammer version.

Both situations with corrupted files are very similiar to that described in #8269.
There was a problem with 2 threads writing to the same RADOS objects. 

Maybe there is another one uknown and specific exception to fix?

Cheers,
SR

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests
  2016-03-04 15:26           ` Ritter Sławomir
@ 2016-03-04 17:23             ` Yehuda Sadeh-Weinraub
  2016-03-07 13:34               ` Ritter Sławomir
  0 siblings, 1 reply; 6+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-03-04 17:23 UTC (permalink / raw)
  To: Ritter Sławomir; +Cc: Robin H. Johnson, ceph-devel, ceph-users

On Fri, Mar 4, 2016 at 7:26 AM, Ritter Sławomir
<Slawomir.Ritter@dreamlab.pl> wrote:
>> From: Robin H. Johnson [mailto:robbat2@gentoo.org]
>> Sent: Friday, March 04, 2016 12:40 AM
>> To: Ritter Sławomir
>> Cc: ceph-users@ceph.com; ceph-devel
>> Subject: Re: [ceph-users] Problem: silently corrupted RadosGW objects caused
>> by slow requests
>>
>> On Thu, Mar 03, 2016 at 01:55:13PM +0100, Ritter Sławomir wrote:
>> > Hi,
>> >
>> > I think this is really serious problem - again:
>> >
>> > - we silently lost S3/RGW objects in clusters
>> >
>> > Moreover, it our situation looks very similiar to described in
>> > uncorrected bug #13764 (Hammer) and in corrected #8269 (Dumpling).
>> FYI fix in #8269 _is_ present in Hammer:
>> commit bd8e026f88b rgw: don't allow multiple writers to same multiobject part
>>
>> --
>> Robin Hugh Johnson
>> Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
>> E-Mail     : robbat2@gentoo.org
>> GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
> Yes,
>
> fix for #8269 also has been included in our version: Dumpling 0.67.11.
> Guys from #13764 are using patched Hammer version

I didn't notice that you were actually running Dumpling (which we
haven't supported and backported fixes for a while). Here's one issue
that you might have hit:

http://tracker.ceph.com/issues/11604

Yehuda

>
> Both situations with corrupted files are very similiar to that described in #8269.
> There was a problem with 2 threads writing to the same RADOS objects.
>
> Maybe there is another one uknown and specific exception to fix?
>
> Cheers,
> SR
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests
  2016-03-04 17:23             ` [ceph-users] " Yehuda Sadeh-Weinraub
@ 2016-03-07 13:34               ` Ritter Sławomir
  0 siblings, 0 replies; 6+ messages in thread
From: Ritter Sławomir @ 2016-03-07 13:34 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Robin H. Johnson, ceph-devel, ceph-users

> > Yes,
> >
> > fix for #8269 also has been included in our version: Dumpling 0.67.11.
> > Guys from #13764 are using patched Hammer version
> 
> I didn't notice that you were actually running Dumpling (which we
> haven't supported and backported fixes for a while). Here's one issue
> that you might have hit:
> 
> http://tracker.ceph.com/issues/11604
> 
> Yehuda
Yes, it looks very similiar. We had to upgrade, probably to Hammer LTS :).
We will check it again after whole operation. Thnx a lot.

--
SR

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-03-07 13:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <156923D1D92AF640829D4E847271EDAE522A46F435@MXMBON06.grupa.onet>
2016-02-26 14:32 ` [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests Dominik Mostowiec
     [not found]   ` <CAMNMNTxs46Hcaq-ZJZfYfF5m8aL9ETNjtBKr8UrdAxZFM7E=nw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-03 12:55     ` Ritter Sławomir
2016-03-03 23:40       ` [ceph-users] " Robin H. Johnson
     [not found]         ` <robbat2-20160303T232148-550781944Z-UgNl/1uUEYUufQK+DwRw3KxOck334EZe@public.gmane.org>
2016-03-04 15:26           ` Ritter Sławomir
2016-03-04 17:23             ` [ceph-users] " Yehuda Sadeh-Weinraub
2016-03-07 13:34               ` Ritter Sławomir

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.