All of lore.kernel.org
 help / color / mirror / Atom feed
* Rados gateway 0.58 crash in RGWProcess::_clear
@ 2013-03-10 22:48 Yann ROBIN
  2013-03-10 22:57 ` Yehuda Sadeh
  0 siblings, 1 reply; 5+ messages in thread
From: Yann ROBIN @ 2013-03-10 22:48 UTC (permalink / raw)
  To: ceph-devel

Hi,

We recently setup a cluster using version 0.58. We did massive parallel upload to the gateway and saw the radosgw restarted every 5 to 10 minutes.
Here are the debug log :
https://gist.github.com/kYann/5130775

Small version :
-2> 2013-03-10 23:20:02.916521 7fc1376ee700  1 ====== req done req=0x237fc10 http_status=200 ======
    -1> 2013-03-10 23:20:02.916546 7fc1376ee700  1 RGWProcess::m_tp worker finish
     0> 2013-03-10 23:20:02.931847 7fc1efa5b780 -1 rgw/rgw_main.cc: In function 'virtual void RGWProcess::RGWWQ::_clear()' thread 7fc1efa5b780 time 2013-03-10 23:20:02.922020
rgw/rgw_main.cc: 175: FAILED assert(process->m_req_queue.empty())
 
ceph version 0.58 (ba3f91e7504867a52a83399d60917e3414e8c3e2)
1: (RGWRESTMgr_Admin::~RGWRESTMgr_Admin()+0) [0x474910]
2: (ThreadPool::stop(bool)+0x1ed) [0x4909bd]
3: (RGWProcess::run()+0x3c7) [0x473367]
4: (main()+0x8b6) [0x447276]
5: (__libc_start_main()+0xed) [0x7fc1ec92276d]
6: /usr/bin/radosgw() [0x448871]


-- 
Yann ROBIN
YouScribe



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Rados gateway 0.58 crash in RGWProcess::_clear
  2013-03-10 22:48 Rados gateway 0.58 crash in RGWProcess::_clear Yann ROBIN
@ 2013-03-10 22:57 ` Yehuda Sadeh
  2013-03-10 23:39   ` Yann ROBIN
  0 siblings, 1 reply; 5+ messages in thread
From: Yehuda Sadeh @ 2013-03-10 22:57 UTC (permalink / raw)
  To: Yann ROBIN; +Cc: ceph-devel

On Sun, Mar 10, 2013 at 3:48 PM, Yann ROBIN <yann.robin@youscribe.com> wrote:
> Hi,
>
> We recently setup a cluster using version 0.58. We did massive parallel upload to the gateway and saw the radosgw restarted every 5 to 10 minutes.
> Here are the debug log :
> https://gist.github.com/kYann/5130775
>
> Small version :
> -2> 2013-03-10 23:20:02.916521 7fc1376ee700  1 ====== req done req=0x237fc10 http_status=200 ======
>     -1> 2013-03-10 23:20:02.916546 7fc1376ee700  1 RGWProcess::m_tp worker finish
>      0> 2013-03-10 23:20:02.931847 7fc1efa5b780 -1 rgw/rgw_main.cc: In function 'virtual void RGWProcess::RGWWQ::_clear()' thread 7fc1efa5b780 time 2013-03-10 23:20:02.922020
> rgw/rgw_main.cc: 175: FAILED assert(process->m_req_queue.empty())
>
> ceph version 0.58 (ba3f91e7504867a52a83399d60917e3414e8c3e2)
> 1: (RGWRESTMgr_Admin::~RGWRESTMgr_Admin()+0) [0x474910]
> 2: (ThreadPool::stop(bool)+0x1ed) [0x4909bd]
> 3: (RGWProcess::run()+0x3c7) [0x473367]
> 4: (main()+0x8b6) [0x447276]
> 5: (__libc_start_main()+0xed) [0x7fc1ec92276d]
> 6: /usr/bin/radosgw() [0x448871]
>
>

This obviously shouldn't happen. Just note that this code should only
be reached either when trying to bring the gateway down, or when
there's some error on the fastcgi socket. Which web server are you
using? Which fastcgi module? How did you set up fastcgi?

Thanks,
Yehuda

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Rados gateway 0.58 crash in RGWProcess::_clear
  2013-03-10 22:57 ` Yehuda Sadeh
@ 2013-03-10 23:39   ` Yann ROBIN
  2013-03-11 14:24     ` Yann ROBIN
  0 siblings, 1 reply; 5+ messages in thread
From: Yann ROBIN @ 2013-03-10 23:39 UTC (permalink / raw)
  To: Yehuda Sadeh; +Cc: ceph-devel

The setup is multiple nginx accessing the fastcgi module using tcp socket.
I found a ticket about multiple gateways issue : http://tracker.ceph.com/issues/2804 this may be related.

We'll test with only one nginx to see if we still have the issue.

Thanks,

-----Message d'origine-----
De : yehudasa@gmail.com [mailto:yehudasa@gmail.com] De la part de Yehuda Sadeh
Envoyé : dimanche 10 mars 2013 23:58
À : Yann ROBIN
Cc : ceph-devel@vger.kernel.org
Objet : Re: Rados gateway 0.58 crash in RGWProcess::_clear

On Sun, Mar 10, 2013 at 3:48 PM, Yann ROBIN <yann.robin@youscribe.com> wrote:
> Hi,
>
> We recently setup a cluster using version 0.58. We did massive parallel upload to the gateway and saw the radosgw restarted every 5 to 10 minutes.
> Here are the debug log :
> https://gist.github.com/kYann/5130775
>
> Small version :
> -2> 2013-03-10 23:20:02.916521 7fc1376ee700  1 ====== req done 
> -2> req=0x237fc10 http_status=200 ======
>     -1> 2013-03-10 23:20:02.916546 7fc1376ee700  1 RGWProcess::m_tp worker finish
>      0> 2013-03-10 23:20:02.931847 7fc1efa5b780 -1 rgw/rgw_main.cc: In 
> function 'virtual void RGWProcess::RGWWQ::_clear()' thread 
> 7fc1efa5b780 time 2013-03-10 23:20:02.922020
> rgw/rgw_main.cc: 175: FAILED assert(process->m_req_queue.empty())
>
> ceph version 0.58 (ba3f91e7504867a52a83399d60917e3414e8c3e2)
> 1: (RGWRESTMgr_Admin::~RGWRESTMgr_Admin()+0) [0x474910]
> 2: (ThreadPool::stop(bool)+0x1ed) [0x4909bd]
> 3: (RGWProcess::run()+0x3c7) [0x473367]
> 4: (main()+0x8b6) [0x447276]
> 5: (__libc_start_main()+0xed) [0x7fc1ec92276d]
> 6: /usr/bin/radosgw() [0x448871]
>
>

This obviously shouldn't happen. Just note that this code should only be reached either when trying to bring the gateway down, or when there's some error on the fastcgi socket. Which web server are you using? Which fastcgi module? How did you set up fastcgi?

Thanks,
Yehuda


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Rados gateway 0.58 crash in RGWProcess::_clear
  2013-03-10 23:39   ` Yann ROBIN
@ 2013-03-11 14:24     ` Yann ROBIN
  2013-03-11 14:35       ` Yehuda Sadeh
  0 siblings, 1 reply; 5+ messages in thread
From: Yann ROBIN @ 2013-03-11 14:24 UTC (permalink / raw)
  To: Yehuda Sadeh; +Cc: ceph-devel

The socket error was due to nginx opening too much connection, thus reaching the limit of open fd for the gateway.

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Yann ROBIN
Sent: lundi 11 mars 2013 00:40
To: Yehuda Sadeh
Cc: ceph-devel@vger.kernel.org
Subject: RE: Rados gateway 0.58 crash in RGWProcess::_clear

The setup is multiple nginx accessing the fastcgi module using tcp socket.
I found a ticket about multiple gateways issue : http://tracker.ceph.com/issues/2804 this may be related.

We'll test with only one nginx to see if we still have the issue.

Thanks,

-----Message d'origine-----
De : yehudasa@gmail.com [mailto:yehudasa@gmail.com] De la part de Yehuda Sadeh Envoyé : dimanche 10 mars 2013 23:58 À : Yann ROBIN Cc : ceph-devel@vger.kernel.org Objet : Re: Rados gateway 0.58 crash in RGWProcess::_clear

On Sun, Mar 10, 2013 at 3:48 PM, Yann ROBIN <yann.robin@youscribe.com> wrote:
> Hi,
>
> We recently setup a cluster using version 0.58. We did massive parallel upload to the gateway and saw the radosgw restarted every 5 to 10 minutes.
> Here are the debug log :
> https://gist.github.com/kYann/5130775
>
> Small version :
> -2> 2013-03-10 23:20:02.916521 7fc1376ee700  1 ====== req done
> -2> req=0x237fc10 http_status=200 ======
>     -1> 2013-03-10 23:20:02.916546 7fc1376ee700  1 RGWProcess::m_tp worker finish
>      0> 2013-03-10 23:20:02.931847 7fc1efa5b780 -1 rgw/rgw_main.cc: In 
> function 'virtual void RGWProcess::RGWWQ::_clear()' thread
> 7fc1efa5b780 time 2013-03-10 23:20:02.922020
> rgw/rgw_main.cc: 175: FAILED assert(process->m_req_queue.empty())
>
> ceph version 0.58 (ba3f91e7504867a52a83399d60917e3414e8c3e2)
> 1: (RGWRESTMgr_Admin::~RGWRESTMgr_Admin()+0) [0x474910]
> 2: (ThreadPool::stop(bool)+0x1ed) [0x4909bd]
> 3: (RGWProcess::run()+0x3c7) [0x473367]
> 4: (main()+0x8b6) [0x447276]
> 5: (__libc_start_main()+0xed) [0x7fc1ec92276d]
> 6: /usr/bin/radosgw() [0x448871]
>
>

This obviously shouldn't happen. Just note that this code should only be reached either when trying to bring the gateway down, or when there's some error on the fastcgi socket. Which web server are you using? Which fastcgi module? How did you set up fastcgi?

Thanks,
Yehuda


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Rados gateway 0.58 crash in RGWProcess::_clear
  2013-03-11 14:24     ` Yann ROBIN
@ 2013-03-11 14:35       ` Yehuda Sadeh
  0 siblings, 0 replies; 5+ messages in thread
From: Yehuda Sadeh @ 2013-03-11 14:35 UTC (permalink / raw)
  To: Yann ROBIN; +Cc: ceph-devel

Thanks. I opened issues #4409, #4410. The first one to improve the
logging on such an error, the second one to make sure we don't die a
horrible death.

On Mon, Mar 11, 2013 at 7:24 AM, Yann ROBIN <yann.robin@youscribe.com> wrote:
> The socket error was due to nginx opening too much connection, thus reaching the limit of open fd for the gateway.
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Yann ROBIN
> Sent: lundi 11 mars 2013 00:40
> To: Yehuda Sadeh
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: Rados gateway 0.58 crash in RGWProcess::_clear
>
> The setup is multiple nginx accessing the fastcgi module using tcp socket.
> I found a ticket about multiple gateways issue : http://tracker.ceph.com/issues/2804 this may be related.
>
> We'll test with only one nginx to see if we still have the issue.
>
> Thanks,
>
> -----Message d'origine-----
> De : yehudasa@gmail.com [mailto:yehudasa@gmail.com] De la part de Yehuda Sadeh Envoyé : dimanche 10 mars 2013 23:58 À : Yann ROBIN Cc : ceph-devel@vger.kernel.org Objet : Re: Rados gateway 0.58 crash in RGWProcess::_clear
>
> On Sun, Mar 10, 2013 at 3:48 PM, Yann ROBIN <yann.robin@youscribe.com> wrote:
>> Hi,
>>
>> We recently setup a cluster using version 0.58. We did massive parallel upload to the gateway and saw the radosgw restarted every 5 to 10 minutes.
>> Here are the debug log :
>> https://gist.github.com/kYann/5130775
>>
>> Small version :
>> -2> 2013-03-10 23:20:02.916521 7fc1376ee700  1 ====== req done
>> -2> req=0x237fc10 http_status=200 ======
>>     -1> 2013-03-10 23:20:02.916546 7fc1376ee700  1 RGWProcess::m_tp worker finish
>>      0> 2013-03-10 23:20:02.931847 7fc1efa5b780 -1 rgw/rgw_main.cc: In
>> function 'virtual void RGWProcess::RGWWQ::_clear()' thread
>> 7fc1efa5b780 time 2013-03-10 23:20:02.922020
>> rgw/rgw_main.cc: 175: FAILED assert(process->m_req_queue.empty())
>>
>> ceph version 0.58 (ba3f91e7504867a52a83399d60917e3414e8c3e2)
>> 1: (RGWRESTMgr_Admin::~RGWRESTMgr_Admin()+0) [0x474910]
>> 2: (ThreadPool::stop(bool)+0x1ed) [0x4909bd]
>> 3: (RGWProcess::run()+0x3c7) [0x473367]
>> 4: (main()+0x8b6) [0x447276]
>> 5: (__libc_start_main()+0xed) [0x7fc1ec92276d]
>> 6: /usr/bin/radosgw() [0x448871]
>>
>>
>
> This obviously shouldn't happen. Just note that this code should only be reached either when trying to bring the gateway down, or when there's some error on the fastcgi socket. Which web server are you using? Which fastcgi module? How did you set up fastcgi?
>
> Thanks,
> Yehuda
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-03-11 14:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-10 22:48 Rados gateway 0.58 crash in RGWProcess::_clear Yann ROBIN
2013-03-10 22:57 ` Yehuda Sadeh
2013-03-10 23:39   ` Yann ROBIN
2013-03-11 14:24     ` Yann ROBIN
2013-03-11 14:35       ` Yehuda Sadeh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.