Re: cluster down during backfilling, Jewel tunables and client IO optimisations

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: cluster down during backfilling, Jewel tunables and client IO optimisations
       [not found]     ` <nk92c8$knq$1@ger.gmane.org>
@ 2016-06-20 17:51       ` Gregory Farnum
       [not found]         ` <CAJ4mKGb28W-jPK7Z7wMzm7fC8Q5YPmqr+PeGS=Bz7jky4GfxuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Gregory Farnum @ 2016-06-20 17:51 UTC (permalink / raw)
  To: Daniel Swarbrick; +Cc: Ceph Users, ceph-devel

On Mon, Jun 20, 2016 at 8:33 AM, Daniel Swarbrick
<daniel.swarbrick-EIkl63zCoXaH+58JC4qpiA@public.gmane.org> wrote:
> We have just updated our third cluster from Infernalis to Jewel, and are
> experiencing similar issues.
>
> We run a number of KVM virtual machines (qemu 2.5) with RBD images, and
> have seen a lot of D-state processes and even jbd/2 timeouts and kernel
> stack traces inside the guests. At first I thought the VMs were being
> starved of IO, but this is still happening after throttling back the
> recovery with:
>
> osd_max_backfills = 1
> osd_recovery_max_active = 1
> osd_recovery_op_priority = 1
>
> After upgrading the cluster to Jewel, I changed our crushmap to use the
> newer straw2 algorithm, which resulted in a little data movment, but no
> problems at that stage.
>
> Once the cluster had settled down again, I set tunables to optimal
> (hammer profile -> jewel profile), which has triggered between 50% and
> 70% misplaced PGs on our clusters. This is when the trouble started each
> time, and when we had cascading failures of VMs.
>
> However, after performing hard shutdowns on the VMs and restarting them,
> they seemed to be OK.
>
> At this stage, I have a strong suspicion that it is the introduction of
> "require_feature_tunables5 = 1" in the tunables. This seems to require
> all RADOS connections to be re-established.

Do you have any evidence of that besides the one restart?

I guess it's possible that we aren't kicking requests if the crush map
but not the rest of the osdmap changes, but I'd be surprised.
-Greg

>
>
> On 20/06/16 13:54, Andrei Mikhailovsky wrote:
>> Hi Oliver,
>>
>> I am also seeing this as a strange behavriour indeed! I was going through the logs and I was not able to find any errors or issues. There was also no slow/blocked requests that I could see during the recovery process.
>>
>> Does anyone has an idea what could be the issue here? I don't want to shut down all vms every time there is a new release with updated tunable values.
>>
>>
>> Andrei
>>
>>
>>
>> ----- Original Message -----
>>> From: "Oliver Dzombic" <info-cbyvsTkHNGAhzyAFmVfXCbNAH6kLmebB@public.gmane.org>
>>> To: "andrei" <andrei-930XJYlnu5nQT0dZR+AlfA@public.gmane.org>, "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>
>>> Sent: Sunday, 19 June, 2016 10:14:35
>>> Subject: Re: [ceph-users] cluster down during backfilling, Jewel tunables and client IO optimisations
>>
>>> Hi,
>>>
>>> so far the key values for that are:
>>>
>>> osd_client_op_priority = 63 ( anyway default, but i set it to remember it )
>>> osd_recovery_op_priority = 1
>>>
>>>
>>> In addition i set:
>>>
>>> osd_max_backfills = 1
>>> osd_recovery_max_active = 1
>>>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cluster down during backfilling, Jewel tunables and client IO optimisations
       [not found]         ` <CAJ4mKGb28W-jPK7Z7wMzm7fC8Q5YPmqr+PeGS=Bz7jky4GfxuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-20 19:22           ` Josef Johansson
       [not found]             ` <CAOnYue9FR5amxPkZ-5v6bntq9WN=YkDUGu0U4MSdj2k_eNiuWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-06-22 12:43           ` Daniel Swarbrick
  1 sibling, 1 reply; 7+ messages in thread
From: Josef Johansson @ 2016-06-20 19:22 UTC (permalink / raw)
  To: Gregory Farnum, Daniel Swarbrick; +Cc: Ceph Users, ceph-devel


[-- Attachment #1.1: Type: text/plain, Size: 3795 bytes --]

Hi,

People ran into this when there were some changes in tunables that caused
70-100% movement, the solution was to find out what values that changed and
increment them in the smallest steps possible.

I've found that with major rearrangement in ceph the VMs does not
neccesarily survive ( last time on a ssd cluster ), so linux and timeouts
doesn't work well os my assumption. Which is true with any other storage
backend out there ;)

Regards,
Josef

On Mon, 20 Jun 2016, 19:51 Gregory Farnum, <gfarnum-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Mon, Jun 20, 2016 at 8:33 AM, Daniel Swarbrick
> <daniel.swarbrick-EIkl63zCoXaH+58JC4qpiA@public.gmane.org> wrote:
> > We have just updated our third cluster from Infernalis to Jewel, and are
> > experiencing similar issues.
> >
> > We run a number of KVM virtual machines (qemu 2.5) with RBD images, and
> > have seen a lot of D-state processes and even jbd/2 timeouts and kernel
> > stack traces inside the guests. At first I thought the VMs were being
> > starved of IO, but this is still happening after throttling back the
> > recovery with:
> >
> > osd_max_backfills = 1
> > osd_recovery_max_active = 1
> > osd_recovery_op_priority = 1
> >
> > After upgrading the cluster to Jewel, I changed our crushmap to use the
> > newer straw2 algorithm, which resulted in a little data movment, but no
> > problems at that stage.
> >
> > Once the cluster had settled down again, I set tunables to optimal
> > (hammer profile -> jewel profile), which has triggered between 50% and
> > 70% misplaced PGs on our clusters. This is when the trouble started each
> > time, and when we had cascading failures of VMs.
> >
> > However, after performing hard shutdowns on the VMs and restarting them,
> > they seemed to be OK.
> >
> > At this stage, I have a strong suspicion that it is the introduction of
> > "require_feature_tunables5 = 1" in the tunables. This seems to require
> > all RADOS connections to be re-established.
>
> Do you have any evidence of that besides the one restart?
>
> I guess it's possible that we aren't kicking requests if the crush map
> but not the rest of the osdmap changes, but I'd be surprised.
> -Greg
>
> >
> >
> > On 20/06/16 13:54, Andrei Mikhailovsky wrote:
> >> Hi Oliver,
> >>
> >> I am also seeing this as a strange behavriour indeed! I was going
> through the logs and I was not able to find any errors or issues. There was
> also no slow/blocked requests that I could see during the recovery process.
> >>
> >> Does anyone has an idea what could be the issue here? I don't want to
> shut down all vms every time there is a new release with updated tunable
> values.
> >>
> >>
> >> Andrei
> >>
> >>
> >>
> >> ----- Original Message -----
> >>> From: "Oliver Dzombic" <info-cbyvsTkHNGAhzyAFmVfXCbNAH6kLmebB@public.gmane.org>
> >>> To: "andrei" <andrei-930XJYlnu5nQT0dZR+AlfA@public.gmane.org>, "ceph-users" <
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>
> >>> Sent: Sunday, 19 June, 2016 10:14:35
> >>> Subject: Re: [ceph-users] cluster down during backfilling, Jewel
> tunables and client IO optimisations
> >>
> >>> Hi,
> >>>
> >>> so far the key values for that are:
> >>>
> >>> osd_client_op_priority = 63 ( anyway default, but i set it to remember
> it )
> >>> osd_recovery_op_priority = 1
> >>>
> >>>
> >>> In addition i set:
> >>>
> >>> osd_max_backfills = 1
> >>> osd_recovery_max_active = 1
> >>>
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

[-- Attachment #1.2: Type: text/html, Size: 5434 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cluster down during backfilling, Jewel tunables and client IO optimisations
       [not found]             ` <CAOnYue9FR5amxPkZ-5v6bntq9WN=YkDUGu0U4MSdj2k_eNiuWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-20 20:16               ` Andrei Mikhailovsky
  0 siblings, 0 replies; 7+ messages in thread
From: Andrei Mikhailovsky @ 2016-06-20 20:16 UTC (permalink / raw)
  To: Josef Johansson; +Cc: ceph-users, ceph-devel, Daniel Swarbrick


[-- Attachment #1.1: Type: text/plain, Size: 5453 bytes --]

Hi Josef, 

are you saying that there is no ceph config option that can be used to provide IO to the vms while the ceph cluster is in heavy data move? I am really struggling to understand that this could be the case. I've read so much about ceph being the solution to the modern storage needs and that all of its components were designed to be redundant to provide an always on availability of the storage in case of upgrades and hardware failures. Has something been overlooked? 

Also, judging by a low number of people with similar issues I am thinking that there are a lot of ceph users which are still using non optimal profile, either because they don't want to risk the downtime or simply they don't know about the latest crush tunables. 

For any future updates, should I be scheduling a maintenance day or two and shutdown all vms prior to upgrading the cluster? It so seems like the backwards approach of the 90s and early 2000s ((( 

Cheers 

Andrei 

> From: "Josef Johansson" <josef86-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> To: "Gregory Farnum" <gfarnum-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Daniel Swarbrick"
> <daniel.swarbrick-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> Cc: "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>, "ceph-devel"
> <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> Sent: Monday, 20 June, 2016 20:22:02
> Subject: Re: [ceph-users] cluster down during backfilling, Jewel tunables and
> client IO optimisations

> Hi,

> People ran into this when there were some changes in tunables that caused
> 70-100% movement, the solution was to find out what values that changed and
> increment them in the smallest steps possible.

> I've found that with major rearrangement in ceph the VMs does not neccesarily
> survive ( last time on a ssd cluster ), so linux and timeouts doesn't work well
> os my assumption. Which is true with any other storage backend out there ;)

> Regards,
> Josef
> On Mon, 20 Jun 2016, 19:51 Gregory Farnum, < gfarnum-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org > wrote:

>> On Mon, Jun 20, 2016 at 8:33 AM, Daniel Swarbrick
>> < daniel.swarbrick-EIkl63zCoXaH+58JC4qpiA@public.gmane.org > wrote:
>> > We have just updated our third cluster from Infernalis to Jewel, and are
>> > experiencing similar issues.

>> > We run a number of KVM virtual machines (qemu 2.5) with RBD images, and
>> > have seen a lot of D-state processes and even jbd/2 timeouts and kernel
>> > stack traces inside the guests. At first I thought the VMs were being
>> > starved of IO, but this is still happening after throttling back the
>> > recovery with:

>> > osd_max_backfills = 1
>> > osd_recovery_max_active = 1
>> > osd_recovery_op_priority = 1

>> > After upgrading the cluster to Jewel, I changed our crushmap to use the
>> > newer straw2 algorithm, which resulted in a little data movment, but no
>> > problems at that stage.

>> > Once the cluster had settled down again, I set tunables to optimal
>> > (hammer profile -> jewel profile), which has triggered between 50% and
>> > 70% misplaced PGs on our clusters. This is when the trouble started each
>> > time, and when we had cascading failures of VMs.

>> > However, after performing hard shutdowns on the VMs and restarting them,
>> > they seemed to be OK.

>> > At this stage, I have a strong suspicion that it is the introduction of
>> > "require_feature_tunables5 = 1" in the tunables. This seems to require
>> > all RADOS connections to be re-established.

>> Do you have any evidence of that besides the one restart?

>> I guess it's possible that we aren't kicking requests if the crush map
>> but not the rest of the osdmap changes, but I'd be surprised.
>> -Greg



>> > On 20/06/16 13:54, Andrei Mikhailovsky wrote:
>> >> Hi Oliver,

>>>> I am also seeing this as a strange behavriour indeed! I was going through the
>>>> logs and I was not able to find any errors or issues. There was also no
>> >> slow/blocked requests that I could see during the recovery process.

>>>> Does anyone has an idea what could be the issue here? I don't want to shut down
>> >> all vms every time there is a new release with updated tunable values.


>> >> Andrei



>> >> ----- Original Message -----
>> >>> From: "Oliver Dzombic" < info-cbyvsTkHNGAhzyAFmVfXCbNAH6kLmebB@public.gmane.org >
>> >>> To: "andrei" < andrei-930XJYlnu5nQT0dZR+AlfA@public.gmane.org >, "ceph-users" < ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org >
>> >>> Sent: Sunday, 19 June, 2016 10:14:35
>>>>> Subject: Re: [ceph-users] cluster down during backfilling, Jewel tunables and
>> >>> client IO optimisations

>> >>> Hi,

>> >>> so far the key values for that are:

>> >>> osd_client_op_priority = 63 ( anyway default, but i set it to remember it )
>> >>> osd_recovery_op_priority = 1


>> >>> In addition i set:

>> >>> osd_max_backfills = 1
>> >>> osd_recovery_max_active = 1



>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[-- Attachment #1.2: Type: text/html, Size: 8171 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cluster down during backfilling, Jewel tunables and client IO optimisations
       [not found]         ` <CAJ4mKGb28W-jPK7Z7wMzm7fC8Q5YPmqr+PeGS=Bz7jky4GfxuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-06-20 19:22           ` Josef Johansson
@ 2016-06-22 12:43           ` Daniel Swarbrick
  2016-06-22 15:54             ` Andrei Mikhailovsky
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Swarbrick @ 2016-06-22 12:43 UTC (permalink / raw)
  To: ceph-users-idqoXFIVOFJgJs9I8MT0rw; +Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA

On 20/06/16 19:51, Gregory Farnum wrote:
> On Mon, Jun 20, 2016 at 8:33 AM, Daniel Swarbrick
>>
>> At this stage, I have a strong suspicion that it is the introduction of
>> "require_feature_tunables5 = 1" in the tunables. This seems to require
>> all RADOS connections to be re-established.
> 
> Do you have any evidence of that besides the one restart?
> 
> I guess it's possible that we aren't kicking requests if the crush map
> but not the rest of the osdmap changes, but I'd be surprised.
> -Greg

I think the key fact to take note of is that we had long-running Qemu
processes that had been started a few months ago, using Infernalis
librbd shared libs.

If Infernalis had no concept of require_feature_tunables5, then it seems
logical that these clients would block if the cluster were upgraded to
Jewel and this tunable became mandatory.

I have just upgraded our fourth and final cluster to Jewel. Prior to
applying optimal tunables, we upgraded our hypervisor nodes' librbd
also, and migrated all VMs at least once, to start a fresh Qemu process
for each (using the updated librbd).

We're seeing ~65% data movement due to chooseleaf_stable 0 => 1, but
other than that, so far so good. No clients are blocking indefinitely.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cluster down during backfilling, Jewel tunables and client IO optimisations
  2016-06-22 12:43           ` Daniel Swarbrick
@ 2016-06-22 15:54             ` Andrei Mikhailovsky
       [not found]               ` <829216253.136015.1466610890998.JavaMail.zimbra-930XJYlnu5nQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Andrei Mikhailovsky @ 2016-06-22 15:54 UTC (permalink / raw)
  To: Daniel Swarbrick; +Cc: ceph-users, ceph-devel

Hi Daniel,

Many thanks for your useful tests and your results.

How much IO wait do you have on your client vms? Has it significantly increased or not?

Many thanks

Andrei

----- Original Message -----
> From: "Daniel Swarbrick" <daniel.swarbrick-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> To: "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>
> Cc: "ceph-devel" <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> Sent: Wednesday, 22 June, 2016 13:43:37
> Subject: Re: [ceph-users] cluster down during backfilling, Jewel tunables and client IO optimisations

> On 20/06/16 19:51, Gregory Farnum wrote:
>> On Mon, Jun 20, 2016 at 8:33 AM, Daniel Swarbrick
>>>
>>> At this stage, I have a strong suspicion that it is the introduction of
>>> "require_feature_tunables5 = 1" in the tunables. This seems to require
>>> all RADOS connections to be re-established.
>> 
>> Do you have any evidence of that besides the one restart?
>> 
>> I guess it's possible that we aren't kicking requests if the crush map
>> but not the rest of the osdmap changes, but I'd be surprised.
>> -Greg
> 
> I think the key fact to take note of is that we had long-running Qemu
> processes that had been started a few months ago, using Infernalis
> librbd shared libs.
> 
> If Infernalis had no concept of require_feature_tunables5, then it seems
> logical that these clients would block if the cluster were upgraded to
> Jewel and this tunable became mandatory.
> 
> I have just upgraded our fourth and final cluster to Jewel. Prior to
> applying optimal tunables, we upgraded our hypervisor nodes' librbd
> also, and migrated all VMs at least once, to start a fresh Qemu process
> for each (using the updated librbd).
> 
> We're seeing ~65% data movement due to chooseleaf_stable 0 => 1, but
> other than that, so far so good. No clients are blocking indefinitely.
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cluster down during backfilling, Jewel tunables and client IO optimisations
       [not found]               ` <829216253.136015.1466610890998.JavaMail.zimbra-930XJYlnu5nQT0dZR+AlfA@public.gmane.org>
@ 2016-06-22 16:09                 ` Daniel Swarbrick
  2016-06-22 16:49                   ` Andrei Mikhailovsky
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Swarbrick @ 2016-06-22 16:09 UTC (permalink / raw)
  To: ceph-users-idqoXFIVOFJgJs9I8MT0rw; +Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA

On 22/06/16 17:54, Andrei Mikhailovsky wrote:
> Hi Daniel,
> 
> Many thanks for your useful tests and your results.
> 
> How much IO wait do you have on your client vms? Has it significantly increased or not?
> 

Hi Andrei,

Bearing in mind that this cluster is tiny (four nodes, each with four
OSDs), our metrics may not be that meaningful. However, on a VM that is
running ElasticSearch, collecting logs from Graylog, we're seeing no
more than about 5% iowait for a 5s period, and most of the time it's
below 1%. This VM is really not writing a lot of data though.

The cluster as a whole is peaking at only about 1200 write op/s,
according to ceph -w.

Executing a "sync" in a VM does of course have a noticeable delay due to
the recovery happening in the background, but nothing is waiting for IO
long enough to trigger the kernel's 120s timer / warning.

The recovery has been running for about four hours now, and is down to
20% misplaced objects. So far we have not had any clients block
indefinitely, so I think the migration of VMs to Jewel-capable
hypervisors did the trick.

Best,
Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cluster down during backfilling, Jewel tunables and client IO optimisations
  2016-06-22 16:09                 ` Daniel Swarbrick
@ 2016-06-22 16:49                   ` Andrei Mikhailovsky
  0 siblings, 0 replies; 7+ messages in thread
From: Andrei Mikhailovsky @ 2016-06-22 16:49 UTC (permalink / raw)
  To: Daniel Swarbrick; +Cc: ceph-users, ceph-devel

Hi Daniel,

Many thanks, I will keep this in mind while performing the updates in the future.

Note to documentation manager - perhaps it makes sens to add this solution as a note/tip to the Upgrade section of the release notes?


Andrei

----- Original Message -----
> From: "Daniel Swarbrick" <daniel.swarbrick-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> To: "ceph-users" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>
> Cc: "ceph-devel" <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> Sent: Wednesday, 22 June, 2016 17:09:48
> Subject: Re: [ceph-users] cluster down during backfilling, Jewel tunables and client IO optimisations

> On 22/06/16 17:54, Andrei Mikhailovsky wrote:
>> Hi Daniel,
>> 
>> Many thanks for your useful tests and your results.
>> 
>> How much IO wait do you have on your client vms? Has it significantly increased
>> or not?
>> 
> 
> Hi Andrei,
> 
> Bearing in mind that this cluster is tiny (four nodes, each with four
> OSDs), our metrics may not be that meaningful. However, on a VM that is
> running ElasticSearch, collecting logs from Graylog, we're seeing no
> more than about 5% iowait for a 5s period, and most of the time it's
> below 1%. This VM is really not writing a lot of data though.
> 
> The cluster as a whole is peaking at only about 1200 write op/s,
> according to ceph -w.
> 
> Executing a "sync" in a VM does of course have a noticeable delay due to
> the recovery happening in the background, but nothing is waiting for IO
> long enough to trigger the kernel's 120s timer / warning.
> 
> The recovery has been running for about four hours now, and is down to
> 20% misplaced objects. So far we have not had any clients block
> indefinitely, so I think the migration of VMs to Jewel-capable
> hypervisors did the trick.
> 
> Best,
> Daniel
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-06-22 16:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <848514758.3747.1466265852627.JavaMail.zimbra@arhont.com>
     [not found] ` <31cbf96d-c79e-1e7d-19fd-df9e2d2a748f@ip-interactive.de>
     [not found]   ` <1456968003.98467.1466423640703.JavaMail.zimbra@arhont.com>
     [not found]     ` <nk92c8$knq$1@ger.gmane.org>
2016-06-20 17:51       ` cluster down during backfilling, Jewel tunables and client IO optimisations Gregory Farnum
     [not found]         ` <CAJ4mKGb28W-jPK7Z7wMzm7fC8Q5YPmqr+PeGS=Bz7jky4GfxuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-20 19:22           ` Josef Johansson
     [not found]             ` <CAOnYue9FR5amxPkZ-5v6bntq9WN=YkDUGu0U4MSdj2k_eNiuWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-20 20:16               ` Andrei Mikhailovsky
2016-06-22 12:43           ` Daniel Swarbrick
2016-06-22 15:54             ` Andrei Mikhailovsky
     [not found]               ` <829216253.136015.1466610890998.JavaMail.zimbra-930XJYlnu5nQT0dZR+AlfA@public.gmane.org>
2016-06-22 16:09                 ` Daniel Swarbrick
2016-06-22 16:49                   ` Andrei Mikhailovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.