All of lore.kernel.org
 help / color / mirror / Atom feed
* Smarter blacklisting?
@ 2017-04-18 16:37 John Spray
  2017-04-18 16:41 ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: John Spray @ 2017-04-18 16:37 UTC (permalink / raw)
  To: Ceph Development

Currently, when we add an address to the blacklist, we leave it in
there for a set period of time (24 minutes by default, which I suspect
might have been meant to be 24 hours), and then expire it.

Clearly there are two problems with that:
 * We leave things in the list for much longer than necessary most of
the time, when a blacklisted client/node comes back reasonably soon
after a restart
 * We are never 100% guaranteed that a long-halted client won't come
back after its blacklist entry has expired (e.g. a paused VM with
dirty pages, wakes up a day later and writes back to OSDs).

These mostly haven't been too much trouble in practice, but we may be
(optionally) doing a lot more blacklisting on cephfs systems soon[1],
and cephfs clients are perhaps more likely to be VMs than RBD hosts.

One thought is to have an alternative type of backlist entry that does
not have an expiration, but instead is automatically removed when we
see a client authenticate with the same auth id, from the same IP
address as the blacklist entry, but with a different nonce.

Flushing out any blacklist entries from a host that never came back
would be an administrative operation, or we could do it automatically
on a *super* long expiration time (like a month), and in other cases
like if the auth identity associated with the blacklist entry was
removed.

Any thoughts?

John

1. https://github.com/ceph/ceph/pull/14610

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Smarter blacklisting?
  2017-04-18 16:37 Smarter blacklisting? John Spray
@ 2017-04-18 16:41 ` Sage Weil
  2017-04-18 18:36   ` Gregory Farnum
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2017-04-18 16:41 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

On Tue, 18 Apr 2017, John Spray wrote:
> Currently, when we add an address to the blacklist, we leave it in
> there for a set period of time (24 minutes by default, which I suspect
> might have been meant to be 24 hours), and then expire it.
> 
> Clearly there are two problems with that:
>  * We leave things in the list for much longer than necessary most of
> the time, when a blacklisted client/node comes back reasonably soon
> after a restart
>  * We are never 100% guaranteed that a long-halted client won't come
> back after its blacklist entry has expired (e.g. a paused VM with
> dirty pages, wakes up a day later and writes back to OSDs).
> 
> These mostly haven't been too much trouble in practice, but we may be
> (optionally) doing a lot more blacklisting on cephfs systems soon[1],
> and cephfs clients are perhaps more likely to be VMs than RBD hosts.
> 
> One thought is to have an alternative type of backlist entry that does
> not have an expiration, but instead is automatically removed when we
> see a client authenticate with the same auth id, from the same IP
> address as the blacklist entry, but with a different nonce.
> 
> Flushing out any blacklist entries from a host that never came back
> would be an administrative operation, or we could do it automatically
> on a *super* long expiration time (like a month), and in other cases
> like if the auth identity associated with the blacklist entry was
> removed.
> 
> Any thoughts?

I like it!  I'm not sure it needs to be a different type of entry, 
though... we can just set the expiration to one month, and then have some 
other bit of code remove it early based on the heuristic.

I suspect the main logistical issue is who pays attention to the new auth 
or mount request from the client.  And where the cleanup heuristic 
lives..

sage


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Smarter blacklisting?
  2017-04-18 16:41 ` Sage Weil
@ 2017-04-18 18:36   ` Gregory Farnum
  2017-04-19 12:02     ` John Spray
  0 siblings, 1 reply; 7+ messages in thread
From: Gregory Farnum @ 2017-04-18 18:36 UTC (permalink / raw)
  To: Sage Weil; +Cc: John Spray, Ceph Development

On Tue, Apr 18, 2017 at 12:41 PM, Sage Weil <sage@newdream.net> wrote:
> On Tue, 18 Apr 2017, John Spray wrote:
>> Currently, when we add an address to the blacklist, we leave it in
>> there for a set period of time (24 minutes by default, which I suspect
>> might have been meant to be 24 hours), and then expire it.
>>
>> Clearly there are two problems with that:
>>  * We leave things in the list for much longer than necessary most of
>> the time, when a blacklisted client/node comes back reasonably soon
>> after a restart
>>  * We are never 100% guaranteed that a long-halted client won't come
>> back after its blacklist entry has expired (e.g. a paused VM with
>> dirty pages, wakes up a day later and writes back to OSDs).
>>
>> These mostly haven't been too much trouble in practice, but we may be
>> (optionally) doing a lot more blacklisting on cephfs systems soon[1],
>> and cephfs clients are perhaps more likely to be VMs than RBD hosts.
>>
>> One thought is to have an alternative type of backlist entry that does
>> not have an expiration, but instead is automatically removed when we
>> see a client authenticate with the same auth id, from the same IP
>> address as the blacklist entry, but with a different nonce.
>>
>> Flushing out any blacklist entries from a host that never came back
>> would be an administrative operation, or we could do it automatically
>> on a *super* long expiration time (like a month), and in other cases
>> like if the auth identity associated with the blacklist entry was
>> removed.
>>
>> Any thoughts?
>
> I like it!  I'm not sure it needs to be a different type of entry,
> though... we can just set the expiration to one month, and then have some
> other bit of code remove it early based on the heuristic.
>
> I suspect the main logistical issue is who pays attention to the new auth
> or mount request from the client.  And where the cleanup heuristic
> lives..

I'd also be a little concerned about building up blacklist entries
over a long time period. Right now they just live in the OSDMap (as a
set?), and if we're keeping them for a month that could be an awful
lot of them. We may need to see if we can pull it out into a separate
structure, or at least encode them more efficiently in incrementals.

But I'd definitely like if we did something to try and clean them up
based on reconnecting clients. I can think of a few different ways to
go:
1) OSDs request entries be removed when they see a blacklist.
2) Servers report connected clients to the manager and it compares
them to the blacklist.
3) Clients which reconnect submit requests to the monitor directly.

What approach were you thinking of, John?
-Greg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Smarter blacklisting?
  2017-04-18 18:36   ` Gregory Farnum
@ 2017-04-19 12:02     ` John Spray
  2017-04-19 13:16       ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: John Spray @ 2017-04-19 12:02 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, Ceph Development

On Tue, Apr 18, 2017 at 7:36 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
> On Tue, Apr 18, 2017 at 12:41 PM, Sage Weil <sage@newdream.net> wrote:
>> On Tue, 18 Apr 2017, John Spray wrote:
>>> Currently, when we add an address to the blacklist, we leave it in
>>> there for a set period of time (24 minutes by default, which I suspect
>>> might have been meant to be 24 hours), and then expire it.
>>>
>>> Clearly there are two problems with that:
>>>  * We leave things in the list for much longer than necessary most of
>>> the time, when a blacklisted client/node comes back reasonably soon
>>> after a restart
>>>  * We are never 100% guaranteed that a long-halted client won't come
>>> back after its blacklist entry has expired (e.g. a paused VM with
>>> dirty pages, wakes up a day later and writes back to OSDs).
>>>
>>> These mostly haven't been too much trouble in practice, but we may be
>>> (optionally) doing a lot more blacklisting on cephfs systems soon[1],
>>> and cephfs clients are perhaps more likely to be VMs than RBD hosts.
>>>
>>> One thought is to have an alternative type of backlist entry that does
>>> not have an expiration, but instead is automatically removed when we
>>> see a client authenticate with the same auth id, from the same IP
>>> address as the blacklist entry, but with a different nonce.
>>>
>>> Flushing out any blacklist entries from a host that never came back
>>> would be an administrative operation, or we could do it automatically
>>> on a *super* long expiration time (like a month), and in other cases
>>> like if the auth identity associated with the blacklist entry was
>>> removed.
>>>
>>> Any thoughts?
>>
>> I like it!  I'm not sure it needs to be a different type of entry,
>> though... we can just set the expiration to one month, and then have some
>> other bit of code remove it early based on the heuristic.
>>
>> I suspect the main logistical issue is who pays attention to the new auth
>> or mount request from the client.  And where the cleanup heuristic
>> lives..
>
> I'd also be a little concerned about building up blacklist entries
> over a long time period. Right now they just live in the OSDMap (as a
> set?), and if we're keeping them for a month that could be an awful
> lot of them. We may need to see if we can pull it out into a separate
> structure, or at least encode them more efficiently in incrementals.

I was already pondering where the more detailed blacklist info (e.g.
ids of clients) should go, as it's not something that actually needs
to be shared with all the normal OSDMap subscribers (it's only the
entity that does the blacklist removal that needs to see that).  It's
already not ideal imho that we expose the list of all blacklisted
clients to all the other clients -- in general they shouldn't be able
to e.g. learn one another's addresses like this.

However, the list of blacklisted addresses of course needs to be in
the osdmap, if you're not in the list that's visible to OSDs, you're
not really blacklisted (blacklist updates are already transmitted
incrementally).

We could impose a maximum blacklist size, and automatically remove the
oldest entries beyond that threshold.  However, in practice the system
needs to be able to handle a blacklist of size O(number of clients),
so I'm not sure what we would set that limit to.

Anyway, there can be some separation of concerns here -- in the first
instance we could have a change that adds the intelligent blacklist
cleaning without increasing the overall expiry time, and then later
something that increases the expiry time and adds better mechanisms
for handling unexpected growth in the size of the blacklist.

> But I'd definitely like if we did something to try and clean them up
> based on reconnecting clients. I can think of a few different ways to
> go:
> 1) OSDs request entries be removed when they see a blacklist.
> 2) Servers report connected clients to the manager and it compares
> them to the blacklist.
> 3) Clients which reconnect submit requests to the monitor directly.
>
> What approach were you thinking of, John?

The mon already sees every client start (when its MonClient comes up
and authenticates), so I would generate blacklist removals in the
monitor when a client opens a session.  As a cluster map update it
would flow through the mon anyway, so probably not much value trying
to do it anywhere else.

John

> -Greg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Smarter blacklisting?
  2017-04-19 12:02     ` John Spray
@ 2017-04-19 13:16       ` Sage Weil
  2017-04-19 23:26         ` Josh Durgin
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2017-04-19 13:16 UTC (permalink / raw)
  To: John Spray; +Cc: Gregory Farnum, Ceph Development

On Wed, 19 Apr 2017, John Spray wrote:
> I was already pondering where the more detailed blacklist info (e.g.
> ids of clients) should go, as it's not something that actually needs
> to be shared with all the normal OSDMap subscribers (it's only the
> entity that does the blacklist removal that needs to see that).  It's
> already not ideal imho that we expose the list of all blacklisted
> clients to all the other clients -- in general they shouldn't be able
> to e.g. learn one another's addresses like this.

We already break the OSDMap encoding into two sections: the first part 
that clients care about, and the second part that is only used by OSDs.  
(The kernel client doesn't bother to decode the second half.)

I suspect it wouldn't take much to send abbreviated maps (and 
incrementals) to clients that don't include that second half at all...

sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Smarter blacklisting?
  2017-04-19 13:16       ` Sage Weil
@ 2017-04-19 23:26         ` Josh Durgin
  2017-04-20 13:09           ` Jason Dillaman
  0 siblings, 1 reply; 7+ messages in thread
From: Josh Durgin @ 2017-04-19 23:26 UTC (permalink / raw)
  To: Sage Weil, John Spray
  Cc: Gregory Farnum, Ceph Development, Jason Dillaman, Mike Christie

On 04/19/2017 06:16 AM, Sage Weil wrote:
> On Wed, 19 Apr 2017, John Spray wrote:
>> I was already pondering where the more detailed blacklist info (e.g.
>> ids of clients) should go, as it's not something that actually needs
>> to be shared with all the normal OSDMap subscribers (it's only the
>> entity that does the blacklist removal that needs to see that).  It's
>> already not ideal imho that we expose the list of all blacklisted
>> clients to all the other clients -- in general they shouldn't be able
>> to e.g. learn one another's addresses like this.
>
> We already break the OSDMap encoding into two sections: the first part
> that clients care about, and the second part that is only used by OSDs.
> (The kernel client doesn't bother to decode the second half.)
>
> I suspect it wouldn't take much to send abbreviated maps (and
> incrementals) to clients that don't include that second half at all...

I was talking to Jason and Mike about another use of more generalized
blacklisting at Vault, but I can't remember the details - do you guys?

IIRC it sounded like that use would make sense as a separate map that
osds and clients would subscribe to.

Josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Smarter blacklisting?
  2017-04-19 23:26         ` Josh Durgin
@ 2017-04-20 13:09           ` Jason Dillaman
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Dillaman @ 2017-04-20 13:09 UTC (permalink / raw)
  To: Josh Durgin
  Cc: Sage Weil, John Spray, Gregory Farnum, Ceph Development, Mike Christie

We didn't fully flesh it out, but I believe we were talking about
iSCSI persistent group reservations and how we could pseudo-blacklist
a (stale) client after the reservation has been updated. The current
approach to handle PGRs requires two round-trips for each IO to (1)
verify the current PGR state and (2) issue the IO if the reservation
is still active. Not only will this be slow due to the extra
round-trip -- it's taking advantage of some ambiguity in the spec with
regard to how in-flight IOs would react given a PGR update.

In order to eliminate this extra round-trip, we need some way for the
OSDs to generically enforce the PGR on each IO. The idea was that we
could have a new "user-managed epoch" sequence that would be managed
by the monitors and enforced by the OSDs. Given such a system, if I
have a target with an in-flight IO that is associated with epoch X
(e.g. via a new op at the start of the transaction) and the end-user
application issuing IO has a failover event which results in a PGR
update, the target path could request epoch X + 1 and know that from
that point on any old, in-flight IO associated with the older sequence
number could not be committed to disk.

The problem is that you would somehow need to ensure that all the
primary PGs know about the new epoch, hopefully without the need to
have librbd "ping" each of the primary PGs that could contain the
backing objects to an image. Perhaps this op could implement a small
lease window on the epoch to ensure the OSDs never have a epoch
sequence that is over X seconds old.

Jason

On Wed, Apr 19, 2017 at 7:26 PM, Josh Durgin <jdurgin@redhat.com> wrote:
> On 04/19/2017 06:16 AM, Sage Weil wrote:
>>
>> On Wed, 19 Apr 2017, John Spray wrote:
>>>
>>> I was already pondering where the more detailed blacklist info (e.g.
>>> ids of clients) should go, as it's not something that actually needs
>>> to be shared with all the normal OSDMap subscribers (it's only the
>>> entity that does the blacklist removal that needs to see that).  It's
>>> already not ideal imho that we expose the list of all blacklisted
>>> clients to all the other clients -- in general they shouldn't be able
>>> to e.g. learn one another's addresses like this.
>>
>>
>> We already break the OSDMap encoding into two sections: the first part
>> that clients care about, and the second part that is only used by OSDs.
>> (The kernel client doesn't bother to decode the second half.)
>>
>> I suspect it wouldn't take much to send abbreviated maps (and
>> incrementals) to clients that don't include that second half at all...
>
>
> I was talking to Jason and Mike about another use of more generalized
> blacklisting at Vault, but I can't remember the details - do you guys?
>
> IIRC it sounded like that use would make sense as a separate map that
> osds and clients would subscribe to.
>
> Josh



-- 
Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-20 13:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-18 16:37 Smarter blacklisting? John Spray
2017-04-18 16:41 ` Sage Weil
2017-04-18 18:36   ` Gregory Farnum
2017-04-19 12:02     ` John Spray
2017-04-19 13:16       ` Sage Weil
2017-04-19 23:26         ` Josh Durgin
2017-04-20 13:09           ` Jason Dillaman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.