* OSDMap / osd_state questions
@ 2017-04-04 11:42 Jesse Williamson
2017-04-04 12:18 ` Jesse Williamson
2017-04-04 14:12 ` Sage Weil
0 siblings, 2 replies; 5+ messages in thread
From: Jesse Williamson @ 2017-04-04 11:42 UTC (permalink / raw)
To: Squid Cybernetic
Hello,
I have a couple of questions about OSDMap.
OSDMap has:
int num_osd; // not saved; see calc_num_osds
int num_up_osd; // not saved; see calc_num_osds
int num_in_osd; // not saved; see calc_num_osds
int32_t max_osd;
vector<uint8_t> osd_state;
int calc_num_osds();
calc_num_osds() does this:
int OSDMap::calc_num_osds()
{
num_osd = 0;
num_up_osd = 0;
num_in_osd = 0;
for (int i=0; i<max_osd; i++) {
if (osd_state[i] & CEPH_OSD_EXISTS) {
++num_osd;
if (osd_state[i] & CEPH_OSD_UP) {
++num_up_osd;
}
if (get_weight(i) != CEPH_OSD_OUT) {
++num_in_osd;
}
}
}
return num_osd;
}
...is there some condition where an osd id will appear in osd_state and
yet CEPH_OSD_EXISTS be false? It looks like the other two conditions, UP
and OUT[1] depend on that being true.
First question: Can we get away with returning osd_state.size()?
Second, should the cached/memoized member variables set as a side-effect
of calc_num_osds() /ever/ be trusted if we don't call the method first?
Finally, if we have to go through this loop anyway, perhaps in addition to
the size we'll want to know the OSD ids as well? I see several places
where we wind up looping through osd_state /again/ to get extra
information (for example, OSDMonitor::get_health()).
Perhaps a method basically doing this would be useful, along these lines?
partition_copy(begin(osd_state), end(osd_state),
begin(up_osds), begin(down_osds),
[&osdmap](const int32_t osd_id) { return osdmap.is_up(osd_id); });
Thanks for reading,
-Jesse
1. It would be amusing to check for DOWN and OUT.;-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OSDMap / osd_state questions
2017-04-04 11:42 OSDMap / osd_state questions Jesse Williamson
@ 2017-04-04 12:18 ` Jesse Williamson
2017-04-04 14:16 ` Sage Weil
2017-04-04 14:12 ` Sage Weil
1 sibling, 1 reply; 5+ messages in thread
From: Jesse Williamson @ 2017-04-04 12:18 UTC (permalink / raw)
To: Squid Cybernetic
On Tue, 4 Apr 2017, Jesse Williamson wrote:
Ok, sorry to add extra traffic, but a partial answer to my own question leads to more questions:
> ...is there some condition where an osd id will appear in osd_state and yet
> CEPH_OSD_EXISTS be false? It looks like the other two conditions, UP and
> OUT[1] depend on that being true.
OSDMonitor::get_health() appears to circumvent the is_up()/is_down()
methods from OSDMap, instead checking manually-- and slightly
differently[1].
One of the checks is:
if (!osdmap.exists(i)) {
if (osdmap.crush->item_exists(i)) {
down_osds.insert(i);
}
...so, to be "down", something must NOT exist in the osd_state AND must
exist in the CRUSH map. Looking at definition of OSDMap::exists() and
OSDMap::is_down(), they differ in that get_health() is checking the CRUSH
map. Who's right..?
It also looks like in get_health() when something's added to down_osds,
num_down_in_osds isn't incremented. Is this intentional?
Thanks again,
-Jesse
1. From OSDMonitor::get_health():
int num_in_osds = 0;
set<int> down_osds;
for (int i = 0; i < osdmap.get_max_osd(); i++) {
if (!osdmap.exists(i)) {
if (osdmap.crush->item_exists(i)) {
down_osds.insert(i);
}
continue;
}
if (osdmap.is_out(i))
continue;
++num_in_osds;
if (!osdmap.is_up(i)) {
++num_down_in_osds;
2. exists(), etc.:
bool exists(int osd) const {
//assert(osd >= 0);
return osd >= 0 && osd < max_osd && (osd_state[osd] &
CEPH_OSD_EXISTS);
}
bool is_up(int osd) const {
return exists(osd) && (osd_state[osd] & CEPH_OSD_UP);
}
bool has_been_up_since(int osd, epoch_t epoch) const {
return is_up(osd) && get_up_from(osd) <= epoch;
}
bool is_down(int osd) const {
return !is_up(osd);
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OSDMap / osd_state questions
2017-04-04 11:42 OSDMap / osd_state questions Jesse Williamson
2017-04-04 12:18 ` Jesse Williamson
@ 2017-04-04 14:12 ` Sage Weil
2017-04-04 14:55 ` Jesse Williamson
1 sibling, 1 reply; 5+ messages in thread
From: Sage Weil @ 2017-04-04 14:12 UTC (permalink / raw)
To: Jesse Williamson; +Cc: Squid Cybernetic
On Tue, 4 Apr 2017, Jesse Williamson wrote:
> Hello,
>
> I have a couple of questions about OSDMap.
>
> OSDMap has:
> int num_osd; // not saved; see calc_num_osds
> int num_up_osd; // not saved; see calc_num_osds
> int num_in_osd; // not saved; see calc_num_osds
>
> int32_t max_osd;
> vector<uint8_t> osd_state;
>
> int calc_num_osds();
>
> calc_num_osds() does this:
> int OSDMap::calc_num_osds()
> {
> num_osd = 0;
> num_up_osd = 0;
> num_in_osd = 0;
> for (int i=0; i<max_osd; i++) {
> if (osd_state[i] & CEPH_OSD_EXISTS) {
> ++num_osd;
> if (osd_state[i] & CEPH_OSD_UP) {
> ++num_up_osd;
> }
> if (get_weight(i) != CEPH_OSD_OUT) {
> ++num_in_osd;
> }
> }
> }
> return num_osd;
> }
>
> ...is there some condition where an osd id will appear in osd_state and yet
> CEPH_OSD_EXISTS be false?
Yes
> It looks like the other two conditions, UP and OUT[1] depend on that
> being true.
Those bits are ignored if the OSD doesn't exist.
> First question: Can we get away with returning osd_state.size()?
Nope!
> Second, should the cached/memoized member variables set as a side-effect of
> calc_num_osds() /ever/ be trusted if we don't call the method first?
Nope! We're careful to call it after decode() and apply_incremental(),
which are basically the only two paths that modify OSDMap.
> Finally, if we have to go through this loop anyway, perhaps in addition to the
> size we'll want to know the OSD ids as well? I see several places where we
> wind up looping through osd_state /again/ to get extra information (for
> example, OSDMonitor::get_health()).
>
> Perhaps a method basically doing this would be useful, along these lines?
>
> partition_copy(begin(osd_state), end(osd_state),
> begin(up_osds), begin(down_osds),
> [&osdmap](const int32_t osd_id) { return osdmap.is_up(osd_id);
> });
Maybe! Since osd_state is a vector its all very fast. Not sure that we
are bulding a set<int> of up or down often enough to justify
prebuilding it. (It also consumes memory, and on big clusters
we have lots of big OSDMaps in a cache that can't consume too much
memory.)
sage
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OSDMap / osd_state questions
2017-04-04 12:18 ` Jesse Williamson
@ 2017-04-04 14:16 ` Sage Weil
0 siblings, 0 replies; 5+ messages in thread
From: Sage Weil @ 2017-04-04 14:16 UTC (permalink / raw)
To: Jesse Williamson; +Cc: Squid Cybernetic
On Tue, 4 Apr 2017, Jesse Williamson wrote:
> On Tue, 4 Apr 2017, Jesse Williamson wrote:
>
> Ok, sorry to add extra traffic, but a partial answer to my own question leads
> to more questions:
>
> > ...is there some condition where an osd id will appear in osd_state and yet
> > CEPH_OSD_EXISTS be false? It looks like the other two conditions, UP and
> > OUT[1] depend on that being true.
>
> OSDMonitor::get_health() appears to circumvent the is_up()/is_down() methods
> from OSDMap, instead checking manually-- and slightly differently[1].
>
> One of the checks is:
> if (!osdmap.exists(i)) {
> if (osdmap.crush->item_exists(i)) {
> down_osds.insert(i);
> }
>
> ...so, to be "down", something must NOT exist in the osd_state AND must exist
> in the CRUSH map. Looking at definition of OSDMap::exists() and
> OSDMap::is_down(), they differ in that get_health() is checking the CRUSH map.
> Who's right..?
Not sure what eversion of the code you're looking at. On master, it's
set<int> osds;
for (int i = 0; i < osdmap.get_max_osd(); i++) {
if (!osdmap.exists(i)) {
if (osdmap.crush->item_exists(i)) {
osds.insert(i);
}
continue;
}
...
}
...
if (!osds.empty()) {
ostringstream ss;
ss << "osds were removed from osdmap, but still kept in crushmap";
summary.push_back(make_pair(HEALTH_WARN, ss.str()));
if (detail) {
ss << " osds: [" << osds << "]";
detail->push_back(make_pair(HEALTH_WARN, ss.str()));
}
}
> It also looks like in get_health() when something's added to down_osds,
> num_down_in_osds isn't incremented. Is this intentional?
The code in master makes sense there too. What are you looking at?
Thanks!
sage
>
> Thanks again,
>
> -Jesse
>
> 1. From OSDMonitor::get_health():
>
> int num_in_osds = 0;
> set<int> down_osds;
> for (int i = 0; i < osdmap.get_max_osd(); i++) {
> if (!osdmap.exists(i)) {
> if (osdmap.crush->item_exists(i)) {
> down_osds.insert(i);
> }
> continue;
> }
> if (osdmap.is_out(i))
> continue;
> ++num_in_osds;
> if (!osdmap.is_up(i)) {
> ++num_down_in_osds;
>
> 2. exists(), etc.:
>
> bool exists(int osd) const {
> //assert(osd >= 0);
> return osd >= 0 && osd < max_osd && (osd_state[osd] & CEPH_OSD_EXISTS);
> }
>
> bool is_up(int osd) const {
> return exists(osd) && (osd_state[osd] & CEPH_OSD_UP);
> }
>
> bool has_been_up_since(int osd, epoch_t epoch) const {
> return is_up(osd) && get_up_from(osd) <= epoch;
> }
>
> bool is_down(int osd) const {
> return !is_up(osd);
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: OSDMap / osd_state questions
2017-04-04 14:12 ` Sage Weil
@ 2017-04-04 14:55 ` Jesse Williamson
0 siblings, 0 replies; 5+ messages in thread
From: Jesse Williamson @ 2017-04-04 14:55 UTC (permalink / raw)
To: Sage Weil; +Cc: Squid Cybernetic
On Tue, 4 Apr 2017, Sage Weil wrote:
>> First question: Can we get away with returning osd_state.size()?
>
> Nope!
Ok, that settles that! :-) Thanks!
>> Second, should the cached/memoized member variables set as a side-effect of
>> calc_num_osds() /ever/ be trusted if we don't call the method first?
>
> Nope! We're careful to call it after decode() and apply_incremental(),
> which are basically the only two paths that modify OSDMap.
Thanks, good to know!
>> Perhaps a method basically doing this would be useful, along these lines?
[...]
> Maybe! Since osd_state is a vector its all very fast. Not sure that we
> are bulding a set<int> of up or down often enough to justify
> prebuilding it. (It also consumes memory, and on big clusters
> we have lots of big OSDMaps in a cache that can't consume too much
> memory.)
Yes, it would definitely use quite a bit of extra memory, at least the
size of the osd_state over again since we're making a copy. Maybe the only
way that would make sense would be to track them seperately from "go", but
then all the code touching osd_state would have to get more complicated.
I'm just poking around in mon's status reporting, so it makes sense to to
keep it local. Thanks!
WRT my other email, I'm indeed looking in a weird branch-- confusion
unconfusifificated!
Appreciatively,
-Jesse
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-04-04 14:55 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-04 11:42 OSDMap / osd_state questions Jesse Williamson
2017-04-04 12:18 ` Jesse Williamson
2017-04-04 14:16 ` Sage Weil
2017-04-04 14:12 ` Sage Weil
2017-04-04 14:55 ` Jesse Williamson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.