All of lore.kernel.org
 help / color / mirror / Atom feed
* OSDMap / osd_state questions
@ 2017-04-04 11:42 Jesse Williamson
  2017-04-04 12:18 ` Jesse Williamson
  2017-04-04 14:12 ` Sage Weil
  0 siblings, 2 replies; 5+ messages in thread
From: Jesse Williamson @ 2017-04-04 11:42 UTC (permalink / raw)
  To: Squid Cybernetic


Hello,

I have a couple of questions about OSDMap.

OSDMap has:
   int num_osd;         // not saved; see calc_num_osds
   int num_up_osd;      // not saved; see calc_num_osds
   int num_in_osd;      // not saved; see calc_num_osds

   int32_t max_osd;
   vector<uint8_t> osd_state;

   int calc_num_osds();

calc_num_osds() does this:
int OSDMap::calc_num_osds()
{
   num_osd = 0;
   num_up_osd = 0;
   num_in_osd = 0;
   for (int i=0; i<max_osd; i++) {
     if (osd_state[i] & CEPH_OSD_EXISTS) {
       ++num_osd;
       if (osd_state[i] & CEPH_OSD_UP) {
     ++num_up_osd;
       }
       if (get_weight(i) != CEPH_OSD_OUT) {
     ++num_in_osd;
       }
     }
   }
   return num_osd;
}

...is there some condition where an osd id will appear in osd_state and 
yet CEPH_OSD_EXISTS be false? It looks like the other two conditions, UP 
and OUT[1] depend on that being true.

First question: Can we get away with returning osd_state.size()?

Second, should the cached/memoized member variables set as a side-effect 
of calc_num_osds() /ever/ be trusted if we don't call the method first?

Finally, if we have to go through this loop anyway, perhaps in addition to 
the size we'll want to know the OSD ids as well? I see several places 
where we wind up looping through osd_state /again/ to get extra 
information (for example, OSDMonitor::get_health()).

Perhaps a method basically doing this would be useful, along these lines?

partition_copy(begin(osd_state), end(osd_state),
                begin(up_osds), begin(down_osds),
                [&osdmap](const int32_t osd_id) { return osdmap.is_up(osd_id); });

Thanks for reading,

-Jesse

1. It would be amusing to check for DOWN and OUT.;-)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OSDMap / osd_state questions
  2017-04-04 11:42 OSDMap / osd_state questions Jesse Williamson
@ 2017-04-04 12:18 ` Jesse Williamson
  2017-04-04 14:16   ` Sage Weil
  2017-04-04 14:12 ` Sage Weil
  1 sibling, 1 reply; 5+ messages in thread
From: Jesse Williamson @ 2017-04-04 12:18 UTC (permalink / raw)
  To: Squid Cybernetic

On Tue, 4 Apr 2017, Jesse Williamson wrote:

Ok, sorry to add extra traffic, but a partial answer to my own question leads to more questions:

> ...is there some condition where an osd id will appear in osd_state and yet 
> CEPH_OSD_EXISTS be false? It looks like the other two conditions, UP and 
> OUT[1] depend on that being true.

OSDMonitor::get_health() appears to circumvent the is_up()/is_down() 
methods from OSDMap, instead checking manually-- and slightly 
differently[1].

One of the checks is:
       if (!osdmap.exists(i)) {
         if (osdmap.crush->item_exists(i)) {
           down_osds.insert(i);
         }

...so, to be "down", something must NOT exist in the osd_state AND must 
exist in the CRUSH map. Looking at definition of OSDMap::exists() and 
OSDMap::is_down(), they differ in that get_health() is checking the CRUSH 
map. Who's right..?

It also looks like in get_health() when something's added to down_osds, 
num_down_in_osds isn't incremented. Is this intentional?

Thanks again,

-Jesse

1. From OSDMonitor::get_health():

     int num_in_osds = 0;
     set<int> down_osds;
     for (int i = 0; i < osdmap.get_max_osd(); i++) {
       if (!osdmap.exists(i)) {
         if (osdmap.crush->item_exists(i)) {
           down_osds.insert(i);
         }
      continue;
       }
       if (osdmap.is_out(i))
         continue;
       ++num_in_osds;
       if (!osdmap.is_up(i)) {
     ++num_down_in_osds;

2. exists(), etc.:

   bool exists(int osd) const {
     //assert(osd >= 0);
     return osd >= 0 && osd < max_osd && (osd_state[osd] & 
CEPH_OSD_EXISTS);
   }

   bool is_up(int osd) const {
     return exists(osd) && (osd_state[osd] & CEPH_OSD_UP);
   }

   bool has_been_up_since(int osd, epoch_t epoch) const {
     return is_up(osd) && get_up_from(osd) <= epoch;
   }

   bool is_down(int osd) const {
     return !is_up(osd);
   }


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OSDMap / osd_state questions
  2017-04-04 11:42 OSDMap / osd_state questions Jesse Williamson
  2017-04-04 12:18 ` Jesse Williamson
@ 2017-04-04 14:12 ` Sage Weil
  2017-04-04 14:55   ` Jesse Williamson
  1 sibling, 1 reply; 5+ messages in thread
From: Sage Weil @ 2017-04-04 14:12 UTC (permalink / raw)
  To: Jesse Williamson; +Cc: Squid Cybernetic

On Tue, 4 Apr 2017, Jesse Williamson wrote:
> Hello,
> 
> I have a couple of questions about OSDMap.
> 
> OSDMap has:
>   int num_osd;         // not saved; see calc_num_osds
>   int num_up_osd;      // not saved; see calc_num_osds
>   int num_in_osd;      // not saved; see calc_num_osds
> 
>   int32_t max_osd;
>   vector<uint8_t> osd_state;
> 
>   int calc_num_osds();
> 
> calc_num_osds() does this:
> int OSDMap::calc_num_osds()
> {
>   num_osd = 0;
>   num_up_osd = 0;
>   num_in_osd = 0;
>   for (int i=0; i<max_osd; i++) {
>     if (osd_state[i] & CEPH_OSD_EXISTS) {
>       ++num_osd;
>       if (osd_state[i] & CEPH_OSD_UP) {
>     ++num_up_osd;
>       }
>       if (get_weight(i) != CEPH_OSD_OUT) {
>     ++num_in_osd;
>       }
>     }
>   }
>   return num_osd;
> }
> 
> ...is there some condition where an osd id will appear in osd_state and yet
> CEPH_OSD_EXISTS be false? 

Yes

> It looks like the other two conditions, UP and OUT[1] depend on that 
> being true.

Those bits are ignored if the OSD doesn't exist.

> First question: Can we get away with returning osd_state.size()?

Nope!
 
> Second, should the cached/memoized member variables set as a side-effect of
> calc_num_osds() /ever/ be trusted if we don't call the method first?

Nope!  We're careful to call it after decode() and apply_incremental(), 
which are basically the only two paths that modify OSDMap.

> Finally, if we have to go through this loop anyway, perhaps in addition to the
> size we'll want to know the OSD ids as well? I see several places where we
> wind up looping through osd_state /again/ to get extra information (for
> example, OSDMonitor::get_health()).
> 
> Perhaps a method basically doing this would be useful, along these lines?
> 
> partition_copy(begin(osd_state), end(osd_state),
>                begin(up_osds), begin(down_osds),
>                [&osdmap](const int32_t osd_id) { return osdmap.is_up(osd_id);
> });

Maybe!  Since osd_state is a vector its all very fast.  Not sure that we 
are bulding a set<int> of up or down often enough to justify 
prebuilding it.  (It also consumes memory, and on big clusters 
we have lots of big OSDMaps in a cache that can't consume too much 
memory.)

sage


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OSDMap / osd_state questions
  2017-04-04 12:18 ` Jesse Williamson
@ 2017-04-04 14:16   ` Sage Weil
  0 siblings, 0 replies; 5+ messages in thread
From: Sage Weil @ 2017-04-04 14:16 UTC (permalink / raw)
  To: Jesse Williamson; +Cc: Squid Cybernetic

On Tue, 4 Apr 2017, Jesse Williamson wrote:
> On Tue, 4 Apr 2017, Jesse Williamson wrote:
> 
> Ok, sorry to add extra traffic, but a partial answer to my own question leads
> to more questions:
> 
> > ...is there some condition where an osd id will appear in osd_state and yet
> > CEPH_OSD_EXISTS be false? It looks like the other two conditions, UP and
> > OUT[1] depend on that being true.
> 
> OSDMonitor::get_health() appears to circumvent the is_up()/is_down() methods
> from OSDMap, instead checking manually-- and slightly differently[1].
> 
> One of the checks is:
>       if (!osdmap.exists(i)) {
>         if (osdmap.crush->item_exists(i)) {
>           down_osds.insert(i);
>         }
> 
> ...so, to be "down", something must NOT exist in the osd_state AND must exist
> in the CRUSH map. Looking at definition of OSDMap::exists() and
> OSDMap::is_down(), they differ in that get_health() is checking the CRUSH map.
> Who's right..?

Not sure what eversion of the code you're looking at.  On master, it's

    set<int> osds;
    for (int i = 0; i < osdmap.get_max_osd(); i++) {
      if (!osdmap.exists(i)) {
        if (osdmap.crush->item_exists(i)) {
          osds.insert(i);
        }
	continue;
      } 
      ...
    }
    ...
    if (!osds.empty()) {
      ostringstream ss;
      ss << "osds were removed from osdmap, but still kept in crushmap";
      summary.push_back(make_pair(HEALTH_WARN, ss.str()));
      if (detail) {
        ss << " osds: [" << osds << "]";
        detail->push_back(make_pair(HEALTH_WARN, ss.str()));
      }
    }

> It also looks like in get_health() when something's added to down_osds,
> num_down_in_osds isn't incremented. Is this intentional?

The code in master makes sense there too.  What are you looking at?

Thanks!
sage



> 
> Thanks again,
> 
> -Jesse
> 
> 1. From OSDMonitor::get_health():
> 
>     int num_in_osds = 0;
>     set<int> down_osds;
>     for (int i = 0; i < osdmap.get_max_osd(); i++) {
>       if (!osdmap.exists(i)) {
>         if (osdmap.crush->item_exists(i)) {
>           down_osds.insert(i);
>         }
>      continue;
>       }
>       if (osdmap.is_out(i))
>         continue;
>       ++num_in_osds;
>       if (!osdmap.is_up(i)) {
>     ++num_down_in_osds;
> 
> 2. exists(), etc.:
> 
>   bool exists(int osd) const {
>     //assert(osd >= 0);
>     return osd >= 0 && osd < max_osd && (osd_state[osd] & CEPH_OSD_EXISTS);
>   }
> 
>   bool is_up(int osd) const {
>     return exists(osd) && (osd_state[osd] & CEPH_OSD_UP);
>   }
> 
>   bool has_been_up_since(int osd, epoch_t epoch) const {
>     return is_up(osd) && get_up_from(osd) <= epoch;
>   }
> 
>   bool is_down(int osd) const {
>     return !is_up(osd);
>   }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OSDMap / osd_state questions
  2017-04-04 14:12 ` Sage Weil
@ 2017-04-04 14:55   ` Jesse Williamson
  0 siblings, 0 replies; 5+ messages in thread
From: Jesse Williamson @ 2017-04-04 14:55 UTC (permalink / raw)
  To: Sage Weil; +Cc: Squid Cybernetic

On Tue, 4 Apr 2017, Sage Weil wrote:

>> First question: Can we get away with returning osd_state.size()?
>
> Nope!

Ok, that settles that! :-) Thanks!

>> Second, should the cached/memoized member variables set as a side-effect of
>> calc_num_osds() /ever/ be trusted if we don't call the method first?
>
> Nope!  We're careful to call it after decode() and apply_incremental(),
> which are basically the only two paths that modify OSDMap.

Thanks, good to know!

>> Perhaps a method basically doing this would be useful, along these lines?

[...]

> Maybe!  Since osd_state is a vector its all very fast.  Not sure that we
> are bulding a set<int> of up or down often enough to justify
> prebuilding it.  (It also consumes memory, and on big clusters
> we have lots of big OSDMaps in a cache that can't consume too much
> memory.)

Yes, it would definitely use quite a bit of extra memory, at least the 
size of the osd_state over again since we're making a copy. Maybe the only 
way that would make sense would be to track them seperately from "go", but 
then all the code touching osd_state would have to get more complicated. 
I'm just poking around in mon's status reporting, so it makes sense to to 
keep it local. Thanks!

WRT my other email, I'm indeed looking in a weird branch-- confusion 
unconfusifificated!

Appreciatively,

-Jesse

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-04 14:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-04 11:42 OSDMap / osd_state questions Jesse Williamson
2017-04-04 12:18 ` Jesse Williamson
2017-04-04 14:16   ` Sage Weil
2017-04-04 14:12 ` Sage Weil
2017-04-04 14:55   ` Jesse Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.