* Print error into debug log by default
@ 2017-03-23 10:02 Wang, Zhiye
2017-03-23 11:59 ` Sage Weil
0 siblings, 1 reply; 4+ messages in thread
From: Wang, Zhiye @ 2017-03-23 10:02 UTC (permalink / raw)
To: ceph-devel
Dear all,
This is a small problem. I was not able to figure out the way to open an issue, so I just share it here.
After some wrong operation steps (run ceph-osd command using root), I was not be able to start ceph-osd anymore. I can see the following stack in debug log.
2017-03-22 02:23:54.054907 7f0e87d8b940 -1 osd.0 0 failed to load OSD map for epoch 71, got 0 bytes
2017-03-22 02:23:54.056361 7f0e87d8b940 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.2.0/rpm/el7/BUILD/ceph-11.2.0/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f0e87d8b940 time 2017-03-22 02:23:54.054921
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.2.0/rpm/el7/BUILD/ceph-11.2.0/src/osd/OSD.h: 997: FAILED assert(ret)
ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f0e88869b35]
2: (OSDService::get_map(unsigned int)+0x3d) [0x7f0e8825d13d]
3: (OSD::init()+0x1fd2) [0x7f0e8820a452]
4: (main()+0x2cda) [0x7f0e8813bf4a]
5: (__libc_start_main()+0xf5) [0x7f0e8460cb15]
6: (()+0x413da9) [0x7f0e881b7da9]
After dig this for problem for some time, I finally realize it should a problem of file permission (because of my previous wrong operation). The problem is that there was no tip in debug log.
Look at the source code, I guess it's because we do not print file open error debug log in FileStore::lfn_open by default. Please correct me if I am wrong. I'd suggest we can always print error message into debug log.
r = ::open((*path)->path(), flags, 0644);
if (r < 0) {
r = -errno;
dout(10) << "error opening file " << (*path)->path() << " with flags="
<< flags << ": " << cpp_strerror(-r) << dendl;
goto fail;
}
Thanks
Mike
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Print error into debug log by default
2017-03-23 10:02 Print error into debug log by default Wang, Zhiye
@ 2017-03-23 11:59 ` Sage Weil
2017-03-24 3:01 ` Wang, Zhiye
0 siblings, 1 reply; 4+ messages in thread
From: Sage Weil @ 2017-03-23 11:59 UTC (permalink / raw)
To: Wang, Zhiye; +Cc: ceph-devel
On Thu, 23 Mar 2017, Wang, Zhiye wrote:
> Dear all,
>
> This is a small problem. I was not able to figure out the way to open an issue, so I just share it here.
>
> After some wrong operation steps (run ceph-osd command using root), I was not be able to start ceph-osd anymore. I can see the following stack in debug log.
>
>
> 2017-03-22 02:23:54.054907 7f0e87d8b940 -1 osd.0 0 failed to load OSD map for epoch 71, got 0 bytes
> 2017-03-22 02:23:54.056361 7f0e87d8b940 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.2.0/rpm/el7/BUILD/ceph-11.2.0/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f0e87d8b940 time 2017-03-22 02:23:54.054921
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.2.0/rpm/el7/BUILD/ceph-11.2.0/src/osd/OSD.h: 997: FAILED assert(ret)
>
> ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f0e88869b35]
> 2: (OSDService::get_map(unsigned int)+0x3d) [0x7f0e8825d13d]
> 3: (OSD::init()+0x1fd2) [0x7f0e8820a452]
> 4: (main()+0x2cda) [0x7f0e8813bf4a]
> 5: (__libc_start_main()+0xf5) [0x7f0e8460cb15]
> 6: (()+0x413da9) [0x7f0e881b7da9]
>
>
> After dig this for problem for some time, I finally realize it should a problem of file permission (because of my previous wrong operation). The problem is that there was no tip in debug log.
>
> Look at the source code, I guess it's because we do not print file open error debug log in FileStore::lfn_open by default. Please correct me if I am wrong. I'd suggest we can always print error message into debug log.
>
> r = ::open((*path)->path(), flags, 0644);
> if (r < 0) {
> r = -errno;
> dout(10) << "error opening file " << (*path)->path() << " with flags="
> << flags << ": " << cpp_strerror(-r) << dendl;
> goto fail;
> }
At this layer we can get ENOENT as a normal event (some client
request asks for an object that doesn't exist), so it doesn't make sense
to log an error here. The get_map() method should probably be modified to
indicate that it failed to load map epoch N (using derr) before asserting
or calling ceph_abort().
Thanks!
sage
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Print error into debug log by default
2017-03-23 11:59 ` Sage Weil
@ 2017-03-24 3:01 ` Wang, Zhiye
2017-03-24 13:20 ` Sage Weil
0 siblings, 1 reply; 4+ messages in thread
From: Wang, Zhiye @ 2017-03-24 3:01 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Thanks Sage.
Currently, it seems the return value of " FileStore::lfn_open" assigned to a boolen variable in "OSDService::_get_map_bl". That causes upper has no knowledge why the operation was failed.
Maybe we can change the return value of "OSDService::_get_map_bl" to "int", and then print failure reason in "OSDService::try_get_map".
bool OSDService::_get_map_bl(epoch_t e, bufferlist& bl)
{
bool found = map_bl_cache.lookup(e, &bl);
if (found)
return true;
found = store->read(coll_t::meta(),
OSD::get_osdmap_pobject_name(e), 0, 0, bl) >= 0;
if (found)
_add_map_bl(e, bl);
return found;
}
Thanks
Zhiye
-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net]
Sent: Thursday, March 23, 2017 7:59 PM
To: Wang, Zhiye <Zhiye.Wang@Arcserve.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: Print error into debug log by default
On Thu, 23 Mar 2017, Wang, Zhiye wrote:
> Dear all,
>
> This is a small problem. I was not able to figure out the way to open an issue, so I just share it here.
>
> After some wrong operation steps (run ceph-osd command using root), I was not be able to start ceph-osd anymore. I can see the following stack in debug log.
>
>
> 2017-03-22 02:23:54.054907 7f0e87d8b940 -1 osd.0 0 failed to load OSD
> map for epoch 71, got 0 bytes
> 2017-03-22 02:23:54.056361 7f0e87d8b940 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_A
> RCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/relea
> se/11.2.0/rpm/el7/BUILD/ceph-11.2.0/src/osd/OSD.h: In function
> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f0e87d8b940 time
> 2017-03-22 02:23:54.054921
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_A
> RCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/relea
> se/11.2.0/rpm/el7/BUILD/ceph-11.2.0/src/osd/OSD.h: 997: FAILED
> assert(ret)
>
> ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0x7f0e88869b35]
> 2: (OSDService::get_map(unsigned int)+0x3d) [0x7f0e8825d13d]
> 3: (OSD::init()+0x1fd2) [0x7f0e8820a452]
> 4: (main()+0x2cda) [0x7f0e8813bf4a]
> 5: (__libc_start_main()+0xf5) [0x7f0e8460cb15]
> 6: (()+0x413da9) [0x7f0e881b7da9]
>
>
> After dig this for problem for some time, I finally realize it should a problem of file permission (because of my previous wrong operation). The problem is that there was no tip in debug log.
>
> Look at the source code, I guess it's because we do not print file open error debug log in FileStore::lfn_open by default. Please correct me if I am wrong. I'd suggest we can always print error message into debug log.
>
> r = ::open((*path)->path(), flags, 0644);
> if (r < 0) {
> r = -errno;
> dout(10) << "error opening file " << (*path)->path() << " with flags="
> << flags << ": " << cpp_strerror(-r) << dendl;
> goto fail;
> }
At this layer we can get ENOENT as a normal event (some client request asks for an object that doesn't exist), so it doesn't make sense to log an error here. The get_map() method should probably be modified to indicate that it failed to load map epoch N (using derr) before asserting or calling ceph_abort().
Thanks!
sage
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Print error into debug log by default
2017-03-24 3:01 ` Wang, Zhiye
@ 2017-03-24 13:20 ` Sage Weil
0 siblings, 0 replies; 4+ messages in thread
From: Sage Weil @ 2017-03-24 13:20 UTC (permalink / raw)
To: Wang, Zhiye; +Cc: ceph-devel
On Fri, 24 Mar 2017, Wang, Zhiye wrote:
> Thanks Sage.
>
> Currently, it seems the return value of " FileStore::lfn_open" assigned to a boolen variable in "OSDService::_get_map_bl". That causes upper has no knowledge why the operation was failed.
>
> Maybe we can change the return value of "OSDService::_get_map_bl" to "int", and then print failure reason in "OSDService::try_get_map".
Yeah, that would be better!
sage
>
>
> bool OSDService::_get_map_bl(epoch_t e, bufferlist& bl)
> {
> bool found = map_bl_cache.lookup(e, &bl);
> if (found)
> return true;
> found = store->read(coll_t::meta(),
> OSD::get_osdmap_pobject_name(e), 0, 0, bl) >= 0;
> if (found)
> _add_map_bl(e, bl);
> return found;
> }
>
>
> Thanks
> Zhiye
>
>
>
>
> -----Original Message-----
> From: Sage Weil [mailto:sage@newdream.net]
> Sent: Thursday, March 23, 2017 7:59 PM
> To: Wang, Zhiye <Zhiye.Wang@Arcserve.com>
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Print error into debug log by default
>
> On Thu, 23 Mar 2017, Wang, Zhiye wrote:
> > Dear all,
> >
> > This is a small problem. I was not able to figure out the way to open an issue, so I just share it here.
> >
> > After some wrong operation steps (run ceph-osd command using root), I was not be able to start ceph-osd anymore. I can see the following stack in debug log.
> >
> >
> > 2017-03-22 02:23:54.054907 7f0e87d8b940 -1 osd.0 0 failed to load OSD
> > map for epoch 71, got 0 bytes
> > 2017-03-22 02:23:54.056361 7f0e87d8b940 -1
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_A
> > RCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/relea
> > se/11.2.0/rpm/el7/BUILD/ceph-11.2.0/src/osd/OSD.h: In function
> > 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f0e87d8b940 time
> > 2017-03-22 02:23:54.054921
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_A
> > RCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/relea
> > se/11.2.0/rpm/el7/BUILD/ceph-11.2.0/src/osd/OSD.h: 997: FAILED
> > assert(ret)
> >
> > ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x85) [0x7f0e88869b35]
> > 2: (OSDService::get_map(unsigned int)+0x3d) [0x7f0e8825d13d]
> > 3: (OSD::init()+0x1fd2) [0x7f0e8820a452]
> > 4: (main()+0x2cda) [0x7f0e8813bf4a]
> > 5: (__libc_start_main()+0xf5) [0x7f0e8460cb15]
> > 6: (()+0x413da9) [0x7f0e881b7da9]
> >
> >
> > After dig this for problem for some time, I finally realize it should a problem of file permission (because of my previous wrong operation). The problem is that there was no tip in debug log.
> >
> > Look at the source code, I guess it's because we do not print file open error debug log in FileStore::lfn_open by default. Please correct me if I am wrong. I'd suggest we can always print error message into debug log.
> >
> > r = ::open((*path)->path(), flags, 0644);
> > if (r < 0) {
> > r = -errno;
> > dout(10) << "error opening file " << (*path)->path() << " with flags="
> > << flags << ": " << cpp_strerror(-r) << dendl;
> > goto fail;
> > }
>
> At this layer we can get ENOENT as a normal event (some client request asks for an object that doesn't exist), so it doesn't make sense to log an error here. The get_map() method should probably be modified to indicate that it failed to load map epoch N (using derr) before asserting or calling ceph_abort().
>
> Thanks!
> sage
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-03-24 13:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-23 10:02 Print error into debug log by default Wang, Zhiye
2017-03-23 11:59 ` Sage Weil
2017-03-24 3:01 ` Wang, Zhiye
2017-03-24 13:20 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.